Data Processing/Manipulation

This section describes the data ingestion and manipulation layer of the library. It is the backbone of the library — its outputs feed directly into both the Metrics Calculations and Visualization sections.

Overview

There are two ways to load data into the library:

Function

When to use

generate_dataframes()

You already have a MESH_output_streamflow.csv (or similar pre-built CSV with interleaved QOMEAS_ / QOSIM_ columns).

generate_dataframes_from_mesh()

You want to go directly from raw MESH NetCDF outputs (QO_D_GRD.nc) and a .tb0 observed file without writing an intermediate CSV first.

Both functions return the same dictionary structure, so all downstream metric and visualisation calls are identical regardless of which loader you use.

Reading MESH outputs directly (generate_dataframes_from_mesh)

This function combines the steps that were previously handled by the combine_mesh_sim_obs notebook script:

  1. Reads the MESH drainage-database NetCDF to get the subbasin → array-index mapping.

  2. Reads the stations GeoPackage (produced by the COMID-matching pre-processing step) to build the station → COMID lookup table.

  3. Reads simulated streamflow from one or more QO_D_GRD.nc NetCDF files.

  4. Reads observed streamflow from the .tb0 EnSim file, parsing the :StartTime header automatically.

  5. Aligns observed and simulated data to their overlapping date range.

  6. Returns the same DATAFRAMES dictionary as generate_dataframes().

Required pre-processing files (produced once per study domain):

  • combined_discharge_stations_comids.gpkg — gauge stations with COMID assignments (produced by the COMID-matching notebook).

  • MESH_input_streamflow_latlon.tb0 — observed streamflow in EnSim format (produced by GenStreamflowAsync).

  • MESH_drainage_database_*.nc — MESH drainage database NetCDF.

Single model run

from postprocessinglib.evaluation import data, metrics, visuals

DATAFRAMES = data.generate_dataframes_from_mesh(
    input_stations_comids="combined_discharge_stations_comids.gpkg",
    input_obs="MESH_input_streamflow_latlon.tb0",
    input_ddb="MESH_drainage_database.nc",
    mesh_flow="QO_D_GRD.nc",
    warm_up=365,
)

# Metrics — identical to the CSV-based workflow
results = metrics.calculate_all_metrics(
    observed=DATAFRAMES["DF_OBSERVED"],
    simulated=DATAFRAMES["DF_SIMULATED"],
)

Multiple model runs

Pass a list of NetCDF paths to compare several runs at once. The function returns one DF_SIMULATED_1, DF_SIMULATED_2, … key per run:

DATAFRAMES = data.generate_dataframes_from_mesh(
    input_stations_comids="combined_discharge_stations_comids.gpkg",
    input_obs="MESH_input_streamflow_latlon.tb0",
    input_ddb="MESH_drainage_database.nc",
    mesh_flow=[
        "run1/QO_D_GRD.nc",
        "run2/QO_D_GRD.nc",
    ],
    warm_up=365,
)

results_run1 = metrics.calculate_all_metrics(
    observed=DATAFRAMES["DF_OBSERVED"],
    simulated=DATAFRAMES["DF_SIMULATED_1"],
)
results_run2 = metrics.calculate_all_metrics(
    observed=DATAFRAMES["DF_OBSERVED"],
    simulated=DATAFRAMES["DF_SIMULATED_2"],
)

Station filtering and aggregation

Both loaders accept the same filtering and aggregation flags:

DATAFRAMES = data.generate_dataframes_from_mesh(
    ...,
    keep_stations=["05BB001", "05BA001"],   # subset of stations
    monthly_agg=True, ma_method="mean",
    yearly_agg=True,  ya_method="sum",
    long_term=True,
    warm_up=365,
)

# Extra keys are ready to use immediately
monthly  = DATAFRAMES["DF_MONTHLY"]
lt_mean  = DATAFRAMES["LONG_TERM_MEAN"]

Reading from a pre-built CSV (generate_dataframes)

If you have already exported a MESH_output_streamflow.csv (e.g. from a previous notebook run), use generate_dataframes() directly:

DATAFRAMES = data.generate_dataframes(
    csv_fpaths=["MESH_output_streamflow.csv"],
    warm_up=365,
)

For multiple CSV files (one per model run):

DATAFRAMES = data.generate_dataframes(
    csv_fpaths=[
        "run1/MESH_output_streamflow.csv",
        "run2/MESH_output_streamflow.csv",
    ],
    warm_up=365,
)

Returned dictionary keys

Both loaders return a dictionary with the following keys:

Key

Content

"DF"

Flat merged DataFrame (single run).

"DF_1", "DF_2", …

Flat merged DataFrames, one per run (multiple runs).

"DF_OBSERVED"

Observed-only DataFrame (QOMEAS_* columns).

"DF_SIMULATED"

Simulated-only DataFrame (single run).

"DF_SIMULATED_1", …

Per-run simulated DataFrames (multiple runs).

"DF_MERGED"

MultiIndex-column DataFrame (station × variable).

"DF_DAILY"

Daily aggregate (when daily_agg=True).

"DF_WEEKLY"

Weekly aggregate (when weekly_agg=True).

"DF_MONTHLY"

Monthly aggregate (when monthly_agg=True).

"DF_YEARLY"

Yearly aggregate (when yearly_agg=True).

"DF_CUSTOM"

Seasonal-period subset (when seasonal_p=True).

"LONG_TERM_MIN" / "MAX" / "MEDIAN"

Long-term seasonal aggregates (when long_term=True).

"DF_STATS"

Cross-simulation statistics (when stat_agg=True).

Aggregation helpers

The following standalone helpers can also be applied to any DataFrame with a DatetimeIndex:

Function

Description

daily_aggregate()

Aggregate by day of year.

weekly_aggregate()

Aggregate by calendar week.

monthly_aggregate()

Aggregate by month.

yearly_aggregate()

Aggregate by year.

long_term_seasonal()

Long-term seasonal mean/min/max/quantile.

seasonal_period()

Slice to a recurring calendar window.

stat_aggregate()

Aggregate across multiple simulation columns.

twelve_hour_aggregate()

Aggregate to 12-hour intervals.

station_dataframe()

Extract individual station DataFrames.

Loaders

postprocessinglib.evaluation.data.generate_dataframes_from_mesh(input_stations_comids: str, input_obs: str, input_ddb: str, mesh_flow, warm_up: int = 0, start_date: str = '', end_date: str = '', daily_agg: bool = False, da_method: str = '', weekly_agg: bool = False, wa_method: str = '', monthly_agg: bool = False, ma_method: str = '', yearly_agg: bool = False, ya_method: str = '', seasonal_p: bool = False, sp_dperiod: tuple = [], sp_subset: tuple = None, long_term: bool = False, lt_method=None, stat_agg: bool = False, stat_method: str = None, keep_stations: list = None, drop_stations: list = None, nc_flow_var: str = 'QO', missing_val: float = -1.0) dict

Generate the required dataframes directly from MESH NetCDF outputs and a .tb0 observed streamflow file, bypassing the need to write an intermediate MESH_output_streamflow.csv.

The returned dictionary has the same structure as generate_dataframes, so all downstream metrics and visualisation functions work identically.

Parameters:
  • input_stations_comids (str) – Path to the GeoPackage (points layer) that maps each gauge station (column Obs_NM) to its drainage-network COMID and source agency (column SRC_obs – stations with SRC_obs == "USGS" are sorted after non-USGS stations, matching the column order in the .tb0 file produced by GenStreamflowAsync).

  • input_obs (str) – Path to the observed streamflow file in EnSim .tb0 format.

  • input_ddb (str) – Path to the MESH drainage-database NetCDF containing the subbasin variable (integer COMID for every routed segment).

  • mesh_flow (str or list of str) – Path(s) to one or more MESH simulated-flow NetCDF files. Each file must contain a 2-D variable <nc_flow_var> with dimensions (time, subbasin) (e.g. QO_D_GRD.nc). Pass a list to compare multiple model runs.

  • warm_up (int, optional) – Number of leading days to discard (spin-up period). Default 0.

  • start_date (str, optional) – Earliest date to keep ("YYYY-MM-DD"). Applied after warm-up.

  • end_date (str, optional) – Latest date to keep ("YYYY-MM-DD"). Applied after warm-up.

  • daily_agg (bool, str) – If True, compute daily aggregate using da_method.

  • da_method (bool, str) – If True, compute daily aggregate using da_method.

  • weekly_agg (bool, str) – If True, compute weekly aggregate using wa_method.

  • wa_method (bool, str) – If True, compute weekly aggregate using wa_method.

  • monthly_agg (bool, str) – If True, compute monthly aggregate using ma_method.

  • ma_method (bool, str) – If True, compute monthly aggregate using ma_method.

  • yearly_agg (bool, str) – If True, compute yearly aggregate using ya_method.

  • ya_method (bool, str) – If True, compute yearly aggregate using ya_method.

  • seasonal_p (bool, tuple, tuple) – Seasonal-period filtering (same semantics as generate_dataframes).

  • sp_dperiod (bool, tuple, tuple) – Seasonal-period filtering (same semantics as generate_dataframes).

  • sp_subset (bool, tuple, tuple) – Seasonal-period filtering (same semantics as generate_dataframes).

  • long_term (bool, list) – Long-term seasonal aggregation (same as generate_dataframes).

  • lt_method (bool, list) – Long-term seasonal aggregation (same as generate_dataframes).

  • stat_agg (bool, str) – Cross-simulation statistic aggregation.

  • stat_method (bool, str) – Cross-simulation statistic aggregation.

  • keep_stations (list of str, optional) – Station IDs (Obs_NM values) to keep; all others are dropped. Mutually exclusive with drop_stations.

  • drop_stations (list of str, optional) – Station IDs to drop. Mutually exclusive with keep_stations.

  • nc_flow_var (str, optional) – Name of the streamflow variable inside the MESH NetCDF. Default "QO".

  • missing_val (float, optional) – Sentinel value for missing data in the .tb0 file. Default -1.0.

Returns:

Same dictionary structure as generate_dataframes:

  • "DF" – flat merged DataFrame (single mesh_flow).

  • "DF_1", "DF_2", … – flat merged DataFrames per run (multiple mesh_flow files).

  • "DF_OBSERVED" – observed-only DataFrame.

  • "DF_SIMULATED" – simulated-only DataFrame (single run).

  • "DF_SIMULATED_1", … – per-run simulated DataFrames.

  • "DF_MERGED" – MultiIndex-column merged DataFrame.

  • Optional aggregation keys ("DF_DAILY", "DF_MONTHLY", "LONG_TERM_MIN", etc.) when requested.

Return type:

dict[str, pd.DataFrame]

Examples

Single model run:

>>> from postprocessinglib.evaluation import data, metrics
>>> DATAFRAMES = data.generate_dataframes_from_mesh(
...     input_stations_comids="combined_discharge_stations_comids.gpkg",
...     input_obs="MESH_input_streamflow_latlon.tb0",
...     input_ddb="MESH_drainage_database.nc",
...     mesh_flow="QO_D_GRD.nc",
...     warm_up=365,
... )
>>> results = metrics.calculate_all_metrics(
...     observed=DATAFRAMES["DF_OBSERVED"],
...     simulated=DATAFRAMES["DF_SIMULATED"],
... )

Multiple model runs:

>>> DATAFRAMES = data.generate_dataframes_from_mesh(
...     input_stations_comids="combined_discharge_stations_comids.gpkg",
...     input_obs="MESH_input_streamflow_latlon.tb0",
...     input_ddb="MESH_drainage_database.nc",
...     mesh_flow=["run1/QO_D_GRD.nc", "run2/QO_D_GRD.nc"],
...     warm_up=365,
... )
>>> results = metrics.calculate_all_metrics(
...     observed=DATAFRAMES["DF_OBSERVED"],
...     simulated=DATAFRAMES["DF_SIMULATED_1"],
... )

JUPYTER NOTEBOOK Examples

postprocessinglib.evaluation.data.generate_dataframes(csv_fpaths: list = None, sim_fpaths: list = None, obs_fpath: str = '', warm_up: int = 0, start_date: str = '', end_date: str = '', daily_agg: bool = False, da_method: str = '', weekly_agg: bool = False, wa_method: str = '', monthly_agg: bool = False, ma_method: str = '', yearly_agg: bool = False, ya_method: str = '', seasonal_p: bool = False, sp_dperiod: tuple[str, str] = [], sp_subset: tuple[str, str] = None, long_term: bool = False, lt_method=None, stat_agg: bool = False, stat_method: str = None, keep_stations: list[str] | None = None, drop_stations: list[str] | None = None, keep_numbers: list[int] | None = None, drop_numbers: list[int] | None = None) dict[str, pandas.core.frame.DataFrame]

Function to Generate the required dataframes

Parameters:
  • csv_fpath (string) – the path to the csv file. It can be relative or absolute. If given, sim_fpath and obs_fpath must be None.

  • sim_fpath (str) – The filepath to the simulated csv of data. If given obs_fpath must also be given and csv_fpath must be None.

  • obs_fpath (str) – The filepath to the observed csv of the data. If given sim_fpath must also be given and csv_fpath must be None.

  • warm_up (int) – number of days required to “warm up” the system

  • start_date (str) – The date at which you want to start calculating the metric in the format yyyy-mm-dd

  • end_date (str) – The date at which you want the calculations to end in the format yyyy-mm-dd

  • daily_agg (bool = False) – If True calculate and return the daily aggregate of the combined dataframes using da_method if its available

  • da_method (str = "") – If provided, it determines the method of daily aggregation. It is “mean” by default, see daily_aggregate() function

  • weekly_agg (bool = False) – If True calculate and return the weekly aggregate of the combined dataframes using wa_method if its available

  • wa_method (str = "") – If provided, it determines the method of weekly aggregation. It is “mean” by default, see weekly_aggregate() function

  • monthly_agg (bool = False) – If True calculate and return the monrhly aggregate of the combined dataframes using ma_method if its available

  • ma_method (str = "") – If provided, it determines the method of monthly aggregation. It is “mean” by default, see monthly_aggregate() function

  • yearly_agg (bool = False) – If True calculate and return the yearly aggregate of the combined dataframes using ya_method if its available

  • ya_method (str) – If provided, it determines the method of yearly aggregation. It is “mean” by default, see yearly_aggregate() function

  • seasonal_p (bool = False) – If True calculate and return a dataframe truncated to fit the parameters specified for the seasonal period Requirement:- sp_dperiod.

  • sp_dperiod (tuple(str, str)) – A list of length two with strings representing the start and end dates of the seasonal period (e.g. (01-01, 01-31) for Jan 1 to Jan 31.

  • sp_subset (tuple(str, str)) – A tuple of string values representing the start and end dates of the time range. Format is YYYY-MM-DD.

  • longterm (bool = False) – If True calculates the min, max and median values for the long term seasonal. It will also create additional dataframes depending on the value of ‘lt_method’.

  • lt_method (list[str]) – Specifies extra long term dataframes to create

  • stat_agg (bool = False) – If True calculates the min, max and median values accross the datetime (every day). It will also create additional series depending on the value of ‘stat_method’. It returns a dataframe with the aggregated values.

  • stat_method (list[str]) – Specifies extra stat aggregations to perform

Returns:

A dictionary containing each Dataframe requested. Its default content is:

  • DF = merged dataframe

  • DF_SIMULATED = all simulated data

  • DF_OBSERVED = all observed data

Depending on which you requested it can also contain:

  • DF_DAILY = dataframe aggregated by days of the year

  • DF_WEEKLY = dataframe aggregated by the weeks of the year

  • DF_MONTHLY = dataframe aggregated by months of the year

  • DF_YEARLY = dataframe aggregated by all the years in the data

  • DF_CUSTOM = dataframe truncated as per the seasonal period parameters

  • DF_STATS = dataframe aggregated by the statistics of the data accross the datetime

  • DF_LONGTERM_MIN = long term seasonal dataframe aggregated using the min of its daily values

  • DF_LONGTERM_MAX = long term seasonal dataframe aggregated using the max of its daily values

  • DF_LONGTERM_MEAN = long term seasonal dataframe aggregated using the mean of its daily values

    Depending on “lt_method,” you can also request that it contain:

    • DF_LONGTERM_SUM = long term seasonal dataframe aggregated using the sum of its daily values

    • DF_LONGTERM_MEDIAN = long term seasonal dataframe aggregated using the median of its daily values

    • DF_LONGTERM_Q1 = long term seasonal dataframe aggregated showing the first quartile of its daily

    • DF_LONGTERM_Q2 = long term seasonal dataframe aggregated showing the second quartile of its daily

    • DF_LONGTERM_Q3 = long term seasonal dataframe aggregated showing the third quartile of its daily

Return type:

dict[str, pd.dataframe]

Example

See linked jupyter notebook file for usage instances

Aggregation helpers

postprocessinglib.evaluation.data.daily_aggregate(df: DataFrame, method: str = 'mean', use_doy_index: bool = True) DataFrame

Aggregate a DataFrame by day using the specified method.

Parameters:
  • df (pd.DataFrame) – DataFrame with a datetime index and numerical columns.

  • method (str) – Aggregation method: “mean”, “sum”, “median”, “min”, “max”, or “inst” (last value of day).

  • use_doy_index (bool) – If True, index is formatted as “YYYY/DOY” (e.g., “1981/001”), else keep datetime index.

Returns:

Aggregated DataFrame indexed by day.

Return type:

pd.DataFrame

Examples

Extraction of a Daily Aggregate

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> merged_df = DATAFRAMES["DF"]
>>> print(merged_df)
            QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1980-12-31           10.20       2.530770            NaN       1.006860
1981-01-01            9.85       2.518999            NaN       1.001954
1981-01-02           10.20       2.507289            NaN       0.997078
1981-01-03           10.00       2.495637            NaN       0.992233
1981-01-04           10.10       2.484073            NaN       0.987417
...                    ...            ...             ...            ...
2017-12-27             NaN       4.418050            NaN       1.380227
2017-12-28             NaN       4.393084            NaN       1.372171
2017-12-29             NaN       4.368303            NaN       1.364174
2017-12-30             NaN       4.343699            NaN       1.356237
2017-12-31             NaN       4.319275            NaN       1.348359
>>> # Extract the daily aggregate by mean(default aggregation method)
>>> daily_agg = data.daily_aggregate(df=merged_df)
>>> print(daily_agg)
              QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
    1980/366           10.20       2.530770             NaN       1.006860
    1981/001            9.85       2.518999             NaN       1.001954
    1981/002           10.20       2.507289             NaN       0.997078
    1981/003           10.00       2.495637             NaN       0.992233
    1981/004           10.10       2.484073             NaN       0.987417
    ...                  ...            ...             ...            ...
    2017/361             NaN       4.418050             NaN       1.380227
    2017/362             NaN       4.393084             NaN       1.372171
    2017/363             NaN       4.368303             NaN       1.364174
    2017/364             NaN       4.343699             NaN       1.356237
    2017/365             NaN       4.319275             NaN       1.348359

JUPYTER NOTEBOOK Examples

postprocessinglib.evaluation.data.weekly_aggregate(df: DataFrame, method: str = 'mean', use_week_start_index: bool = True) DataFrame

Aggregates a DataFrame by week using the specified method.

Parameters:
  • df (pd.DataFrame) – DataFrame with datetime index and numerical columns.

  • method (str) – Aggregation method: “mean”, “sum”, “median”, “min”, “max”, “inst” (last value).

  • use_week_start_index (bool) – If True, the index will be the Monday of each week; otherwise, uses “YYYY.WW” string format.

Returns:

Aggregated DataFrame indexed by week.

Return type:

pd.DataFrame

Examples

Extraction of a Weekly Aggregate

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> merged_df = DATAFRAMES["DF"]
>>> print(merged_df)
                QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
    1980-12-31           10.20       2.530770            NaN       1.006860
    1981-01-01            9.85       2.518999            NaN       1.001954
    1981-01-02           10.20       2.507289            NaN       0.997078
    1981-01-03           10.00       2.495637            NaN       0.992233
    1981-01-04           10.10       2.484073            NaN       0.987417
    ...                   ...            ...             ...            ...
    2017-12-27             NaN       4.418050            NaN       1.380227
    2017-12-28             NaN       4.393084            NaN       1.372171
    2017-12-29             NaN       4.368303            NaN       1.364174
    2017-12-30             NaN       4.343699            NaN       1.356237
    2017-12-31             NaN       4.319275            NaN       1.348359
>>> # Extract the weekly aggregate by taking the minumum value per week
>>> weekly_agg = data.weekly_aggregate(df=merged_df, method="min")
>>> print(weekly_agg)
             QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
    1980.52           10.20       2.530770             NaN       1.006860
    1981.00            9.85       2.495637             NaN       0.992233
    1981.01            8.70       2.416050             NaN       0.959137
    1981.02            8.24       2.339655             NaN       0.927429
    1981.03            7.86       2.266305             NaN       0.897038
    ...                 ...            ...             ...            ...
    2017.49             NaN       4.900197             NaN       1.536146
    2017.50             NaN       4.705044             NaN       1.472965
    2017.51             NaN       4.519744             NaN       1.413064
    2017.52             NaN       4.343699             NaN       1.356237
    2017.53             NaN       4.319275             NaN       1.348359

JUPYTER NOTEBOOK Examples

postprocessinglib.evaluation.data.monthly_aggregate(df: DataFrame, method: str = 'mean') DataFrame
Returns the monthly aggregate value of a given dataframe based

on the chosen method

Parameters:
  • df (pd.DataFrame) – A pandas DataFrame with a datetime index and columns containing float type values.

  • method (string) – string indicating the method of aggregation i.e, mean, min, max, median, sum and instantaneous - default is mean

Returns:

The new dataframe with the values aggregated by months of the year

Return type:

pd.DataFrame

Examples

Extraction of a Monthly Aggregate

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> merged_df = DATAFRAMES["DF"]
>>> print(merged_df)
            QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1980-12-31           10.20       2.530770            NaN       1.006860
1981-01-01            9.85       2.518999            NaN       1.001954
1981-01-02           10.20       2.507289            NaN       0.997078
1981-01-03           10.00       2.495637            NaN       0.992233
1981-01-04           10.10       2.484073            NaN       0.987417
...                    ...            ...             ...            ...
2017-12-27             NaN       4.418050            NaN       1.380227
2017-12-28             NaN       4.393084            NaN       1.372171
2017-12-29             NaN       4.368303            NaN       1.364174
2017-12-30             NaN       4.343699            NaN       1.356237
2017-12-31             NaN       4.319275            NaN       1.348359
>>> # Extract the monthly aggregate by taking the instantaenous value of each month
>>> monthly_agg = data.monthly_aggregate(df=merged_df, method="inst")
>>> print(monthly_agg)
         QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1980-12           10.20       2.530770            NaN       1.006860
1981-01            8.62       2.195846            NaN       0.867900
1981-02            7.20       1.940355            NaN       0.762678
1981-03            7.25       1.699932            NaN       0.664341
1981-04           15.30       3.859564            NaN       0.584523
...                 ...            ...             ...            ...
2017-08             NaN      31.050230            NaN      17.012710
2017-09             NaN      16.144130            NaN      11.127440
2017-10             NaN       6.123822            NaN       1.938875
2017-11             NaN       5.164804            NaN       1.621027
2017-12             NaN       4.319275            NaN       1.348359

JUPYTER NOTEBOOK Examples

postprocessinglib.evaluation.data.yearly_aggregate(df: DataFrame, method: str = 'mean') DataFrame
Returns the yearly aggregate value of a given dataframe based

on the chosen method

Parameters:
  • df (pd.DataFrame) – A pandas DataFrame with a datetime index and columns containing float type values.

  • method (string) – string indicating the method of aggregation i.e, mean, min, max, median, sum and instantaneous - default is mean

Returns:

The new dataframe with the values aggregated yearly

Return type:

pd.DataFrame

Examples

Extraction of a Yearly Aggregate

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> merged_df = DATAFRAMES["DF"]
>>> print(merged_df)
            QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1980-12-31           10.20       2.530770            NaN       1.006860
1981-01-01            9.85       2.518999            NaN       1.001954
1981-01-02           10.20       2.507289            NaN       0.997078
1981-01-03           10.00       2.495637            NaN       0.992233
1981-01-04           10.10       2.484073            NaN       0.987417
...                    ...            ...             ...            ...
2017-12-27             NaN       4.418050            NaN       1.380227
2017-12-28             NaN       4.393084            NaN       1.372171
2017-12-29             NaN       4.368303            NaN       1.364174
2017-12-30             NaN       4.343699            NaN       1.356237
2017-12-31             NaN       4.319275            NaN       1.348359
>>> # Extract the yearly aggregate by taking the sum of the entire year's values
>>> yearly_agg = data.yearly_aggregate(df=merged_df, method="sum")
>>> print(yearly_agg)
          QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
    1980           10.20       2.530770            0.00       1.006860
    1981        10386.27    9273.383180         2424.26    4007.949313
    1982        12635.47    8874.369067         3163.23    4123.606233
    1983        11909.23    8214.793557         3198.17    3810.515038
    1984        13298.33    7459.351671         2699.42    3431.981225
    1985        13730.50    8487.241498         2992.40    3756.822014
    1986        12576.84   10651.883689         3103.15    4794.825198
    1987        15066.57    8947.025052         3599.74    4260.917801
    1988        12642.53   10377.241643         2972.87    4614.234614
    1989        10860.93   11118.336160         2624.79    5193.322199
    1990        11129.76   11226.011936         2650.50    5273.448490
    1991        14354.61   12143.013205         3215.89    5732.371571
    1992        17033.16    9919.064629         3885.72    4566.044810
    1993        15238.65   10265.868953         3598.67    4700.055333
    1994        15623.13    8064.390172         3777.16    4053.331783
    1995        12892.89   10526.186570         3817.08    5006.592916
    1996        12551.39    9191.247302         3249.36    4195.638177
    1997           11.20    9078.253847            0.00    4469.825844
    1998            0.00    9421.178402         3598.21    4650.819283
    1999            0.00    8683.319193         3220.62    4032.381482
    2000            0.00   10181.718825            0.00    4921.033689
    2001            0.00    7076.942619            0.00    3525.593143
    2002            0.00    8046.998223            0.00    4048.992212
    2003            0.00    9017.711719            0.00    4517.088194
    2004            0.00   11726.707770            0.00    4941.582065
    2005            0.00   11975.002047            0.00    4700.295391
    2006            0.00    8972.956022            0.00    4038.214837
    2007            0.00   11089.242586            0.00    5035.426223
    2008            0.00    9652.958064            0.00    4630.531909
    2009            0.00    8762.313253            0.00    3659.265122
    2010            0.00    8006.621137            0.00    3475.115315
    2011            0.00   10158.521707            0.00    4748.153725
    2012            0.00   13141.668859            0.00    5847.670810
    2013            0.00   11389.072535            0.00    4769.917090
    2014            0.00   12719.851800            0.00    5298.904086
    2015            0.00   12258.178724            0.00    5362.497143
    2016            0.00    9989.779678            0.00    4269.909376
    2017            0.00    8801.897128            0.00    4226.258100

JUPYTER NOTEBOOK Examples

postprocessinglib.evaluation.data.twelve_hour_aggregate(df: DataFrame, method: str = 'mean', use_custom_index: bool = True) DataFrame

Aggregate a DataFrame by 12-hour intervals using the specified method.

Parameters:
  • df (pd.DataFrame) – DataFrame with a datetime index and numerical columns.

  • method (str) – Aggregation method: “mean”, “sum”, “median”, “min”, “max”, or “inst” (last value of interval).

  • use_custom_index (bool) – If True, index is formatted as “YYYY-MM-DD_HH:MM”, else keep datetime index.

Returns:

Aggregated DataFrame indexed by 12-hour interval.

Return type:

pd.DataFrame

Examples

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> merged_df = DATAFRAMES["DF"]
>>> twelve_hour_agg = data.twelve_hour_aggregate(df=merged_df)
>>> print(twelve_hour_agg)
postprocessinglib.evaluation.data.long_term_seasonal(df: DataFrame, method: str = 'mean') DataFrame

Computes the long-term seasonal aggregate values of a given DataFrame by applying the specified aggregation method to each day across all years in the provided time period. The resulting data is aggregated into a single year (1 to 366 days).

Parameters:
  • df (pd.DataFrame) – A pandas DataFrame with a datetime index and columns containing float type values. Each column represents a time series to be aggregated.

  • method (str, optional) –

    The aggregation method to apply across all years for each specific day of the year. Supported methods include:

    • ’mean’: Calculate the mean value of that specific day (e.g., January 1st) across all years in the dataset (default).

    • ’min’: Calculate the minimum value of that specific day across all years.

    • ’max’: Calculate the maximum value of that specific day across all years.

    • ’median’: Calculate the median value of that specific day across all years.

    • ’sum’: Calculate the sum of that specific day across all years.

    • ’QX’: Calculate a specific quantile, where X is a number between 0 and 100 (e.g., ‘Q75’ for the 75th percentile). Use uppercase ‘Q’ for quantiles.

    Default is mean

Returns:

A DataFrame with 366 rows (representing days of the year, including February 29th) and the same columns as the input. Each row represents the aggregated value for that specific day across all years.

Return type:

pd.DataFrame

Examples

Extraction of a Long term seasonal aggregation

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> merged_df = DATAFRAMES["DF"]
>>> print(merged_df)
            QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1980-12-31           10.20       2.530770             NaN       1.006860
1981-01-01            9.85       2.518999             NaN       1.001954
1981-01-02           10.20       2.507289             NaN       0.997078
1981-01-03           10.00       2.495637             NaN       0.992233
1981-01-04           10.10       2.484073             NaN       0.987417
...                    ...            ...             ...            ...
2017-12-27             NaN       4.418050             NaN       1.380227
2017-12-28             NaN       4.393084             NaN       1.372171
2017-12-29             NaN       4.368303             NaN       1.364174
2017-12-30             NaN       4.343699             NaN       1.356237
2017-12-31             NaN       4.319275             NaN       1.348359
>>> # Extract the long term Seasonal Aggregation
>>> long_term_seasonal = data.long_term_seasonal(df=merged_df) # Recall the default is mean
>>> print(long_term_seasonal)
      QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
jday
1           9.446471       4.037666             NaN       1.130686
2           9.428125       4.014474             NaN       1.123915
3           9.660625       3.991451             NaN       1.117196
4           9.804375       3.968602             NaN       1.110529
5           9.787500       3.945921             NaN       1.103913
...              ...            ...             ...            ...
362         9.942500       4.188140             NaN       1.169614
363         9.695000       4.163847             NaN       1.162533
364         9.633125       4.139735             NaN       1.155507
365         9.516875       4.115805             NaN       1.148535
366         9.936000       4.243073             NaN       1.173174
>>> # Obtain the Upper Quartile - Q75
>>> long_term_seasonal = data.long_term_seasonal(df=merged_df, method = 'Q75')
>>> print(long_term_seasonal)
      QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
jday
1             10.100       4.830453             NaN       1.315370
2             10.200       4.801530             NaN       1.306986
3             10.550       4.772831             NaN       1.298670
4             10.500       4.744344             NaN       1.290422
5             10.700       4.716085             NaN       1.282241
...              ...            ...             ...            ...
362           11.175       4.978491             NaN       1.372171
363           11.075       4.948421             NaN       1.364174
364           11.100       4.918590             NaN       1.356237
365           10.775       4.888982             NaN       1.348359
366           11.300       4.765379             NaN       1.317393
postprocessinglib.evaluation.data.seasonal_period(df: DataFrame, daily_period: tuple[str, str], subset: tuple[str, str] = None, years: list[int] = None) DataFrame

Creates a dataframe with a specified seasonal period

Parameters:
  • merged_dataframe (DataFrame) – A pandas DataFrame with a datetime index and columns containing float type values.

  • daily_period (tuple(str, str)) – A tuple of two strings representing the start and end dates of the seasonal period (e.g. (01-01, 01-31) for Jan 1 to Jan 31.

  • subset (tuple(str, str)) – A tuple of string values representing the start and end dates of the subset. Format is YYYY-MM-DD.

  • years (list[int]) – A list of years to filter the dataframe by. If provided, only data from these years will be included.

Returns:

Pandas dataframe that has been truncated to fit the parameters specified for the seasonal period.

Return type:

pd.Dataframe

Examples

Extraction of a Seasonal period

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> merged_df = DATAFRAMES["DF"]
>>> print(merged_df)
            QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1980-12-31           10.20       2.530770             NaN       1.006860
1981-01-01            9.85       2.518999             NaN       1.001954
1981-01-02           10.20       2.507289             NaN       0.997078
1981-01-03           10.00       2.495637             NaN       0.992233
1981-01-04           10.10       2.484073             NaN       0.987417
...                    ...            ...             ...            ...
2017-12-27             NaN       4.418050             NaN       1.380227
2017-12-28             NaN       4.393084             NaN       1.372171
2017-12-29             NaN       4.368303             NaN       1.364174
2017-12-30             NaN       4.343699             NaN       1.356237
2017-12-31             NaN       4.319275             NaN       1.348359
>>> # Extract the time period - January 1st till 31st - using the subset
>>> seasonal_p = data.seasonal_period(df=merged_df, daily_period=('01-01', '01-31'),
                        subset = ('1981-01-01', '1985-12-31'))
>>> print(seasonal_p)
            QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1981-01-01            9.85       2.518999             NaN       1.001954
1981-01-02           10.20       2.507289             NaN       0.997078
1981-01-03           10.00       2.495637             NaN       0.992233
1981-01-04           10.10       2.484073             NaN       0.987417
1981-01-05            9.99       2.472571             NaN       0.982631
...                    ...            ...              ...            ...
1985-01-27           11.40       2.734883             NaN       0.809116
1985-01-28           11.60       2.721414             NaN       0.805189
1985-01-29           11.70       2.708047             NaN       0.801287
1985-01-30           11.60       2.694749             NaN       0.797410
1985-01-31           11.60       2.681550             NaN       0.793556
>>> # Extract the time period - January 1st till 10th - using the years
>>> seasonal_p = data.seasonal_period(df=merged_df, daily_period=('01-01', '01-10'),
                        year = [1981, 1983, 1985])
>>> print(seasonal_p)
            QOMEAS_05BB001  QOSIM_05BB001  QOMEAS_05BA001  QOSIM_05BA001
1981-01-01            9.85       2.518999             NaN       1.001954
1981-01-02           10.20       2.507289             NaN       0.997078
1981-01-03           10.00       2.495637             NaN       0.992233
1981-01-04           10.10       2.484073             NaN       0.987417
1981-01-05            9.99       2.472571             NaN       0.982631
1981-01-06            9.69       2.461128             NaN       0.977875
1981-01-07            9.51       2.449758             NaN       0.973148
1981-01-08            8.90       2.438459             NaN       0.968450
1981-01-09            8.70       2.427217             NaN       0.963778
1981-01-10            9.00       2.416050             NaN       0.959137
1983-01-01            8.98       5.371416             NaN       2.441398
1983-01-02            8.89       5.340234             NaN       2.426411
1983-01-03            9.12       5.309281             NaN       2.411540
1983-01-04            9.37       5.278562             NaN       2.396784
1983-01-05            9.40       5.248067             NaN       2.382142
1983-01-06            9.54       5.217788             NaN       2.367613
1983-01-07            9.44       5.187746             NaN       2.353197
1983-01-08            9.21       5.157917             NaN       2.338891
1983-01-09            9.03       5.128305             NaN       2.324696
1983-01-10            8.35       5.098919             NaN       2.310610
1985-01-01           10.10       3.116840             NaN       0.920429
1985-01-02           10.20       3.100937             NaN       0.915796
1985-01-03           10.70       3.085116             NaN       0.911193
1985-01-04            9.90       3.069416             NaN       0.906620
1985-01-05            9.51       3.053805             NaN       0.902076
1985-01-06            9.27       3.038310             NaN       0.897561
1985-01-07            9.55       3.022904             NaN       0.893076
1985-01-08           10.10       3.007609             NaN       0.888619
1985-01-09           10.00       2.992402             NaN       0.884191
1985-01-10           10.00       2.977300             NaN       0.879791

JUPYTER NOTEBOOK Examples

postprocessinglib.evaluation.data.stat_aggregate(df: DataFrame, method: str = 'mean') DataFrame

Aggregates simulation columns (excluding ‘QOMEAS’) for each station on each date, using the specified method.

Parameters:
  • df (pd.DataFrame) – Input DataFrame with MultiIndex columns (station, variable).

  • method (str) – Aggregation method. One of [‘mean’, ‘sum’, ‘min’, ‘max’, ‘median’, or a quantile like ‘q25’].

Returns:

Aggregated DataFrame, same shape index, with (station, method) as column index.

Return type:

pd.DataFrame

Example

>>> import pandas as pd
>>> import numpy as np
>>> from postprocessinglib.evaluation.data import stat_aggregate
>>> # Create a sample DataFrame with MultiIndex columns
>>> dates = pd.date_range(start="2020-01-01", periods=5)
>>> columns = pd.MultiIndex.from_tuples([
...     ('station1', 'SIM1'),
...     ('station1', 'SIM2'),
...     ('station1', 'QOMEAS'),
...     ('station2', 'SIM1'),
...     ('station2', 'SIM2'),
...     ('station2', 'QOMEAS')
... ])
>>> data = np.random.rand(5, 6)
>>> df = pd.DataFrame(data, index=dates, columns=columns)
>>> print(df)
            station1                      station2
                SIM1      SIM2    QOMEAS      SIM1      SIM2    QOMEAS
2020-01-01  0.932333  0.100856  0.267621  0.018376  0.120676  0.852772
2020-01-02  0.290211  0.974238  0.807189  0.904228  0.209833  0.187018
2020-01-03  0.886652  0.432085  0.028842  0.292808  0.236621  0.602609
2020-01-04  0.956156  0.586426  0.866117  0.630563  0.632772  0.722546
2020-01-05  0.941239  0.508157  0.355760  0.674235  0.668786  0.290178
>>> # Apply stat_aggregate
>>> agg_df = stat_aggregate(df, method='q75')
>>> print(agg_df)
            station1  station2
                Q75       Q75
2020-01-01  0.724464  0.095101
2020-01-02  0.803231  0.730630
2020-01-03  0.773010  0.278761
2020-01-04  0.863723  0.632220
2020-01-05  0.832968  0.672872

JUPYTER NOTEBOOK Examples

postprocessinglib.evaluation.data.station_dataframe(observed: DataFrame, simulated: DataFrame, stations: list[int] = []) list[pandas.core.frame.DataFrame]

Extracts each station’s data from the observed and simulated

Parameters:
  • observed (pd.DataFrame) – Observed values[1: Datetime ; 2+: Streamflow Values]

  • simulated (pd.DataFrame) – Simulated values[1: Datetime ; 2+: Streamflow Values]

  • stations (list[int]) – numbers pointing to the location of the stations in the list of stations. Values can be any number from 1 to number of stations in the data

Returns:

Each station’s observed and simulated data in a single dataframe - in a list

Return type:

list[pd.DataFrame]

Example

Extraction of the Data from Individual Stations

>>> from postprocessinglib.evaluation import data
>>> path = 'MESH_output_streamflow_1.csv'
>>> DATAFRAMES = data.generate_dataframes(csv_fpath=path, warm_up=365)
>>> observed = DATAFRAMES["DF_OBSERVED"]
>>> simulated = DATAFRAMES["DF_SIMULATED"]
>>> STATIONS = data.station_dataframe(observed=observed, simulated=simulated)
>>> for station in STATIONS:
>>>     print(station)
                QOMEAS_05BB001  QOSIM_05BB001
    1980-12-31           10.20       2.530770
    1981-01-01            9.85       2.518999
    1981-01-02           10.20       2.507289
    1981-01-03           10.00       2.495637
    1981-01-04           10.10       2.484073
    ...                    ...            ...
    2017-12-27             NaN       4.418050
    2017-12-28             NaN       4.393084
    2017-12-29             NaN       4.368303
    2017-12-30             NaN       4.343699
    2017-12-31             NaN       4.319275
    [13515 rows x 2 columns]
                QOMEAS_05BA001  QOSIM_05BA001
    1980-12-31             NaN       1.006860
    1981-01-01             NaN       1.001954
    1981-01-02             NaN       0.997078
    1981-01-03             NaN       0.992233
    1981-01-04             NaN       0.987417
    ...                    ...            ...
    2017-12-27             NaN       1.380227
    2017-12-28             NaN       1.372171
    2017-12-29             NaN       1.364174
    2017-12-30             NaN       1.356237
    2017-12-31             NaN       1.348359
    [13515 rows x 2 columns]

JUPYTER NOTEBOOK Examples