Quick Start

Prerequisites

As a first step you need a Python environment with all required dependencies. The recommended way is to use Anaconda and to create a new environment using our predefined environment files in environments.

Use:

conda env create -f environments/environment.yml

Installation

The Library is not yet available on PyPi so it will have to be installed directly from the git repo

To install the library use:

pip install git+https://github.com/fuadyassin/NHS_PostProcessing.git

If you want to install an editable version to implement your own models or dataset you’ll have to clone the repository using:

git clone https://github.com/fuadyassin/NHS_PostProcessing.git

or just download the zip file here

After this, you are then left with a directory called NHS_PostProcessing or NHS_PostProcessing-main. Next, we’ll go to that directory and install a local, editable copy of the package:

cd NHS_PostProcessing
pip install -e .

For the MESH direct-ingestion workflow (generate_dataframes_from_mesh), also install:

pip install xarray geopandas

Workflows

There are two main entry-point workflows depending on what files you have available.

Workflow A — CSV-based (existing outputs)

Use this workflow when you already have a MESH_output_streamflow.csv (produced by an earlier notebook run or a previous script).

from postprocessinglib.evaluation import data, metrics, visuals

# Step 1 – load data
DATAFRAMES = data.generate_dataframes(
    csv_fpaths=["MESH_output_streamflow.csv"],
    warm_up=365,
)

# Step 2 – compute metrics
results = metrics.calculate_all_metrics(
    observed=DATAFRAMES["DF_OBSERVED"],
    simulated=DATAFRAMES["DF_SIMULATED"],
)
print(results)

# Step 3 – plot
visuals.plot(
    merged_dataframe=DATAFRAMES["DF"],
    num_stations=1,
    title="Streamflow comparison",
)

Workflow B — MESH NetCDF direct ingestion (no intermediate CSV)

Use this workflow to go directly from raw MESH model outputs to metrics and plots without writing an intermediate CSV file.

Required inputs (produced once per study domain):

combined_discharge_stations_comids.gpkg — gauge stations with COMID assignments (from the COMID-matching pre-processing step).
MESH_input_streamflow_latlon.tb0 — observed streamflow in EnSim .tb0 format (from GenStreamflowAsync).
MESH_drainage_database_*.nc — MESH drainage database NetCDF.
QO_D_GRD.nc — MESH simulated streamflow output.

from postprocessinglib.evaluation import data, metrics, visuals

# Step 1 – load directly from MESH outputs
DATAFRAMES = data.generate_dataframes_from_mesh(
    input_stations_comids="combined_discharge_stations_comids.gpkg",
    input_obs="MESH_input_streamflow_latlon.tb0",
    input_ddb="MESH_drainage_database.nc",
    mesh_flow="QO_D_GRD.nc",
    warm_up=365,
)

# Step 2 – compute metrics (identical to Workflow A)
results = metrics.calculate_all_metrics(
    observed=DATAFRAMES["DF_OBSERVED"],
    simulated=DATAFRAMES["DF_SIMULATED"],
)
print(results)

# Step 3 – plot (identical to Workflow A)
visuals.plot(
    merged_dataframe=DATAFRAMES["DF"],
    num_stations=1,
    title="Streamflow comparison",
)

Comparing multiple model runs

Pass a list of NetCDF paths to generate_dataframes_from_mesh() to evaluate several runs side-by-side:

DATAFRAMES = data.generate_dataframes_from_mesh(
    input_stations_comids="combined_discharge_stations_comids.gpkg",
    input_obs="MESH_input_streamflow_latlon.tb0",
    input_ddb="MESH_drainage_database.nc",
    mesh_flow=[
        "Average_GRU_Params/run_A/QO_D_GRD.nc",
        "Average_GRU_Params/run_B/QO_D_GRD.nc",
    ],
    warm_up=365,
)

for i in [1, 2]:
    r = metrics.calculate_all_metrics(
        observed=DATAFRAMES["DF_OBSERVED"],
        simulated=DATAFRAMES[f"DF_SIMULATED_{i}"],
    )
    print(f"Run {i}:\n", r)

With aggregations

Both workflows accept the same aggregation flags:

DATAFRAMES = data.generate_dataframes_from_mesh(
    ...,
    monthly_agg=True, ma_method="mean",
    yearly_agg=True,  ya_method="sum",
    long_term=True,
    warm_up=365,
)

monthly  = DATAFRAMES["DF_MONTHLY"]
lt_mean  = DATAFRAMES["LONG_TERM_MEAN"]
lt_min   = DATAFRAMES["LONG_TERM_MIN"]

See Data Processing/Manipulation for the full list of returned dictionary keys and aggregation options, and Metrics Calculations for all available performance metrics.