Forecast Evaluation
postprocessinglib.forecast.forecast Module
## Description Script to fetch streamflow forecasts at gauge stations from the National Surface and River Prediction System using GeoMet’s Web Coverage Service API.
Script params can be modified below the line if __name__ == ‘__main__’:
Before running this script for the first time, make sure your credentials are specified in config.cfg. The config.cfg and nsrps_model_stn_locs.json should live in the same directory as this script, or the paths to these files should be adjusted below the if __name__ == ‘__main__’: conditional block.
## Dependencies - pandas - xarray (w/ netCDF4 engine) - OWSLib >= 0.31.0 (more info: https://github.com/geopython/OWSLib)
## Anticipated Changes Currently, streamflow forecasts are available from the Deterministic Hydrologic Prediction System (DHPS). The forecasts from DHPS are the single, “best guess” of streamflow for the coming six days. Soon, forecasts will also be available from the corresponding Ensemble Hydrologic Prediction System (EHPS). EHPS produces a collection of probable streamflow forecasts out 16 days (32 days on Thursdays).
To query for streamflow forecasts at gauge stations, we must know the lat / lon coordinates of the gauge stations in the “model world”. For now, we can find the lat / lon model world coordinates from the file, nsrps_model_stn_locs.json. Soon, the model world coordinates will also be available on GeoMet. This script will require light modification to accommodate this change (see the function stns_on_grid).
## Known Limitations NSRPS assimilates observations from WSC and partner gauge stations. At the risk of grossly oversimplifying the assimilation process, we can think of the assimilated observations as being used “nudge” or “course correct” the forecasts. We have “model world” coordinates for 603 WSC stations whose obs are assimilated by the model. In theory, we can query for forecasts for gauge stations whose data is not assimilated (i.e. they don’t have model world coordinates), but this is more complicated. Note from Natalie: I have scripts to do this, but I’ve ommited the code here for simplicity and because I need to verify the results - happy to provide this code further into the pilot, if it’s needed.
Program runtime is a significant issue when querying for forecasts for more than a few dozen stations. I’ve tried several strategies for speeding things up so I’m detailing them here for future reference. Before that, some notes about querying the Web Coverage Service:
A query must be made for each forecast timestep. For a 6-day DHPS forecast, there are timesteps at an hourly interval, so a single forecast has 144 timesteps in total.
Queries to the WCS can be made for the whole grid, or for a subset of the grid.
In general, there are two strategies for querying the WCS for forecasts at gauge stations:
Query the whole grid for each forecast timestep and then extract streamflow at gauge stations after.
Query a small subset of the grid containing one station. Do this for each station for each forecast timestep.
Query for larger subsets of the grid and extract streamflow at several gauge stations in that subset after. Do this until forecasts for all stations have been extracted.
Unfortunately, methods 1 and 2 described above have approximately the same runtime. Method 2 might be slightly faster (90 mins to 2 hours to query for all 603 gauge stations). I did not try method 3.
Things I tried for method 2:
- Adding threading
This works well when a thread pool is used to asynchronously query for forecast timesteps. Forecasts for each stations are queried in serial. This is the method I use in this script.
- Adding multiprocessing with the threading from method A
I tried adding a process pool to asynchronously submit queries for stations, rather than querying stations in serial. This was definitely faster than method A, but the script would always fail - it seems that this method causes an exceedance of the max allowable number of queries to the GeoMet servers. Maybe sleeping some of the process workers might help…we could probably do a little more investigating here…
- Using asyncio and aiohttp with a GET request, rather than the OWSLib API
This was much slower than method A, so I didn’t pursue it further, but that could also be my ignorance - I have limited experience with asyncio.
## Contact natalie.gervasi@ec.gc.ca
## Last Modified September 2024 By Fuad Yassin fuad.yassin@ec.gc.ca
Functions
|
wcs factory function, returns a version specific WebCoverageService object |
|
wms factory function, returns a version specific WebMapService object |
|
Establishes a connection to GeoMet's WCS API. |
|
Retrieves the forecast time metadata from the WebMapService for the GeoMet layer (product) of interest. |
|
Returns the lat / lon index labels for the grid cell containing the "model world" gauge station. |
|
|
|
Calculate forecast errors for a list of files or DataFrames. |
|
Queries for streamflow forecasts from the WCS for a given list of gauge stations. |
|
Reads username and password from configuration file. |
|
Sorts an iterable naturally. |
|
Queries the WMS for a subset of the model grid for a single timestep from the forecast of interest. |
|
Prepare climatology DataFrames expanded to match timestamps of short_term_df, with separate lists of upper and lower bound DataFrames. |
|
Retrieves the "model world" gauge station locations from file. |