validate_data
- postprocessinglib.utilities._helper_functions.validate_data(observed: DataFrame, simulated: DataFrame)
Ensures that a set of observed and simulated dataframes are valid
Invalid in this case refers to the inputs not being dataframes, both Dataframes not having the same shape and size or we having dataframes with no data (empty dataframes). It goes through both the observed and simulated dataframes, comparing them where necessary, and making sure that the above conditions are met. If any of these conditions are not met, it raises the corresponding error.
- Parameters:
observed (pd.DataFrame) – The observed dataframe being checked.
simulated (pd.DataFrame) – The simulated dataframe being checked.
- Raises:
RuntimeError: – if the sizes or shapes of both dataframes are not the same or if the dataframes are empty
ValueError: – if the inputs are not dataframes
Example
>>> import numpy as np >>> import pandas as pd >>> from postprocessinglib.utilities import _helper_functions >>> # Assuming we have the following dataframes: test_df >>> print(test_df) obs1 sim1 obs2 sim2 1981 0.553080 0.266127 0.043270 0.109264 1982 0.034076 0.428959 0.507130 0.213583 1983 0.876142 0.330159 0.850529 0.522809 1984 -inf 0.474980 0.099652 0.959624 1985 0.439705 0.438630 0.294566 NaN 1986 NaN 0.134409 NaN 0.680744 1987 0.598378 0.668143 0.312386 0.345419 1988 0.934277 0.840275 0.491060 inf 1989 0.169541 0.557099 0.813971 0.006391 1990 0.219000 NaN 0.931811 NaN >>> # extracting the observed and simulated dataframes >>> obs = test_df.iloc[:, ::2] >>> print(obs) obs1 obs2 1981 0.553080 0.043270 1982 0.034076 0.507130 1983 0.876142 0.850529 1984 -inf 0.099652 1985 0.439705 0.294566 1986 NaN NaN 1987 0.598378 0.312386 1988 0.934277 0.491060 1989 0.169541 0.813971 1990 0.219000 0.931811 >>> sim = test_df.iloc[:, 1::2] >>> print(sim) sim1 sim2 1981 0.266127 0.109264 1982 0.428959 0.213583 1983 0.330159 0.522809 1984 0.474980 0.959624 1985 0.438630 NaN 1986 0.134409 0.680744 1987 0.668143 0.345419 1988 0.840275 inf 1989 0.557099 0.006391 1990 NaN NaN
>>> # Test 1: Testing the validate_data function with correct inputs as shown above >>> _helper_functions.validate_data(observed=obs, simulated=sim) >>> # No error is raised as the dataframes are valid
>>> # Test 2: Testing the validate_data function with incorrect inputs: different shapes/sizes of dataframes >>> sim_test_2 = test_df.iloc[:, 1] # Creating a simulated dataframe with different shape >>> print(sim_test_1) sim1 1981 0.266127 1982 0.428959 1983 0.330159 1984 0.474980 1985 0.438630 1986 0.134409 1987 0.668143 1988 0.840275 1989 0.557099 1990 NaN >>> _helper_functions.validate_data(observed=obs, simulated=sim_test_1) >>> # Error is raised due to the different shapes of the dataframes >>> # The error message is as follows: "Shapes of observations and simulations must match"
>>> Test3: Testing the validate_data function with incorrect inputs: simulated dataframe not being a dataframe >>> sim_test_3 = sim.to_numpy() # converting the simulated dataframe to a numpy array >>> print(sim_test_3) array([[0.26612732, 0.10926379], [0.42895924, 0.21358297], [0.33015897, 0.52280869], [0.47498004, 0.959624 ], [0.43862956, nan], [0.13440895, 0.68074365], [0.66814327, 0.34541874], [0.84027532, inf], [0.55709876, 0.00639057], [ nan, nan]]) >>> _helper_functions.validate_data(observed=obs, simulated=sim_test_2) >>> # Error is raised due to the simulated data not being a dataframe >>> # The error message is as follows: "Both observed and simulated values must be pandas DataFrames."
>>> # Test 4: Testing the validate_data function with incorrect inputs: simulated dataframe being an empty dataframe >>> sim_test_4 = pd.DataFrame(index = obs.index) >>> print(sim_test_4) Empty DataFrame Columns: [] Index: [1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990] >>> _helper_functions.validate_data(observed=obs, simulated=sim_test_3) >>> # Error is raised due to the simulated data being an empty dataframe >>> # The error message is as follows: "observed or simulated data is incomplete"