validate_data

postprocessinglib.utilities._helper_functions.validate_data(observed: DataFrame, simulated: DataFrame)

Ensures that a set of observed and simulated dataframes are valid

Invalid in this case refers to the inputs not being dataframes, both Dataframes not having the same shape and size or we having dataframes with no data (empty dataframes). It goes through both the observed and simulated dataframes, comparing them where necessary, and making sure that the above conditions are met. If any of these conditions are not met, it raises the corresponding error.

Parameters:
  • observed (pd.DataFrame) – The observed dataframe being checked.

  • simulated (pd.DataFrame) – The simulated dataframe being checked.

Raises:
  • RuntimeError: – if the sizes or shapes of both dataframes are not the same or if the dataframes are empty

  • ValueError: – if the inputs are not dataframes

Example

>>> import numpy as np
>>> import pandas as pd
>>> from postprocessinglib.utilities import _helper_functions
>>> # Assuming we have the following dataframes: test_df
>>> print(test_df)
          obs1      sim1      obs2      sim2
1981  0.553080  0.266127  0.043270  0.109264
1982  0.034076  0.428959  0.507130  0.213583
1983  0.876142  0.330159  0.850529  0.522809
1984      -inf  0.474980  0.099652  0.959624
1985  0.439705  0.438630  0.294566       NaN
1986       NaN  0.134409       NaN  0.680744
1987  0.598378  0.668143  0.312386  0.345419
1988  0.934277  0.840275  0.491060       inf
1989  0.169541  0.557099  0.813971  0.006391
1990  0.219000       NaN  0.931811       NaN
>>> # extracting the observed and simulated dataframes
>>> obs = test_df.iloc[:, ::2]
>>> print(obs)
        obs1      obs2
1981  0.553080  0.043270
1982  0.034076  0.507130
1983  0.876142  0.850529
1984      -inf  0.099652
1985  0.439705  0.294566
1986       NaN       NaN
1987  0.598378  0.312386
1988  0.934277  0.491060
1989  0.169541  0.813971
1990  0.219000  0.931811
>>> sim = test_df.iloc[:, 1::2]
>>> print(sim)
        sim1      sim2
1981  0.266127  0.109264
1982  0.428959  0.213583
1983  0.330159  0.522809
1984  0.474980  0.959624
1985  0.438630       NaN
1986  0.134409  0.680744
1987  0.668143  0.345419
1988  0.840275       inf
1989  0.557099  0.006391
1990       NaN       NaN
>>> # Test 1: Testing the validate_data function with correct inputs as shown above
>>> _helper_functions.validate_data(observed=obs, simulated=sim)
>>> # No error is raised as the dataframes are valid
>>> # Test 2: Testing the validate_data function with incorrect inputs: different shapes/sizes of dataframes
>>> sim_test_2 = test_df.iloc[:, 1]  # Creating a simulated dataframe with different shape
>>> print(sim_test_1)
          sim1
1981  0.266127
1982  0.428959
1983  0.330159
1984  0.474980
1985  0.438630
1986  0.134409
1987  0.668143
1988  0.840275
1989  0.557099
1990       NaN
>>> _helper_functions.validate_data(observed=obs, simulated=sim_test_1)
>>> # Error is raised due to the different shapes of the dataframes
>>> # The error message is as follows: "Shapes of observations and simulations must match"
>>> Test3: Testing the validate_data function with incorrect inputs: simulated dataframe not being a dataframe
>>> sim_test_3 = sim.to_numpy()  # converting the simulated dataframe to a numpy array
>>> print(sim_test_3)
array([[0.26612732, 0.10926379],
    [0.42895924, 0.21358297],
    [0.33015897, 0.52280869],
    [0.47498004, 0.959624  ],
    [0.43862956,        nan],
    [0.13440895, 0.68074365],
    [0.66814327, 0.34541874],
    [0.84027532,        inf],
    [0.55709876, 0.00639057],
    [       nan,        nan]])
>>> _helper_functions.validate_data(observed=obs, simulated=sim_test_2)
>>> # Error is raised due to the simulated data not being a dataframe
>>> # The error message is as follows: "Both observed and simulated values must be pandas DataFrames."
>>> # Test 4: Testing the validate_data function with incorrect inputs: simulated dataframe being an empty dataframe
>>> sim_test_4 = pd.DataFrame(index = obs.index)
>>> print(sim_test_4)
Empty DataFrame
Columns: []
Index: [1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990]
>>> _helper_functions.validate_data(observed=obs, simulated=sim_test_3)
>>> # Error is raised due to the simulated data being an empty dataframe
>>> # The error message is as follows: "observed or simulated data is incomplete"