Skip to main content

A Generalized KGE Metric for Improved Training (information Extraction) and Evaluation of Dynamical Systems Models

 

Evaluating model performance is a foundational aspect of hydrological modeling, crucial not only for model development and calibration but also for effectively communicating results to peers, stakeholders, and decision-makers. A major challenge arises from spatio-temporal non-stationarity in streamflow data, which refers to variations in statistical properties such as mean and median over time and across regions. Traditional metrics like the Nash-Sutcliffe Efficiency (NSE) or Mean Squared Error (MSE) often fail to account for this non-stationarity. These metrics tend to be sensitive to extreme values and may underrepresent the variability in the signal, especially when models are simplified representations of complex hydrological processes. To address these shortcomings, the Kling-Gupta Efficiency (KGE) has gained popularity, particularly its skill-score version KGEss. While KGEss offers a more nuanced view of performance, it implicitly assumes stationarity and works best with Gaussian-like flow distributions.

Notably, when the benchmarking scale is reduced from the long-term (LT) mean to seasonal or sub-seasonal levels, the temporal non-stationarity in the observed streamflow’s mean and median significantly diminishes. At these shorter temporal scales, streamflow dynamics become more stable and statistically consistent, allowing performance metrics like KGEss to provide a more meaningful and accurate assessment of model behavior. This insight highlights the importance of choosing benchmarking scales that align with the temporal structure of the data. Building on this understanding, we introduce a Generalized KGE (GKGE) metric, which replaces the fixed benchmark (typically the observed long-term mean) with a user-defined, context-sensitive benchmark, such as seasonal mean or quantile-based flow group mean. This flexibility allows the metric to adapt to the underlying structure of the data and extract more relevant information during model training. Empirical results using GKGE in model training reveal substantial improvements in simulation accuracy, particularly for low-flow conditions where traditional metrics tend to underperform over various catchments of different behaviors across CONUS.