From 1-Year to 3-Year Inputs: How Extended Data Sequences Impact Data-Driven Model Performance in Streamflow Prediction
Aldo Tapia1 and Andrew Bennett1
1Department of Hydrology and Atmospheric Sciences. The University of Arizona, Tucson, AZ
Data-driven models (DDMs), including the widely used Long Short-Term Memory (LSTM) network, have outperformed process-based models in streamflow prediction by leveraging their ability to process large volumes of data, typically limited to one year of input. However, their ability to simulate streamflow effectively in water-limited catchments is still systematically lower than humid basins. One reason for this is that existing models are not capturing the influence of long-term dynamics of streamflow.
In this study: we ask can DDMs effectively capture longer-range interactions? We investigate the performance of a range of DDM architectures—LSTM, GRU, MLP-Mixer, and CNN—using one and three years of input data from the CAMELS dataset. We evaluate whether extending the input sequence allows these models to incorporate long-term hydrological processes. Additionally, we introduce synthetic precipitation and temperature perturbations far in the past to assess the sensitivity of DDMs to historical events. Our findings provide insights into the ability of DDMs to leverage long-term dependencies in hydrological modeling, highlighting potential advantages and limitations of extending input sequences for improved streamflow prediction. This work explores the possible limitations and avenues for improving streamflow predictions in arid regions and potentially offers insights of the importance of considering adding far past information, like meteorological droughts, into DDMs.