Streamflow predictions are very important to the economic and human development of a country. These predictions are used in the quantification and distribution of the water resources, to design new hydraulic infrastructure, quantify risk, just to mention some. Our study area, Chile, has an extensive dataset with streamflow data, meteorological forcing, and attributes (CAMELS-CL). The goal of our study is to use this information to develop a national streamflow prediction model.
Recent studies have shown that Machine Learning (ML) can provide better predictive performance than traditional process-based (PB) models. These findings create the opportunity to bridge the gap between ML and PB models by transferring insights gained via the process of developing an ML model into improvements for the PB model(s). With this in mind, we implemented the GR4J process-based catchment model as a baseline, and two ML-based models, Random Forest (RF) decision tree approach, and the Long-Short Term Memory (LSTM) dynamic state variable approach, on 322 selected Chilean catchments. The three models were compared in detail to examine their strengths, weaknesses, and to determine the best candidate for a national model. Our results show that none of the three models performed best across the entire country, and all of them had problems in the north of Chile. This indicates that additional informative attributes and variables must be incorporated into the database. Furthermore, the models show complementary performance abilities, which suggests that an ensemble of them would merge their respective strengths and best predict streamflow. Overall, the model performance results were found to be strongly related to aridity, showing that this attribute is an important variable to characterize the behaviors of different catchments.