Comparison of Box-Jenkins Models with Machine Learning Algorithms for streamflow yield modeling (Case study: Taleghan watershed)

Document Type : Research Paper

Authors

1 Department of Reclamation of Arid and Mountainous Regions Engineering, Faculty of Natural Resources, University of Tehran

2 Assistant professor, Department of reclamation of arid and mountainous regions Engineering, Faculty of Natural Resources, University of Tehran, Karaj, Iran.

3 Natural Geography Group, Faculty of Geography, University of Tehran

10.22059/ijswr.2025.403797.670025

Abstract

This study compared the performance of ARIMA, SARIMA, ELM and XGBoost models for modeling and forecasting monthly streamflow yield in the Taleghan watershed. The dataset included monthly mean discharge time series from five hydrometric stations (Joestan, Mehran Joestan, Dehdar, Gatehdeh, and Alizan Joestan) over a 30-year period from the 1989–1990 to 2018–2019 water years. Data were split 80% for training and 20% for testing, with models trained and evaluated using four input combinations of one to four prior months. Performance was assessed via root mean square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe efficiency (NSE), and correlation coefficient (R). Results showed machine learning models outperformed classical ones, especially at Joestan station. XGBoost achieved the best results with NSE of 0.978 (training) and 0.961 (testing) using four-month inputs. Increasing prior months improved accuracy; for XGBoost testing, four months raised NSE by 2.1% (0.94 to 0.96) and cut RMSE by 2.6% (0.16 to 0.15). Machine learning models offer effective tools for streamflow yield forecasting and water resources management. Future work could focus on hybrid models and climatic data integration.

Keywords

Main Subjects