نوع مقاله : مقاله پژوهشی
نویسندگان
1 گروه مهندسی احیاء مناطق خشک و کوهستانی، دانشکده منابع طبیعی، دانشگاه تهران
2 استادیار، گروه مهندسی احیا مناطق خشک و کوهستانی، دانشکده منابع طبیعی، دانشگاه تهران، کرج، ایران.
3 گروه جغرافیای طبیعی، دانشکده جغرافیا، دانشگاه تهران
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
This study compared the performance of ARIMA, SARIMA, ELM and XGBoost models for modeling and forecasting monthly streamflow yield in the Taleghan watershed. The dataset included monthly mean discharge time series from five hydrometric stations (Joestan, Mehran Joestan, Dehdar, Gatehdeh, and Alizan Joestan) over a 30-year period from the 1989–1990 to 2018–2019 water years. Data were split 80% for training and 20% for testing, with models trained and evaluated using four input combinations of one to four prior months. Performance was assessed via root mean square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe efficiency (NSE), and correlation coefficient (R). Results showed machine learning models outperformed classical ones, especially at Joestan station. XGBoost achieved the best results with NSE of 0.978 (training) and 0.961 (testing) using four-month inputs. Increasing prior months improved accuracy; for XGBoost testing, four months raised NSE by 2.1% (0.94 to 0.96) and cut RMSE by 2.6% (0.16 to 0.15). Machine learning models offer effective tools for streamflow yield forecasting and water resources management. Future work could focus on hybrid models and climatic data integration.
کلیدواژهها [English]
Streamflow yield is one of the most important hydrological variables for water resources management in arid and semi-arid regions such as Iran. Accurate forecasting of monthly streamflow yield is essential for optimal reservoir operation, drought mitigation, agricultural water allocation, and sustainable water supply planning. Given Iran’s climate with low and highly irregular precipitation, reliable monthly streamflow yield forecasting is particularly critical. Major challenges in monthly streamflow yield forecasting include high temporal variability, non-stationarity, and pronounced nonlinear behavior of hydrological time series. Although classical statistical models such as ARIMA and SARIMA have long been applied, they frequently fail to capture the complex patterns inherent in monthly streamflow yield data. In recent years, machine learning and data-driven techniques have demonstrated significantly superior performance in modeling such intricate relationships. Despite these advances, issues persist, including the scarcity of long-term, high-quality monthly streamflow yield records, complex interactions between climate drivers and catchment characteristics, and the requirement for robust generalization to unseen conditions. Consequently, comparative studies and the development of hybrid approaches remain vital for further improvement. Given the pivotal role of accurate monthly streamflow yield forecasting in operational hydrology and long-term water resources planning in Iran, investment in advanced data-driven modeling has become a strategic necessity.
This study was conducted using long-term monthly mean discharge time series recorded at five hydrometric stations (Joestan, Mehran-Joestan, Dehdar, Gate-Deh, and Alizan-Joestan) within the Taleghan watershed, northern Iran. The dataset covers a continuous 30-year period from the water year 1368 to 1398 (1989–2019). Missing values in the monthly time series were reconstructed using linear regression and inverse distance weighting (IDW). Homogeneity of the series was verified using the Run Test at a 95% confidence level, while long-term persistence was confirmed by Hurst exponent values greater than 0.5, indicating the suitability of the data for time-series forecasting.
Four models were employed and compared:
Two classical approaches: ARIMA (for non-seasonal patterns) and SARIMA (for seasonal patterns)
Two advanced machine learning models: Extreme Learning Machine (ELM) and XGBoost
All models were developed using lagged monthly streamflow yield values (1 to 4 previous months) as predictors. Performance was assessed using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Nash–Sutcliffe Efficiency (NSE), and Pearson correlation coefficient (R). The dataset was divided into 80% training and 20% testing subsets, and k-fold cross-validation was applied to ensure model robustness and prevent overfitting. Stationarity was examined using the Augmented Dickey–Fuller test, and differencing was applied where necessary. All analyses and modeling were performed in R using the packages forecast, xgboost, elmNNRcpp, and hydroGOF.
The performance of the four models was systematically evaluated across the five stations using the monthly mean discharge time series. The classical ARIMA and SARIMA models provided reasonable results for relatively smooth series but exhibited clear limitations in representing sharp peaks and strong nonlinearity typically observed in monthly streamflow yield data. In contrast, the machine learning models consistently outperformed the statistical models at all stations.
Among all tested configurations, XGBoost achieved the highest accuracy. At Joestan station, using four lagged months as inputs, XGBoost yielded an NSE of 0.978 in the training period and 0.961 in the testing period. Increasing the number of lagged inputs from one to four months systematically improved forecasting accuracy; for XGBoost, the NSE in the testing phase increased from 0.940 to 0.960 (a 2.1% improvement), while RMSE decreased from 0.16 to 0.15 m³/s (a 6.2% reduction).
The Extreme Learning Machine (ELM) also performed strongly and offered considerably faster training times, making it a practical alternative under computational constraints. Overall, XGBoost proved to be the most robust and accurate model for monthly streamflow yield forecasting in the Taleghan watershed, demonstrating excellent generalization and stability.
The results strongly recommend the adoption of XGBoost as the preferred tool for operational monthly streamflow yield forecasting in semi-arid basins similar to Taleghan. ELM serves as an efficient high-speed alternative, whereas traditional ARIMA and SARIMA models remain suitable only for preliminary analyses of less complex monthly streamflow yield series. This study confirms the superior capability of advanced machine learning techniques in monthly streamflow yield time-series forecasting and provides a solid foundation for their operational implementation in water resources management in Iran.
All authors contributed equally to the conceptualization of the article and writing of the original and subsequent drafts.
Data available on request from the authors.
The authors would like to thank the reviewers and editor for their critical comments that helped to improve the paper. The authors gratefully acknowledge the support and facilities provided by the Department of Reclamation of arid and mountainous regions, Faculty of Natural Resources, University of Tehran, Iran.
The authors avoided data fabrication, falsification, plagiarism, and misconduct.
The author declares no conflict of interest.