ترکیب مدل ماشین یادگیری افراطی مناسب داده پرت (ORELM) با مدل خطی میانگین متحرک اتورگرسیو فصلی(SARIMA) برای بهبود دقت مدل‌سازی رواناب

نوع مقاله : مقاله پژوهشی

نویسندگان

1 گروه مهندسی عمران، دانشکده فنی مهندسی، دانشگاه رازی کرمانشاه، ایران

2 دانشیار گروه مهندسی آب، دانشکده کشاورزی، دانشگاه رازی کرمانشاه، ایران

3 گروه مهندسی عمران، دانشکده فنی مهندسی، دانشگاه رازی، کرمانشاه، ایران.

چکیده

پیش‌بینی دقیق و قابل اعتماد رواناب نقش مهمی در مدیریت منابع آب داشته، اما ماهیت پیچیده این پارامتر ‌‌می‎تواند چالش‎های عمده‎‍‎ای را برای توسعه مدل‎های پیش‎بینی مناسب ایجاد کند. دو مدل هیبرید براساس ترکیب دو مدل خطی و غیرخطی ساده برای مدل‌سازی رواناب ماهانه در ایستگاه هیدرولوژیکی 02PL005 در حوزه رودخانه سنت لارنس در کشور کانادا پیشنهاد گردیده است. مدل خطی میانگین متحرک اتورگرسیو فصلی SARIMA برای پرداختن به ویژگی‎های خطی و فصلی رواناب پیشنهاد شده است. در حالیکه مدل پرسپترون چند لایه (MLP) و ماشین یادگیری افراطی(ORELM) برای پرداختن به ویژگی‎های غیرخطی داده‎ها از طریق یادگیری ماشین و تشخیص الگو به کار برده شده است. به منظور افزایش دقت مدل‌سازی در مرحله اول مدل‌سازی ایستایی و نرمالیته داده‎ها مورد بررسی قرار گرفت و با انجام پیش‎پردازش‌‌ مناسب داده‎ها برای مدل‌سازی در بخش خطی آماده گردیدند. سپس با تعریف زیرسناریوهای مختلف و انجام مدل‌سازی از طریق مدل خطی، بهترین مدل خطی از طریق آماره‌های ریاضی مختلف شامل MAE، RMSE، R و AIC انتخاب گردید. در مرحله پایانی باقیمانده‎های مدل خطی توسط دو مدل غیرخطی شامل ANN و ORELM مدل‌سازی گردیدند. مقایسه نتایج مدل‎های هیبرید پیشنهادی نشان داد که مدل هیبرید SARIMA-ORELM با AIC=249.29، R=0.71، MAE=11.2 و RMSE=14.33  در تمامی معیارهای ریاضی بهتر از مدل SARIMA-MLP عمل می‎کند. همچنین نتایج مدل‎های هیبرید با مدل‎های متداول MLP، ORELM و SARIMA مقایسه گردید.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Combining Outlier Robust Extreme Learning Machine (ORELM) with seasonal autoregressive moving average linear model (SARIMA) to improve the accuracy of runoff modeling

نویسندگان [English]

  • Fereshteh Nourmohammadi Dehbalaei 1
  • Arash Azari 2
  • Ali Akbar Akhtari 3
1 Department of Information Science, Faculty of Management, University of Tehran, Tehran,
2 Assistant Professor, Department of Water Engineering, Razi University, kermanshah. Iran
3 Department of Civil Engineering, Razi University,Kermanshah, Iran Iran.
چکیده [English]

Accurate and reliable runoff forecasting has an important role in water resources management, but the complex nature of this parameter can create major challenges for the development of appropriate forecasting models. Two hybrid models based on the combination of two simple linear and non-linear models have been proposed for runoff modeling at hydrological station 02PL005 in the St. Lawrence River basin in Canada. Seasonal autoregressive moving average (SARIMA) linear model is proposed to address the linear and seasonal characteristics of runoff. While the artificial neural network (ANN) and Outlier Robust Extreme Learning Machine (ORELM) models have been used to deal with the nonlinear characteristics of the data through machine learning and pattern recognition. In order to increase the accuracy of the modeling, in the first stage of modeling, the normality and stationary of the data was examined, and by performing appropriate pre-processing, the data were prepared for modeling in the linear part. Then by defining different sub-scenarios and performing modeling through linear model, the best linear model was selected through different mathematical statistics including MAE, RMSE, R and AIC. In the final stage, the residuals of the linear model were modeled by two non-linear models including ANN and ORELM models. Comparing the results of the proposed hybrid models showed that the SARIMA-ORELM hybrid model with AIC=249.29, R=0.71, MAE=11.2 and RMSE=14.33 performs better than the SARIMA-MLP model in all mathematical criteria. Also, the results of the hybrid models were compared with the common MLP, ORELM and SARIMA models.

کلیدواژه‌ها [English]

  • Hybrid model
  • MLP
  • Monthly Runoff
  • ORELM
  • SARIMA

EXTENDED ABSTRACT

 

Introduction

Runoff is an important factor in a hydrological system and is influenced by various factors such as geographic location, topography, and climate. Runoff forecasting plays an essential role in reducing the effects of floods and droughts, controlling erosion and sedimentation in the basin. Various hydrological models including empirical models, physical models and data-based models have been developed for runoff modeling. The data-driven methods due to the need for less knowledge of the physical behavior of the phenomenon have become more popular.

Materials and Methods

At first, the data were divided into two categories, training (70% of the total data measured) and testing (30% of the total data measured). The value of the Hurst coefficient for the data was 0.63, which indicates that the length of the time series is sufficient for modeling. The results of the normality and stationarity test showed that the data have a non-normal distribution and a non-stationary behavior. Therefore, by performing normalization through normalizing functions and removing definite terms from the time series by performing seasonal differentiation, the data were normalized and stationary. By defining two scenarios (without preprocessing and with preprocessing) and by performing different modeling, the best linear model was selected. By calculating the residual of the linear model and checking the independence of the residuals through the Ljung–Box test, nonlinear modeling was performed by outlier robust extreme learning machine (ORELM) and multilayer perceptron (MLP) models. Then, the output of the nonlinear model was summed with the linear model.

Results and Discussion

For linear modeling with SARIMA model, two scenarios were defined. The best linear model in the first scenario was obtained with MAE=13.28, RMSE=17.23, R=0.62 and AIC=267.54 using seasonal parameters and without preprocessing. In the second scenario, four sub-scenarios were implemented. Sub-scenario 4 using preprocessing through the standardization with MAE=12.76, RMSE=13.11, R=0.57 and AIC=264.41 shows better results than other sub-scenarios. The comparison of the results obtained from the implementation of different nonlinear models showed that model 6 with MAE=10.25, RMSE=13.48 and R=0.7 has the lowest error value and the highest correlation compared to other models. The comparison of the results obtained from the SARIMA-MLP models showed that model 4 with MAE=11.35, RMSE=14.67, AIC=254.41 and R=0.65 has the lowest error and the highest correlation as well as the least complexity compared to other combinations. Comparing the results obtained from the SARIMA-ORELM model showed that model 6 with AIC=249.29, R=0.71, MAE=11.2 and RMSE=14.33 has the best performance in terms of accuracy and complexity compared to other models. By comparing the statistical indicators, the best SARIMA-ORELM and SARIMA-MLP models were selected. The comparison of the results obtained from the implementation of different linear models through the two scenarios showed that preprocessing through standardization increases the accuracy of the model and reduces the complexity of the model.

Conclusion

A summary of the comparison of the results of the hybrid models with the results obtained from modeling through SARIMA and MLP models is given below:

The results of comparing the predictions of the models through statistical indicators show SARIMA-ORELM model performs better than SARIMA-MLP model in all mathematical criteria.

SARIMA-MLP and SARIMA-ORELM models reduced the complexity of the model by 4.9% and 6.8%, respectively, compared to the linear modeling mode without preprocessing.

Among the six different models selected for runoff modeling, the weakest performance in terms of error and complexity criteria is achieved by modeling through the SARIMA model without preprocessing.

Author Contributions

F.N.D.: Writing – original draft, Formal analysis, Conceptualization, Data curation, Methodology, Validation, Writing – review & editing. A.A.: Writing – original draft, Formal analysis, Conceptualization, Data curation, Methodology, Validation, Writing – review & editing. A.A.A.: Conceptualization, Data curation, Writing – review & editing.

Data Availability Statement

Data is available on reasonable request from the authors.

 

Acknowledgements

The authors would like to thank all participants of the present study.

Ethical considerations

The authors avoided data fabrication, falsification, plagiarism, and misconduct.

Conflict of interest

The authors declare no conflict of interest

Azari, A., Zeynoddin, M., Ebtehaj, I., Sattar, A. M. A., Gharabaghi, B. and Bonakdari, H. 2021. Integrated preprocessing techniques with linear stochastic approaches in groundwater level forecasting. Acta Geophysica, 69, 1395–1411. https://doi.org/10.1007/s11600-021-00617-2.
Bayesteh, M., & Azari, A. (2019). Comparison of the performance of stochastic models in the generation of synthetic monthly flows data: A case study on Marun river. Journal of Applied Research in Water and Wastewater. 12, 117-125. https://doi.org/10.22126/arww.2019.1405.
Box, G. E. P., & Jenkins, G. (1970). Time  series  analysis:  Forecasting  and  control (2nd ed.). San Francisco, CA: Holden-Day.
Dwivedi, D. K., & Shrivastava, P.K. (2019). Rainfall and runoff estimation of micro watersheds of coastal Navsari. Journal of Soil and Water Conservation 18(1): 43-51, January-March 2019. ISSN: 022-457X (Print); 2455-7145 (Online); https://doi.org/ 10.5958/2455-7145.2019.00005.5.
Ebtehaj, I., & Bonakdari, H. (2022). A reliable hybrid outlier robust non-tuned rapid machine learning model for multi-step ahead flood forecasting in Quebec, Canada, Journal of Hydrology, Volume 614, Part B, 2022, 128592, ISSN 0022-1694, https://doi.org/10.1016/j.jhydrol.2022.128592.
Ebtehaj, I., Bonakdari, H., & Gharabaghi, B. (2019). A reliable linear method for modeling lake level fluctuations. Journal of Hydrology. 570 (2019), 236-250. https://doi.org/10.1016/j.jhydrol.2019.01.010.
Ebtehaj, I.,  Bonakdari, H.,  Zeynoddin, M.,   Gharabaghi, B., &  Azari, A.  (2020). Evaluation of preprocessing techniques for improving the accuracy of stochastic rainfall forecast models. Int. Journal of Environment Science Technology. 17, 505–524. https://doi.org/10.1007/S13762-019-02361-Z.
Gelete, G. (2023). Application of hybrid machine learning-based ensemble techniques for rainfall-runoff modeling. Earth Sci Inform 16, 2475–2495. https://doi.org/10.1007/S12145-023-01041-4.
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine:theory and applications. Neurocomputing 70(1–3):489–501.
Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Econ Lett. 6(3):255–259. https://doi.org/10.1016/0165-1765(80)90024-5.
Kwiatkowski, D., Phillips. P. C. B., Schmidt,  P. & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J Econo. 54(1–3), 159–178. https ://doi.org/10.1016/0304-4076(92)90104 –Y.
Kim, T., Shin, J. Y., Kim, H., Kim, S., & Heo, J. H.  (2019). The Use of Large-Scale Climate Indices in Monthly Reservoir Inflow Forecasting and Its Application on Time Series and Artificial Intelligence Models. Water. 2019, 11, 374. https://doi.org/10.3390/w11020374.
Lima, L. M. M., Popova, E., & Damien, P. (2014). Modeling and forecasting of Brazilian reservoir inflows via dynamic linear models. International Journal of Forecasting. 30 (2014) 464–476. https://doi.org/10.1016/j.ijforecast.2013.12.009.
Moeeni, H., Bonakdari, H., & Ebtehaj, I. (2017). Integrated SARIMA with Neuro-Fuzzy Systems and Neural Networks for Monthly Inflow Prediction. Water Resource Management. 31, 2141–2156 (2017). https://doi.org/10.1007/s11269-017-1632-7.
Nath, A., Mthethwa, F., & Saha, G. (2020). Runoff estimation using modified adaptive neuro-fuzzy inference system. Environment Engineering Res. 25(4), 545-553. https://doi.org/10.4491/eer.2019.166.
Niu, W. J., Feng, Z. K., Zeng, M., Feng, B., Min, Y. W.,  Cheng, C. T., & Zhou, J. Z. (2019). Forecasting reservoir monthly runoff via ensemble empirical mode decomposition and extreme learning machine optimized by an improved gravitational search algorithm. Applied Soft Computing.  82,105589. https://doi.org/10.1016/j.asoc.2019.105589.
Nourani, V., Najafi, H., Amini, A., & Tanaka, H. (2021). Using hybrid wavelet-exponential smoothing approach for streamflow modeling. Complexity. 1-17. https://doi.org/10.1155/2021/6611848.
Nourmohammadi Dehbalaei, F., Azari, A. & Akhtari, A. A. (2023). Development of a linear–nonlinear hybrid special model to predict monthly runoff in a catchment area and evaluate its performance with novel machine learning methods. Appl Water Sci 13, 118 (2023). https://doi.org/10.1007/s13201-023-01917-2.
Parsaie, A., Ghasemlounia, R., Gharehbaghi, A., Haghiabi, A., Chadee, A. A., Rashki Ghale Nou, M. (2024). Novel hybrid intelligence predictive model based on successive variational mode decomposition algorithm for monthly runoff series, Journal of Hydrology, Volume 634, 2024, 131041, ISSN 0022-1694, https://doi.org/10.1016/j.jhydrol.2024.131041.
Phillips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression, Biometrika, 75(2), 335-46.
Salih, S. Q.,  Sharafati, A.,  Ebtehaj, I.,  Sanikhani, H.,  Siddique, R.,  Deo, R. C.,  Bonakdari, H., ShahidS., &  Yaseen, Z. M. (2020). Integrative stochastic model standardization with genetic algorithm for rainfall pattern forecasting in tropical and semi-arid environments. Hydrological Sciences Journal.  65(2020), 7. https://doi.org/10.1080/02626667.2020.1734813.
Soltani, K., Azari, A., Zeynoddin, M., Amiri, A., Ebtehaj, I., Ouarda, T. B. M. J., Gharabaghi, B., & Bonakdari, H. (2021). Lake surface area forecasting using integrated satellite-sarima-long-short-term memory model. 04 August 2021, PREPRINT (Version 1). https://doi.org/10.21203/rs.3.rs-631247/v1.
Wang, W.C., Du, Y. J., Chau, K. W., Cheng, C. T., Xu, D. M. & Zhuang, WT. (2024). Evaluating the Performance of Several Data Preprocessing Methods Based on GRU in Forecasting Monthly Runoff Time Series. Water Resour Manage 38, 3135–3152. https://doi.org/10.1007/s11269-024-03806-y.
Zhang, K., & Luo, M., (2015). Outlier-robust extreme learning machine for regression problems, Neurocomputing 151  (2015) 1519-1527. https://doi.org/10.1016/j.neucom.2014.09.022.
Zhang, X.,  Wu, X.,  Zhu, G.,   Lu, X., &  Wang, K. (2022). A seasonal ARIMA model based on the gravitational search algorithm (GSA) for runoff prediction. Water Supply 22 (8): 6959–6977. https://doi.org/10.2166/ws.2022.263.
Zhihua, L.V., Zuo, J., & Rodriguez, D. (2020). Predicting of Runoff Using an Optimized SWAT-ANN: A Case Study. Hydrology.29. https://doi.org/10.1016/j.ejrh.2020.100688.