ارزیابی مدل‌های هوشمندGPR-PSO و KNN-PSO در برآورد توزیع غلظت رسوبات معلق

نوع مقاله : مقاله پژوهشی

نویسندگان

1 گروه علوم و مهندسی آب، دانشکده کشاورزی و محیط‌زیست، دانشگاه اراک، اراک، ایران

2 گروه مهندسی آبیاری و آبادانی، دانشکدگان کشاورزی و منابع طبیعی، دانشگاه تهران، کرج، ایران

3 گروه علوم و مهندسی آب، دانشکده کشاورزی و محیط‌زیست، دانشگاه اراک، اراک، ایران.

10.22059/ijswr.2025.377078.669846

چکیده

توزیع عمودی غلظت رسوبات معلق یکی از اساسی‌ترین پارامترها در هیدرولیک انتقال رسوبات در رودخانه‌ها محسوب‌ می‌شود. این پارامتر نقش مهمی در محاسبه دبی کل رسوبات در کانال‌ها و رودخانه‌ها دارد. به همین دلیل اندازه‌گیری دقیق این پارامتر همواره یکی از اهداف پژوهشگران بوده است. یکی از راه‌های برآورد دقیق این پارامتر، استفاده از مدل‌های هوشمند است. برای این منظور، در این تحقیق برای پیش‌بینی توزیع غلظت رسوبات (C/Ca)، چهار مدل داده‌کاوی KNN، KNN-PSO، GPR، GPR-PSO استفاده شده است. تمامی مدل‌ها در محیط نرم‌افزار MATLAB کدنویسی شدند. با توجه به نتایج مشخص شد که بهینه‌‌سازی انجام ‌شده بر روی مدل KNN و GPR تاثیر‌گذار بوده و سبب افزایش عملکرد (دقت) این مدل‌ها شده است. با مقایسه بین مدل‌ها، نشان داده شد که مدل GPR-PSO دقت بیشتری نسبت به سایر مدل‌ها دارد. دقت این مدل در مرحله آموزش برابر با 0297/0 = RMSE، 9878/0 = R2 و 9776/0 = KGE بوده و در مرحله آزمون برابر با 0226/0 = RMSE، 9907/0 = R2 و 9715/0 = KGE است. از لحاظ دقت، بعد از GPR-PSO، مدل KNN-PSO با 0295/0 = RMSE، 9870/0 = R2 و 9864/0 = KGE در مرحله آموزش و 0374/0 = RMSE، 9808/0 = R2 و 9569/0 = KGE در مرحله آزمون قرار گرفت. پس از مدل‌های یادشده، GPR و KNN قرار گرفتند. همچنین با تحلیل نتایج مشخص شد که دو پارامتر y/D و y/a، مهم‌ترین پارامترها در تعیین نتایج دقیق‌تر هستند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Evaluation of GPR-PSO and KNN-PSO data-mining models for prediction of suspended sediment concentration distribution

نویسندگان [English]

  • Mohsen Nasrabadi 1
  • Yaser Mehri 2
  • Ali Abdolrazaq Sabbar 3
  • MohammadJavad Nahvinia 1
1 Department of Water Science and Engineering, Faculty of Agriculture and Environment, Arak University, Arak, Iran.
2 Depratment of Irrigation and Reclamation Engineering, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.
3 Department of Water Science and Engineering, Faculty of Agriculture and Environment, Arak University, Arak, Iran
چکیده [English]

The vertical distribution of suspended sediment concentration (SSC) is one of the most important parameters in the hydraulics of sediment transport in rivers. This parameter plays an important role in calculating the total sediment discharge in channels and rivers. For this reason, accurate measurement of this parameter has always been one of the goals of researchers. One way to accurately predict this parameter is to use intelligent models. For this purpose, in this study, four data mining models, KNN, KNN-PSO, GPR, and GPR-PSO, have been used to predict the distribution of sediment concentration (C/Ca). All models were coded in the MATLAB software environment. According to the results, it was found that the optimization performed on the KNN and GPR models was effective and increased the performance of these models. By comparing the models, it was shown that the GPR-PSO model has more accuracy than other models. The accuracy of this model in the training phase is equal to RMSE = 0.0297, R2 = 0.9878, and KGE = 0.9776, and in the testing phase equal to RMSE = 0.0226, R2 = 0.9907, and KGE = 0.9715. After GPR-PSO, the KNN-PSO model was ranked with RMSE = 0.0295, R2 = 0.9870, and KGE = 0.9864 in the training phase and RMSE = 0.0374, R2 = 0.9808, and KGE = 0.9569 in the testing phase. After the aforementioned models, GPR and KNN were respectively ranked. Also, by analyzing the results, it was determined that the two parameters y/D and y/a are the most important parameters in determining the most accurate results.

کلیدواژه‌ها [English]

  • Concentration distribution
  • suspended sediments
  • data mining models
  • Gaussian process regression

Introduction

The vertical distribution of suspended sediment concentration (SSC) is one of the most fundamental parameters in the hydraulics of sediment transport in rivers. This parameter plays an important role in calculating the total sediment discharge in channels and rivers. For this reason, accurate measurement of this parameter has always been one of the goals of researchers. One way to accurately estimate this parameter is to use intelligent models.

The main aim of the present study is to present and evaluate highly accurate models for predicting suspended sediment concentration under laboratory conditions. Calculation of suspended sediment distribution is one of the important parameters in river engineering, and its accurate calculation leads to an acceptable estimate of suspended sediment discharge. Considering the necessity of accurate measurement of suspended sediment distribution and the difficulty of laboratory works, the use of intelligent models can be useful. To solve this problem, four intelligent models of KNN, KNN-PSO, GPR, GPR-PSO have been used in the present study.

Methodology

In this study, four intelligent models KNN, KNN-PSO, GPR, GPR-PSO were used to predict the distribution of suspended sediment concentration. For this purpose, all models were coded in the MATLAB software environment. Laboratory data of Vanoni (1946) and Einstein and Chien (1955) were used for modeling (Table 1). These data include 203-point data of sediment concentration distribution. Einstein and Chien’s experiments were conducted in a flume with a width of 30.7 cm and a smooth bed with different slopes of 0.00185 and 0.0025. Sediment particles with diameters of 1.3, 0.94 and 0.274 mm were used. In addition, Vanoni’s experiments were conducted in a flume with a rough bed with a width of 84.5 cm and a constant slope of 0.0025. The diameters of sediment particles used in this study were 0.1, 0.13, and 0.16 mm.

Results and Discussion

The data were first normalized and adjusted between 0 and 1. Then, 80% of the data were used for training and 20% of them were used for testing. The results showed that the optimization by PSO method has increased the accuracy of GPR and KNN models. It was found that the superior model is the GPR-PSO model, which has an accuracy of RMSE = 0.0297, R2 = 0.9878 and KGE = 0.9776 in the training phase and RMSE = 0.0226, R2 = 0.9907 and KGE = 0.9715 in the testing phase. After the aforementioned model, the KNN-PSO model was ranked. These results show that if the optimal values ​​are selected for the GPR model, it will have more accuracy than KNN model. This is because the prediction level in KNN is discrete, which makes the search space for PSO limited, while in the GPR model the prediction level is continuous, which makes the PSO optimization search a larger space of parameters, which makes it more accurate in regression and interpolation problems. Also, the modeling method in KNN is linear, which makes the optimization performed less effective; while the GPR model has a greater ability to model nonlinear relationships such as sediment problems, which makes this model more flexible than other models. In addition, the GPR model has the ability to consider uncertainty in modeling, while this is not possible with KNN. This is because the GPR model is a probabilistic model that models the distribution of the output variable according to the input variable, while KNN only considers the K-nearest neighbors for the prediction.

Conclusions

According to the results obtained, it was found that PSO optimization had an important effect on the performance of the GPR and KNN models and increased the accuracy of the models. Among the studied models, the GPR-PSO model was recognized as the best model. The accuracy of this model in the test phase was equal to RMSE = 0.0226, R2 = 0.9907 and KGE = 0.9715. Also, by analyzing the results of the studied models, it was determined that the parameters y/D and y/a in most models were recognized as important parameters in determining the highest accuracy.

The results indicate that PSO optimization impacted the performance of the GPR and KNN models, resulting in increased accuracy. The GPR-PSO model was identified as the most effective model, with test phase accuracy metrics of RMSE = 0.0226, R2 = 0.9907, and KGE = 0.9715. Analysis of the results also revealed that the parameters y/D and y/a were consistently identified as crucial factors in achieving optimal accuracy across most models.

Author Contributions

“Conceptualization, Y.M. and M.N.; methodology, A.S.; software, Y.M.; validation, Y.M., M.N. and M.J.N.; formal analysis, Y.M.; investigation, M.N.; resources, A.S.; data curation, M.N.; writing—original draft preparation, Y.M.; writing—review and editing, M.N.; visualization, Y.M.; supervision, M.N.;

All authors have read and agreed to the published version of the manuscript.”

Data Availability Statement

Not applicable

Acknowledgements

The authors would like to thank all participants of the present study.

Ethical considerations

The study was approved by the Ethics Committee of the Arak University. The authors avoided data fabrication, falsification, plagiarism, and misconduct.

Conflict of interest

The author declares no conflict of interest.

 

Abbaszadeh, H., Daneshfaraz, R., Sume, V., & Abraham, J. 2024. Experimental investigation and application of soft computing models for predicting flow energy loss in arc-shaped constrictions. AQUA—Water Infrastructure, Ecosystems and Society, 73(3), 637-661.‏
Abbaszadeh, H., Norouzi, R., Sume, V., Kuriqi, A., Daneshfaraz, R., & Abraham, J. 2023. Sill role effect on the flow characteristics experimental and regression model analytical. Fluids, 8(8), p. 235.‏
Adnan, R.M., Liang, Z., El-Shafie, A., Zounemat-Kermani, M. and Kisi, O., 2019. Prediction of suspended sediment load using data-driven models. Water11(10), p.2060.
Chien, N., 1955. Effects of Heavy Sediment Concentration Near the Bed on Velocity and Sediment Distribution. Missoury River Division, Corps of Engineers, US Army.
Cigizoglu, H.K., 2004. Estimation and forecasting of daily suspended sediment data by multi-layer perceptrons. Advances in Water Resources, 27(2), pp.185-195.
Dong, X., Yu, Z., Cao, W., Shi, Y. and Ma, Q., 2020. A survey on ensemble learning. Frontiers of Computer Science14, pp.241-258.
Gupta, L.K., Pandey, M., Raj, P.A. and Shukla, A.K., 2023. Fine sediment intrusion and its consequences for river ecosystems: a review. Journal of Hazardous, Toxic, and Radioactive Waste, 27(1), p.04022036.
Hassanpour, F., Sharifazari, S., Ahmadaali, K., Mohammadi, S. and Sheikhalipour, Z., 2019. Development of the FCM-SVR hybrid model for estimating the suspended sediment load. KSCE Journal of Civil Engineering23, pp.2514-2523.
Hassanzadeh, Y. & Abbaszadeh, H. 2023. Investigating Discharge Coefficient of Slide Gate-Sill Combination Using Expert Soft Computing Models. Journal of Hydraulic Structures, 9(1), pp.63-80.
Hassanzadeh, Y., 2007. Evaluation of sediment load in a natural river. Water International, 32(1), pp.145-154.
Hassanzadeh, Y., Abbaszadeh, H. 2023. Investigating Discharge Coefficient of Slide Gate-Sill Combination Using Expert Soft Computing Models', Journal of Hydraulic Structures, 9(1), pp. 63-80. doi: 10.22055/jhs.2023.43683.1251
Hassanzadeh, Y., Abbaszadeh, H., Abedi, A., & Abraham, J. 2024. Numerical simulation of the effect of downstream material on scouring-sediment profile of combined spillway-gate. AQUA—Water Infrastructure, Ecosystems and Society, jws2024360.‏
Heddam, S., Naghibi, A., Khosravi, K. and Singh, S.K., 2024. Suspended sediment load prediction and tree-based algorithms. In Remote Sensing of Soil and Land Surface Processes (pp. 257-269). Elsevier.
Kaveh, K., Kaveh, H., Bui, M.D. and Rutschmann, P., 2021. Long short-term memory for predicting daily suspended sediment concentration. Engineering with Computers, 37, pp.2013-2027.
Khozani, Z.S., Safari, M.J.S., Mehr, A.D. and Mohtar, W.H.M.W., 2020. An ensemble genetic programming approach to develop incipient sediment motion models in rectangular channels. Journal of Hydrology, 584, p.124753.
Marashi, A., Kouchakzadeh, S. & Yonesi, H.A. 2023. Rotary gate discharge determination for inclusive data from free to submerged flow conditions using ENN, ENN–GA, and SVM–SA. Journal of Hydroinformatics. 25(4), 1312-1328.
Mehri, Y., Nasrabadi, M. and Omid, M.H., 2021. Prediction of suspended sediment distributions using data mining algorithms. Ain Shams Engineering Journal, 12(4), pp.3439-3450.
Mehri, Y., Soltani, J. & Khashehchi, M. 2019. Predicting the coefficient of discharge for piano key side weirs using GMDH and DGMDH techniques. Flow Measurement and Instrumentation, 65, pp. 1-6.
Melesse, A.M., Ahmad, S., McClain, M.E., Wang, X. and Lim, Y.H., 2011. Suspended sediment load prediction of river systems: An artificial neural network approach. Agricultural Water Management98(5), pp.855-866.
Nasrabadi, M., Mehri, Y., Ghassemi, A. and Omid, M.H., 2021. Predicting submerged hydraulic jump characteristics using machine learning methods. Water Supply21(8), pp.4180-4194.
Nasrabadi, M., Riahi, S., Samadi Borujeni, H. 2014. Evaluation of the Distribution Equations of Suspended Sediment Concentration in Open Channels, Iranian Water Research, 8(1), 175-185 (in Persian).
Omid, M. H. and Nasrabadi, M. 2012. Sedimentation Engineering. Tehran University Press. First Edition. 790 pages (in Persian).
Pal, D. and Ghoshal, K., 2016. Vertical distribution of fluid velocity and suspended sediment in open channel turbulent flow. Fluid Dynamics Research48(3), p.035501.
Prandtl, L., 1932. Zur turbulenten Strömung in Rohren und längs Platten. In Ergebnisse der aerodynamischen Versuchsanstalt zu Göttingen Lfg. 4 (pp. 18-29). De Gruyter.
Ribeiro, M.H.D.M. and dos Santos Coelho, L., 2020. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Applied soft computing, 86, p.105837.
Rouse, H., 1937. Modern conceptions of the mechanics of fluid turbulence. Transactions of the American Society of Civil Engineers102(1), pp.463-505.
Samantaray, S. and Sahoo, A., 2022. Prediction of suspended sediment concentration using hybrid SVM-WOA approaches. Geocarto International37(19), pp.5609-5635.
Shafai-Bejestan, M. 2008. Theory and Application of Sediment Transport Hydraulics. Shahid Chamran University Press, Ahvaz. 550 pages (in Persian).
Taki, M., Rohani, A., Soheili-Fard, F. and Abdeshahi, A., 2018. Assessment of energy consumption and modeling of output energy for wheat production by neural network (MLP and RBF) and Gaussian process regression (GPR) models. Journal of cleaner production172, pp.3028-3041.
Trojovský, P. and Dehghani, M., 2022. Pelican optimization algorithm: A novel nature-inspired algorithm for engineering applications. Sensors22(3), p.855.
Ulke, A., Tayfur, G. and Ozkul, S., 2009. Predicting suspended sediment loads and missing data for Gediz River, Turkey. Journal of Hydrologic Engineering, 14(9), pp.954-965.
Vanoni, V.A., 1946. Transportation of suspended sediment by water. Transactions of the American Society of Civil Engineers111(1), pp.67-102.
Zenko, B., Todorovski, L. and Dzeroski, S., 2001, November. A comparison of stacking with meta decision trees to bagging, boosting, and stacking with other methods. In Proceedings 2001 IEEE international conference on data mining (pp. 669-670). IEEE.
Zhu, Y.M., Lu, X.X. and Zhou, Y., 2007. Suspended sediment flux modeling with artificial neural network: An example of the Longchuanjiang River in the Upper Yangtze Catchment, China. Geomorphology84(1-2), pp.111-125.
 Zounemat-Kermani, M., Batelaan, O., Fadaee, M. and Hinkelmann, R., 2021. Ensemble machine learning paradigms in hydrology: A review. Journal of Hydrology598, p.126266.