Rainfall-runoff Modelling of Coastal Watersheds near Hormuz Strait Using Data Mining

Document Type : Research Paper


1 PhD Student, Department of Natural Resources Engineering, University of Hormozgan, Bandar-Abbas, Hormozgan, Iran

2 - Associate Professor, Department of Natural Resources Engineering, University of Hormozgan, Bandar-Abbas, Hormozgan, Iran

3 Associate Professor, Department of Natural Resources Engineering, University of Hormozgan, Bandar-Abbas, Hormozgan, Iran

4 Associate Professor, Water Sciences and Engineering Department, Faculty of Agriculture and Natural Resources, Imam Khomeini International University, Qazvin, Iran

5 Professor, Water Engineering Department, College of Agriculture, Shiraz University, Shiraz, Iran


Estimating runoff created by rainfall is a very important step in water resources planning, especially in ungauged River Basins. Therefore, research on models simulating the river flow with minimum error in the river basins is necessary. In this study, rainfall-runoff simulation of Minab watershed was done using data mining methods and their performance was compared to present the proper one. For this purpose, eight data mining algorithms including Model Tree (MT), Random Forest (RF), Support Vector Machines (SVM), Bayesian Ridge Regression (BRR), Gaussian Process (GP), Extreme Gradient Boosting (XGB), Artificial Neural Network (ANN), and Multivariate Adaptive Regression Splines (MARS) were used. Coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE) and Taylor diagram were used to evaluate the model performance. The results indicated that the MARS model had the best performance among the all models to simulate the monthly discharge of the Minab watershed. Also, the SVM model with (RSME =7.73) has a good performance. The other models also performed relatively close to each other (The XGB model with 9.98 had the highest and the MARS model with 7.7 had the lowest RMSE). Then, by entering the values of sea level temperature (PGSST) in the simulation process, the effect of this parameter on the simulation results was investigated. The results showed that PGSST values did not improve the runoff simulation results in the study area.


Ahmadi, F. (2019). Evaluation of Support Vector Machine and Adaptive Neuro-Fuzzy Inference System Performance in Prediction of Monthly River Flow (Case Study: Nazlu chai and Sezar Rivers). Iranian Journal of Soil and Water Research, 51(3), 686-673.
Aleotti, P., and Chowdhury, R. (1999). Landslide hazard assessment: summary review and new perspectives. Bulletin of Engineering Geology and the environment, 58(1), 21-44.
Ångström, A., (1935). Teleconnections of Climatic Changes in Present Time. Geografiska Annaler,
17: 242-258.
Bayat varkashi, M. and Gheysari, P. (2018). The Effect of Enso Phenomenon on Groudwater Table (Case Study: Esfahan and Hormozgan). Iran-Water Resources Research, 14 (2), 1-15.  (In Farsi)
Bergström, S., Carlsson, B., Gardelin, M., Lindström, G., Pettersson, A. and Rummukainen, M. (2001). Climate change impacts on runoff in Sweden - assessments by global climate models, dynamical downscaling and hydrological modelling. Climate Research, 16(2), 101–112.
Bhattacharya, B., and Solomatine, D. P. (2005). Neural networks and M5 model trees in modelling water level–discharge relationship. Neurocomputing, 63, 381-396.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Bui, D. T., Pradhan, B., Lofman, O., Revhaug, I., and Dick, O. B. (2012). Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Computers and Geosciences, 45, 199-211.
Chemura, A., Rwasoka, D., Mutanga, O., Dube, T., and Mushore, T. (2020). The impact of land-use/land cover changes on water balance of the heterogeneous Buzi sub-catchment, Zimbabwe. Remote Sensing Applications: Society and Environment, 18, 100292.
Chen, T., and Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
Chen, T., Wang, X., Chu, Y., Wei, D.-Q., and Xiong, Y. (2020,a). T4SEXGB: interpretable sequence-based prediction of type IV secreted effectors using extreme gradient boosting algorithm. bioRxiv.
doi: 10.1101/2020.06.18.158253
Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H. and Ahmad, B. B. (2020,b). Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Science of the Total Environment, 701, 134979.
Chiew, F.H.S., Piechota, T.C., Dracup, J.A. and Mcmahon, T.A., (1998). El Nino/southern oscillation and Australian rainfall, stream flow and drought: links and potential for forecasting. Journal of Hydrology, 204(1-4): 138-149.
Dastorani, M. T., Mahjoobi, J., Talebi, A., and Fakhar, F. (2018). Application of Machine Learning Approaches in Rainfall-Runoff Modeling (Case Study: Zayandeh_Rood Basin in Iran). Civil Engineering Infrastructures Journal, 51(2), 293-310.
Drobinski, P., Silva, N.D., Panthou, G., Bastin, S., Muller, C., Ahrens, B., Borga, B., Conte, D., Fosser, G., Giorgi, F., Güttler, I., Kotroni, V., Li, L., Morin, E., Önol, B., Quintana-Segui, P., Romera, R., and Torma, C. S. (2018). Scaling precipitation extremes with temperature in the Mediterranean: past climate assessment and projection in anthropogenic scenarios, Clim. Dynam., 51, 1237–1257.
Eini, M., Kaboli, H. S., Rashidian, M., and Hedayat, H. (2020). Hazard and vulnerability in urban flood risk mapping: Machine learning techniques and considering the role of urban districts. International Journal of Disaster Risk Reduction, 101687.
Etemad-Shahidi, A., and Bonakdar, L. (2009). Design of rubble-mound breakwaters using M5′ machine learning method. Applied Ocean Research, 31(3), 197-201.
Etemad-Shahidi, A., and Taghipour, M. (2012). Predicting longitudinal dispersion coefficient in natural streams using M5′ model tree. Journal of hydraulic engineering, 138(6), 542-554.
Evans, M.N., Fairbanks, R.G. and Rubenstone, J.L. (1998). A proxy index of ENSO Teleconnections.
Nature, 394(6695): 732-733.
Farmer, W. H., and Vogel, R. M. (2016). On the deterministic and stochastic use of hydrologic models. Water Resources Research, 52(7), 5619-5633.
Fausett, L. (1994). Fundamentals of neural networks: architectures, algorithms, and applications. Prentice-Hall, Inc.
Friedman, J. H. (1991). Multivariate adaptive regression splines. The annals of statistics, 1-67.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
Gholami, H., Mohamadifar, A., and Collins, A. L. (2020). Spatial mapping of the provenance of storm dust: Application of data mining and ensemble modelling. Atmospheric Research, 233, 104716.
Ghorbani, K., Sohrabian, E. and Salarijazi, M. (2016). Evaluation of Hydrological and Data Mining Models in Monthly River Discharge Simulation and Prediction (Case Study: Araz-Kouseh Watershed). Journal of Water and Soil Conservation, 23(1), 203-217. (In Farsi)
Gramacy, R. B., (2019). Package ‘monomvn’. Package ‘monomvm’ version 1.9-13. Retrieved from https://cran.r-project.org/web/packages/momomvm/index.html
Granata, F. (2019). Evapotranspiration evaluation models based on machine learning algorithms—A comparative study. Agricultural Water Management, 217, 303-315.
Guan, H., He, X. and Zhang, X. (2015). A comprehensive examination of global atmospheric CO2
teleconnections using wavelet-based multi-resolution analysis. Environmental Earth
, 74(10): 7239-7253.
He, X. and Guan, H. (2013). Multiresolution analysis of precipitation teleconnections with large-scale climate signals: A case study in South Australia. Water Resources Research, 49(10), 6995-
Helleputte, T., and Gramme, P. (2017). LiblineaR: Linear predictive models based on the LIBLINEAR C/C++ Library. R package version, 2-10.
Hornik, K., Buchta, C., Hothorn, T., Karatzoglou, A., Meyer, D., Zeileis, A., & Hornik, M. K. (2020). Package ‘RWeka’ version 0.4-43. Retrieved from https://cran.r- roject.org/web/packages/RWeka/index.html.
Hosseini, F., Karimi, O. and Hamedi, F. (2019). Survival Analysis using Bayesian Additive Regression Trees. Andishe, 24(1), 33-42. (In Farsi)
Hrachowitz, M., Savenije, H. H. G., Blöschl, G., McDonnell, J. J., Sivapalan, M., Pomeroy, J. W., ... and Fenicia, F. (2013). A decade of Predictions in Ungauged Basins (PUB)—a review. Hydrological sciences journal, 58(6), 1198-1255.
Iorgulescu, I., and Beven, K. J. (2004). Nonparametric direct mapping of rainfall‐runoff relationships: An alternative approach to data analysis and modeling?. Water Resources Research, 40(8).
Jansa, A., Alpert, P., Arbogast, P., Buzzi, A., Ivancan-Picek, B., Kotroni, V., Llasat, M. C., Ramis, C., Richard, E., Romero, R., and Speranza. (2014). A.: MEDEX: a general overview, Nat. Hazards
Earth Syst. Sci.
, 14, 1965–1984.
Kapelner, A., and Bleich, J. (2013). bartMachine: Machine learning with Bayesian additive regression trees. arXiv preprint arXiv:1312.2171.
Karatzoglou, A., Smola, A., Hornik, K., and Karatzoglou, M. A. (2019). Package ‘kernlab’. CRAN R Project.
Khosravi, K., Cooper, J. R., Daggupati, P., Pham, B. T., and Bui, D. T. (2020). Bedload transport rate prediction: Application of novel hybrid data mining techniques. Journal of Hydrology, 124774.
Kirono, D.G.C., Chiew, F.H.S. and Kent, D.M. (2010). Identification of best predictors for forecasting seasonal rainfall and runoff in Australia. Hydrological Processes, 24(10): 1237–1247.
Kumar, A., Kumar, P., and Singh, V. K. (2019). Evaluating Different Machine Learning Models for Runoff and Suspended Sediment Simulation. Water resources management, 33(3), 1217-1231.
Liaw, A., and Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22.
Luce, C. (2014). Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales: Edited by Günter Blöschl, Murugesu Sivapalan, Thorsten Wagener, Alberto Viglione, and Hubert Savenije Cambridge University Press, 2013, 465 pp., ISBN: 978‐1107028180, 140(hardback), 112 (eBook). Eos, Transactions American Geophysical Union, 95(2), 22-22.
Marjanović, M., Kovačević, M., Bajat, B., and Voženílek, V. (2011). Landslide susceptibility assessment using SVM machine learning algorithm. Engineering Geology, 123(3), 225-234.
Mekanik, F., Imteaz, M.A. and Talei, A., (2015). Seasonal rainfall forecasting by adaptive network-based fuzzy inference system (ANFIS) using large scale climate signals. Climate Dynamics, 46(9-10): 3097-3111
Milborrow, S. (2020). Package ‘earth version 5.3.Retrieved from rproject.org/web/packages/earth/index.html
Mohammadi, M. (2020). Development of Conjunctive Use Model of Surface and Groundwater Based on Teleconnection Patterns Forecasts to Study Ground Water Balancing Scenarios. (Case study: downstream of Minab Esteghlal Dam. Ph. D. dissertation, University of Hormozgan, BandarAbbas, Iran. (In Farsi)
Nazemosadat, M. J., (1998). The Persian Gulf sea surface temperature as a drought diagnostic
for southern parts of Iran, Drouth News Network, 10, 12-14.
Nazemosadat, M. J., Cordery, I., and Eslamian, S, (1995). The impact of the Persian Gulf Sea
surface temperature on Iranian rainfall, Proceedings of the Iranian Water Resource
Management Conference, Esfahan, Iran, 809-819.
Nazemosadat, M. J., Ghaedamini, H. and Tavakoli, M. (2014). Investigating the climate change diagnostics over the north western parts of the Indian Ocean: The SST analysis for the period 1950-2009. Iranian Journal of Geophysics, 8(2), 26-40.
Niu, W. J., Feng, Z. K., Zeng, M., Feng, B. F., Min, Y. W., Cheng, C. T., and Zhou, J. Z. (2019). Forecasting reservoir monthly runoff via ensemble empirical mode decomposition and extreme learning machine optimized by an improved gravitational search algorithm. Applied Soft Computing, 82, 105589.
Nobre, J., and Neves, R. F. (2019). Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets. Expert Systems with Applications, 125, 181-194.
Nohegar, A., Ghashghaeizadeh, N., Heydarzadeh, M., Eydoon, M. and Pannahi, M. (2016). Assessment of drought and its impact on surface and groundwater resources (Case study: River basin Minab), Journal of Earth Science Researches, 7(27), 28-43. (In Farsi)
Nohegar, A., Torabi, B., Holisaz, A. and Biniyaz, M. (2013). Soil Erosion Model Implication (Case Stady: Minab Basin). E.E.R, 3(2), 53-64. (In Farsi)
Nourani, V., Davanlou Tajbakhsh, A., Molajou, A., and Gokcekus, H. (2019). Hybrid Wavelet-M5 Model tree for rainfall-runoff modeling. Journal of Hydrologic Engineering, 24(5), 04019012.
O'Hagan, A. (1978). Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society: Series B (Methodological), 40(1), 1-24.
Panahi, M., Gayen, A., Pourghasemi, H. R., Rezaie, F., and Lee, S. (2020). Spatial prediction of landslide susceptibility using hybrid support vector regression (SVR) and the adaptive Neuro-fuzzy inference system (ANFIS) with various metaheuristic algorithms. Science of the Total Environment, 139937.
Peddle, D. R., Foody, G. M., Zhang, A., Franklin, S. E., and LeDrew, E. F. (1994). Multi-source image classification II: An empirical comparison of evidential reasoning and neural network approaches. Canadian Journal of Remote Sensing, 20(4), 396-407.
Peters, D.L., Atkinson, D., Monk, W.A., Tenenbaum, D.E. and Baird, D.J., (2013). A multi-scale hydroclimatic analysis of runoff generation in the Athabasca River, western Canada. Hydrological Processes, 27(13), 1915-1934.
Pourghasemi, H. R., and Rahmati, O. (2018). Prediction of the landslide susceptibility: Which algorithm, which precision?. Catena, 162, 177-192.
Pourghasemi, H.R., Yousefi, S., Kornejady, A., Cerda, A. (2017). Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling. Sci. Total Environ. 609, 764–775.
Quinlan, J. R. (1992, November). Learning with continuous classes. In 5th Australian joint conference on artificial intelligence (Vol. 92, pp. 343-348).
Rasmussen, C. E., and Williams, C. K. (2006). Gaussian Processes for Machine Learning the MIT Press. Cambridge, MA.
Razavi, T., and Coulibaly, P. (2013). Streamflow prediction in ungauged basins: review of regionalization methods. Journal of hydrologic engineering, 18(8), 958-975.
Riad, S., Mania, J., Bouchaou, L., Najjar, Y., (2004). Rainfall-runoff model using an artificial neural network approach. Math. Comput. Modell. 40 (7–8), 839–846.
Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., and Rigol-Sanchez, J. P. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing, 67, 93-104.
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., and Chica-Rivas, M. J. O. G. R. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804-818.
Rodriguez-Galiano, V., and Gianola, D. (2020). Package ‘brnn’ version 0.8. Retrieved from https://cran.r-project.org/web/packages/brnn/index.html
Samadi, M., Bahremand, A., and Fathabadi, A. (2019). The Boustan Dam monthly inflow forecasting using data-driven and ensemble models in the Golestan Province. Journal of Watershed Engineering and Management, 11(4), 1044-1058. (In Farsi)
Samadi, M., Jabbari, E., Azamathulla, H. M., and Mojallal, M. (2015). Estimation of scour depth below free overfall spillways using multivariate adaptive regression splines and artificial neural networks. Engineering Applications of Computational Fluid Mechanics, 9(1), 291-300.
Sattari, M., Pourazad, M. and Najafabadi, R. (2016). Technical Note: Hourly River flow forecast of Aharchay River using machine learning ‎methods. Journal of Watershed Engineering and Management, 8(1), 115-127. (In Farsi)
Seeger, M. (2004). Gaussian processes for machine learning. International journal of neural systems, 14(02), 69-106.
Sezen, C., Bezak, N., Bai, Y., and Šraj, M. (2019). Hydrological modelling of karst catchment using lumped conceptual and data mining models. Journal of Hydrology, 576, 98-110.
Shawe-Taylor, J., and Cristianini, N. (2000). An introduction to support vector machines.
Shortridge, J. E., Guikema, S. D., and Zaitchik, B. F. (2016). Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrology and Earth System Sciences, 20(7).
Sobolowski, S. and Frei, A. (2007). Lagged relationships between North American snow mass and atmospheric teleconnection indices. International Journal of Climatology, 27(2): 221-231.
Taylor, K. E. (2001). Summarizing multiple aspects of model performance in a single diagram. Journal of Geophysical Research: Atmospheres, 106(D7), 7183-7192.
Vapnik, V. (1998). Statistical Learning Theory John Wiley. New York.
Vapnik, V. N. (2000). Direct methods in statistical learning theory. In The nature of statistical learning theory (pp. 225-265). Springer, New York, NY.
Wang, J., Wang, X., Lei, X-h., Wang, H., Zhang, X-h., Tan, Q-f., Liu, X-l. (2019). Teleconnection analysis of monthly streamflow using ensemble empirical mode decomposition, Journal of Hydrology, (19)31146
Wang, Y., and Witten, I. H. (1996). Induction of model trees for predicting continuous classes.
Worland, S. C., Farmer, W. H., and Kiang, J. E. (2018). Improving predictions of hydrological low-flow indices in ungaged basins using machine learning. Environmental modelling and software, 101, 169-182.
Wu, X., Ren, F., and Niu, R. (2014). Landslide susceptibility assessment using object mapping units, decision tree, and support vector machine models in the Three Gorges of China. Environmental earth sciences, 71(11), 4725-4738.
Yang, C. C., Prasher, S. O., Lacroix, R., and Kim, S. H. (2003). A multivariate adaptive regression splines model for simulation of pesticide transport in soils. Biosystems Engineering, 86(1), 9-15.
Yaseen, Z. M., El-Shafie, A., Jaafar, O., Afan, H. A., and Sayl, K. N. (2015). Artificial intelligence based models for stream-flow forecasting: 2000–2015. Journal of Hydrology, 530, 829-844.
Yaseen, Z. M., Kisi, O., and Demir, V. (2016). Enhancing long-term streamflow forecasting and predicting using periodicity data component: application of artificial intelligence. Water resources management, 30(12), 4125-4151.
Zahiri, J., and Nezaratian, H. (2020). Estimation of transverse mixing coefficient in streams using M5, MARS, GA, and PSO approaches. Environmental Science and Pollution Research, 1-14.
Zema, D. A., Lucas-Borja, M. E., Fotia, L., Rosaci, D., Sarnè, G. M., and Zimbone, S. M. (2020). Predicting the hydrological response of a forest after wildfire and soil treatments using an Artificial Neural Network. Computers and Electronics in Agriculture, 170, 105280.