Prediction of Regional Heavy Precipitation Occurrence in the Southwest Iran Using Synoptic Variables and Data Mining Methods

Document Type : Research Paper

Authors

1 Department of Irrigation & Reclamation Engineering, Faculty of Agricultural Engineering & Technology, College of Agriculture & Natural Resources, University of Tehran, Karaj, Tehran, Iran

2 Associate Professor, Department of Space Physics, Institute of Geophysics, University of Tehran

Abstract

Short-term prediction of heavy precipitation events is especially crucial in flood warning and mitigation. This study offered a novel concept of the regional heavy precipitation based on the probability pattern of a typical rainstorm. Daily precipitation data of 12 synoptic stations located over southwestern Iran were used for this purpose. In addition, six synoptic variables at 1000 to 200 hPa pressure levels on one to five days before heavy precipitations (covering a wide range outside the study area) were used as predictors. All data used in this study cover the period 1987- 2018. Four feature selection methods and 10 binary classifier machine-learning models were employed in this study. The results revealed that using synoptic data up to four days prior to the events best distinguishes heavy precipitation from non-heavy precipitation events. In addition, among the four feature selection methods, Chi-Square and Extra Tree methods are superior to Correlation and Random Forest. As a result of this study, it was found that the Random Forest model with the Chi-Square feature selection method has the highest efficiency in predicting regional heavy precipitation events in the study area. Relative humidity and specific humidity 1-2 days before and wind speed 2-4 days before the precipitation events are relevant synoptic variables for predicting heavy precipitation events.

Keywords


Abbot J., Marohasy J. (2014) Input selection and optimisation for monthly rainfall forecasting in queensland, australia, using artificial neural networks. Atmos Res 138:166–178. https://doi.org/10.1016/j.atmosres.2013.11.002
Aftab S., Ahmad M., Hameed N., Bashir M.S., Ali I., Nawaz Z. (2018) Rainfall prediction in Lahore City using data mining techniques. Int J Adv Comput Sci Appl 9:254–260. https://doi.org/10.14569/IJACSA.2018.090439
Ahmad M., Aftab S. (2017) Analyzing the performance of svm for polarity detection with different datasets. Int J Mod Educ Comput Sci 9:29–36. https://doi.org/10.5815/ijmecs.2017.10.04
Ahmad M., Aftab S., Ali I. (2017a) Sentiment analysis of tweets using SVM. Int J Comput Appl 177:25–29. https://doi.org/10.5120/IJCA2017915758
Ahmad M., Aftab S., Ali I., Hameed N. (2017b) Hybrid tools and techniques for sentiment analysis: a review. Int J Multidiscip Sci Eng 8:28–33
Ahmad M., Aftab S., Muhammad S.S., Ahmad S. (2017) Machine learning techniques for sentiment analysis: a review. Int J Multidiscip Sci Eng 8:27–32
Alijani B., O’Brien J., Yarnal B., O’Brien J., Yarnal B. (2008) Spatial analysis of precipitation intensity and concentration in Iran. Theor Appl Climatol 94:107–124. https://doi.org/10.1007/s00704-007-0344-y
Arvin A., Mohamadinejad J. (2015) Synoptic survey of floods caused by heavy rainfall of 4 february 2006 in the lorestan basin. J Nat Environ Hazards 4:75-90 (In Farsi)
Baharian A., Salimi A. (2018) Utilizing of decision tree model in predicting precipitation in Sari based on the information from Sari synoptic station. In: The first national conference on management strategies of water resources and environmental challenges. pp 1-10 (In Farsi)
Beguería S., Angulo-Martínez M., Vicente-Serrano S.M., López-Moreno J.I., El-Kenawy A. (2011) Assessing trends in extreme precipitation events intensity and magnitude using non-stationary peaks-over-threshold analysis: a case study in northeast Spain from 1930 to 2006. Int J Climatol 31:2102–2114. https://doi.org/10.1002/JOC.2218
Bradley A.P. (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
Breiman L. (2001) Random Forests. Mach Learn 2001 451 45:5–32. https://doi.org/10.1023/A:1010933404324
Cavazos T., Turrent C., Lettenmaier D.P. (2008) Extreme precipitation trends associated with tropical cyclones in the core of the North American monsoon. Geophys Res Lett 35:L21703. https://doi.org/10.1029/2008GL035832
Chen T., Guestrin C. (2016) XGBoost: A Scalable Tree Boosting System. Proc ACM SIGKDD Int Conf Knowl Discov Data Min 13-17-Augu:785–794. https://doi.org/10.1145/2939672.2939785
Edgar T.W., Manz D.O. (2017) Exploratory Study. Res Methods Cyber Secur 95–130. https://doi.org/10.1016/B978-0-12-805349-2.00004-2
Fawcett T. (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874. https://doi.org/10.1016/J.PATREC.2005.10.010
Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence Menlo Park, CA, USA ©1996
Freund Y., Schapire R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139. https://doi.org/10.1006/JCSS.1997.1504
Geurts P., Ernst D., Wehenkel L. (2006) Extremely randomized trees. Mach Learn 2006 631 63:3–42. https://doi.org/10.1007/S10994-006-6226-1
Groisman P.Y., Knight R.W., Easterling D.R., Karl T.R., Hegerl G.C., Razuvaev V.N. (2005) Trends in intense precipitation in the climate record. J Clim 18:1326–1350. https://doi.org/10.1175/JCLI3339.1
Gupta A., Farhan Habib M., Mandal U., Chowdhury P., Tornatore M., Mukherjee B. (2018) On service-chaining strategies using virtual network functions in operator networks. Comput Networks 133:1–16. https://doi.org/10.1016/j.comnet.2018.01.028
Hall M.A. (2000) Correlation-based feature selection of discrete and numeric class machine learning
Hanley J.A., McNeil B.J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36. https://doi.org/10.1148/RADIOLOGY.143.1.7063747
Hirsch R.M., Archfield S.A. (2015) Flood trends: Not higher but more often. Nat Clim Chang 5:198–199. https://doi.org/10.1038/NCLIMATE2551
Hosmer D.W., Lemeshow S., Sturdivant R.X. (2013) Applied logistic regression: third edition. Appl Logist Regres Third Ed 1–510. https://doi.org/10.1002/9781118548387
IPCC (2007) IPCC, 2007: Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change.[Solomon, S., D. Qin, M. Manning, Z. Chen, M. Marquis, K.B. Averyt, M.Tignor and
IPCC (2012) IPCC, 2012: Managing the Risks of Extreme Events and Disasters to Advance ClimateChange Adaptation. A Special Report of Working Groups I and II of theIntergovernmental Panel on Climate Change [; Field, C.B., V. Barros, T.F. Stocker,D. Qin, D.J. Dokken, K.
Jafar Nazemosadat M., Shahgholian K. (2017) Heavy precipitation in the southwest of Iran: association with the Madden–Julian Oscillation and synoptic scale analysis. Clim Dyn 49:3091–3109. https://doi.org/10.1007/s00382-016-3496-6
Joachims T. (1998) Making large-scale SVM learning practical
Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.Y. (2017) LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017-Decem:3147–3155
Khalili A., Rahimi J. (2018) Climate. In: Roozitalab MH, Siadat H, Farshad A (eds) The Soils of Iran. Springer International Publishing, Cham, pp 19–33
Khoshakhlagh F., Safaierad R., Salmani D. (2015) The Synoptic analysis of flood occurrence on November 2011 in Behbahan and Likak cities. Phys Geogr Res 46:509-523 (In Farsi). https://doi.org/10.22059/JPHGR.2014.53001
Lindley D. V. (1958) Fiducial distributions and bayes’ theorem. J R Stat Soc Ser B 20:102–107. https://doi.org/10.1111/J.2517-6161.1958.TB00278.X
Loh W.Y. (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1:14–23. https://doi.org/10.1002/WIDM.8
Longadge R., Dongre S.S., Malik L. (2013) Class Imbalance Problem in Data Mining: Review. Int J Comput Sci Netw 2:
Mallakpour I., Villarini G. (2015) The changing nature of flooding across the central United States. Nat Clim Chang 2014 53 5:250–254. https://doi.org/10.1038/nclimate2516
McCulloch W.S., Pitts W. (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 1943 54 5:115–133. https://doi.org/10.1007/BF02478259
Mishra N., Soni H.K., Sharma S., Upadhyay A.K. (2017) A comprehensive survey of data mining techniques on time series data for rainfall prediction. J ICT Res Appl 11:167–183. https://doi.org/10.5614/ITBJ.ICT.RES.APPL.2017.11.2.4
Nayak D.R., Mahapatra A., Mishra P., RanjanNayak D., Mahapatra A., Mishra P. (2013) A survey on rainfall prediction using artificial neural network. Int J Comput Appl 72:32–40. https://doi.org/10.5120/12580-9217
Nayak M.A., Ghosh S. (2013) Prediction of extreme rainfall event using weather pattern recognition and support vector machine classifier. Theor Appl Climatol 114:583–603. https://doi.org/10.1007/S00704-013-0867-3/TABLES/9
Novakovic J., Veljovi A., Iiic S., Papic Z., Tomovic M. (2017) Evaluation of classification models in machine learning. Theory Appl Math Comput Sci 7:39–46
Omidvar K., Shafie S., Taghizadeh Z., Alipoor M. (2014) Assessing the performance of decision tree model in predicting precipitation in kermanshah synoptic station. J Appl Res Geogr Sci 14:89-110 (In Farsi)
Pourasghar F., Oliver E.C.J., Holbrook N.J. (2021) Influence of the MJO on daily surface air temperature over Iran. Int J Climatol 41:4562–4573. https://doi.org/10.1002/JOC.7086
Poursalehi F., Shahid A., Khasheisiuk A. (2019) Comparison of decision tree m5 and k-nearest neighborhood algorithm models in the prediction of monthly precipitation (case study: birjand synoptic station). Iran J Irrig Drain 13:1283-1293 (In Farsi)
Provost F., Fawcett T. (1997) Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. Proc THIRD Int Conf Knowl Discov DATA Min 43-48
Provost F., Fawcett T. (1998) Robust classiication systems for imprecise environments. Proc AAAI-98 AAAI Press Menlo Park CA 706–713
Provost F., Fawcett T., Kohavi R. (1997) The case against accuracy estimation for comparing induction algorithms. Proc FIFTEENTH Int Conf Mach Learn 445-453
Rahimi M., Fatemi S.S. (2019) Mean versus extreme precipitation trends in iran over the period 1960–2017. Pure Appl Geophys 2019 1768 176:3717–3735. https://doi.org/10.1007/S00024-019-02165-9
Rish I., Rish I. (2001) An empirical study of the naive bayes classifier
Ruivo H.M., De Campos Velho H.F., Sampaio G., Ramos F.M. (2015) Analysis of extreme precipitation events using a novel data mining approach. Am J Environ Eng 5:96–105. https://doi.org/10.5923/s.ajee.201501.13
Rumelhart D.E., Hinton G.E., Williams R.J. (1986) Learning representations by back-propagating errors. Nat 1986 3236088 323:533–536. https://doi.org/10.1038/323533a0
Seneviratne S.I., Nicholls N., Easterling D., Goodess C.M., Kanae S., Kossin J., Luo Y., Marengo J., Mc Innes K., Rahimi M., Reichstein M., Sorteberg A., Vera C., Zhang X., Rusticucci M., Semenov V., Alexander L. V., Allen S., Benito G., Cavazos T., Clague J., Conway D., Della-Marta P.M., Gerber M., Gong S., Goswami B.N., Hemer M., Huggel C., Van den Hurk B., Kharin V. V., Kitoh A., Klein Tank A.M.G., Li G., Mason S., Mc Guire W., Van Oldenborgh G.J., Orlowsky B., Smith S., Thiaw W., Velegrakis A., Yiou P., Zhang T., Zhou T., Zwiers F.W. (2012) Changes in climate extremes and their impacts on the natural physical environment. In Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation. A Special Report of Working Groups I and II of the Intergovernmental Panel on Cl
Speiser J.L., Miller M.E., Tooze J., Ip E. (2019) A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl 134:93–101. https://doi.org/10.1016/j.eswa.2019.05.028
Sun C., Huang G., Fan Y. (2020) Multi-indicator evaluation for extreme precipitation events in the past 60 years over the Loess Plateau. Water (Switzerland) 12:. https://doi.org/10.3390/w12010193
Vaghefi S.A., Keykhai M., Jahanbakhshi F., Sheikholeslami J., Ahmadi A., Yang H., Abbaspour K.C. (2019) The future of extreme climate in Iran. Sci Reports 2019 91 9:1–11. https://doi.org/10.1038/s41598-018-38071-8
Valverde-Albacete F.J., Carrillo-de-Albornoz J., Peláez-Moreno C. (2013) A proposal for new evaluation metrics and result visualization technique for sentiment analysis tasks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Berlin, Heidelberg, pp 41–52
Wheater H.S. (2002) Progress in and prospects for fluvial flood modelling. Philos Trans R Soc London Ser A Math Phys Eng Sci 360:1409–1431. https://doi.org/10.1098/rsta.2002.1007
WMO (2016) Guidelines on the defintion and monitoring of extreme weather and climate events. Task Team Defin Extrem Weather Clim Events, WMO, 4/14/2016 62. https://doi.org/10.1109/CSCI.2015.171
Young P.C. (2002) Advances in real–time flood forecasting. Philos Trans R Soc London Ser A Math Phys Eng Sci 360:1433–1450. https://doi.org/10.1098/rsta.2002.1008
Zainudin S., Jasim D.S., Bakar A.A. (2016) Comparative analysis of data mining techniques for malaysian rainfall prediction. Int J Adv Sci Eng Inf Technol 6:1148–1153. https://doi.org/10.18517/IJASEIT.6.6.1487
Zhang S., Lu L., Yu J., Zhou H. (2016) Short-term water level prediction using different artificial intelligent models. 2016 5th Int Conf Agro-Geoinformatics, Agro-Geoinformatics 2016. https://doi.org/10.1109/AGRO-GEOINFORMATICS.2016.7577678