Modelling and Prediction of Soil Classes Using Boosting Regression Tree and Random Forests Machine Learning Algorithms in Some Part of Qazvin Plain

Document Type : Research Paper


1 Science and Soil Engineering Department, Agriculture and Natural Resources faculty, University of Tehran, Iran.

2 soil science department< faculty of agricultural engineering and technology, university of Tehran

3 Ph.D Student of Soil Resources Management, Science and soil Engineering Department, Tehran university,


Appropriate selection of ancillary covariates have a specific important on digital soil mapping. Currently, use of machine learning algorithms for digital mapping and updating of conventional soil map has been developed in Iran. The current study has been done to compare the BRT and RF models for spatial prediction of subgroup and family classes with selection of axillary variables  using VIF approach in some part of Qazvin Plain. 61 pedons were sampled based on stratified random, digged, described and classified with consideration of laboratory analysis up to family level. The most appropriate variables were selected among 15 Geomorphometry and Remote Sensing Indices using Variance Inflation Factor (VIF). Soil landscape modeling was conducted with RF and BRT learning algorithm in RStudio software based on Randomforest and C5.0 packages at subgroup and family levels. The results showed that six indices including CHA, DEM, STH, SI DVI and NDVI were selected as input variables. Assessment indices such as the Overall Accuracy (OA) and Kappa were obtained for BRT (35, 26%) and RF (70, 60%) at family level, respectively. Sensitivity analysis based on the mean decrease accuracy (MDA) revealed that the modified catchment area variable is the most relative important variable among the selected variables. Generally, by using feature selection innovative approach and effective learning algorithms, the spatial distribution of soil maps could be made even in low relief lands with acceptable accuracy.


Main Subjects

Afshar, F. A., Ayoubi, S., and Jafari, A. (2018). The extrapolation of soil great groups using multinomial logistic regression at regional scale in arid regions of Iran. Geoderma, 315, 36-48.
Baghche Maryam, M.M. and Shekaari, P.)2018(. Soil Distribution Pattern Analysis in a Low Relief Area Using Decision Trees Algorithm. Journal of Water and soil Research. Iran. P 463-480.
Barthold, F. K., Wiesmeier, M., Breuer, L., Frede, H. G., Wu, J., and Blank, F. B. (2013). Land use and climate control the spatial distribution of soil types in the grasslands of Inner Mongolia. Journal of Arid Environments88, 194-205.
Behrens, T., Zhu, A. X., Schmidt, K., and Scholten, T. (2010). Multi-scale digital terrain analysis and feature selection for digital soil mapping. Geoderma, 155(3-4), 175-185.
Boettinger, J. L. (2010). Environmental covariates for digital soil mapping in the western USA. In Digital Soil Mapping (pp. 17-27). Springer, Dordrecht.
Breiman, L. (2001) Random forests. Machine learning, 45(1), 5-32.
Breiman, L. and Cutler, A. (2004) Random Forests. Department of Statistics, University of Berkeley. Forests/cc_home.htm.
Brungard, C. W., Boettinger, J. L., Duniway, M. C., Wills, S. A. and Edwards Jr, T. C. (2015). Machine learning for predicting soil classes in three semi-arid landscapes. Geoderma, 239, 68-83.
Byrt, T., Bishop, J. and Carlin, J. B. (1993). Bias, prevalence and kappa. Journal of clinical epidemiology, 46(5), 423-429.
Chavez, P. S., Berlin, G. L. and Sowers, L. B. (1982) Statistical method for selecting Landsat MSS. J. Appl. Photogr. Eng, 8(1), 23-30.
Congalton, R. G. (1991). A review of assessing the accuracy of classifications of remotely sensed data. Remote sensing of environment, 37(1), 35-46.
Death, G. (2007). Boosted trees for ecological modeling and prediction. Ecology 88 (1), 243–251.
Elith, J., Leathwick, J. R., and Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carre, G., and Munkemuller, T. (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46.
Farzamnia, P., Manafi, S., and Momtaz, H. R. (2015). Evolution of soils formed on Quaternary sediments in some parts of Urmia Plain. journal of soil management and sustainable production. Vol (5.2).
Freidman, J., Hastie, T., and Tibshirani, R. (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2), 337-407.
Gee, G. W., and Bauder, J. W. (1986) Particle-size analysis 1. Methods of soil analysis: Part 1—Physical and mineralogical methods, (methodsofsoilan1), 383-411.
Grinand, C., Arrouays, D., Laroche, B. and Martin, M.P. (2008). Extrapolating regional soil landscapes from an existing soil map: sampling intensity, validation procedures, and integration of spatial context. Geoderma 143, 180–190.
McBratney, A. B., Santos, M. M. and Minasny, B. (2003). On digital soil mapping. Geoderma, 117(1), 3-52.
Abbas, A. and  Khan, S. (2007) Using remote sensing techniques for appraisal of irrigated soil salinity. In: MODSIM 2007: International Congress on Modelling and Simulation: Land, Water and Environmental Management: Integrated Systems for Sustainability, pp. 2632–2638.
Grunwald, S., Thompson, J. A. and Boettinger, J. L. (2011). Digital soil mapping and modeling at continental scales: Finding solutions for global issues. Soil Science Society of America Journal75(4), 1201-1213.
Hengl, T., Toomanian, N., Reuter, H. I. and Malakouti, M. J. (2007). Methods to interpolate soil categorical variables from profile observations: lessons from Iran. Geoderma, 140(4), 417-427.
Iran Meteorological Organization. (2013). Climate Information, Qazvin synoptic station: Qazvin, Iran. Available at:
Jafari, A., Finke, P. A., Vande Wauw, J., Ayoubi, S. and Khademi, H. (2012). Spatial prediction of USDA‐great soil groups in the arid Zarand region, Iran: comparing logistic regression approaches to predict diagnostic horizons and soil types. European Journal of Soil Science, 63(2), 284-298.
Jenny, H. (1994) Factors of soil formation: a system of quantitative pedology. Courier Corporation.
Khamoshi, S.E, Sarmadian,F and Keshavarzi, A. (2019). Digital Soil Mapping Using Random Forests Model in
Abyek, Qazvin Province. Soil Research Journal. No 3. P 394-403.
Lacoste, M., Lemercier, B. and Walter, C. (2011). Regional mapping of soil parent material by machine learning based on point data. Geomorphology, 133(1-2), 90-99.
Lagacherie, P. (1992) Formalisation des lois de distribution des sols pour automatiser la cartography pedologique a partir d'un secteur pris comme reference: cas de la petite region naturelle Moyenne Vallee de l'Héraut.
Landis, J. R. and Koch, G. G. (1977) An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 363-374.
Levi, M. R. and Rasmussen, C. (2014). Covariate selection with iterative principal component analysis for predicting physical soil properties. Geoderma, 219, 46-57.
Mirakzehi, K., Pahlavan-Rad, M. R., Shahriari, A. and Bameri, A. (2018). Digital soil mapping of deltaic soils: a case of study from Hirmand (Helmand) river delta. Geoderma, 313, 233-240.
Mosleh, Z., Salehi, M. H., Jafari, A., Borujeni, I. E. and Mehnatkesh, A. (2017). Identifying sources of soil classes variations with digital soil mapping approaches in the Shahrekord plain, Iran. Environmental earth sciences, 76(21), 748.
Mosleh, Z., Salehi, M. H., Jafari, A., Borujeni, I. E. and Mehnatkesh, A. (2016). The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environmental monitoring and assessment, 188(3), 195.
Mousavi, S. R., Sarmadian, F., Alijani, Z. and Taati, A. (2017). Land suitability evaluation for irrigating wheat by geopedological approach and geographic information system: A case study of Qazvin plain, Iran. Eurasian Journal of Soil Science, 6(3), 275.
Nelson RE. (1982) Carbonate and gypsum. In: Page AL (ed) Methods of soil analysis. American Society of Agronomy, Madison,
pp 181–197.
Olaya, V. I. C. T. O. R. (2004). A gentle introduction to SAGA GIS. The SAGA User Group eV, Gottingen, Germany, 208.
Rad, M. R. P., Khormali, F., Tomanian, N., Kiani, F. and Kamli, B. (2015). Digital soil mapping using Random Forest model in Golestan province. Water and soil conservation Journal’s. 73-93.
Rad, M. R. P., Khormali, F., Toomanian, N., Brungard, C. W., Kiani, F., Komaki, C. B. and Bogaert, P. (2016). Legacy soil maps as a covariate in digital soil mapping: a case study from Northern Iran. Geoderma, 279, 141-148.
Rad, M. R. P., Toomanian, N., Khormali, F., Brungard, C. W., Komaki, C. B. and Bogaert, P. (2014). Updating soil survey maps using random forest and conditioned Latin hypercube sampling in the loess derived soils of northern Iran. Geoderma, 232, 97-106.
Rasouli, A. A. (2008) Principles of Applied Remote Sensing with Emphasis on Satellite Image Processing. Presses University of Tabriz.
Schloeder, C. A., Zimmerman, N. E. and Jacobs, M. J. (2001). Comparison of methods for interpolating soil properties using limited data. Soil Science Society of America Journal65(2), 470-479.
Schoeneberger, P.J., Wysocki, D.A. and Benham, E.C. (2012) Soil Survey Staff. Field book for describing and sampling soils, 3nd version. Natural Resources Conservation Service. National Soil Survey Center, Lincoln.
Soil Survey Staff. (2014) Keys to soil taxonomy. 12th edn. USDANatural Resources Conservation Service, Washington, DC
Sreenivas, K., Dadhwal, V. K., Kumar, S., Harsha, G. S., Mitran, T., Sujatha, G., and Ravisankar, T. (2016). Digital mapping of soil organic and inorganic carbon status in India. Geoderma, 269, 160-173.
Sumner, M. E. and Miller, W. P. (1996). Cation exchange capacity and exchange coefficients. Methods of soil analysis part 3—chemical methods, (methodsofsoilan3), 1201-1229.
Taghizadeh-Mehrjardi, R., Nabiollahi, K., Minasny, B. and Triantafilis, J. (2015). Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma, 253, 67-77.
Tesfa, T. K., Tarboton, D. G., Chandler, D. G. and McNamara, J. P. (2009). Modeling soil depth from topographic and land cover attributes. Water Resources Research, 45(10).
Thomas, P. J., Baker, J. C., Zelazny, L. W. and Hatch, D. R. (2000). Relationship of map unit variability to shrink–swell indicators. Soil Science Society of America Journal64(1), 262-268.
U.S. Geology Survey. (2014).
Van Wambeke, A. R. (2000). The Newhall Simulation Model for estimating soil moisture and temperature regimes. Department of Crop and Soil Sciences. Cornell University, Ithaca, NY. USA.
Walkley, A. and Black, I. A. (1934) An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil science, 37(1), 29-38.
Yang, L., Jiao, Y., Fahmy, S., Zhu, A., Hann, S., Burt, J. E., and Qi, F. (2011). Updating conventional soil maps through digital soil mapping. Soil Science Society of America Journal, 75(3), 1044-1053.
Yemefack, M., Rossiter, D. G. and Njomgang, R. (2005). Multi-scale characterization of soil variability within an agricultural landscape mosaic system in southern Cameroon. Geoderma125(1-2), 117-143.
Zeraatpisheh, M., Ayoubi, S., Jafari, A. and Finke, P. (2017). Comparing the efficiency of digital and conventional soil mapping to predict soil types in a semi-arid region in Iran. Geomorphology, 285, 186-204.
Zinck, J. A. (1988) Physiography and soils, ITC soil survey lecture notes. International Institute for Aerospace Survey and Earth Sciences, Enschede, 7.