Modelling distribution of available phosphorous contents in surface soils of northern Khuzestan Province using multiple linear regression model and random forest algorithm

Document Type : Research Paper

Authors

1 Department of Soil Science, ّّFaculty of Agriculture, Shahid Chamran University of Ahvaz, Khuzestan, Iran

2 Department of Soil Science, Faculty of Agriculture, Shahid Chamran University of Ahvaz, Khuzestan, Iran

3 School of Environmental Sciences, University of Guelph, Ontario, Canada

Abstract

There is little information about the spatial distribution of elements involved in assessing the fertility of soils in Khuzestan province, especially the available phosphorus contents of the soils. Therefore, this study conducted to determine the most effective soil properties controlling the concentration of available phosphorus contents of soils in the north of Khuzestan province and to introduce the most appropriate method for modeling the spatial distribution of available phosphorus contents of the soils analyzed using linear regression and random forest algorithm. For this purpose, 250 composite soil samples (0-10 cm depth) were randomly collected using the Conditional Latin Hypercube sampling approach from December to February 2016. Then, the physical and chemical properties of the samples were determined using standard laboratory methods. The experimental data were then analyzed for descriptive statistics using SPSS software. To model the spatial variability of available phosphorus contents of soils, the experimental data were modeled using linear regression and random forest models in RStudio software. The results showed that according to the measured amounts of absorbable phosphorus in the soil samples in 32.4% of the samples, the concentration of available phosphorus is less than 5 mg/kg. Evaluation of multiple linear regression and random forest models based on model evaluation metrics including mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2) using training, test and the whole dataset, showed that the random forest model provides better and more accurate estimates due to higher coefficients of determination as well as lower error values. The results also illustrated that the organic carbon content of the soils has the greatest contribution in the study area to predict available contents of soil phosphorus. In conclusion, models that include non-linear relationships between variables seem to be more suitable in predicting soil properties.

Keywords

Main Subjects