Wet aggregate stability modeling based on random forest optimized with genetic algorithm

Document Type : Research Paper

Authors

1 Department of Soil Science and Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

2 Department of Soil Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

3 Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

4 Soil Science and Engineering Department, Agriculture Faculty, University of Tabriz, Tabriz, Iran

Abstract

In order to effectively manage soil and water resources, it is imperative to investigate wet aggregate stability (WAS) as a fundamental indicator for assessing soil structure and quality. In this study, machine learning techniques, specifically random forest (RF) and random forest optimized with genetic algorithm (GA-RF), were employed. The analysis focused on determining the texture, organic matter content, and lime characteristics of 55 soil samples collected from the Arsbaran forests. Utilizing various input combinations based on correlations with WAS, modeling was performed across seven distinct scenarios. Furthermore, three performance metrics including correlation coefficient (CC), normalized root mean square error (NRMSE), and Wilmot coefficient (WI) were utilized to evaluate the effectiveness of the models. The findings indicated that the RF5 model exhibited superior performance among the random forest models, achieving NRMSE = 0.038, CC = 0.736, and WI = 0.789. Similarly, the GA-RF5 model, optimized through a genetic algorithm approach, demonstrated exceptional performance with NRMSE = 0.031, CC = 0.800, and WI = 0.842 when considering input percentages of sand, silt, and clay. Moreover, results from RF1 (NRMSE = 0.047, CC = 0.589, WI = 0.721) and GA-RF1 (NRMSE = 0.036, CC = 0.662, WI = 0.797) emphasized that clay content exhibited the strongest correlation with stability. Additionally, the incorporation of calcium carbonate equivalent in scenario 7 significantly enhanced model performance and positively influenced the prediction of wet aggregate stability. In summary, the hybrid model combining random forest with a genetic algorithm is recommended for precise and reliable determination of wet aggregate stability in studies focusing on soil properties.

Keywords

Main Subjects


Wet aggregate stability modeling based on random forest optimized with genetic algorithm

EXTENDED ABSTRACT

Introduction

In order to effectively manage soil and water resources, it is imperative to investigate wet aggregate stability (WAS) as a fundamental indicator for assessing soil structure and quality. Given the labor-intensive and expensive nature of determining WAS values through traditional laboratory techniques, there is a clear advantage in indirectly predicting them using readily available data. Machine learning (ML) techniques present a viable alternative for this purpose. The efficacy of ML stems from its capacity to analyze data on a large scale, enabling the resolution of challenges that conventional linear methods struggle to address economically and satisfactorily. The primary objective of this study is to develop a predictive model for WAS utilizing ML, specifically the random forest (RF) method in standalone mode, and its hybrid with a genetic algorithm (GA-RF) to optimize RF parameters. This unique approach distinguishes the research in the domain of WAS prediction.

Material and Methods

The study area selected for investigation was a portion of forested land within the Arsbaran region. A total of 55 soil samples were collected from diverse environmental conditions and subsequently analyzed in the laboratory to determine soil texture, organic matter content, and calcium carbonate equivalent levels. Wet aggregate stability, as assessed by the Kemper and Rosenau test, served as the basis for calibrating machine learning (ML) models. Seven scenarios were explored for predicting wet aggregate stability using soil characteristics through the application of the random forest method in standalone mode and with optimization through a genetic algorithm. The dataset was partitioned such that 70% of the data was allocated for training the models, while the remaining 30% was reserved for testing. Subsequently, the accuracy of the predictive models was evaluated by calculating error metrics, including normalized root mean square error (NRMSE), correlation coefficient (CC), and Wilmot coefficient (WI).

Results and Discussion

Upon scrutinizing the correlation coefficients between soil attributes and WAS derived from laboratory analysis, a robust relationship between the selected characteristics and the target variable was evident. Among the various random forest models assessed, the RF5 model exhibited notable performance with NRMSE parameters at 0.038, CC at 0.8, and WI at 0.789. Furthermore, the GA-RF5 model, optimized using a genetic algorithm, surpassed the RF5 model with improved metrics of 0.031 NRMSE, 0.800 CC, and 0.842 WI, showcasing enhanced predictive capabilities for WAS. A comparative analysis between the RF5 and GA-RF5 models revealed that the genetic algorithm significantly enhanced the predictive accuracy of RF by elevating R and WI values by 8% and 6.72%, respectively, while reducing NRMSE by 18.42%. Notably, scenario 5 emerged as the optimal model, predicated on the composition of sand, silt, and clay particles.

The findings from RF1 (NRMSE = 0.047, CC = 0.589, WI = 0.721) and GA-RF1 (NRMSE = 0.036, CC = 0.662, WI = 0.797) underscored the pivotal role of clay content in soil structure and its influence on WAS prediction. Clay content was identified as a critical soil property impacting WAS, as it functions as a binding agent that cohesively holds soil particles together. The clay content in the analyzed soils ranged from 5% to 62.5%. Contrarily, organic matter was found to have no discernible effect on WAS, as indicated by the statistical outcomes of scenario 2 models. Moreover, scenarios 6 and 7 demonstrated a substantial reduction of 10.43% and 10.81% in NRMSE in both standalone and optimized modes, highlighting the beneficial impact of lime in enhancing WAS prediction accuracy.

Conclusion

Wet aggregate stability stands as a fundamental soil attribute crucial in determining soil erodibility and hydraulic characteristics. Understanding the key soil components governing WAS is imperative for preserving soil structure integrity. An innovative approach to quantifying WAS involves utilizing easily accessible soil parameters for predictive modeling. The statistical analysis conducted revealed that the RF5 and GA-RF5 models, incorporating soil texture variables, exhibited superior predictive performance. A comparative assessment between these models highlighted the enhanced predictive capabilities of the GA-RF model in forecasting WAS. Furthermore, scenarios 1 and 3 underscored the pivotal role of clay content in soil composition, encapsulating various soil formation processes and factors. Overall, the utilization of the GA-RF machine learning technique yields satisfactory accuracy in predicting WAS based on soil attributes. Notably, organic matter (OM) was found to have negligible impact on WAS, while the inclusion of lime demonstrated a positive effect on improving WAS prediction accuracy.

 

 

  1. Alaboz, P., Dengizb, O., & Saygın, F. (2022). Estimation of soil aggregate stability by different regression methods. Conference Paper.

    Alekseeva, T.V., Sokolowska, Z., Hajnos, M., Alekseev, A.O., & Kalinin, P.I. (2009). Water stability of aggregates in subtropical and tropical soils (Georgia and China) and its relationships with the mineralogy and chemical properties. Eurasian Soil Science, 42, 415-425. DOI: 10.1134/S1064229309040085.

    Alijanpour Shalmani, A., Shabanpour, M., Asadi, H., & Bagheri, F. (2011). Estimation of Soil Aggregate Stability in Forest’s Soils of Guilan Province by Artificial Neural Networks and Regression Pedotransfer Functions. Water and Soil Science21(3), 153-162. (In Persian).

    Allison, L.E., & Moodie, C.D. (1965). Carbonate. Methods of soil analysis: part 2 chemical and microbiological properties, 9, 1379-1396. DOI: 10.2134/agronmonogr9.2.c40.

    Alqahtani, M., Gumaei, A., Mathkour, H., & Maher Ben Ismail, M. (2019). A genetic-based extreme gradient boosting model for detecting intrusions in wireless sensor networks. Sensors19(20), 4383. DOI: 10.3390/s19204383.

    Amézketa, E. (1999). Soil aggregate stability: a review. Journal of sustainable agriculture14(2-3), 83-151. DOI: 10.1300/J064v14n02_08.

    An, S., Mentler, A., Mayer, H., & Blum, W.E. (2010). Soil aggregation, aggregate stability, organic carbon and nitrogen in different soil aggregate fractions under forest and shrub vegetation on the Loess Plateau, China. Catena81(3), 226-233. DOI: 10.1016/j.catena.2010.04.002.

    Angers, D.A., & Carter, M.R. (2020). Aggregation & organic matter storage in cool, humid agricultural soils. In Structure & organic matter storage in agricultural soils, 193-211. CRC Press.

    Are, M., Kaart, T., Selge, A., Astover, A., & Reintam, E. (2018). The interaction of soil aggregate stability with other soil properties as influenced by manure and nitrogen fertilization. DOI: 10.13080/z-a.2018.105.025.

    Armin, M., rouhipour, H., Ahmadi, H., Salajegheh, A., Mahdian, M. H., & ghorbannia kheybari, V. (2016). Relationship between Aggregate Stability and Selected Soil Properties in Taleghan Watershed. Journal of Range and Watershed Management69(2), 275-295.  DOI: 10.22059/jrwm.2016.61683. (In Persian).

    Assiri, A. (2021). Anomaly classification using genetic algorithm-based random forest model for network attack detection. Computers, Materials & Continua66(1). DOI:10.32604/cmc.2020.013813.

    Bardsirizadeh, S., Esfandiarpour Borujeni, I., Besalatpour, A.A., & Abbaszadeh Dehaji, P. (2017). Use of Geostatistical Method to Determine the Most Effective Aggregate Component for Estimating Soil Structural Stability Journal of Water and Soil, 31(2), 533-544. DOI: 10.22067/jsw.v31i2.54438. (In Persian).

    Ben-Hur, M., Shainberg, I., Bakker, D., & Keren, R. (1985). Effect of soil texture & CaCO 3 content on water infiltration in crusted soil as related to water salinity. Irrigation Science6, 281-294. DOI: 10.1007/BF00262473.

    Besalatpour, A. A., Shirani, H., & ESFANDIARPOUR, B. I. (2015). Modeling of soil aggregate stability using support vector machines and multiple linear regression. Journal of Water and Soil, 29(2), 406-417. DOI: 10.22067/JSW.V0I0.22620. (In Persian).

    Besalatpour, A., Hajabbasi, M.A., Ayoubi, S., Afyuni, M., Jalalian, A., & Schulin, R. J. S. S. (2012). Soil shear strength prediction using intelligent systems: artificial neural networks & an adaptive neuro-fuzzy inference system. Soil science & plant nutrition58(2), 149-160. DOI: 10.1080/00380768.2012.661078.

    Besalatpour, A.A., Ayoubi, S., Hajabbasi, M.A., Jazi, A.Y., & Gharipour, A. (2014). Feature selection using parallel genetic algorithm for the prediction of geometric mean diameter of soil aggregates by machine learning methods. Arid Land Research and Management28(4), 383-394. DOI: 10.1080/15324982.2013.871599.

    Besalatpour, A.A., Ayoubi, S., Hajabbasi, M.A., Mosaddeghi, M., & Schulin, R. (2013). Estimating wet soil aggregate stability from easily available properties in a highly mountainous watershed. Catena111, 72-79. DOI: 10.1016/j.catena.2013.07.001.

    Bhattacharya, P., Maity, P. P., Ray, M., & Mridha, N. (2021). Prediction of mean weight diameter of soil using machine learning approaches. Agronomy journal113(2), 1303-1316. DOI: 10.1002/agj2.20469.

    Boix-Fayos, C., Calvo-Cases, A., Imeson, A.C., & Soriano-Soto, M.D. (2001). Influence of soil properties on the aggregation of some Mediterranean soils and the use of aggregate size and stability as land degradation indicators. Catena44(1), 47-67. DOI: 10.1016/S0341-8162(00)00176-4.

    Bouajila, A., & Gallali, T. (2008). Soil Organic Carbon Fractions and Aggregate Stability in Carbonated. Journal of Agronomy7(2), 127-137.

    Bouslihim, Y., Rochdi, A., & Paaza, N.E.A. (2021). Machine learning approaches for the prediction of soil aggregate stability. Heliyon7(3). DOI: 10.1016/j.heliyon.2021.e06480.

    Breiman, L. (2001). Random forests. Machine learning45, 5-32. DOI: 10.1023/A:1010933404324.

    Chau, K.W., Wu, C.L., & Li, Y.S. (2005). Comparison of several flood forecasting models in Yangtze River. Journal of Hydrologic Engineering10(6), 485-491. DOI: 10.1061/(ASCE)1084-0699(2005)10:6(485).

    Chrenková, K., Mataix-Solera, J., Dlapa, P., & Arcenegui, V. (2014). Long-term changes in soil aggregation comparing forest & agricultural l& use in different Mediterranean soil types. Geoderma235, 290-299. DOI: 10.1016/j.geoderma.2014.07.025.

    Dıaz-Zorita, M., Perfect, E., & Grove, J. H. (2002). Disruptive methods for assessing soil structure. Soil and Tillage Research64(1-2), 3-22. DOI: 10.1016/S0167-1987(01)00254-9.

    East Azerbaijan Meteorological Organization. (2021). Meteorological Statistics of Kaleybar Synoptic Station. Tehran: Meteorological Organization of the Islamic Republic of Iran. (In Persian).

    East Azerbaijan Natural Resources Organization. (2003). Forest Conservation Plan for Northern Arasbaran Forests (Summary of Northern Arasbaran Forests Studies). Tabriz: East Azerbaijan Natural Resources General Directorate. (In Persian).

    Falsone, G., Bonifacio, E., Santoni, S., & Zanini, E. (2006). Wet aggregate stability of some Botswana soil properties. ARID LAND RESEARCH AND MANAGEMENT20(1), 15-28.

    Gee, G.W., & Bauder, J.W. (1986). Particle‐size analysis. Methods of soil analysis: Part 1 Physical and mineralogical methods5, 383-411. DOI: 10.2136/sssabookser5.1.2ed.c15.

    Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.

    Horn, R., Taubner, H., Wuttke, M., & Baumgartl, T. (1994). Soil physical properties related to soil structure. Soil and Tillage Research30(2-4), 187-216. DOI: 10.1016/0167-1987(94)90005-1.

    Huang, Y., Lan, Y., Thomson, S. J., Fang, A., Hoffmann, W.C., & Lacey, R.E. (2010). Development of soft computing and applications in agricultural and biological engineering. Computers and electronics in agriculture71(2), 107-127. DOI: 10.1016/j.compag.2010.01.001.

    Kemper, W.D., & Rosenau, R.C. (1986). Aggregate stability and size distribution. Methods of soil analysis: Part 1 Physical and mineralogical methods5, 425-442. DOI: 10.2136/sssabookser5.1.2ed.c17.

    Khazaee, A., Mosaddeghi, M.R., & Mahboubi, A.A. (2008). Structural stability assessment using wet sieving method and its relations with some intrinsic properties in 21 soil series from Hamadan province. Agricultural Research, 8(1 (A)), 171-181. (In Persian).

    Kouchami-Sardoo, I., Shirani, H., & Besalatpour, A.A. (2020). Determining the Features Influencing the Structural Stability of Soils of Arid Regions Using a Hybrid GA-ANN Algorithm. Applied Soil Research8(3), 129-143. (In Persian).

    Lado, M., Ben-Hur, M., & Shainberg, I. (2004). Soil wetting and texture effects on aggregate stability, seal formation, and erosion. Soil Science Society of America Journal68(6), 1992-1999. DOI: 10.2136/sssaj2004.1992.

    Liu, M.Y., Chang, Q.R., Qi, Y.B., Liu, J., & Chen, T. (2014). Aggregation and soil organic carbon fractions under different land uses on the tableland of the Loess Plateau of China. Catena115, 19-28. DOI: 10.1016/j.catena.2013.11.002.

    1. Nikpur, M., Mahboubi, A.A., Mosaddeghi, M. R., & Safadoust, A. (2012). Assessment of Soil Intrinsic Properties Effects on Soil Structural Stability of Some Soils in Hamadan Province. JWSS, 15(58), 85-96. DOI: 20.1001.1.24763594.1390.15.58.6.0. (In Persian).

    Martinez-Mena, M., Lopez, J., Almagro, M., Boix-Fayos, C., & Albaladejo, J. (2008). Effect of water erosion & cultivation on the soil carbon stock in a semiarid area of South-East Spain. Soil & Tillage Research99(1), 119-129. DOI: 10.1016/j.still.2008.01.009.

    Mbagwu, J.S. (2004). Aggregate stability and soil degradation in the tropics. In Proceedings of the conference report on the lecture given at the College on Soil Physics, Trieste, Italy, 3–21 March 2003, 246–252.

    Minhas, P.S., & Sharma, D.R. (1986). Hydraulic conductivity & clay dispersion as affected by application sequence of saline & simulated rain water. Irrigation Science7, 159-167. DOI: 10.1007/BF00344071.

    Mohammadian Khorasani, Sh., Homaee, M., & Pazira, M. (2015). Evaluating soil aggregate stability using classical methods and fractal models, Journal of Water and Soil Resources Conservation, 4(3), 39-51. DOI: 20.1001.1.22517480.1394.4.3.4.1. (In Persian).

    Nelson, D.W., & Sommers, L.E. (1996). Total carbon, organic carbon, and organic matter. Methods of soil analysis: Part 3 Chemical methods5, 961-1010. DOI: 10.2136/sssabookser5.3.c34.

    Ramdhani, Y., Putra, C., & Alamsyah, D. (2023). Heart failure prediction based on random forest algorithm using genetic algorithm for feature selection. International Journal of Reconfigurable and Embedded Systems (IJRES).12, 205. DOI: 10.11591/ijres.v12.i2.pp205-214.

    Rezaei, H., Jafarzadeh, A., Alijanpour, A., Shahbazi, F., & Valizadeh Kamran, K. (2020). Soil Organic Matter Condition in Forest Stands of Arasbaran. Water and Soil34(1), 115-127. DOI: 10.22067/JSW.V34I1.80633. (In Persian).

    Rezaei, H., Jsfarzadeh, A. A., Alijanpour, A., Shahbazi, F., & Valizadeh Kamran, K. (2017). Genetically evolution of Arasbaran forests soils along altitudinal transects of Kaleybar Chai Sofla Sub-Basin. Water and Soil Science, 26(4.1), 151-166. (In Persian).

    Saadat, S., esmaeelnejad, L., rezaei, H., mirkhani, R., & seyedmohammadi, J. (2019). Comparing Aggregate Stability Tests as One of the Soil Physical Quality Indicators. Water and Soil33(2), 289-303. Doi: 10.22067/JSW.V33I2.73916. DOI: 10.22067/JSW.V33I2.73916. (In Persian).

    Svetnik, V., Liaw, A., Tong, C., & Wang, T. (2004). Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In Multiple Classifier Systems: 5th International Workshop, MCS 2004, Cagliari, Italy, June 9-11, 2004. Proceedings 5 (pp. 334-343). Springer Berlin Heidelberg. DOI: 10.1007/978-3-540-25966-4_33.

    Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., & Feuston, B.P. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of chemical information and computer sciences43(6), 1947-1958. DOI: 10.1021/ci034160g.

    Tajik, F. (2004). Evaluation of soil aggregate stability in some regions of iron. JWSS, 8(1), 107-123. DOI: 20.1001.1.24763594.1383.8.1.9.0. (In Persian).

    Tongway, D., & Hindley, N. (2004). Landscape function analysis: a system for monitoring rangeland function. African journal of range and forage science, 21(2), 109-113. DOI: 10.2989/10220110409485841.

    Wang, J.G., Yang, W., Yu, B., Li, Z. X., Cai, C.F., & Ma, R.M. (2016). Estimating the influence of related soil properties on macro-and micro-aggregate stability in ultisols of south-central China. Catena137, 545-553. DOI: 10.1016/j.catena.2015.11.001.

    Wang, W.C., Chau, K.W., Cheng, C.T., & Qiu, L. (2009). A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. Journal of hydrology374(3-4), 294-306. DOI: 10.1016/j.jhydrol.2009.06.019.

    Wu, X., Wei, Y., Wang, J., Wang, D., She, L., Wang, J., & Cai, C. (2017). Effects of soil physicochemical properties on aggregate stability along a weathering gradient. Catena156, 205-215. DOI: 10.1016/j.catena.2017.04.017.

    Zeini, H.A., Al-Jeznawi, D., Imran, H., Bernardo, L.F. A., Al-Khafaji, Z., & Ostrowski, K.A. (2023). Random Forest Algorithm for the Strength Prediction of Geopolymer Stabilized Clayey Soil. Sustainability15(2), 1408. DOI: 10.3390/su15021408.

    Zeraatpisheh, M., Ayoubi, S., Mirbagheri, Z., Mosaddeghi, M.R., & Xu, M. (2021). Spatial prediction of soil aggregate stability and soil organic carbon in aggregate fractions using machine learning algorithms and environmental variables. Geoderma Regional27, e00440. DOI: 10.1016/j.geodrs.2021.e00440.

    Zhai, R., Wang, J., Yin, D., & Shangguan, Z. (2022). Wet aggregate stability modeling based on support vector machine in multiuse soils. International Journal of Distributed Sensor Networks18(6). DOI: 10.1177/15501329221107573.