انبوهش‌زدایی واحدهای نقشه خاک با استفاده از مدل دسمارت: ترکیب مدل‌های مبتنی بر سیستم درختی و داده‌های جدید خاک‌رخی

نوع مقاله : مقاله پژوهشی

نویسندگان

1 گروه علوم و مهندسی خاک، دانشکده کشاورزی، دانشکدگان کشاورزی و منابع طبیعی، دانشگاه تهران، کرج، ایران

2 عضو هیأت علمی گروه مهندسی علوم خاک، پردیس کشاورزی و منابع طبیعی دانشگاه تهران

3 بخش علوم و مهندسی خاک، دانشکده کشاورزی-دانشگاه شهیدباهنر کرمان

چکیده

نقشه‌های مرسوم خاک بصورت واحدهای چندضلعی می‌باشند که در آن‌ها، واحدهای خاک با مرزهای مشخص از یکدیگر تفکیک شده‌اند اما تغییرات کلاس‌های خاک در واحدها مشخص نمی‌باشد. با توجه به نیاز به اطلاع از تغییرات کلاس‌های خاک‌ در واحدهای نقشه، هدف این مطالعه انبوهش‌زدایی واحدهای نقشه خاک با استفاده از روش دسمارت (DSMART) در منطقه آبیک می‌باشد. مدل دسمارت براساس مدل‌های درخت C5.0، جنگل تصادفی و تقویت گرادیان افراطی در دو سناریو انجام شد: (1) استفاده از اطلاعات واحدهای نقشه خاک موروثی یک میلیونیم کشور و (2) با اطلاعات 230 خاک‌رخ جدید در سطح زیرگروه‌های خاک. عملکرد مدل‌ها و عدم قطعیت آن‌ها با شاخص‌های کمی ارزیابی شدند. در سناریوی اول، میزان صحت کلی نقشه‌ها بین 29/0 تا 37/0 و مقدار کاپا بین 17/0 تا 29/0 متغیر بود که بهترین نتایج از مدل تقویت گرادیان افراطی با شاخص درهمی 74/0 بدست آمد. در سناریوی دوم، در مدل جنگل تصادفی صحت کلی نقشه‌ها از 51/0 تا 63/0 و کاپا از 44/0 تا 60/0 افزایش یافت و شاخص درهمی به 65/0 کاهش یافت. مقایسه این نقشه‌ها با توزیع مکانی زیرگروه‌های خاک منطقه بیانگر تطبیق خوب نقشه‌ها با هم بود، به‌طوریکه در سناریوی دوم به میزان 43 درصد افزایش نشان داد. در سناریوی اول، متغیرهای توپوگرافی، و در سناریوی دوم، میانگین بارندگی و شاخص پوشش گیاهی عمودی بیشترین اهمیت را در مدل‌سازی‌ داشتند. ترکیب داده‌های جدید خاک‌رخی صحت مدل‌سازی را تا 26 درصد افزایش داد. این نتایج کارایی روش دسمارت با اطلاعات خاک‌رخی اضافی در انبوهش‌زدایی واحدهای نقشه خاک موروثی را تأیید می‌کنند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Improving the Disaggregation of Soil Map Units Using the DSMART Method: Integrating Tree-Based Models and New Soil Profile Data

نویسندگان [English]

  • zahra rasaei 1
  • Fereydoon Sarmadian 2
  • Azam Jafari 3
1 Soil Science Department, Faculty of Agricultural, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
2 soil science department< faculty of agricultural engineering and technology, university of Tehran
3 Department of Soil Science, Faculty of Agriculture, Shahid Bahonar University of Kerman
چکیده [English]

Conventional soil maps consist of polygon units where soil types are delineated with clear boundaries, yet intra-unit variability of soil classes remains undefined. To address this, the present study aims to disaggregate the inherited one-millionth soil map units of Iran using the DSMART model in the Abyek region. The DSMART was applied using three tree-based algorithms: C5.0, Random Forest (RF), and Extreme Gradient Boosting (XGBoost), across two scenarios: (1) utilizing legacy soil map information, and (2) integrating 230 new soil profiles at the subgroup level. Model performance and uncertainty were evaluated using overall accuracy, Kappa coefficient, and confusion index. In Scenario 1, map accuracy ranged from 0.29 to 0.37, with Kappa values between 0.17 and 0.29. The highest performance was achieved by XGBoost, showing a confusion index of 0.74. In Scenario 2, accuracy improved to 0.51–0.63 and Kappa to 0.44–0.60, with the best results from the RF model, although the confusion index slightly dropped to 0.65. Spatial consistency with observed soil subgroup distribution improved significantly—by 43% in Scenario 2. Topography proved most influential in Scenario 1, while mean annual rainfall and vertical vegetation index dominated in Scenario 2. Incorporating new soil profile data enhanced model performance by up to 26%. These findings underscore the effectiveness of DSMART, particularly when enriched with new soil data, in refining legacy soil map units for more precise soil class delineation.

کلیدواژه‌ها [English]

  • DSMART
  • Downscaling
  • Digital soil mapping
  • Machine learning

EXTENDED ABSTRACT

 

Introduction

Legacy soil data are vital for preserving national resources and serve as a foundational reference for digital soil mapping and environmental modeling. However, many conventional soil maps—such as those produced at a scale of 1:1,000,000—lack the spatial detail required for modern land management applications. These maps often rely heavily on expert judgment and broad physiographic units, resulting in polygonal representations that blend multiple soil classes without describing their internal variability. Consequently, such aggregated maps limit our understanding of soil transitions and spatial patterns critical to agricultural planning, ecological assessments, and policy decisions.

Spatial disaggregation has emerged as a solution for enhancing map resolution and extracting more detailed, pixel-level soil information from generalized units. Nevertheless, relying solely on inherited map units without refining them with updated data can propagate uncertainties and reduce the predictive power of resulting maps. Therefore, integrating new, geo-referenced soil profile observations is essential to improve model calibration, capture local variability, and reduce epistemic uncertainty. This study aims to disaggregate and update Iran's national 1:1,000,000 legacy soil map in the Abyek region of the Qazvin Plain using DSMART (Disaggregation of Soil Map Units Through Resampled Classification Trees) in conjunction with three tree-based algorithms in two different scenarios. The primary goal is to evaluate the combined effect of algorithm choice and input data richness on the spatial delineation of soil classes.

Materials and Methods

The study was conducted in the Abyek region of the Qazvin Plain, spanning 58,000 hectares with varied topography and semi-arid to arid climate conditions. To support disaggregation, eight key environmental covariates representing soil-forming factors—topography, vegetation, salinity, and climate—were selected using a robust feature selection method. Disaggregation of legacy soil map units was performed using the DSMART model under two scenarios: (1) relying solely on existing map information, and (2) incorporating 230 new soil profiles classified at the USDA subgroup level. Three tree-based machine learning algorithms (C5.0, Random Forest, and XGBoost) were evaluated. Model outputs included probabilistic soil class maps. Model validation was performed using confusion matrices, overall accuracy, Kappa coefficients, the confusion (entropy) index, and spatial concordance via Cramér's V.

Results and Discussion

Scenario 1 yielded modest accuracy, with overall prediction rates ranging from 0.29 to 0.37. Among the three models, XGBoost achieved the highest Shannon diversity and composite performance index, demonstrating strength in capturing soil variability. However, high confusion indices in central and southern areas pointed to unresolved overlaps between closely related soil classes, likely due to limited input diversity and high-class imbalance.

Scenario 2, enhanced with 230 new profiles, resulted in substantial performance gains. Overall accuracy rose to 0.67 and Kappa reached 0.60, with agreement between predicted and observed soil subgroups improving by 43%. Random Forest achieved the highest composite accuracy–diversity score (1.34), while XGBoost maintained the lowest confusion index (0.60), reflecting more confident classifications. The inclusion of new field data improved the model's ability to predict rare subgroups and redistributed class probabilities more realistically across polygons. Notably, this also shifted the influence of environmental predictors: while topographic indices (e.g., elevation, MrVBF, valley depth) dominated Scenario 1, climatic (precipitation) and vegetation indices became more important in Scenario 2, suggesting improved model understanding of soil-environment relationships.

These results confirm that DSMART's disaggregation capability is highly sensitive to the quality and quantity of input data. More accurate soil class delineation—particularly for underrepresented or complex classes—requires sufficient profile observations and landscape-driven covariates. The findings align with recent studies emphasizing the role of disaggregated models and enriched data inputs for refining legacy maps in diverse agroecological settings.

Conclusion

This study demonstrated the applicability of the DSMART model for disaggregating Iran's one-millionth legacy soil map units in the Abyek region. Among the tree-based models tested, Random Forest and Extreme Gradient Boosting outperformed C5.0 in modeling soil subgroups. Incorporating 230 new soil profiles substantially improved model performance, underscoring the importance of high-quality field data. Topographic variables were most influential in the baseline model, while climatic and vegetation indices became more prominent with data enrichment. The results affirm DSMART's adaptability for refining outdated soil maps and support future use of additional soil observations and soil–landscape relationships to enhance spatial prediction accuracy in complex terrains.

 

Author Contributions

Conceptualization, Z.R., F.S. and A.J.; methodology, Z.R. and A.J.; software, Z.R.; validation, Z.R.; formal analysis, Z.R.; investigation, Z.R., and A.J.; resources, Z.R.; data curation, F.S.; writing—original draft preparation, Z.R.; writing—review and editing, Z.R., F.S. and A.J.; visualization, Z.R., and A.J.; supervision, F.S.; project administration, Z.R. and F.S.; funding acquisition, Z.R. All authors have read and agreed to the published version of the manuscript.

 

Data Availability Statement

Not applicable.

 

Acknowledgements

This work is based on research funded by the Iran National Science Foundation (INSF) under project No. 4024343. The first author gratefully acknowledges the financial support provided by the INSF and the University of Tehran, both for funding during data acquisition and analysis, and for creating the necessary conditions to carry out and complete this study.

 

Ethical considerations

The study was approved by the Ethics Committee of the University of ABCD (Ethical code: IR.UT.RES.2024.500). The authors avoided data fabrication, falsification, plagiarism, and misconduct.

 

Conflict of interest

The author declares no conflict of interest.

Abdel-Kader, F.H., 2011. Digital soil mapping at pilot sites in the northwest coast of Egypt: a multinomial logistic regression approach. Egypt. J. Remote Sens. Space Sci. 14, 29–40.
Allbed, A., & Kumar, L. (2013). Soil salinity mapping and monitoring in arid and semi-arid regions using remote sensing technology: a review. Advances in remote sensing, 2(4), 373-385.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM.
Conrad, O., Bechtel, B., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., Böhner, J., 2015. System for automated geoscientific analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 8, 1991–2007.
Dolan, M. F., Ross, R. E., Albretsen, J., Skarðhamar, J., Gonzalez-Mirelis, G., Bellec, V. K., ... & Bjarnadóttir, L. R. (2021). Using spatial validity and uncertainty metrics to determine the relative suitability of alternative suites of oceanographic data for seabed biotope prediction. A case study from the Barents Sea, Norway. Geosciences, 11(2), 48.
Easher, T.H., Saurette, D., Chappell, E., Lopez, F. de J.M., Gasser, M.O., Gillespie, A., Heck, R.J., Heung, B., & Biswas, A. (2023). Sampling and classifier modification to DSMART for disaggregating soil polygon maps. Geoderma, 431, 116360.
Ellili-Bargaoui, Y., Malone, B.P., Michot, D., Minasny, B., Vincent, S., Walter, C., & Lemercier, B. (2020). Comparing three approaches of spatial disaggregation of legacy soil maps based on the Disaggregation and Harmonisation of Soil Map Units Through Resampled Classification Trees (DSMART) algorithm. SOIL, 6(2), 371–388.
Ellili-Bargaoui, Y., Walter, C., Michot, D., Saby, N. P., Vincent, S., & Lemercier, B. (2019). Validation of digital maps derived from spatial disaggregation of legacy soil maps. Geoderma, 356, 113907.
Flynn, T., & Kostecki, R. (2024). Spatial downscaling of global soil texture classes into 30 m images at the province scale. Geomatica, 76(2), 100028.
Flynn, T., Van Zijl, G., Van Tol, J., Botha, C., Rozanov, A., Warr, B., & Clarke, C. (2019). Comparing algorithms to disaggregate complex soil polygons in contrasting environments. Geoderma, 352, 171–180.
Gagkas, Z., & Lilly, A. (2024). Spatial disaggregation of a legacy soil map to support digital soil and land evaluation assessments in Scotland. Geoderma Regional, 38, e00833.
Gallant, J.C., Dowling, T.I., 2003. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 39, 1347–1359.
Holmes, K., Griffin, E., & Odgers, N. (2015). Large-area spatial disaggregation of a mosaic of conventional soil maps: Evaluation over Western Australia. Soil Research, 53, 865–880.
Jafari, A., Ayoubi, S., Khademi, H., Finke, P., Toomanian, N. (2013). Selection of a taxonomic level for soil mapping using diversity and map purity indices: a case study from an Iranian arid region. Geomorphology 201, 86–97.
Jamshidi, M., Delavar, M.A., Taghizadehe-Mehrjardi, R., & Brungard, C. (2019). Disaggregation of conventional soil map by generating multi realizations of soil class distribution (case study: Saadat Shahr plain, Iran). Environmental Monitoring and Assessment, 191(12), 769.
Khamoshi, S. E., Sarmadian, F., & Keshavarzi, A. (2019). Digital soil mapping Using Random Forests and Land Suitability Evaluation for Abyek Region, Qazvin Province. Journal of Range and Watershed Management, 71(4), 885-899. (In Persian with English abstract).
Khamoshi, S. E., Sarmadian, F., & Omid, M. (2023). Predicting and Mapping of Soil Organic Carbon Stock Using Machine Learning Algorithm. Iranian Journal of Soil and Water Research, 53(11), 2671-2681. (In Persian with English abstract).
Krivoruchko, K., & Gribov, A. (2019). Evaluation of empirical Bayesian kriging. Spatial Statistics, 32, 100368.
Kursa, M. B., and Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(1), 1–13.
Lázaro-López, A., González-SanJosé, M.L., Gómez-Miguel, V., Malone, B., Lázaro-López, A., González-SanJosé, M.L., Gómez-Miguel, V., & Malone, B. (2021). Disaggregation of conventional soil maps: A review. Soil Research, 59(8), 747–766.
Machado, I.R., Giasson, E., Campos, A. R., Costa, J.J.F., da Silva, E.B., & Bonfatti, B.R. (2018). Spatial Disaggregation of Multi-Component Soil Map Units Using Legacy Data and a Tree-Based Algorithm in Southern Brazil. Revista Brasileira de Ciência Do Solo, 42, e0170193.
Malone, B. P., Styc, Q., Minasny, B., & McBratney, A. B. (2017). Digital soil mapping of soil carbon at the farm scale: A spatial downscaling approach in consideration of measured and uncertain data. Geoderma, 290, 91-99.
McBratney, A.B., Mendonça Santos, M.L., & Minasny, B. (2003). On digital soil mapping. Geoderma, 117(1), 3–52.
Minai, J., Libohova, Z., & Schulze, D.G. (2020). Disaggregation of the 1:100,000 Reconnaissance soil map of the Busia Area, Kenya using a soil landscape rule-based approach. Catena, 195, 104806.
Momtazi Burojeni, M., & Sarmadian, F. (2023). Spatial prediction of soil classes using C5.0 boosted decision tree model in Abyek Area. Iranian Journal of Soil and Water Research, 75(4), 553-572. (In Persian with English abstract).
Mousavi, S. R. A., Sarmadian, F., & Rahmani, A. (2019). Modelling and Prediction of Soil Classes Using Boosting Regression Tree and Random Forests Machine Learning Algorithms in Some Part of Qazvin Plain. Journal of Water and Soil, 50(10), 2525-2538. (In Persian with English abstract).
Mousavi, S. R. A., Sarmadian, F., Omid, M., & Bogaert, P. (2021a). Application of Machine Learning Models in Spatial Estimation of Soil Phosphorus and Potassium in Some Parts of Abyek Plain. Iranian Journal of Soil Science, 35(4), 397-411. (In Persian with English abstract).
Mousavi, S. R. A., Sarmadian, F., Omid, M., & Bogaert, P. (2021b). Modeling the Vertical Soil Calcium Carbonate Equivalent Variation by Machine Learning Algorithms in Qazvin Plain. Journal of Water and Soil, 35(5), 719-734. (In Persian with English abstract).
Mousavi, S. R. A., Sarmadian, F., Omid, M., & Bogaert, P. (2021c). Digital Modeling of Three-Dimensional Soil Salinity Variation Using Machine Learning Algorithms in Arid and Semi-Arid lands of Qazvin Plain. Iranian Journal of Soil and Water Research, 52(7), 1915-1929. (In Persian with English abstract).
Mousavi, S. R., Sarmadian, F., Omid, M., & Bogaert, P. (2022). Three-dimensional mapping of soil organic carbon using soil and environmental covariates in an arid and semi-arid region of Iran. Measurement, 201, 111706.
Neyestani, M., Sarmadian, F., Jafari, A., Keshavarzi, A., & Sharififar, A. (2021). Digital mapping of soil classes using spatial extrapolation with imbalanced data. Geoderma Regional, 26, e00422.
Odgers, N., Mcbratney, A., Minasny, B., Sun, W., & Clifford, D. (2014). DSMART: An algorithm to spatially disaggregate soil map units. In GlobalSoilMap: Basis of the Global Spatial Soil Information System—Proceedings of the 1st GlobalSoilMap Conference (pp. 261–266).
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.
Rahmani, A., Sarmadian, F., & Arefi, H. (2023). Digital modeling and prediction of soil subgroup classes using deep learning approach in a part of arid and semi-arid lands of Qazvin Plain. Iranian Journal of Soil and Water Research, 53(11), 2477-2499. (In Persian with English abstract).
Rasaei, Z., Rossiter, D.G., & Farshad, A. (2020). Rescue and renewal of legacy soil resource inventories in Iran as an input to digital soil mapping. Geoderma Regional, 21, e00262.
Rasaei, Z., Sarmadian, F., & Jafari, A. (2024) Investigating soil grouping using conventional and modern clustering models in some parts of Qazvin plain, Iranian Journal of Soil and Water Research, 55 (8), 1273-1295.
Rezaie, G., Sarmadian, F., Mohammadi Torkashvand, A., Seyedmohammadi, J., & Marashi Aliabadi, M. (2023). Digital mapping of surface and subsurface soil organic carbon and soil salinity variation in a part of Qazvin plain (Case study: Abyek and Nazarabad regions). Journal of Water and Soil, 37(2), 315-331. (In Persian with English abstract).
Richardson, A. J., & Wiegand, C. L. (1977). Distinguishing vegetation from soil background information. Photogrammetric engineering and remote sensing, 43(12), 1541-1552.
Roozitalab, M. H., Siadat, H., & Farshad, A. (Eds.). (2018). The Soils of Iran. Springer International Publishing.
Rossiter, D.G. (2001). Assessing the Thematic Accuracy of Area-Class Soil Maps. Soil Science Division, ITC. Enschede Netherlands. 43p.
Soil Survey Staff. 2022. Keys to Soil Taxonomy, 13th ed. USDA-Natural Resources Conservation Service.
Stoorvogel, J., Mulder, V.L., & Hendriks, C. (2019). The effect of disaggregating soil data for estimating soil hydrological parameters at different scales. Geoderma, 347, 185–193.
Thompson, J., Prescott, T., Moore, A.C., Bell, J. S., Kautz, D., Hempel, F., Waltman, S., & Perry, C. H. (2010). Regional approach to soil property mapping using legacy data and spatial disaggregation techniques. Proceedings of 19th World Congress Soil Science, Soil Solutions for a Changing World, 1-6 August, 17–20.
Vincent, S., Lemercier, B., Berthier, L., & Walter, C. (2018). Spatial disaggregation of complex Soil Map Units at the regional scale based on soil-landscape relationships. Geoderma, 311, 130–142.
Yang, L., Jiao, Y., Fahmy, S., Zhu, A. X., Hann, S., Burt, J. E., & Qi, F. (2011). Updating conventional soil maps through digital soil mapping. Soil Science Society of America Journal, 75(3), 1044-1053.
Zeraatpisheh, M., Ayoubi, S., Brungard, C.W., & Finke, P. (2019). Disaggregating and updating a legacy soil map using DSMART, fuzzy c-means and k-means clustering algorithms in Central Iran. Geoderma, 340, 249–258.