نوع مقاله : مقاله پژوهشی
نویسندگان
1 گروه علوم و مهندسی خاک، دانشکده کشاورزی، دانشکدگان کشاورزی و منابع طبیعی، دانشگاه تهران، کرج، ایران
2 عضو هیأت علمی گروه مهندسی علوم خاک، پردیس کشاورزی و منابع طبیعی دانشگاه تهران
3 بخش علوم و مهندسی خاک، دانشکده کشاورزی-دانشگاه شهیدباهنر کرمان
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Conventional soil maps consist of polygon units where soil types are delineated with clear boundaries, yet intra-unit variability of soil classes remains undefined. To address this, the present study aims to disaggregate the inherited one-millionth soil map units of Iran using the DSMART model in the Abyek region. The DSMART was applied using three tree-based algorithms: C5.0, Random Forest (RF), and Extreme Gradient Boosting (XGBoost), across two scenarios: (1) utilizing legacy soil map information, and (2) integrating 230 new soil profiles at the subgroup level. Model performance and uncertainty were evaluated using overall accuracy, Kappa coefficient, and confusion index. In Scenario 1, map accuracy ranged from 0.29 to 0.37, with Kappa values between 0.17 and 0.29. The highest performance was achieved by XGBoost, showing a confusion index of 0.74. In Scenario 2, accuracy improved to 0.51–0.63 and Kappa to 0.44–0.60, with the best results from the RF model, although the confusion index slightly dropped to 0.65. Spatial consistency with observed soil subgroup distribution improved significantly—by 43% in Scenario 2. Topography proved most influential in Scenario 1, while mean annual rainfall and vertical vegetation index dominated in Scenario 2. Incorporating new soil profile data enhanced model performance by up to 26%. These findings underscore the effectiveness of DSMART, particularly when enriched with new soil data, in refining legacy soil map units for more precise soil class delineation.
کلیدواژهها [English]
EXTENDED ABSTRACT
Legacy soil data are vital for preserving national resources and serve as a foundational reference for digital soil mapping and environmental modeling. However, many conventional soil maps—such as those produced at a scale of 1:1,000,000—lack the spatial detail required for modern land management applications. These maps often rely heavily on expert judgment and broad physiographic units, resulting in polygonal representations that blend multiple soil classes without describing their internal variability. Consequently, such aggregated maps limit our understanding of soil transitions and spatial patterns critical to agricultural planning, ecological assessments, and policy decisions.
Spatial disaggregation has emerged as a solution for enhancing map resolution and extracting more detailed, pixel-level soil information from generalized units. Nevertheless, relying solely on inherited map units without refining them with updated data can propagate uncertainties and reduce the predictive power of resulting maps. Therefore, integrating new, geo-referenced soil profile observations is essential to improve model calibration, capture local variability, and reduce epistemic uncertainty. This study aims to disaggregate and update Iran's national 1:1,000,000 legacy soil map in the Abyek region of the Qazvin Plain using DSMART (Disaggregation of Soil Map Units Through Resampled Classification Trees) in conjunction with three tree-based algorithms in two different scenarios. The primary goal is to evaluate the combined effect of algorithm choice and input data richness on the spatial delineation of soil classes.
The study was conducted in the Abyek region of the Qazvin Plain, spanning 58,000 hectares with varied topography and semi-arid to arid climate conditions. To support disaggregation, eight key environmental covariates representing soil-forming factors—topography, vegetation, salinity, and climate—were selected using a robust feature selection method. Disaggregation of legacy soil map units was performed using the DSMART model under two scenarios: (1) relying solely on existing map information, and (2) incorporating 230 new soil profiles classified at the USDA subgroup level. Three tree-based machine learning algorithms (C5.0, Random Forest, and XGBoost) were evaluated. Model outputs included probabilistic soil class maps. Model validation was performed using confusion matrices, overall accuracy, Kappa coefficients, the confusion (entropy) index, and spatial concordance via Cramér's V.
Scenario 1 yielded modest accuracy, with overall prediction rates ranging from 0.29 to 0.37. Among the three models, XGBoost achieved the highest Shannon diversity and composite performance index, demonstrating strength in capturing soil variability. However, high confusion indices in central and southern areas pointed to unresolved overlaps between closely related soil classes, likely due to limited input diversity and high-class imbalance.
Scenario 2, enhanced with 230 new profiles, resulted in substantial performance gains. Overall accuracy rose to 0.67 and Kappa reached 0.60, with agreement between predicted and observed soil subgroups improving by 43%. Random Forest achieved the highest composite accuracy–diversity score (1.34), while XGBoost maintained the lowest confusion index (0.60), reflecting more confident classifications. The inclusion of new field data improved the model's ability to predict rare subgroups and redistributed class probabilities more realistically across polygons. Notably, this also shifted the influence of environmental predictors: while topographic indices (e.g., elevation, MrVBF, valley depth) dominated Scenario 1, climatic (precipitation) and vegetation indices became more important in Scenario 2, suggesting improved model understanding of soil-environment relationships.
These results confirm that DSMART's disaggregation capability is highly sensitive to the quality and quantity of input data. More accurate soil class delineation—particularly for underrepresented or complex classes—requires sufficient profile observations and landscape-driven covariates. The findings align with recent studies emphasizing the role of disaggregated models and enriched data inputs for refining legacy maps in diverse agroecological settings.
This study demonstrated the applicability of the DSMART model for disaggregating Iran's one-millionth legacy soil map units in the Abyek region. Among the tree-based models tested, Random Forest and Extreme Gradient Boosting outperformed C5.0 in modeling soil subgroups. Incorporating 230 new soil profiles substantially improved model performance, underscoring the importance of high-quality field data. Topographic variables were most influential in the baseline model, while climatic and vegetation indices became more prominent with data enrichment. The results affirm DSMART's adaptability for refining outdated soil maps and support future use of additional soil observations and soil–landscape relationships to enhance spatial prediction accuracy in complex terrains.
Conceptualization, Z.R., F.S. and A.J.; methodology, Z.R. and A.J.; software, Z.R.; validation, Z.R.; formal analysis, Z.R.; investigation, Z.R., and A.J.; resources, Z.R.; data curation, F.S.; writing—original draft preparation, Z.R.; writing—review and editing, Z.R., F.S. and A.J.; visualization, Z.R., and A.J.; supervision, F.S.; project administration, Z.R. and F.S.; funding acquisition, Z.R. All authors have read and agreed to the published version of the manuscript.
Not applicable.
This work is based on research funded by the Iran National Science Foundation (INSF) under project No. 4024343. The first author gratefully acknowledges the financial support provided by the INSF and the University of Tehran, both for funding during data acquisition and analysis, and for creating the necessary conditions to carry out and complete this study.
The study was approved by the Ethics Committee of the University of ABCD (Ethical code: IR.UT.RES.2024.500). The authors avoided data fabrication, falsification, plagiarism, and misconduct.
The author declares no conflict of interest.