Soil is a crucial component in achieving sustainable development goals due to its significant role in addressing environmental challenges. It is essential to differentiate soils that have similar management requirements. This necessity has prompted soil scientists to employ numerical classification models to categorize soils based on their similarities. In this study, we utilized two types of clustering models, traditional and modern, to classify soils from certain areas of the Qazvin Plain. Using one-way and two-way clustering models, we grouped 297 soils from the region based on a comprehensive set of their morphological, physicochemical, and environmental attributes. The classifications derived from these two models were assessed using internal and external evaluation indicators, with the distribution map of soil subgroups serving as a ground truth reference map. The results indicated that the hierarchical clustering model, with a lower Davis-Bouldin index (DB: 1.38) and a higher adjusted Rand index (ARI: 0.49), outperformed the biclustering model. However, the classifications from the bidirectional clustering model corresponded reasonably well with the topographical and soil changes in the region, as evidenced by the higher Shannon’s difference index in the bidirectional clustering model (1.82) compared to the hierarchical clustering model (1.62). Overall, the study’s findings underscore the utility of the co-clustering model as a contemporary data mining technique for soil classification and identification of soil management similarity patterns.


The classification of soils is crucial for their proper management and identification. Soil scientists have recognized the importance of numerical classification models for soil grouping. Creating continuous maps using digital mapping models allows for a better understanding of soil class distribution, aiding in improved soil management. Traditional clustering methods group soils based on their distinct properties, while two-way clustering methods group soils within subsets of similar characteristics. This study aims to compare the effectiveness of one-way and two-way clustering models in soil grouping and in identifying relationships between different soils.

Materials and Methods

This study utilized a dataset of 297 soil samples from some parts of the Qazvin plain. A broad spectrum of morphological, physicochemical, and environmental variables was used for soil grouping. The hierarchical clustering method was employed for one-way soil grouping, and the two-way clustering method was used for co-clustering of them. The Davis-Bouldin (DB) index was used to evaluate the groupings obtained from these models based on the degree of soil separation and intra-group variance. The Adjusted Rand Index (ARI) was used for external evaluation of groupings, with the distribution map of soil subgroups serving as a ground truth reference. Shannon’s entropy index was used to assess the efficiency of these models in representing soil variability in the study area.

Results and Discussion

The study found that both models were successful in differentiating the region's soils based on topographical and physiographic unit changes. However, the two-way clustering model demonstrated a slightly different pattern in soil separation, particularly in the central and southern parts of the study area. A numerical comparison of the results showed that the one-way clustering model provided better soil separation and less variance (DB: 1.38), and was more congruent with the distribution map of soil subgroups in the region (ARI: 0.49). The two-way clustering model effectively represented the pattern of soil changes in the study area, as evidenced by a higher Shannon index (1.82) compared to the hierarchical clustering model (1.62).


Although numerical comparative evaluations of the groupings reveal the superior efficiency of the hierarchical clustering model in separating soil groups, the two-way clustering model successfully grouped the region's soils according to their changes and the region's physiographic changes. This model also effectively represented soil changes in the region, as indicated by a higher Shannon entropy index. The study's findings affirm the efficacy of the two-way clustering model as a modern data mining technique in identifying similar soils and, consequently, in their grouping and modeling in digital soil mapping studies. The use of this model is recommended for examining soils in different parts of the country

