Document Type : Research Paper
Authors
1 Nature Engineering Department, Faculty of Agriculture, Khuzestan University of Agricultural Sciences and Natural Resources
2 Department of Geology, University of Isfahan
3 Department of Soil Science and Engineering, Faculty of Agriculture, Khuzestan University of Agricultural Sciences and Natural Resources
Abstract
Keywords
Main Subjects
Today, studying the parameters affecting forest regeneration and modeling them is expensive and complex. The phenomenon of forest destruction will lead to the release of millions of tons of carbon dioxide into the atmosphere, creating greenhouse conditions and disrupting the balance of the planet. Deforestation is one of the most important environmental challenges, as it is one of the most important factors in the weakening and destruction of the environment. Recently, by training machine learning models using experimental data and using the R programming language, it is possible to design algorithms and provide accurate predictions of forest growth and changes based on environmental conditions. Machine learning is not limited to environmental issues, but has been successfully used in diverse fields such as facial and voice recognition, medicine, robotics, aquaculture, food security, and climatology. Although this technology faces obstacles such as one-sided conclusions and data limitations, it will be accompanied by many advances and applications in in the future. Machine learning is a type of data analysis that automates model analysis. This method allows machines to learn from experience.
In this study, eleven different parameters were used as factors affecting vegetation reduction in southwestern Iranian forests based on machine learning algorithms and satellite images. The factors studied include distance from village, distance from river, distance from lake, distance from road, soil erosion, precipitation, evaporation, fire, topographic wetness index, slope percentage and slope direction. In this regard, the digital elevation model (DEM) of the region was extracted from the USGS website. Also, slope percentage, slope direction, topographic wetness index and distance from the river were extracted using a digital elevation model. Distance from roads and villages were also prepared using Landsat 8 satellite data. Changes in Normalized Difference Vegetation Index (NDVI) in forest areas were obtained by processing Landsat 8 data. For this purpose, Landsat 8 satellite data was used during the years 2013 to 2023. Forest areas with a decrease in Normalized Difference Vegetation Index of more than 20% are considered as deforestation areas. Then, in this stage, in order to train machine learning models, 270,000 points were randomly selected for areas with and without deforestation (with all land uses) in the entire study area. Of these, about 90,000 points that had more than 20 percent changes in the Normalized Difference Vegetation Index in forest areas and 180,000 points that had less than 20 percent changes in the Normalized Difference Vegetation Index in forest areas. All the points were randomly divided into two groups (70% for learning and 30% for testing the model). The learning data was used to train the selected model and the testing data was used for validation. The available data was also used as input to the trained models to analyze the probability of future deforestation.
In order to identify areas experiencing deforestation, Landsat 8 satellite data was used from 2013 to 2023. Areas with a decrease of more than 20% in the Normalized Difference Vegetation Index a ten-year period were considered as areas experiencing deforestation. In this study, different methods were used to validate the accuracy of the models. Six different machine learning models, random forest(RF), support vector machine(SVM), generalized additive model(GAM), generalized linear model(GLM), classification, Boosted Regression Tree (BRT) and Classification and Regression Tree (CART) were validated using different parameters including ROC and AUC plots, efficiency, TPR and FPR . These criteria indicate the predictive ability of machine learning models. The AUC values of RF, BRT, SVM, GAM, CART and GLM models are 0.986, 0.92, 0.865, 0.761, 0.743 and 0.682, respectively. The results showed that the random forest model was more accurate in predicting deforestation than other models in the present study. The sensitivity (TPR) values of RF, BRT, CART, SVM, GAM and GLM models were estimated as 0.953, 0.835, 0.826, 0.768, 0.668 and 0.604, respectively, and the specificity (FPR) values of the models were estimated as 0.047, 0.142, 0.240, 0.174, 0.233 and 0.306, respectively. These figures indicate sufficient accuracy of all models used in this study for investigating forest cover changes. Considering the AUC, TPR and FPR indices, it can be concluded that all models have good accuracy, but overall, the random forest (RF) and boosted tree regression (BRT) models have more accuracy than the other four models. The importance of the parameters in the models used varied, in general, five parameters distance from the river, precipitation, evaporation, distance from the lake, and distance from the village are more important than other parameters. Therefore, climatic and anthropogenic factors simultaneously affect the destruction of forests in Khuzestan province.
Although the six models used have sufficient accuracy, the comparison of the models shows that the random forest and tree-reinforced regression models are more accurate than the other four models. While the generalized additive and generalized linear models showed less accuracy than the other models. Finally, the results showed that among the six models used, the random forest (RF) model has the highest accuracy. Since the calculation method used by the models is different from each other; therefore, the extent of the areas at risk in the Khuzestan province area is not the same in different models. Studies show that the random forest model introduces the risk of deforestation in more limited areas than other models. While the support vector model and generalized ensemble models act conservatively and show more areas of the forests of the study area at risk. The risk of deforestation in the study area was investigated and predicted using six models. The models used show between 2 and 8 percent of the total area of Khuzestan province (11 to 51 percent of the province's forest area) at high risk of deforestation. On average, about 4 percent of the total area of the province has a high probability of deforestation, which includes about 29 percent of the province's forests.
All authors contributed equally to the conceptualization of the article and writing of the original and subsequent drafts.
Data available on request from the authors.
The authors would like to thank all participants of the present study.
The authors avoided data fabrication, falsification, plagiarism, and misconduct.
The author declares no conflict of interest.