Modeling vegetation changes in Zagros forests based on machine learning algorithms and satellite images

Document Type : Research Paper

Authors

1 Nature Engineering Department, Faculty of Agriculture, Khuzestan University of Agricultural Sciences and Natural Resources

2 Department of Geology, University of Isfahan

3 Department of Soil Science and Engineering, Faculty of Agriculture, Khuzestan University of Agricultural Sciences and Natural Resources

Abstract

This research investigated and modeled changes in vegetation cover in southwestern Iranian forests using Landsat 8 satellite images and six machine learning algorithms including random forest(RF), Boosted Regression Tree (BRT), support vector machine(SVM), generalized additive model(GAM), Classification and Regression Tree (CART) and generalized linear model(GLM). First, satellite images were processed to determine deforestation areas over a decade from 2013 to 2023, and changes in Normalized Difference Vegetation Index in forest of the study area were calculated. Then, the parameters affecting forest cover changes were examined, and validation and verification of machine learning models were performed using efficiency indices. The results showed that in this study area, several factors affected forest cover and in some areas led to a 20% decrease in Normalized Difference Vegetation Index. The results also indicate that the random forest model has greater accuracy and precision than the other five models. The area under the curve (AUC) of these models is 0.986, 0.0.92, 0.865, 0.761, 0.743 and 0.682 respectively. On the other hand, the impact of different factors on forest cover reduction in different models is not the same; in general, the importance of the parameters of distance from the river, precipitation, evaporation, distance from villages, distance from the dam lake, erosion, distance from the road, slope direction, topographic wetness index, fire and slope are 0.176, 0.170, 0.170, 0.169, 0.167, 0.165, 0.165, 0.163, 0.162, 0.161 and 0.160 respectively.

Keywords

Main Subjects


Introduction

     Today, studying the parameters affecting forest regeneration and modeling them is expensive and complex. The phenomenon of forest destruction will lead to the release of millions of tons of carbon dioxide into the atmosphere, creating greenhouse conditions and disrupting the balance of the planet. Deforestation is one of the most important environmental challenges, as it is one of the most important factors in the weakening and destruction of the environment. Recently, by training machine learning models using experimental data and using the R programming language, it is possible to design algorithms and provide accurate predictions of forest growth and changes based on environmental conditions. Machine learning is not limited to environmental issues, but has been successfully used in diverse fields such as facial and voice recognition, medicine, robotics, aquaculture, food security, and climatology. Although this technology faces obstacles such as one-sided conclusions and data limitations, it will be accompanied by many advances and applications in in the future. Machine learning is a type of data analysis that automates model analysis. This method allows machines to learn from experience.

Materials and Methods

    In this study, eleven different parameters were used as factors affecting vegetation reduction in southwestern Iranian forests based on machine learning algorithms and satellite images. The factors studied include distance from village, distance from river, distance from lake, distance from road, soil erosion, precipitation, evaporation, fire, topographic wetness index, slope percentage and slope direction. In this regard, the digital elevation model (DEM) of the region was extracted from the USGS website. Also, slope percentage, slope direction, topographic wetness index and distance from the river were extracted using a digital elevation model. Distance from roads and villages were also prepared using Landsat 8 satellite data. Changes in Normalized Difference Vegetation Index (NDVI) in forest areas were obtained by processing Landsat 8 data. For this purpose, Landsat 8 satellite data was used during the years 2013 to 2023. Forest areas with a decrease in Normalized Difference Vegetation Index of more than 20% are considered as deforestation areas. Then, in this stage, in order to train machine learning models, 270,000 points were randomly selected for areas with and without deforestation (with all land uses) in the entire study area. Of these, about 90,000 points that had more than 20 percent changes in the Normalized Difference Vegetation Index in forest areas and 180,000 points that had less than 20 percent changes in the Normalized Difference Vegetation Index in forest areas. All the points were randomly divided into two groups (70% for learning and 30% for testing the model). The learning data was used to train the selected model and the testing data was used for validation. The available data was also used as input to the trained models to analyze the probability of future deforestation.

Results

     In order to identify areas experiencing deforestation, Landsat 8 satellite data was used from 2013 to 2023. Areas with a decrease of more than 20% in the Normalized Difference Vegetation Index a ten-year period were considered as areas experiencing deforestation. In this study, different methods were used to validate the accuracy of the models. Six different machine learning models, random forest(RF), support vector machine(SVM), generalized additive model(GAM), generalized linear model(GLM), classification, Boosted Regression Tree (BRT)  and Classification and Regression Tree (CART) were validated using different parameters including ROC and AUC plots, efficiency, TPR and FPR . These criteria indicate the predictive ability of machine learning models. The AUC values ​​of RF, BRT, SVM, GAM, CART and GLM models are 0.986, 0.92, 0.865, 0.761, 0.743 and 0.682, respectively. The results showed that the random forest model was more accurate in predicting deforestation than other models in the present study. The sensitivity (TPR) values ​​of RF, BRT, CART, SVM, GAM and GLM models were estimated as 0.953, 0.835, 0.826, 0.768, 0.668 and 0.604, respectively, and the specificity (FPR) values ​​of the models were estimated as 0.047, 0.142, 0.240, 0.174, 0.233 and 0.306, respectively. These figures indicate sufficient accuracy of all models used in this study for investigating forest cover changes. Considering the AUC, TPR and FPR indices, it can be concluded that all models have good accuracy, but overall, the random forest (RF) and boosted tree regression (BRT) models have more accuracy than the other four models. The importance of the parameters in the models used varied, in general, five parameters distance from the river, precipitation, evaporation, distance from the lake, and distance from the village are more important than other parameters. Therefore, climatic and anthropogenic factors simultaneously affect the destruction of forests in Khuzestan province.

Conclusions

     Although the six models used have sufficient accuracy, the comparison of the models shows that the random forest and tree-reinforced regression models are more accurate than the other four models. While the generalized additive and generalized linear models showed less accuracy than the other models. Finally, the results showed that among the six models used, the random forest (RF) model has the highest accuracy. Since the calculation method used by the models is different from each other; therefore, the extent of the areas at risk in the Khuzestan province area is not the same in different models. Studies show that the random forest model introduces the risk of deforestation in more limited areas than other models. While the support vector model and generalized ensemble models act conservatively and show more areas of the forests of the study area at risk. The risk of deforestation in the study area was investigated and predicted using six models. The models used show between 2 and 8 percent of the total area of ​​Khuzestan province (11 to 51 percent of the province's forest area) at high risk of deforestation. On average, about 4 percent of the total area of ​​the province has a high probability of deforestation, which includes about 29 percent of the province's forests.

Author Contributions

     All authors contributed equally to the conceptualization of the article and writing of the original and subsequent drafts.

Data Availability Statement

     Data available on request from the authors.

Acknowledgements

      The authors would like to thank all participants of the present study.

Ethical considerations

      The authors avoided data fabrication, falsification, plagiarism, and misconduct.

Conflict of interest

       The author declares no conflict of interest.

Aertsen, W., Kint, V., Van Orshoven, J., Özkan, K., & Muys, B. (2010). Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests. Ecological modelling221(8), 1119-1130.
Aksoy, S., Sertel, E., Roscher, R., Tanik, A., & Hamzehpour, N. (2024). Assessment of soil salinity using explainable machine learning methods and Landsat 8 images. International Journal of Applied Earth Observation and Geoinformation130, 103879.‏
Alimahmoudi Sarab, S. (2018). Monitoring dust and weather factors effecting on Quercus branti decline using satellite images in Southern Zagroos forests (Doctoral dissertation, Gorgan University of Agricultural Sciences and Natural Resources, Iran).(In Persian).
Bera, B., Saha, S., & Bhattacharjee, S. (2020). Forest cover dynamics (1998 to 2019) and prediction of deforestation probability using binary logistic regression (BLR) model of Silabati watershed, India. Trees, Forests and People2, 100034.
Buba, F. N., Gajere, E. N., & Ngum, F. F. (2020). Assessing the Correlation between Forest Degradation and Climate Variability in the Oluwa Forest Reserve, Ondo State, Nigeria. American Journal of Climate Change9(4), 371-390.‏
Centre for People and Forests—CPF (2012). Climate Change, Forests, and You. Gras- sroots Capacity Building for REDD+ in the Asia-Pacific Region.
Dias, F., Suhadolnik, N., Camargo, H., & Da Silva, S. (2024). Predicting the pulse of the Amazon: Machine learning insights into deforestation dynamics. Journal of Environmental Management, 362, 121359. ‏
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of animal ecology77(4), 802-813.‏
Fang, K., Shen, C., Kifer, D., & Yang, X. (2017). Prolongation of SMAP to spatiotemporally seamless coverage of continental US using a deep learning neural network. Geophysical Research Letters44(21), 11-030.‏
FAO and UNEP (2020). The State of the World's Forests 2020. Forests, Biodiversity and People. FAO and UNEP, Rome. https://doi.org/10.4060/ca8642en.
Fenton, N., & Neil, M. (2018). Risk assessment and decision analysis with Bayesian networks. Crc Press.‏
Huang, C., Davis, L. S., & Townshend, J. R. G. (2002). An assessment of support vector machines for land cover classification. International Journal of remote sensing23(4), 725-749.‏
Jabeen, S., Li, X., Amin, M. S., Bourahla, O., Li, S., & Jabbar, A. (2023). A review on methods and applications in multimodal deep learning. ACM Transactions on Multimedia Computing, Communications and Applications19(2s), 1-41.‏
Johnston, R., Jones, K., & Manley, D. (2018). Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Quality & quantity52, 1957-1976.‏
Kang, J., Schwartz, R., Flickinger, J., & Beriwal, S. (2015). Machine learning approaches for predicting radiation therapy outcomes: a clinician's perspective. International Journal of Radiation Oncology* Biology* Physics, 93(5), 1127-1135.‏
Keshtkar, M. (2023). Survey and analysis of changes in land cover in Iran during the years 1933 to 2015 in studies of the National Land Planning Document, research report document, Center for Development and Foresight Research of the National Planning and Budget Organization, 220 pp. (In Persian).
Kumar, B. P., Babu, K. R., Anusha, B. N., & Rajasekhar, M. (2022). Geo-environmental monitoring and assessment of land degradation and desertification in the semi-arid regions using Landsat 8 OLI/TIRS, LST, and NDVI approach. Environmental Challenges, 8, 100578.‏
Kumar, R., Nandy, S., Agarwal, R., & Kushwaha, S. P. S. (2014). Forest cover dynamics analysis and prediction modeling using logistic regression model. Ecological indicators, 45, 444-455.‏
Maione, C., & Barbosa, R. M. (2019). Recent applications of multivariate data analysis methods in the authentication of rice and the most analyzed parameters: A review. Critical reviews in food science and nutrition, 59(12), 1868-1879.‏
Marcus, M. S., Hergoualc'h, K., Coronado, E. N. H., & Gutiérrez-Vélez, V. H. (2024). Spatial distribution of degradation and deforestation of palm swamp peatlands and associated carbon emissions in the Peruvian Amazon. Journal of Environmental Management351, 119665.‏
Matthew et al.,2024, Spatial distribution of degradation and deforestation of palm swamp peatlands and associated carbon emissions in the Peruvian Amazon, Journal of Environmental Management 351 (2024) 119665.
Mayfield, H., Smith, C., Gallagher, M., & Hockings, M. (2017). Use of freely available datasets and machine learning methods in predicting deforestation. Environmental modelling & software87, 17-28.‏
Moscovini, L., Ortenzi, L., Pallottino, F., Figorilli, S., Violino, S., Pane, C., ... & Costa, C. (2024). An open-source machine-learning application for predicting pixel-to-pixel NDVI regression from RGB calibrated images. Computers and Electronics in Agriculture216, 108536.‏
Pradhan, B. (2013). A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Computers & Geosciences51, 350-365.‏
Rahimikhoob, A. (2014). Comparison between M5 model tree and neural networks for estimating reference evapotranspiration in an arid environment. Water resources management28, 657-669.‏ (In Persian).
Saha, S., Bhattacharjee, S., Shit, P. K., Sengupta, N., & Bera, B. (2022). Deforestation probability assessment using integrated machine learning algorithms of Eastern Himalayan foothills (India). Resources, Conservation & Recycling Advances14, 200077.‏
Saha, S., Saha, M., Mukherjee, K., Arabameri, A., Ngo, P. T. T., & Paul, G. C. (2020). Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India. Science of the Total Environment730, 139197.‏
Samuel Jos 2024, Machine learning methods: Modeling net growth in the Atlantic Forest of Brazil, Ecological Informatics, 81(3):102564 81(3):102564.
Senior, R. A., Hill, J. K., González del Pliego, P., Goode, L. K., & Edwards, D. P. (2017). A pantropical analysis of the impacts of forest degradation and conversion on local temperature. Ecology and evolution7(19), 7897-7908.‏
Shabani, S., Pourghasemi, H. R., & Blaschke, T. (2020). Forest stand susceptibility mapping during harvesting using logistic regression and boosted regression tree machine learning models. Global Ecology and Conservation22, e00974.‏ (In Persian).
Sharifani, K., & Amini, M. (2023). Machine learning and deep learning: A review of methods and applications. World Information Technology and Engineering Journal10(07), 3897-3904.‏
Shirmardi, H. A., Heydari, G., Ghorbani, J., Tahmasebi, P., & Mehnatkesh, A. (2019). Changes of vegetation indices in arable lands with different years of abandonment in Shirmard rangelands, Chaharmahal va Bakhtiari province. Journal of Plant Ecosystem Conservation6(13), 177-196.‏ (In Persian).
Takahashi, K., Kim, K., Ogata, T., & Sugano, S. (2017). Tool-body assimilation model considering grasping motion through deep learning. Robotics and Autonomous Systems91, 115-127.‏
Tovar, C. L. M. (2009). Analysis of the normalized differential vegetation index (NDVI) for the detection of degradation of forest coverage in Mexico 2008–2009.‏
Tuoku, L., Wu, Z., & Men, B. (2024). Impacts of climate factors and human activities on NDVI change in China. Ecological Informatics81, 102555.‏
Vapnik, V.N., (1995). The Nature of Statistical Learning Theory. Springer Science & Business Media, Berlin, Germany.
Wang, Z., Wang, Y., Liu, Y., Wang, F., Deng, W., & Rao, P. (2023). Spatiotemporal characteristics and natural forces of grassland NDVI changes in Qilian Mountains from a sub-basin perspective. Ecological Indicators, 157, 111186.‏
Zhang, B., He, X., Ouyang, F., Gu, D., Dong, Y., Zhang, L., ... & Zhang, S. (2017). Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer letters403, 21-27.‏
Zhao, J., Li, L., Liu, J., Yan, Y., Wang, Q., Newman, C., & Zhou, Y. (2024). A bibliometric analysis using machine learning to track paradigm shifts and analytical advances in forest ecology and forestry journal publications from 2010 to 2022. Forest Ecosystems, 100233.‏
Zhou, C., Lin, K., Xu, D., Chen, L., Guo, Q., Sun, C., & Yang, X. (2018). Near infrared computer vision and neuro-fuzzy model-based feeding decision system for fish in aquaculture. Computers and electronics in agriculture, 146, 114-124.‏
Zoratipour, A. (2021), Investigation of actual evapotranspiration and spatial distribution of water requirement of strategic agricultural crops using remote sensing algorithms and comparison with physical-based evapotranspiration models. National Project Report (National Planning and Budget Organization, 2021), Project No.: 502374, 148p. (In Persian).