Evaluation of effective parameters for predicting the potassium grade of saline water by using support vector machine and random forest algorithms (case study: playa of Khoor and Biabank area city, Isfahan province)

Document Type : Research Paper

Authors

1 Department of Soil Science and Engineering, Faculty of Water and Soil Engineering, Gorgan University of Agricultural Sciences and Natural Resources, Gorgan, Iran.

2 Department of Desert Management, Faculty of Pasture and Watershed Management, Gorgan University of Agricultural Sciences and Natural Resources, Gorgan, Iran

3 Department of Archaeology, Faculty of Humanities, Higher Education Institute of Architecture and Arts, Tehran, Iran.

Abstract

The importance of potassium in agricultural products has increased the demand for potassium fertilizers. Potassium grade in aquifers ensures its extraction. The purpose of this research is to use RF and SVM algorithms in order to prioritize the effective parameters on the potassium grade of saline water groundwater in playa Khoor and Biabank in Isfahan province. For this purpose, 55 parameters were measured in 12 drilling holes.The parameters measured as independent variables include the percentage of saturated moisture, the apparent specific gravity and the porosity of the core at 15 different depths, the area  polygon, the depth of the underground water, the depth of the salt layer, the potassium of the surface layer, the density of the brine and the amount of Elements of calcium, magnesium, sodium, chlorine and grade potassium were included in the model as dependent variables. In the RF model, the (PFI) and (RFE) were used for prioritization. In the different kernels of the SVM algorithm, in order to prevent the collinearity of the independent parameters, all the combinations of the independent variables, considering the variance inflation factor less than 8 and the highest coefficient of determination and the lowest MSE error, were examined and selected as the best combination. The effective parameters in predicting the grade potassium of the brine in the RF algorithm and the linear function of the SVM algorithm are sp, ap, duw, slp, SAR and n, sp, duw, and SAR respectively, which led to the best results. The coefficient of determination for both models is 0.99 and 0.97, respectively, which indicates the good accuracy of both algorithms.

Keywords

Main Subjects


Evaluation of effective parameters for predicting the potassium grade of saline water by using support vector machine and random forest algorithms (case study: playa of Khoor and Biabank area city, Isfahan province)

EXTENDED ABSTRACT

Introduction:

  With the increase in the world population, one of the important issues in the field of agriculture is increasing the production of agricultural products, and potassium is one of the most widely used elements to increase crop yield. For this reason, the demand for potassium fertilizers increases. One of the main sources of potassium fertilizers is underground water. One of the important issues in saline water extraction is the amount of potassium grade of saline water conventional methods of grade estimation, such as geometric and geostatistics techniques, cannot accurately estimate the grade value and have low accuracy. One of the novel solutions to estimate the grade of minerals is Machine learning algorithm, which perform evaluation and determination of the grade of mineral resources with high accuracy.

Objective:

The aim of this research is to evaluate the effective parameters for predicting the potassium grade of saline water using machine learning algorithms (random forest and support vector machine) as new, low cost and cost effective methods and determining the effective parameters (independent variables used) with the greatest influence measuring the potassium grade in order to improve the utilization of potassium reserves and reduce executive, operational and laboratory costs.

 Materials and method:

The purposes of this  research is to use support vector machine (SVM) and random forest (RF) algorithms in order to predict and prioritize the effective parameters on the potassium grade of groundwater in playa  Khoor and Biabank in Isfahan province. For this purpose, 55 different parameters were measured in 12 boreholes (sampling locations). The parameters measured as independent variables include the percentage of saturation moisture core at 15 different depths (sp1 sp15), the apparent specific gravity of the core at 15 different depths (pb1 pb15), the porosity of the core at 15 different depths (n1 n15), polygon area (ap), underground water depth (duw), salt layer depth (dsl), surface layer potassium (slp), brine density (d) and the amount of calcium (Ca), magnesium (Mg), sodium (Na), chlorine ( Cl) and the dependent variable were also the potassium grade in the brine (Potassium Grade). three parameters n, sp and pb which were measured in 15 different depths; They were converted into an equivalent parameter using the principal component analysis (PCA) method. Also, three measured parameters, Ca, Mg, and Na were entered into the model with the sodium absorption ratio (SAR) formula. A total of 10 measured parameters were entered into the model as independent variables to predict the grade of potassium. Both RF and SVM models were implemented in Python programming language based on the relationship between dependent variable and independent variables. In different kernels of the SVM algorithm, in order to prevent the collinearity of independent parameters, all the different combinations of independent variables (2 to the power of 10 different combinations) considering the variance inflation factor (VIF) less than 8 and the highest coefficient of determination and the lowest MSE error are checked and the best combination were chosen. Permutation Feature Importance (PFI) and Recursive Feature Elimination (RFE) methods were used in the RF model to prioritize and select parameters for modeling.

Results and discussion:

The parameters effective in predicting the potassium grade  of the both in the RF algorithm and the linear function of the SVM algorithm were sp, ap, duw, slp, SAR and n, sp, duw, and SAR respectively, which led to the best results (high determination coefficient and low error). Based on the results, the accuracy of the model (explanation coefficient) for the RF model and SVM (linear function) was 0.99 and 0.97, respectively, which indicates the good accuracy of both algorithms. Effective parameters in choosing suitable areas for drilling in order to extract potassium from saline water   play a significant role and prevent repeated and time consuming tests in the laboratory, and the developed models can be used for this purpose.

 

 

Conclusion:

Machine learning algorithms are one of the most important techniques for evaluating mineral grade estimation. Given that, a large part of the country consists of arid and semi arid areas, where there are many playas that are rich in underground saline water that have good and suitable reserves of potassium and because in playa, the conditions are unpredictable and the environment has high complexity، Effective parameters in choosing suitable areas for drilling in order to extract potassium from saline water   play a significant role.

Amini Khoei, Z., & Abdullah Puri, A. (2017). Network traffic classification using improved random forest algorithm. Journal of Computer Science, 2 (2): 24 38. (In Persian).
Baudron, P., Alonso-Sarría, F., García-Aróstegui, J. L., Cánovas-García, F., Martínez-Vicente, D., & Moreno-Brotóns, J. (2013). Identifying the origin of groundwater samples in a multi-layer aquifer system with Random Forest classification. Journal of Hydrology499, 303-315.
Chatterjee, S., & Bandopadhyay, S. (2011). Goodnews Bay Platinum resource estimation using least squares support vector regression with selection of input space dimension and hyperparameters. Natural Resources Research20, 117-129.
Chen, H., Huang, J. J., & McBean, E. (2020). Partitioning of daily evapotranspiration using a modified shuttleworth-wallace model, random Forest and support vector regression, for a cabbage farmland. Agricultural Water Management228, 105923.
Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.
Dutta, S. (2006). Predictive performance of machine learning algorithms for ore reserve estimation in sparse and imprecise data. University of Alaska Fairbanks.
Dutta, S., Bandopadhyay, S., Ganguli, R., & Misra, D. (2010). Machine learning algorithms and their application to ore reserve estimation of sparse and imprecise data. Journal of Intelligent Learning Systems and Applications2(02), 86-96.
Estefan, G., Sommer, R., & Ryan, J. (2013). Methods of soil, plant, and water analysis. A manual for the West Asia and North Africa region3, 65-119.
García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2018). Principal components analysis random discretization ensemble for big data. Knowledge-Based Systems150, 166-174.
Ghorbanzadeh, O., Rostamzadeh, H., Blaschke, T., Gholaminia, K., & Aryal, J. (2018). A new GIS-based data mining technique using an adaptive neuro-fuzzy inference system (ANFIS) and k-fold cross-validation approach for land subsidence susceptibility mapping. Natural Hazards94, 497-517.
Hasni Pak, A (2005). Exploratory data analysis. second edition. Tehran: Tehran University Press. (In Persian).
Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.
Jafrasteh, B., Fathianpour, N., & Suárez, A. (2018). Comparison of machine learning methods for copper ore grade estimation. Computational Geosciences22, 1371-1388.
Jalloh, A. B., Kyuro, S., Jalloh, Y., & Barrie, A. K. (2016). Integrating artificial neural networks and geostatistics for optimum 3D geological block modeling in mineral reserve estimation: A case study. International Journal of Mining Science and Technology26(4), 581-585.
Jeon, H., & Oh, S. (2020). Hybrid-recursive feature elimination for efficient feature selection. Applied Sciences10(9), 3211.
Kaneko, H. (2022). Cross‐validated permutation feature importance considering correlation between features. Analytical Science Advances3(9-10), 278-287.
Kisi, O., Karahan, M. E., & Şen, Z. (2006). River suspended sediment modelling using a fuzzy logic approach. Hydrological Processes: An International Journal20(20), 4351-4362.
Li, X. L., Li, L. H., Zhang, B. L., & Guo, Q. J. (2013). Hybrid self-adaptive learning based particle swarm optimization and support vector regression model for grade estimation. Neurocomputing118, 179-190.
Maleki, S., Ramazia, H. R., & Moradi, S. (2014). Estimation of Iron concentration by using a support vector machineand an artificial neural network-the case study of the Choghart deposit southeast of Yazd, Yazd, Iran. Geopersia4(2), 201-212.
Manouchehri, Sh. (2003) Potash, Encyclopaedia of Mineral Materials and Industries of Iran, Iran Mineral Industries Research and Development Company. (In Persian).
Matías, J. M., Vaamonde, A., Taboada, J., & Gonzalez-Manteiga, W. (2004). Support vector machines and gradient boosting for graphical estimation of a slate deposit. Stochastic Environmental Research and Risk Assessment18, 309-323.
McKay, G., & Harris, J. R. (2016). Comparison of the data-driven random forests model and a knowledge-driven method for mineral prospectivity mapping: A case study for gold deposits around the Huritz Group and Nueltin Suite, Nunavut, Canada. Natural Resources Research25(2), 125-143.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT press.
Moorthi, S. M., Misra, I., Kaur, R., Darji, N. P., & Ramakrishnan, R. (2011). Kernel based learning approach for satellite image classification using support vector machine. In 2011 IEEE Recent Advances in Intelligent Computational Systems (pp. 107-110). IEEE.
Moriasi, D. N., Arnold, J. G., Van Liew, M. W., Bingner, R. L., Harmel, R. D., & Veith, T. L. (2007). Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE50(3), 885-900.
Mousavi, S.A., RanjbarFardoi, A., Mousavi, S. H. (2022). Modeling soil erodibility in Khoor and Biabank region using remote sensing indicators. Desert Ecosystem Engineering, 5(13): 67 80. (In Persian).
Naghibi, S. A., Pourghasemi, H. R., & Dixon, B. (2016). GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environmental monitoring and assessment188, 1-27.
Nitze, I., Schulthess, U., & Asche, H. (2012). Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification. Proceedings of the 4th GEOBIA, Rio de Janeiro, Brazil79, 3540.
Oke, J., Akinkunmi, W. B., & Etebefia, S. O. (2019). Use of correlation, tolerance and variance inflation factor for multicollinearity test. GSJ7(5).
Pozdnoukhov, A. (2005). Support vector regression for automated robust spatial mapping of natural radioactivity. automatic mapping algorithms, 57.
Ray, S. (2019). A quick review of machine learning algorithms. International conference on machine learning, big data, cloud and parallel computing (COMITCon) (pp. 35-39).
Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., & Rigol-Sanchez, J. P. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS journal of photogrammetry and remote sensing67, 93-104.
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., & Chica-Rivas, M. J. O. G. R. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews71, 804-818.
Schnitzler, N., Ross, P. S., & Gloaguen, E. (2019). Using machine learning to estimate a key missing geochemical variable in mining exploration: Application of the Random Forest algorithm to multi-sensor core logging data. Journal of Geochemical Exploration205, 106344.
Shaw, P. A., & Bryant, R. G. (2011). Pans, playas and salt lakes. Arid zone geomorphology: process, form and change in drylands, 373-401.
Sheng, L., Zhang, T., Niu, G., Wang, K., Tang, H., Duan, Y., & Li, H. (2015). Classification of iron ores by laser-induced breakdown spectroscopy (LIBS) combined with random forest (RF). Journal of Analytical Atomic Spectrometry30(2), 453-458.
Soliman, O. S., & Mahmoud, A. S. (2012, May). A classification system for remote sensing satellite images using support vector machine with non-linear kernel functions. In 2012 8th International Conference on Informatics and Systems (INFOS) (pp. BIO-181). IEEE.
Tenorio, V. O., Bandopadhyay, S., Misra, D., Naidu, S., & Kelley, J. (2015). Support vector machines applied for resource estimation of underwater glacier-type platinum deposits. Application Of Computers and Operations Research in the Mineral Industry, 889-902.
Tiwari, S., Babbar, R., & Kaur, G. (2018). Performance evaluation of two ANFIS models for predicting water quality index of River Satluj (India). Advances in Civil Engineering2018.
Twarakavi, N. K., Misra, D., & Bandopadhyay, S. (2006). Prediction of arsenic in bedrock derived stream sediments at a gold mine site under conditions of sparse data. Natural Resources Research15, 15-26.
Wang, C., Pan, Y., Chen, J., Ouyang, Y., Rao, J., & Jiang, Q. (2020). Indicator element selection and geochemical anomaly mapping using recursive feature elimination and random forest methods in the Jingdezhen region of Jiangxi Province, South China. Applied Geochemistry122, 104760.
Yang, Q., Li, X., & Shi, X. (2008). Cellular automata for simulating land use changes based on support vector machines. Computers & geosciences34(6), 592-602.
Zhang, S., Xiao, K., Carranza, E. J. M., & Yang, F. (2019). Maximum entropy and random forest modeling of mineral potential: Analysis of gold prospectivity in the Hezuo–Meiwu district, west Qinling Orogen, China. Natural Resources Research28, 645-664.
Zörb, C., Senbayram, M., & Peiter, E. (2014). Potassium in agriculture–status and perspectives. Journal of plant   physiology171(9), 656-669.