Assessing the Factors Affecting the Salinity Risk of Groundwater using Data Mining and Statistical Methods in arid and Semi-Arid Regions
Over the past decade, the trend of declining water levels as well as declining groundwater quality along with quantity is a major issue in water resources management. In this study frequency ratio, statistical index, weight of evidence, classification and regression tree (CART) algorithms and random forest methods were used for groundwater salinity hazard mapping in the southern part of Bakhtegan watershed. After considering the salinity threshold for groundwater (EC<1000 µSiemens/cm), As groundwater salinity map, thematic layers of 21 groundwater salinity conditioning factors including altitude, distance to anticlines, distance to synclines, distance to salt plans, distance to saltwater lakes, distance to dams. Soil salinity index, topographic wetness index, curvature, plan curvature, plan curvature, flow accumulation, flow direction, slope, aspect, land use, soil type, climate, land cover, groundwater drop, groundwater level were prepared. EC data were divided into two categories of training and validation and by comparing the salinity map of groundwater with 21 independent factors; the weighting of two-variable methods and the parameters of multivariate methods were estimated. According to the selected factors in the southern plains of Bakhtegan watershed, the results of this study stated that altitude factors, distance to salt plans, distance to synclines and anticlines and distance to saltwater lakes are more important in the occurrence of groundwater salinity in this region. The results of validation of bivariate models estimated the amount of area under the curve (ROC) for frequency ratio methods (0.923), statistical index (0.905) and weight of evidence (0.908), which indicates better performance of frequency ratio method compared to two other methods. Also, the results of multivariate methods showed better performance of random forest method with matching coefficient values (0.91) and correlation coefficient (0.85) than CART with matching coefficient (0.89) and correlation coefficient (0.82). Finaly, in any research, the efficiency of the models depends on the appropriate selection of the effective factor in the occurrence of the phenomenon under study, the quality of the collected data and the quality of the maps used.