Magiran | جستجوی کلیدواژه "support vector machine"

شبیه سازی تغییرات ضریب یکنواختی توزیع آب در سامانه های آبیاری بارانی کلاسیک ثابت با استفاده از مدل های داده کاوی

فریبرز احمدزاده کلیبر*، شهرام شاه محمدی کلالق، سینا فرد مرادی نیا

نشریه رویکردهای نوین در مهندسی آب و محیط زیست، سال سوم شماره 2 (پیاپی 6، پاییز و زمستان 1403)، صص 118 -135

هدف

کمبود آب در جهان و سهم عمده مصرف آن در بخش آبیاری محصولات، ضروری می دارد تا از علوم مختلف در جهت افزایش بهره وری آب بهره برد. ضریب یکنواختی توزیع آب در سامانه های آبیاری بارانی، از شاخص های مهمی است که در ارزیابی عملکرد آنها موثر بوده است و تنها مقادیر زیاد آن می تواند اجرای این سامانه ها را توجیه پذیر کند. هدف از این پژوهش استفاده از مدل های ماشین بردار پشتیبان (SVM) و برنامه ریزی بیان ژن ((GEP برای شبیه سازی ضریب یکنواختی توزیع آب در شرایط مزرعه ای دشت ملکان در شمال غرب ایران است که در حوضه آبریز دریاچه ارومیه، دچار تنش آبی سختی است.

مواد و روش ها

آزمایشهای صحرایی بر روی هفت مزرعه مجهز به سامانه آبیاری بارانی کلاسیک ثابت با آبپاش متحرک (Komet 162, 163) با متغیرهای فواصل آبپاش روی لترال ها و مانیفلدها، فشار کارکرد و سرعت باد انجام شد و داده های ضریب یکنواختی توزیع به دست آمد. از دو مدل (SVM) و ((GEP برای شبیه سازی مقدار ضریب یکنواختی استفاده شد. تحلیل حساسیت نشان داد هر سه متغیر به عنوان ورودی های مدل ها باید انتخاب شود. سهم فرآیندهای آموزش و آزمون از داده ها به ترتیب 70 درصد و 30 درصد در نظر گرفته شدند. با استفاده از این داده ها، پارامترهای تنظیمی هر یک از مدل ها برای رسیدن به بهینه ترین خروجی محاسبه شدند. ارزیابی عملکرد مدل ها با چهار شاخص RMSE (مجموع مربعات میانگین خطا)، MAE (میانگین خطای مطلق)، R2 (ضریب تبیین) و DDR (نسبت تفاوت توسعه داده شده) انجام شد.

نتایج

رتبه اول دقت شبیه سازی به مدل GEP اختصاص یافت. مقدار شاخص های (RMSE, MAE, R2) در گام آموزش و آزمون به ترتیب (8634/0، 6827/2، 5087/3) و (9833/0، 9494/0، 1787/1) برای GEP به دست آمدند. مقدار شاخص های ارزیابی (RMSE, MAE, R2) برای بهینه ترین مدل SVM در گام آزمون و آموزش نیز به ترتیب (7884/0، 2704/4، 8917/4) و (9185/0، 4113/2، 6790/2) حاصل شدند. در گام آموزش مقدار CU (DDR(max)) برای مدل GEP و SVM به ترتیب 0540/7 و 2925/5 محاسبه شد. مقدار این شاخص در گام آزمون برای این دو مدل به ترتیب 8355/20 و 2863/9 بود. مقایسه مقدار این شاخص نیز نشان از دقت بیشتر و بالاتر مدل GEP نسبت به مدل SVM داشت. در مجموع هر دو مدل قدرت شبیه سازی مقدار یکنواختی توزیع آب در آبیاری بارانی با شرایط مزرعه ای را دارند، اما استفاده از مدل GEP منجر به نتایج بهتری خواهد شد.

کلید واژگان: برنامه ریزی بیان ژن، دشت ملکان، تحلیل حساسیت، ماشین بردار پشتیبان، ارزیابی عملکرد

Simulating of Changes in Water Distribution Uniformity Coefficient in Classic Stationary Sprinkler Irrigation Using Data-Mining Models

Fariborz Ahmadzadeh-Kaleybar *, Shahram Shahmohammadi Kalalagh, Sina Fard Moradinia

Journal of New Approaches in Water Engineering and Environment, Volume:3 Issue: 2, 2025, PP 118 -135

The coefficient of water distribution uniformity in sprinkler irrigation systems is one of the important indicators that are effective in evaluating their performance and only high values can justify the implementation of these systems. The purpose of this research is to use support vector machine (SVM) and gene expression programming (GEP) models to simulate the coefficient of water distribution uniformity in the farm-conditions of Malekan plain in the northwest of Iran, which is in the catchment area of the Urmia lake is experiencing severe water stress.Field tests were carried out on seven farms equipped with a classic stationary sprinkler irrigation system with a movable sprinkler (Komet 162, 163) with variables of sprinkler intervals on laterals and manifolds, operating pressure and wind speed, and distribution uniformity coefficient data were obtained. The values of the indicators (RMSE, MAE, R2) were obtained in the training and test steps, respectively (3.5087, 2.6827, 0.8634) and (1.1787, 0.9494, 0.9833) for GEP. The values of the evaluation indices (RMSE, MAE, R2) for the most optimal SVM model in the test and training steps were obtained (4.8917, 4.2704, 0.7884) and (2.6790, 2.4113, 0.9185) respectively. In the training step, the value of CU(DDR(max)) for GEP and SVM model was calculated as 7.0540 and 5.2925 respectively. The value of this index in the test step for these two models was 20.83 and 9.28 respectively. The comparison of the value of this index also showed that the GEP model is more accurate than the SVM model.

Keywords: Gene Expression Programming, Malekan Plain, Sensitivity Analysis, Support Vector Machine, Performance Evaluation

ارتقاء کیفیت نقشه کاربری/پوشش اراضی با استفاده از برخی شاخص های طیفی در شهرستان سراب، آذربایجان شرقی

علی سرابچی، حسین رضائی*، فرزین شهبازی

نشریه آب و خاک، سال سی و هشتم شماره 5 (پیاپی 97، آذر و دی 1403)، صص 591 -605

مطالعه الگوی کاربری/پوشش اراضی و کسب اطلاعات درست و به روز در این خصوص از گام های نخست در مدیریت اراضی است. تحقیق حاضر در منطقه ای با وسعت 8000 هکتار از اراضی شهرستان سراب به منظور بررسی امکان تفکیک حداکثری و نقشه برداری دقیق پدیده های زمینی مرتبط با کاربری/پوشش اراضی انجام شد. الگوی کاربری/پوشش منطقه مورد مطالعه با استفاده از باندهای مرئی، NIR و SWIR سنجنده OLI و به کمک الگوریتم ماشین بردار پشتیبان و حداکثر احتمال طبقه بندی شدند. سپس به منظور بهبود کیفیت نقشه کاربری/پوشش اراضی، نقشه DEM و سه گروه شاخص طیفی شامل شاخص های پوشش گیاهی (NDVI-SAVI-LAI-EVI1-EVI2)، شاخص های خاک (BSI-BSI3-MNDSI-NBI-DBSI-NBLI) و شاخص های تلفیقی مستخرج از تصاویر ماهواره ای (TLIVI-ATLIVI-LST-) بررسی و شاخص های منتخب مجدد در الگوریتم طبقه بندی برتر وارد و کیفیت نقشه های خروجی مورد ارزیابی قرار گرفت. مقایسه نتایج محاسبه صحت کلی طبقه بندی و ضریب کاپا نشان داد که در تمامی ترکیبات باندی به کار رفته، روش ماشین بردار پشتیبان عملکرد بهتری نسبت به روش حداکثر احتمال داشته است. سپس، شاخص هایی که بیشترین تاثیر را در افزایش صحت طبقه بندی داشتند انتخاب و مجددا عملیات طبقه بندی فقط با روش ماشین بردار پشتیان انجام شد و تا حصول بیشترین مقادیر پارامترهای ارزیابی صحت نقشه تکرار شد. نتایج نشان داد از شاخص های گیاهی، شاخص LAI با بیشترین تاثیر باعث افزایش 64/2 درصدی صحت طبقه بندی، از شاخص های خاک، شاخص های BSI و MBI مطلوب ترین عملکرد را داشته و به ترتیب باعث افزایش 95/1 و64/1 واحدی صحت طبقه بندی شده و از شاخص های تلفیقی، LST و ALTIVI به ترتیب موجب افزایش 75/2 و 35/2 واحدی درصد صحت طبقه بندی شدند. در نهایت فرآیند طبقه بندی با استفاده از پنج باند سنجنده OLI (باندهای مرئی+NIR+SWIR1) و شاخص های منتخب شامل LAI، BSI، MBI، LST و ALTIVI و الگوریتم ماشین بردار پشتیبان انجام و صحت طبقه بندی و ضریب کاپا به ترتیب 24/85 % و 82/0 محاسبه و منطقه مورد مطالعه به دوازده کلاس کاربری/پوشش اراضی تفکیک شد. در نهایت به منظور بهره گیری از نقشه کاربری/پوشش اراضی در مدیریت پایدار اراضی توصیه به تهیه این نقشه در دو مرحله شامل انتخاب الگوریتم برتر و در گام بعد استفاده از شاخص های طیفی می باشد.

کلید واژگان: حداکثر احتمال، ماشین بردار پشتیبان، مدیریت پایدار اراضی

Enhancing the Accuracy of Land Use/Cover Map Using Some Spectral Indices in Sarab County–East Azerbaijan

A. Sarabchi, H. Rezaei *, F. Shahbazi

Journal of water and soil, Volume:38 Issue: 5, 2024, PP 591 -605

Introduction

High-resolution satellite imagery data is widely utilized for Land Use/Land Cover (LULC) mapping. Analyzing the patterns of LULC and the data derived from changes in land use caters to the increasing societal demands, improving convenience, and fostering a deeper comprehension of the interaction between human activities and environmental factors. Although numerous studies have focused on remote sensing for LULC‎ mapping, there is a pressing need to improve the quality of LULC maps to achieve sustainable land management, especially in light of recent advancements made. This study was carried out in an area covering approximately 8000 hectares, characterized by diverse conditions in LULC, geomorphology and pedology. The objective was to investigate the potential for achieving maximum differentiation and accurate mapping of land features related to LULC. Additionally, the study assessed the impact of various spectral indices on enhancing the results from the classification of Landsat 8 imagery, while also evaluating the efficacy of support vector machine (SVM) and maximum likelihood algorithms in producing maps with satisfactory accuracy and precision.

Materials and Methods

As an initial step, LULC features were identified through fieldwork, and their geographic coordinates were recorded using GPS. These features included various types of LULC, soil surface characteristics, and landform types. Following the fieldwork, 12 types of LULC units were identified. Subsequently, the LULC pattern in the study area was classified using the RGB+NIR+SWIR1 bands of Landsat 8, employing both SVM and maximum likelihood classifiers. To assess the impact of various spectral indices on improving the accuracy of the LULC maps, a set of vegetation indices (NDVI, SAVI, LAI, EVI, and EVI2), bare soil indices (BSI, BSI3, MNDSI, NBLI, DBSI, and MBI), and integrated indices (TLIVI, ATLIVI, and LST), and digital elevation model of study area were successively incorporated into the classification algorithms. Finally, the outcomes from the two classification algorithms were compared, taking into account the influence of the applied indexes. The classification process continued with the selected classifier and indices until reaching the maximum overall accuracy and kappa coefficient.

Results and Discussion

Field observations revealed that the study area could be categorized into 12 primary LULC units, including irrigated farms, flow farming, dry farming, traditional gardens (with no evident order observed among planted trees), modern gardens (featuring regular rows where soil reflectance is visible between tree rows), grasslands, degraded grasslands, highland pastures (covered by Astragalus spp., dominantly), lowland pastures (covered by halophyte plants), salt domes (with no or very poor vegetation), outwash areas (River channel with many waterways), and resistant areas. The results of image classification indicated that the performance of the SVM algorithm across different band combinations is superior to that of the maximum likelihood method. Using SVM resulted in an increase in overall accuracy and Kappa coefficient by 3-8% and 0.03-0.08, respectively. For the map generated using RGB+NIR+SWIR1 bands and employing SVM, overall accuracy and Kappa coefficient were determined to be 76.6% and 0.72, respectively. Among the vegetation indices used in the SVM algorithm, LAI had the most significant impact, increasing the classification accuracy by 2.64%. Among the soil indices, BSI and MBI indices demonstrated the best performance; with BSI increasing the classification accuracy by 1.95% and MBI by 1.64%. Among the integrated indices, LST and ALTIVI enhanced the classification accuracy by 2.75% and 2.35%, respectively. It should be noted that the inclusion of the digital elevation model did not significantly improve the classification accuracy when using the support vector machine algorithm; in fact, it led to a decrease in accuracy when applied to the maximum likelihood classification. The probable reason for this issue is the different nature of DEM data compared to the other input data, as well as the limitations of parametric statistical approaches to effectively integrating data from diverse sources. Finally, the classification process was executed using the three visible bands, NIR, and SWIR1, in conjunction with selected indices (LAI, BSI, MBI, LST, and ALTIVI). Results indicated that using these spectral indices significantly improved classification accuracy, particularly for the DF, DGL, MG, O, and IF land cover/use classes. The calculated accuracies for these classes increased by 11.62%, 18.57%, 20.06%, 29.39%, and 33.19% respectively. Consequently, the accuracy of the classification and the Kappa coefficient (using support vector machine algorithm) increased to 85.24% and 0.82, respectively.

Conclusion

In this research, we aimed to accurately map various land use/land covers by utilizing Landsat 8 imagery and incorporating three group of spectral indexes. Despite spectral interferences and overlaps among various phenomena related to LULC, the utilization of different spectral indices resulted in significant differentiation among LULC classes. Finally, considering the limitations of modelling in ENVI software, it is recommended to investigate the effectiveness of other models for classification in more specialized software, such as R.

Keywords: Land Sustainable Management, Maximum Likelihood, Support Vector Machine

ارزیابی کارآمدی سه مدل داده کاوی در پهنه بندی مناطق حساس به رخداد آبکند (مطالعه موردی: حوزه آبخیز بالادست سد بوستان، استان گلستان)

ثریا یعقوبی، محسن حسینعلی زاده*، چوقی بایرام کمکی، علی نجفی نژاد، حمیدرضا پورقاسمی

نشریه مدل سازی و مدیریت آب و خاک، سال چهارم شماره 4 (زمستان 1403)، صص 219 -238

فرسایش آبکندی یکی از مخرب ترین اشکال فرسایش آبی است که باعث هدررفت حجم زیادی از خاک در مناطق خشک و نیمه خشک می شود. هدف از این پژوهش بررسی حساسیت حوزه آبخیز بالادست سد بوستان در شمال شرق استان گلستان به فرسایش آبکندی با استفاده از فناوری شیءگرا و الگوریتم های داده کاوی است. برای پایش و شناسایی آبکندهای موجود در منطقه با سنجش از دور، از تصاویر QuickBird سال 2021 و نرم افزار Orfeo برای قطعه بندی تصویر مورد نظر استفاده شد. سپس با بازدیدهای میدانی، 81 آبکند در منطقه انتخاب شد. در نهایت، در محیط پایتون (کولب) با استفاده از تحلیل هم خطی بر 23 شاخص موثر در وقوع فرسایش آبکندی با سه مدل جنگل تصادفی، حداکثر آنتروپی و ماشین بردار پشتیبان اقدام به مدل سازی شد. بعد از انجام تحلیل هم خطی، هفت عامل شامل فاصله از گسل، ارتفاع، NDBI، NDWI، Band3، Band5 و Band7 به دلیل مقدار تورم واریانس بالاتر از پنج، از مرحله مدل سازی حذف شدند. نتایج حاصل از بررسی متغیرهای تاثیرگذار نشان داد که در مدل جنگل تصادفی بارندگی، شاخص فاصله از رودخانه، شاخص HAND، فاصله از جاده و دره پراهمیت ترین شاخص ها می باشند. هم چنین، نتایج پهنه بندی با استفاده از این شاخص ها حاکی از آن بود که در مدل جنگل تصادفی، 65/8 درصد از مساحت منطقه در خطر فرسایش زیاد و خیلی زیاد قرار دارد که در مقایسه با دو مدل حداکثر آنتروپی و ماشین بردار پشتیبان با عملکرد بهتری نواحی مستعد فرسایش را پیش بینی کرده است. در نهایت، برای اعتبارسنجی مدل از منحنی ROC استفاده شد. مقادیر AUC در مدل جنگل تصادفی در دو مرحله آموزش و اعتبارسنجی 95/0 و 94/0 درصد به دست آمد که بیان گر صحت بالای این مدل در پیش بینی مناطق با حساسیت بالا به فرسایش آبکندی است. نتایج این پژوهش و کارایی فناوری شیء گرا در تفکیک آبکندها، می تواند به پژوهش گران کمک کند که با لحاظ کردن اقدامات حفاظتی و آبخیزداری در اراضی لسی از تمرکز رواناب های ناشی از بارش سیلابی، در مناطق با حساسیت زیاد به وقوع آبکند جلوگیری کنند.

کلید واژگان: جنگل تصادفی، حداکثر آنتروپی، سد بوستان، شاخص HAND، ماشین بردار پشتیبان

Evaluation of the efficiency of three data mining models in zoning areas prone to gully erosion (Case study: Upper Watershed of Boustan Dam)

Soraya Yaghobi, Mohsen Hosseinalizdeh *, Chouoghi Bairam Komaki, Ali Najafinejad, Hamidreza Pourghasemi

Journal of Water and Soil Management and Modeling, Volume:4 Issue: 4, 2024, PP 219 -238

Introduction

Gully erosion is a particularly destructive form of water erosion that can lead to alarming rates of soil loss, especially in the vulnerable landscapes of dry and semi-arid regions. This type of erosion is recognized not only for its immediate impact on land but also as a critical environmental challenge that requires our urgent attention. As a result, there has been a growing emphasis on developing effective predictive models that can elucidate the temporal and spatial dynamics of gully erosion-specifically, how it forms, expands, and evolves over time. This endeavor has captured the interest of soil conservation experts and researchers alike, who understand the profound implications of this issueIn recent years, remote sensing and data mining techniques have emerged as valuable tools for identifying and mapping areas susceptible to gully erosion. These innovative methods provide essential insights for land managers and policymakers, enabling them to make informed decisions. Furthermore, the effectiveness of predictive models hinges on their advanced capabilities, which enhance their learning potential and improve the identification of relationships among various factors. Creating a sensitivity map is an essential strategy for land use planning, as it actively contributes to reducing land degradation and safeguarding our natural resources. Understanding the connection between gully occurrences and influential factors is not only beneficial; it is crucial for sustainable land management and environmental preservation.

Materials and Methods

This research investigates the sensitivity of the upper basin of the Boustan Dam to gully erosion using object-based techniques and data mining algorithms. To achieve this, field visits were conducted to select 81 gullies for analysis. The study examines several factors, including slope, aspect, slope length index (LS), elevation, plan curvature, distance from the river, drainage density, topographic wetness index (TWI), height above the nearest drainage (HAND), average annual rainfall, distance from roads, distance from faults, land use, geomorphology, soil texture, and satellite bands B7, B5, and B3. Additionally, the normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), and normalized difference water index (NDWI) are considered, along with geological aspects. QuickBird satellite images from 2021 and Orfeo software were utilized to monitor and identify gullies in the area through image segmentation. Initially, a collinearity analysis of 23 effective erosion occurrence indices was performed, resulting in the removal of distance from the fault, digital elevation model (DEM), NDWI, NDBI, and satellite bands B3, B5, and B7 due to their collinearity exceeding five. Following this linear operation, all remaining indices were integrated with the segmentation map obtained from the Orfeo environment. Finally, three models -Random Forest, Maximum Entropy, and Support Vector Machine- were employed to model the selected indices using Python (Colab).

Results and Discussion

The results from the object-oriented method in the Orfeo software further demonstrated its effectiveness in accurately identifying gullies. With an impressive accuracy rate of 91.3%, this method has proven to be highly reliable in generating machine learning maps with high precision. Findings indicate that the key factors contributing to gully erosion include the rainfall index, distance from the river, Height Above Nearest Drainage (HAND) index, distance from the road, and valley index. Torrential rain emerged as a significant driver of gully erosion, while the distance from the river was crucial due to the concentration of surface and subsurface flows toward waterways. The HAND index played a prominent role in modeling the sensitivity of the study area compared to other sub-indices derived from DEM, as it exhibited promising applications in assessing natural hazards. Locations close to roads were found to be more vulnerable to water erosion, and valleys were identified as especially susceptible to gully erosion due to their conducive conditions for rapid water flow and erosion. Extensive field studies support this observation. Furthermore, zoning results generated using these indices indicated that, within the random forest model, 544.23 hectares of the area are at high or very high risk of erosion. This model outperformed the Maximum Entropy and Support Vector Machine models in predicting erosion-prone areas. Finally, the ROC curve was utilized to validate the model, yielding AUC values of 0.95 and 0.94 in the random forest model during the training and validation stages, respectively. These results indicate the model's high accuracy in predicting areas highly susceptible to gully erosion.

Conclusion

This study effectively used object-based image analysis algorithms and data mining techniques to create a sensitivity map of the region. The object-based method efficiently identified the local gullies using the mean shift algorithm, while the random forest algorithm excelled in predicting areas prone to gully erosion. Key factors contributing to gully erosion were identified, including rainfall, distance from the river, soil HAND index, and distance from roads and valleys. The findings from this study provide valuable insights for managing and preserving basin resources. Implementing the recommendations from this research could help mitigate the impacts of gully erosion in the future and ensure the sustainability of the Boustan Dam and its surrounding ecosystem.

Keywords: Boustan Dam, HAND Index, Maximum Entropy, Random Forest Model, Support Vector Machine

ارزیابی دقت روش فرامکعب لاتین به منظور انتخاب موقعیت نقاط مطالعاتی برای تهیه نقشه ی رقومی ویژگی های خاک

زهره مصلح قهفرخی*، ابوالفضل آزادی

نشریه آب و خاک، سال سی و هشتم شماره 3 (پیاپی 95، امرداد و شهریور 1403)، صص 367 -382

با توجه به اینکه دقت و صحت تمام اطلاعات خاک شناسی وابسته به بهترین گمانه زنی در مورد مکان تغییرات خاک ها در قالب تعیین الگوی نمونه برداری می باشد، انتخاب روشی کارآمد که بتواند به بهترین شکل این تغییرات را رصد نماید بسیار حائز اهمیت است. تاکنون مطالعات اندکی در رابطه با بررسی تاثیر تصادفی بودن انتخاب نمونه ها در روش فرامکعب لاتین بر صحت نقشه ها انجام شده است. این مطالعه با هدف ارزیابی دقت روش فرامکعب لاتین در انتخاب موقعیت نمونه برداری به منظور انجام مطالعات نقشه برداری رقومی خاک در منطقه ای از شهرستان بروجن در استان چهارمحال و بختیاری انجام شد. با توجه به اینکه، چندین مرتبه نمونه برداری میدانی برای ارزیابی روش نمونه برداری خاک امری غیرمنطقی است در این پژوهش تلاش گردید تا از روش های شبیه سازی بر اساس نقشه هایی با صحت بسیار بالا برای این منظور استفاده شود. فاصله باهاتاچاریا برای کمی سازی فاصله بین توزیع احتمال جامعه اصلی و اجراهای مختلف روش فرامکعب لاتین استفاده گردید. نقشه ویژگی های خاک (درصد کربنات کلسیم معادل، رس و کربن آلی) عمق سطحی (صفر تا 30 سانتی متر) با استفاده از روش ماشین بردار پشتیبان تهیه گردید و اعتبارسنجی شد. علاوه بر آن، انتخاب موقعیت نقاط نمونه برداری با استقاده از روش فرامکعب لاتین با تراکم 200 نقطه با 500 مرتبه اجرا انجام گردید. در هر مرحله، اعتبارسنجی برای پیش بینی ویژگی های خاک با استفاده از R2، RMSE و %RMSE انجام شد. نتایج نشان داد که برای تمامی ویژگی های مورد بررسی، مدل ماشین بردار پشتیبان از صحت قابل قبولی (%RMSE کمتر از 40) برخوردار می باشد. از سوی دیگر، نتایج گویای آن است که خروجی های مختلف روش فرامکعب لاتین در اجراهای مختلف آن بر صحت مدلسازی تاثیرگذار است و مقادیر RMSE مدل در حالت های مختلف برای درصد کربنات کلسیم معادل، رس و کربن آلی به ترتیب از 1/1، 1/1 و 02/0 تا 2/3، 2 و 12/0 متغیر است. اگرچه این موضوع متاثر از ویژگی مورد بررسی و میزان تغییرات آن در منطقه مورد مطالعه نیز می باشد.

کلید واژگان: فاصله باهاتاچاریا، ماشین بردار پشتیبان، موقعیت نمونه، نقشه برداری رقومی

Evaluating the Precision of the Conditioned Latin Hypercube Sampling Method for Selecting Soil Samples to Generate Digital Maps of Soil Properties

Z. Mosleh Ghahfarokhi *, A. Azadi

Journal of water and soil, Volume:38 Issue: 3, 2024, PP 367 -382

Introduction

Soil properties play a crucial role as they determine the soil's suitability for different types of plant growth, ecosystems, and biota functioning. They have a significant impact on nutrient cycling, carbon sequestration, and soil management. Digital Soil Mapping (DSM) is a process aimed at delineating soil properties. Soil sampling for DSM serves as a fundamental step in improving prediction accuracy and is crucial for incorporating variability in terms of environmental covariates. Conditioned Latin Hypercube (CLH) sampling is a technique utilized to generate a sample of points from a multivariate distribution conditioned on one or more covariates. Numerous researchers (Ramirez-Lopez et al., 2014; Adhikari et al., 2017; Zhang et al., 2022) have endorsed this approach in their studies, following its inception by Minasny and McBratney in 2006. However, there has been limited research to date on the impact of the Latin hypercube method's random sample selection process on the accuracy of resulting maps. Hence, the central question remains: Is the Latin hypercube sampling method, which is currently widely adopted, always a dependable approach in this field?

Materials and Methods

The study area covers longitudes 50°35'47'' to 51°29'' east and latitudes 31°36''31'' to 32°15'48'' north in Borujen city, Chaharmahal, and Bakhtiari Province. The region, with an average elevation of 2338 meters above sea level, receives an annual rainfall of 250 millimeters and maintains an average temperature of 11.5 degrees centigrade. In this investigation, inherited data from soil studies were utilized, consisting of 250 samples distributed across the study area. In this research, the studied characteristics included percentage of equivalent calcium carbonate, clay, and soil organic carbon at a depth of 0 to 30 cm. Land component variables were extracted using the Alus Palsar digital elevation model with a spatial resolution of 12.5 meters. In the initial stage, digital maps of equivalent calcium carbonate, clay, and soil organic carbon were generated using the support vector machine method. The modeling process proceeded until a highly accurate model was achieved, with the root mean square error percentage (RMSE%) being less than 40. The Latin hypercube approach was utilized for sample design, with 500 repetitions in this study. After selecting sampling points for each run using the Latin hypercube method, these points were mapped onto a detailed map, and the corresponding feature values were retrieved. The final map was created based on the extracted points. Subsequently, the latin hypercube approach was employed to generate soil property maps for each selected dataset. Validation was conducted using criteria such as the coefficient of explanation, root mean square error, and root mean square error in multiple iterations to ensure the accuracy of the generated maps.

Results and Discussion

The results distinctly illustrates the varied selection of sampling positions with each implementation of the Latin hypercube method. It is important to note that there may be some overlaps in different implementations. Consequently, the primary question arises: Is a one-time execution of the Latin hypercube sufficient for selecting study points? The findings indicate that the support vector machine model achieves satisfactory accuracy for all the examined characteristics. In the studied area, the environmental factors such as slope and elevation were identified as a significant predictors for estimating percentage of equivalent calcium carbonate.

Conclusion

In the present study, the accuracy of the latin hypercube method was assessed for selecting sampling location for digital soil mapping endeavors in Chaharmahal and Bakhtiari Province. Given the impracticality of collecting numerous field samples to evaluate the soil sampling method, this research aimed to employ simulation methods based on highly accurate maps for this purpose. The results indicate that the different outputs of the Latin hypercube method influence the accuracy of modeling, although this effect is also influenced by the specific feature under investigation and the extent of its variability within the study area. Considering that the Latin hypercube method is based on the principle that samples are randomly selected in each class of environmental parameters, it is suggested that future studies using this method should account for this principle. Adequate consideration should be given, and the selection of sampling locations should rely on multiple implementations of the Bhattacharya distance method to ensure robustness and reliability.

Keywords: Bhattacharyya Distance, Digital Soil Mapping, Sampling Position, Support Vector Machine

ارزیابی کارایی مدل هایSVM ، LS-SVM و SVM-GOA در شبیه سازی دبی اوج سیل ایستگاه پل دختر

فاطمه توکلی، حامد نوذری*، صفر معروفی

مجله تحقیقات آب و خاک ایران، سال پنجاه و پنجم شماره 4 (پیاپی 100، تیر 1403)، صص 537 -552

مدل سازی یا شبیه سازی سیل یکی از راهکارهای اساسی برای مدیریت و کاهش اثرات مخرب این پدیده بوده و شناسایی مدل هایی کارآمد بدین منظور، یکی از مهم ترین ارکان در مدیریت حوضه های آبریز است. در این پژوهش دقت مدل های ماشین بردار کلاسیک(SVM) ، ماشین بردار پشتیبان تلفیق شده با الگوریتم ملخ (GOA-SVM)و حداقل مربعات ماشین بردار پشتیبان (LS-SVM) در شبیه سازی دبی اوج سیل ایستگاه پل دختر در حوضه کرخه، مورد ارزیابی قرار گرفته است. بدین منظور از آمار 74 واقعه سیل در محدوه سال های 1388 تا 1395 در ایستگاه پل دختر و بارش روزانه 13 ایستگاه باران سنجی در حوضه آبریز بالادست این ایستگاه استفاده شده است. از این تعداد، 52 واقعه برای آموزش و 22 واقعه نیز برای صحت سنجی مدل ها انتخاب شد. مقایسه نتایج به کمک چهار شاخص آماری ضریب تبیین(R^2)، جذر میانگین مربعات خطا (RMSE)، خطای استاندارد (SE)، ضریب نش (NS) و همچنین تحلیل عدم قطعیت به کمک دو شاخص متوسط طول بازه نسبی (ARIL)و درصد پوشش (POC) صورت گرفت. نتایج حاکی از برتری نسبی مدل LS-SVM با 407/0SE=، 16/110RMSE=، 91/0NS= و 92/0R2= نسبت به مدل SVM با 5/0 SE=، 70/137RMSE=، 87/0NS= و 88/0R2= و مدل SVM-GOA با 519/0 SE=، 53/144RMSE=، 83/0NS= و 9/0R2= است. متوسط مدت زمان اجرای مدلLS-SVM در حد چند ثانیه و این زمان در مدل SVM-GOA در حد چند ساعت است. از سوی دیگر تنظیم پارامترهای مدل SVM کلاسیک بصورت دستی نیز مستلزم صرف زمان زیادی است. لذا مدلLS-SVM به دلیل دارا بودن پارامترهای قابل تنظیم کمتر نسبت به مدل های SVM وSVM-GOA ، از لحاظ اجرایی ازسهولت بیشتری برخوردار است. لذا می توان با قطعیت و اختلافی چشمگیر مدلLS-SVM را نسبت به دو مدل دیگر در ارجحیت قرار داد.

کلید واژگان: الگوریتم ملخ، حوضه کرخه، پل دختر، مدل سازی سیل، ماشین بردار پشتیبان

Evaluating the efficiency of SVM, LS-SVM and SVM-GOA models in simulating the Flood peak discharge at the Poldokhtar station

Fatemeh Tavakoli, Hamed Nozari *, Safar Marofi

Iranian Journal of Soil and Water Research, Volume:55 Issue: 4, 2024, PP 537 -552

In order to control and minimize the damaging impacts of floods, flood modeling or simulation is a fundamental solution. Identifying effective models for this purpose is crucial in watershed management. This study evaluates the accuracy of support vector machine models combined with the support vector machine (SVM), Grasshopper algorithm (SVM-GOA) and least square support vector machine (LS-SVM) in simulating the flood peak discharge of Poldokhtar station in the Karkheh basin. For this study, 74 flood events from 2009 to 2016 at the Poldokhtar station and data from 13 daily rainfall stations in the upstream area for the same period were utilized. Subsequently, 52 events were allocated for training, and 22 for validation. The comparison of results was conducted using three statistical indicators: Correlation coefficient (R2), Root mean square error (RMSE), Nash efficiency (Ns), and Standard error (SE). Additionally, uncertainty analysis was performed using two indexes: ARIL and POC. The results indicate the relative superiority of the LS-SVM model with SE=0.407, RMSE=110.16, NS= 0.91 and R2=0.92 compared to the SVM model with SE=0.5, RMSE=137.70, NS= 0.87 and R2=0.88 and SVM-GOA model with SE=0.519, RMSE=144.53, NS= 0.83 and R2=0.9. The study's overall conclusion is that the LS-SVM model is more accurate, faster, and easier to implement compared to the SVM and SVM-GOA models. As a result, it can be confidently preferred over the SVM and SVM-GOA models due to its significant advantages. The research emphasizes the critical importance of precise flood modeling and simulation in watershed management for mitigating the destructive impact of floods.

Keywords: Flood Modeling, Support Vector Machine, Grasshopper Algorithm, Karkheh Basin, Poldokhtar Station

تخمین غلظت هوا در سرریز شوت با استفاده از روش های فرامدل

کیومرث روشنگر*، رضا سعادت جو، حمیدرضا عباس زاده، آیدین پناهی

مجله تحقیقات آب و خاک ایران، سال پنجاه و پنجم شماره 4 (پیاپی 100، تیر 1403)، صص 601 -613

یکی از راه های جلوگیری از ایجاد فشار منفی و کاویتاسیون در سرریزها، هوادهی به جریان عبوری از سرریزها می باشد. شناخت نحوه توزیع تغییرات غلظت هوا در طول سرریز جهت تخمین میزان هوادهی از اهمیت زیادی برخوردار است. در پژوهش حاضر کاربرد روش های فرامدل رگرسیونی فرآیند گاوسی (GPR) و ماشین بردار پشتیبان (SVM) در پیش بینی غلظت هوا مورد بررسی قرار گرفت. بدین منظور مجموعه داده های آزمایشگاهی (2268) به دست آمده از مدل های هیدرولیکی سرریز شوت در فرآیند مدل سازی به کار گرفته شد. مدل های ورودی متنوعی بر اساس ترکیب مختلفی از پارامترهای اندازه گیری شده تعریف گردید. نتایج به دست آمده نشان دهنده توانایی بالای هر دو روش در برآورد غلظت هوای مورد نیاز بر روی سرریز است. در برآورد میزان غلظت هوا در سرریز شوت برای حالتی که هوادهی مصنوعی توسط هواده انجام می گیرد پارامترهای دبی جریان (QW)، نسبت فاصله طولی از انتهای دفلکتور به عرض کانال (L/W) و نسبت عمق (عمود بر سرریز) بر عرض کانال (Y/W) تاثیر زیادی داشتند. نتایج شاخص های آماری ضریب همبستگی (R)، ضریب تبیین (DC) و خطای جذر میانگین مربعات برای این حالت در روش GPR به ترتیب 9214/0، 8451/0 و 1008/0 و مقادیر 9333/0، 8662/0 و 0937/0 در روش SVM است. برای حالتی که هوادهی مصنوعی توسط هواده انجام نمی گیرد، مدل با پارامترهای ورودی Qw، L/W، Y/W و ΔP (اختلاف فشار ما بین فشار اتمسفر و فشار زیر جت) با دارا بودن مقادیر 9222/0=R، 8644/0=DC و 0914/0=RMSE در روش GPR و به ترتیب با مقادیر 87/0، 7543/0 و 123/0 به عنوان برترین مدل انتخاب گردیدند.

کلید واژگان: رگرسیون فرآیند گاوسی، سرریز شوت، ماشین بردار پشتیبان، هوادهی

Estimation of air concentration in chute spillway using metamodel methods

Kiyoumars Roushangar *, Reza Saadatjoo, Hamidreza Abbaszadeh, Aydin Panahi

Iranian Journal of Soil and Water Research, Volume:55 Issue: 4, 2024, PP 601 -613

One of the ways to prevent creating negative pressure and cavitation in spillways is to introduce air into the flow over the spillways. Understanding the distribution of air concentration variations along the spillway is of significant importance for estimating the aeration level. This study explores the application of GPR and SVM molels in predicting air concentration. To achieve this, a dataset of 2268 laboratory experiments obtained from hydraulic models of chute spillways was utilized in the modeling process. Various input models were defined based on different combinations of measured parameters. The results demonstrate the high capability of both methods in estimating the required air concentration over the spillway. In predicting air concentration in the chute spillway under artificial aeration conditions, flow discharge (QW), longitudinal distance ratio from the end of the deflector to the channel width (L/W), and depth ratio (perpendicular to the spillway) to channel width (Y/W) significantly influenced the outcomes. Statistical indices, including R, DC, and RMSE for this case were 0.9214, 0.8451, and 1.008, respectively, in the GPR, and 0.9333, 0.8662, and 0.937 in the SVM. For scenarios without artificial aeration, the model with input parameters QW, L/W, Y/W, and ΔP (pressure difference between atmospheric pressure and the pressure under the jet) achieved the best performance in the GPR method with values of R=0.9222, DC=0.8644, and RMSE=0.914. In the SVM, the same model with values of 0.87, 0.7543, and 0.123 for R, DC, and RMSE, respectively, was selected as the superior model.

Keywords: Aeration, Chute Spillway, Gaussian Process Regression, Support Vector Machine

برآورد نیاز آبی و تبخیرتعرق واقعی با استفاده از تصاویر ماهواره ای به منظور بهبود تحویل حجمی آب در شبکه های آبیاری و زهکشی (مطالعه موردی: شبکه آبیاری و زهکشی مهاباد، استان آذربایجان غربی)

امیر نورجو*، فرید فیض الله پور

نشریه تحقیقات مهندسی سازه های آبیاری و زهکشی، سال بیست و چهارم شماره 93 (زمستان 1402)، صص 23 -42

با توجه به محدودیت کمی و کیفی آب، مدیریت و تحویل حجمی آب در شبکه های آبیاری و زهکشی امری مهم محسوب می شود. برای دستیابی به این هدف، الگوی کشت شبکه آبیاری و زهکشی مهاباد با استفاده از تصاویر ماهواره ای سنتینل 2 و روش طبقه بندی ماشین بردار پشتیبان برای سال زراعی 98-97 استخراج گردید. همچنین، با استفاده از داده های هواشناسی ایستگاه مهاباد و معادله پنمن مانتیث، حجم خالص آب مورد نیاز گیاهان غالب در محل نقاط تحویل حجمی محاسبه گردید. برای تعیین میزان تبخیر-تعرق واقعی، از تصاویر ماهواره ای لندست 8 و الگوریتم سبال استفاده شد و در نهایت نقشه های مکانی تبخیر-تعرق واقعی و نیاز خالص آبیاری برای شبکه استخراج گردید. بر اساس نتایج حاصل، 64 درصد از اراضی کشت شده (6786 هکتار) شبکه مهاباد به صورت باغی و 36 درصد از اراضی (3808 هکتار) به صورت زراعی به دست آمد. بدین ترتیب، نیاز خالص آبیاری (تبخیر و تعرق محاسباتی با کسر بارش موثر)برابر با 71 میلیون مترمکعب و نیاز ناخالص آبیاری با لحاظ راندمان آبیاری 44 درصد، برابر با 36/161 میلیون مترمکعب محاسبه گردید. همچنین، کل میزان تبخیر-تعرق حاصل از الگوریتم سبال برابر با 78/79 میلیون متر محاسبه گردید. بر اساس نقشه های کاربری اراضی، نیاز خالص آبیاری و تبخیر-تعرق واقعی، نحوه برداشت آب در شبکه مورد بررسی قرار گرفته و مشاهده شد که در اراضی بالادست شبکه و مجاور رودخانه مهاباد، نیاز آبی گیاهان برطرف شده ولی مناطق پایین دست شبکه، به علت عدم دسترسی به آب کافی، دچار تنش آبیاری شده اند.

کلید واژگان: الگوی کشت، چرخه فنولوژی گیاهی، سبال، کم آبیاری، ماشین بردار پشتیبان

Estimation of crop water requirement and actual evapotranspiration using satellite images to improve volumetric water delivery in irrigation and drainage networks (case study: Mahabad irrigation and drainage network, West Azerbaijan province

Amir Nourjou *, Farid Feizolahpour

Irrigation and Drainage Structures Engineering Research, Volume:24 Issue: 93, 2024, PP 23 -42

Due to the location of Iran in arid and semi-arid regions and according to the quantitative and qualitative limitations of water resources, optimal management and volumetric delivery of water is important in irrigation and drainage networks. In this regard, it is necessary to estimate the water requirement of crops accurately and provide adequate water to farmers. Remote sensing technology provides facilities that can be used to obtain different layers of information at the lowest cost in the fastest time. Accordingly, many researchers have used remote sensing data to monitor vegetation cover, provide land use maps, estimate crop evapotranspiration and have declared this technology as appropriate tool for such studies. Based on the previous studies, it is observed that low researches has been conducted to investigate the crop evapotranspiration considering the crop water requirement. Therefore, the most important objectives of this study are: provide the cropping pattern and land use maps using Sentinel 2 satellite images, determination of the water requirement for the delivery points of irrigation network, determination of the actual evapotranspiration of the crop cover using SEBAL algorithm and Landsat 8’s images and finally evaluation of the water supply and management in the Mahabad irrigation and drainage network. In order to determine the cropping pattern of the Mahabad irrigation and drainage network, Sentinel 2 images have been used related to the 2018-2019 crop year. The images were examined in terms of the region of syudy and the percentage of cloudiness and after selecting the appropriate images, pre-processing operations including radiometric and atmospheric corrections were applied on them. Then, the NDVI index was calculated based on selected images. On the other hand, after determination of the classification classes, the phenological cycle of crops were examined for each class and spectral pattern of crops was determined during the growing season. Training samples were selected for supervised classification using the existing maps, Google Earth images, creating images with false color composites and considering the growth pattern and some of them were also considered for validation of the classified map. Then, the cropping pattern map was obtained by using the SVM classification algorithm. After generating the crop classification map, the water requirement of the different classes was determined based on the Penman-Montith evapotranspiration method, applying plant coefficients and irrigation application efficiency at the volumetric water delivery points. Finally, the actual evapotranspiration rate of the study area calculated based on the SEBAL algorithm and compared with the net water requirement map. Based on the results, kappa coefficient and overall accuracy of the classified map were determined to be 0.953 and 91%, respectively. The area of the planted agricultural farms was equal to 10594 hectares and 1576 hectares of farms were without planting. The area of orchard farms was equal to 6786 hectares and the area of sugar beet, wheat, alfalfa and corn lands were obtained to 998, 1839, 693 and 278 hectares, respectively. Thus, the net irrigation water requirement was equal to 71 million cubic meters and the gross irrigation water requirement was calculated equal to 161.36 million cubic meters, considering the irrigation efficiency of 44%. On the other hand, the evaluation of the SEBAL evapotranspiration maps during the growing season indicated that the total amount of evapotranspiration was equal to 79.78 million cubic meters, and this amount was 14% higher than the net irrigation water requirement. Finally, according to the crop classification map and based on the comparison of the net irrigation water requirement and evapotranspiration maps, the water consumption in the Mahabad irrigation and drainage network was evaluated. It turned out that in the upstream farms of the network or close to the Mahabad River, the Water consumption was more than net water requirement and downstream areas were faced to deficit irrigation due to lack of sufficient water.Finally, based on the results of this study, it was observed that by using the capabilities of satellite images and remote sensing, it is possible to monitor and evaluate the condition of agricultural farms on a large scale with acceptable accuracy. Also it is possible to improve the management of water supply and water use efficiency in irrigation and drainage networks by creating up-to-date land use maps, determining net and gross irrigation water requirment and comparing with actual evapotranspiration maps.

Keywords: Crop Pattern, Plant Phenology Cycle, SEBAL, Deficit Irrigation, Support Vector Machine

تحلیل عدم قطعیت مدل های شبکه عصبی مصنوعی (ANN) و ماشین بردار پشتیبان (SVM) در پیش بینی جریان ماهانه رودخانه (مطالعه موردی: رودخانه قزل اوزن)

مجید محمدی، پویا اللهویردی پور*

نشریه مدل سازی و مدیریت آب و خاک، سال چهارم شماره 2 (تابستان 1403)، صص 311 -326

در یک دهه اخیر، روش های هوش مصنوعی بیش ترین کاربرد را در شبیه سازی فرآیندهای مختلف از جمله فرآیندهای هیدرولوژیکی داشته اند، اما نتایج این روش ها همواره با عدم قطعیت همراه بوده اند. یکی از راه حل هایی که می تواند تا حدودی این مشکل را حل نماید، تحلیل عدم قطعیت پیش بینی های صورت گرفته است. در مطالعه حاضر عدم قطعیت نتایج مدل های شبکه عصبی مصنوعی (ANN) و ماشین بردار پشتیبان (SVM) در پیش بینی جریان ماهانه رودخانه با استفاده از شبیه سازی مونت-کارلو و مقادیر 95PPU و d-factor مورد ارزیابی قرار گرفته است. در این پژوهش از داده ها و سری زمانی جریان ماهانه رودخانه قزل اوزن در یک دوره 39 ساله از سال 1355 تا 1393 برای ایستگاه آب سنجی بیانلو-یساول استفاده شده است که 75 درصد داده ها برای آموزش و 25 درصد برای آزمون مدل ها به کار رفته است. در این مدل ها به منظور تخمین جریان رودخانه، شش ترکیب مختلف ورودی شامل جریان یک، دو و سه ماه قبل و شماره ماه های جریان مورد استفاده قرار گرفت. برای ارزیابی مدل ها از معیارهای آماری ضریب همبستگی (R) و ریشه میانگین مربعات خطا (RMSE) استفاده شد. نتایج نشان داد که اگر چه مدل ANN با مقادیر R مساوی با 757/0 و RMSE مساوی با 45/9 دارای عملکرد خوبی نسبت به مدل SVM با مقادیر R مساوی با 729/0 و RMSE مساوی با 946/10 در پیش بینی جریان رودخانه است. اما نتایج این مدل با عدم قطعیت زیادی همراه است. مقایسه تحلیل عدم قطعیت نتایج مدل ها نشان داد که مدل SVM با مقادیر d-factor و 95PPU به ترتیب برابر با 155/0 و 241/17 نسبت به مدل ANN با مقادیر d-factor و 95PPU به ترتیب برابر با 993/0 و 470/85 از عدم قطعیت کم تری برخوردار است و از این لحاظ بر مدل ANN برتری دارد. مطابق نتایج این پژوهش باید با در نظر گرفتن این نکته که مدل های پیشرفته هوش مصنوعی نیز دارای عدم قطعیت هستند، نسبت به کاربرد این روش ها در زمینه های مدیریت ریسک و برنامه ریزی های آینده اقدام کرد تا بهترین عملکرد را به دست آورد.

کلید واژگان: پیش بینی جریان، رودخانه قزل اوزن، شبکه عصبی مصنوعی، عدم قطعیت، ماشین بردار پشتیبان

Uncertainty analysis of artificial neural network (ANN) and support vector machine (SVM) models in predicting monthly river flow (Case study: Ghezelozan River)

Majid Mohammadi, Pouya Allahverdipour *

Journal of Water and Soil Management and Modeling, Volume:4 Issue: 2, 2024, PP 311 -326

Introduction

River flow forecasting has been one of the important challenges of water resources management in recent decades, so many researchers have proposed different methods to improve the performance of forecasting models. In the last decade, artificial intelligence methods have been most widely used in the simulation of various processes, including hydrological processes, due to their flexibility and high accuracy in modeling. However, the results of these methods have always been associated with uncertainty due to several factors such as structure, algorithm, input data, and the type of method chosen for data calibration. One of the methods that can somewhat solve this problem is the uncertainty analysis of the predictions made by these models.

Materials and Methods

In this study, the uncertainty of the results of artificial neural network (ANN) and support vector machine (SVM) models in predicting the monthly flow of the river has been evaluated. In this research, the time series of the monthly flow of the Ghezelozan River using the data of the Bianlu-Yasaul Stream gauging station in 39 years from 1976 to 2014 was used, where 75% and 25% of the data was used for training and testing the models, respectively. In these models, to estimate the monthly flow of the Ghezelozan River, six different input combinations including the flow of one, two, and three months before and the number of months of the flow were used. Then, the accuracy and performance of the models were compared using the coefficient of determination (R) and root mean square of errors (RMSE). Finally, the uncertainty of the results of ANN and SVM models in predicting the monthly flow of the river was evaluated by the Monte-Carlo method using d-factor and 95PPU values.

Results and Discussion

The evaluation of the performance of the ANN model shows that the best performance is related to the combination where the flow of the previous two months and the number of the month of the flow are the inputs of the model so that R and RMSE indexes were obtained as 0.757 and 9.45, respectively. In the SVM model for the monthly river flow series, the best performance is related to the combination where the flow of one, two, and three months ago and the number of months of the flow were the inputs of the model, and the R and RMSE indexes for this input pattern were 0.729 and 10.946, respectively. After studying the uncertainty of the models, the results showed that the ANN model has more uncertainty in the output values compared to the SVM model, and this is while the d-factor of the ANN model, both in the training and test phase, it was more than the SVM model. The comparison of the uncertainty analysis of the results of the ANN and SVM models showed that the SVM model with d-factor and 95PPU values equal to 0.155 and 17.241, respectively, compared to the ANN model with d-factor and 95PPU values equal to 0.993 and 85.470, respectively, has less uncertainty in the output values. So the number of observation data placed in the 95% confidence range (95PPU) of the ANN model compared to the SVM model has increased significantly in both the training and testing phases. Also, the results showed that both models have more uncertainty in the months with a large volume of water, which can be due to the complexity of the process and the involvement of uncertain factors in these months, as well as the effect of factors that are not considered in the structure of predictive models.

Conclusion

The results of ANN and SVM models in predicting the monthly flow of the Ghezelozan River showed that although the ANN model with R-value equal to 0.757 and RMSE value equal to 9.45 has a good performance compared to the SVM model with R-value equal to 0.729 and RMSE value equal to 10.946 in predicting the river flow, the results of this model are associated with high uncertainty. The comparison of the uncertainty analysis of the results of ANN and SVM models by Monte-Carlo method showed that the SVM model with d-factor and 95PPU values equal to 0.155 and 17.241, respectively, compared to the ANN model with d-factor and 95PPU values equal to 0.993 and 85.470, respectively, has less uncertainty in predicting the monthly flow of the Ghezelozan River and it is better than ANN model. According to the results of this research, taking into account the fact that advanced artificial intelligence models also have uncertainty, it is necessary to apply these methods in the fields of risk management and future planning to obtain the best performance.

Keywords: Artificial Neural Network, Flow Prediction, Ghezelozan River, Support Vector Machine, Uncertainty

پهنه بندی حساسیت وقوع زمین لغزش با استفاده از الگوریتم های یادگیری ماشین (منطقه مورد مطالعه: بخشی از حوزه آبخیز هراز)

علیرضا سپه وند*، نسرین بیرانوند

نشریه مدل سازی و مدیریت آب و خاک، سال چهارم شماره 2 (تابستان 1403)، صص 261 -278

زمین لغزش یکی از انواع پدیده های زمین شناسی در سراسر جهان است که هر ساله تلفات جانی و خسارات اقتصادی زیادی را به همراه دارد. بنابراین، این پژوهش به منظور ارزیابی پهنه بندی حساسیت وقوع زمین لغزش با استفاده از الگوریتم های مختلف یادگیری ماشین از نوع ماشین پشتیبان بردار (SVM) و رگرسیون فرآیند گاوسی (SVM) با دو کرنل (PUK و RBF) و جنگل تصادفی (RF) در بخشی از حوزه آبخیز هراز، ایران انجام شده است. در پژوهش حاضر از نه عامل شیب، جهت، ارتفاع، زمین شناسی، کاربری اراضی، فاصله از گسل، فاصله از جاده، فاصله از رودخانه و بارش به عنوان پارامترهای ورودی و نقاط لغزشی و غیرلغزشی به عنوان پارامتر خروجی برای مدل سازی و پهنه بندی حساسیت وقوع زمین لغزش استفاده شد. از مجموع 148 نقاط لغزشی و غیرلغزشی، 70 درصد برای مرحله آموزش و 30 درصد برای مرحله آزمایش مدل سازی استفاده شد. برای ارزیابی کارایی مدل ها و انتخاب مدل بهینه از معیارهای سنجش خطای مدل Accuracy، F1-score و AUC و برای تحلیل حساسیت از روش حذفی استفاده شد. نتایج به دست آمده نشان داد که مدل RF (با 9/0Accuracy =، 957/0F1-score= و 999/0AUC=) در بخش آزمایش در مقایسه با دیگر مدل ها به عنوان بهترین مدل برای پهنه بندی حساسیت وقوع زمین لغزش انتخاب شد. بر اساس نتایج نقشه پهنه بندی مشخص شد که به ترتیب 86/31، 16/32، 38/13، 73/9 و 84/12 درصد در طبقات با حساسیت خیلی کم، کم، متوسط، زیاد و خلیی زیاد قرار دارد. علاوه براین نتایج تحلیل حساسیت مدل نشان داد که جهت شیب، حساس ترین پارامتر در پهنه بندی خطر وقوع زمین لغزش است. مقایسه نتایج مدل ها نشان داد که ارتباط معناداری بین مقادیر پیش بینی شده و مقادیر مشاهداتی با استفاده از مدل های استفاده شده وجود ندارد. بر اساس نتایج به دست آمده از نقشه پهنه بندی حساسیت وقوع زمین لغزش می توان به اولویت بندی و مدیریت مناطق پایدار و با حساسیت کم به وقوع حرکت های توده ای برای اجرای عملیات عمرانی پرداخت.

کلید واژگان: حوزه آبخیز هراز، رگرسیون فرآیند گاوسی، زمین لغزش، شاخص حساسیت زمین لغزش، ماشین بردار پشتیبان، مدل جنگل تصادفی

Landslide susceptibility mapping using various soft computing techniques (Case study: A part of Haraz Watershed)

Alireza Sepahvand *, Nasrin Beiranvand

Journal of Water and Soil Management and Modeling, Volume:4 Issue: 2, 2024, PP 261 -278

Introduction

A landslide is one of the mass movements on the top surface of the earth. Landslides have resulted in notable injury and damage to human life and destroyed infrastructure and property. Landslides represented approximately Nine percent of the natural disasters worldwide during the 1990s. According to studies, this trend is expected to continue due to increased human development. Many studies have been done to determine the factors affecting mass movement. In large part of Iran including the mountain areas, tectonic activity and seismic high with diverse geological and weather conditions led to many countries prone to landslide. Landslides cause wide damage to natural resources, human settlements, infrastructure, mud floods, and filling reservoirs. Landslides cause extensive property damage and occasionally result in loss of life. Besides, should not be ignored the social and environmental impacts resulting from the occurrence of this phenomenon, such as immigration and unemployment. One of the strategies for reducing losses due to a range of movements is the identification and management of unstable slope areas. To identify unstable regions pay to landslide hazard mapping. The main purpose of this research is to assess the effective parameter on landslide occurrence and to compare different machine learning models including SVM, GP regression, and RF for landslide susceptibility zoning.

Materials and Methods

The study area is a part of the Haraz Watershed, Mazandaran Province, Iran, occurrence many landslides are damaged after each heavy rain. So, it was selected as a suitable Watershed to evaluate the landslide susceptibility mapping (LSM). The vegetation covers and land mainly consists of rangeland. The geology of the study area consists mainly of Quaternary and Shemshak formations. The first step for the assessment of landslide susceptibility is gathering the necessary data and preparing information. These data were determined based on several factors. Considering the literature review, the local conditions, and previous studies. In this study, nine parameters such as slope angle, slope aspect, elevation, geology, land use, the distance of fault, the distance of the road, the distance of the river, and precipitation were identified as key factors for the prediction of landslide susceptibility. To assess the effectiveness of GP-PUK, GP-RBF, SVM-PUK, SVP-RBF, AND RF to estimate the landslide susceptibility map (LSM), data used in the present study were taken from field data. In this study, the dataset contains 148 observations of landslide occurrence and landslide non-occurrence points. The landslide data have been randomly separated into training (70% of landslides; 103) and testing (30% of the landslides; 45). To judge the performance of the soft computing techniques, statistical evaluation parameters were used. In this research, three statistical evaluation parameters were used. These parameters are the correlation coefficient (C.C.), root mean square error (RMSE), and Nash–Sutcliffe model efficiency (NSE).

Results and Discussion

According to the results of the comparison of methods, RF was the best model and the accuracy of the RF model was more suitable for the estimation of the landslide occurrence. So, in this study, RF was used for the landslide susceptibility map. Single-factor ANOVA test suggests that there is an insignificant difference between observed and predicted values of landslide occurrence and landslide non-occurrence using GP_PUK, GP_RBF, SVM_PUK, SVM_RBF and Random Forest approaches. According to the results of the comparison of methods, RF was the best model and the accuracy of the RF model was more suitable for the estimation of the landslide occurrence. The map of landslide susceptibility map was divided into five classes from none susceptible to very high susceptibility. According to the final Landslide susceptibility map, the area belonging to the “non-susceptible” class covers 35.86 km2, “low susceptibility” class 36.19 km2, “moderate susceptibility” class 15.06 km2, “high susceptibility” class 10.95 km2 and “very high susceptibility” class 14.46 km2 of Haraz Watershed. Sensitivity analysis was performed to find the most significant input parameter in the prediction of landslide occurrence and landslide non-occurrence. The result shows that aspect has a major role in predicting landslide occurrence and landslide non-occurrence in comparison to other input parameters, respectively.

Conclusion

Due to all results, some zones are potentially dangerous for any future habitation and development. Thus, there is an immediate need to implement mitigation measures in the very high-hazard and high-hazard zones, or such zones need to be avoided for habitation or any future developmental activities. The results of this research can be used by the local authority to manage properly, and systematically and plan development within their areas.

Keywords: Haraz Watershed, Landslide, Landslide Susceptibility Index (LSI), Support Vector Machine, Gaussian Process, Random Forest Method

ارزیابی روش های محاسبات نرم در برآورد رسوب معلق رودخانه (ایستگاه حسن آباد رودخانه تیره)

امیر مرادی نژاد*، سعید خسروبیگی، محمود اکبری، سیداحمد حسینی

نشریه مدل سازی و مدیریت آب و خاک، سال چهارم شماره 2 (تابستان 1403)، صص 241 -260

برآورد بار رسوب رودخانه ها از مسائل مهم و کاربردی در مطالعات و طراحی پروژه های مهندسی آب، مانند طراحی و توسعه شبکه های آبیاری و زهکشی، آبگیری از رودخانه و غیره است. مدل های آماری و رگرسیونی از معمول ترین روش های تحلیل هستند که اغلب با توجه به حل خطی این پدیده ها، نتایجی همراه با خطا ارائه داده اند. مدل های هیدرولیکی به دلیل نیاز به داده های زیاد و گاهی در دسترس نبودن داده های مورد نیاز و دقیق نبودن داده ها به علت خطای انسانی برای شبیه سازی رسوبات، همیشه نمی توان به آن ها اعتماد کرد. امروزه سیستم هادی هوشمند فازی و عصبی با توجه به توانایی در حل پدیده های غیرخطی و پیچیده، کاربردهای فراوانی در مسائل مختلف مهندسی آب از جمله رسوب پیدا کرده اند. هدف از پژوهش حاضر نیز ارزیابی و مقایسه چهار روش مدل های فازی-عصبی تطبیقی (ANFIS)، ماشین بردار پشتیبان (SVM)، برنامه ریزی بیان ژن (GEP) و روش گروهی کنترل داده ها GMDH در برآورد بار رسوب ایستگاه حسن آباد رودخانه تیره استان مرکزی است. بدین منظور به عملکرد چهار نوع مدل در شبیه سازی بار رسوبی رودخانه ها پرداخته، سپس نتایج چهار روش با یک دیگر و با نتایج منحنی سنجه مورد مقایسه قرار گرفت. نتایج بیان گر عملکرد قابل قبول مدل ها نسبت به منحنی سنجه است. هم چنین، نتایج برتری مدل (GMDH) با بیش ترین ضریب تبیین (R2) با مقدار 99/0 و کم ترین ریشه میانگین مربعات خطا (RMSE) بر حسب تن در روز با مقدار 0038/0 نشان داد. در این خصوص کارآیی مدل (GEP) تا حدی بهتر از مدل های SVM و ANFIS بود. در مرحله بعد، از بهترین الگوی انتخابی مدل های ANFIS، SVM و GEP به عنوان ورودی مدل GMDH استفاده شد. نتایج بیان گر عملکرد قابل قبول مدل GMDH با بیش ترین ضریب تبیین (R2) برابر 99/0 و 98/0 و کم ترین ریشه میانگین مربعات خطا به ترتیب برابر 0038/0 و 0045/0 تن در روز شد. نتایج به دست آمده نشان داد هر چهار روش داده کاوی بررسی شده به مراتب نتایج بهتری نسبت به منحنی سنجه رسوب ارائه می کنند.

کلید واژگان: بار معلق، برنامه ریزی بیان ژن، رسوب، شبکه فازی-عصبی، ماشین بردار پشتیبان

Assessing soft calculation methods in river suspended sediment estimation (Hassan Abad station of Tirah river)

Amir Moradinejad *, Saeid Khosrobeigi, Mamood Akbari, Seyed Ahmad Hosseini

Journal of Water and Soil Management and Modeling, Volume:4 Issue: 2, 2024, PP 241 -260

Introduction

Rivers are always faced with erosion and sediment transport. Sediment transport in rivers is one of the most complex topics in river engineering and is always the focus of experts and water engineers. This phenomenon is one of the important hydrodynamic processes that affect many hydraulic systems and water facilities and is considered one of the basic problems in the exploiting surface water resources globally. Estimating the sediment load of rivers is one of the important and practical issues in the studies and design of water engineering projects, such as the design and development of irrigation and drainage networks, water extraction from rivers, etc. Sediment concentration can be calculated by direct or indirect methods, which are usually expensive and time-consuming direct methods. Various factors affect this phenomenon, which makes their analysis difficult. Therefore, they cannot model the sedimentation phenomenon with acceptable accuracy. Hydraulic models cannot always be trusted due to the need for a lot of data, unavailability of the required data, and the inaccuracy of the data due to human error for simulating sediments. Nowadays, fuzzy and neural intelligent conductor systems, due to their ability to solve complex and nonlinear phenomena, have found many applications in various water engineering problems, including sedimentation. The purpose of this research is to evaluate and compare adaptive neural fuzzy models (ANFIS), support vector machine (SVM), gene expression programming (GEP), and group model of data handling (GMDH) in estimating the sediment load of Tirah River, Markazi Province.

Materials and Methods

In this research, first, the long-term daily statistics of temperature, rainfall, average flow rate, and sediment concentration of Hasan Abad hydrometric and sediment measuring station located on the main branch of the Tirah River were collected. Then, the data sufficiency test for analysis, checking the correlation between parameters of river discharge, precipitation, temperature with sediment discharge, and determining the long-term average of suspended sediment in the studied stations were performed. In the next step, a suitable combination of input variables was selected. The design of the input parameter pattern can be based on the relationship between flow and sediment flow parameters, rainfall, temperature, flow, and sediment flow. Of course, considering that the mentioned parameters have a historical course, therefore, the design of the input patterns of soft computing models should be done based on time delays (like what is discussed in the analysis and forecasting of time series). Determining the most appropriate time delay of the input parameters in the modeling of discharge, sediment, temperature, and rainfall, then the appropriate design of the structure of the used soft calculation models was done. In the next step, the estimation of sediment discharge using an SVM, GEP, and ANFIS group method of GMDH data control and comparison of three data mining methods, and also with the sediment rating curve and observational data. About 70 % of the research data was used as training and between 20 to 30 % for validation and testing.

Results and Discussion

Based on the statistical indicators of optimal model selection, the best performance of the SVR model has been obtained for model number one. In this model, the R2 and RMSE obtained from the model are 0.96 and 0.0047, respectively. Besides, the R2 and the RMSE error of the models in predicting suspended sediment values in the test stage are 0.95 and 0.014, respectively for the ANFIS model, and 0.50 and 4.97, respectively for the GEP model. The best performance of the ANFIS model has been obtained for model number one. In this model, the R2 and the RMSE obtained from the model are 0.95 and 0.014. The R2 and RMSE of the models in predicting suspended sediment values in the test stage are 0.96, 0.0047 for the SVR model, and 0.50, 4.97 for the GEP model, respectively. The best performance of the GEP model has been obtained for pattern number nine. In this model, the R2 and RMSE obtained from the model are 0.99 and 0.010, respectively. The R2 and the RMSE of the models in predicting the amount of suspended sediment in the test stage are respectively equal to 0.70, 0.015 for the ANFIS model and 0.78, 0.0185 tons respectively for the SVR model.

Conclusion

It can be seen that the performance of the GEP model was better compared to other models. SVR and ANFIS models are ranked second and third. In the next step, the best-selected pattern of ANFIS, SVM, and GEP models was used as the input of the GMDH model. First, input pattern one, which was selected as the best pattern for ANFIS and SVM models, was introduced as the input of the GMDH model. In the training and test, the values of R2 statistical indices are 0.94 and 0.99, respectively, the RMSE error value is 0.0079 and 0.0038, respectively, the MSE value is 0.000062 and 0.000015, respectively, and the MAPE values are respectively 0.007 and 0.003. In the next step, input pattern nine, which was selected as the best pattern for the GEP model, is introduced as GMDH input. In the training and test steps, the value of R2 is equal to 0.95 and 0.98 respectively, the RMSE error value is equal to 0.0077 and 0.0045 respectively, and the MSE value is equal to 0.0006 and 0.00002 respectively, and MAPE value is equal to 363 and 502. The results showed the acceptable performance of the GMDH model with the highest R2 equal to 0.99 and 0.98 and the lowest RMSE equal to 0.0038 and 0.0045, respectively.

Keywords: Fuzzy Neural Network, Gene Expression Programming, Suspended Load, Sedimentation, Support Vector Machine

ارزیابی پارامترهای موثرجهت پیش بینی عیار پتاسیم شورابه با استفاده از الگوریتم های ماشین بردار پشتیبان و جنگل تصادفی (مطالعه موردی: پلایای شهرستان خور و بیابانک، استان اصفهان)

مریم ایرجی*، سید علیرضا موحدی نائینی، چوقی بایرام کمکی، سهیلا ابراهیمی، بامشاد یغمایی

مجله تحقیقات آب و خاک ایران، سال پنجاه و پنجم شماره 1 (پیاپی 97، فروردین 1403)، صص 145 -161

اهمیت پتاسیم در بالا بردن کمیت و کیفیت محصولات کشاورزی، تقاضا را برای کودهای پتاسیمی افزایش داده است. تضمین استخراج پتاسیم از شورابه های زیرزمینی مقدار عیار پتاسیم در آن هاست. هدف این پژوهش استفاده از الگوریتم های جنگل تصادفی (RF) و ماشین بردار پشتیبان (SVM) به منظور اولویت بندی پارامترهای موثر بر عیار پتاسیم شورابه زیرزمینی در پلایای خور و بیابانک استان اصفهان است. به همین منظور تعداد 55 پارامتر در 12 گمانه حفاری اندازه گیری شد. پارامترهای اندازه گیری شده به عنوان متغیرهای مستقل شامل درصد رطوبت اشباع مغزه در 15عمق مختلف، جرم مخصوص ظاهری مغزه در 15عمق مختلف، تخلخل مغزه در 15عمق مختلف، مساحت پلی گون، عمق آب زیرزمینی، عمق لایه نمک، پتاسیم لایه سطحی، دانسیته شورابه و میزان عناصر کلسیم، منیزیم، سدیم، کلر و عیار پتاسیم به عنوان متغیر وابسته وارد مدل شدند. در مدلRF برای اولویت بندی، پارامترها از روش های اهمیت ویژگی جایگشت (PFI) و حذف ویژگی جایگشتی (RFE) استفاده شد. در کرنل های مختلف الگوریتم SVM به منظور جلوگیری از هم خطی پارامترهای مستقل، تمام ترکیب های حاصل از متغیرهای مستقل با در نظر گرفتن ضریب تورم واریانس کمتر از 8 و بالاترین ضریب تعیین و کمترین خطای MSE بررسی و به عنوان بهترین ترکیب انتخاب شدند. پارامترهای موثر در پیش بینی عیار پتاسیم شورابه در الگوریتم RF و تابع خطی الگوریتم SVM به ترتیب sp، ap، duw، slp، SAR و n، sp، duw و SAR بودند که منجر به بهترین نتیجه (ضریب تعیین زیاد و خطای کم) شدند. ضریب تعیین برای هر دو مدل به ترتیب 99/0 و 97/0 که نشان دهنده دقت خوب هر دو الگوریتم است.

کلید واژگان: پیش بینی عیار، جنگل تصادفی، شورابه، ماشین بردار پشتیبان

Evaluation of effective parameters for predicting the potassium grade of saline water by using support vector machine and random forest algorithms (case study: playa of Khoor and Biabank area city, Isfahan province)

Maryam Iraji *, Seyed Alireza Movahedi Naeini, Chooghi Bayram Komaki, Soheila Ebrahimi, Bamshad Yaghmaei

Iranian Journal of Soil and Water Research, Volume:55 Issue: 1, 2024, PP 145 -161

The importance of potassium in agricultural products has increased the demand for potassium fertilizers. Potassium grade in aquifers ensures its extraction. The purpose of this research is to use RF and SVM algorithms in order to prioritize the effective parameters on the potassium grade of saline water groundwater in playa Khoor and Biabank in Isfahan province. For this purpose, 55 parameters were measured in 12 drilling holes.The parameters measured as independent variables include the percentage of saturated moisture, the apparent specific gravity and the porosity of the core at 15 different depths, the area polygon, the depth of the underground water, the depth of the salt layer, the potassium of the surface layer, the density of the brine and the amount of Elements of calcium, magnesium, sodium, chlorine and grade potassium were included in the model as dependent variables. In the RF model, the (PFI) and (RFE) were used for prioritization. In the different kernels of the SVM algorithm, in order to prevent the collinearity of the independent parameters, all the combinations of the independent variables, considering the variance inflation factor less than 8 and the highest coefficient of determination and the lowest MSE error, were examined and selected as the best combination. The effective parameters in predicting the grade potassium of the brine in the RF algorithm and the linear function of the SVM algorithm are sp, ap, duw, slp, SAR and n, sp, duw, and SAR respectively, which led to the best results. The coefficient of determination for both models is 0.99 and 0.97, respectively, which indicates the good accuracy of both algorithms.

Keywords: grade prediction, Random forest, Saline water, Support vector machine

مدل سازی بارش- رواناب ایستگاه های هیدرومتری خرمازرد و بناب با استفاده از الگوریتم ماشین بردار پشتیبان و جنگل تصادفی

زینب بیگدلی، ابوالفضل مجنونی هریس*، رضا دلیرحسن نیا، سپیده کریمی

نشریه آب و خاک، سال سی و هفتم شماره 6 (پیاپی 92، بهمن و اسفند 1402)، صص 971 -989

شبیه سازی فرآیند بارش-رواناب می تواند نقش بسزایی در مدیریت منابع آب و مسائل هیدرولوژی داشته باشد. در این تحقیق با استفاده از مدل های داده کاوی ماشین بردار پشتیبان (SVM) و جنگل تصادفی (RF) اقدام به مدل سازی بارش- رواناب دو ایستگاه بناب و خرمازرد به ترتیب واقع بر روی رودخانه های صوفی چای و ماهپری چای (دشت مراغه) شده است. در مطالعه حاضر داده های ایستگاه های هواشناسی و هیدرومتری منطقه از سال 1355 تا 1397 از شرکت آب منطقه ای و سازمان هواشناسی استان آذربایجان شرقی دریافت گردید. تغییر روند رواناب جاری در سال 1374، باعث گردید مدت مطالعه به دو دوره قبل و بعد آن تقسیم شود. مقدار بارش و رواناب با تاخیر زمانی یک ماه بعنوان ورودی به این مدل وارد و سپس مقادیر رواناب ماهانه مشاهداتی با رواناب ماهانه تخمین زده شده با استفاده از معیارهای ارزیابی خطا مورد بررسی گرفت. نتایج نشان داد که در هر دو دوره برای ایستگاه بناب مدل SVM کارآیی بالاتری نسبت به مدل RF داشت و در ایستگاه خرمازرد نیز برای این دو دوره، مدل RF عملکرد بهتری از مدل SVM ارائه کرد. نتایج مدل سازی در مجموعه تست در دو ایستگاه نشان داد که مقدار همبستگی متقابل برای دو دوره مطالعاتی اول و دوم ایستگاه بناب به ترتیب برابر با 85/0 و 84/0 و برای ایستگاه خرمازرد برابر با 79/0 و 75/0 بدست آمد. با توجه به نتایج مقادیر آماره من کندال و سری های زمانی برای هر دو ایستگاه، روند مشخصی برای بارش در طول دوره مشاهده نشد، ولی دبی رودخانه صوفی چای در ایستگاه بناب، بخصوص بعد از سال 1374 روند صعودی و دبی رودخانه ماهپری چای روند کاملا نزولی داشته است.

کلید واژگان: بارش- رواناب، جنگل تصادفی، دشت مراغه، صوفی چای، ماشین بردار پشتیبان، مدل سازی

Rainfall-Runoff Modeling of Khormazard and Bonab Hydrometric Stations Using Support Vector Machine and Random Forest Algorithms

Z. Bigdeli, A. Majnooni-Heris *, R. Delearhasannia, S. Karimi

Journal of water and soil, Volume:37 Issue: 6, 2024, PP 971 -989

Introduction

Water plays a crucial role in ensuring the sustainable development of any region. Given that our country consists primarily of arid and semi-arid regions, where the majority of rivers are also found, along with the critical state of groundwater extraction and the growing importance of surface water, It is crucial to have a deep understanding of the future condition of water resources within the country's watersheds (Fathollahi et al., 2015). By utilizing intelligent models, it becomes feasible to represent the inherent relationships between data that cannot be solved by conventional mathematical methods. Support vector machine (SVM) and Random Forest algorithms are two types of machine learning methods that utilize essential algorithms for making repeated and accurate predictions (Kisi & Parmarm, 2016). The most recent study conducted by Zarei et al. (2022) evaluated the risk of flooding using data mining models of SVM and RF (case study: Frizi watershed). By analyzing the results, it was found that both the SVM algorithm and the new random forest algorithm showed higher accuracy in predicting flooding risks, both in terms of the educational data and algorithmic performance. The purpose of this study is to simulate the precipitation-runoff process in the hydrometric stations at the end of the Maragheh plain (Khormazard station on the Mahpari chai river and Bonab station on the Sufichai river) in East Azerbaijan province using support vector machine and random forest modeling algorithms. This study has been conducted over a period of 43 years, making it one of the few research cases in this area.

Materials and Methods

The Maragheh Sufi chai basin is situated in the eastern region of Lake Urmia, within the East Azarbaijan province. It covers an area of 611.89 square kilometers and is located between longitudes 45° and 40´ to 46° and 25´and latitudes from 37° and 15´ to 37° and 55´ north. The average height of the basin is 1767 meters above sea level (Sharmod et al., 2015). Based on the substantial changes observed in the runoff trend in the data since 1994 (without any noticeable change in the precipitation trend), the available data was divided into two distinct periods. The first period spans from 1976 to 1994, and the second period covers the years 1995 to 2019. To simulate rainfall-runoff, first the average rainfall of Maragheh plain was calculated by polygonal method. Subsequently, this data was combined with the discharge output from Bonab and Khormazard stations, with a one-day time lag. These inputs were then utilized in two models, SVM (kernel function) and RF. For this purpose, 70% of the data was used for the training stage and 30% of the data was used for the validation stage. Then, the rainfall and runoff training sets from one day before were chosen as the predictor variables, while the runoff training set was designated as the target variable. Several combinations of runoff and rainfall inputs were evaluated for the purpose of modeling. The inputs consist of the monthly Q and P values that were recorded previously (Pt, Qt-1), while the output represents the current runoff data (Qt), with the subscript t indicating the time step. As a result, two input combinations were constructed from Q and P data (as seen in Table 3) and SVM and RF models were used for rainfall-runoff modeling to determine the optimal input combination.
Calculating average rainfall through the Thiessen Polygons method Thiessen polygons, which are Voronoi cells, are used to define rainfall polygons that correspond to the surface area (Ai). These polygons are used to weight the rainfall measured by each rain gauge (ri). Consequently, the area-weighted rainfall is equivalent to:
(1)
Random Forest Algorithm
Random forest is a modern type of tree-based methods that includes a multitude of classification and regression trees. This algorithm is one of the most widely used machine learning algorithms due to its simplicity and usability for both classification and regression tasks.
Support Vector Machine (SVM) algorithm
Support vector machines works like other artificial intelligence methods based on data mining algorithm. The most important functions of the support vector machine model are classification and linearization or data regression.
Evaluation Criteria
To evaluate the models and compare their effectiveness, this research employs metrics such as the root mean square error (RMSE), correlation coefficient (r), explanation coefficient (R2) and Nash-Sutcliffe efficiency coefficient (NS) are used. Below are the relationships among these criteria:
(2)
(3)
(4)
(5)

Results and Discussion

Figure 6 displays the time series data for rainfall and runoff during the two study periods, before and after 1994.The analysis of the figures showed that for Bonab station, during the two study periods, the value of Kendall's statistic for precipitation variable was 0.044 and 0.028, respectively. For Khormazard station, this statistic value for the first and second period was 0.030, and 0.028, respectively. However, these values are not significant at the 95% level. This indicates that the annual rainfall for the two studied stations during these years is not statistically significant. Therefore, it is concluded that the annual rainfall in these stations between the years 1976 to 2019 did not show any significant trend. The variations observed during this period were deemed normal, suggesting that the time series of rainfall displayed fluctuating patterns. However, it should be noted that there were instances of both increasing and decreasing trends in certain years Examining the time series reveals varying trends Initially, the outflow from Bonab station (both a and b) displayed fluctuating patterns, followed by periods of both decreasing and increasing trends. However, in recent years, there has an increase in outflow from this station. The Mann-Kendall test statistic for the two study periods for this station is 0.325 and 0.512, respectively. These values are significantly different at the 95% level, indicating that the increasing trend of discharge for both time periods was statistically significant. The reason for this trend at the Bonab station, compared to other entrance stations to Lake Urmia, is the lower demand for water in the Sofichai basin for agricultural and industrial purposes, in contrast to other rivers. To explore the root cause of this issue, studies should be conducted to examine both underground and surface water sources, as well as the utilization of water in the agricultural and industrial sectors of this region. On the contrary, the trend observed at Khormazard station (c and d) is different. Unlike Bonab station, the discharge from Khormazard station exhibited a complete downward trend. The Mann-Kendall test statistic for the discharge variable during our two research periods were -0.269 and -0.412, respectively. At the 95% level, the decreasing trend of discharge in this station was found to be significant. On the other hand, it is apparent that the volume of discharge in this hydrometric station has decreased drastically since 1976 (d). Apart from 2007, when there was a sudden increase in discharge volume, the water inflow into lake Urmia has remained at its lowest level throughout the years. To analyze the Bonab and Khormazard stations during two distinct periods, rainfall and runoff statistics (average, minimum, maximum) for the first period (1976-1994) and the second period (1995-2019) are presented in Tables 4 and 5. Based on the data presented in both tables, the Bonab station displays the highest average rainfall and runoff values in the total data column, while the Khormazard station has the lowest average rainfall and runoff values.
As mentioned, in order to model rainfall-runoff data using SVM and RF models, a portion of the data was used for training purposes, while another portion was used for validation. Tables 5 and 6 present the values of the calculated statistical indicators associated with the results obtained from the training and validation sections for both SVM and RF models. According to the results of Tables 6 and 7, it is clear that in both study periods, the SVM model outperformed the RF model at the Bonab station. The SVM model demonstrated superior accuracy in simulating both flow rate and monthly rainfall. Conversely, at the Kharmazard station during these periods, the RF model displayed better performance compared to the SVM model. The modeling results in the test set for both stations revealed that the mutual correlation values for the first and second study periods at the Bonab station were 0.85 and 0.84, respectively. For the Kharmazard station, these values were 0.79 and 0.75, respectively.

Conclusion

The results indicate that for both periods at the Bonab station, the SVM model exhibited higher efficiency compared to the RF model. Conversely, at the Khormazard station, the RF model outperformed the SVM model for both periods. Mutual correlation values for the test sets were 0.85 and 0.84 for the first and second study periods at the Bonab station, respectively, for the SVM model test set. For the Khormazard station, these values were 0.79 and 0.75, respectively, for the RF model test set. Other notable findings of this research include the analysis of the time series data for rainfall and runoff over 43 years. Graphs obtained for both stations, along with the Mann-Kendall statistic for precipitation and flow parameters, revealed no discernible trend in precipitation during the two study periods. Instead, precipitation in these areas displayed fluctuating patterns However, the analysis of the time series and statistical values for the discharge of Sofichai and Mahpari chai rivers at the Bonab and Khormazard stations showed different results. In the Bonab station, the discharge exhibited fluctuations, with an increase observed in the second period. Conversely, at the Khormazard station, the discharge trend was downward in both study periods. The volume of Mahpari chai River outflow notably decreased in recent years, as evidenced by the Mann-Kendall statistic showing a decreasing trend.

Keywords: Maragheh Plain, Modeling, Rainfall-Runoff, random forest, Sufi Chai, Support vector machine

ارزیابی مدل های یادگیری ماشین در GIS جهت پیش بینی آب زیرزمینی مناطق نیمه خشک شرق ایران

مبین افتخاری*، علی حاجی الیاسی، سید احمد اسلامی نژاد

مجله آبخوان و قنات، سال چهارم شماره 2 (پیاپی 7، پاییز و زمستان 1402)، صص 49 -66

پیش بینی پتانسیل آب های زیرزمینی جهت توسعه و برنامه ریزی سیستماتیک منابع آب بسیار حیاتی است. هدف اصلی این تحقیق، توسعه مدل های یادگیری ماشینی از جمله جنگل تصادفی (RF)، درخت تصمیم (DT) و ماشین بردار پشتیبان (SVM) برای پیش بینی مناطق پتانسیلی آب زیرزمینی در دشت بیرجند است. بنابراین، برای اجرای این مطالعه، داده های ژئوهیدرولوژیکی مربوط به 37 چاه آب زیرزمینی (شامل تعداد و موقعیت چاه ها و سطح آب زیرزمینی) و 17 معیار هیدرولوژی، توپوگرافی، زمین شناسی و محیطی مورد استفاده قرار گرفت. روش انتخاب ویژگی از طریق کمترین مربعات ماشین بردار پشتیبان جهت تعیین معیارهای موثر برای بهبود عملکرد الگوریتم های یادگیری ماشین به کار گرفته شد. در نهایت، نقشه های پیش بینی پتانسیل آب زیرزمینی با استفاده از مدل های DT، RF و SVM تهیه شدند و عملکرد این مدل ها با استفاده از سطح زیر منحنی (AUC) و سایر شاخص های آماری مورد ارزیابی قرار گرفت. نتایج نشان داد که مدل DT (AUC=0.89) توانایی پیش بینی بسیار بالایی برای پتانسیل آب زیرزمینی در منطقه مورد مطالعه دارد و معیار ارتفاع به عنوان مهم ترین عامل در پیش بینی پتانسیل آب زیرزمینی در این منطقه شناخته شد. نتایج این مطالعه می تواند به عنوان راهنمایی برای تصمیم گیری و برنامه ریزی مناسب در استفاده بهینه از منابع آب زیرزمینی مورد استفاده قرار گیرد.

کلید واژگان: دشت بیرجند، نقشه های پیش بینی، جنگل تصادفی، درخت تصمیم، ماشین بردار پشتیبان

Assessment of machine learning models in GIS for predicting groundwater in semi-arid regions of eastern Iran

Mobin Eftekhari *, Ali Haji Elyasi, Seyed Ahmad Eslaminezhad

Journal of Auifer and Qanat, Volume:4 Issue: 2, 2024, PP 49 -66

Predicting groundwater potential is crucial for systematic development and planning of water resources. The main objective of this study is to develop machine learning models including Random Forest (RF), Decision Tree (DT), and Support Vector Machine (SVM) for predicting potential groundwater areas in the Birjand plain. Therefore, for the implementation of this study, geohydrological data related to 37 groundwater wells (including the number and location of wells and groundwater levels) and 17 hydrological, topographical, geological, and environmental criteria were used. Feature selection was performed using Support Vector Machine's least squares method to determine effective criteria for improving the performance of machine learning algorithms. Ultimately, predictive maps of groundwater potential were prepared using DT, RF, and SVM models, and the performance of these models was evaluated using the Area under the Curve (AUC) and other statistical indicators. The results showed that the DT model (AUC=0.89) has very high predictive capability for groundwater potential in the study area, and elevation was identified as the most important factor in predicting groundwater potential in this area. The findings of this study can serve as a guide for decision-making and appropriate planning in the optimal use of groundwater resources.

Keywords: Birjand Plain, Predictive Maps, Random Forest, Decision Tree, Support Vector Machine

ارزیابی تکنیک سنجش از دور و مدل های یادگیری ماشین در برآورد تبخیر و تعرق گیاه نیشکر

محمد علوی، محمد الباجی*، منا گلابی، عبدعلی ناصری، سعید همایونی

مجله مدیریت آب و آبیاری، سال سیزدهم شماره 4 (زمستان 1402)، صص 965 -982

تخمین تبخیروتعرق گیاه در مناطق خشک و نیمه خشک چالش برانگیز است زیرا این فرایند در طول زمان و مکان بسیار پویا است. هم چنین اندازه گیری این متغیر به صورت میدانی کاری بسیار وقت گیر و هزینه بر است. لذا این پژوهش با هدف ایجاد چارچوبی برای برآورد بهینه تبخیروتعرق گیاه نیشکر در مقیاس مکانی- زمانی با استفاده از چهار مدل یادگیری ماشین (MLR، CART، SVR و GBRT) در ترکیب با داده های سنجش از دور و متغیر های هواشناسی صورت گرفت. هم چنین به منظور کاهش وابستگی به پارامترهای متعدد هواشناسی در روش های مرسوم برآورد تبخیروتعرق، هشت مدل مختلف تجربی مبتنی بر دما و چهار مدل اصلاحی هارگریوز سامانی نسبت به مدل استاندارد فایو- پنمن- مانتیث ارزیابی شد. بدین منظور داده های هواشناسی از ایستگاه هواشناسی کشت و صنعت نیشکر حکیم فارابی در دوره زمانی سه ساله (1400-1397) گردآوری شدند. نه ترکیب مختلف از متغیرهای ورودی (داده های سنجش از دور و متغیر های هواشناسی) براساس روش Information Gain Ratio طراحی شدند و سپس توسط الگوریتم های یادگیری ماشین ارزیابی شدند. نتایج نشان داد که بیش ترین دقت مدل های یادگیری ماشین براساس آماره هایR2 ، RMSE و MAE به ترتیب در مدل های CART (99/0، 41/0 و 18/0) و GBRT (99/0، 65/0 و 26/0) به دست آمد. هم چنین از بین روش های تجربی مبتنی بر دما، روش ایوانف با R2 برابر 91/0 و روش بایر رابرتسون با R2 برابر 78/0 به ترتیب بهترین و ضعیف ترین عملکرد را ثبت کردند. به طورکلی روش سنجش از دور در ترکیب با مدل های یادگیری ماشین توانست مقادیر بهتر و دقیق تری از تبخیروتعرق گیاه را در مقیاس زمان و مکان ارایه نماید.

کلید واژگان: درخت تصمیم، رگرسیون بردار پشتیبان، شاخص های طیفی، مدل درخت گرادیان بوستینگ، مدل های تجربی

Evaluating Remote Sensing Technique and Machine Learning Algorithms in Estimating Sugarcane Evapotranspiration

Mohammad Alavi, Mohammad Albaji *, Mona Golabi, Abd Ali Naseri, Saeid Homayouni

Journal of Water and Irrigation Management, Volume:13 Issue: 4, 2023, PP 965 -982

Estimating crop evapotranspiration (ETc) in arid and semi-arid areas can be difficult due to the dynamic nature of this process across both time and space. In addition, obtaining on-site measurements for this variable can be very time-consuming and costly. This study aimed to develop a framework that accurately estimates the sugarcane crop evapotranspiration on a spatio-temporal scale. This was achieved using four machine learning (ML) algorithms (MLR, CART, SVR, and GBRT) combined with remote sensing (RS) data and meteorological variables. Also, to reduce the dependence on several meteorological parameters in conventional ETc equations, the performance of eight different experimental temperature-based methods and four modified Hargreaves & Samani equations was evaluated compared to the standard FAO-Penman-Monteith method. For this purpose, weather data were collected from Hakim Farabi Sugarcane Agro-Industrial meteorological station for three years (2018-2021). Nine combinations of input variables (RS data and meteorological variables) were designed based on the IGR method and then evaluated by the ML algorithms. The results showed that the highest accuracy of ML algorithms based on R2, RMSE, and MAE statistics was obtained in CART (0.99, 0.41, and 0.18) and GBRT algorithms (0.99, 0.65, and 0.26), respectively. Regarding temperature-based methods, Ivanov’s equation had the best performance with an R2 of 0.91, while Baier and Robertson’s equation had the weakest performance with an R2 of 0.78 when estimating ETc. Overall, the combination of RS and ML algorithms effectively produced more precise and reliable ETc values on both temporal and spatial scales.

Keywords: Decision Tree, experimental models, Gradient boosted regression tree, spectral indices, Support vector machine

مقایسه کارایی هیدرولیکی سرریزهای غیر خطی قوسی در پلان با استفاده از شبکه های عصبی GEP و SVM

مهدی ماجدی اصل*، توحید امیدپور علویان، مهدی کوهدرق، وحید شمسی

نشریه علوم آب و خاک (علوم و فنون کشاورزی و منابع طبیعی)، سال بیست و هفتم شماره 3 (پیاپی 105، پاییز 1402)، صص 179 -199

سرریزهای غیرخطی ضمن دارا بودن مزیت های اقتصادی، قابلیت عبوردهی بیشتری را نسبت به سرریزهای خطی دارند. این سرریزها با افزایش طول تاج در یک عرض مشخص، در مقایسه با سرریزهای خطی راندمان دبی بیشتر با ارتفاع آزاد کمتر را در بالادست دارند. الگوریتم های هوشمند به دلیل توانایی زیاد در کشف رابطه های دقیق پیچیده مخفی بین پارامترهای مستقل موثر و پارامتر وابسته و همچنین صرفه جویی مالی و زمانی، جایگاه بسیار ارزشمندی بین پژوهشگران پیدا کرده اند. در این پژوهش عملکرد الگوریتم های ماشین بردار پشتیبان (SVM) و برنامه ریزی بیان ژن (GEP) در پیش بینی ضریب دبی سرریزهای غیرخطی قوسی به کمک 243 سری داده آزمایشگاهی برای سناریو اول و 247 سری داده آزمایشگاهی برای سناریو دوم بررسی شده است. پارامترهای هندسی و هیدرولیکی استفاده شده شامل بار آبی (HT/p)، ارتفاع سرریز (P)، نسبت بار آبی کل ، زاویه سیکل قوسی (Ɵ)، زاویه دیواره سیکل(α) و ضریب دبی (Cd) است. نتایج هوش مصنوعی نشان داد که ترکیب پارامترهای (H_T/p ،α ،Ɵ و Cd) به ترتیب در الگوریتم های GEP و SVM در مرحله آموزش مربوط به سناریو اول (سرریز کنگره ای با زاویه دیواره سیکل 6 درجه) به ترتیب برابر است با (0/9811=R2)، (RMSE=0/02120)، (DC=0/9807)، (R2=0/9896)، (RMSE=0/0189)، (DC=0/9871). (در سناریو دوم (سرریز کنگره ای با زاویه دیواره سیکل 12 درجه) به ترتیب برابراست با (0/9770=R2)،(RMSE=0/0193)، (DC=0/9768) و (9908/0=R2)، (RMSE=0/0128)، (DC=0/9905) که در مقایسه با دیگر ترکیب ها منجر به بهینه ترین خروجی شده است که نشان دهنده دقت بسیار مطلوب هر دو الگوریتم در پیش بینی ضریب دبی سرریز غیرخطی قوسی است. نتایج آنالیز حساسیت نشان داد که پارامتر موثر در تعیین ضریب دبی سرریز غیرخطی قوسی در GEP و هم در SVM پارامتر نسبت بار آبی کل (HT/p) است. مقایسه نتایج این پژوهش با سایر پژوهشگران نشان می دهد که شاخصه های ارزیابی برای الگوریتم های GEP و SVM پژوهش حاضر نسبت به سایر پژوهشگران برآورد بهتری دارند.

کلید واژگان: شبکه های عصبی، سرریز غیرخطی، ضریب دبی، ماشین بردار پشتیبان، برنامه ریزی بیان ژن

Comparison of Hydraulic Efficiency of Arched Non-linear Weirs in Plan Using GEP and SVM Neural Networks

M. Majedi Asl*, T. Omidpour Alavian, M. Kouhdaragh, V. Shamsi

Journal of Hydrology and Soil Science, Volume:27 Issue: 3, 2023, PP 179 -199

Non-linear weirs meanwhile economic advantages, have more passing flow capacity than linear weirs. These weirs have higher discharge efficiency with less free height upstream compared to linear weirs by increasing the length of the crown at a certain width. Intelligent algorithms have found a valuable place among researchers due to their great ability to discover complex and hidden relationships between effective independent parameters and dependent parameters, as well as saving money and time. In this research, the performance of support vector machine (SVM) and gene expression programming algorithm (GEP) in predicting the discharge coefficient of arched non-linear weirs was investigated using 243 laboratory data series for the first scenario and 247 laboratory data series for the second scenario. The geometric and hydraulic parameters were used in this research including the water load (HT), weir height (P), total water load ratio (HT/p), arc cycle angle (Ɵ), cycle wall angle (α), and discharge coefficient (Cd). The results of artificial intelligence showed that the combination of parameters (Cd, H_T/p, α, Ɵ) respectively in GEP and SVM algorithms in the training phase related to the first scenario (Labyrinth weir with cycle wall angle 6 degrees) were respectively equal to (R2=0.9811), (RMSE=0.02120), (DC=0.9807), and (R2=0.9896), (RMSE=0.0189), (DC=0.9871) in the second scenario (Labyrinth weir with a cycle wall angle of 12 degrees) it was equal to (R2=0.9770), (RMSE=0.0193), (RMSE=0.9768), and (R2 = 0.9908), (RMSE = 0.0128), (DC = 0.9905), which compared to other combinations has led to the most optimal output that shows the very favorable accuracy of both algorithms in predicting the coefficient the Weir discharge is arched non-linear. The results of the sensitivity analysis indicated that the effective parameter in determining the discharge coefficient of the arched non-linear Weir in GEP and in SVM is the total water load ratio parameter (HT/p). Comparing the results of this research with other researchers revealed that the evaluation indices for GEP and SVM algorithms of this research had better estimates than other researchers.

Keywords: Neural networks, Non-linear weirs, Discharge coefficient, Support vector machine, Genetic expression tool

برآورد دبی جریان در فلوم های با تنگ شدگی مثلثی شکل با استفاده از روش های یادگیری ماشین

محمدرضا زایری *

نشریه تحقیقات مهندسی سازه های آبیاری و زهکشی، سال بیست و چهارم شماره 90 (بهار 1402)، صص 55 -70

فلوم های گلو بریده که نوعی پارشال فلوم بدون بخش طولی گلوگاه می باشند، به عنوان ابزارهایی ساده و کارامد نقش بسزایی حهت اندازه گیری دبی جریان در کانال های روباز محسوب می شوند. نصب ساده، هزینه راه اندازی پایین و دقت بسیار مناسب در اندازه گیری میزان دبی جریان از ویژگی های مهم این نوع از سازه هاست. در این پژوهش از نتایج آزمایشگاهی به دست آمده از سازه فلوم گلو بریده که با قرار دادن دو صفحه مثلثی در دو طرف دیواره های کناری یک کانال مستطیلی و تشکیل مقطع مستطیلی و ذوزنقه ای به کار گرفته شد، جهت توسعه مدل های یادگیری ماشین مورد بررسی قرار گرفت. به منظور برآورد دبی جریان در این نوع از کانال ها از مدل های شامل دسته بندی گروهی داده ها (GMDH)، ماشین بردار پشتیبان (SVM) و جنگل تصادفی (RF) استفاده گردید. بدین منظور از پارامترهای هندسی و هیدرولیکی شامل عرض تنگ شدگی در محل سازه، شیب های افقی و عمودی دیواره های مثلثی شکل، عمق نسبی جریان به عنوان متغیر ورودی استفاده و دبی به عنوان متغیر خروجی (پاسخ) در نظر گرفته شد. نتایج نشان داد که مقدار آماره ریشه میانگین مربعات خطا (RMSE) برای مدل های مبتنی بر GMDH، SVM و RF به ترتیب، 033/0، 016/0 و 020/0 و مقدار آماره ضریب تعیین (R2) به ترتیب، 805/0، 951/0 و 900/0 به دست آمد. مقایسه بین تحقیقات گذشته و نتایج حاضر حاکی از برتری عملکرد مدل مبتنی بر SVM نسبت به سایر مدل های توسعه یافته بود. عمق آب به عرض مقطع تنگ شده به عنوان مهم ترین داده ورودی مدل ها توسعه یافته شناسایی شد.

کلید واژگان: جنگل تصادفی، فلوم های گلو بریده، شبکه های آبیاری، دسته بندی گروهی داده ها، ماشین بردار پشتیبان

Discharge Prediction in Flumes with Trapezoidal Contraction by Machine Learning Techniques

mohamadreza zayeri

Irrigation and Drainage Structures Engineering Research, Volume:24 Issue: 90, 2023, PP 55 -70

Introduction

The effective use of water for irrigation requires that flow discharge and flow volume be measured carefully. Venturi flumes are widely-used structures for monitoring flow discharges in canal networks (Venturi tubes used in pipelines). Venturi was the first that observed the effect of a local contraction in a conduit on flow velocity distribution. The Venturi flumes have a local constriction. These devices may be built in different shapes and generally are very accurate when operated under free outflow conditions. Longitudinal section of the Venturi flumes has a constant bottom slope, or the bottom has a local sill or hump. The Venturi flume with an arc-shaped inlet is referred to as Khafagi flume. Montana flume is also a flume without diverging downstream wall. The polygonal-shaped flumes are simpler in construction than curve-shaped flumes and are less expensive due to plan shaped elements. The Parshall flume is characterized by a specific shape with variousdegrees of convergence and divergence. Since development of the Parshall flume in 1926, many studies have been made to simplify, reduce construction costs, increase performance and accuracy of the flumes. A review of the literature of the subject shows that the study of hydraulics in flow measurement flumes is mainly based on laboratory research, and although numerical modeling is of interest, due to the complexity of the geometry of the section and the Contraction of the triangular shape, their accuracy is not reported to be acceptable. On the other hand, researchers have tried to develop and use soft computing models to estimate flow characteristics in such sections due to the hydraulic complexity of these types of flumes, which according to published reports, their accuracy has been suitable in all types of sections. Therefore, in this research, the development of data group classification model, support vector machine (SVM) model and random forest algorithm were developed to estimate the Discharge in flumes with triangular contraction.

Methodology

By investigating the hydraulics of flow in flumes, researchers have found that the Discharge depends on the width of the contraction at the location of the structure, the horizontal and vertical slopes of the triangular walls, and the relative depth of the flow. Therefore, in the present study, for the development of GMDH, SVM and RF models, five dimensionless input parameters were considered. GMDH algorithm has been widely used in solving various hydraulic engineering problems. One of the most important applications of this method is the estimation of erosion around the bridge base, downstream of the cup-shaped launcher, and the discharge coefficient of flow measurement structures such as overflows. The SVM model is divided into two main groups a) Support Vector Classification model and b) Support Vector Regression model or SVR for short. The support vector machine classification model is used to solve data classification problems that are placed in different classes, and the support vector machine regression model is used to solve forecasting problems.

Results and Discussion

First, the collected data are divided into two categories, training and testing. It should be noted that the number of collected data is 592, and in this research, 80% of the data was assigned to training and the remaining 20% to testing. The training data is used for calibration and the test data is used for validation. Due to the fact that the collected data do not have a time series nature, they were randomly assigned to each of the training and testing groups. First, the results of the random forest model are presented. The reason for the priority of presenting the results of the random forest model compared to other modes used in this research is the identification of the most important effective parameters in the development process of the random forest model in the modeling and estimation of Discharge.

Conclusions

First, the collected data are divided into two categories, training and testing. It should be noted that the number of collected data is 592, and in this research, 80% of the data was assigned to training and the remaining 20% to testing. The training data is used for calibration and the test data is used for validation. Due to the fact that the collected data do not have a time series nature, they were randomly assigned to each of the training and testing groups. First, the results of the random forest model are presented. The reason for the priority of presenting the results of the random forest model compared to other modes used in this research is the identification of the most important effective parameters in the development process of the random forest model in the modeling and estimation of Discharge.

Keywords: Cut-throated flume, Irrigation networks, Support vector machine, GMDH, Random Forest

بررسی تغییرات کاربری اراضی با استفاده از تصاویر ماهواره ای در شهرستان زرند-کرمان

سعید دلگرم، معین گنجعلی خانی، بهرام بختیاری*

نشریه هواشناسی کشاورزی، سال یازدهم شماره 1 (بهار و تابستان 1402)، صص 64 -74

تغییرات کاربری اراضی به ویژه اراضی کشاورزی، تاثیر بسزایی در خرداقلیم و مدیریت منابع طبیعی و تبادل شار مابین سطح زمین و جو دارد. سنجش ازدور، یکی از روش های قابل اعتماد و دقیق در تهیه نقشه های کاربری اراضی بویژه در گسترهای وسیع می باشد.هدف از این مطالعه تهیه نقشه کاربری اراضی و آشکارسازی تغییر سطح پوشش مشتمل بر صنعتی، کشاورزی ، مسکونی و بایر در شهرستان زرند واقع در استان کرمان در سال های 1366 و 1399 ، با استفاده از تصاویر ماهواره ای می باشد. به منظور طبقه بندی کاربری اراضی در هریک ازین گروه های 4 گانه، از سه روش حداکثر درستنمایی، شبکه عصبی مصنوعی و ماشین بردار پشتیبان استفاده شد که ماشین بردار پشتیبان به عنوان روش برگزیده انتخاب گردید. به طورکلی نتایج نقشه های کاربری اراضی در سال های مورد مطالعه حاکی از افزایش 64 هکتاری بخش کشاورزی، 17 هکتاری بخش شهری و 2 هکتاری بخش صنعتی می باشد. افزایش مناطق صنعتی شهرستان و افزایش مناطق با کاربری کشاورزی دو مورد بسیار با اهمیت در تغییر محتمل الگوهای کشت و اقلیم کشاورزی منطقه بوده و شایسته توجه بیشتر در برنامه ریزی های بلند مدت زیست محیطی منطقه است.

کلید واژگان: اقلیم کشاورزی، شبکه عصبی مصنوعی، کاربری اراضی، ماشین بردار پشتیبان، Etm+

Investigation of land-use changes using satellite images in Zarand region, Kerman

S .Delgarm, M. Ganjalikhani, B .Bakhtiari *

Journal of Agricultural Meteorology, Volume:11 Issue: 1, 2023, PP 64 -74

Land-use changes especially agricultural lands have a significant impact on the microclimate of a region,natural resrource managrment and land-atmoshptre interactions .Remote sensing is a reliable and precise techniques in generating land-use maps. The aim of this study, is producing to a land use map by and detection of land cover pattern changes including urban,industrial agricultural and fallow land in Zarand region, Kerman province,south of Iran, during 1987 to 2020. In order to classify land use in four above mentioned types, three methods of maximum likelihood, artificial neural network, and support vector machine were used. The support vector machine was found to be the best performing method. In general, the generated land-use maps in the studied years showed an increase of 64,17 and 2 hectares in the agricultural, urban and industrial land uses, respectively. The observed increase in industrial and agricultural lands are quite important in possible changes of cropping pattern and agroclimatic condtion of the region and needs further investigatin in long term environmental management planing.

Keywords: Agro-climate, Artificial Neural Network, Etm+, Land-use, Support Vector Machine

بررسی عملکرد مدل های داده کاوی در پیش بینی بارش و تحلیل وضعیت خشک سالی ایستگاه سینوپتیک بندرعباس

عماد محجوبی*، حمید عبدل آبادی، جواد محجوبی، احسان غفوری

مجله مدیریت آب و آبیاری، سال سیزدهم شماره 2 (تابستان 1402)، صص 429 -499

استفاده از روش های مختلف داده کاوی در پیش بینی خشک سالی متداول است. با این حال، به طور عمده انتخاب مدل برتر بر مبنای دقت شبیه سازی صورت می گیرد. درحالی که در اغلب مطالعات به ویژگی های ساختاری مدل ها کم تر توجه شده است. در این مقاله کارایی مجموعه ای از متداول ترین مدل های داده کاوی شامل شبکه عصبی مصنوعی چندلایه پرسپترون (ANN-MLP)، شبکه عصبی با تابع پایه شعاعی (ANN-RBF)، درخت تصمیم رگرسیونی (CART)، مدل درختی (M5P) و ماشین بردار پشتیبان (SVM) جهت پیش بینی بارش یک سال بعد ایستگاه سینوپتیک بندر عباس ارزیابی شده و ویژگی های هر یک از آن ها تشریح می شود. واسنجی و صحت سنجی مدل ها با استفاده از داده های خام و میانگین متحرک سه ساله پارامترهای اقلیمی در بازه آماری 1347 تا 1396 انجام شد. عملکرد مدل ها با استفاده از پارامترهای آماری مختلف و نمودارهای مقایسه ای ارزیابی شد. نتایج نشان داد مدل های SVM و M5P به ترتیب با مقادیر RMSE برابر 93/7 و 31/8 میلی متر، MAE برابر 66/3 و 69/4 میلی متر و ضریب همبستگی 83/0 و 82/0 کارایی مطلوبی در پیش بینی بارش دارند. هم چنین، به استثنای مدل CART، تغییر در ابزار داده کاوی تفاوت هشت تا 11 درصدی در دقت تخمین ها ایجاد می کند؛ بنابراین انتخاب مدل مناسب تر باید بر مبنای سایر ویژگی های روش ها در کنار میزان دقت آن ها صورت پذیرد. به علاوه، بهره گیری از میانگین متحرک سه ساله به طور متوسط ضریب همبستگی را حدود 78 درصد افزایش و RMSE را حدود 63 درصد کاهش داده است. تحلیل وضعیت درازمدت خشک سالی نشان داد با افزایش طول دوره شاخص بارش استاندارد، میزان تفکیک سال های مرطوب و خشک مشخص تر می شود.

کلید واژگان: درخت تصمیم، شاخص بارش استاندارد، شبکه عصبی مصنوعی، ماشین بردار پشتیبان

Investigating the Performance of Data Mining Models in Rainfall Forecasting and Drought Analysis of Bandar Abbas Synoptic Station

Emad Mahjoobi *, Hamid Abdolabadi, Javad Mahjoobi, Ehsan Ghafoori

Journal of Water and Irrigation Management, Volume:13 Issue: 2, 2023, PP 429 -499

It is common to use different data mining methods in drought prediction. However, the selection of the best model is mainly based on the accuracy of the simulation, while most of the studies do not mention the features of the models. In this paper, the performance of the most common data mining models, including Multilayer Perceptron Artificial Neural Network (ANN-MLP), Radial Base Function Neural Network (ANN-RBF), Regression Decision Tree (CART), Model Tree (M5P), and Support Vector Machine (SVM) is evaluated in order to predict monthly one year ahead rainfall at Bandar Abbas synoptic station and then the characteristics of each of them are described. Calibration and validation of the models were done using raw data and a three-year moving average of climatic parameters from 1347 to 1396. The performance of the models has been evaluated using different statistical indices and comparative diagrams. The results showed that the SVM and M5P models have good prediction performance with RMSE of 7.93 and 8.31 mm, the MAE of 3.66 and 4.69 mm, and the CC of 0.83 and 0.82, respectively. Also, with the exception of the CART, the change in the data mining tool makes an eight to 11 percent difference in the accuracy of the estimates. Therefore, the most appropriate model should be selected based on other characteristics of the methods besides their accuracy. In addition, using the three-year moving average of the input parameters has increased the correlation coefficient by about 78 percent and reduced the RMSE by about 63 percent. The analysis of the long-term drought situation showed that with the increase in the period of the standard precipitation index, the separation of wet and dry years becomes more specific.

Keywords: Artificial Neural Network, Decision Tree, Standard Precipitation Index, Support vector machine

شبیه سازی عملکرد و بهره وری آب گیاه خیار (Cucumis sativus L.) با استفاده از شبکه عصبی مصنوعی

ساناز شکری، عبدالرحیم هوشمند، منا گلابی، ناصر عالم زاده انصاری، دن استرو

مجله مهندسی آبیاری و آب ایران، پیاپی 52 (تابستان 1402)، صص 165 -182

به منظور انجام شبیه سازی میزان عملکرد و بهره وری آب گیاه خیار (Cucumis sativus L.) آزمایشی در قالب طرح بلوک کاملا تصادفی با سه سطح آبیاری 100، 85 و 75 درصد نیازآبی در دو فصل کشت طی سال های 1397 و 1398 اجرا و از شبکه های عصبی پرسپترون (MLP) و روش ماشین بردار پشتیبان (SVM) استفاده گردید و در نهایت جهت انتخاب مدل مناسب و بهینه از شاخص های ضریب تبیین، میانگین مربعات خطا و میانگین مربعات خطای نرمال شده استفاده شد. میزان آب آبیاری،، تعداد برگ روی بوته، دما، میزان تبخیر و میزان رطوبت نسبی به عنوان داده های ورودی انتخاب شدند و به ترتیب 60، 20 و 20 درصد کل داده ها، به ترتیب برای آموزش، اعتبارسنجی و آزمون مدل اختصاص یافت. نتایج نشان داد که شبکه عصبی MLP با ورودی های میزان آب آبیاری و تعداد برگ به ترتیب با داشتن ضریب تبیین 92/0 و 86/0 دقت بیشتری در شبیه سازی میزان عملکرد میوه و بهره وری آب مصرفی در گیاه خیار داشت. نتایج آنالیز حساسیت حاکی از آن بود که پارامتر ورودی آب آبیاری به ترتیب با ضریب حساسیت 9/0 و 86/0 مهمترین پارامتر موثر بر مدل بهره وری آب مصرفی و عملکرد میوه خیار می باشد.

کلید واژگان: پرسپترون، ماشین بردار پشتیبان، شبکه عصبی، آنالیز حساسیت، کم آبیاری

Simulation of Yield and Water Productivity of Cucumber Plant Using Artificial Neural Network

Sanaz Shokri, Abdolrahim Hooshmand, Mona Golabi, Naser Alemzadeansari, Dan Struve

Irrigation & Water Engineering, Volume:13 Issue: 52, 2023, PP 165 -182

In order to simulate the yield and water productivity of cucumber plant (Cucumis sativus L.), an experiment was conducted in the form of a completely randomized block design with three irrigation levels of 100, 85 and 75% of the water requirement in two growing seasons during 2017 and 2018 and using perceptron neural networks (MLP) and support vector machine (SVM) methods were used and finally, to select the appropriate and optimal model, the indices of explanatory coefficient, mean squared error and normalized mean squared error were used. The amount of irrigation water, number of leaves on the plant, temperature, evaporation rate and relative humidity were selected as input data and 60%, 20% and 20% of the total data were allocated for training, validation and testing of the model, respectively. The results showed that the MLP neural network with the inputs of irrigation water and number of leaves was more accurate in simulating fruit yield and water productivity in cucumber plants with an explanation coefficient of 0.92 and 0.86, respectively. The results of the sensitivity analysis indicated that the irrigation water input parameters are the most important effective parameters on the water consumption efficiency model and cucumber fruit yield with sensitivity coefficients of 0.9 and 0.86, respectively.

Keywords: Perceptron, Support Vector Machine, Neural Network, Sensitivity analysis, dehydration

کاربرد مدل های درختی و مبتنی بر کرنل در تعیین تبخیرتعرق مرجع روزانه در دو منطقه مرطوب و خشک ایران

فاطمه میکائیلی، سعید صمدیان فرد*

نشریه دانش خاک و گیاه، سال سی و سوم شماره 2 (تابستان 1402)، صص 35 -51

با توجه به واقع شدن ایران در اقلیم خشک و نیمه خشک، تبخیر تعرق یکی از موثرترین مولفه ها در بررسی وضعیت بیلان آبی است. برآورد دقیق این پارامتر در محاسبه دقیق نیاز آبی گیاهان و به تبع آن در طراحی و مدیریت سیتم های آبیاری و منابع آب از اهمیت ویژه ای برخوردار است. هدف از پژوهش حاضر، بررسی توانایی مدل رگرسیون بردار پشتیبان (SVR)، مدل جنگل تصادفی (RF) و مدل درختی M5P در پیش بینی روزانه مقادیر روزانه تبخیر تعرق گیاه مرجع در دو ایستگاه آستارا و سیرجان به ترتیب واقع در مناطق مرطوب و خشک ایران با استفاده از داده های هواشناسی حداقل، متوسط و حداکثر دما، رطوبت نسبی، تابش خورشیدی و سرعت باد در بازه زمانی سال های 2020-2000 است. درنهایت، دقت روش های مذکور و روش های تجربی در برآورد تبخیر تعرق روزانه گیاه مرجع با استفاده از معیارهای آماری جذر میانگین مربعات خطا، ضریب همبستگی، شاخص پراکندگی، ضریب نش- ساتکلیف و ضریب ویلموت مورد مقایسه قرار گرفت. نتایج حاصل از داده های صحت سنجی نشان داد که مدل های SVR3 (سناریو سه با روش رگرسیون بردار پشتیبان) و M5P3 (سناریو سه با روش مدل درختی M5P) در ایستگاه آستارا با در نظر گرفتن تمامی پارامترهای هواشناسی و با دارا بودن ضریب همبستگی 993/0، جذر میانگین مربعات خطای 201/0 و همچنین مدل SVR3 در ایستگاه سیرجان نیز با ضریب همبستگی 982/0، جذر میانگین مربعات خطای 410/0 در مقایسه با روش های تجربی هارگریوز- سامانی، مک کینک، تورک و دالتون نتایج بهتری در تخمین مقادیر تبخیر تعرق روزانه گیاه داشته اند.

کلید واژگان: تبخیر تعرق مرجع، جنگل تصادفی، درخت M5P، رگرسیون بردار پشتیبان، روش های تجربی

Application of Tree and Kernel- Based Models for Estimating Daily Reference Evapotranspiration in Humid and Arid Regions of Iran

Fatemeh Mikaeili, Saeed Samadianfard *

Journal of Soil and Plant Science, Volume:33 Issue: 2, 2023, PP 35 -51

Background and Objectives

The gradual increase in the world’s population requires continues increase in agricultural production. Climate change is one of the challenges of our society and frequent droughts affect large areas of the world, which requires more accurate management of water resources, both globally and in local catchments. Accurate estimation of components of the hydrological cycle is essential for proper irrigation scheduling. Most of the precipitation received by the earth is returned to the earth’s atmosphere by the process of evapotranspiration. On the other hand, because every process that takes place in the plant is dependent on water and one of the most common uses of water in the plant is evapotranspiration, so reducing amount of the water will have adverse effects on photosynthesis, crop production, product quality, etc. The complex and nonlinear relationship between the factors affecting the process of evapotranspiration, has caused researchers today to use new methods to accurately identify and predict this parameter. Reference evapotranspiration is a concept that uses the crop coefficient to obtain the actual water requirement. According to the FAO proposal, the FAO- Penman- Monteith equation was introduced as a benchmark method for calculating reference evapotranspiration values when measurements of this parameter are not available and there is no access to lysimetric data. One of the major advantages of this model is its physical basis and global validity, but this equation needs a large number of meteorological parameters that are often not available, instead empirical equations with low meteorological variables or modern methods such as artificial intelligence and machine learning methods can be used.

Methodology

In this study, meteorological data related to two stations of Astara located in the humid region and Sirjan located in the arid region of Iran in the period of 2000-2020 were studied to predict the crop evapotranspiration values. As mentioned, the FAO- Penman- Monteith method has used as a standard method for calibration and evaluation of the other functional equations and machine learning methods. In this study, four types of empirical equations including Hargreaves –Samani, Makkink, Turk and Dalton were evaluated against the FAO- Penman- Monteith model. Also, modelling was performed using Support Vector Regression, Random forest and M5P Tree model. In this study, 70% of data were considered for training and 30% for testing. Finally, statistical parameters including root mean squared error (RMSE), correlation coefficient (R), scatter index (SI), Nash-Sutcliffe coefficient (NS) and Wilmot index (WI) were used to determine the performance of each mentioned methods in estimating reference evapotranspiration values.

Findings

Using different meteorological parameters in accurate prediction of evapotranspiration using 4 combined scenarios, calibration calculations were performed on 70% of data and validation calculations were performed on 30% of testing data implementing Weka software. The obtained results showed that the SVR3 and M5P3 models in Astara station with all meteorological parameters and having R= 0.993, RMSE= 0.201 and also, the SVR3 model in Sirjan station with R= 0.982, RMSE= 0.410 compared to the studied empirical methods provided better results in estimating the reference evapotranspiration and scenario 3 with all meteorological parameters was introduced as the top scenario. Among the empirical methods, Hargreaves- Samani was superior to some models only in Astara station. At Sirjan station, none of the empirical models performed better than the machine methods.

Conclusion

Accurate estimation of reference evapotranspiration in water resource management is essential. In this study, meteorological data from Astara and Sirjan stations were used to evaluate the ability of machine learning methods including SVR, RF and M5P to estimate the values of reference evapotranspiration and compared the results with empirical methods. The results showed that the high accuracy of the SVR3 model in both stations and in the next position M5P3 model for humid area. Empirical methods except Hargreaves- Samani had poor performance compared to data- driven models. Finally, the use of SVR and M5P methods in irrigation scheduling is recommended.

Keywords: Empirical methods, M5P, Random forest, Reference evapotranspiration, Support Vector Machine

به جمع مشترکان مگیران بپیوندید!

support vector machine