Magiran | جستجوی کلیدواژه "data mining"

استفاده از روش رگرسیونی و مدل GMDH در تخمین نسبت رسوب ورودی به آبگیرهای جانبی

امیر مرادی نژاد*

مجله ترویج و توسعه آبخیزداری، پیاپی 46 (پاییز 1403)، صص 41 -51

به علت تغییراتی که در توزیع سرعت در محدوده دهانه آبگیر رخ می دهد، معمولا عمل رسوب گذاری صورت می گیرد که باعث کاهش راندمان آبگیری، افزایش هزینه های اجرایی برای عملیات رسوب زدایی و درنهایت تغییر مسیر و خط القعر رودخانه به سمت ساحل مقابل آبگیر می شود. استفاده از سازه های کنترل رسوب دیوار جداکننده در جلوی آبگیر و هم زمان آب شکن در مقابل آبگیر باعث کاهش رسوب ورودی و افزایش راندمان آبگیری می شود. در تحقیق حاضر تاثیر دیوار جداکننده و سازه آب شکن در تخمین نسبت رسوب ورودی به آبگیر به صورت آزمایشگاهی، روش های داده کاوی و رگرسیون چندگانه مورد ارزیابی قرارگرفته است. ابتدا با انجام آنالیز ابعادی، نسبت های بدون بعد استخراج و رابطه بین متغیرها و مقدار آن ها در آزمایش ها مشخص گردید. با استفاده از نرم افزارهای آماری XLSTAT و SPSS از روش گام به گام و رگرسیون استاندارد (اینتر) معادلاتی برای ارتباط بین متغیرهای مستقل و وابسته استخراج شد. بعد از به دست آوردن معادلات خطای نسبی هر معادله محاسبه شد. سپس بهترین معادله که R2 آن بالا و خطای نسبی آن پایین بود انتخاب و پیشنهاد شد. در مرحله بعد با روش های شبکه های عصبی مصنوعی روش کنترل گروهی داده ها (GMDH) مدل سازی انجام و بهترین روش در تخمین نسبت رسوب ورودی به آبگیر انتخاب شد. نتایج نشان داد که در تخمین نسبت رسوب ورودی به آبگیر بهترین عملکرد مربوط به مدل (GMDH) با شاخص های آماری 0/30,R2=0/85MAD=، 0/039RMSE= و 26/95MAPE= می باشد. در روش رگرسیون گام به گام 38/0,R2= 99/4 RMSE= و روش رگرسیون اینتر 0/76,R2= 4/16 RMSE= می باشد. هم چنین روش های داده کاوی نسبت به روش رگرسیونی دقت بالاتری دارند.

کلید واژگان: آبگیر، آب شکن، دیوار جداکننده، رسوب، داده کاوی

Using the regression method and GMDH model in estimating the ratio of input sediment to lateral Intake

Amir Moradinejad *

Journal of Extension and Development of Watershed Managment, Volume:12 Issue: 46, 2024, PP 41 -51

Due to the changes that occur in the distribution of velocity in the area of the catchment opening, sedimentation usually takes place, which reduces the efficiency of water intake, increases the operational costs for sedimentation operations, and finally changes the direction and contour of the trough. The river drains towards the opposite bank. The use of sediment control structures separating the wall in front of the catchment and at the same time the breakwater in front of the catchment reduces the incoming sediment and increases the catchment efficiency. In the current research, the effect of the separating wall and the breakwater structure in estimating the ratio of incoming sediment to the catchment has been evaluated by laboratory and data mining and multiple regression methods. First, by performing dimensional analysis, dimensionless ratios were extracted and the relationship between the variables and their value in the experiments was determined. Using XLSTAT and SPSS statistical software, equations for the relationship between independent and dependent variables were extracted from the step-by-step and inter method. After obtaining the equations, the relative error of each equation was calculated. Then the best equation with high R2 and low relative error was selected and proposed. In the next step, modeling was done with the methods of artificial neural networks and the method of group data control (GMDH) and the best method was selected to estimate the ratio of input sediment to the catchment. The results showed that in estimating the ratio of input sediment to the catchment, the best performance is related to the model (GMDH) with statistical indices, R2=0.850, MAD=0.03, RMSE=0.039 and MAPE=26.95. In the step-by-step regression method, R2=0.38, RMSE=4.99 and inter regression method, R2=0.76, RMSE=4.16. also data-mining methods compared to the method Regression has higher accuracy.

Keywords: Aquifer, Breakwater, Separating Wall, Sediment, Data-Mining

شبیه سازی مکانی تخریب سرزمین با بهره گیری از مدل نسبت فراوانی در دشت قزوین

اعظم ابوالحسنی، غلامرضا زهتابیان، حسن خسروی*، امید رحمتی، اسماعیل حیدری علمدارلو، پائولو دی اودوریکو

مجله مدیریت بیابان، پیاپی 22 (تابستان 1401)، صص 21 -38

با وجود اینکه تخریب سرزمین به عنوان چالش محیط زیستی در سطح جهان مطرح است، بررسی های اندکی در به کارگیری روش های جدید عددی (داده کاوی و آماری) برای تعیین مناطق حساس به تخریب انجام شده است. هدف از بررسی حاضر، شبیه سازی مکانی تخریب سرزمین در دشت قزوین بهره گیری از مدل نسبت فراوانی و تعیین نواحی مستعد تخریب در این دشت است. بدین منظور با بهره گیری از روند تغییرات تولید خالص اولیه طی سال های 1399-1380، نقاط وقوع تخریب سرزمین در دشت قزوین تعیین شد و به ترتیب 70% و 30% نقاط برای تهیه نقشه قابلیت تخریب سرزمین و اعتبار سنجی مدل مورد استفاده قرارگرفت. متغیر های تاثیرگذار (مستقیم و غیرمستقیم) بر تخریب سرزمین شامل دما، بارش، شیب، جهت، ارتفاع، هدایت الکتریکی و نسبت جذب سدیم آب زیرزمینی، افت سالانه آب زیرزمینی، تراز آب زیرزمینی، کاربری اراضی، شاخص تفاوت نرمال شده پوشش گیاهی، شاخص تفاوت نرمال شده شوری، شاخص شوری خاک-پوشش گیاهی، شاخص تفاوت نرمال شده رطوبت و شاخص خشکسالی مریی و مادون قرمز کوتاه ، به عنوان عامل پیش بینی کننده (مستقل) به مدل معرفی شد. در پایان، با بهره گیری از شاخص سطح زیر منحنی (AUC) کارآیی مدل در شبیه سازی مکانی تخریب سرزمین ارزیابی شد. نقشه قابلیت تخریب سرزمین نشان داد که مناطق حساس به تخریب در قسمت های شمال شرق، شمال، شمال غرب، غرب، جنوب غرب و جنوب دشت قزوین واقع شده و بیشتر کاربری مراتع خوب، متوسط و فقیر را شامل می شود. برای کاربری اراضی، بیشترین مقدار نسبت فراوانی نیز به مجموع کاربری های مرتع خوب، متوسط و فقیر (5.66) اختصاص داشت. مقدار0.7 = AUC نیز حاکی از کارآیی مناسب مدل نسبت فراوانی در شبیه سازی مکانی تخریب سرزمین بود.

کلید واژگان: تخریب سرزمین، داده کاوی، سنجش از دور، منحنی ROC، نسبت فراوانی

Spatial Simulation of Land Degradation in The Qazvin Plain Using A Frequency Ratio Model

Azam Abolhasani, Gholamreza Zehtabian, Hassan Khosravi *, Omid Rahmati, Esmail Heydari Alamdarloo, Paolo D&Rsquo, Odorico

Desert Management, Volume:10 Issue: 22, 2022, PP 21 -38

Although land degradation is a worldwide challenge and a destructive phenomenon, little studies have been done on the application of new numerical methods (data mining and statistically), for spatial simulation of this phenomenon and identification of areas sensitive to land degradation. The aim of this study is to spatially simulate land degradation in the Qazvin plain using the frequency ratio model to identify areas prone to land degradation. For this purpose, using the trend of changes in net primary production during the years 2001 to 2020, the points of occurrence of land degradation in the Qazvin plain were determined. Approximately 70% and 30% of the points were used to prepare the land degradation vulnerability map and validate the model's efficiency, respectively. For this research, 15 parameters affecting land degradation (directly and indirectly) including temperature, rainfall, slope, aspect, elevation, EC and SAR of ground water, ground water level, annual ground water decline, land use, normalized difference vegetation index, normalize difference salinity index, vegetation soil salinity index, normalized difference moisture index, and visible and shortwave infrared drought index, were introduced into the model as predictors factors or independent parameters. Finally, using the area under the ROC curve, the effectiveness of the frequency ratio model for spatial simulation of land degradation was assessed. The map of land degradation susceptibility shows that the areas prone to degradation are located in the northeast, north, northwest, west, southwest, and south of the Qazvin plain, which mainly includes good, moderate and poor rangelands. For the land use parameter, the highest frequency ratio was associated with the sum of good, moderate, and poor rangeland (5.66). The value of AUC = 0.7 indicates the good performance of the frequency ratio model in spatial simulation of land degradation.

Keywords: land degradation, Data Mining, Remote Sensing, ROC Curve, frequency ration

پایش داده های موثر بر گرد و غبار با استفاده از طبقه بندی شورایی ایران مرکزی

محمد هاشمی نژاد*

مجله علوم و مهندسی آبخیزداری ایران، پیاپی 57 (تابستان 1401)، صص 52 -61

امروزه استفاده از داده های ثبت شده در ایستگاه های سینوپتیک کشور یکی از مهم ترین منابع تحقیقات کاربردی برای پژوهشگران است. ایستگاه های سینوپتیک، اقلیم شناسی و غیره برای واکاوی های آماری مورد بررسی قرار می گیرند. در این تحقیق با استفاده از داده کاوی به روش طبقه بندی شورایی به پایش داده های موثر بر پدیده گرد و غبار ایستگاه های سینوپتیک ایران مرکزی پرداخته شد. در این مطالعه از داده های 36 ایستگاه سینوپتیک واقع در استان های اصفهان، کرمان، یزد، سیستان و بلوچستان، سمنان، مرکزی، خراسان رضوی، قم و خراسان جنوبی استفاده شد. پارامترهای دمای متوسط روزانه، بارش روزانه، ارتفاع ایستگاه، موقعیت جغرافیایی ایستگاه، سرعت حداکثر باد، جهت سرعت حداکثر باد، نقطه شبنم در طبقه بندی شورایی مورد استفاده قرار گرفت. همچنین مهمترین عامل موثر در بین این پارامتر ها برای گرد و غبار عامل حداکثر سرعت باد است که در همه روش های طبقه بندی به عنوان مهمترین عامل نشان داده شد. همچنین سه طبقه بند KNN، SVM با کرنل RBF و شبکه عصبی MLP به عنوان اعضای شورا انتخاب شدند که با دقت 90/7 درصد منشا تولید گرد و غبار (از منظر داخلی و خارجی بودن گرد و غبار) را به درستی تشخیص می دهد.

کلید واژگان: داده کاوی، ایستگاه های سینوپتیک، گرد و غبار، طبقه بندی شورایی، فلات مرکزی ایران

Validation of Synoptic Station Data Using Ensemble Classification on Central Iran

Mohammad Hasheminejad*

Iranian Journal of Watershed Management Science and Engineering, Volume:16 Issue: 57, 2022, PP 52 -61

Today, the use of data recorded in synoptic stations of the country is one of the most significant sources of applied research for researchers. Data recorded automatically or manually at synoptic, climatological, and other stations are analyzed for statistical analysis. In this research, the data recorded in the synoptic stations of Iran, which are used to determine the days of dust, were analyzed using the science of monitoring and data analysis using ensemble classification. In this study, data from 36 synoptic stations, were used. These stations are in Isfahan, Kerman, Yazd, Sistan and Baluchestan, Semnan, Markazi, Khorasan Razavi, Hamedan, Qom, and South Khorasan. The parameters of daily average temperature, daily rainfall, station height, geographical location of the station, maximum wind speed, maximum wind speed, and dew point were used for the classification. The results showed that the most important factor among these parameters for dust is the maximum wind speed, which was identified as the most significant factor in all classification methods. Also, three classifiers, KNN, SVM with RBF kernel, and MLP neural network, were selected as members of the ensemble, which accurately detects 90.7 percent of the source of dust production (from the inside and outside the dust).

Keywords: Data Mining, Synoptic Stations, Dust, Ensemble Classification, Central Plateau of Iran

Mapping spatial patterns of plant species based on machine-learning and regression models

H. Keshtkar *, P. Poormohammad

Desert, Volume:27 Issue: 1, Winter - Spring 2022, PP 201 -215

Various statistical techniques have been used for species distribution modeling that attempt to predict the occurrence of a given species with respect to environmental conditions. The current study was conducted to compare the performance of three regression-based models (multivariate adaptive regression splines, generalized additive models, and generalized linear models) with three machine-learning algorithms (random forest, artificial neural networks, and generalized boosted models). Also in this study, three sets of explanatory variables (climate-only, topography-only and combined topography-climate) for each species (i.e. Achillea millefolium, Festuca rupicola, and Centaurea jacea) were quantified and the effect of the interaction of the predictor variables with the modeling approaches on determining the accuracy of the predictions was tested. Model accuracy was evaluated using the area under the curve (AUC) of the receiver operating characteristics and true skill statistics (TSS). It was found that regression-based approaches, especially generalized additive model, performed better than those of machine-learning. The results showed that the topography-climate variables were the most important for mapping potentially suitable habitats of target species. The response curves associated with these variables indicate that there are ecological thresholds for favorable growth of all plant species studied.

Keywords: plant distribution, suitable habitats, explanatory variable, Data Mining

ارزیابی و مدل سازی پارامترهای اقلیمی موثر بر تولید سالانه گونه مرتعی ریواس (Rheum ribes) با الگوریتم های داده کاوی

مهدی بشیری*، علی ماروسی

مجله مرتع، سال چهاردهم شماره 3 (پیاپی 55، پاییز 1399)، صص 435 -451

شناخت ویژگی های اقلیمی موثر بر تولید سالانه ریواس (Rheum ribes) می تواند در مدیریت و توسعه آن در مراتع مفید واقع شود. در این پژوهش عملکرد سالانه این گونه در استان خراسان رضوی با 74 پارامتر اقلیمی طی دوره 10 ساله ارزیابی و پارامترهای اقلیمی موثر با الگوریتم های داده کاوی استخراج شد. ابتدا نقش پارامترهای اقلیمی مرتبط با درجه حرارت، رطوبت، بارندگی و ساعات آفتابی، با همبستگی و رگرسیون تحلیل شد. سپس 11 الگوریتم طبقه بندی در نرم افزار MATLAB برنامه نویسی و مقایسه شدند. نتایج نشان داد که عملکرد ریواس با میانگین دمای حداکثر تابستان، دامنه تغییرات دمای اردیبهشت تا شهریور، حداکثر دمای تابستان، میزان رطوبت نسبی و بارندگی فصل بهار همبستگی مثبت دارد. ارزیابی الگوریتم ها با شاخص های ضریب تعیین و میانگین مربع خطا نشان داد در تخمین عملکرد سالانه بر مبنای عوامل اقلیمی، روش تشخیص الگو در مرحله آزمون با ضریب تعیین 46/0 و روش های رگرسیونی، طبقه بندی ممیزی و k نزدیکترین همسایه در مرحله آموزش (ضریب تعیین برابر1) بهترین عملکرد را داشتند. با ورود عوامل موثر به روش گام به گام، رگرسیون خطی در مرحله آزمون (ضریب تعیین برابر 74/0) و روش k نزدیک ترین همسایه در مرحله آموزش با ضریب تعیین برابر 1، عملکرد ریواس را دقیق تر تخمین زدند. همچنین روش پیشنهادی K نزدیک ترین به میانگین، به ترتیب با مقادیر K برابر 6 و 7 در روش های ورود تمامی عوامل و عوامل موثر حاصل از روش گام به گام، بالاترین دقت را در تخمین عملکرد محصول داشت. لذا استفاده از روش های داده کاوی و مدل پیشنهادی، در شناسایی پارامترهای اقلیمی موثر بر گونه های مرتعی مختلف روشی کاربردی معرفی می گردد.

کلید واژگان: اقلیم، داده کاوی، رگرسیون گام به گام، عملکرد سالانه، مدل سازی

Modelling Climatic Parameters Affecting the Annual Yield of Rheum Ribes Rangeland Species using Data Mining Algorithms

Mehdi Bashiri*, Ali Maroosi

Journal of Rangeland, Volume:14 Issue: 3, 2020, PP 435 -451

Identification of climatic characteristics affecting the annual yield of Rheum Ribes can be useful in management and development of this species in the rangelands. In this research, the annual yield of this species in Khorasan-Razavi province based on 74 climatic parameters during a ten-year period evaluated and affecting climatic parameters extracted using data mining methods. First, the role of climatic parameters associated with temperature, humidity, rainfall and sunny hours analyzed using correlation and regression methods. Then, 11 classification algorithms in MATLAB software programmed and compared. The results showed that the Rheum Ribes yield has a positive relationship with the average of maximum temperatures in the summer, the range of high temperature in May to September, the maximum of summer temperatures and the relative humidity and rainfall of the spring. Evaluation of the algorithms using the indices of coefficient of determination and mean square error showed that in estimation of the annual yield based on climatic factors, the pattern recognition method at the testing stage with a coefficient of determination equal to 0.46 and regression methods, classification discrimination and K nearest neighbor (KNN) at the training stage (coefficient of determination equal to 1) had the best performance. With regard to the effective factors in stepwise method, the linear regression method at the testing stage (coefficient of determination equal to 0.74) and K nearest neighbor method at the training stage with coefficient of determination equal to 1, estimate the Rheum Ribes yield more accurately. Also, the proposed K nearest to mean (KNM) method for k values equal to 6 and 7 with regard to all factors and the effective factors resulted from stepwise method, respectively, had higher accuracy in yield estimation. So, application of data mining methods and the proposed model, in recognition of climate parameters affecting different rangeland species could be a practical approach.

Keywords: Annual yield, Climate, Data mining, Modelling, Stepwise regression

کاربرد ویژگی های ژئومورفومتری در نقشه برداری رقومی خاک با استفاده از منطق فازی و یادگیری ماشین

اصغر رحمانی، فریدون سرمدیان*، سید روح الله موسوی، سید عرفان خاموشی

نشریه مرتع و آبخیزداری، سال هفتاد و سوم شماره 1 (بهار 1399)، صص 105 -124

روش های معمول نقشه برداری خاک وابسته به نمونه برداری متراکم، متاثر از مقیاس و دانش کارشناس می باشد، بنابراین استفاده از رویکردهای جدید داده کاوی در تهیه نقشه رقومی ویژگی های خاک برای مرتفع نمودن مشکلات روش معمول هدف اصلی این تحقیق است. در این پژوهش 62 نمونه خاک از عمق 0-20 سانتی متر براساس روش شبکه منظم (300 متر) و نظر کارشناس انتخاب و ویژگی های درصد کربن آلی، رس و کربنات کلسیم در بخشی از اراضی دیم منطقه کوهین با مساحت 370 هکتار اندازه گیری گردیدند. دو دسته داده 80 و 20 درصد به ترتیب برای واسنجی و اعتبارسنجی مدل ها انتخاب گردیدند. با استفاده از نرم افزار SAGA GIS و مدل رقومی ارتفاع با قدرت تفکیک مکانی10متر ، 19 متغیر ژیومورفومتری استخراج و براساس آنالیز تجزیه مولفه های اصلی (PCA) سه متغیر ارتفاع، شاخص موقعیت توپوگرافی و شاخص شدت پستی و بلندی و همچنین براساس نظر کارشناس، نقشه واحدهای لندفرم برای مدل سازی ویژگی ها انتخاب گردیدند. مدل جنگل تصادفی دارای دقت بالاتری بود به نحوی که نتایج آن برای ویژگی های درصد کربن آلی، رس و کربنات کلسیم بر اساس آماره های ضریب تبیین (R2) به ترتیب مقادیر 63/0، 75/0 و 63/0 و ریشه میانگین مربعات خطا (RMSE) مقادیر 17/0، 5/7، 77/5 درصد و برای رویکرد SoLIM مقادیر ضریب تبیین (R2) 47/0، 42/0 و 42/0 و مقادیر ریشه میانگین مربعات خطای 2/0، 08/8 و 68/4 درصد حاصل گردید. رویکرد جنگل تصادفی با شناخت ارتباط غیرخطی و بهینه ویژگی های خاک و متغیر های محیطی موثر می تواند نقشه های رقومی را با دقت مناسب برای مدیریت و بهره برداری پایدار از اراضی پیش بینی نماید.

کلید واژگان: جنگل تصادفی، داده کاوی، مدل استنباطی خاک-زمین نما، نقشه برداری رقومی خاک

Application of Geomorphometric attributes in digital soil mapping by using of machine learning and fuzzy logic approaches

Asghar Rahmani, Fereydoon Sarmadian *, Sayed Roholla Mousavi, Seyyed Erfan Khamoshi

Journal of Range and Watershed Management, Volume:73 Issue: 1, 2020, PP 105 -124

Conventional soil mapping is related to High density sampling, affected by scale and expert knowledge So using of new data mining methods in digital soil properties mapping was the main aim of this study for resolving conventional soil survey problems. In this research, 62 surface soil samples based on regular grid and expert knowledge opinion were selected after that soil organic carbon(SOC), clay content and CaCO3 were determined in some part of Dryland Kuhin region with area of 372 ha. Data sets were divided to two 80%(calibration) and 20%(validation), respectively. From digital elevation model with 10-meter spatial resolution were derived 19 geomorphometric attribute in SAGA GIS software. Three geomorphometric covariate included TPI, TRI, DEM and landform map unit were chosen PCA and expert knowledge. RStudio and SoLIM Solution software were used for random forest (RF) and fuzzy logic modelling, respectively. The RF modelling results show that for SOC, clay and CaCO3 based on determination coefficient (R2) had 0.63,0.75,0.63 and RMSE 0.17,7.5,5.77 percentage and for SoLIM method revealed that R2 0.47,0.42,0.42 and RMSE 0.2,8.08,4.68 percentage, respectively. Generally, the RF model with creating nonlinear relationship among soil properties and environmental covariate can predicted digital map with appropriate precision for management and sustainable land utilization

Keywords: digital soil mapping, Data Mining, Random forest, Soil Landscape Inference Model

ارزیابی مدل های مختلف آماری در تهیه نقشه سیل گیری استان گیلان

عیسی غلامی، مهدی وفاخواه*، سیدجلیل علوی

نشریه مرتع و آبخیزداری، سال هفتاد و دوم شماره 4 (زمستان 1398)، صص 1011 -1022

به دلیل کمبود اطلاعات در اکثر حوزه های آبخیز، بسیاری از محققین برای مطالعه های هیدرولوژیکی و سیل گیری به استفاده از تجزیه و تحلیل های مکانی در سیستم اطلاعات جغرافیایی روی آوردند. پژوهش حاضر به منظور مقایسه کارایی سه مدل ماشین بردار پشتیبان (SVM)، خطی تعمیم یافته (GLM) و جمعی تعمیم یافته (GAM) در تهیه نقشه سیل گیری استان گیلان برنامه ریزی شده است. بدین منظور لایه های اطلاعاتی درجه شیب، جهت شیب، شکل شیب، ارتفاع از سطح دریا، فاصله از رودخانه، تراکم زهکشی، زمین شناسی، کاربری اراضی، شاخص رطوبت توپوگرافی و شاخص توان آبراهه در محیط سامانه اطلاعات جغرافیایی (نرم افزارهای ArcGIS و SAGA-GIS) تهیه شدند. سپس بر اساس اطلاعات 220 نقطه سیل گیر، از 70 درصد تعداد کل نقاط به منظور واسنجی و 30 درصد باقیمانده برای اعتبارسنجی و ارزیابی کارآیی مدل ها مورد استفاده قرار گرفت. نتایج ارزیابی دقت مدل ها به ترتیب با استفاده از شاخص های سطح زیر منحنی (AUC) و کاپا (Kappa) نشان داد که از نظر شاخص سطح زیر منحنی (AUC)، مدل ماشین بردار پشتیبان (SVM) با 835/0 و مدل جمعی تعمیم یافته (GAM) با 827/0 دارای دقت خیلی خوب و مدل خطی تعمیم یافته (GLM) با 79/0 دارای دقت خوب می باشد. از نظر شاخص کاپا (Kappa) مدل ماشین بردار پشتیبان (SVM) با 58/0 داری دقت خوب، مدل جمعی تعمیم یافته (GAM) با 53/0 و مدل خطی تعمیم یافته (GLM) با 48/0 دارای دقت قابل قبول می باشند. بنابراین بر اساس شاخص های مذکور مدل ماشین بردار پشتیبان (SVM) نسبت به دو مدل دیگر در شناسایی مناطق سیل گیر کارایی بالاتری دارد. همچنین عوامل فاصله از رودخانه، ارتفاع از سطح دریا و شیب بیشترین تاثیر را بر سیل گیری منطقه مورد مطالعه دارند.

کلید واژگان: سیل گیری، داده کاوی، مدل های داده محور، مدل سازی، منحنی تشخیص عملکرد، استان گیلان

Evaluating the Different Statistical Models for Flood Susceptibility Mapping in Guilan Province

Eisa Gholami, Mehdi Vatakhah *, SeyedJalil Alavi

Journal of Range and Watershed Management, Volume:72 Issue: 4, 2020, PP 1011 -1022

Due to the lack of information in most of the watersheds, many researchers attempt to use spatial analysis within Geographic Information System (GIS) in hydrological and Flood Prone (FP) area studies. The present study was designed to compare the efficiency of three models i.e. Support Vector Machine (SVM), Generalized Linear Model (GLM) and Generalized Additive Model (GAM) for preparing the flood susceptibility mapping in Guilan province, Iran. For this purpose, slope, aspect, plan curvature, elevation, distance from the river, drainage density, geology, land use, Topographic Wetness Index (TWI) and Stream Power Index (SPI) layers were derived in GIS (ArcGIS and SAGA-GIS). Using 220 flood locations, 70% and 30% out of total flood locations were then used to calibrate and to validate the performance of the models, respectively. The evaluation results of the models accuracy using the area under the curve (AUC) and Kappa indices showed that in terms of AUC, the SVM with 0.835 and the GAM with 0.827, and the GLM with of 0.79 performed very good and good classes, respectively. In terms of Kappa index, the SVM with 0.58, GAM with 0.53 and GLM with 0.48 are performed good and acceptable classes, respectively. Therefore, based on the mentioned indices, the SVM superior to other two models for identifying the flood susceptibility areas.

Keywords: Flood Inundation Area, Data Mining, Data Driven Models, modelling, Receiving Operating Curve (ROC), Guilan province

تعیین پتانسیل آب زیرزمینی با استفاده از روش های داده کاوی و آماری در منطقه یاسوج-سی سخت

محمدتقی آوند، سعید جانی زاده، محسن فرزین*

نشریه مرتع و آبخیزداری، سال هفتاد و دوم شماره 3 (پاییز 1398)، صص 609 -623

با افزایش جمعیت و توسعه کشاورزی نیاز به منابع آبی به شدت افزایش یافته و منابع آب زیرزمینی، بیش از پیش، به خصوص در مناطق خشک و نیمه خشک مورد توجه بسیاری قرار گرفته است. هدف از این پژوهش تهیه نقشه پتانسیل منابع آب زیرزمینی با استفاده از دو مدل داده کاوی جنگل تصادفی (RF) و آماری رگرسیون خطی تعمیم یافته (GLM) در محدوده یاسوج-سی سخت می باشد. بدین منظور لایه های اطلاعاتی شامل درجه شیب، جهت شیب، طول شیب، ارتفاع از سطح دریا، شاخص رطوبت توپوگرافی، فاصله از گسل، فاصله از آبراهه، بارندگی، کاربری اراضی، سنگ شناسی، شاخص موقعیت توپوگرافی و شاخص قدرت جریان به عنوان مهم ترین عوامل موثر بر پتانسیل آب زیرزمینی تعیین شده و در نرم افزار ArcGIS و SAGAGIS رقومی و تهیه شدند. از پراکنش 362 چشمه موجود در سطح منطقه، 70 درصد (253 چشمه) به عنوان چشمه های آموزشی و 30 درصد (109 چشمه) به عنوان چشمه های آزمایشی استفاده گردید. نتایج نشان داد که سطح طبقات حضور آب زیرزمینی با پتانسیل کم، متوسط، زیاد و خیلی زیاد در نقشه حاصل از روش جنگل تصادفی به ترتیب 78/37، 22/22، 89/18 و 11/21 درصد و در روش رگرسیون خطی تعمیم یافته به ترتیب 49/14، 04/32، 11/31 و 36/22 درصد می باشد. همچنین با حساسیت سنجی عوامل موثر در هر دو روش، عامل های بارندگی، ارتفاع از سطح دریا و فاصله از گسل حساس ترین عوامل تعیین شدند. ارزیابی دقت مدل های داده کاوی مورد استفاده در این تحقیق نیز با استفاده از منحنی عملکرد نسبی (ROC) مورد سنجش قرار گرفت. سطح زیر منحنی (AUC) برای دو مدل RF و GLM به ترتیب 92 % و 65 % درصد را نشان می دهد، بنابراین دقت مدل جنگل تصادفی در تهیه نقشه پتانسیل آب زیرزمینی در منطقه مورد مطالعه بیشتر از مدل رگرسیون خطی تعمیم یافته است. مدل های نوین داده کاوی و آماری در تلفیق با GIS برای پتانسل یابی منابع آب زیرزمینی می تواند برای مدیریت پایدار، مورد توجه طراحان و تصمیم گیران طرح های توسعه ای واقع گردد.

کلید واژگان: پتانسیل یابی منابع آب زیرزمینی، داده کاوی، جنگل تصادفی، رگرسیون خطی تعمیم یافته، یاسوج-سی سخت

Groundwater Potential Determination on Yasouj-Sisakht area Using Random Forest and Generalized Linear Statistical Models

Mohammad Taghi Avand, Saeed Janizadeh, Mohsen Farzin *

Journal of Range and Watershed Management, Volume:72 Issue: 3, 2019, PP 609 -623

Increasing population and agricultural development need dramatically water resources groundwater resources, therefore, are increasingly being considered, especially in arid and semi-arid regions. Aim of this research is mapping potential of groundwater resources on Yasouj-Sisakht region using data mining method Random Forest (RF) and Generalized Linear Statistical Model (GLM). For this purpose. For this purpose, information layers including slope, slope direction, slope length, aspect, topographic wetness index (TWI), distance from fault, distance from the stream, rainfall, land use, lithology, topographic position index (TPI) and stream power index (SPI) as the main factors influencing groundwater potential were identified and developed in ArcGIS and SAGAGIS software. From the distribution of 263 springs in the area, 70% (253 springs) were used as educational springs and 30% (109 springs) were used as experimental springs. The results showed that the level of underground water with low, medium, high and very high potential in the map of the random forest was 37.78, 22.22, 18.89 and 21.11%, respectively, and in the generalization linear model were 14.49, 32.04, 31.11 and 22.36%, respectively. Moreover, Sensitivity Analysis show that the factors affecting both methods are rainfall, altitude and distance from the fault factors. The accuracy of the data mining models used in this research was also evaluated using a relative performance curve (ROC). The area under curve (AUC) for both RF and GLM models is 92% and 65%, respectively. The accuracy of RF model, therefore, mapping groundwater potential in the study area is more than GLM model.

Keywords: Mapping potential, Data Mining, Random forest, Generalized linear model, Yasouj-Sisakht

Application of Satellite Data and Data Mining Algorithms in Estimating Coverage Percent (Case study: Nadoushan Rangelands, Ardakan Plain, Yazd, Iran)

Zinab Mirshekari, Majid Sadeghinia *, Saeideh Kalantari, Maryam Asadi

Journal of Rangeland Science, Volume:9 Issue: 4, Autumn 2019, PP 333 -350

Assessing and monitoring rangelands in arid regions are important and essential tasks in order to manage the desired regions. Nowadays, satellite images are used as an approximately economical and fast way to study the vegetation in a variety of scales. This research aims to estimate the coverage percent using the digital data given by ETM+ Landsat satellite. In late May and early June 2018, the vegetation was measured in Ardakan plain, Yazd province, Iran. Information was obtained by 320 plots in 40 transects and also, the satellite images in terms of sampling time were downloaded and processed in USGS website. 16 indices involving NDVI, NIR, MSI, SS, IR1, MIRV1, NVI, TVI, RAI, SAVI, LWC, PD322, PD321, PD312, PD311 and IR2 were estimated. Through estimating the indices and extracting the values in order to conduct index-based predictions, six data mining models of Artificial Neural Network (ANN), the K Nearest Neighbor (KNN), Gaussian Process (GP), Linear Regression (LR), Support Vector Machine (SVM) and Decision Tree (DT M5) have been applied. Model assessment results indicated high vegetation estimate efficiency based on the indices but the model KNN with Root Mean Square Error (RMSE= 2.520) and Coefficient of determination (R2= 0.94) and (RMSE= 2.872 and R2= 0.96) had the highest accuracy in the training and data sets, respectively. As well, to determine the weight and importance of parameters, and to estimate the coverage percent, the weighing process were conducted based on support vector machine. Weighing results indicated that the KNN model and the Simple Subtraction (SS) index had higher weight and importance in terms of vegetation percent.

Keywords: Coverage percent, Data mining, Remote Sensing Indices, ETM+ sensor

تعیین آستانه موثرترین عوامل بر افزایش طول خندق ها با استفاده از الگوریتم های داده کاوی و درخت تصمیم CART (بررسی موردی: حوزه آبخیز قاضیان، استان فارس)

سید مسعود سلیمان پور*، بهرام هدایتی، مجید صوفی، محمد جواد روستا، صمد شادفر

نشریه مرتع و آبخیزداری، سال هفتاد و دوم شماره 2 (تابستان 1398)، صص 409 -426

یکی از روابط مهم در فرسایش خندقی، بررسی آستانه های ایجاد و گسترش این فرسایش است. این روابط کمک می کند تا بتوان با شناخت دقیق، راهکار مناسبی را پیش بینی نمود و از تخریب اراضی به نحو مطلوب جلوگیری به عمل آورد. طی دهه اخیر، ظهور دانش های نوین در تعیین رابطه بین متغیرها موجب توسعه روش های پیش بینی در علوم مختلف شده است و در نتیجه، بررسی قابلیت استفاده از آن ها در مباحث فرسایش و حفاظت خاک، ضروری است. همچنین با توجه به این که لازم است به منظور کنترل فرسایش خندقی، مکانیسم رشد و گسترش ابعاد خندق ها، به ویژه رشد طولی آن ها، به دقت بررسی و شناخته شود، به این منظور، پژوهش حاضر اقدام به تعیین آستانه موثرترین عوامل بر افزایش طول خندق ها با استفاده از الگوریتم های داده کاوی K-Means و درخت تصمیم CART در حوزه آبخیز قاضیان واقع در شمال استان فارس نموده است. نتایج این پژوهش که شامل اندازه گیری متغیرهای مختلف خندق ها در عملیات میدانی و آزمایشگاهی و استفاده از تکنیک های داده کاوی است نشان داد افزایش طول خندق ها در این منطقه، تابع عوامل مساحت آبخیز گسترش، هدایت الکتریکی عصاره اشباع، شیب پیشانی، درصد پوشش گیاهی و نسبت جذبی سدیم می باشد. توصیه می شود در کاهش گسترش طولی خندق ها و تولید رسوب، به کنترل فرسایش در پیشانی آن ها توجه بیشتری شود. همچنین اصلاح خاک های این منطقه به کمک اصلاح کننده ها و احیای پوشش گیاهی سازگار و افزایش ماده آلی خاک، در اولویت اقدامات موثر در کنترل گسترش طولی خندق ها قرار گیرد.

کلید واژگان: آستانه، خندق، داده کاوی، درخت تصمیم، طول، فرسایش

Determining the threshold of the most effective factors on increasing the length of gullies using data-mining algorithms and CART Decision Tree (Case study: Ghazian Watershed in Fars Province)

Seyed Masoud Soleimanpour *, Bahram Hedayati, Majid Soufi, Mohammad Javad Rousta, Samad Shadfar

Journal of Range and Watershed Management, Volume:72 Issue: 2, 2019, PP 409 -426

One of the important relations in the erosion of gullies is to study the threshold of erosion creation and expansion. In recent decade, creation of new knowledge in determination of relation between variables was led to develop prediction methods in different science and therefore, investigating the ability to use these methods in erosion and soil conservation is essential. Also, in order to control the erosion of the gully, the mechanism of gullies growth and its dimension expansion, especially increasing in gullies length, has to be carefully determine; for this purpose, the present study aimed to determine the threshold of the most effective factors on increasing the length of the gully, using the K-Means data mining algorithms and the CART decision tree in the Ghazian watershed in the north of Fars province. The results of this study, which include measuring various variables of gullies under field condition and in laboratory, and using data mining techniques, showed that increasing the length of gully in this area depended on the factors of the area above headcut, saturated extract electrical conductivity, forehead slope, canopy cover percentage, and sodium adsorption ratio. It is recommended control of erosion in the foreheads is highly important in reducing the increase in gullies length and sediment production. Also, improving the soils of this area with soil amendments and the restoration of compatible vegetation and the increase in soil organic matter should be considered as the priority of effective actions to control the increasing length of gullies.

Keywords: threshold, gully, data mining, decision tree, length, erosion

تعیین موثرترین فاکتورهای کیفیت آب آشامیدنی با استفاده از روش داده کاوی QUEST (مطالعه ی موردی: شهرستان پاسارگاد استان فارس)

سید مسعود سلیمان پور *، سید حمید مصباح، بهرام هدایتی

مجله ترویج و توسعه آبخیزداری، پیاپی 21 (تابستان 1397)، ص 21

امروزه بحث کیفیت آب در بسیاری از مناطق جهان به عنوان یکی از مباحث کلیدی مطرح است؛ زیرا این امر ارتباط بسیاری با سلامتی انسان و نقش بسیار مهمی در مدیریت و بهره برداری از منابع دارد. به این منظور پژوهش حاضر برای اولین بار نسبت به تعیین موثرترین فاکتورهای کیفیت آب آشامیدنی با استفاده از روش داده ک اوی در شهرستان پاسارگاد واقع در 105 کیلومتری شمال شرقی شیراز اقدام نموده است. نتایج این تحقیق که از مدل سازی با بهره گیری از درخت تصمیم QUEST در نرم افزار Clementine (نسخه ی 12)، حاصل شده است، نشان داد که موثرترین فاکتورهای کیفیت آب آشامیدنی در این منطقه، تابع سختی کل (TH) و هدایت الکتریکی (EC)، می باشد. بدین ترتیب، در صورتی که سختی کل (TH) در این شهرستان کمتر از 282/232 قسمت در میلیون، و هدایت الکتریکی (EC) آن، کمتر از 822/787 میکروموهس بر سانتی متر باشد، این آب مناسب آشامیدن می باشد. بنابراین توصیه می شود به اقدامات تصفیه و کاهش سختی آب جهت مصارف شرب (انسان) توجه گردد و انجام پایش های مستمر در قالب نمونه برداری های دوره ای منظم از منابع آب در این شهرستان در دستور کار قرار گیرد.

کلید واژگان: آب آشامیدنی، داده کاوی، سختی کل، کیفیت آب، هدایت الکتریکی

Determination of the most Effective Drinking Water Quality Factors using QUEST Data Mining Technique (Case Study: Pasargad City, Fars Province)

Syed Masoud Solimanpour *, Seyed Hamid Mesbah, Bahram Hadayati

Journal of Extension and Development of Watershed Managment, Volume:6 Issue: 21, 2018, P 21

Nowadays, water quality is one of the key issues in most regions all over the world. Because of its high relation with human health and also have important role in management and exploitation of resources. This research has done first time in Pasargad city located in 105 km of eastern north of Shiraz. In order to determine drinking water quality factors using data mining technique, results of this research have obtained using Quest decision tree in Clementine software(12 version) and showed most effective factors of drinking water in this region is function of total harness and electrical conductivity. In this way, if total hardness in this city is less than 282.232 parts in million and electrical conduction is less than 822.787 micro mohs in centimeter, this water will be suitable for drinking. Therefore recommended more noting to refine and decrease the hardness of water for human drinking water, moreover continuous refining in periodically regular sampling of water resources in this city should be noted.

Keywords: Drinking Water, Data Mining, Total Hardness, Water Quality, Electrical Conductivity

بررسی کارایی مدل های مبتنی بر هوش محاسباتی در برآورد بار معلق رودخانه ها (مطالعه موردی: استان گیلان)

مریم اسدی، علی فتح زاده *

نشریه مرتع و آبخیزداری، سال هفتاد و یکم شماره 1 (بهار 1397)، صص 45 -60

آگاهی از میزان رسوب معلق رودخانه ها یکی از مسائل اساسی در پروژه های آبی است که طراحان تاسیسات آبی همواره با آن روبرو بوده اند. با توجه به صرف هزینه و زمان طولانی جهت اندازه گیری بار معلق رودخانه ها، استفاده از منحنی های سنجه رسوب معمول ترین روش برآورد بار رسوب معلق رودخانه ها محسوب می گردد. این در حالی است که روش های نوین مبتنی بر هوش مصنوعی و داده کاوی در بسیاری از علوم مهندسی رخنه کرده است. بر همین اساس هدف اصلی این تحقیق به چالش کشیدن توانمندی روش کلاسیک برآورد بار معلق در مقایسه با برخی روش های نوظهور می باشد. ما در این پژوهش شش مدل،K نزدیک ترین همسایه، شبکه عصبی پس انتشار خطا، فرآیند گوسی، درخت تصمیم گیری M5، ماشین بردار پشتیبان و ماشین بردار پشتیبان تکاملی را انتخاب و به مقایسه آنها با مدل سنجه رسوب در هشت حوزه آبخیز واقع در استان گیلان پرداختیم. طول دوره آماری داده های ورودی به مدل ها به صورت روزانه و 30 ساله در نظر گرفته شد. ارزیابی نتایج حاصله نشان داد مدل فرآیند گوسی در مقایسه با سایر مدل ها، با کمترین مجموع مربعات باقیمانده (RMSE) (متوسط مجموع مربعات باقی مانده= 05/37 در هشت حوزه) و بیشترین ضریب همبستگی (r) (متوسط ضریب همبستگی 72/0 در هشت حوزه) و با بهترین ضریب ناش- ساتکلیف (متوسط 66/0 در هشت حوزه) نسبت به سایر مدل ها از کارآیی بیشتری برخوردار است. لذا استفاده از مدل های مذکور به جای روش های معمول برآورد بار معلق می تواند دقت این برآوردها را به میزان قابل ملاحظه ای بهبود بخشد.

کلید واژگان: بار رسوبی معلق، منحنی سنجه رسوب، فرآیند گوسی، داده کاوی، هوش مصنوعی

The use of computational intelligence base models in suspended sediment load estimation (Case study: Gillan province)

Maryam Asadi, Ali Fathzadeh *

Journal of Range and Watershed Management, Volume:71 Issue: 1, 2018, PP 45 -60

Understanding of suspended sediment rate is one of the fundamental problems in water projects which water engineers consistently have involved with it. Wrong estimations in sediment transport cause incorrect design and destruction of hydraulic systems. Due to the difficulty of suspended sediment measurements, sediment rating curves is considered as the most common method for estimating the suspended sediment load. The main purpose of this research is the capability challenge of this method in comparison to some state of the art models. In this study, we selected some computational intelligence models (i.e. K-nearest neighbor (KNN), artificial neural networks (ANN), Gaussian processes (GP), decision trees of M5, support vector machine (SVM) and evolutionary support vector machine (ESVM)) and compared them with their sediment rating model in 8 basins located in Gilan province. Daily sediment and discharge data considered as the input data for 30-years. Evaluation of the results indicated that the Gaussian process model has the lowest residual sum of squares (RMSE) and the highest correlation coefficient (r) than the other models.

Keywords: Suspended sediment load, Sediment rating curve, Gaussian process, Data mining, Artificial intelligent

مقایسه کارایی روش های رگرسیون بردار پشتیبان و k-نزدیکترین همسایگی در برآورد میزان بار رسوبی معلق در رودخانه (مطالعه موردی: رودخانه لیقوان چای)

علی رضازاده جودی *، محمد تقی ستاری

نشریه مرتع و آبخیزداری، سال هفتادم شماره 2 (تابستان 1396)، صص 345 -358

برآورد بار رسوبی معلق رودخانه ها با توجه به خسارات ناشی از عدم توجه و لحاظ کردن آن، یکی از مهم ترین و اساسی ترین چالش های مطالعات انتقال رسوب و مهندسی رودخانه می باشد. با توجه به اهمیت و نقش رسوب در طراحی و نگهداری سازه های هیدرولیکی همچون سدها و همچنین برنامه ریزی جهت استفاده بهینه از منابع آبی در پایین دست رودخانه ها و حفظ منابع مغذی بالادست آن ها، همواره تلاش های بسیاری در زمینه تخمین میزان بار رسوبی معلق رودخانه ها انجام گرفته و روش های متعددی در این زمینه توسعه یافته است. اما با توجه به هزینه بر بودن اکثر روش ها و یا عدم دقت کافی در اکثر روش های تجربی مرسوم، نیاز به روش نوینی که بتواند بار رسوبی معلق رودخانه را با بیشترین دقت ممکن تخمین زند، امری ضروری به نظر می رسد. در این مطالعه میزان بار رسوبی معلق رودخانه لیقوان چای توسط روش های رگرسیون بردار پشتیبان و k-نزدیک ترین همسایگی برآورد گردیدند. نتایج نشان دهنده عملکرد مناسب هر دو روش داده کاوی بررسی شده در این تحقیق می باشد. از میان روش های بررسی شده در این تحقیق، روش رگرسیون بردار پشتیبان میزان بار رسوبی معلق رودخانه لیقوان چای را با ارائه مقادیر ضریب همبستگی برابر با 959/0 و ریشه میانگین مربعات خطا برابر با 547/43 (تن در روز) با دقت بیشتری نسبت به روش k-نزدیک ترین همسایگی پیش بینی کرد.

کلید واژگان: بار رسوبی معلق، داده کاوی، رگرسیون بردار پشتیبان، لیقوان چای، k-نزدیک ترین همسایگی

Comparison of the Efficiency of Support Vector Regression and K-Nearest Neighbor Methods in suspended sediment load Estimation in river (Case Study: Lighvan Chay River)

Ali Rezazadeh Joudi *, Mohammad Taghi Sattari

Journal of Range and Watershed Management, Volume:70 Issue: 2, 2017, PP 345 -358

Estimation of suspended sediment load is one of the most important and fundamental challenges in the studies of sediment transport and river engineering, due to the damage caused by the lack of attention and considering it. Given the importance and role of sediment in the design and maintenance of hydraulic structures such as dams and As well as planning for efficient use of downstream of river and also conservation of nutrients at the upstream of river always lots of efforts have been done in the field of suspended sediment load estimation and numerical methods have been developed in this case. But due to the cost of most procedures or lack of adequate accuracy in most of common experimental methods, need to a new method that can estimate suspended sediment load with the greatest possible precision, seems to be very necessary. In this study the amounts of suspended sediment loads have been estimated with support vector regression and k-Nearest neighbor methods. Results indicated the acceptable ability of both data mining techniques that explored in this study in estimation of suspended sediment load. Among the methods examined in this study, the support vector regression method estimated the amounts of suspended sediment load in Lighvan Chay River with representing evaluation indexes such as (CC=0.959, RMSE=43.547(ton/day)) is more accurate rather than K-nearest neighbor method.

Keywords: k-Nearest neighbors, Lighvan Chay River, Data Mining, Support vector Regression, Suspended sediment load

استفاده از مدل داده کاوی CANFIS در تخمین ظرفیت تبادل کاتیونی برخی خاک های مناطق خشک و نیمه خشک

فریدون سرمدیان، علی کشاورزی

نشریه مرتع و آبخیزداری، سال شصت و نهم شماره 2 (تابستان 1395)، صص 397 -410

داده کاوی این فرصت را فراهم می کند تا داده های موجود از خاک، به مناطق دور از دسترس تعمیم داده شوند و داده های خاک را در طیفی از مقیاس ها، متراکم کرده و یا گسترش داد؛ بطوریکه می توان آن را به عنوان یکی از دستاوردهای با ارزش در جهت کمک بهه تصمیم گیری صحیح مدیران اجرایی تلقی نمود. ظرفیت تبادل کاتیونی (CEC) یکی از مهمترین ویژگی های شیمیایی خاک هاسهت که توانایی خاک در ذخیره عناصر غذایی و یا عناصر آلاینده در خاک را نشان می دهد. اندازه گیری CEC خاکها ی مناطق گرم و خشک با دارا بودن خصوصیاتی مانند مواد آلی پایین و کانی شناسی خاص، به روش های معمول سخت و زمان بر است، لذا در این تحقیق از روش CANFIS (Coactive Neuro-Fuzzy Inference Systems ( جهت برآورد ظرفیت تبادل کاتیونی برخی خاکهای مناطق خشک و نیمه خشک استفاده گردید. در این تحقیق از 58 نمونه خاک )بانک داده های خاک هدف( موجود در پایگاه داده ها ی خاک ) 444 نمونه خاک مرجع( به نسبت 5:8 استفاده شد. به منظور بررسی همراستایی در داده ها، همبستگی بین متغیر های مستقل مورد بررسی قرار گرفت و با استفاده از روش رگرسیونی حذف پیشرو، مهمترین و تاثیرگذارترین مولفه های ورودی بر نتایج خروجی، انتخاب گردید. نتایج این تحقیق نشان داد که روش CANFIS دارای قابلیت و کارایی زیادی در تخمین ظرفیت تبادل کاتیونی خاک با استفاده از ویژگی های زودیافتی مانند بافت خاک، ماده آلی و تصاویر ماهواره ای می باشد.

کلید واژگان: پایگاه داده های خاک، داده کاوی، ظرفیت تبادل کاتیونی، ویژگی های زودیافت خاک، CANFIS

Application of CANFIS Model in Prediction of Soil Cation Exchange Capacity in Some Arid and Semi-Arid Regions of Iran

Fereydun Sarmadian, Ali Keshavarzi

Journal of Range and Watershed Management, Volume:69 Issue: 2, 2016, PP 397 -410

Data mining enables generalization of data of soil to remote areas and which is able to up/down scale of data in wide ranges of level that facilitate the decision-making process of executives. Cation Exchange Capacity (CEC) is one of the most important parameters in soil database and shows the ability of a soil to retention of minerals and pollutants. Due to low organic matter and specific mineralogy of soils in arid and semi-arid regions, measurement of CEC is time consuming and expensive. The objective of this study was to evaluate Coactive Neuro-Fuzzy Inference System (CANFIS) in prediction of CEC in soils of arid and semi-arid regions. A total of 85 soil samples from target area were selected among 440 soil sample database (available reference database) with a ratio of 1:5. Correlation test was conducted to assess the co-linearity of independent variables. Forward regression model was used to determine the most important and influential input parameters on the output results. The results indicated the reliability and high performance of the CANFIS approach in estimation of CEC using easily measurable characteristics, organic material, and satellite images.

Keywords: soil database, data mining, CEC, easily measurable characteristics, CANFIS

پیش بینی جریان روزانه رودخانه اهرچای با استفاده از مدل قوانین M5 و مقایسه آن با شبکه های عصبی مصنوعی المانی (ENN)

مهندس محمدرضا عبدالله پورآزاد، محمدتقی ستاری، رسول میرعباسی نجف آبادی

مجله علوم و مهندسی آبخیزداری ایران، پیاپی 33 (تابستان 1395)، صص 11 -18

برآورد صحیح آبدهی رودخانه ها یکی از موارد مهم در پیش بینی خشکسالی، سیلاب، طراحی سازه های آبی، بهره برداری از مخازن سدها و کنترل رسوب می باشد. از این رو متخصصان علوم مهندسی آب جهت برآورد دقیق جریان، از روش های هوشمند مانند شبکه های عصبی مصنوعی و روش های مختلف داده کاوی بهره گرفته اند. در این مطالعه، جهت پیش بینی جریان روزانه رودخانه اهرچای، از روش های شبکه عصبی مصنوعی المانی (ENN) و قوانین درختی M5 بهره گرفته شد. بدین منظور از داده های جریان روزانه ایستگاه هیدرومتری اورنگ واقع بر رودخانه اهرچای در استان آذربایجان شرقی برای مدل سازی استفاده شد. نتایج حاصل از پیش بینی جریان در یک روز بعد نشان داد که گرچه روش ENN در بهترین سناریو با ساختار شبکه نسبتا پیچیده 1-3-9 که بیان گر 9 گره در لایه ورودی، 3 گره در لایه پنهان و یک گره در لایه خروجی با 90/0R2=، (m3/s)028/0RMSE= و (m3/s)001/0MAE= از دقت بیش تری برخوردار است. اما روش قوانین M5 تنها با دو پارامتر جریان در روز جاری و یک روز قبل به عنوان ورودی، با 83/0 R2=، (m3/s)734/0RMSE= و (m3/s)317/0 MAE= علاوه بر سادگی، از دقت قابل قبولی نیز برخوردار بوده است. مقایسه عملکرد دو مدل نشان داد، گرچه شبکه عصبی المانی دارای دقت بالاتری نسبت به روش M5 می باشد، ولی روش M5 با توجه به ارائه قوانین کارآمد و ساده اگر-آنگاه و روابط خطی ساده برای پیش بینی جریان و نیز تعداد پارامتر ورودی موردنیاز کم تر، می تواند بعنوان یک روش جایگزین مناسب بکار گرفته شود.

کلید واژگان: داده کاوی، شبکه عصبی المانی، مدل درخت تصمیم، مدل سازی

Daily Discharge Forecast of Aharchay River using M5 Model Trees and Its Comparing with Elman Neural Networks (ENN)

Eng. Mohammad Reza Abdollah Pourazad, Dr. Mohammad Taghi Sattari, Dr. Rasoul Mirabbasi

Iranian Journal of Watershed Management Science and Engineering, Volume:10 Issue: 33, 2016, PP 11 -18

The correct estimation of river discharge is an important issue in forecasting of drought and floods, designing of water structures, dam reservoir operation and sediment control. For this reason, water resources managers used intelligent techniques such as Artificial Neural Networks and data mining methods such as Decision Tree to reliably estimate the discharge in a river. In this study, the Elman Neural Networks (ENN) and M5 model trees were used to forecast daily discharge of Aharchay River. The daily discharge data of Aharchay River measured at the Orange hydrometric station was used for modeling. The results showed that for the forecasting discharge of one day ahead, the ENN method presents more accurate results in compression with M5 model. For forecasting discharge of one day ahead, the best scenario of ENN model with a relatively complicated structure of 9-3-1 that indicating 9 nodes in input layer, 3 nodes in hidden layer and 1 node in output layer, the calculated error measures were R2=0.90, RMSE=0.028 (m3/s) and MAE=0.001 (m3/s). The corresponding values for M5 model with only two input parameters including the discharge of current and last day, were R2=0.83, RMSE=0.734 (m3/s) and MAE=0.317 (m3/s). Comparing the performance of ENN and M5 models indicated that, however the ENN approach may present more accurate results than the M5 model tree, but the M5 model provides more understandable, applicable and simple linear relation in forecasting daily discharge. In addition, the number of required input parameter for M5 model is less than ENN model. Thus, the M5 model tree can be used as an alternative method in forecasting daily discharge.

Keywords: Data Mining, Elman Neural Network, Decision Tree Model, Modelling

گزارش فنی: پیش بینی سیلاب های ساعتی رودخانه اهرچای با استفاده از روش های یادگیری ماشین

محمدتقی ستاری، محمدرضا عبدالله پورآزاد، رسول میرعباسی نجف آبادی*

مجله مهندسی و مدیریت آبخیز، سال هشتم شماره 1 (بهار 1395)، صص 115 -127

سیل یکی از حوادث طبیعی است که هر ساله خسارات بسیاری در نقاط مختلف جهان به وجود می آورد. پیش بینی دقیق سیلاب در کاهش خسارات جانی و مالی و مدیریت منابع آب از اهمیت بسزایی برخوردار است. هدف از مطالعه حاضر، مقایسه قابلیت های روش های رگرسیون ماشین بردار پشتیبان، مدل درختی M5 و مدل رگرسیون خطی در برآورد دبی سیلاب یک و دو ساعت آینده ایستگاه تازه کند در رودخانه اهرچای می باشد. داده های تاریخی دبی-اشل ساعتی ایستگاه تازه کند و 14 رویداد مهم سیل برای ایجاد مدل مورد استفاده قرار گرفت. نتایج نشان داد که روش رگرسیون ماشین بردار پشتیبان با ضریب تبیین 0.96 و جذر میانگین مربعات خطا M3s-1) 0.0472) برای سیلاب یک ساعت بعد و 0.90=R2 و M3.s-1) RMSE=0.1596 برای سیلاب دو ساعت بعد بهترین نتیجه را ارائه نمود. گرچه مدل درختی M5 دقت نسبتا کمتری نسبت به روش رگرسیون ماشین بردار پشتیبان داشت، ولی به لحاظ ارائه روابط خطی ساده و قابل فهم می تواند به عنوان یک روش کاربردی در پیش بینی دبی سیلاب های ساعتی مورد استفاده قرار گیرد.

کلید واژگان: آذربایجان شرقی، دبی، داده کاوی، رگرسیون ماشین بردار پشتیبان، مدل درختی M5

Technical Note: Hourly river flow forecast of Aharchay River using machine learning ‎methods

Mohammadtaghi Sattari, Mohammadreza Abdollah Pourazad, Rasoul Mirabbasi Najafabadi*

Journal of Watershed Engineering and Management, Volume:8 Issue: 1, 2016, PP 115 -127

Floods are the main natural disasters that produce serious agricultural, environmental, and socioeconomical damages in many parts of the world. Accurate estimation of river flow in streams can have a significant role in water resources management and in protection from possible damages. This study aims to compare the abilities of Support Vector Machine (SVM), M5 model trees and Linear Regression (LR) methods in forecasting hourly discharge flow of Aharchay River. The hourly water level-discharge and 14 flood events data of Aharchay River measured at the Tazekand hydrometric station was used for modeling. The results showed that the SVM method gives more accurate results than the M5 model and LR method in forecasting river flow for next one and two hours with the R2=0.96 and RMSE=0.0472 (m3s-1) and the R2=0.90 and RMSE=0.1596 (m3s-1), respectively. Comparing the performance of SVR and M5 models indicated that, however the SVR approach may present more accurate results than the M5 model tree, but the M5 model provides more understandable, applicable and simple linear relation in forecasting hourly discharge. Thus, the M5 model tree can be used as an alternative method in forecasting hourly discharge.

Keywords: Data mining, Discharge, East Azerbaijan, M5 model trees, Support Vector Machine ýý(SVM)ý

تعیین آستانه ی عوامل موثر بر گسترش طولی آبکندها با استفاده از تکنیک های داده کاوی در منطقه ی ماهورمیلاتی استان فارس

سید مسعود سلیمان پور، بهرام هدایتی، مجید صوفی، حسن احمدی

مجله علوم و مهندسی آبخیزداری ایران، پیاپی 29 (تابستان 1394)، صص 47 -56

یکی از مهم ترین انواع فرسایش آبی، فرسایش آبکندی است. این نوع فرسایش به ویژه در مناطق خشک و نیمه خشک جهان موجب تغییرات قابل ملاحظه ای در اراضی، تولید رسوب فراوان و پیامدهای زیان بار اقتصادی و اجتماعی می شود. آستانه عبارت از نقطه ای است که پس از آن رفتار سیستم تغییر می کند و می تواند به عوامل داخل و یا عوامل خارج از سیستم مرتبط باشد. یکی از سوالات مهم در مورد آبکندها این است که آستانه ی ایجاد و گسترش آن ها در چه شرایطی تامین می گردد. تعیین این شرایط از این لحاظ مهم است که در یک منطقه می توان شرایط موجود را با شرایط آستانه مقایسه و تعیین نمود که این شرایط تا چه حد به شرایط آستانه ی ایجاد و یا گسترش آبکند نزدیک است. بدین منظور پژوهش حاضر نسبت به تعیین آستانه ی عوامل موثر بر گسترش طولی آبکندها با استفاده از تکنیک های داده کاوی در منطقه ی ماهورمیلاتی واقع در جنوب غرب استان فارس اقدام نموده است. نتایج این تحقیق که از مدل سازی با بهره گیری از الگوریتم های خوشه بندی K-Means و درخت تصمیم CART در نرم افزار Clementine 12.0 حاصل شده است، نشان داد که گسترش طولی آبکندها در این منطقه، تابع عوامل نسبت جذب سدیم، اسیدیته ی خاک و مساحت خروجی می باشد. توصیه می شود اصلاح خاک های این منطقه به کمک اصلاح کننده ها، احیای پوشش گیاهی و افزایش ماده ی آلی خاک در اولویت اقدامات موثر در کنترل گسترش طولی آبکندها قرار گیرد.

کلید واژگان: آبکند، آستانه، داده کاوی، گسترش طولی

Determination of Threshold of Effective Factors on Length Expansion of Gullies Using Data Mining Techniques in Mahourmilati Region, Fars Province

S.Masood Soleimanpour, Bahram Hedayati, Majid Soufi, Hasan Ahmadi

Iranian Journal of Watershed Management Science and Engineering, Volume:9 Issue: 29, 2015, PP 47 -56

One of the most important types of water erosion is gully erosion. This type of erosion causes remarkable changes in lands، production of abundant sediment and deleterious social and economic consequences; especially in arid and semi arid regions of the world. Threshold is the point after which the behavior of the system changes and it can be related to the internal or the external factors of the system. One of the important questions about gullies is that in what condition the threshold of gully is created and expanded. Identifying the conditions seems important considering the fact that it is possible to compare conditions of the threshold with the regional conditions and determine how much the conditions are close to thresholds of gully creation and or Expansion. For this purpose، the current research aims to determine threshold of effective factors on length expansion of gullies in Mahourmilati region in south-west of Fars province by using the data mining techniques. The findings of the current research which were achieved by modeling based on K-Means Clustering Algorithms and CART Decision Tree and in Clementine software (version: 12. 0) indicated that the length expansion of gullies in the studied region is the function of SAR، soil acidity (pH)، and output area. To improve the soil of the studied region، it is recommended to consider improvements، revegetation and enhancement of soil organic matters prior to effective actions in controlling the length expansion of the gullies.

Keywords: Gully, Threshold, Data Mining, Length Expansion

بررسی تاثیر تعدیل دامنه تغییرات داده ها بر کارایی مدل شبکه عصبی مصنوعی و درخت تصمیم رگرسیونی در پیش بینی خشکسالی

اعظم حبیبی پور، محمد تقی دستورانی، محمدرضا اختصاصی، حمیده افخمی

پژوهشنامه مدیریت حوزه آبخیز، پیاپی 3 (بهار و تابستان 1390)، صص 63 -79

خشکسالی یکی از اثرات تغییر سامانه اقلیمی است. پیش بینی خشکسالی نقش مهمی در اعمال روش های موثر مدیریت منابع آب ایفا می کند. روش های مختلفی برای مطالعه خشکسالی وجود دارد. روش تحلیل داده های بارندگی، جزء عمومی روش های تحلیل خشکسالی به شمار می رود. لذا پیش بینی دقیق و پیش از وقوع بارش می تواند شرایط را برای ارزیابی وضعیت خشکسالی فراهم نماید. هدف از این پژوهش، بررسی تأثیر پیش پردازش داده ها بر عملکرد دو مدل داده کاوی در پیش بینی خشکسالی در ایستگاه سینوپتیک یزد می باشد. در این رابطه از دو روش شبکه عصبی مصنوعی و درخت تصمیم رگرسیونی که از انواع روش های داده کاوی محسوب می شود استفاده شد و شبیه سازی ها در دو حالت کلی صورت گرفت. در حالت اول، از مقادیر اصلی برخی پارامترهای اقلیمی استفاده و میزان بارش 12 ماه پیش از وقوع پیش بینی گردید. در حالت دوم میانیگن متحرک سه ساله همان داده ها به مدل معرفی و پیش بینی بر همین اساس انجام شد. در پایان برای ارزیابی دقت و درستی دو روش مورد استفاده، معیارهای آماریRMSE، R مورد استفاده قرار گرفت. یافته ها نشان داد که اعمال میانگین لغزان روی داده های اصلی به نحو چشمگیری در بهبود کارایی هر دو مدل موثر می باشد و در این شرایط هر دو روش درخت تصمیم رگرسیونی و شبکه عصبی مصنوعی در ایستگاه یزد قادرند با ضریب اطمینان بالایی میزان بارش را 12 ماه پیش از وقوع برآورد نمایند.

کلید واژگان: پیش بینی، خشکسالی، شبکه عصبی مصنوعی، درخت تصمیم رگرسیونی، داده کاوی

Evaluation of the Effects of Data range Modification on Efficiency of Regression Decision Tree and Artificial Neural Networks for Drought Prediction

A. Habibipoor, M.T. Dastorani, M.R. Ekhtesasi, H. Afkhami

Journal of Watershed Management Research, Volume:2 Issue: 3, 2011, PP 63 -79

One of the effects of climate system change is the occurrence and intensification of drought phenomenon. Prediction of drought condition can play an important role in mitigation of its effects as well as effective management of the available water during the drought periods. Different approaches have been presented for evaluation of drought. Analysis of precipitation data is the general method for drought evaluation, as acceptable prediction of precipitation before its occurrence, would be necessary and effective for analysis of drought. The purpose of this research is the evaluation of the effect of data processing on applicability of two data mining models on drought prediction in Yazd station. In addition, during the recent decades some new computer based models have been developed for drought prediction and in most of the cases they have presented quite satisfactory results. In this research, prediction of precipitation, which is the main component on drought occurrence, has been carried out in Yazd synoptic meteorological station. Therefore, two data mining methods including Regression Decision Tree (RDT) and Artificial Neural Networks (ANN) have been used, and simulations were carried out in two different conditions. In the first condition, the measured values of some meteorological variables were used as inputs and the amount of precipitation was predicted 12 months in advance. However, In the second condition, 3-year moving average of data were the inputs of the models for prediction of precipitation amount 12 months before its occurrence. Finally for evaluation of the model performance in different conditions, statistical criterion including R and RMSE were employed. Results indicated that using moving average of data as inputs of the models has considerably improved the performance of the models. Both RDT and ANN methods are able to predict the amount of precipitation in Yazd station 12 months before its occurrence.

Keywords: Prediction, Drought, Artificial Neural Networks, Regression Decision Tree, Data mining

به جمع مشترکان مگیران بپیوندید!

data mining