Tabriz Daily Rainfalls Modeling via Hybridized Tree Based and Seasonal-Trend Component Bagging Method
Precipitation is one of the most important components of water cycle. Accurate precipitation measurement is essential for flood forecasting and control, drought analysis, runoff modeling, sediment control and management, watershed management, agricultural irrigation planning, and water quality studies. Determining the correct amount of precipitation in cities and rural areas is also important for managing floods. The precipitation process is completely non-linear and involves randomness in terms of time and space. Therefore, it is not easy to explain that with simple linear models due to various climatic factors and may contain major errors. Therefore, various methods and models have been proposed to evaluate, and predict precipitation. This study aimed to estimate the daily precipitation of Tabriz based on hybridized tree-based and Bagging methods by using neighboring stations.
In the present study, the rainfall data of adjacent stations in Urmia lake basin (Sahand, Sarab, Urmia, Maragheh and Mahabad) were employed in 1986-2021 to estimate the daily rainfall in Tabriz. About 70% of data were considered for calibration and 30% of data were applied for validation. Using the correlation matrix and Relief algorithm, various input components were identified. Modeling was performed using tree-based data mining methods including M5P, RT and REPT and Bagging method. The daily precipitations of Tabriz was decomposed into their components by seasonal-trend analysis method. Its components, including trend, seasonal and residual, were used in different input scenarios to investigate the effect of these components on improving the modeling results. To evaluate the modeling performance, the indices of correlation coefficient, Root Mean Square Error, Nash-Sutcliffe Efficiency and modified Wilmot coefficient were applied.
RT and REPT methods increased the accuracy of the model and decreased its error when they were used as the basic algorithm of the Bagging method. This was not the case with the M5P method, as the results were slightly weaker. It was also observed that Tabriz rainfall is largely influenced by Sahand rainfall, as the most models gave reliable estimates by using the rainfall data for Sahand station. This can be explained by the high correlation between Tabriz rainfall and Sahand. The results showed that the first scenario (Sahand) for M5P, RT, REPT and B-M5P method, the fifth scenario (Sahand, Sarab, Urmia, Maragheh and Mahabad) for the B-RT method, and the fourth scenario (Sahand, Sarab, Urmia and Mahabad) for the B-REPT method were the best scenarios. The best performance was found for the scenario 1 of the M5P decision tree model, followed by the Bagging method with the M5P base algorithm. In general, it was concluded that application of the Bagging method produced reliable results. Modeling without considering the decomposition components was compared with modeling with decomposition components. Adding seasonal, trend and residual components to the modeling input combinations significantly improved the accuracy of the results. Application of Bagging method in most cases also increased the modeling accuracy. The first scenario (Sahand and residual) for M5P and B-M5P methods, the tenth scenario (residual, trend, seasonal, Sahand and Sarab) for RT, REPT and B-REPT methods, and the eighth scenario (residual, trend and Sahand) for B-RT method were selected as the best scenarios. As a result, among the stations, Sahand, due to proximity and high correlation, and Sarab, due to greater correlation, had a great impact on precipitation in Tabriz. In general, the Bagging method with the basic M5P algorithm (B-M5P) was best suited in the first scenario. Thus, adding precipitation analysis components and using the Bagging method improve the modeling results with tree-based data mining methods.
Our results showed that Bagging method provided acceptable results in most cases. In the first case, the first scenario of M5P method including Sahand precipitation data was selected as the superior method and scenario. As a result, Sahand was the most effective station in estimating Tabriz rainfall with the highest correlation and the shortest distance from Tabriz. In the second case, with the decomposition components, the accuracy of the results increased significantly. The Bagging method with the basic M5P algorithm, the parameters of Sahand precipitation and the residual of Tabriz precipitation was considered as the best modeling algorithm. It can be concluded that using Bagging method and decomposition components with the closest station to the studied station results in the highest accuracy. Therefore, Bagging models with tree-based algorithm can be considered as simple and widely used methods.
-
Evaluating the efficiency of dimensionality reduction methods in improving the accuracy of water quality index modeling in Qizil-Uzen River using machine learning algorithms
Mohammadtaghi Sattari *, Kimia Shirini,
Journal of Water and Soil Management and Modeling, -
Investigating the Performance of the Combined Dagging Method with the Hoeffding Tree Base Algorithm in the Qualitative Classification of Drinking Water
MohammadTaghi Sattari *,
Journal of Water & Wastewater Science and Engineering,