Imputation of Missing Meteorological Data with Evolutionary and Machine Learning Methods Case Study: Long-term Monthly Precipitation and Temperature of Mashhad
Author(s):
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Introduction
Temperature and precipitation are two of the main variables in meteorology and climatology. These are basic inputs in water resource management. The length of the statistical period plays a pivotal role in the accurate analysis of these variables. Observation data at Iran's first synoptic station from 1330 (1951) is available at the Iranian Meteorological Organization website The historical monthly precipitation and temperature of five stations in Iran is available since 1880 with missing data. These data measured by the Embassy of the United States and Britain from the Qajar period and recorded in World Weather records books. These synoptic stations include Mashhad, Isfahan, Tehran, Bushehr, and Jask. The monthly missing data were predominantly recorded during World War II (1941-1949). Unfortunately, these data have missing. Therefore, the accuracy of simulating these variables is very important. The current research aimed to predict the missing values of monthly temperature and precipitation in Mashhad station. The stations in the neighboring countries were selected due to the distance to Mashhad, relationship, and completeness of data since 1880, as the predictive variables. Monthly precipitation of Ashgabat from Tajikistan and Sarakhs, Kooshkah, Bayram Ali, Kerki and Repetek from Turkmenistan were selected as an independent variable in the making of Missing Rainfall in Mashhad. Also, the temperature of Ashgabat, Bayram Ali, Gudan, Sarakhs, and Tajan were selected to restore the monthly temperature of the Mashhad station. This research has fitted ten multiple regression models to monthly rainfall of Mashhad station and has fitted 6 multiple regression to the monthly temperature of Mashhad. then the parameters of these patterns are optimized by genetic and Ant Colony algorithm. Also, the Artificial Neural Network (MLP) model and Support vector regression have been selected and implemented in order to simulate monthly precipitation and temperature data of Mashhad.
Materials and Methods
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). Genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover, and selection. Ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations. Artificial neural networks are one of the main tools used in machine learning. As the “neural” part of their name suggests, they are brain-inspired systems which are intended to replicate the way that we humans learn. Neural networks consist of input and output layers, as well as (in most cases) a hidden layer consisting of units that transform the input into something that the output layer can use. They are excellent tools for finding patterns which are far too complex or numerous for a human programmer to extract and teach the machine to recognize. In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting).
Results and Discussion
At the first stage, several multiple regressions were fitted to monthly precipitation (with coefficients ranging from 0.63 to 0.81) and six patterns for monthly temperature (0.986-0.993). Afterward, GA and ACO were applied to improve the accuracy of the selected regression models by optimizing their parameters. At the next stage, ANN and SVR were used to estimate the monthly missing values separately. Finally, the results of the previous stages were compared using the root mean square error (RMSE), and the optimal models were applied to determine the missing values of monthly temperature and precipitation of Mashhad. The results showed that the Genetic Algorithm and Ant Colony increase the accuracy of the estimation of missing rainfall data significantly more than the previous methods. The lowest error criterion (RMSE) between regression patterns is 9.8 millimeters. By genetic algorithm, this criterion is reduced to 2.56 mm, and by ant colony algorithm to 2.559.
Conclusion
Comparison of the above methods in restoration temperature and precipitation shows that evolutionary methods (GA and ACO) are the best for estimating the missing monthly precipitation and machine learning methods (ANN and SVR) are the best to imputation missing data of monthly temperature.Keywords:
Language:
Persian
Published:
Journal of water and soil, Volume:33 Issue: 2, 2019
Pages:
361 to 377
https://magiran.com/p2004638
مقالات دیگری از این نویسنده (گان)
-
Association between Clinical Symptoms and Histological Features of Molars with Acute Pulpitis
Mahsa Dastpak, Jamileh Ghoddusi *, Amir Hossein Jafarian,
Iranian Endodontic Journal, Spring 2023 -
Performace Evaluation of PERSIAN PDIR-Now and PERSIANN CCS Products for Precipitation leading to the Most Severe Floods in Iran between 2017 and 2019
Seyyed Hossein Sanaei-Nejad *, Khosro Salari
Journal of Geography and Environmental Hazards, -
Two-Source Energy Balance Model (TSEB) Evaluation for Evapotranspiration Partitioning of Corn under Drip Irrigation in Farm Scale
Mosayeb Moqbeli Damane, Mahdi Gholami Sharafkhane, Seyed Hossein Sanaeinejad *, Mojtaba Sadegh
Iranian Journal of Soil and Water Research, -
Environmental Flow Assessment of Karun River in Upstream and Downstream of Beheshtabad Dam
Fateme Hayatgheibi, Naser Shahnoushi *, , Hossein Samadi, Mohammad Ghorbani, Mahmood Sabouhi Sabouni
Journal of water and soil, -
Economic study of the application of drip irrigation method (Tape) in wheat cultivation
Hadi Afshar *, Hosein Sharifan, Bijan Ghahraman, Mohammad Bannayan
Water Management in Agriculture, -
A bilateral fuzzy support vector machine hybridizing the Gaussian mixture model
M. Mohammadi *, M. Sarmad
Iranian journal of fuzzy systems, May-Jun 2021