به جمع مشترکان مگیران بپیوندید!

تنها با پرداخت 70 هزارتومان حق اشتراک سالانه به متن مقالات دسترسی داشته باشید و 100 مقاله را بدون هزینه دیگری دریافت کنید.

برای پرداخت حق اشتراک اگر عضو هستید وارد شوید در غیر این صورت حساب کاربری جدید ایجاد کنید

عضویت

جستجوی مقالات مرتبط با کلیدواژه « Outlier » در نشریات گروه « ریاضی »

تکرار جستجوی کلیدواژه «Outlier» در نشریات گروه «علوم پایه»
  • Elham Eskandari, Alireza Khastan

    The imprecision related to measurements can be managed in terms of fuzzy features, which are characterized by two components: center and spread. Outliers affect the outcome of the clustering models. In trying to overcome this problem, this paper proposes a fuzzy clustering model for L-R fuzzy data, which is based on a dissimilarity measure between each pair of fuzzy data defined as an adaptive weighted sum of the L1-norms of the centers and the spreads. The proposed method is robust based on the metric and weighting approaches. It estimates the weight of a given fuzzy feature on a given fuzzy cluster by considering the relevance of that feature to the cluster; if outlier fuzzy features are present in the dataset, it tends to assign them weights close to 0.To deeply investigate the capability of our model, i.e., alleviating undesirable effects of outlier fuzzy data, we provide a wide simulation study. We consider the ability to classify correctly and the ability to recover the true prototypes, both in the presence of outliers. The comparison made with other existing robust methods indicates that the proposed methodology is more robust to the presence of outliers than other methods. Moreover, the performance of our method decreases more slowly than others when the percentage of outliers increases. An application of the suggested method to a real-world categorical dataset is also provided.

    Keywords: L-R fuzzy data, Robust fuzzy clustering, L1 norm, Outlier}
  • احد رحیم پور، مسعود یارمحمدی*

    در تحلیل سری های زمانی چند متغیره، نقاط دورافتاده می توانند منجر به شناسایی غلط مدل، برآورد اریب پارامترها و پیش بینی های ضعیف شوند. لذا آشکارسازی این نقاط بسیار مهم بوده و مورد توجه می باشد. در این تحقیق، از الگوریتم ژنتیک جدیدی برای آشکارسازی نقاط دورافتاده در سری زمانی چند متغیره استفاده می کنیم. در این الگوریتم علاوه بر پیدا کردن مکان نقاط دورافتاده، شناسایی نوع دورافتادگی این نقاط نیز انجام می شود. سپس به معرفی روش تسای، پناه و پانکراتز (TPP) پرداخته و با مطالعات شبیه سازی نشان می دهیم که درصد آشکارسازی صحیح نقاط دورافتاده در الگوریتم ژنتیک نسبت به روش TPP بیشتر است. همچنین داده های مربوط به گاز-کوره بررسی و مدل بندی شده و مشخص گردید که روش های الگوریتم ژنتیک و TPP، نقاط دورافتاده مشابهی را آشکار می سازند.

    کلید واژگان: سری زمانی چند متغیره, نقطه دورافتاده, آشکارسازی, الگوریتم ژنتیک}
    Ahad Rahimpoor, Masoud Yarmohammadi*

    Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA model is necessary. By detecting outliers, their effect can be eliminated over time and we obtain the modified data. Using this modified data, the proper estimates of the VARMA model are obtained which have the least effect on the outliers. On the other hand, detect of outliers is important in finding an external event over time. For example, by finding outliers in river water monitoring data, flood times can be obtained. The parameter estimation of VAR model is less time consuming than VARMA. On the other hand, under condition of invertibility, VARMA models could be approximated by VAR(p) for large p. Therefore, we use this model to fit and investigate the data generated from VARMA models that contaminated by outliers. Multivariate observations of time series may be contaminated with different types of outliers. However, the effect of different types of outliers in multivariate and univariate case is different, and this observation must be assessed by multivariate approach. In this research, we use a Genetic Algorithm (GA) to develop a procedure for detecting different types of outliers (additive, innovation, level shift and temporary change outliers) in a multivariate time series. GA detects outlier location which minimizes Akaike-like Information Criterion (AIC) and we try to "minimize the number of outliers" and "maximize the likelihood function".  GA is a numerical optimization algorithm whose idea is based on natural selection and natural genetics. This algorithm does not require strong assumptions to obtain the optimal value of a function and has the ability to search for the optimal solution from a space with several local optimal. That is, for example, if a function has several relative maxima, GA finds the absolute maximum of this function as well. For minimization of a function, GA operates by first generating, at random or optionally, several minimal solutions to the function that this set of solutions called the initial population and each solution as a chromosome. Then, using reproductive operators, we combine chromosomes and make a jump into them. If the function of newly produced chromosomes is lower than the previous chromosomes, these chromosomes can be added to the initial population or replaced with chromosomes with less function in this population. This process is repeated until convergence occurs or the end number of itteration obtained. Furthermore, we introduce another method of detecting outliers, the Tsay Pena and Pankratz (TPP) method. TPP uses some test statistics based on outliers size and VAR parameters. This method detects outliers in three stages. In stage I, it detects one by one outliers and remove their effects. Iteration done until no outlier found. In stage II, for detected outlier in stage I, the estimation of outliers effects are obtained simultaneously. Then, outliers with insignificant effects are removed. The VAR parameters re-estimated based on modified series of this stage. In stage III, we repeated stage I and II with new VAR parameter estimation. In each iteration of TPP, an outlier is detected and the effect of this outlier is removed from series (modified series). Then the parameter estimation is obtained from the modified series and the next outlier detection is continued using these estimates. This may lead to biased estimates and wrong detection of the next outlier point. In other words, in the TPP method, one detected outlier hides another outlier (masking), or one detected outlier reveals the usual observation as an outlier (swamping). This method often mis-detects the type of outliers. But in each iteration of GA, a random pattern of outliers (for testing) is first generated and a temporary modified series is obtained by removing effect of this pattern from series. Then the estimation of the parameters obtained and the detection of this pattern is tested. This work reduces the effect of the previously identified outliers on the full pattern of the outliers. In fact, if the random pattern of all outliers is correctly generated, almost effect of all of them will be eliminated in the modified series. Therefore using this temporary modified series, the GA obtained more accurate estimates and detected outliers more accurately. The simulation results confirm the validity of the GA method and the percentage of correct outlier detection in this method is higher than the TPP method. GA, of course, needs more time to calculate. Also, although the VAR model is used in both detection methods, the percentage of correct outlier detection in the VARMA model data is similar to the VAR model. Gas-furnace data were analyzed and modeled and it was determined that GA and TPP methods detected similar outliers. Fitting the VAR(6) model on these data shows that the variance of input gas error in modified data of GA to TPP is reduced by 17% and the variance of carbon dioxide error in the modified data of GA to TPP reduced by 43%.

    Keywords: Multivariate time series, Outlier, Detection, Genetic algorithms}
  • Omar Salim Ibrahim, Mohammed Jasim Mohammed

    The GLS and ML methods are the most common methods for estimating SEM but require multivariate normality. Therefore, methods robust to standard errors and quality of fit indexes to Chi-square have been proposed: MLR and they are considered superior to ML and GLS methods analyzing ordinal data. When we have a five-way Likert scale, the data is treated as continuous by calculating the covariance matrix as inputs for ML, GLS, and MLR. However, outliers are familiar because modeling requires a large sample size, either because of the input of data or answers more than is expressed within a particular category. Their presence affects even methods with robust corrections, where the accuracy of estimating parameters, standard errors, and fit indicators may be compromised the quality of fit indexes and inappropriate solutions, where a robust algorithm is proposed to clean the data from the outlier, as this proposed algorithm calculates the robust correlation matrix robust RFCH Reweighted Fast Consistent and High Breakdown, which consists of several steps and has been modified by taking the clean data before calculating the robust RFCH correlation matrix.  It was also suggested to make a comparison between the three methods before the treatment process with the presence of outlier values and note the extent of their impact on the methods and after using the robust RFCH method, and note the extent of improvement in estimations, standard errors and the overall quality of fit indexes for each of the Chi-square index, CFI, TLI, and RMSEA, SRMR and CRMR, with the robust corrections in the Chi-square index for each of the methods MLR. Through the simulation experiment, the researcher reached the power of the proposed method robust RFCH in improving the quality of parameter estimation, standard errors, and overall fit indexes quality.

    Keywords: outlier, robust RFCH, SEM, fit indexes, methods estimation}
  • جمیل اونق، حسین باغیشنی*، احمد نزاکتی
    کاربران مدل های رگرسیون کلاسیک دریافته اند که در عمل بسیاری از پذیره های این نوع مدل ها برقرار نیستند و باید مدل هایی را به کار گرفت که قادر به مدل بندی ماهیت واقعی داده ها باشند. رده مدل های جمعی تعمیم یافته برای همه پارامترهای یک توزیع شامل مکان، مقیاس و شکل، یک رده بسیار منعطف و پرطرفدار است که می تواند پیچیدگی های موجود در داده ها را لحاظ کند. در کنار ارایه یک مدل رگرسیونی برای پارامترهای مختلف توزیع متغیر پاسخ و نه فقط میانگین، مدل بندی داده های پرت نیز دارای اهمیت است. در مواردی که تعداد داده های پرت اندک است، استفاده از توزیع های دم سنگین می تواند پیچیدگی بیش از حد نیاز وارد مساله کند. در این مقاله، با در نظر گرفتن توزیع هایپربولیک سکانت با دم نیمه سنگین و تعبیه آن در چارچوب مدل های جمعی تعمیم یافته برای مکان، مقیاس و شکل، یک مدل رگرسیون نیمه پارامتری مکان-مقیاس جدید را برای رفع این مشکل در کنار حفظ انعطاف بالای مدل بندی اثرات متغیرهای رگرسیونی، معرفی می کنیم. کارایی مدل پیشنهادی را در مقایسه با مدل کلاسیک نرمال با یک مطالعه شبیه سازی بررسی می کنیم و کاربست آن را در یک مثال واقعی نمایش می دهیم.
    کلید واژگان: توزیع با دم نیمه سنگین, توزیع هایپربولیک سکانت, داده پرت, درستنمایی تاوانیده, رگرسیون مکان-مقیاس}
    Jamil Ownuk, Hossein Baghishani *, Ahmad Nezakati
    Practitioners who use the classical regression model have been realized that many of its assumptions seldom hold. We then need flexible models to capture the real intrinsic properties of data. The class of generalized additive models for location, scale, and shape is very flexible in analyzing the inherent complexity of the data. This class of models provides the ability to do regression modeling beyond the mean of the response variable. Indeed, to admit outliers in the modeling framework is vital. Where we have a few outliers, the model could be too complicated by using heavy-tailed distributions. To overcome this issue, in this paper, we introduce a new location-scale semiparametric regression that is constructed based on a semi-heavy-tailed distribution, named hyperbolic secant, in the considered class of the models. We explore the performance of the proposed model by a simulation study and compare the results with a classical normal model. We also illustrate the model in a real application.
    Keywords: Semi-heavy-tailed distribution, Hyperbolic secant distribution, Outlier, penalized Likelihood, Location-scale regression}
  • M. Mohammadi *, M. Sarmad

    Fuzzification of support vector machine has been utilized to deal with outlier and noise problem. This importance is achieved, by the means of fuzzy membership function, which is generally built based on the distance of the points to the class centroid. The focus of this research is twofold. Firstly, by taking the advantage of robust statistics in the fuzzy SVM, more emphasis on reducing the impact of outliers on the generalizability of SVM has been placed. Moreover, the variety of membership function for the elliptical data has been designated, based on the classic and robust Mahalanobis distance. Minimum covariance determinant and orthogonalised Gnanadesikan Kettenring estimators are employed in the structure of the robust--fuzzy SVM.By implementing the new membership function, the disadvantages of the traditional fuzzy membership function has been rectified. Simulated and real benchmarking data set confirm the effectiveness of the proposed methods. Compared with the traditional SVM and fuzzy SVM, these methods give a better performance on reducing the effects of outliers and significantly improves the classification accuracy and generalization.

    Keywords: Support vector machine, Noise, outlier, Robust statistics, Fuzzy membership function, Minimum covariance determinant estimator, Orthogonalised Gnanadesikan Kettenring estimator}
  • A. Gholam ÝabriÝ
    This paper will examine the relationship between "Data Envelopment Analysis" and a statistical concept ``Outlier". Data envelopment analysis (DEA) is a method for estimating the relative efficiency of decision making units (DMUs) having similar tasks in a production system by multiple inputs to produce multiple ýoutputs.ý An important issue in statistics is to identify the outliers. In this paper, we attempt to investigate the concept of the outliers determination by data envelopment analysis and assess the manner of decision making units when a sample contains an outlier. We will start by providing a review literature. We will then proceed with our proposed method and discuss the strengths and weaknesses of our method. We will provide some numerical results to demonstrate the applicability of our ýmethod.ý
    Keywords: Data envelopment analysis (DEA), Statistics, Outlier, Efficiency, Normal Distribution, Production Possibility Set (ýPPS)ý}
  • جلال چاچی، مهدی روزبه
    روش های برآوردیابی پارامترهای مدل های رگرسیون فازی کمترین مربعات خطا حساسیت (بسیار) زیادی نسبت به داده های پرت دارند. اغلب روش های موجود برآوردیابی پارامترهای این مدل ها با رویکرد کمترین مربعات خطا، تحت تاثیر داده های پرت، برآوردهایی نامناسب، دور از انتظار و با خطای زیاد ارائه می دهند. لذا در این مطالعه یک مدل رگرسیون فازی استوار کمترین مربعات پیراسته برای مدل سازی متغیرهای ورودی حقیقی-مقدار و متغیر خروجی فازی-مقدار معرفی خواهد شد. در این رویکرد، تابع هدف در برآوردیابی پارامترهای مدل به گونه ای ساختاربندی می شود که مجموع تا از کوچک ترین توان دوم باقیمانده های مرتب شده کمینه شوند. این روش دارای الگوریتمی است که با جستجو در مجموعه مشاهدات به برآورد بهترین پارامترهای مدل بر اساس ترکیب های مختلف انتخاب مشاهده خوب از مجموعه تایی مشاهدات، می پردازد. این موضوع باعث کاهش تاثیر مشاهدات پرت در فرآیند برآوردیابی پارامترهای مدل می شود. در انتها کاربرد روش پیشنهادی این مقاله در مدل سازی داده های واقعی در مهندسی آب (آب شناسی) که اغلب شامل مشاهدات پرت هستند، موردبررسی و مطالعه قرار می گیرد. ازاین رو، در این مطالعه به مقایسه بین روش پیشنهاد شده در این مقاله و روش متداول رگرسیون کمترین مربعات فازی که در آن مشاهدات پرت و مشاهدات خوب تاثیر یکسانی در برآوردیابی پارامترهای مدل دارند، پرداخته می شود. نتایج تجربی این مطالعه کاربردی برتری برازش بهتر روش پیشنهادی بر این داده ها را در مقایسه با روش متداول رگرسیون فازی کمترین مربعات خطا نشان می دهد. همچنین روش پیشنهاد شده در این مقاله مشاهدات پرتی را که تاثیر نامطلوبی در برآوردیابی پارامترها داشته اند را مشخص نموده است.
    کلید واژگان: رگرسیون فازی, رگرسیون کمترین مربعات پیراسته, داده پرت, دبی معلق}
    Jalal Chachi, Mahdi Roozbeh
    Estimation methods of parameters of fuzzy least-squares regression have very sensitivity to unusual data (e.g. outliers). In the presence of outliers, most of the existing estimation methods of parameters of this kind of models using least-squares approach provide unexpected and unreliable estimators with amounts of errors. Therefore, in this paper a robust least trimmed squares fuzzy regression model is described for modeling for crisp input-fuzzy output variables. In this approach, the constructed target function in model parameter estimation problem in such a way which minimizes the sum of the smallest squared residuals. This method has an algorithm that estimates the optimal values of the parameters based on different selected combinations of good observations of the data set of size . Therefore, this method has the ability of reducing the effects of such a data in estimation of the parameters of the model. Finally, the investigated fuzzy regression model is applied and studied to modeling real-world data set in hydrology which sometimes contains outlier points. In this regard, a comparison study between the proposed method and ordinary least squares fuzzy regression method is considered. The comparison results of the applied study reveal that for this particular data set the proposed method performs better fitting than the well-known ordinary fuzzy least-squares regression model. Also the proposed method identified the points that have bad effect on estimation problem of the parameters.
    Keywords: Fuzzy regression, Least trimmed squares regression, Outlier, Debi, Suspended load}
  • Mahnaz Mirbolouki *
    Data Envelopment Analysis (DEA) is a mathematical programming for evaluating efficiency of a set of Decision Making Units (DMUs). One of the problems in DEA, is distinguishing outlier DMUs which have a different behavior in contrast to the general prevailing behavior of the population. The important issue is that the outlier DMUs, which are caused by the incorrect way of collecting data or other unknown factors which can be social, political and etc. , can affect the efficiency of other DMUs. Thus, recognizing and excluding them from the population or reducing their effect and proportioning their status with the population can influence the improvement of total efficiency of population. Therefore, as a result, it prevented the incorrect deduction about the population. In this paper, it is assumed that the efficiency of population must have a unimodal symmetric distribution, and a method based on the skewness of efficiency and inefficiency presented. The important contribution of this method is that it can recognize all the outlier DMUs, in different layers.
    Keywords: Data Envelopment Analysis, Outlier, Skewness Coefficient, Normal Distribution}
  • F. Rezai Balf, R. Shahverdi, M. Hosseinaei
    Outliers are considered as a set of data that distinctly stands out from the rest of that data. Accepting or rejecting the outliers depends on various factors. The objective of this paper is to explain the accepting or rejecting conditions of outliers. Studying the congestion of the outlier units is one of the which through which the acceptance or rejection conditions can be figure out. In this method, it is first needed to identify the outliers that have congestion and then decide about the accepting or rejecting them. Discussions are presented following some examples to obtain higher level of underestimating of the proposed method. In addition, the return to scale of outliers are determined and discussed by using some examples.
    Keywords: Outlier, Congestion, Data envelopment analysis}
نکته
  • نتایج بر اساس تاریخ انتشار مرتب شده‌اند.
  • کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شده‌است. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
  • در صورتی که می‌خواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.
درخواست پشتیبانی - گزارش اشکال