Automatic Error Detecting in Databases, Based on Clustering and Nearest Neighbor
Author(s):
Abstract:
Data quality affects on companies decision making, so that decisions based on data without quality incur companies high costs. Data quality has various dimensions and accuracy is the most important of these dimensions. Error detection is needed for data cleaning. Due to the huge volume of data, an automatic system is needed to perform this process without user interaction. In this paper an approach is proposed based on k-means clustering for error detection. Firstly data are clustered for each attribute. Then for each data in each cluster a method similar to k-nearest neighbor is used for detecting errors. The proposed method is able to detect multiple errors in one record. Also this approach is able to detect errors in fields with various attribute types. Experimental results show that this approach can detect 91% of errors in data on average. Also the proposed approach is compared with an automatic method which detects errors based on rule in various attribute types. Experimental results show that the proposed approach has on average 25%better performance to detect errors.
Language:
Persian
Published:
Iranian Journal of Electrical and Computer Engineering, Volume:14 Issue: 4, 2017
Pages:
349 to 356
magiran.com/p1719170
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یکساله به مبلغ 1,390,000ريال میتوانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.
In order to view content subscription is required
Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!