Data mining methods for quality control of research data; Case study of Iranian Scientific Database (GANJ)

Author(s):

Azadeh Fakhrzdaeh * , Mohammad Javad Ershadi , Mohammad Mahdi Ershadi

Message:

Article Type:

Case Study (دارای رتبه معتبر)

Abstract:

Research information databases and search engines are one of the main resources used by researchers every day. To accurately retrieve information from these databases, data need to be stored correctly. Manual controlling of data quality is costly and time-consuming. Here we suggest data mining methods for controlling the quality of a research database. To this end, common errors that are seen in a database should be collected. Metadata of every record in addition to its error codes is saved in a dataset. Statistics and data mining methods are applied to this dataset and patterns of errors and their relationships are discovered. Here we considered Iran's scientific information database (Ganj) as a case study. Experts defined 59 errors. Intimate features of every record, such as its subject, authors' names and name of the university, with its error codes were saved in a dataset. The dataset containing 41021 records was formed. Statistics methods and association rules were applied to the dataset and the relationship between errors and their pattern of repetition was discovered. Based on our results, in average by considering 25 % of errors in every subject, up to 80% of errors of all the records in a subject are covered. All the records were also clustered using K-means clustering. Although there was some similarity between records of different subjects, there was not seen any evident relationship between the pattern of repetition of the errors and the subject of records.

Keywords:

Data quality , Research information quality , quality control , Clustering

Language:

Persian

Published:

Journal of Information Processing and Management, Volume:38 Issue: 3, 2023

Pages:

927 to 944

https://magiran.com/p2555086

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با ثبت ایمیلتان و پرداخت حق اشتراک سالانه به مبلغ 1,950,000ريال، بلافاصله متن این مقاله را دریافت کنید.اعتبار دانلود 70 مقاله نیز در حساب کاربری شما لحاظ خواهد شد.

پرداخت حق اشتراک به معنای پذیرش "شرایط خدمات" پایگاه مگیران از سوی شماست.

پست الکترونیکی

اگر مقاله ای از شما در مگیران نمایه شده، برای استفاده از اعتبار اهدایی سامانه نویسندگان با ایمیل منتشرشده ثبت نام کنید. ثبت نام

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر ثبت نام با ایمیل دانشگاهی/سازمانی

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

سامانه نویسندگان

Author (3)

Ershadi, Mohammad Mahdi

MSc Graduated Department of Industrial Engineering and Management Systems, Amirkabir University of Technology, Amirkabir University of Technology, تهران, Iran

اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.

مقالات دیگری از این نویسنده (گان)

Multi-Objective Modeling of Green Vehicle Routing Problem Using a Hybrid Extreme Learning Machine (ELM) and Genetic Programming (GP)
Mohammad Mehdi Ershadi, Mahsa Momeni Sharifabad, Mohammad Javad Ershadi *, Amir Azizi, Samaneh Behzadipour
Iranian Journal of Supply Chain Management,
Analyzing prioritization of customers' preferences in Islamic banking system using Kano model: A case study
Zeinab Rahimi Rise, Mohammad Mahdi Ershadi *
Journal of Quality Engineering and Production Optimization, Summer-Autumn 2022

علمی مصوب

پژوهشنامه پردازش و مدیریت اطلاعات

Journal of Information Processing and Management

فصلنامه علوم انسانی

آخرین شماره | آرشیو

ISSN: 2251-8223 eISSN: 2251-8231

تا پاییز 1384 با نام «علوم اطلاع رسانی» منتشر شده است.

صاحب امتیاز:

پژوهشگاه اطلاعات و مدارک علمی ایران

مدیر مسئول:

دکتر محمد حسن زاده

سردبیر:

دکتر سید رحمت الله فتاحی

تلفن نشریه: ۰۲۱-۶۶۴۹۴۹۸۰

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله