Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

Abstract:
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucity of works in the field of Persian language due to lack of reliable plagiarism checkers in Persian there is a need for a method to improve the accuracy of detecting plagiarized Persian phrases. Attempt is made in the article to present the PCP solution. This solution is a combinational method that in addition to meaning and stem of words, synonyms and pluralization is dealt with by applying the document tree representation based on manner fingerprinting the text in the 3-grams words. The obtained grams are eliminated from the text, hashed through the BKDR hash function, and stored as the fingerprint of a document in fingerprints of reference documents repository, for checking suspicious documents. The PCP proposed method here is evaluated by eight experiments on seven different sets, which include suspicions document and the reference document, from the Hamshahri newspaper website. The results indicate that accuracy of this proposed method in detection of similar texts in comparison with "Winnowing" localized method has 21.15 percent is improvement average. The accuracy of the PCP method in detecting the similarity in comparison with the language-free tool reveals 31.65 percent improvement average.
Language:
English
Published:
Journal of Artificial Intelligence and Data Mining, Volume:4 Issue: 2, Summer-Autumn 2016
Pages:
125 to 133
magiran.com/p1556153  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!