PHMM: Stemming on Persian Texts using Statistical Stemmer Based on Hidden Markov ModelPHMM: Stemming on Persian Texts using Statistical Stemmer Based on Hidden Markov Model

Message:
Abstract:
Stemming is the process of finding the main morpheme of a word and it is used in natural language processing, text mining and information retrieval systems. A stemmer extracts the stem of the words. Persian stemmers are classified into three main classes: structural stemmers, dictionary based stemmers, and statistical stemmers. The precision of structural stemmers is low and the expenses of dictionary based stemmers is high; therefore, the main goal of this research was to design and implement a statistical stemmer based on Hidden Markov Model with high precision in order to reduce the size of indexed file and increase the speed of information retrieval systems. In the present study, the proposed stemmer finds the prefixes and suffixes of a word and removes them, so that the rest of the word is considered to be the stem. But there are some exceptions in Persian words which would be considered as a stem mistakenly. So, at first a dictionary of Persian stemmers was collected and after that the proposed stemmer searched a word in the dictionary, if the word was not there, the stemmer found the stem of it by HMM based stemmer. This stemmer was tested in Bijankhan corpus and Hamshahri test collection. The results showed increment in mean average precision and recall. The speed of the Information retrieval system was increased and the size of indexed files were decreased by the algorithm.
Language:
English
Published:
International Journal of Information Science and Management, Volume:14 Issue: 2, Jul-Dec 2016
Page:
107
magiran.com/p1557469  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!