An Effective Method of Feature Selection in Persian Text for Improving the Accuracy of Detecting Request in Persian Messages on Telegram

Author(s):

zahra khalifeh zadeh , MohammadAli Zare Chahooki*

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

In recent years, data received from social media has increased exponentially. They have become valuable sources of information for many analysts and businesses to expand their business. Automatic document classification is an essential step in extracting knowledge from these sources of information. In automatic text classification, words are assessed as a set of features. Selecting useful features from each text reduces the size of the feature vector and improves classification performance. Many algorithms have been applied for the automatic classification of text. Although all the methods proposed for other languages are applicable and comparable, studies on classification and feature selection in the Persian text have not been sufficiently carried out. The present research is conducted in Persian, and the introduction of a Persian dataset is a part of its innovation. In the present article, an innovative approach is presented to improve the performance of Persian text classification. The authors extracted 85,000 Persian messages from the Idekav-system, which is a Telegram search engine. The new idea presented in this paper to process and classify this textual data is on the basis of the feature vector expansion by adding some selective features using the most extensively used feature selection methods based on Local and Global filters. The new feature vector is then filtered by applying the secondary feature selection. The secondary feature selection phase selects more appropriate features among those added from the first step to enhance the effect of applying wrapper methods on classification performance. In the third step, the combined filter-based methods and the combination of the results of different learning algorithms have been used to achieve higher accuracy. At the end of the three selection stages, a method was proposed that increased accuracy up to 0.945 and reduced training time and calculations in the Persian dataset.

Keywords:

Feature Selection , Text Mining , Classification Accuracy , Machine Learning , Ensemble Classifier

Language:

English

Published:

Journal of Information Systems and Telecommunication, Volume:8 Issue: 4, Oct-Dec 2020

Pages:

249 to 262

https://magiran.com/p2221799

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

Journal of Information Systems and Telecommunication

فصلنامه سیستم های اطلاعاتی و مخابرات

فصلنامه فنی مهندسی به زبان انگلیسی

Information Systems and Telecommunication

آخرین شماره | آرشیو

ISSN: 2322-1437 eISSN: 2345-2773

صاحب امتیاز:

جهاد دانشگاهی

مدیر مسئول:

مهندس حبیب الله اصغری

سردبیر:

دکتر مسعود شفیعی

تلفن نشریه: ۰۲۱-۸۸۹۳۰۱۵۰

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله راهنمای نویسندگان

به جمع مشترکان مگیران بپیوندید!

An Effective Method of Feature Selection in Persian Text for Improving the Accuracy of Detecting Request in Persian Messages on Telegram

zahra khalifeh zadeh , MohammadAli Zare Chahooki*

Feature Selection , Text Mining , Classification Accuracy , Machine Learning , Ensemble Classifier

Journal of Information Systems and Telecommunication

فصلنامه سیستم های اطلاعاتی و مخابرات