A Two-Level Semi-supervised Clustering Technique for News Articles

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:

The web and social media are overcrowded with news pieces in terms of amount and diversity. Document clustering is a useful technique that is widely used in organizing and managing data into smaller groups. One of the factors influencing the quality of clustering is the way documents are represented. Some traditional methods of document representation depend on word frequencies and create sparse and large-sized document vectors. These methods cannot preserve proximity information between documents. In addition, neural network-based methods that preserve proximity information suffer from poor interpretability. Conceptual text representation methods have overcome the shortcomings of previous methods, but semi-supervised text clustering does not currently use concept-based document representation. This paper presents a two-level semi-supervised text clustering method that uses labeled and unlabeled data simultaneously to achieve higher clustering quality. In the first level, documents are represented based on the concepts extracted from the raw corpus. Second, the semi-supervised clustering process applies unlabeled data to capture the overall structure of the clusters and a small amount of labeled data to adjust the center of the clusters. Experiments on the Reuters-21578 data collection show that the proposed model is superior to other semi-supervised approaches in both text classification and text clustering.

Language:
English
Published:
International Journal of Engineering, Volume:34 Issue: 12, Dec 2021
Page:
10
magiran.com/p2337608  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!