A N E W A L G O R I T H M F O R C L U S T E R I N G W E B-P A G E S B A S E D O N L I N K S A N D C O N T E N T

Author(s):

M. FATHIAN * , A.M. KARIMI-MAJD

Abstract:

In the midst of webpages, two issues raise for users to access the desired resources. These issues are speed and accuracy that are two important factors for user's satisfaction of web services, for which an appropriate information retrieval tool to provide suitable responses is required. Therefore, developing an efficient search engine could be useful in order to attract customers and increase their satisfaction.
However, Web search engines often face with a crucial problem, that is, their results, include highly diverse pages in correspondence with vague queries. This kind of diversity makes choosing the most relevant pages more difficult for search engines. On the other hand, the obtained results may be undesirable from the user's perspective. In such a situation, discovering natural grouping of pages and finding their representatives help the engines to cover all admissible meanings related to user's query. Clustering is the well-known approach for this reduction purpose, i.e., finding a few representatives among highly diverse Web pages.
In this paper, we focus on a pioneering algorithm and aim to improve it in terms of the quality of responses and the execution speed. To do so, we propose to provide initial clusters by means of a well-known algorithm, called K-means. This could be a proper initial point. We also reformulate a time-consuming formula of the main algorithm by taking advantages of the properties of linking network. Furthermore, we formulate a set of significant variables of the main algorithm to increase the quality of the clustering. These variables have been considered constant in the main algorithm. The experimental results on ground-truth datasets indicate that the performance of our algorithm is about 30%superior to the performance of the main algorithm both in terms of quality of clustering and execution speed.
Moreover, as an interesting case study, we execute our algorithm on the dataset of Persian blogs. We provided this dataset by collecting the information about links and texts included in some blogs. Implementing our algorithm on this interesting dataset provides marvelous results in the case of extracted clusters.

Keywords:

C?l?u?s?t?e?r?i?n?g , e-c?o?m?m?e?r?c?e , c?o?n?t?e?n?t , l?i?n?k , s?e?a?r?c?h e?n?g?i?n?e , c?o?m?p?l?e?x n?e?t?w?o?r?k?s

Language:

Persian

Published:

Industrial Engineering & Management Sharif, Volume:33 Issue: 1, 2017

Pages:

21 to 28

magiran.com/p1753181

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

مجله مهندسی صنایع و مدیریت شریف

Industrial Engineering & Management Sharif

دوفصلنامه فنی مهندسی

آخرین شماره | آرشیو

ISSN: 4741-2676 eISSN: 475x-2676

صاحب امتیاز:

دانشگاه صنعتی شریف

مدیر مسئول:

دکتر علی اکبر صالحی

سردبیر:

دکتر کوروش عشقی

تلفن نشریه: ۰۲۱-۶۶۱۶۴۰۹۳

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله راهنمای نویسندگان

به جمع مشترکان مگیران بپیوندید!

A N E W A L G O R I T H M F O R C L U S T E R I N G W E B-P A G E S B A S E D O N L I N K S A N D C O N T E N T

M. FATHIAN * , A.M. KARIMI-MAJD

C?l?u?s?t?e?r?i?n?g , e-c?o?m?m?e?r?c?e , c?o?n?t?e?n?t , l?i?n?k , s?e?a?r?c?h e?n?g?i?n?e , c?o?m?p?l?e?x n?e?t?w?o?r?k?s

مجله مهندسی صنایع و مدیریت شریف

Industrial Engineering & Management Sharif