A partition-based algorithm for clustering large-scale software systems

Author(s):

Babak Pourasghar* , Habib Izadkhah , Shahriar Lotfi , Khayyam Salehi

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawbacks such as finiteness criterion and arbitrary decisions occurred in the process. Because of the NP-hardness of clustering software systems, evolutionary and search-based algorithms are more commonly used algorithm than hierarchical ones. In evolutionary algorithms, the clustering of software systems is considered as a problem of searching over some possible clustering candidates. Although these algorithms are often able to achieve an appropriate structure of the software, they are not applicable in clustering large-scale software. Furthermore, these algorithms are unable to consider the knowledge in the artifact dependency graph, which extracted from the source code of the software. In software systems, an artifact can be everything like a class, a function, or a file. In this paper, a new partition-based clustering algorithm is presented. This algorithm attempts to partition the artifact dependency graph considering the knowledge therein. Moreover, a new distance criterion is presented to measure the similarity and dissimilarity of the artifacts. The proposed algorithm starts with the artifact dependency graph and creates the similarity matrices of the artifacts. So, it attempts to refine the partition candidate until a fixed point is reached. We expect that the proposed method compared with other methods could lead to achieve the clustering with high quality and similar to the expert's clustering based on MoJo-FM measure. To demonstrate the applicability and validity of the proposed algorithm, a large-scale case study, Mozilla Firefox, is employed. The results demonstrate that the proposed algorithm outperforms the commonly used evolutionary methods in the literature.

Keywords:

Software Engineering , Reverse Engineering , Software Clustering , K-means algorithm

Language:

Persian

Published:

Signal and Data Processing, Volume:18 Issue: 4, 2022

Pages:

37 to 47

magiran.com/p2420996

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

فصلنامه پردازش علائم و داده ها

Signal and Data Processing

فصلنامه فنی مهندسی

آخرین شماره | آرشیو

ISSN: 2538-4201 eISSN: 2538-421X

صاحب امتیاز:

پژوهشگاه توسعه فناوری های پیشرفته خواجه نصیرالدین طوسی

مدیر مسئول:

دکتر جواد شیخ زادگان

سردبیر:

دکتر محمدحسن قاسمیان

تلفن نشریه: ۰۲۱-۸۳۸۵۷۶۰۵

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله

به جمع مشترکان مگیران بپیوندید!

A partition-based algorithm for clustering large-scale software systems

Babak Pourasghar* , Habib Izadkhah , Shahriar Lotfi , Khayyam Salehi

Software Engineering , Reverse Engineering , Software Clustering , K-means algorithm

فصلنامه پردازش علائم و داده ها

Signal and Data Processing