Sampling from social networks’s graph based on topological properties and bee colony algorithm

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:

In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as degree distribution, clustering coefficient, internal density and community structures, etc. There are various sampling methods such as random walk-based methods, methods based on the shortest path, graph partitioning-based algorithms, and etc. Each group of methods has its own pros and cones. The main drawback of these methods is the lack of attention to the high time complexity in making the sample graph and the quality of the obtained sample graph. In this paper, we propose a new sampling method by proposing a new equation based on the structural properties of social networks and combining it with bee colony algorithm. This sampling method uses an informed and non-random approach so that the generated samples are similar to the original network in terms of features such as network topological properties, degree distribution, internal density, and preserving the clustering coefficient and community structures. Due to the random nature of initial population generation in meta-heuristic sampling methods such as genetic algorithms and other evolutionary algorithms, in our proposed method, the idea of ​​consciously selecting nodes in producing the initial solutions is presented. In this method, based on the finding hub and semi-hub nodes as well as other important nodes such as core nodes, it is tried to maintain the presence of these important nodes in producing the initial solutions and the obtained samples as much as possible. This leads to obtain a high-quality final sample which is close to the quality of the main network. In this method, the obtained sample graph is well compatible with the main network and can preserve the main characteristics of the original network such as topology, the number of communities, and the large component of the original graph as much as possible in sample network. Non-random and conscious selection of nodes and their involvement in the initial steps of sample extraction have two important advantages in the proposed method. The first advantage is the stability of the new method in extracting high quality samples in each time. In other words, despite the random behavior of the bee algorithm, the obtained samples in the final phase mostly have close quality to each other. Another advantage of the proposed method is the satisfactory running time of the proposed algorithm in finding a new sample. In fact, perhaps the first question for asking is about time complexity and relatively slow convergence of the bee colony algorithm. In response, due to the conscious selection of important nodes and using them in the initial solutions, it generates high quality solutions for the bee colony algorithm in terms of fitness function calculation. The experimental results on real world networks show that the proposed method is the best to preserve the degree distribution parameters, clustering coefficient, and community structure in comparison to other method.

Language:
Persian
Published:
Signal and Data Processing, Volume:17 Issue: 3, 2020
Pages:
55 to 70
magiran.com/p2205059  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!