Determining the number of groups in geochemical data set using pattern recognition indices on the basis of separation and compactness of clusters

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Summary
This paper presents an innovative approach for calculating the correct number of groups in the geochemical data sets. The proposed method reduces the uncertainty of traditional methods that is often based on expert knowledge or application of a unique index. On the basis of separation and compactness of clusters, several pattern recognition indices (thirty indices) are used to produce the response distribution. Then, the optimal solution is concluded from the possible answers which are selected on the basis of the maximum frequency of distribution. This process has been implemented on a simulated data set which ultimately has been managed to properly identify the true number of artificial clusters. It has also been applied to a real geochemical data set, and consequently, three clusters are estimated as the optimum group numbers in the data set. The three groups resulted from data clustering are fully correlated with the geological and geochemical evidences in the study area.
Introduction
Partitioning of the heterogeneous data set into homogeneous subsets is an important goal of geochemical data processing which clustering tools are usually used to achieve this goal. Nevertheless, the most important practical challenge in this regard is an estimation of the actual number of underlying groups in the data set. This is traditionally related to descriptive geochemical information, expert knowledge, and unique statistical index. Due to the instability and uncertainty of the mentioned approaches, we recommend solving the problem by implementing the whole range of indices, creating a distribution of possible responses and consequently extracting the best answer.
Methodology and Approaches
To evaluate the performance of the proposed approaches, we generated a two-dimensional simulated data set containing four artificial clusters. The real geochemical data set that is used in this research includes 149 soil samples collected from the North Dalli porphyry Cu-Au deposit, located in Markazi province. Thirty indices were used to determine the optimal number of groups in the data set. These indices were essentially achieved from pattern recognition and their performance is based on maximizing the within-group separation and minimizing the between-group compactness.
Results and Conclusions
All indices were implemented in the R programming environment. The mode of response distribution in the case of simulated data was in compliance with the true number of artificial clusters. In case of the geochemical data set of the Dalli Cu-Au deposit, three clusters were identified. Clustering of geochemical data into these three groups indicated a clear geochemical zonation, which corresponds to the geological and mineralogical evidences in the study area.
Language:
Persian
Published:
Journal of Aalytical and Numerical Methods in Mining Engineering, Volume:9 Issue: 18, 2019
Pages:
61 to 76
https://magiran.com/p1986037  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!