An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For instance, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures like Euclidean and cosine similarity are not appropriate in many applications, metric learning algorithms have been developed with the aim of learning an optimal distance function from data. These methods often need training data in the form of pair or triplet sets. Nowadays, this training data is popularly obtained via crowdsourcing from the Internet.  Therefore, this information may be contaminated with label noise resulting in the poor performance of the learned metric. In some datasets, even it is possible that the learned metrics perform worse than the general ones such as Euclidean. To address this emerging challenge, we present a new robust metric learning algorithm that can identify outliers and label noise simultaneously from training side information. For this purpose, we model the probability distribution of label noise based on information in the training data. The proposed distribution function efficiently assigns the high probability to the data points contaminated with label noise. On the other hand, its value on the normal instances is near zero. Afterward, we weight the training instances according to these probabilities in our metric learning optimization problem. The proposed optimization problem can be solved using available SVM libraries such as LibSVM efficiently. Note that the proposed approach for identifying data with label noise is general and can easily be applied to any existing metric learning algorithms. After the metric learning phase, we utilized both the weights and the learned metric to enhance the accuracy of the metric-based classifier such as kNN. Several experiments are conducted on both real and synthetic datasets. The results confirm that the proposed algorithm enhances the performance of the learned metric in the presence of label noise and considerably outperforms state-of-the-art peer methods at different noise levels.

Language:
Persian
Published:
Signal and Data Processing, Volume:19 Issue: 1, 2022
Pages:
125 to 136
magiran.com/p2456829  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!