Sub-Word Image Clustering in Old Printed Documents Using Template Matching

Author(s):
Message:
Abstract:
Due to the rapid growth of digital libraries, digitizing large documents has become an important topic. In a quite long book, similar characters, sub-words and words will occur many times. In this paper, we propose a sub-word image clustering method for the applications dealing with large uniform documents. We assumed that the whole document is printed in a single font and print quality is not good. To test our method, we created a dataset of all sub-words of a Farsi book. The book has 233 pages with more than 111000 sub-words manually labeled. We use an incremental clustering algorithm. Four simple features are extracted from each sub-word and compared with the corresponding features of each cluster center. If all feature's differences lie within certain thresholds, the sub-word and the winner cluster center are finely compared using a template matching algorithm. In our experiments, we show that all sub-words of the book are recognized with more than 99.7% accuracy by assigning the label of each cluster center to all of its members.
Language:
Persian
Published:
Iranian Journal of Electrical and Computer Engineering, Volume:11 Issue: 2, 2014
Page:
85
magiran.com/p1255746  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!