A Transformer-based Approach for Persian Text Chunking

Author(s):

P. Kavehzadeh , M. M. Abdollah Pour , S. Momtazi *

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

Over the last few years, text chunking has taken a significant part in sequence labeling tasks. Although a large variety of methods have been proposed for shallow parsing in English, most proposed approaches for text chunking in Persian language are based on simple and traditional concepts. In this paper, we propose using the state-of-the-art transformer-based contextualized models, namely BERT and XLM-RoBERTa, as the major structure of our models. Conditional Random Field (CRF), the combination of Bidirectional Long Short-Term Memory (BiLSTM) and CRF, and a simple dense layer are employed after the transformer-based models to enhance the model's performance in predicting chunk labels. Moreover, we provide a new dataset for noun phrase chunking in Persian which includes annotated data of Persian news text. Our experiments reveal that XLM-RoBERTa achieves the best performance between all the architectures tried on the proposed dataset. The results also show that using a single CRF layer would yield better results than a dense layer and even the combination of BiLSTM and CRF.

Keywords:

Persian text chunking , sequence labeling , deep learning , contextualized word representation

Language:

English

Published:

Journal of Artificial Intelligence and Data Mining, Volume:10 Issue: 3, Summer 2022

Pages:

373 to 383

https://magiran.com/p2488403

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

Journal of Artificial Intelligence and Data Mining

مجله هوش مصنوعی و داده کاوی

فصلنامه فنی مهندسی به زبان انگلیسی

آخرین شماره | آرشیو

ISSN: 2322-5211 eISSN: 2322-4444

صاحب امتیاز:

دانشگاه صنعتی شاهرود

مدیر مسئول و سردبیر:

دکتر حمید حسن پور

تلفن نشریه: ۰۲۳-۳۲۳۰۰۲۵۱

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله

به جمع مشترکان مگیران بپیوندید!

A Transformer-based Approach for Persian Text Chunking

P. Kavehzadeh , M. M. Abdollah Pour , S. Momtazi *

Persian text chunking , sequence labeling , deep learning , contextualized word representation

Journal of Artificial Intelligence and Data Mining

مجله هوش مصنوعی و داده کاوی