Improving the Effectiveness of Open-Domain Question-Answering Systems for Answering Multi-hop Questions in Persian Language
Author(s):
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Today, one of the most popular and challenging tasks in natural language processing is answering complex user questions. Question-answering systems, as a new generation of search engines, receive user questions in natural language without semantic limitations and provide precise answers. In recent years, most research in the field of question-answering systems has been focused on the English language, and not much effort has been made in languages with limited resources, such as Persian. This limitation prevents these systems from efficiently handling questions in languages like Persian.In this regard, this article aims to enhance the efficiency of question-answering systems in the Persian language by creating a dataset for answering complex multi-turn questions. Multi-hop questions, require at least two steps of reasoning to reach an answer. This dataset, called PersianMHQA, is the first open-domain question-answering dataset in Persian and includes 7,000 multi-hop questions. It was generated using the Persian Wikipedia as a knowledge source. To evaluate and benchmark this dataset, it has been fine-tuned on the latest pre-trained language models that support the Persian language.The best results obtained based on F1 score and exact match on this dataset are 75.92% and 71.73%, respectively. These results indicate that this dataset is a powerful starting point for improving multi-hop complex question-answering for Persian language systems.In this regard, this article aims to enhance the efficiency of question-answering systems in the Persian language by creating a dataset for answering complex multi-turn questions. Multi-hop questions, require at least two steps of reasoning to reach an answer. This dataset, called PersianMHQA, is the first open-domain question-answering dataset in Persian and includes 7,000 multi-hop questions. It was generated using the Persian Wikipedia as a knowledge source. To evaluate and benchmark this dataset, it has been fine-tuned on the latest pre-trained language models that support the Persian language.The best results obtained based on F1 score and exact match on this dataset are 75.92% and 71.73%, respectively. These results indicate that this dataset is a powerful starting point for improving multi-hop complex question-answering for Persian language systems.
Keywords:
Language:
Persian
Published:
Journal of Soft Computing and Information Technology, Volume:13 Issue: 1, 2024
Pages:
1 to 10
https://magiran.com/p2782797