Efficiency of Machine Translation in the Language Processing Process; Using Context Clues in Finding the [Exact] Meaning of Quranic Words [In Persian]

Message:
Article Type:
Research/Original Article (بدون رتبه معتبر)
Abstract:

Translation is the transfer of the content of a text from the source language in to the target language, which is done by finding semantic equivalents between the two languages. The most important problems facing translation are the ambiguities in vocabulary and sentence structure. In a division, there are five important types of lexical ambiguity (categorical ambiguities, homophones, homographs, polysemy and transitive ambiguity), and two important types of structural ambiguity (real structural ambiguities and systemic ambiguities). Machine translation (MT), which is a part of the computer-based field of natural language processing (NLP) in computational linguistics and artificial intelligence, is considered as one of the automatic techniques that that convert unstructured text into structured data, and by converting text into information, it has been able to apply further analysis to the data to extract useful information. In this article, which was compiled in a library method, a theoretical plan has been proposed to resolve the issues surrounding the meaning of words in the machine translation of the Quran, the purpose of which is to help better understand the meaning of the words of the Quran, by taking advantage of the context clues and styles of the expressions. In the proposed method, a more suitable equivalent word is chosen in the target language by taking advantage of the context rule and text mining techniques, and referring to it. In this plan, the context is considered in the scale of words, which can be developed to other types if the conditions are met. In short, this plan has two steps: prioritizing (weighting) the adjacent words next to each other (any word within the range of verses where there is a consensus about their simultaneous descent) and then, comparing with the homonyms words (polysemous), and also comparing the equivalents of a word with the equivalents of other words (synonymization). In order to make the results more accurate, more specifications of the words can be prepared manually, tables that include things such as whether the verses are Meccan or Medinan, the order of revelation of the Surahs, the concepts and interpretations that are mentioned in the meaning of the words of the Qur'an in dictionaries such as Lisan al-Arab by Ibn Manzur and The Book of Vocabulary in the Strange Qur'an by Al-Ragheb Al-Isfahani and so on. Indexing techniques are used to obtain input data. In the pre-processing stage, the data that is less important (Stop Words) (such as “al-lazi (which)”, “al-lati (that is)”, “lam (not)”, “k'ana (was)”, “kaannama (as if)”, etc.) should be removed to get a better output. To change the shape of the data, the diacritic can be removed to make coding easier, and to reduce the sample size, the infix of the words can be used. In order to prepare a record of specifications for each word that is processed as input, based on the rule of context clues, at first, it is necessary to create a tokenizer, to prepare it in the primary data, and in the entire collection of input verses, a weight should be assigned to each word based on the two criteria of spatial proximity and frequency of repetition. The closer the words are to the desired word or the more it is repeated, the more weight is assigned to it, which represents their stronger semantic connection, and vice versa. Naturally, the words that are in the same verse (have the same number of the verse) have a greater influence than the words that are in other verses and at a further distance. In measuring the frequency criterion, weighted frequency (TF/IDF Weight) is used to show the importance of the word in the surah, the value (TF/IDF value) increases proportionally to the number of times a word appears in each surah or set of input verses, and is balanced by the number of verses that are in the Surah and contain the word. Finally, it was concluded that by using the contiguity of words and the semantic relations between them, and with the help of text mining techniques, a greater understanding of the vocabulary was obtained, which leads to a more appropriate selection of the equivalent word in the target language.

Language:
Persian
Published:
Journal of Studies in Applied Language, Volume:6 Issue: 2, 2023
Pages:
101 to 130
https://magiran.com/p2572019  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!