Semantic Textual Similarity of Persian-English sentences using deep learning
Semantic Textual similarity is one of the subtasks of natural language processing that has attracted extensive rese arch in recent years. Measuring semantic similarity between words, sentences, paragraphs, and documents plays an important role in natural language processing and computational linguistics. Semantic similarity of texts is used in question-answering systems, fraud detection, machine translation, information retrieval and etc. Semantic similarity means calculating the degree of similarity between two textual documents, paragraphs or sentences, which are presented in both monolingual and cross lingual forms. In this article, by using the parallel corpus, for the first time, the cross lingual model of semantic similarity for Persian-English sentences is presented, and then we test and compare our model with the Multilingual BERT model. The results show that by using parallel corpuses, the quality of sentence embedding in two different languages can be improved. Pearson correlation criterion based on cosine similarity between sentence's vector of multilingual Bert has increased from 65% to 73.77% by the proposed method. The proposed method was also tested on the Arabic-English language pair, and the results show that the proposed method is superior to the multilingual Bert.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.