Evaluating the Structure of the Inverted Pyramid in the Big Persian News Corpus: News Discourse Analysis based on the Correlation Coefficient between Title and News Content

Author(s):
Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
News discourse is a type of discourse analysis that deals with the analysis of news discourse. Due to the fact that in the formatting of news there are two hidden features of selection and prominence in the communication representation of news, the inverted pyramid of news is used to grade the importance of the discourse parts of the news. Although it is desirable to meet the structure of the inverted pyramid of news, sometimes this structure may change. In this article, we put an effort to analyse the discourse analysis of Persian news websites with the help of statistical analysis. To research the goal, data science can be used. This inter-discipline deals with data analysis from a scientific aspect, finding implicit concepts to be obtained from data analysis and extracting knowledge from the data. In the framework of data science, we examined the Persian news corpus and studied the existence of semantic correlation between the news title and the news content based on the structure of the news inverted pyramid. To achieve the goal, by using the crawling method, a relatively large news corpus with a volume of 14 billion words has been obtained from 24 news websites. After pre-processing and normalizing the corpus, in the framework of distributional semantics, the vector of title news and content have been created by using the Word2Vec tool for creating the vector model to have the vector representation of each news. After segmenting news content into three parts (lead, body and further explanation about the lead) according to the inverted pyramid, the Pearson correlation coefficient has been used to calculate the correlation between the title and each part of the news. Although Pearson's correlation coefficient was positive for a large number of news, zero value and no correlation was found for the news. On average, the correlation between the headline and the news lead and body was higher than the correlation between the headline and the lead development. This research can be used as a method to carefully select the title and content and filter the news according to the inverted pyramid structure.
Language:
Persian
Published:
Language and Linguistics, Volume:18 Issue: 35, 2023
Pages:
21 to 45
magiran.com/p2597178  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!