Magiran | فهرست مطالب نویسنده: Seyed Mostafa Fakhrahmad

تاثیر کمبود و پراکندگی داده بر اثربخشی نتایج سامانه ژورنال یاب رایسست: مطالعه موردی حوزه فنی و مهندسی

نرجس ورع، مهدیه میرزابیگی*، هاجر ستوده، سید مصطفی فخراحمد، نیلوفر مظفری

پژوهشنامه پردازش و مدیریت اطلاعات، سال سی و هفتم شماره 4 (پیاپی 110، تابستان 1401)، صص 1293 -1317

عوامل متعددی از مجموعه عناصر تشکیل دهنده سامانه های پیشنهاددهنده در تولید و ارایه پیشنهاد دخیل هستند. مطالعه حاضر، با هدف شناخت تاثیر دو چالش کمبود و پراکندگی داده بر اثربخشی نتایج پیشنهادی سامانه ژورنال یاب رایسست انجام شده است. بدین منظور بیش از 15000 مقاله از نشریه های فنی و مهندسی در بازه زمانی 1392 تا 1396 از وب سایت نشریه ها گرداوری شد. در مرحله بعد عناصر متنی این مقاله ها شامل عنوان، چکیده و واژه های کلیدی استخراج، نرمال ‏سازی و پردازش شد و پایگاه داده پیکره پژوهش ایجاد گردید. بر اساس تعداد مقاله های گردآوری شده، با استفاده از فرمول کوکران تعداد 400 مقاله پایه که پیش از این در نشریه های مرتبط با موضوع منتشر شده بودند، به روش تصادفی- تناسبی، انتخاب شد. عنوان و چکیده این مقاله ها، به منظور دریافت نشریه های پیشنهادی سامانه، جهت چاپ مقاله در دو مرحله پیش و پس از بهبود دو چالش کمبود و پراکندگی داده به عنوان پرسمان وارد سامانه شد. سپس نتایج پیشنهادی در هر مرحله در قالب فایل اکسل ذخیره گردید. در نهایت میزان اثربخشی نتایج سامانه در هر مرحله، به روش اعتبارسنجی یک طرفه و بر اساس معیار دقت در k تعیین شد. فراوانی نسبی رده ها نشان داد در وضعیت موجود، نشریه هدف تنها در 26 درصد از پرسمان ها در 3 رتبه نخست پیشنهاد شده است. در راستای بهبود چالش کمبود داده با غنی سازی، نرمال سازی و پردازش داده ها اثربخشی نتایج در 3 رتبه نخست به میزان 15 درصد افزایش یافت. اما همچنان در بیش از 30 درصد پرسمان ها، نشریه هدف در رتبه 10 و بالاتر پیشنهاد شده بود. بنابراین در مرحله بعد به منظور بهبود چالش پراکندگی، دسته بندی موضوعی داده ها انجام و افزایش 30 درصدی اثربخشی نتایج نسبت به مرحله پیشین در 3 رتبه نخست حاصل گردید. بر این اساس یکی از عواملی که منجر به کاهش اثربخشی نتایج پیشنهادی سامانه ژورنال یاب رایسست می گردد، کمبود و پراکندگی داده ها است؛ که با غنی سازی پایگاه داده، بهبود فرآیند پردازش و دسته بندی موضوعی داده ها می توان به میزان قابل توجهی با این دو چالش مقابله و اثربخشی نتایج پیشنهادی سامانه را بهبود بخشید.

کلید واژگان: اثربخشی, سامانه پیشنهاد دهنده نشریه, کمبود داده, پراکندگی داده, سامانه ژورنال یاب رایسست

The Impact of Data Lack and Data Sparsity on the Effectiveness of the Results of the RICeST Journal Finder Results: A Case Study in the Field of Engineering

Narjes Vara, Mahdieh Mirzabeigi*, Shajar Sotudeh, Seyed Mostafa Fakhrahmad, Niloofar Mozafari

Journal of Information Processing and Management, Volume:37 Issue: 4, 2022, PP 1293 -1317

Several factors are involved in the production and presentation of recommender systems.The aim of this study was to investigate the effect of the two challenges lack and sparsity of data on the effectiveness of the proposed results of the RICeST Journal Finder. The corpus includes more than 15,000 articles from technical and engineering publications in the period 2013 to 2017, which have been collected from their website. Textual elements of these articles were extracted, normalized and processed, and a research body database was created. Based on the number of collected articles, by using Cochran's formula, 400 basic articles that previously published in related to the topic of each journal were selected in a random-proportional method. Title and abstract of these articles as a query, in order to receive the system Journals suggested, to print the article in two stages of before and after improving the two challenges of lack and sparsity of data in the test corpus. The suggested results in each step were saved in Excel. Finally, the effectiveness of the system results in each stage was determined by Leave-one-out Cross-Validation method and based on the accuracy criterion in k.The relative abundance of categories showed that, in the current situation, the target journal was suggested in only 26% of searches in the first 3 ranks. After enriching, normalizing and processing the data and thus improving the lack of data challenge, although 30% of the results were still rated above 10; But the accuracy of the results in the first 3 ranks increased by 15%. Also, after thematically categorizing the data with the aim of improving the sparsity challenge, 30% increase in the accuracy of the system results in the first 3 ranks compared to the previous step was achieved. The results of this study showed that enriching the database, improving the processing process and thematic classification of data in RICeST journal finder can reduce the two challengs lack and sparsity of data and increase the effectiveness of the proposed results of this systems.

Keywords: Efficiency, Journal Finder, lack of Data, Data Sparsity, RICeST Journal Finder

مقایسه دیدگاه پژوهشگران حوزه فنی- مهندسی و علوم انسانی در ارتباط با اهمیت معیارهای ارسال مقاله به نشریه و میزان ربط موضوعی نتایج پیشنهادی سامانه ژورنال یاب رایسست

نرجس ورع، مهدیه میرزابیگی*، هاجر ستوده، سید مصطفی فخراحمد

نشریه علوم و فنون مدیریت اطلاعات، سال هشتم شماره 2 (پیاپی 27، تابستان 1401)، صص 53 -72

هدف

این مطالعه با هدف مقایسه دیدگاه پژوهشگران حوزه فنی- مهندسی و علوم انسانی در ارتباط با اهمیت معیارهای ارسال مقاله به نشریه و میزان ربط موضوعی نتایج پیشنهادی سامانه ژورنال یاب رایسست انجام شده است.

روش پژوهش

پژوهش حاضر به لحاظ هدف کاربردی و روش گردآوری داده ها پیمایشی است. گام اول مطالعه، مبتنی بر پرسشنامه محقق ساخته و دیدگاه پژوهشگران و گام دوم بر اساس سیاهه وارسی مشتمل بر عناصر متنی مقالات و نظر متخصصان موضوعی/ داوران انجام شده است.

نتایج

یافته ها نشان داد معیارهای بررسی کارشناسانه/ داوری، ربط موضوعی مقاله با دامنه موضوعی نشریه و داشتن ضریب تاثیر، از دیدگاه پژوهشگران در هر دو حوزه مورد بررسی، دارای بیشترین میزان اهمیت و قدمت نشریه در رتبه آخر قرار داشت. همچنین سنجش میزان ربط موضوعی نتایج پیشنهادی سامانه، بر اساس نظر متخصصان نشان داد در بیش از 85 درصد پرس-وجوها، نشریه پیشنهادی برای مقاله مورد نظر کاملا مرتبط است و تفاوت معنی دار آماری بین نظر متخصصان/داوران در این دو حوزه وجود ندارد.

نتیجه گیری

با توجه به امکان پالایش نتایج پیشنهادی بر اساس معیارهای دارای اولویت، بنظر میرسد استفاده از این سامانه می-تواند به عنوان ابزار کمکی برای پژوهشگران مفید واقع شود.

کلید واژگان: انتشار مقاله, پژوهشگران, رایسست, ژورنال یاب, معیارهای انتخاب نشریه, نشریه مرتبط

Comparison of the views of researchers in the field of engineering and humanities about the importance of criteria for submitting an article to the journal and the degree of thematic relevance of the proposed results of the RICeST Journal Finder

Narjes Vara, Mahdieh Mirzabeigi *, Hajar Sotudeh, Seyed Mostafa Fakhrahmad

Journal of Sciences and Techniques of Information Management, Volume:8 Issue: 2, 2022, PP 53 -72

Aim

This study aims to compare the researchers' views in two fields about the importance of criteria for submitting an article to the journal and the degree of thematic relevance of the proposed results of the RICeST Journal Finder.

Methodology

The research is a survey in terms of applied purpose and the data collection method and was done in two steps; First, due to the lack of a standard questionnaire, while studying the literature, important and common criteria for the researchers of the two fields were extracted and a researcher-made questionnaire consisting of 13 criteria was prepared. The face and content validity of the questionnaire was done by 10 experts in information science and epistemology. Then, in order to identify the importance of the criteria as well as the thematic relevance of the system's proposed results, subject matter experts (reviewers) in two fields from all over Iran were used.

Findings

Criteria of Peer review, thematic relevance of the article with the thematic scope of the journal, and having an impact factor from the perspective of researchers in both groups, were the most important and the age of the journal was the least. Measuring the thematic relevance of the results suggested by the system using the opinions of experts showed that in more than 85% of the queries, the proposed publication for the article is completely relevant. There is no statistically significant difference between the opinions of experts in these two areas. It is necessary to explain that the evaluation of the thematic relevance of the results was done after improving the existing challenges in the RICeST journal finder.

Conclusion

There are many Journals in various scientific fields, so authors are facing challenges to find the most appropriate journal to publish their research findings. The results showed that the importance of the criteria of selecting the journal, from the viewpoint of national researchers, is consistent with the findings of international studies. However, considering the mental variables of researchers in different conditions, it is not possible to consider a single factor category to choose a journal to publish a manuscript; However, the possibility of refining the results, based on the priority criteria of the researchers has been introduced the use of Journal finder systems as an auxiliary tool that can be found more quickly and easily to a list of related publications. Among the other factors that can be examined objectively and are important, regardless of the author's priorities and limitations, is the thematic connection of the manuscript with the journal to which the manuscript is to be sent and published, which is the basis of the performance of the Journal finder systems. In general, according to the obtained results, the authors can use the RICeST journal finder at the national level to obtain relevant journals to publish the manuscript.

Keywords: Related Journal, Article Publishing, Journal Selection Criteria, Researchers, RICeST, Journal Finder

ارائه روشی نوین برای استخراج خودکار چهریزه ها در جستجوهای چهریزه ای (مورد مطالعه: حوزه زنان و زایمان)

عبدالحسین فرج پهلو، فریده عصاره، سید مصطفی فخراحمد، لیلا دهقانی*

پژوهشنامه پردازش و مدیریت اطلاعات، سال سی و هفتم شماره 3 (پیاپی 109، بهار 1401)، صص 807 -837

هدف این پژوهش ابداع و معرفی الگوریتمی نوین برای استخراج چهریزه ها ست که امکان تجربی شناسایی چهریزه ها با کمک پشتوانه انتشاراتی را فراهم می کند. الگوریتم پیشنهادی بر مبنای دو ایده شکل گرفته است: ایده اول این است که چهریزه در بافت بروز پیدا می کند. بنابراین برای تشخیص چهریزه در یک بدنه متنی بایستی بافت یا بستر آن مورد بررسی قرار گیرد و ایده دوم این است که چهریزه نقطه تمرکز در یک درخت واژگانی است که نه بسیار عام و نه بسیار خاص است. در حوزه پزشکی، دامنه زنان و زایمان به عنوان بستر آزمون انتخاب گردید. سه پیکره ی متنی از درون پشتوانه انتشاراتی انتخاب شد. پیکره ی بستر، از چکیده و عنوان مجموعه مقالات موجود در 20 مجله برتر حوزه انتخاب شد که در برگیرنده 167071 سند بود. پیکره دوم، پیکره منشاء بود که 2000 مقاله به صورت تصادفی از پیکره بستر، انتخاب شد. پیکره سوم، پیکره واژگانی است که با استفاده از یک سرویس تحت وب و معیار رتبه بندی واژگان LIDF-value استخراج گردید. خروجی حاصل، در برگیرنده 514 واژه بود. واژگان تکراری حذف شدند و در نهایت 480 واژه مهم شناسایی شد. سپس، واژگان در پیکره بستر با کمک مجموعه راهنما یعنی Mesh ، بسط داده شد و پس از آن بر اساس دو شرط انتقال مبتنی بر تکرار یعنی بیشتر بودن اسناد مرتبط با واژه در بستر نسبت به منشاء و انتقال مبتنی بر رتبه یعنی رشد رتبه موجود واژه در پیکره بستر نسبت به منشاء که نشان دهنده عام شدن واژه است، چهریزه های کاندید استخراج شدند. در نهایت با استفاده از سه قاعده ی اخص بودن، جایگزنی و اعم بودن، چهریزه های شناسایی شده اصلاح و نام گذاری شدند. در نهایت 26 چهریزه به عنوان چهریزه های حوزه زنان و زایمان شناسایی شدند. با مقایسه الگوریتم پیشنهادی با دیگر الگوریتم ها مشخص شد که ایجاد سه افراز (افراز منشاء و بدنه متنی و افراز برای شناسایی واژگان مهم) و مقایسه رفتار واژه در آنها و سپس ایجاد درخت بر اساس چهریزه های کاندید یعنی ترکیب رویکرد آماری و هرس درخت می تواند نتایج مناسب تری نسبت به رویکرد صرفا آماری یا هرس درخت داشته است. همچنین، مقایسه چهریزه های خروجی از الگوریتم و چهریزه های سنتی در این زمینه نشان داد که چهریزه های خروجی الگوریتم، خرد تر و برای مرور در ابزارهای بازیابی اطلاعات مفید تر هستند. همچنین،در این پژوهش مشخص شد که چهریزه های دامنه تخصصی از چهریزه های عمومی در حوزه پزشکی متفاوت است و مستقل از آنها قابل شناسایی و تعریف است اما نمی توان، نتایج را به تمامی دامنه های پزشکی تعمیم داد و نیاز است پژوهش های دیگری در دیگر حوزه ها صورت گیرد.

کلید واژگان: بازیابی اطلاعات, چهریزه, جستجوی چهریزه ای, استخراج خودکار چهریزه

Introducing a novel method for Automatic facet extraction in the faceted search (Case Study: gynecology and obstetrics domain)

Abdolhossein Farajpahlou, Farideh Osareh, Seyed Mostafa Fakhrahmad, Leila Dehghani*

Journal of Information Processing and Management, Volume:37 Issue: 3, 2022, PP 807 -837

In this research, a new algorithm for facets extraction has been developed and introduced, which provides the experimental possibility of identifying facets based on a literary warrant. In the field of automatic facet extraction, two main ideas were considered by reviewing the researches. The first idea is that the facet appears in the context. Therefore, to identify the facet in a corpus, its context must be examined. The second idea is that the facet is the focal point in a lexical tree that is neither very general nor very specific. Based on these two ideas, first, the corpus in the medicine area and the obstetrics and gynaecology domain was prepared. The research team selected three corpora from the literary warrant and used the abstract and title of the collection of articles in the top 20 journals of the field to create a contextual corpus. This collection contained 167071 documents. 2000 articles were randomly selected to create the origin corpus. The third body is the lexical corpus. The proper words of the corpus were extracted using a web-based service. The output contained 514 words. Duplicate words were removed and finally, 480 important words were identified. Then, the words were expanded in the contextual corpus with the help of the guide set- Mesh and then-candidate dissertations were extracted based on the two conditions of frequency-based Shifting and rank-based Shifting. Finally, using the three rules of specificity, substitution, and generality, the identified facets were modified and named. Finally, 26 facets were identified in the domain of gynaecology and obstetrics. Comparing the proposed algorithm with other algorithms, it was found that the combination of statistical approach and tree pruning can have better results than purely statistical approach or tree pruning. Also, the comparison of the output facets of the algorithm with the traditional facets in this obstetrics and gynaecology domain showed that the output of the algorithm is smaller and more useful for browsing information retrieval tools. Also, in this study was specified that specialized domain facets are different from general facets and can be redefined independently, but the results cannot be generalized to all medical domains and other research needs to be done in other fields.

Keywords: data retrieval, facet, faceted search, automatic facet extraction

استخراج کلمات و عبارات کلیدی از متون فارسی(مروری بر پژوهش های صورت گرفته)

عاطفه کلانتری*، عبدالرسول جوکار، سید مصطفی فخراحمد، جواد عباس پور، هاجر ستوده، مسعود مرتضوی نصرآباد، امیر جوادی، زهرا پوربهمن

پژوهشنامه پردازش و مدیریت اطلاعات، سال سی و ششم شماره 2 (پیاپی 104، زمستان 1399)، صص 563 -592

استخراج کلمات/ عبارات کلیدی متن، پیش‏‏نیاز بسیاری دیگر از وظایف حوزه پردازش زبان طبیعی است. اما بررسی متون فارسی و انگلیسی این حوزه نشان ‏می ‏دهد، تلاش‏های انگشت‏شماری برای استخراج کلمات/ عبارات کلیدی از متون فارسی صورت گرفته است. لذا، این مقاله، ‏با هدف تعیین موقعیت کنونی پردازش زبان طبیعی فارسی و ‏به‏طور خاص استخراج کلمات/ عبارات کلیدی از متون فارسی، ‏به‏ مرور خلاصه‏‏‏‏ای ‏از مقالات فارسی و انگلیسی منتشر‏شده در این حوزه که از متون فارسی برای آزمودن ایده‏هایشان استفاده کرده‏‏‏اند‏، ‏می‏پردازد؛ سپس هر مقاله را از نظر روش‏‏شناسی، نحوه اجرا و ‏پیاده‏سا‏‏زی، روش ارزیابی و معیارهای آن مورد تعمق قرار داده و به چالش ‏می‏کشد.در مجموع 14 مقاله فارسی و 6 مقاله انگلیسی به استخراج کلمات و عبارات کلیدی از متون فارسی پرداخته‏ اند‏. روش بیشتر این مقالات، استفاده از اطلاعات آماری و ‏زبان‏‏‏شناختی بوده ‏است. اکثر این مقالات یا در روش‏شناسی انتخاب‏ شده ایراد دارند و یا نویسندگان نتوانسته‏ اند‏ ایده پیشنهادی‏شان را ‏به ‏وضوح برای خواننده تبیین نمایند. ‏ در بسیاری از مقالات، از مجموعه داده استانداردی برای ارزیابی سیستم استفاده نشده و نحوه محاسبه معیارهای ارزیابی مبهم یا دارای اشکال است.در مجموع، ‏به ‏جز 3 مقاله که روش اجرا‏شده را ‏به ‏نحو نسبتا قابل‏قبولی گزارش کرده‏اند‏، سایر مقالات قابلیت تکرار‏پذیری و تعمیم ندارند. لذا نمی‏توان از آن‏ها ‏به‏ عنوان معیار پایه‏‏ای ‏برای ارزیابی سیستم‏های آینده استفاده کرد یا از ایده مطرح‏ شده در آن‏ها با اطمینان در ساخت و توسعه نرم‏افزارهای کاربردی و عملی در حوزه استخراج کلمات کلیدی استفاده نمود.

کلید واژگان: استخراج کلمات کلیدی, استخراج عبارات کلیدی, پردازش زبان طبیعی, زبان فارسی, بررسی مروری

Keyword and phrase Extraction from Persian texts: a review of the literature

Atefeh Kalantari*, Abdolrasool Jowkar, Seyed Mostafa Fakhrahmad, Javad Abbaspour, Hajar Sotudeh, Massoud Mortazavi, Amir Javadi, Zahra Pourbahman

Journal of Information Processing and Management, Volume:36 Issue: 2, 2021, PP 563 -592

Keyword and phrase extraction is a prerequisite of many natural language processing tasks. However, a review on the related Persian and English literature showed that a few studies have already been done on how to extract keywords and phrases from Persian texts. Thus, Aiming to shed light on the research status of Keyword and phrase extraction from Persian texts, the present study reviews the Persian and English publications which have assessed their research ideas over Persian texts. We also focus on each of the studies to challenge their methodologies, implementations and evaluation methods and measures.To our knowledge, a total number of 14 Persian and 6 English papers exist which have worked on the extraction of Persian keywords and phrases. Investigating on the papers revealed that they were mostly based on statistical and linguistic information. A majority of the papers suffered from the lack of either appropriate methodologies or lucid explanation of their research ideas. They generally used non-standard datasets and vague or problematic metrics to evaluate the experimental systems. Generally speaking, except for 3 papers that appropriately reported their proposed methods, the other papers lacked reproducibility and generalizability. Hence, their results cannot be confidently used as a benchmark in evaluating future works, and their proposed ideas cannot be employed in developing applications for extraction of key words and phrases from Persian texts.

Keywords: extraction, key words, key phrases, natural language processing, Persian language, review

استخراج چهریزه های حوزه موضوعی زنان و زایمان بر اساس رویکرد کاربرمدار

عبدالحسین فرج پهلو، فریده عصاره، سید مصطفی فخراحمد، لیلا دهقانی*

نشریه مدیریت اطلاعات سلامت، سال شانزدهم شماره 6 (پیاپی 70، بهمن و اسفند 1398)، صص 285 -293

مقدمه

اگرچه مفهوم تحلیل چهریزه ای در رده بندی و سیستم های بازیابی اطلاعات قدمتی طولانی دارد، اما به کارگیری رویکرد تحلیل چهریزه در سیستم های بازیابی امروزی با مشکلاتی همراه است که یکی از این مشکلات، عدم توجه مناسب به کاربر به عنوان ذی نفع اصلی سیستم می باشد. هدف از انجام پژوهش حاضر، ارایه روشی برای استخراج چهریزه های مناسب در سیستم های بازیابی اطلاعات نوین با استفاده از رویکرد کاربرمدار بود.

روش بررسی

برای درک نیاز کاربران و دستیابی به چهریزه های حوزه تخصصی زنان و زایمان، از روش تحلیل محتوای قراردادی با رویکرد کیفی استفاده شد. ابتدا با 14 متخصص مامایی و زنان و زایمان مصاحبه صورت گرفت و نیازهای اطلاعاتی گروه کاربری شناسایی گردید. سپس نیازهای اطلاعاتی با کمک متخصصان حوزه موضوعی طبقه بندی و چهریزه ای به هر طبقه نسبت داده شد. به منظور ارزیابی مفید بودن چهریزه های استخراج شده، از یک گروه خبره متشکل از 8 متخصص موضوعی و 8 متخصص کتابداری و اطلاع رسانی پزشکی استفاده گردید و توافق بر اساس فرمول توافق کل مورد ارزیابی قرار گرفت.

یافته ها

بر اساس کد های استخراج شده از مصاحبه های مربوط به بخش تعیین نیاز های اطلاعاتی ذی نفعان حوزه زنان و زایمان، 23 به دست آمد که از میان آن ها، 9 چهریزه «گروه سنی، ارگان، روش های درمانی، تشخیص، بیماری، علایم و نشانه ها، عامل خطر، عارضه و پیش آگهی» با دریافت ضریب توافق بالای 80 درصد، به عنوان چهریزه های مناسب توسط خبرگان شناسایی شد.

نتیجه گیری

استخراج چهریزه های سیستم های بازیابی اطلاعات بر اساس رویکرد کاربرمدار، سبب می شود که چهریزه ها از حالت عمومی به تخصصی تبدیل گردد. در این صورت، چهریزه ها برای کاربران هر حوزه تخصصی در رابط کاربری متفاوت خواهد بود و بدین ترتیب رابط های کاربری تخصصی شکل می گیرد.

کلید واژگان: رفتار اطلاع یابی, ذخیره سازی و بازیابی اطلاعات, طبقه بندی

The User-oriented Approach for Facet Extraction in Gynecology and Obstetrics Domain

Abdolhossein Farajpahlou, Farideh Osareh, Seyed Mostafa Fakhrahmad, Leila Dehghani*

Health Information Management, Volume:16 Issue: 6, 2020, PP 285 -293

Introduction

Although the concept of facet analysis has a long background in the classification and information retrieval systems, the use of facet analysis approach in information retrieval systems has been associated with drawbacks. One of these drawbacks is lack of proper attention to the user as the main stakeholder of the system. In this study, a method is presented for appropriate facet extraction in the modern information retrieval systems.

Methods

In order to perceive the need of users and achieve the facets of gynecology and obstetrics, the Contractual Content Analysis method with a qualitative approach was employed. First, the information needs of the user group were identified after having interviews with 14 specialists in the fields of gynecology and obstetrics. Then, the information needs were classified with the help of specialists in the subject area and a facet was attributed to each stage. An expert group consisting of eight subject-area specialists and eight specialists in knowledge and information Science evaluated the efficiency of the extracted facets; this way, the agreement was evaluated based on the total agreement formula.

Results

Based on the codes extracted from the interviews related to determining the information needs of stakeholders in the domains of gynecology and obstetrics, 23 facets were identified, 9 of which were identified as proper facets including Age groups, Organ, Therapeutics, diagnosis, Disease, symptoms or Finding, risk factor, Complication, Prognosis by the experts through receiving a coefficient of agreement above 80%.

Conclusion

Facet extraction of information retrieval systems based on the user-oriented approach converts the facets from general to specialized states. In this case, the facets are different for the users of each specialized domain in the user interface; thus, the specialized user interfaces would be formed

Keywords: Information Seeking Behavior, Information Storage, Retrieval, Classification

تحلیل کاربرد الگوی فراگفتمان هایلند در خلاصه سازی خودکار استناد مدار: پیشنهاد طرح حاشیه نویسی بافتارهای استنادی

پگاه تاجر*، عبدالرسول جوکار، سیدمصطفی فخراحمد، هاجر ستوده، علیرضا خرمایی

فصلنامه کتابداری و اطلاع رسانی، سال بیست و دوم شماره 3 (پیاپی 87، پاییز 1398)، صص 91 -111

هدف

هدف مقاله حاضر، تحلیل کاربرد الگوی فراگفتمان هایلند در خلاصه سازی خودکار استنادمدار متون علمی و پیشنهاد یک طرح حاشیه نویسی فراگفتمان مدار برای بافتارهای استنادی به منظور به کار گیری در خلاصه سازی استنادمدار می باشد.

روش شناسی

روش شناسی این پژوهش از نوع کتابخانه ای است و پاسخ دهی به سوالات پژوهش، از طریق مطالعه و تحلیل منابع مربوط به الگوی فراگفتمان هایلند، خلاصه سازی خودکار متون علمی، تحلیل بافتارهای استناد و طبقه بندی کارکردهای استناددهی انجام شده است.

یافته ها

فراگفتمان تعاملی هایلند برای نشان دادن چشم انداز نویسنده نسبت به اطلاعات گزاره ای و خواننده به کار می رود، از ابزارهای زبانی مناسب ژانر نقد بهره می برد و برای تحلیل بافتارهای استنادی مناسب است. بنابراین، طرح حاشیه نویسی فراگفتمان مدار بافتارهای استنادی بر اساس تردیدنما، یقین نما، نگرش نما، خوداظهارها و دخیل سازها که از مولفه های اصلی فراگفتمان تعاملی - مشارکتی هایلند هستند، پیشنهاد شد. این طرح شامل 70 طبقه می باشد.

نتیجه گیری

از فراگفتمان تعاملی هایلند می توان برای ساخت پیکره مناسب جهت خلاصه سازی خودکار استنادمدار بهره گرفت و مراحل ایجاد رده بند های مورد نیاز فرآیند خلاصه سازی، پالایش بافتارهای استنادی و انتخاب جملات برای درج در خلاصه نهایی را بر اساس آن انجام داد. حاشیه نویسی پیکره ها عموما بر اساس یک طرح حاشیه نویسی انجام می شود. بنابراین، طرح پیشنهاد شده می تواند مفید واقع شود. با توجه به این که طرح حاشیه نویسی پیشنهاد شده مبتنی بر نظریات موجود است، لازم است در به کارگیری آن، از حاشیه نویسان خواسته شود تا در حین برچسب زنی، هر برچسب دیگری غیر از موارد مطرح شده در طرح را که به ذهنشان می رسد با ذکر دلیل، یادداشت نمایند تا در صورت احراز توافق مطلوب به طرح اضافه گردد.

کلید واژگان: فراگفتمان هایلند, بافتارهای استنادی, خلاصه سازی استنادمدار, طرح حاشیه نویسی, متون علمی

Analyzing the Application of Hyland Metadiscourse Model for Citation-based Automatic Text Summarization: A proposed Annotation Scheme for Citation Contexts

Pegah Tajer*, Abdorasoul Jowkar, Seyed Mostafa Fakhrahmad, Hajar Sotoudeh, Alireza Khormaee

Library and Information Science, Volume:22 Issue: 3, 2019, PP 91 -111

Objective

Author's abstract contains those contributions that the author himself considers important. Meanwhile, they may be less important among scientific community. This supplementary information can be obtained by analyzing citing articles. Citation contexts citing a cited article are actually summaries of that article produced by the scientific community. This type of summary is called citation summary which can provide a deeper insight into the impact of that article on scientific community. Selecting useful citation sentences to be inserted in a system summary is one of the major challenges of citation-based automatic text summarization. Hence, the semantic approach of analyzing citation contexts reveals citation functions; it can be used to refine citation contexts and to insert important content in the final summary. So, approaches like metadiscourse analysis that provide more information would result in producing useful summaries. Therefore, this paper aims at analyzing the application of Hyland metadiscourse model for citation-based automatic summarization of scientific texts. Moreover, based on Hyland Metadiscourse Model, an annotation scheme was proposed for citation contexts which could be used in corpus-based citation summarization systems.

Methodology

This is a library research that answers research questions through studying and analyzing resources related to Hyland Metadiscourse Model, Scientific Text Summarization, Citation Context Analysis and Citation Function Classification. The scheme was evolved during two stages of analysis. First, an initial scheme was created based on studying existing schemes. Then, its metadiscourse version was suggested through analyzing Hyland Metadiscourse Model. Expert evaluation was performed for validating the proposed annotation scheme. Three experts in Information Science and two in Linguistics confirmed the scheme.

Findings

>Hyland interactional metadiscourse is suitable for analyzing citation contexts because it is used to represent the author's perspective on propositional information and also the reader. Moreover, interactional metadiscourse analysis applies appropriate language tools for the critique genre. Therefore, a scheme was proposed based on boosters, attitude markers, hedges, engagement markers and self-mentions which are the main components of Hyland interactional metadiscourse. The proposed scheme includes 70 classes.

Conclusion

Hyland interactive metadiscourse can be used to construct proper corpora for automatic citation-based text summarization. Also, some other phases of automatic summarization such as classifier development, citation context refinement, and sentence selection could be performed based on this type of metadiscourse. Annotating corpora is usually performed using an annotation scheme. Thus, the proposed annotation scheme would be beneficial. However, it is a conceptual scheme proposed on existing theories. So, it is necessary to ask annotators to write down any new labels while annotating. Moreover, they should make some notes about the reasons of creating new ones. In the next stage, if desirable agreement is reached those labels could be added to the scheme.

Keywords: Annotation Scheme, Citation-based Summarization, Citation Contexts, Hyland Metadiscourse Model, Scientific Texts

A Correlation Study of Co-opinion and Co-citation Similarity Measures

Maryam Yaghtin, Hajar Sotudeh*, Mehdi Mohammadi, Mahdieh Mirzabeigi, Seyed Mostafa Fakhrahmad

International Journal of Information Science and Management, Volume:17 Issue: 2, Jul-Dec 2019, PP 19 -31

Co-citation forms a relational document network. Co-citation-based measures are found to be effective in retrieving relevant documents. However, they are far from ideal and need further enhancements. Co-opinion concept was proposed and tested in previous research and found to be effective in retrieving relevant documents. The present study endeavors to explore the correlation between opinion (dis)similarity measures and the traditional co-citation-based ones including Citation Proximity Index (CPI), co-citedness and co-citation context similarity. The results show significant, though weak to medium, correlations between the variables. The correlations are direct for co-opinion measure, while being inverse for the opinion distance. Accordingly, the two groups of measures are revealed to represent some similar aspects of the document relation. Moreover, the weakness of the correlations implies that there are different dimensions represented by the two groups

Keywords: Opinion, Co-opinion, Co-citation, Correlation, Citation Proximity Index, Similarity

روند رشد رویکرد تحلیل چهریزه ای در سازماندهی دانش: مروری صد ساله

عبدالحسین فرج پهلو*، فریده عصاره، سید مصطفی فخر احمد، لیلا دهقانی

پژوهشنامه پردازش و مدیریت اطلاعات، سال سی و چهارم شماره 3 (پیاپی 97، بهار 1398)، صص 1235 -1264

رویکرد تحلیل چهریزه ای از اوایل قرن بیستم تاکنون روند رشد مستمری داشته است. هدف این مقاله مرور سیستماتیک پژوهش ها و مستندات طرح های سازماندهی چهریزه ای و نیز تقسیم بندی موضوعی و زمانی این مطالعات است. با مرور صورت گرفته، روند رشد و توسعه کاربردهای این رویکرد در ابزارهای سازماندهی و بازیابی اطلاعات شناسایی و پیشنهاداتی برای پژوهشگران آینده ارائه گردید. برای این منظور در گام اول، جست وجوی جامع در منابع و بررسی اولیه اسناد؛ در گام دوم، طبقه بندی و پالایش اسناد؛ و در گام سوم، طبقه بندی زمانی و موضوعی اسناد و تحلیل متون و شناسایی شکاف های موجود و در نهایت، پیشنهاداتی برای پوشش این شکاف ها صورت گرفت. حاصل تلاش های انجام شده قبلی، توسعه رده بندی های چهریزه ای، اصطلاحنامه ها و سرعنوان های چهریزه ای و نظام های بازیابی اطلاعات چهریزه ای بود که به طوری گسترده تا دهه 1990 میلادی ادامه داشت؛ اما بعد از آن با توسعه سیستم های کامپیوتری و وب، چهریزه ها نقش دیگری در بازیابی اطلاعات در پایگاه داده بر عهده گرفتند. در این دوره مجموعه ای از مدل ها، فراداده های چهریزه ای، رابط های کاربری چهریزه ای و آنتولوژی های چهریزه ای شکل گرفت و نرم افزارهای متعددی در این زمینه توسعه یافت. رویکرد تحلیل چهریزه ای از حدود اوایل قرن بیستم تا سال 1990 میلادی بر مبنای نظام منطقی (پیشینی) طبقه بندی علوم پیش رفته است. اما از آن سال به بعد، به دلیل گسترش توانایی های کامپیوتری و رشد نیازهای کاربران، دیدگاه منطقی جای خود را به دیدگاه محاسباتی و کاربرمدار (پسینی) سپرد. ایجاد ساختار چهریزه ها در محیط وب معنایی و ایجاد استانداردهای جدید، بهره برداری از روش های موثرتر درک رفتار کاربران و توجه به توسعه و تحول تاریخی علم، شکاف هایی است که هنوز نیاز به مطالعه و بررسی بیشتر دارد. پوشش این شکاف ها، تاثیر پایدار فرایند تحلیل چهریزه در آینده را نوید می دهد.

کلید واژگان: سازماندهی دانش, بازیابی اطلاعات, چهریزه, تحلیل چهریزه ای, مرور سیستماتیک

The development of facet analysis approach in knowledge organization: a 100-Year Review

Abdolhossein Farajpahlou*, Farideh Osareh, Seyed Mostafa Fakhrahmad, Leila Dehghani

Journal of Information Processing and Management, Volume:34 Issue: 3, 2019, PP 1235 -1264

Facet analysis approach (FAA) has exhibited a continuous growth trend since the early 20th century. The present paper is aimed to systematically review the studies and documents of the faceted organization plans as well as thematic and temporal classification of these works. This review led to the identification of the growth and development trends of applying this approach in the information organization and retrieval tools, followed by providing some suggestions for researchers in future works. Accordingly, the steps to be taken in the present work were as follows: The first step included a comprehensive search in relevant references as well as a primary review of the documents, followed by classification and refinement of the documents in the second step. The third step addressed temporal and thematic classification of the documents, analysis of the literature, and identification of the existing gaps. In the final step, some suggestions were provided for covering these gaps. The outcomes of the previous works included the development of faceted rankings, faceted glossaries and headings, and faceted information retrieval systems, the extensive use of which continued until 1990s. Subsequently, with the development of computer systems and web, the facets took another role in the retrieval of the data available in the database. During this period, a set of models, faceted metadata, faceted user interfaces, and faceted ontologies was made, which was followed by the development of several software in this field. The facet analysis approach has been developing since the early 20th century to 1990 based on the logical system (a priori) of science classification. However, since then, due to the development of the computer capabilities and growth of the users' needs, the logical perspective was replaced by the computational and user-oriented (a posteriori) perspective. Creating the structure of facets in semantic web environments and formulation of new standards, utilizing more effective methods to perceive user behaviors, and taking historical development and changes of sciences into account are the gaps that still require further studies. Covering these gaps promises sustainable effectiveness of the facet analysis process in future.

Keywords: knowledge organization, information retrieval, facet, Facet analysis, Systematic review

Investigating text power in predicting semantic similarity

Zahra Yousefi, Hajar Sotudeh *, Mahdieh Mirzabeigi, Seyed Mostafa Fakhrahmad, Alireza Nikseresht, Mehdi Mohammadi

International Journal of Information Science and Management, Volume:17 Issue: 1, Jan-Jun 2019, PP 17 -31

This article presents an empirical evaluation to investigate the distributional semantic power of abstract, body and full-text, as different text levels, in predicting the semantic similarity using a collection of open access articles from PubMed. The semantic similarity is measured based on two criteria namely, linear MeSH terms intersection and hierarchical MeSH terms distance. As such, a random sample of 200 queries and 20000 documents are selected from a test collection built on CITREC open source code. Sim Pack Java Library is used to calculate the textual and semantic similarities. The nDCG value corresponding to two of the semantic similarity criteria is calculated at three precision points. Finally, the nDCG values are compared by using the Friedman test to determine the power of each text level in predicting the semantic similarity. The results showed the effectiveness of the text in representing the semantic similarity in such a way that texts with maximum textual similarity are also shown to be 77% and 67% semantically similar in terms of linear and hierarchical criteria, respectively. Furthermore, the text length is found to be more effective in representing the hierarchical semantic compared to the linear one. Based on the findings, it is concluded that when the subjects are homogenous in the tree of knowledge, abstracts provide effective semantic capabilities, while in heterogeneous milieus, full-texts processing or knowledge bases is needed to acquire IR effectiveness.

Keywords: distributional semantics, semantic similarity, textual similarity, effectiveness, information retrieval, MeSH

A Hybrid Accurate Alignment method for large Persian-English corpus construction based on statistical analysis and Lexicon/Persian Word net

Mohammad Bagher Dastgheib*, Seyed Mostafa Fakhrahmad, Mansour Zolghadri Jahromi

International Journal of Information Science and Management, Volume:14 Issue: 2, Jul-Dec 2016, P 97

A bilingual corpus is considered as a very important knowledge source and an inevitable requirement for many natural language processing (NLP) applications in which two languages are involved. For some languages such as Persian, lack of such resources is much more significant. Several applications, including statistical and example-based machine translation needs bilingual corpora, in which large amounts of texts from two different languages have been aligned at the sentence or phrase levels. In order to meet this requirement, this paper aims to propose an accurate and hybrid sentence alignment method for construction of an English-Persian parallel corpus. As the first step, the proposed method uses statistical length based analysis for filtering of candidates. Punctuation marks are used as a directing feature to reduce the complexity and increase the accuracy. Finally, the proposed method makes use of some lexical knowledge in order to produce the final output. . In the phase of lexical analysis, a bilingual dictionary as well as a Persian semantic net (denoted as FarsNet) is used to calculate the extended semantic similarity. Experiments showed the positive effect of expansion on synonym words by extended semantic similarity on the accuracy of the sentence alignment process. In the proposed matching scheme, a semantic load based approach (which considers the verb as the pivot and the main part of a sentence) was also used in order for increasing the accuracy. The results obtained from the experiments were promising and the generated parallel corpus can be used as an effective knowledge source by researchers who work on Persian language.

Keywords: Parallel corpora, Hybrid sentence alignment, English, Persian corpus, Extended semantic similarity

Word Sense Disambiguation Based on Lexical and Semantic Features Using Naive Bayes Classifier

Amir Hossein Rasekh *, Mohammad Hadi Sadreddini, Seyed Mostafa Fakhrahmad

Journal of Computing and Security, Volume:1 Issue: 2, Spring 2014, PP 123 -132

Machine translation is considered as a branch of machine intelligence with about ﬁfty years background. Ambiguity of language is the most problematic issue in machine translation systems, which may lead to unclear or wrong translation. One of the problems involved in natural language processing is the semantic and structural ambiguity of the words. The objective of this paper to focused on the word sense disambiguation. In here, the existing algorithms for word sense disambiguation are evaluated and a method which is proposed based on the concept, structure and meaning of the words. The experimental results are promising and indicate that this proposed approach signiﬁcantly outperform its counterparts in terms of disambiguation accuracy.

Keywords: Natural Language Processing, Word Sense Disambiguation, Text Mining

به جمع مشترکان مگیران بپیوندید!

seyed mostafa fakhrahmad