Magiran | جستجوی کلیدواژه "text mining"

Using machine learning algorithms in determining the stage of breast cancer from pathology reports

Shirin Samadzad-Qushchi, Parinaz Eskandarian, Zahra Niazkhani, _ Ali Rashidi, Habibollah Pirnejad *

Frontiers in Health Informatics, Volume:13 Issue: 1, Winter 2024, P 182

Introduction

After a cancer diagnosis, the most important thing is to determine the stage and grade of the cancer. Pathology reports are the main source for cancer staging, but they do not contain all the information needed for the staging. However, the text of these reports is sometimes the only available information. We were interested in knowing whether text mining methods can be used to predict staging only from pathology reports.

Material and Methods

A total of 698 pathology reports of breast cancer cases and their TNM staging collected from multiple centers in West Azerbaijan Province, Iran were used for this study. After preparing the semi-structured reports, the texts of the reports were imported into a program written by Python V3. Three machine learning algorithms of Logistic Regression, SVM, and Naïve Bayes and a simple pipeline were used for the purpose of text mining. The performance of the algorithms was evaluated in terms of accuracy, precision, recall, and F1 score.

Results

The Naïve Bayes algorithm achieved excellent results and a value rate of higher than 91% in all evaluation criteria (accuracy, precision, recall and F1 score). This means that the Naïve Bayes algorithm could classify the reports with high efficiency and its predictions were more correct than the other two algorithms. Naïve Bayes also outperformed SVM and Logistic Regression in terms of accuracy, recall and F1 score. In addition, Naïve-Bayes showed faster inference due to its simplicity and lower computational and training time.

Conclusion

We suggest using the proposed design in this study for predicting breast cancer staging, where there is a need but not all necessary information except pathology reports. This method may not be a useful for clinical management of cancer patients, but it can be safely used for epidemiological estimations.

Keywords: Breast Cancer, Pathology Reports, Text Mining, NLP, TNM Stage, Machine Learning

داده کاوی متنی انتشارات کووید-19 به منظور کشف و استخراج روندهای نوظهور

فرشید دانش، فروغ رحیمی*

نشریه میکروب شناسی پزشکی ایران، سال هفدهم شماره 2 (فروردین و اردیبهشت 1402)، صص 150 -160

زمینه و اهداف

حجم بسیار بالای انتشارات معتبر COVID-19 در سراسر جهان، ضرورت پایش و تحلیل متون علمی COVID-19 را برای پژوهشگران در سطح خرد و برای سیاست گذاران و برنامه ‏ریزان در سطح کلان بیش از پیش آشکار می ‏سازد. به بیان دیگر، نتایج منتج از تحلیل مدارک منتشرشده COVID-19 با روش ‏ها و تکنیک های متن‏کاوی از جایگاه و اهمیت ویژه‏ای برای پژوهشگران، سیاست گذاران و برنامه ‏ریزان علوم پزشکی در سطح ملی و بین‏ المللی برخوردار است و ضرورت انجام چنین پژوهشی را بیش از پیش آشکار می سازد. هدف اصلی پژوهش حاضر شناسایی موضوعات نو ظهور و روند تغییر در واژگان علمی در سطح ملی و بین ‏المللی حوزه موضوعی COVID-19 با روش متن‏ کاوی است.

مواد و روش کار

نوع پژوهش حاضر، کاربردی است. این پژوهش با استفاده روش متن کاوی و الگوریت م‏ها و تکنیک ‏های مربوط به آن و همچنین طبقه بندی متون با رویکرد تحلیلی-تطبیقی انجام شده است. جامعه پژوهش حاضر شامل کلیه انتشارات COVID-19 نمایه شده در پایگاهPubMed Central® (PMC) است. تا تاریخ بیست خردادماه سال 1400 تعداد رکوردهای بازیابی شده از پایگاه PubMed Central® (PMC)، 160862 مورد بود. از این تعداد 3143 مورد انتشارات ملی و 157719 مورد انتشارات بین ‏المللی COVID-19 است. در این پژوهش از زبان برنامه ‏نویسی پایتون و کتابخانه ‏های مرتبط با این برنامه استفاده شد. مهم ترین واژگان بر اساس وزن دهی TF-IDF نیز شناسایی و گزارش شد. موضوعات نوظهور با توجه به رشد میانگین وزنی، شناسایی شدند.

یافته ها

تحلیل داده ها حاکی از آن است که “covid”، “infect” و “cell” از مهم ترین واژگان بکار رفته در انتشارات بین المللی COVID-19 و “patient”، “SARS-Cov” و “covid” مهم ترین واژگان انتشارات ملی هستند.

نتیجه گیری

در خصوص روند تغییرات واژگان مورد استفاده در انتشارات COVID-19 از مهمترین نتایجی که می‏توان استنباط نمود تفاوت اساسی بین مهمترین واژه‏ های انتشارات بین ‏المللی با ملی و تاکید پژوهش های بین الملل بر کرونا و عفونت ناشی از آن و در سطح ملی بر بیماران و کرونا است. نتیجه مهم دیگر تغییرات سالانه بوجود آمده در واژه ‏ها در سطح انتشارات ملی و بین‏ المللی است. شایان ذکر است که تغییرات واژه ‏ها به خصوص در انتشارات ملی و بین ‏المللی هم‏راستا با اتفاقات و رویدادهای مهم علمی است.

کلید واژگان: کووید-19, متن کاوی, فراوانی وزنی تی اف-آی دی اف, طبقه بندی, خوشه بندی, موضوعات نوپدید, پایتون

Mining of Emerging trends of Covid-19 thematic areas in National and International publications

Farshid Danesh, Forough Rahimi*

Iranian Journal of Medical Microbiology, Volume:17 Issue: 2, 2023, PP 150 -160

Background and Aim

The results from the analysis of COVID-19 literature by employing text-mining techniques are of particular importance for researchers, policymakers, and planners of medical sciences at the national and international levels, avoiding parallel research and waste of time and budget. The paper explore emerging topics and the trend of scientific words at the national and international levels in the subject area of COVID-19.

Materials and Methods

This applied research was conducted by employing the text-mining and its related algorithms and classifying texts. The population consists of all COVID-19 articles indexed in PubMed Central® (PMC). The number of records retrieved was 160,862 items until June 10, 2021. Among these, 3143 national and 157,719 international COVID-19 articles. Python and its related libraries were applied. The most significant words were also identified and reported based on TF-IDF weighting. Emerging topics were identified according to the weighted average growth.

Results

"COVID", "infect", and "cell" were among the most important words used in international COVID-19 articles. In addition, the most important words in the national COVID-19 articles were "patient", "SARS-Cov", and "COVID".

Conclusion

Among the most important conclusions that can be inferred from the trend of word change used in the COVID-19 literature is that the most significant words in international literature differ significantly from those in national literature, as international research focuses on COVID-19 and the infections caused by it. In contrast, national research focuses on COVID-19 and patients. Another significant result is the annual word-changing national and international literature.

Keywords: Covid-19, Text Mining, TF-IDF, Classification, Clustering, Emerging Topics, Python

Identifying the Trends of Global Publications in Health Information Technology Using Text-mining Techniques

Meisam Dastani, Hamideh Ehtesham, Zohreh Javanmard, Azam Sabahi, Fateme Bahador *

Shiraz Emedical Journal, Volume:23 Issue: 11, Nov 2022, P 5

Background

Due to the increased publication of articles in various scientific fields, analyzing the published topics in specialized journals is important and necessary.

Objectives

This research has identified the published topics in global publications in the health information technology (HIT) field.

Methods

This study analyzed articles in the field of HIT using text-mining techniques. For this purpose, 162,994 documents were extracted from PubMed and Scopus databases from 2000 to 2019 using the appropriate search strategy. Text mining techniques and the Latent Dirichlet Allocation (LDA) topic modeling algorithm were used to identify the published topics. Python programming language has also been used to run text-mining algorithms.

Results

This study categorized the subject of HIT-related published articles into 16 topics, the most important of which were Telemedicine and telehealth, Adoption of HIT, Radiotherapy planning techniques, Medical image analysis, and Evidence-based medicine.

Conclusions

The results of the trends of subjects of HIT-related published articles represented the thematic extent and the interdisciplinary nature of this field. The publication of various topics in this scientific field has shown a growing trend in recent years.

Keywords: Health Information Technology, Text Mining, Scientific Publications, Trend, Health Information Management

Prioritizing Candidate Genes for Type 2 Diabetes Mellitus using Integrated Network and Pathway Analysis

Tejaswini Prakash, Nallur B Ramachandra

Avicenna Journal of Medical Biotechnology, Volume:14 Issue: 3, Jul-Sep 2022, PP 239 -246

Background

Type 2 Diabetes Mellitus (T2DM) has emerged as a major threat to global health that fosters life-threatening clinical complications, taking a huge toll on our society. More than 65 million Indians suffer from T2DM, making it one of the leading causes of death. T2DM and associated complications have to be constantly monitored and managed which reduces the overall quality of life and increases socioeconomic burden. Therefore, it is crucial to develop specific treatment and management strategies. In order to achieve this, it is essential to understand the underlying genetic causes and molecular mechanisms.

Methods

Integrated gene network and ontology analyses facilitate prioritization of plausible candidate genes for T2DM and also aid in understanding their mechanistic pathways. In this study, T2DM-associated genes were subjected to sequential interaction network and gene set enrichment analysis. High ranking network clusters were derived and their interrelation with pathways was assessed.

Results

About 23 significant candidate genes were prioritized from 615 T2DM-associated genes which were overrepresented in pathways related to insulin resistance, type 2 diabetes, signaling cascades such as insulin receptor signaling pathway, PI3K signaling, IGFR signaling pathway, ERBB signaling pathway, MAPK signaling pathway and their regulatory mechanisms.

Conclusion

Of these, two tyrosine kinase receptor genes-EGFR and IGF1R were identified as common nodes and can be considered to be significant candidate genes in T2DM.

Keywords: Gene ontology, Hub genes identification, In silico analysis, Text mining, Type 2diabetes mellitus

مقایسه تحلیلی مدارک علمی ایران در حوزه موضوعی متن کاوی

محدثه رفیعی خشنود*، عبدالصمد کرامت فر

نشریه علم سنجی کاسپین، سال نهم شماره 1 (بهار و تابستان 1401)، صص 104 -116

سابقه و هدف

سیاست‌گذاران تلاش می‌کنند تا عملکرد علمی کشور خود را مورد ارزیابی قرار داده و آن را از نظر اثربخشی و حل مشکلات مورد سنجش قرار دهند. این مقاله به مقایسه تحلیلی مدارک علمی ایران در حوزه موضوعی متن‌کاوی بر اساس پایگاه‌های داخلی و خارجی می‌پردازد.

مواد و روش‌ها:

پژوهش حاضر از نوع توصیفی- پیمایشی و با رویکرد کتاب‌سنجی انجام شده است. برای بازیابی مدارک علمی مرتبط با متن‌کاوی در پایگاه اسکوپوس عبارات مرتبط با آن جستجو و سپس نتایج به ایران محدود شد. برای بازیابی مدارک علمی مجلات داخلی از پایگاه مرکز اطلاعات علمی جهاد دانشگاهی به شیوه مشابه استفاده شد. برای تجزیه و تحلیل داده‌ها از نرم‌افزارهای Bibexcel، Vosviewer، زبان برنامه‌نویسی Python و Excel استفاده شد.

یافته‌ها:

تعداد کل مدارک علمی ایران در حوزه موضوعی متن‌کاوی در پایگاه استنادی اسکوپوس، برابر با 1082 است. 284 مدرک علمی (26/25%) از مدارک علمی نمایه‌ شده در اسکوپوس، بر زبان فارسی متمرکز هستند. همچنین بر اساس داده‌های پایگاه مرکز اطلاعات علمی، تعداد مدارک علمی این حوزه موضوعی برابر با 89 و مدارک علمی متمرکز بر زبان فارسی برابر با 51 (57/30%) است. مجله Lecture notes in computer science بیشترین تعداد مدارک علمی بین‌المللی ایران و مجله پردازش علایم و داده‌ها، بیشترین تعداد مدارک علمی داخلی ایران را در حوزه موضوعی متن‌کاوی منتشر کرده‌اند. با استفاده از آزمون تی مستقل مشخص شد بین تعداد مدارک علمی متمرکز بر زبان فارسی پایگاه اسکوپوس و مرکز اطلاعات علمی جهاد دانشگاهی، تفاوت معناداری وجود دارد (0/0001>p).

نتیجه‌گیری:

میانگین نرخ رشد مدارک علمی ایران در حوزه متن‌کاوی بالاتر از حوزه‌های موضوعی دیگر است. کشورهای آمریکا، انگلیس و استرالیا بیشترین میزان مشارکت را با محققان ایرانی در این حوزه موضوعی داشته‌اند. همچنین مشخص شد مدارک علمی بین‌المللی که بر زبان انگلیسی متمرکز هستند، استناد بیشتری نسبت به مدارک علمی متمرکز بر زبان فارسی دریافت می‌کنند.

کلید واژگان: داده کاوی, متن کاوی, ارزیابی علم, کتاب سنجی, پردازش زبان طبیعی

Analytical Comparison of Iranian Scientific Documents in Text Mining

Mohadeseh Rafiee *, Abdalsamad Keramatfar

Caspian Journal of Scientometrics, Volume:9 Issue: 1, 2022, PP 104 -116

Background and aim

Policymakers seek to evaluate their country's scientific performance and measure it in terms of effectiveness and problem-solving. The aim of this study was to make an analytical comparison of Iranian scientific documents in text mining based on domestic and foreign databases.

Materials and methods

The present study is descriptive survey with a bibliometric approach. In order to find scientific documents related to text mining in the Scopus database, related terms were searched, and then the results were limited to Iran. Scientific Information Database (SID) was used to search for Persian scientific documents. Bibexcel, VOSviewer, Python programming language, and Excel 2017 were used to analyze the data.

Findings

The total number of Iranian scientific documents in text mining in the Scopus citation database was 1082 and 284 (26.25%) of scientific documents indexed in Scopus were in Persian. Moreover, according to the Scientific Information Center, the number of scientific documents in this field was 89 and the number of scientific documents in Persian was 51 (57.30%). The Journal of Lecture Notes in Computer Science has published most international scientific papers in Iran, and the Journal of Signal and Data Processing has published most domestic scientific papers in Iran in text mining. A t-test was used to determine that there was a significant difference in the number of scientific documents in Persian between Scopus and SID databases (p<0.0001).

Conclusion

The average growth rate of Iranian scientific documents in text mining was higher than in other subject areas. The United States, Britain, and Australia have had the most collaboration with Iranian researchers in this field. It was also found that international scientific documents in English received more citations than scientific documents in Persian.

Keywords: Data mining, Text mining, Evaluating science, Bibliometrics, Natural language processing

Features and Services of Well-designed Hospital Information Systems: A Review Study

masoomeh nouri tahneh, Hamid Moghaddasi, Azam Sadat Hosseini, Farkhondeh Asadi

Archives of Advances in Biosciences, Volume:12 Issue: 2, Spring 2021, PP 55 -66

Context:

Hospitals are large information organizations with the main goal of providing high-quality, integrated, and cost-effective healthcare. This goal is more easily realized through well-designed Hospital Information Systems [HIS].

Evidence Acquisition:

In this narrativereview study, 98 articles were extracted from Science Direct, PubMed, and Google Scholar databasesusing "Hospital Information System"keyword.The articles werepublished between1980 and2018. After examining the quality of the articles in terms of research design and references, 41 articles remained for analysis. Relevant e-books and print books were also examined, and the features and services of HIS were investigated.

Results

For HIS, seven features, namelythe coverage of differenttypes of data, integration of subsystems, having an enterprise metamodel, communication with other information systems, coverage of hospital units, adherence to standards, and connectivity to digital instruments were obtained. Moreover, 18 servicesofpatient management, economic management and cost reduction, legal management of data, treatment management, administrative management, presenting information based on policies, clinical decision support, managerial-administrative decision support, educationalsupport, research support, electronic medical record generation, text mining, encoding, documentation quality improvement, medical support, resource utilization management, personnel management, and warehouse management were determined.

Conclusions

To evaluate HIS, it is necessary to determine its features and services. Based on the features and services of HIS, itsevaluation tool has been developedin this study

Keywords: Hospital Information System, Metamodel, Electronic Medical Record, Text Mining

Identifying Emerging Trends in Scientific Texts Using TF-IDF Algorithm: A Case Study of Medical Librarianship and Information Articles

Meisam Dastani, Afshin Mousavi Chelak*, Soraya Ziaei, Faeze Delghandi

Health Technology Assessment in Action, Volume:4 Issue: 2, Nov 2020, P 1

Context

Nowadays, due to the increased publication of articles in various scientific fields, identifying the publishing trend and emerging keywords in the texts of these articles is essential. Thus, the present study has identified and analyzed the keywords used in published articles on medical librarianship and information.

Materials and Methods

In the present investigation, an exploratory and descriptive approach has been used to analyze librarianship and information articles published in specialized journals in this field from 1964 to 2019 by applying text mining techniques. The TF-IDF weighting algorithm has been applied to identify the most important keywords used in the articles. Python programming language has also been used to implement text mining algorithms.

Results

The results obtained from the TF-IDF algorithm indicate that the words “library”, “patient”, and “inform” with the weights of 95.087, 65.796, and 63.386, respectively, were the most important keywords in the published articles on medical librarianship and information. Also, the words “Catalog”, “Book” and “Journal” were the most important keywords used in the articles published between the years 1960 and 1970, and the words “Patient”, “Bookstore” and “Intervent” were the most important keyword used in articles on medical librarianship and information published from 2015 to 2020. The words “Blockchain”, “telerehabilit”, “Instagram”, “WeChat”, and “comic” are new keywords observed in articles on medical librarianship and information between 2015 and 2020.

Conclusion

The results of the present study have revealed that the keywords used in articles on medical librarianship and information have not been consistent over time and have undergone an alteration at different periods so that nowadays, this field of science has also changed following the needs of society with the advent and growth of information technologies.

Keywords: librarianship, information, Medical, Analysis, Text mining

بررسی تقلب در بیمه های درمان تکمیلی و راه های مقابله با آن

سجاد رامندی، لیلی نیاکان*، سعیده رجائی هرندی، هادی عاشقی

مجله بیمه سلامت ایران، سال سوم شماره 3 (پیاپی 10، پاییز 1399)، صص 178 -187

مقدمه

تقلب، آثار مستقیم و غیرمستقیمی بر بیمه گران و بیمه گذاران دارد. با توجه به ذات و ماهیت صنعت بیمه، تصمیم های مدیران و مسیولان بدون انجام بررسی و پژوهش کافی به نتیجه مطلوب نخواهد رسید. بنابراین، این پژوهش به مطالعه و بررسی تقلب در بیمه های درمان تکمیلی و راه های مقابله با آن با رویکرد کاربردی پرداخته است.

روش بررسی

با مطالعه تطبیقی تجربیات موفق کشورهای پیشرو در زمینه مبارزه با تقلب در رشته بیمه درمان تکمیلی و همچنین مصاحبه با خبرگان حوزه، فرایندها، عوامل زمینه ساز و آثار تقلب در بیمه درمان تکمیلی، موانع و چالش های موجود در فرایندهای مذکور شناسایی و در نهایت، به ارایه راهکارهای پیشگیری و کنترل این پدیده پرداخته شد. مصاحبه ها در قالب متن در نرم‎افزارMAXQDA بارگذاری و سپس به متن کاوی پرداخته شد.

یافته ها

در مجموع، 34 عامل که زمینه بروز تقلب در صنعت بیمه درمان تکمیلی را ایجاد می کنند، شناسایی و در 6 گروه راهکارهای مربوط به قوانین و مقررات، راهکارهای مربوط به فرایندها، راهکارهای مربوط به تکنولوژی، راهکارهای مربوط به نهادها و سازمان های مرتبط، راهکارهای آموزشی و راهکارهای فرهنگی دسته بندی شدند. یافته های پژوهش نشان دادند از بین عوامل زمینه ساز تقلب بیشترین عامل تاثیرگذار مربوط به ناکارایی نهاد ناظر و چشم پوشی از پدیده تقلب در بیمه بود.

نتیجه گیری

در صورتی که شرکت های بیمه اقدام به طراحی و راه اندازی سامانه ای جامع و یکپارچه کنند، امکان جلوگیری از میزان بالایی از تقلب ها فراهم خواهد شد.

کلید واژگان: بیمه درمان تکمیلی, تقلب, متن‎ کاوی

Fraud Detection in Supplementary Health Insurance and Ways to Compete

Sajad Ramandi, Leili Niakan*, Saeedeh Rajaee Harandi, Hadi Asheghi

Iranian Journal of Health Insurance, Volume:3 Issue: 3, 2020, PP 178 -187

Introduction

Fraud has direct and indirect effects on insurers and insured. Due to the nature of the insurance industry, the decisions of managers and officials will not achieve the desired result without conducting sufficient research. Therefore, this research has studied and investigated fraud in complementary health insurance and ways to deal with it through a practical approach.

Methods

In this regard, by comparative study of successful experiences of leading countries in the fight against fraud in the field of complementary health insurance and also interviews with experts in this field, processes, underlying factors and effects of fraud in complementary health insurance, obstacles and challenges in these processes was identified. Finally solutions were provided to prevent and control this phenomenon. The interviews were uploaded in text format to MAXQDA software and then analyzed.

Results

In total, 34 factors that cause fraud in the complementary health insurance industry have been identified and divided into six groups of “Rules and Regulations”, “Process Solutions”, “Technology Solutions”, and “Solutions related to institutions and organizations, “Educational Strategies” and “Cultural Strategies”. Findings showed that among the underlying factors of fraud, the most influential factor was “inefficiency of central insurance of Iran” and “ignoring the phenomenon of insurance fraud.

Conclusion

If insurance companies design and launch comprehensive and integrated systems, it will be possible to prevent a high level of fraud.

Keywords: Supplementary Health Insurance, Fraud, Text Mining

Mapping the intellectual structure of epidemiology with use of co-word analysis

Hamed Baziyad*, Saeed Shirazi, Seyed mohammadreza Hosseini, Rasoul Norouzi

Journal of Biostatistics and Epidemiology, Volume:5 Issue: 3, Summer 2019, PP 210 -215

Background and Aim

The existence of an intellectual structure for every field is essential for managers and scholars. Intellectual structures provide a comprehensive map of knowledge that can guide researchers and managers to have a better view of their fields. Besides, with high-speed and massive amounts of data and information generation, reading and surveying of all resources are severely tricky. Intellectual maps solve this problem and make a situation for control and monitoring this voluminous and high-speed generated data. Epidemiology is regarded as one of the exciting fields which many researchers focused on it. A study of the structure and criteria of different epidemiological fields has not been done yet. Indeed, there is no serious effort for knowledge discovery of hidden information on epidemiological texts.

Methods

In this paper, in order to survey this field, an intellectual structure is provided using co-word analysis. Utilizing co-word analysis discloses relationships and structure among research subjects and topics in a field.

Results

Finally, four main clusters were determined, namely: genetic (with 30.53% of surveyed papers), illness (29.47%), modeling (23.16%), and prevention (16.84%).

Conclusion

According to epidemiology co-word network, epidemiology area has not been studied from enough different areas, especially from novel technologies

Keywords: Intellectual structure of epidemiology, Co-word analysis, Text mining, Graph mining, Social network analysis

تحلیل ترکیبی کتاب سنجی و متن کاوی تولیدات علمی حوزه پرونده الکترونیک سلامت در پایگاه PubMed

شکوهیان، شعبانی، چشمه سهرابی، عاصمی*

نشریه مدیریت اطلاعات سلامت، سال شانزدهم شماره 4 (پیاپی 68، مهر و آبان 1398)، صص 190 -196

مقدمه

دسترسی به اطلاعات کامل بیمار، نقش مهمی در بهبود مراقبت های بالینی و کاهش اشتباهات پزشکی دارد. در این خصوص، پرونده الکترونیک سلامت، قسمت اصلی یک سیستم اطلاعات سلامت یکپارچه محسوب می شود. هدف از انجام پژوهش حاضر، تحلیل کتاب سنجی و متن کاوی تولیدات علمی منتشر شده در حوزه پرونده الکترونیک سلامت در پایگاه PubMed بود.

روش بررسی

این مطالعه به روش کتاب سنجی و متن کاوی در بازه زمانی سال های 2009 تا 2019 بر روی 6863 مقاله انجام شد. داده ها با استفاده از نرم افزارهای Excel و VOSviewer و ابزار Voyant مورد تجزیه و تحلیل قرار گرفت.

یافته ها

در حوزه مورد نظر، موضوعات پرونده الکترونیک سلامت، سلامت، مراقبت بهداشتی و سیستم های مراقبت بهداشتی اهمیت زیادی در پایگاه PubMed داشت. تولید مقالات در حوزه پرونده الکترونیک سلامت طی ده سال روندی صعودی را نشان داد و کشور آمریکا پرتولیدترین کشور در این حوزه بود. بیشترین مقالات به David Bates، Dean Sittig و Hardeep Singh اختصاص داشت.

نتیجه گیری

نقشه هم رخدادی واژگان برای هر کدام از واژه ها، نماینده یک مفهوم یا حوزه تحقیقاتی در سلامت می باشد. نتایج به دست آمده می تواند دید روشنی به منظور سیاست گذاری علمی این حوزه برای تاثیرگذاری بر تخصیص و توزیع منابع در فعالیت های علمی و فنی ارایه نماید. همچنین، می تواند به محققان در انتخاب موضوعات داغ و کسب بینش جامعی از چارچوب علمی حوزه مورد نظر کمک نماید.

کلید واژگان: تولیدات علمی, پرونده های الکترونیک سلامت, کتابسنجی, متنکاوی, PubMed

Combined Bibliometric and Text-Mining Analysis of Scientific Productions in PubMed Database in the Field of Electronic Health Records

Mahboobeh Shokouhian, Asefeh Asemi, Ahmad Shabani, Mozafar Cheshme Sohrabi

Health Information Management, Volume:16 Issue: 4, 2019, PP 190 -196

Introduction

Access to patient’s complete information is critical in improving clinical care and reducing medical errors. Electronic Health Record is a collection of individuals' health information, from prenatal to posthumous, which is stored electronically, is available at any center and at any time, and is an integral part of an integrated health information system. The purpose of the present study was bibliometric and text-mining analyze of scientific products in the field of Electronic Health Records in PubMed database.

Methods

This present study was carried out using bibliometric method and text mining. The study was conducted in the academic year of 2019 in PubMed database on the period of 2009-2019, and 6863 articles were selected for review. Excel, VOSviewer and Voyant were used for data analysis.

Results

In the studied field, issues of electronic health records, health, health care, information, health care systems were of great importance in PubMed. Developing articles in this field had been on the rise for ten years, and the United States was the most productive country in the field. David Bates, Dean Sittig, and Hardeep Singh had the most articles in the field of study

Conclusion

Each item of co-occurring vocabulary map can represent a concept or research area in health. The findings can provide a clear insight to scientific policymaking of this field to influence the allocation and distribution of resources for scientific and technical activities. It can also help researchers in selecting the state-of-the-art topics and having a comprehensive insight into the academic context of the field.

Keywords: PubMed, Scientific Productions, Electronic Health Records, Bibliography, Text Mining

سموم به عنوان سلاح های زیستی؛ رویکرد متن کاوی ادبیات زیست پزشکی

رضا معظمیان فر *، حمیده روحانی نژاد

نشریه طب انتظامی، سال هفتم شماره 1 (پیاپی 24، زمستان 1396)، صص 45 -50

اهداف

بیوتروریسم حمله حساب شده ای است که منجر به ایجاد بیماری یا مرگ در انسان، با استفاده از ویروس ها ، باکتری ها یا مواد سمی می شود. در سال های اخیر، به دلیل افزایش رکورد مقالات آنلاین موجود در پایگاه های اطلاعاتی، به کاربرد متن کاوی و استراتژی های استخراج اطلاعات از مقالات زیست پزشکی، توجه بسیاری شده است. هدف از این مطالعه، بررسی اهمیت سموم به عنوان سلاح های زیستی با جستجو در متون پزشکی و پایگاه های اطلاعاتی پزشکی بود.
اطلاعات و

روش ها

این پژوهش متن کاوی و بررسی داده ای در سال 1396-1395 انجام شد. به منظور خوشه بندی نتایج جستجوی کلیدواژه های اختصاصی در شبکه و به خصوص پایگاه های اطلاعاتی پزشکی از نرم افزار آنلاین Carrot 2 استفاده شد. در زمان جستجوی کلیدواژه ها، موتور جستجوگر بر PubMed و نوع خوشه بندی بر اساس K-means تنظیم شد و در پایان نتایج به صورت نمودارهای حبابی مورد بررسی قرار گرفت.

یافته ها

بیشترین رکورد نوروتوکسین ها مربوط به تترودوتوکسین با مجموع رکورد 18970، بیشترین رکورد سایتوتوکسین ها مربوط به پرتوسیس توکسین با مجموع رکورد 14390 و بیشترین رکورد سایتوتوکسین های خطرناک پوستی مربوط به زیرالنون با مجموع رکورد 2656 بود.

نتیجه گیری

اولویت های حوزه پیگیری بیوتروریسم و سلاح های زیستی، تشخیص زودرس، سلامت عمومی و کنترل این عوامل است؛ درنتیجه سیاست گذاری های خرد و کلان می بایست معطوف به این زمینه ها شود.

کلید واژگان: سلاح های زیستی, متن کاوی, نوروتوکسین ها

Toxins as Biological Weapons; a Text-Mining Approach to Biomedical Literature

H. Rouhani Nejadr *, R. Moazamyan Far

Journal of Police Medicine, Volume:7 Issue: 1, 2018, PP 45 -50

Aims: Bioterrorism is an invasive attack that can cause disease or death in humans, using viruses, bacteria or toxic substances. In recent years, due to an increase in the number of online articles in databases, much attention has been paid to the application of text mining and information extraction strategies from biomedical articles. The purpose of this study was to evaluate the importance of toxins as biological weapons by searching in medical texts and medical databases.
Information &

Methods

This text mining and data research study was carried out in 2015-2016. The Carrot 2 software was used to cluster the keyword search results into the network and especially the biomedical databases. When searching for keywords, the search engine was set up on PubMed and the cluster type was based on K-means, and at the end of the results, the results were considered as foam trees.
Findings: The highest neurotoxins record was related to tetrodotoxin with a total record of 18970, The highest cytotoxins record was related to Pertussis toxin with a total record of 14390, The highest dermally hazardous cytotoxins record was related to Zearalenone with a total record of 2656.

Conclusion

Priorities of Bioterrorism Tracking and Biological Warfare Tracking are early detection, public health and control of these agents, So the micro and macro policies should focus on these.

Keywords: Biological Warfare, Text Mining, Neurotoxins

متن کاوی مقالات رشته مهندسی کامپیوتر بر اساس اطلاعات بازیابی شده از پایگاه Web of Science

علی سلطانی نژاد، محمد احمدی نیا*

مجله دانشکده مدیریت و اطلاع رسانی پزشکی کرمان، سال سوم شماره 2 (پیاپی 5، تابستان 1396)، صص 201 -209

مقدمه

این پژوهش باهدف بررسی مستندات رشته مهندسی کامپیوتر بازیابی شده از پایگاه Web of Science به منظور انجام خوشه بندی و متن کاوی آن ها صورت گرفته است.

روش پژوهش

روش این پژوهش از نوع توصیفی - تحلیلی است که به روش پیمایشی انجام شده و رویکرد متن کاوی را مورد نظر قرار داده است. جامعه پژوهش، مدارک حوزه مهندسی کامپیوتر نمایه شده در پایگاه Web of Science بود که در بازه زمانی 2004 تا 2014، 6186 رکورد گزارش شد. داده های جمع آوری شده، با استفاده از نرم افزار هیست سایت و اکسل نسخه 2013 و همچنین نرم افزار رپیدماینر نسخه 7.3 تجزیه و تحلیل شدند.

یافته ها

برای خوشه بندی پس از پیش پردازش داده ها و اجرای الگوریتم خوشه بندی k-means، 8 خوشه اصلی با عناوین «اینترنت و فناوری»، «امنیت سیستم های اطلاعات سلامت»، «انسان و تعامل با رایانه»، «وب پنهان»، «مدل های کامیپوتری»، «عملکرد سیستم های کامپیوتری»، «شبکه ها و پایگاه های اطلاعاتی»، «الگوریتم ها و روش های کشف دانش» و خوشه اینیز با عنوان «سایر موضوعات» تشکیل شد. به منظور ارزیابی خوشه ها از دو معیار دقت و بازیافت استفاده شد و برای هر دو معیار عدد 0/81 به دست آمد.

نتیجه گیری

استفاده از کلماتی که به عنوان کلمات کلیدی در خوشه بندی ها انتخاب شده اند، می تواند به کاربر در صرفه جویی در وقت و بازیابی اطلاعات مرتبط کمک کند.

کلید واژگان: متن کاوی, خوشه بندی, الگوریتم k-means, پایگاه Web of Science

Text Mining of Computer Engineering Articles Based on the Documents Retrieved from the Web of Science Database

Soltani Nejad A *_Ahmadiniya M

Journal of Management and Medical Informatics School, Volume:3 Issue: 2, 2017, PP 201 -209

Introduction

The aim of this study was to evaluate text mining and clustering of computer engineering documents retrieved from the Web of Science database.

Methods

This is a descriptive-analytical study which was conducted in a survey method using text mining approach. The research community was all computer engineering documents indexed in the Web of Science, among which 6016 cases were reported between 2004 and 2016. The collected data were analyzed by HistCite software, Excel version 2013 and RapidMiner version 7.3.

Results

In order to perform clustering, after preprocessing the data and running K-means (a clustering algorithm), 8 main clusters were established. The clusters were Internet and Technology, Security of Healthcare Information Systems, Human-Computer Interaction, Semantic Web, Computer Models, Computer Systems Performance, Networks & Databases, Knowledge Discovery Algorithms and Other Topics. To evaluate the clusters, two criteria of precision and recall were used and a value of 0.81 was obtained for both criteria.

Discussion and Conclusion

Using words selected as keywords in the clustering can help the user save time and retrieve the related information.

Keywords: Text mining, Clustering, K-means algorithm, Web of Science Database

A Knowledge Map for Hospital Performance Concept: Extraction and Analysis: A Narrative Review Article

Nader Markazi, Moghaddam, Mohammad Arab, Hamid Ravaghi, Arash Rashidian, Toktam Khatibi, Sanaz Zargar Balaye Jame

Iranian Journal of Public Health, Volume:45 Issue: 7, Jul 2016, PP 843 -854

Background

Performance is a multi-dimensional and dynamic concept. During the past 2 decades, considerable studies were performed in developing the hospital performance concept. To know literature key concepts on hospital performance, the knowledge visualization based on co-word analysis and social network analysis has been used.

Methods

Documents were identified through PubMed searching from1945 to 2014 and 2350 papers entered the study after omitting unrelated articles, the duplicates, and articles without abstract. After pre-processing and preparing articles, the key words were extracted and terms were weighted by TF-IDF weighting schema. Support as an interestingness measure, which considers the co-occurrence of the extracted keywords and "hospital performance" phrase was calculated. Keywords having high support with "hospital performance" are selected. Term-term matrix of these selected keywords is calculated and the graph is extracted.

Results

The most high frequency words after Hospital Performance were mortality and efficiency. The major knowledge structure of hospital performance literature during these years shows that the keyword mortality had the highest support with hospital performance followed by quality of care, quality improvement, discharge, length of stay and clinical outcome. The strongest relationship is seen between electronic medical record and readmission rate.

Conclusion

Some dimensions of hospital performance are more important such as efficiency, effectiveness, quality and safety and some indicators are more highlighted such as mortality, length of stay, readmission rate and patient satisfaction. In the last decade, some concepts became more significant in hospital performance literature such as mortality, quality of care and quality improvement.

Keywords: Hospital performance, Knowledge mapping, Social network analysis, Co, word analysis, Text mining

طراحی و پیاده سازی فرم الکترونیکی ساختارمند برای گزارش های پاتولوژی بیماری سلیاک: رویکرد متن کاوی

آزاده کامل قالیباف، فرزانه خادم ثامنی، مجید جنگی، محمدرضا مظاهری حبیبی، کبری اطمینانی

نشریه مدیریت اطلاعات سلامت، سال سیزدهم شماره 1 (پیاپی 47، فروردین و اردیبهشت 1395)، ص 19

مقدمه

گزارش پاتولوژی به صورت متن باز تهیه می شود و شامل شبکه ای از روابط بین مفاهیم پزشکی است که پزشک از آن برای استدلال و تشخیص استفاده می کند. این مطالعه با هدف، طراحی و ارزیابی مدلی جهت استخراج خودکار این مفاهیم و تبدیل آن به فرم ساختار یافته و قابل تحلیل توسط کامپیوتر انجام شد.

روش بررسی

تحقیق حاضر از نوع کاربردی و اجرایی بود و بر روی 258 گزارش پاتولوژی با تشخیص بیماری سلیاک که به صورت تصادفی از دو آزمایشگاه پاتوبیولوژی جمع آوری شد، صورت گرفت. سیستم پیشنهاد شده شامل سه فاز اصلی بود. فاز اول به طراحی یک فرم استاندارد و ساختارمند برای گزارش بیوپسی بیماری سلیاک با استفاده از روش Delphi ارتباط داشت. در فاز دوم با به کارگیری ابزارهای متن کاوی ارایه شده توسط مرکز زبان شناسی دانشگاه استنفورد و برنامه واسط طراحی شده به منظور تفسیر قطعات معنایی، اطلاعات مورد نظر از متن گزارش استخراج و در قالب فرم استاندارد ذخیره گردید. در فاز سوم، کلاس Marsh مربوط به هر گزارش با استفاده از الگوریتم یادگیری درخت تصمیم 48J، به صورت خودکار تعیین شد.

یافته ها

عملکرد سیستم در فاز استخراج اطلاعات و انتساب مقادیر به فیلدهای فرم استاندارد، صحت 76 درصدی را نشان داد. صحت سیستم در تعیین خودکار طبقه بندی Marsh بر اساس خروجی مرحله قبل، 62 درصد به دست آمد که در صورت ارایه داده های تصحیح شده و بدون خطا، صحت الگوریتم دسته بندی تا 84 درصد افزایش می یابد.

نتیجه گیری

در مطالعه حاضر با طراحی و پیاده سازی مدلی برای ساختارمند کردن گزارش های پاتولوژی بیماری سلیاک، علاوه بر تسهیل و تسریع در ورود و بازیابی اطلاعات و افزایش خوانایی گزارش، امکان پردازش کامپیوتری داده ها و پیدا کردن روابط و الگوها نیز میسر گردید.

کلید واژگان: متن کاوی, بیماری سلیاک, سیستم پشتیبان تصمیم بالینی, روش Delphi, درخت تصمیم

Design and Implementation of a Structured Electronic Form for Celiac Disease ‎Pathology ‎Reports: A Text Mining Approach

Azadeh Kamel, Ghalibaf, Farzaneh Khadem, Sameni, Majid Jangi, Mohammad Reza Mazaheri, Habibi, Kobra Etminani

Health Information Management, Volume:13 Issue: 1, 2016, P 19

Introduction

Pathology reports generally use an unstructured text format and contain a complex web of ýrelations between medical concepts. In order to enable computers to understand and analyze ýthe reports free text, we aimed to convert these concepts and their relations into a structured ýformat.ý

Methods

The training, validation, and evaluation of this implementation study was based on a corpus ýof 258 pathology reports with a positive diagnosis of celiac disease randomly selected from ýamong the records of 2 pathology laboratories. Our proposed system consisted of 3 phases of ýstandardization of celiac disease pathology reports using Delphi technique with 3 experts, ýinformation extraction from free text reports with text mining techniques using Stanford ýParser, and automatic classification of celiac disease stages in marsh system using decision ýtree classifier J48 algorithm.ý

Results

We were successful in extracting information from free text pathology reports and assigning ýeach piece of information to the associated pre-defined fields in standardized template form ýwith an accuracy of 76%. After determining marsh stage for each report in the third phase, ýour system showed an average overall accuracy of 62%. Evaluation of the third phase as an ýindependent system with manually corrected, gold-standard input achieved an accuracy of ýgreater than 84%.ý

Conclusion

The benefits of standardized synoptic pathology reporting include enhanced completeness ýand improved consistency, avoidance of confusion and error, and facilitation of the faster and ýsafer transmission of critical pathological data in comparison with narrative reports.ý

Keywords: Text Mining, Celiac disease, Decision Support Systems, Clinical, Delphi Technique, Decision ýTrees

A novel AIDS/HIV intelligent medical consulting system based on expert systems

Alireza Pour Ebrahimi, Abbas Toloui Ashlaghi, Maryam Mahdavy Rad

Journal of Education and Health Promotion, Volume:3 Issue: 9, Sep 2013, P 54

Background

The purpose of this paper is to propose a novel intelligent model for AIDS/HIV data based on expert system and using it for developing an intelligent medical consulting system for AIDS/HIV.

Materials and Methods

In this descriptive research, 752 frequently asked questions (FAQs) about AIDS/HIV are gathered from numerous websites about this disease. To perform the data mining and extracting the intelligent model, the 6 stages of Crisp method has been completed for FAQs. The 6 stages include: Business understanding, data understanding, data preparation, modelling, evaluation and deployment. C5.0 Tree classification algorithm is used for modelling. Also, rational unified process (RUP) is used to develop the web‑based medical consulting software. Stages of RUP are as follows: Inception, elaboration, construction and transition. The intelligent developed model has been used in the infrastructure of the software and based on client’s inquiry and keywords related FAQs are displayed to the client, according to the rank. FAQs’ ranks are gradually determined considering clients reading it. Based on displayed FAQs, test and entertainment links are also displayed.

Result

The accuracy of the AIDS/HIV intelligent web‑based medical consulting system is estimated to be 78.76%.

Conclusion

AIDS/HIV medical consulting systems have been developed using intelligent infrastructure. Being equipped with an intelligent model, providing consulting services on systematic textual data and providing side services based on client’s activities causes the implemented system to be unique. The research has been approved by Iranian Ministry of Health and Medical Education for being practical.

Keywords: AIDS, HIV, data mining, intelligent system, medical informatics, software engineering, text mining

به جمع مشترکان مگیران بپیوندید!

جستجوی مقالات مرتبط با کلیدواژه "text mining" در نشریات گروه "پزشکی"