جستجوی مقالات مرتبط با کلیدواژه "text mining" در نشریات گروه "پزشکی"
-
Introduction
After a cancer diagnosis, the most important thing is to determine the stage and grade of the cancer. Pathology reports are the main source for cancer staging, but they do not contain all the information needed for the staging. However, the text of these reports is sometimes the only available information. We were interested in knowing whether text mining methods can be used to predict staging only from pathology reports.
Material and MethodsA total of 698 pathology reports of breast cancer cases and their TNM staging collected from multiple centers in West Azerbaijan Province, Iran were used for this study. After preparing the semi-structured reports, the texts of the reports were imported into a program written by Python V3. Three machine learning algorithms of Logistic Regression, SVM, and Naïve Bayes and a simple pipeline were used for the purpose of text mining. The performance of the algorithms was evaluated in terms of accuracy, precision, recall, and F1 score.
ResultsThe Naïve Bayes algorithm achieved excellent results and a value rate of higher than 91% in all evaluation criteria (accuracy, precision, recall and F1 score). This means that the Naïve Bayes algorithm could classify the reports with high efficiency and its predictions were more correct than the other two algorithms. Naïve Bayes also outperformed SVM and Logistic Regression in terms of accuracy, recall and F1 score. In addition, Naïve-Bayes showed faster inference due to its simplicity and lower computational and training time.
ConclusionWe suggest using the proposed design in this study for predicting breast cancer staging, where there is a need but not all necessary information except pathology reports. This method may not be a useful for clinical management of cancer patients, but it can be safely used for epidemiological estimations.
Keywords: Breast Cancer, Pathology Reports, Text Mining, NLP, TNM Stage, Machine Learning -
زمینه و اهداف
حجم بسیار بالای انتشارات معتبر COVID-19 در سراسر جهان، ضرورت پایش و تحلیل متون علمی COVID-19 را برای پژوهشگران در سطح خرد و برای سیاست گذاران و برنامه ریزان در سطح کلان بیش از پیش آشکار می سازد. به بیان دیگر، نتایج منتج از تحلیل مدارک منتشرشده COVID-19 با روش ها و تکنیک های متنکاوی از جایگاه و اهمیت ویژهای برای پژوهشگران، سیاست گذاران و برنامه ریزان علوم پزشکی در سطح ملی و بین المللی برخوردار است و ضرورت انجام چنین پژوهشی را بیش از پیش آشکار می سازد. هدف اصلی پژوهش حاضر شناسایی موضوعات نو ظهور و روند تغییر در واژگان علمی در سطح ملی و بین المللی حوزه موضوعی COVID-19 با روش متن کاوی است.
مواد و روش کارنوع پژوهش حاضر، کاربردی است. این پژوهش با استفاده روش متن کاوی و الگوریت مها و تکنیک های مربوط به آن و همچنین طبقه بندی متون با رویکرد تحلیلی-تطبیقی انجام شده است. جامعه پژوهش حاضر شامل کلیه انتشارات COVID-19 نمایه شده در پایگاهPubMed Central® (PMC) است. تا تاریخ بیست خردادماه سال 1400 تعداد رکوردهای بازیابی شده از پایگاه PubMed Central® (PMC)، 160862 مورد بود. از این تعداد 3143 مورد انتشارات ملی و 157719 مورد انتشارات بین المللی COVID-19 است. در این پژوهش از زبان برنامه نویسی پایتون و کتابخانه های مرتبط با این برنامه استفاده شد. مهم ترین واژگان بر اساس وزن دهی TF-IDF نیز شناسایی و گزارش شد. موضوعات نوظهور با توجه به رشد میانگین وزنی، شناسایی شدند.
یافته هاتحلیل داده ها حاکی از آن است که “covid”، “infect” و “cell” از مهم ترین واژگان بکار رفته در انتشارات بین المللی COVID-19 و “patient”، “SARS-Cov” و “covid” مهم ترین واژگان انتشارات ملی هستند.
نتیجه گیریدر خصوص روند تغییرات واژگان مورد استفاده در انتشارات COVID-19 از مهمترین نتایجی که میتوان استنباط نمود تفاوت اساسی بین مهمترین واژه های انتشارات بین المللی با ملی و تاکید پژوهش های بین الملل بر کرونا و عفونت ناشی از آن و در سطح ملی بر بیماران و کرونا است. نتیجه مهم دیگر تغییرات سالانه بوجود آمده در واژه ها در سطح انتشارات ملی و بین المللی است. شایان ذکر است که تغییرات واژه ها به خصوص در انتشارات ملی و بین المللی همراستا با اتفاقات و رویدادهای مهم علمی است.
کلید واژگان: کووید-19, متن کاوی, فراوانی وزنی تی اف-آی دی اف, طبقه بندی, خوشه بندی, موضوعات نوپدید, پایتونBackground and AimThe results from the analysis of COVID-19 literature by employing text-mining techniques are of particular importance for researchers, policymakers, and planners of medical sciences at the national and international levels, avoiding parallel research and waste of time and budget. The paper explore emerging topics and the trend of scientific words at the national and international levels in the subject area of COVID-19.
Materials and MethodsThis applied research was conducted by employing the text-mining and its related algorithms and classifying texts. The population consists of all COVID-19 articles indexed in PubMed Central® (PMC). The number of records retrieved was 160,862 items until June 10, 2021. Among these, 3143 national and 157,719 international COVID-19 articles. Python and its related libraries were applied. The most significant words were also identified and reported based on TF-IDF weighting. Emerging topics were identified according to the weighted average growth.
Results"COVID", "infect", and "cell" were among the most important words used in international COVID-19 articles. In addition, the most important words in the national COVID-19 articles were "patient", "SARS-Cov", and "COVID".
ConclusionAmong the most important conclusions that can be inferred from the trend of word change used in the COVID-19 literature is that the most significant words in international literature differ significantly from those in national literature, as international research focuses on COVID-19 and the infections caused by it. In contrast, national research focuses on COVID-19 and patients. Another significant result is the annual word-changing national and international literature.
Keywords: Covid-19, Text Mining, TF-IDF, Classification, Clustering, Emerging Topics, Python -
Background
Due to the increased publication of articles in various scientific fields, analyzing the published topics in specialized journals is important and necessary.
ObjectivesThis research has identified the published topics in global publications in the health information technology (HIT) field.
MethodsThis study analyzed articles in the field of HIT using text-mining techniques. For this purpose, 162,994 documents were extracted from PubMed and Scopus databases from 2000 to 2019 using the appropriate search strategy. Text mining techniques and the Latent Dirichlet Allocation (LDA) topic modeling algorithm were used to identify the published topics. Python programming language has also been used to run text-mining algorithms.
ResultsThis study categorized the subject of HIT-related published articles into 16 topics, the most important of which were Telemedicine and telehealth, Adoption of HIT, Radiotherapy planning techniques, Medical image analysis, and Evidence-based medicine.
ConclusionsThe results of the trends of subjects of HIT-related published articles represented the thematic extent and the interdisciplinary nature of this field. The publication of various topics in this scientific field has shown a growing trend in recent years.
Keywords: Health Information Technology, Text Mining, Scientific Publications, Trend, Health Information Management -
Background
Type 2 Diabetes Mellitus (T2DM) has emerged as a major threat to global health that fosters life-threatening clinical complications, taking a huge toll on our society. More than 65 million Indians suffer from T2DM, making it one of the leading causes of death. T2DM and associated complications have to be constantly monitored and managed which reduces the overall quality of life and increases socioeconomic burden. Therefore, it is crucial to develop specific treatment and management strategies. In order to achieve this, it is essential to understand the underlying genetic causes and molecular mechanisms.
MethodsIntegrated gene network and ontology analyses facilitate prioritization of plausible candidate genes for T2DM and also aid in understanding their mechanistic pathways. In this study, T2DM-associated genes were subjected to sequential interaction network and gene set enrichment analysis. High ranking network clusters were derived and their interrelation with pathways was assessed.
ResultsAbout 23 significant candidate genes were prioritized from 615 T2DM-associated genes which were overrepresented in pathways related to insulin resistance, type 2 diabetes, signaling cascades such as insulin receptor signaling pathway, PI3K signaling, IGFR signaling pathway, ERBB signaling pathway, MAPK signaling pathway and their regulatory mechanisms.
ConclusionOf these, two tyrosine kinase receptor genes-EGFR and IGF1R were identified as common nodes and can be considered to be significant candidate genes in T2DM.
Keywords: Gene ontology, Hub genes identification, In silico analysis, Text mining, Type 2diabetes mellitus -
سابقه و هدف
سیاستگذاران تلاش میکنند تا عملکرد علمی کشور خود را مورد ارزیابی قرار داده و آن را از نظر اثربخشی و حل مشکلات مورد سنجش قرار دهند. این مقاله به مقایسه تحلیلی مدارک علمی ایران در حوزه موضوعی متنکاوی بر اساس پایگاههای داخلی و خارجی میپردازد.
مواد و روشها:
پژوهش حاضر از نوع توصیفی- پیمایشی و با رویکرد کتابسنجی انجام شده است. برای بازیابی مدارک علمی مرتبط با متنکاوی در پایگاه اسکوپوس عبارات مرتبط با آن جستجو و سپس نتایج به ایران محدود شد. برای بازیابی مدارک علمی مجلات داخلی از پایگاه مرکز اطلاعات علمی جهاد دانشگاهی به شیوه مشابه استفاده شد. برای تجزیه و تحلیل دادهها از نرمافزارهای Bibexcel، Vosviewer، زبان برنامهنویسی Python و Excel استفاده شد.
یافتهها:
تعداد کل مدارک علمی ایران در حوزه موضوعی متنکاوی در پایگاه استنادی اسکوپوس، برابر با 1082 است. 284 مدرک علمی (26/25%) از مدارک علمی نمایه شده در اسکوپوس، بر زبان فارسی متمرکز هستند. همچنین بر اساس دادههای پایگاه مرکز اطلاعات علمی، تعداد مدارک علمی این حوزه موضوعی برابر با 89 و مدارک علمی متمرکز بر زبان فارسی برابر با 51 (57/30%) است. مجله Lecture notes in computer science بیشترین تعداد مدارک علمی بینالمللی ایران و مجله پردازش علایم و دادهها، بیشترین تعداد مدارک علمی داخلی ایران را در حوزه موضوعی متنکاوی منتشر کردهاند. با استفاده از آزمون تی مستقل مشخص شد بین تعداد مدارک علمی متمرکز بر زبان فارسی پایگاه اسکوپوس و مرکز اطلاعات علمی جهاد دانشگاهی، تفاوت معناداری وجود دارد (0/0001>p).
نتیجهگیری:
میانگین نرخ رشد مدارک علمی ایران در حوزه متنکاوی بالاتر از حوزههای موضوعی دیگر است. کشورهای آمریکا، انگلیس و استرالیا بیشترین میزان مشارکت را با محققان ایرانی در این حوزه موضوعی داشتهاند. همچنین مشخص شد مدارک علمی بینالمللی که بر زبان انگلیسی متمرکز هستند، استناد بیشتری نسبت به مدارک علمی متمرکز بر زبان فارسی دریافت میکنند.
کلید واژگان: داده کاوی, متن کاوی, ارزیابی علم, کتاب سنجی, پردازش زبان طبیعیBackground and aimPolicymakers seek to evaluate their country's scientific performance and measure it in terms of effectiveness and problem-solving. The aim of this study was to make an analytical comparison of Iranian scientific documents in text mining based on domestic and foreign databases.
Materials and methodsThe present study is descriptive survey with a bibliometric approach. In order to find scientific documents related to text mining in the Scopus database, related terms were searched, and then the results were limited to Iran. Scientific Information Database (SID) was used to search for Persian scientific documents. Bibexcel, VOSviewer, Python programming language, and Excel 2017 were used to analyze the data.
FindingsThe total number of Iranian scientific documents in text mining in the Scopus citation database was 1082 and 284 (26.25%) of scientific documents indexed in Scopus were in Persian. Moreover, according to the Scientific Information Center, the number of scientific documents in this field was 89 and the number of scientific documents in Persian was 51 (57.30%). The Journal of Lecture Notes in Computer Science has published most international scientific papers in Iran, and the Journal of Signal and Data Processing has published most domestic scientific papers in Iran in text mining. A t-test was used to determine that there was a significant difference in the number of scientific documents in Persian between Scopus and SID databases (p<0.0001).
ConclusionThe average growth rate of Iranian scientific documents in text mining was higher than in other subject areas. The United States, Britain, and Australia have had the most collaboration with Iranian researchers in this field. It was also found that international scientific documents in English received more citations than scientific documents in Persian.
Keywords: Data mining, Text mining, Evaluating science, Bibliometrics, Natural language processing -
Context:
Hospitals are large information organizations with the main goal of providing high-quality, integrated, and cost-effective healthcare. This goal is more easily realized through well-designed Hospital Information Systems [HIS].
Evidence Acquisition:
In this narrativereview study, 98 articles were extracted from Science Direct, PubMed, and Google Scholar databasesusing "Hospital Information System"keyword.The articles werepublished between1980 and2018. After examining the quality of the articles in terms of research design and references, 41 articles remained for analysis. Relevant e-books and print books were also examined, and the features and services of HIS were investigated.
ResultsFor HIS, seven features, namelythe coverage of differenttypes of data, integration of subsystems, having an enterprise metamodel, communication with other information systems, coverage of hospital units, adherence to standards, and connectivity to digital instruments were obtained. Moreover, 18 servicesofpatient management, economic management and cost reduction, legal management of data, treatment management, administrative management, presenting information based on policies, clinical decision support, managerial-administrative decision support, educationalsupport, research support, electronic medical record generation, text mining, encoding, documentation quality improvement, medical support, resource utilization management, personnel management, and warehouse management were determined.
ConclusionsTo evaluate HIS, it is necessary to determine its features and services. Based on the features and services of HIS, itsevaluation tool has been developedin this study
Keywords: Hospital Information System, Metamodel, Electronic Medical Record, Text Mining -
Context
Nowadays, due to the increased publication of articles in various scientific fields, identifying the publishing trend and emerging keywords in the texts of these articles is essential. Thus, the present study has identified and analyzed the keywords used in published articles on medical librarianship and information.
Materials and MethodsIn the present investigation, an exploratory and descriptive approach has been used to analyze librarianship and information articles published in specialized journals in this field from 1964 to 2019 by applying text mining techniques. The TF-IDF weighting algorithm has been applied to identify the most important keywords used in the articles. Python programming language has also been used to implement text mining algorithms.
ResultsThe results obtained from the TF-IDF algorithm indicate that the words “library”, “patient”, and “inform” with the weights of 95.087, 65.796, and 63.386, respectively, were the most important keywords in the published articles on medical librarianship and information. Also, the words “Catalog”, “Book” and “Journal” were the most important keywords used in the articles published between the years 1960 and 1970, and the words “Patient”, “Bookstore” and “Intervent” were the most important keyword used in articles on medical librarianship and information published from 2015 to 2020. The words “Blockchain”, “telerehabilit”, “Instagram”, “WeChat”, and “comic” are new keywords observed in articles on medical librarianship and information between 2015 and 2020.
ConclusionThe results of the present study have revealed that the keywords used in articles on medical librarianship and information have not been consistent over time and have undergone an alteration at different periods so that nowadays, this field of science has also changed following the needs of society with the advent and growth of information technologies.
Keywords: librarianship, information, Medical, Analysis, Text mining -
مقدمه
تقلب، آثار مستقیم و غیرمستقیمی بر بیمه گران و بیمه گذاران دارد. با توجه به ذات و ماهیت صنعت بیمه، تصمیم های مدیران و مسیولان بدون انجام بررسی و پژوهش کافی به نتیجه مطلوب نخواهد رسید. بنابراین، این پژوهش به مطالعه و بررسی تقلب در بیمه های درمان تکمیلی و راه های مقابله با آن با رویکرد کاربردی پرداخته است.
روش بررسیبا مطالعه تطبیقی تجربیات موفق کشورهای پیشرو در زمینه مبارزه با تقلب در رشته بیمه درمان تکمیلی و همچنین مصاحبه با خبرگان حوزه، فرایندها، عوامل زمینه ساز و آثار تقلب در بیمه درمان تکمیلی، موانع و چالش های موجود در فرایندهای مذکور شناسایی و در نهایت، به ارایه راهکارهای پیشگیری و کنترل این پدیده پرداخته شد. مصاحبه ها در قالب متن در نرمافزارMAXQDA بارگذاری و سپس به متن کاوی پرداخته شد.
یافته هادر مجموع، 34 عامل که زمینه بروز تقلب در صنعت بیمه درمان تکمیلی را ایجاد می کنند، شناسایی و در 6 گروه راهکارهای مربوط به قوانین و مقررات، راهکارهای مربوط به فرایندها، راهکارهای مربوط به تکنولوژی، راهکارهای مربوط به نهادها و سازمان های مرتبط، راهکارهای آموزشی و راهکارهای فرهنگی دسته بندی شدند. یافته های پژوهش نشان دادند از بین عوامل زمینه ساز تقلب بیشترین عامل تاثیرگذار مربوط به ناکارایی نهاد ناظر و چشم پوشی از پدیده تقلب در بیمه بود.
نتیجه گیریدر صورتی که شرکت های بیمه اقدام به طراحی و راه اندازی سامانه ای جامع و یکپارچه کنند، امکان جلوگیری از میزان بالایی از تقلب ها فراهم خواهد شد.
کلید واژگان: بیمه درمان تکمیلی, تقلب, متن کاویIntroductionFraud has direct and indirect effects on insurers and insured. Due to the nature of the insurance industry, the decisions of managers and officials will not achieve the desired result without conducting sufficient research. Therefore, this research has studied and investigated fraud in complementary health insurance and ways to deal with it through a practical approach.
MethodsIn this regard, by comparative study of successful experiences of leading countries in the fight against fraud in the field of complementary health insurance and also interviews with experts in this field, processes, underlying factors and effects of fraud in complementary health insurance, obstacles and challenges in these processes was identified. Finally solutions were provided to prevent and control this phenomenon. The interviews were uploaded in text format to MAXQDA software and then analyzed.
ResultsIn total, 34 factors that cause fraud in the complementary health insurance industry have been identified and divided into six groups of “Rules and Regulations”, “Process Solutions”, “Technology Solutions”, and “Solutions related to institutions and organizations, “Educational Strategies” and “Cultural Strategies”. Findings showed that among the underlying factors of fraud, the most influential factor was “inefficiency of central insurance of Iran” and “ignoring the phenomenon of insurance fraud.
ConclusionIf insurance companies design and launch comprehensive and integrated systems, it will be possible to prevent a high level of fraud.
Keywords: Supplementary Health Insurance, Fraud, Text Mining -
Background and Aim
The existence of an intellectual structure for every field is essential for managers and scholars. Intellectual structures provide a comprehensive map of knowledge that can guide researchers and managers to have a better view of their fields. Besides, with high-speed and massive amounts of data and information generation, reading and surveying of all resources are severely tricky. Intellectual maps solve this problem and make a situation for control and monitoring this voluminous and high-speed generated data. Epidemiology is regarded as one of the exciting fields which many researchers focused on it. A study of the structure and criteria of different epidemiological fields has not been done yet. Indeed, there is no serious effort for knowledge discovery of hidden information on epidemiological texts.
MethodsIn this paper, in order to survey this field, an intellectual structure is provided using co-word analysis. Utilizing co-word analysis discloses relationships and structure among research subjects and topics in a field.
ResultsFinally, four main clusters were determined, namely: genetic (with 30.53% of surveyed papers), illness (29.47%), modeling (23.16%), and prevention (16.84%).
ConclusionAccording to epidemiology co-word network, epidemiology area has not been studied from enough different areas, especially from novel technologies
Keywords: Intellectual structure of epidemiology, Co-word analysis, Text mining, Graph mining, Social network analysis -
مقدمه
دسترسی به اطلاعات کامل بیمار، نقش مهمی در بهبود مراقبت های بالینی و کاهش اشتباهات پزشکی دارد. در این خصوص، پرونده الکترونیک سلامت، قسمت اصلی یک سیستم اطلاعات سلامت یکپارچه محسوب می شود. هدف از انجام پژوهش حاضر، تحلیل کتاب سنجی و متن کاوی تولیدات علمی منتشر شده در حوزه پرونده الکترونیک سلامت در پایگاه PubMed بود.
روش بررسیاین مطالعه به روش کتاب سنجی و متن کاوی در بازه زمانی سال های 2009 تا 2019 بر روی 6863 مقاله انجام شد. داده ها با استفاده از نرم افزارهای Excel و VOSviewer و ابزار Voyant مورد تجزیه و تحلیل قرار گرفت.
یافته هادر حوزه مورد نظر، موضوعات پرونده الکترونیک سلامت، سلامت، مراقبت بهداشتی و سیستم های مراقبت بهداشتی اهمیت زیادی در پایگاه PubMed داشت. تولید مقالات در حوزه پرونده الکترونیک سلامت طی ده سال روندی صعودی را نشان داد و کشور آمریکا پرتولیدترین کشور در این حوزه بود. بیشترین مقالات به David Bates، Dean Sittig و Hardeep Singh اختصاص داشت.
نتیجه گیرینقشه هم رخدادی واژگان برای هر کدام از واژه ها، نماینده یک مفهوم یا حوزه تحقیقاتی در سلامت می باشد. نتایج به دست آمده می تواند دید روشنی به منظور سیاست گذاری علمی این حوزه برای تاثیرگذاری بر تخصیص و توزیع منابع در فعالیت های علمی و فنی ارایه نماید. همچنین، می تواند به محققان در انتخاب موضوعات داغ و کسب بینش جامعی از چارچوب علمی حوزه مورد نظر کمک نماید.
کلید واژگان: تولیدات علمی, پرونده های الکترونیک سلامت, کتابسنجی, متنکاوی, PubMedIntroductionAccess to patient’s complete information is critical in improving clinical care and reducing medical errors. Electronic Health Record is a collection of individuals' health information, from prenatal to posthumous, which is stored electronically, is available at any center and at any time, and is an integral part of an integrated health information system. The purpose of the present study was bibliometric and text-mining analyze of scientific products in the field of Electronic Health Records in PubMed database.
MethodsThis present study was carried out using bibliometric method and text mining. The study was conducted in the academic year of 2019 in PubMed database on the period of 2009-2019, and 6863 articles were selected for review. Excel, VOSviewer and Voyant were used for data analysis.
ResultsIn the studied field, issues of electronic health records, health, health care, information, health care systems were of great importance in PubMed. Developing articles in this field had been on the rise for ten years, and the United States was the most productive country in the field. David Bates, Dean Sittig, and Hardeep Singh had the most articles in the field of study
ConclusionEach item of co-occurring vocabulary map can represent a concept or research area in health. The findings can provide a clear insight to scientific policymaking of this field to influence the allocation and distribution of resources for scientific and technical activities. It can also help researchers in selecting the state-of-the-art topics and having a comprehensive insight into the academic context of the field.
Keywords: PubMed, Scientific Productions, Electronic Health Records, Bibliography, Text Mining -
اهدافبیوتروریسم حمله حساب شده ای است که منجر به ایجاد بیماری یا مرگ در انسان، با استفاده از ویروس ها ، باکتری ها یا مواد سمی می شود. در سال های اخیر، به دلیل افزایش رکورد مقالات آنلاین موجود در پایگاه های اطلاعاتی، به کاربرد متن کاوی و استراتژی های استخراج اطلاعات از مقالات زیست پزشکی، توجه بسیاری شده است. هدف از این مطالعه، بررسی اهمیت سموم به عنوان سلاح های زیستی با جستجو در متون پزشکی و پایگاه های اطلاعاتی پزشکی بود.
اطلاعات وروش هااین پژوهش متن کاوی و بررسی داده ای در سال 1396-1395 انجام شد. به منظور خوشه بندی نتایج جستجوی کلیدواژه های اختصاصی در شبکه و به خصوص پایگاه های اطلاعاتی پزشکی از نرم افزار آنلاین Carrot 2 استفاده شد. در زمان جستجوی کلیدواژه ها، موتور جستجوگر بر PubMed و نوع خوشه بندی بر اساس K-means تنظیم شد و در پایان نتایج به صورت نمودارهای حبابی مورد بررسی قرار گرفت.یافته هابیشترین رکورد نوروتوکسین ها مربوط به تترودوتوکسین با مجموع رکورد 18970، بیشترین رکورد سایتوتوکسین ها مربوط به پرتوسیس توکسین با مجموع رکورد 14390 و بیشترین رکورد سایتوتوکسین های خطرناک پوستی مربوط به زیرالنون با مجموع رکورد 2656 بود.نتیجه گیریاولویت های حوزه پیگیری بیوتروریسم و سلاح های زیستی، تشخیص زودرس، سلامت عمومی و کنترل این عوامل است؛ درنتیجه سیاست گذاری های خرد و کلان می بایست معطوف به این زمینه ها شود.کلید واژگان: سلاح های زیستی, متن کاوی, نوروتوکسین هاAims: Bioterrorism is an invasive attack that can cause disease or death in humans, using viruses, bacteria or toxic substances. In recent years, due to an increase in the number of online articles in databases, much attention has been paid to the application of text mining and information extraction strategies from biomedical articles. The purpose of this study was to evaluate the importance of toxins as biological weapons by searching in medical texts and medical databases.
Information &MethodsThis text mining and data research study was carried out in 2015-2016. The Carrot 2 software was used to cluster the keyword search results into the network and especially the biomedical databases. When searching for keywords, the search engine was set up on PubMed and the cluster type was based on K-means, and at the end of the results, the results were considered as foam trees.
Findings: The highest neurotoxins record was related to tetrodotoxin with a total record of 18970, The highest cytotoxins record was related to Pertussis toxin with a total record of 14390, The highest dermally hazardous cytotoxins record was related to Zearalenone with a total record of 2656.ConclusionPriorities of Bioterrorism Tracking and Biological Warfare Tracking are early detection, public health and control of these agents, So the micro and macro policies should focus on these.Keywords: Biological Warfare, Text Mining, Neurotoxins -
مقدمهاین پژوهش باهدف بررسی مستندات رشته مهندسی کامپیوتر بازیابی شده از پایگاه Web of Science به منظور انجام خوشه بندی و متن کاوی آن ها صورت گرفته است.روش پژوهشروش این پژوهش از نوع توصیفی - تحلیلی است که به روش پیمایشی انجام شده و رویکرد متن کاوی را مورد نظر قرار داده است. جامعه پژوهش، مدارک حوزه مهندسی کامپیوتر نمایه شده در پایگاه Web of Science بود که در بازه زمانی 2004 تا 2014، 6186 رکورد گزارش شد. داده های جمع آوری شده، با استفاده از نرم افزار هیست سایت و اکسل نسخه 2013 و همچنین نرم افزار رپیدماینر نسخه 7.3 تجزیه و تحلیل شدند.یافته هابرای خوشه بندی پس از پیش پردازش داده ها و اجرای الگوریتم خوشه بندی k-means، 8 خوشه اصلی با عناوین «اینترنت و فناوری»، «امنیت سیستم های اطلاعات سلامت»، «انسان و تعامل با رایانه»، «وب پنهان»، «مدل های کامیپوتری»، «عملکرد سیستم های کامپیوتری»، «شبکه ها و پایگاه های اطلاعاتی»، «الگوریتم ها و روش های کشف دانش» و خوشه اینیز با عنوان «سایر موضوعات» تشکیل شد. به منظور ارزیابی خوشه ها از دو معیار دقت و بازیافت استفاده شد و برای هر دو معیار عدد 0/81 به دست آمد.نتیجه گیریاستفاده از کلماتی که به عنوان کلمات کلیدی در خوشه بندی ها انتخاب شده اند، می تواند به کاربر در صرفه جویی در وقت و بازیابی اطلاعات مرتبط کمک کند.کلید واژگان: متن کاوی, خوشه بندی, الگوریتم k-means, پایگاه Web of ScienceIntroductionThe aim of this study was to evaluate text mining and clustering of computer engineering documents retrieved from the Web of Science database.MethodsThis is a descriptive-analytical study which was conducted in a survey method using text mining approach. The research community was all computer engineering documents indexed in the Web of Science, among which 6016 cases were reported between 2004 and 2016. The collected data were analyzed by HistCite software, Excel version 2013 and RapidMiner version 7.3.ResultsIn order to perform clustering, after preprocessing the data and running K-means (a clustering algorithm), 8 main clusters were established. The clusters were Internet and Technology, Security of Healthcare Information Systems, Human-Computer Interaction, Semantic Web, Computer Models, Computer Systems Performance, Networks & Databases, Knowledge Discovery Algorithms and Other Topics. To evaluate the clusters, two criteria of precision and recall were used and a value of 0.81 was obtained for both criteria.Discussion and ConclusionUsing words selected as keywords in the clustering can help the user save time and retrieve the related information.Keywords: Text mining, Clustering, K-means algorithm, Web of Science Database
-
BackgroundPerformance is a multi-dimensional and dynamic concept. During the past 2 decades, considerable studies were performed in developing the hospital performance concept. To know literature key concepts on hospital performance, the knowledge visualization based on co-word analysis and social network analysis has been used.MethodsDocuments were identified through PubMed searching from1945 to 2014 and 2350 papers entered the study after omitting unrelated articles, the duplicates, and articles without abstract. After pre-processing and preparing articles, the key words were extracted and terms were weighted by TF-IDF weighting schema. Support as an interestingness measure, which considers the co-occurrence of the extracted keywords and "hospital performance" phrase was calculated. Keywords having high support with "hospital performance" are selected. Term-term matrix of these selected keywords is calculated and the graph is extracted.ResultsThe most high frequency words after Hospital Performance were mortality and efficiency. The major knowledge structure of hospital performance literature during these years shows that the keyword mortality had the highest support with hospital performance followed by quality of care, quality improvement, discharge, length of stay and clinical outcome. The strongest relationship is seen between electronic medical record and readmission rate.ConclusionSome dimensions of hospital performance are more important such as efficiency, effectiveness, quality and safety and some indicators are more highlighted such as mortality, length of stay, readmission rate and patient satisfaction. In the last decade, some concepts became more significant in hospital performance literature such as mortality, quality of care and quality improvement.Keywords: Hospital performance, Knowledge mapping, Social network analysis, Co, word analysis, Text mining
-
مقدمهگزارش پاتولوژی به صورت متن باز تهیه می شود و شامل شبکه ای از روابط بین مفاهیم پزشکی است که پزشک از آن برای استدلال و تشخیص استفاده می کند. این مطالعه با هدف، طراحی و ارزیابی مدلی جهت استخراج خودکار این مفاهیم و تبدیل آن به فرم ساختار یافته و قابل تحلیل توسط کامپیوتر انجام شد.روش بررسیتحقیق حاضر از نوع کاربردی و اجرایی بود و بر روی 258 گزارش پاتولوژی با تشخیص بیماری سلیاک که به صورت تصادفی از دو آزمایشگاه پاتوبیولوژی جمع آوری شد، صورت گرفت. سیستم پیشنهاد شده شامل سه فاز اصلی بود. فاز اول به طراحی یک فرم استاندارد و ساختارمند برای گزارش بیوپسی بیماری سلیاک با استفاده از روش Delphi ارتباط داشت. در فاز دوم با به کارگیری ابزارهای متن کاوی ارایه شده توسط مرکز زبان شناسی دانشگاه استنفورد و برنامه واسط طراحی شده به منظور تفسیر قطعات معنایی، اطلاعات مورد نظر از متن گزارش استخراج و در قالب فرم استاندارد ذخیره گردید. در فاز سوم، کلاس Marsh مربوط به هر گزارش با استفاده از الگوریتم یادگیری درخت تصمیم 48J، به صورت خودکار تعیین شد.یافته هاعملکرد سیستم در فاز استخراج اطلاعات و انتساب مقادیر به فیلدهای فرم استاندارد، صحت 76 درصدی را نشان داد. صحت سیستم در تعیین خودکار طبقه بندی Marsh بر اساس خروجی مرحله قبل، 62 درصد به دست آمد که در صورت ارایه داده های تصحیح شده و بدون خطا، صحت الگوریتم دسته بندی تا 84 درصد افزایش می یابد.نتیجه گیریدر مطالعه حاضر با طراحی و پیاده سازی مدلی برای ساختارمند کردن گزارش های پاتولوژی بیماری سلیاک، علاوه بر تسهیل و تسریع در ورود و بازیابی اطلاعات و افزایش خوانایی گزارش، امکان پردازش کامپیوتری داده ها و پیدا کردن روابط و الگوها نیز میسر گردید.کلید واژگان: متن کاوی, بیماری سلیاک, سیستم پشتیبان تصمیم بالینی, روش Delphi, درخت تصمیمIntroductionPathology reports generally use an unstructured text format and contain a complex web of ýrelations between medical concepts. In order to enable computers to understand and analyze ýthe reports free text, we aimed to convert these concepts and their relations into a structured ýformat.ýMethodsThe training, validation, and evaluation of this implementation study was based on a corpus ýof 258 pathology reports with a positive diagnosis of celiac disease randomly selected from ýamong the records of 2 pathology laboratories. Our proposed system consisted of 3 phases of ýstandardization of celiac disease pathology reports using Delphi technique with 3 experts, ýinformation extraction from free text reports with text mining techniques using Stanford ýParser, and automatic classification of celiac disease stages in marsh system using decision ýtree classifier J48 algorithm.ýResultsWe were successful in extracting information from free text pathology reports and assigning ýeach piece of information to the associated pre-defined fields in standardized template form ýwith an accuracy of 76%. After determining marsh stage for each report in the third phase, ýour system showed an average overall accuracy of 62%. Evaluation of the third phase as an ýindependent system with manually corrected, gold-standard input achieved an accuracy of ýgreater than 84%.ýConclusionThe benefits of standardized synoptic pathology reporting include enhanced completeness ýand improved consistency, avoidance of confusion and error, and facilitation of the faster and ýsafer transmission of critical pathological data in comparison with narrative reports.ýKeywords: Text Mining, Celiac disease, Decision Support Systems, Clinical, Delphi Technique, Decision ýTrees
-
Background
The purpose of this paper is to propose a novel intelligent model for AIDS/HIV data based on expert system and using it for developing an intelligent medical consulting system for AIDS/HIV.
Materials and MethodsIn this descriptive research, 752 frequently asked questions (FAQs) about AIDS/HIV are gathered from numerous websites about this disease. To perform the data mining and extracting the intelligent model, the 6 stages of Crisp method has been completed for FAQs. The 6 stages include: Business understanding, data understanding, data preparation, modelling, evaluation and deployment. C5.0 Tree classification algorithm is used for modelling. Also, rational unified process (RUP) is used to develop the web‑based medical consulting software. Stages of RUP are as follows: Inception, elaboration, construction and transition. The intelligent developed model has been used in the infrastructure of the software and based on client’s inquiry and keywords related FAQs are displayed to the client, according to the rank. FAQs’ ranks are gradually determined considering clients reading it. Based on displayed FAQs, test and entertainment links are also displayed.
ResultThe accuracy of the AIDS/HIV intelligent web‑based medical consulting system is estimated to be 78.76%.
ConclusionAIDS/HIV medical consulting systems have been developed using intelligent infrastructure. Being equipped with an intelligent model, providing consulting services on systematic textual data and providing side services based on client’s activities causes the implemented system to be unique. The research has been approved by Iranian Ministry of Health and Medical Education for being practical.
Keywords: AIDS, HIV, data mining, intelligent system, medical informatics, software engineering, text mining
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.