Proposing a Model for Diagnosing the Type 2 Diabetes Using a Self-Organizing Genetic Algorithm
Building clinical decision support models to automatically extract knowledge from data helps physicians in early diagnosis of disease. Interpretability of the diagnostic rules of these models for understanding how they make decisions and increasing confidence in their output is a key indicator in determining their efficacy.
In this retrospective study, an automated hybrid rule extraction model is proposed for type 2 diabetes. In order to evaluate the model, the PIMA Diabetes dataset including 768 records and 9 variables was used. After removing the missing and outlier data in the data preprocessing stage, a proposed fuzzy-genetic hybrid model was implemented using MATLAB software to extract the rules. A self-organizing chromosomal structure was used to eliminate the complexity of setting genetic algorithm operators and facilitate the re-implementation of the model in other applications.
The accuracy of the proposed model on the PIMA dataset was 79.05%. This accuracy was obtained by two fuzzy rules, each of which contained only two independent variables. In addition, two single diagnostic rules for diabetic and non-diabetic individuals were presented with accuracy of 70.83% and 81.48%, respectively. The number of pregnancies, body mass index, diastolic blood pressure, diabetes pedigree function, plasma glucose concentration, and triceps skinfold thickness were the most effective factors in having or not having diabetes in the extracted rules.
The proposed model with high accuracy and interpretability is quite suitable in producing an accurate and highly interpretable set of rules as well as single rules for diagnosing diabetes or absence of diabetes. Due to its self-organizing ability, it can also be used for other binary classification purposes.