Using Classification and K‑means Methods to Predict Breast Cancer Recurrence in Gene Expression Data
Breast cancer is a type of cancer that starts in the breast tissue and affects about 10% of women at different stages of their lives. In this study, we applied a new method to predict recurrence in biological networks made from gene expression data.
The method includes the steps such as data collection, clustering, determining differentiating genes, and classification. The eight techniques consist of random forest, support vector machine and neural network, randomforest + k‑means, hidden markov model, joint mutual information, neural network + k‑means and suportvector machine + k‑menas were implemented on 12172 genes and 200 samples.
Thirty genes were considered as differentiating genes which used for the classification. The results showed that random forest + k‑means get better performance than other techniques. The two techniques including neural network + k‑means and random forest + k‑means performed better than other techniques in identifying high risk cases.
Thirty of 12,172 genes are considered for classification that the use of clustering has improved the classification techniques performance.