Comparative Study of Classification Algorithms in Breast Cancer Prediction

Document Type


Lead Author Type

MBI Masters Student


Dr. Guenter Tusch, tuschg@gvsu.edu

Embargo Period



Objective(s): This study addresses the comparison of classification models for diagnosing breast cancer. The dataset has been analyzed previously with various classification techniques (naïve Bayes, decision tree, random forest, support vector machine and adaboost) with the goal to classify breast cancer as benign or malignant. Our goal is to determine how the above mentioned classification models are comparable with regard to classification of breast cancer.

Materials and Methods: The dataset used for the study was the Wisconsin Breast Cancer Diagnostic dataset obtained from the UCI Machine learning repository. Metrics used to evaluate the effectiveness of the classification models were confusion matrix statistics, the accuracy of the results and the kappa statistic. Feature selection methods such as Pearson correlation, recursive feature elimination or feature selection based on the random forest algorithm were also employed to identify the best possible features to improve the accuracy of the respective models. For the analysis R software was used.

Results: Among all the classification models the support vector machine model could achieve the best accuracy in classifying the data. The accuracies of the other models were also comparable to the support vector machine model with the adaboost and random forest models being closest.

This document is currently not available here.