Comparison of Supervised Machine Algorithms by Classifying a Cardiotocography Data Set
Document Type
Capstone
Advisors
Dr. Guenter Tusch; tuschg@gvsu.edu
Embargo Period
8-23-2018
Abstract
PURPOSE: To compare the performance and visualize the results of five different Supervised Machine Learning algorithms by classifying Cardiotocography dataset.
SUBJECTS: Cardiotocography is a technique to record the fetal heart rate and uterine contractions during pregnancy to examine the maternal and fetal health status. The UCI Machine Learning Repository Cardiotocography dataset contains 2126 automatically processed cardiotocograms with 21 attributes. The two-way classification of the dataset as 10-class morphological patterns and 3-class fetal status was done by three expert obstetricians. The 10-class classification was attempted in this project.
METHODS AND MATERIALS: Five different classification models based on Recursive Partitioning, Random Forest, Conditional Inference Trees, Linear Discriminant Analysis and Naïve Bayes were built. 70-30% data-splitting was used for Training-Testing process. The performances of models’ were compared in terms of accuracy and Kappa value. Confusion-matrices were converted to heat map for visual assessment of individual model performance. Visual comparison of models was done by plotting class mismatch percentages across every model. R statistical programming and Tableau software were used for model building and visualization respectively.
RESULTS: RandomForest model shown highest accuracy (86%) and kappa (.84) whereas Naive Bayes model showed lowest accuracy (55%) and Kappa (0.49). Heat map visualization of individual algorithms and class-wise mismatch percentages of every model aided in the analysis.
CONCLUSION: RandomForest algorithm has potential to classify future cardiotocography datasets. Visualization techniques such as Heatmap and Mismatch plotting should be considered while assessing the performance of the multi-class classifier.
ScholarWorks Citation
Paithankar, Shreya, "Comparison of Supervised Machine Algorithms by Classifying a Cardiotocography Data Set" (2018). Technical Library. 308.
https://scholarworks.gvsu.edu/cistechlib/308