Comparing Classification Algorithms to Predict Vertebral Column Disorder

Document Type

Capstone

Lead Author Type

MBI Masters Student

Advisors

Dr. Guenter Tusch, tuschg@gvsu.edu

Embargo Period

6-5-2017

Abstract

Spinal disorders are extremely common in two third of adults. In this project, we are analyzing a dataset on biomechanical features of the Vertebral Column to classify patients as normal or abnormal (disk hernia or spondilolysthesis). The dataset has 12 physical parameter measurments. After normalizing the data, we performed t-tests to determine the significance of all variables. To improve the accuracy of prediction, it is also important to determine the correlation between the variables and their impact on classification on spinal disorders. Principle component analysis, calculation of the correlation matrix and feature selection libraries are used for feature selection. The selection of the key features resulted in 6 variables left for analysis. These 6 variables explain about 95% of the data. Furthermore, we fitted a logistic regression model, we used Random forest and SVM algorithms for classification on new data and we compare their performance based on the AUC (area under curve) and the confusion matrix. The analysis was performed using R Studio and some built-in libraries that are helpful for us to automate procedures.

The results showed that the logistic regression model attained the best fit. It classified data with 0.9513 accuracy, but this model is not recommended because it may overfit the data. Hence, we recommend the random forest model, it predicted the data with 0.903486 accuracy. This model also shows the importance of the variables for model selection and is especially helpful for forward/backward stepwise selection.

This document is currently not available here.

Share

COinS