Prediction Comparative Study on Cervical Cancer Analysis in Women using Machine Learning Algorithms

Document Type


Lead Author Type

MBI Masters Student


Dr. Guenter Tusch; tuschg@gvsu.edu

Embargo Period



PURPOSE: The purpose of my capstone project is to predict which age group is likely to get cervical cancer in women and comparative study of cervical cancer analysis in women using machine learning algorithms.

METHODS AND MATERIALS: Cervical cancer arises from transformation of normal cells into tumor cells in a multistage process that generally progresses from a pre-cancerous lesion to a malignant tumor. The dataset has been obtained from UCI Machine Learning repository. The dataset consists of demographic information, habits, and historical medical records of 858 observations and 36 variables. The models used for this analysis are firth logistic regression for prediction of cervical cancer and comparative model using linear discriminant analysis and k-means clustering.

ANALYSIS: Data were cleaned and Boruta analysis was done for a selection of variables. Firth Logistic Regression was used for the prediction and model fitting by penalized maximum likelihood estimation, which helps to count for high volumes of zero in the data. For a one unit (or year) increase in age, a patient is 0.756 times less likely to have cervical cancer. Linear discriminant analysis, Decision tree, and k means clustering models was performed to detect the accuracy of the models.

This document is currently not available here.