Diabetes Mellitus Prediction in Women

Document Type


Lead Author Type

MBI Masters Student


Dr. Guenter Tusch, tuschg@gvsu.edu

Embargo Period



Motivation: Diabetes Mellitus is a chronic, lifelong condition that effects the body, its diagnosis will help to improve the health of the individuals. Many of the researchers started using the bioinformatics and knowledge discovery to help in better diagnosis of this disease. The goal of the paper is to predict the occurrence of diabetes taking various factors into consideration.

Materials and Methods: The dataset was taken from the UCI Machine learning repository (Pima Indian Diabetes dataset). The statistical or machine learning models used are Logistic regression, Random forest, Support Vector Machines(SVM). To construct these models various steps such as checking the variability of the data, feature selection methods were performed. The analysis was done using the R software.

Results: Three models were constructed to predict the occurrence of diabetes. The first one was the logistic regression model with an accuracy of 80.51%, second being the random forest model with accuracy of 81.82% and for the SVM model, the accuracy was 80.51%.

This document is currently not available here.