Identification of High-risk and Low-risk Groups among Men Who Chew Tobacco through Artificial Neural Networks (ANN) and Support Vector Machines (SVM)

Document Type


Lead Author Type

MBI Masters Student


Dr. Guenter Tusch, tuschg@gvsu.edu

Embargo Period



The National Health and Nutrition Examination Survey (NHANES) is a survey research program conducted by the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC), an agency of the United States Department of Health and Human Services (HHS) of the federal government. It aims to assess the health and nutritional status of adults and children in the United States, and to track changes over time. The survey combines interviews and physical examinations. The objective of this project was to select a peer-reviewed article that used the NHANES data set, recreate the specific data set, repeat the analysis, and use a machine learning algorithm different from that one utilized in the paper to analyze the data. We chose the article “Chewing Tobacco: Who Uses and Who Quits? Findings from NHANES III, 1988–1994” by Beth Howard-Pitney and Marilyn A. Winkleby (American Journal of Public Health 2002, Vol 92, No. 2, 250-256). The paper aims to identify high-risk and low-risk groups among men who chew tobacco through a classification tree algorithm. We retrieved and filtered data from NAHNES III as per the analysis in the paper. The filtered data constitutes a sample size of 4969 men, 1866 non-Hispanic white, 1533 non-Hispanic black, and 1578 Mexican-American. The analysis was carried out on men of ages between 25 and 64. The classification tree outcome to classify high-risk and low-risk subgroups is compared to Artificial Neural Network (ANN) and Support Vector Machine (SVM) results.

This document is currently not available here.