HAPPy: Home Affordability Predictor in Python

Document Type


Lead Author Type

CIS Masters Student


Dr. Greg Wolffe, wolffeg@gvsu.edu

Embargo Period



From a credit and income perspective, the current home lending decision-making process is driven primarily by assessing a prospective borrower’s ability to repay a loan. Although effective from a credit risk perspective, this approach falls short of helping borrowers understand the much more nuanced question of how much they can afford to spend on a house. The current approach does not consider individual borrower preferences including savings and retirement goals and lifestyle choices. Lenders have an opportunity to develop a more guided “affordability” focused home lending experience by leveraging data that is readily available – including historical loan application data and deposit account transaction history.

This research project used the “Fannie Mae Single-Family Loan Performance Data” dataset to create a proof-of-concept home affordability prediction model. Four classifiers were implemented and assessed to determine their suitability as prediction models: a logistic regression classifier, a polynomial regression classifier, a deep neural network classifier and a random forest classifier. Several techniques were leveraged to process the Fannie Mae data and optimize model performance including synthetic minority oversampling, feature scaling / normalization, feature engineering, k-fold cross validation and grid search.

Two primary approaches were explored: using loan default status as the predictor of affordability and using monthly delinquency status to compute a custom affordability score that could be used as a predictor. Using the custom affordability scores binned into 4 classes as a predictor of affordability, the random forest classifier was able to achieve an accuracy of 96.36%, with the lowest-scoring class achieving a prediction accuracy of 92%.

This document is currently not available here.