Document Type

Project

Lead Author Type

CIS Masters Student

Advisors

Dr. Jerry Scripps, scrippsj@gvsu.edu

Embargo Period

5-21-2013

Abstract

With the vast amount of data that the world has nowadays, institutions are looking for more and more accurate ways of using this data. Companies like Amazon use their huge amounts of data to give recommendations for users. Based on similarities among items, systems can give predictions for a new item’s rating. Recommender systems use the user, item, and ratings information to predict how other users will like a particular item.

Recommender systems are now pervasive and seek to make profit out of customers or successfully meet their needs. However, to reach this goal, systems need to parse a lot of data and collect information, sometimes from different resources, and predict how the user will like the product or item. The computation power needed is considerable. Also, companies try to avoid flooding customer mailboxes with hundreds of products each morning, thus they are looking for one email or text that will make the customer look and act.

The motivation to do the project comes from my eagerness to learn website design and get a deep understanding of recommender systems. Applying machine learning dynamically is one of the goals that I set for myself and I wanted to go beyond that and verify my result. Thus, I had to use a large dataset to test the algorithm and compare each technique in terms of error rate. My experience with applying collaborative filtering helps me to understand that finding a solution is not enough, but to strive for a fast and ultimate one. In my case, testing my algorithm in a large data set required me to refine the coding strategy of the algorithm many times to speed the process.

In this project, I have designed a website that uses different techniques for recommendations. User-based, Item-based, and Model-based approaches of collaborative filtering are what I have used. Every technique has its way of predicting the user rating for a new item based on existing users’ data. To evaluate each method, I used Movie Lens, an external data set of users, items, and ratings, and calculated the error rate using Mean Absolute Error Rate (MAE) and Root Mean Squared Error (RMSE). Finally, each method has its strengths and weaknesses that relate to the domain in which I am applying these methods.

Share

COinS