Date Approved


Graduate Degree Type


Degree Name

Computer Information Systems (M.S.)

Degree Program

School of Computing and Information Systems

First Advisor

Jagadeesh Nandigam

Academic Year



Data has become a huge part of modern decision making. With the improvements in computing performance and storage in the past two decades, storing large amounts of data has become much easier. Analyzing large amounts of data and creating data models with them can help organizations obtain insights and information which helps their decision making. Big data analytics has become an integral part of many fields such as retail, real estate, education, and medicine. In the project, the goal is to understand the working of Apache Spark and its different storage methods and create a data warehouse to analyze data. The data used is obtained from the CMS (Centers for Medicare & Medicaid Services) website. The data consists of information such as prescriber name, drug generic name, drug brand name, drug price. Using Apache Spark, a data warehouse, the data can be queried and transformed to obtain valuable insights. Using Power BI, the data can be visualized which makes the data easier to understand and highlights the trends in the data.