Building Big Data Platform for End-to-End Analytics Experience in Academic Environment
Location
Hager-Lubbers Exhibition Hall
Description
ABSTRACT: The goal of this study is to highlight the skills that students could gain from experiencing end-to-end analytics on big data platforms in the academic environment in order to meet real world expectations. After almost two decades, many industries realized that incorporating the fields of computer science, statistics, and machine learning to big data produced the competitive advantage and internal development needed. Beyond the business industries, data science is increasingly becoming the foundation of projects in a variety of domains e.g., health, manufacturing, and policy. This creates a wide range of expectations regarding recent data science graduates. PURPOSE: To promote departmental consideration for incorporating big data activity in courses. PROCEDURES: A data lake infrastructure was built for large scale analyses. Data from different sources was stored to distribute storage system within data lake. The data was then extracted out of the distributed database and prepare the dataset for the analysis. The exploratory data analysis was performed on the airline on-time performance data and the local climatological data. OUTCOME: The learning outcomes involve experiencing the process of handling big data within data lake and understanding the pros and cons of big data infrastructure in analysis. IMPACT: This study will serve as course projects that covers the techniques involved in the extraction of big data and analytics for the courses in the data science and analytics programs. Enabling such skills in future students will create a stronger background to meet the expectations of the industry.
Building Big Data Platform for End-to-End Analytics Experience in Academic Environment
Hager-Lubbers Exhibition Hall
ABSTRACT: The goal of this study is to highlight the skills that students could gain from experiencing end-to-end analytics on big data platforms in the academic environment in order to meet real world expectations. After almost two decades, many industries realized that incorporating the fields of computer science, statistics, and machine learning to big data produced the competitive advantage and internal development needed. Beyond the business industries, data science is increasingly becoming the foundation of projects in a variety of domains e.g., health, manufacturing, and policy. This creates a wide range of expectations regarding recent data science graduates. PURPOSE: To promote departmental consideration for incorporating big data activity in courses. PROCEDURES: A data lake infrastructure was built for large scale analyses. Data from different sources was stored to distribute storage system within data lake. The data was then extracted out of the distributed database and prepare the dataset for the analysis. The exploratory data analysis was performed on the airline on-time performance data and the local climatological data. OUTCOME: The learning outcomes involve experiencing the process of handling big data within data lake and understanding the pros and cons of big data infrastructure in analysis. IMPACT: This study will serve as course projects that covers the techniques involved in the extraction of big data and analytics for the courses in the data science and analytics programs. Enabling such skills in future students will create a stronger background to meet the expectations of the industry.