Event Title

Building Big Data Platform for End-to-End Analytics Experience in Academic Environment

Location

Hager-Lubbers Exhibition Hall

Description

ABSTRACT: The goal of this study is to highlight the skills that students could gain from experiencing end-to-end analytics on big data platforms in the academic environment in order to meet real world expectations. After almost two decades, many industries realized that incorporating the fields of computer science, statistics, and machine learning to big data produced the competitive advantage and internal development needed. Beyond the business industries, data science is increasingly becoming the foundation of projects in a variety of domains e.g., health, manufacturing, and policy. This creates a wide range of expectations regarding recent data science graduates. PURPOSE: To promote departmental consideration for incorporating big data activity in courses. PROCEDURES: A data lake infrastructure was built for large scale analyses. Data from different sources was stored to distribute storage system within data lake. The data was then extracted out of the distributed database and prepare the dataset for the analysis. The exploratory data analysis was performed on the airline on-time performance data and the local climatological data. OUTCOME: The learning outcomes involve experiencing the process of handling big data within data lake and understanding the pros and cons of big data infrastructure in analysis. IMPACT: This study will serve as course projects that covers the techniques involved in the extraction of big data and analytics for the courses in the data science and analytics programs. Enabling such skills in future students will create a stronger background to meet the expectations of the industry.

This document is currently not available here.

Share

COinS
 
Apr 15th, 3:30 PM

Building Big Data Platform for End-to-End Analytics Experience in Academic Environment

Hager-Lubbers Exhibition Hall

ABSTRACT: The goal of this study is to highlight the skills that students could gain from experiencing end-to-end analytics on big data platforms in the academic environment in order to meet real world expectations. After almost two decades, many industries realized that incorporating the fields of computer science, statistics, and machine learning to big data produced the competitive advantage and internal development needed. Beyond the business industries, data science is increasingly becoming the foundation of projects in a variety of domains e.g., health, manufacturing, and policy. This creates a wide range of expectations regarding recent data science graduates. PURPOSE: To promote departmental consideration for incorporating big data activity in courses. PROCEDURES: A data lake infrastructure was built for large scale analyses. Data from different sources was stored to distribute storage system within data lake. The data was then extracted out of the distributed database and prepare the dataset for the analysis. The exploratory data analysis was performed on the airline on-time performance data and the local climatological data. OUTCOME: The learning outcomes involve experiencing the process of handling big data within data lake and understanding the pros and cons of big data infrastructure in analysis. IMPACT: This study will serve as course projects that covers the techniques involved in the extraction of big data and analytics for the courses in the data science and analytics programs. Enabling such skills in future students will create a stronger background to meet the expectations of the industry.