Climate Science – It’s all in the Cloud

Document Type


Lead Author Type

CIS Masters Student


Dr. Greg Wolffe, wolffe@gvsu.edu

Embargo Period



The vast amount of data being generated each year, especially in the sciences, demands new computing paradigms to facilitate its analysis. Given the sheer size of modern datasets, traditional platforms and methods are inadequate, and adapting these methods to scalable, multi-processing models has proven to be a non-trivial task.

Hadoop is one such tool designed to aid researchers in this endeavor, particularly for Big Data domains. However, the shortcomings of Hadoop – first released in December of 2011 – are beginning to appear as more application experience accumulates. For example: Hadoop setup can be quite complicated, it is reliant solely on the Map-Reduce processing paradigm, it suffers performance degradation due to extensive use of I/O system calls, and it struggles to accommodate streaming data. All of these issues are addressed by Apache Spark, an in-memory, multi-paradigm framework that can operate on both static and streaming data.

Our proof-of-concept research project used Apache Spark to solve a large problem in a scientific domain with contemporary social significance. The project proposed to create a new measure called the Climate Resilience Index (CRI), modeled after the Multi-dimensional Poverty Index (MPI). The index is intended to be a measure of a locality's ability to recover from a catastrophic climatological event. Like the MPI, it will consist of a number of factors ranging from climate metrics to educational status to living standards.

Specifically, Apache Spark was used to process large datasets of Moderate Resolution Imaging Spectroradiometer (MODIS) remote sensing data, correlated with available U.S. government economic data, to identify areas of the country that are at extreme risk of suffering and recovering from drought.

Results showed that Apache Spark is a robust, easy to use, and powerful tool for Big Data exploration and analysis. Future work is centered on finalizing the creation of the Climate Resilience Index and validating its utility.

This document is currently not available here.