Event Title

Increasing the Performance of R Package ‘Penalized’ Through Integration of C++ with Rcpparmadillo

Location

Hager-Lubbers Exhibition Hall

Start Date

18-4-2017 3:30 PM

Description

BACKGROUND AND PURPOSE: Penalized is a R package which allows users to fit penalized regression models to high dimensional data. It can perform linear regression, logistic regression, Poisson regression, and Cox proportional hazards regression. Penalized allows for the application of ℓ1, ℓ2, or fused lasso penalties to each of these models. Due to the nature of the problems that Penalized is used for, performance is essential. The purpose of this project was to increase the performance of Penalized by rewriting portions of its code in a faster programming language while maintaining an interface with R. PROCEDURES: First, the code base was profiled using the package GUIProfiler. This was done to determine which sections of code should be converted. Next, these sections were rewritten in a faster programming language. C++ was the high-performance language chosen for this conversion. RcppArmadillo was used to provide an interface between the C++ and R code. OUTCOME: Speedups were obtained in the range of 1.3-2.2 depending on the function, parameters and input data. Models fitted with an ℓ2 penalty had the largest performance increase. Cox regression with an ℓ2 penalty performed on the nki70 dataset resulted in a median speedup of 2.05. IMPACT: Having more efficient software tools allows statisticians, data scientists, and computational biologists to work on larger datasets and develop pipelines with lower turnaround time.

This document is currently not available here.

Share

COinS
 
Apr 18th, 3:30 PM

Increasing the Performance of R Package ‘Penalized’ Through Integration of C++ with Rcpparmadillo

Hager-Lubbers Exhibition Hall

BACKGROUND AND PURPOSE: Penalized is a R package which allows users to fit penalized regression models to high dimensional data. It can perform linear regression, logistic regression, Poisson regression, and Cox proportional hazards regression. Penalized allows for the application of ℓ1, ℓ2, or fused lasso penalties to each of these models. Due to the nature of the problems that Penalized is used for, performance is essential. The purpose of this project was to increase the performance of Penalized by rewriting portions of its code in a faster programming language while maintaining an interface with R. PROCEDURES: First, the code base was profiled using the package GUIProfiler. This was done to determine which sections of code should be converted. Next, these sections were rewritten in a faster programming language. C++ was the high-performance language chosen for this conversion. RcppArmadillo was used to provide an interface between the C++ and R code. OUTCOME: Speedups were obtained in the range of 1.3-2.2 depending on the function, parameters and input data. Models fitted with an ℓ2 penalty had the largest performance increase. Cox regression with an ℓ2 penalty performed on the nki70 dataset resulted in a median speedup of 2.05. IMPACT: Having more efficient software tools allows statisticians, data scientists, and computational biologists to work on larger datasets and develop pipelines with lower turnaround time.