# Increasing the Performance of R Package 'Penalized' through Integration of C++ with RcppArmadillo

## Location

Hager-Lubbers Exhibition Hall

## Description

**Background and Purpose: **Penalized is a R package which allows users to fit penalized regression models to high dimensional data. It can perform linear regression, logistic regression, Poisson regression, and Cox proportional hazards regression. Penalized allows for the application of ℓ1, ℓ2, or fused lasso penalties to each of these models. Because of the nature of the problems that Penalized is used for, performance is essential. The purpose of this project was to increase the performance of Penalized by rewriting portions of its code in a faster programming language while maintaining an interface with R. ** Procedures: **First, the code base was profiled using the package GUIProfiler. This was done to determine which sections of code should be converted. Next, these sections were rewritten in a faster programming language. C++ was the high-performance language chosen for this conversion. RcppArmadillo was used to provide an interface between the C++ and R code. **Outcome: **Significant speedups were obtained when fitting models with an ℓ1 penalty. Median speedups for these models ranged from 1.53 to 2.41 when different types of regression were benchmarked against the original version of penalized. Recoding functions associated with the ℓ2 penalty did not result in a speedup, so this code was left in its original state. Cox regression had the largest speedups when compared to logistic, linear, and Poisson regression.** Impact: **Having more efficient software tools allows statisticians, data scientists, and computational biologists to work on larger datasets and develop pipelines with lower turnaround time.

Increasing the Performance of R Package 'Penalized' through Integration of C++ with RcppArmadillo

Hager-Lubbers Exhibition Hall

**Background and Purpose: **Penalized is a R package which allows users to fit penalized regression models to high dimensional data. It can perform linear regression, logistic regression, Poisson regression, and Cox proportional hazards regression. Penalized allows for the application of ℓ1, ℓ2, or fused lasso penalties to each of these models. Because of the nature of the problems that Penalized is used for, performance is essential. The purpose of this project was to increase the performance of Penalized by rewriting portions of its code in a faster programming language while maintaining an interface with R. ** Procedures: **First, the code base was profiled using the package GUIProfiler. This was done to determine which sections of code should be converted. Next, these sections were rewritten in a faster programming language. C++ was the high-performance language chosen for this conversion. RcppArmadillo was used to provide an interface between the C++ and R code. **Outcome: **Significant speedups were obtained when fitting models with an ℓ1 penalty. Median speedups for these models ranged from 1.53 to 2.41 when different types of regression were benchmarked against the original version of penalized. Recoding functions associated with the ℓ2 penalty did not result in a speedup, so this code was left in its original state. Cox regression had the largest speedups when compared to logistic, linear, and Poisson regression.** Impact: **Having more efficient software tools allows statisticians, data scientists, and computational biologists to work on larger datasets and develop pipelines with lower turnaround time.