Optimizing de novo Assembly Algorithms for Next-Generation Sequencing Data
Presentation Type
Poster/Portfolio
Presenter Major(s)
Computer Science, Mathematics
Mentor Information
Christian Trefftz, Greg Wolffe
Department
School of Computing and Information Systems
Location
Henry Hall Atrium 68
Start Date
11-4-2012 9:00 AM
Keywords
Information, Innovation, and Technology, Life Science, Mathematical Science, Technology
Abstract
Next-generation sequencing (NGS) platforms have presented unique challenges to the computing community. The large number of short reads characteristic of NGS data has increased the difficulty of assembling genomes without use of a reference sequence, a method known as de novo sequence assembly. Further complicating the problem is the recent interest in metagenomics, the sequencing of multi-genetic material from environmental samples. Specialized data structures, such as de Bruijn graphs and bloom filters, have been incorporated as the backbone of modern assembly software. But as the rapid growth in metagenomic data illustrates, the development of new data structures and algorithms must continue to keep pace. The goal of this project is to analyze and optimize the performance of these assembly algorithms, focusing specifically on the pre-processing and graph partitioning stages. Both memory usage and run-time optimizations are considered, and a range of computing platforms is targeted.
Optimizing de novo Assembly Algorithms for Next-Generation Sequencing Data
Henry Hall Atrium 68
Next-generation sequencing (NGS) platforms have presented unique challenges to the computing community. The large number of short reads characteristic of NGS data has increased the difficulty of assembling genomes without use of a reference sequence, a method known as de novo sequence assembly. Further complicating the problem is the recent interest in metagenomics, the sequencing of multi-genetic material from environmental samples. Specialized data structures, such as de Bruijn graphs and bloom filters, have been incorporated as the backbone of modern assembly software. But as the rapid growth in metagenomic data illustrates, the development of new data structures and algorithms must continue to keep pace. The goal of this project is to analyze and optimize the performance of these assembly algorithms, focusing specifically on the pre-processing and graph partitioning stages. Both memory usage and run-time optimizations are considered, and a range of computing platforms is targeted.