Document Type


Lead Author Type

MBI Masters Student


Guenter Tusch

Embargo Period



This research project attempts to evaluate a fly genome from a clinical perspective. To study the clinical translation of genetic risk, Drosophila Melanogaster has been chosen as the model organism. A similar study to this, on the human genome, has been reported in [1]. Thanks to technological advancement, the cost of genome sequencing has significantly decreased in recent years, thus making genetic information potentially accessible for clinical use. However, the explanatory power and clinical implementation and utilization of risk estimates for common variants as found in genome-wide association studies still remain widely unclear [1]. Drosophila Melanogaster has been used in this model, because it is attractive to study as explained in [2]. Many basic biological, physiological, and neurological properties are conserved between mammals and D. melanogaster. Nearly 75% of human disease-causing genes are believed to have a functional homolog in the fly. The data source is the National Center for Biotechnology Information’s (NCBI’s) Sequence Read Archive (SRA), which stores raw sequence data from the next generation of sequencing platforms (“Next-Gen Sequencing”). To quantify the genetic risk information available in different databases (NCBI SRA, Flybase, etc.) has been integrated as shown in the below diagram.

The following pipeline is used for the analysis:

The model is tested on a selected number of human disease genes. To assess the clinical risk of a specific disease the fly gene orthologs of the human disease genes are identified and extracted from the fly genome. The risk is calculated using a Hidden Markov Model (HMM) trained from known mutations of the corresponding gene.

Here is the software currently used for the project:

a) Acquire Genome (Mapping and sorting): BWA, SAMTOOLS

b) Genes associated to diseases: AUGUSTUS (

c) Calculation of likelihood of clinical risk (similarity of sequences): NCBI BLAST, HMMER

d) Gene-Environment Interaction Diagram: Programmed in HTML5.

This is a proof of concept study.