Event Title

An Efficient Method for Indexing Temporal Gene Expression Datasets

Location

Hager-Lubbers Exhibition Hall

Description

PURPOSE: Identifying temporal high throughput gene expression data sets for comparative transcriptome analysis. BACKGROUND: Comparative transcriptome analysis of high throughput gene expression temporal experiments helps with understanding of the complexities of living organisms with great value for diagnosis, treatment, and prevention of human diseases. METHODS AND MATERIALS: To identify those temporal patterns, temporal studies needed to be indexed appropriately since much of the publicly available high throughput expression data is of non-temporal nature. The index needed to be based on the MIAME standard as used for example for abstracting NCBI’s Gene Expression Omnibus (GEO) datasets. A simple keyword search of abstracts from NCBI GEO will only result in a large number of false positives. RESULTS: We essentially see this research as a text mining process to find the correct set of pertinent articles for the topic at hand. Using random samples of keyword search results, we repeatedly refined the search query to obtain better indexing of the datasets from temporal studies. We developed a dictionary of appropriate words for the algorithm to find in the search and thus were able to improve on false positive and false negative search results. CONCLUSIONS: Our promising model can assist researchers in bioinformatics, genetics, medicine, and scientific investigations, to find a more appropriate selection of scientific data online, while saving various resources (time, capital, talent). The refined indexing algorithm can be used in future studies to compare patterns of gene expression in bioinformatics.

This document is currently not available here.

Share

COinS
 
Apr 15th, 3:30 PM

An Efficient Method for Indexing Temporal Gene Expression Datasets

Hager-Lubbers Exhibition Hall

PURPOSE: Identifying temporal high throughput gene expression data sets for comparative transcriptome analysis. BACKGROUND: Comparative transcriptome analysis of high throughput gene expression temporal experiments helps with understanding of the complexities of living organisms with great value for diagnosis, treatment, and prevention of human diseases. METHODS AND MATERIALS: To identify those temporal patterns, temporal studies needed to be indexed appropriately since much of the publicly available high throughput expression data is of non-temporal nature. The index needed to be based on the MIAME standard as used for example for abstracting NCBI’s Gene Expression Omnibus (GEO) datasets. A simple keyword search of abstracts from NCBI GEO will only result in a large number of false positives. RESULTS: We essentially see this research as a text mining process to find the correct set of pertinent articles for the topic at hand. Using random samples of keyword search results, we repeatedly refined the search query to obtain better indexing of the datasets from temporal studies. We developed a dictionary of appropriate words for the algorithm to find in the search and thus were able to improve on false positive and false negative search results. CONCLUSIONS: Our promising model can assist researchers in bioinformatics, genetics, medicine, and scientific investigations, to find a more appropriate selection of scientific data online, while saving various resources (time, capital, talent). The refined indexing algorithm can be used in future studies to compare patterns of gene expression in bioinformatics.