Date on Master's Thesis/Doctoral Dissertation

8-2008

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Computer Engineering and Computer Science

Committee Chair

Rouchka, Eric Christian

Author's Keywords

Pseudogene detection; Computational biology; Expressed sequence tags; WU BLAST; Phosphodiesterase 4

Subject

Gene mapping; Genetics--Technique; Computational biology

Abstract

High-throughput sequencing has provided a myriad of genetic data for thousands of organisms. Computational analysis of one data type, expressed sequence tags (ESTs) yields insight into gene expression, alternative splicing, tissue specificity gene functionality and the detection and differentiation of pseudogenes. Two computational methods have been developed to analyze alternative splicing events and to detect and characterize pseudogenes using ESTs. A case study of rat phosphodiesterase 4 (PDE4) genes yielded more than twenty-five previously unreported isoforms. These were experimentally verified through wet lab collaboration and found to be tissue specific. In addition, thirteen cytochrome-like gene and pseudogene sequences from the human genome were analyzed for pseudogene properties. Of the thirteen sequences, one was identified as the actual cytochrome gene, two were found to be non-cytochrome-related sequences, and eight were determined to be pseudogenes. The remaining two sequences were identified to be duplicates. As a precursor to applying the two new methods, the efficiency of three BLAST algorithms (NCBI BLAST, WU BLAST and mpiBLAST) were examined for comparing large numbers of short sequences (ESTs) to fewer large sequences (genomic regions). In general, WU BLAST was found to be the most efficient sequence comparison tool. These approaches illustrate the power of ESTs in understanding gene expression. Efficient computational analysis of ESTs (such as the two tools described) will be vital to understanding the complexity of gene expression as more high-throughput EST data is made available via advances in molecular sequencing technologies, such as the current next-generation approaches.

Share

COinS