Date on Master's Thesis/Doctoral Dissertation

5-2015

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Computer Engineering and Computer Science

Degree Program

Computer Science and Engineering, PhD

Committee Chair

Rouchka, Eric Christian

Committee Co-Chair (if applicable)

Chang, Dar-jen

Committee Member

Moseley, Hunter

Committee Member

Petruska, Jeffrey

Committee Member

Yampolskiy, Roman Vladimirovich

Subject

Nucleotide sequence; RNA--Analysis

Abstract

High-throughput mRNA sequencing (also known as RNA-Seq) promises to be the technique of choice for studying transcriptome profiles, offering several advantages over old techniques such as microarrays. This technique provides the ability to develop precise methodologies for a variety of RNA-Seq applications including gene expression quantification, novel transcript and exon discovery, differential expression (DE) and splice variant detection. The detection of significantly changing features (e.g. genes, transcript isoforms, exons) in expression across biological samples is a primary application of RNA-Seq. Uncovering which features are significantly differentially expressed between samples can provide insight into their functions. One major limitation with the majority of recently developed methods for RNA-Seq differential expression is the dependency on annotated biological features to detect expression differences across samples. This forces the identification of expression levels and the detection of significant changes to known genomic regions. Thus, any significant changes occurring in unannotated regions will not be captured. To overcome this limitation, we developed a novel segmentation approach, Island-Based (IBSeq), for analyzing differential expression in RNA-Seq and targeted sequencing (exome capture) data without specific knowledge of an isoform. IBSeq segmentation determines individual islands of expression based on windowed read counts that can be compared across experimental conditions to determine differential island expression. In order to detect differentially expressed features, the significance of DE islands corresponding to each feature are combined using combined p-value methods. We evaluated the performance of our approach by comparing it to a number of existing gene DE methods using several benchmark MAQC RNA-Seq datasets. Using the area under ROC curve (auROC) as a performance metric, results show that IBSeq clearly outperforms all other methods compared.

COinS