Date on Master's Thesis/Doctoral Dissertation
5-2016
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Bioinformatics and Biostatistics
Degree Program
Biostatistics, PhD
Committee Chair
Brock, Guy
Committee Co-Chair (if applicable)
Lorenz, Douglas
Committee Member
Kong, Maiying
Committee Member
Kulasekera, K. B.
Committee Member
Mukhopadhyay, Partha
Committee Member
Wu, Dongfeng
Author's Keywords
Bioinformatic; China; Xinjiang; Louisville; Biostatistics: Dake Yang
Abstract
MicroRNAs (miRNAs) are a large number of small endogenous non-coding RNA molecules (18-25 nucleotides in length) which regulate expression of genes post-transcriptionally. While a variety of algorithms exist for determining the targets of miRNAs, they are generally based on sequence information and frequently produce lists consisting of thousands of genes. Canonical correlation analysis (CCA) is a multivariate statistical method that can be used to find linear relationships between two data sets, and here we apply CCA to find the linear combination of differentially expressed miRNAs and their corresponding target genes having maximal negative correlation. Due to the high dimensionality, sparse CCA is used to constrain the problem and obtain a solution. A novel gene set enrichment analysis statistic is proposed based on the sparse CCA results for estimating the significance of predefined gene sets. The methods are illustrated with both a simulation study and real miRNA-mRNA expression data. DNA methylation is a process of adding a methyl group to DNA by a group of enzymes collectively known as DNA methyltransferases which is an epigenetic modification critical to normal genome regulation and development. In order to understand the role of DNA methylation in gene differentiation, we analyze genome-scale DNA methylation patterns and gene expression data using sparse CCA to find linear combinations between the two data sets which have maximal negative correlation. In a similar spirit to the miRNA-mRNA study, we create a GSEA statistic with weight vectors from the sparse CCA method and assess the significance of predefined gene sets. The method is exemplified with real gene expression / DNA methylation data regarding the development of the embryonic murine palate.
Recommended Citation
Yang, Dake, "Integrated analysis of miRNA/mRNA expression and gene methylation using sparse canonical correlation analysis." (2016). Electronic Theses and Dissertations. Paper 2439.
https://doi.org/10.18297/etd/2439