Date on Master's Thesis/Doctoral Dissertation
5-2020
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Bioinformatics and Biostatistics
Degree Program
Biostatistics, PhD
Committee Chair
Gaskins, Jeremy
Committee Co-Chair (if applicable)
Datta, Susmita
Committee Member
Kulasekera, K.B.
Committee Member
Kong, Maiying
Committee Member
Gill, Ryan
Author's Keywords
Single-cell; RNA-seq; hurdle model; factor model; differential expression; gene networks
Abstract
With single-cell RNA sequencing (scRNA-seq) technology, researchers are able to gain a better understanding of health and disease through the analysis of gene expression data at the cellular-level; however, scRNA-seq data tend to have high proportions of zero values, increased cell-to-cell variability, and overdispersion due to abnormally large expression counts, which create new statistical problems that need to be addressed. This dissertation includes three research projects that propose Bayesian methodology suitable for scRNA-seq analysis. In the first project, a hurdle model for identifying differentially expressed genes across cell types in scRNA-seq data is presented. This model incorporates a correlated random effects structure based on an initial clustering of cells to capture the cell-to-cell variability within treatment groups but can easily be adapted to an independent random effect structure if needed. A sparse Bayesian factor model is introduced in the second project to uncover network structures associated with genes in scRNA-seq data. Latent factors impact the gene expression values for each cell and provide flexibility to account for the common features of scRNA-seq. The third project expands upon this latent factor model to allow for the comparison of networks across different treatment groups.
Recommended Citation
Sekula, Michael, "Novel Bayesian methodology for the analysis of single-cell RNA sequencing data." (2020). Electronic Theses and Dissertations. Paper 3416.
https://doi.org/10.18297/etd/3416