Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Bioinformatics and Biostatistics

Degree Program

Biostatistics, PhD

Committee Chair

Gaskins, Jeremy T.

Committee Co-Chair (if applicable)

Mitra, Ritendranath

Committee Member

Kong, Maiying

Committee Member

Pal, Subhadip

Committee Member

Bert, Little

Author's Keywords

Global-local shrinkage; variable selection; multivariate regression; sparsity; shrinkage; factor model


This dissertation involves developing novel Bayesian methodology for multivariate problems. In particular, it focuses on two contexts: shrinkage based variable selection in multivariate regression and simultaneous covariance estimation of multiple groups. Both these projects are centered around fully Bayesian inference schemes based on hierarchical modeling to capture context-specific features of the data and the development of computationally efficient estimation algorithm. Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few non-zero components, those covariates that are most important. The first project extends the global-local shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K-outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient-specific variance term that consists of a predictor-specific factor (shared local shrinkage parameter) and a model-specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example. Covariance estimation for multiple groups is a key feature for drawing inference from a heterogeneous population. One should seek to share information about common features in the dependence structures across the various groups. In the second project, we introduce a novel approach for estimating the covariance matrices for multiple groups using a hierarchical latent factor model that shrinks the factor loadings across groups toward a global value. Using a spike and slab model on these loading coefficients provides a level of sparsity in the global factor structure. Parameter estimation is accomplished through a Markov chain Monte Carlo scheme, and a model selection approach is used to determine the number of factors to use. Finally, a number of simulation studies and a data application are shown to demonstrate the performance of our methodology.

Included in

Biostatistics Commons