Date on Master's Thesis/Doctoral Dissertation

5-2020

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Bioinformatics and Biostatistics

Degree Program

Biostatistics, PhD

Committee Chair

Rai, Shesh

Committee Co-Chair (if applicable)

Pal, Subhadip

Committee Member

Mitra, Riten

Committee Member

Wu, Dongfeng

Committee Member

McClain, Craig

Author's Keywords

Bayesian shrinkage priors; machine learning; classification; data mining; clinical trials; personalized medicine

Abstract

Generalized linear models have broad applications in biostatistics and sociology. In a regression setup, the main target is to find a relevant set of predictors out of a large collection of covariates. Sparsity is the assumption that only a few of these covariates in a regression setup have a meaningful correlation with an outcome variate of interest. Sparsity is incorporated by regularizing the irrelevant slopes towards zero without changing the relevant predictors and keeping the resulting inferences intact. Frequentist variable selection and sparsity are addressed by popular techniques like Lasso, Elastic Net. Bayesian penalized regression can tackle the curse of high dimensions by prior distributions called Shrinkage Priors. This dissertation presents a unified Bayesian hierarchical framework that implements and compares global-local shrinkage priors in logistic regression and negative binomial regression. The key feature of the approach is a representation of the likelihood using a Polya-gamma data augmentation approach that can be naturally integrated with several Bayesian priors, explicitly focusing on the Horseshoe, the Dirichlet Laplace, and Double Pareto priors. Posterior inference schemes based on Gibbs sampling were developed for both low and high-dimensional settings. Extensive simulation studies were conducted to assess the performances of these priors under different settings of sample sizes, parameter values, and covariate dimensions. The results show excellent predictive performance in terms of accuracies in most of the simulation scenarios. The method was applied to real datasets emerging from a wide range of applications in Colorectal cancer, Alzheimer's disease (ADNI and OASIS), diabetes (Pima Indians Diabetes), doctor visit counts (DebTrivedi) and twitter data (Amazon). Specifically, the performance of the method was successfully validated in the context of gene-treatment interaction models that are used for assessing patient sensitivity in clinical trials. Chapter 1 describes the historical development of Bayesian shrinkage priors. Chapter 2 compares different classification, machine learning tools, including shrinkage priors after high throughput gene expression analysis on a colorectal cancer data. Chapter 3 provides a basic understanding of how simple classification models like logistic regression with shrinkage priors have a profound impact on drug development by biomarker-guided clinical trials utilizing gene-treatment interactions. It also depicts its role in patient sensitivity analysis and precision medicine. Chapter 4 and Chapter 5 explains the detailed methodology of shrinkage priors and data augmentation in logistic regression and negative binomial regression respectively and provides the theory behind the above applications. Chapter 6 gives future research guidelines in multinomial logistic regression and development of R package "ShrinkageBayesGlm."

Share

COinS