Date on Master's Thesis/Doctoral Dissertation
5-2020
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Bioinformatics and Biostatistics
Degree Program
Biostatistics, PhD
Committee Chair
Rai, Shesh
Committee Co-Chair (if applicable)
Pal, Subhadip
Committee Member
Mitra, Riten
Committee Member
Wu, Dongfeng
Committee Member
McClain, Craig
Author's Keywords
Bayesian shrinkage priors; machine learning; classification; data mining; clinical trials; personalized medicine
Abstract
Generalized linear models have broad applications in biostatistics and sociology. In a regression setup, the main target is to find a relevant set of predictors out of a large collection of covariates. Sparsity is the assumption that only a few of these covariates in a regression setup have a meaningful correlation with an outcome variate of interest. Sparsity is incorporated by regularizing the irrelevant slopes towards zero without changing the relevant predictors and keeping the resulting inferences intact. Frequentist variable selection and sparsity are addressed by popular techniques like Lasso, Elastic Net. Bayesian penalized regression can tackle the curse of high dimensions by prior distributions called Shrinkage Priors. This dissertation presents a unified Bayesian hierarchical framework that implements and compares global-local shrinkage priors in logistic regression and negative binomial regression. The key feature of the approach is a representation of the likelihood using a Polya-gamma data augmentation approach that can be naturally integrated with several Bayesian priors, explicitly focusing on the Horseshoe, the Dirichlet Laplace, and Double Pareto priors. Posterior inference schemes based on Gibbs sampling were developed for both low and high-dimensional settings. Extensive simulation studies were conducted to assess the performances of these priors under different settings of sample sizes, parameter values, and covariate dimensions. The results show excellent predictive performance in terms of accuracies in most of the simulation scenarios. The method was applied to real datasets emerging from a wide range of applications in Colorectal cancer, Alzheimer's disease (ADNI and OASIS), diabetes (Pima Indians Diabetes), doctor visit counts (DebTrivedi) and twitter data (Amazon). Specifically, the performance of the method was successfully validated in the context of gene-treatment interaction models that are used for assessing patient sensitivity in clinical trials. Chapter 1 describes the historical development of Bayesian shrinkage priors. Chapter 2 compares different classification, machine learning tools, including shrinkage priors after high throughput gene expression analysis on a colorectal cancer data. Chapter 3 provides a basic understanding of how simple classification models like logistic regression with shrinkage priors have a profound impact on drug development by biomarker-guided clinical trials utilizing gene-treatment interactions. It also depicts its role in patient sensitivity analysis and precision medicine. Chapter 4 and Chapter 5 explains the detailed methodology of shrinkage priors and data augmentation in logistic regression and negative binomial regression respectively and provides the theory behind the above applications. Chapter 6 gives future research guidelines in multinomial logistic regression and development of R package "ShrinkageBayesGlm."
Recommended Citation
Bhattacharyya, Arinjita, "Novel inference methods for generalized linear models using shrinkage priors and data augmentation." (2020). Electronic Theses and Dissertations. Paper 4256.
https://doi.org/10.18297/etd/4256
Included in
Analysis Commons, Applied Statistics Commons, Biostatistics Commons, Categorical Data Analysis Commons, Clinical Trials Commons, Diseases Commons, Microarrays Commons, Multivariate Analysis Commons, Oncology Commons, Other Computer Sciences Commons, Preventive Medicine Commons, Statistical Methodology Commons, Statistical Models Commons