Bayesian variable selection strategies in longitudinal mixture models and categorical regression problems.
Date on Master's Thesis/Doctoral Dissertation
Bioinformatics and Biostatistics
Committee Co-Chair (if applicable)
Variable screening; mixture models; shrinkage; bayesian analysis; variable selection
In this work, we seek to develop a variable screening and selection method for Bayesian mixture models with longitudinal data. To develop this method, we consider data from the Health and Retirement Survey (HRS) conducted by University of Michigan. Considering yearly out-of-pocket expenditures as the longitudinal response variable, we consider a Bayesian mixture model with $K$ components. The data consist of a large collection of demographic, financial, and health-related baseline characteristics, and we wish to find a subset of these that impact cluster membership. An initial mixture model without any cluster-level predictors is fit to the data through an MCMC algorithm, and then a variable screening step finds a set of candidate predictors that may be associated with the cluster configurations found in the initial fit. For each predictor, we choose a discrepancy measure such as frequentist hypothesis tests that will measure the differences in the predictor values across clusters. A large discrepancy provides evidence that the clusters (and the corresponding response trajectories) differ across the baseline characteristic, and these are used to choose a small set of predictors to include in a multinomial logit model for cluster membership. The stepwise logit model along with other choices is considered as a multivariate variable screening approach. The performance of this methodology is explored in both simulations and real data. Additionally, we consider the problem of variable selection in the baseline categorical logit model for categorical regression. While there are a number of studies considering variable selection in the regression paradigm with a numerical response, the research is limited for a categorical response variable. The main goal of this project is to develop a method for leveraging the features of the global-local shrinkage framework to improve variable selection in baseline categorical logistic regression by introducing new shrinkage priors that encourage similar predictors to be selected across the models for different response levels. To that end, the proposed shrinkage priors share information across response models through the local parameters that favor similar levels of shrinkage for all coefficients (log odds ratios) of a predictor. We explore different shrinkage approaches using the horseshoe and normal gamma priors within our setting and compare to a spike and slab setup and other shrinkage priors that fail to share information across models. We explore the performance of our approach in both simulations and a real data application.
Uddin, Md Nazir, "Bayesian variable selection strategies in longitudinal mixture models and categorical regression problems." (2021). Electronic Theses and Dissertations. Paper 3701.
Applied Statistics Commons, Biostatistics Commons, Categorical Data Analysis Commons, Longitudinal Data Analysis and Time Series Commons, Multivariate Analysis Commons, Probability Commons, Statistical Methodology Commons, Statistical Models Commons