Date on Master's Thesis/Doctoral Dissertation
8-2024
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Bioinformatics and Biostatistics
Degree Program
Biostatistics, PhD
Committee Chair
Mitra, Riten
Committee Co-Chair (if applicable)
Kong, Maiying
Committee Member
Kulasekera, Karunarathna B.
Committee Member
Gaskins, Jeremy
Committee Member
Levinson, Cheri
Author's Keywords
Bayesian inference; multistate models; markov trees; high dimensional; time-to-event data; nonparametric method
Abstract
This dissertation consists of two projects. The first one involves nonparametric methods on Continuous Time Markov Chains (CTMCs). The second one is centered around Bayesian shrinkage models for detecting prognostic and predictive biomarkers in high-dimensional clinical data. Both these projects build on methods from across the frequentist and Bayesian paradigm to offer novel solutions. In the first project, we aim to model the nonlinear effects of continuous variables within multistate framework in a non-parametrically by appealing to the rich mathematical framework of Reproducing Kernel Hilbert Spaces (RKHS). Then we adapted the classical Representer Theorem to penalized (squared norm) log-likelihood which guarantees the optimizer of objective function is a linear combination of Gaussian kernel. Wethen embedded this structure in Bayesian settings using shrinkage priors and successfully used the Expectation Maximization Variable Selection (EMVS) for inference. An extensive set of simulations attested strongly to the validity of our approach.Our performance metrics included a normalized difference of the estimated and true non-linear transition functions as well as the difference between probability distributions induced by these transitions. The latter essentially captured the ability of our methods to predict the long-term behaviors. To our knowledge, this confluence of (i) Bayesian regression (ii) CTMCs and (iii) RKHS in a single unified framework is the first in methodological literature. In the second project, we formulated a setup for variable selection in clinical time-to-event data. Each of these groups are indexed by different levels of a treatment variable. We are interested in estimating the interaction effect of extraneous covariates (e.g, biomarkers, demographics) on the treatment to survival outcomes based on an interaction model between the treatment and covariates. For a high-dimensional covariate set, it is natural to expect only a few of them to be significant treatment modifiers, thus necessitating careful variable selection. From a frequentist perspective, we first adapt smoothly clipped absolute deviation(SCAD) and group SCAD(grSCAD) methods in a survival modeling framework to avoid excessively penalizing the large parameters. We also explored the Bayesian counterpart using shrinkage priors and executed them through Stan. We carried out extensive simulations in various scenarios varying the levels of sparsity and magnitude of regression coefficients. In each of these scenarios we evaluated model performance by mean-squared errors and accuracy measures associated with variable selection. Next, we applied the algorithm on both randomized controlled trials and observational studies.
Recommended Citation
Han, Yuchen, "Bayesian approaches in multi-state Markov models and high dimensional time-to-event data." (2024). Electronic Theses and Dissertations. Paper 4437.
Retrieved from https://ir.library.louisville.edu/etd/4437
Included in
Biostatistics Commons, Statistical Methodology Commons, Statistical Models Commons, Statistical Theory Commons, Survival Analysis Commons