Date on Master's Thesis/Doctoral Dissertation

8-2024

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Bioinformatics and Biostatistics

Degree Program

Biostatistics, PhD

Committee Chair

Mitra, Riten

Committee Co-Chair (if applicable)

Kong, Maiying

Committee Member

Kulasekera, Karunarathna B.

Committee Member

Gaskins, Jeremy

Committee Member

Levinson, Cheri

Author's Keywords

Bayesian inference; multistate models; markov trees; high dimensional; time-to-event data; nonparametric method

Abstract

This dissertation consists of two projects. The first one involves nonparametric methods on Continuous Time Markov Chains (CTMCs). The second one is centered around Bayesian shrinkage models for detecting prognostic and predictive biomarkers in high-dimensional clinical data. Both these projects build on methods from across the frequentist and Bayesian paradigm to offer novel solutions. In the first project, we aim to model the nonlinear effects of continuous variables within multistate framework in a non-parametrically by appealing to the rich mathematical framework of Reproducing Kernel Hilbert Spaces (RKHS). Then we adapted the classical Representer Theorem to penalized (squared norm) log-likelihood which guarantees the optimizer of objective function is a linear combination of Gaussian kernel. Wethen embedded this structure in Bayesian settings using shrinkage priors and successfully used the Expectation Maximization Variable Selection (EMVS) for inference. An extensive set of simulations attested strongly to the validity of our approach.Our performance metrics included a normalized difference of the estimated and true non-linear transition functions as well as the difference between probability distributions induced by these transitions. The latter essentially captured the ability of our methods to predict the long-term behaviors. To our knowledge, this confluence of (i) Bayesian regression (ii) CTMCs and (iii) RKHS in a single unified framework is the first in methodological literature. In the second project, we formulated a setup for variable selection in clinical time-to-event data. Each of these groups are indexed by different levels of a treatment variable. We are interested in estimating the interaction effect of extraneous covariates (e.g, biomarkers, demographics) on the treatment to survival outcomes based on an interaction model between the treatment and covariates. For a high-dimensional covariate set, it is natural to expect only a few of them to be significant treatment modifiers, thus necessitating careful variable selection. From a frequentist perspective, we first adapt smoothly clipped absolute deviation(SCAD) and group SCAD(grSCAD) methods in a survival modeling framework to avoid excessively penalizing the large parameters. We also explored the Bayesian counterpart using shrinkage priors and executed them through Stan. We carried out extensive simulations in various scenarios varying the levels of sparsity and magnitude of regression coefficients. In each of these scenarios we evaluated model performance by mean-squared errors and accuracy measures associated with variable selection. Next, we applied the algorithm on both randomized controlled trials and observational studies.

Share

COinS