Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Bioinformatics and Biostatistics

Degree Program

Biostatistics, PhD

Committee Chair

Gaskins, Jeremy

Committee Co-Chair (if applicable)

Kong, Maiying

Committee Member

Kong, Maiying

Committee Member

Lorenz, Doug

Committee Member

Mitra, Riten

Committee Member

Gill, Ryan

Author's Keywords

Causal inference; propensity scores; confounder selection


Causal inference is a method used in various fields to draw causal conclusions based on data. It involves using assumptions, study designs, and estimation strategies to minimize the impact of confounding variables. Propensity scores are used to estimate outcome effects, through matching methods, stratification, weighting methods, and the Covariate Balancing Propensity Score method. However, they can be sensitive to estimation techniques and can lead to unstable findings. Researchers have proposed integrating weighing with regression adjustment in parametric models to improve causal inference validity. The first project focuses on Bayesian joint and two-stage methods for propensity score analysis. Propensity score modeling involves calculating each patient’s probability of receiving the treatment and using the obtained values to design a regression model. The Bayesian joint propensity score methods provide feedback between the stages, which is the process of using the outcomes of a statistical model to improve or refine the model itself. We propose a likelihood framework to estimate the average treatment effect by mimicking the weighted likelihood commonly used. The proposed likelihood structure involves a normal distribution for the response variable given the treatment and the covariates in the outcome model, and a Bernoulli distribution for the treatment given the covariates in the propensity score model. The coefficients of the covariates in the propensity score model and the outcome model are estimated through a Bayesian approach. We will demonstrate the proposed method with simulation studies and real-world data taken from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study (NHEFS). The results of this project will contribute to the understanding of the effectiveness of Bayesian joint and two-stage methods for propensity score analysis and provide insights into the potential biases that may arise from feedback in joint Bayesian PS estimation. The second project focuses on developing new methods to estimate propensity scores using Bayesian variable selection techniques. By fitting Bayesian variable selection models for both the outcome and the exposure, the role of each predictor is identified. The goal is to combine variable selection results from both models to yield a new propensity score model that utilizes only those predictors associated with both of the models. The proposed method aims to achieve a strong and precise propensity score (PS) to establish a balanced distribution of covariates across treatment groups, simplifying the process of estimating treatment effects. The method is based on the Spike-and-Slab model, which integrates two prior distributions: a ``spike" component and a ``slab" component. The spike component promotes sparsity, while the slab component allows non-zero coefficients. The method is used to manage model selection and variable inclusion in regression analysis. The proposed method comprises two distinct models, the treatment model and the outcome model, which use a spike-and-slab prior to learn predictors associated with each. Our method has been proven to be competitive through extensive simulations conducted across a range of applications. Ultimately, our method was applied to the ``nhefs" dataset and the ``lalonde" dataset in order to evaluate the efficacy of the approach.