Date on Master's Thesis/Doctoral Dissertation
8-2020
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Bioinformatics and Biostatistics
Degree Program
Biostatistics, PhD
Committee Chair
Lorenz, Doug
Committee Co-Chair (if applicable)
Datta, Somnath
Committee Member
Datta, Somnath
Committee Member
Gill, Ryan
Committee Member
Kong, Maiying
Committee Member
Kulasekera, KB
Author's Keywords
informative cluster size; informative within-cluster group size; clustered data; marginal inference
Abstract
Clustered data result when observations have some natural organizational association. In such data, cluster size is defined as the number of observations belonging to a cluster. A phenomenon termed informative cluster size (ICS) occurs when observation outcomes vary in a systematic way related to the cluster size. An additional form of informativeness, termed informative within-cluster group size (IWCGS), arises when the distribution of group-defining categorical covariates within clusters similarly carries information related to outcomes. Standard methods for the marginal analysis of clustered data can produce biased estimates and inference when data have informativeness. A reweighting methodology has been developed that is resistant to ICS and IWCGS bias, and this method has been used to establish clustered data analogs of classical hypothesis tests related to ranks and correlation. In this work, we extend the reweighting methodology to develop a versatile collection of marginal hypothesis tests related to proportions, means, and variances in clustered data that are analogous to classical forms. We evaluate the performance of these tests compared to other cluster-appropriate methods through simulation and show that only reweighted tests maintain appropriate size when data have informativeness. We construct reweighted tests of clustered categorical data using several variance estimators, and demonstrate that the method of variance estimation can have substantial effect on these tests. Additionally, we show that when testing simple hypotheses in data lacking informativeness, reweighted tests can outperform other standard cluster-appropriate methods both in terms of size and power. Combining our novel tests with the existing tests of ranks and correlations, we compile a comprehensive R software package that executes this collection of ICS/IWCGS-appropriate methods through a thoughtful and user-friendly design.
Recommended Citation
Gregg, Mary Elizabeth, "Marginal methods and software for clustered data with cluster- and group-size informativeness." (2020). Electronic Theses and Dissertations. Paper 3482.
https://doi.org/10.18297/etd/3482