Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Bioinformatics and Biostatistics

Degree Program

Biostatistics, PhD

Committee Chair

Lorenz, Doug

Committee Co-Chair (if applicable)

Datta, Somnath

Committee Member

Datta, Somnath

Committee Member

Gill, Ryan

Committee Member

Kong, Maiying

Committee Member

Kulasekera, KB

Author's Keywords

informative cluster size; informative within-cluster group size; clustered data; marginal inference


Clustered data result when observations have some natural organizational association. In such data, cluster size is defined as the number of observations belonging to a cluster. A phenomenon termed informative cluster size (ICS) occurs when observation outcomes vary in a systematic way related to the cluster size. An additional form of informativeness, termed informative within-cluster group size (IWCGS), arises when the distribution of group-defining categorical covariates within clusters similarly carries information related to outcomes. Standard methods for the marginal analysis of clustered data can produce biased estimates and inference when data have informativeness. A reweighting methodology has been developed that is resistant to ICS and IWCGS bias, and this method has been used to establish clustered data analogs of classical hypothesis tests related to ranks and correlation. In this work, we extend the reweighting methodology to develop a versatile collection of marginal hypothesis tests related to proportions, means, and variances in clustered data that are analogous to classical forms. We evaluate the performance of these tests compared to other cluster-appropriate methods through simulation and show that only reweighted tests maintain appropriate size when data have informativeness. We construct reweighted tests of clustered categorical data using several variance estimators, and demonstrate that the method of variance estimation can have substantial effect on these tests. Additionally, we show that when testing simple hypotheses in data lacking informativeness, reweighted tests can outperform other standard cluster-appropriate methods both in terms of size and power. Combining our novel tests with the existing tests of ranks and correlations, we compile a comprehensive R software package that executes this collection of ICS/IWCGS-appropriate methods through a thoughtful and user-friendly design.

Included in

Biostatistics Commons