Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Bioinformatics and Biostatistics

Degree Program

Biostatistics, PhD

Committee Chair

Datta, Somnath

Committee Co-Chair (if applicable)

Datta, Susmita

Committee Member

Datta, Susmita

Committee Member

Kong, Maiying

Committee Member

Gaskins, Jeremy

Committee Member

Gill, Ryan

Author's Keywords

dental data; next-generation-sequencing; marginal regression; mixed effects model; Bayesian


This dissertation is directed toward developing a statistical methodology with applications of the Conway-Maxwell-Poisson (CMP) distribution (Conway, R. W., and Maxwell, W. L., 1962) to count data. The count data for this dissertation exhibit three different characteristics: clustering, zero inflation, and dispersion. Clustering suggests that observations within clusters are correlated, and the zero inflation phenomenon occurs when the data exhibit excessive zero counts. Dispersion implies that the mean is greater/smaller than the variance unlike a Poisson distribution. The dissertation starts with an introduction of inference for a zero-inflated clustered count data in the first chapter. Then, it presents novel methodologies through three different statistical approaches (Chapters 2-4). A marginal regression approach as the second chapter which begins with a description of a zero-inflated CMP model and subsequently develops proper statistical methodologies for estimating marginal regression parameters. Furthermore, various types of simulations are conducted to investigate whether the marginal regression approach leads to the proper statistical inference. This chapter also provides an application to a dental dataset, which is clustered, zero inflated, and dispersed. Chapter 3 develops a mixed effects model including a cluster-specific random effect term. This chapter also addresses numerical challenges of a mixed effects model approach through extensive simulations. For the application of the zero-inflated mixed effects model, next generation sequencing (NGS) data from a maize hybrids experiement is analyzed. While Chapter 3 applies a mixed effects model using the frequentist approach, Chapter 4 develops a Bayesian method to analyze such data under a mixed effects model sturucture. In that chapter, a hurdle model is applied to cope with a zero inflation phenomenon, rather than a zero-inflated model used in both Chapters 2 and 3. Furthermore, Chapter 4 provides the application to the same dental dataset used in Chapter 2. The application section introduces a new factor into a hurdle mixed effects model, which incorporates both fixed effects term and random effects term. Chapter 5 describes the future plan as the concluding chapter.

Included in

Biostatistics Commons