Electronic Theses and Dissertations

Designing and sample size calculation in presence of heterogeneity in biological studies involving high-throughput data.

Sudhir Srivastava, University of LouisvilleFollow

Date on Master's Thesis/Doctoral Dissertation

8-2019

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Interdisciplinary Studies

Degree Program

Interdisciplinary Studies with a specialization in Bioinformatics, PhD

Committee Chair

Rai, Shesh N.

Committee Member

Rouchka, Eric C.

Committee Member

Kalbfleisch, Ted

Committee Member

Merchant, Michael L.

Committee Member

Mitra, Riten

Author's Keywords

Tissue storage; tissue extraction; technical variability; imputation; negative binomial distribution; generalized linear model

Abstract

The designing and determination of sample size are important for conducting high-throughput biological experiments such as proteomics experiments and RNA-Seq expression studies, thus leading to better understanding of complex mechanisms underlying various biological processes. The variations in the biological data or technical approaches to data collection lead to heterogeneity for the samples under study. We critically worked on the issues of technical and biological heterogeneity. The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values (MVs) and data heterogeneity. We considered a proteomics data set generated from human kidney biopsy material to investigate the technical effects of sample preparation and the quantitative MS. We studied the effect of tissue storage methods (TSMs) and tissue extraction methods (TEMs) on data analysis. There are two TSMs: frozen (FR) and FFPE (formalin-fixed paraffin embedded); and three TEMs: MAX, TX followed by MAX and SDS followed by MAX. We assessed the impact of different strategies to analyze the data while considering heterogeneity and MVs. We found that the FFPE is better than that of FR for tissue storage. We also found that the one-step TEM (MAX) is better than those of two-steps TEMs. Furthermore, we found the imputation method is a better approach than excluding the proteins with MVs or using unbalanced design. We introduce a web application, PWST (Proteomics Workflow Standardization Tool) to standardize the proteomics workflow. The tool will be helpful in deciding the most suitable choice for each step and studying the variability associated with technical steps as well as the effects of continuous variables. We have used the special cases of general linear model - ANCOVA and ANOVA with fixed effects to study the effects due to various sources of variability. We introduce an interactive tool, “SATP: Statistical Analysis Tool for Proteomics”, for analyzing proteomics expression data that is scalable to large clinical proteomic studies. The user can perform differential expression analysis of proteomics data either at the protein or peptide level using multiple approaches. We have developed statistical approaches for calculating sample size for proteomics experiments under allocation and cost constraints. We have developed R programs and a shiny app “SSCP: Sample Size Calculator for Proteomics Experiment” for computing sample sizes. We have proposed statistical approaches for calculating sample size for RNA-Seq experiments considering allocation and cost. We have developed R programs and shiny apps to calculate sample size for conducting RNA-Seq experiments.

Recommended Citation

Srivastava, Sudhir, "Designing and sample size calculation in presence of heterogeneity in biological studies involving high-throughput data." (2019). Electronic Theses and Dissertations. Paper 3261.
https://doi.org/10.18297/etd/3261

Download

Included in

Bioinformatics Commons, Computer Sciences Commons, Statistics and Probability Commons

COinS

Electronic Theses and Dissertations

Designing and sample size calculation in presence of heterogeneity in biological studies involving high-throughput data.

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Degree Program

Committee Chair

Committee Member

Committee Member

Committee Member

Committee Member

Author's Keywords

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Related Links

Contact:

Electronic Theses and Dissertations

Designing and sample size calculation in presence of heterogeneity in biological studies involving high-throughput data.

Author

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Degree Program

Committee Chair

Committee Member

Committee Member

Committee Member

Committee Member

Author's Keywords

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Related Links

Contact: