Date on Master's Thesis/Doctoral Dissertation
12-2017
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Bioinformatics and Biostatistics
Degree Program
Biostatistics
Committee Chair
Zheng, Qi
Committee Co-Chair (if applicable)
Brock, Guy
Committee Member
Kulasekera, K.B.
Committee Member
Gaskins, Jeremy
Committee Member
Kong, Maiying
Committee Member
Garbett, Nichola
Author's Keywords
functional data analysis; thermograms; lupus; differential scanning calorimetry; disease status
Abstract
Introduction: Differential scanning calorimetry (DSC) is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are often considered as functional data. In this dissertation we propose and apply functional data analysis (FDA) techniques to analyze DSC data from the Lupus Family Registry and Repository (LFRR). The aim is to develop FDA methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves.
Methods: In project 1 we examine how well standard functional regression is able to capture the differences in curves for cases and controls and compare this to a multivariate approach. In project 2 we develop a semiparametric model; the Generalized Functional Partially Linear Single-Index Model (GFPL). This model is useful when there exists some curvature or non-linearity in the logit, which cannot be modeled by the standard Functional Generalized Linear Model (FGLM). It also mitigates the curse of dimensionality, is a more flexible model, and yields interpretable results. In project 3, we propose a tree-based method: Local Basis Random Forests (LBRF) for Functional Data. This non-parametric method allows us to focus on significant parts of the functional covariates and reduce the noise level.
Results: The standard functional logistic regression model with FPCA scores as the predictors gives an 81.25% correct classification rate on the test data, comparable to results from the multivariate approach. The proposed GFPL gives prediction accuracies and standard errors that are better than the standard FGLM when there is nonlinearity present. The LBRF for functional data yields high prediction accuracy (as high as 97% in simulations and 92% in the Lupus data), especially when the true signal is localized, and is able to capture where the true signal lies.
Recommended Citation
Kendrick, Sarah, "Functional data analysis methods for predicting disease status." (2017). Electronic Theses and Dissertations. Paper 2851.
https://doi.org/10.18297/etd/2851