Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Bioinformatics and Biostatistics

Degree Program


Committee Chair

Zheng, Qi

Committee Co-Chair (if applicable)

Brock, Guy

Committee Member

Kulasekera, K.B.

Committee Member

Gaskins, Jeremy

Committee Member

Kong, Maiying

Committee Member

Garbett, Nichola

Author's Keywords

functional data analysis; thermograms; lupus; differential scanning calorimetry; disease status


Introduction: Differential scanning calorimetry (DSC) is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are often considered as functional data. In this dissertation we propose and apply functional data analysis (FDA) techniques to analyze DSC data from the Lupus Family Registry and Repository (LFRR). The aim is to develop FDA methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves.

Methods: In project 1 we examine how well standard functional regression is able to capture the differences in curves for cases and controls and compare this to a multivariate approach. In project 2 we develop a semiparametric model; the Generalized Functional Partially Linear Single-Index Model (GFPL). This model is useful when there exists some curvature or non-linearity in the logit, which cannot be modeled by the standard Functional Generalized Linear Model (FGLM). It also mitigates the curse of dimensionality, is a more flexible model, and yields interpretable results. In project 3, we propose a tree-based method: Local Basis Random Forests (LBRF) for Functional Data. This non-parametric method allows us to focus on significant parts of the functional covariates and reduce the noise level.

Results: The standard functional logistic regression model with FPCA scores as the predictors gives an 81.25% correct classification rate on the test data, comparable to results from the multivariate approach. The proposed GFPL gives prediction accuracies and standard errors that are better than the standard FGLM when there is nonlinearity present. The LBRF for functional data yields high prediction accuracy (as high as 97% in simulations and 92% in the Lupus data), especially when the true signal is localized, and is able to capture where the true signal lies.