Date on Master's Thesis/Doctoral Dissertation
5-2013
Document Type
Master's Thesis
Degree Name
M.S.
Department
Bioinformatics and Biostatistics
Committee Chair
Kim, Seongho
Committee Co-Chair (if applicable)
Wu, Dongfeng
Committee Member
Wu, Dongfeng
Committee Member
Zhang, Xiang
Author's Keywords
Compound identification; MS data; Penalized regression; Dot product; Metabolomics
Subject
Chemical detectors; Regression analysis; Ridge regression (Statistics)
Abstract
In this study, we propose a new method for compound identification using penalized linear regression. Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. In the context of the linear regression, the response variable is an experimental mass spectrum (i.e., query) and all the compounds in the reference library are the independent variables. However, the number of compounds in the reference library is much larger than the range of m/z values so that the data become high dimensional data with suffering from singularity. For this reason, we use penalized linear regression such as ridge regression and the Lasso. Furthermore, we also propose two-step approaches using dot product and Pearson’s correlation along with the penalized linear regression in this study.
Recommended Citation
Liu, Ruiqi, "Compound identification using penalized linear regression." (2013). Electronic Theses and Dissertations. Paper 844.
https://doi.org/10.18297/etd/844