Date on Master's Thesis/Doctoral Dissertation
Bioinformatics and Biostatistics
Compound identification; MS data; Penalized regression; Dot product; Metabolomics
Chemical detectors; Regression analysis; Ridge regression (Statistics)
In this study, we propose a new method for compound identification using penalized linear regression. Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. In the context of the linear regression, the response variable is an experimental mass spectrum (i.e., query) and all the compounds in the reference library are the independent variables. However, the number of compounds in the reference library is much larger than the range of m/z values so that the data become high dimensional data with suffering from singularity. For this reason, we use penalized linear regression such as ridge regression and the Lasso. Furthermore, we also propose two-step approaches using dot product and Pearson’s correlation along with the penalized linear regression in this study.
Liu, Ruiqi, "Compound identification using penalized linear regression." (2013). Electronic Theses and Dissertations. Paper 844.