Date on Master's Thesis/Doctoral Dissertation
5-2018
Document Type
Master's Thesis
Degree Name
M.S.
Department
Computer Engineering and Computer Science
Degree Program
Computer Science, MS
Committee Chair
Frigui, Hichem
Committee Co-Chair (if applicable)
Zhang, Xiang
Committee Member
Zhang, Xiang
Committee Member
Park, Juw Won
Author's Keywords
machine Learning; outlier detection; data normalization
Abstract
In proteomics and metabolomics, to quantify the changes of abundance levels of biomolecules in a biological system, multiple sample analysis steps are involved. The steps include mass spectrum deconvolution and peak list alignment. Each analysis step introduces a certain degree of technical variation in the abundance levels (i.e. peak areas) of those molecules. Some analysis steps introduce technical variations that affect the peak areas of all molecules equally while others affect the peak areas of a subset of molecules with varying degrees. To correct these technical variations, some existing normalization methods simply scale the peak areas of all molecules detected in one sample using a single normalization factor or fit a regression model based on different assumptions. As a result, the local technical variations are ignored and may even be amplified in some cases. To overcome the above limitations, we developed a molecule specific normalization algorithm, called MSN, which adopts a robust surface fitting strategy to minimize the molecular profile difference of a group of house-keeping molecules across samples. The house-keeping molecules are those molecules whose abundance levels were not affected by the biological treatment. We also developed an outlier detection algorithm based on Fisher Criterion to detect and remove noisy data points from the experimental data. The applications of the MSN method on two different datasets showed that MSN is a highly efficient normalization algorithm that yields the highest sensitivity and accuracy compared to five existing normalization algorithms. The outlier detection algorithm's application on the same datasets has also shown to be efficient and robust.
Recommended Citation
Trabelsi, Ameni, "Machine learning for omics data analysis." (2018). Electronic Theses and Dissertations. Paper 2911.
https://doi.org/10.18297/etd/2911