Master's Thesis

Mathematics, MA

Gill, Ryan

Han, Dan

Han, Dan

Gaskins, Jeremy

linear methods; variables; regression


In data sets where there are a small number of observations but a large number of variables observed for each observation, ordinary least squares estimation cannot be used for regression models. There are many alternative including stepwise regression, penalized methods such as ridge regression and the LASSO, and methods based on derived inputs such as principal components regression and partial least squares regression. In this thesis, these five methods are described. K-fold cross validation is also discussed as a way for determining regularization parameters for each method. The performance of these methods in estimation and prediction is also examined through simulation studies under various interesting scenarios. Finally, the methods will be applied to a real data set in which each method is applied to build a model for the weights of mice based on microarray expression data for a large number of genes.