Date on Master's Thesis/Doctoral Dissertation
Committee Co-Chair (if applicable)
Alzheimer's disease--Diagnosis; Mathematical models
Dimensionality plays a huge part in the modeling process. If there are more elements in a data set than variables in each element then there are very few restrictions in selection of an algorithm. Bagging, bootstrap aggregating (Breiman, 1994), may also be used to improve a model’s prediction capability. On the other hand, if there more variables in each observation than the number of observations in the dataset, the number of usable algorithms is greatly reduced. The recently developed algorithm, support vector machines, was designed for such situations, in comparison to algorithms such as logistic regression which have instability issues caused by the dimensionality. Localizing or reducing the variables is an option if the loss of information is of little importance. This paper introduces a method called variable bagging (a term which was inspired by bagging) which lifts the barrier imposed by dimensionality. Instead of randomly selecting elements of the data set and using all the variables, variable bagging randomly selects variables and uses all the resultants of the data set to develop an appropriate model chosen by the cross-validation selector. The procedure is repeated several times until a committee is formed in order to “vote” on the final outcome. Theatrical results justifying use of the cross-validation selector are also discussed. In particular, this paper obtains and proves an improved upper bound for the risk of the cross-validation selector compared with similar upper bounds in existing literature.
Godbey, Michael Wayne, "The use of variable-bagging and the cross-validation selector in the prediction of alzheimer’s using the adni database." (2014). Electronic Theses and Dissertations. Paper 1708.