Date on Master's Thesis/Doctoral Dissertation

12-2014

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Mathematics

Committee Chair

Gill, Ryan

Committee Co-Chair (if applicable)

Lee, Kiseop

Committee Member

Li, Jiaxu

Committee Member

Sahoo, Prasanna

Committee Member

Shafto, Patrick

Subject

Alzheimer's disease--Diagnosis; Mathematical models

Abstract

Dimensionality plays a huge part in the modeling process. If there are more elements in a data set than variables in each element then there are very few restrictions in selection of an algorithm. Bagging, bootstrap aggregating (Breiman, 1994), may also be used to improve a model’s prediction capability. On the other hand, if there more variables in each observation than the number of observations in the dataset, the number of usable algorithms is greatly reduced. The recently developed algorithm, support vector machines, was designed for such situations, in comparison to algorithms such as logistic regression which have instability issues caused by the dimensionality. Localizing or reducing the variables is an option if the loss of information is of little importance. This paper introduces a method called variable bagging (a term which was inspired by bagging) which lifts the barrier imposed by dimensionality. Instead of randomly selecting elements of the data set and using all the variables, variable bagging randomly selects variables and uses all the resultants of the data set to develop an appropriate model chosen by the cross-validation selector. The procedure is repeated several times until a committee is formed in order to “vote” on the final outcome. Theatrical results justifying use of the cross-validation selector are also discussed. In particular, this paper obtains and proves an improved upper bound for the risk of the cross-validation selector compared with similar upper bounds in existing literature.

Included in

Mathematics Commons

Share

COinS