Date on Master's Thesis/Doctoral Dissertation
Computer Engineering and Computer Science
Elmaghraby, Adel S.
Data Classification is a task that could be found in many life activities. In general, the term could be used for any activity that derives some decision or forecast based on the currently available information. Using a more accurate definition, a classification procedure is the construction of some kind of a method for making judgments for a continuing sequence of cases, where each new case must be assigned to one of pre-defined classes. This type of construction has been termed supervised learning, in order to distinguish it from unsupervised learning or clustering in which the classes are not pre-defined but are concluded from the available data. This thesis is divided into five chapters, analyzing three classification techniques, namely nearest neighbor technique, perceptron learning algorithm and multi-layer perceptrons with backpropagation, based on performance and scalability issues. Chapter one gives an introduction to the research topic of this thesis. In addition it states the problem that builds the core of this thesis and predefines the objective of this study, namely selecting the most efficient and scalable classification algorithm that suits a given classification task. Chapter two explores a historical review of the literature introduced in the classification domain. It focuses mainly on the topics that are related to this study and presents some of the new classification approaches. Chapter three introduces the way based on which this thesis is designed. The technical methodology used to analyze and investigate the three classification algorithms is clearly described. In this thesis different experiments are introduced to prove the findings. The datasets used here are considered to be real-life datasets that present sports players and cars classification tasks. Chapters four and five represent the main core of this thesis, as they contain the data analysis, main findings and conclusions that are derived from different experiments. The nearest neighbor classification technique is one of the lazy learners because before the classification process starts, it needs to store all of the training samples. But, although it takes more time to classify any unknown samples, it is considered the most efficient technique amont other classification techniques. A natural and future step would be using the single-layer perception algorithm that does not need to store the data samples to reach an acceptable convergence rate. Alternatively, it speeds the recognition or the learning process, because it learns and stores only the weights of the neural network used to implement the algorithm. This algorithm has a big deficiency: it only works for the linearly separable data samples. So, it is now a suitable phase to start working on a more scalable and efficient technique. It is the multi-layer perceptrons network with backpropagation that has the power of solving different complex and non-linearly separable classification tasks.
Mehanna, Fadi Samih Omar, "Towards a scalable and efficient data classification technique." (2005). Electronic Theses and Dissertations. Paper 961.