Date on Master's Thesis/Doctoral Dissertation
5-2019
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Computer Engineering and Computer Science
Degree Program
Computer Science and Engineering, PhD
Committee Chair
Frigui, Hichem
Committee Co-Chair (if applicable)
Amini, Amir
Committee Member
Amini, Amir
Committee Member
Nasraoui, Olfa
Committee Member
Kantardzic, Mehmed
Committee Member
Altiparmak, Nihat
Author's Keywords
machine learning; BEO detection; clustering; multiple instance learning; target concepts; diverse density
Abstract
An emergent area of research in machine learning that aims to develop tools to analyze data where objects have multiple representations is Multiple Instance Learning (MIL). In MIL, each object is represented by a bag that includes a collection of feature vectors called instances. A bag is positive if it contains at least one positive instance, and negative if no instances are positive. One of the main objectives in MIL is to identify a region in the instance feature space with high correlation to instances from positive bags and low correlation to instances from negative bags -- this region is referred to as a target concept (TC). Existing methods either only identify a single target concept, do not provide a mechanism for selecting the appropriate number of target concepts, or do not provide a flexible representation for target concept memberships. Thus, they are not suitable to handle data with large intra-class variation. In this dissertation we propose new algorithms that learn multiple target concepts simultaneously. The proposed algorithms combine concepts from data clustering and multiple instance learning. In particular, we propose crisp, fuzzy, and possibilistic variations of the Multi-target concept Diverse Density (MDD) metric, along with three algorithms to optimize them. Each algorithm relies on an alternating optimization strategy that iteratively refines concept assignments, locations, and scales until it converges to an optimal set of target concepts. We also demonstrate how the possibilistic MDD metric can be used to select the appropriate number of target concepts for a dataset. Lastly, we propose the construction of classifiers based on embedded feature space theory to use our target concepts to predict the label of prospective MIL data. The proposed algorithms are implemented, tested, and validated through the analysis of multiple synthetic and real-world data. We first demonstrate that our algorithms can detect multiple target concepts reliably, and are robust to many generative data parameters. We then demonstrate how our approach can be used in the application of Buried Explosive Object (BEO) detection to locate distinct target concepts corresponding to signatures of varying BEO types. We also demonstrate that our classifier strategies can perform competitively with other well-established embedded space approaches in classification of Benchmark MIL data.
Recommended Citation
Karem, Andrew D., "Clustering of multiple instance data." (2019). Electronic Theses and Dissertations. Paper 3161.
https://doi.org/10.18297/etd/3161