Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Computer Engineering and Computer Science

Committee Chair

Frigui, Hichem

Author's Keywords

CBIR; Clustering; Content based image retrieval; Constrained clustering; Image processing; Adaptive clustering


Images, Photographic--Databases; Image processing--Digital techniques


The advent of larger storage spaces, affordable digital capturing devices, and an ever growing online community dedicated to sharing images has created a great need for efficient analysis methods. In fact, analyzing images for the purpose of automatic categorization and retrieval is quickly becoming an overwhelming task even for the casual user. Initially, systems designed for these applications relied on contextual information associated with images. However, it was realized that this approach does not scale to very large data sets and can be subjective. Then researchers proposed methods relying on the content of the images. This approach has also proved to be limited due to the semantic gap between the low-level representation of the image and the high-level user perception. In this dissertation, we introduce a novel clustering technique that is designed to combine multiple forms of information in order to overcome the disadvantages observed while using a single information domain. Our proposed approach, called Adaptive Constrained Clustering (ACC), is a robust, dynamic, and semi-supervised algorithm. It is based on minimizing a single objective function incorporating the abilities to: (i) use multiple feature subsets while learning cluster independent feature relevance weights; (ii) search for the optimal number of clusters; and (iii) incorporate partial supervision in the form of pairwise constraints. The content of the images is used to extract the features used in the clustering process. The context information is used in constructing a set of appropriate constraints. These constraints are used as partial supervision information to guide the clustering process. The ACC algorithm is dynamic in the sense that the number of categories are allowed to expand and contract depending on the distribution of the data and the available set of constraints. We show that the proposed ACC algorithm is able to partition a given data set into meaningful clusters using an adaptive, soft constraint satisfaction methodology for the purpose of automatically categorizing and summarizing an image database. We show that the ACC algorithm has the ability to incorporate various types of contextual information. This contextual information includes: spatial information provided by geo-referenced images that include GPS coordinates pinpointing their location, temporal information provided by each image's time stamp indicating the capture time, and textual information provided by a set of keywords describing the semantics of the associated images.