Date on Master's Thesis/Doctoral Dissertation

12-2013

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Computer Engineering and Computer Science

Committee Chair

Nasraoui, Olfa

Committee Co-Chair (if applicable)

Zurada, Jacek M.

Committee Member

Frigui, Hichem

Committee Member

Elmaghraby, Adel

Committee Member

Amini, Amir

Author's Keywords

Clustering; Heterogeneous; Mixed types; Collaborative

Subject

Cluster analysis

Abstract

We propose an Inter-Domain Supervision (IDS) clustering framework to discover clusters within diverse data formats, mixed-type attributes and different sources of data. This approach can be used for combined clustering of diverse representations of the data, in particular where data comes from different sources, some of which may be unreliable or uncertain, or for exploiting optional external concept set labels to guide the clustering of the main data set in its original domain. We additionally take into account possible incompatibilities in the data via an automated inter-domain compatibility analysis. Our results in clustering real data sets with mixed numerical, categorical, visual and text attributes show that the proposed IDS clustering framework gives improved clustering results compared to conventional methods, over a wide range of parameters. Thus the automatically extracted knowledge, in the form of seeds or constraints, obtained from clustering one domain, can provide additional knowledge to guide the clustering in another domain. Additional empirical evaluations further show that our approach, especially when using selective mutual guidance between domains, outperforms common baselines such as clustering either domain on its own or clustering all domains converted to a single target domain. Our approach also outperforms other specialized multiple clustering methods, such as the fully independent ensemble clustering and the tightly coupled multiview clustering, after they were adapted to the task of clustering mixed data. Finally, we present a real life application of our IDS approach to the cluster-based automated image annotation problem and present evaluation results on a benchmark data set, consisting of images described with their visual content along with noisy text descriptions, generated by users on the social media sharing website, Flickr.

Recommended Citation

Abdullin, Artur, "An inter-domain supervision framework for collaborative clustering of data with mixed types." (2013). Electronic Theses and Dissertations. Paper 6.
https://doi.org/10.18297/etd/6

Download

Included in

Computer Engineering Commons

COinS

Electronic Theses and Dissertations

An inter-domain supervision framework for collaborative clustering of data with mixed types.

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Committee Chair

Committee Co-Chair (if applicable)

Committee Member

Committee Member

Committee Member

Author's Keywords

Subject

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Related Links

Contact:

Electronic Theses and Dissertations

An inter-domain supervision framework for collaborative clustering of data with mixed types.

Author

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Committee Chair

Committee Co-Chair (if applicable)

Committee Member

Committee Member

Committee Member

Author's Keywords

Subject

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Related Links

Contact: