Electronic Theses and Dissertations

OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters.

Michael N. Sekula, University of Louisville

Date on Master's Thesis/Doctoral Dissertation

5-2015

Document Type

Master's Thesis

Degree Name

M.S.

Department

Bioinformatics and Biostatistics

Degree Program

Biostatistics with a concentration in Decision Science, MS

Committee Chair

Datta, Susmita

Committee Co-Chair (if applicable)

Datta, Somnath

Committee Member

Gill, Ryan

Subject

Cluster analysis; Biology--Mathematical models; Algorithms; Biomathematics

Abstract

Determining the best clustering algorithm and ideal number of clusters for a particular dataset is a fundamental difficulty in unsupervised clustering analysis. In biological research, data generated from Next Generation Sequencing technology and microarray gene expression data are becoming more and more common, so new tools and resources are needed to group such high dimensional data using clustering analysis. Different clustering algorithms can group data very differently. Therefore, there is a need to determine the best groupings in a given dataset using the most suitable clustering algorithm for that data. This paper presents the R package optCluster as an efficient way for users to evaluate up to ten clustering algorithms, ultimately determining the optimal algorithm and optimal number of clusters for a given set of data. The selected clustering algorithms are evaluated by as many as nine validation measures classified as “biological”, “internal”, or “stability”, and the final result is obtained through a weighted rank aggregation algorithm based on the calculated validation scores. Two examples using this package are presented, one with a microarray dataset and the other with an RNA-Seq dataset. These two examples highlight the capabilities the optCluster package and demonstrate its usefulness as a tool in cluster analysis.

Recommended Citation

Sekula, Michael N., "OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters." (2015). Electronic Theses and Dissertations. Paper 2147.
https://doi.org/10.18297/etd/2147

Download

Included in

Bioinformatics Commons, Biostatistics Commons

COinS

ThinkIR: The University of Louisville's Institutional Repository

Electronic Theses and Dissertations

OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters.

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Degree Program

Committee Chair

Committee Co-Chair (if applicable)

Committee Member

Subject

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Related Links

Contact:

ThinkIR: The University of Louisville's Institutional Repository

Electronic Theses and Dissertations

OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters.

Author

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Degree Program

Committee Chair

Committee Co-Chair (if applicable)

Committee Member

Subject

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Related Links

Contact: