Electronic Theses and Dissertations

Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

Dazhuo Li, University of Louisville

Date on Master's Thesis/Doctoral Dissertation

5-2012

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Computer Engineering and Computer Science

Committee Chair

Rouchka, Eric Christian

Author's Keywords

Bayesian nonparametrics; Machine learning; Clustering; Bioinformatics; Relational and high dimensional

Subject

Bayesian statistical decision theory; Machine learning; Bioinformatics

Abstract

Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task.

Recommended Citation

Li, Dazhuo, "Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics." (2012). Electronic Theses and Dissertations. Paper 827.
https://doi.org/10.18297/etd/827

Download

COinS

ThinkIR: The University of Louisville's Institutional Repository

Electronic Theses and Dissertations

Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Committee Chair

Author's Keywords

Subject

Abstract

Recommended Citation

Search

Browse

Author Corner

Related Links

Contact:

ThinkIR: The University of Louisville's Institutional Repository

Electronic Theses and Dissertations

Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

Author

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Committee Chair

Author's Keywords

Subject

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

Related Links

Contact: