Date on Master's Thesis/Doctoral Dissertation


Document Type

Master's Thesis

Degree Name



Computer Engineering and Computer Science

Degree Program

Computer Science, MS

Committee Chair

Nasraoui, Olfa

Committee Co-Chair (if applicable)

Frigui, Hichem

Committee Member

Frigui, Hichem

Committee Member

Barnes, Gregory

Author's Keywords

clustering; machine learning; autism


Autism spectrum disorder (ASD) is a developmental disorder that affects communication and behavior. Several studies have been conducted in the past years to develop a better understanding of the disease and therefore a better diagnosis and a better treatment by analyzing diverse data sets consisting of behavioral surveys and tests, phenotype description, and brain imagery. However, data analysis is challenged by the diversity, complexity and heterogeneity of patient cases and by the need for integrating diverse data sets to reach a better understanding of ASD. The aim of our study is to mine homogeneous groups of patients from a heterogeneous set of data consisting of both ADOS and Behavioral datasets and to interpret the discovered clusters within the medical context of the affected brain areas using fMRI data. We developed an unsupervised machine learning pipeline to mine a heterogenous data set consisting of the Standardized Autism Diagnostic Observation Schedule (ADOS) scores, which are metrics used to measure the autism severity, phenotypical and behavioral data, which is used to identify behavioral problems for autistic patients, and functional Magnetic Resonance Imaging (fMRI) which is a technique of measuring and mapping brain activity. Our Big Data pipeline utilizes different clustering algorithms to partition the patients into homogeneous groups: hierarchical clustering, spectral clustering and spectral co-clustering. In addition, we design a general framework that adds explainability to clustering algorithms in a way that assists the end-user in making sense of the clustering outputs through answering their questions about the results relative to the input data itself as well as available external evidence. Our clustering algorithms were able to discover homogeneous groups of patients that share similar behavioral and phenotypical characteristics. Furthermore, we generate an accessible interpretation of clustering results by mapping the discovered clusters onto the brain structure. Through our clustering and explanation modules, our unsupervised machine learning methodology enables the domain experts to perform a powerful analysis on homogeneous cases, such as discovering hidden associations between the genetic data of patients belonging to the same cluster in order to have a better understanding of Autism Spectrum Disorder (ASD) and to pave the way toward data-driven personalized medicine.