Date on Master's Thesis/Doctoral Dissertation

12-2025

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Interdisciplinary Studies

Degree Program

Interdisciplinary Studies with a specialization in Bioinformatics, PhD

Committee Chair

Rouchka, Eric

Committee Member

Park, Juw Won

Committee Member

Petruska, Jeffrey

Committee Member

Mistry, Akshitkumar

Committee Member

Gill, Ryan

Author's Keywords

bioinformatics; scRNA-seq; topological data analysis; persistent homology; gene expression

Abstract

As single-cell RNA sequencing (scRNA-seq) data expands, robust methods for integrating diverse datasets are critical. This dissertation applies Persistent Homology (PH), a technique from Topological Data Analysis (TDA), to a collection of scRNA-seq datasets spanning eight tissue types to quantify how data integration affects topological features and biological interpretability. We assessed global topological structure using Betti curves, Euler characteristics, and persistence landscapes across raw, normalized, and integrated data representations. Our analysis revealed a performance inversion: while conventional methods excelled on unintegrated data, high-granularity topological methods, particularly those sensitive to global data structure, became superior after integration. This suggests a synergy where standard integration algorithms, while potentially disrupting local data structures, clarify the large-scale global topology that landscape-based metrics are well-suited to detect. Conversely, we found that low-granularity statistical summaries, like mean landscape curves, were most effective at distinguishing tissues in the unintegrated data, but this statistical separability was lost after harmonization. Despite this convergence of averages, landscape-based distance metrics successfully partitioned tissues after integration, demonstrating that the full pairwise distance matrix retains biologically relevant information missed by simpler statistical summaries. This work establishes a framework for using topological methods to assess integration quality and demonstrates that PH can serve as a complementary strategy for interpreting complex transcriptomic landscapes beyond conventional clustering.

Share

COinS