Date on Master's Thesis/Doctoral Dissertation
12-2025
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Interdisciplinary Studies
Degree Program
Interdisciplinary Studies with a specialization in Bioinformatics, PhD
Committee Chair
Rouchka, Eric
Committee Member
Park, Juw Won
Committee Member
Petruska, Jeffrey
Committee Member
Mistry, Akshitkumar
Committee Member
Gill, Ryan
Author's Keywords
bioinformatics; scRNA-seq; topological data analysis; persistent homology; gene expression
Abstract
As single-cell RNA sequencing (scRNA-seq) data expands, robust methods for integrating diverse datasets are critical. This dissertation applies Persistent Homology (PH), a technique from Topological Data Analysis (TDA), to a collection of scRNA-seq datasets spanning eight tissue types to quantify how data integration affects topological features and biological interpretability. We assessed global topological structure using Betti curves, Euler characteristics, and persistence landscapes across raw, normalized, and integrated data representations. Our analysis revealed a performance inversion: while conventional methods excelled on unintegrated data, high-granularity topological methods, particularly those sensitive to global data structure, became superior after integration. This suggests a synergy where standard integration algorithms, while potentially disrupting local data structures, clarify the large-scale global topology that landscape-based metrics are well-suited to detect. Conversely, we found that low-granularity statistical summaries, like mean landscape curves, were most effective at distinguishing tissues in the unintegrated data, but this statistical separability was lost after harmonization. Despite this convergence of averages, landscape-based distance metrics successfully partitioned tissues after integration, demonstrating that the full pairwise distance matrix retains biologically relevant information missed by simpler statistical summaries. This work establishes a framework for using topological methods to assess integration quality and demonstrates that PH can serve as a complementary strategy for interpreting complex transcriptomic landscapes beyond conventional clustering.
Recommended Citation
Daneshmand, Jonah, "A persistent homology framework for scRNA-Seq: Assessing clustering robustness and quantifying preprocessing and integration effects on topological features." (2025). Electronic Theses and Dissertations. Paper 4649.
Retrieved from https://ir.library.louisville.edu/etd/4649
Included in
Bioinformatics Commons, Biostatistics Commons, Computational Biology Commons, Data Science Commons, Geometry and Topology Commons, Other Cell and Developmental Biology Commons, Other Genetics and Genomics Commons, Systems Biology Commons