Date on Master's Thesis/Doctoral Dissertation
5-2023
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Interdisciplinary and Graduate Studies
Degree Program
Interdisciplinary Studies with a specialization in Bioinformatics, PhD
Committee Chair
Rouchka, Eric
Committee Co-Chair (if applicable)
Mitra, Riten
Committee Member
Mitra, Riten
Committee Member
Petruska, Jeffrey
Committee Member
Park, Juw won
Author's Keywords
HMM; BAHCC1; PLXN; variants in G4; thermodynamic stability
Abstract
G quadruplex structures are secondary structures located throughout the genome of various organisms with involvement in regulatory functions in different transcription, translation, genome stability, epigenetic regulation as well as cell division. Even with the diverse acknowledgement of G4 structure in vivo, there are no current search tools for G quadruplexes based on already identified G quadruplexes and identified families across different genomes based on sequence diversity. Construction of families of G4 sequences and identifying their polymorphisms within disease and disorders will lead to a better understanding of their functional roles and will further research into the biophysical modeling of interactions with oligonucleotide treatments of disease. The first project aims to develop a framework for clustering G quadruplex (G4) sequences into families based on sequence, structure, and thermodynamic properties. No current search tools exist to filter G4s based on their properties, and the diversity of G4 sequences across the genome is not fully understood. To address this gap, we utilized a combination of clustering and annotation methods to identify 95 families of G4 sequences within the human genome. Profiles for each family were created using hidden Markov models, and their thermodynamic properties, functional annotations, and transcription factor binding motifs were analyzed. The second project aims to investigate the effect of single nucleotide variations (SNVs) on G4 structures in disease contexts. Although the role of G4s in cancer and metabolic disorders are well-established, the effect of SNVs on G4s has not been extensively studied. Using the COSMIC and CLINVAR databases, we identified over 37,000 G4 SNVs and analyzed their effects on G4 secondary structures. We found that a significant proportion of SNVs result in G4 loss or gain, and we identified genes enriched for destabilizing SNVs in G4-forming regions. We also analyzed mutational patterns in the G4 structure and found a higher selective pressure on the coding region of the template strand. Our findings provide insights into the effects of SNVs on G4 structures and highlight potential targets for therapeutic intervention in diseases associated with G4 dysregulation.
Recommended Citation
Neupane, Aryan, "Clustering and analysis of g quadruplex sequences." (2023). Electronic Theses and Dissertations. Paper 4058.
https://doi.org/10.18297/etd/4058