Date on Master's Thesis/Doctoral Dissertation

8-2025

Document Type

Master's Thesis

Degree Name

M.S.

Department

Bioinformatics and Biostatistics

Degree Program

Biostatistics, MS

Committee Chair

Sekula, Michael

Committee Co-Chair (if applicable)

Kong, Maiying

Committee Member

Cash, Elizabeth

Author's Keywords

Breast cancer; TCGA-BRCA; RNA-Seq; cox proportional hazards model; LASSO feature selection; cancer genomics

Abstract

High-dimensional genomic data offer both promise and challenges for identifying clinically relevant biomarkers. This study developed a parallelized survival modeling pipeline to identify genes associated with overall survival in breast cancer, with a focus on gene-by-treatment interactions and patient heterogeneity. RNA-Seq data from female patients in the TCGA-BRCA cohort were analyzed. Univariate Cox proportional hazards models were used to screen genes, adjusting for age, race/ethnicity, treatment status, and cancer stage. A LASSO-penalized Cox regression was fit across 2000 random seeds to assess feature stability. Genes were filtered by expression level, statistical significance, and hazard ratios (effect sizes) in either direction, then re-evaluated in a multivariable Cox model. Several genes with statistically significant treatment interactions were identified, including novel candidates not present in established prognostic panels. These findings highlight the value of interaction-aware survival modeling for improving personalized prognostic prediction in breast cancer and underscore the importance of accounting for treatment heterogeneity in high-dimensional genomic analyses.

Share

COinS