Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Biochemistry and Molecular Biology

Degree Program

Biochemistry and Molecular Biology, PhD

Committee Chair

Gregg, Ronald

Committee Co-Chair (if applicable)

Rouchka, Eric

Committee Member

Rouchka, Eric

Committee Member

Samuelson, David

Committee Member

Smith, Melissa

Committee Member

Yan, Jun

Committee Member

Watson, Corey

Author's Keywords

genetics; immunogenetics; immunology; population genetics; genomics; antibodies


The adaptive immune system relies on a diverse set of over one hundred immunoglobulin (IG) genes across three genomic loci that are variably combined to form antibodies (Ab). The IG Lambda locus is one of two loci which encodes the IG light chain. The complexity of the IGL locus severely limits the effective use of standard short-read sequencing, limiting our knowledge of population diversity in these loci. We leveraged single molecule real-time (SMRT) long-read sequencing in conjunction with IGL-targeted DNA capture to develop the method IG-Cap for accurate and high-throughput sequencing of the IGL locus. We benchmarked this method using six gold-standard assemblies of the IGL locus. Using IG-Cap and whole genome long-read sequencing data, we resolved the IGL locus in 238 individuals of diverse population origins. From these individuals, we identified 207 novel IGL alleles and resolved multiple large structural variations, including a large 60 KB deletion affecting 6 functional IGLV genes and population variable duplications in the IGL constant region. Additionally, we identified signatures of balancing and purifying selection in and around functional genes across the IGL locus including gene-specific patterns of heterozygosity and allelic richness. Finally, we found that IGLV alleles are enriched for nonsynonymous mutations resulting in disparate amino acid changes. Overall, this work revealed significant unexplored diversity in the IGL locus and provides an important set of genomic tools and resources to enable future functional studies, disease association studies, and targeted therapeutic development.