Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Bioinformatics and Biostatistics

Degree Program

Biostatistics, PhD

Committee Chair

Wu, Dongfend

Committee Co-Chair (if applicable)

Gaskins, Jeremy

Committee Member

Gaskins, Jeremy

Committee Member

Zheng, Qi

Committee Member

Sekula, Michael

Committee Member

Seow, Albert

Author's Keywords

Lung cancer screening; sojourn time; transition time; sensitivity; probability of overdiagnosis; optimal scheduling time


This dissertation consists of three research projects on cancer screening probability modeling. In these projects, the three key modeling parameters (sensitivity, sojourn time, transition density) for cancer screening were estimated, along with the long-term outcomes (including overdiagnosis as one outcome), the optimal screening time/age, the lead time distribution, and the probability of overdiagnosis at the future screening time were simulated to provide a statistical perspective on the effectiveness of cancer screening programs. In the first part of this dissertation, a statistical inference was conducted for male and female smokers using the National Lung Screening Trial (NLST) chest X-ray data. A likelihood function was applied to the lung cancer screening data to obtain Bayesian inference of the transition probability and the distribution of sojourn time. The estimates revealed that male smokers are more susceptible to lung cancer due to their higher transition probability density compared to same-aged female smokers. Furthermore, female smokers exhibited a slightly shorter mean sojourn time than their male counterparts, suggesting they develop clinical symptoms of lung cancer at a faster rate. In the second part, the probability model was applied to assess the long-term effects of cancer screening. The participants in the cancer screenings were categorized into four mutually exclusive groups: symptom-free-life, no-early-detection, true-early-detection, and overdiagnosis. To estimate the probability of overdiagnosis, Simulation studies and Bayesian inference were conducted, considering factors such as a person's age at the study entry, screening frequency, screening sensitivity, and other relevant parameters. The probability of overdiagnosis among the screen-detected cases was found to be relatively low but increased with the initial screen age. It was observed that males were more susceptible to overdiagnosis compared to females. The model can provide policymakers with essential information about the distribution of individuals in the overdiagnosis and true-early-detection groups, enabling them to minimize the long-term effects resulting from frequent screenings. In the third part of this dissertation, a recently developed statistical method was applied to the National Lung Screening Trial (NLST) chest X-ray data, to find the optimal time for initiating chest X-ray screening in asymptomatic individuals. Incidence probability was used to control the risk of clinical incidence before the first exam, constraining it to a small value, given one's current age. The simulation study showed that the optimal screening age interval remains relatively consistent as the current age increases. Notably, male heavy smokers tended to have slightly later screening ages compared to females, which contrasted with the findings from NLST CT data. Once the future screening time/age was found, the lead time distribution and the probability of overdiagnosis were estimated if one would be diagnosed at this future time/age. The lead time was relatively consistent across incidence probability and sensitivity, with a slight decrease in the mean lead time as the current age increased, and it was positively correlated with the sojourn time. The probability of overdiagnosis exhibited positive correlations with the mean sojourn time, incidence probability, and current age, and it only slightly changes with sensitivity. Overall, the probability of overdiagnosis was small and was not a concern at a younger age.