Date on Master's Thesis/Doctoral Dissertation


Document Type

Master's Thesis

Degree Name

M. Eng.


Electrical and Computer Engineering

Committee Chair

Graham, James H.

Committee Co-Chair (if applicable)

Farag, Aly A.




From a computerized image analysis prospective, early diagnosis of lung cancer involves detection of doubtful nodules and classification into different pathologies. The detection stage involves a detection approach, usually by template matching, and an authentication step to reduce false positives, usually conducted by a classifier of one form or another; statistical, fuzzy logic, support vector machines approaches have been tried. The classification stage matches, according to a particular approach, the characteristics (e.g., shape, texture and spatial distribution) of the detected nodules to common characteristics (again, shape, texture and spatial distribution) of nodules with known pathologies (confirmed by biopsies). This thesis focuses on the first step; i.e., nodule detection. Specifically, the thesis addresses three issues: a) understanding the CT data of typical low dose CT (LDCT) scanning of the chest, and devising an image processing approach to reduce the inherent artifacts in the scans; b) devising an image segmentation approach to isolate the lung tissues from the rest of the chest and thoracic regions in the CT scans; and c) devising a nodule modeling methodology to enhance the detection rate and lend benefits for the ultimate step in computerized image analysis of LDCT of the lungs, namely associating a pathology to the detected nodule. The methodology for reducing the noise artifacts is based on noise analysis and examination of typical LDCT scans that may be gathered on a repetitive fashion; since, a reduction in the resolution is inevitable to avoid excessive radiation. Two optimal filtering methods are tested on samples of the ELCAP screening data; the Weiner and the Anisotropic Diffusion Filters. Preference is given to the Anisotropic Diffusion Filter, which can be implemented on 7x7 blocks/windows of the CT data. The methodology for lung segmentation is based on the inherent characteristics of the LDCT scans, shown as distinct bi-modal gray scale histogram. A linear model is used to describe the histogram (the joint probability density function of the lungs and non-lungs tissues) by a linear combination of weighted kernels. The Gaussian kernels were chosen, and the classic Expectation-Maximization (EM) algorithm was employed to estimate the marginal probability densities of the lungs and non-lungs tissues, and select an optimal segmentation threshold. The segmentation is further enhanced using standard shape analysis based on mathematical morphology, which improves the continuity of the outer and inner borders of the lung tissues. This approach (a preliminary version of it appeared in [14]) is found to be adequate for lung segmentation as compared to more sophisticated approaches developed at the CVIP Lab (e.g., [15][16]) and elsewhere. The methodology developed for nodule modeling is based on understanding the physical characteristics of the nodules in LDCT scans, as identified by human experts. An empirical model is introduced for the probability density of the image intensity (or Hounsfield units) versus the radial distance measured from the centroid – center of mass - of typical nodules. This probability density showed that the nodule spatial support is within a circle/square of size 10 pixels; i.e., limited to 5 mm in length; which is within the range that the radiologist specify to be of concern. This probability density is used to fill in the intensity (or Hounsfield units) of parametric nodule models. For these models (e.g., circles or semi-circles), given a certain radius, we calculate the intensity (or Hounsfield units) using an exponential expression for the radial distance with parameters specified from the histogram of an ensemble of typical nodules. This work is similar in spirit to the earlier work of Farag et al., 2004 and 2005 [18][19], except that the empirical density of the radial distance and the histogram of typical nodules provide a data-driven guide for estimating the intensity (or Hounsfield units) of the nodule models. We examined the sensitivity and specificity of parametric nodules in a template-matching framework for nodule detection. We show that false positives are inevitable problems with typical machine learning methods of automatic lung nodule detection, which invites further efforts and perhaps fresh thinking into automatic nodule detection. A new approach for nodule modeling is introduced in Chapter 5 of this thesis, which brings high promise in both the detection, and the classification of nodules. Using the ELCAP study, we created an ensemble of four types of nodules and generated a nodule model for each type based on optimal data reduction methods. The resulting nodule model, for each type, has lead to drastic improvements in the sensitivity and specificity of nodule detection. This approach may be used as well for classification. In conclusion, the methodologies in this thesis are based on understanding the LDCT scans and what is to be expected in terms of image quality. Noise reduction and image segmentation are standard. The thesis illustrates that proper nodule models are possible and indeed a computerized approach for image analysis to detect and classify lung nodules is feasible. Extensions to the results in this thesis are immediate and the CVIP Lab has devised plans to pursue subsequent steps using clinical data.