A recently published article by Dr. Henry Thompson et al (Cancer Prevention Laboratory) made the front cover of the June 2004 issue of Cancer Epidemiology Biomarkers & Prevention. The cover depicts a pairwise correlation graphs and histograms of nuclear morphometric parameters superimposed over Feulgen stained lung epithelial cells obtained from cell culture.

Wolfe P, Murphy J, McGinley J, Zhu Z, Jiang W, Gottschall EB, Thompson HJ. (2004) Using nuclear morphometry to discriminate the tumorigenic potential of cells: a comparison of statistical methods. Cancer Epidemiol Biomarkers Prev. Jun;13(6):976-88.

Despite interest in the use of nuclear morphometry for cancer diagnosis and prognosis as well as to monitor changes in cancer risk, no generally accepted statistical method has emerged for the analysis of these data. To evaluate different statistical approaches, Feulgen-stained nuclei from a human lung epithelial cell line, BEAS-2B, and a human lung adenocarcinoma (non-small cell) cancer cell line, NCI-H522, were subjected to morphometric analysis using a CAS-200 imaging system. The morphometric characteristics of these two cell lines differed significantly. Therefore, we proceeded to address the question of which statistical approach was most effective in classifying individual cells into the cell lines from which they were derived. The statistical techniques evaluated ranged from simple, traditional, parametric approaches to newer machine learning techniques. The multivariate techniques were compared based on a systematic cross-validation approach using 10 fixed partitions of the data to compute the misclassification rate for each method. For comparisons across cell lines at the level of each morphometric feature, we found little to distinguish nonparametric from parametric approaches. Among the linear models applied, logistic regression had the highest percentage of correct classifications; among the nonlinear and nonparametric methods applied, the Classification and Regression Trees model provided the highest percentage of correct classifications. Classification and Regression Trees has appealing characteristics: there are no assumptions about the distribution of the variables to be used, there is no need to specify which interactions to test, and there is no difficulty in handling complex, high-dimensional data sets containing mixed data types.

Colorado State University CHOICE BreastWatch