|
|
||||||||
Computer Applications |
1 From the Department of Radiology, Kurt Rossmann Laboratories for Radiologic Image Research, University of Chicago, 5841 S Maryland Ave, Chicago, IL 60637. Received November 25, 1998; revision requested December 29; final revision received August 30, 1999; accepted September 2. Supported in part by United States Public Health Service grants CA24806 and CA62625. Address reprint requests to K.D. (e-mail: k-doi@uchicago.edu).
| Abstract |
|---|
|
|
|---|
MATERIALS AND METHODS: Fifty-six chest radiographs of 34 primary lung cancers and 22 benign nodules were digitized with a 0.175-mm pixel size and a 10-bit gray scale. Eight subjective image features were evaluated and recorded by radiologists in each case. A computerized method was developed to extract objective features that could be correlated with the subjective features. An ANN was used to distinguish benign from malignant nodules on the basis of subjective or objective features. The performance of the ANN was compared with that of the radiologists by means of receiver operating characteristic (ROC) analysis.
RESULTS: Performance of the ANN was considerably greater with objective features (area under the ROC curve, Az = 0.854) than with subjective features (Az = 0.761). Performance of the ANN was also greater than that of the radiologists (Az = 0.752).
CONCLUSION: The computerized scheme has the potential to improve the diagnostic accuracy of radiologists in the distinction of benign and malignant solitary pulmonary nodules.
Index terms: Computers, neural network Computers, diagnostic aid Diagnostic radiology, observer performance Lung neoplasms, diagnosis, 60.11, 60.31, 60.321 Lung, nodule, 60.281 Receiver operating characteristic (ROC) curve
| Introduction |
|---|
|
|
|---|
A computerized scheme that is capable of providing objective information may aid radiologists in the classification of pulmonary nodules. Various computerized schemes have been investigated for the characterization of pulmonary nodules. In most of these studies, however, radiographic features were extracted manually, and the computer was used only to merge the image features by rule-based or discriminant analysis for the determination of the likelihood of malignancy.
Swensen et al (15) estimated the probability of malignancy in radiologically indeterminate solitary pulmonary nodules by use of multivariate logistic regression. They concluded that three clinical characteristics (age, cigarette-smoking status, and history of cancer) and three radiologic characteristics (diameter, spiculation, and location in the upper lobe) were independent predictors of malignancy. Cummings et al (16) estimated the probability of malignancy of pulmonary nodules by use of Bayesian analysis based on the diameter of a solitary pulmonary nodule, patient's age, history of cigarette smoking, and prevalence of malignancy in solitary pulmonary nodules. Gurney (17) and Gurney et al (18) also used Bayesian analysis to calculate the probability of malignancy, which was compared with the findings of the radiologists.
Other investigators have used computer-extracted features to differentiate malignant and benign pulmonary nodules. Sherrier et al (19) applied gradient analysis in the distinction of benign and malignant nodules, and they reported that benign calcified granulomas showed a gradient number that was greater than that of malignant nodules. Sasaoka et al (20) extracted nodule features by use of a computerized method. However, the extracted features, such as density gradient and density entropy, were not directly correlated with specific radiologic findings. This lack of correlation makes it difficult to understand the importance of their findings.
Recently, artificial neural networks (ANNs) have been used in diagnostic radiology as potentially powerful classification tools (2127). Gurney and Swensen (28) reported that use of the Bayesian method was better than use of an ANN in the prediction of the probability of malignancy in pulmonary nodules in which radiographic features were extracted manually. Despite their considerable efforts, no practical computerized schemes were developed.
Our purpose in this study was to develop a practical computer-aided diagnostic scheme to assist radiologists in the objective distinction of benign and malignant pulmonary nodules.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The final diagnosis was established at pathologic examination, and, for some benign nodules, a presumed diagnosis was made on the basis of an absence of change or a decrease in nodule size during a 2-year period. The 34 primary bronchogenic carcinomas included adenocarcinomas (n = 26), squamous cell carcinomas (n = 3), small cell carcinoma (n = 1), carcinoid tumors (n = 1), and tumors of unknown subtype (n = 3). The 22 benign lesions were classified as pulmonary hamartomas (n = 2), granulomas (n = 12), inflammatory lesions (n = 7), and pulmonary infarctions (n = 1). Fifty-six radiographs were obtained in 33 women and 23 men (age range, 2486 years; mean age, 58.4 years). Chest radiographs were digitized by use of a laser scanner (Abe Sekkei, Tokyo, Japan) with a pixel size of 0.175 mm and a 10-bit gray scale (1,024 gray levels).
Subjective Feature Extraction by the Radiologists
Eight subjective radiologic features of the pulmonary nodules were quantitatively recorded. These included nodule size, nodule shape, marginal irregularity, spiculation, border definition, lobulation, nodule density (contrast), and homogeneity. The observers used a ruler to measure nodule size. The variation in measured nodule sizes was mainly due to variation in subjective judgments of the edges of the nodule. Nodule shapes ranged from round to elongated, marginal irregularity ranged from smooth to irregular, spiculation ranged from nonspiculated to spiculated, border definition ranged from well defined to poorly defined, lobulation ranged from nonlobulated to lobulated, nodule density ranged from low to high, and homogeneity ranged from homogeneous to inhomogeneous.
Seven radiologists (four attending radiologists [including K.A.] and three radiology residents) used a score sheet with a scale from 1 to 5 to independently extract the features of each nodule. The score sheet included pictorial diagrams of two extreme examples, such as a nonspiculated nodule and a spiculated nodule, to serve as a guide. Table 1 shows examples of a radiologist's ratings for the eight radiologic findings in two pulmonary nodules (Fig 1). To eliminate bias, all radiologists rated each nodule without knowledge of the correct diagnosis.
|
|
|
|
|
We believe that the definition of marginal irregularity contains two independent factors, namely, the magnitude and the coarseness (or fineness) of irregular edge patterns, which was defined here as the distance from the nodule outline to the fitted ellipse (Fig 3). The irregular edge pattern was analyzed by means of Fourier transformation. The root-mean-square variation and the first moment of the power spectrum (35) were calculated as measures of the magnitude and the coarseness, respectively, of marginal irregularity. The degree of irregularity (32), which was used as another measure of marginal irregularity, was defined as 1 minus the ratio of the perimeter (circumference) of the ellipse to the length of the contour.
|
|
The mean pixel value was a measure of the optical density of a nodule. The SD of the pixel values over the nodule was a measure of the homogeneity of the nodule.
Artificial Neural Networks
Three-layer, feed-forward ANNs with back-propagation algorithms (38) were used. Two clinical parameters (patient's age and sex) and eight radiologic findings extracted by radiologists or physical measures obtained from the computer analysis were used as input data for the ANNs. The basic structure of the ANN included 10 input units, five hidden units, and one output unit. The number of hidden units was empirically determined, as it is generally done in ANN applications. Input data obtained from clinical parameters, subjective ratings by radiologists, and physical measures obtained by use of the computer were normalized to range from 0 to 1. The output of the ANN represented the likelihood of malignancy (0 = benign, 1 = malignant).
The training and testing of the ANN were performed by means of a round-robin (or leave-one-out) method (24). With this method, all of the cases in the database but one were used for training, and the case not used was applied in the testing of the trained ANNs. This procedure was repeated until every case in the database was used once for testing. The performance of the ANNs was evaluated on a per-patient basis (24) for individual radiologists and for all of the radiologists together by means of receiver operating characteristic (ROC) curves (39). The LABROC4 program (40) was used to fit the ROC curves, and the area under the ROC curve, Az, was used as an index of performance in the distinction of benign and malignant nodules.
Observer Performance
The seven radiologists participated in the evaluation of their performance in the classification of pulmonary nodules. Each observer was presented with a chest radiograph and two clinical parameters (patient's age and sex) and was asked to mark his or her confidence level regarding the likelihood of malignancy by using an analog continuous rating scale with a line-checking method (29). Confidence ratings of definitely benign and definitely malignant were marked above the left and the right ends of the line, respectively. Radiographs were presented in random order. ROC analysis was used for the comparison of the performance of observers with that of the computerized methods in the distinction of benign and malignant nodules.
| RESULTS |
|---|
|
|
|---|
|
We selected six features for input into the ANN on the basis of the performance obtainable with each independent feature and with the radiologists' knowledge and experience. These features included patient's age, nodule size, marginal irregularity, border definition, spiculation, and homogeneity.
The mean Az value of 0.761 for all seven radiologists with the six selected subjective features was significantly greater than the Az value for all 10 subjective features (Az = 0.710; P <.001). The mean Az value for the ANN with the six features selected by either the attending radiologists or the residents was greater than the Az value for all 10 features. Figure 4 shows the comparison of the performances of the radiologists and the ANN with the selected subjective features. The mean performance (Az = 0.790) of the ANN with selected features extracted by attending radiologists was slightly greater than the mean performance (Az = 0.774) of the attending radiologists. The mean performance (Az = 0.722) of the ANN with selected subjective features extracted by residents was lower than the mean performance (Az = 0.744) of the residents.
|
|
|
We also evaluated the performance of the ANN by using the features extracted by all of the radiologists as a group. The performance of the ANN with the features extracted by all radiologists is also shown in Table 2. With all 10 of the subjective features, the Az value of 0.747 for the ANN with the features extracted by all radiologists was comparable to the mean Az value of 0.742 for the ANN with the features extracted by each attending radiologist.
However, when the ANN was used with the six selected subjective features, the Az value of 0.754 for the ANN with the features extracted by all radiologists was lower than the mean Az value of 0.790 for the ANN with the features extracted by each attending radiologist. This result seems to indicate that, despite the larger number of input data for the ANN with all radiologists, the ANN did not learn patterns of input data well in the distinction of benign and malignant nodules; this may have been due to the variations among radiologists' subjective ratings.
Computerized Analysis of Objective Features
To evaluate the usefulness of physical measures obtained by means of the computerized analysis, we first compared the physical measures with the radiologists' subjective ratings. Figure 5 shows the relationship between the root-mean-square value and the subjective ratings of marginal irregularity and the relationship between the SD of pixel values and the subjective ratings of homogeneity. These results indicate that physical measures corresponded well to the radiologists' subjective ratings. Table 3 shows the correlation coefficients between the physical measures and subjective features; in general, the coefficients indicated that most of the physical measures correlated well with the corresponding subjective features.
|
|
|
|
|
It should be noted, however, that we always included the patient's age and nodule size because these two features are considered to be among the most important features in the differentiation of pulmonary nodules (15,16). Table 4 shows combinations of computer-extracted features that were used to achieve a high performance with the ANN (Az > 0.830). Figure 7 shows a comparison of the ROC curves for the performances of the ANN with the computer-extracted features, of the ANN with six subjective ratings by selected radiologists, and of the radiologists. The results indicated that the performance of the ANN with the selected computer-extracted features was better than that of the radiologists or the ANN with subjective features.
|
|
| DISCUSSION |
|---|
|
|
|---|
The ANN has a unique ability to learn specific patterns between input and output data if it is repeatedly trained with examples. However, this ability strongly depends on the quality of the input data. If the input data are randomly selected and if they have no correlation with the output data, the ANN cannot learn any specific pattern; this would result in poorer performance. In this study, the performance of the ANN with use of subjective ratings by each radiologist varied considerably. This seemed to reflect the large variation in the subjective ratings made by the radiologists. Although we provided pictorial diagrams to help the radiologists to improve their consistency in extracting subjective image features, the ratings were highly subjective and could have been strongly affected by an individual radiologist's knowledge and experience.
In general, it is desirable to train the ANN with a large database that contains a wide spectrum of data. In this study, therefore, we applied the round-robin test by combining all of the data provided by the seven radiologists. Although the performance of the ANN improved slightly with use of all of the data, compared with the mean performance of the ANN with each radiologist's data for all features, the performance decreased when selected features were used. This might have been caused by the variation among the subjective ratings by the radiologists. The data from some radiologists might have had a negative influence on the ANN in learning the pattern of the data from the other radiologists.
Another limitation of using the subjective ratings as input data for the ANN is that the quality of these subjective ratings depends on the ability of a radiologist to extract nodule features. In our study, the performance of the ANN with subjective features extracted by radiology residents was much lower than its performance with features extracted by attending radiologists. This indicated that radiologists with less experience could not extract the nodule features with sufficient accuracy. Consequently, the ANN could not learn the specific patterns between input data and output data well enough. Therefore, computer-aided diagnostic schemes that can be used to extract nodule features automatically, objectively, and reproducibly are highly desirable.
Our results showed that, with features extracted by the computer, the ANN performed better than the radiologists (mean value); it even performed better than the ANN with subjectively extracted features. Although physical measures were initially selected on the basis of their expected correlation with the subjective features, these physical measures may have contained additional and possibly useful information. This may have contributed to the distinction of the benign from the malignant pulmonary nodules.
If more useful parameters for the differentiation of pulmonary nodules (ie, history of smoking and tumor markers) are available as input data, a better performance with the ANN would be expected. However, in this study, we used a smaller number of essential features in our attempt to develop a ANN scheme for use in more practical clinical situations. We chose patient's age and sex because they are available in almost all clinical situations. However, smoking history and history of cancer are not routinely available and were, therefore, not incorporated into the analysis.
In this study, the computer analysis was performed on the basis of nodule outlines drawn by a radiologist. This introduced a subjective element that is a potential limitation of this study. Therefore, we examined the nodule outlines drawn by another radiologist and confirmed that the ANN, with another set of computer-extracted features, provided a comparable or better performance (Az = 0.920). However, the difference in the performance of the ANN with the nodule outlines drawn by each of the two radiologists was large. This could have been due to the variation between the two outlines, although the radiologists were not informed about the correct diagnosis. Therefore, the development of a computerized method for the automatic extraction of a nodule outline is desirable.
Because the definition of the morphology of pulmonary nodules is limited on chest radiographs, a CT image must be obtained in most patients who have noncalcified solitary pulmonary nodules. However, CT is expensive and exposes the patient to radiation. The aim of our computerized classification scheme for solitary pulmonary nodules was to reduce the number of patients with benign nodules who are referred for further diagnostic evaluation. This scheme achieved a higher performance than did the radiologists (mean value) in the study, although the computer was provided with a nodule outline drawn by a radiologist. It appears that the computerized classification method has the potential to be a useful aid to radiologists in the differentiation of benign and malignant pulmonary nodules; in the future, it may reduce the number of unnecessary CT examinations that are performed.
|
| APPENDIX |
|---|
|
|
|---|
One hundred thirty-three patients (83 men, 50 women; age range, 2585 years; mean age, 62.9 years) were identified at the five institutions. The clinical diagnosis before CT consisted of suspected lung cancer (n = 43), pulmonary nodule and/or mass (n = 70), abnormal shadow (n = 10), suspected pulmonary metastasis (n = 6), and benign diseases (suspected pulmonary tuberculosis [n = 1], abscess [n = 2], or aspergillosis [n = 1]). Of the patients who, before CT, were suspected of having a pulmonary nodule and/or mass, some might have been suspected of having benign disease. However, we assumed that most of these patients were suspected of having malignant disease.
Table A1 summarizes the data from this survey. Fifty-five (41.4%) of the 133 patients had malignant nodules, which included primary lung cancer and pulmonary metastases. Sixty-four patients (48.1%) had benign conditions, which included benign diseases and negative findings with no apparent pulmonary abnormality at CT. Fourteen patients did not have conclusive final diagnoses. The results obtained in this survey showed that a large fraction of the patients who underwent chest CT were identified as having benign conditions. This finding appears to indicate that some of these studies could have been avoided if the benign cases had been confidently diagnosed on chest radiographs.
| Acknowledgments |
|---|
| Footnotes |
|---|
3 Department of Radiology, Nagasaki University School of Medicine, Japan. ![]()
H. M. and K. D. are shareholders of R2 Technology, Los Altos, Calif. It is the policy of the University of Chicago that investigators publicly disclose actual or potential substantial financial interests that may appear to be affected by the research activities.
Abbreviations: ANN = artificial neural network ROC = receiver operating characteristic
Author contributions: Guarantors of integrity of entire study, study concepts and design, and definition of intellectual content, K.N., K.D.; literature research, K.N., H.M.; clinical studies, K.N., H.M., K.A.; experimental studies, T.I., S.K.; data acquisition, H.Y., R.E., S.K.; data analysis, H.Y., K.N., S.K.; manuscript preparation, K.N., K.D.; manuscript editing, H.M.; manuscript review, H.Y., R.E., S.K., T.I., K.A.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. Yamashita, T. Yoshiura, H. Arimura, F. Mihara, T. Noguchi, A. Hiwatashi, O. Togao, Y. Yamashita, T. Shono, S. Kumazawa, et al. Performance Evaluation of Radiologists with Artificial Neural Network for Differential Diagnosis of Intra-Axial Cerebral Tumors on MR Images AJNR Am. J. Neuroradiol., June 1, 2008; 29(6): 1153 - 1158. [Abstract] [Full Text] [PDF] |
||||
![]() |
E M Schultz, G D Sanders, P R Trotter, E F Patz Jr, G A Silvestri, D K Owens, and M K Gould Validation of two models to estimate the probability of malignancy in patients with solitary pulmonary nodules Thorax, April 1, 2008; 63(4): 335 - 341. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Gould, J. Fletcher, M. D. Iannettoni, W. R. Lynch, D. E. Midthun, D. P. Naidich, and D. E. Ost Evaluation of Patients With Pulmonary Nodules: When Is It Lung Cancer?: ACCP Evidence-Based Clinical Practice Guidelines (2nd Edition) Chest, September 1, 2007; 132(3_suppl): 108S - 130S. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. J. Jeong, C. A. Yi, and K. S. Lee Solitary Pulmonary Nodules: Detection, Characterization, and Guidance for Further Diagnostic Workup and Treatment Am. J. Roentgenol., January 1, 2007; 188(1): 57 - 68. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Nie, Q. Li, F. Li, Y. Pu, D. Appelbaum, and K. Doi Integrating PET and CT Information to Improve Diagnostic Accuracy for Lung Nodules: A Semiautomatic Computer-Aided Method J. Nucl. Med., July 1, 2006; 47(7): 1075 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
K Doi Current status and future potential of computer-aided diagnosis in medical imaging Br. J. Radiol., January 1, 2005; 78(suppl_1): S3 - s19. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Li, M. Aoyama, J. Shiraishi, H. Abe, Q. Li, K. Suzuki, R. Engelmann, S. Sone, H. MacMahon, and K. Doi Radiologists' Performance for Differentiating Benign from Malignant Lung Nodules on High-Resolution CT Using Computer-Estimated Likelihood of Malignancy Am. J. Roentgenol., November 1, 2004; 183(5): 1209 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Fukushima, K. Ashizawa, T. Yamaguchi, N. Matsuyama, H. Hayashi, I. Kida, Y. Imafuku, A. Egawa, S. Kimura, K. Nagaoki, et al. Application of an Artificial Neural Network to High-Resolution CT: Usefulness in Differential Diagnosis of Diffuse Lung Disease Am. J. Roentgenol., August 1, 2004; 183(2): 297 - 305. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Shiraishi, H. Abe, R. Engelmann, M. Aoyama, H. MacMahon, and K. Doi Computer-aided Diagnosis to Distinguish Benign from Malignant Solitary Pulmonary Nodules on Radiographs: ROC Analysis of Radiologists' Performance--Initial Experience Radiology, May 1, 2003; 227(2): 469 - 474. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. B. Tan, K. R. Flaherty, E. A. Kazerooni, and M. D. Iannettoni The Solitary Pulmonary Nodule Chest, January 1, 2003; 123(1_suppl): 89S - 96S. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Abe, H. MacMahon, R. Engelmann, Q. Li, J. Shiraishi, S. Katsuragawa, M. Aoyama, T. Ishida, K. Ashizawa, C. E. Metz, et al. Computer-aided Diagnosis in Chest Radiography: Results of Large-Scale Observer Tests at the 1996-2001 RSNA Scientific Assemblies RadioGraphics, January 1, 2003; 23(1): 255 - 265. [Abstract] [Full Text] [PDF] |
||||