|
|
||||||||
Thoracic Imaging |
1 From the Department of Diagnostic Radiology (M.D., G.M., A.H.M., R.W.G., J.E.W.), Institute of Medical Statistics (S.S.), and Department of Occupational Health (T.K.), Rheinisch-Westfâlische Technische Hochschule Aachen University, Pauwelsstrasse 30, D-52074 Aachen, Germany; and Department of Computed Tomography, Siemens Medical Solutions, Forchheim, Germany (T.G.F., L.G.). Received July 6, 2005; revision requested September 6; revision received December 15; accepted January 11, 2006; final version accepted February 26. Address correspondence to M.D. (e-mail: das{at}rad.rwth-aachen.de).
| ABSTRACT |
|---|
|
|
|---|
Materials and Methods: Institutional review board approval and informed consent were obtained. Multidetector row CT scans were randomly chosen and prospectively evaluated in 25 patients. Two dedicated CAD systemsImageChecker CT (R2 Technologies, Sunnyvale, Calif) and Nodule Enhanced Viewing (NEV) (Siemens Medical Solutions, Forchheim, Germany)were used. Results were interpreted by three radiologists with 1, 3, and 6 years of experience. Images were evaluated without and with CAD software. The reference standard was assessed by a consensus panel consisting of all three radiologists and an adjudicator with 8 years of experience.
Results: A total of 116 pulmonary nodules (average diameter, 3.4 mm; average volume, 32.05 mm3) were found in all data sets during consensus interpretation, which included findings from the CAD software and all radiologists. Overall sensitivity was 73% with ImageChecker CT and 75% with NEV. Overall sensitivity without CAD was 68% for radiologist 1, 78% for radiologist 2, and 82% for radiologist 3. With ImageChecker CT, sensitivity increased to 79% for radiologist 1, 90% for radiologist 2, and 84% for radiologist 3. With NEV, sensitivity increased to 79% for radiologist 1, 90% for radiologist 2, and 86% for radiologist 3. The average number of false-positive findings was six (range, 014) with ImageChecker CT and eight (range, 022) with NEV.
Conclusion: Radiologist performance in the interpretation of multidetector row CT scans can be improved by using CAD systems, with a reduction in the number of false-negative diagnoses. No statistically significant difference in sensitivity was found between the two CAD systems.
© RSNA, 2006
| INTRODUCTION |
|---|
|
|
|---|
To reduce the number of false-negative diagnoses, double reading has been proposed. However, double reading does not usually occur in the clinical routine setting. First introduced for mammography, double reading helped significantly improve sensitivity (1315). In chest CT, double reading showed similar results, with a significant improvement in the detection rate of pulmonary nodules (16). The use of double reading is not always possible in routine clinical settingsespecially because of limited human resources and cost effectiveness. Thus, the use of a computer-aided detection (CAD) system as a second reader may provide a solution to this problem.
The sensitivity of CAD systems alone has been previously assessed (2,1720), with reported sensitivities of between 38% and 84%. The effect of a single system on the performance of a radiologist with regard to the detection of pulmonary nodules has been described in several studies (17,19,2124). Recently, Rubin et al (21) compared the sensitivity of radiologists alone with the sensitivity of double reading and with that of a radiologist using CAD software as a second reader. Rubin et al showed a significant improvement in sensitivityfrom an average of 50% for radiologists alone to 76% for radiologists using CAD software. Marten et al (19) also tested the performance of radiologists with and without the use of a CAD system. In this study, they took into account the radiologist's experience and found that experienced radiologists benefited more from CAD schemes. Although inexperienced radiologists benefit from CAD systems as well, experience is crucial for the differentiation of true-positive findings from false-positive findings.
To our knowledge, no study has been performed to test two different commercially available systems head to head with regard to radiologist performance. Thus, the purpose of our study was to prospectively compare the effects of two CAD systems on the detection of small pulmonary nodules at multidetector row CT by using a consensus panel decision as the reference standard.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Patient Selection and Imaging
Institutional review board approval and informed consent were obtained. Twenty-five patients (five women, 20 men; age range, 4374 years; mean age, 54 years ± 8 [standard deviation]) who had undergone multidetector row CT of the chest (12 patients [five women, seven men]; age range, 4367 years; mean age, 52 years) or who were included in a low-dose lung cancer screening study (13 men; mean age, 67 years ± 9; age range, 5774 years) were randomly chosen. In the first group, eight patients were scanned for metastasis with underlying malignant disease (four kidney cancers, three prostate cancers, and one breast cancer), and four patients were scanned for tumor search. In the second group, the results of the screening examinations were obtained from a large lung cancer screening trial in which workers who had been exposed to asbestos underwent low-dose multidetector row CT at yearly intervals (inclusion criteria: minimum age, 45 years; maximum age, 75 years; asbestos exposure, 1645 years). All patients were scanned with a 16-section CT scanner (Somatom Sensation 16; Siemens Medical Solutions, Forchheim, Germany).
Imaging parameters for the first patient group (routine protocol) were as follows: 120 kV, effective milliampere-second according to patient weight (eg, 60 kg = 60 mAseff), 0.5-second rotation time, 16 x 1.5-mm collimation, 24-mm table feed per rotation, 2-mm section thickness, 1.5-mm reconstruction interval, and sharp reconstruction kernel (B50f; Siemens Medical Solutions).
Imaging parameters for the second patient group (low-dose screening protocol) were as follows: 120 kV, effective milliampere-second of 10 for patients who weighed less than 80 kg and 20 for patients who weighed more than 80 kg, 0.5-second rotation, 16 x 0.75-mm collimation, 18-mm table feed per rotation, 1-mm section thickness, 0.5-mm reconstruction intervals, and sharp reconstruction kernel (B50f; Siemens Medical Solutions).
The images were stored on a local network drive and were distributed to the two CAD workstations for further analysis.
Image Interpretation
Three radiologists with different levels of experience were chosen from the department of radiology. Radiologist 1 (G.M.) had 1 year of experience in interpreting CT scans of the chest, radiologist 2 (M.D.) had 3 years of experience in interpreting CT scans of the chest, and radiologist 3 (A.H.M.) had 6 years of experience in interpreting CT scans of the chest. All three radiologists independently interpreted all images without the aid of either of the two CAD systems. Analysis was performed in a room where no distractions could occur, and radiologists were given as much reading time as they wanted. Reading time was measured by the radiologists themselves with a standard stopwatch. Time started when the patient data were opened and stopped when the patient data were closed.
During analysis, each nodule was marked, and the diameter, volume, attenuation, and location were reported. Images were displayed with a standard lung window (window level, 700 HU; window width, 2000 HU). Each radiologist used cine viewing with thinMips, which is a stack of images presented as a 5-mm-thick maximum intensity projection, for interpretation. In addition, multiplanar reconstructions or three-dimensional renderings were allowed.
In the second step, which was performed 6 weeks to 3 months later to avoid recall bias, the data sets were put in random order and were interpreted independently by the radiologists with use of ImageChecker CT (R2 Technologies, Sunnyvale, Calif). Radiologists were blinded to previous results. The ImageChecker CT software preprocesses the data set and immediately presents the potential nodules ("candidate nodules") to the radiologist. All results that were found by the CAD software were stored. The candidate nodules were then evaluated by the radiologist and were either added as true-positive findings or dismissed as false-positive findings.
In the third step, which was performed within the next 6 weeks to 3 months after the final reading, the data sets were interpreted independently by all three radiologists with use of the Nodule Enhanced Viewing (NEV) system (Siemens Medical Solutions). Again, the radiologists were blinded to previous results. The NEV software begins processing the data set while the radiologist reviews images from the entire examination. CAD results are presented within a few minutes (average, 3.1 minutes per examination ± 2.9). All results obtained with the CAD software were stored, and the candidate nodules were either added as true-positive findings or dismissed as false-positive findings. In addition, original images were available for comparison with either system.
The times needed for reviewing the images with the two second-reader approaches were also noted with a stopwatch by one author (M.D.) and were compared.
CAD and Reference Standard
To assess the sensitivity and false-positive rate of the two systems individually, both CAD software systems were used to analyzed all images. Candidate lesions were stored and classified as true-positive or false-positive findings by the consensus panel, which included all three radiologists and an adjudicator (J.E.W.), as needed, with 8 years of experience.
The reference standard was assessed by using all reported findings after the single reading was performed (radiologist alone, radiologist with CAD software, CAD software alone, and radiologists and adjudicator, as needed). All findings were accepted or dismissed by the consensus panel.
Statistical Analysis
Nodule diameter, nodule volume, and reading times are given as arithmetic means and corresponding standard deviations; the number of detected nodules is given as the range, median, and corresponding interquartile range (the difference between the upper and the lower quartile). Observed false-positive rates with the two CAD systems are given as medians and corresponding interquartile ranges and are represented in a bar graph.
Observed sensitivity values are given as arithmetic means, medians, and interquartile ranges separately for the different analysis protocols (CAD system alone, radiologist alone, and radiologist with each of the two CAD systems) and are displayed graphically by using parallel box plots.
The nonparametric unpaired Wilcoxon test was used for overall comparison of sensitivity values obtained for all nodules with respect to the two acquisition protocols; comparisons were made separately for each of the two CAD systems. In case of a statistically significant test result with at least one of these two tests, further statistical analysis was conducted separately for the two acquisition protocols; otherwise, the 25 patients were treated as one study group.
First, sensitivity was assessed for all nodule sizes for the CAD software alone, for the three radiologists alone, and for the three radiologists with the CAD software.
The nonparametric paired Wilcoxon test was used for overall comparison of sensitivity values between the two software tools and for pairwise comparison of sensitivity values among the three different analysis protocols (radiologist alone, radiologist with ImageChecker CT, and radiologist with NEV); comparisons were made individually for each of the three radiologists.
A
coefficient was calculated to evaluate the degree of agreement between the three radiologists and between the three radiologists using the two CAD software tools. To simplify the computation of
coefficients, the number of nodules diagnosed per patient was classified into three categories as follows: 0, no nodules found; 1, 13 nodules found; and 2, more than 3 nodules found.
Moreover, the same statistical analysis scheme (ie, paired Wilcoxon tests and
coefficients) was applied to the subgroup of nodules that were smaller than 5 mm, thus ignoring nodules with diameters of 5 mm or more.
An
level of .05 was chosen for all statistical test procedures. In our opinion, a confirmative statement seemed plausible only for overall comparison between the two software tools during the analysis of all nodules. Thus, all additional statistical tests were performed in a clearly explorative sense. Therefore, no
adjustment was made. Hence, P values of less than or equal to .05 could be interpreted as indicating a statistically significant result.
All statistical analyses were performed by using a statistical software package (SAS for Windows, release 8.02, TS level 02MO; SAS Institute, Cary, NC) and the R open source statistical analysis system (www.R-project.org).
| RESULTS |
|---|
|
|
|---|
Nodule Size
Of the 116 pulmonary nodules (Fig 1), 89 were smaller than 5 mm in diameter. The average diameter of these smaller nodules was 2.9 mm ± 1.2, and the average volume was 27.3 mm3 ± 25.4. No statistically significant difference in sensitivity was found between the two acquisition protocols by using ImageChecker CT (P = .323) or NEV (P = .129); therefore, further statistical analysis was performed by using all 25 patients, without differentiation between the two protocols.
|
CAD Comparisons
ImageChecker CT detected 85 of 116 nodules, leading to an overall sensitivity of 73% for nodules of all sizes. The average sensitivity per examination was 75% (median, 80%; interquartile range, 25%). The average nodule diameter was 2.9 mm ± 7.2, and the average volume was 27.23 mm3 ± 89.43. The mean number of false-positive findings was six (range, 014) (Fig 2). The ImageChecker CT system found 68 of 89 nodules that were smaller than 5 mm, for a mean sensitivity of 76% (median, 83%; interquartile range, 25%).
|
The paired Wilcoxon test for overall comparisons between the two CAD systems gave a P value of less than .129, which is not indicative of a statistically significant difference between the two CAD systems regarding sensitivity for all nodule sizes.
The average reading time, as measured with a stopwatch, was 5.03 minutes ± 3.43 without CAD, 5.25 minutes ± 6.45 with ImageChecker CT, and 6.35 minutes ± 4.35 with NEV.
All Nodules
Sensitivity data for the detection of all nodules are shown in Figure 3. For all nodule sizes, radiologist 1 detected 79 of 116 nodules (overall sensitivity, 68%; average sensitivity per examination, 72% [median, 75%; interquartile range, 34%]). With ImageChecker CT, the number of detected nodules increased to 92 (overall sensitivity, 79%; average sensitivity per examination, 80% [median, 100%; interquartile range, 34%]). With NEV, the number of detected nodules increased to 92 (overall sensitivity, 79%; average sensitivity per examination, 81% [median, 83%; interquartile range, 34%]). If we compare the sensitivity values of the different analysis strategies, no statistically significant difference is found between the two CAD systems (P = .100). ImageChecker CT (P = .005) helped to significantly increase sensitivity relative to diagnosis without the aid of a CAD system, but no statistically significant increase was found with NEV (P = .116).
|
For all nodule sizes, radiologist 3 detected 95 of 116 nodules (overall sensitivity, 82%; average sensitivity per examination, 84% [median, 83%; interquartile range, 25%]). With ImageChecker CT, the number of detected nodules increased to 98 (overall sensitivity, 84%; average sensitivity per examination, 84% [median, 100%; interquartile range, 25%]). With NEV, the number of detected nodules increased to 100 (overall sensitivity, 86%; average sensitivity per examination, 89% [median, 100%; interquartile range, 20%]). No statistically significant difference in sensitivity was seen between the two CAD systems (P = .893) nor was any statistically significant difference seen with ImageChecker CT (P = .123) and NEV (P = .161) relative to diagnosis without the use of a CAD system.
Calculation of
coefficients (Table 1) showed an increased degree of agreement between any pair of radiologists with use of the CAD software.
|
|
Radiologist 2 detected 63 of 89 nodules that were smaller than 5 mm (overall sensitivity, 71%; average sensitivity per examination, 72% [median, 75%; interquartile range, 34%]). With ImageChecker CT, the number of detected nodules increased to 72 (sensitivity, 81%; average sensitivity per examination, 80% [median, 75%; interquartile range, 34%]). With NEV, the number of detected nodules increased to 75 (overall sensitivity, 84%; average sensitivity per examination, 83% [median, 83%; interquartile range, 25%]).
The results of pairwise comparison of the three analysis strategies with respect to sensitivity showed a statistically significant difference between the sensitivity values obtained for radiologist 2 alone and those obtained for radiologist 2 with ImageChecker CT (P = .027) and NEV (P = .011). There was, however, no statistically significant difference between the two CAD systems (P = .109).
Radiologist 3 detected 56 of 89 nodules that were smaller than 5 mm (overall sensitivity, 63%; average sensitivity per examination, 69% [median, 66%; interquartile range, 50%]). With ImageChecker CT, the number of detected nodules increased to 66 (overall sensitivity, 74%; average sensitivity per examination, 76% [median, 75%; interquartile range, 23%]). With NEV, the number of detected nodules increased to 71 (overall sensitivity, 80%; average sensitivity per examination, 80% [median, 75%; interquartile range, 34%]).
The results of pairwise comparison of the three analysis strategies with respect to sensitivity showed no statistically significant difference between sensitivity values obtained for radiologist 3 alone and those obtained for radiologist 3 with ImageChecker CT (P = .139). However, a statistically significant difference was found between sensitivity values obtained for radiologist 3 alone and those obtained with for radiologist 3 with NEV (P = .041). There was no statistically significant difference between the two CAD systems (P = .109).
Calculation of
coefficients (Table 2) showed an increased degree of agreement between any pair of radiologists with use of CAD software.
|
| DISCUSSION |
|---|
|
|
|---|
NEV versus ImageChecker CT
Although both tested CAD systems function in a different way, no statistically significant difference could be detected. Although the overall sensitivity of NEV was slightly higher than that of ImageChecker CT, there were fewer false-positive findings with ImageChecker CT. As mentioned earlier, however, there was no statistically significant difference between the two software systems. Although the sensitivity rates obtained in this study were higher than those reported in the literature (range, 38%86%) (2,1719), neither software system is ideal because they both miss 20%25% of all nodules. A further increase in sensitivity, however, can be expected with the ongoing development of algorithms. Thus, the sensitivity of the CAD systems is not perfect, and although CAD software should not be used as a first reader, it could have value as a second reader. Although it is not yet clear why some nodules are missed by the software system, the CAD software tends to work better for smaller nodules than for larger nodules.
Usually, the number of false-positive findings is a major drawback of CAD systems. In our study, both CAD systems had similar false-positive rates, with ImageChecker CT having fewer false-positive findings than NEV. Often, both software tools marked vessel bifurcations or small consolidations, which could be easily dismissed by the radiologists. All candidate nodules must be verified as true-positive or false-positive findings, and this can be challenging depending on the experience of the radiologist, as was demonstrated by Marten et al (19). In our study, the sensitivity of all three radiologists increased with use of the software tools; however, the least experienced radiologist still had the lowest sensitivity. This can be explained by this radiologist's lack of experience in the interpretation of CAD findings.
Nodule Detection
All radiologists had an increase in sensitivity with use of the software tools. Although there was not a statistically significant difference in the detection of all nodule sizes, the software tools were especially helpful in the detection of nodules that were smaller than 5 mm. There was a statistically significant increase in sensitivity with both software tools relative to the assessment of the radiologist alone. This result holds for all three radiologists, with the only exception being that the comparison between radiologist 3 with NEV and radiologist 3 alone did not result in a statistically significant difference in sensitivity. Rubin et al (21) showed that CAD software provides complementary nodule findings, making such findings superior to those of a second radiologist, who usually finds similar nodules as the first radiologist. In our study, an increased degree of agreement among the three radiologists was shown with CAD. Thus, CAD software tools have the potential to reduce variability among different radiologists and may lead to a less variable diagnosis.
Study Limitations
There were several limitations to this study. One might argue that there is a recall bias when looking at the examination results again. To avoid this bias, we waited six weeks to 3 months to reinterpret the images. Another bias could be the order of use for both systems; randomization of reading order would have reduced this bias. Third, the examinations were chosen at random and might not reflect a perfect average type of routine examination. Fourth, the number of examinations could have been increased. However, with three radiologists interpreting all examinations four times, the task was already time-consuming. In addition, more images without nodules could have been added to evaluate the performance of the CAD systems alone. Our study also contained a large number of very small nodules, for which clinical relevance is not yet clear.
In summary, the sensitivity of all three radiologists in the detection of pulmonary nodules increased with both CAD systems, and there was an increased degree of agreement among radiologists. These findings indicate the potential value of CAD, which leads to fewer false-negative diagnoses and a higher degree of interobserver agreement in clinical practice.
| ADVANCES IN KNOWLEDGE |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Abbreviations: CAD = computer-aided detection NEV = Nodule Enhanced Viewing
See Materials and Methods for pertinent disclosures.
Author contributions: Guarantor of integrity of entire study, M.D.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, M.D., A.H.M., J.E.W.; clinical studies, M.D., G.M., L.G., S.S., J.E.W.; experimental studies, all authors; statistical analysis, M.D., A.H.M., S.S., J.E.W.; and manuscript editing, M.D., G.M., T.G.F., L.G., S.S., T.K., R.W.G., J.E.W.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
N. Kawel, B. Seifert, M. Luetolf, and T. Boehm Effect of Slab Thickness on the CT Detection of Pulmonary Nodules: Use of Sliding Thin-Slab Maximum Intensity Projection and Volume Rendering Am. J. Roentgenol., May 1, 2009; 192(5): 1324 - 1329. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Girvin and J. P. Ko Pulmonary Nodules: Detection, Assessment, and CAD Am. J. Roentgenol., October 1, 2008; 191(4): 1057 - 1069. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Fenton, S. H. Taplin, P. A. Carney, L. Abraham, E. A. Sickles, C. D'Orsi, E. A. Berns, G. Cutter, R. E. Hendrick, W. E. Barlow, et al. Influence of Computer-Aided Detection on Performance of Screening Mammography N. Engl. J. Med., April 5, 2007; 356(14): 1399 - 1409. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |