|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Breast Imaging |
1 From the Departments of Radiology, University of Pittsburgh and Magee-Womens Hospital, Imaging Research, Suite 4200, 300 Halket St, Pittsburgh, PA 15213. Received April 29, 2002; revision requested June 21; final revision received September 23; accepted October 23. Supported in part by grants CA77850, CA85241, and CA80836 from the National Cancer Institute of the National Institutes of Health and by the U.S. Army Medical Research Acquisition Center, Fort Detrick, Md, under contract DAMD17-00-1-0410. Address correspondence to B.Z. (e-mail: zhengb@msx.upmc.edu).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: One hundred positive mammographic examinations (four views each), depicting 96 masses and 50 microcalcification clusters, were scanned and analyzed three times by the CAD system. Reproducibility of detection sensitivity and the individual CAD-generated cues in the three images were examined. Both abnormality- and region-based detection sensitivities were compared.
RESULTS: Forty-eight (96.0%) of 50 microcalcification clusters were marked on all three images in the abnormality-based analysis. Of the remaining two clusters, one was marked in two images and one was marked in only one. The abnormality-based sensitivity for mass detection ranged from 66.7% (64 of 96) to 70.8% (68 of 96). The system generated identical patterns (including images with and those without cues) for all three images in 53.3% (213 of 400) of images. For true-positive cluster regions, 88.9% (80 of 90) were marked at the same location in all images. For true-positive mass regions, 69.5% (82 of 118) were marked at the same locations in all images. In false-positive detections, only 44.0% (81 of 184) of false-positive mass regions and 31.9% (38 of 119) of false-positive cluster regions were marked at the same locations on all three images.
CONCLUSION: Reproducibility of marked regions generated by the CAD system is improved from that reported previously, largely as a result of the substantial reduction in the false-positive detection rates. Reproducibility of true-positive identification of masses remains an important issue that may have methodologic and clinical practice implications.
Index terms: Breast neoplasms, diagnosis, 00.119 Breast radiography, technology, 00.119 Computers, diagnostic aid, 00.119
| INTRODUCTION |
|---|
|
|
|---|
Because of the potential importance of CAD systems in the clinical environment, several studies (610) have been conducted recently to evaluate the performance of CAD systems alone and their possible effect on diagnostic performance of radiologists under a variety of clinical conditions. In one recent study involving 12,860 patients in a community breast center, use of CAD resulted in a 19.5% increase in the number of cancers detected without undue effect on the recall rate (from 6.5% to 7.7%) (6). In another large retrospective study, a false-negative rate of 21% was found when 14 radiologists interpreted mammograms, and the CAD system correctly marked 77% of these missed cases (7). Thus, researchers claim that CAD cueing could potentially reduce this false-negative rate by as much as 77% without an increase in the recall rate (8). On the other hand, findings in a different study showed that despite high (and clinically viable) sensitivity, the CAD system had no effect on radiologist performance (including sensitivity and specificity) (9). These researchers suggested that perhaps the many false-positive markings influenced the radiologists not to have sufficient confidence in the CAD results to alter their original interpretations (9). Results in another retrospective study demonstrated that the performance of a CAD system could affect the performance of radiologists in the detection of masses and microcalcification clusters. Highly performing CAD schemes with high sensitivity and a low false-positive rate could improve radiologists performance significantly, while poorly performing CAD schemes could significantly (P < .01) decrease readers performance (10).
An important issue related to the use of CAD is the reproducibility of results. In one study, an early version of ImageChecker (R2 Technology, Los Altos, Calif) was evaluated, and the authors suggested that its reproducibility may be insufficient for the routine clinical environment (11). Recently, a new version of the software was used, which improves the detection sensitivity and specificity (12). In the version used in the current study (ImageChecker, version 2.0), the stated detection sensitivity for the cancer cases was increased from 83.7% to 90.4% (including an increase in mass detection from 74.7% to 85.7% and an essentially unchanged performance for microcalcification detection of more than 98%). At the same time, the false-positive rate was reduced substantially from approximately 1.0 per image to 0.5 per image (or 4.12.06 false-positive cues per four views in true-negative cases) (12). The purpose of our study was to examine the performance and reproducibility of a commercially available CAD system by using a set of mammograms acquired in 100 patients who had undergone biopsy after positive findings at mammography.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Each case could involve one or more abnormalities (mass, microcalcification cluster, or both). In these 100 cases, 51 depicted only masses (43 depicted one mass and eight depicted two masses), 12 depicted only microcalcification clusters (11 depicted one cluster and one depicted two clusters), and 37 depicted both masses and clusters (one mass and one cluster). There were no cases with more than three abnormalities depicted. The data set involved 96 verified masses and 50 verified microcalcification clusters. Sixty-five of the 96 masses were malignant, and 31 were benign. Thirty-one of the 50 microcalcification clusters were associated with malignancy, and 19 were benign. By examining all source documents (including pathology reports), the locations of all abnormalities were specified by radiologists.
CAD Evaluation
These 400 images were scanned through the CAD system three times within a period of 3 weeks. After digitization and computation, suspicious masses and microcalcification clusters identified by the CAD system were marked on the output paper images by using the standard identification scheme. The CAD system does not outline the entire mass region or individual microcalcifications in a cluster, only a small star or a triangle is superimposed on the image to indicate the presence of a suspicious region for a mass or a cluster, respectively. The boundaries of masses and clusters were identified visually on the images by a researcher (B.Z.), who consulted with radiologists in cases of ambiguity. If the star was located anywhere inside a true-positive mass region in the image, this mass was considered to be identified correctly by the CAD system. Similarly, as long as a triangle was overlapping any of the microcalcification areas, the mark was considered to represent a true-positive detection. Otherwise, the cue was considered to identify a false-positive region. The processing of each case resulted in three sets of output images.
Data Analysis
The sensitivity, false-positive rate, and reproducibility of the CAD system with these 100 cases (or 400 images) were analyzed for abnormality- and region-based values. In the abnormality-based analysis, the sensitivity is assessed on the basis of the correct marking of at least one true-positive region in either view (craniocaudal, mediolateral oblique, or both), which included 96 masses (65 malignant) and 50 calcifications (31 malignant) in the 100 cases. In cases with more than one abnormality, each was considered to be independent of the others. In the region-based analysis, the abnormality depicted in each view (either craniocaudal or mediolateral oblique) was considered an independent true-positive finding. Sensitivity was then computed on the basis of the number of correctly detected true-positive regions (rather than abnormalities). This approach included 292 positive findingsnamely, 96 masses and 50 clusters, each visible on two views. To compare the differences in proportions of correctly detected abnormalities among replicated images, the pairwise McNemar test was applied to the data set.
| RESULTS |
|---|
|
|
|---|
|
|
Although Tables 1 and 2 show that the total number of regions detected in this set of images is relatively constant with all three scans, the locations of the regions detected (in particular, false-positive regions) could differ from scan to scan. In 213 of 400 images, the output results for all three scans were identical, which represents an overall reproducibility of 53.3%. Among these images, 37.6% (80 of 213) had no cues (including neither true-positive nor false-positive cues) in all three scans. For the remaining 320 images, the CAD system marked 511 regions (1.6 cues per image) in three scans (including true-positive cues). Of these 511 marked regions, 281 were identified on all three scans (55% region-based reproducibility).
Tables 3 and 4 summarize the number of true-positive and false-positive masses and microcalcification clusters (including both abnormalities and regions) that were identified in all three scans, two scans, or only one scan. The results show that the reproducibility for the true-positive regions (those identified in all three scans) is substantially higher than that for the false-positive regions. For the true-positive mass regions, the CAD system generated 118 cues in three scans, and 82 (69.5%) of them were marked at the same locations. For the true-positive cued cluster regions, 88.9% (80 of 90) of cues were in the same locations for all three scans. On the other hand, the reproducibility of the false-positive cues was much lower, with a higher fraction of different cues being generated in each scan. Only 44.0% (81 of 184) of the false-positive mass regions and 31.9% (38 of 119) of the false-positive microcalcification cluster regions were marked at the same locations in all three scans.
|
|
| DISCUSSION |
|---|
|
|
|---|
It should be noted that we obtained somewhat different results in absolute terms for the benign and malignant cases, but the pattern for the two groups remained similar. All cases in our study were sufficiently suspicious to ultimately warrant a recommendation for biopsy. We believe that at this stage, CAD schemes should be designed and optimized to identify this group of cases, including those that ultimately prove to be benign. It is well known that repeated scanning of the same image results in a slightly different digital value matrix for a variety of technical reasons. In current CAD systems, a binary threshold is typically used to generate detection marks. Each marked region has a computed score that is above a predetermined threshold; hence, lesions with computed scores that are near the threshold are vulnerable to small changes and may be detected in one image and missed in another. Findings in the present study show that the reproducibility of false-positive cues was much lower than that of true-positive cues (Tables 3 and 4), because the detection scores may be close to the threshold. We did not perform a complete long-term follow-up to confirm that all false-positive cues actually represented negative regions. Should any false-positive detection prove to be a true abnormality, the computed reproducibility level would be lower than that reported herein.
Note that the databases used in this and a previous (11) study were small; hence, the results may not represent the actual reproducibility of CAD systems in the screening environment. Despite this limitation, findings in the two studies highlight an important finding. Current CAD schemes are sensitive to small variations in the digital value matrices that result from repeated scanning of the same images. This may have methodologic and clinical practice implications that need to be addressed. The fact that all abnormalities depicted in the present study were visible on both views indicates that the cases were not particularly subtle and that the findings we report herein, including possible implications, may be magnified in cases that are more difficult to identify visually or when the abnormality is visible only on one view. We suspect that this sensitivity to minor changes in the matrices is not unique to the CAD system evaluated in the current study. Full-field digital mammography systems are rapidly becoming available (14,15). By definition, once an image is acquired, the CAD detection result will be 100% reproducible when the same CAD scheme is applied repeatedly to such an image. To be optimal, however, current CAD schemes may have to be reengineered and reoptimized by using digitally acquired images before these schemes can be applied optimally to full-field digital mammography systems. An investigation on possible effects of repeated image acquisition of the same breast on CAD results is beyond the scope of the present study.
Findings in our preliminary study suggest that sensitivity for the detection of microcalcification clusters is high; as a result, reproducibility is also high. These results are achieved at a low false-positive detection rate; hence, it is a useful tool during the diagnostic process. Our results raise the important question about the possible need to maintain records of CAD cues as available during the interpretation of the individual cases. This may become an even more important issue as cancer detection continues to progress toward an earlier stage (hence, a more subtle appearance) on the average. Detailed documentation of all available information at the time of diagnosis is not always done, particularly since information is often provided verbally. In the case of screening mammographic interpretation, however, the presence of a malignancy that was visible (in retrospect) on a previous mammogram and in which a follow-up scan of the original images in a CAD system may produce a true-positive identification, could present a medicolegal problem. It will be difficult to argue that the abnormality in question was not identified as suspicious on the original image. Findings in our preliminary study suggest that this may be the case in a noticeable fraction of mass cases (approximately 20%, as shown in Table 3).
The current practice associated with the use of CAD in the mammographic environment is not clear on whether a record of the CAD results used during the case interpretation should be retained. Until mass detection is substantially improved, results in our study suggest that such a practice should be considered. Interestingly, although largely impractical, our study findings clearly suggest that at this level of performance, multiple repeated scans of each case could be acquired to improve the performance of CAD schemes.
| FOOTNOTES |
|---|
Author contributions: Guarantor of integrity of entire study, B.Z.; study concepts, B.Z.; study design, B.Z., J.H.S.; literature research, B.Z., S.G.; clinical and experimental studies, B.Z., W.R.P.; data acquisition, B.Z., L.A.H., S.G.; data analysis/interpretation, B.Z.; statistical analysis, B.Z.; manuscript preparation, B.Z., S.G., L.A.H.; manuscript definition of intellectual content, B.Z.; manuscript editing, B.Z., S.G., L.A.H.; manuscript revision/review and final version approval, all authors.
The content of the contained information does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. J. Kim, W. K. Moon, N. Cho, J. H. Cha, S. M. Kim, and J.-G. Im Computer-aided Detection in Full-Field Digital Mammography: Sensitivity and Reproducibility in Serial Examinations Radiology, December 1, 2007; 246(1): 71 - 80. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Kim, W. K. Moon, N. Cho, J. H. Cha, S. M. Kim, and J.-G. Im Computer-aided Detection in Digital Mammography: Comparison of Craniocaudal, Mediolateral Oblique, and Mediolateral Views Radiology, December 1, 2006; 241(3): 695 - 701. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Ko, M. J. Nicholas, J. B. Mendel, and P. J. Slanetz Prospective assessment of computer-aided detection in interpretation of screening mammography. Am. J. Roentgenol., December 1, 2006; 187(6): 1483 - 1491. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. R. Pai, N. E. Gregory, A. E. Swinford, and M. Rebner Ductal Carcinoma in Situ: Computer-aided Detection in Screening Mammography Radiology, December 1, 2006; 241(3): 689 - 694. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. V. Ongeval, H. Bosmans, and A. Van Steen Current challenges of full field digital mammography Radiat Prot Dosimetry, December 1, 2005; 117(1-3): 148 - 153. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Zheng, G. S. Maitz, M. A. Ganott, G. Abrams, J. K. Leader, and D. Gur Performance and Reproducibility of a Computerized Mass Detection Scheme for Digitized Mammography Using Rotated and Resampled Images: An Assessment Am. J. Roentgenol., July 1, 2005; 185(1): 194 - 198. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gur, J. S. Stalder, L. A. Hardesty, B. Zheng, J. H. Sumkin, D. M. Chough, B. E. Shindel, and H. E. Rockette Computer-aided Detection Performance in Mammographic Examination of Masses: Assessment Radiology, November 1, 2004; 233(2): 418 - 423. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Baker, J. Y. Lo, D. M. Delong, and C. E. Floyd Computer-aided Detection in Screening Mammography: Variability in Cues Radiology, November 1, 2004; 233(2): 411 - 417. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Guenin and B. Zheng Long-term Retention of Mammographic Computer-assisted Diagnosis Information Is Neither Necessary Nor Desirable [letter] * Dr Zheng responds: Radiology, February 1, 2004; 230(2): 595 - 597. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |