|
|
||||||||
Breast Imaging |
1 From the Department of Radiology, Ullevaal University Hospital, Breast Imaging Center, Kirkeveien 166, N-0407 Oslo, Norway (P.S., K.Y.); Institut Gustave Roussy, Villejuif, France (C.B.); Department of Diagnostic Radiology, University Charité, Berlin, Germany (F.D., S.D.); and Institut Imagerive, Geneva, Switzerland (J.C.P.); and private consultant, Milwaukee, Wis (L.T.N.). From the 2003 RSNA Annual Meeting. Received September 19, 2004; revision requested November 5; revision received January 24, 2005; accepted February 23. Address correspondence to P.S. (e-mail: per.skaane{at}ulleval.no).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Regional ethics committee approved the study; signed patient consents were obtained. Two-view mammograms were obtained with digital and screen-film systems at previous screening studies. Six readers interpreted images. Interpretation included Breast Imaging Reporting and Data System (BI-RADS) and five-level probability-of-malignancy scores. A case was one breast, with two standard views acquired with both screen-film mammography and digital mammography. The standard for an examination with normal findings was classification of normal (category 1) assigned by two independent readers; for cases with benign findings, the standard was benign results at diagnostic work-up in patients who were recalled. Cases with normal or benign findings that manifested as neither interval cancer nor as cancer at subsequent screening were considered the standard. All cancers were confirmed histologically. Images were interpreted by readers in two sessions 5 weeks apart; the same case was not seen twice in any session. Receiver operating characteristic (ROC) analysis and, for a given true-positive fraction, 2 x 2 table analysis and the McNemar test were used. For binary outcome, classification of BI-RADS category 3 or higher was defined as positive for cancer.
RESULTS: Cases with proved findings (n = 232) were displayed: 46 with cancers, 88 with benign findings, and 98 with normal findings. ROC analysis for all readers and all cases revealed a higher area under ROC curve (Az) for digital mammography (0.916) than for screen-film mammography (0.887) (P = .22). Five of six readers had a higher performance rating with digital mammography; one of five demonstrated a significant difference in favor of digital mammography with Az values; two showed a significant difference in favor of digital mammography with ROC analysis for a given false-positive fraction (P = .01 and .03, respectively). For cases with cancer, digital mammography resulted in correct classification of an average of three additional cancers per reader. For digital versus screen-film mammography, 2 x 2 table analysis for cancers revealed a higher true-positive rate; for benign masses, a higher true-negative rate. Neither of these differences nor any others from analysis of subgroups between the modalities were significant.
CONCLUSION: Digital mammography allowed correct classification of more breast cancers than did screen-film mammography. Az value was higher for digital mammography; this difference was not significant.
© RSNA, 2005
| INTRODUCTION |
|---|
|
|
|---|
Conventional screen-film mammography with high spatial resolution has so far been the modality of choice for screening programs. Screen-film mammography has some distinct advantages, including relatively inexpensive technology, high spatial resolution, convenient display of images with widely available illuminators, and the capability for simultaneous display of multiple images. New technologies for detection and characterization are being pursued because a relatively large number of cancers are missed in screening programs (1,2). Full-field digital mammography is a promising new technology. The main advantage of digital mammography is that the processes of image acquisition, image processing, image display, and image storage are decoupled. Consequently, with a digital mammographic imaging system, each of these processes is performed independently, and this independent performance allows each step to be optimized individually. In contrast-detail studies (3,4), the digital mammographic equipment that has been developed during the past few years has demonstrated superior depiction of low-contrast objects compared with that of screen-film mammographic equipment. It is likely that the benefits of digital technology may be best realized with soft-copy display and interpretation of images.
Investigators in experimental and retrospective studies (57) of comparisons between screen-film mammography and digital mammography with soft-copy reading have demonstrated comparable results for both modalities in regard to lesion detection and characterization. Researchers in these studies mainly have investigated the performance of these two modalities in phantoms or in symptomatic women. So far, there have been only three large-scale studies (810) of a comparison between screen-film mammography and digital mammography in asymptomatic women in a screening situation. Results in these three trials showed no statistically significant differences between screen-film mammography and digital mammography in cancer detection rate. There were, however, some noticeable differences in other results of these trials. Fewer cancers were detected with digital mammography compared with screen-film mammography in the Colorado-Massachusetts study (8,11) and the Oslo I study (9), whereas more cancers were detected with digital mammography in the Oslo II study (10). With digital mammography, a significantly lower recall rate was observed in the Colorado-Massachusetts study (8,11), but the recall rates were higher for digital mammography in the Oslo I and II studies (9,10). Thus, the aim of our study was to retrospectively compare screen-film mammography and digital mammography by using soft-copy interpretation for reader performance in the detection and classification of breast lesions in women in a screening program.
| MATERIALS AND METHODS |
|---|
|
|
|---|
A total of 250 cases, with two standard views acquired with both screen-film mammography and digital mammography, were selected from the database by a nonradiologist. The study initially included 100 cases with normal findings, 100 cases with abnormal but benign findings, and 50 cases with cancers. All 100 cases with normal findings were randomly collected from the Oslo I study, which included independent double reading for screen-film mammograms and for digital mammograms. Two of the authors (P.S. and K.Y.) had, together with six other radiologists, taken part in the image reading of the Oslo I study. The inclusion criterion as a reference standard for a normal examination was that all four independent readers in the screening program, the two readers for screen-film mammograms and the two readers for digital mammograms, had assigned a score of category 1 on the five-point rating scale of probability of cancer used in the Norwegian Breast Cancer Screening Program, which was considered normal. A further inclusion criterion for the cases with normal findings was that such a case did not manifest as an interval cancer and that the case also was assigned a score of normal (category 1) by both independent readers in the subsequent screening round 2 years later.
Of the 100 cases with abnormal findings, 85 cases were selected from the Oslo I study and 15 cases were selected from the Oslo II study. A case with benign findings was defined as such if the woman had been recalled for diagnostic work-up because of an abnormal mammographic interpretation (rating score of >1) by at least one of the two independent readers in the screening program and the results of diagnostic work-up (at ultrasonography [US] or fine-needle aspiration cytologic analysis) indicated a benign abnormality. Furthermore, the case with benign findings was defined as such if the case did not manifest as an interval cancer or as a cancer in the subsequent screening round 2 years later.
The 50 cancers, which included both cases of ductal carcinoma in situ and invasive cancers, were detected at screening in the Oslo I study (n = 27) or the Oslo II study (n = 23). All cancers were confirmed at histologic analysis. The histologic type of the cancers, the size of the cancers, and the density of the breast parenchyma were not known by the person randomly selecting the cases from the files of the Oslo Breast Cancer Screening Program.
The mean age and age range (minimum and maximum age) of the women with normal findings, abnormal findings, or cancers were calculated.
Imaging
Screen-film mammographic examinations were all performed with a unit (Mammomat 300; Siemens, Erlangen, Germany) with film (Min-R 2000; Kodak Health Imaging, Rochester, NY) and screens (Min-R 2190; Kodak Health Imaging). A molybdenum target and a molybdenum filter at 29 kV were always used. The hospital physicist chose 29 kV, in accordance with the recommendations of the Norwegian Breast Cancer Screening Program, to keep the dose low while acceptable image quality was maintained. Screen-film mammographic images had optical density in compliance with the Norwegian Breast Cancer Screening Program requirements (12). Digital mammograms were acquired with a digital system with a cesium iodideamorphous silicon detector (Senographe 2000D; GE Medical Systems, Milwaukee, Wis). The unit is equipped with an automatic mode (automatic optimization of parameters) in which anode-filter combination, tube voltage, and tube currenttime product are selected automatically after analysis of results at a short exposure before actual imaging was performed. The standard automatic optimization of parameters mode was used according to the manufacturer's recommendations. The area of the image detector was 19 x 23 cm. Mammograms obtained with both imaging modalities (screen-film mammography and digital mammography) included the two standard views (craniocaudal and mediolateral oblique) of each breast.
Image Interpretation
Six radiologists (readers AF) from four European countries participated in the study. The readers' experience in screen-film mammography varied from 4 to 24 years, and their experience in digital mammography with soft-copy reading varied from 2 to 4 years. The number of screening examinations interpreted by each radiologist in his or her own practice varied from 2500 to 12 000 examinations per year. The two Norwegian radiologists had experience from a population-based (every member of the eligible target population being invited, with batch interpretation) screening program, and the other four readers had experience from service-based (self-referred or referred asymptomatic women, with consecutive individual interpretation) mammographic screening.
Soft-copy review was performed in a dedicated darkened room for digital mammography, and a darkened room with high-luminance view boxes was used for hard-copy display of screen-film mammographic images. No clinical information was available, and the readers had no knowledge of the screening results. Patient identification information was removed from the mammograms so that they were anonymous. The radiologists assigned scores to the images in two sessions that were 5 weeks apart such that the same case was not seen twice in any session. Each reading session included six interpretation rounds, and screen-film mammographic images and digital mammographic images were alternated, with a time limit of 60 minutes for about 40 cases in each round. A magnifying glass was always offered for screen-film mammographic interpretation. Images from the digital mammographic examinations were interpreted by using soft-copy reading on the review workstation (GE Medical Systems), which included two high-resolution 2000 x 2500 pixel monitors and a dedicated keypad. Postprocessing of the images, which included window-level adjustments, zooming, and inversion, was optional but strongly recommended.
For interpretation, a data form was included on which the reader marked the localization of an abnormality, if present. For cases in which more than one lesion was suspected, the lesion with the highest suspicion was considered. A Breast Imaging Reporting and Data System (BI-RADS) category of 15 was assigned for all cases as follows: category 1, findings negative for disease; category 2, findings that were benign, with no mammographic evidence of malignancy; category 3, findings that were probably benign, with short-term follow-up suggested in daily practice; category 4, suspicious abnormality, with biopsy considered; and category 5, findings that were highly suggestive of malignancy. BI-RADS category 0 was omitted. A cutoff between BI-RADS category 2 and category 3 was used for interpretations with positive versus negative results in the binary outcome for analysis with 2 x 2 tables; that is, for the cancer cases, assignment of a BI-RADS category of 3 or higher was considered as an interpretation with true-positive results. The readers also assigned a score for probability of cancer in all cases. To accomplish this task, they used the five-point rating scale applied by the Norwegian Breast Cancer Screening Program, according to the following: score 1, normal or definitely benign; score 2, probably benign; score 3, indeterminate finding; score 4, probably malignant; and score 5, malignant.
The density of breast parenchyma in each case was retrospectively determined by two radiologists (P.S. and K.Y.) by using the BI-RADS classification, which was as follows: category 1, fatty; category 2, scattered dense; category 3, heterogeneously dense; and category 4, extremely dense.
Statistical Analysis
Receiver operating characteristic (ROC) analysis was used for calculation of the diagnostic performance rating of each reader and for determination of overall results for the comparison of the average performance rating between screen-film mammography and digital mammography. The diagnostic performance rating was calculated for the BI-RADS categories, as well as for the probability of malignancy scores, and for the subgroups of masses and microcalcifications only. ROC analysis for the individual readers was performed by using a software program (ROCKIT, Macintosh PPC version 0.9.1 Beta; Charles E. Metz, University of Chicago, Chicago, Ill), whereas multireader analysis was performed by using another software program (LABMRMC, Macintosh PPC version 1.0b3; Charles E. Metz, University of Chicago). The software program for multireader analysis was used to compare the area under the ROC curve (Az) for the six readers and yielded a mean Az value. For comparison of the mean Az values of screen-film mammography and digital mammography, a difference with a P value of less than .05 was considered statistically significant.
By using the software program for ROC analysis, comparison of diagnostic performance ratings was also performed for a fixed true-positive fraction at a given false-positive fraction on the ROC curve. This analysis was performed to compare performance ratings of screen-film mammography and digital mammography at a specific operating point on the ROC curve. Since a BI-RADS category of 3 or higher was defined as a true-positive classification for the cancer cases, the operating point was chosen as the false-positive fraction that corresponded to a BI-RADS category of 3. For comparison of scores of individual readers, the average false-positive fraction from the screen-film mammographic and digital mammographic scores was used. A paired t test was used to test for significance; a difference with a P value of less than .05 was considered statistically significant.
Analysis with a 2 x 2 table was applied for comparison of screen-film mammographic interpretations and digital mammographic interpretations that were based on the BI-RADS categories for subgroups of the study population. A BI-RADS category of 3 or higher was defined as a true-positive classification for the cancer cases. The McNemar test (a difference with a P value of less than .05 was considered statistically significant) was used to compare the discordant pairs in the analysis with the 2 x 2 tables (Epi Info, version 6; Centers for Disease Control and Prevention, Atlanta, Ga). For each case, the average value of the BI-RADS category over the six readers was computed, and a mean value of 3 or higher was considered positive for the probability that a cancer was present.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
|
Analysis with 2 x 2 tables that were based on the BI-RADS categories for the subgroups of masses and microcalcifications only and for masses and microcalcifications in women with dense breast parenchyma (density of breast parenchyma BI-RADS categories 3 and 4) showed no significant differences between the two modalities.
Analysis with 2 x 2 tables for all cancers (46 cases or 276 interpretations) showed that 235 of 276 interpretations determined by the six readers were true-positive with both modalities, six interpretations were false-negative with both modalities, 27 interpretations were true-positive with digital mammography but false-negative with screen-film mammography, compared with eight interpretations that were false-negative with digital mammography but true-positive with screen-film mammography, with a discordant pair of 27 and eight (Table 3). With application of the McNemar test, which was based on averaging of the BI-RADS categories, to the 46 cancers determined by the six readers, however, no statistically significant difference between screen-film mammography and digital mammography was shown. Classification of the cancer cases according to masses only (26 cases or 156 interpretations) revealed that 13 interpretations of cancer were true-positive at digital mammography but false-negative at screen-film mammography, compared with six true-positive cases determined with screen-film mammography that were false-negative with digital mammography, whereas 132 cases were true-positive with both modalities, and five cancer cases were missed with both screen-film mammography and digital mammography. Hence, digital mammography depicted seven (4.5%) more mass cancers in 156 interpretations than did screen-film mammography. For malignant microcalcifications only (14 cases or 84 interpretations), the corresponding discordant pair was five versus one in favor of digital mammography, or four (4.8%) more malignant microcalcifications in 84 interpretations were correctly classified with digital mammography than with screen-film mammography. Six malignancies that manifested with both microcalcifications and a mass or density were not included in this classificatory analysis of the cancers. In the subgroup of cancers in dense breast parenchyma (BI-RADS categories 3 and 4), 89 of 108 interpretations were true-positive with both modalities, five interpretations were false-negative with both modalities, and the discordant pair of 11 and three in favor of digital mammography was not statistically significant.
|
|
| DISCUSSION |
|---|
|
|
|---|
We present the results of ROC analyses that were based on the BI-RADS classification, since most breast imaging radiologists are familiar with this system. A principal requirement for performance of ROC analysis is that the measurements or interpretations are meaningfully ranked in magnitude (13). It is a matter of discussion, however, whether the BI-RADS categories of 15 should be considered as a linear, continuous scale suitable for ROC analysis (14). The overall mean Az values for all cases and all readers that were based on the BI-RADS categories were comparable with the Az values that were based on the somewhat more linear five-point rating scale for probabilty of cancer. We, therefore, decided to present the diagnostic performance rating on the basis of the widely familiar categories of 15 in the BI-RADS lexicon. The ROC curves for digital mammography and screen-film mammography for all readers and all cases on the basis of the BI-RADS classification did not intersect, whereas the corresponding curves that were based on the probability scale showed that the two curves crossed slightly.
In the present study, results of the ROC analyses showed a slightly higher overall reader diagnostic performance rating with digital mammography, compared with screen-film mammography, although the difference was not statistically significant. Comparison of the interpretations for the cancer cases revealed a higher true-positive rating for digital mammography, but the difference was not statistically significant. Five of the six readers had a higher diagnostic performance rating with digital mammography by using Az values for comparison, and the difference was statistically significant for one of the five readers. Two of the six readers showed a statistically higher overall performance rating with digital mammography, if comparison is based on a fixed single false-positive fraction that defines BI-RADS category 3 or higher as a true-positive classification for cancers. Thus, the results of our study confirm the results of previous studies with experimental settings with a digital mammographic unit, which have shown that digital mammography is at least comparable with screen-film mammography in the detectability and characterization of microcalcifications and low-contrast objects (57,1518). The differences among the six readers reflect the challenges of interobserver variation in the interpretation of screening mammograms (19,20).
The substantially higher number of true-negative interpretations for benign masses with digital mammography compared with screen-film mammography in our study is noteworthy. This better characterization of benign masses with digital mammography seems to support the results of a previous study, which showed a statistically lower recall rate for women who underwent digital mammography, compared with those who underwent screen-film mammography, in a screening population (8). On the other hand, the higher number of false-positive classifications (BI-RADS category 3 or higher) of cases with normal findings in our study with digital mammography (11.9%) compared with screen-film mammography (8.2%) seems to support the results of the Oslo I and II studies, in which a higher recall rate was reported for women who underwent digital mammography (9,10). Interpretation of our results related to benign and malignant masses caused problems, partly because the ROC curves for masses intersected. The characterization of benign microcalcifications in our study showed no difference between the two modalities.
Digital mammography is expected to have a potential benefit over screen-film mammography for breast cancer detection in women with dense breast parenchyma because of increased contrast resolution and wider dynamic range. Classification of the population according to cancer cases in women with dense breast parenchyma (BI-RADS breast density categories 3 and 4) in our study, however, revealed no significant difference in the true-positive interpretations between the two modalities. Since there were only 18 cancers in this subgroup, a larger study would be required to verify improved detection and classification of cancers with digital mammography in women with dense breast parenchyma. There were only slightly, but not significantly, more correct true-negative interpretations with digital mammography for the benign masses in women with dense breast parenchyma.
The experimental design of this study led to some limitations. An important aspect of our study was the comparison of diagnostic performance ratings with respect to screen-film mammography versus digital mammography for the detection of small asymptomatic lesions, such as those usually found in women in a mammographic screening population. The large number of cancers included in the study population sample, however, does not reflect a screening situation, and this feature may explain the high number of false-positive interpretations, and especially the relatively high number of false-positive interpretations for cases with normal findings with both modalities. This clearly showsonce againthat results from retrospective experimental studies are difficult to compare with prospective interpretations performed in daily practice. Also, significant results could not be derived for different mammographic types of lesions, because of a relatively low number of cases in various subgroups and, consequently, a lack of power.
In conclusion, our results demonstrated that digital mammography with soft-copy reading was slightly superior to screen-film mammography in the detection and the characterization of lesions that are representative of those in women in a mammographic screening program. Overall, however, there was no statistically significant difference in diagnostic performance ratings between the two imaging modalities. The small number of cases in our study limits the ability to determine true differences between the two imaging modalities for various subgroups of lesions.
| FOOTNOTES |
|---|
Abbreviations: Az = area under ROC curve BI-RADS = Breast Imaging Reporting and Data System ROC = receiver operating characteristic
2 Current address: Hologic, Hillsborough, NC ![]()
L.T.N. was a consultant for GE Medical Systems, Milwaukee, Wis. See Materials and Methods for pertinent disclosures.
Author contributions: Guarantor of integrity of entire study, P.S.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; literature research, P.S., F.D., S.D., J.C.P., K.Y., L.T.N.; clinical studies, all authors; statistical analysis, P.S., L.T.N.; and manuscript editing, P.S., C.B., F.D., S.D., J.C.P., L.T.N.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. B. Krug, H. Stutzer, R. Schroder, J. Boecker, J. Poggenborg, and K. Lackner Image Quality of Digital Direct Flat-Panel Mammography Versus an Analog Screen-Film Technique Using a Low-Contrast Phantom Am. J. Roentgenol., September 1, 2008; 191(3): W80 - W88. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gur, A. I. Bandos, C. S. Cohen, C. M. Hakim, L. A. Hardesty, M. A. Ganott, R. L. Perrin, W. R. Poller, R. Shah, J. H. Sumkin, et al. The "Laboratory" Effect: Comparing Radiologists' Performance and Variability during Prospective Clinical and Laboratory Mammography Interpretations Radiology, August 5, 2008; (2008) 2491072025. [Abstract] [Full Text] |
||||
![]() |
R. E. Hendrick, E. B. Cole, E. D. Pisano, S. Acharyya, H. Marques, M. A. Cohen, R. A. Jong, G. E. Mawdsley, K. M. Kanal, C. J. D'Orsi, et al. Accuracy of Soft-Copy Digital Mammography versus That of Screen-Film Mammography according to Digital Manufacturer: ACRIN DMIST Retrospective Multireader Study Radiology, April 1, 2008; 247(1): 38 - 48. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gur, A. I. Bandos, and H. E. Rockette Comparing Areas under Receiver Operating Characteristic Curves: Potential Impact of the "Last" Experimentally Measured Operating Point Radiology, April 1, 2008; 247(1): 12 - 15. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gur Digital Mammography: Do We Need to Convert Now? Radiology, October 1, 2007; 245(1): 10 - 11. [Full Text] [PDF] |
||||
![]() |
J. H. BRUSIN Digital Mammography: An Update Radiol. Technol., January 1, 2006; 77(3): 226M - 234M. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. D. Dershaw Film or digital mammographic screening? N. Engl. J. Med., October 27, 2005; 353(17): 1846 - 1847. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |