Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print August 11, 2005, 10.1148/radiol.2371041605
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2371041605v1
237/1/37    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Skaane, P.
Right arrow Articles by Niklason, L. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Skaane, P.
Right arrow Articles by Niklason, L. T.
(Radiology 2005;237:37-44.)
© RSNA, 2005


Breast Imaging

Breast Lesion Detection and Classification: Comparison of Screen-Film Mammography and Full-Field Digital Mammography with Soft-copy Reading—Observer Performance Study1

Per Skaane, MD, PhD, Corinne Balleyguier, MD, Felix Diekmann, MD, Susanne Diekmann, MD, Jean-Charles Piguet, MD, Kari Young, MD and Loren T. Niklason, PhD2

1 From the Department of Radiology, Ullevaal University Hospital, Breast Imaging Center, Kirkeveien 166, N-0407 Oslo, Norway (P.S., K.Y.); Institut Gustave Roussy, Villejuif, France (C.B.); Department of Diagnostic Radiology, University Charité, Berlin, Germany (F.D., S.D.); and Institut Imagerive, Geneva, Switzerland (J.C.P.); and private consultant, Milwaukee, Wis (L.T.N.). From the 2003 RSNA Annual Meeting. Received September 19, 2004; revision requested November 5; revision received January 24, 2005; accepted February 23. Address correspondence to P.S. (e-mail: per.skaane{at}ulleval.no).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
PURPOSE: To retrospectively compare screen-film and full-field digital mammography with soft-copy interpretation for reader performance in detection and classification of breast lesions in women in a screening program.

MATERIALS AND METHODS: Regional ethics committee approved the study; signed patient consents were obtained. Two-view mammograms were obtained with digital and screen-film systems at previous screening studies. Six readers interpreted images. Interpretation included Breast Imaging Reporting and Data System (BI-RADS) and five-level probability-of-malignancy scores. A case was one breast, with two standard views acquired with both screen-film mammography and digital mammography. The standard for an examination with normal findings was classification of normal (category 1) assigned by two independent readers; for cases with benign findings, the standard was benign results at diagnostic work-up in patients who were recalled. Cases with normal or benign findings that manifested as neither interval cancer nor as cancer at subsequent screening were considered the standard. All cancers were confirmed histologically. Images were interpreted by readers in two sessions 5 weeks apart; the same case was not seen twice in any session. Receiver operating characteristic (ROC) analysis and, for a given true-positive fraction, 2 x 2 table analysis and the McNemar test were used. For binary outcome, classification of BI-RADS category 3 or higher was defined as positive for cancer.

RESULTS: Cases with proved findings (n = 232) were displayed: 46 with cancers, 88 with benign findings, and 98 with normal findings. ROC analysis for all readers and all cases revealed a higher area under ROC curve (Az) for digital mammography (0.916) than for screen-film mammography (0.887) (P = .22). Five of six readers had a higher performance rating with digital mammography; one of five demonstrated a significant difference in favor of digital mammography with Az values; two showed a significant difference in favor of digital mammography with ROC analysis for a given false-positive fraction (P = .01 and .03, respectively). For cases with cancer, digital mammography resulted in correct classification of an average of three additional cancers per reader. For digital versus screen-film mammography, 2 x 2 table analysis for cancers revealed a higher true-positive rate; for benign masses, a higher true-negative rate. Neither of these differences nor any others from analysis of subgroups between the modalities were significant.

CONCLUSION: Digital mammography allowed correct classification of more breast cancers than did screen-film mammography. Az value was higher for digital mammography; this difference was not significant.

© RSNA, 2005


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Mammography is the established method for detection of unsuspected breast cancer at an early preclinical stage in asymptomatic women, and it has sufficient specificity to be used in screening. The success of screening mammography depends on the perception of small and sometimes subtle lesions. The depiction of fine microcalcifications and subtle soft-tissue masses at high-quality mammography is key to the detection of early breast cancer. The interval between screening rounds, the image quality, and the radiologist's skill and training in the perception and analysis of subtle features of the lesion are important factors that must be optimized to achieve the goal of early cancer detection.

Conventional screen-film mammography with high spatial resolution has so far been the modality of choice for screening programs. Screen-film mammography has some distinct advantages, including relatively inexpensive technology, high spatial resolution, convenient display of images with widely available illuminators, and the capability for simultaneous display of multiple images. New technologies for detection and characterization are being pursued because a relatively large number of cancers are missed in screening programs (1,2). Full-field digital mammography is a promising new technology. The main advantage of digital mammography is that the processes of image acquisition, image processing, image display, and image storage are decoupled. Consequently, with a digital mammographic imaging system, each of these processes is performed independently, and this independent performance allows each step to be optimized individually. In contrast-detail studies (3,4), the digital mammographic equipment that has been developed during the past few years has demonstrated superior depiction of low-contrast objects compared with that of screen-film mammographic equipment. It is likely that the benefits of digital technology may be best realized with soft-copy display and interpretation of images.

Investigators in experimental and retrospective studies (57) of comparisons between screen-film mammography and digital mammography with soft-copy reading have demonstrated comparable results for both modalities in regard to lesion detection and characterization. Researchers in these studies mainly have investigated the performance of these two modalities in phantoms or in symptomatic women. So far, there have been only three large-scale studies (810) of a comparison between screen-film mammography and digital mammography in asymptomatic women in a screening situation. Results in these three trials showed no statistically significant differences between screen-film mammography and digital mammography in cancer detection rate. There were, however, some noticeable differences in other results of these trials. Fewer cancers were detected with digital mammography compared with screen-film mammography in the Colorado-Massachusetts study (8,11) and the Oslo I study (9), whereas more cancers were detected with digital mammography in the Oslo II study (10). With digital mammography, a significantly lower recall rate was observed in the Colorado-Massachusetts study (8,11), but the recall rates were higher for digital mammography in the Oslo I and II studies (9,10). Thus, the aim of our study was to retrospectively compare screen-film mammography and digital mammography by using soft-copy interpretation for reader performance in the detection and classification of breast lesions in women in a screening program.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Study Population and Reference Standards
GE Healthcare assisted in the study by making available a suitable reading environment used by the radiologists to interpret the images obtained at screen-film and digital mammographic examinations and by providing logistic support for the reading sessions. The authors had full control of the data and the information submitted for publication. All cases included in the study (one case represented one breast) were selected from the population-based Breast Cancer Screening Program in Oslo, Norway, which is part of the Norwegian Breast Cancer Screening Program. This program was for women 50–69 years old and was started in 1996, and the interval between the screening rounds was 2 years. Two standard views (craniocaudal and mediolateral oblique) of each breast were acquired. Most cases selected for this study were obtained from the Oslo I study, which was performed between January 3, 2000, and June 22, 2000, and was a paired study in which the women underwent a two-view examination of each breast with screen-film mammography and digital mammography on the same day. Cases were also collected from the Oslo II study, which was performed between November 27, 2000, and December 31, 2001, and was a randomized trial that involved a comparison between screen-film mammography and digital mammography in women in a population-based screening program. For these cases, the interval between the examination with screen-film mammography and that with digital mammography was less than 3 weeks. All women included in the Oslo I and II studies were informed about the study beforehand, and their participation in the project was voluntary. Each woman who was enrolled in those two studies signed a written consent form, and both studies were approved by the regional ethics committee. The consent form specified that the data and images related to screening could be used for future research and scientific purposes.

A total of 250 cases, with two standard views acquired with both screen-film mammography and digital mammography, were selected from the database by a nonradiologist. The study initially included 100 cases with normal findings, 100 cases with abnormal but benign findings, and 50 cases with cancers. All 100 cases with normal findings were randomly collected from the Oslo I study, which included independent double reading for screen-film mammograms and for digital mammograms. Two of the authors (P.S. and K.Y.) had, together with six other radiologists, taken part in the image reading of the Oslo I study. The inclusion criterion as a reference standard for a normal examination was that all four independent readers in the screening program, the two readers for screen-film mammograms and the two readers for digital mammograms, had assigned a score of category 1 on the five-point rating scale of probability of cancer used in the Norwegian Breast Cancer Screening Program, which was considered normal. A further inclusion criterion for the cases with normal findings was that such a case did not manifest as an interval cancer and that the case also was assigned a score of normal (category 1) by both independent readers in the subsequent screening round 2 years later.

Of the 100 cases with abnormal findings, 85 cases were selected from the Oslo I study and 15 cases were selected from the Oslo II study. A case with benign findings was defined as such if the woman had been recalled for diagnostic work-up because of an abnormal mammographic interpretation (rating score of >1) by at least one of the two independent readers in the screening program and the results of diagnostic work-up (at ultrasonography [US] or fine-needle aspiration cytologic analysis) indicated a benign abnormality. Furthermore, the case with benign findings was defined as such if the case did not manifest as an interval cancer or as a cancer in the subsequent screening round 2 years later.

The 50 cancers, which included both cases of ductal carcinoma in situ and invasive cancers, were detected at screening in the Oslo I study (n = 27) or the Oslo II study (n = 23). All cancers were confirmed at histologic analysis. The histologic type of the cancers, the size of the cancers, and the density of the breast parenchyma were not known by the person randomly selecting the cases from the files of the Oslo Breast Cancer Screening Program.

The mean age and age range (minimum and maximum age) of the women with normal findings, abnormal findings, or cancers were calculated.

Imaging
Screen-film mammographic examinations were all performed with a unit (Mammomat 300; Siemens, Erlangen, Germany) with film (Min-R 2000; Kodak Health Imaging, Rochester, NY) and screens (Min-R 2190; Kodak Health Imaging). A molybdenum target and a molybdenum filter at 29 kV were always used. The hospital physicist chose 29 kV, in accordance with the recommendations of the Norwegian Breast Cancer Screening Program, to keep the dose low while acceptable image quality was maintained. Screen-film mammographic images had optical density in compliance with the Norwegian Breast Cancer Screening Program requirements (12). Digital mammograms were acquired with a digital system with a cesium iodide–amorphous silicon detector (Senographe 2000D; GE Medical Systems, Milwaukee, Wis). The unit is equipped with an automatic mode (automatic optimization of parameters) in which anode-filter combination, tube voltage, and tube current–time product are selected automatically after analysis of results at a short exposure before actual imaging was performed. The standard automatic optimization of parameters mode was used according to the manufacturer's recommendations. The area of the image detector was 19 x 23 cm. Mammograms obtained with both imaging modalities (screen-film mammography and digital mammography) included the two standard views (craniocaudal and mediolateral oblique) of each breast.

Image Interpretation
Six radiologists (readers A–F) from four European countries participated in the study. The readers' experience in screen-film mammography varied from 4 to 24 years, and their experience in digital mammography with soft-copy reading varied from 2 to 4 years. The number of screening examinations interpreted by each radiologist in his or her own practice varied from 2500 to 12 000 examinations per year. The two Norwegian radiologists had experience from a population-based (every member of the eligible target population being invited, with batch interpretation) screening program, and the other four readers had experience from service-based (self-referred or referred asymptomatic women, with consecutive individual interpretation) mammographic screening.

Soft-copy review was performed in a dedicated darkened room for digital mammography, and a darkened room with high-luminance view boxes was used for hard-copy display of screen-film mammographic images. No clinical information was available, and the readers had no knowledge of the screening results. Patient identification information was removed from the mammograms so that they were anonymous. The radiologists assigned scores to the images in two sessions that were 5 weeks apart such that the same case was not seen twice in any session. Each reading session included six interpretation rounds, and screen-film mammographic images and digital mammographic images were alternated, with a time limit of 60 minutes for about 40 cases in each round. A magnifying glass was always offered for screen-film mammographic interpretation. Images from the digital mammographic examinations were interpreted by using soft-copy reading on the review workstation (GE Medical Systems), which included two high-resolution 2000 x 2500 pixel monitors and a dedicated keypad. Postprocessing of the images, which included window-level adjustments, zooming, and inversion, was optional but strongly recommended.

For interpretation, a data form was included on which the reader marked the localization of an abnormality, if present. For cases in which more than one lesion was suspected, the lesion with the highest suspicion was considered. A Breast Imaging Reporting and Data System (BI-RADS) category of 1–5 was assigned for all cases as follows: category 1, findings negative for disease; category 2, findings that were benign, with no mammographic evidence of malignancy; category 3, findings that were probably benign, with short-term follow-up suggested in daily practice; category 4, suspicious abnormality, with biopsy considered; and category 5, findings that were highly suggestive of malignancy. BI-RADS category 0 was omitted. A cutoff between BI-RADS category 2 and category 3 was used for interpretations with positive versus negative results in the binary outcome for analysis with 2 x 2 tables; that is, for the cancer cases, assignment of a BI-RADS category of 3 or higher was considered as an interpretation with true-positive results. The readers also assigned a score for probability of cancer in all cases. To accomplish this task, they used the five-point rating scale applied by the Norwegian Breast Cancer Screening Program, according to the following: score 1, normal or definitely benign; score 2, probably benign; score 3, indeterminate finding; score 4, probably malignant; and score 5, malignant.

The density of breast parenchyma in each case was retrospectively determined by two radiologists (P.S. and K.Y.) by using the BI-RADS classification, which was as follows: category 1, fatty; category 2, scattered dense; category 3, heterogeneously dense; and category 4, extremely dense.

Statistical Analysis
Receiver operating characteristic (ROC) analysis was used for calculation of the diagnostic performance rating of each reader and for determination of overall results for the comparison of the average performance rating between screen-film mammography and digital mammography. The diagnostic performance rating was calculated for the BI-RADS categories, as well as for the probability of malignancy scores, and for the subgroups of masses and microcalcifications only. ROC analysis for the individual readers was performed by using a software program (ROCKIT, Macintosh PPC version 0.9.1 Beta; Charles E. Metz, University of Chicago, Chicago, Ill), whereas multireader analysis was performed by using another software program (LABMRMC, Macintosh PPC version 1.0b3; Charles E. Metz, University of Chicago). The software program for multireader analysis was used to compare the area under the ROC curve (Az) for the six readers and yielded a mean Az value. For comparison of the mean Az values of screen-film mammography and digital mammography, a difference with a P value of less than .05 was considered statistically significant.

By using the software program for ROC analysis, comparison of diagnostic performance ratings was also performed for a fixed true-positive fraction at a given false-positive fraction on the ROC curve. This analysis was performed to compare performance ratings of screen-film mammography and digital mammography at a specific operating point on the ROC curve. Since a BI-RADS category of 3 or higher was defined as a true-positive classification for the cancer cases, the operating point was chosen as the false-positive fraction that corresponded to a BI-RADS category of 3. For comparison of scores of individual readers, the average false-positive fraction from the screen-film mammographic and digital mammographic scores was used. A paired t test was used to test for significance; a difference with a P value of less than .05 was considered statistically significant.

Analysis with a 2 x 2 table was applied for comparison of screen-film mammographic interpretations and digital mammographic interpretations that were based on the BI-RADS categories for subgroups of the study population. A BI-RADS category of 3 or higher was defined as a true-positive classification for the cancer cases. The McNemar test (a difference with a P value of less than .05 was considered statistically significant) was used to compare the discordant pairs in the analysis with the 2 x 2 tables (Epi Info, version 6; Centers for Disease Control and Prevention, Atlanta, Ga). For each case, the average value of the BI-RADS category over the six readers was computed, and a mean value of 3 or higher was considered positive for the probability that a cancer was present.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Final Study Population
Among the 100 cases with normal findings that were selected by the nonradiologist, those in two women who had large breasts that could not fit on the standard film format of 18 x 24 cm were excluded from analysis. In these two women, large-format images (24 x 30 cm) were obtained at screen-film mammography and more than the standard two images (mosaic) were obtained at digital mammography. A total of 12 cases among the 100 randomly selected cases with benign lesions were excluded from further analysis, since no abnormality was confirmed at diagnostic work-up, which included magnification views, cone-down views, and breast US. The suspicious abnormality seen on the screening mammograms thus proved to be superimposed glandular tissue and not a true abnormality. A total of four cancers among the 50 malignant tumors randomly selected from the database were excluded from analysis: One cancer proved to be occult and, consequently, represented an incidental finding. One cancer proved to be outside the image on one view and at the margin on the other view at digital mammography; consequently, detection of this cancer was missed because of positioning failure. In two cancers, lateromedial views instead of mediolateral oblique views were obtained at diagnostic work-up, so comparison with the screening mammograms was inappropriate. Thus, the final study population consisted of 232 cases—98 cases with normal findings in women with a mean age of 56.2 years and an age range of 49–67 years, 88 cases with benign findings in women with a mean age of 56.4 years and an age range of 45–68 years, and 46 cancers in women with a mean age of 59.2 years and an age range of 51–70 years. In 18 cases, the density of the breast parenchyma was classified as BI-RADS category 1 (fatty), and in 12 cases, as BI-RADS category 4 (extremely dense). In the rest of the cases, the classification of the density was nearly equally distributed between categories 2 and 3. The mammographic features of the cases with benign and malignant findings and the distribution of breast parenchyma density according to the BI-RADS classification for the 232 cases included in the study are summarized in Table 1.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Mammographic Features and Breast Parenchyma Density of Cases with Normal, Benign, and Malignant Findings

 
Reader Comparisons
All cases.—The Az for the six individual readers with the BI-RADS categories for digital mammography and screen-film mammography, respectively, was as follows: reader A (0.846 and 0.878), reader B (0.963 and 0.913), reader C (0.921 and 0.895), reader D (0.933 and 0.891), reader E (0.931 and 0.860), and reader F (0.901 and 0.886). These values and the corresponding Az values determined by using the probability of malignancy score are shown in Table 2. One reader performed had a higher performance rating with screen-film mammography, but the difference was not statistically significant, whereas five readers had a higher performance rating with digital mammography. Reader E demonstrated a statistically significant higher performance rating (P = .04) with digital mammography, whereas the differences in Az values between digital mammography and screen-film mammography that were based on BI-RADS categories, as well as on probability of malignancy scores, were not statistically significant for the other comparisons (Table 2).


View this table:
[in this window]
[in a new window]

 
TABLE 2. Az Values for Digital Mammography and Screen-Film Mammography for Six Readers with BI-RADS Classification and Five-Point Scale for Probability of Malignancy

 
The multireader analysis of BI-RADS categories for all cases and all readers showed a mean Az value of 0.916 for digital mammography, compared with a mean Az of 0.887 for screen-film mammography. This difference was not statistically significant (P = .22). The corresponding ROC analysis with the probability of malignancy scores provided similar results with a mean Az for all readers of 0.925 for digital mammography and 0.893 for screen-film mammography (P = .25). The ROC curves are shown in the Figure, parts a and b.



View larger version (18K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure a. Graphs show fitted ROC curves for full-field digital mammography (FFDM) and screen-film mammography (SFM). (a) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.916; that for screen-film mammography, 0.887 (P = .22). (b) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the five-point rating scale for probability of malignancy: Az value for digital mammography was 0.925; that for screen-film mammography, 0.893 (P = .25). (c) Mean ROC curves for all benign and malignant densities and masses (n = 71) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.860; that for screen-film mammography, 0.835 (P = .646). The six cancers that manifested as masses with associated microcalcifications were excluded from analysis. (d) Mean ROC curves for cases with benign findings that manifested as microcalcifications (n = 43) and cancer cases that manifested as microcalcifications (n = 14) and four readers on the basis of the BI-RADS classification. Az value for digital mammography was 0.841; that for screen-film mammography, 0.787. This difference was not significant. FPF = false-positive fraction, TPF = true-positive fraction.

 


View larger version (19K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure b. Graphs show fitted ROC curves for full-field digital mammography (FFDM) and screen-film mammography (SFM). (a) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.916; that for screen-film mammography, 0.887 (P = .22). (b) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the five-point rating scale for probability of malignancy: Az value for digital mammography was 0.925; that for screen-film mammography, 0.893 (P = .25). (c) Mean ROC curves for all benign and malignant densities and masses (n = 71) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.860; that for screen-film mammography, 0.835 (P = .646). The six cancers that manifested as masses with associated microcalcifications were excluded from analysis. (d) Mean ROC curves for cases with benign findings that manifested as microcalcifications (n = 43) and cancer cases that manifested as microcalcifications (n = 14) and four readers on the basis of the BI-RADS classification. Az value for digital mammography was 0.841; that for screen-film mammography, 0.787. This difference was not significant. FPF = false-positive fraction, TPF = true-positive fraction.

 
Subgroups.—The ROC curves for the subgroups of benign and malignant masses and microcalcifications only are shown in the Figure, parts c and d. For densities and masses only (n = 71 cases), the Az value for digital mammography was 0.860, compared with an Az value of 0.835 for screen-film mammography. Comparison of these values, however, caused problems, since the ROC curves crossed each other (Figure, part c). For the subgroup of microcalcifications only (n = 57 cases), the data were degenerate (either clustered values, with which most data represented just a few scores without enough range, or tied values, with which the same score was assigned for many screen-film mammographic and digital mammographic interpretations) for two readers, and these data were expelled from analysis by the computer program. The average ROC curve that was based on the BI-RADS category for the other four readers is shown in the Figure, part d. The ROC curves show a higher diagnostic performance rating for digital mammography, compared with screen-film mammography, for all false-positive fractions; the Az for digital mammography was 0.841, compared with the Az of 0.787 for screen-film mammography, but this difference was not significant.



View larger version (18K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure c. Graphs show fitted ROC curves for full-field digital mammography (FFDM) and screen-film mammography (SFM). (a) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.916; that for screen-film mammography, 0.887 (P = .22). (b) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the five-point rating scale for probability of malignancy: Az value for digital mammography was 0.925; that for screen-film mammography, 0.893 (P = .25). (c) Mean ROC curves for all benign and malignant densities and masses (n = 71) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.860; that for screen-film mammography, 0.835 (P = .646). The six cancers that manifested as masses with associated microcalcifications were excluded from analysis. (d) Mean ROC curves for cases with benign findings that manifested as microcalcifications (n = 43) and cancer cases that manifested as microcalcifications (n = 14) and four readers on the basis of the BI-RADS classification. Az value for digital mammography was 0.841; that for screen-film mammography, 0.787. This difference was not significant. FPF = false-positive fraction, TPF = true-positive fraction.

 


View larger version (20K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure d. Graphs show fitted ROC curves for full-field digital mammography (FFDM) and screen-film mammography (SFM). (a) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.916; that for screen-film mammography, 0.887 (P = .22). (b) Mean ROC curves for all cases (n = 232) and all six readers on the basis of the five-point rating scale for probability of malignancy: Az value for digital mammography was 0.925; that for screen-film mammography, 0.893 (P = .25). (c) Mean ROC curves for all benign and malignant densities and masses (n = 71) and all six readers on the basis of the BI-RADS classification: Az value for digital mammography was 0.860; that for screen-film mammography, 0.835 (P = .646). The six cancers that manifested as masses with associated microcalcifications were excluded from analysis. (d) Mean ROC curves for cases with benign findings that manifested as microcalcifications (n = 43) and cancer cases that manifested as microcalcifications (n = 14) and four readers on the basis of the BI-RADS classification. Az value for digital mammography was 0.841; that for screen-film mammography, 0.787. This difference was not significant. FPF = false-positive fraction, TPF = true-positive fraction.

 
Other analyses.—In addition to the comparisons with Az, further analyses were performed for a fixed single false-positive fraction and, consequently, a given true-positive fraction. The threshold for a true-positive score with the BI-RADS classification was 3, so a BI-RADS category of 3 or higher for cancers was a true-positive rating. When the average false-positive fractions for digital mammography and screen-film mammography at this threshold level were entered into the program for ROC analysis, two readers (readers B and E) showed a statistically significant higher performance rating with digital mammography, compared with screen-film mammography (P = .01 and .03, respectively). When the false-positive fraction was averaged over all six readers (0.337) for all cases, digital mammography showed a higher performance rating (mean true-positive fraction of 0.935) than did screen-film mammography (mean true-positive fraction of 0.889), but this difference was not significant (P = .088).

Analysis with 2 x 2 tables that were based on the BI-RADS categories for the subgroups of masses and microcalcifications only and for masses and microcalcifications in women with dense breast parenchyma (density of breast parenchyma BI-RADS categories 3 and 4) showed no significant differences between the two modalities.

Analysis with 2 x 2 tables for all cancers (46 cases or 276 interpretations) showed that 235 of 276 interpretations determined by the six readers were true-positive with both modalities, six interpretations were false-negative with both modalities, 27 interpretations were true-positive with digital mammography but false-negative with screen-film mammography, compared with eight interpretations that were false-negative with digital mammography but true-positive with screen-film mammography, with a discordant pair of 27 and eight (Table 3). With application of the McNemar test, which was based on averaging of the BI-RADS categories, to the 46 cancers determined by the six readers, however, no statistically significant difference between screen-film mammography and digital mammography was shown. Classification of the cancer cases according to masses only (26 cases or 156 interpretations) revealed that 13 interpretations of cancer were true-positive at digital mammography but false-negative at screen-film mammography, compared with six true-positive cases determined with screen-film mammography that were false-negative with digital mammography, whereas 132 cases were true-positive with both modalities, and five cancer cases were missed with both screen-film mammography and digital mammography. Hence, digital mammography depicted seven (4.5%) more mass cancers in 156 interpretations than did screen-film mammography. For malignant microcalcifications only (14 cases or 84 interpretations), the corresponding discordant pair was five versus one in favor of digital mammography, or four (4.8%) more malignant microcalcifications in 84 interpretations were correctly classified with digital mammography than with screen-film mammography. Six malignancies that manifested with both microcalcifications and a mass or density were not included in this classificatory analysis of the cancers. In the subgroup of cancers in dense breast parenchyma (BI-RADS categories 3 and 4), 89 of 108 interpretations were true-positive with both modalities, five interpretations were false-negative with both modalities, and the discordant pair of 11 and three in favor of digital mammography was not statistically significant.


View this table:
[in this window]
[in a new window]

 
TABLE 3. Analysis with 2 x 2 Table for Interpretations of Six Readers for Screen-Film and Digital Mammography in all 46 Cancers (276 Interpretations)

 
For the benign and normal cases (186 cases or a total of 1116 interpretations determined by the six readers), there was virtually no difference between the two modalities. A total of 604 interpretations were true-negative with both modalities, and 238 were false-positive with both modalities, with the discordant pair of 140 and 134 interpretations in favor of digital mammography (one more correct true-negative interpretation with digital mammography per reader). For the 45 benign masses only (270 interpretations), the discordant pair of 67 versus 41 true-negative interpretations in favor of digital mammography (Table 4) was not statistically significant (P = .33, McNemar test that was based on averaging of the BI-RADS categories over the six readers). Overall, 26 (9.6%) of 270 more correct true-negative interpretations were observed with digital mammography, compared with screen-film mammography. Only 14 benign masses were found in women with dense breast parenchyma; in this number of masses, 20 of 84 interpretations were correctly true-negative with digital mammography and false-positive with screen-film mammography, compared with 14 of 84 false-positive ratings with digital mammography and true-negative ratings with screen-film mammography (the difference was not statistically significant). For the benign microcalcifications only (43 cases or 258 interpretations), 50 interpretations were true-negative with both modalities and 136 interpretations were false-positive with both modalities, whereas the discordant pair was 38 versus 34 in favor of screen-film mammography. Thus, there was practically no difference between screen-film mammography and digital mammography in the characterization of benign microcalcifications.


View this table:
[in this window]
[in a new window]

 
TABLE 4. Analysis with 2 x 2 Table for Interpretations of Six Readers for Screen-Film and Digital Mammography in 45 Benign Masses (270 Interpretations)

 
With screen-film mammography, a total of 48 (8.2%) of 588 interpretations for the cases with normal findings (n = 98) were classified as BI-RADS category 3 or higher, and the distribution was as follows: 31 cases, category 3; 14 cases, category 4; and three cases, category 5. For digital mammography, a total of 70 (11.9%) of 588 interpretations were classified as BI-RADS category 3 or higher, and the distribution was as follows: 48 cases, category 3; 22 cases, category 4; and zero cases, category 5.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
The digital mammographic equipment developed over the past few years has demonstrated superior detection of low-contrast objects in contrast-detail studies (3,4). This improvement, along with a wider dynamic range, is expected to enhance the diagnostic quality of images, particularly for women with dense breast parenchyma. The true flexibility, and benefit, of digital technology is primarily realized in a soft-copy display of the images and, consequently, in soft-copy reading. This potential benefit of digital mammography should result in enhanced cancer detection, especially in dense breast parenchyma, because of increased contrast resolution. The three large-scale trials in which screen-film mammography and digital mammography were compared in screening populations so far, however, did not show any statistically significant differences in the cancer detection rate between screen-film mammography and digital mammography (811). The differences between screen-film mammography and digital mammography for cancer detection rates found in the two first trials (8,9) could be explained by random statistical variation or lack of power. The third and larger-scale trial, however, showed a cancer detection rate for women in the age group of 50–69 years that was close to statistical significance in favor of digital mammography, compared with screen-film mammography (10).

We present the results of ROC analyses that were based on the BI-RADS classification, since most breast imaging radiologists are familiar with this system. A principal requirement for performance of ROC analysis is that the measurements or interpretations are meaningfully ranked in magnitude (13). It is a matter of discussion, however, whether the BI-RADS categories of 1–5 should be considered as a linear, continuous scale suitable for ROC analysis (14). The overall mean Az values for all cases and all readers that were based on the BI-RADS categories were comparable with the Az values that were based on the somewhat more linear five-point rating scale for probabilty of cancer. We, therefore, decided to present the diagnostic performance rating on the basis of the widely familiar categories of 1–5 in the BI-RADS lexicon. The ROC curves for digital mammography and screen-film mammography for all readers and all cases on the basis of the BI-RADS classification did not intersect, whereas the corresponding curves that were based on the probability scale showed that the two curves crossed slightly.

In the present study, results of the ROC analyses showed a slightly higher overall reader diagnostic performance rating with digital mammography, compared with screen-film mammography, although the difference was not statistically significant. Comparison of the interpretations for the cancer cases revealed a higher true-positive rating for digital mammography, but the difference was not statistically significant. Five of the six readers had a higher diagnostic performance rating with digital mammography by using Az values for comparison, and the difference was statistically significant for one of the five readers. Two of the six readers showed a statistically higher overall performance rating with digital mammography, if comparison is based on a fixed single false-positive fraction that defines BI-RADS category 3 or higher as a true-positive classification for cancers. Thus, the results of our study confirm the results of previous studies with experimental settings with a digital mammographic unit, which have shown that digital mammography is at least comparable with screen-film mammography in the detectability and characterization of microcalcifications and low-contrast objects (57,1518). The differences among the six readers reflect the challenges of interobserver variation in the interpretation of screening mammograms (19,20).

The substantially higher number of true-negative interpretations for benign masses with digital mammography compared with screen-film mammography in our study is noteworthy. This better characterization of benign masses with digital mammography seems to support the results of a previous study, which showed a statistically lower recall rate for women who underwent digital mammography, compared with those who underwent screen-film mammography, in a screening population (8). On the other hand, the higher number of false-positive classifications (BI-RADS category 3 or higher) of cases with normal findings in our study with digital mammography (11.9%) compared with screen-film mammography (8.2%) seems to support the results of the Oslo I and II studies, in which a higher recall rate was reported for women who underwent digital mammography (9,10). Interpretation of our results related to benign and malignant masses caused problems, partly because the ROC curves for masses intersected. The characterization of benign microcalcifications in our study showed no difference between the two modalities.

Digital mammography is expected to have a potential benefit over screen-film mammography for breast cancer detection in women with dense breast parenchyma because of increased contrast resolution and wider dynamic range. Classification of the population according to cancer cases in women with dense breast parenchyma (BI-RADS breast density categories 3 and 4) in our study, however, revealed no significant difference in the true-positive interpretations between the two modalities. Since there were only 18 cancers in this subgroup, a larger study would be required to verify improved detection and classification of cancers with digital mammography in women with dense breast parenchyma. There were only slightly, but not significantly, more correct true-negative interpretations with digital mammography for the benign masses in women with dense breast parenchyma.

The experimental design of this study led to some limitations. An important aspect of our study was the comparison of diagnostic performance ratings with respect to screen-film mammography versus digital mammography for the detection of small asymptomatic lesions, such as those usually found in women in a mammographic screening population. The large number of cancers included in the study population sample, however, does not reflect a screening situation, and this feature may explain the high number of false-positive interpretations, and especially the relatively high number of false-positive interpretations for cases with normal findings with both modalities. This clearly shows—once again—that results from retrospective experimental studies are difficult to compare with prospective interpretations performed in daily practice. Also, significant results could not be derived for different mammographic types of lesions, because of a relatively low number of cases in various subgroups and, consequently, a lack of power.

In conclusion, our results demonstrated that digital mammography with soft-copy reading was slightly superior to screen-film mammography in the detection and the characterization of lesions that are representative of those in women in a mammographic screening program. Overall, however, there was no statistically significant difference in diagnostic performance ratings between the two imaging modalities. The small number of cases in our study limits the ability to determine true differences between the two imaging modalities for various subgroups of lesions.


    FOOTNOTES
 

Abbreviations: Az = area under ROC curve • BI-RADS = Breast Imaging Reporting and Data System • ROC = receiver operating characteristic

2 Current address: Hologic, Hillsborough, NC Back

L.T.N. was a consultant for GE Medical Systems, Milwaukee, Wis. See Materials and Methods for pertinent disclosures.

Author contributions: Guarantor of integrity of entire study, P.S.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; literature research, P.S., F.D., S.D., J.C.P., K.Y., L.T.N.; clinical studies, all authors; statistical analysis, P.S., L.T.N.; and manuscript editing, P.S., C.B., F.D., S.D., J.C.P., L.T.N.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Saarenmaa I, Salminen T, Geiger U, et al. The visibility of cancer on earlier mammograms in a population-based screening programme. Eur J Cancer 1999;35:1118–1122.
  2. van Dijck JAAM, Verbeek ALM, Hendriks JHCL, Holland R. The current detectability of breast cancer in a mammographic screening program: a review of the previous mammograms of interval and screen-detected cancers. Cancer 1993;72:1933–1938.[CrossRef][Medline]
  3. Berns EA, Hendrick RE, Cutter GR. Performance comparison of full-field digital mammography to screen-film mammography in clinical practice. Med Phys 2002;29:830–834.[CrossRef][Medline]
  4. Suryanarayanan S, Karellas A, Vedantham S, Ved H, Baker SP, D'Orsi CJ. Flat-panel digital mammography system: contrast-detail comparison between screen-film radiographs and hard-copy images. Radiology 2002;225:801–807.[Abstract/Free Full Text]
  5. Feig SA, Eskola-Feig C. Visualization of ductal carcinoma in situ and early invasive carcinoma: comparison of film-screen mammography and full-field digital mammography. In: Yaffe MJ, ed. IWDM 2000: 5th International Workshop on Digital Mammography. Madison, Wis: Medical Physics Publishing, 2001; 451–460.
  6. Fischer U, Baum F, Obenauer S, et al. Comparative study in patients with microcalcifications: full-field digital mammography vs screen-film mammography. Eur Radiol 2002;12:2679–2683.[Medline]
  7. Obenauer S, Luftner-Nagel S, von Heyden D, Munzel U, Baum F, Grabbe E. Screen film vs full-field digital mammography: image quality, detectability and characterization of lesions. Eur Radiol 2002;12:1697–1702.[CrossRef][Medline]
  8. Lewin JM, Hendrick RE, D'Orsi CJ, et al. Comparison of full-field digital mammography with screen-film mammography for cancer detection: results of 4,945 paired examinations. Radiology 2001;218:873–880.[Abstract/Free Full Text]
  9. Skaane P, Young K, Skjennald A. Population-based mammography screening: comparison of screen-film and full-field digital mammography with soft-copy reading—Oslo I study. Radiology 2003;229:877–884.[Abstract/Free Full Text]
  10. Skaane P, Skjennald A. Screen-film mammography versus full-field digital mammography with soft-copy reading: randomized trial in a population-based screening program—the Oslo II study. Radiology 2004;232:197–204.[Abstract/Free Full Text]
  11. Lewin JM, D'Orsi CJ, Hendrick RE, et al. Clinical comparison of full-field digital mammography and screen-film mammography for detection of breast cancer. AJR Am J Roentgenol 2002;179:671–677.[Abstract/Free Full Text]
  12. Pedersen K, Nordanger J. Quality control of the physical and technical aspects of mammography in the Norwegian breast-screening programme. Eur Radiol 2002;12:463–470.[CrossRef][Medline]
  13. Obuchowski NA. Receiver operating characteristic curves and their use in radiology. Radiology 2003;229:3–8.[Abstract/Free Full Text]
  14. Cole EB, Pisano ED, Kistner EO, et al. Diagnostic accuracy of digital mammography in patients with dense breasts who underwent problem-solving mammography: effects of image processing and lesion type. Radiology 2003;226:153–160.[Abstract/Free Full Text]
  15. Bick U, Diekmann F, Marth F, Le Roux A, Juran R, Friedrich M. Comparison of a new full-field digital mammography system and conventional film-screen mammography: a phantom study (abstr). Radiology 1999; 213(P):152.
  16. Diekmann S, Bick U, von Heyden H, Diekmann F. Visualization of microcalcifications on mammographies obtained by digital full-field mammography in comparison to conventional film-screen mammography [in German]. Rofo 2003;175:775–779.[Medline]
  17. Rosol MS, Niklason LT, Venkatakrishnan V, Silvenoinnen H, Kopans DB, Hamberg LM. Contrast-detail comparison of a full-field mammography system and a screen-film system (abstr). Radiology 1999; 213(P):151.
  18. Venta LA, Hendrick RE, Adler YT, et al. Rates and causes of disagreement in interpretation of full-field digital mammography and film-screen mammography in a diagnostic setting. AJR Am J Roentgenol 2001;176:1241–1248.[Abstract/Free Full Text]
  19. Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists. Arch Intern Med 1996;156:209–213.[Abstract]
  20. Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists' interpretation of mammograms. N Engl J Med 1994;331:1493–1499.[Abstract/Free Full Text]



This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
K. B. Krug, H. Stutzer, R. Schroder, J. Boecker, J. Poggenborg, and K. Lackner
Image Quality of Digital Direct Flat-Panel Mammography Versus an Analog Screen-Film Technique Using a Low-Contrast Phantom
Am. J. Roentgenol., September 1, 2008; 191(3): W80 - W88.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Gur, A. I. Bandos, C. S. Cohen, C. M. Hakim, L. A. Hardesty, M. A. Ganott, R. L. Perrin, W. R. Poller, R. Shah, J. H. Sumkin, et al.
The "Laboratory" Effect: Comparing Radiologists' Performance and Variability during Prospective Clinical and Laboratory Mammography Interpretations
Radiology, August 5, 2008; (2008) 2491072025.
[Abstract] [Full Text]


Home page
RadiologyHome page
R. E. Hendrick, E. B. Cole, E. D. Pisano, S. Acharyya, H. Marques, M. A. Cohen, R. A. Jong, G. E. Mawdsley, K. M. Kanal, C. J. D'Orsi, et al.
Accuracy of Soft-Copy Digital Mammography versus That of Screen-Film Mammography according to Digital Manufacturer: ACRIN DMIST Retrospective Multireader Study
Radiology, April 1, 2008; 247(1): 38 - 48.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Gur, A. I. Bandos, and H. E. Rockette
Comparing Areas under Receiver Operating Characteristic Curves: Potential Impact of the "Last" Experimentally Measured Operating Point
Radiology, April 1, 2008; 247(1): 12 - 15.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Gur
Digital Mammography: Do We Need to Convert Now?
Radiology, October 1, 2007; 245(1): 10 - 11.
[Full Text] [PDF]


Home page
radtechHome page
J. H. BRUSIN
Digital Mammography: An Update
Radiol. Technol., January 1, 2006; 77(3): 226M - 234M.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
D. D. Dershaw
Film or digital mammographic screening?
N. Engl. J. Med., October 27, 2005; 353(17): 1846 - 1847.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2371041605v1
237/1/37    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Skaane, P.
Right arrow Articles by Niklason, L. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Skaane, P.
Right arrow Articles by Niklason, L. T.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE