|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Breast Imaging |
1 From the Department of Radiology, Breast Imaging Center, Ullevaal University Hospital, Kirkeveien 166, N-0407 Oslo, Norway. From the 2002 RSNA scientific assembly. Received October 7, 2003; revision requested November 3; revision received December 7; accepted January 2, 2004. Address correspondence to P.S. (e-mail: per.skaane@ulleval.no).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Of 43,429 women invited, 25,263 women aged 4569 years attended the screening program and were randomized, with adjustments for age and area of residence, to undergo SFM or FFDM. Two standard views of each breast were acquired. Independent double reading was performed with use of a five-point rating scale for probability of cancer. Recall rates, positive predictive values, and cancer detection rates were compared for two age groups (4549 and 5069 years) by using the
2 test.
RESULTS: Overall, 73 cancers in 17,911 women were detected at SFM (detection rate, 0.41%), compared with 41 cancers in 6,997 women at FFDM (detection rate, 0.59%; P = .06). In the group aged 5069 years, 56 cancers in 10,304 women were detected at SFM (detection rate, 0.54%), compared with 33 cancers in 3,985 at FFDM (detection rate, 0.83%); the difference in cancer detection rates approached significance (P = .053). In the group aged 4549 years, 17 cancers in 7,607 women were detected at SFM (detection rate, 0.22%), compared with eight cancers in 3,012 at FFDM (detection rate, 0.27%). Recall rates in both age groups were significantly higher at FFDM than at SFM (P < .05), but positive predictive value was not significantly different.
CONCLUSION: FFDM allowed a higher cancer detection rate than did SFM in the group aged 5069, although the difference did not reach statistical significance. The detection rate was nearly equal for the two modalities in the group aged 4549. SFM and FFDM with soft-copy reading are comparable techniques for population-based screening mammography programs.
© RSNA, 2004
Index terms: Breast neoplasms, radiography, 00.32, 00.81 Breast radiography, comparative studies, 00.112, 00.1215 Cancer screening Radiography, digital, 00.1215
| INTRODUCTION |
|---|
|
|
|---|
So far, conventional screen-film mammography (SFM) with high spatial resolution has been the modality of choice for screening programs. Many believe digital mammography to be the next step in the evolution of mammography, as digital systems increasinglyreplace conventional film imaging systems. The digital mammography equipment developed in the past few years has demonstrated superior depiction of low-contrast objects in contrast-detail studies (3,4). This improvement, along with a wider dynamic range, is expected to increase the diagnostic quality of images, particularly those acquired in dense breasts. Therefore, digital mammography may provide better images than does SFM in women who are younger than 50 years, the age group usually associated with dense breast tissue. The greatest flexibility and benefit of digital technology are realized primarily in soft-copy image display and, consequently, in soft-copy reading.
Before digital technology can be recommended for population-based screening mammography programs, large-scale clinical trials must be performed to evaluate the new technology and assess its comparability with SFM for the detection of early breast cancer. So far, there have been only two major studies in which the performance of SFM has been compared with that of full-field digital mammography (FFDM) in asymptomatic women in a screening situation (5,6). The results of both of these paired studies showed no statistically significant differences in cancer detection rate between SFM and FFDM, although fewer cancers were detected in both studies with FFDM (5,6). The use of FFDM also resulted in a significantly lower recall rate in one of the two studies (5). Investigators in these previous studies, however, used prototype display equipment (5) or a suboptimal soft-copy display environment (6). The purpose of our study was to prospectively compare cancer detection rates, recall rates, and positive predictive values (PPV) at SFM with those at FFDM with soft-copy reading in a population-based screening program in Norway.
| MATERIALS AND METHODS |
|---|
|
|
|---|
During the study period, 25,263 (58.2%) of the 43,429 invited women attended the screening program. Of 23,442 invited women aged 5069 years, 14,436 (61.6%) attended; of 19,987 invited women aged 4549 years, 10,827 (54.2%) attended (Fig 1). Of the women aged 5069 years, 10,391 (72%) were randomized to undergo SFM and 4,045 (28%) were randomized to undergo FFDM. Of the women aged 4549 years, 7,663 (71%) were randomized to undergo SFM and 3,164 (29%) were randomized to undergo FFDM. A total of 352 women did not undergo mammographic examination with the modality to which they had been randomized (Fig 1): One woman in the group aged 5069 who was assigned to undergo SFM was so severely disabled that it was not possible to carry out the planned examination, and ultrasonography (US) was performed instead. Also in the group aged 5069 years, 84 women randomized to undergo SFM underwent FFDM, and 59 women randomized to undergo FFDM underwent SFM. In the group aged 4549 years, 56 women randomized to undergo SFM underwent FFDM, and 152 women randomized to undergo FFDM underwent SFM. The reasons for these changes in modality included equipment availability at our facilities, participants unwillingness to undergo "ordinary" SFM instead of FFDM, or participants undergoing examination at a different facility (because of disability, breast implant, or personal preference). All of the women were informed about and consented to the use of the modality actually used for imaging. The Breast Imaging Center at Ullevaal University Hospital was equipped with only one SFM and one FFDM unit, and ongoing diagnostic imaging work-up and interventional procedures performed at the Breast Imaging Center occasionally prevented the use of the screening modality to which the women had been assigned. One hundred two (73%) of the 140 women who were randomized to undergo SFM but underwent FFDM did so because they had breast implants. Women with breast implants are generally offered FFDM in our department, since we consider the use of FFDM preferable to that of SFM in these women. In addition, although the technologists were instructed to strictly implement the randomized assignments during the study period, some women who had been randomized to undergo SFM underwent FFDM because of the busy daily workflow. Cancers were found in three women who were excluded from analysis: Invasive ductal carcinoma was found in the disabled woman who underwent US screening and in one of the two women with breast implants in the group aged 4549 years who were randomized to undergo SFM but underwent FFDM; in the other woman, ductal carcinoma in situ was diagnosed.
|
The study was approved by the regional ethical committee. Informed consent was obtained as noted in the preceding paragraph.
Imaging
All SFM examinations were performed by using one of three mammography units (Mammomat 300; Siemens Medical Systems, Erlangen, Germany) with Min-R 2000 film and Min-R 2190 screens (Eastman Kodak, Rochester, NY) in both standard and large formats. A molybdenum anode, molybdenum filter, and 29 kV were used for all examinations. The hospital physicist chose 29 kV in accordance with the recommendations of the NBCSP, to keep the dose low while maintaining acceptable image quality. SFM images had a mean optical density of 1.4 during the study period and, thus, were in compliance with the NBCSP requirement of 1.21.8. FFDM images were acquired by using one of two available FFDM units (Senographe 2000D; GE Medical Systems, Milwaukee, Wis) equipped with an automatic mode (automatic optimization of parameters, or AOP) in which an anode track-filter combination and kV were selected automatically after analysis of premammographic data obtained with a brief exposure. Automatic optimization of parameters was used in the standard dose mode, according to the manufacturers recommendations. The area of the image detector was 19 x 23 cm and the pixel size was 100 x 100 µm. Additional images, one or two in each breast, were obtained if the entire breast could not be imaged with the 19 x 23-cm digital detector. Mammograms from both modalities (FFDM and SFM) included the two standard views (craniocaudal and mediolateral oblique) of each breast.
Image Interpretation
The images were interpreted the day after acquisition at the Breast Imaging Center. Eight radiologists, each with more than 4 years of experience in screening mammography, participated in image interpretation during the study period. The SFM and FFDM images were interpreted independently by two of these eight radiologists. Prior mammograms were not offered for comparison during the initial SFM and FFDM image interpretation sessions because of logistic problems that might have caused an interpretation bias. Prior mammograms were, however, always offered at the consensus meetings, if prior images were available. The SFM images were read by using two standard motorized mammography alternators. A magnifying glass was offered for SFM reading, and its use was recommended. The FFDM images were interpreted by using soft-copy reading at a workstation that included two high-resolution 2,000 x 2,500-pixel monitors and a dedicated keypad. The recommended protocol for FFDM image display and reading included three steps. First, all four views were displayed: The two craniocaudal views were displayed back-to-back on one monitor and the two mediolateral oblique views were displayed back-to-back on the other monitor. Next, both craniocaudal views were presented at full size, one on each monitor. Last, both mediolateral oblique views were presented at full size, one on each monitor. This display protocol, as well as further postprocessing of the images (eg, adjustments of window level and zoom), was strongly recommended for all cases, at least for the two full-size mediolateral oblique views (because they depict most of the breast parenchyma).
A steering committee, which included representatives from the Institute of Population-based Cancer Research, Norwegian Radiation Protection Authority, and Ullevaal University Hospital, oversaw the operation of the program and ensured consistency. The committee decided, for ethical and juridical reasons, that four independent readers should interpret the FFDM images for the first 4 months or until preliminary results proved a cancer detection rate at FFDM equal to or higher than that at SFM. The reason for this decision was a concern that the cancer detection rate at FFDM might be lower, as had been found in the Oslo I study (6). If the preliminary analysis after 4 months had shown a statistically lower cancer detection rate at FFDM than at SFM, the study would have been stopped.
The interpretations of SFM and FFDM images were input directly into a central database (QBE Vision; Sysdeco Technology, Oslo) at the Norwegian Cancer Registry by the readers, who used either a light pen (bar code technology) or a mouse connected to a personal computer and placed next to the alternator for SFM examinations or next to the viewing station for FFDM examinations. To avoid potential bias as a result of four readers interpreting FFDM images and only two interpreting SFM images during the first 4 months of the program, we decided that the first two readers of the FFDM images who entered their reports into the database would be defined as the "official" readers and that the third and fourth readers would be defined as "unofficial" readers. Only the interpretations by the two official readers were used in the analysis for comparison of the two modalities. A patients further diagnostic work-up was, of course, unaffected by whether the patient was recalled by official or unofficial readers.
A discrete five-point rating scale for probability of cancer was used for interpretation of both SFM and FFDM images as follows: A score of 1 indicated normal or definitely benign findings, 2 indicated findings that were probably benign, 3 indicated indeterminate findings, 4 indicated probable malignancy, and 5 indicated definite malignancy. If at least one of two readers (one of four readers, for FFDM images in the first 4 months of the study) assigned the mammographic finding a score of 2 or higher, the case was automatically selected for a consensus meeting (marked in the database). Other indicators for selection to consensus meetings included the presence of clinical symptoms, especially a palpable lump, or technical insufficiency of the examination. Any clinical findings were recorded by the technologists in the screening center on an information sheet that was always placed next to the alternator or the viewing station during the image interpretation sessions. There was no time limit for the interpretation of either SFM or FFDM images.
Consensus meetings for the SFM and the FFDM image interpretations were held twice weekly. Hard-copy mammograms from previous screening examinations, if available, were offered for both the SFM and the FFDM consensus meetings. All radiologists were encouraged to participate in the consensus meetings, but, usually, only two radiologists (the prescribed minimum) were present. The radiologists who attended the consensus meetings were not necessarily the same radiologists who scored the SFM or FFDM images for probability of cancer. At the consensus meeting, the hard-copy images were used for SFM image interpretation, whereas soft-copy display was used with the various available options for FFDM image interpretation. The outcome of the consensus meeting was a decision about which women should continue in the screening program and which women should be recalled for diagnostic work-up according to the prescribed guidelines (see Diagnostic Work-up). A flowchart of the Oslo II study design is presented in Figure 2.
|
Diagnostic Work-up
The diagnostic work-up of women who were recalled was carried out at the Breast Imaging Center, usually within 12 weeks after the consensus meeting. Work-up included the acquisition of spot-compression and magnification views, US images, and magnetic resonance images if needed. Percutaneous fine-needle aspiration was the standard technique used for biopsies and was performed with US guidance if possible; otherwise, stereotactic guidance or a perforated compression plate was used. Cytologic and histologic analyses were carried out in the Department of Pathology. All cancers (including breast malignancies) in Norway are reported to the Norwegian Cancer Registry, which maintains a database linked to that of the NBCSP. This system enabled the surveillance of the population included in our screening mammography program.
Statistical Analysis
Medical audit parameters for screening mammography programs, including recall rate, cancer detection rate, and PPV, were calculated separately for the two modalities and two age groups. The recall rate was defined as the percentage of patients for whom further imaging work-up was recommended at the consensus meeting. PPV was the percentage of all screening examinations that resulted in a breast cancer diagnosis based on abnormal mammographic findings (ie, findings scored 2 or higher on the five-point rating scale by at least one of the two readers). Statistical software was used for data analysis (Epi Info, version 6; Centers for Disease Control and Prevention, Atlanta, Ga). The
2 test was used to compare recall rates, PPVs, and cancer detection rates between the two modalities in both age groups. A P value of less than .05 was considered to indicate a statistically significant difference.
| RESULTS |
|---|
|
|
|---|
|
|
Positive Predictive Value
PPV, based on the number of women who were recalled because of abnormal mammographic findings, was 56 (22.1%) of 253 for SFM and 33 (21.6%) of 153 for FFDM in the group aged 5069 years. In the group aged 4549 years, PPV was 17 (7.4%) of 231 for SFM and eight (7.1%) of 112 for FFDM (Table). The differences in PPV between modalities within each age group were nonsignificant (P > .05).
Overall Comparison
A flowchart that summarizes the results of our comparison of SFM and FFDM for both age groups (5069 and 4549 years) is presented in Figure 4. In women aged 5069 years, the difference between the cancer detection rates of 0.83% for FFDM and 0.54% for SFM approached statistical significance (P = .053). In women aged 4549 years, the cancer detection rates of 0.22% for SFM and 0.27% for FFDM were nearly equal (P = .686). The number of breast cancers in the group aged 4549 years, however, was low, and it reflected the prevalence in the general population at these ages (Fig 4).
|
| DISCUSSION |
|---|
|
|
|---|
There are important differences between the two previous studies and the Oslo II study. Lewin et al used mammographic equipment with a prototype soft-copy display system (7), and, although SFM and FFDM images were interpreted independently, there was no independent double reading for each modality. Investigators in the Oslo I study used a production detector and a production display system. Independent double reading was used for both the Oslo I and Oslo II studies. A dedicated room for soft-copy reading was not used during the Oslo I study, and a suboptimal reading environment for FFDM images may have influenced the results. In the Oslo II study, the soft-copy reading was carried out in dedicated, darkened, and quiet environments. Most of the eight readers in the Oslo I study had a limited experience with FFDM and soft-copy reading before the project started, whereas seven of the eight radiologists who took part in the Oslo II project had also interpreted digital mammograms in the Oslo I study and were consequently experienced in soft-copy reading. Unlike the study by Lewin et al and the Oslo I study, in which a paired design was used, the Oslo II study was a randomized screening trial and involved a large study population. The Oslo II trial included review of all positive mammograms, and mammograms were available for comparison only during the review of positive findings.
Previous experimental or retrospective studies in which SFM and FFDM were compared in smaller populations showed that FFDM is comparable with SFM with regard to the detectability, conspicuity, and characterization of microcalcifications and low-contrast objects (810). An important question, therefore, is why more cancers were missed at FFDM in the two previous large-scale trials (the study by Lewin et al and the Oslo I study). The prototype display system used in the study by Lewin et al may have influenced the results; Lewin and colleagues also considered that an improved workstation might have resulted in the detection of more cancers at FFDM (7). Current workstations are equipped with image processing algorithms that were not present on prototype systems, at least for part of the study reported by Lewin et al. These algorithms allow display of the wide range of attenuation values that are typically measured, from the skin line to the chest wall, without a loss of local contrast. The absence of these algorithms may partly explain the lower cancer detection rate in that study. Investigators in the Oslo I study, on the other hand, used a production display and still found a slightly lower cancer detection rate for FFDM compared with SFM (6). On the basis of a retrospective side-by-side feature analysis performed by external radiologists as part of the Oslo I study, it was concluded that the conspicuity of cancers was equal for SFM and FFDM (6). The higher number of missed cancers at FFDM in the two previous studies (Lewin et al and Oslo I) could be related to positioning variability and interpretation errors, and, thus, might indicate the huge challenge presented by interobserver variation in mammographic interpretation (6,7,11). The differences in cancer detection rate between the two modalities in the two previous studies were not statistically significant and may have been due only to the inadequate size of the study populations to demonstrate the absence of such a difference.
Another important question in this respect is the use of soft-copy reading in screening mammography. It has been suggested that interpretation with soft-copy display is unlikely to substantially change accuracy, but a number of parameters that may affect soft-copy accuracy must be considered (12). These parameters include image processing, reader experience, and viewing conditions. The important change from a lower cancer detection rate in the Oslo I study to a higher detection rate in the Oslo II study for FFDM compared with SFM can most likely be explained by two circumstances: First, there may have been a learning curve effect, as the readers had little experience in soft-copy reading in the Oslo I study but were experienced in this technology in the Oslo II study. Second, the viewing conditions in Oslo II were substantially improved over those in Oslo I. We underestimated the importance of reading environments during the Oslo I study, but we used a dedicated, darkened, quiet room for the Oslo II study. Optimal reading environments are important for detection of mammographic abnormalities on both hard-copy and soft-copy images (13,14).
Unlike the results of the study by Lewin et al, in which a significantly lower recall rate was found for FFDM, the results of our study show a significantly higher recall rate for FFDM (P < .05) in both age groups (4549 and 5069 years). The higher overall recall rate in the study by Lewin et al compared with the rates in the Oslo I and Oslo II studies is most likely explained by the fact that the threshold of suspicion leading to recall is lower in the United States than in Norway, because of the different medicolegal environments. In U.S. practice, recall rates between 4.9% and 5.5% are considered to indicate the best trade-off between sensitivity and PPV (18). According to the guidelines of the NBCSP, however, the recall rate for incident breast screening examinations should be less than 3.5%. The significantly higher recall rates for FFDM in both age groups in our study may at least partly explain the higher cancer detection rate for FFDM, although the difference in detection rates was not statistically significant. Two circumstances in the Oslo studies should be noted: First, prior screening mammograms were not offered for comparison at the initial reading sessions because logistic problems might have introduced an interpretation bias. The decision not to offer comparison mammograms was supported by reports that their use does not significantly increase the cancer detection rate at screening mammography (15,16). The review of prior mammograms may, however, result in a statistically significant increase in the specificity of the screening method and a consequent reduction in the recall rate (15,16). Prior mammograms, if available, were always offered for comparison at our consensus meetings, to increase the specificity of screening. Double reading with consensus or arbitration, as applied in the Oslo II study, is associated with an increased cancer detection rate together with a decrease in the number of women recalled for diagnostic work-up (17). Second, we do not recommend short-term follow-up of probably benign lesions.
It has been suggested that FFDM, because of its superior contrast resolution and dynamic range, may provide better images of dense breasts than can be obtained with SFM. An important aspect of our study was the comparison of cancer detection rates at SFM with those at FFDM in women younger than 50 years, the age group usually associated with dense breast tissue. The flow diagram of cancer detection according to modality and age group shows a higher cancer detection rate with FFDM than with SFM in the group aged 5069 yearsa difference that is close to statistical significance (P = .053)but nearly identical detection rates for the two modalities in the group aged 4549 years (P = .686). The latter rates are comparable with those reported for women younger than 50 years in other screening programs (19,20). The number of cancers in the group aged 4549 years in our study, however, was small. Therefore, our results do not permit any final conclusions regarding the comparison of SFM and FFDM in women younger than 50 years.
The lack of power is a problem common to the three large-scale trials conducted so far (the study by Lewin et al, the Oslo I study, and the Oslo II study of the group aged 4549 years), and the nonsignificant differences found in these studies may be the results of random statistical variation rather than indicators of any real difference. A limitation of our study is the small number of cancers in the group aged 4549 years because of the low prevalence of breast cancer in this younger age group and the limited number of women included in the study. Many women in this age group have a dense breast parenchyma that may easily lead to misperception of subtle mammographic abnormalities. The difference in the percentages of ductal carcinomas in situ detected with the two modalities (41% of ductal carcinomas in situ for SFM, versus 25% of ductal carcinomas in situ for FFDM) may at first glance seem noteworthy and may raise the question of whether microcalcifications are better detected with SFM than with FFDM in dense breast parenchyma. However, the number of cases is too small to support a definitive conclusion. The nearly equal percentages of ductal carcinomas in situ detected with the two modalities in the group aged 5069 years (27% of ductal carcinomas in situ for SFM, versus 30% of ductal carcinomas in situ for FFDM) does not indicate any difference in effectiveness between the two modalities. Another limitation of our Oslo II study is that comparisons between SFM and FFDM were available only during review of positive mammograms. Low recall rates and no short-term follow-up for probably benign lesions in our screening program are circumstances that might have caused cancers represented by a positive score on images in either modality to be dismissed at the consensus meetings. Follow-up for 2 years would be necessary to detect such incorrectly dismissed cancers and to evaluate for interval cancers in a subsequent screening round.
In conclusion, FFDM with soft-copy reading demonstrated a higher cancer detection rate than SFM in the group aged 5069, but the difference was not statistically significant. Our study provides evidence that FFDM with current production equipment is equivalent to SFM for breast cancer detection and consequently well suited for use in screening mammography.
| FOOTNOTES |
|---|
Author contributions: Guarantor of integrity of entire study, P.S.; study concepts and design, P.S., A.S.; literature research, P.S.; clinical studies, P.S., A.S.; data acquisition and analysis/interpretation, P.S., A.S.; statistical analysis, P.S.; manuscript preparation, definition of intellectual content, editing, and revision/review, P.S., A.S.; manuscript final version approval, P.S.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Van Ongeval, A. Van Steen, and H. Bosmans Teaching syllabus for radiological aspects of breast cancer screening with digital mammography Radiat Prot Dosimetry, March 1, 2008; 129(1-3): 191 - 194. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. D. Pisano, R. E. Hendrick, M. Yaffe, E. F. Conant, and C. Gatsonis Should Breast Imaging Practices Convert to Digital Mammography? A Response from Members of the DMIST Executive Committee Radiology, October 1, 2007; 245(1): 12 - 13. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |