|
|
||||||||
Breast Imaging |
,
Marco Rosselli del Turco, MD,
Nils Bjurstam, MD,
Hans Junkermann, MD,
David Beijerinck, MD,
Brigitte Séradour, MD and
Carl J. G. Evertsz, PhD
1 From the Department of Radiology, Radboud University Nijmegen Medical Center, Geert Grooteplein 10, 667 Radiology, 6500 HB Nijmegen, the Netherlands. From the 2004 RSNA Annual Meeting. Received April 25, 2005; revision requested June 21; revision received November 21; accepted December 15; final version accepted April 10, 2006. Supported by a grant from the European Community in the 5th Framework Information Society Technologies program (IST-2001-33439, SCREEN-TRIAL). Address correspondence to A.A.J.R. (e-mail: T.Roelofs{at}rad.umcn.nl).
| ABSTRACT |
|---|
|
|
|---|
Materials and Methods: Institutional review board approval was not required. Participants gave written informed consent. Twelve experienced screening radiologists read 160 soft-copy screening mammograms twice, once with and once without prior mammograms. Eighty mammograms were obtained in women in whom breast cancer was diagnosed later; the other 80 mammograms had been reported as normal or benign. All cancers were visible in retrospect. Readers located potential abnormalities, estimated likelihood of malignancy for each finding, and indicated whether prior mammograms were considered necessary. The effect of prior mammograms on detection was determined by computing the mean lesion localized fraction in a range of low fractions of nonlesion locations corresponding to operating points in screening. Scores for both reading sessions were combined to assess the effect of making prior mammograms available only when requested. Data were analyzed by comparing the number of localized lesions between the two reading conditions with a paired two-tailed Student t test and applying a linear mixed model to test differences in average mean lesion localized fraction between reading conditions. P values less than .05 indicated statistical significance.
Results: Without prior mammograms, significantly more annotations were made. When only positive cases were considered, no difference was observed. Reading performance was significantly better when prior screening mammograms were available. At fixed lesion localized fraction, nonlesion localized fraction was reduced by 44% (P < .001) on average when prior mammograms were read. Performance was also increased for combined reading mode (ie, when prior mammograms were available on request only). However, this increase was smaller than that when prior mammograms were always available. Prior mammograms were requested in 24%33% of all cases and were requested more often in positive cases.
Conclusion: Comparison with prior mammograms significantly improves overall performance and can reduce referrals due to nonlesion locations. Limiting the availability of prior mammograms to cases selected by the reader reduces the beneficial effect of prior mammograms.
© RSNA, 2007
| INTRODUCTION |
|---|
|
|
|---|
With the upcoming transition from screen-film mammography to full-field digital mammography, conventional film image viewers will be replaced with soft-copy image reading equipment. Organization of this transition will require a major effort. In particular, the use of prior mammograms poses a challenge, as reading digital images in combination with film images is difficult to organize and may lead to a loss of efficiency. One solution that has been considered is the digitization of prior screening mammograms. Clearly, this would require a considerable effort, which should be balanced by the medical benefits provided by the use of prior mammograms in the screening process. Another solution involves limiting the number of prior mammograms used; for instance, it could be left to the reader to decide whether prior mammograms should be retrieved from patient files. Yet another alternative is to use prior mammograms only during the second reading of suspicious mammograms; this procedure was used in a digital screening program described in the literature (1,2) and in some film-based screening programs (3). Thus, the purpose of our retrospective study was to determine the influence of comparison of current mammograms with prior mammograms on breast cancer detection in screening and to investigate a protocol in which prior mammograms are viewed only when the reader deems it necessary.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Study Mammograms
Mammograms used in this study were taken from the population-based breast cancer screening program in the Netherlands. In this program, women aged 5075 years are invited to undergo screening examinations every 2 years. In this program, women with positive findings are referred to a hospital, where additional imaging and diagnostic procedures are performed. In this article, referral is related to this practice and has a meaning that is similar to the meaning of recall. Fracheboud et al (4) described the organization of this program in detail.
In our study, the study radiologist (J.H.C.L.H., with more than 30 years of experience with screening mammography) selected a series of 180 cases from this screening program. This series was divided as follows: 80 cases (patient age range, 5273 years) had malignant findings, 60 (patient age range, 5170 years) had normal findings, and 20 (patient age range, 5380 years) had benign findings; an additional 20 cases (patient age range, 5370 years) with malignant findings were included for training purposes. Cases were selected randomly; however, in malignant cases, the abnormality had to be visible on the prior mammogram in retrospect. All images were checked for proper positioning and image quality. All original screening mammograms reviewed in this study were obtained from the Dutch Screening Program. Women participating in this program were asked to give written informed consent that also included consent for their data to be used for future retrospective research. Our study was conducted according to Dutch privacy laws, which indicated that institutional review board approval was not required to use these mammograms in our study because they were anonymized.
For all cancer cases, the mammogram obtained at the time cancer was detected (diagnostic mammogram) and the mammograms obtained in two prior screening sessions were available. The diagnostic mammograms were either clinical mammograms of interval cancers (n = 42; mean time to detection, 12.1 months; range, 131 months) or mammograms of screen-detected cancers (n = 38). In this study, we used the last negative screening mammograms obtained before detection of cancer and corresponding prior mammograms; hereafter, these last negative screening mammograms will be referred to as current mammograms. The study radiologist reviewed all positive mammograms and used diagnostic mammograms and pathology reports to mark the locations of the cancers. This radiologist did not participate as an observer in the current study. The observers in this study remained unaware of the pathologic nature of the lesions until the whole study was completed.
Patients with normal findings were not referred, and their healthy status was confirmed with at least one negative follow-up screening mammogram. For each normal case, two consecutive screening mammograms were available, obtained from the same screening rounds and screening units as the positive cases to avoid biases due to variation of equipment and image quality. The 60 normal cases were selected from a larger series of normal cases that were used previously in an observer study (5). To enrich the series with more difficult cases, we excluded some of the obvious normal cases for which radiologists in the earlier study did not report relevant findings. To further enrich the case sample, we included 20 cases obtained in women referred with benign abnormalities that were confirmed with biopsy or negative follow-up findings.
The study radiologist estimated breast composition in each of the cases. Composition was fatty (n = 26), less than 25% dense (n = 93), or between 25% and 75% dense (n = 41). Lesion characteristics in the cancer cases are shown in Table 1. In the set of benign cases, eight cases contained one or more clusters of microcalcifications and 12 contained a mass.
|
A number of predefined display protocols for the mammograms provided the reader with various levels of spatial resolution. A new case was first presented in a low-spatial-resolution overview mode. In this 2 x 4 display mode, the current mammogram was displayed on the lower half of the monitor and the prior mammogram was displayed on the upper half. Mediolateral oblique views were displayed on the left half of the monitor, and craniocaudal views were displayed on the right half. Readers could use the keypad to subsequently switch to full-screen display of mediolateral oblique views and then to full-screen display of craniocaudal views. With the availability of prior mammograms, the user could toggle between the full-screen prior mammogram and the current mammogram on the same monitor. In a previous study (8), this was shown to be the best way to detect temporal changes in lesion size. Higher-spatial-resolution display of a section of a mammogram was made possible by use of a full-screen roaming function or an electronic magnifying glass. Users could change contrast and brightness by selecting a proper gray scale from a series of predefined look-up tables or by using a window-and-level interface.
All mammograms used in this study were digitized at 50 µm and 12 bits per pixel with a film scanner (Lumisys LS85; Lumisys, Sunnyvale, Calif). Digital images were anonymized and archived after averaging the spatial resolution down to 100 µm per pixel. Before display, a modified version of the basic unsharp masking algorithm (9) was used to compensate for image blurring during digitization and display. In an earlier study (5), we showed that reader performance was not diminished by the procedures used for digitization, processing, and display in comparison to conventional film reading.
Computer-aided Detection
Our initial experience with soft-copy image reading indicated that radiologists believed in the ability of computer-aided detection (CAD) to assist with the detection of microcalcification clusters. Some readers commented that CAD might be needed to efficiently detect microcalcifications with reading of soft-copy screening mammograms because of the limited spatial resolution of the displays. For this reason, we made CAD (version 2.0; R2 Technology) available for detection of microcalcifications. To avoid the risk of confounding our results, we did not use CAD to detect masses. When CAD is used to detect masses, it may easily confuse radiologists who are not properly trained. The user could access CAD results for microcalcifications via the keypad. Once CAD was activated, microcalcification clusters were marked by white circles in the overview mode or by highlighting detected individual microcalcifications at full brightness when full-screen images were displayed. The threshold level for display of CAD results was adjusted so that nonlesion marks for microcalcifications occurred at a rate of 0.4 nonlesion marks per image.
Observer Study
Twelve screening radiologists (including M.R.d.T, N.B., H.J., and D.B.) from six screening centers in five European countries participated in this study. All radiologists were experienced: 10 read at least 5000 screening mammograms per year, and two read at least 3000 mammograms per year. Participants read the mammograms twice in separate reading sessions: In one session, only current mammograms were included. In the other, both current and prior mammograms were available.
In preparation for the reading sessions, the 160 study cases were divided into four sets of 40 cases. Each set contained 20 malignant cases and 20 normal or benign cases. The order of the cases within a set read with prior mammograms differed from that of the cases within a set read without prior mammograms. The sets were presented in a balanced order regarding the availability of prior mammograms. Radiologists were informed about the approximate percentage of cancer cases in the total case selection. Time between the first and second readings of each set was at least 4 weeks. The duration of the reading sessions was not restricted.
In both reading modes, readers used dedicated workstation tools to mark locations of potential abnormalities, estimate the likelihood of malignancy, and classify abnormalities according to one or more of the following five categories: mass, microcalcifications, architectural distortion, asymmetry, or other. In addition, readers were asked to record for each breast whether referral was required and if prior mammograms were needed to make a decision. For this purpose, a five-point scale was used (Table 2). Recorded differences between categories 1 (ie, not needed) and 2 (ie, helpful) and categories 4 (ie, helpful) and 5 (ie, not needed) were not analyzed in this study. Readers were asked to use a low threshold when reporting their findings to ensure that most regions they considered as possible cancer locations were recorded for further analysis. In the study protocol, we suggested that readers report each abnormality to which their attention was drawn for more than 5 seconds or to report at least one finding per two cases, on average. To assess the likelihood of malignancy of each finding, a rating scale with discrete levels ranging from 0% to 100% was used. The workstation software supported linking of lesion locations in different views, which allowed them to be scored as one finding.
|
Statistical Analysis
The total number of findings by all readers with viewing prior mammograms and the total number of findings by all readers without viewing prior mammograms were calculated, and the difference was tested for significance with a paired two-tailed Student t test (A.A.J.R.). If initial detection of cancers was hampered by the absence of prior mammograms, we expected the total number of localized lesions to be smaller when prior mammograms were unavailable. Therefore, we compared the total number of localized lesions detected by each radiologist, irrespective of the malignancy rating, for each of the two reading conditions with a paired two-tailed Student t test. A P value of less than .05 was considered to indicate a statistically significant difference.
To compare detection results in the two reading modes, we computed lesion localized fraction as a function of the nonlesion localized fraction by using malignancy ratings of the reported regions and verifying the location. This method is known as the localized receiver operating characteristic (LROC) paradigm. We did not use standard LROC analysis (10) because it fits combined LROC and receiver operating characteristic data under independence assumptions but does not allow us to compare different curves for multiple modalities (n = 3) and multiple readers (n = 12). A lesion was considered localized when the mathematical center of mass of the lesion was closer than 2.5 cm to a true cancer location. The LROC data points were generated in the standard manner by cumulating the ratings. The lesion localized fraction was calculated by dividing the number of cases in which cancer was detected by the total number of cases with cancer. The distance criterion of 2.5 cm minimized the risk of counting accidental hits and was large enough to avoid the possibility of counting correct detections as nonlesion locations (11).
In our experimental design, readers were free to report multiple lesion locations in a case; however, case-based analysis was used. The fraction of cases with nonlesion locations assigned by a reader at a given value of the decision threshold was determined by dividing the number of normal and benign mammograms with a nonlesion location (ie, with a finding that had a malignancy rating that exceeded the threshold) by the total number of normal and benign mammograms. Thus, nonlesion reports were not used in cancer cases; this allowed us to avoid the problem of incorrectly counting nonlesion locations because of ambiguities in the exact location of the truth annotations. Multiple nonlesion responses per image were not considered. For every case, only the nonlesion location with the highest malignancy rating was used. This way, part of the obtained data was not used. However, to our knowledge, no adequate method currently exists to analyze these data appropriately and resolve this limitation.
By combining scores obtained with both reading modes, we investigated the effect of a protocol in which prior mammograms were made available only if the reader decided they were needed. For this purpose, we used the responses in the five-point scale shown in Table 2. To compute the detection performance of a reader, we used the results of the sessions in which only current mammograms were displayed. However, if the reader assigned a score of 3 to a case in either the left or the right breast, thus indicating that prior mammograms were needed, we replaced the reading results for this case with the results of the reading session in which prior mammograms were available. From the findings obtained, we computed overall detection performance, as described previously. Also, the frequency with which prior mammograms were requested was calculated for each reader individually and for all readers. This was done separately for normal, benign, and malignant cases.
In breast cancer screening, the rate of nonlesion locations generally is lower than 10% (1214). Therefore, we used mean lesion localized fraction at low rates of nonlesion locations as the overall measure of reader performance. We computed these mean values for nonlesion localized fractions of less than 25%. A relatively large interval was chosen because our study sample was enriched with difficult normal and benign cases. The mean lesion localized fraction of the readers was computed for reading with and reading without prior mammograms and for the combined reading mode of prior mammograms on request. Differences in the mean lesion localized fraction between these three reading conditions were tested (A.A.J.R.) for statistical significance with a linear mixed model (SAS software, version 8.2; SAS Institute, Cary, NC). The dependent variable was the mean lesion localized fraction, the independent random variable was the observer (n = 12), and the independent fixed variable was the reading condition (n = 3). The appropriate least square mean of the difference between two of the three reading conditions (with 95% confidence intervals) is presented.
The LROC results were used to calculate the mean reduction in nonlesion localized fraction by taking the average percentage by which nonlesion localizations decreased in the interval of nonlesion localized fractions less than 25%. In this way, the reading mode in which prior mammograms were unavailable was compared with the mode in which prior mammograms were available and with the mode in which prior mammograms were available on request.
By using the five-point categorical scores, the readers also indicated for each case and independently for the left and right breasts whether they detected a lesion they would normally refer. On the basis of these scores, the frequencies by which decisions regarding referral differed between the two reading conditions were calculated.
| RESULTS |
|---|
|
|
|---|
Average LROC results for the 12 readers when reading with prior mammograms, when reading without prior mammograms, and when reading with prior mammograms available on request in a selected number of cases (Figure) showed that the availability of prior mammograms led to a considerable improvement in detection performance. Mean lesion localized fractions at nonlesion localized fractions less than 25% for each of the readers for reading with prior mammograms, without prior mammograms, and with prior mammograms available on request are shown in Table 3. Analysis of variance revealed that there were significant differences between the results of the three reading conditions (P < .001). Mean lesion localized fraction (Table 4) was significantly higher for reading with prior mammograms than for reading with prior mammograms available on request (P = .001). Mean lesion localized fraction for reading with prior mammograms available on request was significantly higher than that for reading without prior mammograms (P = .007). Prior mammograms were requested in 24%33% of cases, and they were requested more often for malignant and benign cases than for normal cases. The variation of frequencies in which prior mammograms were requested was large among the readers.
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Assessing the level of suspiciousness in a region is an important aspect of screening. An operating point that balances the negative aspects of delayed detection with the cost of referrals due to nonlesion locations needs to be chosen. Viewing our results in this light, it is remarkable that at any given level of lesion localized fraction the corresponding number of referrals due to nonlesion locations was much smaller when current mammograms were read in association with prior mammograms. On average, additional information revealed by the prior mammograms led to better decisions. The reduction of the nonlesion localized fraction had a fairly constant value in the lower range of nonlesion rates and was 44% (P < .001), on average.
In our experimental design, we chose to bias case selection toward difficult cases to increase the power of this study. All positive cases were prior screening mammograms obtained in patients with cancer. Negative cases were enriched with mammograms obtained in patients referred with benign findings and normal mammograms with signs of an abnormality. For this reason, the mean lesion localized fraction was rather low.
The fact that the case sample was biased is not a study limitation because it is possible to extrapolate our results to an unbiased case sample. First, it can be argued that if more obviously normal cases had been included in the study they would not have led to an increase in nonlesion radiologic findings. As a result, LROC curves would merely scale along the horizontal axis, which does not affect the relative reduction of the nonlesion localized fraction obtained by using prior mammograms. Second, adding obviously positive cases that almost all radiologists would identify to the study sample, regardless of the availability of prior mammograms, would shift the LROC curves vertically (ie, to higher lesion localized fraction levels). This would also leave the reduction of nonlesion locations at a given lesion localized fraction unchanged. On the basis of these arguments, we believe that a 44% reduction of nonlesion locations due to the use of prior mammograms is a good estimate of the benefit of prior mammograms in screening practice. After all, the benefit of prior mammograms seems to lie in the improved interpretation of abnormalities rather than in the initial detection of abnormalities. It is unlikely that the reading performance of radiologists with regard to interpretation of detected abnormalities in the presence or absence of prior mammograms in clinical practice would be much different from that in our study.
Few studies in the literature provide data with which to compare our results. Thurfjell et al (15) performed a study in which three radiologists retrospectively reviewed 150 cases. Specificity increased and lesion localized fraction seemed to decrease when prior mammograms were available. Burnside et al (16) retrospectively analyzed results from mammographic examinations and compared detection rates between cases that were read with and cases that were read without prior mammograms. The recall rate decreased from 4.9% to 3.8% when prior mammograms were used, without a significant difference in the detection rate. We excluded the study performed by Callaway et al (17) because it included only nine visible cancer cases. It is noted that our results may have been influenced by the 2-year interval between acquisition of current and prior mammograms. In some programs, a 1-year interval is used or comparisons are made with older screening mammograms. For this reason, our ability to compare our findings with those of other studies is limited.
In our study, radiologists were asked whether they needed prior mammograms to interpret each individual case. By combining the scores of the reading sessions with and those of the reading sessions without prior mammograms, we were able to compute detection results for a protocol in which prior mammograms are viewed only when radiologists indicate they are needed. It was found that LROC performance increased significantly when this protocol was used. However, performance was significantly lower than that in sessions in which comparison with prior mammograms was always made. At a fixed lesion localized fraction, the nonlesion localized fraction was reduced by 24%, on average (P < .001), when prior session scores were used in 14.5 (24%) of the 60 normal cases, in 6.5 (33%) of the 20 benign cases, and in 22.5 (28%) of the 80 malignant cases. Initial detection was not worse when reading was performed without prior mammograms; therefore, this difference can only be explained by looking at the selection process. It seems that radiologists may not always make the best choice when they are asked which prior mammograms are needed.
It appears that decision making does not always improve when prior mammograms are available. This can be partly explained by intraobserver variance. However, it seems that there are cases in which the availability of prior mammograms leads to the wrong decision. This may be the case if a true malignancy does not grow during the screening interval or if a nonlesion location appears as a new mass. Thus, improved detection performance with prior mammograms is a net effect, in which negative effects in a few cases are outweighed by greater benefits in other cases.
In conclusion, we found that viewing current mammograms in association with prior mammograms significantly affects performance and may decrease the number of referrals due to nonlesion localizations by up to 44%. Our results suggest that prior mammograms help radiologists interpret suspicious abnormalities but have no effect on the initial detection of these abnormalities. In principle, this makes viable a strategy in which prior mammograms are reviewed only when radiologists deem it necessary. However, overall performance may become worse when prior mammograms are not always available, and it may be the case that prior mammograms are not requested frequently enough in clinical practice because of the additional workload involved.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Abbreviations: CAD = computer-aided detection LROC = localized receiver operating characteristic
See Materials and Methods for pertinent disclosures.
Author contributions: Guarantors of integrity of entire study, A.A.J.R., N.K., J.H.C.L.H., N.B.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, A.A.J.R., N.K., N.W., S.v.W., N.B.; clinical studies, N.W., N.B., B.S., C.J.G.E.; statistical analysis, A.A.J.R.; and manuscript editing, A.A.J.R., N.K., N.W., C.B., S.v.W., P.R.S., J.H.C.L.H., M.R.d.T., N.B., D.B., C.J.G.E.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |