Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2372041174
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hofvind, S.
Right arrow Articles by Bjurstam, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hofvind, S.
Right arrow Articles by Bjurstam, N.
(Radiology 2005;237:437-443.)
© RSNA, 2005


Breast Imaging

Influence of Review Design on Percentages of Missed Interval Breast Cancers: Retrospective Study of Interval Cancers in a Population-based Screening Program1

Solveig Hofvind, MSc, Per Skaane, MD, PhD, Bedrich Vitak, MD, PhD, Hege Wang, PhD, Steinar Thoresen, MD, PhD, Liv Eriksen, MD, Hilde Bjørndal, MD, Audun Braaten, MD and Nils Bjurstam, MD, PhD

1 From the Cancer Registry of Norway, Montebello, N-0310 Oslo, Norway (S.H., S.T.); Ullevål University Hospital, Oslo, Norway (P.S.); Linköping University Hospital, Linköping, Sweden (B.V.); Directorate for Health and Social Affairs, Oslo, Norway (H.W.); Central Hospital, Rogaland County, Stavanger, Norway (L.E.); Norwegian Radium Hospital, Oslo, Norway (H.B.); Haukeland University Hospital, Bergen, Norway (A.B.); and University of North-Norway, Tromsø, Norway (N.B.). Received July 26, 2004; revision requested October 1; revision received November 10; accepted December 17. Address correspondence to S.H. (e-mail: solveig.hofvind{at}kreftregisteret.no).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
PURPOSE: To retrospectively investigate whether different review designs have an influence on the estimate of missed interval cancer in a population-based breast cancer screening program.

MATERIALS AND METHODS: The Norwegian Breast Cancer Screening Program invites women aged 50–69 years to undergo biennial screening mammography. The current study was part of the evaluation and scientific aspects of the screening program and thus was covered by the general ethical approval of the screening program as a part of the Cancer Registry of Norway. All participants signed an informed consent that specified that data related to their screening visit could be used for evaluation and scientific purposes. Six radiologists (9–34 years of experience in mammography) reviewed previously obtained bilateral two-view screening and diagnostic mammograms of 231 interval cancers, 117 screening-detected cancers, and 373 normal cases. Four review designs were used: individual and paired blinded review and individual and consensus informed review. A five-point interpretation scale was used to reclassify the cancers into missed cancers, minimal signs, and true cancers. The number and proportion of subgroups were estimated with 95% confidence intervals.

RESULTS: Of 231 interval cancers, 46 (19.9%) were reclassified as missed cancers with the mixed blinded individual review and 54 (23.4%) were classified as missed cancers with the mixed blinded paired review. Eighty-three cancers (35.9%) were classified as missed cancers with individual informed review, and 78 (33.8%) were classified as missed cancers with consensus informed review. Thirty-nine cancers (16.8%) were reclassified as missed when four or more radiologists assigned a score of 2 or more (probably benign or more suspicious); three cancers (1.3%) were reclassified as missed when a score of 4 or more (probably malignant or more suspicious) was assigned.

CONCLUSION: The percentage of interval cancers classified as missed ranged from 1.3% to 35.9% according to review design. To encourage learning, a review protocol should include both blinded and informed designs.

© RSNA, 2005


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Mammographic screening is credited with helping reduce breast cancer mortality rates in both randomized controlled trials (1,2) and service screening programs (3,4). The effects, however, can only be achieved in high-quality programs (1).

The rate of breast cancers detected between the screening examinations, so-called interval cancers, has an influence on the sensitivity of a screening program. The European Guidelines for Quality Assurance in Mammography Screening (5) define an interval cancer as "breast cancer arising after a negative screening episode (which may include assessment) and before the next scheduled screening round." Varying definitions of interval cancer—whether the cancer is detected after the woman's first (prevalent) or subsequent (incident) screening examination, whether single- or two-view mammography is performed, whether single or double reading by one or two radiologists is used, and the length of the interval between the screening rounds—complicate the issue and make comparison of the interval cancer rate difficult (611).

Interval cancers form a heterogeneous group of tumors (5,7,9,11). Missed cancers have mammographic signs of malignancy that have been misinterpreted or overlooked on the preceding image. These cases can be considered as a measure of radiologic performance and quality. A visible lesion at mammography that is not suspicious enough to necessitate a recall of the patient is defined as a minimal sign. True interval cancers represent cancers that are defined as mammographically undetectable at screening. A minimal sign can be regarded as a true interval cancer because considerations of the tumor did not result in a recall of the patient, and true interval cancers represent tumors that can be mammographically occult on screening and diagnostic mammograms.

It has been recommended that radiologic review of screening mammograms in women who subsequently develop an interval cancer be a part of the quality assurance in a breast cancer screening program (5,12). A review can be performed by mixing screening mammograms from women who subsequently developed an interval cancer with mammograms of screening-detected cancer and those showing negative findings. A review in which the distribution formula and screening outcome are unknown to the readers is called a mixed blind review. Another review design is an informed review, in which reports from radiologic, surgical, and histologic examination of the interval cancer are available in addition to the diagnostic and screening mammograms.

The definition of an interval cancer, the review design, the definition of subgroups of interval cancer, the radiologist's and the reviewer's experience in mammographic screening, the age groups of the women included, and the criteria for inclusion in the study are all factors that vary and, thus, influence the results of review designs (7,9,12). Although reviews are apparently important for learning about and classifying interval cancers, to our knowledge, only a limited number of studies have included an analysis of the influence of review design on the estimates of subgroups (8,9,12,13).

This multireader review study of interval cancers was designed as a part of the quality assurance section of the Norwegian Breast Cancer Screening Program (NBCSP). The aim of the study was to retrospectively investigate whether different review designs influence the estimate of missed interval cancers in a population-based breast cancer screening program.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
This study is based on the prevalent screening round and the following 2-year interval in the pilot study of the NBCSP. The program started as a 4-year pilot study in four of 19 counties at the end of 1995 and the beginning of 1996 and became nationwide in February 2004 (14,15). The NBCSP is a governmental program and is administered by the Cancer Registry of Norway.

Invitation and Screening Examination
This was a population-based study, and all women aged 50–69 years who were residents of one of four counties in Norway were personally invited by letter to participate. A unique 11-digit personal identification number assigned to all inhabitants of Norway was used to identify the women. All participants signed an informed consent that specified that data related to their screening visit could be used for evaluation and scientific purposes. The current study was part of the evaluation and scientific aspects of the screening program and thus was covered by the general ethical approval of the screening program as a part of the Cancer Registry of Norway.

The women were offered two-view mammography biennially. In the pilot study, all mammograms were obtained with a mammographic system (Mammomat 300; Siemens Medical Systems, Erlangen, Germany) with film (Min-RE; Eastman Kodak, Rochester, NY) and screens (Min-R; Eastman Kodak). Molybdenum-molybdenum anode and 27–29 kV were always used.

Double reading of all four mammograms obtained in each woman was performed by two independent radiologists in the usual screening setting. The screening radiologists had 4–20 years of experience in clinical mammography, and all read screening mammograms from more than 5000 women a year. A five-point rating scale for probability of breast cancer was used in the interpretation of the mammograms as follows: 1, normal; 2, probably benign; 3, indeterminate; 4, probably malignant; and 5, malignant. If at least one of the readers categorized the mammographic findings with a score of 2 or higher, the case was automatically selected for a consensus meeting, at which an agreement was reached about whether to recall the woman. Mammographic breast density and radiologic characteristics were registered for all women who underwent a diagnostic work-up. Density was classified by using three categories as follows: dense (>70% glandular tissue), intermediate (30%–70% glandular tissue), and lucent (<30% glandular tissue). Radiographically determined mammographic characteristics were classified as circular or oval lesions, unsharp or ill-defined lesions, asymmetric densities, stellate lesions, and calcification alone or in the lesion.

In the pilot study of the NBCSP, short-term follow-up was not recommended. This meant that the women had a negative or a positive screening test. All women who had a positive screening test were referred to diagnostic work-up that had to be conclusive, which meant that the woman was either referred back to screening or was referred for cancer surgery. The result of the diagnostic work-up was claimed to be within 3 months after the screening examination. If the results at diagnostic work-up were negative (false-positive finding), the women were referred back to screening. Women with positive (true-positive) results at diagnostic work-up were offered treatment.

The screening and diagnostic procedures and the corresponding results were registered in a nationwide screening database located at the Cancer Registry of Norway. Both the screening-detected and interval cancers were reported on cytologic and histologic analysis forms, clinical records, and death certificates. Tumor characteristics (size in millimeters, involvement of the axillary lymph node, and histologic type and grade) were included on histologic analysis forms and were registered according to the International Statistical Classification of Diseases, 10th Revision. This multiple reporting practice provided an accurate and complete set of data for each patient linked to the personal identification number.

Prevalent Screening Round
A total of 159 887 women were invited to the prevalent screening round in the pilot study, and 127 064 accepted (attendance rate, 79.5%). The recall rate caused by abnormal mammograms was 4.2% (5 370 of 127 064 women). Eight hundred fifty-six breast cancers were diagnosed; of them, 169 (19.7%) were cases of ductal carcinoma in situ and 687 (80.2%) were invasive cancers.

A total of 247 interval cancers were diagnosed after the prevalent screening round of the pilot study. Breast cancer was diagnosed during the first 6 months after screening in 11 (4.5%) women, during the second half-year after screening in 46 (18.6%) women, during the third half-year after screening in 90 (36.4%) women, and during the fourth half-year after screening in 100 (40.5%) women. These numbers provided a sensitivity of 77.6% (856 of 856 + 247) when both cases of ductal carcinoma in situ and invasive cancers were included.

Interval Cancers, Corresponding Mammograms, and Final Study Group
Since 1952, it has been mandatory for all physicians and laboratories in Norway to report all cancers diagnosed to the Cancer Registry. When breast cancer in women in the target group of the pilot study was reported to the Cancer Registry, the patient's screening history was controlled. If the criterion for an interval cancer (the women had previously undergone screening during the last 2 years) was met, the case was defined as an interval cancer. Records of breast cancer were frequently exchanged between the breast clinics and the screening database of the Cancer Registry to ensure completeness of the material.

The interval cancer rate in the pilot study of the NBCSP included all breast cancers (invasive cancers and cases of ductal carcinoma in situ) diagnosed between the date that the results of mammographic screening or those of the diagnostic work-up were obtained and the scheduled date for the next screening examination. Women in the oldest birth cohort, aged 68 and 69 years in the prevalent screening round, were followed up for 2 years after the screening examination. The diagnosis of interval cancers was made on the basis of symptoms declared by the women themselves, the clinical findings of their physician, or as an asymptomatic cancer detected with screening mammography at a private clinic. The extent of screening at private clinics in the period between screening sessions is, unfortunately, unknown. All interval cancers were verified at histologic examination. Because of the logistics of the pilot study, it was possible to consider the cases of ductal carcinoma in situ and the invasive cancers separately. All results are woman-based, which means that one case is one woman, two breasts, and four mammograms (from two-view mammography of each breast). This study was based on 247 women diagnosed with an interval breast cancer (16). The screening mammograms were not available for 16 patients. Thus, the final material included 231 interval cancers—14 cases of ductal carcinoma in situ (6.1%) and 217 invasive cancers (93.9%)—diagnosed in women aged 51–73 years (mean age, 60.2 years ± 0.4 [standard error]; median age, 60 years). Radiologic, histologic, and surgical reports were available for all cancers, and all patients had previously undergone two-view bilateral mammography; thus, the reviewing material included images from two-view mammography of 462 breasts.

Review Design
The retrospective multireader review designs consisted of two parts (Fig 1). Six radiologists were dedicated to the review in two single occasions. Five radiologists were screeners in Norway with 9–34 years of experience in mammography (A.B., 9 years; H.B., 12 years; N.B., 34 years; L.E., 17 years; P.S., 23 years). One reader worked in Sweden (B.V., 28 years of experience), and one of the screeners in Norway had many years of experience with mammographic screening in Sweden (N.B.).



View larger version (59K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Diagram illustrates the designs used for the multireader review of interval cancer in the pilot study of the NBCSP.

 
Mixed Blinded Review
In the mixed blinded review, previously obtained mammograms from patients with interval cancer (n = 231) were mixed with screening mammograms with negative findings (n = 373) and mammograms that showed cancers detected at screening (n = 117). Thus, mammograms from a total of 721 women were included. The screening mammograms with negative findings were randomly selected from women who participated in the prevalent screening round, matched by the age of the women with interval cancer, and drawn among women with negative findings also in the subsequent screening round. The cancers detected at screening were randomly selected and matched by the age of the woman and the tumor size of the interval cancer.

The mammograms were first reviewed independently and individually by all six radiologists (mixed blind individual review). Thereafter, the mammograms were reviewed by three independent pairs of radiologists (A.B. and L.E., H.B. and P.S., and N.B. and B.V.). Although all radiologists knew they were participating in a mixed blind review, they did not know the distribution formula of screening mammograms with negative findings, cancers that were detected at screening, and interval cancers.

In the individual design, all six radiologists reviewed two-view mammograms of both breasts from all 721 women. A person who did not take part in the review randomly divided the cases into six groups. No diagnostic information was available, and the mammograms were masked to keep the radiologists from knowing the identity of the patient and the breast clinic. The distribution formula of screening mammograms with negative findings, screening mammograms that showed detected cancers, and interval cancers was unknown to the radiologists. Equipment such as spotlights, magnifiers, and viewers were available. The readers were allowed 2 hours 30 minutes for the individual review of the six groups (mammograms from approximately 120 cases). The radiologists had to fill in a five-point interpretation scale, and they had to indicate the breast density and tumor location on forms made particularly for this study. The scales used for interpretation, mammographic density, and location were the same as those used for screening, which were described earlier.

In the mixed blind paired review, three pairs of two radiologists worked together on the day after the individual mixed blind review. The radiologists were paired randomly, and each radiologist was part of only one pair. In this review, only the cases about which the two radiologists disagreed in the individual review were reviewed. Disagreement meant that one of the radiologists assigned the mammogram a score of 1 (normal) and the other assigned the mammogram a score of 2 or higher (probably benign or more suspicious). An equal score by the two radiologists was considered a negative finding (a score of 1 assigned by both radiologists) or a positive finding (a score of 2 or higher assigned by both radiologists) if the given location was the same. The same data registered in the individual review (interpretation score, mammographic breast density, and location) were registered in the paired review.

Informed Review
In the informed review, only mammograms from the interval cases (n = 231) were included. Two-view mammograms of both breasts from the screening session that preceded the diagnosis and two-view mammograms of both breasts obtained at diagnostic examination were used in the review. All mammograms (screening examination and diagnostic examination) from women with the 231 interval cancers were reviewed by all six radiologists on an individual basis (informed individual review) and in consensus (informed consensus review). These two designs were carried out 5 months after the mixed blind review. A form was used to collect information about the interpretation score, mammographic breast density, location, and radiographic characteristics from the screening mammograms. The radiographic characteristics were specified as in the screening examination, with five categories as described previously. Fifty minutes was allowed for the individual informed review on each rotator (approximately 40 interval cancers on each rotator), which meant that all six radiologists had six shifts on different rotators.

For the consensus review, all six radiologists worked together in one group (Fig 1) and discussed the findings for each of the 231 interval cancers. In addition to the previously described interpretation scale, an additional scale (described in the following section) was used to categorize interval cancers.

Interpretation Scales
A five-point interpretation scale (1 = normal, 2 = probably benign, 3 = indeterminate, 4 = probably malignant, or 5 = malignant) was used to reclassify the previously obtained mammograms of the interval cancer. This scale was used in all four reviews and is identical to that used in the pilot study. The score was regarded as correct (missed cancer) if the radiologist also had indicated a location of the tumor that corresponded to the location of the tumor of the diagnosed breast cancer. For the informed consensus review, an additional scale was used to classify the cancer into the following subgroups: missed cancer, minimal sign, true interval cancer, occult lesion, and technically unsatisfactory mammograms. A technically unsatisfactory mammogram is, for example, a mammogram without satisfactory imaging of the major pectoral muscle or a mammogram with too low a contrast. The decision about whether a mammogram was technically unsatisfactory was determined on a subjective basis by the radiologists.

Statistics
The numbers and percentages of missed cancers (score of 2–5) are presented with corresponding 95% confidence intervals (CIs) for each review design in the Table. The 95% CIs were calculated on the basis of binomial distribution with continuity correction, and the statistical tests were performed by using a two-sample test for equality of proportions with continuity correction. Corresponding figures are presented for the independent mixed blind paired review and the independent individual informed review. The exact number of missed cancers is provided for the informed consensus review. These calculations also were performed in subgroups of interval cancers diagnosed during the 1st (0–365 days) and 2nd (366–730 days) year after screening.


View this table:
[in this window]
[in a new window]

 
Numbers and Percentages of Missed Interval Cancers according to Review Design and Time after Screening

 
Missed cancer will also be presented as the percentage of interval cancers with a score of 2 or higher (score of 2–5) by at least four radiologists. A minimal sign cancer was diagnosed if a score of 2 or higher was assigned by one to three radiologists, and a true interval cancer was diagnosed if none of the radiologists assigned the cancer a score of 2 or higher. An additional design was applied by requiring a score of 4 or higher (score of 4 or 5). A missed cancer was diagnosed if a score of 4 or higher was assigned by four or more radiologists, and a minimal sign cancer was diagnosed if a score of 4 or higher was assigned by one to three radiologists. If none of the radiologists assigned the cancer a score of 4 or higher, a true interval cancer was diagnosed.

Sensitivity was estimated by dividing the number of true-positive findings (cancers detected at screening) by the number of true-positive and false-negative cancers (interval cancers).

Analyses were performed with software (SPSS for Windows, version 12.0.1; SPSS, Chicago, Ill).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
The mean and exact number and percentages of missed cancers (correctly selected cancers assigned a score of 2 or higher) with different review designs are shown in the Table. A mean of 46 of 231 interval cancers (19.9%; 95% CI: 15.0, 25.7) were reclassified as missed by the six radiologists in the retrospective independent mixed blind individual review, and a mean of 54 cancers (23.4%; 95% CI: 18.1, 29.4; P = .50) were reclassified as missed in the independent mixed blind paired review. The percentage of missed cancers was significantly higher in the independent informed individual review (83 of 231 cancers [35.9%]; 95% CI: 29.7, 42.5) and in the informed consensus review (78 of 231 cancers [33.8%]; 95% CI: 27.7, 40.3) than in the independent mixed blind individual review (P < .01 for both) (Table). When only the 53 interval cancers diagnosed during the 1st year of the study were analyzed, the percentages of missed interval cancer were 23% (12 of 53 cancers), 28% (15 of 53 cancers), 45% (24 of 53 cancers), and 40% (21 of 53 cancers) for the mixed blind individual, mixed blind pair, informed individual, and informed consensus reviews, respectively (Table). All these estimates were statistically the same as they were for the interval cancers in total (P > .05 for all four estimates). The percentages of missed interval cancers diagnosed with each review design during the 2nd year after screening (366–730 days) were also statistically the same as those for the missed cancers diagnosed during the entire study period (P > .05 for all).

The percentages of missed cancers, minimal signs, and true interval cancers according to number of radiologists who assigned the interval cancer a score of 2 or higher and a score of 4 or higher in the independent individual mixed blind review are presented in Figure 2. When a score of 2 (probably benign or more suspicious) was regarded, a total of 39 cancers (16.9%; 95% CI: 12.3, 22.3) were reclassified as missed cancers, 43 cancers were reclassified as minimal sign cancers (18.6%; 95% CI: 13.8, 24.2), and 149 were reclassified as true interval cancers (64.5%; 95% CI: 58.0, 70.7). A score of 4 (probably malignant or malignant) resulted in three missed cancers (1.3%; 95% CI: 0.3, 3.7), 21 minimal sign cancers (9.1%; 95% CI: 5.7, 13.6), and 207 true interval cancers (89.6%; 95% CI: 84.9, 93.2).



View larger version (17K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Charts show subgroups of interval breast cancer in an independent mixed blinded individual review in the NBCSP according to score and number of radiologists. * = Probably benign, indeterminate, probably malignant, and malignant; {dagger} = probably malignant and malignant; dark gray area = missed cancer (selected by four, five, or six radiologists in an independent individual mixed blinded review); light gray area = minimal sign cancer (selected by one, two, or three radiologists in an independent individual mixed blinded review); and white area = true interval cancer (not selected by any radiologist in an independent individual mixed blinded review).

 
With the additional scale used only in the informed consensus review with all radiologists, 80 of 231 interval cancers (34.6%; 95% CI: 28.5, 41.2) were categorized as missed cancers, 53 (22.9%; 95% CI: 17.7, 28.9) were categorized as minimal sign cancers, 78 (33.8%; 95% CI: 27.7, 40.3) were categorized as true interval cancers, and 16 (6.9%; 95% CI: 4.0, 11.0) were categorized as occult cancers. In four cancers (1.7%; 95% CI: 0.5, 4.4), mammograms were determined to be technically unsatisfactory.

In the individual mixed blind review, a mean of 109 of 117 cancers detected at screening were assigned a score of 2 or higher by the six radiologists (true-positive findings). Eight cancers were not selected and, thus, were regarded as false-negative findings. This yields a sensitivity of the screening program of 92.3%, derived with the calculation of [108 ÷ (108 + 9)].

The sensitivity of the prevalent screening round in the NBCSP was 95.6%, derived with the calculation of [856 ÷ (856 + 39)] when cancers assigned a score of 2 or higher by four or more radiologists were regarded as missed cancers. The value was 91.5%, derived with the calculation of [856 ÷ (856 + 80)] when the number of missed interval cancers in the informed consensus review was regarded as a false-negative finding.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
In this retrospective multireader review of interval breast cancers in the pilot study of the NBCSP, the percentage of missed cancers varied from 1.3% to 35.9% according to review design.

The quality of the radiologist's interpretations of mammograms is important to minimize the rate of occurrence of interval cancer and, thus, achieve a tolerable sensitivity for the screening program. The necessity of viewing a large number of images to detect a relatively small number of cancers, the complex radiographic structure of the breast, the subtle nature of many mammographic characteristics of early breast cancer, and the radiologist's fatigue or distraction in a screening situation are all factors that influence interpretation (17). Experience and exercises for rereading of mammograms may contribute to a reduced rate of false-negative findings. A review process is a subjective procedure, and the radiologist's reactions to the review situation will probably influence the results of such a procedure (18).

The results of this study confirm that a higher percentage of missed interval cancers is obtained in the informed review than is obtained in the mixed blind review (10,12,13). The various mixing formulas used for mammograms in mixed reviews have been shown to be of clinical importance (11,12). Somewhat varying results in the studies are probably due to aspects of the definition of an interval cancer (inclusion criteria) and the review protocol. The screening interval and extent of screening at private clinics also affect the number and characteristics of the interval cancers and, subsequently, the percentage distribution of the subgroups; however, it is stated that the occurrence rate for interval cancer in Norway is somewhat high (15), according to the European guidelines (5). In this study, the percentage of missed interval cancers diagnosed during the 1st and 2nd year of the study was equal to that for the entire study period. This indicates that the results may be transmissible to screening programs with different intervals between two screening examinations.

Different requirements for defining a cancer as correctly selected or missed complicate the issue further; some studies accept selection of the cancers independent of tumor site, some studies require the correct breast, and others require an accurate quadrant or exact lesion location (18). For this review, we required an exact specification of the breast and quadrant of the malignant lesion to be regarded as correctly selected. The consequence is probably that the percentage of missed cancers is reduced and the percentage of true interval cancers is increased, compared with the requirement of selecting the correct breast. This topic is challenging and requires radiographic understanding of the screening process and substantial knowledge about tumor growth.

The higher percentage of missed cancers in the informed review compared with the blind review is as expected because of the available information about the exact tumor location. Thus, the percentage of missed cancers in an informed setting is less informative. Nevertheless, an informed review is of great value for quality assurance of the radiologist's performance, for comparing the results from a blind review with those from an informed review, and as a part of the continuing education for screening radiologists (9,18).

The five-point interpretation scale used in this study and in the NBCSP is somewhat different from the Breast Imaging and Reporting Data System (19) used in the United States. The Breast Imaging and Reporting Data System has a category of "probably benign finding—initial short-interval follow-up" (category 3) that is not used in Norway. A screening examination interpreted as a Breast Imaging and Reporting Data System category 3 is probably a suspicious cancer according to the Norwegian system (category 2 or 3). The threshold of suspicions that lead to recall of the patient is lower in the United States because of the different medicolegal environment; in addition, the overall recall rate is lower in Norway than it is in the United States (15,20). A possible reason for the lower recall rate could be that the consensus meeting held by the radiologists before a final decision for a recall is striving to reach the recommendations of the program (3%–5%). Nevertheless, the low recall rate and the restriction in follow-up time probably have an influence on the occurrence rate of interval cancer.

It is expected that the interval cancer has less favorable prognostic and predictive tumor characteristics than do screening-detected cancers. This is shown for the interval cancers included in this study (16) and indicates the importance of keeping the interval cancer occurrence rate as low as possible in order to achieve a high sensitivity of the screening program, and thus possibilities for reducing mortality from the disease. In this study, we did not include analysis of the histologic or radiographic tumor characteristics according to subgroup of interval cancer.

Material from a population-based screening program and four different review designs used for performance of evaluations by the same six experienced radiologists at two dedicated sessions are the strengths of this study. The facts that all interval cancer cases were diagnosed in the same time period, subsequent to the prevalent screening round in a population-based screening program, and that we had a stable staff of radiologists, equal technical equipment, and similar reading conditions during the entire screening period contribute further to a useful basis for this study. Even though the number of interval cancers (n = 231) detected by all six radiologists seems to be satisfactory for estimating the entire group of interval cancers, it is somewhat low for estimations in subgroups.

Our study had some limitations. We tested only whether the women were received a recommendation for work-up. A score of 2 or higher on the five-point interpretation scale was regarded as an indication that work-up was warranted in the review, but we do not know whether the work-up would have led to a biopsy and diagnosis of a breast cancer. If all women who were called back, or selected, had a cancer, the positive predictive value would have been 100%, compared with the rate of approximately 15% in the pilot study. If the review accuracy were applied in a screening setting, the recall rate would reach a higher level of false-positive findings than were observed and, hence, would increase expenses.

In addition, the retrospective detection rate was likely affected because of a different setting in the review and usual screening (9,12). The mixing of mammograms in a review ideally should be similar to that in a screening setting; however, because of effectiveness, this was not feasible. In the mixed blind part of this study, the distribution ratio of the number of cancer cases and cases with negative findings was 1:1.1 (348 of 373 cases); the ratio in the screening situation was approximately 1:167 (six of 1000 cases). If this review had been conducted as a breast-based study (each breast counts as one, ie, one woman usually counts as two), the number of cancers included in the blind review would have resulted in a distribution ratio of 1:3.1 (348 breasts with cancer divided by 1094 breasts with negative findings at screening). The figures illustrate the difficulty of getting a screeninglike distribution of mammograms in a review if an acceptable number of interval cancers is included.

The differences are likely to be reflected in the sensitivity of the program and the review. The sensitivity was 77.6% in the pilot study and 93.2% in the mixed blind review. The sensitivity of a screening program is influenced by the number of views, image quality, single versus independent reading by two independent radiologists, availability of previously obtained mammograms, and recall rate (2123). The pilot study was performed with a high quality standard, with two-view mammography and independent reading by two radiologists. Patients were recalled according to recommendations in the European guidelines; thus, our recall rate was lower than the usual rate in the United States (5,20). The 2-year screening interval is also somewhat different than usual in the United States (21) and probably is the main reason for the high number of interval cancers. Annual screening would probably decrease the interval cancer occurrence rate to approximately half, thus increasing the sensitivity of the program. All these factors must be taken into account when one considers the results. There are various approaches, however, for calculating the sensitivity (24). Inclusion of only the number of interval cases selected with a score of 2 or higher by four or more radiologists in the mixed blind individual review as false-negative findings increased the sensitivity in the pilot study from 77.6% to 95.6%. If only the missed cancers in the informed consensus review were regarded as false-negative findings, the sensitivity was estimated to be 91.5%. The figures illustrate how difficult it is to compare the sensitivity in screening programs.

Another weakness of a mixed blind review is that a reclassified interval cancer can also be missed. Missing a cancer in a screening setting is a part of a screening program and leads to an interval cancer or a delayed diagnosis of a cancer detected at screening. A cancer that is missed in a mixed blind review is conducive to weak estimates and underestimation of the percentages of subgroups.

The percentage of missed cancers varied from 1.3% to 35.9% according to review design in this retrospective multireader study of interval breast cancers. The review of interval cancers is important for quality assurance in a population-based screening program, as well as for radiologic learning. Results of this study emphasize the substantial effect of different designs in a review protocol and encourage both mixed blind individual and informed consensus reviews to optimize the possibilities for radiologic learning.


    ACKNOWLEDGMENTS
 
We are indebted to the medical coders Eva-Lisa Piiksi Dahli and Anka Ertzaas, who both work at the Cancer Registry of Norway, for valuable practical assistance in collecting the mammograms and carrying out the two weekends of reviews.


    FOOTNOTES
 

Abbreviations: CI = confidence interval • NBCSP = Norwegian Breast Cancer Screening Program

Authors stated no financial relationship to disclose.

Author contributions: Guarantors of integrity of entire study, S.H., S.T., N.B.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, S.H., P.S., B.V., N.B.; clinical studies, L.E., N.B., S.H., B.V., H.B.; statistical analysis, S.H., N.B.; and manuscript editing, S.H., P.S., B.V., H.W., S.T., H.B., A.B., N.B.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Vainio H, Bianchini F, eds. IARC handbook of cancer prevention. Vol 7, Breast cancer screening. Lyon, France: International Agency for Cancer Research, 2002.
  2. Nystrom L, Andersson I, Bjurstam N, Frisell J, Nordenskjold B, Rutquiest LE. Long-term effects of mammography screening: updated overview of the Swedish randomized trials. Lancet 2002;359:909–919.[CrossRef][Medline]
  3. Tabar L, Yen MF, Vitak B, Chen HH, Smith RA, Duffy SW. Mammography service screening and mortality in breast cancer patients: 20-year follow-up before and after introduction of screening. Lancet 2003;361:1405–1410.[CrossRef][Medline]
  4. Otto SJ, Fracheboud J, Looman CW, et al. Initiation of population-based mammography screening in Dutch municipalities and effect on breast-cancer mortality: a systematic review. Lancet 2003;361:1411–1417.[CrossRef][Medline]
  5. Perry N, Broeders M, deWolf C, Tornberg S. European guidelines for quality assurance in mammography screening. Luxembourg Grand Duchy: Office for Official Publications of the European Communities, 2001.
  6. Tabar L, Faberberg G, Day NE, Holmberg L. What is the optimum interval between mammographic screening examinations? an analysis based on the latest results of the Swedish two-county breast cancer screening trial. Br J Cancer 1987;55:547–551.[Medline]
  7. Vitak B. Invasive interval cancers in the Ostergotland Mammographic Screening Programme: radiological analysis. Eur Radiol 1998;8:639–646.[CrossRef][Medline]
  8. Day N, McCann J, Camilleri-Ferrante C, et al. Monitoring interval cancers in breast screening programmes: the east Anglian experience. Quality Assurance Management Group of the East Anglian Breast Screening Programme. J Med Screen 1995;2:180–185.
  9. Gower-Thomas K, Fielder HM, Branston L, Greening S, Beer H, Rogers C. Reviewing interval cancers: time well spent? Clin Radiol 2002;57:384–388.[CrossRef][Medline]
  10. Fracheboud J, de Koning HJ, Beemsterboer PM, et al. Interval cancers in the Dutch breast cancer screening programme. Br J Cancer 1999;81:912–917.[CrossRef][Medline]
  11. Duncan AA, Wallis MG. Classifying interval cancers. Clin Radiol 1995;50:774–777.[CrossRef][Medline]
  12. Moberg K, Grundstrom H, Tornberg S, et al. Two models for radiological reviewing of interval cancers. J Med Screen 1999;6:35–39.[Abstract/Free Full Text]
  13. de Rijke JM, Schouten LJ, Schreutelkamp JL, Jochem I, Verbeek AL. A blind review and an informed review of interval breast cases in the Limburg screening programme, the Netherlands. J Med Screen 2000;7:19–23.[Abstract/Free Full Text]
  14. Wang H, Karesen R, Hervik A, Thoresen SO. Mammography screening in Norway: results from the first screening round in four counties and cost-effectiveness of a modeled nationwide screening. Cancer Causes Control 2001;12:39–45.[CrossRef][Medline]
  15. Hofvind S, Wang H, Thoresen S. Do the results of the process indicators in the Norwegian Breast Cancer Screening Program predict future mortality reduction from breast cancer? Acta Oncol 2004;43:467–473.[CrossRef][Medline]
  16. Wang H, Bjurstam N, Bjørndal H, et al. Interval cancers in the Norwegian breast cancer screening program: frequency, characteristics and use of HRT. Int J Cancer 2001;94:594–598.[CrossRef][Medline]
  17. Warren Burhenne LJ, Wood SA, D'Orsi CJ, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000;215:554–562.[Abstract/Free Full Text]
  18. Moberg K. Incidence and interval cancers in retrospective assessment [doctoral thesis]. Stockholm, Sweden: Stockholm Söder Hospital Karolinska Institutet, 2003.
  19. ACR BI-RADS Committee. BI-RADS Breast Imaging Reporting and Data System. Reston, Va: American College of Radiology, 2003.
  20. Smith-Bindman R, Chu PW, Miglioretti DL, et al. Comparison of screening mammography in the United States and the United Kingdom. JAMA 2003;290:2129–2137.[Abstract/Free Full Text]
  21. Thurfjell EL, Lernevall KA, Taube AA. Benefit of independent double reading in a population-based mammography screening program. Radiology 1994;191:241–244.[Abstract/Free Full Text]
  22. Yankaskas BC, Cleveland RJ, Schell MJ, Kozar R. Association of recall rates with sensitivity and positive predictive value of screening mammography. AJR Am J Roentgenol 2001;177:543–549.[Abstract/Free Full Text]
  23. Taplin SH, Rutter CM, Finder C, Mandelson MT, Houn F, White E. Screening mammography: clinical image quality and the risk of interval breast cancer. AJR Am J Roentgenol 2002;178:797–803.[Abstract/Free Full Text]
  24. Kopans DB. The positive predictive value of mammography. AJR Am J Roentgenol 1992;158:521–526.[Free Full Text]



This article has been cited by other articles:


Home page
RadiologyHome page
P. B. Gordon, M. J. Borugian, and L. J. Warren Burhenne
A True Screening Environment for Review of Interval Breast Cancers: Pilot Study to Reduce Bias
Radiology, November 1, 2007; 245(2): 411 - 415.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
P. Skaane, S. Hofvind, and A. Skjennald
Randomized Trial of Screen-Film versus Full-Field Digital Mammography with Soft-Copy Reading in Population-based Screening Program: Follow-up and Final Results of Oslo II Study
Radiology, September 1, 2007; 244(3): 708 - 717.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
P. Skaane, A. Kshirsagar, S. Stapleton, K. Young, and R. A. Castellino
Effect of Computer-Aided Detection on Independent Double Reading of Paired Screen-Film and Full-Field Digital Screening Mammograms
Am. J. Roentgenol., February 1, 2007; 188(2): 377 - 384.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hofvind, S.
Right arrow Articles by Bjurstam, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hofvind, S.
Right arrow Articles by Bjurstam, N.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE