|
|
||||||||
Breast Imaging |
1 From the Pendergrass Laboratory (C.F.N., C.M.T., H.L.K.) and Department of Breast Imaging (S.P.W., E.F.C., R.E.H.S., S.E.R., J.A.B.), University of Pennsylvania, Philadelphia. From the 2000 RSNA scientific assembly. Received September 8, 2000; revision requested October 26; final revision received January 31, 2001; accepted February 15. C.F.N. supported in part by DAMD17-97-1-7130 grant. Address correspondence to C.F.N., Department of Radiology, 3600 Market St, Suite 370, Rm 103, Philadelphia, PA 19104-2647 (e-mail: nodine@oasis.rad.upenn.edu).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Four experienced mammographers performed a blinded review of a test set of 20 retrospective cases where the cancer was not detected until the next mammographic evaluation, 10 prospective cases where the cancer was initially detected, and 10 cancer-free cases. Two views were digitized and displayed on a workstation. The experiment consisted of an initial impression, during which eye position was monitored, and a final impression, during which viewers zoomed on regions of interest and localized suspicious lesions. Eye-position data were analyzed to determine whether retrospectively visible cancers attracted attention to the same degree as prospectively visible cancers. The initial impression used 1,000 msec as the eye-fixation dwell criterion for detecting a lesion.
RESULTS: Initially, 70% of retrospective cancers and 50% of prospective cancers did not attract prolonged visual attention. In prospective cases, detailed examination significantly improved the mean receiver operating characteristic area, from .73 to .88 (P < .01), but in retrospective cases, the mean receiver operating characteristic area barely increased, from .60 to .68, due to a high true-positivetofalse-positive ratio.
CONCLUSION: At blinded review, detection of retrospectively visible cancers was significantly inferior to that of prospective cancers. It cannot be assumed that retrospectively identified cancers are intrinsically detectable, because they do not draw prolonged visual attention during visual search for breast cancers.
Index terms: Breast neoplasms, diagnosis, 00.125, 00.32 Cancer screening, 00.125, 00.32 Diagnostic radiology, observer performance, 00.125 Radiology and radiologists
| INTRODUCTION |
|---|
|
|
|---|
The use of retrospective review to determine whether a cancer is missed is like "Monday morning quarterbacking" because of perceptual bias from a priori knowledge. When the prior study is reviewed, the viewer knows that a cancer is definitely present and where it is located. The cancer may have been missed at the prior examination because of faulty perception or inappropriate application of decision criteria. Alternatively, the cancer may have been missed because the features of the cancer were so ambiguous that it was not detectable without the use of a priori knowledge. The review bias can be overcome by performing a blinded reading in which the reader is uncertain about both the location and the presence of a cancer.
The experiment reported here was performed to determine what proportion of cancers that were missed on a prior mammogram reviewed after the cancer was detected were not intrinsically detectable or could have been detected initially. These are called retrospective cases. Cases with cancers that were detected initially, called prospective cases, were included to estimate the true-positive fraction of cases known to be intrinsically detectable. Cancer-free cases were included to estimate the false-positive fraction.
If the cancers in retrospective cases are detected in a blinded review, then we may assume that they were intrinsically detectable but were missed initially. If not, then they were visible retrospectively but not intrinsically detectable.
Whether retrospectively identified cancers can reliably be detected in a blinded review has been studied before in various guises (35) but never examined with the benefit of the eye-position data to determine whether such cancers attract visual attention. Eye-position data are used to determine what on the radiograph attracts visual attention at measurement of the visual dwell, which is the cumulative duration of fixations clustering within a 3.2-cm-diameter region of interest on the mammographic display. We have found that for both lung nodules and breast lesions, the visual dwell on false-negative decisions is almost as long as the dwell on true- and false-positive decisions (6,7). This suggests that unreported lesions are visually processed. The purpose of this study was to determine whether unreported retrospectively identified cancers on mammograms receive prolonged visual attention and can reliably be detected in a blinded review.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The experienced mammographer, who did not participate as a viewer, selected the cases from digitized images. All of the normal cases were those of patients who were disease free for at least 2 years and were classified in BI-RADS (Breast Imaging Reporting and Data System) category 1 or 2. Each case consisted of a craniocaudal (CC) and mediolateral oblique (MLO) view. The 20 cancer cases that went unreported on the mammogram obtained immediately prior to the detection had a mean interval between mammographic examinations of 14.6 months (range, 629 months). The distribution of major findings is shown in Table 1.
|
Digitizing and Displaying the Mammograms
All of the cases were digitized to 50-µm pixel resolution with a digitizer (Lumiscan Model 100; Lumysis, Sunnyvale, Calif). The gray scale for each view was set automatically by using an algorithm in which the skin line was identified and the intensity range within it sampled. The sampled intensity values were used to define a look-up table for each breast image that made full use of the gray-scale range of the display monitor. The automatic gray-scale algorithm was used to save time and interruptions in the experimental procedure. The two-view mammograms were displayed side by side on a 21-inch high-spatial-resolution (2,560 x 2,048) landscape monitor (DS5000L; Clinton Electronics, Rockford, Ill), as shown in Figure 1. The 50-µm images were displayed at 127 µm.
|
Viewers and Reading Procedure
Four full-time fellowship-trained mammographers blinded to the clinical diagnostic outcomes of the cases were the viewers. Their recorded average case-reading experience during the year prior to their participation was 3,035 mammographic cases (range, 1,1094,041 cases). The viewers were familiar with the method of displaying the digitized mammograms and with interacting at the workstation. They were told that they would be shown a combination of subtle-lesion cases and normal cases. Informed consent was obtained from all patients in compliance with a protocol approved by the University of Pennsylvania institutional review board. The cases were read in random order. Viewers were given unlimited viewing time and instructed to scan the CC and MLO views for an initial impression and then, before making a final decision, examine the mammogram in detail by using a zoom-and-rove window. This is analogous to reading mammograms by first viewing them for an overall impression and then looking at image detail with a magnifying lens.
Eye position was monitored during the initial viewing. The viewer terminated this phase by activating a pull-down window and indicating the initial impression of the case as normal or abnormal. No confidence rating was requested because we did not want to interrupt the perception and decision-making processes by forcing the viewer into a detailed interpretation. Eye-position monitoring was terminated as soon as the viewer pulled down the menu.
During the final impression, the viewer could zoom on the regions of interest by moving a mouse-controlled cursor to a region of interest and clicking. Zooms were initiated by clicking on a point on the image. The x and y cursor locations were identified on the image and recorded in the viewers data file with eye-position coordinates and decision responses and confidences. When the viewer clicked on a given point, this action opened a zoom window that displayed that part of the mammogram at full 50-µm image resolution. The digital zoom window was 5.8 x 5.8 cm2 and covered about 30% of one image (CC or MLO view).
A malignant lesion that was either seen during the initial impression or newly discovered during the final impression was reported by clicking on the image location. This called up a report menu in which the viewer indicated the type of lesionthat is, mass, microcalcification, or architectural distortionand gave a rating of decision confidence as high, medium, or low. The x and y coordinates of the lesion location and the decision confidence were recorded and stored in the viewers data files.
Eye-Position Recording
The eye position was recorded with an eye-head tracker (Model 4000SU; Applied Science Laboratories, Bedford, Mass). The viewer wore a headband containing the eye-head tracking system, as shown in Figure 1.
Eye-position data were acquired with a computer that coordinated eye and head movements, analyzed them into fixations, and related the x and y coordinates of each fixation to locations on the mammographic display. In our study, a fixation was a pause in the eye movement that typically lasted 100200 msec. The eye movement scanning pattern over the mammogram was translated into a sequence of fixations connected by lines and referred to as a scan path. Fixations within 1.6 cm of each other within the scan path were grouped into fixation clusters that typically lasted 500 msec (median). These fixation clusters are long enough to be regarded as centers of focal attention (9). The duration of a fixation cluster provides a direct estimate of the time spent by a viewer processing a region of interest on a mammogram. We refer to the cumulative fixation duration of a cluster as dwell. Details of the eye-position recording and analysis have been published previously (10).
Analysis of the Viewer Responses
For the initial impression, eye-position data were used to identify on the mammogram the regions of interest that elicited decisions. A true-positive case was defined as that having an eye-fixation cluster with a dwell greater than 1,000 msec falling within 1.6 cm of the center of a true lesion. A false-positive case was defined as an eye-fixation cluster with dwell greater than 1,000 msec falling on a cancer-free case. The false-positive proportion was calculated by dividing the number of fixation clusters with dwell greater than 1,000 msec by the total number of fixation clusters per case per viewer.
The dwell threshold of 1,000 msec has been used in a number of previous studies (11). A fixation cluster with a dwell greater than 1,000 msec was defined as having prolonged attention because it exceeded twice the absolute deviation (176 msec) of the median duration of a fixation cluster (500 msec). Hillstrom (12) found that the mean response time for searching and identifying a target defined by a conjunction of features (color and orientation of bars, which is a much simpler task than searching and identifying a breast lesion) ranged from 800 to 1,000 msec.
For the final impression, a true-positive case was scored when the viewer clicked the cursor within 1.6 cm of the true-lesion center, localized a lesion, classified it as malignant, and gave it a confidence rating. For a case to be scored as true-positive, viewers had to localize at least one true malignant lesion on either view of a study and give it a rating. The type of lesion did not enter into the scoring. A "wrong lesion" was scored when the viewer failed to localize a true malignant lesion for the case and clicked the cursor on a region that was beyond 1.6 cm of a true lesion and gave it a rating. Only the highest rating per case was scored. A false-positive case was scored when the viewer clicked the cursor localizing a lesion in a nonmalignant case, classified it falsely as malignant, and gave it a confidence rating.
The responses were analyzed by using the location-response operating characteristic (LROC) curve (13). The number of highest-confidence positive reports per case divided by the total number of cases in the cancer group and in the cancer-free group provides the true-positive fraction and false-positive fraction, respectively. There are at least three different ways to define the total number of decisions in each group. One can count cases, or individual images (CC and MLO), or total lesions. Cases were chosen for the LROC analysis because the initial impression was given on a case basis.
The area under the receiver operating characteristic (ROC) curve, Az, was used to compare the performance on the initial impression and that at final decision. The Az value at the final decision was calculated from the 2 x 4 table (malignant vs nonmalignant by high-, medium-, or low-decision confidence plus default "normal") by using the ROCFIT computer program (14). The Az value of the initial impression was based on eye-position data that followed the rules described earlier to estimate the true-positive fraction and the false-positive fraction as follows: Az =
(da/
2), where da =
2/(1 + s2) (z · true-positive fraction - sz · false-positive fraction),
is the cumulative normal distribution function, da is an index of detectability, s is the slope of the ROC curve, and z is the inverse of the cumulative normal distribution function (15). The actual slope is unknown but was estimated by using a slope of 1 for the final decision. The Az value is based on the assumption of binormality and is derived from the
transformation of da, as just shown.
For the final impression, the highest level of confidence on a correctly localized and classified lesion on either the CC or MLO view was entered into the malignant row of a 3 x 4 table for the LROC analysis, and the highest confidence in either the CC or MLO view of a cancer-free case was entered into the nonmalignant row of the table. If a lesion was not correctly localized on either the CC or MLO view of a cancer case and incorrect lesions were localized, the highest confidence rating was entered into the row containing the wrong-lesion category (16).
The LROC procedure eliminates the benign category and allows the readers to decide only whether a malignant lesion is present in each case and rate the degree of confidence in the decision as low, medium, or high. This differs from the BI-RADS scoring, which, for our study, could be considered to be a four-level rating scale where BI-RADS categories 1 and 2 are used as one level when rating the decision confidence of a case as cancer-free and BI-RADS categories 35 are used as three levels when rating the decision confidence of a case as cancer containing. But this use of the BI-RADS rating scheme ignores the clinical distinction between normal, benign, and malignant lesions, which makes the BI-RADS a multidimensional classification scheme. This multidimensional scheme is incompatible with the proper use of LROC, which calls for decision-confidence ratings to be based on a single binary dimension such as cancer free versus cancer containing or malignant versus nonmalignant.
To achieve maximum statistical power and efficiency, we decided on a repeated-measures experimental design in which readers read both case types, retrospective and prospective, with the aim of testing differences in case types within readers. As part of our attempt to maximize efficiency in using readers time, we used the same cancer-free cases to estimate the false-positive fraction for both case types. However, this modification violated the usual assumption of independence. Accordingly, we chose to be conservative in comparing areas and proportions of ROC and LROC performance by testing differences in case types between readers by using the unpaired t test. Further, because proportions are not normally distributed, we transformed them by using probits prior to the analysis.
Analysis of the Ability to Correctly Identify Lesions
In addition to decision performance at the case level, we determined how accurately the viewers localized and interpreted lesions within each case. A lesion analysis was used in the present study because it enabled us to identify and relate eye-position data associated with overall decisions during initial impression to correctly localized cancers in the final impression after zooming. The proportion of correctly localized cancers out of all the positive reports was used as a measure of lesion-based performance. A lesion analysis is relevant because mammographers are required to localize suspicious lesions to carry out follow-up studies such as magnification views, spot compression views, ultrasonography (US), and biopsy and thus determine whether a suspicious lesion is malignant or benign.
| RESULTS |
|---|
|
|
|---|
There were an average of 30 fixation clusters (range, 1064 clusters) per case. In the present study, 52% of fixation clusters had dwell greater than 1,000 msec among all positive decisions (true-positive and false-positive), and only 19% of fixation clusters had dwell greater than 1,000 msec among all negative decisions (true-negative and false-negative).
Table 2 shows the yield of case decisions resulting from an initial-impression report of abnormal cases, defined as true-positive, in which viewers fixated on at least one true lesion for more than 1,000 msec within 1.6 cm of a true lesion, and report of the proportion of fixations greater than 1,000 msec on nonlesion areas of normal cases, which was defined as the false-positive rate. On average, for retrospective cases, only 30% of true lesions were fixated. Since the false-positive rates were the same, the Az value for the retrospective cases was .60, and for the prospective cases, .73.
|
|
The average number of zooms per case, regardless of initial decision, was seven (range, 015). There were no differences between retrospective, prospective, and cancer-free cases when tested with repeated-measures analyses of variance, although more zooms occurred in cancer-free cases.
Zooms were related to fixations generated during the initial phase. For retrospective cases, 71% (77 of 109) of zooms across all four readers fell within 1.6 cm of locations in common with eye-fixation clusters. For prospective cases, 76% (130 of 172) of zooms fell in common with eye-fixation clusters, and for cancer-free cases, 85% (11 of 13) of zooms fell in common with eye-fixation clusters. For retrospective cases, 55% (36 of 66) of first zooms resulted in true-positive cases; for prospective cases, 48% (24 of 50) of first zooms resulted in true-positive cases.
Detailed examination of mammograms with zooming led to the discovery of true-cancer cases, detected by reporting and correctly localizing them at final impression, that were not detected with the eye-position analysis of dwell, as shown in Table 2. For retrospective cases, eight of 24 (proportion, .33) new true-cancer cases were detected, and for prospective cases, 11 of 20 (proportion, .55) new true-cancer cases were detected. But the increase in the discovery of true cancer in retrospective cases was associated with an increase in localization and false classification of lesions in cancer-free cases.
Lesion Analysis
The effects of fixating and zooming become clearer when a lesion analysis, rather than a case analysis, is applied. Table 4 shows the proportion of true-positive cancers that were classified correctly out of the total number of positive decisions made by each viewer. The results per lesion in Table 4, when compared with the results per case in Table 3, show how accurately the mammographers classified lesions within the cases that they received credit for under the LROC and ROC analyses.
|
The following eye-position records will illustrate how an experienced mammographer scans the two-view mammographic display showing fixation scan paths, fixation clusters with visual dwells greater than 1,000 msec, and zooms. The first eye-position record, shown in Figure 2, is a mammogram of an abnormal case.
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
The performance of our sample of mammographers suggests that many of the retrospective-cancer cases that were judged visible in retrospect either lacked sufficient conspicuity to attract visual attention or lacked distinctive features of malignancy, making it difficult to differentiate malignant from nonmalignant breast-image perturbations. From the standpoint of attracting visual attention, only 30% of retrospective cases on average qualified at blinded review (Table 2). This casts doubt as to whether experts can objectively decide, after retrospective review, whether a cancer that was missed should have been reported. Missing a substantial finding on a radiograph is an event that escapes few radiologists during their professional careers, and when one of these errors is pointed out, most are astonished that they did not see what now (in retrospect) seems obvious to them. This study suggests that a priori knowledge exerts a powerful influence on perceptual judgment in retrospective review.
We acknowledge that the experimental conditions of the present study limited our conclusions to some extent. But we attempted to capture the clinical mammographic reading task as closely as possible and still scientifically measure a mammographers perceptual and decision-making analysis of the radiographs. From clinical practice, we know that mammographers do not rely on visual analysis of two mammographic views alone to determine malignancy, but rather perform additional evaluations of suspicious findings, such as magnification views, spot compression views, additional projections, and US. Harvey et al (3) found that only 41% of retrospectively visible cancers were reported at blinded review when the viewers were given access to the most recent previous mammograms. The mammographers in the present study were given only two views of one breast, and about the same percentage of retrospectively visible cancers (40%) were reported at blinded review. Findings from both studies found a significant difference when the retrospective-review bias was eliminated.
The present study findings provide some insight into how mammographers read mammograms. First, they show how important the initial impression is in guiding perceptual analysis and decision making (17,18). Seventy-six percent (61 of 80) of the retrospective cases and 80% (32 of 40) of the prospective cases were correctly reported as abnormal during the initial impression. But, as we know from subsequent analysis of the abnormal decisions, this initial performance was misleading, because what the viewers were fixating on, localizing, and classifying was not necessarily a malignant lesion. An in-depth analysis, based on eye position data within an ROC framework, revealed that one-third to two-thirds of the initial case decisions were not related to fixating on the truly malignant lesion but rather to fixating a false lesion. In the final impression phase, the effect of digital zooming was selective, improving performance for prospective cases but actually decreasing performance for retrospective cases, because more than half of the localized lesions were falsely classified as malignant.
Second, it appears that in our study, zooming was used primarily to confirm or reject suspicious findings that attracted visual attention rather than to discover new findings. Fixation clusters generated during the initial impression were largely responsible for guiding zooming during detailed examination. Support for this comes from the fact that viewers prioritized first zooms by almost always localizing them on first-seen and first-reported lesions. Then too, more than three-fourths of all zooms overlapped the fixation clusters with visual dwells greater than 1,000 msec. This suggests that zooms indicate regions of interest, and this is also how we have interpreted the role of fixation clusters. Perhaps in the future, we may be able to gain valuable information about decision making by measuring the scanning sequence and image content at zoomed locations and at prolonged visual-dwell locations.
Finally, the LROC analysis based on lesion detection confirmed that viewers performance was significantly inferior in recognizing and localizing malignant lesions for retrospective cases than for prospective cases (proportion of .46 vs .75, respectively). The mammographers performance was low in this study because they were dealing with the most subtle lesion type and because they had to base their decisions solely on the visual properties of two mammographic views without the benefits of prior images and clinical history. In this study, we have shown that retrospectively detected breast cancers like those in our test set are not detected either because they escape visual attention during visual search or because they are detected and receive prolonged visual attention but cannot be reliably interpreted as cancers without additional imaging tests. Thus, most of our retrospective cancers were not intrinsically detectable.
In conclusion, when retrospective cancer cases that were not reported initially were subjected to blinded review, more false-positive than true-positive cases were reported.
For the sample of subtle cancer cases in the present study, the initial impression played a major role in guiding perceptual analysis during visual search and interpretation. At initial impression, two-thirds of all correctly interpreted true cancers were picked up, but at retrospective impression, proportionally more false- than true-positive cases were picked up.
A digital zoom was used in the present study primarily to confirm initial perception of abnormalities rather than to discover new abnormalities. For retrospective cancer cases, the new cancers that were discovered with zooming were offset by false-positive decisions. Thus, what the mammographer sees initially is practically all of what the mammographer finally reports when limited to two views.
The mammographers in the present study were capable of detecting malignant findings in clinical practice from the point of view of having read a great deal of mammograms but were unable to reliably interpret malignant lesions solely on the basis of perceptual analysis without additional imaging.
| FOOTNOTES |
|---|
Author contributions: Guarantor of integrity of entire study, C.F.N.; study concepts, C.F.N., C.M.T., S.P.W., H.L.K.; study design, C.F.N., C.M.T., S.P.W.; literature research, C.F.N.; clinical studies, R.E.H.S., J.A.B., S.P.W., E.F.C.; experimental studies, C.F.N., H.L.K.; data acquisition and analysis/interpretation, C.F.N., C.M.T.; statistical analysis, C.F.N.; manuscript preparation, C.F.N., H.L.K., C.M.T.; manuscript definition of intellectual content, C.F.N., H.L.K., C.M.T., S.P.W.; manuscript editing, C.F.N., H.L.K., C.M.T., S.P.W., E.F.C., J.A.B., R.E.H.S.; manuscript revision/review, C.F.N., H.L.K., S.P.W., C.M.T., E.F.C.; manuscript final version approval, all authors.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. G. Elmore and R. J. Brenner The More Eyes, the Better to See? From Double to Quadruple Reading of Screening Mammograms J Natl Cancer Inst, August 1, 2007; 99(15): 1141 - 1143. [Full Text] [PDF] |
||||
![]() |
C Mello-Thoms The problem of image interpretation in mammography: effects of lesion conspicuity on the visual search strategy of radiologists Br. J. Radiol., December 1, 2006; 79(Special_Issue_2): S111 - S116. [Abstract] [Full Text] [PDF] |
||||
![]() |
R S Saunders and E Samei Improving mammographic decision accuracy by incorporating observer ratings with interpretation time Br. J. Radiol., December 1, 2006; 79(Special_Issue_2): S117 - S122. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Kopans Be Careful to Not Willingly Suspend Disbelief Radiology, December 1, 2004; 233(3): 645 - 647. [Full Text] [PDF] |
||||
![]() |
N. Karssemeijer, J. D. M. Otten, A. L. M. Verbeek, J. H. Groenewoud, H. J. de Koning, J. H. C. L. Hendriks, and R. Holland Computer-aided Detection versus Independent Double Reading of Masses on Mammograms Radiology, April 1, 2003; 227(1): 192 - 200. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. F. Nodine, C. Mello-Thoms, H. L. Kundel, and S. P. Weinstein Time Course of Perception and Decision Making During Mammographic Interpretation Am. J. Roentgenol., October 1, 2002; 179(4): 917 - 923. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Alastuey, C. F. Nodine, S. P. Weinstein, and E. F. Conant A Look in the Wrong Place * Drs Nodine and colleagues respond: Radiology, July 1, 2002; 224(1): 298 - 298. [Full Text] |
||||
Read all eLetters
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |