|
|
||||||||
Breast Imaging |
1 From the Radiology Department, St George's Hospital, Blackshaw Rd, London SW17 0QT, England (L.A.L.K., R.M.G.); and Royal Free and University College Medical School, London, England (P.T.). Received August 3, 2004; revision requested October 8; revision received November 22; accepted December 27. Supported by the National Health Service Breast Screening Programme. Address correspondence to R.M.G. (e-mail: rosalind.givenwilson{at}stgeorges.nhs.uk).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: The study had appropriate ethics committee approval. Informed consent was not required; however, patients were informed that their mammograms might be used in research efforts, and all patients agreed to participate. Mammograms obtained in 6111 women (mean age, 58.4 years) undergoing routine screening every 3 years were analyzed with a CAD system. Mammograms were independently double read. Twelve readers participated. Readers recorded an initial evaluation, viewed the CAD prompts, and recorded a final evaluation. Recall to assessment was decided after arbitration. Sensitivities were calculated for single reading, single reading with CAD, and double reading, as a proportion of the total number of cancers detected by using double reading with CAD.
RESULTS: A total of 62 cancers were detected in 61 women. CAD prompted 51 (84%) of 61 radiographically detected cancers. Of 12 cancers missed on single reading, nine were correctly prompted; however, seven prompts were overruled by the reader. Sensitivity of single reading was 90.2% (95% confidence interval [CI]: 83.0%, 95.0%), single reading with CAD was 91.5% (95% CI: 85.0%, 96.0%), and double reading without CAD was 98.4% (95% CI: 91.0%, 100%). Cancer detection rate was 1%. Recall to assessment rate was 6.1%, with an increase of 5.8% because of CAD. Average time required, per reader, to read a case was 25 seconds without CAD and 45 seconds with CAD.
CONCLUSION: CAD increases sensitivity of single reading by 1.3%, whereas double reading increases sensitivity by 8.2%.
© RSNA, 2005
| INTRODUCTION |
|---|
|
|
|---|
Computer-aided detection (CAD) systems can potentially be used to address staffing problems and improve error rates in mammogram reading. The two main commercially available systems (ImageChecker, R2 Technology, Los Altos, Calif; Second Look, iCAD, Nashua, NH) have similar levels of sensitivity of up to 98.2% for microcalcifications, 88.7% for soft-tissue lesions, and 90.0% for malignant lesions overall (8,9). It has been argued that CAD, performing at this level of sensitivity, has the potential to replace the second reader (10).
To accurately assess the potential of CAD, realistic studies of its effect on reader decision-making are needed. Thus, the purpose of this study was to evaluate prospectively the recall and cancer detection rates with and without CAD in the United Kingdom National Health Service Breast Screening Programme.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Women were enrolled through the South West London Breast Screening Service, which is a part of the United Kingdom National Health Service Breast Screening Programme. Women aged 5064 years are invited to undergo mammographic screening every 3 years. Women older than 64 years may refer themselves. The examination consists of two views of each breast. Between March 21, 2003, and January 9, 2004, inclusive mammograms obtained in 6111 women (age range, 4594 years; mean age, 58.4 years) were read with CAD. During this period, another 13 391 mammograms were read on rollers; CAD was not available on these rollers.
Study Design
Mammograms obtained in these 6111 women were digitized and analyzed with the ImageChecker system, version 5.0 (R2 Technology). Women requiring more than the standard four views for coverage of the whole of both breasts could not be analyzed with this system, and they were excluded from the study. Mammograms were loaded in batches on a roller viewer and accompanied by previously obtained round screening mammograms, when available. Twelve film readers participated in this study; seven readers were consultant radiologists (including R.M.G.), and five readers were radiographers trained in reading screening mammograms. All mammograms were independently double read by at least one consultant radiologist. All readers met the training requirements of the National Health Service Breast Screening Programme: They read at least 5000 screening mammograms per year; they were aged 3456 years, with a mean age of 43 years; and their experience in mammography ranged from 4 to 23 years, with a mean experience of 11 years.
Each reader viewed current and available prior mammograms for each patient; at this time, an opinion about any visible abnormalities and whether to recall the patient for further assessment was recorded. CAD prompts for the current mammograms were then displayed on an adjacent liquid crystal display screen. The reader then reassessed the prompted areas before recording a revised assessment.
After double reading, women were considered healthy or assigned for (a) discussion, with a view to recall the patient for technical repeat of mammography; (b) comparison of current and prior mammograms, which were not available in the unit; or (c) arbitration, with a view to recall the patient for further clinical assessment.
Women in the first two categories were excluded from the study, since CAD prompts would not necessarily be available when images were reviewed. At arbitration, cases were discussed by an additional two consultant radiologists who reviewed current and previously obtained (if available) mammograms, CAD prompts, and proformas completed by the first two readers before they decided whether to recall patients for assessment. All of the radiologists (including R.M.G.) who participated in the study participated in arbitration meetings. Final outcomes were recorded after assessment.
A summary of all the women excluded from the study is presented in Table 1. There were two cancers in these women. One cancer was in a woman who required more than four views for coverage of her breast. The other cancer was in a woman who had an expedited appointment; therefore, although CAD prompts are available, there is no reader analysis data for CAD.
|
Relative sensitivity was calculated for each of three protocols (ie, single reading, single reading with CAD, and double reading) by dividing the number of cancers detected with each protocol by the total number of cancers detected. The data obtained are not a measure of the true sensitivity because until 3-year follow-up data are obtained, the total number of missed cancers cannot be ascertained. Thus, sensitivity is expressed as a proportion of the total number of cancers detected with double reading with CAD. Exact 95% confidence intervals (CIs) for the relative sensitivities were calculated. Where the CIs for two sensitivities overlapped, the difference was assumed to not be statistically significant.
For double reading, cancer was considered detected if either reader detected it at step 1; cancer was considered missed if neither reader detected it until step 2 (Figure). A cancer detected by a reader was considered detected at single reading if the reader detected it at step 1; otherwise, it was deemed missed. If a cancer was not detected by a reader at step 1 but was detected at step 2, it was deemed detected by that reader for single reading with CAD; otherwise, it was deemed missed.
|
Radiologic and gross pathologic details of detected cancers were recorded, as were prompt rates and whether cancers were correctly prompted. A prompt was deemed correct if, after review by two authors (R.M.G. and L.A.L.K.), the cancer was judged to be marked in one or both views with the appropriate mass or calcification marker. A sample of 613 noncancerous cases was selected at random and reviewed by an author (L.A.L.K.) to provide an estimate of the false prompt rate.
Recall and cancer detection rates were also measured for the period of the study on the mammograms read on rollers that were not equipped with CAD viewers. The procedure followed was the same as that used for mammograms, which involved double reading of current and available prior mammograms followed, when appropriate, by arbitration and recall. Twelve readers read these mammograms, including 10 readers who participated in the study.
We used the StatPages.net tool (accessed October 21, 2004), which is available at http://members.aol.com/johnp71/confint.html, for statistical analysis.
| RESULTS |
|---|
|
|
|---|
|
|
Timing
Mean time per reader per case was 25 seconds for reading without CAD (hence 50 seconds for double reading) and 45 seconds for reading with CAD. Ignoring the possible confounding factor of nonindependence between readers, a Mann-Whitney test was performed with these data (z = 4.46, P < .001) (18 rollers in each group). Each additional arbitration requires 2.2 minutes of radiologist time, and each additional assessment appointment requires 1 hour. The extrapolated overall time effect of single reading with CAD relative to double reading is presented in Table 4. For our study group of 6111 women, single reading with CAD took an extra 9.43 hours when compared with double reading.
|
|
| DISCUSSION |
|---|
|
|
|---|
In our study, there was an increase of recall both to arbitration and to assessment (5.7% and 6.0% increase, respectively) because of CAD in the study group. Studies have shown that the effects of CAD range from slight improvements in specificity to increases in recall rate of up to 19% (1113,15,16). In the 36 single readings (31 women) recalled to assessment because of CAD, there were two cancers; thus, the positive predictive value for recall to assessment on the basis of a CAD prompt is 5.6%. This compares unfavorably with an overall positive predictive value of 19% for malignancy of recall cases in the unit.
Although the number of increased cancers (above single reading) detected with CAD was small (1.5%, two of 118 readings), the total is similar to that in other prospective studies. Bandokar et al (17) found two extra cancers that were prompted by CAD in 4089 screening cases (12%), and Young et al (11) found three extra cancers that were prompted by CAD in 12 082 screening mammograms (6.2%). These percentages are larger than those in our study because fewer total cancers were detected. This reflects differences between screening practice in the United States and that in the National Health Service Breast Screening Programme. In the United Kingdom, the women screened are older (mean age, 58.4 years in this study), the screening interval is longer (3 years), and double reading is routinely used. In the study of Freer and Ulissey (12), the median age was 49 years, the screening interval was 13 months, and single reading was practiced.
Why Correct CAD Prompts Are Dismissed
In retrospective studies, the sensitivity of CAD in women with a cancer that was missed has been measured, and it has been assumed that this would translate into increased cancer detection rates. Brem et al (18) found that CAD prompted 65% of 123 missed cancers. They assume that these cancers would have been identified if prompted; therefore, they estimate a 21.2% increase in radiologist sensitivity. Warren Burhenne et al (19) used a similar method and estimated that the false-negative rate could be cut by 77%. These studies, however, assume that readers will respond to every correct CAD prompt. The findings of this study, which are in line with the findings of other studies, show that readers reject some true prompts (15,20,21).
It is important to understand this phenomenon. The study design ensured that readers turned on the screen and looked at the CAD prompts. The seven correct prompts that were ignored or overruled were all for masses. There were few false prompts for these cases (three false prompts for seven sets of mammograms). Previous studies have shown that a high level of confidence that a mammogram is normal is associated with increased likelihood that a correct prompt will be ignored or overruled (21). It has been postulated that greater confidence can be placed on calcification prompts than on other prompts (22).
It is possible that CAD will have more influence on detection and search errors than on decision errors. In this study, despite the high cancer detection rate of 1%, 99% of women are healthy. With a false prompt rate of 1.59 per case for 6050 healthy women and a sensitivity of 84% for 62 cancers, readers will have to dismiss 180 false prompts for every true prompt. This low specificity may be the major factor explaining why readers are more likely to ignore correct prompts than respond to them. It has been postulated that failure to act on nonspecific but CAD-marked findings prospectively in patients with subtle cancer does not constitute negligence (23).
Timings and Resources
Average time required for single reading of a set of mammograms was 25 seconds without CAD and 45 seconds with CAD. Thus, reviewing CAD prompts and the original mammograms nearly doubles the time a single reader takes to read a set of mammograms. Time saved per reading, by adopting CAD with single reading rather than double reading, is 5 seconds per case. However, when extra time spent on arbitration and assessment is considered, there is a net loss of 9.43 hours of radiologist time over the study period.
Use of CAD resulted in increased demand on resources, including a fee of $4800 (in U.S. dollars) per month for the lease of the ImageChecker system (R2 Technology). Also, a part-time member of the clerical staff sorted mammograms, put them through the digitizer, and matched them with a bar code reader to enable them to be loaded on the CAD roller. In the United States, the financial calculation would be different because of increased reimbursement for the use of CAD, which may range from $15 to $28 per examination (24).
Study Design
There are some potential shortcomings in the design of our study. The outcome of patients with negative mammograms is not confirmed. To establish the true false-negative rate, 3 years of follow-up will be needed. Although the outcome of such follow-up will be interesting, it does not affect the primary measure of cancers diagnosed with CAD.
A number of women were excluded from analysis because they were designated for technical recall, reading was delayed while old mammograms were obtained, or multiple mammograms were obtained at initial screening. Only two cancers occurred within this group of patients, and it does not appear that exclusion of these women significantly changed the measurement of cancers in the CAD group.
A further question is whether training and experience with the CAD system were adequate for readers during the study. The majority of readers in this study were trained to use this system by the manufacturer applications specialist; this training is given to new purchasers of the system. This training was cascaded by those who were present to the few readers who were not present at initial training. Readers participating in this study also took part in previous archive studies of CAD within our unit. During the study, they were using CAD in their routine daily screening work for more than 9 months. Participation in arbitration discussion, assessment, and gross pathologic review meetings provided readers with feedback on the outcome of women undergoing screening during this period. No evidence of an increasing or decreasing trend in terms of responsiveness to prompts was seen during this time.
The study is not a straightforward comparison of single reading with CAD against double reading, since the extra step of arbitration applied to all women in whom recall was considered. This should not affect the main outcome measure of sensitivity, as the function of arbitration is to improve specificity.
In a number of cases, readers changed their mind to recall to arbitration discussion on the basis of CAD prompting (89 single readings in 78 women). Only 31 women (36 single readings) had findings that were deemed suspicious at arbitration, and these women were recalled and assessed. Only two women had cancer. It will be interesting to follow up the rest of these women over 3 years to see whether any cancers have developed that might also have been potentially diagnosed with CAD.
Recall (6.1% vs 5.0%, P < .001) and cancer detection (1.0% vs 0.84%, P = .2) rates were higher in the study group than in patients with mammograms that were read during the same period on rollers not equipped with CAD. The overwhelming majority of the extra cancers were detected before viewing CAD prompts. Although there was no systematic difference between the women whose films were read with and without CAD, there was a difference between the readers reading CAD and non-CAD cases, with two readers reading only non-CAD cases.
Conclusion
Double reading with CAD served as the reference standard; the relative sensitivity of single reading is 90.2% (95% CI: 83.0%, 95.0%), the relative sensitivity of single reading with CAD is 91.5% (95% CI: 85.0%, 96.0%), and the relative sensitivity of double reading is 98.4% (95% CI: 91.0%, 100%). Single reading with CAD is significantly slower than single reading without CAD. Our estimates suggest that use of single reading with CAD as an alternative to double reading would increase the time required. CAD prompted 84% of cancers in this study; however, readers ignored or overruled 78% of correct prompts on missed cancers.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Abbreviations: CAD = computer-aided detection CI = confidence interval
The views and opinions expressed herein are those of the authors and do not necessarily reflect those of the Department of Health or the NHS Breast Screening Programme.
Authors stated no financial relationship to disclose.
Author contributions: Guarantors of integrity of entire study, L.A.L.K., P.T., R.M.G.; study concepts/study design or data acquisition or data analysis/interpretation, L.A.L.K., P.T., R.M.G.; manuscript drafting or manuscript revision for important intellectual content, L.A.L.K., P.T., R.M.G.; approval of final version of submitted manuscript, L.A.L.K., P.T., R.M.G.; literature research, L.A.L.K., P.T.; clinical studies, L.A.L.K., R.M.G.; experimental studies, R.M.G.; statistical analysis, P.T.; and manuscript editing, L.A.L.K., P.T., R.M.G.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. M. Nishikawa and L. L. Pesce Computer-aided Detection Evaluation Methods Are Not Created Equal Radiology, June 1, 2009; 251(3): 634 - 636. [Full Text] [PDF] |
||||
![]() |
F. J. Gilbert, S. M. Astley, M. G.C. Gillan, O. F. Agbaje, M. G. Wallis, J. James, C. R.M. Boggis, S. W. Duffy, and the CADET II Group Single Reading with Computer-Aided Detection for Screening Mammography N. Engl. J. Med., October 16, 2008; 359(16): 1675 - 1684. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gromet Comparison of Computer-Aided Detection to Double Reading of Screening Mammograms: Review of 231,221 Mammograms Am. J. Roentgenol., April 1, 2008; 190(4): 854 - 859. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Li, R. Engelmann, C. E. Metz, K. Doi, and H. MacMahon Lung Cancers Missed on Chest Radiographs: Results Obtained with a Commercial Computer-aided Detection Program Radiology, January 1, 2008; 246(1): 273 - 280. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. Brem Blinded Comparison of Computer-Aided Detection with Human Second Reading in Screening Mammography: The Importance of the Question and the Critical Numbers Game Am. J. Roentgenol., November 1, 2007; 189(5): 1142 - 1144. [Full Text] [PDF] |
||||
![]() |
S. Ciatto, N. Houssami, D. Gur, R. M. Nishikawa, R. A. Schmidt, C. E. Metz, J. F. Ruiz, S. A. Feig, R. L. Birdwell, M. N. Linver, et al. Computer-aided screening mammography. N. Engl. J. Med., July 5, 2007; 357(1): 83 - 84. [Full Text] [PDF] |
||||
![]() |
J. J. Fenton, S. H. Taplin, P. A. Carney, L. Abraham, E. A. Sickles, C. D'Orsi, E. A. Berns, G. Cutter, R. E. Hendrick, W. E. Barlow, et al. Influence of Computer-Aided Detection on Performance of Screening Mammography N. Engl. J. Med., April 5, 2007; 356(14): 1399 - 1409. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |