|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Breast Imaging |
1 From the Departments of Radiology (D.G., C.S.C., C.M.H., M.A.G., R.L.P., R.S., J.H.S., L.P.W.) and Biostatistics (A.I.B., H.E.R.), University of Pittsburgh School of Medicine, 3362 Fifth Ave, Pittsburgh, Pa 15213-31803; University of Colorado Hospital Breast Center, Aurora, Colo (L.A.H.); and West Penn Allegheny Health System, Pittsburgh, Pa (W.R.P.). Received November 19, 2007; revision requested December 21, 2007; final revision received January 17, 2008; accepted March 28. Supported in part by grants EB001694 and EB003503 (to the University of Pittsburgh) from the National Institute for Biomedical Imaging and Bioengineering (NIBIB), National Institutes of Health. Address correspondence to D.G. (e-mail: gurd{at}upmc.edu).
Purpose: To compare radiologists' performance during interpretation of screening mammograms in the clinic with their performance when reading the same mammograms in a retrospective laboratory study.
Materials and Methods: This study was conducted under an institutional review board–approved, HIPAA-compliant protocol; the need for informed consent was waived. Nine experienced radiologists rated an enriched set of mammograms that they had personally read in the clinic (the "reader-specific" set) mixed with an enriched "common" set of mammograms that none of the participants had previously read in the clinic by using a screening Breast Imaging Reporting and Data System (BI-RADS) rating scale. The original clinical recommendations to recall the women for a diagnostic work-up, for both reader-specific and common sets, were compared with their recommendations during the retrospective experiment. The results are presented in terms of reader-specific and group-averaged sensitivity and specificity levels and the dispersion (spread) of reader-specific performance estimates.
Results: On average, the radiologists' performance was significantly better in the clinic than in the laboratory (P = .035). Interreader dispersion of the computed performance levels was significantly lower during the clinical interpretations (P < .01).
Conclusion: Retrospective laboratory experiments may not represent either expected performance levels or interreader variability during clinical interpretations of the same set of mammograms in the clinical environment well.
© RSNA, 2008