Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2432060387
This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Obuchowski, N. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Obuchowski, N. A.
(Radiology 2007;243:10-12.)
© RSNA, 2007


Editorials

New Methodological Tools for Multiple-Reader ROC Studies1

Nancy A. Obuchowski, PhD

1 From the Department of Quantitative Health Sciences and Division of Radiology, the Cleveland Clinic Foundation, Wb4, 9500 Euclid Ave, Cleveland, OH 44195. Received March 1, 2006; accepted March 8; final version accepted May 11. Address correspondence to the author (e-mail: ObuchoN{at}ccf.org).

The multiple-reader receiver operating characteristic (ROC) study design has become a frequently used tool in diagnostic radiology. With this tool, an investigator can characterize and compare the accuracy of diagnostic tests that rely on subjective interpretation. This is important because many commonly used diagnostic tests are interpreted subjectively, and there are known to be substantial differences in readers' ability to interpret these tests (1). Thus, a study design with one or two readers cannot appropriately measure the accuracy of these tests.

There are several published statistical methods for analyzing the complex data generated from a multiple-reader ROC study (28), including some software (911). In this editorial, I would like to make investigators aware of some new tools for multiple-reader ROC studies.

While already quite difficult to analyze, many multiple-reader ROC studies are complicated by multiple observations made in the same patient (eg, multiple nodules per patient). These data need a statistical analysis tailored to handle the correlations within a patient. Determination of the required sizes for both the patient and reader samples is a critical issue for multiple-reader ROC studies. New approaches for determining sample size recently have been proposed and may lead to smaller sample sizes (see below). Finally, a simple method of summarizing multiple readers' ROC curves to illustrate the performance of the average of the readers, along with the between-reader variability, can be a valuable tool in reporting study results. Here, I address these practical issues in the analysis, planning, and reporting of multiple-reader ROC studies.


    MULTIPLE OBSERVATIONS IN EACH PATIENT
 TOP
 MULTIPLE OBSERVATIONS IN EACH...
 SAMPLE SIZE DETERMINATION
 MULTIPLE-READER SUMMARY ROC...
 References
 
Two breasts, multiple lung nodules, and multiple arterial segments are common examples of multiple observations in the same patient, or "clustered data." Readers are asked to assign a confidence score to each breast, lung nodule, or arterial segment, where the confidence score describes the reader's confidence that a particular disease or condition is present at that site. Thus, there are multiple related observations made from the same patient. Often the number of observations differs between patients. Multiple observations on the same patient tend to resemble each other, more so than if the observations came from different patients. This resemblance (ie, correlation), however small, must be accounted for when calculating P values or constructing confidence intervals (CIs). If the correlation is ignored, the P values and CIs will be wrong; usually, the P values will be too small and the CIs will be too narrow, which can mislead investigators, editors, and the readership.

There are several methods available for analyzing clustered radiologic data (1217), but they apply to a single reader's data. In the multiple-reader ROC study, the clustered data must be accounted for each reader, as well as between the readers.

One approach for the multiple-reader study is to apply a resampling statistical method, such as jackknifing or bootstrapping. Bootstrapping (14,18) involves generating many samples from the original data set and performing analyses on these samples. Jackknifing requires analyses of samples from the original dataset, where each sample has a different observation removed. When resampling with clustered data, we must treat all of a patient's observations as a unit, rather than sampling, for example, nodules. This approach requires the analyst to write his or her own software, but there are statistical packages (eg, Splus and R) that can assist the analyst. Furthermore, Beiden et al (6) use bootstrapping to analyze multiple-reader ROC data, while others use jackknifing (9,10); thus, it would be a straightforward extension of these methods to bootstrap or jackknife in a manner that respects the multiple observations for each patient.

Another alternative that takes advantage of existing multiple-reader ROC software is to preprocess the clustered data then feed the preprocessed data into the multiple-reader ROC software. This preprocessing involves estimating the readers' ROC areas, variances, and covariances by using the existing clustered data methods (15,16). These estimates can then be entered as input to the OBUMRM software (11), which performs the multiple-reader analysis.


    SAMPLE SIZE DETERMINATION
 TOP
 MULTIPLE OBSERVATIONS IN EACH...
 SAMPLE SIZE DETERMINATION
 MULTIPLE-READER SUMMARY ROC...
 References
 
Hillis and Berbaum (19) have recently published a method of computing sample size for multiple-reader ROC studies. Their method uses pilot data to estimate various parameters (eg, interreader variability) needed for determining sample size; then, these estimates are used to scale the proposed study.

The approach differs from the sample size method of Obuchowski (3,20), in which sample size is estimated from conjectured values of similar parameters. In a comparison of three sample size approaches—the Hillis and Berbaum approach (19), an unpublished method of Beiden et al, and the published method of Obuchowski (20)—both the Hillis and Berbaum method and the Beiden et al method projected that fewer numbers of patients and readers were needed for a future study in comparison with the sample size requirements based on the Obuchowski method (8). This is encouraging because if multireader ROC studies required less resources, then perhaps more such studies could be performed.


    MULTIPLE-READER SUMMARY ROC CURVE
 TOP
 MULTIPLE OBSERVATIONS IN EACH...
 SAMPLE SIZE DETERMINATION
 MULTIPLE-READER SUMMARY ROC...
 References
 
Figure 1 illustrates the typical results of a multiple-reader ROC study. Seventeen readers interpreted the same set of images, and each of their resulting empirical ROC curves are illustrated. The figure shows considerable between-reader differences. Typically in a multiple-reader ROC study, one estimates the area under the ROC curve for each reader and then takes the average of these areas to describe the performance of the diagnostic test. However, no single ROC curve corresponds to this average ROC area. It would be useful to illustrate a summary ROC curve to represent the average performance of the readers.


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1: Graph illustrates 17 readers' ROC curves. The curves represent real data from a multiple-reader study.

 
Swets and Pickett (21) mention several possible ways to construct a summary ROC curve from multiple readers' ROC curves. Taking their ideas a step further, one could construct a summary curve (Fig 2) where one curve illustrates the average of the readers' true-positive rates at each false-positive rate. Other curves can represent the 25th and 75th percentiles of the readers' true-positive rates while yet other curves represent the 5th and 95th percentiles of these readers' true-positive rates.


Figure 2
View larger version (15K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2: Summary curve, in red, illustrates the average of the readers' true-positive rates at each false-positive rate. Blue curves represent the 25th and 75th percentiles of the readers' true-positive rates, whereas the outer black curves represent the 5th and 95th percentiles of these readers' true-positive rates.

 
Thus, the summary curve illustrates both the average performance of the readers and the variability between readers. Summary curves can also be used to display the accuracies of two competing diagnostic tests.

In summary, multiple-reader ROC study designs are a popular tool for radiologists. With continued partnership of radiologists and methodologists, better statistical methods and software will continue to develop for the planning, analysis, and reporting of these studies.


    FOOTNOTES
 
Author stated no financial relationship to disclose.


    References
 TOP
 MULTIPLE OBSERVATIONS IN EACH...
 SAMPLE SIZE DETERMINATION
 MULTIPLE-READER SUMMARY ROC...
 References
 

  1. Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists: findings from a national sample. Arch Intern Med 1996;156:209–213.[Abstract]
  2. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27:723–731.[CrossRef][Medline]
  3. Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Acad Radiol 1995;2(suppl 1):S22–S29.[Medline]
  4. Toledano AY, Gatsonis C. Ordinal regression methodology for ROC curves derived from correlated data. Stat Med 1996;15:1807–1826.[CrossRef][Medline]
  5. Song HH. Analysis of correlated ROC areas in diagnostic testing. Biometrics 1997;53:370–382.[CrossRef][Medline]
  6. Beiden SV, Wagner RF, Campbell G. Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. Acad Radiol 2000;7:341–349.[CrossRef][Medline]
  7. Ishwaran H, Gatsonis CA. A general class of hierarchical ordinal regression models with applications to correlated ROC analysis. Can J Stat 2000;28:731–750.
  8. Obuchowski NA, Beiden SV, Berbaum KS, et al. Multi-reader multi-case ROC analysis: an empirical comparison of five methods. Acad Radiol 2004;11:980–995.[Medline]
  9. LABMRMC. Kurt Rossmann Laboratories for Image Research. University of Chicago Web site. http://xray.bsd.uchicago.edu/krl/KRL_ROC/software_index.htm. Accessed February 16, 2007.
  10. Schartz KM, Hillis SL, Berbaum KS, and Dorfman DD. MRMC2.0. Medical Image Perception Laboratory. University of Iowa Web site. http://perception.radiology.uiowa.edu. Accessed February 16, 2007.
  11. OBUMRM. Cleveland Clinic Foundation Web site. http://www.bio.ri.ccf.org/html/obumrm.html. Accessed February 16, 2007.
  12. Gonen M. Panageas KS. Larson SM. Statistical issues in analysis of diagnostic imaging experiments with multiple observations per patient. Radiology 2001;221:763–767.
  13. Rutter CM. Bootstrap estimation of diagnostic accuracy with patient-clustered data. Acad Radiol 2000;7:413–419.[CrossRef][Medline]
  14. Beam CA. Analysis of clustered data in receiver operating characteristic studies. Stat Methods Med Res 1998;7:324–336.[Abstract/Free Full Text]
  15. Obuchowski NA. Nonparametric analysis of clustered ROC curve data. Biometrics 1997;53:567–578.[CrossRef][Medline]
  16. Obuchowski NA, Lieber ML, Powell KA. Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 2000;7:516–525.[CrossRef][Medline]
  17. Zou KH, O'Malley AJ. A Bayesian hierarchical non-linear regression model in receiver operating characteristic analysis of clustered continuous diagnostic data. Biom J 2005;47:417–427.[CrossRef][Medline]
  18. Efron B, Tibshirani RJ. An introduction to the bootstrap. In: Monographs on statistics and applied probability 57. New York, NY: Chapman & Hall, 1993.
  19. Hillis SL, Berbaum KS. Power estimation for the Dorfman-Berbaum-Metz method. Acad Radiol 2004;11:1260–1273.[CrossRef][Medline]
  20. Obuchowski NA. Sample size tables for receiver operating characteristic studies. AJR Am J Roentgenol 2000;175:603–608.[Abstract/Free Full Text]
  21. Swets JA, Pickett RM. Evaluation of diagnostic systems: methods from signal detection theory. New York, NY: Academic Press, 1982.




This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Obuchowski, N. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Obuchowski, N. A.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE