|
|
||||||||
Statistical Concepts Series |
1 From the Department of Biostatistics and Epidemiology/Wb4, Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH 44195. Received May 8, 2001; revision requested June 11; revision received August 1; accepted August 2. Address correspondence to the author (e-mail: nobuchow@bio.ri.ccf.org).
| ABSTRACT |
|---|
|
|
|---|
© RSNA, 2003
Index terms: Diagnostic radiology Statistical analysis
| INTRODUCTION |
|---|
|
|
|---|
Consider as an example the test results of 100 patients who have undergone mammography (Table 1). According to biopsy results and/or 2-year follow-up results (ie, the reference standard procedures), 50 patients actually have a malignant lesion and 50 patients do not. If these 100 test results were from 100 asymptomatic women without a personal history of breast cancer, then we might define a positive test result as any that represents a "suspicious" or "malignant" finding and a negative test result as any that represents a "normal," "benign," or "probably benign" finding. We have used a cut point for defining positive and negative test results. The cut point is located between the suspicious and probably benign findings. The estimated sensitivity with this cut point is (18 + 20)/50 = 0.76, and the specificity is (15 + 3 + 18)/50 = 0.72.
|
Important point: Sensitivity and specificity depend on the cut point used to define positive and negative test results. As the cut point shifts, the sensitivity increases while the specificity decreases, or vice versa.
| COMBINED MEASURES OF SENSITIVITY AND SPECIFICITY |
|---|
|
|
|---|
|
| RECEIVER OPERATING CHARACTERISTIC CURVE |
|---|
|
|
|---|
Figure 1 illustrates the empirical ROC curve for the mammography example. Since in our example there are five categories for the test results, we can compute four cut points for the ROC curve. The two endpoints on the ROC curve are 0,0 and 1,1 for FPR, sensitivity. The points labeled 1 and 2 on the curve correspond to the first and second cut points, respectively, that are defined in the note to Table 1. Estimations of the other points are provided in Table 3.
|
|
It is often convenient to make some assumptions about the distribution of the test results and then to draw the ROC curve on the basis of the assumed distribution (ie, assumed model). The resulting curve is called the fitted or smooth ROC curve. The fitted curve for the mammography study is plotted in Figure 1; it was constructed from a binormal distribution (ie, two normal distributions: one for the test results of patients without breast cancer and another for test results of patients with breast cancer) (Fig 2). The binormal distribution is the most commonly used distribution for estimating the smooth ROC curve. There are computer programs (for example, www-radiology.uchicago.edu/sections/roc/software.cgi) for estimating the smooth ROC curve on the basis of the binormal distribution; these programs make use of a statistical method called maximum likelihood estimation.
|
The term receiver operating characteristic curve comes from the idea that, given the curve, we, the receivers of the information, can use (or operate at) any point on the curve by using the appropriate cut point. The clinical application determines which cut point is used. For example, for evaluating women with a personal history of breast cancer, we need a cut point with good sensitivity (eg, cut point 2 in Table 1), even if the FPR is high. For evaluating women without a personal history of breast cancer, we require a lower FPR. For each application the optimal cut point (2,7) can be determined by finding the sensitivity and specificity pair that maximizes the function sensitivity - m(1 - specificity), where m is the slope of the ROC curve as follows:
|
|
| MEASURES OF ACCURACY BASED ON THE ROC CURVE |
|---|
|
|
|---|
|
In Figure 1 the area under the empirical ROC curve for mammography is 0.82; that is, if we select two patients at randomone with breast cancer and one withoutthe probability is 0.82 that the patient with breast cancer will have a more suspicious mammographic result. The area under the fitted curve is slightly larger at 0.84. When the number of cut points is small, the area under the empirical ROC curve is usually smaller than the area under the fitted curve.
The ROC curve area is a good summary measure of test accuracy because it does not depend on the prevalence of disease or the cut points used to form the curve. However, once a test has been shown to distinguish patients with disease from those without disease well, the performance of the test for particular applications (eg, diagnosis, screening) must be evaluated. At this stage, we may be interested in only a small portion of the ROC curve. Furthermore, the ROC curve area may be misleading when one is comparing the accuracies of two tests. Figure 4 illustrates the ROC curves of two tests with equal area. At the clinically important FPR range (for example, 0.00.2), however, the curves are different: ROC curve A demonstrates higher sensitivity than does ROC curve B. Whenever the ROC curves of two tests cross (regardless of whether or not their areas are equal), it means that the test with superior accuracy (ie, higher sensitivity) depends on the FPR range; a global measure of accuracy, such as the ROC curve area, is not helpful here.
|
One alternative is to use the ROC curve to estimate sensitivity at a fixed FPR (or, as appropriate, we could use the FPR at a fixed sensitivity). As an example, in Figure 1 the sensitivity at a fixed FPR of 0.10 is 0.60. This measure of accuracy allows us to focus on the portion of the ROC curve that is of clinical relevance.
Another alternative measure of accuracy is the partial area under the ROC curve. It is defined as the area between two FPRs, e1 and e2 (or, as appropriate, the area between two false-negative rates). If e1 = 0 and e2 = 1, then the area under the entire ROC curve is specified. If e1 = e2, then the sensitivity at a fixed FPR is given. The partial area measure is thus a "compromise" between the entire ROC curve area and the sensitivity at a fixed FPR.
To interpret the partial area, we must consider its maximum possible value. The maximum area is equal to the width of the intervalthat is, e2 - e1 (13). McClish (13) and Jiang et al (14) recommend standardizing the partial area by dividing it by its maximum value. Jiang et al (14) refer to this standardized partial area as the partial area index. The partial area index is interpreted as the average sensitivity for the range of FPRs examined (or the average FPR for the range of sensitivities examined). As an example, in Figure 1, the partial area in the FPR range of 0.000.20 is 0.112; the partial area index is 0.56. In other words, when the FPR is between 0.00 and 0.20, the average sensitivity is 0.56.
| EXAMPLES OF ROC CURVES IN RADIOLOGY |
|---|
|
|
|---|
The first example is the study of Mushlin et al (15) of the accuracy of magnetic resonance (MR) imaging for detecting multiple sclerosis (MS). Three hundred three patients suspected of having MS underwent MR imaging and CT of the head. The images were read separately by two neuroradiologists without knowledge of the clinical course of or final diagnosis given to the patients. The images were scored as definitely showing MS, probably showing MS, possibly showing MS, probably not showing MS, or definitely not showing MS. The reference standard consisted of results of a review of the clinical findings by a panel of MS experts, results of follow-up for at least 6 months, and results of other diagnostic tests; the results of CT and MR imaging were not included to avoid bias.
The estimated ROC curve area for MR imaging was 0.82, indicating a good, but not definitive, test. In contrast, the estimated ROC curve area of CT was only 0.52; this estimated area was not significantly different from 0.50, indicating that CT results were no more accurate than guessing for diagnosing MS. The authors concluded that a "definite MS" reading at MR imaging essentially establishes the diagnosis of MS (MR images in only two of 140 patients without MS were scored as definitely showing MS, for an FPR of 1%). However, a normal MR imaging result does not conclusively exclude the diagnosis of MS (MR images in 35 of 163 patients with MS were scored as definitely not showing MS, for a false-negative rate of 21%).
In the second example, Iinuma et al (16) compared the accuracy of conventional radiography and digital radiography for the diagnosis of gastric cancers. One hundred twelve patients suspected of having gastric cancer underwent conventional radiography, and 113 different patients with similar symptoms and characteristics underwent digital radiography. Six readers interpreted the images from all 225 patients; the readers were blinded to the clinical details of the patients. The images were scored with a six-category scale, in which a score of 1 indicated that cancer was definitely absent; a score of 2, cancer was probably absent; a score of 3, cancer was possibly absent; a score of 4, cancer was possibly present; a score of 5, cancer was probably present; and a score of 6, cancer was definitely present. The diagnostic standard consisted of the findings of a consensus panel of three radiologists (not the same individuals as the six readers) who examined the patients and were told of the findings of other tests, such as endoscopy and histopathologic examination after biopsy.
The ROC curve areas of the six readers were all higher with digital radiography than with conventional radiography; the average ROC curve areas with digital and conventional radiography were 0.93 and 0.80, respectively. By plotting the fitted ROC curve areas of each of the six readers, the authors determined that for five of the six readers, digital radiography resulted in higher sensitivity for all FPRs; for the sixth reader, digital radiography resulted in considerably higher sensitivity only at a low FPR.
In summary, the ROC curve has many advantages as a measure of the accuracy of a diagnostic test: (a) It includes all possible cut points, (b) it shows the relationship between the sensitivity of a test and its specificity, (c) it is not affected by the prevalence of disease, and (d) from it we can compute several useful summary measures of test accuracy (eg, ROC curve area, partial area). The ROC curve alone cannot provide us with the optimal cut point for a particular clinical application; however, given information about the pretest probability of disease and the relative costs of diagnostic test errors, we can find the optimal cut point on the ROC curve. There are many study design issues (eg, patient and reader selection, verification and diagnostic standard bias) that need to be considered when one is conducting and interpreting the results of a study of diagnostic test accuracy. Many of these issues will be covered in a future article.
| APPENDIX |
|---|
|
|
|---|
Figure A1 depicts a fictitious data set. The process described and illustrated in the figure can be written mathematically as follows (8): Let Xj denote the test score of the jth patient with disease and Yk denote the test score of the kth patient without disease. Then,
|
if Xj is equal to Yk, and equals 0 if Xj is less than Yk. The symbol in the following formula
|
|
|
|
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. O. Heise, M. G. Siqueira, R. S. Martins, and J. L. D. Gherpelli Motor Nerve-Conduction Studies in Obstetric Brachial Plexopathy for a Selection of Patients with a Poor Outcome J. Bone Joint Surg. Am., July 1, 2009; 91(7): 1729 - 1737. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.S. Babiarz, J.M. Romero, E.K. Murphy, B. Brobeck, P.W. Schaefer, R.G. Gonzalez, and M.H. Lev Contrast-Enhanced MR Angiography Is Not More Accurate Than Unenhanced 2D Time-of-Flight MR Angiography for Determining >=70% Internal Carotid Artery Stenosis AJNR Am. J. Neuroradiol., April 1, 2009; 30(4): 761 - 768. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Majos, C. Aguilera, J. Alonso, M. Julia-Sape, S. Castaner, J.J. Sanchez, A. Samitier, A. Leon, A. Rovira, and C. Arus Proton MR Spectroscopy Improves Discrimination between Tumor and Pseudotumoral Lesion in Solid Brain Masses AJNR Am. J. Neuroradiol., March 1, 2009; 30(3): 544 - 551. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Capderou, M. Berkani, M.-H. Becquemin, and M. Zelter A Method to Derive Lower Limit of Normal for the FEV1/Forced Expiratory Volume at 6 s of Exhalation Ratio From FEV1/FVC Data Chest, February 1, 2009; 135(2): 408 - 418. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kim, L. Loevner, H. Quon, E. Sherman, G. Weinstein, A. Kilger, and H. Poptani Diffusion-Weighted Magnetic Resonance Imaging for Predicting and Detecting Early Response to Chemoradiation Therapy of Squamous Cell Carcinomas of the Head and Neck Clin. Cancer Res., February 1, 2009; 15(3): 986 - 994. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Akiba, M. Tamakawa, H. Hyodoh, K. Hyodoh, N. Yama, T. Nonaka, Y. Minamida, M. Hashimoto, and M. Hareyama Assessment of Dural Arteriovenous Fistulas of the Cavernous Sinuses on 3D Dynamic MR Angiography AJNR Am. J. Neuroradiol., October 1, 2008; 29(9): 1652 - 1657. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Mossman Conceptualizing and Characterizing Accuracy in Assessments of Competence to Stand Trial J Am Acad Psychiatry Law, September 1, 2008; 36(3): 340 - 351. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. d. A. Rodrigues, I. Hug, M. B. Diniz, R. C.L. Cordeiro, and A. Lussi The Influence of Zero-Value Subtraction on the Performance of Two Laser Fluorescence Devices for Detecting Occlusal Caries In Vitro J Am Dent Assoc, August 1, 2008; 139(8): 1105 - 1112. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gaemperli, T. Schepis, I. Valenta, P. Koepfli, L. Husmann, H. Scheffel, S. Leschka, F. R. Eberli, T. F. Luscher, H. Alkadhi, et al. Functionally Relevant Coronary Artery Disease: Comparison of 64-Section CT Angiography with Myocardial Perfusion SPECT Radiology, August 1, 2008; 248(2): 414 - 423. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Sonego, A. Kocsor, and S. Pongor ROC analysis: applications to the classification of biological sequences and 3D structures Brief Bioinform, May 1, 2008; 9(3): 198 - 209. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Yoshida, T. Kurokawa, Y. Sawamura, A. Shinagawa, T. Tsujikawa, H. Okazawa, T. Tsuchida, Y. Imamura, N. Suganuma, and F. Kotsuji Comparison of 18F-FDG PET and MRI in Assessment of Uterine Smooth Muscle Tumors J. Nucl. Med., May 1, 2008; 49(5): 708 - 712. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Yanaga, K. Awai, T. Nakaura, T. Namimoto, S. Oda, Y. Funama, and Y. Yamashita Optimal Contrast Dose for Depiction of Hypervascular Hepatocellular Carcinoma at Dynamic CT Using 64-MDCT Am. J. Roentgenol., April 1, 2008; 190(4): 1003 - 1009. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y Kawamura-Hagiya, T Yoshioka, and H Suda Logistic regression equation to screen for vertical root fractures using periapical radiographs Dentomaxillofac. Radiol., January 1, 2008; 37(1): 28 - 33. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wen, K. A. Marsolo, E. E. Bennett, K. S. Kutten, R. P. Lewis, D. B. Lipps, N. D. Epstein, J. F. Plehn, and P. Croisille Adaptive Postprocessing Techniques for Myocardial Tissue Tracking with Displacement-encoded MR Imaging Radiology, January 1, 2008; 246(1): 229 - 240. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Petrick, M. Haider, R. M. Summers, S. C. Yeshwant, L. Brown, E. M. Iuliano, A. Louie, J. R. Choi, and P. J. Pickhardt CT Colonography with Computer-aided Detection as a Second Reader: Observer Performance Study Radiology, December 1, 2007; 246(1): 148 - 156. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Iezzi, A. R. Cotroneo, A. Filippone, F. Di Fabio, M. Santoro, and M. L. Storto MDCT Angiography in Abdominal Aortic Aneurysm Treated with Endovascular Repair: Diagnostic Impact of Slice Thickness on Detection of Endoleaks Am. J. Roentgenol., December 1, 2007; 189(6): 1414 - 1420. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Garavaglia, A. Lubbeke, V. Dubois-Ferriere, D. Suva, D. Fritschy, and J. Menetrey Accuracy of Stress Radiography Techniques in Grading Isolated and Combined Posterior Knee Injuries: A Cadaveric Study Am. J. Sports Med., December 1, 2007; 35(12): 2051 - 2056. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. N. Bloch, E. Furman-Haran, T. H. Helbich, R. E. Lenkinski, H. Degani, C. Kratzik, M. Susani, A. Haitel, S. Jaromi, L. Ngo, et al. Prostate Cancer: Accurate Determination of Extracapsular Extension with High-Spatial-Resolution Dynamic Contrast-enhanced and T2-weighted MR Imaging Initial Results Radiology, October 1, 2007; 245(1): 176 - 185. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Honda, A. R. Qureshi, J. Axelsson, O. Heimburger, M. E Suliman, P. Barany, P. Stenvinkel, and B. Lindholm Obese sarcopenia in patients with end-stage renal disease is associated with inflammation and increased mortality Am. J. Clinical Nutrition, September 1, 2007; 86(3): 633 - 638. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Tong, J. J. Carrero, A. R. Qureshi, B. Anderstam, O. Heimburger, P. Barany, J. Axelsson, A. Alvestrand, P. Stenvinkel, B. Lindholm, et al. Plasma Pentraxin 3 in Patients with Chronic Kidney Disease: Associations with Renal Function, Protein-Energy Wasting, Cardiovascular Disease, and Mortality Clin. J. Am. Soc. Nephrol., September 1, 2007; 2(5): 889 - 897. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-H. Lee, C.-Y. Tzeng, and Y.-M. Lin Specificity Gains at the Expense of Sensitivity and Vice Versa Am. J. Roentgenol., June 1, 2007; 188(6): W574 - W574. [Full Text] [PDF] |
||||
![]() |
M. J. Slavin, C. K. Sandstrom, T.-T. T. Tran, P. M. Doraiswamy, and J. R. Petrella Hippocampal Volume and the Mini-Mental State Examination in the Diagnosis of Amnestic Mild Cognitive Impairment Am. J. Roentgenol., May 1, 2007; 188(5): 1404 - 1410. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. H. Zou, A. J. O'Malley, and L. Mauri Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models Circulation, February 6, 2007; 115(5): 654 - 657. [Full Text] [PDF] |
||||
![]() |
S. D. Bixby, B. C. Lucey, J. A. Soto, J. M. Theysohn, A. Ozonoff, and J. C. Varghese Perforated versus Nonperforated Acute Appendicitis: Accuracy of Multidetector CT Detection Radiology, December 1, 2006; 241(3): 780 - 786. [Abstract] [Full Text] [PDF] |
||||
![]() |
M F Chiang, J Starren, Y E Du, J D Keenan, W M Schiff, G R Barile, J Li, R A Johnson, D J Hess, and J T Flynn Remote image based retinopathy of prematurity diagnosis: a receiver operating characteristic analysis of accuracy Br. J. Ophthalmol., October 1, 2006; 90(10): 1292 - 1296. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Aguirre, C. A. Behling, E. Alpert, T. I. Hassanein, and C. B. Sirlin Liver Fibrosis: Noninvasive Diagnosis with Double Contrast Material-enhanced MR Imaging. Radiology, May 1, 2006; 239(2): 425 - 437. [Abstract] [Full Text] [PDF] |
||||
![]() |
I.-F. Talos, K. H. Zou, L. Ohno-Machado, J. G. Bhagwat, R. Kikinis, P. M. Black, and F. A. Jolesz Supratentorial Low-Grade Glioma Resectability: Statistical Predictive Analysis Based on Anatomic MR Features and Tumor Characteristics. Radiology, May 1, 2006; 239(2): 506 - 513. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Li, K. Sugimura, Y. Kaji, Y. Kitamura, M. Fujii, I. Hara, and M. Tachibana Conventional MRI capabilities in the diagnosis of prostate cancer in the transition zone. Am. J. Roentgenol., March 1, 2006; 186(3): 729 - 742. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. H. Heijenbrok-Kal, E. Buskens, P. J. Nederkoorn, Y. van der Graaf, and M. G. M. Hunink Optimal Peak Systolic Velocity Threshold at Duplex US for Determining the Need for Carotid Endarterectomy: A Decision Analytic Approach Radiology, December 21, 2005; (2005) 2381041078. [Abstract] [Full Text] |
||||
![]() |
Y. K. Kim, H. S. Kwak, C. S. Kim, G. H. Chung, Y. M. Han, and J. M. Lee Hepatocellular Carcinoma in Patients with Chronic Liver Disease: Comparison of SPIO-enhanced MR Imaging and 16-Detector Row CT Radiology, December 21, 2005; (2005) 2381042193. [Abstract] [Full Text] |
||||
![]() |
K. H. Zou, D. N. Greve, M. Wang, S. D. Pieper, S. K. Warfield, N. S. White, S. Manandhar, G. G. Brown, M. G. Vangel, R. Kikinis, et al. Reproducibility of Functional MR Imaging: Preliminary Results of Prospective Multi-institutional Study Performed by Biomedical Informatics Research Network Radiology, December 1, 2005; 237(3): 781 - 789. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. M. Fathallah-Shaykh Noise and rank-dependent geometrical filter improves sensitivity of highly specific discovery by microarrays Bioinformatics, December 1, 2005; 21(23): 4255 - 4262. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Funama, K. Awai, Y. Nakayama, K. Kakei, N. Nagasue, M. Shimamura, N. Sato, S. Sultana, S. Morishita, and Y. Yamashita Radiation Dose Reduction without Degradation of Low-Contrast Detectability at Abdominal Multisection CT with a Low-Tube Voltage Technique: Phantom Study Radiology, December 1, 2005; 237(3): 905 - 910. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Skaane, C. Balleyguier, F. Diekmann, S. Diekmann, J.-C. Piguet, K. Young, and L. T. Niklason Breast Lesion Detection and Classification: Comparison of Screen-Film Mammography and Full-Field Digital Mammography with Soft-copy Reading--Observer Performance Study Radiology, October 1, 2005; 237(1): 37 - 44. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Partain, H.-P. Chan, J. G. Gelovani, M. L. Giger, J. A. Izatt, F. A. Jolesz, K. Kandarpa, K. C. P. Li, M. McNitt-Gray, S. Napel, et al. Biomedical Imaging Research Opportunities Workshop II: Report and Recommendations Radiology, August 1, 2005; 236(2): 389 - 403. [Full Text] [PDF] |
||||
![]() |
Y.-M. Lin, T.-S. Lee, S. M. Wong, S. K. Lo, and N. A. Obuchowski Cutoff Point of 0.10 cm2 Appropriate for Both Hands * Drs Wong and Lo respond: * Dr Obuchowski responds: Radiology, February 1, 2005; 234(2): 642 - 643. [Full Text] [PDF] |
||||
![]() |
L. G. Astrakas, D. Zurakowski, A. A. Tzika, M. K. Zarifi, D. C. Anthony, U. De Girolami, N. J. Tarbell, and P. M. Black Noninvasive Magnetic Resonance Spectroscopic Imaging Biomarkers to Predict the Clinical Grade of Pediatric Brain Tumors Clin. Cancer Res., December 15, 2004; 10(24): 8220 - 8228. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. M. Fathallah-Shaykh, B. He, L.-J. Zhao, and A. Badruddin Mathematical algorithm for discovering states of expression from direct genetic comparison by microarrays Nucleic Acids Res., July 20, 2004; 32(13): 3807 - 3814. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |