|
|
||||||||
Statistical Concepts Series |
1 From the Department of Radiology and Health Outcomes Policy and Economics (H.O.P.E.) Center, Childrens Hospital, 3100 SW 62nd Ave, Miami, FL 33155; and Departments of Orthopaedic Surgery and Biostatistics, Childrens Hospital, Harvard Medical School, Boston, Mass. Received September 17, 2001; revision requested November 12; revision received December 17; accepted January 21, 2002. Address correspondence to L.S.M. (e-mail: smedina@post.harvard.edu).
| ABSTRACT |
|---|
|
|
|---|
© RSNA, 2003
Index terms: Radiology and radiologists, research Statistical analysis
| INTRODUCTION |
|---|
|
|
|---|
Precise knowledge of important statistical parameters, such as the SD, the standard error of the mean (SEM), and the CIs, will provide the radiologist with answers to the questions previously posed. Most of these parameters can be quickly and easily obtained with a small calculator. In addition, these parameters are useful while reading the literature. Appropriate understanding and use of these fundamental statistics, namely, the SD, the SEM, and the CI, will allow more reliable analysis, interpretation, and communication of clinical information among health care providers and between these providers and their patients.
| WHAT ARE RANDOM SAMPLING AND THE CENTRAL LIMIT THEOREM? |
|---|
|
|
|---|
Random sampling has other advantages. Because the sample is randomly selected, the methods of probability theory can be applied to the data obtained. This enables the clinician to estimate the likely size of the errors that may occur, for example, with the SD or CIs, and to present them as part of the results (2, pp 3336).
In general, if one has any series of independent identically distributed random variables, then their sum tends to produce a normal distribution as the number of variables increases (2, pp 116120) (Fig 1). This fundamental theorem in statistics is known as the central limit theorem (1, pp 248277; 2, pp 116120). Simply stated, as sample size increases, the means of samples from a population of any distribution will approach the normal (Gaussian) distribution. This is an important property because it allows clinicians to use the normal distribution to formulate inferences from the data about means of populations. In addition, the variability of means of samples obtained from a population decreases as the sample size increases. However, the sample size required to make use of the central limit theorem depends on the underlying distribution of the population, and skewed populations require larger samples.
|
|
| WHAT IS THE DIFFERENCE BETWEEN THE SD AND THE SEM? |
|---|
|
|
|---|
The following example is given to illustrate the difference between the SD and the SEM and why one should summarize data by using the SD. Suppose that, in a study sample of patients with atherosclerotic disease, an investigator reported that the PSV in the carotid artery was 220 cm/sec and the SD was 10. Since the PSV in about 95% of all population members is within roughly 2 SDs of the mean, the results would tell one that, assuming the distribution is approximately normal, it would be unusual to observe a PSV less than 200 cm/sec or greater than 240 cm/sec in moderate atherosclerotic disease of the carotid artery. Therefore, a summary of the population and a range with which to compare specific patients who are examined by the clinician are described in the article.
Unfortunately, the investigator is quite likely to say that the PSV of the common carotid artery was 220 cm/sec ± 1.6 (SEM). If one confused the SEM with the SD, one would believe that the range of most of the population was narrow, between 216.8 and 223.2 cm/sec. These values describe the range that about 95% of the time includes the mean PSV of the entire population from which the sample of patients was chosen. The SEM is simply a measure of how far the sample mean is likely to be from the actual population mean. In practice, however, one generally wants to compare an individual patients PSV with the spread of the population distribution as a whole and not with the population mean (3). This information is provided by the SD and not by the SEM.
| WHAT ARE CIs? |
|---|
|
|
|---|
A CI is the range of values that is believed to encompass the actual ("true") population value (1, pp 5563). This true population value or parameter of interest usually is not known, but it does exist and can be estimated from an appropriately selected sample. CIs around population estimates provide information about how precise the estimate is. Wider CIs indicate lesser precision, while narrower ones indicate greater precision (Figs 1, 2). CIs provide bounds to estimates.
If one repeatedly obtained samples from the population and constructed CIs for each sample, then one could expect a certain percentage of the CIs to include the value of the true population and a certain percentage of them not to include that value. For example, with a 95% CI, the level of certainty is 95% of such CIs obtained in repeated sampling to include the true parameter value and only 5% of the CIs not to include the true parameter value.
| HOW ARE CIs FOR A MEAN CALCULATED? |
|---|
|
|
|---|
One can calculate an interval for any desired degree of confidence, although 95% CIs are by far the most commonly used. The following equation is the usual method for calculating a 95% CI that is based on a normally distributed sample with a known SD or one that is based on a sample from a populatio
n with an unknown SD but in which the population is known to be normally distributed and the sample itself is large (ie, n > 100):
|
The scale of z scores is independent of the units of measurement. Therefore, for any measurement being investigated, one can calculate an individuals z score and compare it with that of other individuals. The z scores are calculated from the sample data as (X - mean)/SD, where X is the actual individuals value. For example, if an individuals value is 1 SD above the mean for the group, that individuals z score is 1.0; a value 1 SD below the mean corresponds to a z score of -1.0. Approximately 68.0% of the area under the normal curve includes z scores between -1.0 and 1.0, approximately 95.0% of the area includes z scores between -2.0 and 2.0, and 99.7% of the area under the normal curve includes z scores between -3.0 and 3.0.
Equation (1) can be applied when the data conform to a normal (Gaussian) distribution and when the population SD is known. When the sample is small (n < 100) and information regarding the parametric SD is not known, one must rely on the sample SD, which requires setting CIs by using the t distribution. In this situation, the z value should be replaced with the appropriate critical value of the t distribution with n - 1 degrees of freedom, where n is the sample size.
The t distribution, or the Student t distribution, resembles the normal distribution, although its shape depends on the sample size. It is wider than the normal distribution to account for variability in estimating the mean and SD from the sample data (5). The t distribution differs from the normal distribution in that it assumes different shapes depending on the number of degrees of freedom. Therefore, when setting a CI around a mean, the appropriate critical value of the t distribution should be used in place of the z value in Equation (1). This t value can be found in a conventional t table included in most statistical textbooks. For example, in a study with a sample size of 25, the critical value for a t distribution that corresponds to a 95% CI, where 1 -
is the confidence level and n - 1 indicates the degrees of freedom, is 2.064.
CIs can be constructed for any desired level of confidence. There is nothing magical about 95%, although it is traditionally used. If greater confidence is needed, then the CIs have to be wider. Consequently, 99% CIs are wider than 95% CIs, and 90% CIs are narrower than 95% CIs. Wider CIs are associated with greater confidence but less precision. This is the trade-off.
If one assumes that a sample was randomly selected from a certain population (that follows a normal distribution), one can be 95% sure that the CI includes the population mean. More precisely, if one generates many 95% CIs from many data sets, one can expect that the CI will include the true population mean in 95% of the cases and that the CI will not include the true mean value in the other 5%. Therefore, the 95% CI is related to statistical significance at the .05 level, which means that the CI itself can be used to determine if an estimated change is statistically significant at the .05 level (1, pp 5563).
Whereas the P value is often interpreted as an indication of a statistically significant difference, the CI, by providing a range of values, allows the reader to interpret the implications of the results at either end of the range (1, pp 5563; 6). For example, if one end of the range includes clinically important results but the other does not, the results can be regarded as inconclusive, not simply as an indication of a statistically significant difference or not. In addition, whereas P values are not presented in units, CIs are presented in the units of the variable of interest, and this latter presentation helps readers to interpret the results. CIs are generally preferred to P values because CIs shift the interpretation from a qualitative judgment about the role of chance to a quantitative estimation of the biologic measure of effect (1, pp 5563; 6). More importantly, the CI quantifies the precision of the mean.
For example, findings in two hypothetical articles about US in the carotid artery in elderly patients indicate that a mean PSV of 200 cm/sec is associated with a 70% stenosis of the vascular diameter. Both articles reported the same SD of 50 cm/sec. However, one article was about a study that included 50 subjects, whereas the other one was about a study that included 500 subjects. At first glance, both articles appear to have the same information. This is delineated with the calculations here.
The calculations in the article with the smaller sample were as follows:
|
|
|
|
|
|
|
|
| WHY ARE CIs FOR SENSITIVITY AND SPECIFICITY OF A TEST IMPORTANT? |
|---|
|
|
|---|
The simplest diagnostic test is dichotomous, in which the results are used to classify patients into two groups according to the presence or absence of disease. Magnetic resonance (MR) imaging and arthroscopic findings from a hypothetical example are delineated in Table 1. In this hypothetical study, arthroscopy is considered the standard of reference. The question that arises in the clinical setting is, "How good is knee MR imaging at helping to distinguish torn and intact ACLs?" In other words, "To what degree can one rely on the interpretation of MR imaging in making judgments about the status of a patients knee?"
|
Sensitivity is calculated as the proportion of torn ACLs that were correctly classified by using MR imaging. In this example, of the 421 knees with ACL tears, 394 were correctly evaluated with MR imaging (Table 1). The sensitivity of MR imaging in the detection of ACL tears is, therefore, 94% (ie, sensitivity = 394/421 = 0.94). In other words, 94% of ACL tears were correctly classified as torn by using MR imaging. The 95% CI for a proportion can be determined by the equation shown here:
|
|
Specificity is calculated as the proportion of intact ACLs that were correctly classified by using MR imaging. Of the 133 knees with an intact ACL, 101 were correctly classified. The specificity of MR imaging is, therefore, 76% (ie, specificity = 101/133 = 0.76). This means that 76% of intact ACLs were correctly classified as intact by using MR imaging. By using Equation (2), the 95% CI for specificity is 0.76 ± 0.07 or 0.69 to 0.83. Therefore, one expects MR imaging to have a specificity between 69% and 83%. It is also important to note that the CI was wider for specificity than it was for sensitivity because the sample groups were 133 (smaller) and 421 (larger), respectively.
| CAN CIs FOR ODDS RATIOS BE CALCULATED? |
|---|
|
|
|---|
Data from a study by Kocher et al (8) are shown in Table 2 and can be summarized as follows: For those patients with radiographic effusion, the odds of having septic arthritis are 63/33 (effusion/no effusion) = 1.9. The odds of having septic arthritis for those with no radiographic effusion are 19/53 (effusion/no effusion) = 0.36. The OR is the ratio of these two odds: 1.9/0.36 = 5.3. This means that children with a radiographic effusion are approximately five times more likely to have septic arthritis than those without a radiographic effusion. The OR is sometimes referred to as the cross product ratio because it can be calculated by means of multiplication of the counts in the diagonal cells and division of data as follows (Table 2): OR = ad/bc = (63 x 53)/(33 x 19) = 3,339/627 = 5.3.
|
Several methods are commonly used to construct CIs around the OR. A simple method for constructing CIs (8) can be expressed as follows:
|
By using Equation (3), the 95% CI in our example is calculated as follows:
|
|
|
|
|
|
|
|
|
|
|
|
|
Therefore, among children who have acute hip pain at presentation, those with a radiographic effusion are, on average, 5.3 times more likely to have septic arthritis compared with those with no radiographic effusion. The 95% CI lower limit of the OR is 2.7 and the upper limit is 10.5. When the 95% CI does not include 1.0 (as in this example), the results indicate a statistically significant difference at the .05 level (ie, P < .05).
| CONCLUSION |
|---|
|
|
|---|
CIs can be calculated for means as well as for proportions. Proportions commonly used in medicine include sensitivity, specificity, and the OR. Proportions should always be accompanied by 95% CIs. Proper understanding and use of fundamental statistics, such as the SD, the SEM, and the CI, and their calculations will allow more reliable analysis, interpretation, and communication of clinical data to patients and to referring physicians.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. E. Zgleszewski, D. Zurakowski, P. J. Fontaine, M. D'Angelo, and K. P. Mason Is Propofol a Safe Alternative to Pentobarbital for Sedation during Pediatric Diagnostic CT? Radiology, May 1, 2008; 247(2): 528 - 534. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wen, K. A. Marsolo, E. E. Bennett, K. S. Kutten, R. P. Lewis, D. B. Lipps, N. D. Epstein, J. F. Plehn, and P. Croisille Adaptive Postprocessing Techniques for Myocardial Tissue Tracking with Displacement-encoded MR Imaging Radiology, January 1, 2008; 246(1): 229 - 240. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. S. Medina, B. Bernal, and J. Ruiz Role of Functional MR in Determining Language Dominance in Epilepsy and Nonepilepsy Populations: A Bayesian Analysis Radiology, January 1, 2007; 242(1): 94 - 100. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Hollingworth, L.S. Medina, R.E. Lenkinski, D.K. Shibata, B. Bernal, D. Zurakowski, B. Comstock, and J.G. Jarvik A Systematic Literature Review of Magnetic Resonance Spectroscopy for the Characterization of Brain Tumors AJNR Am. J. Neuroradiol., August 1, 2006; 27(7): 1404 - 1411. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Sanborn, E. Michna, D. Zurakowski, P. E. Burrows, P. J. Fontaine, L. Connor, and K. P. Mason Adverse Cardiovascular and Respiratory Events during Sedation of Pediatric Patients for Imaging Examinations Radiology, October 1, 2005; 237(1): 288 - 294. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. S. Medina, B. Bernal, C. Dunoyer, L. Cervantes, M. Rodriguez, E. Pacheco, P. Jayakar, G. Morrison, J. Ragheb, and N. R. Altman Seizure Disorders: Functional MR Imaging for Diagnostic Evaluation and Surgical Treatment--Prospective Study Radiology, July 1, 2005; 236(1): 247 - 253. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ishigami, Y. Zhang, S. Rayhill, D. Katz, and A. Stolpen Does Variant Hepatic Artery Anatomy in a Liver Transplant Recipient Increase the Risk of Hepatic Artery Complications After Transplantation? Am. J. Roentgenol., December 1, 2004; 183(6): 1577 - 1584. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. P. Mason, D. Zurakowski, L. Connor, V. E. Karian, P. J. Fontaine, P. A. Sanborn, and P. E. Burrows Infant Sedation for MR Imaging and CT: Oral versus Intravenous Pentobarbital Radiology, December 1, 2004; 233(3): 723 - 728. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Sistrom and C. W. Garvan Proportions, Odds, and Risk Radiology, January 1, 2004; 230(1): 12 - 19. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Brenner, J. H. Sumkin, and D. Gur Prior Mammograms: How Old Is Old? Am. J. Roentgenol., August 1, 2003; 181 (2): 594 - 595. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |