|
|
||||||||
Statistical Concepts Series |
1 From the Department of Radiology, Brigham and Womens Hospital, Harvard Medical School, Boston, Mass (K.H.Z., J.R.F., S.G.S., C.M.C.T.); and Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (K.H.Z.). Received September 10, 2001; revision requested November 8; revision received December 12; accepted December 19. Supported in part by Public Health Service Grant NIH-U01 CA9398-03 awarded by the National Cancer Institute, Department of Health and Human Services. Address correspondence to K.H.Z. (e-mail: zou@bwh.harvard.edu).
| ABSTRACT |
|---|
|
|
|---|
© RSNA, 2003
Index terms: Statistical analysis
| INTRODUCTION |
|---|
|
|
|---|
Estimation can be carried out on the basis of sample values from a larger population (1). Point estimation involves the use of summary statistics, including the sample mean and SD. These values can be used to estimate intervals, such as the 95% confidence level. For example, by using summary statistics, one can determine the sensitivity or specificity of the size and location of a ureteral stone for prediction of the clinical management required. In a study performed by Fielding et al (2), it was concluded that stones larger than 5 mm in the upper one-third of the ureter were very unlikely to pass spontaneously.
In contrast, hypothesis testing enables one to quantify the degree of uncertainty in sampling variation, which may account for the results that deviate from the hypothesized values in a particular study (3,4). For example, hypothesis testing would be necessary to determine if ovarian cancer is more prevalent in nulliparous women than in multiparous women.
It is important to distinguish between a research hypothesis and a statistical hypothesis. The research hypothesis is a general idea about the nature of the clinical question in the population of interest. The primary purpose of the statistical hypothesis is to establish the basis for tests of significance. Consequently, there is also a difference between a clinical conclusion based on a clinical hypothesis and a statistical conclusion of significance based on a statistical hypothesis. In this article, we will focus on statistical hypothesis testing only.
In this article we review and demonstrate the hypothesis tests for both a single proportion and a comparison of two independent proportions. The topics covered may provide a basic understanding of the quantitative approaches for analyzing radiologic data. Detailed information on these concepts may be found in both introductory (5,6) and advanced textbooks (79). Related links on the World Wide Web are listed in Appendix A.
| STATISTICAL HYPOTHESIS TESTING BASICS |
|---|
|
|
|---|
There are five steps necessary for conducting a statistical hypothesis test: (a) formulate the null (H0) and alternative (H1) hypotheses, (b) compute the test statistic for the given conditions, (c) calculate the resulting P value, (d) either reject or do not reject H0 (reject H0 if the P value is less than or equal to a prespecified significance level [typically .05]; do not reject H0 if the P value is greater than this significance level), and (e) interpret the results according to the clinical hypothesis relevant to H0 and H1. Each of these steps are discussed in the following text.
Null and Alternative Hypotheses
In general, H0 assumes that there is no association between the predictor and outcome variables in the study population. In such a case, a predictor (ie, explanatory or independent) variable is manipulated, and this may have an effect on another outcome or dependent variable. For example, to determine the effect of smoking on blood pressure, one could compare the blood pressure levels in nonsmokers, light smokers, and heavy smokers.
It is mathematically easier to frame hypotheses in null and alternative forms, with H0 being the basis for any statistical significance test. Given the H0 of no association between a predictor variable and an outcome variable, a statistical hypothesis test can be performed to estimate the probability of an association due to chance that is derived from the available data. Thus, one never accepts H0, but rather one rejects it with a certain level of significance.
In contrast, H1 makes a claim that there is an association between the predictor and outcome variables. One does not directly test H1, which is by default accepted when H0 is rejected on the basis of the statistical significance test results.
One- and Two-sided Tests
The investigator must also decide whether a one- or two-sided test is most suitable for the clinical question (4). A one-sided H1 test establishes the direction of the association between the predictor and the outcomefor example, that the prevalence of ovarian cancer is higher in nulliparous women than in parous women. In this example, the predictor is parity and the outcome is ovarian cancer. However, a two-sided H1 test establishes only that an association exists without specifying the directionfor example, the prevalence of ovarian cancer in nulliparous women is different (ie, either higher or lower) from that in parous women. In general, most hypothesis tests involve two-sided analyses.
Test Statistic
The test statistic is a function of summary statistics computed from the data. A general formula for many such test statistics is as follows: test statistic = (relevant statistic - hypothesized parameter value)/(standard error of the relevant statistic), where the relevant statistics and standard error are calculated on the basis of the sample data. The standard error is the indicator of variability, and much of the complexity of the hypothesis test involves estimating the standard error correctly. H0 is rejected if the test statistic exceeds a certain level (ie, critical value).
For example, for continuous data, the Student t test is most often used to determine the statistical significance of an observed difference between mean values with unknown variances. On the basis of large samples with underlying normal distributions and known variances (5), the Z test of two population means is often conducted. Similar to the t test, the Z test involves the use of a numerator to compare the difference between the sample means of the two study groups with the difference that would be expected with H0, that is, zero difference. The denominator includes the sample size, as well as the variances, of each study group (5).
Once the Z value is calculated, it can be converted into a probability statistic by means of locating the P value in a standard reference table. The Figure illustrates a standard normal distribution (mean of 0, variance of 1) of a test statistic, Z, with two rejection regions that are either below -1.96 or above 1.96. Two hypothetical test statistic values, -0.5 and 2.5, which lie outside and inside the rejection regions, respectively, are also included. Consequently, one does not reject H0 when Z equals -0.5, but one does reject H0 when Z equals 2.5.
|
Type I and II Errors
Two types of errors can occur in hypothesis testing: A type I error (significance level
) represents the probability that H0 was erroneously rejected when in fact it is true in the underlying population. Note that the P value is not the same as the
value, which represents the significance level in a type I error. The significance level
is prespecified (5% conventionally), whereas the P value is computed on the basis of the data and thus reflects the strength of the rejection of H0 on the test statistic. A type II error (significance level ß) represents the probability that H0 was erroneously retained when in fact H1 is true in the underlying population. There is always a trade-off between these two types of errors, and such a relationship is similar to that between sensitivity and specificity in the diagnostic literature (Table) (10). The probability 1 - ß is the statistical power and is analogous to the sensitivity of a diagnostic test, whereas the probability 1 -
is analogous to the specificity of a diagnostic test.
|
| STATISTICAL TESTS OF PROPORTIONS: THE Z TEST |
|---|
|
|
|---|
When sample sizes are large, the approximate normality assumptions hold for both the sample proportion and the test statistic. In the test of a single proportion (
) based on a sample of n independent trials at a hypothesized success probability of
0 (the hypothesized proportion), both n
0 and n(1 -
0) need to be at least 5 (Appendix B). In the comparison of two proportions,
1 and
2, based on two independent sample sizes of n1 and n2 independent trials, respectively, both n1 and n2 need to be at least 30 (Appendix C) (5). The test statistic is labeled Z, and, hence, the analysis is referred to as the Z test of a proportion. Other exact hypothesis-testing methods are available if these minimum numbers are not met.
Furthermore, the Z and Student t tests both are parametric hypothesis teststhat is, they are based on data with an underlying normal distribution. There are many situations in radiology research in which the assumptions needed to use a parametric test do not hold. Therefore, nonparametric tests must be considered (9). These statistical tests will be discussed in a future article.
| TWO RADIOLOGIC EXAMPLES |
|---|
|
|
|---|
1. H0 is as follows: 80% of the ureteral stones smaller than 6 mm will pass spontaneously (
= 0.80). H1 is as follows: The proportion of the stones smaller than 6 mm that pass spontaneously does not equal 80%that is, it is either less than or greater than 80% (
0.80). This is therefore a two-sided hypothesis test.
2. The test statistic Z is calculated to be 1.29 on the basis of the results of the Z test of a single proportion (5).
3. The P value, .20, is the sum of the two tail probabilities of a standard normal distribution for which the Z values are beyond ±1.29 (Figure).
4. Because the P value, .20, is greater than the significance level
of 5%, H0 is not rejected.
5. Therefore, our data support the belief that 80% of the stones smaller than 6 mm in diameter will pass spontaneously, as reported in the literature. Thus, H0 is not rejected, given the data at hand. Consequently, it is possible that a type II error will occur if the true proportion in the population does not equal 80%.
Two-Sample Z Test to Compare Two Independent Proportions
Brown et al (14) hypothesized that the imaging appearances (eg, multilocularity) of primary ovarian tumors and metastatic tumors to the ovary might be different. Data were obtained from 280 patients who had an ovarian mass and underwent US in the Radiologic Diagnostic Oncology Group (RDOG) ovarian cancer staging trial (15,16). The study results showed that 30 (37%) of 81 primary ovarian cancers, as compared with three (13%) of 24 metastatic neoplasms, were multilocular at US. To test if the respective underlying proportions are different, we conduct a statistical hypothesis test with five steps:
1. H0 is as follows: There is no difference between the proportions of multilocular metastatic tumors (
1) and multilocular primary ovarian tumors (
2) among the primary and secondary ovarian cancersthat is,
1 -
2 = 0. H1 is as follows: There is a difference in these proportions: One is either less than or greater than the otherthat is,
1 -
2
0. Thus, a two-sided hypothesis test is conducted.
2. The test statistic Z is calculated to be 2.27 on the basis of the results of the Z test to compare two independent proportions (5).
3. The P value, .02, is the sum of the two tail probabilities of a standard normal distribution for which the Z values are beyond ±2.27 (Figure).
4. Because the P value, .02, is less than the significance level
of 5%, H0 is rejected.
5. Therefore, there is a statistically significant difference between the proportion of multilocular masses in patients with primary tumors and that in patients with metastatic tumors.
| SUMMARY AND REMARKS |
|---|
|
|
|---|
Alternative exact hypothesis-testing methods are available if the sample sizes are not sufficiently large. In the case of a single proportion, the exact binomial test can be conducted. In the case of two independent proportions, the proposed large-sample Z test is equivalent to a test based on contingency table (ie,
2) analysis. When large samples are not available, however, the Fisher exact test based on contingency table analysis can be adopted (8,1719). For instance, in the clinical example involving data from the RDOG study, the sample of 24 metastatic neoplasms is slightly smaller than the required sample of 30 neoplasms, and, thus, use of the exact Fisher test may be preferred.
The basic concepts and methods reviewed in this article may be applied to similar inferential and clinical trial design problems related to counts and proportions. More complicated statistical methods and study designs may be considered, but these are beyond the scope of this tutorial article (2024). A list of available software packages can be found by accessing the Web links given in Appendix A.
| APPENDIX A |
|---|
|
|
|---|
lane/rvls.html, www.bmj.com:/collections/statsbk/index.shtml, and espse.ed.psu.edu/statistics/investigating .htm. In addition, statistical software packages are available at the following address: www.amstat.org/chapters/alaska/resources .htm. | APPENDIX B |
|---|
|
|
|---|
be a population proportion to be tested (Table B1). The procedure for deciding whether or not to reject H0 is as follows:
=
0; this is based on the results of a one-sided, one-sample Z test at the significance level of
with n independent trials(Table B1). The observed number of successes is x, and, thus, the sample proportion of successes is p = x/n. In our first clinical example, that in which the unenhanced helical CT features of 100 ureteral calculi were evaluated (2),
= 0.80, n = 66, x = 57, and p = 66/57 (0.86).
|
| APPENDIX C |
|---|
|
|
|---|
1 and
2 be the two independent population proportions to be compared (Table C1). The procedure for deciding whether or not to reject H0 is as follows:
1 -
2 = 0; this is based on the results of a one-sided, two-sample Z test at the significance level of
with two independent trials: sample sizes of n1 and n2, respectively (Table C1). The observed numbers of successes in these two samples are p1 = x1/n1 and p2 = x2/n2, respectively. To denote the pooled proportion of successes over the two samples, use the following equation: pc = (x1 + x2)/(n1 + n2). In our second clinical example, that involving 280 patients with ovarian masses in the RDOG ovarian cancer staging trial (15,16), n1 = 81, x1 = 30, n2 = 24, x2 = 3, p1 = x1/n1 (30/81 [0.37]), p2 = x2/n2 (3/24 [0.13]), and pc = (x1 + x2)/(n1 + n2), or 33/105 (0.31).
|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Abbreviations: H0 = null hypothesis, H1 = alternative hypothesis, RDOG = Radiologic Diagnostic Oncology Group
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
V. Bajpai, K. H. Lee, B. Kim, K. J. Kim, T. J. Kim, Y. H. Kim, and H. S. Kang Differences in Compression Artifacts on Thin- and Thick-Section Lung CT Images Am. J. Roentgenol., August 1, 2008; 191(2): W38 - W43. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Petrick, M. Haider, R. M. Summers, S. C. Yeshwant, L. Brown, E. M. Iuliano, A. Louie, J. R. Choi, and P. J. Pickhardt CT Colonography with Computer-aided Detection as a Second Reader: Observer Performance Study Radiology, December 1, 2007; 246(1): 148 - 156. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. S. Stafford, V. Monti, C. D. Furberg, and J. Ma Long-Term and Short-Term Changes in Antihypertensive Prescribing by Office-Based Physicians in the United States Hypertension, August 1, 2006; 48(2): 213 - 218. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Ragozzino, G. Brancatelli, V. Vilgrain, M. P. Federle, F. Uzan, M. Zappa, and Y. Menu Biases Likely Invalidate the Conclusions [letter] * Dr Brancatelli and colleagues respond: Radiology, June 1, 2004; 231(3): 926 - 927. [Full Text] [PDF] |
||||
![]() |
R. Tello and P. E. Crewson Hypothesis Testing II: Means Radiology, April 1, 2003; 227(1): 1 - 4. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |