|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Special Report |
1 From the Departments of Epidemiology and Biostatistics (M.G., K.S.P.) and Radiology (S.M.L.), Memorial Sloan-Kettering Cancer Center, 1275 York Ave, Box 44, New York, NY 10021. Received January 8, 2001; revision requested February 12; revision received April 23; accepted May 14. Supported by grant 1 P50 CA86438 01 from the National Institutes of Health/National Cancer Institute and by the Hascoe Fund. Address correspondence to M.G. (e-mail: gonenm@mskcc.org).
| ABSTRACT |
|---|
|
|
|---|
2 and t tests, are introduced. One of these methods is illustrated by using a brief analysis of data from a positron emission tomographic study, which demonstrates how resulting conclusions may be incorrect if appropriate techniques are not applied. Alternative methods that can handle multiple observations and dependency within a subject for diagnostic imaging studies are discussed.
Index terms: Positron emission tomography (PET) Special Reports Statistical analysis
| INTRODUCTION |
|---|
|
|
|---|
2 tests, are based on the assumption that all observations are independent of each other. Observations within the same cluster, however, are correlated. Any conclusions made on the basis of an analysis in which it is assumed that all observations are independent will not be valid (13). The extent of the problem depends heavily on the magnitude of the correlation, as well as on the number of observations within a cluster. When the correlation between the observations in the same patient is positive, ignoring the correlation might result in an erroneous conclusion, that is, the statistical significance of a test for association will be overstated (smaller P values). Consider the use of a digital survey technique, such as positron emission tomography (PET), in oncology to help detect tumor sites and monitor the metabolic response of tumor to anticancer therapy as an example where patients with advanced cancers may present with multiple sites of tumor in their bodies (46). Because PET offers the possibility of imaging the whole body, as many as 60 lesions may be observed in one patient. In sicker patients (patients exhibiting more symptoms), the presence of a lesion in a certain site increases the chances of observing a lesion in another site (positive correlation). Hence, data obtained from whole-body PET scans are likely to be clustered. Evaluations with magnetic resonance angiography, in which multiple artery segments are examined for each patient, are another example. One can easily identify other examples from different radiologic settings. Although each example will have its own unique radiologic aspects, the analysis of clustered data from all these experiments follows the same statistical principles. Although we use examples from PET studies in this article to keep the discussion focused, the methods we discuss are equally applicable to a wide range of radiologic studies.
Clustered samples occur in other fields of study as well. For example, in periodontal studies, the subject is the cluster but multiple sites (gums or teeth) in the subjects mouth are subsampled; in ophthalmologic research, the subject, again, is the cluster, but two eyes are measured; and in teratologic studies, the litter is the cluster, but measurements are taken from each animal in the litter.
There are vast amounts of statistical methods in the literature with which to analyze clustered data, ranging from simple corrections to complicated modeling techniques. Although some of these methods are developed specifically within the context of radiologic studies, others are frequently applied in other fields such as periodontology. Our purpose is to provide an introduction to the problem of clustered data in radiology, briefly evaluate some of the statistical methods, and demonstrate an analysis by using one of the methods that we discussed.
| SOME METHODS OF ANALYSIS FOR CLUSTERED DATA |
|---|
|
|
|---|
Binary Response Data (
2 Test): PET Findings (Positive vs Negative) are an Example of Binary Outcome Data
If the variables studied are binary, a
2 test can be used to test for an association (eg, to test whether PET findings [positive vs negative] are associated with disease stage [early vs late]). An appropriate way to adjust for the effect of clustering is to divide the conventional
2 test statistic by a correction factor C, which is defined as [1 + (m - 1)
], where
is a quantity called the intracluster cluster correlation and m the number of measurements per subject. Estimation of
is explained in a later section. This adjusted
2 statistic is then compared with the
2 distribution to obtain a P value. For a motivation of this correction and further references, see the article by Donner and Banting (7). Available therein as well is the correction factor when each cluster has a varied number of measurements, that is, m is replaced by a weighted average.
Depending on the hypothesis of interest, other
2 tests might be applied. For example, to compare two proportions estimated from paired observations (eg, each patient is examined with two modalities), the McNemar
2 test might be used; to test for an association between two variables (eg, PET findings and prostate-specific antigen [PSA] levels) adjusting for a third factor (eg, age less than 50 years vs age greater than 50 years), the Mantel-Haenszel
2 test might be used. Similar scalar adjustments have been presented for these tests as well (812).
Continuous Data (Two-Sample t Test): Standardized Uptake Value is an Example of Continuous Outcome Data
Suppose one wanted to examine the difference between two sample means (eg, mean standardized uptake value, SUV, a commonly used quantitative PET measurement) to determine whether differences exist between two groups (eg, patients with a high PSA level and those with a normal PSA level). The usual t test equation is defined as follows:
]. This adjusted t statistic is then compared with a t distribution to obtain a P value. This method is very attractive because it is simple to understand and is applied to a well-known statistic.
Ordinal Response Data (Receiver Operating Characteristic Curve): PET Findings on a Five-Point Scale are an Example of Ordinal Response Data
Ordinal data such as PET findings on a five-point scale are usually analyzed with receiver operating characteristics curve analysis. For every cutoff on this ordinal five-point scale, the sensitivity (proportion of true-positive findings) and specificity (proportion of true-negative findings) are calculated. A receiver operating characteristic curve consists of sensitivity values plotted against specificity for each cutoff.
Statistical corrections are necessary in this setting, and several statistical methods have been specifically developed for this purpose. Most of these methods are quite elaborate and have already received some attention in the radiology literature (1422). For these reasons, we will not discuss the methods for ordinal data further in this article and refer interested readers to the references for reviews and detailed discussions.
| ESTIMATION OF INTRACLUSTER CORRELATION |
|---|
|
|
|---|
is needed to calculate the correction factor C.
can be viewed as a measure of "resemblance" between any two sites within each individual (cluster). Therefore, an adjustment for the cluster effect must take into account this within-cluster dependence.
can be defined as follows:
One can see from the above ratio that the higher the variability between subjects relative to the variability within subjects (clusters), the higher the value of
. At the extreme, if all sites within subjects provide the same result, then
will be 1. Estimation of
will be of crucial importance in the adjustments that we will describe. Many methods have been proposed to estimate
. The most commonly reported (23) uses analysis of variance to estimate the variance between subjects and between sites within subjects in Equation 1.
Estimation of
for Binary Data
In the case of binary data, a simple estimate of
, which can be calculated by hand, can be arrived at by using
statistic. The
statistic is a commonly used measure of agreement that also can be interpreted as the intracluster correlation. The following equation is an estimate of
value for more than two measurements per subject (24):
Table 1 provides the descriptions of the components of Equation 2. In the section on adjusted tests for binary data, a detailed calculation of
statistic is given.
|
for Continuous Data
statistic cannot be used to estimate
when the outcome data are continuous. In this case, we can resort to estimating the components of Equation 1, namely the variance between subjects and variance between sites within subjects, by using Equation 1 (23). Note that although this method of estimating
is applicable even when the data are binary, it is simpler to use the
statistic. | ILLUSTRATIVE EXAMPLE |
|---|
|
|
|---|
2 test and the correction procedure discussed earlier. Patients with prostate cancer have aggressive bone metastasis; about 30% of the patients have bone metastasis at diagnosis, and more than 80% have bone metastasis at the time of death (25). We retrospectively evaluated 12 patients with prostate cancer to explore the association between the levels of PSA and the frequency of bone metastases identified with PET. Of the 12 patients in the study, four had normal PSA values and eight had high PSA values. In addition, PET findings were examined for a total of 176 bones each. For example, the first patient had a high PSA level and 17 lesions identified at PET (and, hence, 159 bones with no lesions). The raw data are presented in Table 2 and are cross-classified according to PSA level and PET finding in Table 3.
|
|
2 statistic for a 2 x 2 table was used to test the hypothesis that there is no association between PSA level and PET findings. The
2 statistic is 30.4, and the resulting P value is .0001. On the basis of this unadjusted analysis, we can conclude that a highly significant association exists between PSA level and bone metastases as determined with PET. It is important to note, however, that the unit of analysis for this evaluation is the site (bone); that is, the frequencies in Table 3 represent sites within patients. Although a total of 2,112 measurements were recorded, there were only 12 patients. Therefore, we must adjust for the correlation within subject. To calculate the correction, we first need to estimate
.
Table 2 includes everything needed to compute the intracluster correlation for binary data. The total number of subjects (n) is 12, the number of sites per subject (m) is 176 (same for all patients), the total proportion of positive sites
is 157/2,112 = 0.0743, and 1 -
= 0.9257. By using the
statistic method described in the previous section, we obtained an estimate of 0.0621 for
, as follows:
Hence, C = [1 + (m - 1)
] is 11.87. Dividing the standard
2 statistic by this value gives us a corrected statistic, which is 2.56 (30.4/11.87), with a corrected P value of .110. After applying the correction factor that adjusts for the effect of intracluster correlation, we see that there is no longer evidence against the null hypothesis of no association at the 5% level.
It is important to note that if sites of interest are predetermined, as in this example, there is no ambiguity about the value of mi for each patient. This should be taken into account during the design stage of planned studies.
As seen with the earlier example, conclusions can be biased when clustering is ignored. Table 4 presents some plausible scenarios to show how results might be affected. The adjusted
2 statistic (
a2 =
2/C) of 2.56 corresponded to a P value of .110 when we accounted for clustering. The results that would have been obtained had clustering been ignored are reported in Table 4 for various cluster sizes (m = 3, 10, 25, 100) and levels of correlation (
= 0.05, 0.10, 0.25, 0.50). It is evident that as m or
increases, the discrepancy between results of adjusted and unadjusted analysis increases. In particular, it is important to note that the unadjusted P value was smaller than adjusted P values in all cases; in 13 of the 16 scenarios, the unadjusted P value is less than .05, which would have led to an inappropriate rejection of the null hypothesis.
|
| DISCUSSION |
|---|
|
|
|---|
2 tests ignore the effect of clustering and can produce misleading or incorrect conclusions. A common approach for analyzing clustered data is to calculate summary measures for each patient (eg, the proportion of positive sites per patient). This is appropriate if the summary measure under consideration is clinically interesting. Summary methods should not be used just because of their simplicity, and alternative methods that make use of all data (without aggregation) should be considered. We presented simple alternatives that adjust standard test statistics in the presence of clustering. More complicated regression modeling techniques for analyzing clustered data are available as well. These are based on standard regression models but use a covariance structure to represent dependence within patients. For continuous data, a mixed model based on normal theory can be appropriate (26,27), and for binary or ordinal data, generalized estimating equations can be applied (28). These were not considered in this article for several reasons: (a) the interpretation and communication of the results obtained with these models require a higher level of statistical sophistication; (b) special software not accessible to most radiologists is necessary to fit these models; and (c) applications of these models have already received some attention in radiology (1417). Our purpose was not to encourage investigators to perform their own analysis or to promote the adjustment methods discussed herein as the only way to analyze clustered data. Investigators are advised to consult a statistician when analyzing or designing a study that involves clustered data; the statisticians might, in turn, choose to apply a method that we have not discussed herein.
Recall that the problem of clustering originates from obtaining multiple observations per patient. These multiple observations can take on different values for each site. In the example we considered, the outcome PET findings took on different values within the same subject. Variables that can change from one site to another are called site-specific variables. Hence, if the outcome is not site-specific, there is no clustering. In addition to the outcome, one may be interested in variables that do not vary across sites within a subject, such as PSA level. PSA level remains the same for any given site within a particular subject. Such variables are defined as subject-specific. If variables considered (other than the outcome) are all subject-specific, then the adjusted statistics discussed herein are applicable. If the analysis includes variables that are site-specific, however, then it may be necessary to apply complicated regression models or more advanced adjustments (10,29) because the assumptions underlying the adjusted methods are no longer met.
Although we used PET studies as a framework here, it should be clear that any experiment with patients contributing more than one observation will benefit from the application of these statistical methods.
| FOOTNOTES |
|---|
Author contributions: Guarantor of integrity of entire study, M.G.; study concepts and design, M.G., K.S.P., S.M.L.; literature research, M.G., K.S.P., S.M.L.; clinical and experimental studies, S.M.L.; data acquisition, S.M.L.; data analysis/interpretation, M.G., K.S.P., S.M.L.; statistical analysis, M.G., K.S.P.; manuscript preparation, definition of intellectual content, editing, revision/review, and final version approval, M.G., K.S.P., S.M.L.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. Oncel, G. Oncel, A. Tastan, and B. Tamci Evaluation of Coronary Stent Patency and In-Stent Restenosis with Dual-Source CT Coronary Angiography Without Heart Rate Control Am. J. Roentgenol., July 1, 2008; 191(1): 56 - 63. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Pandit-Taskar, J. A. O'Donoghue, M. J. Morris, E. A. Wills, L. H. Schwartz, M. Gonen, H. I. Scher, S. M. Larson, and C. R. Divgi Antibody Mass Escalation Study in Patients with Castration-Resistant Prostate Cancer Using 111In-J591: Lesion Detectability and Dosimetric Projections for 90Y Radioimmunotherapy J. Nucl. Med., July 1, 2008; 49(7): 1066 - 1074. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Oda, K. Awai, D. Liu, T. Nakaura, Y. Yanaga, H. Nomori, and Y. Yamashita Ground-Glass Opacities on Thin-Section Helical CT: Differentiation Between Bronchioloalveolar Carcinoma and Atypical Adenomatous Hyperplasia Am. J. Roentgenol., May 1, 2008; 190(5): 1363 - 1368. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Oncel, G. Oncel, and A. Tastan Effectiveness of Dual-Source CT Coronary Angiography for the Evaluation of Coronary Artery Disease in Patients with Atrial Fibrillation: Initial Experience Radiology, December 1, 2007; 245(3): 703 - 711. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. U. Gerth, K. U. Juergens, U. Dirksen, J. Gerss, O. Schober, and C. Franzius Significant Benefit of Multimodal Imaging: PET/CT Compared with PET Alone in Staging and Follow-up of Patients with Ewing Tumors J. Nucl. Med., December 1, 2007; 48(12): 1932 - 1939. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Park, J. K. Kim, H. J. Lee, H. J. Choi, and K.-S. Cho Hematuria: Portal Venous Phase Multi Detector Row CT of the Bladder A Prospective Study Radiology, December 1, 2007; 245(3): 798 - 805. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Ong, H. Schoder, S. G. Patel, I. M. Tabangay-Lim, I. Doddamane, M. Gonen, A. R. Shaha, R. M. Tuttle, J. P. Shah, and S. M. Larson Diagnostic Accuracy of 18F-FDG PET in Restaging Patients with Medullary Thyroid Carcinoma and Elevated Calcitonin Levels J. Nucl. Med., April 1, 2007; 48(4): 501 - 507. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. T. Sica Bias in Research Studies Radiology, March 1, 2006; 238(3): 780 - 789. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. Zajick Jr, W. B. Morrison, M. E. Schweitzer, J. A. Parellada, and J. A. Carrino Benign and Malignant Processes: Normal Values and Differentiation with Chemical Shift MR Imaging in Vertebral Marrow Radiology, November 1, 2005; 237(2): 590 - 596. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. H. K. Hoffmann, H. Shi, B. L. Schmitz, F. T. Schmid, M. Lieberknecht, R. Schulze, B. Ludwig, U. Kroschel, N. Jahnke, W. Haerer, et al. Noninvasive Coronary Angiography With Multislice Computed Tomography JAMA, May 25, 2005; 293(20): 2471 - 2478. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hong, G. S. Chrysant, P. K. Woodard, and K. T. Bae Coronary Artery Stent Patency Assessed with In-Stent Contrast Enhancement Measured at Multi-Detector Row CT Angiography: Initial Experience Radiology, October 1, 2004; 233(1): 286 - 291. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Kim, S.-Y. Park, H. J. Ahn, C. S. Kim, and K.-S. Cho Bladder Cancer: Analysis of Multi-Detector Row Helical CT Enhancement Pattern and Accuracy in Tumor Detection and Perivesical Staging Radiology, June 1, 2004; 231(3): 725 - 731. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Schoder, H. W. D. Yeung, M. Gonen, D. Kraus, and S. M. Larson Head and Neck Cancer: Clinical Usefulness and Accuracy of PET/CT Image Fusion Radiology, April 1, 2004; 231(1): 65 - 72. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Van Hoe and M. Vanderheyden Ischemic Cardiomyopathy: Value of Different MRI Techniques for Prediction of Functional Recovery After Revascularization Am. J. Roentgenol., January 1, 2004; 182(1): 95 - 100. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Tello and P. E. Crewson Hypothesis Testing II: Means Radiology, April 1, 2003; 227(1): 1 - 4. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |