|
|
||||||||
Nuclear Medicine |
1 From the Department of Internal Medicine, Division of Nuclear Medicine (B.A.D., J.O.A., R.L.W.), the Departments of Radiology (R.L.W.) and Surgery (S.S.S.), and the Consortium for Health Outcomes, Innovation, and Cost Effectiveness Studies (CHOICES) (S.S.S.), University of Michigan Medical Center, 1500 E Medical Center Dr, UH B1G 412, Ann Arbor, MI 48109-0028. From the 1997 RSNA scientific assembly. Received April 26, 1998; revision requested November 9; revision received February 4, 1999; accepted June 8. Address reprint requests to R.L.W.
| Abstract |
|---|
|
|
|---|
MATERIALS AND METHODS: English-language reports on the diagnostic performance of PET (14 studies, 514 patients) and/or CT (29 studies, 2,226 patients) for demonstration of mediastinal nodal metastases from NSCLC were selected by using the MEDLINE database. In eligible studies, an objective diagnostic standard was used, data were presented to allow recalculation of contingency tables, and established diagnostic criteria were used for abnormal test results. Summary receiver operating characteristic (ROC) curves were calculated.
RESULTS: Pooled point estimates of diagnostic performance and summary ROC curves indicated that PET was significantly more accurate than CT for demonstration of nodal metastases (P < .001). Mean sensitivity and specificity (± 95% CI) were 0.79 ± 0.03 and 0.91 ± 0.02, respectively, for PET and 0.60 ± 0.02 and 0.77 ± 0.02, respectively, for CT. The log odds ratios were 1.79 (95% CI: 1.49, 2.09) for CT and 3.77 (95% CI: 2.77, 4.77) for PET (P < .001). Subgroup analyses did not alter findings.
CONCLUSION: PET is superior to CT for mediastinal staging of nonsmall cell lung cancer, independent of performance index or clinical context of PET imaging.
Index terms: Computed tomography (CT), comparative studies, 996.1291 Positron emission tomography (PET), comparative studies, 996.12963 Lung neoplasms, metastases, 60.321, 67.33 Lymphatic system, neoplasms, 996.33 Lymphatic system, CT, 996.1291 Lymphatic system, radionuclide studies, 996.12963 Mediastinum, neoplasms, 67.33
| Introduction |
|---|
|
|
|---|
Currently, mediastinal lymph nodes shown at x-ray computed tomography (CT) or magnetic resonance (MR) imaging to be larger than or equal to 1 cm in short-axis diameter are considered to be "abnormal" and are subsequently evaluated either with bronchoscopy and transbronchial needle aspiration biopsy or with mediastinoscopy. Patients with mediastinal nodes smaller than 1 cm are presumed to be free of local-regional metastatic disease and, if there are no contraindications to thoracic surgery, are offered an opportunity to undergo surgical resection. McKenna and colleagues (5) found no correlation between the presence of mediastinal nodal metastases and nodal size. In fact, metastases may be found in 21% of normal nodes (6), and up to 40% of enlarged nodes in some series (7) are not cancerous. Thus, it has been suggested (8) that important advances in the noninvasive detection of metastases to the lymph nodes must await an approach that is fundamentally different from CT for determination of lymph node abnormality on the basis of size.
The authors of studies (9,10) in which the tumor-localizing properties of 2-[fluorine 18]fluoro-2-deoxy-D-glucose (FDG) were used have described the application of positron emission tomography (PET) to diagnostic evaluation for a variety of tumors, including breast cancer, brain tumors, lymphomas, and lung cancer. In contrast to CT, which is primarily dependent on anatomic imaging features, FDG PET is mainly dependent on the metabolic characteristics of a tissue for assistance in the diagnosis of disease.
Wahl and colleagues (11) prospectively evaluated PET in a head-to-head trial with CT for the evaluation of mediastinal lymph nodes in patients with NSCLC. In their study of 23 patients with newly diagnosed or possible NSCLC who were undergoing both CT and PET with pathologic correlation, PET demonstrated 82% sensitivity and 81% specificity, as compared with 64% sensitivity and 44% specificity for CT, for the staging of mediastinal nodal disease (11). Subsequently, other researchers (1224) have conducted single-institution studies, with results even more favorable with regard to PET. However, these studies lacked the statistical power to help determine whether the differences between CT and PET were significant. In a meta-analysis of 42 studies with CT published before the 1990s, Dales and colleagues (8) reported sensitivity and specificity for mediastinal metastases of 0.79 and 0.78, respectively.
With the emergence of newer-generation CT scanners and better imaging algorithms, however, the contemporary diagnostic performance of CT in the staging of mediastinal NSCLC has not been well characterized. It also is not clear what differences may exist in diagnostic performance between PET used as a substitute for CT and PET used as a complementary modality to CT. We undertook a meta-analysis of the published literature to compare the discriminatory power of the two modalities and to place in perspective the role of FDG PET relative to that of CT in the staging of mediastinal NSCLC.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Study Selection
Two of the authors (B.A.D., J.O.A.) independently reviewed the articles to determine eligibility for detailed analysis, with disagreements resolved by means of repeat review and discussion. Articles selected for inclusion and analysis met the following criteria: (a) evaluation of the diagnostic performance of FDG PET and/or CT for the detection of mediastinal nodal metastases from NSCLC, (b) comparison of imaging results with an objective diagnostic standard (ie, mediastinal nodal status established with results from histologic samples obtained at mediastinoscopy, thoracotomy, and/or autopsy), (c) reporting of results in sufficient detail to allow reconstruction of contingency tables of the raw data (ie, true-positive, true-negative, false-positive, and false-negative results), and (d) use of established diagnostic criteria for abnormal test results (eg, at CT, abnormal lymph node
10 mm in short-axis diameter; at PET, abnormal lymph node uptake exceeding that of mediastinal blood pool).
Data Extraction and Assessment of Methodological Quality
Two authors (B.A.D., J.O.A.) abstracted the following information from the eligible articles in a nonblinded fashion: author names; journal name; year of publication; number of patients; mode of analysis (patients or nodal stations); and true-positive, true-negative, false-positive, and false-negative rates for the presence of nodal metastases. We independently assessed the quality of each study according to the following prospectively developed criteria modified from well-accepted methodological standards for the evaluation of quality in diagnostic test research (2530), with disagreements resolved by means of discussion and consensus.
Description and quality of imaging procedure.This standard required that the imaging protocol was adequately described and conformed to accepted standards for technical quality. For PET, this was fulfilled if (a) the type of scanner was third generation or later, (b) patients fasted at least 4 hours before scanning, (c) the dose of FDG was mentioned, and (d) transmission scanning with attenuation correction, emission imaging protocol, reconstruction algorithm, and criteria for interpretation were described in full. An adequate CT examination required (a) use of a third- or fourth-generation scanner, (b) section scanning time of 2 seconds or less, (c) maximum section thickness and interval of 10 mm, (d) scanning area included from above the apices through the adrenal glands, and (e) full description of criteria for interpretation.
Technical quality of reference test.Technically adequate reference testing required nodal tissue sampling with fine-needle aspiration biopsy or biopsy at mediastinoscopy, thoracotomy, and/or autopsy.
Uniform application of reference test.The purpose of this standard was to help prevent verification bias. The standard was met if verification was obtained by means of reference test(s), regardless of imaging results.
Independence of interpretation.This standard refers to blinding of interpretations of the results of index tests and reference standards. This criterion helped prevent review bias and required a statement about independence or blinding in interpretation of the results of both the imaging test(s) and the reference test.
Clinical description and spectrum composition.Description of the study population included at least three of the following descriptors: age distribution; sex distribution; summary of symptoms at presentation, disease stage, or both; and eligibility criteria for study subjects. Ability to generalize results was determined by means of adequacy of the spectrum composition.
Cohort assembly.Fulfillment of this standard required prospective enrollment of patients.
Sample size.This standard refers to the number of cases included in the study. This standard was met if the population with disease and the population without disease both had more than 35 subjects. A sample size of 35 is the minimum for which the lower bound of the 95% CI for a sensitivity or specificity of 1.0 would exceed 0.9.
Adequate reporting of results.This required both summary and subgroup indexes of accuracy, with precision estimates (such as CIs) and a summary measure of observer variability.
Data Analyses
Diagnostic performance indexes (sensitivity, specificity, accuracy, and predictive values) were recalculated for each report from the reconstructed contingency tables of true-positive, true-negative, false-positive, and false-negative results. The pooling of sensitivity and specificity, which has traditionally been used in the meta-analysis of diagnostic test data, ignores the fact that both performance indexes are dependent on the cutoff value used to define a positive test result; a stricter cutoff value will increase the specificity at the expense of the sensitivity. Thus, test performance was calculated by using summary receiver operating characteristic (ROC) curve analysis to distinguish variations in decision thresholds from actual differences in accuracy.
We used the logistic transform method of Littenberg and Moses, the properties and details of which have been described elsewhere (31,32). In brief, construction of a summary ROC curve involves calculation of the sum and the difference of the logit transforms of the true-positive and false-positive rates for each study. Once the sum S and difference D have been calculated, D becomes the dependent variable, and S becomes the independent variable in an ordinary least-squares regression. The slope of the resultant regression line indicates the departure from symmetry of the summary ROC curve, and the intercept is correlated with overall diagnostic accuracy. D is the log odds ratio from the respective two-by-two table and is a measure of accuracy, whereas S corresponds to the leniency of the positivity criterion. The final step involves conversion of the regression line back from the transformed space (S, D) to a summary ROC curve in the original space (false-positive rate, true-positive rate). As is the case with conventional ROC curves, a summary ROC curve closer to the upper left-hand corner of the graph indicates better overall diagnostic performance of the technology summarized.
Significant differences between summary ROC curves with slopes between -0.5 and +0.5 were calculated by using the Student t test to analyze differences between the intercepts of the regression lines corresponding to the summary ROC curves. The mean difference value was used as a basic summary statistical measure of test performance. The false-positive rate was restricted to the range 0.040.60, and the true-positive rate was restricted to that actually reported from the eligible studies.
It was predetermined at the time of study design to perform subgroup analyses of CT versus FDG PET by using the following variables: (a) type of study (prospective vs retrospective), (b) method of subject analysis (patient vs nodal station), (c) sample size (
35 patients vs >35 patients), (d) publication year or period (during or before 1995 vs after 1995), (e) geographic origin of study (North America vs other origin), (f) clinical context of PET imaging (PET substituted for CT vs PET complementary to CT).
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
The systematic evaluation of diagnostic tests before they enter widespread use may help improve the quality of diagnostic test information, eliminate poor or useless tests before they are widely applied, reduce the costs of health care, and improve the care of patients. The assessment of PET is important because, as an emerging clinical diagnostic tool with a potentially high capital investment, it is imperative that the diagnostic usefulness of PET be established prior to widespread diffusion into clinical practice.
The comparison of PET with CT is justified by the fact that the latter, which currently is the noninvasive test of choice, has substantial limitations. For example, it is not possible to reliably differentiate malignant mediastinal nodes from benign nodes on the basis of size alone, and there is a high frequency of normal-sized N2 nodes in patients with an operable stage of lung cancer.
By using pooled point estimates of diagnostic performance and summary ROC curves derived from the available published evidence, we showed that FDG PET was significantly more accurate than CT for help in detection of mediastinal nodal metastases in patients with NSCLC. All differences in diagnostic performances were notably significant, with P values of less than .001. It is important to note that the superiority of FDG PET over CT was maintained across all subgroup analyses. The lower false-positive rate means that fewer patients will be denied the opportunity to undergo curative resection, and the high sensitivity of FDG PET implies that PET will help reduce the number of unnecessary thoracotomies, as compared with the number that would be performed on the basis of CT findings. The log odds ratios show that PET findings were much better for help in discriminating between benign and metastatic lymph node enlargement.
Traditionally, simple measures of effectiveness, such as the pooled sensitivity and the specificity, have been used in meta-analyses of diagnostic tests. These measures, although widely recognized and easy to understand, are subject to definitional arbitrariness, and positivity thresholds commonly differ across studies. The marked variability in sensitivity and specificity, especially for CT, illustrates why summary ROC curve analysis may offer a better comparison between alternative diagnostic tests (31,32).
In our critical analyses of the eligible primary studies, we detected important limitations in methodological quality that potentially introduced bias and weakened estimates of diagnostic performance. Whereas appropriate reference tests were applied uniformly in all patients across studies, only 34% of studies had a sample size of 35 or more patients, the minimum required for the lower bound of the 95% CI to exceed 0.90 for a sensitivity or specificity of 1.0. Because of the small sample sizes, especially in the PET studies, CIs were wide. However, there was no overlap of CIs of the summary log odds ratios for FDG PET and CT studies.
Other limitations were related to the poor reporting of test results, the methods of cohort assembly, and the independence of interpretation. Although the selected reports provided estimates of sensitivity and accuracy, only three reports provided such measures for subgroups, and only one provided a measure of interobserver variability. For successful application of either test in clinical practice, separate indexes of accuracy are needed for pertinent individual subgroups within the spectrum of patients evaluated. Assessment of reproducibility (interobserver variability) is critical, especially for these imaging techniques, which often require interpretation of results tempered with subjective judgements (28).
Apart from the limitations associated with the primary studies are those common to meta-analysis, such as publication bias, selection bias, and blinded selection of articles (5053). Publication bias refers to the fact that studies with statistically significant results are more likely to be published than are those with negative results (52,53). Because meta-analyses such as ours are based on published reports, the potential for underrepresentation of negative studies in the literature is a major concern (53). However, the wisdom of the inclusion of unpublished data has been contested (54). Unpublished studies have not undergone formal peer review, and the fact that the study was not submitted for publication raises questions about the quality of the work. Although there are statistical methods for correcting for publication bias in meta-analysis of treatment trials (55,56), no validated procedures exist for meta-analysis of diagnostic test data.
Despite the retrospective nature and limitations of meta-analysis and a lack of methodological rigor in the primary studies, we believe our results provide the best picture currently available to inform clinicians, patients, and policy makers about the accuracy of PET relative to that of CT. This is because the expense, practical difficulties, and entrenchment of CT in clinical practice make it unlikely that a methodologically rigorous comparison of CT and PET, with a sample size sufficient to provide more precise data, will be performed.
We conclude from the results of our meta-analytic evaluation that FDG PET is significantly more accurate than CT for characterization of mediastinal lymph nodes in patients with NSCLC. This accuracy is independent of our selected subgroup analyses and method of data analysis. In the future, however, investigators must attempt to improve study design, especially in the areas of cohort assembly, blinded interpretation of both imaging and reference test results, and presentation of results.
| Footnotes |
|---|
Author contributions: Guarantor of integrity of entire study, R.L.W.; study concepts and design, B.A.D., S.S.S., R.L.W.; definition of intellectual content, B.A.D., S.S.S., R.L.W.; literature research, B.A.D., J.O.A., R.L.W.; data acquisition and analysis, B.A.D., J.O.A.; statistical analysis, B.A.D., S.S.S.; manuscript preparation, B.A.D., S.S.S.; manuscript editing, B.A.D., S.S.S., R.L.W.; manuscript review, all authors.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. A Yi, K. M. Shin, K. S. Lee, B.-T. Kim, H. Kim, O J. Kwon, J. Y. Choi, and M. J. Chung Non-Small Cell Lung Cancer Staging: Efficacy Comparison of Integrated PET/CT versus 3.0-T Whole-Body MR Imaging Radiology, August 1, 2008; 248(2): 632 - 642. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Nomori, T. Mori, K. Ikeda, K. Kawanaka, S. Shiraishi, K. Katahira, and Y. Yamashita Diffusion-weighted magnetic resonance imaging can be used in place of positron emission tomography for N staging of non-small cell lung cancer with fewer false-positive results. J. Thorac. Cardiovasc. Surg., April 1, 2008; 135(4): 816 - 822. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Fletcher, B. Djulbegovic, H. P. Soares, B. A. Siegel, V. J. Lowe, G. H. Lyman, R. E. Coleman, R. Wahl, J. C. Paschold, N. Avril, et al. Recommendations on the Use of 18F-FDG PET in Oncology J. Nucl. Med., March 1, 2008; 49(3): 480 - 508. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. F. Bruzzi, R. Komaki, G. L. Walsh, M. T. Truong, G. W. Gladish, R. F. Munden, and J. J. Erasmus Imaging of Non-Small Cell Lung Cancer of the Superior Sulcus: Part 2: Initial Staging and Assessment of Resectability and Therapeutic Response RadioGraphics, March 1, 2008; 28(2): 561 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Y. Kim, C. A Yi, K. S. Lee, M. J. Chung, Y. K. Kim, B. K. Choi, H. Kim, and O J. Kwon Nodal Metastasis in Non Small Cell Lung Cancer: Accuracy of 3.0-T MR Imaging Radiology, December 4, 2007; (2007) 2461061907. [Abstract] [Full Text] |
||||
![]() |
D. Hellwig, T. P. Graeter, D. Ukena, A. Groeschel, G. W. Sybrecht, H.-J. Schaefers, and C.-M. Kirsch 18F-FDG PET for Mediastinal Staging of Lung Cancer: Which SUV Threshold Makes Sense? J. Nucl. Med., November 1, 2007; 48(11): 1761 - 1766. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Wang, Z. Wang, W. Yao, H. Xie, J. Xu, and L. Tian Role of 99mTc-Octreotide Acetate Scintigraphy in Suspected Lung Cancer Compared with 18F-FDG Dual-Head Coincidence Imaging J. Nucl. Med., September 1, 2007; 48(9): 1442 - 1448. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Silvestri, M. K. Gould, M. L. Margolis, L. T. Tanoue, D. McCrory, E. Toloza, and F. Detterbeck Noninvasive Staging of Non-small Cell Lung Cancer: ACCP Evidenced-Based Clinical Practice Guidelines (2nd Edition) Chest, September 1, 2007; 132(3_suppl): 178S - 201S. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Craanen, E. F.I. Comans, M. A. Paul, and E. F. Smit Endoscopic ultrasound guided fine-needle aspiration and 18FDG-positron emission tomography in the evaluation of patients with non-small cell lung cancer Interactive CardioVascular and Thoracic Surgery, August 1, 2007; 6(4): 433 - 436. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H.M. Piet, F. J. Lagerwaard, P. W.A. Kunst, J. R. van Sornsen de Koste, B. J. Slotman, and S. Senan Can Mediastinal Nodal Mobility Explain the Low Yield Rates for Transbronchial Needle Aspiration Without Real-Time Imaging? Chest, June 1, 2007; 131(6): 1783 - 1787. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Mujoomdar, J. H. M. Austin, R. Malhotra, C. A. Powell, G. D. N. Pearson, M. C. Shiau, and H. Raftopoulos Clinical Predictors of Metastatic Disease to the Brain from Non-Small Cell Lung Carcinoma: Primary Tumor Size, Cell Type, and Lymph Node Metastases Radiology, March 1, 2007; 242(3): 882 - 888. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Singh, B. Camazine, Y. Jadhav, R. Gupta, P. Mukhopadhyay, A. Khan, R. Reddy, Q. Zheng, D. D. Smith, R. Khode, et al. Endoscopic Ultrasound As a First Test for Diagnosis and Staging of Lung Cancer: A Prospective Study Am. J. Respir. Crit. Care Med., February 15, 2007; 175(4): 345 - 354. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A Yi, K. S. Lee, B.-T. Kim, S. S. Shim, M. J. Chung, Y. M. Sung, and S. Y. Jeong Efficacy of Helical Dynamic CT Versus Integrated PET/CT for Detection of Mediastinal Nodal Metastasis in Non-Small Cell Lung Cancer Am. J. Roentgenol., February 1, 2007; 188(2): 318 - 325. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Czernin, M. Allen-Auerbach, and H. R. Schelbert Improvements in Cancer Staging with PET/CT: Literature-Based Evidence as of September 2006 J. Nucl. Med., January 1, 2007; 48(1_suppl): 78S - 88S. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. Biehl, F.-M. Kong, F. Dehdashti, J.-Y. Jin, S. Mutic, I. El Naqa, B. A. Siegel, and J. D. Bradley 18F-FDG PET Definition of Gross Tumor Volume for Radiotherapy of Non-Small Cell Lung Cancer: Is a Single Standardized Uptake Value Threshold Approach Appropriate? J. Nucl. Med., November 1, 2006; 47(11): 1808 - 1812. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ebihara, H. Nomori, K. Watanabe, T. Ohtsuka, T. Naruke, K. Uno, I. Kuwahira, and K. Eguchi Characteristics of Advantages of Positron Emission Tomography over Computed Tomography for N-staging in Lung Cancer Patients Jpn. J. Clin. Oncol., November 1, 2006; 36(11): 694 - 698. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-T. Kim, K. S. Lee, S. S. Shim, J. Y. Choi, O J. Kwon, H. Kim, Y. M. Shim, J. Kim, and S. Kim Stage T1 Non-Small Cell Lung Cancer: Preoperative Mediastinal Nodal Staging with Integrated FDG PET/CT--A Prospective Study Radiology, November 1, 2006; 241(2): 501 - 509. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. C. McLoud "A system for the clinical staging of lung cancer"--a commentary. Am. J. Roentgenol., August 1, 2006; 187(2): 269 - 270. [Full Text] [PDF] |
||||
![]() |
P. De Leyn, S. Stroobants, W. De Wever, T. Lerut, W. Coosemans, G. Decker, P. Nafteux, D. Van Raemdonck, L. Mortelmans, K. Nackaerts, et al. Prospective Comparative Study of Integrated Positron Emission Tomography-Computed Tomography Scan Compared With Remediastinoscopy in the Assessment of Residual Mediastinal Lymph Node Disease After Induction Chemotherapy for Mediastinoscopy-Proven Stage IIIA-N2 Non-Small-Cell Lung Cancer: A Leuven Lung Cancer Group Study J. Clin. Oncol., July 20, 2006; 24(21): 3333 - 3339. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Shiraishi, S. Tomiguchi, D. Utsunomiya, K. Kawanaka, K. Awai, S. Morishita, T. Okuda, K. Yokotsuka, and Y. Yamashita Quantitative Analysis and Effect of Attenuation Correction on Lymph Node Staging of Non-Small Cell Lung Cancer on SPECT and CT. Am. J. Roentgenol., May 1, 2006; 186(5): 1450 - 1457. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Bunyaviroch and R. E. Coleman PET Evaluation of Lung Cancer J. Nucl. Med., March 1, 2006; 47(3): 451 - 469. [Full Text] [PDF] |
||||
![]() |
C. S. Yap, J. Czernin, M. C. Fishbein, R. B. Cameron, C. Schiepers, M. E. Phelps, and W. A. Weber Evaluation of Thoracic Tumors With 18F-Fluorothymidine and 18F- Fluorodeoxyglucose-Positron Emission Tomography. Chest, February 1, 2006; 129(2): 393 - 401. [Abstract] [Full Text] [PDF] |