|
|
||||||||
Evidence-based Practice |
1 From the Department of General Medicine and Clinical Epidemiology, Kyoto University Graduate School of Medicine, Kyoto, Japan. Received June 26, 2003; revision requested September 9; final revision received May 21, 2004; accepted June 23. Address correspondence to T.F. Department of Medicine, St Lukes International Hospital, 91 Akashi-cho, Chuo-ku, Tokyo, 104-8560, Japan. (e-mail: fkts@luke.or.jp).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: V-P scanning articles published from January 1985 to March 2003 and helical CT articles published from January 1990 to March 2003 in MEDLINE and EMBASE databases were included if (a) tests were performed for evaluation of acute PE, (b) conventional angiography was the reference standard, and (c) absolute numbers of true-positive, false-negative, true-negative, and false-positive results were available. Sensitivity analysis was conducted by excluding articles published before 1995.
RESULTS: A total of 12 articles discussing helical CT and/or V-P scanning were included. With a random-effects model, pooled sensitivity for helical CT was 86.0% (95% confidence interval [CI]: 80.2%, 92.1%), and specificity was 93.7% (95% CI: 91.1%, 96.3%). V-P scanning yielded low sensitivity of 39.0% (95% CI: 37.3%, 40.8%) but high specificity of 97.1% (95% CI: 96.0%, 98.3%) with high probability threshold. V-P scanning yielded high sensitivity of 98.3% (95% CI: 97.2%, 99.5%) and low specificity of 4.8% (95% CI: 4.7%, 4.9%) with normal threshold. Regression coefficients for helical CT angiography were 0.588 (95% CI: 1.55, 2.74) and 4.14 (95% CI: 0.002, 8.28) versus V-P scanning with high and normal thresholds, respectively. Regression coefficients for helical CT angiography were 0.588 (95% CI: 1.55, 2.74) and 4.14 (95% CI: 0.002, 8.28) versus V-P scanning with high and normal thresholds, respectively.
CONCLUSION: Helical CT has greater discriminatory power than V-P scanning with normal and/or near-normal threshold to exclude PE, while helical CT and V-P scanning with high probability threshold had similar discriminatory power in the diagnosis of PE.
© RSNA, 2005
| INTRODUCTION |
|---|
|
|
|---|
Contrast agentenhanced helical computed tomography (CT) of the pulmonary arteries has been proposed, and data are accumulating (817). The choice between V-P scanning and helical CT should be determined by using available data to compare diagnostic accuracy. Interpretation of the data as a whole is difficult because of the wide variation in the background of the patients. In prior reports that compare the test performance of helical CT and V-P scanning (18,19), the difference in test performance has not been systematically compared with a statistical method frequently used in meta-analysis of diagnostic tests.
Summary receiver operating characteristic (ROC) analysis is a method that enables quantitative combination of the multiple studies with heterogeneous results; each point on the summary ROC curve represents a combination of sensitivity and specificity that could result from each study (2022). Heterogeneity among different studies is caused either by differences between the way clinicians define a test as positive for PE or by wide variation in terms of the patients background; summary ROC curve analysis could resolve those problems by means of stringent inclusion criteria and meta-regression analysis. It is suitable to compare the performance of different diagnostic tests, and reports involving the use of this analysis have been accumulating (23,24). Thus, the purpose of our study was to perform meta-analysis of the helical CT and V-P scanning literature by using the methodologic tool of summary ROC analysis.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Inclusion Criteria
We included a study if (a) helical CT or V-P scanning was used as a diagnostic tool for acute PE; (b) absolute numbers of true-positive, false-positive, true-negative, and false-negative cases or their equivalent were given; (c) pulmonary angiography was used as the reference standard for diagnosis of PE; and (d) the time interval between the findings obtained from the test and reference standard was 48 hours or less, taking into account the fact that PE might disappear during the interval of two tests. We determined this time interval by reviewing literature of diagnostic tests in patients with PE (25,26) and discussing this issue with radiologists.
A study was excluded if (a) pulmonary angiography, in combination with any other modality, served as the reference standard; (b) helical CT was not performed for acute PE (eg, chronic PE or septic embolism); (c) noncomparable CT methods (eg, electron-beam CT) were used; (d) helical CT was performed after anticoagulant therapy or surgery for PE; or (e) the published information was incomplete. We tried to see if different studies from the same institution used the same patients because one author published several reports.
Data Collection
Two investigators (Y.H., M.G.) independently abstracted the data from all articles included in our analysis. The information abstracted included descriptive data (eg, authors, title, journal citation, and year of publication), study group characteristics (eg, sample size, mean age, proportion of women, and prevalence of PE), study design characteristics (involving criteria used to define a positive result and protocol information), extent of blinding between readers, information about the extent of the disease, and any evidence of verification bias and test interpretation bias.
For each study, the results were classified as true-positive, false-positive, true-negative, and false-negative. For V-P scanning, PIOPED criteria are generally used according to the probability of PE (eg, high, intermediate, low, or near normal or normal) (5). We specify three criteria for calculating sensitivity and specificity (Table 1). High-probability V-P scanning findings were positive, and others (eg, intermediate probability, low probability, or near normal and/or normal) were negative (threshold 1). High- and intermediate-probability V-P scanning findings were positive, and low-probability and normal and/or near-normal V-P scanning findings were negative (threshold 2). Normal and/or near-normal V-P scanning findings were negative, and others (eg, high probability, intermediate probability, and low probability) were positive (threshold 3). In one study, researchers developed and used original criteria that consisted of five categories; these five categories were reduced to four to match PIOPED criteria (27). As for helical CT, the presence or absence of PE, defined as an intraluminal filling defect or complete nonfilling of a pulmonary artery, was used as a criterion for determining positive or negative status.
|
Analysis and Statistics
The overall suitability of the pooled and summary ROC curve analysis was evaluated by using the Spearman correlation coefficient (22). We then checked heterogeneity separately for sensitivity and specificity. Since sensitivities for helical CT and specificities for V-P scanning threshold 3 were not homogeneous (P = .006 and P < .001, respectively), pooled sensitivity and specificity estimates were calculated by using a random-effects model that weighted each report according to its sample size (28).
To estimate the summary ROC curve for helical CT and V-P scanning, we used a previously described method of variance-weighted least squares regression (20,22,23,28). On the basis of the 2 x 2 table constructed from each report, we made a logit transformation of the true-positive (eg, sensitivity) and false-positive (eg, 1 minus specificity) rates. Differences in the logit transformations (eg, measure of the observed discriminatory power of helical CT and V-P scanning) were then regressed on the sums of the logit transformations (eg, measure of the positivity threshold used to determine positive helical CT and V-P scanning results). Summary ROC curves for helical CT and V-P scanning were constructed with back transformation of the fitted line from the regression model. We weighted each study in the regression model by its variance with the following equation: [1/(true-positive + 0.5)] + [1/(false-positive + 0.5)] + [1/(false-negative + 0.5)] + [1/(true-negative + 0.5)] (20). We restricted the final summary ROC curves to the range of observed true-positive and false-positive rates.
Adjustment for clinical variables was accomplished by including them in the regression model. Inclusion of a dummy variable in the regression analysis for the type of diagnostic examination performed (eg, 1 for helical CT and 0 for V-P scanning) allows comparison of tests. The regression coefficient of this dummy variable is a measure of the difference in discriminatory power between the examinations. A positive regression coefficient implies increased discriminatory power for helical CT compared with V-P scanning, and a negative regression coefficient implies reduced discriminatory power. To avoid undefined values for diagnostic odds ratio, positivity criteria, and the variance that arises from zeroes of the true-positive, false-negative, true-negative, or false-positive values, 0.5 was added to that value (20).
We assessed the effect of publication year, mean age (55 years or younger vs older than 55 years), prevalence of PE, duration of tests (<24 hours vs <48 hours), study design (prospective vs retrospective), presence of interpretation bias, and presence of verification bias (eg, presence of verification bias vs no available information) in a combined model of helical CT and V-P scanning. We could not consider the effect of the extent of the disease in the model used to compare helical CT and V-P scanning, since information about the extent of the disease was not available in the literature that mainly dealt with V-P scanning (5,27,29).
We dichotomized some variables (eg, age, percentage of women, and duration between tests) at median. Because of the availability of data (eg, data on collimation were only available for helical CT) or missing data, the following variables were analyzed separately in each model: percentage of women included in the study (eg,
25% vs >25%), collimation (eg, 3 mm or thinner vs thicker than 3 mm), size of PE (eg, segmental vs subsegmental) for helical CT model, and type of radionuclide (eg, technetium 99m [99mTc] diethylenetriaminepentaacetic acid [DTPA] vs other types of radionuclides) used for V-P scanning. In the combined model, univariate analysis was performed to enable the effect of each clinical covariate to be assessed.
We added the factors that had a P value of less than .20 at univariate analysis into a multivariate regression model and used backward elimination to remove variables with a P value of more than .05. For the main aim of this study, a dummy variable for the type of diagnostic test (helical CT = 1) was always kept in this process. In separate and combined models, V-P scanning data were treated separately in different circumstances (eg, helical CT vs V-P scanning threshold 1, helical CT vs V-P scanning threshold 2, and helical CT vs V-P scanning threshold 3). Finally, we reanalyzed the final model with random-effects regression analysis (Technical bulletin no. 42; Stata Statistical Software, College Station, Tex), which took inter- and intrastudy variability into account.
After sensitivities and specificities were pooled, we assessed the posttest probability of PE on the basis of different pretest probabilities (low = .03, moderate = .27, high = .78). The arbitrary pretest probabilities of .03, .27, and .78 were based on the report by Wells et al (1), in which pretest probability was determined by using clinical signs and symptoms. First, pretest odds were converted into posttest odds by multiplying the pretest odds by the likelihood ratio. Likelihood ratio is defined as the probability of the test result in people with the disease divided by the probability of the test result in people without the disease. Posttest odds were converted back to posttest probabilities (30).
Since the helical CT method used in the detection of PE has undergone rapid changes in the past decade, the test performance might have changed from the early 1990s to 2003. Thus, we performed a sensitivity analysis by excluding helical CT articles that were published before 1995 and compared the results with those in base-case analysis. All analyses were performed by using commercially available software (Intercooled Stata 7.0; StataCorp, College Station, Tex).
| RESULTS |
|---|
|
|
|---|
|
Variations in study protocols included thickness of the scanning section (range, 2.55.0 mm) and the amount of contrast agent (range, 70150 mL). As for detector system, singledetector row helical CT was used in eight studies, and dualdetector row helical CT was used in one study (17). The reported sensitivity of helical CT ranged from 53% (13) to 100% (8,9), and specificity ranged from 75% (17) to 100% (9,11,14).
Ventilation studies used xenon 133 gas (133Xe), 99mTc-pyrophosphate (PYP), or 99mTc-DTPA as the nuclear isotope, and perfusion studies used 99mTc-macroaggregated albumin (MAA) as the nuclear isotope. PIOPED criteria were used in four studies, and original criteria were formulated in one study (27). The reported specificity of V-P scanning with threshold 1 ranged from 96.0% (5) to 100% (10,14). With threshold 2, sensitivity ranged from 54.5% (10) to 100% (14), and with threshold 3, sensitivity ranged from 98% (5) to 100% (10,14,27,29). Details of the articles included are summarized in Table 2.
|
|
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
In two studies (10,14), the diagnostic test performance of both helical CT and V-P scanning were reported in one article, but it was not plausible to directly compare these two modalities for several reasons. In one of these studies, not all patients underwent both tests, and data were insufficient for direct comparison (14). The only study we could use for direct comparison was one in which all patients underwent both helical CT and V-P scanning (10). It might be interesting to directly compare the performance of helical CT and V-P scanning; however, the main aim of the second study was not to compare the performance of the two tests, and there exists a verification bias (eg, patients underwent helical CT on the basis of results of V-P scanning). The direct comparison influenced by verification bias might have caused deviation in the performance of the latter test; thus, it was rather misleading to compare the two tests in this study (23).
Statistical Methods and Results
The summary ROC analysis allows us to compare results of different tests by summarizing sensitivity and specificity results from several studies into a single ROC curve (21). In the past decade, many articles have suggested that helical CT might be more effective than V-P scanning in the diagnosis of PE (32,33), but formal comparison of the two diagnostic tests has yet to be performed. Although previous reports have described helical CT as superior to V-P screening because of its higher sensitivity and specificity, one cannot state that helical CT is more useful for this reason alone. In general, a negative result essentially rules out PE when a test with very high sensitivity is used, and a positive result effectively confirms diagnosis of PE when a test with very high specificity is used (30). In the clinical setting, V-P scanning with a high-probability threshold and high specificity has been used to confirm a diagnosis of PE, and V-P scanning with a normal and/or near normal threshold with high sensitivity has been used to exclude a diagnosis of PE. It is therefore necessary to compare the sensitivity of helical CT with that of V-P scanning by using the normal and/or near-normal threshold to exclude the possibility of PE; it is also necessary to compare specificity to diagnose PE with summary ROC analysis.
There has been criticism of the sensitivities and specificities claimed in earlier helical CT reports (34,35). New diagnostic tests are often described in glowing terms when they are first introduced; however, these tests are often found to be wanting when more experience has been gained (36). This frequently results from limitations in the methods used to evaluate test characteristics. An example of this phenomenon is use of carcinoembryonic antigen (30). Carcinoembryonic antigen was originally considered a very promising tool in the diagnosis of colon cancer; however, carcinoembryonic antigen level was subsequently found to be increased in a wide variety of instances, including in smokers without cancer. The same seems true for helical CT used in the diagnosis of PE. When compared with earlier studies that showed high sensitivity (90%100%) and specificity (96%100%), the test performance was lower (sensitivity, 70%; specificity, 91%) in a study performed in 2001 (35). On the other hand, rapid advances in the CT method used in the diagnosis of PE have been made in the past decade. Thus, we performed sensitivity analysis, and the results of ROC analysis did not change when articles published before 1995 were excluded. Moreover, univariate analysis reveals that publication year is not statistically significant.
Another criticism of high accuracy rates, as pointed out in recently published review articles, is that methodologic problems are common in studies used to evaluate helical CT in the diagnosis of PE (eg, several reports were missing key data regarding the methods used to select patients). It is unclear, however, how these methodologic problems have influenced our results (26,37). The second PIOPED study, which is being funded by the U.S. National Institutes of Health, involves evaluation of the accuracy of helical CT in the diagnosis of PE in more than 1000 patients and should bring us closer to the solution of these methodologic problems (38).
Clinical Implications and Results
Our work shows that when the high probability threshold is used, helical CT and V-P scanning have comparable discriminatory power; however, when the normal and/or near-normal threshold is used, helical CT had greater discriminatory power than V-P scanning. It is now more important to consider other aspects of the tests. First, the main problem with V-P scanning is that definitive diagnosis can be obtained in less than 30% of patients tested, and the remaining patients need to undergo further testing (5). The use of helical CT would therefore reduce the number of patients subjected to further diagnostic tests (33). In contrast, one should consider the presence of contraindications before performing helical CT. In one report, about 24% of patients suspected of having PE did not undergo helical CT because of contraindications, such as impaired renal function or allergy to contrast agent (35). This is a substantial proportion of patients and is similar to the proportion of patients with inconclusive results of V-P scanning.
Second, as for V-P scanning, large differences (25%30%) of interpretation among expert readers have been reported, especially in the classification of low- or intermediate-probability scans. In contrast, helical CT has better inter-observer agreement than does V-P scanning (
value of 0.85 and 0.61, respectively) (12). Third, there are inconsistent results concerning relative cost-effectiveness, with the controversy continuing (32,34). The advantages or disadvantages of either test are crucial in application to the patients when PE is suspected. It is important to judge the advantages and disadvantages of each test and select the most appropriate procedure for a favorable outcome. On the basis of the results of our analyses, we recommend the following strategies: Confirm PE with moderate to high pretest probability, and use either helical CT or V-P scanning, according to high-probability threshold. Exclude PE with low pretest probability, as helical CT is a better test than V-P scanning. If helical CT is not available, V-P scanning with normal and/or near-normal threshold could be an alternative technique. To exclude PE with moderate or high pretest probability or to confirm PE with low pretest probability, avoid V-P scanning by using low-probability threshold.
Limitations
Our review has several limitations. First, because of the nature of meta-analysis, the result is subject to publication bias. Only published reports were examined, and studies with poor results are less likely to be written, submitted, and accepted. Our results may therefore be biased toward the favorable direction. This tendency should affect helical CT and V-P scanning equally and should not alter our qualitative conclusion. Second, as in all meta-analyses of diagnostic testing, verification bias could be present, since about half of the studies included did not control or mention verification bias. Verification bias occurs when the result of the test influences the decision as to which patients receive the verification test. This can have dramatic results on the sensitivity and specificity of a test (39). We were unable to correct for this bias because the original studies did not provide the necessary information on the entire population tested; in our study, however, covariate analysis did not show a significant difference between studies that did and those that did not control for verification bias. Third, the large degree of variation between observers in reading V-P scans could limit the interpretation of our results when combining studies; this possibility was rejected with a test for homogeneity.
Numerous studies were excluded from analysis, and a smaller number of studies was finally included in meta-analysis compared with previous meta-analysis of helical CT (19). One reason is that some studies used combination reference standards to compare helical CT and V-P scanning (eg, normal results at V-P lung scanning were accepted as an alternative reference standard for the absence of PE) (12,33). The small number of studies included in the current analysis might have decreased our power to detect the true difference. This could not have been avoided, however, because it is usually assumed that the test is being compared with a sole reference standard when meta-analysis of diagnostic tests is conducted. We therefore chose the optimal strategy that pulmonary angiography should be the sole reference standard. For the same reason as mentioned previously, only one multidetector row helical CT report was included in our analysis. We might have underestimated the test performance of helical CT because the recent introduction of multidetector row helical CT is expected to offer a further increase in performance, particularly in the ability of physicians to scan larger anatomic volumes with high spatial resolution (40). The stringent inclusion criteria used in the current study, however, should have ensured the quality of our results.
Independently pooled estimates of sensitivity and specificity could be calculated easily; however, these frequently used methods have come under strong criticism because they do not take into account the fact that different studies may have used test thresholds (41). In spite of this, the reason why we pooled estimates of sensitivity and specificity separately is that the results of ß coefficients are not always easy for readers to intuitively understand. In a real-world setting, reports of summary ROC analysis present the results of pooled sensitivities; specificities are presented in reports of summary ROC analysis (42,43). Another reason we pooled estimates of sensitivity and specificity separately is that authorities in the field of decision sciences recommend to pool sensitivities and specificities of diagnostic tests and use these data for cost-effectiveness analysis (44). These pooled estimates, however, should carefully be interpreted when used to compare these two diagnostic tests directly.
In conclusion, helical CT has greater discriminatory power than V-P scanning with the normal and/or near-normal threshold in the exclusion of PE, while helical CT and V-P scanning with high-probability threshold had similar discriminatory power in the diagnosis of PE.
| FOOTNOTES |
|---|
Authors stated no financial relationship to disclose.
Author contributions: Guarantor of integrity of entire study, Y.H.; study concepts and design, all authors; literature research, Y.H., M.G.; data acquisition, Y.H., G.M.; data analysis/interpretation, all authors; statistical analysis, Y.H., Y.N.; manuscript preparation and definition of intellectual content, all authors; manuscript editing and revision/review, Y.H., Y.N., T.F.; manuscript final version approval, all authors
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M.-P. Revel, R. Triki, G. Chatellier, S. Couchon, N. Haddad, A. Hernigou, C. Danel, and G. Frija Is It Possible to Recognize Pulmonary Infarction on Multisection CT Images? Radiology, September 1, 2007; 244(3): 875 - 882. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Scarsbrook, K. Bradley, F. Gleeson, A. M. Groves, S. J. Yates, T. Win, I. Kayani, J. Bomanji, and P. J. Ell Perfusion Scintigraphy Still has Important Role in Evaluation of Majority of Pregnant Patients with Suspicion of Pulmonary Embolism Radiology, August 1, 2007; 244(2): 623 - 625. [Full Text] [PDF] |
||||
![]() |
C. H. McCollough, B. A. Schueler, T. D. Atwell, N. N. Braun, D. M. Regner, D. L. Brown, and A. J. LeRoy Radiation Exposure and Pregnancy: When Should We Be Concerned? RadioGraphics, July 1, 2007; 27(4): 909 - 917. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Tunariu, S. J.R. Gibbs, Z. Win, W. Gin-Sing, A. Graham, P. Gishen, and A. AL-Nahhas Ventilation-Perfusion Scintigraphy Is More Sensitive than Multidetector CTPA in Detecting Chronic Thromboembolic Pulmonary Disease as a Treatable Cause of Pulmonary Hypertension J. Nucl. Med., May 1, 2007; 48(5): 680 - 684. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Segal, J. Eng, L. J. Tamariz, and E. B. Bass Review of the Evidence on Diagnosis of Deep Venous Thrombosis and Pulmonary Embolism Ann. Fam. Med, January 1, 2007; 5(1): 63 - 73. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. R. O. Nyman, B. Jacobsson, M. D. Cham, D. F. Yankelevitz, and C. I. Henschke Routine CT Venography after CT for Pulmonary Embolism: Evidence-based Radiology or Hemorrhage from Anticoagulation of False-Positive Deep Venous Thrombosis? Radiology, December 1, 2006; 241(3): 945 - 947. [Full Text] [PDF] |
||||
![]() |
A. M. Groves, S. J. Yates, T. Win, I. Kayani, F. A. Gallagher, R. Syed, J. Bomanji, and P. J. Ell CT Pulmonary Angiography versus Ventilation-Perfusion Scintigraphy in Pregnancy: Implications from a UK Survey of Doctors' Knowledge of Radiation Exposure Radiology, September 1, 2006; 240(3): 765 - 770. [Abstract] [Full Text] [PDF] |
||||
![]() |
S Matthews Imaging pulmonary embolism in pregnancy: what is the most appropriate imaging protocol? Br. J. Radiol., May 1, 2006; 79(941): 441 - 444. [Abstract] [Full Text] [PDF] |
||||
![]() |
ADDITIONAL ARTICLES ABSTRACTED IN ACP JOURNAL CLUB Evid. Based Med., October 1, 2005; 10(5): 156 - 156. [Full Text] [PDF] |
||||
![]() |
E. S. Darze, A. L. Latado, A. G. Guimaraes, R. A. V. Guedes, A. B. Santos, S. S. de Moura, and L. C. S. Passos Incidence and Clinical Predictors of Pulmonary Embolism in Severe Heart Failure Patients Admitted to a Coronary Care Unit Chest, October 1, 2005; 128(4): 2576 - 2580. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |