|
|
||||||||
Evidence-based Practice |
1 From the Department of Specialist Radiology (S.H., S.A.T.), University College Hospital, Euston Rd, London, NW1 2BU, England; Intestinal Imaging Centre (C.I.B.) and Cancer Research UK Colorectal Cancer Unit (W.A.), St Mark's Hospital, Northwick Park, London, England; and Cancer Research UK/NHS Centre for Statistics in Medicine, Old Road Campus, Oxford, England (D.G.A., S.M., J.J.D.). Received February 2, 2005; revision requested April 4; revision received May 23; accepted June 20. Supported by a grant from the European Association of Radiology, administered by the European Society of Gastrointestinal and Abdominal Radiology. Address correspondence to S.H.
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: The MEDLINE database was searched for colonography reports published between 1994 and 2003, without language restriction. The terms colonography, colography, CT colonoscopy, CT pneumocolon, virtual colonoscopy, and virtual endoscopy were used. Studies were selected if the focus was detection of colorectal polyps verified with within-subject reference colonoscopy by using key methodologic criteria based on information presented at the Fourth International Symposium on Virtual Colonoscopy (Boston, Mass). Two reviewers independently abstracted methodologic characteristics. Per-patient and per-polyp detection rates were extracted, and authors were contacted, when necessary. Per-patient sensitivity and specificity were calculated for different lesion size categories, and Forest plots were produced. Meta-analysis of paired sensitivity and specificity was conducted by using a hierarchical model that enabled estimation of summary receiver operating characteristic curves allowing for variation in diagnostic threshold, and the average operating point was calculated. Per-polyp sensitivity was also calculated.
RESULTS: Of 1398 studies considered for inclusion, 24 met our criteria. There were 4181 patients with a study prevalence of abnormality of 15%72%. Meta-analysis of 2610 patients, 206 of whom had large polyps, showed high per-patient average sensitivity (93%; 95% confidence interval [CI]: 73%, 98%) and specificity (97%; 95% CI: 95%, 99%) for colonography; sensitivity and specificity decreased to 86% (95% CI: 75%, 93%) and 86% (95% CI: 76%, 93%), respectively, when the threshold was lowered to include medium polyps. When polyps of all sizes were included, studies were too heterogeneous in sensitivity (range, 45%97%) and specificity (range, 26%97%) to allow meaningful meta-analysis. Of 150 cancers, 144 were detected (sensitivity, 95.9%; 95% CI: 91.4%, 98.5%). Data reporting was frequently incomplete, with no generally accepted format.
CONCLUSION: CT colonography seems sufficiently sensitive and specific in the detection of large and medium polyps; it is especially sensitive in the detection of symptomatic cancer. Studies are poorly reported, however, and the authors propose a minimum data set for study reporting.
© RSNA, 2005
| INTRODUCTION |
|---|
|
|
|---|
For CT colonography, ambitious claims mostly are based on relatively few within-subject comparisons with colonoscopy from single centers. At the time of this writing, only two multicenter trials of CT colonography have been published (3,4). Furthermore, reported results vary considerably, with quoted sensitivities for large (>1 cm) adenomas of 8%100% (2). The purpose of our study was to assess the methodologic quality of the available data in published reports of CT colonography by performing a systematic review and meta-analysis.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Data Sources
A search of the biomedical literature was performed by two researchers (S.H., S.A.T.) working independently and using the MEDLINE database to identify studies involving human subjects. Each researcher covered the period from January 1994, whento our knowledgethe technique was first described (5), to December 2003. They used the search terms colonography, colography, CT colonoscopy, CT pneumocolon, virtual colonoscopy, and virtual endoscopy. There was no language restriction. A preliminary search that covered the period from January 1994 to May 2003 included other electronic databases (Cochrane controlled trials register, EMBASE, Science Citation Index) and hand searching of key journals from the fields of radiology, gastroenterology, surgery, and general medicine; however, this search did not result in identification of any additional study that was not identified with the MEDLINE database. The reference list of reports that were eventually selected was also searched.
Study Selection
Studies were eligible for inclusion if the focus was the detection of colorectal polyps and if the study used key methods for CT colonography that were based on the consensus document presented at the Fourth International Symposium on Virtual Colonoscopy (Boston, Mass) (6). In particular, these criteria stipulated that full bowel preparation should be administered, prone and supine images should be acquired, and helical scanners should be used. Selected studies were restricted to those with full reports, in which original data from in vivo research conducted in human subjects were presented. Studies that were not peer reviewed (eg, abstracts from meetings) were excluded. Any study with fewer than 30 patients was excluded in an attempt to diminish the effect of incorporating any learning curve for CT colonography.
Participants and Prior Tests
Studies were ineligible for inclusion if the prevalence of abnormality could be guessed by the CT observers to be excessively high because a priori patient selection criteria were used. For example, studies that required a prior positive test result for recruitment in the majority of patients were excluded because the observers would know in advance that a positive finding likely existed on CT images. Studies in which patients underwent CT because of incomplete colonoscopy due to an obstructing tumor were also excluded. Studies were still eligible for inclusion, however, if these patients represented the minority of patients examined (ie, less than 50% of patients) or if they formed an identifiable subset that could be excluded during data extraction.
Target Disorder
To be included in our study, the focus of the study had to be detection of colorectal polyps. Studies without details of polyps and their verification with a reference test (eg, those focusing on patient preferences) were excluded. Any study with artificially inserted polyps, digital or otherwise, was excluded.
CT Test Methods
On the basis of the consensus document presented at the Fourth International Symposium on Virtual Colonoscopy (Boston, Mass) (6), all patients had to undergo full bowel preparation before imaging, and both prone and supine CT images were obtained. It was acceptable if one of the two CT acquisitions was limited to the pelvis rather than the upper abdomen, since the sigmoid colon is most susceptible to luminal collapse, and exclusion of the upper abdominal region was considered less likely to result in bias of subsequent findings. Studies that presented mixed results from patients examined in single and dual orientations were potentially eligible if data relating to patients examined in dual orientations could be extracted from the report. We excluded studies in which intravenous iodinated contrast material was routinely administered to all patients, since this is unlikely to be offered in the context of a screening program for colorectal cancer (6). Studies in which intravenous contrast material was administered during subsequent CT as a response to a possible abnormality detected with initial unenhanced CT were potentially eligible. Studies with mixed results from patients who routinely received intravenous contrast material and those who did not were potentially eligible if data relating to the patients who routinely received intravenous contrast material could be extracted from the report. We required that interpretation of CT colonographic findings precede the reference test or that observers be unaware of reference findings at the time of interpretation of CT findings.
In addition, we required that the software used for interpretation of CT colonography findings be commercially available. Custom platforms were potentially eligible, however, if their features mimicked those available on commercial platforms. In particular, systems had to allow two-dimensional interpretation, with luminal three-dimensional rendering for problem solving (2). Any system that did not permit this was excluded, as was any noncommercial system in which the method of three-dimensional rendering was unconventional or the included diagnostic aids generally were unavailable (eg, computer-aided detection). We did not, however, stipulate that two-dimensional interpretation had to be used for analysis, with three-dimensional interpretation used for problem solving; a primary three-dimensional interpretation was equally acceptable.
Reference Test
All CT colonography findings had to be verified with a within-subject reference test; normally, conventional endoscopy was used, although we stipulated in advance that surgical findings were an acceptable alternative.
Data Extraction
The same two researchers (S.H., S.A.T.) independently assessed the abstract of each potential study and rejected it if it was clearly ineligible. The author names, journal, and year of publication were noted for the remaining reports, and these reports were retrieved, photocopied, and translated into English, if necessary. The same two researchers then independently searched the full version of each report to determine if it was eligible for inclusion. Disagreements were resolved by consensus after a face-to-face discussion (S.H., S.A.T.); however, persistent uncertainty was resolved with regular monthly meetings with other authors (D.G.A., S.M.), if necessary.
For each study, we noted the sample size and calculated the per-patient prevalence of neoplasia according to the reference test. We attempted to extract 2 x 2 contingency tables for test characteristics for both per-patient and per-polyp analyses. Because studies generally reported polyps grouped into three size categories, to reflect their biologic importance, we similarly stratified extracted data into the following three size categories, when possible: small (generally defined as polyps smaller than 6 mm), medium (generally defined as polyps between 6 and 9 mm), and large (generally defined as polyps 1 cm or larger). We attempted to identify established cancers separately from polyps. Readings from multiple observers, if reported, were averaged and rounded down to the nearest whole figure in case of positive CT findings. If it was stated explicitly that a result was from an inexperienced observer (eg, the performance of experienced and inexperienced readers was compared), this result was excluded. We defined an inexperienced observer as one who had interpreted fewer than 30 colonography studies in total.
We also extracted important methodologic information about each study that might relate to trial quality or potential bias. This was performed according to the Standards for Reporting of Diagnostic Accuracy (7) and Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews (8) guidelines, and it was based on the study population and technical aspects of CT and the reference test.
The following data were extracted from reports: (a) We noted whether asymptomatic patients were included and whether these patients had a history of colorectal polyps or cancer or if the results of a recent screening test performed prior to CT colonography were positive. (b) We noted the time interval between CT colonography and the reference test and whether the result of the reference test was modified because of CT findings (specifically, segmental unblinding of CT results as colonoscopy progressed). (c) We determined if it was possible for other researchers to replicate the technique used for CT colonography and the reference test from the technical and methodologic information presented in each report. (d) We noted whether technical failures of CT colonography and the reference test were reported (namely, when colonoscopy did not reach the cecum) or if no technical failure was explicitly stated for either test. We recorded the number of observers who reported the CT colonographic findings for each patient and whether their findings were documented individually or in consensus. (e) We examined how potential lesions identified at CT colonoscopy were matched or rejected at subsequent colonoscopy. Specifically, we noted whether lesions were matched by colonic segment and determined how lesions were measured with both tests (eg, whether polyps were measured in situ during colonoscopy or ex vivo after polypectomy). (f) We noted whether any learning effect for CT colonography was presented as the trial progressed.
Authors with more than one study being considered for inclusion were contacted to determine if there was any overlap in patient populations; in the case of overlap, duplicate studies were excluded. An attempt was made to contact authors if data presentation was incomplete or if it was necessary to resolve an apparent conflict or inconsistency in the article. Our institutional review board does not require its approval for such contact; however, individuals who were contacted were informed of our study before they provided us with responses to our queries. Additional information was most frequently required because information relating to the per-patient analysis was missing or incomplete or because cancers were not separately identified from colorectal polyps.
Statistical Analysis
Per-patient analysis was related to the size of the largest polyp in each patient. All polyps were included, irrespective of histologic findings. We considered three categories of data: category 1, large polyps alone; category 2, medium and large polyps combined (ie, those studies for which results were available for patients with at least one polyp larger than 5 mm); and category 3, all polyps (ie, small, medium, and large categories combined, with no minimum size).
These categories reflect different thresholds for clinical decisions and mirror those used in the colonographic literature. Although these categories overlap, we were unable to obtain all the potentially relevant data. We expected the findings across the categories to reflect the fact that larger polyps are easier to detect. For each report, per-patient data from the extracted 2 x 2 table were used to calculate sensitivity, specificity, and exact 95% confidence intervals (CIs). Only studies from which both sensitivity and specificity could be estimated were included in the per-patient analysis. Forest plots of sensitivities and specificities of included studies, grouped according to polyp size categories, were produced with statistical software (Stata, release 8.0; Stata, College Station, Tex) by using the meta command.
Meta-analysis of paired sensitivity and specificity was conducted by using a hierarchical model that enables estimation of a summary receiver operating characteristic (ROC) curve that allows for variation in threshold between studies. Explicit variation will arise when the minimum threshold for detection of polyps differs between studies, whereas additional variation will occur with differences between the spectrums in the diseased and healthy groups included in the different samples. The summary ROC model was fitted with a nonlinear binary random-effects method that used the PROC NLMIXED command (9) of the SAS program (SAS Institute, Cary, NC). The model is used to estimate the average threshold and diagnostic odds ratio, as well as variability, and it allows summary ROC curves to have either a symmetrical or an asymmetrical shape. To obtain stable estimates of category 2 polyp size, the model was simplified to enable estimation of a summary ROC curve with a symmetrical shape and no variability in the diagnostic odds ratio. From the model, it is possible to calculate the average operating point, which is the point on the summary ROC curve that represents the sensitivity and specificity results at the average threshold, together with 95% CIs. When interpreting the results of these models, it is important to consider both these figures and the variability in sensitivity and specificity along this curve, as depicted in the ROC plot across the range of study values.
Per-polyp analysis yielded results for sensitivity but not specificity, as there was no denominator for these data. Analysis of per-polyp data was undertaken with a random-effects meta-analysis model on logit-transformed proportions by using the meta command of the statistical software. We used
2 tests to assess heterogeneity between studies, and separate tests were used for sensitivity and specificity. Results are presented as the average sensitivity, with 95% CIs; study variation is indicated by the range of sensitivity values observed for the individual studies. For detection of cancer, the number of cancers per study was too small to allow meta-analysis. Sensitivity was calculated by treating the data as if they were from a single study, thus leading to uncertainty regarding the CIs calculated with this method.
| RESULTS |
|---|
|
|
|---|
|
|
We were able to extract a fully populated 2 x 2 contingency table for per-patient data for any polyp size category from the information presented in the article in only 12 (50%) articles (12,15,16,18,19,2124,27,29,32). Data were available for a further five (21%) articles after we contacted the corresponding author (4,10,11,13,31). In contrast, we were able to extract a 1 x 2 contingency table for per-polyp data for any polyp size category in all articles, although one article (4) reported this for adenomas only.
Per-Patient Analysis
For each of the three polyp size categories we present (a) a forest plot of sensitivity; (b) a forest plot of specificity; and (c) an ROC plot of sensitivity versus 1 minus specificity. For polyps in categories 1 and 2, we also show in the ROC plot the fitted summary ROC curve. This last analysis was not performed for category 3 polyps because of the considerable amount of heterogeneity between studies. Heterogeneity in the results for polyps in categories 1 and 2 was of a lower magnitude and was estimated with the random-effects meta-analysis included in our statistical model.
For category 1 polyps, most studies had high sensitivity (Fig 1a), and all studies had excellent specificity (Fig 1b). Figure 1c shows that the studies cluster near the top left corner in the ROC plot, and the fitted summary curve from meta-analysis is very close to the corner. Meta-analysis was based on data from 2610 patients in seven studies; in 206 of these patients, at least one large polyp was identified. From this model, the operating point based on the included studies has an average sensitivity of 93% (95% CI: 73%, 98%; range, 64%100%) and an average specificity of 97% (95% CI: 95%, 99%; range, 95%100%).
|
|
|
|
|
|
|
|
|
Per-Polyp Analysis
Figure 4 shows sensitivity of polyp detection for category 13 polyps. These results show how the performance of CT colonography deteriorates for smaller polyps. For categry 1 polyps, the average sensitivity was 77% (95% CI: 70%, 83%). For category 2 polyps, the average sensitivity was 70% (95% CI: 63%, 76%). We did not pool the data for category 3 polyps (Fig 4c) because of the large amount of heterogeneity, as discussed for the per-patient analysis.
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Our analysis emphasized the findings to date, mostly on the basis of single-center studies; CT colonography used as a diagnostic tool on a per-patient basis has high average sensitivity (93%; 95% CI: 73%, 98%) and average specificity (97%; 95% CI: 95%, 99%) for larger colorectal polyps, and these test characteristics diminish with the size of the target lesion. A striking finding was the high sensitivity for detection of cancer, which has hitherto been obscured by the relatively small numbers of patients per individual study and the fact that medical and public attention has been focused on polyps and screening. In most studies, patients were recruited from symptomatic populations; this strongly suggests that CT colonography merits further investigation as a diagnostic tool for cancer in its own right.
Heterogeneity between individual study results was observed in this review, as is common with diagnostic accuracy studies. Heterogeneity can be due to random variation between studies, variation of study characteristics (eg, patient spectrum), or variation in the diagnostic threshold required for a positive test result. The latter can be either explicit or implicit (75). Explicit threshold effects were accounted for by analyzing studies according to polyp size: category 1 (ie, large polyps), category 2 (ie, medium and large polyps), and category 3 (all polyps). We found greater heterogeneity of study results within category 3 polyps than within either of the other categories of polyps; this was most likely due to a spectrum that included mixed polyp sizes. The ROC plots and regression analysis enabled us to examine additional sources of heterogeneity between studies. We predicted that incorporation bias might increase sensitivity and specificity and that reporting results by consensus assessment might also increase sensitivity; unfortunately, there were too few studies to allow us to assess heterogeneity due to these two study characteristics.
In broad terms, our summary estimates are similar to those found by Sosna et al (74), who analyzed 14 studies; however, these authors reported no difficulty extracting data for their analysis. In contrast, we consider the main outcome from our review the finding that there was no generally accepted format for data reporting. Methods were markedly heterogeneous, with the result being that important data from the original article frequently were unavailable. This situation is analogous to early studies of magnetic resonance (MR) imaging, where Cooper et al (76) found poor levels of data reporting in 54 studies of MR imaging. Dachman and Zalis (77) have proposed standards for performing and reporting studies of CT colonography, with the aim being to facilitate data synthesis. The objective findings from our systematic review support their observations (77). We wish to emphasize their suggestions by direct reference to the results of our review and to extend them by suggesting a minimum data set for study-level reporting of CT colonography (Tables 3, 4). This minimum data set is directly based on our difficulties with data extraction, and its rationale can be broken down into the following five categories.
|
|
The patient population recruited should be described fully. Asymptomatic patients should be separated into those with no known risk factor and those with an above-population risk because they are under surveillance, have already tested positive to another test, or have a family history sufficient to suspect a diagnosis of hereditary nonpolyposis colorectal cancer. It should be remembered that colorectal cancer is common, and even population risk groups will include subjects with a family history of disease. Whether the sample was acquired consecutively, conveniently, prospectively, or retrospectively should be described. Withdrawal of subjects should be detailed, and inclusion of subjects who participated in previously reported studies should be acknowledged. Results should be presented according to the different risk subgroups included.
Replication of CT Colonography and Reference Test
While CT colonography could be satisfactorily reproduced from the descriptions given in all studies, technical details relating to reference colonoscopy were frequently inadequate.
Reference colonoscopy should be detailed sufficiently to allow it to be replicated by other researchers. The experience of the operator(s) should be defined. Technical failures and their cause should be detailed for both CT and colonoscopy.
Observers of CT Colonography
It was occasionally unclear how many observers looked at each CT study and exactly what was meant by a consensus decision. Consensus implies that a decision was made after a face-to-face discussion, but some researchers presented results summated from two or more independent observers.
The number of CT observers per research study and per patient should be detailed, along with the experience of these observers. Results should be presented fully for each observer or, if derived by consensus, this should follow face-to-face discussion rather than be an aggregate of individual observations. An investigation of learning effects should be presented for inexperienced observers.
Matching Lesions between CT and Reference Colonoscopy
Ostensibly, assessment of CT colonography should be simple to achieve with within-subject comparison with reference colonoscopya design that is used by all eligible studies. We found, however, that several common factors conspired to frustrate meaningful assessment: A polyp found with CT colonography but not with colonoscopy is regarded as a false-positive finding. A polyp that is not identified with colonoscopy has several potential meanings: This could mean that there is no polyp, no polyp was found with colonoscopy (ie, it was missed), or a polyp was found with colonoscopy, but it was thought to be different from the polyp seen on the CT image. This may be because it is thought to be in a different location in the colon, a different size than the polyp seen on CT images, or both. Meaningful assessment of CT colonographic findings is, therefore, dependant on the accurate matching of polyps detected with CT with those found (or not found) with reference colonoscopy. During our analysis, it became clear that matching polyps was potentially the major source of error and uncertainty in these evaluations.
Most investigators recorded the segmental location of polyps detected during both CT and colonoscopy, and they considered a match to have been made if a polyp found on a CT image was in the same or the immediately adjacent anatomic segment at colonoscopy. However, the boundary between segments rarely was defined precisely; even if the boundary was defined precisely, it is unclear how reliable these definitions are. For example, electromagnetic imaging has shown that colonoscopists are frequently unable to correctly locate the anatomic position of the endoscope tip (78). Many investigators also used a size matching algorithm between CT and colonoscopy (eg, stating that polyps had to be within 50% of the colonoscopic measurement). In almost every report, the results presented did not distinguish which false-positive CT findings were due to no lesion being seen at colonoscopy and which were due to size mismatching, anatomic mismatching, or a combination of both. Also, no report presented a cross-tabulation of polyp sizes obtained with the two methods.
The problem of size matching is further complicated by the fact that investigators usually chose to group polyps into three size categories (ie, small, medium, and large), as a reflection of their biologic importance. It was usually unclear how often polyps allocated to one category according to their size on CT images were reclassified when measured with colonoscopy. Most important, this could result in the illogical situation of a large polyp being identified at CT colonography and treated as a false-positive finding if subsequent colonoscopy revealed a medium polyp in the same segment, with an additional false-negative finding because the medium polyp was missed at CT. Moreover, the biologic importance of these arbitrary classifications is well known, and we could find no study in which the observer making the measurement (for both CT and colonoscopy) was blinded to the value of the measurement itself; a 1-cm polyp has an importance that a 9-mm polyp does not, and this knowledge may influence the value of the measurement being made.
Furthermore, polyps may be measured by using a variety of methods with CT and colonoscopy. Reference measurement was usually performed with adjacent open biopsy forceps, which are known to be inaccurate (79); a measuring probe was used explicitly in only two studies (4,18). In some studies, measurements were obtained in vitro after polypectomy, whereas other studies combined in vivo and in vitro measurements; usually, it was not stated which of these was the reference measurement. Some reports did not describe how the reference measurement was obtained. During CT colonography, polyps may be measured with either two-dimensional source images, two-dimensional multiplanar reformatted images, or intraluminal three-dimensional rendered perspective images, each of which has been shown to give differing results, as does the window setting used to view the images (80).
Colonoscopy is also known to be an imperfect reference standard. Consecutive studies suggest that competent practitioners initially miss 24% of adenomas (81). A minority of studies used segmental unblinding to account for this in an attempt to avoid true-positive CT findings being incorrectly classified as false-positive CT findings. While this procedure leads to a reference standard that is closer to the truth, it violates a fundamental feature of a fair comparison of CT and colonoscopy, and it will lead to overestimation of both the sensitivity and the specificity.
The exact measurement method should be stated for both CT and colonoscopy.
Both CT and initial colonoscopy should be performed with blinding to any preexisting results or history. Segmental unblinding should be used to modify the initial diagnostic colonoscopy to obtain an enhanced reference standard, but this should not be used as the basis for evaluating the performance of colonoscopy versus CT.
Data should be presented in the form of a contingency table that includes every polyp size category being considered, which to some extent will enable researchers to overcome the problems of distinguishing between false-positive CT findings either due to no polyp being detected at colonoscopy or due to size mismatching.
Analysis and Data Presentation
All reports detailed by-polyp analysis, but only 50% of reports presented data sufficient to extract any 2 x 2 table relating to per-patient analysis. However, like other researchers (2), we would argue that this is the key analysis. CT colonography is used to identify individuals with polyps or cancers who need subsequent video colonoscopy for polypectomy or biopsy. The number and size of lesions are, therefore, irrelevant once a polyp large enough to trigger subsequent colonoscopy is identified. In one study, researchers chose to exclude nonadenomatous polyps from analysis (4). This seems illogical, since endoscopy is necessary to ultimately determine histologic nature. Similarly, many studies did not distinguish between adenomas and cancers. Some reports confounded per-patient analysis by presenting data in three size categories, but they did not indicate where individuals contributed to more than one category. We believe a sensible approach is to present data in terms of size thresholds (ie, all polyps above a specified size are included) (82); however, this does not deal with the problem of measurement error, which is especially relevant when polyps lie close to a threshold. Per-patient analysis also potentially eliminates problems due to location matching, although it does not properly deal with the problem that occurs when different polyps in the same patient are correctly identified with CT and colonoscopy. A per-polyp analysis may also be desirable, but consideration should be given to how size matching is handled, since this appears to penalize CT.
We think that per-patient data should be presented in a contingency table. The results of CT and initial colonoscopy should be compared with the results of modified colonoscopy to appropriately compare CT with colonoscopy in daily practice.
Data for polyps of a given size should be presented, regardless of histologic findings, but per-patient data for adenomas and cancers should be detailed in subset analyses.
Results should be interpreted as positive if both CT and reference colonoscopy depict a polyp larger than a stated size threshold, with a secondary analysis performed to determine how well the two methods agree for size measurement.
Our study had limitations. We used methods being developed by members of the Cochrane Collaboration Diagnostic and Screening Test Methods Group (83); however, as indicated by the systematic review component of our research, data presentation was variable. We did not receive a reply from all authors we contacted, and some authors were unable to supply the additional data we needed to complete our analysis of their study. Thus, the meta-analytical component of our study must be interpreted with some caution. Also, it is a well-recognized fact that the statistical method used for meta-analysis of diagnostic tests is not as established as that used for meta-analysis of therapeutic interventions (75). Diagnostic thresholds differ between trials and between observers in the same trial, and this can affect heterogeneity profoundly. This is especially so with CT colonography because diagnostic threshold depends not only on whether a polyp is identified but also on its measurement, something which is also likely to be observer and technique dependent and differ within and across trials. Ultimately, it might be argued that meta-analysis of relatively small studies is irrelevant when results from large multicenter trials are available (3,4). However, we would argue that the conflicting results from such trials leave us no wiser. Rather, they mandate detailed systematic analysis of the experimental methods used for studies of CT colonography. Some will argue that the rapid pace of technologic advance in radiology precludes inclusion of anything but the most recent studies; however, we could find no difference between single and multidetector row CT scanners.
Our analysis suggests that CT colonography has high average sensitivity and specificity for large and medium colorectal polyps and excellent sensitivity for cancer in symptomatic patients. More work is needed in asymptomatic subjects. Our analysis formally supports the impression of other researchers (77) that more detailed reporting is needed. The minimum data set proposed (Table 3) is based on the methodologic problems we encountered during this systematic review. We hope its adoption will lead to better quality reporting of future studies.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Abbreviations: CI = confidence interval ROC = receiver operating characteristic
See Materials and Methods for pertinent disclosures.
Author contributions: Guarantor of integrity of entire study, S.H.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; literature research, S.H., S.A.T., S.M.; statistical analysis, D.G.A., S.M., J.J.D.; and manuscript editing, all authors
| References |
|---|
|
|
|---|