Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print February 21, 2008, 10.1148/radiol.2471070816
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Supplemental Table and Figures
Right arrow All Versions of this Article:
2471070816v1
247/1/133    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Taylor, S. A.
Right arrow Articles by Halligan, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Taylor, S. A.
Right arrow Articles by Halligan, S.
(Radiology 2008;247:133-140.)
© RSNA, 2008


Gastrointestinal Imaging

CT Colonography and Computer-aided Detection: Effect of False-Positive Results on Reader Specificity and Reading Efficiency in a Low-Prevalence Screening Population1

Stuart A. Taylor, MD, MRCP, FRCR, Rebecca Greenhalgh, FRCR, Rajapandian Ilangovan, MD, FRCR, Emily Tam, FRCR, Vikram A. Sahni, FRCR, David Burling, MD, MRCP, FRCR, Jie Zhang, MD, Paul Bassett, BSc, Perry J. Pickhardt, MD, and Steve Halligan, MD, FRCP, FRCR

1 From the Department of Specialist X-Ray, University College Hospital, 2F Podium, 235 Euston Rd, London NW1 2BU, England (S.A.T., R.G., S.H.); Department of Intestinal Imaging, St Mark's Hospital, Harrow, England (R.I., E.T., V.A.S., D.B., P.B.); Department of Radiology, Beijing Friendship Hospital, Beijing, China (J.Z.); and Abdominal Imaging Section, University of Wisconsin Medical School, Madison, Wis (P.J.P.). Received May 9, 2007; revision requested July 13; revision received August 17; accepted September 19; final version accepted October 12. Supported in part by the Department of Health's NIHR Biomedical Research Centres funding scheme. Address correspondence to S.A.T. (e-mail: csytaylor{at}yahoo.co.uk).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
Purpose: To retrospectively evaluate the effect of increasing numbers of computer-aided detection (CAD)-generated false-positive (FP) marks on reader specificity and reporting times by using computed tomographic (CT) colonography in a low-prevalence screening population.

Materials and Methods: Ethics committee approval and informed consent were obtained for this HIPAA-compliant study. Four readers each read 48 data sets (26 men, 22 women; mean age, 57 years) from a screening population (three containing polyps) without CAD application, followed by review of the CAD output and recorded findings and diagnostic confidence. The 45 data sets that were designated as normal were chosen such that 22 generated 15 or fewer FP CAD marks and 23 generated more than 15 FP CAD marks. Sensitivity, specificity, and receiver operating characteristic (ROC) curves were calculated with and without CAD. The relationships between the number of CAD FP marks and reader confidence, reporting times, and correct data set classification were analyzed by using linear and logistic regression.

Results: Across all readers, CAD resulted in four additional FP detections. Overall reader sensitivity and specificity (6-mm polyp threshold) before and after CAD application were 0.75 (95% confidence interval [CI]: 0.43, 0.95) versus 0.83 (95% CI: 0.52, 0.98) and 0.96 (95% CI: 0.91, 0.98) versus 0.93 (95% CI: 0.88, 0.96), respectively. The area under the ROC curve increased from 0.57 (95% CI: 0.34, 0.80) to 0.61 (95% CI: 0.42, 0.80). There was no correlation between an increasing number of CAD FP marks and reader confidence (P = .71) or correct study classification (P = .23), but there was a positive correlation with CAD-assisted reading times (0.06 [95% CI: 0.02, 0.10], P = .002).

Conclusion: Increasing numbers of CAD FP marks did not adversely influence correct reader study classification or diagnostic confidence, although reporting times did increase.

© RSNA, 2008

Supplemental material: http://radiology.rsnajnls.org/cgi/content/full/2471070816/DC1


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
Computer-aided detection (CAD) has proved effective in situations where radiologists must detect small lesions that occur infrequently, notably in screening mammography (1,2), but there are accumulating data detailing good stand-alone performance for several colon CAD systems (36). Preliminary studies suggest that the influence of colon CAD on reader sensitivity will be positive (7,8). Most have used heavily enriched data sets to improve statistical power, but it could be argued that the influence of CAD will be greatest in a screening setting where radiologist vigilance must remain high despite most examinations being normal (9).

For computed tomographic (CT) colonographic screening to be cost effective, it is important that unnecessary colonoscopy precipitated by false-positive (FP) CT colonographic interpretations must be kept to a minimum (10,11). All colon CAD systems generate FP marks and there is potential for these to adversely influence reader specificity and/or efficiency in a low-prevalence screening setting, an observation well described for screening mammography (12). However, it is unknown whether the actual number of CAD FP marks matters or an increase in the number of marks influences the effectiveness of CAD. Thus, the purpose of our study was to retrospectively evaluate the effect of increasing numbers of CAD-generated FP marks on reader specificity and reporting times by using CT colonography in a low-prevalence screening population.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
Medicsight (Hammersmith, London, England) and Vital Images (Minnetonka, Minn), respectively, provided the CAD equipment and viewing software used in this study. Some authors (D.B., S.H., S.A.T., P.J.P.) are research consultants to Medicsight. Nonconsultant authors had full control of the data and information submitted for publication.

Ethics committee approval and patient consent were obtained from the donor institution for the CT colonographic data sets used in this Health Insurance Portability and Accountability Act–compliant study.

Data Set Preparation
A colon CAD system (ColonCAD API, version 2.0; Medicsight) was applied to a database of CT colonography studies collected from an ongoing single-site screening program (University of Wisconsin Hospitals and Clinics, Madison, Wis) (Fig 1). Studies had the following characteristics: bowel preparation, 45 mL sodium phosphasoda (18 hours prior to CT colonography; Fleet Pharmaceuticals, Lynchburg, Va) with 2% barium suspension (250 mL, 15 hours prior to CT colonography; Scan C, Lafayette Pharmaceuticals, Lafayette, Ind) and diatrizoate meglumine and diatrizoate sodium (60 mL, 12 hours prior to CT colonography; Gastrografin, Bracco Diagnostics, Princeton, NJ). Scan parameters for the 16-section CT scanner (LightSpeed; GE Healthcare, Milwaukee, Wis) were 1.25-mm collimation, 1-mm reconstruction interval, 120 kVp, and 50–75 mAs. By using a software harness, the total number of CAD marks (supine and prone combined) was counted for each data set (patient) that was originally reported as normal (ie, containing no polyps ≥6 mm) by five experienced program radiologists (P.J.P. and four nonauthors, with experience with 500–1000 endoscopically verified data sets each).


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1: Flow diagram of study data set collation.

 
Given a prestudy power calculation (see below), data sets in 45 patients (25 men, 20 women; mean age, 57 years; range, 36–85 years) were selected at random on the basis that 22 generated 15 or fewer FP CAD marks (median, 13.5; range, 8–15) per study and 23 generated more than 15 CAD FP marks per study (median, 21; range, 17–25). Three additional data sets in one man and two women from the same database, each containing one endoscopically verified adenomatous polyp of 10 mm or larger, were then chosen at random.

These 48 data sets were then reread by two additional radiologists (D.B., S.A.T., experienced with 800 and 900 endoscopically verified CT colonography studies, respectively) with and without CAD application (see below) to confirm complete radiologic normality and that all CAD marks were FPs in the 45 negative studies and to locate the known polyp by using colonic segment and section numbers in the three positive studies. The readers also assessed the quality of the bowel preparation and distention for each data set, noting whether areas of colonic mucosa remained unseen owing to fecal residue and/or untagged fluid or collapse, and classified each study by consensus as either diagnostically acceptable or not (ie, necessitating a repeat study owing to inability to exclude a ≥6-mm polyp).

Power calculation.—Given the potential statistical deficiencies powering a study by using continuous data (in this case, CAD FP marks) with a binary outcome (reader study classification as FP or true-negative), the power calculation was performed by using a binary cutoff. Given previous data (7), it was assumed that observers using CAD would generate FP detections in 10% of normal studies with 15 or fewer FP marks. We hypothesized that this would increase to 30% with more than 15 FP marks per study. To detect this difference (at 5% significance and 80% power), 72 (144 total) readings in each group were required. This was increased to 180 readings to allow for some nonindependence of data by using more than one reader.

Reader selection and CAD workstation integration.—Four radiologists (E.T., R.G., R.I., V.A.S.) took part in the study. All had been at a 2-day dedicated CT colonography workshop within 6 months preceding the study, were familiar with the workstation (Vitrea 3.8; Vital images), had read at least 75 endoscopically validated data sets (range, 75–120), and were reporting findings at unaided CT colonography in daily clinical practice.

Before the study, readers were provided with historical data relating to CAD performance (13). In brief, the data described external validation of the CAD software, providing expected sensitivity and FP rates in similar CT colonographic data sets, at the settings used for the present study. Readers were also given a 1-hour tutorial on the specific integration in the workstation software. The workstation integration used for the study has been described elsewhere (6,7), In brief, the software segments the colon included in the CT data set and determines the inherent sphericity of all objects projecting into the colonic lumen. Detections with a sphericity above a predetermined threshold level are then prompted visually to the observer by using small red dots superimposed over the region of interest on two-dimensional (2D) transverse and three-dimensional (3D) endoluminal views. The CAD iteration utilized analyses of each CT scan acquisition (supine or prone) independently, and, as such, CAD marks are not matched between the supine and prone positions.

Reading sessions.—To mimic clinical practice, readers were informed that studies were acquired from an asymptomatic screening population but were given no other information about the prevalence of abnormality or the aims of the study. Each reader was provided with a list of study numbers in randomized order and was instructed to independently read the studies (see below) over a period of 2 weeks (to mimic normal reporting volumes).

Reading paradigms.—Readers were free to use the full functionality of the workstation (ie, 2D transverse, multiplanar reformations, 3D cube, and full endoluminal fly-through images), mirroring normal clinical practice, and were instructed to first analyze each case (ie, prone and supine data sets) without CAD, as per their usual clinical practice (unassisted read). Readers were specifically told to follow CT Colonography Reporting and Data System (CRADS) guidelines (14) (ie, only studies containing a polyp measuring ≥6 mm were considered abnormal; polyps were measured as per CRADS guidelines). Readers were free to measure polyps by using either 2D multiplanar reformation or 3D endoluminal views, according to their usual practice. Readers noted interpretation time (defined as time taken to read the data set once opened on the workstation) on a study sheet along with each perceived abnormality noting colonic segment, 2D transverse section number, lesion size (in millimeters), and overall diagnostic confidence that the case was normal (scored from 1 [least confident] to 100 [most confident]). Readers were not told to use a particular confidence score to indicate if they would recommend colonoscopy in clinical practice.

Once this initial read was complete, readers immediately applied the preprocessed CAD and reviewed the case again. There was no software functionality to move automatically from CAD mark to CAD mark (eg, by hitting a specific keyboard key) and readers assessed each CAD mark by scrolling through the data set. Readers documented any additional findings seen with CAD and were permitted to discard any of their unassisted findings. Readers then revised overall case confidence in light of CAD and recorded the additional time taken. At the end of the study, the preferred primary reading method (2D or primary 3D endoluminal fly-through) was documented for each reader.

Case marking.—A radiologist (J.Z., experienced with 300 endoscopically validated CT colonographic data sets) who did not take part in the main study reviewed the reader report forms, documenting reader performance against known patient status. In consensus with two other radiologists (D.B., S.A.T., experienced with 800 and 900 endoscopically verified studies, respectively), the causes of all reader FP marks were classified in seven categories as follows: (a) bulbous fold, a prominent fold in an otherwise well distended segment; (b) segment under distension; (c) fecal residue and/or residual fluid; (d) normal colonic anatomy (eg, ileocecal valve, redundant mucosa, internal hemorrhoid); (e) extracolonic; (f) diminutive (≤5 mm) polyp; and (g) unexplained. Finally, all normal data sets were reviewed and all CAD FP marks classified in the same way. Because the particular iteration of the CAD system utilized does not provide sizes of lesions detected with CAD, instead marking a region of interest for radiologist review, it was not possible to rank CAD FP marks according to size.

Statistical Analysis
Effect of CAD.—Per-case sensitivity and specificity were calculated with and without CAD. The effect of CAD on reader confidence was assessed by using a paired t test. Receiver operating characteristic (ROC) curves were generated for each reader with and without CAD on the basis of the association between confidence level and correct case classification (normal vs abnormal).

Effect of CAD FP marks.—The distribution of the causes of CAD FP marks was compared between those data sets generating 15 or fewer CAD FP marks and those generating more than 15 CAD FP marks by using a {chi}2 test, and the number of reader FP detections before and after CAD were calculated. Linear regression was used to examine the relationship with reader case confidence and reporting times. The effect of CAD FP numbers on correct reader classification of normal data sets was also examined by using logistic regression. Robust standard errors (by using the Huber, White, and sandwich estimators of variance) (15) were used for all regression analyses to account for the fact that each case was included in the analysis four times (one for each observer).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
All 48 data sets were deemed diagnostically acceptable. There was a reasonably even distribution of FP marks across the 45 normal data sets (Fig E1, [http://radiology.rsnajnls.org/cgi/content/full/2471070816/DC1]). All four readers expressed a preference for primary 2D reading analysis.

CAD Performance
CAD correctly detected all three polyps in the abnormal studies, generating 13, 18, and 25 FP marks per whole data set. In the normal studies, the contribution of each cause of CAD FP marks (Table 1) was not significantly different between those data sets with 15 FP marks or fewer and those with more than 15 FP marks (P = .69).


View this table:
[in this window]
[in a new window]

 
Table 1. Grading and Distribution of CAD FP Marks

 
Overall Influence of CAD
Reader sensitivity and specificity.—For the three polyps in the data sets, one reader detected an additional 10-mm polyp with CAD, one reader detected all three with and without CAD, and two readers missed the same 11-mm polyp (coated by tagged fluid) despite a correct CAD prompt (Figs E2, [http://radiology.rsnajnls.org/cgi/content/full/2471070816/DC1], 2; Table 2). Overall reader sensitivity and 95% confidence interval (95% CI) before and after CAD application were 0.75 (95% CI: 0.43, 0.95) and 0.83 (95% CI: 0.52, 0.98), respectively. After CAD application, no reader discounted any observation made before CAD application.


Figure 2A
View larger version (122K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2a: CT colonographic (a) 2D transverse and (b) 3D endoluminal images show 11-mm sigmoid polyp (arrows) missed by two readers despite correct CAD prompt (not shown on a for clarity). Note that polyp is coated by tagged fluid (arrowhead).

 

Figure 2B
View larger version (125K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2b: CT colonographic (a) 2D transverse and (b) 3D endoluminal images show 11-mm sigmoid polyp (arrows) missed by two readers despite correct CAD prompt (not shown on a for clarity). Note that polyp is coated by tagged fluid (arrowhead).

 

View this table:
[in this window]
[in a new window]

 
Table 2. Reader Performance with and without CAD

 
During the unassisted analysis, among the four readers, there were eight FP detections in total (range, 6–15 mm) (Table 3). Five and three of these pre-CAD FP detections, respectively, occurred in studies generating 15 or fewer and more than 15 CAD FP marks when CAD was subsequently applied, and all eight were included by readers in their final report with CAD (ie, none were dismissed after CAD application).


View this table:
[in this window]
[in a new window]

 
Table 3. Nature of Reader FP Detections across All Four Readers

 
Across all readers and all data sets (4 x 48 = 192 reads), application of CAD resulted in a total of four (2.1%) additional FP detections in four normal studies (respective size, 7, 6, 15, and 6 mm; one for reader 1, two for reader 2, zero for reader 3 and one for reader 4) (Figs 3, 4; Table 3). Three of four additional studies had 15 or more CAD FP marks and one had fewer than 15. Two of these detections (6-mm, fecal residue; 15-mm, bulbous fold) had also been incorrectly recorded by two of four readers during their read without CAD, but were only recorded as FP for other readers after CAD application. Overall reader specificity before and after CAD were 0.96 (95% CI: 0.91, 0.98) and 0.93 (95% CI: 0.88, 0.96), respectively, for a change of –0.03 (95% CI: 0.0, 0.05).


Figure 3A
View larger version (136K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3a: CT colonographic (a) 2D coronal and (b) 3D endoluminal images show normal but bulbous haustral fold (arrows) marked by CAD (not shown on a for clarity). Fold was incorrectly recorded as abnormal by two readers before and another reader after CAD application.

 

Figure 3B
View larger version (113K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3b: CT colonographic (a) 2D coronal and (b) 3D endoluminal images show normal but bulbous haustral fold (arrows) marked by CAD (not shown on a for clarity). Fold was incorrectly recorded as abnormal by two readers before and another reader after CAD application.

 

Figure 4A
View larger version (128K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4a: CT colonographic (a) coronal and (b) 3D endoluminal images show linear filling defect (arrows) in rectum that represent internal hemorrhoid marked by CAD. Lesion was correctly dismissed by all four readers before CAD, but only by three after CAD.

 

Figure 4B
View larger version (125K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4b: CT colonographic (a) coronal and (b) 3D endoluminal images show linear filling defect (arrows) in rectum that represent internal hemorrhoid marked by CAD. Lesion was correctly dismissed by all four readers before CAD, but only by three after CAD.

 
Reader confidence.—Across all four readers, there was a clinically small but significant increase in reader confidence with CAD application (mean improvement, 2.1 [95% CI: 1.3, 2.8], P < .001) (Table E1, [http://radiology.rsnajnls.org/cgi/content/full/2471070816/DC1]).

ROC Curves
The area under the ROC (AUC) increased with CAD for two of four readers, was unchanged for one, and marginally decreased for one (Table 4).


View this table:
[in this window]
[in a new window]

 
Table 4. AUC for Readers with and without CAD

 
Influence of the Number of Increasing CAD FP Marks
Reader confidence.—The regression coefficient between the number of CAD FP marks and reader diagnostic confidence after CAD was –0.04 (95% CI: –0.25, 0.17) (P = .71), indicating no association between increasing numbers of CAD FP prompts and diminished reader confidence.

Reporting times.—The mean reporting time across all four readers was 8.6 minutes (standard deviation, 3.6) for the unassisted read and 3.6 minutes (standard deviation, 1.5) for the CAD read (42% increase). The regression coefficient relating the number of CAD FP marks to CAD reporting time was 0.06 (95% CI: 0.02, 0.10) (P = .002), indicating a small but significant positive correlation between increasing CAD FP marks and reading time. The additional time for review of the CAD output for studies with 15 or fewer FP marks was 3.3 minutes (standard deviation, 1.4) and for studies with more than 15 FP marks was 3.9 minutes (standard deviation, 1.6).

Correct case classification.—For each additional CAD FP mark, the odds of readers correctly classifying a normal case were 1.14 (95% CI: 0.92, 1.40) (P = .23), indicating no significant detrimental effect of increasing numbers of CAD FP marks on correct reader case classification.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
We found no evidence that increasing numbers of FP marks adversely influenced reader specificity or decreased overall reader confidence. Given the range of CAD FP marks in the literature (3,6,1618), we hypothesized that more than 15 detections per case would adversely influence reader performance. While it could be argued a cutoff of 15 FP marks was somewhat arbitrary, given the immaturity of the colonographic CAD literature, to our knowledge, there is no precedent for our study and we deemed this a reasonable figure. The level of reader FP marks before and after CAD was lower than our expectations before the study and certainly less than our previous experience by using similarly trained readers and data sets obtained with a similar CT colonographic protocol. By way of explanation, to mimic clinical practice, readers were aware that studies were from an average-risk screening population and were informed of the general performance characteristics of the CAD system utilized (both conditions would likely be in place in clinical practice).

By keeping CAD settings constant throughout, we ensured that the contribution of each cause of CAD FP marks was constant among all data sets. We did not adjust the CAD output in any way to artificially change the number or type of CAD marks. The main causes of FP marks were fecal and/or fluid residue and normal colon anatomy. The CAD system we used had been tested on data sets by using oral tagging agents (19), and our data again support the concept that, in general, CAD FP marks are easily dismissed by trained radiologists (20). The plausibility of CAD FP marks is perhaps more important than the actual number.

Clearly plausible CAD marks require greater radiologist work-up than do easily dismissed detections and are more likely to produce an actual radiologist report with FP findings. Indeed, it could be argued that if CAD produces more than 15 marks per data set, many are likely to be dismissed with relative ease. However, it is difficult to define a plausible CAD mark and numbers will differ between individual data sets. For example, in our study, CAD FP marks resulting from fecal residue and bulbous folds were clearly deemed plausible by some readers (resulting in FP detections) and dismissed by others. All CAD FP marks were dismissed as such by our experienced radiologists, but this was not the case for the less experienced readers. Because of this subjectivity, we did not attempt to grade plausibility of CAD FP marks, although we indirectly assessed this by recording reader confidence levels after CAD, but this concept clearly requires further study.

It could be argued that our data support the notion that as many as 25 CAD FP marks do not affect radiologist performance. While this seems to be the case for specificity (our main study aim), with only three positive data sets, we do not know whether this also holds true for reader sensitivity. The small number of positive studies is reflected by very large confidence limits in our sensitivity data. We also did not include 6- to 9-mm polyps, which was also an artificial stipulation for the study. However, our main aim was to test specificity in a low-prevalence population. We did include some abnormal studies so readers would not assume that the whole data set was normal, thus skewing their interpretation, but did not attempt to provide robust sensitivity data across a range of polyp sizes. Many data sets will be required to provide adequate power to test this while maintaining the low prevalence of abnormality in a screening population, which most CAD studies to date lack. It also follows that our data may not be directly applicable to other CAD systems with differing spectrums of FP marks.

We did demonstrate a positive correlation between increasing numbers of CAD FP marks and reporting times, although arguably this may have limited clinical significance—every additional CAD FP mark added 0.06 minutes (just under 4 seconds). This is probably inevitable, since time must be taken to analyze each CAD mark, although it would likely seem the benefit of CAD will outweigh the relatively minimal increase in reading time.

Even though all three polyps were correctly detected by using CAD, two of four readers failed to detect one 10-mm polyp, again emphasizing that observers may reject bona fide CAD prompts. The large polyp marked by CAD but missed by readers was not particularly subtle (although coated with tagged fluid). Why correct CAD detections are dismissed is unclear but is likely dependent in part on reader experience.

CAD also had a clinically unimportant (albeit significant) positive effect on reader confidence, suggesting readers were at least reassured by CAD that they had not missed anything. The use of a 100-point confidence scale (as opposed to defined categories) has been recommended for studies by using ROC curves and was indeed the initial recommendation of the Breast Imaging Reporting and Data System (21). By its nature, this scale does not include actionable threshold levels (eg, the level required to trigger colonoscopy), but is a measure of a reader's certainty that a case is normal. However, even by using this 100-point scale, readers chose a finite number of confidence levels, explaining the number of data points on the ROC curves. We accept that a category-based scale would have been just as effective.

Our study had limitations. Our normal studies did not have colonoscopic and/or histologic correlation, although all were deemed radiologically normal by three radiologists (with a combined experience of over 2000 validated data sets) by using CAD. Importantly, one of these radiologists had no previous contact with the readers (it could otherwise be argued that the study readers were influenced by prior instruction given by the experienced radiologists). Furthermore, all reader FP marks were reviewed and their cause confidently determined so that even if CT colonography datasets did not contain occult neoplasia, this would not have affected the results. Although endoscopically verified CT colonographic data sets are now altruistically provided on the Internet, we wanted to use representative data sets from an ongoing screening program rather than risk potentially using hand-picked data sets provided online. It should be noted that lack of an independent histologic reference standard is common in lung and mammographic studies. We deliberately rejected a paired study design whereby readers analyzed the same data sets with differing number of CAD FP marks. Such a design risks recall bias and would be difficult to perform without explicitly alerting the readers to the study purpose.

Furthermore, by not artificially increasing CAD FP marks (eg, by changing the CAD filter settings or manually adding new prompts), we ensured the causes of the FP detections were similar across high- and low-prevalence data sets. An alternative study design would have been a randomized study. It could be argued that studies with more than 15 CAD FP marks were somehow intrinsically different from those with less. However, data sets were acquired from the same source, generated equivalent numbers of unassisted reader FP marks regardless of the subsequent number of CAD marks, and had all been graded as clinically adequate by three experienced radiologists. Ultimately, if studies with 15 or more CAD FP marks were of inferior technical quality to the others, this would act in favor of our null hypothesis. One important consideration is our inability to ensure that readers questioned every CAD mark, and it is possible some may have been overlooked. Many CAD systems allow readers to move automatically from CAD mark to CAD mark via a mouse click or keyboard key, which may be a more robust method of ensuring all marks are reviewed. Finally, all four readers expressed a preference for primary 2D analysis, and it may not be possible to fully extrapolate the data to those preferring primary 3D endoluminal review.

In summary, we found no evidence that increasing numbers of CAD FP marks adversely influenced either correct reader case classification or diagnostic confidence, but they did prolong reporting times.


    ADVANCE IN KNOWLEDGE
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 


    FOOTNOTES
 

Abbreviations: AUC = area under ROC curve • CAD = computer-aided detection • CI = confidence interval • FP = false-positive • ROC = receiver operating characteristic • 3D = three-dimensional • 2D = two-dimensional

See Materials and Methods for pertinent disclosures.

Author contributions: Guarantor of integrity of entire study, S.A.T.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, S.A.T., P.J.P.; clinical studies, S.A.T., R.G., R.I., E.T., V.A.S., D.B., J.Z., P.J.P., S.H.; statistical analysis, P.B.; and manuscript editing, all authors


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 

  1. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001;220:781–786.[Abstract/Free Full Text]
  2. Warren Burhenne LJ, Wood SA, D'Orsi CJ, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000;215:554–562.[Abstract/Free Full Text]
  3. Summers RM, Yao J, Pickhardt PJ, et al. Computed tomographic virtual colonoscopy computer-aided polyp detection in a screening population. Gastroenterology 2005;129:1832–1844.[CrossRef][Medline]
  4. Yoshida H, Nappi J, MacEneaney P, Rubin DT, Dachman AH. Computer-aided diagnosis scheme for detection of polyps at CT colonography. RadioGraphics 2002;22:963–979.[Abstract/Free Full Text]
  5. Yoshida H, Masutani Y, MacEneaney P, Rubin DT, Dachman AH. Computerized detection of colonic polyps at CT colonography on the basis of volumetric features: pilot study. Radiology 2002;222:327–336.[Abstract/Free Full Text]
  6. Taylor SA, Halligan S, Burling D, et al. Computer-assisted reader software versus expert reviewers for polyp detection on CT colonography. AJR Am J Roentgenol 2006;186:696–702.[Abstract/Free Full Text]
  7. Halligan S, Altman DG, Mallett S, et al. Computed tomographic colonography: assessment of radiologist performance with and without computer-aided detection. Gastroenterology 2006;131:1690–1699.[CrossRef][Medline]
  8. Summers RM, Jerebko AK, Franaszek M, Malley JD, Johnson CD. Colonic polyps: complementary role of computer-aided detection in CT colonography. Radiology 2002;225:391–399.[Abstract/Free Full Text]
  9. Pickhardt PJ, Taylor AJ, Kim DH, Reichelderfer M, Gopal DV, Pfau PR. Screening for colorectal neoplasia with CT colonography: initial experience from the 1st year of coverage by third-party payers. Radiology 2006;241:417–425.[Abstract/Free Full Text]
  10. Sonnenberg A, Delco F, Bauerfeind P. Is virtual colonoscopy a cost-effective option to screen for colorectal cancer? Am J Gastroenterol 1999;94:2268–2274.[CrossRef][Medline]
  11. Vijan S, Hwang I, Inadomi J, et al. The cost-effectiveness of CT colonography in screening for colorectal neoplasia. Am J Gastroenterol 2007;102:380–390.[CrossRef][Medline]
  12. Hukkinen K, Vehmas T, Pamilo M, Kivisaari L. Effect of computer-aided detection on mammographic performance: experimental study on readers with different levels of experience. Acta Radiol 2006;47:257–263.[CrossRef][Medline]
  13. Halligan S, Taylor SA, Dehmeshki J, et al. Computer-assisted detection for CT colonography: external validation. Clin Radiol 2006;61:758–763.[CrossRef][Medline]
  14. Zalis ME, Barish MA, Choi JR, et al. CT colonography reporting and data system: a consensus proposal. Radiology 2005;236:3–9.[Free Full Text]
  15. Williams RL. A note on robust variance estimation for cluster-correlated data. Biometrics 2000;56:645–646.[CrossRef][Medline]
  16. Bogoni L, Cathier P, Dundar M, et al. Computer-aided detection (CAD) for CT colonography: a tool to address a growing need. Br J Radiol 2005;78(spec issue 1):S57–S62.[Abstract/Free Full Text]
  17. Nappi J, Yoshida H. Feature-guided analysis for reduction of false positives in CAD of polyps for computed tomographic colonography. Med Phys 2003;30:1592–1601.[CrossRef][Medline]
  18. Mani A, Napel S, Paik DS, et al. Computed tomography colonography: feasibility of computer-aided polyp detection in a "first reader" paradigm. J Comput Assist Tomogr 2004;28:318–326.[CrossRef][Medline]
  19. Dehmeshki J, Halligan S, Taylor SA, et al. Computer assisted detection software for CT colonography: effect of sphericity filter on performance characteristics for patients with and without fecal tagging. Eur Radiol 2007;17:662–668.[CrossRef][Medline]
  20. Taylor SA, Halligan S, Slater A, et al. Polyp detection with CT colonography: primary 3D endoluminal analysis versus primary 2D transverse analysis with computer-assisted reader software. Radiology 2006;239:759–767.[Abstract/Free Full Text]
  21. Wagner RF. An overview of contemporary ROC methodology in medical imaging and computer assist modalities. http://www.fda.gov/ohrms/dockets/ac/04/slides/4024s1_01_ROC-TUTORIAL.PPT. Published March 6, 2001. Accessed January 3, 2007.



This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
S. A. Taylor, J. Brittenden, J. Lenton, H. Lambie, A. Goldstone, P. N. Wylie, D. Tolan, D. Burling, L. Honeyfield, P. Bassett, et al.
Influence of Computer-Aided Detection False-Positives on Reader Performance and Diagnostic Confidence for CT Colonography
Am. J. Roentgenol., June 1, 2009; 192(6): 1682 - 1689.
[Abstract] [Full Text] [PDF]


Home page
JNMHome page
M. Sadik, M. Suurkula, P. Hoglund, A. Jarund, and L. Edenbrandt
Improved Classifications of Planar Whole-Body Bone Scans Using a Computer-Assisted Diagnosis System: A Multicenter, Multiple-Reader, Multiple-Case Study
J. Nucl. Med., March 1, 2009; 50(3): 368 - 375.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Regge, C. Hassan, P. J. Pickhardt, A. Laghi, A. Zullo, D. H. Kim, F. Iafrate, and S. Morini
Impact of Computer-aided Detection on the Cost-effectiveness of CT Colonography
Radiology, February 1, 2009; 250(2): 488 - 497.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Supplemental Table and Figures
Right arrow All Versions of this Article:
2471070816v1
247/1/133    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Taylor, S. A.
Right arrow Articles by Halligan, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Taylor, S. A.
Right arrow Articles by Halligan, S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE