|
|
||||||||
Gastrointestinal Imaging |
1 From the Department of Intestinal Imaging, St Mark's Hospital, Northwick Park, Harrow, United Kingdom (S.A.T., S.H., A.S., V.G., D.N.B.); Charing Cross Hospital, London, United Kingdom (M.E.R.); and Medicsight, London, United Kingdom (M.E.R., L.H., J.M., H.A., J.D.). Received March 22, 2005; revision requested May 17; revision received May 24; accepted June 21; final version accepted July 18. Supported by Medicsight, London, United Kingdom. Address correspondence to S.A.T., Department of Specialist X-Ray, Level 2 Podium, University College Hospital, 235 Euston Rd, London NW1 2BU, UK (e-mail: csytaylor{at}yahoo.co.uk).
| ABSTRACT |
|---|
|
|
|---|
Materials and Methods: Ethical permission and patient consent were obtained from all donor institutions for use of CT colonography data sets. Twenty CT colonography data sets from 14 men (median age, 61 years; age range, 5278 years) with 48 endoscopically proved polyps were selected. Polyp coordinates were documented in consensus by three unblinded radiologists to create a reference standard. Two radiologists read the data sets, which were randomized between primary 3D endoluminal views with 2D problem solving and 2D views supplemented by CAR software. Reading times and diagnostic confidence were documented. The CAR software highlighted possible polyps by superimposing circles on the 2D transverse images. Data sets were reread after 1 month by using the opposing analysis method. Detection rates were compared by using the McNemar test. Reporting times and diagnostic confidence were compared by using the paired t test and Mann-Whitney U test, respectively.
Results: Mean sensitivity values for polyps measuring 15, 69, and 10 mm or larger were 14%, 53%, and 83%, respectively, for 2D CAR analysis and 16%, 53%, and 67%, respectively, for primary 3D analysis. Overall sensitivity values were 41% for 2D CAR analysis and 39% for primary 3D analysis (P = .77). Reader 1 detected more polyps than reader 2, particularly when using the 3D fly-through method (P = .002). Mean reading times were significantly longer with the 3D method (P = .001). Mean false-positive findings were 1.5 for 2D analysis and 5.5 for 3D analysis. Reader confidence was not significantly different between analysis methods (P = .42).
Conclusion: Two-dimensional CAR analysis is quicker and at least matches the sensitivity of primary 3D endoluminal analysis, with fewer false-positive findings.
© RSNA, 2006
| INTRODUCTION |
|---|
|
|
|---|
Recently developed computer-assisted reader (CAR) software may overcome some of the perceptual problems inherent to primary 2D analysis, without substantially adding to reporting times (16). CAR software highlights possible polyps for the radiologist at the time of the primary reading and has the potential to reduce perceptual error, thereby obviating some of the advantages of a more time-consuming primary 3D analysis.
Thus, the purpose of our study was to retrospectively compare primary 3D endoluminal analysis with primary 2D transverse analysis supplemented by CAR software for CT polyp detection and reader reporting times.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Case Selection
Although the results of early studies suggest high sensitivity for polyp detection by using 2D analysis, findings from more recent studies have revealed considerably lower detection rates. On the basis of these findings (1,2,6,7,17) and the more recent literature concerning the use of a full 3D fly-through analysis (4), we hypothesized that a primary 3D endoluminal analysis would demonstrate approximately 20% more polyps than a corresponding 2D analysis (90% vs 70% detection, respectively). A power calculation at 80% indicated that 35 polyps would be required to detect this difference with an
of .05 on the basis of data pairing and analysis with the McNemar test.
A total of 20 studies from 14 men (median age, 61 years; age range, 5278 years) with 48 polyps were chosen at random from a local database. The database had been collated from five donor institutions, all of which operated research programs comparing CT colonography with same-day endoscopy. Ethical permission and patient consent were obtained from all donor institutions for the use of the CT colonography data sets, and all studies had been judged as clinically adequate by the donor institution.
Annotated CT Reference Standard
The specifics of the 20 selected studies were as follows: 120-kV collimation (all studies), 50 mA (six studies), 100 mA (12 studies), 200 mA (two studies), and fluid tagging (13 studies). All studies had supine and prone data acquisitions. The studies were loaded onto a computer workstation equipped with CT colonography software (ColonCAR 1.2; Medicsight) and were read in consensus by three experienced radiologists (S.A.T., S.H., and D.N.B., each of whom had previously read 600, 400, and 500 endoscopically confirmed CT colonography studies, respectively). All radiologists had full knowledge of the same-day colonoscopy findings. Polyp diameter was used to individually match polyps if the colonoscopy results suggested more than one polyp in a particular colonic segment. All studies were assessed in consensus as adequate for distention, cleansing, and tagging efficiency. Clinical adequacy was defined as sufficient quality in all colonic segments such that a repeat study would not have been indicated in routine clinical practice.
The position of the detected polyps was indicated on the appropriate 2D transverse image by drawing a region of interest around the polyp surface with a freehand drawing tool embedded in the software (Fig 1). These regions of interest were then stored as binary image files so that their exact location could be recalled during reader marking. The section location and polyp diameter were also documented. Polyp diameter was measured by using electronic calipers, which were applied to the multiplanar reformatted (MPR) image that showed the greatest polyp dimension. Because the reference study annotation had been performed primarily by using a 2D transverse approach, one experienced nonobserver (S.A.T.) reviewed all 20 studies to ensure that all annotated polyps were visible on the 3D endoluminal view. There were no flat lesions on the 20 studies.
|
CT Colonography Software Platforms
For primary 2D reading, studies were loaded onto a workstation equipped with ColonCAR 1.2 software (Medicsight). The software permits side-by-side supine and prone 2D transverse scrolling in the upper half of the screen, with MPR images located in the lower half of the screen. A surface-shaded 3D cube was available for problem solving. The CAR software is designed to highlight potential polyps for the reporting radiologist. In brief, the software segments the colon from the CT data set and then determines the inherent sphericity of all raised objects within the colonic lumen. The user has control over the degree of flatness and sphericity of the detected objects by means of slider bars with an arbitrary scale between 0.5 and 1.0.
Once the settings for sphericity and flatness are set, the user applies this polyp enhancement filter to each of the supine and prone data sets. The detected objects are then highlighted with red and yellow rings, which are superimposed on the transverse 2D images (Fig 2). The corresponding unfiltered 2D image is displayed adjacent to the filtered image. For the purpose of this study, filters were kept at default settings (sphericity of 0.75 and flatness of 1) on the basis of previous results from studies performed during the development of the CAR software (16). The software had not been previously exposed to the study data sets.
|
Before beginning the study, both readers were given full instruction as to the functionality of each platform.
Reading Protocol
The 20 data sets were randomized into two groups of 10 studies each (group A and group B). All patient information was removed from the data sets to preserve patient anonymity and to reduce recall bias when studies were reanalyzed after 1 month. Reader 1 was instructed to read group A studies by using a primary 2D analysis and to read group B studies by using a primary 3D analysis. The actual reading order for the 20 cases was randomized. The same process was applied for reader 2, except that group A studies were read by using a primary 3D analysis and group B studies were read by using a primary 2D analysis.
One month later, readers reread the 20 scans by using the opposing method of image analysis. The scan order was randomized before the second reading in an attempt to reduce recall bias. No feedback was given to the readers after their first analysis.
Reading Strategy and Polyp Annotation
With primary 2D analysis, readers were instructed to apply the CAR software by using default filter settings and to scroll through the 2D transverse data sets as usual. They were free to use the 3D surface-shaded cube and MPR views as often as necessary. Readers were also free to choose the window for analysis and could change the window as required (eg, for tagged cases). No information about the expected performance of the CAR software was provided, although readers were instructed to analyze all areas highlighted by the software and to categorize each highlighted lesion as a either a true polyp or a false-positive finding. Lesions were graded by readers as follows: grade E, lesion immediately dismissed as extracolonic; grade 1, lesion immediately dismissed as intracolonic (eg, highlighted area was obviously the rectal tube or ileocaecal valve); grade 2, lesion quickly dismissed after only a few seconds by using 2D transverse scrolling only (eg, highlighted area was obvious fecal residue, indentation by extracolonic structures, or a normal fold); or grade 3, lesion dismissed only after problem-solving maneuvers (eg, by using a supine-prone correlation, MPR view, or 3D cube view).
Reading times, which excluded the time needed to load the scan and to apply the CAR filters (about 12 minutes), were recorded. The readers indicated the position of the detected polyps on the appropriate 2D transverse image by drawing around the polyp surface with a freehand drawing tool that was included in the 2D software package.
With 3D analysis, readers were told to perform a complete endoluminal fly through from anus to cecum and back again for supine and prone data sets. The readers were free to use transverse 2D and MPR views for problem solving and for primary analysis of segments in which endoluminal navigation was not possible. Readers were instructed to indicate the position of detected polyps by reporting the section number on the corresponding transverse image. A screen shot was also taken of the polyp on transverse and endoluminal views to aid subsequent marking (freehand polyp annotation was not possible with the 3D software platform). Reading times were recorded from the commencement of actual analysis.
For both reading strategies (2D analysis supplemented by CAR software and primary 3D endoluminal analysis), readers recorded their level of confidence that the findings represented a true lesion on a scale of 1100 (100 being the most confident) (18). By using the electronic calipers with each software platform, readers measured the maximum polyp dimension on the 2D MPR view that best showed the polyp.
Marking
A nonobserver radiologist (S.A.T.) marked all reader reports. When assessing the primary 2D analysis, the experienced radiologist recalled the annotated consensus reference for each data set on one workstation and compared this with the annotated data set produced by each reader on an adjacent workstation.
When marking the primary 3D reading, the experienced radiologist compared the section numbers for polyps that were detected by the readers with the section numbers that were derived from the reference standard. Polyps were assumed to match if the recorded section number was within three sections (about 3 mm) of the reference standard (same or adjacent colonic segment) and if the screen shots for the reader matched those for the consensus readers. For polyps that had a more than three section difference between the reader section number and the reference standard, two experienced radiologists (S.A.T., D.N.B.) viewed the screen shots and judged in consensus whether the reader had correctly identified the actual polyp but made a transcription error in section number.
The experienced radiologist (S.A.T.) also independently applied the CAR software to all 20 data sets and compared the circled objects with the annotated reference standard to ascertain how many polyps had been demonstrated with the CAR software. The same radiologist also retrospectively analyzed the characteristics (eg, morphologic features, relationship to folds, obscuration by tagged fluid, and segmental distention) of individual polyps measuring 5 mm or larger that were identified exclusively by using one of the two workstations.
Statistical Analysis
Statistical analysis was performed by using a commercially available software package (Statsdirect, version 2.4; Statsdirect, Cheshire, United Kingdom). Polyp detection rates for 2D analysis with CAR software and 3D endoluminal fly-through analysis were compared by using the McNemar test. The effects of fluid tagging on polyp detection were compared by using the Fisher exact test. To investigate any potential learning curve or recall bias, overall reader sensitivities for the first reading were compared with those for the second reading by using the Fisher exact test. Reading times for the workstations were compared by using a paired t test. Diagnostic confidence levels were skewed and were compared by using the nonparametric Mann-Whitney U test. Significance was assigned at a P value of less than .05.
| RESULTS |
|---|
|
|
|---|
Detection
There were no significant differences in overall polyp detection between 2D analysis with CAR software and primary 3D analysis for either reader (P = .75 for reader 1, P = .27 for reader 2) or for the combined results (P = .77).
Reader 1 detected significantly more polyps than reader 2 by using 2D analysis with CAR software (P = .04) and 3D endoluminal fly-through analysis (P = .002). Indeed, reader 2 detected only three of nine polyps measuring 10 mm or larger by using the 3D fly-through method compared with seven of nine polyps by using a 2D analysis with CAR software (Table 1).
|
Of the four polyps measuring 69 mm that were not highlighted by the CAR software, two were detected by reader 1 by using 2D analysis and three were detected by reader 1 by using 3D endoluminal analysis. Reader 2 did not detect any of these polyps by using 2D analysis but did detect three by using 3D endoluminal analysis.
There was no improvement in combined reader detection rates between the first (18 of 48 polyps) and second (22 of 48 polyps) readings for 2D analysis with CAR software (P = .53) or between the first (17 of 48 polyps) and second (20 of 48 polyps) readings for primary 3D endoluminal analysis (P = .67).
No significant difference was found in the mean polyp detection rate for data sets with and those without fluid tagging by using either 2D analysis with CAR software (P = .38) or primary 3D analysis (P = .24).
Polyp Characteristics
Regarding the characteristics of polyps measuring 5 mm or larger that were detected on just one workstation, seven of eight were in close proximity to a fold and five were highlighted by the CAR software (Figs 36) (Table 2).
|
|
|
|
|
|
|
|
|
|
|
False-Positive Results and Reader Confidence
The median number of CAR-highlighted regions that were designated as false-positive findings by readers 1 and 2 was 13 (range, 442) per case (prone and supine data sets combined). Findings were classified (per case) as follows: grade E, median of 2.5 (range, 026); grade 1, median of 3 (range, 014); grade 2, median of 4 (range, 021); and grade 3, median of 2 (range 08).
In general, there tended to be more false-positive findings for 3D endoluminal analysis than for 2D analysis supplemented by CAR software (Table 3).
|
| DISCUSSION |
|---|
|
|
|---|
We found no advantage of using primary 3D endoluminal analysis versus 2D analysis supplemented by CAR software for either of our readers. Indeed, one reader detected less than half the number of polyps 10 mm or larger when using the 3D fly-through method. We deliberately used trained readers with some working knowledge of CT colonography in an attempt to mimic the experience of most radiologists outside academic centers. We found that reader 1, who was the most junior reader, did better than reader 2 on both workstations. Both readers, however, had a similar amount of training and an equal level of experience using software platforms equipped with both primary 2D and endoluminal fly-through capability. The reason for the performance discrepancies is unclear, but these data again emphasize the wide interobserver variation in CT colonography reporting, which has been well documented in previous studies (6,19). Our results also suggest that simply switching to a primary 3D endoluminal analysis will not be sufficient to compensate for inadequate 2D performance.
In isolation, the CAR software performed well, highlighting 76% of polyps 69 mm and all polyps larger than 10 mm. Both readers (particularly reader 2) dismissed many true polyps during primary 2D analysis. Therefore, while computer-aided detection software can potentially reduce perceptual error, it cannot fully account for interpretative error, the magnitude of which is dependent on the prior training, experience, and innate aptitude of the individual reader (20,21). We also blinded our readers to the performance characteristics of the software to avoid inducing bias, although this may have negatively affected the interpretation of CAR highlights.
The CAR software indicated a variable number of false-positive findings. The majority of these findings, however, were quickly and easily dismissed by the readers. On average, primary 3D analysis resulted in more than three times as many false-positive findings recorded by the readers, thereby suggesting that the erroneous CAR prompts did not unduly affect specificity. Readers had full access to 2D problem solving when reading the primary 3D views, which perhaps suggests that additional false-positive lesions were not detected on the primary 2D view. It should be stressed, however, that most of the false-positive findings for reader 2 were generated by using primary 3D analysis, and most of these were small. Neither reading method generated many false-positive results for lesions 6 mm or larger.
It is unclear whether fewer false-positive CAR prompts (achieved by way of improved specificity) would result in improved sensitivity because the readers would be less distracted. Clearly, one of the major potential advantages of using CAR software rather than computer-aided detection software is that readers can adjust the sensitivity and specificity profiles of the software filters according to their own personal and case-specific needs. The effect of this adjustment merits further study.
Although readers dismissed polyps highlighted by the CAR software, the converse was also reassuringly truethat is, reader 1 correctly identified polyps that were not highlighted by the software. At the present time, computer-aided detection software cannot safely replace a full reading by the reporting radiologist, and it is important that readers use the software as an adjunct to (rather than as a replacement for) their own careful analysis (2225).
Concerning polyps that were identified by readers but were not prompted by the CAR software, there was the suggestion that performance was better with 3D analysis than with 2D analysis. Of the four 69-mm polyps that were not highlighted by CAR software, reader 1 detected two by using the 2D view and three by using the 3D endoluminal view. Reader 2 did not detect any polyps by using the 2D view but detected three by using the 3D endoluminal view. Although the numbers are too small to draw any meaningful conclusion, they do suggest that 2D analysis may be inferior to 3D analysis.
Overall, reading times were significantly shorter for 2D analysis versus primary 3D analysis, a finding that reflects the results of previous studies (1115). The requirement to view both data sets twice (forward and backward navigation) when performing an endoluminal fly-through analysis inevitably adds to reporting time, depending on the software platform used. Furthermore, filling defects that are easily dismissed as fecal residue on 2D views often need further interrogation on 3D views. Although 2D reporting times were quicker, they were still relatively long for both readers. Reading times for both methods were too long for clinical use. On balance, this most likely reflects the reader's experience, the use of both tagged and untagged cases, the need to annotate positive findings, and perhaps most importantly the laboratory effect of taking part in a study. We did not specifically address the effect of CAR software on reporting times, and clearly this is another area for further study.
Eight polyps larger than 5 mm were detected on only one of the two workstations. It is interesting to speculate as to why readers correctly identified a polyp at 3D endoluminal analysis yet dismissed it at 2D analysis, even though it had been highlighted by the CAR software (especially because a 3D cube was available for problem solving). It can only be assumed that either the reader immediately dismissed the polyp without reviewing the 3D cube or the surface-shaded 3D cube was less convincing than the volume-rendered endoluminal reconstruction (26). We did not, however, find any significant difference in reader confidence for the detection of polyps between analysis methods. Four polyps were detected solely by using 2D analysis with CAR software, and it could be argued that adding CAR software to a primary 3D view may be more productive than adding CAR software to complement a primary 2D view. There is no absolute reason why CAR software cannot be applied to any view, be it transverse, MPR, or 3D endoluminal. At the time of this study, however, such a product is not commercially available but will clearly be the focus of much future research.
Our study does have limitations. The workstation used for primary 3D analysis was different than the workstation used by Pickhardt et al (4). Thus, our readers did not have the advantage of subtraction software or the ability to highlight the unseen colonic lumen. However, both the workstation and 3D fly-through iteration that we employed are up to date, commercially available, and widely used by many radiologists throughout the world who perform CT colonography. Furthermore, we ensured that all polyps were visible without the subtraction of tagged fluid.
Although one reader detected just three of nine large polyps by using primary 3D analysis, the other reader detected all large polyps, which suggests that the software itself was at least adequate. The use of mixed data sets from several institutions may have added to the complexity of the 2D readings (eg, readers may have employed several windows in tagged cases, thereby increasing reporting times). It could also be argued that we should not have used a mixture of tagged and untagged data sets. The use of mixed data sets, however, provided a sterner test for the CAR software, thereby allowing us to draw wider conclusions from the study. Importantly, we found no difference in polyp detection between tagged and untagged cases.
For this study, we used a polyp-enriched data set, and it could be argued that the benefits of 3D analysis would be more evident if the prevalence of abnormality was reduced. However, to perform such a study would require many more data sets to reach acceptable statistical power. It should be noted that the readers were unaware of the prevalence of abnormality in this study and had no prior expectations related to this. In addition, our statistical methods assumed independence of polyp observations (ie, the probability of polyp detection was not influenced by the specifics of the individual case). Although all studies were clinically adequate (there were no poorly prepared cases), all were obtained with thin-collimation protocols, and readers were instructed to document every finding in the colon. We accept that nonindependence cannot be fully assumed, which may weaken our conclusions.
We did not specifically investigate the additional benefit of 2D analysis with CAR software versus 2D analysis alone so we cannot be certain as to how much the CAR software contributed to reader performance. Any incremental benefit of CAR software was not the primary purpose of our study, and further research into this particular aspect is ongoing. We also did not include any flat polyps in our study, and clearly the effect of CAR software on detection requires further investigation.
In conclusion, we found that primary 2D analysis supplemented by CAR software is quicker than and as sensitive as the primary 3D fly-through method for polyp detection during CT colonography.
| ADVANCES IN KNOWLEDGE |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Abbreviations: CAR = computer-assisted reader MPR = multiplanar reformatted 3D = three-dimensional 2D = two-dimensional
2 Current address: Department of Specialist X-Ray, University College Hospital, London, United Kingdom ![]()
3 Current address: Paul Strickland Scanner Centre, Mount Vernon Hospital, Northwood, United Kingdom ![]()
See Materials and Methods for pertinent disclosures.
Author contributions: Guarantor of integrity of entire study, S.A.T.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, S.A.T.; clinical studies, S.A.T., V.G., D.N.B., M.E.R., L.H., J.M.; statistical analysis, S.A.T., V.G.; and manuscript editing, S.A.T., S.H., V.G., M.E.R., L.H., J.M., H.A., J.D.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. Hock, R. Ouhadi, R. Materne, A.-S. Aouchria, I. Mancini, T. Broussaud, P. Magotteaux, and A. Nchimi Virtual Dissection CT Colonography: Evaluation of Learning Curves and Reading Times with and without Computer-aided Detection Radiology, September 1, 2008; 248(3): 860 - 868. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Summers, L. R. Handwerker, P. J. Pickhardt, R. L. Van Uitert, K. K. Deshpande, S. Yeshwant, J. Yao, and M. Franaszek Performance of a Previously Validated CT Colonography Computer-Aided Detection System in a New Patient Population Am. J. Roentgenol., July 1, 2008; 191(1): 168 - 174. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Taylor, R. Greenhalgh, R. Ilangovan, E. Tam, V. A. Sahni, D. Burling, J. Zhang, P. Bassett, P. J. Pickhardt, and S. Halligan CT Colonography and Computer-aided Detection: Effect of False-Positive Results on Reader Specificity and Reading Efficiency in a Low-Prevalence Screening Population Radiology, April 1, 2008; 247(1): 133 - 140. [Abstract] [Full Text] [PDF] |
||||
![]() |
S A TAYLOR, D BURLING, M RODDIE, L HONEYFIELD, J MCQUILLAN, P BASSETT, and S HALLIGAN Computer-aided detection for CT colonography: incremental benefit of observer training Br. J. Radiol., March 1, 2008; 81(963): 180 - 186. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Taylor, S. C. Charman, P. Lefere, E. G. McFarland, E. K. Paulson, J. Yee, R. Aslam, J. M. Barlow, A. Gupta, D. H. Kim, et al. CT Colonography: Investigation of the Optimum Reader Paradigm by Using Computer-aided Detection Software Radiology, December 19, 2007; (2007) 2461070190. [Abstract] [Full Text] |
||||
![]() |
N. Petrick, M. Haider, R. M. Summers, S. C. Yeshwant, L. Brown, E. M. Iuliano, A. Louie, J. R. Choi, and P. J. Pickhardt CT Colonography with Computer-aided Detection as a Second Reader: Observer Performance Study Radiology, December 1, 2007; 246(1): 148 - 156. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Kim, J. M. Lee, H. W. Eun, M. W. Lee, J. K. Han, J. Y. Lee, and B. I. Choi Two- versus Three-dimensional Colon Evaluation with Recently Developed Virtual Dissection Software for CT Colonography Radiology, September 1, 2007; 244(3): 852 - 864. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Pickhardt, S. A. Taylor, and S. Halligan Polyp Detection at CT Colonography: Inadequate Primary 3D Endoluminal Reference Standard Precludes Meaningful Comparison Radiology, July 1, 2007; 244(1): 316 - 317. [Full Text] [PDF] |
||||
![]() |
T. Mang, A. Maier, C. Plank, C. Mueller-Mang, C. Herold, and W. Schima Pitfalls in Multi-Detector Row CT Colonography: A Systematic Approach RadioGraphics, March 1, 2007; 27(2): 431 - 454. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||