Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print June 26, 2006, 10.1148/radiol.2401042099
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2401042099v1
240/2/343    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hadjiiski, L.
Right arrow Articles by Shen, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hadjiiski, L.
Right arrow Articles by Shen, J.
(Radiology 2006;240:343-356.)
© RSNA, 2006


Breast Imaging

Breast Masses: Computer-aided Diagnosis with Serial Mammograms1

Lubomir Hadjiiski, PhD, Berkman Sahiner, PhD, Mark A. Helvie, MD, Heang-Ping Chan, PhD, Marilyn A. Roubidoux, MD, Chintana Paramagul, MD, Caroline Blane, MD, Nicholas Petrick, PhD, Janet Bailey, MD, Katherine Klein, MD, Michelle Foster, MD, Stephanie K. Patterson, MD, Dorit Adler, MD, Alexis V. Nees, MD and Joseph Shen, MD

1 From the Department of Radiology, University of Michigan Medical Center, CGC B2102, 1500 E Medical Center Dr, Ann Arbor, MI 48109-0904 (L.H., B.S., M.A.H., H.P.C., M.A.R., C.P., C.B., J.B., K.K., M.F., S.K.P., D.A., A.V.N., J.S.); and Center for Devices and Radiological Health, U.S. Food and Drug Administration, Rockville, Md (N.P.). Received December 10, 2004; revision requested February 3, 2005; revision received May 5; accepted June 13; final version accepted September 12. Supported by USAMRMC grants DAMD17-98-1-8211, DAMD17-02-1-0489, and DAMD17–02-1-0214 and USPHS grant CA95153. Address correspondence to L.H. (e-mail: lhadjisk{at}umich.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
Purpose: To retrospectively evaluate effects of computer-aided diagnosis (CAD) involving an interval change classifier (which uses interval change information extracted from prior and current mammograms and estimates a malignancy rating) on radiologists' accuracy in characterizing masses on two-view serial mammograms as malignant or benign.

Materials and Methods: The data collection protocol had institutional review board approval. Patient informed consent was waived for this HIPAA-compliant retrospective study. Ninety temporal pairs of two-view serial mammograms (depicting 47 malignant and 43 benign biopsy-proved masses) were obtained from 68 patient files and were digitized. Biopsy was the reference standard. Eight Mammography Quality Standards Act of 1992–accredited radiologists and two breast imaging fellows assessed digitized two-view temporal pairs (in preselected regions of interest only) by estimating likelihood of malignancy and Breast Imaging Reporting and Data System (BI-RADS) category without and with CAD. Observers' rating data were analyzed with Dorfman-Berbaum-Metz (DBM) multireader multicase method. Statistical significance of differences was estimated with the DBM method and Student two-tailed paired t test.

Results: Average area under the receiver operating characteristic curve for likelihood of malignancy across the 10 observers was 0.83 (range, 0.74–0.88) without CAD and improved to 0.87 (range, 0.80–0.92) with CAD (P < .05). The average partial area index above a sensitivity of 0.90 for likelihood of malignancy was 0.35 (range, 0.13–0.54) without CAD and 0.49 (range, 0.18–0.73) with CAD—a nonsignificant improvement (P = .11). For BI-RADS assessment, it was estimated that with CAD, six radiologists would correctly recommend additional biopsies for malignant masses (range, 4.3%–10.6%) and five would correctly recommend reduction of biopsy (ie, fewer biopsies) for benign masses (range, 2.3%–9.3%). However, five radiologists would incorrectly recommend additional biopsy for benign masses (range, 2.3%–14.0%), and one would incorrectly recommend reduction of biopsy (4.3%).

Conclusion: CAD involving interval change analysis of preselected regions of interest can significantly improve radiologists' accuracy in classifying masses on digitized screen-film mammograms as malignant or benign.

© RSNA, 2006


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
Breast cancer is the most frequently diagnosed nonskin cancer in women and is one of the leading causes of death among women in the United States between 40 and 55 years of age (1,2). In the United States, 211 240 women will be diagnosed with breast cancer in 2005, and there will be 40 410 deaths (1). Mammography is currently the only recommended imaging method for breast cancer screening (3,4). Mammography is especially valuable as an early detection tool because it can reveal breast cancer at an early stage, before physical symptoms develop (1). Early detection through mammographic screening and physical examination has been shown, in randomized controlled trials (3,4), to result in improved survival from breast cancer. However, the high sensitivity of mammography is accomplished at a cost of low specificity. To reduce the rate of false-negative diagnosis, lesions with a greater than 2% chance of being malignant will be recommended for biopsy (5). As a result, only 15%–30% of patients referred for biopsy are found to have malignancy (57). Unnecessary biopsies not only cause patient anxiety and morbidity but also increase health care costs. It is, therefore, important to improve the accuracy of interpreting mammographic lesions, thereby increasing the positive predictive value of mammography.

Multiple views and multiple prior studies are routinely reviewed by radiologists in forming a mammographic interpretation. The use of serial mammograms for evaluating interval changes has been found to increase the sensitivity of breast cancer detection (5,8). Recently, Burnside et al (9) and Sumkin et al (10) reported that in a diagnostic setting, comparison with results of a prior examination significantly (P < .001) increased the overall cancer detection rate and the correct recall of patients for additional procedures.

In recent years, a number of computer-aided diagnosis (CAD) techniques for characterization of mammographic lesions in a single mammographic examination were developed (5,1118). Receiver operating characteristic (ROC) studies were performed to evaluate the effects of CAD on radiologists' accuracy in the characterization of malignant and benign masses (18,19) and microcalcification clusters (16) on single- and multiple-view mammograms obtained in a single examination. In all of these studies, the radiologists' performance in terms of the area under the ROC curve (Az) achieved statistically significant improvement (P < .05) when they read with computer aid versus when they read without aid.

We previously (20) developed a classification scheme based on mammograms from multiple examinations. The classifier combines prior and current information that is automatically extracted from masses on serial mammograms. It performed significantly better (P = .015) in terms of Az than did the classifier that used current information alone. We conducted an observer performance study in which radiologists estimated the likelihood of malignancy for masses on single-view serial mammograms (21). The accuracy of the radiologists in the characterization of malignant and benign temporal pairs was significantly improved (P = .005) with CAD versus without CAD.

An important question that we attempted to address in our current study is to what extent CAD influences radiologists' diagnostic recommendations when more mammographic information is available for a case. Thus, the purpose of our study was to retrospectively evaluate the effects of a CAD program that involves an interval change classifier (a classifier that uses interval change information extracted from prior and current mammograms and estimates a malignancy rating) on radiologists' accuracy in characterization of masses on two-view serial mammograms as malignant or benign.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
Data Set
The data collection protocol was approved by the institutional review board of the University of Michigan. Patient informed consent was waived for this retrospective study, which was compliant with the Health Insurance Portability and Accountability Act. We collected mammograms from the files of 68 women. The criteria for patient selection were as follows: Two-view mammograms containing craniocaudal (CC) and mediolateral oblique (MLO) views were available for current and prior examinations, and a mammographically visible mass was present at the current examination. A "mass" was defined as a lesion with completely or partially visualized convex outward borders that was usually depicted on CC and MLO views (22); partially calcified masses were also study eligible. If a case had an area of asymmetry at a prior examination that later was determined to be a mass at the current examination, this case was also study eligible.

For 17 patients, the current mammograms were diagnostic images. There were two patients in whom both the current and the prior mammograms were acquired during diagnostic examinations. Six of the 17 patients with diagnostic images had a palpable mass. The cases were collected from our database of patients who had undergone breast biopsy in our department. We used all available cases that satisfied the above criteria. The current mammograms for the cases used in the study were collected (by L.H., H.P.C., B.S., and N.P.) from December 1995 to February 2001. The patients ranged in age from 37 to 86 years (mean age, 59.9 years).

The mammograms were digitized by using a LUMISCAN 85 laser scanner (Lumisys, Los Altos, Calif) at a pixel resolution of 50 x 50 µm and 4096 gray levels. Because masses are large and relatively noisy objects, they do not require high spatial resolution, and their characterization may be improved by reducing the noise. The images were smoothed by using a 2 x 2 box filter and were down sampled by a factor of two, resulting in images with a pixel size of 100 x 100 µm for further analysis. The pathologic nature of all masses was proved with biopsy. Biopsy results were considered the reference standard.

The true mass locations on all mammograms were identified by a Mammography Quality Standards Act of 1992–accredited radiologist (M.A.H.) with 17 years of experience in reading mammograms. When a mass was not discretely visible on the previous mammogram, the radiologist estimated the area where the mass would develop by comparing the previous mammogram with the current mammogram. Prior mammograms for all study cases had been interpreted prospectively in terms of Breast Imaging Reporting and Data System (BI-RADS) lexicon categories 1, 2, and 3 by the radiologists who initially interpreted the studies at the time of the patients' clinical examination. A region of interest centered at the identified mass location and containing the mass or estimated mass area (for masses that were not discretely visible on the prior mammogram) was extracted from each mammogram. The sizes of the regions of interest were variable to enclose masses and estimated mass areas of different sizes and were large enough to include a strip of breast parenchyma of at least 5 mm in width surrounding the mass. The region-of-interest images were used in the computerized analysis and observer study. They are referred to as the mammograms in the following discussion.

A total of 300 mammograms containing CC and MLO views from serial examinations were obtained from the data set; from these, 90 two-view temporal pairs were formed, of which 47 were malignant and 43 were benign. A two-view temporal pair consisted of four mammograms: the CC and MLO views from a prior examination and the CC and MLO views from a current examination in the same patient. For the purpose of computerized analysis, two temporal pairs were obtained from each two-view temporal pair—the CC temporal pair (combining the current and prior CC views) and, similarly, the MLO temporal pair.

All masses had been sampled with core needle or excisional biopsy to establish their histologic features during the patients' clinical care. The average size (ie, longest diameter) of the malignant masses, at retrospective review, was 7.7 mm (range, 3–22 mm) on the prior mammograms and 12.5 mm (range, 4–42 mm) on the current mammograms. The corresponding sizes were 9.7 mm (range, 4–23 mm) and 11.6 mm (range, 5–30 mm), respectively, for the benign masses. The average size of the current and prior masses was estimated by using the two-view current and prior mammograms, respectively, on which the mass was visible in at least one view (CC or MLO). The average size of the current masses was estimated by using all temporal pairs. The average size of the prior masses was estimated by using 81 temporal pairs (for nine of the malignant temporal pairs, the mass was not visible on any of the CC or MLO views from prior mammograms).

Seven additional two-view temporal pairs depicting normal dense breast tissue that was deemed to mimic mammographic masses by the experienced radiologist (M.A.H.) were mixed into the data set read by the radiologists in the observer study. The seven two-view temporal pairs depicting normal breast tissue were randomly selected from images of the contralateral breast of seven of the 68 patients whose data were included in this study. The observers were informed of the presence of normal tissues, but the proportion was not disclosed. In this way, a slightly more realistic clinical situation in which the observers would have to distinguish normal breast tissue from masses, as well as malignant from benign masses, was simulated. The BI-RADS category 1 (negative) could be chosen. However, the ratings for the normal tissue pairs were not included in the data analyses because this study focused on the characterization of benign versus malignant masses rather than on differentiation of true masses from false masses.

The temporal pairs had a time interval of 6–48 months (Fig 1). When the radiologist identified the location of the mass, he or she also rated the visibility of the masses on the mammograms relative to the visibility of masses encountered in clinical practice on a 10-point scale, with 1 representing the most obvious and 10 the most subtle masses.


Figure 1
View larger version (36K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1: Bar graph shows temporal interval between current and prior mammograms for 90 two-view temporal pairs, of which 47 were malignant and 43 were benign.

 
Computerized Classification of Temporal Masses
We designed a computerized method to analyze the current and prior information for a temporal pair and provide a malignancy rating for the mass. A detailed description of the computerized classification method can be found elsewhere (20). Briefly, in each region of interest containing the current or prior mass, the segmentation of the mass from the surrounding background tissue was performed by means of K-mean clustering, followed by application of the active contour model. From each automatically segmented mass, a total of 35 features were extracted. The features included 20 run-length statistics texture features, 12 morphologic features, and three spiculation features. A detailed description of the features can be found in previous reports (21,2326). Additionally, 35 difference features were derived by subtracting a feature of the prior mass from the corresponding feature of the current mass.

The two-view classifier was designed on the basis of the single-view CC and MLO temporal pairs classification. A "leave-one-case-out" resampling method was used to obtain test scores for the temporal pairs. Sixty-eight training-test partitions were obtained. Stepwise feature selection was applied to the training subsets to reduce the size of the input feature space. A test classifier score was obtained for each single-view CC or MLO temporal pair. For the application to two-view analysis, we obtained a single score for each two-view temporal pair by merging the test classifier scores of the corresponding CC and MLO single-view temporal pairs. We compared three different ways to merge the CC and the MLO single-view temporal pair scores: selecting the minimum between the two, selecting the maximum between the two, and calculating the average of the two.

Relative Computer Malignancy Rating of Masses
In our observer study, the average test classifier scores were linearly transformed to a scale from 1 to 10. The scores were rounded to the closest integer before being presented to the radiologists. This scale was more practical and intuitive for the observers than the original classifier scores, which were real numbers ranging from –3.5 to +2.6. Gaussian functions were fitted to the distributions of the transformed scores of the malignant and benign masses to yield an estimation of the classifier performance. The accuracy of the fit was estimated by using the Kolmogorov-Smirnov test. The radiologists evaluated the temporal pairs by using a graphical user interface. When the radiologist evaluated the cases by using CAD, the fitted distribution was displayed on the interface as a reference.

Observer Performance Study
In the observer performance study, both the CC and MLO view temporal pairs for each mass (Fig 2) were presented to the radiologist at the same time on the workstation. The 100 x 100-µm pixel size images were displayed. The radiologist evaluated the displayed two-view temporal pairs and provided an assessment by using two methods: First, an estimate of the likelihood of malignancy on a 100-point scale (where a score of 1 indicates a benign mass; a score of 100, a mass with a high likelihood of malignancy) and second, an assessment of the mass based on the BI-RADS malignancy ratings (where a score of 1 indicates a negative mass; a score of 2, a benign mass; a score of 3, a probably benign mass; a score of 4, a suspicious mass; and a score of 5, a mass highly suggestive of malignancy). The use of BI-RADS category 0 was not allowed, forcing the radiologist to make a "final" assessment.


Figure 2
View larger version (47K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2: Two-view (CC and MLO) temporal pairs of mammographic images displayed by graphical user interface in observer study. Left: Views of prior mass. Right: Views of current mass. Radiologist provided two ratings: estimate of the likelihood of malignancy (range, 1 = benign to 100 = high likelihood of malignancy) and BI-RADS category (range, 1 = negative to 5 = highly suggestive of malignancy), which are displayed at upper right of the interface. When reading in sequential mode with computer aid, computer rating (gray box with number 7) and fitted distributions (in which white curve represents likelihood that mass is benign and black curve represents likelihood that it is malignant) are both displayed in the lower right of the interface. In sequential mode, the radiologist first evaluated the mass without CAD and could change the likelihood of malignancy and/or BI-RADS category after considering the computer's rating.

 
Three reading conditions—the independent mode, the sequential mode without CAD, and the sequential mode with CAD—were used for the evaluation of the cases. In the independent mode, the two-view temporal pairs were read without computer aid. In the sequential modes, the observers evaluated the cases first without CAD, and then with CAD. The difference between the independent mode and the sequential mode without CAD was that in the independent mode the radiologist would not be influenced by the fact that the computer score would follow immediately after his or her own estimate (as in the sequential mode without CAD).

Eight Mammography Quality Standards Act of 1992–accredited radiologists (M.A.R., C.B., C.P., J.B., K.K., S.K.P., D.A., and A.V.N.) with experience in mammography that ranged from 3 to 24 years and two breast imaging fellows (M.F. and J.S.) participated as observers in this study. The radiologist (M.A.H.) who selected the cases and identified the masses did not participate in the observer experiment.

The 90 two-view temporal pairs of masses and the seven temporal pairs of normal tissue were divided into two case groups. Each observer read the 194 cases (97 cases times two reading conditions) in two reading sessions that were separated by at least 1 month. In each session, one case group was read with the independent mode and the other was read with the sequential mode. The order of the two reading conditions was switched between the reading sessions, and the order of the cases within each case group was randomized for each observer. The orders of the case groups and the reading conditions were arranged in a counterbalanced design such that no one case group or reading condition would be read or applied first more often than another when averaged over all observers.

Additional Comparisons
The observer performance results for the two-view temporal pairs of masses were compared with those for the single-view temporal pairs (L.H.). For this comparison, the radiologists' ratings for the 180 single-view temporal pairs (90 CC and 90 MLO views) obtained from a previous single-view experiment (21) were analyzed. These single-view temporal pairs corresponded to the same 90 two-view temporal pairs in the current experiment, and the 10 readers in the previous study were the same readers as in the current two-view experiment. The single-view observer study was performed separately 3 months before the two-view observer study so that the effects of memorization or learning would be minimal.

A further comparison was made (L.H.) by deriving a set of simulated two-view temporal pair observer ratings for the same 90 masses by artificially combining the two single-view observer ratings (by averaging and rounding) into a two-view rating for each mass. The classification performance achieved by using the 90 simulated two-view ratings was then compared with that achieved by using the 90 two-view ratings directly from the radiologists' reading in the current experiment. This comparison provided some insight as to whether the radiologists might use a "worst-case scenario" in estimating the likelihood of malignancy on the basis of the two views of the mass.

Statistical Analysis
The observer performances were analyzed (by L.H., B.S., H.P.C., and N.P.) in terms of the likelihood of malignancy, as well as in terms of the BI-RADS ratings, for the different modalities and the different radiologists. The Dorfman-Berbaum-Metz multireader multicase method (27) was applied to the radiologists' likelihood-of-malignancy ratings. The Dorfman-Berbaum-Metz method takes into account both the observer and the case sample variations by means of an analysis of variance approach so that the results of this analysis can be generalized to the population of observers, as well as to the population of case samples. The ROC curve was derived from a maximum likelihood estimation of the binormal distributions fitted to the observers' rating data, and the Az value and the partial area index (28) above a sensitivity threshold of 0.90 (Az(0.90)) were calculated. The statistical significance of the difference between the studied modalities was estimated by using the Dorfman-Berbaum-Metz method and the Student two-tailed paired t test for the observer-specific paired data. Additionally, we used the Obuchowski method (29) for analysis of clustered data. The Obuchowski method, which was also generalized by Lee and Rosner (30) for multireader, multimodality studies, accounts for the possible correlations between temporal pairs when multiple prior examinations are available and more than one temporal pair is formed from the multiple serial examinations of the same patient. The method is nonparametric and is robust to a variety of intracluster correlation patterns, as well as to non–normally distributed test results.

The radiologists' recommended action for a given mass was determined by the BI-RADS rating provided in the observer experiment. We considered two different groupings, callback and biopsy, as follows. For the callback grouping, the cases with BI-RADS ratings of 1 or 2 were grouped as "normal," and cases with BI-RADS ratings of 3, 4, or 5 were grouped as callback. For the biopsy grouping, the cases with BI-RADS ratings of 1, 2, or 3 were grouped as "no biopsy," and cases with ratings of 4 and 5 were grouped as "biopsy recommended." After the radiologist evaluated the case with CAD, he or she could change the BI-RADS rating. For the callback grouping, if the BI-RADS rating for a case was changed from 1 or 2 to 3, 4, or 5, the change was considered to be from normal to callback. If the BI-RADS rating for a case was changed from 3, 4, or 5 to 1 or 2, the change was considered to be from callback to normal. Similarly, for the biopsy grouping, a change of the BI-RADS rating for a case from 1, 2, or 3 to 4 or 5 was considered to be a change from no biopsy to biopsy. If the rating for a case was changed from 4 or 5 to 1, 2, or 3, the change was considered to be from biopsy to no biopsy. The changes were counted over all of the cases and all observers and finally averaged by the number of observers. The McNemar test was used to evaluate the statistical significance of changes for the individual radiologists. For all analyses, a P value of less than .05 was considered to indicate a significant difference.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
Generally, the malignant masses were less visible on the prior than on the current mammograms, while the visibility of the benign masses was found to be similar on the current and prior mammograms. The mean difference in visibility ratings between the prior and the current mammograms was 2.5 for the malignant masses, compared with 1.2 for the benign masses (P < .01, unpaired t test between malignant and benign masses). The correlation coefficient between the visibility of the current and that of the prior masses was 0.04 for malignant masses and 0.38 for benign masses. For the CC views, the mean difference in visibility between the current and the prior masses was 2.9 for the malignant and 1.1 for the benign masses (P < .001, unpaired t test). For the MLO views, the mean difference in visibility between the current and prior masses was 2.1 for the malignant and 1.3 for the benign masses (P = .12, unpaired t test).

On average, seven features were automatically selected by the stepwise feature selection for each training-test partition. The seven features included two difference run-length statistics features, three current run-length statistics features, one spiculation feature from the current image, and one spiculation feature from the prior image.

The classification accuracy of the two-view computer classifier in terms of Az was 0.90 for the test data set. We found that the classifier accuracy was the highest when the CC and MLO single-view scores were averaged. This was consistent with our previous experience with (19) and reports in the literature about (18) merging scores in the case of multiple views. The average score was thus used in this observer study. The classifier test scores were linearly transformed to scores between 1 and 10 from the original range (–3.5 to +2.6). The differences between the Gaussian fitted distributions and the transformed scores were not statistically significant for either malignant (P = .31, Kolmogorov-Smirnov test) or benign (P = .98, Kolmogorov-Smirnov test) masses.

Two-View Observer Performance
The two-view observer performance results are presented in Tables 14. The average ROC curves for the observers were obtained by averaging the fitted a and b parameters of the individual radiologist's ROC curve for each mode and then calculating an ROC curve from the average a and b parameters. The parameter a represents the vertical intercept, and b represents the slope of the fitted binormal ROC curve when it is plotted as a straight line on normal deviate axes. The average Az for radiologists was 0.83 for the independent mode, 0.82 for the sequential mode without CAD, and 0.87 for the sequential mode with CAD (Table 1, Fig 3). The observer performance for the reading with CAD was significantly improved compared with that for the independent reading mode (P = .03, Student paired t test; P < .05, Dorfman-Berbaum-Metz method; P = .01, Obuchowski method) and with that for the sequential mode without CAD (P < .01, Student paired t test; P < .01, Dorfman-Berbaum-Metz method; P = .01, Obuchowski method). There was a slight decrease in the performance for the sequential mode without CAD compared with the independent mode; however, the difference was not statistically significant (P = .10, Student paired t test; P = .89, Dorfman-Berbaum-Metz method; P = .52, Obuchowski method).


View this table:
[in this window]
[in a new window]

 
Table 1. Az Values for 10 Radiologists in Characterization of Masses on Two-View Mammograms at Three Reading Sessions

 

View this table:
[in this window]
[in a new window]

 
Table 2. Az(0.90) Values for 10 Radiologists in Characterization of Masses on Two-View Mammograms at Three Reading Sessions

 

View this table:
[in this window]
[in a new window]

 
Table 3. Change in BI-RADS Categories, and Therefore Callback Recommendations, Assigned by 10 Radiologists according to Change in Reading Mode

 

View this table:
[in this window]
[in a new window]

 
Table 4. Change in BI-RADS Categories, and Therefore Biopsy Recommendations, Assigned by 10 Radiologists according to Change in Reading Mode

 

Figure 3
View larger version (38K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3: Average ROC curves for two-view observer study for three reading modes: independent (No CAD- Ind) (Az = 0.83), sequential without CAD (No CAD-Seq) (Az = 0.82), and sequential with CAD (With CAD-Seq) (Az = 0.87). The difference was significant for sequential with CAD mode versus independent mode (P = .03, Student paired t test; P < .05, Dorfman-Berbaum-Metz method) and for sequential with CAD mode versus sequential without CAD mode (P < .01, Student paired t test; P < .01, Dorfman-Berbaum-Metz method). Dashed line at true-positive fraction of 0.90 shows threshold above which Az(0.90) was calculated. For the two-view observer study, Az(0.90) was 0.35 for independent mode, 0.30 for sequential without CAD mode, and 0.49 for sequential with CAD mode. The difference was significant for sequential with CAD mode versus sequential without CAD mode (P < .01, Student paired t test).

 
When we compared results for the independent reading mode with those for the sequential mode with CAD, we found that three radiologists did not change their Az performance with and without CAD. For radiologist 8, Az declined with the use of CAD; however, the difference was not significant (P = .65). The remaining radiologists improved their Az values by 0.01–0.13; for three of these radiologists, the improvement was significant (P < .04). On the other hand, when we compared results for the sequential mode without CAD with those for the sequential mode with CAD, we found that only radiologist 3 had the same performance with and without CAD (Az = 0.80). Az for radiologist 8 declined with the use of CAD, but this difference was not significant (P = .28). The performance of the other radiologists improved in the range from 0.02 to 0.09, and for five of these radiologists the improvement was significant (P < .03).

The average partial area index above a sensitivity of 0.90, Az(0.90), was 0.35 for the independent mode, 0.30 for the sequential mode without CAD, and 0.49 for the sequential mode with CAD (Table 2). There was an improvement in observer performance when reading with CAD compared with reading without CAD. However, only the improvement with the sequential mode with CAD versus the sequential mode without CAD was significant (P < .01, Student paired t test). When we compared results for the sequential mode without CAD with those for the sequential mode with CAD, we found that eight of the radiologists improved their performance with CAD in the high sensitivity range. For five of them, this improvement was significant (P < .05). For one radiologist, Az(0.90) did not change with CAD, and for another radiologist Az(0.90) declined but the difference was not significant (P = .7). When we compared results for the independent mode with those for the sequential mode with CAD, we found that seven radiologists achieved an improvement with CAD, and for three of them the improvement was significant (P < .001). For the remaining three radiologists, Az(0.90) declined with CAD, but not significantly (P > .58).

The radiologists' BI-RADS assessments for the three reading modes are shown in Tables 3 and 4. When we compared results for the sequential mode with CAD with those for the independent mode (Table 3), we found that four radiologists showed an increase in correct recommendations for callback with CAD (range, 2.1%–6.4%). For four radiologists there was also a correct recommendation for reduction of callback (ie, fewer callbacks) (range, 2.3%–4.6%). At the same time, however, for three radiologists there was an increase in incorrect callback for benign masses (range, 4.6%–9.3%), and for two radiologists there was an incorrect reduction of callback for malignant masses (2.1%). When we compared results for the sequential mode without CAD with those for the sequential mode with CAD, we found that five radiologists showed an increase in correct recommendations for callback with CAD (range, 2.1%–4.3%) and three radiologists made correct recommendations for reduction of callback (range, 2.3%–4.6%). Two radiologists recommended additional incorrect callbacks for benign masses (range, 4.6%–14.0%).

The correct biopsy recommendations were increased when radiologists read with CAD (Table 4). When the sequential mode with CAD was compared with the independent mode, six radiologists recommended additional correct biopsies of malignant masses (range, 4.3%–10.6%), and five radiologists correctly recommended reduction of biopsy (ie, fewer biopsies) for benign masses (range, 2.3%–9.3%). However, at the same time, five radiologists incorrectly recommended additional biopsy for benign masses (range, 2.3%–14.0%), and, for one radiologist, there was an incorrect reduction of biopsy (4.3%). When the sequential mode without CAD was compared with the sequential mode with CAD, for seven radiologists there was an increase in correct recommendations for biopsy (range, 2.1%–12.8%) and for three radiologists there was an increase in correct recommendations for reduction of biopsy (range, 2.3%–7.0%). For four radiologists, there were additional incorrect biopsy recommendations for benign masses (range, 2.3%–18.6%).

The above BI-RADS assessment results were not significantly different for most of the radiologists (McNemar test was performed for the results of each radiologist). Exceptions were for radiologists 9 and 4 for both callback and biopsy when the sequential mode without CAD was compared with the sequential mode with CAD (Tables 3, 4). Radiologist 9 had a significant increase in correct biopsy recommendations (P = .041), and radiologist 4 had a significant increase in incorrect reductions of both callback (P = .041) and biopsy (P = .013) recommendations.

Two-View Simulated Reading
We compared the use of minimum, maximum, and average to combine the observer ratings for the CC and MLO single-view temporal pairs into a two-view rating. It was found that the combined ratings obtained from averaging the CC and MLO single-view ratings achieved higher classification accuracy (in terms of Az values) than those of the combined ratings obtained from the minimum or maximum. This was also consistent with our previous experience (19) and with the literature (18) about merging scores in the case of multiple views. The observer results for the simulated reading of 90 two-view temporal pairs obtained by averaging the single-view ratings and the matched 180 single-view temporal pairs are presented in Tables 5–7.


View this table:
[in this window]
[in a new window]

 
Table 5. Az and Az(0.90) Values for 10 Radiologists in Characterization of Masses in 180 Single-View Temporal Pairs and 90 Simulated Two-View Temporal Pairs at Three Reading Sessions

 
When the likelihood of malignancy ratings for the simulated reading of 90 two-view temporal pairs were evaluated, the average Az values for the three reading modes were 0.83 for the independent mode, 0.86 for the sequential mode without CAD, and 0.88 for the sequential mode with CAD (Table 5). The improvement was statistically significant when reading with CAD was compared with independent mode (P < .04, Student paired t test; P = .03, Dorfman-Berbaum-Metz method) or with reading in sequential mode without CAD (P = .01, Student paired t test; P < .02, Dorfman-Berbaum-Metz method).

Az(0.90) values for the simulated reading of two-view temporal pairs in independent mode, the sequential mode without CAD, and the sequential mode with CAD were 0.31, 0.35, and 0.44, respectively (Table 5). There was an improvement in Az(0.90) for the reading with CAD versus the readings without CAD. However, only the improvement compared with the sequential mode without CAD was significant (P < .02, Student paired t test).

The analysis of the BI-RADS assessments (Table 6) for the simulated two-view reading showed that, when reading with CAD was compared with the independent mode, three radiologists would correctly increase callback (2.1%) for malignant cases and four radiologists would correctly reduce callback (range, 2.3%–9.3%) for benign cases. Two radiologists would incorrectly increase the callback (4.7%). When we compared results for the sequential mode with CAD with those for the sequential mode without CAD, we found that three radiologists would call back additional malignant cases correctly (2.1%) and two radiologists would correctly reduce the callback of benign cases (2.3%). One radiologist would incorrectly increase the callback (2.3%).


View this table:
[in this window]
[in a new window]

 
Table 6. Change in BI-RADS Categories, and Therefore Callback Recommendations, Assigned by 10 Radiologists according to Change in Reading Mode of 180 Single-View Temporal Pairs and 90 Simulated Two-View Temporal Pairs

 
In terms of biopsy recommendations (Table 7), for reading with CAD versus reading in independent mode, seven radiologists would correctly recommend for biopsy additional malignant cases (range, 2.1%–19.1%), and three radiologists would correctly recommend reduction of biopsy (range, 2.3%–18.6%). However, six radiologists would incorrectly increase the recommendation of biopsy for benign cases (range, 2.3%–20.9%), and one radiologist would incorrectly recommend a reduction of biopsy (6.4%). When reading with CAD was compared with the sequential mode without CAD, we found that six radiologists would have made additional correct biopsy recommendations for malignant cases (range, 2.1%–8.5%) and two radiologists would have correctly recommended reduction of biopsy for benign cases (2.3%). Seven radiologists would have incorrectly increased the callback (range, 2.3%–27.9%).


View this table:
[in this window]
[in a new window]

 
Table 7. Change in BI-RADS Categories, and Therefore Biopsy Recommendations, Assigned by 10 Radiologists according to Change in Reading Mode of 180 Single-View Temporal Pairs and 90 Simulated Two-View Temporal Pairs

 
Comparison with Single-View Observer Performance
In the case of the 180 single-view observer results, the average Az was 0.78 for the independent mode, 0.80 for the sequential mode without CAD, and 0.83 for the sequential mode with CAD (Table 5). The improvement with CAD was statistically significant for both the independent mode (P < .02, Student paired t test; P < .02, Dorfman-Berbaum-Metz method) and the sequential mode (P < .01, Student paired t test; P < .01, Dorfman-Berbaum-Metz method). The Az(0.90) values for the independent mode, the sequential mode without CAD, and the sequential mode with CAD were 0.21, 0.27, and 0.36, respectively (Table 5). The results showed that there was significant improvement with the sequential mode with CAD compared with the independent mode (P < .02) and the sequential mode without CAD (P < .01). The Az results for the 90 two-view temporal pairs (Table 1) were higher than those for the 180 single-view temporal pairs in all three modes (Fig 4). The improvement was significant for the independent mode (P < .05, Student paired t test) but did not achieve statistical significance for the sequential mode with CAD (P = .05, Student paired t test) and the sequential mode without CAD (P = .33, Student paired t test).


Figure 4
View larger version (43K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4: Average ROC curves for two-view and single-view observer studies for two reading modes. Az was 0.87 for sequential reading of two-view mammograms with CAD (With CAD-Seq 2V) and 0.83 for independent reading of two-view mammograms (No CAD-Ind 2V) (P = .03, Student paired t test; P < .05, Dorfman-Berbaum-Metz method). Az was 0.83 for sequential reading of single-view mammograms with CAD (With CAD-Seq 1V) and 0.78 for independent reading of single-view mammograms (No CAD-Ind 1V) (P < .02, Student paired t test; P < .02, Dorfman-Berbaum-Metz method). Dashed line at true-positive fraction of 0.90 shows threshold above which Az(0.90) was calculated. Az(0.90) was 0.49 for sequential reading of two-view mammograms with CAD, 0.35 for independent reading of two-view mammograms (P = .11, Student paired t test), 0.36 for sequential reading of single-view mammograms with CAD, and 0.21 for independent reading of single-view mammograms (P < .02, Student paired t test).

 
The BI-RADS assessments are presented in Tables 6 and 7. When the evaluation in sequential mode with CAD was compared with evaluation in the independent mode (Table 6), we found that six radiologists correctly recommended additional callback of malignant masses (range, 2.1%–7.4%) and three radiologists correctly reduced callback (range, 2.3%–12.8%) for benign masses. Seven radiologists incorrectly increased the callback (range, 1.2%–9.3%) and one radiologist incorrectly reduced the callback (5.3%). In the case of biopsy evaluation, when reading with CAD mode was compared with independent mode (Table 7), we found that eight radiologists correctly increased the recommendation for biopsy (range, 2.1%–12.8%) and four radiologists correctly reduced the recommendation for biopsy of benign cases (range, 3.5%–18.6%). At the same time, however, six radiologists incorrectly increased the recommendations for biopsy (range, 2.3%–11.6%) and two radiologists incorrectly recommended reduction of biopsy (range, 1.1%–5.3%).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
The performance of the radiologists in terms of Az was better when reading the two-view temporal pairs than when reading the single-view temporal pairs. This trend was consistent for all three reading modes. The improvement in the independent mode was statistically significant (P < .05), and that in the sequential mode with CAD was very close to significant (P = .05). The presentation of more information (two views instead of one view) increased the radiologists' accuracy in characterization of masses on serial mammograms. However, even with the increased accuracy, the radiologists still gained a statistically significant improvement in Az when CAD was available to them during the two-view reading. This further demonstrates the effectiveness of CAD in improving characterization of mammographic masses.

When we analyzed the radiologists' performance individually, we observed that some of them improved their Az performance with CAD, some of them showed no change in performance, and one showed a decline in performance. The largest portion of the radiologists improved their Az when reading with CAD. For at least half of them, the improvement was statistically significant (P < .04). A few radiologists did not change their Az performance with and without CAD, and for one radiologist the Az declined with CAD; however, the difference was not significant (P > .20).

The analyses of the changes of the BI-RADS assessments between the different modes did not reveal significant differences for most of the radiologists (the McNemar test was performed for the results of each radiologist). When we compared evaluation with CAD with evaluation without CAD we found that there were a number of radiologists who would recommend correct increase of callback or biopsy for malignant masses and correct reduction of callback or biopsy for benign masses. At the same time, there were also a number of radiologists who would recommend incorrect increase of callback or biopsy for benign masses and incorrect reduction of callback or biopsy for malignant masses. In general, however, the number of radiologists with correct recommendations was greater than the number of radiologists with incorrect recommendations.

The comparison of the results of the actual two-view reading to those of the simulated two-view reading showed that for the sequential mode with CAD, the Az values were close (0.87 and 0.88, respectively). Performance with the independent mode of two-view and simulated two-view readings was the same, with Az values of 0.83 for both. For the sequential modes without CAD, the Az value from the simulated two-view reading (ie, 0.86) was higher than that from the actual two-view reading (ie, 0.82). However, the difference was not significant (P = .08).

In the report of our previous ROC study with single-view temporal pairs (21), we discussed the fact that the observed change in the performance of the radiologists' reading in independent mode compared with reading in the sequential mode without CAD may reflect the possibility of subtle change in the behavior when that behavior is being studied. This phenomenon has also been observed and discussed by other researchers (3134). In the current observer study of reading the two-view temporal pairs, this phenomenon is not as obvious. When the likelihood-of-malignancy rating for the reading modes without CAD were analyzed, there was a slight decrease for the sequential mode without CAD (Az = 0.82) compared with the independent mode (Az = 0.83). The difference was not significant (P = .10, Student paired t test; P = .89, Dorfman-Berbaum-Metz method).

We did not observe a relationship between the radiologists' performance and their years of experience in breast imaging.

There were some limitations in our study. Ideally a classifier should be developed on the basis of a training data set and then be applied to an independent data set that is used to evaluate the radiologists' performance in the observer study. However, we were limited by the size of the data set of temporal pairs that were collected. A hold-out method of splitting the data set into training and testing subsets would have reduced the statistical power of the study. We therefore employed a leave-one-case-out resampling method to develop and test our classifier, and the resampled test set was used for the observer performance study. The leave-one-out resampling method is well established in the pattern-recognition literature as a statistically valid technique for estimating classifier performance in an unknown population. The test scores of the classifier were presented to the radiologists in the observer study. Furthermore, the purpose of this study was not to measure the absolute performance of the radiologists in comparison with the absolute performance of the classifier. Rather, our goal was to demonstrate that there was a relative improvement in the radiologists' performance when they used a computer classifier that had a reasonable performance as a second opinion. We believe that the use of a different data set will not change the conclusions, as long as the computer classifier has a reasonable performance.

Another limitation of the study was the fact that the radiologists evaluated regions of interest containing the masses, but not the entire breast. Although this was not a lesion-detection study, there is a possibility that radiologists' characterization accuracy without and with CAD might be different if the whole mammogram was evaluated rather than only a region of interest. On the other hand, if the whole mammogram is displayed, it is possible to have mixed effects from other confounding factors such as breast density, additional lesions, or the fact that different radiologists may use the breast parenchymal information to different extents, which would be difficult to account for.

A third limitation was the fact that we did not allow readers to use BI-RADS category 0. In clinical practice, the radiologists would require additional information such as that yielded by spot-compression magnification mammograms and ultrasonography before using BI-RADS category 3 or greater. However, in our ROC experiment, our purpose was to evaluate the effects of CAD on radiologists' assessment of masses on two-view temporal pair mammograms. The radiologists were asked to make a decision within the BI-RADS categories 1–5 on the basis of the information available on the serial mammograms. Although our study design did not take into account many possible factors in clinical practice, this is the first step in evaluating the effect of our CAD system within a focused goal. An ROC study with a limited goal also provides the advantage of gaining insight into the effects of individual factors without the presence of other confounding factors that mask the individual effects. It is noted that the results of a limited laboratory ROC study may not be directly applicable to clinical practice. Large prospective clinical trials will be needed to evaluate the effect of CAD on radiologists' diagnostic decisions in clinical settings.

In conclusion, we performed an observer study to evaluate the effects of CAD on radiologists' characterization of masses on two-view serial mammograms. We compared the performances of the radiologists with and without CAD when the available diagnostic information was increased—that is, for two-view temporal pairs versus single-view temporal pairs. Our results demonstrate that at both the two-view and the single-view readings there was an improvement in the radiologists' performance when they were assisted by a computer classifier that had a performance in the range that we had studied.


    ADVANCES IN KNOWLEDGE
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 


    ACKNOWLEDGMENTS
 
The authors are grateful to Charles E. Metz, PhD, for the LABMRMC program.


    FOOTNOTES
 

Abbreviations: Az = area under the ROC curve • Az(0.90) = partial area index above a sensitivity threshold of 0.90 • BI-RADS = Breast Imaging Reporting and Data System • CAD = computer-aided diagnosis • CC = craniocaudal • MLO = mediolateral oblique • ROC = receiver operating characteristic

Authors stated no financial relationship to disclose.

Author contributions: Guarantor of integrity of entire study, L.H.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; literature research, L.H., B.S., M.A.H., H.P.C.; experimental studies, M.A.H., M.A.R., C.P., C.B., J.B., K.K., M.F., S.K.P., D.A., A.V.N., J.S.; statistical analysis, L.H., B.S., H.P.C., N.P.; and manuscript editing, all authors


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 

  1. Cancer facts & figures 2005. www.cancer.org. American Cancer Society Web site. Accessed April 21, 2005.
  2. Jemal A, Murray T, Ward E, et al. Cancer statistics. CA Cancer J Clin 2005;55:10–30.[Abstract/Free Full Text]
  3. Feig SA, D'Orsi CJ, Hendrick RE, et al. American College of Radiology guidelines for breast cancer screening. AJR Am J Roentgenol 1998;171:29–33.[Free Full Text]
  4. Cady B, Michaelson JS. The life-sparing potential of mammographic screening. Cancer 2001;91:1699–1703.[CrossRef][Medline]
  5. Sickles EA. Periodic mammographic follow-up of probably benign lesions: results in 3184 consecutive cases. Radiology 1991;179:463–468.[Abstract/Free Full Text]
  6. Kopans DB. The positive predictive value of mammography. AJR Am J Roentgenol 1992;158:521–526.[Free Full Text]
  7. Adler DD, Helvie MA. Mammographic biopsy recommendations. Curr Opin Radiol 1992;4:123–129.[Medline]
  8. Bassett LW, Shayestehfar B, Hirbawi I. Obtaining previous mammograms for comparison: usefulness and costs. AJR Am J Roentgenol 1994;163:1083–1086.[Abstract/Free Full Text]
  9. Burnside ES, Sickles EA, Sohlich RE, Dee KE. Differential value of comparison with previous examinations in diagnostic versus screening mammography. AJR Am J Roentgenol 2002;179:1173–1177.[Abstract/Free Full Text]
  10. Sumkin JH, Holbert BL, Herrmann JS, et al. Optimal reference mammography: a comparison of mammograms obtained 1 and 2 years before the present examination. AJR Am J Roentgenol 2003;180(2):343–346.[Abstract/Free Full Text]
  11. Kilday J, Palmieri F, Fox MD. Classifying mammographic lesions using computer-aided image analysis. IEEE Trans Med Imaging 1993;12:664–669.[CrossRef][Medline]
  12. Sahiner B, Chan HP, Petrick N, Helvie MA, Adler DD, Goodsitt MM. Classification of malignant and benign breast masses: development of a high-sensitivity classifier using a genetic algorithm [abstr]. Radiology 1996;201(P):256–257.
  13. Chan HP, Sahiner B, Petrick N, et al. Computerized classification of malignant and benign microcalcifications on mammograms: texture analysis using an artificial neural network. Phys Med Biol 1997;42:549–567.[CrossRef][Medline]
  14. Sahiner B, Chan HP, Petrick N, Helvie MA, Goodsitt MM. Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis. Med Phys 1998;25:516–526.[CrossRef][Medline]
  15. Hadjiiski L, Sahiner B, Chan HP, Petrick N, Helvie M. Classification of malignant and benign masses based on hybrid ART2LDA approach. IEEE Trans Med Imaging 1999;18:1178–1187.[CrossRef][Medline]
  16. Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K. Improving breast cancer diagnosis with computer-aided diagnosis. Acad Radiol 1999;6:22–33.[CrossRef][Medline]
  17. Tourassi GD, Markey MK, Lo JY, Floyd CE. A neural network approach to breast cancer diagnosis as a constraint satisfaction problem. Med Phys 2001;28:804–811.[CrossRef][Medline]
  18. Huo Z, Giger ML, Vyborny CJ, Metz CE. Breast cancer: effectiveness of computer-aided diagnosis—observer study with independent database of mammograms. Radiology 2002;224:560–568.[Abstract/Free Full Text]
  19. Chan HP, Sahiner B, Helvie MA, et al. Improvement of radiologists' characterization of mammographic masses by computer-aided diagnosis: an ROC study. Radiology 1999;212:817–827.[Abstract/Free Full Text]
  20. Hadjiiski L, Sahiner B, Chan HP, Petrick N, Helvie MA, Gurcan M. Analysis of temporal change of mammographic features: computer-aided classification of malignant and benign breast masses. Med Phys 2001;28:2309–2317.[CrossRef][Medline]
  21. Hadjiiski L, Chan HP, Sahiner B, et al. Improvement of radiologists' characterization of malignant and benign breast masses in serial mammograms by computer-aided diagnosis: an ROC study. Radiology 2004;233:255–265.[Abstract/Free Full Text]
  22. American College of Radiology. Breast Imaging Reporting and Data System Atlas (BI-RADS Atlas). Reston, Va: American College of Radiology, 2003.
  23. Galloway MM. Texture classification using gray level run lengths. Comput Graph Image Proc 1975;4:172–179.
  24. Sahiner B, Chan HP, Petrick N, Helvie MA, Hadjiiski LM. Improvement of mammographic mass characterization using spiculation measures and morphological features. Med Phys 2001;28:1455–1465.[CrossRef][Medline]
  25. Petrick N, Chan HP, Sahiner B, Helvie MA. Combined adaptive enhancement and region-growing segmentation of breast masses on digitized mammograms. Med Phys 1999;26:1642–1654.[CrossRef][Medline]
  26. Sahiner B, Chan HP, Petrick N, Hadjiiski LM, Helvie MA, Paquerault S. Active contour models for segmentation and characterization of mammographic masses. In: The 5th International Workshop on Digital Mammography, Toronto, Canada. Madison, Wis: Medical Physics, 2001; 357–362.
  27. Dorfman DD, Berbaum KS, Metz CE. ROC rating analysis: generalization to the population of readers and cases with the jackknife method. Invest Radiol 1992;27:723–731.[CrossRef][Medline]
  28. Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996;201:745–750.[Abstract/Free Full Text]
  29. Obuchowski NA. Nonparametric analysis of clustered ROC curve data. Biometrics 1997;53:567–578.[CrossRef][Medline]
  30. Lee ML, Rosner BA. The average area under correlated receiver operating characteristic curves: a nonparametric approach based on generalized two-sample Wilcoxon statistics. J R Stat Soc Ser C Appl Stat 2001;50:337–344.[CrossRef]
  31. Helvie MA, Hadjiiski LM, Makariou E, et al. Sensitivity of noncommercial computer-aided detection system for mammographic breast cancer detection: a pilot clinical trial. Radiology 2004;231:208–214.[Abstract/Free Full Text]
  32. Kobayashi T, Xu XW, MacMahon H, Metz CE, Doi K. Effect of a computer-aided diagnosis scheme on radiologists' performance in detection of lung nodules on radiographs. Radiology 1996;199:843–848.[Abstract/Free Full Text]
  33. FDA Radiological Devices Advisory Panel meeting, March 5, 2001. Review of Deus RapidScreen. www.accessdata.fda.gov. Accessed June 7, 2002.
  34. Beiden SV, Wagner RF, Doi K, et al. Independent versus sequential reading in ROC studies of computer-assist modalities: analysis of component of variance. Acad Radiol 2002;9:1036–1043.[CrossRef][Medline]




This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2401042099v1
240/2/343    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hadjiiski, L.
Right arrow Articles by Shen, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hadjiiski, L.
Right arrow Articles by Shen, J.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE