Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2382042066
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.
(Radiology 2006;238:446-453.)
© RSNA, 2006


Breast Imaging

Correlation of Radiologist Rank as a Measure of Skill in Screening and Diagnostic Interpretation of Mammograms1

Craig A. Beam, PhD, Emily F. Conant, MD and Edward A. Sickles, MD

1 From the Biostatistics Core, H. Lee Moffitt Cancer Center & Research Institute, Tampa, Fla. From the 2005 RSNA Annual Meeting. Received December 7, 2004; revision requested February 3, 2005; revision received March 31; accepted May 2; final version accepted July 11. Address correspondence to C.A.B., Division of Epidemiology and Statistics, University of Illinois at Chicago, School of Public Health, 1603 W Taylor St, Chicago, IL 60612-4394 (e-mail: cbeam{at}uic.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Purpose: To determine whether skill in the interpretation of screening mammograms is correlated with skill in the interpretation of diagnostic mammograms.

Materials and Methods: The institutional review board of the University of South Florida approved this study. This study was determined to be exempt from informed consent requirements because of the retrospective use of images and was conducted before HIPPA requirements were implemented. A total of 59 radiologists interpreted screening and diagnostic performance test sets of mammograms with a 1-year interval. Interpretations were recorded with modifications of the Breast Imaging and Reporting Data System. Radiologist skill was measured as the radiologist's ranking among his or her cohort in each of several measures of performance (ie, performance test receiver operating characteristic curve area, performance test screening sensitivity, performance test diagnostic sensitivity, and associated specificities). Correlations between radiologist rank in screening and rank in the diagnostic performance test measures were analyzed with the Spearman rank correlation statistical test.

Results: Radiologist rank in screening interpretations and in diagnostic interpretations was found to be significantly correlated in all measurements (P < .05). However, only two measurments (ie, receiver operating characteristic curve area rank correlation of 0.327 and sensitivity rank correlation of 0.402) remained significant after adjusting for multiple testing. The correlation between ranked screening specificity and ranked diagnostic specificity (0.296) was significant at only the .05 level.

Conclusion: The interpretive performance of radiologists among their peers is moderately correlated between screening and diagnostic interpretations. Thus, proficiency in one area does not guarantee proficiency in the other area for some radiologists.

© RSNA, 2006


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Radiologists who interpret mammograms are faced with two distinct tasks. On one hand, when radiologists interpret screening mammograms, their focus is on the decision to call the patient back for further work-up (ie, additional imaging examinations). On the other hand, after the patient has been called back and additional examinations have been performed, the radiologist must then interpret the diagnostic mammographic studies; this task is focused on the decision to recommend biopsy examination of findings detected at screening.

Although these tasks are distinct, it seems natural to expect that radiologists who are highly skilled in screening interpretation should also be highly skilled in diagnostic interpretation. On the other hand, we might equally hypothesize that the tasks are distinct enough that expertise in one task requires development of specific skills that might not be beneficial for the other task. For example, perceptual skill in target detection and localization logically seems to be a primary skill needed in screening interpretation; however, detection and localization skills may not be as important as analytic skills in diagnostic interpretation.

The concept of expertise disequilibrium in mammography (ie, disparate levels of interpretive performance for screening examinations vs diagnostic examinations for a given radiologist) may be useful. The detection of a trend toward greater expertise disequilibrium across time could serve as an important bellwether of substantial shifts in professional attitudes and societal pressures regarding mammography. For example, a trend toward greater expertise in diagnostic interpretation could signal a shift in expertise coming about because of the widespread use of computer-aided detection in screening and loss of skill in detection. Some shifts away from mammography by radiologists in training have been documented (1). It would be beneficial to have other indicators to independently corroborate such findings. Thus, the purpose of our study was to determine whether and to what extent skill in the interpretation of screening mammograms is correlated with skill in the interpretation of diagnostic mammograms.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
In this study, we distinguish screening mammogram interpretation from diagnostic mammogram interpretation as follows: Screening mammogram interpretation involves interpretation of an index screening mammogram obtained in an asymptomatic patient and includes comparison with previously obtained screening mammograms (when available) before additional examinations are ordered. Diagnostic mammogram interpretation involves interpretation of an index screening mammogram and comparison with previously obtained mammograms, as well as interpretation of additional diagnostic studies that were obtained on the basis of the interpretation of the index screening mammogram. Diagnostic mammogram interpretation can be based on various indications—such as short-interval follow-up, evaluation of a palpable lump, or any other symptom that might suggest the presence of breast cancer; however, we chose to focus on diagnostic examinations that were performed on the basis of abnormal screening mammographic findings to best compare screening accuracy with diagnostic accuracy.

The institutional review board of the University of South Florida approved our study. Our study was exempt from informed consent requirements because of the retrospective use of images. Our study was conducted before the Health Insurance Portability and Accountability Act requirements were implemented.

Radiologist Characteristics
Radiologists who participated in this study were recruited from a study of screening mammogram interpretation among U.S. radiologists (2). In the screening mammogram study, 110 radiologists interpreted mammograms obtained in the same set of 148 screening cases. These radiologists were randomly sampled from a sampling frame of U.S. radiologists that was constructed for the previous study (2). Of the 412 radiologists contacted, 292 (71%) were willing to participate if selected; 110 radiologists were selected. There were no statistically significant differences in the characteristics (ie, years of experience, reading volume, training, practice type) of the radiologists who participated in the screening interpretation study and the characteristics of those who did not. More details can be found in the report of Beam et al (2). After a minimum 1-year hiatus, these 110 radiologists were invited to participate in this study of diagnostic interpretation.

Case Sets
In the screening study (2), 148 index mammography cases were selected randomly from a large screening mammography program that covered the period from 1993 to 1997. All mammograms selected for this study were reviewed for image quality (ie, positioning, contrast, imaging artifacts) by one of the authors (E.F.C.). No cases were rejected because of poor technical quality. Index screening examinations were performed in 64 randomly selected biopsy-proved cancer cases. In addition, 84 index screening examinations performed in randomly selected cases with a minimum of 2 years of cancer-free follow-up or biopsy-proved benign breast disease were included. The index examination was defined as the examination that prompted the first biopsy or as the next-to-last available mammographic examination in women with at least 2 years of follow-up without biopsy. Original film mammograms were used in the reading study. Comparison film screening mammograms were provided, if available, to mimic usual clinical practice. Each set of screening mammograms was obtained with low-dose screen-film mammography performed with dedicated mammography units and use of single-emulsion film. Each set of mammograms consisted of mediolateral oblique and craniocaudal views of each breast. Thus, the screening examinations of 84 women (mean age, 55.6 years; age range, 40–77 years) with a minimum of 2-year cancer-free follow-up or biopsy-proved benign lesions were classified as "noncancerous cases" in the screening case set. As will be described later, a subset of screening cancer cases was used in this study.

In the diagnostic interpretation study, we attempted to acquire all breast imaging work-up data that had been obtained after interpretation of index screening examination findings for each of the cancer cases used in the screening study. Additional work-up data (ie, mammograms and/or ultrasonographic images) were available in 58 cases. Work-up data of six cases of breast cancer were not available at the time of random selection, and these cases were excluded from the study of diagnostic interpretation and from analysis of screening interpretation reported in this article. The comparison film images provided for diagnostic interpretation were the same as those provided for screening interpretation. Thus, the screening and diagnostic case sets included findings obtained in 58 women with biopsy-proved breast cancer (mean age, 61.6 years; age range, 40–85 years).

To provide the readers with a case set reflective of a typical diagnostic case set, we excluded patients without cancer from the screening study, since only one patient was recalled for additional imaging. We then sampled 76 biopsy-proved benign cases from the previously described screening program. The overall mix of cancerous and benign cases was selected to represent the mix of diagnostic cases from a screening program with a high yield of malignancy at biopsy (a calculation of positive predictive value known as PPV3 in the American College of Radiology Breast Imaging and Reporting Data System [BI-RADS]). Assuming all cases that are called back are referred for biopsy at diagnostic interpretation, the case mix of 58 cancers among 134 "screening positive" cases (ie, callbacks) translates to 43% of biopsy findings being positive for malignancy. This is comparable to the 38% rate, which is based on audit data, reported by Dee and Sickles (3). Thus, diagnostic examinations of 76 women with biopsy-proved benign breast lesions (mean age, 58.2 years; age range, 40–92 years) were included in this study.

Reading Study
In both the screening interpretation study and the diagnostic interpretation study, all radiologists interpreted mammograms in a controlled reading environment that was dedicated solely to the study and permitted investigators to control ambient light. More details about the reading study can be found in the report of Beam et al (2). Both the screening reading and the diagnostic reading were performed at a time in which the image quality of the case set reflected current technology.

For the screening study, images were mounted in random sequence on dedicated mammography alternators (RADX, Houston, Tex). Only the age of the patient was provided to the reader in the screening interpretation study. Prior to reading, radiologists were instructed that the case set did not have the mix expected in a typical screening population. Reader orientation consisted of supervised hands-on experience with a set of three practice cases before commencement of the study. Readers were asked to (a) identify findings, (b) make recommendations for further work-up, and (c) report what they believed would be the result of additional work-up. Responses to the third item used in the BI-RADS (4) scale are analyzed in this article.

For the diagnostic interpretation study, each index screening case was reviewed in combination with additional diagnostic studies that were ordered by the original interpreting radiologist and the report of the radiologist who interpreted the index screening mammogram, including relevant clinical history, as recorded in the report by the screening radiologist. Radiologists were asked to identify findings and recommend a management strategy, making use of BI-RADS classification (4). The management recommendations are used in the analyses reported in this article.

It is important to identify differences between the scales used for analysis. The scale used in the analysis of screening interpretation incorporates BI-RADS category 0 by allowing the radiologist to identify the outcome he or she expects to occur from further work-up. This scale essentially is a refinement of the standard BI-RADS categories in that the BI-RADS categories are now subdivided into two ordered subcategories. For example, classification of BI-RADS category 3 (probably benign finding) at screening interpretation is subclassified as either BI-RADS category 3 (probably benign finding, no further work-up needed) or BI-RADS category 3 (probably benign finding, further work-up needed to confirm). Similar interpretations of subcategories can be made for the other BI-RADS categories. The interpretation of the scale analyzed for diagnosis has no option for BI-RADS category 0. Thus, the diagnostic scale can be viewed as the screening scale without the subcategories to recommend further work-up for confirmation. To make the measurement of accuracy more comparable between screening and diagnosis, we collapsed the screening scale into the usual five categories used in diagnostic interpretation by combining the subcategories.

Statistical Analysis
Qualitative characteristics of the radiologists were summarized with percentages and compared between participating radiologists and nonparticipating radiologists with either the {chi}2 test (when cell sample sizes were greater than or equal to five) or the Fisher exact test (when cell sample sizes were less than five).

Mean (± standard deviation) radiologist age and experience (measured by using recent volume reading mammograms, years of experience, and years since residency) were calculated. The means of these quantitative variables were compared between participants and nonparticipants with the two-sample t test. Satterwaite adjustment for unequal variance did not alter the P values substantially, thus all tests reported use pooled variance. Log-transformed volume was analyzed because of skewed distribution observed in the sample.

Testing of both qualitative and quantitative characteristics used a 5% significance level. No adjustment for multiple testing of these variables was performed to achieve the greatest statistical power for detection of differences. All tests were two sided.

Slightly more than half (n = 59) of the 110 radiologists agreed to participate in the diagnostic portion of the study. Since the sample of radiologists in the diagnostic portion of this study does not represent an unbiased random sample, our data might lead to a biased estimate of interreader correlation in the population of readers. This implies that typical "multiple reader–multiple cases" methods are not appropriate for our analysis.

To address this problem, we decided to treat our case sets as though they were fixed for analysis (ie, they were treated like test instruments). This solution was introduced by Hanley (5), and it has been used by others, including Esserman et al (6). By fixing the case set, there is no case sampling variability, and each reader has both a single and a fixed set of interpretations. By virtue of random sampling from the population of radiologists, we can then assume that the reading data are statistically independent (although possibly biased). This assumption also follows logically from the random-effects model for the ratings data (7). With this model, interreader correlation of the data comes about via the random effect arising from case sampling. Fixing the case set eliminates this random effect, hence eliminating correlation among readers.

Although this solution resolves the interreader correlation issue, it does alter the interpretation of conventional measures of diagnostic accuracy (ie, sensitivity, specificity, and receiver operating characteristic [ROC] curve area). This result follows from the fact that sensitivity and specificity are expected values in the population of individuals with disease and those without disease, respectively. While one can compute the proportion of true-positive cases of all disease cases of a particular reader evaluating a test set, the interpretation of the measure is not sensitivity, since it is not based on a random sample but on a fixed performance test set and its statistical expectation is not the population-based expectation of a true random sample.

To remember these facts and to facilitate appropriate interpretation of our findings, we introduce and define the following terms: performance test, performance test sensitivity, performance test specificity, performance test false-positive probability, and performance test ROC curve area.

A performance test is a case set that is used as a test and not as a sample from a larger population. Performance test screening sensitivity refers to the percentage of women in the screening test set with breast cancer who were given a recommendation by the reader to undergo further work-up consisting of short-term follow-up (BI-RADS category 3), additional imaging (BI-RADS category 0), or biopsy consideration (BI-RADS category 4 or 5). Performance test diagnostic sensitivity refers to the percentage of women with breast cancer in the diagnostic test set who were given a recommendation to undergo biopsy by the interpreting radiologist (BI-RADS category 4 or 5). Performance test screening specificity refers to the percentage of women without breast cancer in the test set who were not given a recommendation to undergo further work-up (ie, BI-RADS category 1 or 2). Performance test diagnostic specificity refers to the percentage of women in the test set without breast cancer who were not given a recommendation to undergo biopsy by the interpreting radiologist. Performance test ROC curve areas (ie, screening and diagnostic) were estimated nonparametrically in our study by using the BI-RADS interpretations for screening interpretation (8). Radiologists were instructed to report what they believed would be the result of further work-up by using the BI-RADS interpretation scale. This interpretation was used to define the performance-test screening ROC curve and its area. For diagnosis, the BI-RADS management recommendations were used directly.

For each performance-test measure, the radiologists were ranked from low to high. This rank indicates the position of each radiologist's performance measurement in relation to the measurements of all other radiologists; rank is ordered from lowest to highest. Ranks can have integer values between 1 and 59. A rank of 1 is assigned to the radiologist with the lowest performance measurement among the 59 radiologists. Thus, the radiologist with the rank of 59 had the highest performance measurement.

We used rank as the measure of skill in our study because rankings act as their own reference. If we were to use the values of radiologist performance test sensitivity, it would be hard to gauge the skill of a radiologist on the basis of only the knowledge that he or she had 80% sensitivity. Some other information is required to gauge skill—for example, we would have to select some level of sensitivity to use as a benchmark of skill. On the other hand, the use of ranks allows us to immediately gauge the skill of a radiologist relative to his or her cohort.

Correlations between screening and diagnostic ranks were assessed with the Spearman rank correlation (9). Statistical significance of each rank correlation was determined by using a Bonferroni-corrected significance level of .017. This adjustment ensures that the overall type I error rate will not exceed 5% at joint testing of significance of the three correlations (ie, correlation in rank between screening and diagnostic performance test sensitivity, specificity, and ROC curve area). All statistical analyses were performed with SAS, version 9, software (SAS Institute, Cary, NC).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Radiologist Characteristics
Characteristics of the radiologists from the screening study who agreed to participate in the diagnostic interpretation study are compared with those who did not agree to participate (Tables 1, 2). Fifty-eight (54%) of the 110 radiologists agreed to participate and were classified as participants. Among 15 characteristics related to demographics, training, practice, and facility, three characteristics were significantly different (P < .05) between participants and nonparticipants (Table 1). Participating radiologists were more likely to report an affiliation with an academic practice (10 [17%] participants vs one [2%] nonparticipant), more likely to report a full-time work schedule in radiology (55 [95%] participants vs 41 [78%] nonparticipants), and more likely to report that their facility served a mostly nonwhite population (39 [67%] participants vs 20 [38%] nonparticipants). There were no significant differences between the participating and nonparticipating radiologists in mean age, recent reading volume, years of experience reading mammograms, or years since residency (Table 2).


View this table:
[in this window]
[in a new window]

 
Table 1. Comparison of Radiologist Characteristics

 

View this table:
[in this window]
[in a new window]

 
Table 2. Comparison of Radiologist Age and Experience

 
Interpretive Performance
Correlation in screening and diagnostic values are shown in Figure 1 . The rank correlation (Table 3) between screening and diagnostic levels for both performance test measures was modest (sensitivity, 0.402; specificity, 0.296). A modest (ie, 0.327) correlation was observed for performance test ROC areas. Rank correlations were statistically significant for performance test sensitivity and performance test ROC area at the Bonferonni-corrected threshold (P < .017). Rank correlation in performance test specificity was significant at the P < .05 level but not at the adjusted level.


Figure 1
View larger version (12K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1a: Scatterplots show relationship between diagnostic and screening performance test measurements. Each dot represents a radiologist. The diagonal line represents equal diagnostic and screening measurements. (a) Performance test sensitivity. (b) Performance test specificity. (c) Relationship between diagnostic and screening performance test ROC curve area.

 

Figure 1
View larger version (13K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1b: Scatterplots show relationship between diagnostic and screening performance test measurements. Each dot represents a radiologist. The diagonal line represents equal diagnostic and screening measurements. (a) Performance test sensitivity. (b) Performance test specificity. (c) Relationship between diagnostic and screening performance test ROC curve area.

 

Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1c: Scatterplots show relationship between diagnostic and screening performance test measurements. Each dot represents a radiologist. The diagonal line represents equal diagnostic and screening measurements. (a) Performance test sensitivity. (b) Performance test specificity. (c) Relationship between diagnostic and screening performance test ROC curve area.

 

View this table:
[in this window]
[in a new window]

 
Table 3. Interpretive Performance Test Characteristics of the 59 Radiologists

 
Patterns of Correlation
Some radiologists (Fig 2) maintained their rank in skill, whether for screening interpretation or for diagnostic interpretation. For example, one radiologist ranked 55th in screening performance and 58th in diagnostic performance. On the other hand, some radiologists had large differences in their ranking. For example, one radiologist ranked 53rd in screening interpretation and lowest in diagnostic interpretation.


Figure 2
View larger version (63K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2: Scatterplot shows relationship between radiologist ranking in the diagnostic and screening ROC curve areas. Each dot represents a radiologist. The radiologist with the largest area has a rank of 59. The radiologist with the smallest area has a rank of 1. The diagonal line represents equal ranking in both diagnostic and screening ROC curve areas. Dashed lines demarcate the median rank (ie, the rank of 30).

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Radiologists in our sample who tended to rank highly among their peers in the interpretation of screening mammograms tended to also rank highly among their peers in diagnostic interpretation. However, this relationship was not strict and it did not hold true for every radiologist in our sample. Our results show that there are individuals who have wide disparities in their screening and diagnostic interpretive skills. We refer to this phenomenon as expertise disequilibrium. The performance case set that we have introduced in this article can be used to measure expertise disequilibrium on an individual basis. Documenting expertise disequilibrium in groups of individuals might provide valuable insights into the determinants of expertise in mammogram interpretation, which is an area that needs further research (2).

Although our study used a self-selected sample of radiologists, we were able to establish that our sample favored representation of full-time radiologists who are affiliated with academic practices serving largely nonwhite populations. There appeared to be no other selection biases from the original screening study, which was unbiased with respect to the U.S. population of radiologists (2).

Our results show a downward shift in interpretation skill (as measured with performance tests) from the screening to the diagnostic setting. An important aspect of our study is that the same cancer cases were examined by the same radiologists in reading experiments separated by at least 1 year. Many of the radiologists in our sample had lower performance test diagnostic sensitivity than performance test screening sensitivity. It is important to note that the other measures analyzed in our study do not support the hypothesis that these changes solely reflect changing interpretive thresholds. For example, many radiologists who had reduced diagnostic versus screening performance test sensitivity also had a reduction in performance test specificity; this is the exact opposite of what would happen with a simple change in interpretive threshold.

Inspection of changes in the performance test ROC curve area confirms that the accuracy shift is not solely due to threshold shift. We observe that all but one radiologist had lower performance test diagnostic performance than performance test screening performance. Since the estimation of ROC curve area is not dependent on threshold selection by the radiologist, the persistent reductions in skill from screening to diagnostic interpretation exhibited by our sample of radiologists is unlikely to be solely caused by alterations in thresholds.

The change in performance test measures related to specificity and ROC curve area could reflect real differences between the noncancerous cases read in screening and the noncancerous cases read in diagnosis. Since we attempted to present case sets that reflect clinical populations, we might infer that the screening process is less difficult than diagnostic interpretation because of the differing mix of cases. We might reasonably suspect that discrimination between cancerous cases and noncancerous cases is less difficult when the case mix is extreme (cancerous cases vs normal cases). The influence of case mix or "spectrum" on diagnostic performance is well known in other areas (10,11). On the other hand, radiologists interpreting diagnostic mammograms have a heightened expectation of the presence of cancer, which might make them reluctant to recommend biopsy. In either case, the observation that skill (relative standing among one's cohort) is not strongly correlated suggests that different skills are needed in each activity and that mastery of one skill does not necessarily imply mastery of the other skill.

To the extent that our self-selected sample reflects the larger population of radiologists, our findings can be interpreted to indicate that diagnostic interpretation might be more difficult than screening interpretation for many radiologists currently practicing in the United States. This conclusion implies that continuing medical education training must target diagnostic interpretation. It also implies that computer assistance is more needed in the diagnostic domain than in the screening domain.

Several limitations of our study need to be kept in mind when interpreting our findings. The foremost limitation is our use of a laboratory setting in the measurement of performance. This limitation is realized in the use of case sets that do not present typical proportions of diseased and nondiseased cases. However, our use of ROC curve areas addresses this limitation, since ROC curves are based on conditional probabilities and we randomly sampled cancerous and noncancerous cases. Hence, the ROC curve area estimates should be unbiased estimators. On the other hand, it is possible that the concentration or vigilance of the radiologists in our study does not reflect that in practice; this is another limitation of laboratory measurement. Hence, it might be the case that the ROC curve area of the radiologist in the laboratory exceeds the ROC curve area in practice. To aid interpretation of our findings in light of the latter limitation, we note that laboratory measurement of skill was performed under ideal conditions. Noise and other interruptions were minimized, and radiologists did not have to be concerned with lawsuits resulting from missed cancers. We suspect that this study limitation should result in estimates of skill that reflect the best performance possible.

The fact that some radiologists develop skill in one domain of mammogram interpretation and not in another will not be a surprise to some. Now that our study has documented this phenomenon, however, new questions emerge. On the basis of our findings, it should now be considered important to investigate whether radiologists are aware of their own expertise disequilibrium and whether disequilibrium occurs because of conscious professional choice or because of the influence of unknown factors—including those that are internal and external to the profession and individual. We hope our study has set the stage for additional research in this area.


    FOOTNOTES
 

Abbreviations: BI-RADS = Breast Imaging and Reporting Data System • ROC = receiver operating characteristic

Authors stated no financial relationship to disclose.

Author contributions: Guarantor of integrity of entire study, C.A.B.; study concepts/study design or data acquisition or data analysis/interpretation, C.A.B., E.F.C., E.A.S.; manuscript drafting or manuscript revision for important intellectual content, C.A.B., E.F.C., E.A.S.; approval of final version of submitted manuscript, C.A.B., E.F.C., E.A.S.; literature research, C.A.B.; clinical studies, E.F.C.; experimental studies, C.A.B., E.F.C.; statistical analysis, C.A.B.; and manuscript editing, C.A.B., E.F.C., E.A.S.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Bassett LW, Monsees BS, Smith RA, et al. Survey of radiology residents: breast imaging training and attitudes. Radiology 2003;227:862–869.[Abstract/Free Full Text]
  2. Beam CA, Conant EF, Sickles EA. The association of volume and volume-independent factors with accuracy in screening mammogram interpretation. J Natl Cancer Inst 2003;95:282–290.[Abstract/Free Full Text]
  3. Dee KE, Sickles EA. Medical audit of diagnostic mammography examinations: comparison with screening outcomes obtained concurrently. AJR Am J Roentgenol 2001;176:729–733.[Abstract/Free Full Text]
  4. American College of Radiology. Breast Imaging Reporting and Data System (BI-RADS). 4th ed. Reston, Va: American College of Radiology, 2003.
  5. Hanley JA. Receiver operating characteristic (ROC) methodology: the state of the art. Crit Rev Diagn Imaging 1989;29:307–335.[Medline]
  6. Esserman L, Cowley H, Eberle C, et al. Improving the accuracy of mammography: volume and outcome relationships. J Natl Cancer Inst 2002;94:369–375.[Abstract/Free Full Text]
  7. Beam CA. Random-effects models in the receiver operating characteristic curve-based assessment of the effectiveness of diagnostic imaging technology: concepts, approaches and issues. Acad Radiol 1995;2(suppl 1):S4–S13.
  8. Hanley JA, McNeil BJ. The meaning and use of the area under an ROC curve. Radiology 1982;143:29–35.[Abstract/Free Full Text]
  9. Altman DG. Practical statistics for medical research. Boca Raton, Fla: Chapman & Hall/CRC, 1991; 286.
  10. Begg CB, McNeil BJ. Assessment of radiologic tests: control of bias and other design considerations. Radiology 1988;167:565–569.[Abstract/Free Full Text]
  11. Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978;299:926–993.[Abstract]



This article has been cited by other articles:


Home page
JCOHome page
H. Singh, S. Sethi, M. Raber, and L. A. Petersen
Errors in Cancer Diagnosis: Current Understanding and Future Directions
J. Clin. Oncol., November 1, 2007; 25(31): 5009 - 5018.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
J. H. Cha, W. K. Moon, N. Cho, S. M. Kim, S. H. Park, B.-K. Han, Y. H. Choe, J. M. Park, and J.-G. Im
Characterization of Benign and Malignant Solid Breast Masses: Comparison of Conventional US and Tissue Harmonic Imaging
Radiology, December 1, 2006; 242(1): 63 - 69.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beam, C. A.
Right arrow Articles by Sickles, E. A.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE