|
|
||||||||
Breast Imaging |
1 From the Departments of Radiology (E.A.S., J.W.T.L., R.S.B.) and Epidemiology/Biostatistics (R.S.B.), University of California San Francisco School of Medicine, 1600 Divisadero St, Rm H-2801, San Francisco, CA 94115; Center for Health Studies, Group Health Cooperative, Seattle, Wash (D.L.M.); Applied Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Md (R.B.B.); Office of Health Promotion Research, University of Vermont, Burlington, Vt (B.M.G.); Department of Radiology, University of New Mexico-HSC, Albuquerque, NM (R.D.R.); and Department of Radiology, University of North Carolina, Chapel Hill, NC (B.C.Y.). Received April 23, 2004; revision requested July 2; revision received July 20; accepted August 18. Supported by grants U01CA63740 (E.A.S.), U01CA86076 (D.L.M.), U01CA70013 (B.M.G.), U0169976 (R.D.R.), U01CA70040 (B.C.Y.), U01CA63731 (Group Health Cooperative, Diana Buist principal investigator), and U01CA8608201 (New Hampshire Mammography Network, Patricia Carney principal investigator) from the National Cancer Institute; K07CA86032 (R.S.B.) from the NIH. Address correspondence to E.A.S. (e-mail: edward.sickles@ucsfmedctr.org).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Institutional review board approval was met, informed consent was not required, and this study was Health Insurance Portability and Accountability Act compliant. Six mammography registries contributed data to the Breast Cancer Surveillance Consortium (BCSC), providing patient demographic and clinical information, mammogram interpretation data, and biopsy results from defined population-based catchment areas. The study involved 151 mammography facilities and 646 interpreting radiologists. The study population included women 18 years of age or older who underwent at least one diagnostic mammography examination between 1996 and 2001. Collected data were used to derive mean performance parameter values, including abnormal interpretation rate, positive predictive value (for abnormal interpretation, biopsy recommended, and biopsy performed), cancer diagnosis rate, invasive cancer size, and the percentages of minimal cancers, axillary node-negative invasive cancers, and stage 0 and I cancers. Additional benchmarks were derived for these performance parameters, including 10th, 25th, 50th (median), 75th, and 90th percentile values.
RESULTS: The study involved 332 926 diagnostic mammography examinations. Mean performance parameter values were abnormal interpretation rate, 8.0%; positive predictive value for abnormal interpretation, 31.4%; positive predictive value for biopsy recommended, 31.5%; positive predictive value for biopsy performed, 39.5%; cancer diagnosis rate, 25.3 per 1000 examinations; invasive cancer size, 20.2 mm; percentage of minimal cancers, 42.0%; percentage of axillary node-negative invasive cancers, 73.6%; and percentage of stage 0 and I cancers, 62.4%.
CONCLUSION: The presented BCSC outcomes data and performance benchmarks may be used by mammography facilities and individual radiologists to evaluate their own performance for diagnostic mammography as determined by means of periodic comprehensive audits.
© RSNA, 2005
| INTRODUCTION |
|---|
|
|
|---|
Recent reports indicate significantly different clinical outcomes for diagnostic compared with screening mammography, the diagnostic examinations being defined as those performed for indications other than the periodic screening of asymptomatic women (9,10).However, these reports involve only a moderate number (approximately 10 000) of examinations, performed at a single institution, which may limit generalization of the observed findings. There also is evidence of considerable variability in performance parameters among interpreting radiologists; this is probably related to a complex interaction of experience and expertise (7,1117). For diagnostic mammography, the published reports on performance variability are based on data from only 10 interpreting radiologists (9) and are described by investigators as likely being at the ends of the spectrum of performance rather than representing average performance (18,19). Clearly, there is need for more robust data on the clinical outcomes of diagnostic mammography examinations.
The Breast Cancer Surveillance Consortium (BCSC) is a group of mammography registries from geographically diverse areas in the United States, funded by the National Cancer Institute, that collects patient demographic and clinical information, mammogram interpretation data, and biopsy results in the defined catchment areas of its participating facilities (20). The primary purpose of the BCSC is to collect data from diverse population-based settings to examine the practice and performance of mammography throughout the United States. Six BCSC registries collect data on the full range of clinical outcomes pertinent to the comprehensive auditing of mammography performance parameters. Pooling of the data from these registries provides by far the largest reported experience involving diagnostic mammography practice, from which reasonable and realistic performance benchmarks may be derived. Thus, the purpose of our study was to evaluate a range of performance parameters pertinent to the comprehensive auditing of diagnostic mammography examinations, and to derive performance benchmarks therefrom, by pooling data collected from large numbers of patients and radiologists that are likely to be representative of mammography practice in the United States.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Each registry and the BCSC Statistical Coordinating Center, or SCC, have developed data management and quality control procedures that result in high-quality data collection that is comparable across registries. Prior to sending data to the SCC, data quality checks are conducted at each registry by using their own procedures, such as manual validation of a random sample of records, double data entry, monitoring of facility volume over time, and comparing different sources (eg, cancer registry and pathology databases) for consistency. After each annual data submission from the individual registries, the SCC performs additional quality checks of the pooled data by flagging coding errors and by comparing information across registries and over time for consistency and outlying values. The SCC also conducts biennial site visits to each registry and annual meetings involving data managers from the registries to review data management and quality control procedures, as well as to check data quality.
Across the six BCSC registries, 151 mammography facilities contributed to the pooled data. This represents 1.5% of the approximately 10 000 Food and Drug Administrationcertified mammography facilities in the United States in 2000. The pool of data contains diagnostic mammogram interpretations made by 646 radiologists. We have been unable to find reliable estimates of how many radiologists met Food and Drug Administration requirements to read mammograms in 2000.
Two authors (E.A.S. and D.L.M., by consensus) compared the demographic makeup (rural-urban mix, race, ethnicity, education level, and socioeconomic status) of the population living in the catchment areas of the six BCSC registries included in our study with that of the entire U.S. population by using 2000 census data. To describe the BCSC population, we included census data from all counties in which there was a participating mammography facility.
Subjects
The study population included women 18 years of age or older who had undergone at least one diagnostic mammography examination during the years 19962001. Mammography examinations performed after December 2001 were excluded to ensure that there was a period of at least 12 months following examination during which cancer could be diagnosed and a period of an additional 24 months for reporting cancer data to tumor registries. Cancer reporting was at least 95% complete.
Diagnostic examinations are designed to solve specific problems and almost always include as many mammograms as are necessary to make a Breast Imaging Reporting and Data System (BI-RADS) final assessment, as well as case management recommendations. However, under certain circumstances a diagnostic examination is occasionally assessed as "incompleteneeds additional imaging evaluation" (BI-RADS assessment category 0). In this study, 15 971 (4.6%) of 348 897 examinations were given a category 0 assessment. For this study, when one or more diagnostic examinations followed an initial diagnostic examination that was assessed as category 0, all examinations up to and including the first examination with a non-zero assessment (within 180 days) were treated as a single observation. The date of and indication for examination were considered to be those from the initial examination (the first one with a category 0 assessment). However, we used the assessment and management recommendations from the first non-zero assessment and attributed the observed clinical outcomes to the radiologist who made that first non-zero assessment. If there was no non-zero assessment within 180 days, all of the examinations were excluded (10 662 of 348 897 examinations, 3.1%).
Data Collection Procedures
Across all BCSC registries, mammography patients complete a questionnaire that requests medical history and demographic data (including date of most recent mammography examination, family history of breast cancer, previous percutaneous or surgical biopsies, personal history of breast cancer, and description of breast symptoms experienced within the past 3 months). Women were considered to have a family history of breast cancer if they reported having at least one female first-degree relative (mother, sister, or daughter) with breast cancer. Women were considered to have a personal history of breast cancer if they self-reported previous breast cancer or if there was evidence of previous breast cancer in the cancer registry or pathology database. Each woman was considered to have undergone a previous mammography examination if she self-reported a history of prior mammography or there was data from a prior mammography examination in the BCSC database.
Diagnostic mammography is performed for a variety of problem-solving indications, including work-up of abnormalities detected at screening mammography, evaluation of abnormalities found at clinical examination, and short-interval follow-up examinations both for probably benign lesions and for cancer patients recently treated by means of breast preservation surgery. Other special breast problems, such as the presence of implants or the evaluation of extent of disease for a known malignancy may also represent indications for diagnostic mammography. Across all BCSC registries, the interpreting radiologist prospectively classifies each diagnostic mammography examination into one of three categories: additional work-up of an abnormality detected at screening examination, short-interval follow-up, or evaluation of a breast problem. We further subdivided the "evaluation of a breast problem" category according to whether the patient indicated the presence of a palpable lump on the medical history questionnaire that she completed at the time of her mammography examination, because results in previous published reports have shown substantially different clinical outcomes based on this approach (9,10). If the self-reported response concerning palpable lump was missing for a given examination, we used the first nonmissing response, if any, within the previous 90 days.
The mammography registry also collects data on image interpretation, including management recommendations and the BI-RADS assessment categories assigned by the interpreting radiologist for each mammography examination (22,23). A separate assessment is recorded for each breast. For purposes of this study, we have reported an overall assessment for the entire examination, if appropriate, by using the more abnormal BI-RADS assessment category according to the following hierarchy: negative (category 1), benign (category 2), probably benign (category 3), suspicious (category 4), or highly suggestive of malignancy (category 5). Published results of a previous investigation, as well as our own data, show only very small, nonsignificant differences between woman-specific and breast-specific outcomes data (24), indicating that woman-specific data are sufficiently accurate measures of interpretive performance.
All BCSC registries record data on whether or not breast ultrasonography (US) is performed concurrently with diagnostic mammography. However, these data do not include a separate BI-RADS assessment category for US examinations.
In a report on diagnostic mammography from the BCSC, Geller et al (25) showed that in 10%15% of examinations with positive (abnormal) findings, there is discordance between the BI-RADS assessment and subsequent management recommendations provided by the interpreting radiologist. An example of such discordance is a finding assessed as suspicious, accompanied by the recommendation for anything other than biopsy or surgical consultation. In this study, we have chosen to analyze mammography interpretation data by using both BI-RADS assessments and management recommendations to parallel the BI-RADS auditing approaches that will be discussed in the paragraph concerning positive predictive value (PPV) calculations.
Mammography patients were considered to have breast cancer if a state tumor registry, Surveillance Epidemiology and End Results program registry, or pathology database indicated the diagnosis of invasive carcinoma or ductal carcinoma in situ (DCIS) within 12 months after a diagnostic mammography examination. Additional data collected for breast cancer cases included tumor size (for invasive cancers), axillary lymph node status (for invasive cancers), and American Joint Committee on Cancer stage (26).
Outcome Measures
A positive (abnormal) assessment at diagnostic mammography was defined as an overall assessment of suspicious for or highly suggestive of malignancy. Cancer diagnosis rate was defined as the number of cancer cases identified at mammography (mammographically true-positive) divided by the total number of diagnostic mammography examinations. A true-positive case is one that is followed by the diagnosis of invasive breast cancer or DCIS within 12 months of a positive assessment at diagnostic mammography. Conversely, a case was considered to be false-positive if results at diagnostic mammography were interpreted as positive and no breast cancer was diagnosed within the next 12 months.
In this article, we do not report on measures of sensitivity or specificity because such measures require the enumeration of false-negative and true-negative cases, respectively, involving tumor registry linkage data that are not generally available to mammography facilities or individual practicing radiologists. These measures, as well as other data beyond the scope of this article, are available to interested readers on the BCSC Web site (breastscreening.cancer.gov/benchmarks/diagnostic).
Statistical Analysis
Calculations of PPV were made by dividing the number of true-positive cases by the sum of true-positive and false-positive cases. Three separate PPV calculations were performed by using BI-RADS methods: PPV1, probability of cancer after positive mammography interpretation; PPV2, probability of cancer after recommendation for biopsy or surgical consultation, following positive mammography interpretation; and PPV3, probability of cancer after biopsy, following positive mammography interpretation and a recommendation for biopsy or surgical consultation. "Biopsy" included the performance of any type of biopsy (fine-needle aspiration, core, or surgical biopsy), whether or not imaging guidance was used to perform the biopsy.
Because the principal aim of this study was to provide outcomes data to be used for the derivation of clinically relevant performance benchmarks, we have chosen to provide only descriptive statistics such as those enumerated previously. Because benchmarks are more meaningful if they indicate ranges of performance as well as arithmetic means, we also have calculated percentile values for selected outcomes. For example, the combination of 25th and 75th percentile values defines the range within which the middle 50% of performance is found, and the combination of 10th and 90th percentile values defines the range within which the middle 80% of performance is found. To reduce the number of radiologists with zero observed "events" (eg, no abnormal interpretations, no cancers diagnosed) in our percentile data, we report outcomes from only those radiologists who contributed at least a designated, subjectively determined minimum number of cases for each outcome, because radiologists with zero events do not contribute useful or informative data. We have used graphical presentations (frequency distributions overlaid with percentile values) to display these data in an easily understandable format. More complex analytic methods, designed to elucidate statistically significant interactions among the data variables collected, are beyond the scope of our study.
| RESULTS |
|---|
|
|
|---|
Demographic Factors
The demographic makeup of the population living in the catchment areas of the six BCSC sites in our study is compared with that for the entire U.S. population in Table 1. There were only slight differences, none greater than five per-centage points, between our study population and the U.S. population. Our study population was slightly more rural, contained slightly fewer African American and Hispanic women, was slightly more educated, and had a slightly higher median family income than the entire U.S. population.
|
|
|
|
Some breast lesions are found to be palpable only in retrospect, after diagnostic mammography is performed (ie, once the presence of a lesion is verified and its three-dimensional location is precisely determined). During the study period (19962001), the performance of imaging guidance for tissue diagnosis was limited primarily to those lesions that were nonpalpable even in retrospect. Data on the use of imaging guidance for tissue diagnosis were unavailable for 4773 (56.7%) examinations that led to a breast cancer diagnosis. This very high percentage of missing data precludes reliable determination of the frequency with which breast cancer may be palpable in retrospect, after having been identified at diagnostic mammography.
In our overall study population, among the diagnostic mammography examinations with findings interpreted as abnormal, there were 1473 (17.5%) cases of DCIS and 6938 (82.5%) cases of invasive carcinoma. The highest percentages of DCIS were found for abnormal screening work-up and short-interval follow-up cases (26.9% and 30.7%, respectively); the lowest percentage of DCIS (5.5%) was found for cases of breast problem with a palpable lump reported (Table 4).
Data on tumor size were available for 5998 (86.5%) of the invasive cancers in this study. The mean and median sizes for these cancers were 20.2 mm and 15 mm, respectively. When stratified by indication for examination, as shown in Table 4, invasive cancer size was smallest (therefore prognosis was most favorable) for abnormal screening work-up cases (mean, 14.3 mm; median, 11 mm) and short-interval follow-up cases (mean, 14.4 mm; median, 11 mm). Invasive cancer size was largest for palpable lump evaluation cases (mean, 25.6 mm; median, 21 mm).
Another widely used outcome measure indicating favorable prognosis is the frequency of minimal cancer, which is defined as either DCIS or invasive carcinoma 10 mm or smaller. For the entire study population, there were 3140 minimal cancers, representing 42.0% of the study population if DCIS and invasive cancers only of known size are considered. The highest percentages of minimal cancer were found for abnormal screening work-up and short-interval follow-up cases (62.0% and 64.7%, respectively). The lowest percentage of minimal cancer (17.5%) was found for palpable lump cases (Table 4).
Conversely, a measure of poor prognosis is the frequency of invasive carcinoma larger than 20 mm in size. For the entire study population, there were 2040 such cases, representing 34.0% of invasive cancers of known size. The lowest percentages of these large cancers were found for cases of abnormal screening work-up and short-interval follow-up (14.8% and 15.9%, respectively), whereas the highest percentage (50.8%) was found for cases of palpable lump (Table 4).
Data on axillary lymph node status were available for 6324 (91.2%) of the invasive cancers. For the entire study population, the percentage of these cancers that were node-negative (favorable prognosis) was 73.6%. The highest percentages were found for abnormal screening work-up and short-interval follow-up cases (84.2% and 86.7%, respectively), whereas the lowest percentage (65.6%) was found for palpable lump cases (Table 4).
Data on cancer stage were available for 7381 (87.8%) of the cancers. For the entire study population, the percentage of these cancers that were stage 0 or stage I (favorable prognosis) was 62.4%. The highest percentages were found for abnormal screening work-up and short-interval follow-up cases (81.7% and 82.0%, respectively), whereas the lowest percentage (40.0%) was found for palpable lump cases (Table 4).
Performance Benchmarks
The data presented in Tables 3 and 4 represent arithmetic mean values of clinical outcomes for all diagnostic mammography examinations in our study. However, because it is unlikely that outcomes for a given radiologist will closely approximate these average values, we also present ranges of performance, displayed in graphical format as smoothed plots of frequency distributions overlaid with vertical lines indicating the 10th, 25th, 50th (median), 75th, and 90th percentile values for those participating radiologists who contributed sufficient numbers of cases to provide useful data. The breadth of these ranges, shown in Figures 19, indicates the wide variability in individual performance among radiologists. For example, in Figure 5, E (cancer diagnosis rate, all diagnostic examinations), only 10% of eligible radiologists had a cancer detection rate lower than or equal to 10.3 cancers per 1000 examinations, whereas 90% of radiologists had a rate lower than or equal to 38.0 cancers per 1000 examinations.
|
|
|
|
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
In general, the outcomes we observe for diagnostic mammography are considerably different from published performance benchmarks for screening mammography (58) as were reported recently from the University of California at San Francisco, or UCSF (9,10). The cancer diagnosis rate is substantially greater at diagnostic mammography, and the cancers identified at diagnostic mammography are larger, more frequently node-positive, and are found at a more advanced stage than are those detected at screening mammography. These similarities between BCSC and UCSF data are due partially to inclusion of some UCSF cases in the BCSC data set. However, cancers reported from the UCSF represent only 606 (7.2%) of the 8411 BCSC cancers. Furthermore, these same general observations are valid for both the UCSF and non-UCSF cases in our study.
The overall BCSC data also confirm the previously reported UCSF observation that diagnostic mammography outcomes vary substantially by indication for examination. All three PPVs are lower for examinations performed as work-up of screening-detected abnormalities and short-interval follow-up than for those performed to evaluate a breast problem and especially those performed to evaluate palpable lumps. Similar observations apply concerning the prognostic factors of cancers identified at diagnostic mammography. Cancers identified among examinations performed as work-up of screening-detected abnormalities and short-interval follow-up are smaller, are more frequently node-negative, and are earlier in stage than are those identified among examinations performed to evaluate a breast problem and especially among those examinations performed to evaluate palpable lumps. These observations have been reported previously (9,31) and are to be expected because the populations of patients undergoing diagnostic mammography for work-up of abnormal results detected at screening examinations and for short-interval follow-up involve asymptomatic women similar to the general population of healthy women undergoing routine screening mammography (women among whom advanced cancer outcomes are less likely). The subset of patients undergoing diagnostic mammography for work-up of screening-detected abnormalities differs from the general screening population only in that mammographic abnormalities are present in all cases, thereby accounting for increased abnormal interpretation (BI-RADS category 4 and 5) and cancer diagnosis rates. Our results also reinforce previously published observations that cancer is identified very infrequently (in less than 1% of cases) among patients undergoing diagnostic mammography for short-interval follow-up (3234).
Traditionally, performance benchmarks are derived by panels of expert practitioners from critical analysis of scientific data published in the peer-reviewed literature. This approach has been used in the development of screening mammography benchmarks. The screening benchmarks currently most widely used in the United States are stated to represent "desirable goals" achieved by "highly skilled experts" in mammography (6).
The authors of this article collectively have the appropriate expertise in breast imaging practice, epidemiology, and biostatistics to evaluate the existing scientific data on clinical outcomes for diagnostic mammography, but we find a paucity of previously published scientific data on the subject. The BCSC data reported here involve by far the most extensive published experience with diagnostic mammography and are likely to be representative of results in general practice throughout the United States rather than results achieved by highly skilled specialists. We have chosen to use only these BCSC data in deriving performance benchmarks.
To achieve the goal of presenting representative and reliable performance benchmarks in a format that is easy to understand by practicing radiologists, we have chosen to present our data not only as arithmetic means but also in the form of frequency distribution graphs overlaid with selected calculated percentiles. Note that we have chosen to depart from the previous practice of reporting performance benchmarks as "desirable goals" based on outcomes achieved by "highly skilled experts." It is unclear at what level specialists really perform in the context of BCSC data, although the little scientific evidence already published on the subject suggests that their performance would be at the high end of the numeric scale for all performance parameters except for mean invasive cancer size, for which this would be at the low end of the numeric scale (16,18,19). Rather, the data we report are meant to indicate the range of current clinical outcomes in general practice, and percentile calculations serve as indicators of average and not-so-average performance. These data should not be used to define either standards of care or proscriptive regulatory thresholds for the clinical practice of diagnostic mammography; these issues are beyond the scope of this article. Instead, these data should be used by practicing radiologists to place into perspective the clinical outcomes observed from their own facility-wide and individual audits, for the purpose of continuing quality improvement.
How to Use Benchmark Data
How then should a mammography facility or individual radiologist use the benchmark data presented in this article? First, it will be important to collect data on most if not all of the outcomes reported in this article. One will gain very little insight into either mammography facility or individual radiologist performance if auditing is limited to the cancer-versus-no-cancer tracking of biopsy-recommended cases that is mandated in the United States by Food and Drug Administration regulation (35,36). This approach provides only PPV2 data, which are essentially meaningless unless analyzed in combination with data on cancer detection rate, size, nodal status, and stage. Furthermore, data (mammography outcomes) collection procedures should be either fairly complete or realistically judged to be representative in order to reduce the extent to which case selection bias confounds observed results.
Next, it will be necessary to perform a mammography audit that segregates diagnostic from screening examinations to analyze diagnostic outcomes separately. The methods used in this article parallel the BI-RADS auditing approaches developed by the American College of Radiology (22,23), so these should be followed as closely as possible. If feasible, audit data should be analyzed collectively and also separately by indication for diagnostic examination. Next, selected demographic factors of the diagnostic mammography patient population (age, family history of breast cancer, personal history of breast cancer, mammography performed previously) should be compared with those factors reported in Table 2 to determine whether and to what degree patient-related differences might confound the comparison of ones data with those of the BCSC. For example, if one interprets mammograms from a patient population at very high or very low risk for breast cancer, the interpretations, management recommendations, and clinical outcomes will be different than those reported for the BCSC (35).
Finally, appropriate outcomes should be compared with the benchmarks reported for the BCSC, by using both arithmetic mean data from Tables 3 and 4 and the graphical data shown in Figures 19. For each clinical outcome, then, one will be able to judge the level of performance in terms of being above or below mean and also in terms of an estimated percentile. In so doing, it is important to recognize that larger amounts of data will be collected at the mammography facility level, which will provide more statistical precision (and therefore be less subject to random statistical variation) than data collected at the level of the individual radiologist. For relatively low-volume facilities, and especially for individual radiologists who interpret relatively few diagnostic mammograms, it may be necessary to analyze audit data collected from a period longer than the past year. Despite this limitation, it is very important for radiologist-specific data to be analyzed because this is the only approach that will enable one to identify whether there are individual radiologists within a group practice who need to improve performance.
For those mammography facilities that are able to link their audit data with those in a regional tumor registry, thereby permitting reliable compilation of data on true-negative and false-negative results, calculations of sensitivity and specificity also should be obtained and those calculations should then be compared with parallel BCSC data posted on the BCSC Web site. For those mammography facilities capable of breaking down audit results as a function of important patient demographic factors (patient age, family history of breast cancer, personal history of breast cancer, mammography performed previously), these results also should be compared with parallel BCSC data posted on the BCSC Web site. For either the mammography facility or the individual radiologist who prefers to conduct an online self-versus-BCSC comparison of data, the BCSC is developing a Web site that has a secure user-driven module that employs computer prompts for data entry and validation, followed by interactive displays of performance data for diagnostic mammography for entered-versus-BCSC data. Finally, as the BCSC continues to collect mammography outcomes data over the subsequent years, we also plan to update the performance benchmarks posted on the BCSC Web site, perhaps once a year, so that repeat users will be able to compare their annual audit data with even more robust BCSC data obtained during similar periods of time.
Study Limitations
There are five principal limitations to the use of data from our study. First, insofar as clinical outcomes are expected to vary with changes in the demographic factors of a given patient population (35,2629), those who anticipate such problems, particularly those from countries other than the United States, should make appropriate comparisons of their own demographic data with those of the BCSC before considering BCSC performance benchmarks to be representative of their practice.
The second limitation concerns the subset of patients undergoing diagnostic mammography for evaluation of a breast problem with no self-reported palpable lump or unknown lump status. This group of cases covers a wide variety of indications for diagnostic mammography ranging from indications similar to those for screening (patients with breast implants or breast pain) to evaluations actually ordered for a palpable lump in cases in which the patient did not self-report the presence of a lump. In the BCSC series, these cases are grouped together because no query for these specific indications was prospectively made. It is likely that the diversity of miscellaneous indications for diagnostic mammography (breast problem, no lump/lump status unknown) will vary somewhat, perhaps even widely, among different mammography facilities. Therefore, one should be cautious in comparing results from this specific subset of diagnostic examinations with results from the BCSC.
The third limitation concerns the concurrent interpretation of mammograms and US images as part of an integrated diagnostic breast imaging evaluation. Because some BCSC registries do not collect US-specific interpretation data, we cannot determine the extent to which US may have affected diagnostic mammography assessments or management recommendations. However, some mammography facilities and some radiologists probably did report integrated mammography-US assessments whereas others did not. Note that the November 2003 publication of a new edition of BI-RADS guidelines (23), in which the use of integrated mammography-US assessments is actively recommended for the first time, may confound comparison of clinical outcomes data collected in the future with the 19962001 data that we report in this article.
The fourth limitation concerns our calculation of benchmark percentiles based on outcomes only from those radiologists who contributed at least a designated minimum number of cases for each outcome. Although this approach reduces the number of radiologists who contribute no useful or informative data, it necessarily excludes outcomes from examinations interpreted by low-volume radiologists, ranging from exclusion of 15% of radiologists for abnormal interpretation rate benchmarks to 21% of radiologists for invasive cancer size benchmarks. Therefore, our reported data on performance benchmarks apply principally to those individual radiologists with moderate to high amounts of diagnostic mammography experience.
The fifth limitation is that many (perhaps most) mammography facilities and individual radiologists in the United States do not now conduct the type of comprehensive auditing required to properly utilize the performance benchmark data presented in this article. There simply may not be anyone available to set up, conduct, or analyze comprehensive audits. For practices that use auditing software programs, the program in use may not be able to generate data in a format that permits appropriate comparison with our data. In still other practices, it may be difficult to justify the added cost and effort to conduct comprehensive audits, especially in view of the limited reimbursement now received for breast imaging examinations. However, publication of our performance benchmark data may encourage more mammography facilities and radiologists to conduct comprehensive audits now that clinically relevant comparison data are available.
We have presented a very extensive set of data on diagnostic mammography outcomes and performance benchmarks, among a patient population judged to be representative of the population examined in general radiology practice in the United States, with data designed to be used by mammography facilities and individual radiologists to evaluate their own performance for diagnostic mammography as determined by periodic comprehensive audits. A parallel effort with similar methodology is underway to utilize BCSC data to provide clinically realistic performance benchmarks for screening mammography. Results of this effort will be reported separately.
| FOOTNOTES |
|---|
Authors stated no financial relationship to disclose.
Author contributions: Guarantors of integrity of entire study, E.A.S., D.L.M.; study concepts and design, E.A.S.; literature research, E.A.S.; data acquisition, D.L.M.; data analysis/interpretation, all authors; statistical analysis, D.L.M.; manuscript preparation, E.A.S.; manuscript definition of intellectual content, editing, revision/review, and final version approval, all authors
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E. J. A. Bowles, D. L. Miglioretti, E. A. Sickles, L. Abraham, P. A. Carney, B. C. Yankaskas, and J. G. Elmore Accuracy of Short-Interval Follow-Up Mammograms by Patient and Radiologist Characteristics Am. J. Roentgenol., May 1, 2008; 190(5): 1200 - 1208. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Tice, S. R. Cummings, R. Smith-Bindman, L. Ichikawa, W. E. Barlow, and K. Kerlikowske Using Clinical Factors and Mammographic Breast Density to Estimate Breast Cancer Risk: Development and Validation of a New Predictive Model Ann Intern Med, March 4, 2008; 148(5): 337 - 347. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Smith-Bindman, D. L. Miglioretti, R. Rosenberg, R. J. Reid, S. H. Taplin, B. M. Geller, K. Kerlikowske, and the National Institutes of Health Breast Cancer Su Physician Workload in Mammography Am. J. Roentgenol., February 1, 2008; 190(2): 526 - 532. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Miglioretti, R. Smith-Bindman, L. Abraham, R. J. Brenner, P. A. Carney, E. J. A. Bowles, D. S. M. Buist, and J. G. Elmore Radiologist Characteristics Associated With Interpretive Performance of Diagnostic Mammography J Natl Cancer Inst, December 19, 2007; 99(24): 1854 - 1863. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Hwang, D. L. Miglioretti, R. Ballard-Barbash, D. L. Weaver, K. Kerlikowske, and for the National Cancer Institute Breast Cancer Su Association between Breast Density and Subsequent Breast Cancer Following Treatment for Ductal Carcinoma In situ Cancer Epidemiol. Biomarkers Prev., December 1, 2007; 16(12): 2587 - 2593. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Schell, B. C. Yankaskas, R. Ballard-Barbash, B. F. Qaqish, W. E. Barlow, R. D. Rosenberg, and R. Smith-Bindman Evidence-based Target Recall Rates for Screening Mammography Radiology, June 1, 2007; 243(3): 681 - 689. [Abstract] [Full Text] [PDF] |
||||
|
|