|
|
||||||||
Breast Imaging |
1 From the Office of Health Promotion Research, University of Vermont, 1 S Prospect St, Rm 4427D, Burlington, VT 05401-3444 (B.M.G.); Center for Health Studies, Group Health Cooperative, Seattle, Wash (L.E.I., D.S.M.B., W.B.); University of California, San Francisco, San Francisco, Calif (E.A.S., K.K.); Dartmouth Medical School, Norris Cotton Cancer Center, Lebanon, NH (P.A.C.); University of North Carolina at Chapel Hill, Chapel Hill, NC (B.C.Y.); Cooper Institute, Denver, Colo (M.D.); Applied Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD (K.R.Y.); and University of New Mexico, Albuquerque, NM (R.D.R.). Received August 16, 2005; revision requested October 21; revision received November 28; final version accepted January 2, 2006. Data collection supported by a National Cancer Institutefunded Breast Cancer Surveillance Consortium cooperative agreement (U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, U01CA70040). Address correspondence to B.M.G. (e-mail: berta.geller{at}uvm.edu).
| ABSTRACT |
|---|
|
|
|---|
Materials and Methods: The study included mammograms from 1996 to 2001 from the seven mammography registries of the Breast Cancer Surveillance Consortium (BCSC). The authors defined the pre-MQSA period as January 1, 1996April 27, 1999, and the post-MQSA period as April 28, 1999December 31, 2001 (2470151 screening and 194199 diagnostic mammograms). Assessment was cross-classified according to management recommendation. Changes in concordance between assessment and recommendation were evaluated by year and by period (before and after MQSA) for computer-linked data and for all data by using Pearson
2 test to evaluate differences. Mantel-Haenszel
2 test was used to measure change in concordance over time. Each registry and the BCSC Statistical Coordinating Center had a Federal Certificate of Confidentiality and approval from each institution's review board for protection of human subjects to collect and send data to coordinating center and conduct research with these data. Active consent was required at only one site in this HIPAA-compliant study.
Results: Concordance increased significantly in the post-MQSA period for Breast Imaging Reporting and Data System categories 35 assessments at both screening and diagnostic mammography. The most substantial improvements were in the use of the management recommendation for "additional imaging," which decreased from 41% in 1996 to 15% in 2001 for screening mammograms with an initial assessment of category 4 (P < .001). Recommendation for short-interval follow-up in women with screening mammograms with a category 3 final assessment increased from 51% in 1996 to 76% in 2001 (P < .001). Concordance for diagnostic mammograms assigned category 0 improved from 65% in the pre-MQSA period to 81% in the post-MQSA period (P < .001).
Conclusion: This analysis demonstrates that over a relatively short period of time, major improvement in radiology reporting has occurred.
© RSNA, 2006
| INTRODUCTION |
|---|
|
|
|---|
Thus, the purpose of our study was to retrospectively compare the concordance of initial and final assessment categories for mammograms with management recommendations made before and after the final rules of the MQSA were in effect for screening and diagnostic mammography (3,4).
| MATERIALS AND METHODS |
|---|
|
|
|---|
The BCSC collects patient and radiologist information prospectively at each visit for mammography. The radiologist or radiologic technologist provided information about the indication for examination, type of imaging performed, breast density, assessment, and management recommendations.
Inclusion Criteria
We included data from screening mammograms and from the subset of diagnostic mammograms obtained to evaluate a breast problem (hereafter, called "diagnostic mammogram") from 1996 to 2001. We restricted screening mammograms to those obtained in women 40 years and older and diagnostic mammograms to those obtained in women 25 years and older. We defined the pre-MQSA period as the period between January 1, 1996, and April 27, 1999, and the post-MQSA period as the period between April 28, 1999, and December 31, 2001. The BCSC has defined screening mammograms as those for which the indication for examination was routine screening, with bilateral views obtained, in women with no breast imaging performed in the prior 9 months, personal history of breast cancer, or breast augmentation (n = 2 516 555). Diagnostic mammograms were defined as those for which the indication for examination was "evaluation of a breast problem" in women with no breast surgery or diagnostic examination performed in the previous 60 days (7), personal history of breast cancer, or breast augmentation (n = 234 154).
We excluded mammograms for the following reasons: Examinations for both screening and evaluation of a breast problem occurred on the same day (screening mammograms, 1683; diagnostic mammograms, 2851), which did not allow us to determine the true indication for the examination; initial assessment was missing (screening mammograms, 9366; diagnostic mammograms, 2654); initial management recommendation was missing (screening mammograms, 29 345; diagnostic mammograms, 10 166); mammogram was from a follow-up diagnostic examination performed on the same day as the screening examination (n = 23 334); final assessment was missing (screening mammograms, 4430; diagnostic mammograms, 785); or final management recommendation was missing (screening mammograms, 1580; diagnostic mammograms, 165). This gave us a final number of 2 470 151 screening (1 297 032 pre-MQSA; 1 173 119 post-MQSA) and 194 199 diagnostic (101 536 pre-MQSA; 92 663 post-MQSA) mammograms. Most missing assessments and management recommendations were from ultrasonographic (US) examinations performed at the same time the mammogram was obtained or were completed outside our catchment area, for which we did not collect these details.
Radiologic Assessment
Assessments of the mammograms were made by community-based radiologists by using the six categories from the BI-RADS (third edition), as follows: category 0, additional imaging evaluation needed; 1, negative finding; 2, benign finding; 3, probably benign finding; 4, suspicious abnormality; and 5, highly suggestive of malignancy (2). If different assessments were given for the left and right breast, we used the last in the following hierarchy of assessments to assign the highest category according to the likelihood of cancer diagnosis: 1, 2, 3, 0, 4, and 5.
We reported the results from the screening examination for screening views only (initial assessment) and then again following the completion of all work-up imaging, if any was performed (final assessment). If additional imaging was performed on the same day, the initial assessment was recoded as category 0, and the recommendation was recoded as a recommendation for additional imaging at the Statistical Coordinating Center (L.E.I.). All records with an assessment of category 0 were followed to assign a final assessment until the earlier of the following conditions occurred: (a) 180 days elapsed after the index screening, (b) more than 90 days elapsed between consecutive imaging examinations, (c) breast surgery, or (d) cancer diagnosis. We defined the first assessment with a category 1, 2, 3, 4, or 5 in the follow-up period as the "final" assessment. For records for which no follow-up examinations were identified (n = 30 417), or for which the assessment was 0 after a follow-up examination (n = 2777), or for which categories 15 were assigned as the initial assessment (n = 2 324 923), the final results were the same as the initial results.
For diagnostic mammograms, an initial assessment and a final assessment were determined, but we reported only results for the final assessment, because most (181 273 [92.9%]) of the mammograms were assigned a final assessment on the same day the initial diagnostic mammogram was obtained. We used the same procedure to assign the "final" assessment that we used for screening mammograms if there was a category 0 assessment or if there were multiple breast imaging examinations that occurred on the same day. We truncated follow-up data for diagnostic mammograms at the next diagnostic examination if it was performed before breast surgery or cancer diagnosis.
Radiologic Management Recommendation
The concordance between assessment and management recommendation was based on the BI-RADS assignment for a given assessment: score of 0, additional imaging; 1 or 2, normal-interval follow-up; 3, short-interval follow-up; 4, biopsy should be considered; and 5, appropriate action should be taken (2). In addition to these standard categories, the BCSC collects additional management recommendation categories that we collapsed into the following BI-RADS categories. Recommendations to obtain diagnostic mammography views, US images, magnetic resonance images, and nuclear medicine images were all considered "additional imaging." Recommendations for clinical examination, surgical consultation, fine-needle aspiration, and biopsy were combined into a "biopsyappropriate action" category and were treated as concordant with BI-RADS 4 and 5. If more than one recommendation was given, the one with the highest level of concern among the following was kept: normal-interval follow-up, short-interval follow-up, other, additional imaging, or biopsyappropriate action.
Data and Statistical Analysis
All analyses were performed separately for screening and diagnostic mammograms. We examined the distribution of patient characteristics (age, race, breast density, first mammogram) according to year to identify trends over time. The distribution of assessments according to 10-year age groups in the pre-MQSA period versus the post-MQSA period was also examined to identify any change over time. The management recommendation stratified by assessment was examined according to year, with 1999 split into pre-MQSA and post-MQSA periods.
Some computer-based mammography reporting systems discourage the radiologist from mismatching BI-RADS assessment categories with management recommendations by automatically linking these two parameters prior to report generation. We refer to this approach as producing "linked" data. Most systems that have automatic linkage also permit the radiologist to override the default setting, which can yield discordance between the assessment and management recommendation. The main analyses included all data, whether they were linked or not. We conducted a subanalysis of the concordance between assessment and recommendation by using linked data only and tested whether the concordance was different in the linked and unlinked data.
The Pearson
2 test was used to test for a difference in concordance between the pre-MQSA and post-MQSA periods. A linear trend test was also performed by using the Mantel-Haenszel
2 test to evaluate for change in concordance over time (in years). A logistic regression model with concordance (yes or no) as the response variable was used to test if the change in concordance from the pre-MQSA period to the post-MQSA period differed in the linked and unlinked data. All tests were performed, with a significance level of P < .05, by using SAS/STAT software (version 8 for Windows; SAS Institute, Cary, NC).
The results were independently reviewed and evaluated by all of the authors.
| RESULTS |
|---|
|
|
|---|
Among the 68% of mammograms (n = 1 679 789) for which breast density information was available, the distribution of breast density was as follows: for 9% of mammograms (n = 148 237), almost entirely fat; for 46% (n = 770 129), scattered fibroglandular densities; for 38% (n = 635 988), heterogeneously dense; and for 7% (n = 125 435), extremely dense. Approximately 5% of studies (130 629 of 2 415 079) were first mammograms (data missing for 2% of mammograms [n = 55072]).
Pre-MQSA period versus post-MQSA period.Of the total 196 113 screening mammograms with an initial assessment of category 0, 26% (51 642 of 196 113) were assessed as category 0 as a result of additional imaging having been performed on the same day and, thus, were recoded as a BI-RADS assessment of 0 (Table 1). For initial screening assessments, the proportion of category 1 and 3 assessments decreased and the proportion of category 0 and 2 assessments increased from the pre-MQSA period to the post-MQSA period. Of the mammograms that were given incomplete assessments (not recoded) and for which women required additional imaging, 23% (33 194 of 145 228) of mammogram assessments were unresolved and were classified as a category 0 final assessment (21% [n = 30 417], no follow-up records found; 2% [n = 2777], mammogram assessed as category 0 or not assigned an assessment code at follow-up). For final screening assessments, the proportion of mammograms with category 1 and 3 assessments decreased and the proportion of mammograms with category 2 assessments increased for the post-MQSA study period. The difference in distribution of assessment categories between the pre-MQSA period and the post-MQSA period was significant (P < .05) for both initial and final assessments.
|
|
|
|
|
Among the 66% of mammograms (128 129 of 194 199) for which breast density information was available, the distribution of breast density was as follows: for 5% of mammograms (n = 6824), almost entirely fat; for 37% (n = 47 078), scattered fibroglandular densities; for 44% (n = 56 275), heterogeneously dense; and for 14%, extremely dense (n = 17 952). Approximately 18% of the studies (33 391 of 188 852) were first mammograms (data missing for 3% of mammograms [n = 5347]).
Pre-MQSA period versus post-MQSA period.Among the 6.7% (n = 12 296) of diagnostic mammograms for which women required follow-up imaging, 54% (n = 7044) of mammogram assessments were unresolved and were classified as a category 0 for final assessment (no follow-up records found). For 56% (109 610 of 194 199) of the diagnostic mammograms, additional diagnostic mammographic views, US images, or both were obtained. The distribution of final assessments did not differ across age categories, so results were given for all ages combined.
Concordance improved from the pre-MQSA period to the post-MQSA period, from 65% (2741 of 4201) to 81% (2316 of 2843) for category 0, from 46% (5489 of 12 061) to 55% (5677 of 10321) for category 3, from 89% (4908 of 5488) to 95% (6638 of 6979) for category 4, and from 89% (1168 of 1313) to 96% (1380 of 1434) for category 5 (Fig 3); these improvements were all significant (P < .001). The test for linear trend of increased concordance over time was also significant (P < .001) for final assessments of categories 0, 3, 4, and 5 (Table 4). There was a decrease in concordance from the pre-MQSA period to the post-MQSA period for categories 1 and 2that is, from 80% (42 054 of 52 336) to 73% (27 609 of 38 066) for category 1 and from 76% (19 773 of 26 137) to 71% (23 381 of 33 020) for category 2 (data not shown). The differences in concordance between the pre- and the post-MQSA periods for diagnostic mammograms with a final assessment of 1 or 2 were both significant (P < .001).
|
|
Linked-Data Mammographic Assessment and Management Recommendations
Data were linked for 15% (382 645 of 2 470 151) of screening mammograms and 14% (26 832 of 194 199) of diagnostic mammograms. For initial assessments of screening mammograms with linked data, concordance from the pre-MQSA period to the post-MQSA period increased from 71% (1233 of 1733) to 76% (1180 of 1544) for category 3 (P < .001), from 69% (305 of 442) to 86% (148 of 173) for category 4 (P < .001), and from 93% (56 of 60) to 100% (44 of 44) for category 5 (P = .08) (Fig 1). For final assessment of screening mammograms, concordance increased from 87% (4238 of 4898) to 90% (4674 of 5181) for category 3 (P < .001), from 86% (2108 of 2453) to 98% (1464 of 1494) for category 4 (P < .001), and from 97% (190 of 195) to 99% (151 of 152) for category 5 (P = .18) (Fig 2). For final assessment of diagnostic mammograms, concordance increased from 80% (1148 of 1443) to 84% (1357 of 1625) for category 3 (P = .005), from 92% (873 of 945) to 99% (845 of 858) for category 4 (P < .001), and from 68% (19 of 28) to 99% (165 of 167) for category 5 (P < .001) (Fig 3).
The amount of concordance for both initial and final assessments of screening mammograms was higher for the linked data than for the unlinked data in both the pre- and the post-MQSA periods (P < .001 for all tests). For final assessments of diagnostic mammograms, there was a significant difference in concordance over all assessments in the pre-MQSA period (P < .001) but not in the post-MQSA period (P = .81).
The increase in concordance from the pre-MQSA period to the post-MQSA period was higher (P < .001) for the unlinked data than for the linked data for initial and final assessments of screening mammograms. For diagnostic mammograms, the change in concordance from the pre-MQSA period to the post-MQSA period was also significantly different (P < .001) between the linked and unlinked data; the concordance was constant between the pre-MQSA period and the post-MQSA period for the unlinked data and was decreased for the linked data.
| DISCUSSION |
|---|
|
|
|---|
In this study, we found improvement over time in the concordance between the assessment categories and the management recommendations recommended by the American College of Radiology. Concordance improved over time in all of the assessment categories for both screening and diagnostic mammography. In general, there was more of an improvement after the implementation of the final MQSA regulations. The most substantial improvements were seen in the use of the management recommendation "additional imaging," which should be assigned only when a mammogram receives a category 0 assessment. Most of the declines in the inappropriate recommendation for additional imaging were followed by an increase in the appropriate recommendation for each category. For example, in 1996, only 40% of screening mammograms with an initial assessment of category 3 were associated with a recommendation for short-interval follow-up, while in 2001, 71% of the category 3 mammograms were associated with short-interval follow-up. We saw similar types of improvements in the final screening assessments and in the diagnostic assessments.
Each new BI-RADS edition has provided more details about the assessment categories and the associated recommendations. The third edition was published in 1998, during the time of our study (2). It is difficult to say how long it takes to diffuse the guidance from a professional organization and to document its translation into practice. From 1997 to 1998, there were no remarkable changes. However, in 1999after the release of, but before the implementation of, the final MQSA regulationswe did see some clinically important changes in the association of assessments and recommendation for both initial and final screening examinations. There was an 11% and 7% reduction in recommendations for additional imaging for category 3 and 4 initial screening mammograms, respectively. In addition, there was an 8% decline in recommending additional imaging for both categories 3 and 4 for final screening examinations. These changes toward a more appropriate use of management recommendations may provide evidence that more radiologists are both familiar and comfortable with the concept of an incomplete assessment that requires further imaging before a final assessment can be made.
Federal regulations under the MQSA may have helped to improve the concordance of assessment and management recommendations. Concordance increased significantly from the pre-MQSA period to the post-MQSA period for most of the assessment categories for both screening and diagnostic mammography. The final MQSA regulations were published in November of 1998 (1), and our study results show a trend toward improvement in concordance in 1999, immediately before the regulations went into effect on April 28, 1999. Perhaps once the regulations were published, radiology practices started to implement them. The MQSA currently requires the specific language of each BI-RADS assessment category be included in the final report of the mammography study to clinicians. It also asks that a management recommendation be included but does not specify the language for the management recommendation, nor does it require the management recommendation to be concordant with the assessment category as per BI-RADS.
Computer linkage of data appears to be an effective method for ensuring concordance between assessment and management recommendations. Most commercial mammography computer systems link the assessment to the BI-RADS suggested recommendation but are built to allow the radiologist to override the system, which is why there is not perfect concordance. Figure 3 showed a low proportion of concordance for category 5 diagnostic mammograms in the pre-MQSA period, probably because of the very small number of examinations. Category 3 assessments remain a problem even with computer linkage. During the time of our study, most participating radiology practices did not use computerized systems that link data, although this may increase in the future.
There are several clinical implications for the results of this study. BI-RADS gives explicit direction on how to perform an outcome audit by using the MQSA-regulated assessment categories to calculate outcomes. However, to the extent that there is discordance between the assessment and the management recommendation, outcome audit data can become difficult to interpret and compare with benchmark data. In addition, reports of benign or negative assessments that require further work-up may cause confusion among clinicians and possibly inappropriate follow-up care.
There are several limitations to our study. We did not have the ability to review the complete text of mammography reports to clarify the purpose and intent of an individualized diagnostic evaluation plan. This would have great utility in interpreting data that appear inconsistent with standard recommendations. For example, there are management recommendations categorized as "immediate work-up not otherwise specified." These were not included as concordant recommendations with any of the assessment categories, and for most examinations this does not make a difference. However, for the category 5 diagnostic mammograms, this recommendation category could explain the very low concordance within linked data in the pre-MQSA period (Fig 3), because the numbers are very small.
Two other specific data limitations should be noted. Our follow-up data are incomplete when women go outside of our catchment area for further care. It is possible that more of the cases of category 0 assessments received follow-up but our data systems did not capture it. Second, some of the mammographic and US images obtained following a finding at the screening or diagnostic examination were listed only as completed, without an assessment category or management recommendation, and we could not include these in our analyses.
Our study shows that three mechanismsthe publication of BI-RADS, the regulations for the MQSA, and the use of computer systems with automatic linkageare temporally related to improved concordance of mammography assessment and management recommendations. To promote further progress, radiologists have options for improvement: (a) Continue to improve concordance on their own by using the guidance provided in the newest version of BI-RADS (8) or (b) start to use computer systems with automatic linkage between the mammography assessment and management recommendation for reporting. Finally, this analysis demonstrates that, over a relatively short period of time, major improvements in radiology reporting have occurred.
| ADVANCES IN KNOWLEDGE |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Abbreviations: BCSC = Breast Cancer Surveillance Consortium BI-RADS = Breast Imaging Reporting and Data System MQSA = Mammography Quality Standards Act
Authors stated no financial relationship to disclose.
All opinions and findings are the sole responsibility of the authors. The views expressed do not necessarily represent those of the U.S. government.
Author contributions: Guarantors of integrity of entire study, B.M.G., P.A.C.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, B.M.G., E.A.S., P.A.C.; clinical studies, K.K.; statistical analysis, L.E.I., W.B.; and manuscript editing, all authors
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E.-K. Kim, K. H. Ko, K. K. Oh, J. Y. Kwak, J. K. You, M. J. Kim, and B.-W. Park Clinical Application of the BI-RADS Final Assessment to Breast Sonography in Conjunction with Mammography Am. J. Roentgenol., May 1, 2008; 190(5): 1209 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Lehman, C. M. Rutter, P. R. Eby, E. White, D. S. M. Buist, and S. H. Taplin Lesion and Patient Characteristics Associated with Malignancy After a Probably Benign Finding on Community Practice Mammography Am. J. Roentgenol., February 1, 2008; 190(2): 511 - 515. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Miglioretti, R. Smith-Bindman, L. Abraham, R. J. Brenner, P. A. Carney, E. J. A. Bowles, D. S. M. Buist, and J. G. Elmore Radiologist Characteristics Associated With Interpretive Performance of Diagnostic Mammography J Natl Cancer Inst, December 19, 2007; 99(24): 1854 - 1863. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |