|
|
||||||||
Breast Imaging |
1 From Breast Imaging Consultant, American Radiology Services, Johns Hopkins Green Spring, 10755 Falls Rd, Suite 440, Lutherville, MD 21093 (W.A.B.); Center for Statistical Sciences, Brown University, Providence, RI (J.D.B., J.B.C.); and Department of Radiology, Northwestern University School of Medicine, Chicago, Ill (E.B.M.). Received October 19, 2005; revision requested December 9; revision received January 20, 2006; final version accepted February 8. Supported by a grant from the Avon Foundation. Address correspondence to W.A.B. (e-mail: wendieberg{at}hotmail.com).
| ABSTRACT |
|---|
|
|
|---|
Materials and Methods: Institutional review board approval was obtained for the HIPAA-compliant study. Ten women (aged 1953 years; mean, 37.4 years; 20 breasts) with numerous known breast lesions consented to participate. Eleven breast radiologists, who passed experience and qualification requirements for a screening breast US trial and consented to participate, scanned both breasts in all participants and documented images of each detected lesion and its size, location, features, palpability, and Breast Imaging Reporting and Data System final assessment. Intraclass correlation coefficients (ICCs) were used to measure agreement on lesion size and location, and
statistics were calculated for agreement on features and final assessments compared with consensus.
Results: Eighty-eight unique lesions were identified by at least two investigators (five to 13 lesions per participant). Mean diameter was 6.7 mm (standard error, 0.4; range, 222 mm), and eight lesions (9%) were palpable. Of 968 potential detections (88 lesions, 11 investigators), 536 (55%) detections were made. Individual investigators detected between 43 (49%) and 58 (66%) lesions. Larger lesions were more consistently detected: Detection rates were six of 33 lesions (18%) at 3 mm or smaller; 164 of 374 (43.9%) at 3.15 mm; 145 of 275 (52.7%) at 5.17 mm; 119 of 176 (67.6%) at 7.19 mm; 38 of 44 (86%) at 9.111 mm; and 64 of 66 (97%) lesions larger than 11 mm (P < .001). ICCs for clockface, distance from nipple, and individual lesion diameter all exceeded 0.7, indicating high reliability. For shape, margins, and final assessments of solid lesions,
values were 0.62, 0.67 (substantial agreement), and 0.52 (moderate agreement), respectively. Of 110 detections of consensus cysts 8 mm and smaller, 15 (14%) detections were considered to be of solid lesions by at least one reader.
Conclusion: Larger lesions (>11 mm) are most consistently detected, with fewer than half of lesions 5 mm or smaller in mean diameter identified; substantial agreement was found for description of lesion size, location, and key features, and moderate agreement was found for lesion management.
© RSNA, 2006
| INTRODUCTION |
|---|
|
|
|---|
Whole-breast US may also have value. US can be used to identify additional invasive cancers and help assess the extent of disease in women with newly diagnosed cancer (912). On the basis of encouraging results in single-center studies (1319), a multicenter trial of supplemental screening breast US in women at high risk with dense breast tissue is currently in progress (20).
Consistent recognition of lesions that can be dismissed as benign is critical in avoiding unnecessary anxiety and cost with screening or other whole-breast US applications. Also implicit in consideration of whole-breast screening US is the assumption that incidental probably benign lesions (21) can be successfully followed. This requires consistent reporting of lesion location, size, description, and depiction of features. Thus, the purpose of our study was to prospectively examine operator dependence of lesion detection, description, and interpretation when experienced breast imaging radiologists perform whole-breast US.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
Scanning Technique
Scanning technique was standardized. Each investigator scanned all participants in random order and documented images of each lesion detected. Investigators were instructed to begin scanning the right breast, in a clockwise fashion, by using transverse and sagittal orientation, scanning the inner breast in a supine position and the outer breast in a supine oblique position with the woman's arm raised above her head. An individual was assigned to each room to scribe, and each scribe/monitor was instructed to record the times at which scanning was initiated and completed and to prompt the investigator for a description of lesion location, features, and BI-RADS final assessments (22) per lesion and per breast while the investigator scanned the participant.
All manufacturers were invited to supply equipment for the study. The authors had full control of data and information submitted for publication. A variety of US equipment was used, all of which met the ACRIN 6666 protocol requirements. Spatial compounding and power Doppler were available on all units and could be used at the discretion of the investigators. Specifically, seven units used were HDI 5000 scanners (Philips Medical Systems, Bothell, Wash) equipped with 50-mm L12-5-MHz transducers; one was a Logiq 9 scanner (GE Medical Systems, Milwaukee, Wis) equipped with a 38-mm L14-9-MHz transducer; one was a Hi Vision 8500 scanner (Hitachi Medical Systems America, Twinsburg, Ohio) equipped with a 50-mm L13-6.5-MHz transducer; and one was an Aplio scanner (Toshiba America Medical Systems, Tustin, Calif) equipped with a 38-mm L14-7.2-MHz transducer. Two units (the Logiq 9 and Aplio units) provided harmonic imaging, which could be electively used. The women, equipment, and scribe remained constant for each room as investigators rotated rooms. Each of the 11 investigators performed US in each of the 10 participating women by using the 10 US units described.
Investigators were instructed to document each cystic and solid lesion with an image of its largest horizontal diameter (recording both horizontal and vertical diameters) and an image perpendicular to that with its respective diameter. For each lesion, investigators recorded the location according to breast, clockface to the nearest half-hour, and estimated distance from the nipple in centimeters. (Investigators were instructed to use the known width of the transducerthat is, 38 or 50 mmas a guide in estimating distance from nipple.)
Features Recorded
Investigators were instructed to use interpretive criteria developed for the ACRIN trial protocol 6666, "Screening Breast Ultrasound in High-Risk Women" (www.acrin.org). The overall echotexture was described by each investigator for each participant: homogeneous, focally heterogeneous, or diffusely heterogeneous (22). BI-RADS US (22) features were recorded, beginning with whether the lesion was a special case (special cases defined as any of the following: cyst, complicated cyst, clustered microcysts, intraductal mass, lymph node, postsurgical scar, or calcifications without a mass). For each lesion, each investigator was required to assess the presence of calcifications (none, macrocalcifications, microcalcifications in mass, or microcalcifications outside of mass), palpability after scanning at directed examination by the investigator (yes or no), likelihood of malignancy (0%100%), and BI-RADS final assessment. Each investigator was then asked to give a BI-RADS final assessment score per breast (score of 1, negative; 2, benign; 3, probably benign; 4A, low suspicion; 4B, intermediate suspicion; 4C, moderate suspicion; or 5, highly suggestive of malignancy).
For cases that were not deemed special cases, additional features were recorded, which included the following: shape (oval or two to three gentle lobulations, round, or irregular), orientation (parallel to the skin surface or not), margin (circumscribed or not), boundary (abrupt or echogenic halo), echo pattern (anechoic, hyperechoic, complex cystic, isoechoic, mixed hyperechoic-hypoechoic, or hypoechoic), and posterior features (none, enhancement, combined enhancement and shadowing, or shadowing).
Image and Statistical Analysis
The primary aim of this study was to estimate the reproducibility of various lesion characteristics such as lesion volume, identification of lesions, measurement of size, morphology, and recording of lesion location on US images across multiple observers in an enriched set of diagnostic cases. We projected that 10 readers and 10 subjects (each with at least three lesions) would be required in order to provide sufficient efficiency to estimate the interclass correlation coefficient (or its likeness, the
statistic, when the data are ordinal in nature). Note that a
of 1.0 denotes perfect agreement; 0.810.99, almost perfect agreement; 0.610.80, substantial agreement; 0.410.60, moderate agreement; 0.210.40, fair agreement; and 0.20 or less, slight agreement (23).
The sample size projections were designed to achieve an acceptable one-sided lower confidence bound for estimating the intraclass correlation coefficient (ICC) that is above 0.6 under the following conditions: A one-sided 95% confidence interval would be used to estimate the ICC, the ICC would be approximately 0.75 (indicating excellent reliability), and the ratio of key variance components would be (much) less than 3. In the ensuing analyses, we actually provide a two-sided 95% confidence interval, which is slightly more conservative.
Statistical software SAS (version 8.0; SAS Institute, Cary, NC) and Stata (version 7.0; Stata, College Station, Tex) were used to process the data (J.D.B., J.B.C.) and facilitate statistical analyses. Initially, summary tables and simple frequencies were used to explore the data and check for outliers. The probability of detection was modeled as a function of lesion characteristics, such as size or special case, by using a generalized linear model with a log-link function and binomial errors. The resulting coefficients for this regression are log risk ratios. Our models allowed for clustering at the participant level to account for the natural correlation in these data. ICCs were calculated for continuous variables such as lesion size (comparing mean diameters) and location.
statistics (23) were used to measure agreement on lesion features and final assessments compared with the consensus finding (defined as the mode across all readers). Missing values were excluded. Confidence intervals for ICCs were bootstrapped in an attempt to account for the correlated nature of these data. In our experience (J.D.B.), such bootstrapping is sufficient, although not ideal. We specifically analyzed the subset of lesions considered cysts by consensus to determine the reliability of this designation as a function of lesion size.
For consensus information, hard-copy images were compared (W.A.B., with 14 years of experience in breast US), and lesions were matched and assigned lesion numbers across investigators. To reduce the effect of false-positive detections, for reproducibility and
analyses we considered only those lesions seen by at least two investigators. The consensus description across all investigators identifying any given lesion was considered the reference standard for that lesion. For lesions seen by only two readers, we required a different algorithm: "Consensus" was defined as the rounded mean of the two interpretations (with ordinal values given to each feature in order of increasing suspicion).
The percentage of lesions detected was explored as a function of the time spent scanning the patient while adjusting for clustering due to investigator and participant. Investigators who failed to record their time (two participant evaluations) were excluded from this analysis. One investigator performed only a targeted US examination in one participant and, hence, his or her (missing) results could not be included in these analyses. To examine the potential for fatigue to affect detection, investigator-specific detection rates were examined as a function of time.
| RESULTS |
|---|
|
|
|---|
Both empirically and according to statistical modeling, lesion detection increased with increasing lesion size from 0 to 13 mm (Fig 2). Empiric detection rates at arbitrary cutoff points were as follows: Six (18%) of 33 potential detections were made at a lesion size of 3.0 mm or smaller; 164 (43.9%) of 374 detections were made at 3.15.0 mm; 145 (52.7%) of 275 detections were made at 5.17.0 mm; 119 (67.6%) of 176 detections were made at 7.19.0 mm; 38 (86%) of 44 detections were made at 9.111.0 mm; and 64 (97%) of 66 detections were made for lesions larger than 11.0 mm in diameter, although one 31-mm fibroadenoma was missed by one observer (Fig 3). According to the statistical model (Fig 2), the likelihood of detection increased by 12% (1.12; 95% confidence interval: 1.05, 1.3) per millimeter increase in size from 3 to 13 mm. Mean lesion size was 8.6 mm (standard error, 0.3) for lesions detected by at least seven investigators versus 5.2 mm (standard error, 0.2) for lesions detected by only two or three investigators (P < .001). Lesion detection rates did not differ for cysts (versus solid or complex lesions) or as a function of distance from the nipple.
|
|
|
|
|
was 0.30 (with
of 0.30 for homogeneous, 0.19 for focally heterogeneous, and 0.41 for diffusely heterogeneous echotexture). No correlation was observed between heterogeneity of echotexture and breast density in this small number of patients. For palpability (yes or no),
was 0.70 (92% agreement). For special cases (cyst, complicated cyst, clustered microcysts, intraductal mass, lymph node, scar, or not a special case),
was 0.74. Specific designation of a lesion as a cyst or as a complicated cyst had a
value of 0.59 or 0.52, respectively.
Special cases were excluded from subsequent feature analysis, which left 41 solid lesions and one potentially complex cystic lesion (by consensus) to be described (Table 2). Importantly, we found substantial agreement for margins with
of 0.67, although only five (12%) of 42 lesions in our series were considered "not circumscribed" by consensus. For shape,
was 0.62, and no lesions were described as "round" (Table 2). For echogenicity, orientation, and posterior features,
values were 0.25, 0.72, and 0.38, respectively. Notably low
values were seen for mixed hyperechoic-hypoechoic (
= 0.02), isoechoic (
= 0.22), hypoechoic (
= 0.33), and complex cystic (
= 0.35) echogenicities and for posterior enhancement or no posterior features (
= 0.31 and 0.36, respectively).
|
was 0.52 (standard error, 0.03). When we grouped categories 1 and 2 and categories 4A, 4B, 4C, and 5 and considered category 3 as its own category,
was 0.56 (standard error, 0.04), and when we dichotomized the assessments into BI-RADS categories 1, 2, or 3 versus categories 4A, 4B, 4C, or 5,
was 0.48 (standard error, 0.04). Clinically important disagreement (eg, BI-RADS final assessment of category 4A, 4B, 4C, or 5 by consensus and 1, 2, or 3 by readers) was found in 8.5% of lesion interpretations.
Cystic Lesions
Of 88 lesions, 23 (26%) were considered simple cysts by consensus, nine (10%) were complicated cysts, nine (10%) were clustered microcysts, three (3%) were lymph nodes, and 42 (48%) were not special cases (ie, not lymph nodes, scars, intraductal masses, or calcifications, which leaves these to be solid [n = 41] or complex cystic [n = 1] masses, hereafter referred to as "solid"). Of 142 detections of lesions classified as cysts by consensus (ie, "consensus cysts"), 108 (76.1%) detections were classified as cysts, 15 (11%) as complicated cysts, four (2.8%) as clustered microcysts, and 15 (11%) as solid lesions by individual investigators (Fig 5). There was greater agreement on classification of lesions as simple cysts when they were larger than 8.0 mm (Table 3). Of 34 consensus cysts 4.0 mm or smaller, seven investigators detecting the same lesion considered it to be solid (21%) in comparison with five (13%) of 39 cysts 4.16.0 mm, three (8%) of 37 cysts 6.18.0 mm, and no lesions larger than 8.0 mm (ie, overall, 15 [11%] of 142 consensus cysts were classified as solid by an individual investigator, and all of those disagreements were for lesions 8.0 mm in mean diameter or smaller).
|
|
|
For the vast majority of lesions for which the consensus was a solid mass other than a lymph node or intraductal mass, individual interpretations concurred. Specifically, of 285 individual interpretations of such lesions, 257 (90.2%) agreed. Only 12 (4.2%) individual interpretations of such lesions classified such lesions as cysts, and another five (1.8%) interpretations classified them as complicated cysts (Table 3).
Despite variability in individual investigator characterization of lesions as simple cysts, management recommendations were generally consistent when analyzed according to individual lesion descriptions (Table 3). Of 137 individual interpretations of lesions as simple cysts, appropriately 127 (92.7%) were assessed as BI-RADS category 2, benign. Another six (4.4%) interpretations described as being of simple cysts were assessed as BI-RADS category 3, probably benign, and one (0.7%) as BI-RADS category 4 (three interpretations not recorded). All individual interpretations of "cysts" that gave a final assessment of BI-RADS category 3 or 4 were of lesions classified as solid by consensus. Of 52 individual interpretations of lesions as complicated cysts, 29 (56%) interpretations were classified as BI-RADS category 2, another 22 (42%) were classified as BI-RADS category 3, and one (2%) was classified as BI-RADS category 4. Of 24 individual interpretations of lesions as clustered microcysts, 19 (79%) classified the lesion as BI-RADS category 2 and five (21%) classified the lesion as BI-RADS category 3.
Examination Time
The average time to complete a bilateral whole-breast US examination in this group of patients, which was enriched in lesions, was 31 minutes (median, 30 minutes; standard deviation, 11 minutes; range, 359 minutes). Time to scan did not generally correlate with the number of lesions per participant, the percentage of lesions detected, or the observer. The minimum average time to scan both breasts was 25 minutes for a woman with seven lesions, and the maximum average time was 42 minutes for a woman with 12 lesions. One outlying investigator averaged 46 minutes per patient (median, 45 minutes). By comparison, the range of mean times for all other investigators was 2336 minutes (range of median times, 1932 minutes). Not surprisingly, the one outlying observer also had the highest detection rates. The minimum total time spent scanning 10 women by 11 observers was 178 minutes, and the maximum total time was 514 minutes. There were insufficient data to characterize the potential effect of investigator fatigue on lesion detection.
| DISCUSSION |
|---|
|
|
|---|
Despite the added potential variability of US compared with other imaging modalities, Bosch et al (24) found high interexamination agreement in both detection and classification across three observers independently performing real-time whole-breast US in 58 patients and 113 breasts; 60% of breasts had a lesion, and 10% of lesions were malignant. The
values were 0.720.75 between pairs of observers, indicating excellent reliability (24); values decreased slightly to a mean
of 0.65 when normal breasts were excluded, and values further decreased to
of 0.55 in the 32 dense breasts evaluated (compared with
of 0.82 in nondense breasts). Of importance, these
values exceeded those of mammography across the same observers in the same patients (24). In the study by Bosch et al (24), a resident with experience performing 500 breast US examinations performed on par with the two senior investigators, who had 35 years of experience each.
We found high reliability for the recording of lesion location and size in our series. Across seven single-center studies of screening breast US (reviewed in reference 25), 6.6% of women required short-interval follow-up of probably benign findings (21) seen only at US. Our results suggest that such follow-up is feasible as lesion size and location are consistently recorded. We considered the mean lesion diameter for each lesion, which was more consistent than individual diameters, and specified standardized technique to record the diameters (ie, the largest horizontal diameter, the vertical measure at that level, and the horizontal perpendicular diameter were documented). To facilitate lesion follow-up, it is strongly recommended that the lesion location be specified according to clockface, as in our series, rather than just according to quadrant. We would further suggest that lesion distance from the nipple be recorded in centimeters, as in our series, rather than by using the A-B-C classification system, which divides the breast into thirds from the nipple to the axilla. The transducer itself serves as an internal calibration tool for measurement of distance to the nipple from the center of the lesion.
While the primary goal of our study was assessment of reliability, we were also able to examine consistency of lesion detection because of the large number of lesions present in our participants. In our study, detection was seen to vary with lesion size, with only an approximately 53% detection rate for lesions 5.17.0 mm in size, a less than 70% detection rate for lesions 7.19.0 mm in size, and reliable detection (97%) only once mean lesion diameter was larger than 11.0 mm. In phantom scanning qualification tasks for the ACRIN 6666 protocol (26), lesion detection was reliable for lesions 5 mm or larger, which appears to overestimate clinical performance.
We did not evaluate the role of increasing breast size or lesion depth on detection in our series. The depth of the center of the lesion from the skin surface is being recorded for lesions identified in the ACRIN 6666 protocol (www.acrin.org). Variations in pressure used while scanning will affect not only depth and lesion diameters but also feature analysis (27). Having the lesion depth recorded and achieving the same depth at follow-up imaging will again facilitate comparison of lesion measurements over time. The surrogate measures of bra cup size and breast thickness are being used as indicators of breast size in the ACRIN 6666 protocol, and lesion depths are being recorded, although the effect of breast size on lesion detection at US has not yet been systematically studied in the clinical setting. Lesion detection decreased with increasing lesion depth in phantom studies (26).
When considering screening for breast cancer, the goal is detection of breast cancer when it has a good prognosisthat is, cancers 10 mm or smaller in size with negative lymph nodes (28,29). While the mean size of cancers detected in seven reported series of screening US to date is 911 mm (seven series reviewed in reference 25), results from our series suggest that detecting lesions much smaller than 9 mm in size may be more variable with whole-breast US with a handheld transducer, although we had only benign lesions in our series, and operators may have dismissed small cysts.
In screening and, to a lesser degree, diagnostic applications, it is critical to minimize unnecessary biopsies. Accurate lesion characterization is critical to successful use of breast US. Classically, characterization of a lesion as a simple cyst is considered to be nearly 100% reliable with strict adherence to defined criteria: anechoic, circumscribed mass (with emphasis on the presence of a well-defined back wall), with posterior enhancement (30). In phantom studies, Berg et al (26) reported that characterization of lesions designed (31) to be "simple cyst" equivalents was unreliable for lesions 3 mm in size or smaller. In our current series, we found inconsistent characterization of cysts smaller than 8 mm in mean diameter, with difficulty distinguishing a simple cyst from a solid lesion in 15 (14%) of 110 detections of consensus cysts 8 mm in size or smaller. Even for cysts 9 mm and larger, six (19%) of 32 observations described the lesion as a complicated cyst. At least some of the variability in distinguishing simple cysts from solid lesions likely relates to variable user-defined gain and possibly also to differences in scanning pressure and resultant lesion depth.
To be considered a complicated cyst, a lesion must be circumscribed with posterior enhancement and homogeneous low-level echoes (32). By protocol (www.acrin.org), complicated cysts can be dismissed as benign when there are multiple bilateral cysts and complicated cysts or when the complicated cyst demonstrates mobile internal echoes or a fluid-debris level. An isolated complicated cyst can be classified as probably benign (BI-RADS category 3), with recommendation for short interval follow-up. As such, it is not surprising that 56% of "complicated cysts" were classified as benign and 42% were classified as probably benign in our series. Investigators appropriately (27) classified clustered microcysts as either benign or probably benign as well.
Management was more consistent with the description of the lesion by any given observer than was management of a given lesion across observers. Specifically, for consensus cysts, complicated cysts, and clustered microcysts, individual interpretations of these lesions as "cysts" all carried a BI-RADS assessment of 2, benign. When a consensus cyst was considered solid by any observer, that observer nearly always classified the lesion as BI-RADS category 3, probably benign. As might be expected, when a lesion considered clustered microcysts by consensus was thought to be at least partially solid by another observer, that observer classified the lesion as BI-RADS category 4, suspicious, which is in keeping with a 23% risk of malignancy for complex cystic and solid masses (33).
We found moderate agreement on lesion management, with
of 0.52 for BI-RADS final assessments, which is consistent with prior work in breast imaging and other areas of radiology (3438). Authors of several other studies have reported moderate agreement in breast US hard-copy image interpretation (36,39). Skaane et al (39) reported slightly lower interobserver agreement for management based on US than for mammography or combined readings, with a mean
value of 0.48 for hard-copy US images in comparison with
values of 0.58 for mammography and 0.71 for combined readings. Baker et al (36) reported
of 0.51 for management based on US images.
We found substantial agreement for description in all special case categories used, including cyst, complicated cyst, clustered microcysts, lymph node, and postsurgical scar. This is reassuring because each of these special cases has management implications (21). We did not have participants with intraductal masses, although biopsy is appropriate for that category of special case, with an assessment of BI-RADS category 4A (www.acrin.org). Of importance, assessment of margins as circumscribed or not circumscribed showed substantial agreement (
of 0.67). Moderate to substantial agreement was noted for suspicious features, such as irregular shape and posterior shadowing. Only fair agreement was noted for posterior enhancement or no posterior features in this series, which was probably due to extensive use of spatial compounding, which reduces conspicuity of posterior features (4042). Similarly, fair agreement was seen for description of echogenicity, which likely reflects the artificial exclusion of all anechoic cysts from this calculation (as they were considered special cases), as well as the large number of choices and subtlety of distinguishing masses that are isoechoic to fat from those that are slightly hypoechoic to fat in this population enriched in presumed fibroadenomas.
The palpability of a mass generally also plays a role in management recommendations. Criteria for surveillance were developed for nonpalpable lesions (21,43,44), although Graf et al (45) reported success in following palpable masses that appeared benign at both mammography and US. We found substantial agreement on the determination of the palpable status of a mass, with
of 0.70.
The mean amount of time to perform a bilateral whole-breast US study, at 31 minutes in our study, far exceeds values previously reported (16,46) and does not include the time to generate a clinical report. While the increased time in our study is certainly related to the increased number of lesions present per participant, our results highlight some of the resource problems encountered when considering offering screening breast US. Reimbursement for breast US at present is standardized, regardless of whether the examination is targeted to a palpable cyst or requires documenting multiple bilateral solid lesions.
There are potential limitations to our study. We established a standardized scanning and documentation protocol (www.acrin.org), which all investigators were instructed to use and which room monitors helped enforce as they requested missing information. This certainly improved completeness of documentation. Survey scanning was performed in transverse and sagittal planes to facilitate thoroughly covering the entire breast. Radial and anti-radial scanning may be appropriate for lesion characterization (44), but such transducer orientations result in duplicative coverage of the periareolar tissues and reduced coverage of the peripheral breast. The scanning, documentation, and interpretation methods used in our series are believed to be generalizable and their use is encouraged, although the presence of a standard protocol does not ensure its implementation. Baker and Soo (47) reviewed static images from 152 US examinations from 86 institutions and found that over 60% failed to comply with at least one of the then-existing guidelines set by the American College of Radiology (2), with deficiencies most notable for equipment, focal zones, lesions documented in only one plane, or patient identifiers. Investigators in our study were required to use US machines from multiple different manufacturers: They may or may not have been familiar with the unit and its controls, which may have reduced detection or hampered lesion classification.
The population of participants and lesions in our series was highly enriched in multiple, bilateral circumscribed solid masses and cysts. Insofar as detection of suspicious lesions is the primary goal of any imaging test, investigators may have seen but dismissed many benign lesions, particularly cysts and intramammary lymph nodes. As such, our results likely underestimate rates of detection that would be observed with suspicious findings. If anything, the presence of multiple, bilateral benign findings makes bilateral whole-breast US more challenging: Our results may underestimate agreement on final assessments. Bosch et al (24) found higher agreement for negative cases than for positive cases. While the population in our series is not representative of lesions undergoing biopsy, it is representative of false-positive findings in screening US. Indeed, one participant underwent biopsy of a stereotactically placed biopsy collagen plug and clip on the basis of these scans.
In our series, all investigators participated in training in BI-RADS for US (22) and a US interpretive skills task for ACRIN 6666 (www.acrin.org) the day prior to reproducibility studies, which may have improved interpretive skills. Training in BI-RADS for mammography has been shown to improve reader performance (48), and the same is probably true for BI-RADS for US. The phantom and interpretive skills tasks can be made available to interested individuals: We expect these results to be generalizable to those individuals who complete the same qualification tasks and follow a similar protocol.
Success in lesion detection with US is likely influenced by operator fatigue, large breast size, multiplicity of findings, and lesion depth. Each of these factors may have, in fact, hampered success in our study, though they were not able to be specifically analyzed.
In summary, by using a standardized scanning protocol and interpretive criteria, we found that fewer than half of lesions 5.0 mm or smaller were detected, detection of lesions 5.111.0 mm in mean diameter was variable, and detection of lesions larger than 11.0 mm was reliable; high reliability was demonstrated for reporting lesion size and location, as was moderate agreement for lesion management with physician-performed whole-breast US.
| ADVANCES IN KNOWLEDGE |
|---|
|
|
|---|
of 0.62 and 0.67, respectively.
of 0.52) and was comparable with that at mammography and breast MR imaging.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Abbreviations: ACRIN = American College of Radiology Imaging Network BI-RADS = Breast Imaging Reporting and Data System ICC = intraclass correlation coefficient
Author contributions: Guarantors of integrity of entire study, W.A.B., E.B.M.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, W.A.B., E.B.M.; clinical studies, W.A.B., E.B.M.; statistical analysis, W.A.B., J.D.B., J.B.C.; and manuscript editing, all authors
Authors stated no financial relationship to disclose.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
W. A. Berg, J. D. Blume, J. B. Cormack, E. B. Mendelson, D. Lehrer, M. Bohm-Velez, E. D. Pisano, R. A. Jong, W. P. Evans, M. J. Morton, et al. Combined Screening With Ultrasound and Mammography vs Mammography Alone in Women at Elevated Risk of Breast Cancer JAMA, May 14, 2008; 299(18): 2151 - 2163. [Abstract] [Full Text] [PDF] |
||||
![]() |
E.-K. Kim, K. H. Ko, K. K. Oh, J. Y. Kwak, J. K. You, M. J. Kim, and B.-W. Park Clinical Application of the BI-RADS Final Assessment to Breast Sonography in Conjunction with Mammography Am. J. Roentgenol., May 1, 2008; 190(5): 1209 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Kim, E.-K. Kim, J. Y. Kwak, B.-W. Park, S.-I. Kim, J. Sohn, and K. K. Oh Role of Sonography in the Detection of Contralateral Metachronous Breast Cancer in an Asian Population Am. J. Roentgenol., February 1, 2008; 190(2): 476 - 480. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||