Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print April 26, 2006, 10.1148/radiol.2393051069
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2393051069v1
239/3/693    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Berg, W. A.
Right arrow Articles by Madsen, E. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Berg, W. A.
Right arrow Articles by Madsen, E. L.
(Radiology 2006;239:693-702.)
© RSNA, 2006


Breast Imaging

Lesion Detection and Characterization in a Breast US Phantom: Results of the ACRIN 6666 Investigators1

Wendie A. Berg, MD, PhD, Jeffrey D. Blume, PhD, Jean B. Cormack, PhD, Ellen B. Mendelson, MD and Ernest L. Madsen, PhD

1 From American College of Radiology Imaging Network, American Radiology Services, Johns Hopkins Green Spring, 10755 Falls Rd, Lutherville, MD 21093 (W.A.B.); Center for Statistical Sciences, Brown University, Providence, RI (J.D.B., J.B.C.); Department of Radiology, Northwestern University School of Medicine, Chicago, Ill (E.B.M.); and Department of Medical Physics, University of Wisconsin, Madison, Wis (E.L.M.). Received June 25, 2005; revision requested August 16; revision received September 1; final version accepted September 21. Supported by grants from the National Cancer Institute (CA89008) and the Avon Foundation. Address correspondence to W.A.B. (e-mail: wendieberg{at}hotmail.com).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
Purpose: To prospectively evaluate ultrasonographic (US) lesion detection and characterization in a breast phantom by potential investigators in a screening US protocol, American College of Radiology Imaging Network (ACRIN) 6666.

Materials and Methods: National Cancer Institute Cancer Experimental Therapeutic Protocol review and ACRIN internal institutional review board approved the protocol; potential investigators were informed of the study purpose prior to participation. Six equivalent anthropomorphic phantoms were prepared with 17 masses (2–10 mm in mean diameter) in different locations at different depths. Sixty-six investigators, experienced in breast US, from 23 institutions scanned a phantom with high-frequency linear-array transducers (12-5 MHz). Lesion location, diameters, echogenicity, shape, and posterior features were recorded. Reader-specific phantom maps were generated and compared with known lesion locations and features. Results from 64 observers could be analyzed and were masked to investigator identity. Agreement on US features was measured with {kappa} statistics. A generalized linear model generated log relative risks for detection rates as a function of lesion diameter, depth, and features.

Results: Of 17 lesions, a median of 14 (82%) were detected (range, 9–16), and 86% of observers detected at least 12 lesions. Of 1088 potential detections, 861 (79.1%) were made. Among 5–10-mm lesions, 499 (97.5%) of 512 detections were made (excluding a 6-mm "skin" lesion seen by only seven observers [11%]). One 4-mm mass was seen by 53 observers (83%). Among 3-mm lesions, 274 (71.4%) of 384 detections were made. One 2-mm lesion was seen by 28 (44%) observers. Relative risk of detection decreased to 0.55 (95% confidence interval: 0.51, 0.59) for each centimeter increase in lesion depth. Agreement was slight for lesion shape ({kappa} = 0.14), substantial for echogenicity ({kappa} = 0.61), and moderate for posterior features ({kappa} = 0.45). Feature description errors were common for 2–4-mm lesions; only 33% of 3-mm anechoic masses were so characterized. Among eight 6–10-mm lesions, investigators erred in feature description of a median of 1 lesion (mean, 1.3; range, 0–4).

Conclusion: US detection and description of lesions in a breast phantom were highly consistent for lesions 5–10 mm in diameter; those smaller than 5 mm were less reliably identified or characterized by experienced investigators.

© RSNA, 2006


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
The reduction in breast cancer mortality attributable to widespread mammographic screening is almost entirely due to early detection (1). The goal of screening is the detection of breast cancer when it is stage 0 (ductal carcinoma in situ), or stage I (<2 cm in size, invasive, without spread to axillary lymph nodes). In particular, the best prognosis is seen with cancers smaller than 1 cm in size (2). While mammography has been shown to have 98% sensitivity for depicting cancer when the tissue is fatty (3,4), sensitivity is reduced to as low as 30%–48% (46) when the parenchyma is extremely dense (7).

Supplemental screening ultrasonography (US) has been shown to depict small invasive cancers not seen at mammography in dense breast tissue. Across multiple single-center trials totaling 42 838 examinations (4,812), 150 (0.35%) cancers were identified only at US in 126 women with average risk. The detection benefit was nearly all observed in heterogeneously dense or extremely dense breast tissue; over 90% of the women with cancers seen only at US had mammographic tissue density in those categories. Of the 150 cancers, 94% were invasive, with a mean size of 9–11 mm across the series (4,812). When staging was detailed, over 90% were node-negative. The generalizability of supplemental screening US is now the subject of a multicenter randomized trial, the American College of Radiology Imaging Network (ACRIN) protocol 6666 (13).

To participate as investigators in ACRIN protocol 6666, radiologist investigators were required to successfully complete several qualification tasks, including phantom scanning, training in the Breast Imaging Reporting and Data System (BI-RADS) for US (14), and interpretation of proved sets of US and mammographic images. Thus, the purpose of our study was to prospectively evaluate US lesion detection and characterization in a breast phantom by potential investigators in a screening US protocol, ACRIN 6666.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
Phantom Preparation
Six anthropomorphic breast phantoms (15) were constructed by one of the authors (E.L.M.). In brief, a series of 17 simulated masses, which included eight simulated cysts, two hyperechoic lesions, and seven hypoechoic lesions (Table 1), were impaled on thin (0.3-mm) stainless steel wires in varying locations in each phantom. Fourteen lesions were spaced approximately every 2.5 cm; two lesions were embedded in a superficial "subareolar zone" centrally, and a "cyst" was placed in one corner within a superficial layer of fat-mimicking molten material (16). Molten parenchyma-mimicking material (17) was introduced and allowed to congeal, and then the wires were removed. Lesion depths ranged from 0.5 to 4.1 cm, lesion diameters ranged from 2 to 10 mm (Table 1), and the overall phantoms measured 10.0 x 15.5 x 6.0 cm. Alphanumeric coordinates were marked every 1 cm on the edges of the phantom. The six phantoms were identical except that the positions of 14 of the 17 lesions were randomized during construction (15).


View this table:
[in this window]
[in a new window]

 
Table 1. Simulated Masses in Breast Phantoms and Rates of Detection among 64 Radiologist Investigators

 
Investigator Participation
Each of the 66 potential investigators had a minimum experience of scanning and interpreting images from at least 500 breast US examinations in the prior 2 years. Investigators received a 1-hour didactic session in the BI-RADS for US (14). Investigators were shown representative images of the phantom and lesions and were instructed in how to record location, measurements, and features. They were then asked to scan one of the six phantoms by using a high-frequency linear-array transducer (12–5-MHz HDI 5000, Philips, Bothell, Wash or 13–7-MHz Sequoia, Acuson-Siemens, Malvern, Pa) and identify as many lesions as possible. Information about the total number and types of lesions in the phantom was available within the protocol, which had been sent to site principal investigators but was not specifically discussed. Use of spatial compounding was at the discretion of the investigators. Harmonic imaging was not employed.

Investigators were asked to report which phantom was scanned, lesion location (x and y alphanumeric coordinates and depth from the surface) to the nearest 0.5 cm, size (in all three perpendicular planes) to the nearest 0.1 mm, shape (either oval-round or irregular), echogenicity (anechoic, hyperechoic, complex cystic, isoechoic, hypoechoic), and posterior features (none, enhancement, shadowing, combined shadowing and enhancement). A nonradiologist assistant was allowed to record data while the investigator scanned the phantom. The data collection form provided space to detail 16 lesions.

Investigators were encouraged to identify as many lesions as possible. While there was no specific time constraint, the majority of investigators performed this task as part of a 2-day training session in the protocol at Northwestern University Medical Foundation (Chicago, Ill), and spending more than 1 hour performing this task was not realistic given the overall timing of the course.

Results were analyzed and masked to investigator identity according to a protocol approved by National Cancer Institute Cancer Experimental Therapeutics Protocol review and the ACRIN internal institutional review board. Potential investigators were informed of the purpose of the study prior to consenting to participate, and individuals had the option to not attempt to qualify as investigators for the ACRIN 6666 protocol. Phantom maps were created (J.B.C.) by using S-Plus 6.2 (Insightful, Seattle, Wash), with an overlay of the investigator's results and the known location, mean size, echogenicity, and posterior features of lesions in that phantom (Fig 1). Additional features were available in an Excel (Microsoft, Redmond, Wash) database and were occasionally needed to match lesions.


Figure 1
View larger version (30K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1: Sample map of phantom scanning results by investigator (red) and known locations of simulated masses (black). Numbers refer to the lesion recorded by the reader (red) or the actual lesion number (black, Table 1). Bold = hyperechoic lesion, {square} = lesion with shadowing. Relative mean diameter of the lesions is drawn to scale, with the alphanumeric scale delimiting centimeter marks. In this example, lesions 14 and 17 were not identified, lesion 8 was identified twice (duplicate), and lesion 7 was underestimated in size by more than 1 mm.

 
The study chair (W.A.B.) reviewed all phantom maps and scored true-positive, false-positive, and duplicate detections, as well as errors in feature analysis. Scoring was as follows: one point for each unique true lesion identified, minus half of a point for each false-positive lesion, and minus half of a point for each lesion with incorrect feature analysis (echogenicity, shape, or mean size incorrect by more than 1 mm) for lesions 6 mm in size or larger. Duplicate identification of the same lesion was recorded but was not included in the scoring. On the basis of preliminary studies of several phantoms by one of two breast imaging specialists (W.A.B and E.B.M., with 10 and 25 years of experience, respectively), investigators were required to achieve a score of 11.5 or better out of a maximum of 17. Investigators were given the option to rescan another phantom (with the lesions in different locations) if needed. We present the results of the first attempt.

Two of 66 investigators submitted identical data sets; both were discarded. Thus, our results relate to 64 investigators. Those investigators who did not qualify at the initial attempt were asked to rescan the phantom. We examined errors in detection for all lesions as a function of size, depth, and features. We evaluated feature description by investigators (echogenicity, shape, or mean size incorrect by more than 1 mm) for lesions 6 mm in size or larger against the designed characteristics of each lesion. Inaccuracies in lesion location were also analyzed.

Statistical Methods
Data were analyzed at the Center for Statistical Sciences at Brown University (J.D.B. and J.B.C.), which serves as the biostatistics center for all ACRIN trials. Data were prospectively cleaned and monitored in a collaborative effort with ACRIN data management staff located at the American College of Radiology (Philadelphia, Pa). Statistical software SAS (version 8.0; SAS Institute, Cary, NC) and Stata (version 7.0; Stata, College Station, Tex) were used to process the data and facilitate statistical analyses. Initially, summary tables and simple frequencies were used to explore the data and check for outliers.

A primary objective was to estimate the reliability with which readers could identify lesion characteristics such as shape, size, depth, echogenicity, and posterior features. We tabulated the percentage of readers who correctly identified shape, echogenicity, and posterior features, and assessed their reliability by using the well-known {kappa} statistic and generalized {kappa} statistic (18), both of which account for agreement due to chance. We present an "average reader-specific {kappa}" for agreement with the reference standard, which is simply the average of each of the individual reader's {kappa} values against the reference standard for the feature of interest. We accounted for the natural clustering of outcomes by using Huber-White robust standard errors or by including a random effect in our regression models.

For continuous features, such as size and depth, we report the average error in determination of lesion size and depth; a repeated measures regression model for size and depth was constructed to assess this measurement error. In addition, we modeled the detection rate as a function of these lesion characteristics by using a generalized linear model with binomial errors and log link function. The coefficients in this model are log relative risks, as opposed to the more standard log odds ratios obtained from logistic regressions. The advantage of our approach is that we can directly assess the relative risks, which are the true quantity of interest. (Odds ratios only approximate the relative risk.) This model was fit by using an iteratively reweighted least-squares algorithm designed to minimize the deviance function. Huber-White standard errors were used to account for clustering within a phantom.

We also sought to identify reader characteristics, such as experience, that were associated with poor or good performance. Readers were asked how many years they had practiced breast imaging (<2, 2–5, 6–10, or >10 years), what percentage of time they spent in breast imaging (by quartiles), who performs breast imaging in their practices (themselves, resident or fellow and then themselves, technologist, or technologist and then themselves), number of mammograms read per week, number of breast US examinations performed per week, and whether they were currently performing whole-breast US for screening, only for extent of disease, or neither. Regression models included these variables to assess their effect. Some analyses were also stratified by experience variables to check for trends.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
Results were analyzed for 64 investigators experienced in breast US from 23 institutions in the United States and Canada. Table 2 summarizes demographic information against reader performance in lesion detection. Of 64 radiologist investigators, 21 (33%) perform their own US in clinical practice, 11 (17%) have a trainee perform initial US and then repeat it, 31 (48%) first have a technologist perform the US, and one (2%) did not respond. Modeling suggested slight differences in performance on this task on the basis of who performs the scanning, but these differences were difficult to interpret. The two investigators who have technologists perform the scanning (who appeared to have improved performance compared with the other groups) were among the minority of investigators who performed the task separate from the training sessions without any time constraints. Those investigators reading at least 300 mammograms per week were more successful in detecting lesions in the US phantom than were those who read lower volumes of mammograms. Most investigators were performing whole-breast US prior to their planned participation in the ACRIN screening US trial: 36 (58%) of 62 investigators for both diagnostic and screening examinations and 15 (24%) for only women with newly diagnosed cancer in that breast; only 11 investigators (18%) were not previously performing or interpreting images from whole-breast US (sonography).


View this table:
[in this window]
[in a new window]

 
Table 2. Demographics of and Lesion Detection among 64 Radiologists

 
Lesion Detection
Of the 17 phantom lesions, a median of 14 (82%) were detected (mean, 13.6; range, 9–16), and 55 (86%) of 64 investigators detected at least 12 lesions. Of the 1088 potential lesion observations among all 64 readers, 861 observations (79.1%) were made.

The rate of lesion detection decreased as lesion diameter decreased and depth increased (Tables 1, 3). Of 512 potential observations of lesions 5–10 mm in diameter for all readers, 499 observations (97.5%) were made, excluding a 6-mm "skin" lesion that was seen by only seven (11%) of 64 observers (Fig 2). A 4-mm mass was seen by 53 (83%) of 64 observers. Of 384 potential observations of 3-mm lesions, 274 (71.4%) were made (Figs 3, 4), as were 28 (44%) of 64 potential observations of a 2-mm lesion (Fig 4). The rate of detection for the 4-mm lesion was significantly lower than rates for lesions of 5 mm or larger (P < .001). Those lesions that were 3 mm in diameter tended to be less likely to be detected than the lesion that was 4 mm in diameter (P = .053), and the 2-mm lesion was less likely to be detected than were the 3-mm lesions (P < .001). A relative risk regression model indicated that for each millimeter increase in lesion diameter from 2 to 5 mm, the relative risk of detection was 1.46 (95% confidence interval: 1.40, 1.54) (Table 3).


View this table:
[in this window]
[in a new window]

 
Table 3. Relative Risk of Detection by 64 Observers as a Function of Lesion Depth, Size, and Features for 17 Known Lesions in a Breast Phantom

 

Figure 2
View larger version (124K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2: Transverse US scan without spatial compounding shows superficial 6-mm "cyst," lesion 17 (arrowhead), reported by only seven (11%) of 64 readers. Its location in a corner of the phantom, as well as the reverberation artifacts, may have hampered lesion detection.

 

Figure 3
View larger version (70K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3: Sagittal panoramic spatially compounded US scan of phantom shows lesions 1, 11, 4, and 12. A 3-mm anechoic lesion (lesion 16) was not seen on this image, midway between lesions 11 and 4, but was identified by 59 (92%) of 64 readers. Lesion 1, a hyperechoic, round, 10-mm mass with posterior shadowing was identified by 62 readers (97%) and accurately characterized as hyperechoic by 58 (94%) and as showing posterior shadowing by 58 (94%). The 6-mm anechoic mass, lesion 11, was identified by 62 readers (97%) and characterized as anechoic by 42 (68%). Lesion 4, a 3-mm hyperechoic round mass, was reported by 50 readers (78%). Lesion 12, a hypoechoic mass with shadowing, was seen by 55 readers (86%) and accurately characterized as showing shadowing in 52 detections (95%).

 

Figure 4
View larger version (176K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4: Transverse US scan without spatial compounding shows 3-mm irregular hypoechoic lesion (lesion 5, arrowhead), which was detected by only 23 (36%) of 64 readers.

 
There was an inverse relationship between lesion depth and rate of detection: As depth increased, the rate of detection decreased. In our relative risk regression model, we found that a 1-mm increase in lesion depth results in a reduced relative risk of detection of 0.55 (95% confidence interval: 0.51, 0.59) (Table 3). Our regression model indicated that this relationship was further compounded for lesions smaller than 5 mm. Empirical evidence supports this: A 3-mm lesion at 1.5 cm depth was detected in 59 (92%) of 64 potential observations, while four 3-mm lesions at 2.6 cm depth were detected in 183 (71%) of 256 potential observations (P < .001). The one 3-mm lesion at 4.1 cm depth was detected by only 32 (50%) of 64 observers (P < .001).

Hyperechoic lesions were the easiest to detect, with a relative risk of detection of 7.13 (95% confidence interval: 3.0, 17.0) (Table 3). In modeling, hypoechoic lesions were easier to identify than were anechoic lesions, with a relative risk of detection of 2.14 (95% confidence interval: 1.04, 4.42) (Table 3, Fig 4), although empirically hypoechoic lesion 5 was less often identified than anechoic lesion 15, which was of the same diameter and at the same depth (Table 1). Lesions with posterior enhancement or, to a lesser degree, posterior shadowing, were more likely to be detected than were those with no posterior features (Table 3).

Most investigators reported neither false-positive findings (median, 0; mean, 0.4; range, 0–5) nor identification of the same lesion more than once ("duplicates," Fig 1; median, 0; mean, 0.6; range, 0–3). There were 47 (5%) duplicate lesion descriptions among 936 reported lesions.

Feature Analysis
Feature analysis errors were common for lesions 2–4 mm in diameter, with only 33% of 3-mm anechoic masses so characterized. For eight lesions 6–10 mm in diameter, investigators erred in the description of features for a median of 1 lesion (mean, 1.3; range, 0–4).

Though we dichotomized description of lesion shape (round-oval or irregular), agreement of observers with the true lesion shape was only slightly better than expected by chance, with an average {kappa} of 0.14 (Table 4). Agreement was substantial for echogenicity, with an average {kappa} of 0.61 (Table 4).


View this table:
[in this window]
[in a new window]

 
Table 4. Feature Analysis by 64 Observers for 17 Known Lesions in a Breast Phantom

 
The four anechoic lesions 6–10 mm in size (lesions 6, 9, 11, and 13) were recognized as anechoic in 215 (86.3%) of 249 observations and were described as hypoechoic in 28 observations (11.2%), complex cystic in five observations (2.0%), and hyperechoic in one observation (0.4%). Of the five anechoic lesions 6 mm in diameter or larger, lesion 11, at a depth of 4.1 cm, was the most difficult to so characterize (Figs 3, 5). The three 3-mm anechoic lesions (lesions 14, 15, and 16) were more likely to be reported as hypoechoic than as anechoic: They were reported as anechoic in 48 (33%) of 146 observations, as hyperechoic in nine (6%), as complex cystic in three (2%), and as hypoechoic in 86 (59%). The deepest 3-mm anechoic lesion (lesion 14) was so described by only one (3%) of 32 observers, while the most superficial (lesion 16) was so described by 30 (51%) of 59 observers.


Figure 5
View larger version (185K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5: Distinction of an anechoic "cyst" from a hypoechoic mass was problematic at times. Transverse US scan without spatial compounding shows two adjacent anechoic 6-mm lesions. The lesion to the left (arrow, lesion 11) appears hypoechoic on this image, possibly due to its depth in the phantom, and was considered anechoic by 42 (66%) of 64 observers, complex cystic by two (3%), and hypoechoic by 18 (28%). The lesion to the right (arrowhead, lesion 13) was correctly classified as anechoic by 56 (92%) of 61 observers and considered complex cystic by two (3%) and hypoechoic by three (5%).

 
Agreement was moderate for posterior features, with an average {kappa} of 0.45 (Table 4). Use of spatial compounding was elective, and, when used, it decreased the conspicuity of posterior features (Fig 6).


Figure 6
View larger version (175K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6a: Transverse US scans demonstrate that posterior features are more difficult to depict with spatial compounding. (a) Scan without spatial compounding shows hyperechoic lesion (left, lesion 1) with posterior shadowing (arrow) and hypoechoic lesion (right, lesion 2) with posterior shadowing (arrowhead). (b) Scan with spatial compounding shows the same lesions. Posterior shadowing is still evident for lesion 1 (arrow) but is much less apparent for lesion 2 (arrowhead). Of 62 readers who identified lesion 1, 58 (94%) recognized posterior shadowing, whereas only 34 (55%) described lesion 2 as having shadowing. We did not control for use of spatial compounding, which decreases conspicuity of posterior features.

 

Figure 6
View larger version (147K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6b: Transverse US scans demonstrate that posterior features are more difficult to depict with spatial compounding. (a) Scan without spatial compounding shows hyperechoic lesion (left, lesion 1) with posterior shadowing (arrow) and hypoechoic lesion (right, lesion 2) with posterior shadowing (arrowhead). (b) Scan with spatial compounding shows the same lesions. Posterior shadowing is still evident for lesion 1 (arrow) but is much less apparent for lesion 2 (arrowhead). Of 62 readers who identified lesion 1, 58 (94%) recognized posterior shadowing, whereas only 34 (55%) described lesion 2 as having shadowing. We did not control for use of spatial compounding, which decreases conspicuity of posterior features.

 
On average, the x- and y-direction lesion location was described accurately to within 0.7 cm (range, 0–4.0 cm). Readers were asked to describe depth to within 0.5 cm and did so accurately to an average error of 0.3 cm, with a range of 0–2.2 cm.

Measurement of lesion diameters was generally highly accurate. The mean difference between actual largest diameter and measured largest diameter ranged from 0.3 to 0.7 mm for all "parenchymal" lesions, with absolute errors ranging from 1.0 to 4.1 mm and percentage errors ranging from 5% to 28%. Not surprisingly, the greatest percentage error in measurement of lesion diameter occurred for the smallest lesions (ie, those lesions 2–4 mm in diameter). For the 6-mm "skin" lesion, the mean difference in measured diameter compared with known diameter was 1.4 mm.

Overall Performance
Of a possible score of 17, the mean score was 12.8 (standard error, 0.3), with a range of 7–16. Of the 64 investigators, 12 (19%) failed to achieve a score of 11.5 or better on the first attempt.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 
While screening breast US is not the standard of care at this time, more than half (58%) of the investigator observers interested in participating in the ACRIN 6666 trial of screening breast US were performing some screening US prior to the study. Similarly, Farria et al (19) reported that 35% of responding breast imagers surveyed in the Society of Breast Imaging in 2004 were offering screening breast US even though it remains investigational. Another 24% of radiologists in our study were performing whole-breast US to assess disease extent in women with newly diagnosed cancer.

Those radiologists who interpret at least 300 mammograms per week performed better, on average, than did those interpreting fewer mammograms. This is consistent with other observations of improved performance in breast imaging with greater specialization (20,21). Apparent improved performance of the two radiologists whose technologists perform the US scanning is not beyond expected by random variability and is unlikely to be generalizable; indeed, those same investigators had significantly worse performance in interpreting breast US images in a separate qualifying task (22).

For screening US to be of practical value, cancers smaller than 10 mm in size must be reliably depicted. In the multicenter Radiation Oncology Diagnosis Group V trial (23,24), which was conducted between 1994 and 1996, 561 (76.1%) of 737 nonpalpable breast masses undergoing biopsy were detectable at US by using 7.5-MHz transducers, including 33 (79%) of 42 masses smaller than 10 mm in size (Radiation Oncology Diagnosis Group V, unpublished data, 2001). This is a considerable improvement over the 25% sensitivity of US for masses undergoing biopsy seen in a series in the early 1980s (25) and over the detection of only 8% of malignant lesions smaller than 10 mm in the early work of Sickles et al (26). In our series in a breast US phantom, by using higher-frequency transducers with an average center frequency of 11 MHz, 97.5% of "parenchymal" lesions 5–10 mm in diameter were detected across multiple observers.

Except for the extreme case of the "skin lesion," superficial lesions were more reliably detected and characterized than were deeper lesions. We did not control for field of view or focal zone settings used by investigators, and each of these can affect lesion and feature conspicuity. Broad bandwidth transducers use lower frequencies at greater depths, which results in decreased resolution for deeper lesions. The phantom is 5 cm thick when the retromammary fat pad is included, and the 3-mm "cyst" at 4 cm depth was particularly difficult for readers to identify. In practice, in the supine position two-thirds of breasts are less than 3 cm thick and 86% are less than 4 cm thick (ACRIN 6666, unpublished observations). The results from this phantom study suggest that performance of US may be diminished in large breasts, although clinical validation is warranted and is in progress (current protocols, ACRIN 6666; available at: www.acrin.org).

One superficial "cyst" (lesion 17) was placed in a corner of each phantom at the "skin" surface, and it was difficult to depict both because of its location near the edge of the phantom and because of the presence of reverberation artifact within it. Even with current 10–14-MHz transducers, the most superficial 7 mm of the breast is not optimally evaluated, as the beam cannot be focused more superficially. A glob of gel or standoff pad is still needed for optimal characterization of superficial masses.

In logistic regression modeling, we found hypoechoic and isoechoic lesions were predicted to be easier to detect than anechoic lesions. In practice, such lesions can mimic fat lobules and go clinically undetected. As discussed below, the 3-mm lesions created as anechoic often appeared to contain minimal low-level echoes and were more often interpreted as hypoechoic; this likely falsely improved the apparent detection of lesions deemed "hypoechoic."

The presence of posterior shadowing or especially enhancement greatly facilitated lesion detection. In clinical practice, most cysts will demonstrate posterior enhancement, and this feature dominated our modeling for predicting successful detection compared with hypoechogenicity. Insofar as spatial compounding reduces visibility of posterior features (27,28), our results suggest that lesion detection may be improved if spatial compounding is "off" while surveying the breast, as in screening US. In this study, investigators could use spatial compounding if desired, but its use was not specifically recorded so that its effect could not be determined. When spatial compounding is processed line by line as frequency compounding, rather than frame by frame as true spatial compounding, posterior features may still be apparent, although such processing was not available at the time when our study was performed.

Lesion characterization was more accurate for lesions 5 mm and larger than for smaller lesions. Anechoic lesions mimicking simple cysts smaller than 5 mm were not accurately characterized by experienced investigators, and only 33% of 3-mm anechoic lesions were so recognized. In clinical practice, this may result in a need for excessive rates of follow-up or aspiration of incidental small cysts seen at US. Routine use of tissue harmonic imaging may help distinguish cystic from solid lesions (27,29), although this technique was not available on the equipment used in this study. Spatial compounding may facilitate improved characterization of margins and internal structure once a lesion is detected (27,28,30), although it is unlikely to facilitate proper determination of echogenicity.

Successful distinction of oval circumscribed masses from irregular masses and/or masses with margins that are not circumscribed is critical in appropriate patient care. Most incidental circumscribed oval or gently lobulated solid masses with no posterior features or minimal enhancement may be followed (31,32), whereas irregular masses require intervention. In the phantom, lesions were designed to be either round (n = 16) or irregular (n = 1). In practice, round lesions are more concerning than oval parallel lesions (14,31). As with the recognition of a lesion as anechoic, distinction of a mass as round from one that was irregular was not reliable for the one 4-mm irregular lesion in this phantom, and this again suggests inaccurate management may be more common for small lesions (<5 mm) in clinical practice.

The successful performance of US in both lesion detection and lesion characterization across the majority of the 64 observers in this phantom study is encouraging, though these idealized circumstances may overestimate reliability in clinical practice. Bosch et al (33) found high interexamination agreement in both detection and classification of lesions across three observers independently performing real-time whole-breast US in 58 patients and 113 breasts. The {kappa} values were 0.72–0.75 between pairs of observers, which indicates excellent reliability (33). Of importance, {kappa} values exceeded those for mammography across the same observers in the same patients (33).

The phantoms were constructed in a standard fashion, although we did identify a few false-positive hyperechoic lesions in one phantom, which may have confused participating investigators and contributed to premature satisfaction of search for a few observers. One superficial lesion (lesion 8) was designed to be hypoechoic, although it was closer to isoechoic in some of the phantoms, so that reader agreement on echogenicity was not included for that lesion.

One of the challenging aspects of this reader qualification task was the multiplicity of lesions. The systematic recording of at least 12 lesions, including location, size, and features, is laborious and may result in reader fatigue. We found a 5% rate of duplicate lesion reporting. The investigators were experienced in the performance of breast US and interpretation of breast US images, had knowledge that their performances were being evaluated based on lesion detection, and had no specific time constraint. Despite this, there were two readers who detected as few as nine (53%) of 17 lesions. In clinical practice, multiple bilateral lesions may be difficult to accurately report and follow with freehand US, although further study of this issue is warranted.

Despite variability in lesion detection, lesion diameter was reliably recorded for all lesions. This is reassuring when contemplating short-interval follow-up of probably benign lesions seen only at US. The minimal variability in phantom lesion measurements may underestimate variability in clinical practice, however, as neither the phantoms nor individual lesions were compressible. Of importance, and not unexpected, measurement of diameter was more accurate as a percentage of the total diameter for lesions 5 mm and larger. In ACRIN protocol 6666, an increase of 20% or more in lesion volume is considered a true increase (www.acrin.org). Such a calculation may be invalid with lesions 4 mm in diameter or smaller, if the error in each diameter exceeds 20% as seen in this study for such small lesions, though further validation is needed in clinical practice.

Performance of experienced, trained investigators participating in this study may exceed that in routine practice. Sickles et al (20) reported that specialists in breast imaging detected 6 cancers per 1000 screening mammography examinations compared with 3.4 cancers per 1000 detected by general radiologists. This improved performance was seen together with a lower recall rate among the specialists (20). Linver et al (34), and, more recently, Berg et al (35), showed that mammographic interpretive skills could be improved through training. The training materials developed for the ACRIN 6666 investigators are available on request.

There are several other limitations to our study. Investigators could spend an hour scanning the phantom, which is unrealistic in usual clinical practice. Investigators knew that the phantoms had a number of lesions, and they could have accessed a key that described details of how many lesions and of what types were present: Detection rates are likely overestimated, as US detection is facilitated by knowledge that a lesion is present (as with second-look US following magnetic resonance imaging). Consistency of reporting lesion location, diameter, and depth is likely overestimated because the lesions in these phantoms are fixed, and the phantom itself is not compressible. In this phantom, only one irregular lesion was included. While not a goal of our series or this task, it may be desirable to develop a similar phantom that emphasizes subtle distinction of lesion margins as circumscribed or not or of shape as irregular or oval in order to measure reader performance in distinguishing lesions that require biopsy from those that can be followed.

In summary, 97.5% of parenchymal lesions 5–10 mm in diameter were detected in a breast US phantom; lesions smaller than 5 mm were less consistently identified and were not accurately characterized by experienced investigators. We anticipate similar results in clinical studies of breast US, and validation of these results is ongoing.


    ADVANCES IN KNOWLEDGE
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 


    ACKNOWLEDGMENTS
 
The authors acknowledge the outstanding efforts of Cynthia Olson, MBA, MHS, and Donna Hartfeil, BSN, and the staff of the Northwestern Medical Foundation Breast Center in coordinating investigator training sessions.


    FOOTNOTES
 

Abbreviations: ACRIN = American College of Radiology Imaging Network • BI-RADS = Breast Imaging Reporting and Data System

See also the article by Madsen et al in this issue.

Author contributions: Guarantors of integrity of entire study, W.A.B., J.D.B., E.B.M.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, W.A.B.; experimental studies, W.A.B., E.B.M., E.L.M.; statistical analysis, W.A.B., J.D.B., J.B.C.; and manuscript editing, all authors

Authors stated no financial relationship to disclose.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCES IN KNOWLEDGE
 References
 

  1. Paci E, Duffy SW, Giorgi D, et al. Quantification of the effect of mammographic screening on fatal breast cancers: The Florence Programme 1990–96. Br J Cancer 2002;87:65–69.[CrossRef][Medline]
  2. Tabar L, Vitak B, Chen HH, et al. The Swedish Two-County Trial twenty years later: updated mortality results and new insights from long-term follow-up. Radiol Clin North Am 2000;38:625–651.[CrossRef][Medline]
  3. Kerlikowske K, Grady D, Barclay J, Sickles EA, Ernster V. Effect of age, breast density, and family history on the sensitivity of first screening mammography. JAMA 1996;276:33–38.[Abstract]
  4. Kolb TM, Lichy J, Newhouse JH. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology 2002;225:165–175.[Abstract/Free Full Text]
  5. Mandelson MT, Oestreicher N, Porter PL, et al. Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Inst 2000;92:1081–1087.[Abstract/Free Full Text]
  6. Berg WA, Gutierrez L, Nessaiver MS, et al. Diagnostic accuracy of mammography, clinical examination US, and MR imaging in preoperative assessment of breast cancer. Radiology 2004;233:830–849.[Abstract/Free Full Text]
  7. D'Orsi CJ, Bassett LN, Bers WA, et al. Illustrated Breast Imaging Reporting and Data System (BI-RADS): mammography. 4th ed. Reston, Va: American College of Radiology, 2003.
  8. Gordon PB, Goldenberg SL. Malignant breast masses detected only by ultrasound: a retrospective review. Cancer 1995;76:626–630.[CrossRef][Medline]
  9. Buchberger W, Niehoff A, Obrist P, DeKoekkoek-Doll P, Dunser M. Clinically and mammographically occult breast lesions: detection and classification with high-resolution sonography. Semin Ultrasound CT MR 2000;21:325–336.[CrossRef][Medline]
  10. Kaplan SS. Clinical utility of bilateral whole-breast US in the evaluation of women with dense breast tissue. Radiology 2001;221:641–649.[Abstract/Free Full Text]
  11. Crystal P, Strano SD, Shcharynski S, Koretz MJ. Using sonography to screen women with mammographically dense breasts. AJR Am J Roentgenol 2003;181:177–182.[Abstract/Free Full Text]
  12. Leconte I, Feger C, Galant C, et al. Mammography and subsequent whole-breast sonography of nonpalpable breast cancers: the importance of radiologic breast density. AJR Am J Roentgenol 2003;180:1675–1679.[Abstract/Free Full Text]
  13. Berg WA. Rationale for a trial of screening breast ultrasound: American College of Radiology Imaging Network (ACRIN) 6666. AJR Am J Roentgenol 2003;180:1225–1228.[Free Full Text]
  14. Mendelson EB, Baum JK, Berg WA, Merritt CRB, Rubin E. Illustrated Breast Imaging Reporting and Data System (BI-RADS): ultrasound. Reston, Va: American College of Radiology, 2003.
  15. Madsen EL, Berg WA, Mendelson EB, Frank GR. Anthropomorphic breast phantoms for qualification of investigators for ACRIN protocol 6666. Radiology 2006;239(3):869–874.
  16. Madsen EL, Zagzebski JA, Frank GR. Oil-in-gelatin dispersions for use as ultrasonically tissue-mimicking materials. Ultrasound Med Biol 1982;8:277–287.[CrossRef][Medline]
  17. Madsen EL, Kelly-Fry E, Frank GR. Anthropomorphic phantoms for assessing systems used in ultrasound imaging of the compressed breast. Ultrasound Med Biol 1988;14(suppl 1):183–201.
  18. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–174.[CrossRef][Medline]
  19. Farria DM, Schmidt ME, Monsees BS, et al. Professional and economic factors affecting access to mammography: a crisis today, or tomorrow?—results from a national survey. Cancer 2005;104:491–498.[CrossRef][Medline]
  20. Sickles EA, Wolverton DE, Dee KE. Performance parameters for screening and diagnostic mammography: specialist and general radiologists. Radiology 2002;224:861–869.[Abstract/Free Full Text]
  21. Smith-Bindman R, Chu P, Miglioretti DL, et al. Physician predictors of mammographic accuracy. J Natl Cancer Inst 2005;97:358–367.[Abstract/Free Full Text]
  22. Berg WA, Blume J, Cormack JB, Mendelson EB. Effect of experience variables on interpretive performance in breast ultrasound: results from ACRIN protocol 6666 (abstr). In: Radiological Society of North America Scientific Assembly and Annual Meeting Program. Oak Brook, Ill: Radiological Society of North America, 2004; 317.
  23. Fajardo LL, Pisano ED, Caudry DJ, et al. Stereotactic and sonographic large-core biopsy of nonpalpable breast lesions: results of the Radiologic Diagnostic Oncology Group V study. Acad Radiol 2004;11:293–308.[CrossRef][Medline]
  24. Pisano ED, Fajardo LL, Caudry DJ, et al. Fine-needle aspiration biopsy of nonpalpable breast lesions in a multicenter clinical trial: results from the Radiologic Diagnostic Oncology Group V. Radiology 2001;219:785–792.[Abstract/Free Full Text]
  25. Weber WN, Sickles EA, Callen PW, Filly RA. Nonpalpable breast lesion localization: limited efficacy of sonography. Radiology 1985;155:783–784.[Abstract/Free Full Text]
  26. Sickles EA, Filly RA, Callen PW. Breast cancer detection with sonography and mammography: comparison using state-of-the-art equipment. AJR Am J Roentgenol 1983;140:843–845.[Abstract/Free Full Text]
  27. Seo BK, Oh YW, Kim HR, et al. Sonographic evaluation of breast nodules: comparison of conventional, real-time compound, and pulse-inversion harmonic images. Korean J Radiol 2002;3:38–44.[Medline]
  28. Huber S, Wagner M, Medl M, Czembirek H. Real-time spatial compound imaging in breast ultrasound. Ultrasound Med Biol 2002;28:155–163.[CrossRef][Medline]
  29. Szopinski KT, Pajk AM, Wysocki M, Amy D, Szopinska M, Jakubowski W. Tissue harmonic imaging: utility in breast sonography. J Ultrasound Med 2003;22:479–487.[Abstract/Free Full Text]
  30. Merritt C, Piccoli C, Forsberg F, Wilkes A, Cavanaugh B, Lee S. Real-time spatial compound imaging of the breast: clinical evaluation of masses (abstr). Radiology 2000;217(P):491–492.
  31. Stavros AT, Thickman D, Rapp CL, Dennis MA, Parker SH, Sisney GA. Solid breast nodules: use of sonography to distinguish between benign and malignant lesions. Radiology 1995;196:123–134.[Abstract/Free Full Text]
  32. Berg WA. Breast ultrasonography: cystic lesions and probably benign findings. In: Berg WA, Javitt MC, eds. Women's imaging: strategies for clinical practice—categorical course syllabus. Leesburg, Va: American Roentgen Ray Society, 2004; 95–102.
  33. Bosch AM, Kessels AG, Beets GL, et al. Interexamination variation of whole breast ultrasound. Br J Radiol 2003;76:328–331.[Abstract/Free Full Text]
  34. Linver MN, Paster SB, Rosenberg RD, Key CR, Stidley CA, King WV. Improvement in mammography interpretation skills in a community radiology practice after dedicated teaching courses: 2-year medical audit of 38,633 cases. Radiology 1992;184:39–43.[Abstract/Free Full Text]
  35. Berg WA, D'Orsi CJ, Jackson VP, et al. Does training in the Breast Imaging Reporting and Data System (BI-RADS) improve biopsy recommendations or feature analysis agreement with experienced breast imagers at mammography? Radiology 2002;224:871–880.[Medline]



This article has been cited by other articles:


Home page
JAMAHome page
W. A. Berg, J. D. Blume, J. B. Cormack, E. B. Mendelson, D. Lehrer, M. Bohm-Velez, E. D. Pisano, R. A. Jong, W. P. Evans, M. J. Morton, et al.
Combined Screening With Ultrasound and Mammography vs Mammography Alone in Women at Elevated Risk of Breast Cancer
JAMA, May 14, 2008; 299(18): 2151 - 2163.
[Abstract] [Full Text] [PDF]


Home page
J Ultrasound MedHome page
B. Mesurolle, T. Helou, M. El-Khoury, M. Edwardes, E. J. Sutton, and E. Kao
Tissue Harmonic Imaging, Frequency Compound Imaging, and Conventional Imaging: Use and Benefit in Breast Sonography
J. Ultrasound Med., August 1, 2007; 26(8): 1041 - 1051.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
W. A. Berg, J. D. Blume, J. B. Cormack, and E. B. Mendelson
Operator Dependence of Physician-performed Whole-Breast US: Lesion Detection and Characterization.
Radiology, November 1, 2006; 241(2): 355 - 365.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
E. L. Madsen, W. A. Berg, E. B. Mendelson, and G. R. Frank
Anthropomorphic Breast Phantoms for Qualification of Investigators for ACRIN Protocol 6666
Radiology, June 1, 2006; 239(3): 869 - 874.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2393051069v1
239/3/693    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Berg, W. A.
Right arrow Articles by Madsen, E. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Berg, W. A.
Right arrow Articles by Madsen, E. L.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE