DOI: 10.1148/radiol.2372040996
(Radiology 2005;237:450-457.)
© RSNA, 2005
Free-Response Receiver Operating Characteristic Evaluation of Lossy JPEG2000 and Object-based Set Partitioning in Hierarchical Trees Compression of Digitized Mammograms1
Mónica Penedo, PhD,
Miguel Souto, MD,
Pablo G. Tahoces, PhD,
José M. Carreira, MD,
Justo Villalón, MD,
Gerardo Porto, MD,
Carmen Seoane, MD,
Juan J. Vidal, MD,
Kevin S. Berbaum, PhD,
Dev P. Chakraborty, PhD and
Laurie L. Fajardo, MD
1 From the Laboratorio de Imagen Médica, Hospital General Universitario Gregorio Marañón, C/Ibiza 43, 28009 Madrid, Spain (M.P.); the Laboratorio de Investigación Imagen Radiológica, Departamento de Radiología (M.S., J.M.C., J.J.V.), and Departamento de Electrónica y Computacion (P.G.T.), Universidade de Santiago de Compostela, Santiago de Compostela, Spain; the Complexo Hospitalario de Santiago de Compostela, Santiago de Compostela, Spain (M.S., J.M.C., J.V., G.P., C.S., J.J.V.); the Dept of Radiology, Univ of Iowa Hospitals and Clinics, Iowa City, Ia (K.S.B., L.L.F.); and the Dept of Radiology, Univ of Pittsburgh, Pittsburgh, Pa (D.P.C.). Received Jun 4, 2004; revision requested Aug 10; revision received Nov 4; accepted Dec 14.
Address correspondence to M.P. (e-mail: monica{at}mce.hggm.es).
 |
ABSTRACT
|
|---|
PURPOSE: To assess the effects of two irreversible wavelet-based compression algorithmsJoint Photographic Experts Group (JPEG) 2000 and object-based set partitioning in hierarchical trees (SPIHT)on the detection of clusters of microcalcifications and masses on digitized mammograms.
MATERIALS AND METHODS: The use of the images in this retrospective image-collection study was approved by the institutional review board, and patient informed consent was not required. One hundred twelve mammographic images (28 with one or two clusters of microcalcifications, 19 with one mass, 17 with both abnormal findings, and 48 with normal findings) obtained in 60 women who ranged in age from 25 to 79 years were digitized and compressed at 40:1 and 80:1 by using the JPEG2000 and object-based SPIHT methods. Five experienced radiologists were asked to locate and rate clusters of microcalcifications and masses on the original and compressed images in a free-response receiver operating characteristic (FROC) data acquisition paradigm. Observer performance was evaluated with the jackknife FROC method.
RESULTS: The mean FROC figures of merit for detecting clusters of microcalcifications, masses, and both radiographic findings on uncompressed images were 0.80, 0.81, and 0.72, respectively. With object-based SPIHT 80:1 compression, the corresponding values were larger than the values for uncompressed images by 0.005, 0.009, and 0.005, respectively. The 95% confidence interval for the differences in figures of merit between compressed and uncompressed images was 0.039, 0.033 for the microcalcification finding; 0.055, 0.034 for the mass finding; and 0.039, 0.030 for both findings. Because each of these confidence intervals includes zero, no significant difference in detection accuracy between uncompressed and object-based SPIHT 80:1 compression was observed at a P value of 5%. The F test of the null hypothesis that all of the modes (uncompressed and four compressed modes) were equivalent yielded the following results: F = 0.255, P = .903 for the microcalcification finding; F = 0.340, P = .848 for the mass finding; and F = 0.122, P = .975 for both findings.
CONCLUSION: To within the accuracy of these measurements, lossy compression of digital mammographic data at 80:1 with JPEG2000 or the object-based SPIHT algorithm can be performed without decreasing the rate of detection of clusters of microcalcifications and masses.
© RSNA, 2005
 |
INTRODUCTION
|
|---|
The widespread use of digital mammography devices has led to the need to evaluate whether they enable better diagnostic performance as compared with conventional screen-film devices. Comparative studies of both modalities have yielded encouraging results that promise the implementation of digital technology in mammography in the near future (17). However, high-spatial-resolution digital images in medicine tend to be large, which increases the processing and transmission times, the storage capacities needed, and the total costs associated with digital images. For instance, to preserve small or low-contrast lesions, such as clusters of microcalcifications or masses (the most frequent radiologic features of subtle breast cancer), a typical mammogram has to be digitized at a resolution of 50-µm spot size at 12 bits, resulting in approximately 40 MB of digital data (810). Given the current breast cancer screening guidelines of the American Cancer Society, the annual volume could exceed 5.6 petabytes. These demands create important challenges to the widespread adoption of digital mammography and have also hindered the use of computer-aided detection in breast cancer (11,12). Image compression provides a possible solution to this problem.
To substantially affect these storage and transmission costs, compression ratios higher than 10:1 are required, which implies the utilization of irreversible ("lossy") compression methods. There is a need to study the level of compression that can be used without resulting in clinically relevant diagnostic degradation. Previous studies in mammography have evaluated irreversible compression methods by determining their effects on computer-aided detection systems or radiologists' performance. Good et al (13) applied the original Joint Photographic Experts Group (JPEG) algorithm to 60 digitized mammograms and evaluated the performance of eight observers in the detection of masses and clusters of microcalcifications with receiver operating characteristic (ROC) analysis. Zheng et al (14) assessed the performance of a computer-aided detection system in the detection of breast cancer in 952 digitized mammograms after JPEG compression. Both groups of investigators have reported no statistical difference in results for detecting masses, but a significant difference was found for the detection of clusters of microcalcifications when images compressed at about 100:1 were used.
As wavelet transform emerged as a more effective image-compression technique than JPEG, reports of other evaluation studies that applied wavelet-based methods, including the new JPEG2000 standard, appeared in the literature. Kocsis et al (15) assessed the detection of clusters of microcalcifications in 68 digital mammograms compressed with a public-domain wavelet-based method. With an ROC study involving four observers, they demonstrated that a visually lossless threshold occurred at a 40:1 compression ratio. Perlmutter et al (16) used 57 digitized mammograms compressed with the set partitioning in hierarchical trees (SPIHT) algorithm, a widely known wavelet-based compression method. They assessed whether the use of lossy compression resulted in patient care decisions that were different from those that resulted from the use of uncompressed images. Their study revealed no statistically significant differences between analog or digitized original mammograms and mammograms compressed at an 80:1 compression ratio.
Sung et al (17) assessed JPEG2000 at several compression ratios in 20 low-spatial-resolution mammograms digitized at 8 bits per pixel. They concluded that there were no differences in lesion detectability with ratios up to 20:1. Suryanarayanan et al (18) investigated the effect of JPEG2000 with 10 contrast-detail phantoms containing circular gold disks of different diameters (0.13.2 mm) and thicknesses (0.051.6 µm). The phantoms were used in a clinical full-field digital mammography system, and the resulting images were compressed at different ratios (10:1, 20:1, and 30:1). Suryanarayanan et al included seven observers in the study and found no significant differences in perception up to 20:1, except for the disks that were 1 mm in diameter.
Recently, region-of-interest coding techniques have emerged as particularly suitable techniques for medical imaging. Such methods provide the possibility of adjusting the compression ratio to the clinical content of a region, so that regions with high diagnostic relevance are compressed at lower compression ratios (ie, with better quality) than the rest of the image. An evaluation of a region of interestbased modification of the SPIHT method that was adapted to digital mammography (called object-based SPIHT) revealed an improvement in compression efficiency as compared with original SPIHT or JPEG2000 (19).
The purpose of our study was to assess the effects of use of two irreversible wavelet-based compression algorithms (JPEG2000 and object-based SPIHT) at 40:1 and 80:1 compression ratios on the detection of clusters of microcalcifications and masses on digitized mammograms.
 |
MATERIALS AND METHODS
|
|---|
Image Selection and Digitization
A total of 112 conventional mammograms that included craniocaudal and lateral views and had been obtained in 60 patients (age range, 2579 years) were used in this study. The images were collected from among images obtained as part of the daily clinical caseload of the Department of Radiology of the Hospital Clínico Universitario de Galicia, University of Santiago de Compostela, Santiago de Compostela, Spain. The institutional review board of the hospital approved the use of the images in this retrospective image-collection study. Informed consent was not required. Of the 112 mammograms, 28 contained one or two clusters of microcalcifications, 19 contained one mass, 17 contained both abnormal findings, and 48 contained normal findings. In total, there were 54 clusters of microcalcifications (32 of which were found to be malignant) and 36 masses (24 of which were found to be malignant), all of which were biopsy proved.
The selection criterion for abnormal mammograms was that they showed clusters of microcalcifications (with more than five microcalcifications smaller than 1 mm within a region 1 cm2 in area) and/or masses (with irregular shapes and ill-defined or spiculated margins) but excluded obvious findings. The subtlety of the radiographic findings was rated subjectively by a radiologist (M.S.) with 10 years of experience in mammography who was not included as a reader in the study by using a four-level scale (obvious, relatively obvious, subtle, and very subtle). Two of the clusters were ranked as obvious to detect, 21 as relatively obvious, 23 as subtle, and eight as very subtle; nine of the masses were ranked as obvious to detect, three as relatively obvious, 16 as subtle, and eight as very subtle. After reviewing each patient's radiology and pathology reports, this radiologist also established the location of the center of each finding and its extent (the latter was recorded as the coordinates of the smallest rectangle that contained the finding). These values were stored in a "ground truth file" and were used in the observer performance study. All images were digitized with a commercially available laser film digitizer (Lumiscan 85; Lumisys, Sunnyvale, Calif) at a resolution of 4096 horizontal x 5120 vertical pixels (50 µm/pixel). The optical density range of the digitizer was 0.034.1, and each pixel was digitized to 12 bits precision, yielding 4096 gray levels per pixel.
Image Compression, Decompression, and Printing Methods
Each digitized mammogram was compressed at 40:1 and 80:1 and then decompressed by using two wavelet-based methods: the standard JPEG2000 method applied to the full image and an object-based implementation of the SPIHT algorithmcalled object-based SPIHTapplied to the mammogram region. JPEG2000 and object-based SPIHT provide progressive transmission of images, which makes these compression methods suitable for applications such as telemammography. Both methods use the Daubechies 97 biorthogonal wavelet filters.
Standard JPEG2000 Technique
JPEG2000, developed by the JPEG (20,21), is the emerging standard for still image compression. This standard was developed to provide features that the old JPEG standard (22) did not provide, such as control of the compression ratio, progressive lossy to lossless coding and decoding, multiresolution representation, error resilience, and region-of-interest encoding. In JPEG2000, region-of-interestbased compression is performed by prioritizing the coding of the information associated with the desired region. However, software implementations of the JPEG2000 region-of-interest operation mode allow the use of only rectangular or circular region shapes. The JasPer codec (coder/decoder) software implementation of JPEG2000 developed by Image Power and the University of British Columbia (both in Vancouver, British Columbia, Canada) was used in this study (20).
Object-based SPIHT Technique
SPIHT is a state-of-the art image compression technique developed by Said and Pearlman (23). It has been previously applied in lossy compression of medical images (16,24). In our study, the object-based SPIHT coding method, an object-based extension of the SPIHT, was used for compressing and decompressing the digitized mammograms. This region-of-interest encoding method is specifically adapted for digital mammography and has previously been described in detail (19). Briefly, a border-detection technique segments the mammogram into the tissue area and the nontissue image background. Then, as part of a texture coding method, a region-based discrete wavelet transform is applied only to the tissue area, decomposing it into wavelet coefficients. Finally, object-based SPIHT permits exclusion from the wavelet coefficients of information that is associated with pixels outside the breast region. This method also provides the possibility of encoding multiple arbitrarily shaped regions of interest at any desired compression level.
After reconstruction, a total set of 560 digital mammograms at five compression levels was obtained: 112 original (uncompressed) mammograms, 112 mammograms compressed at 40:1 with JPEG2000, 112 mammograms compressed at 80:1 with JPEG2000, 112 mammograms compressed at 40:1 with object-based SPIHT, and 112 mammograms compressed at 80:1 with object-based SPIHT. All images were printed to film with a high-spatial-resolution Digital Imaging and Communications in Medicine, or DICOM, printer (SCOPIX LR5200-P; Agfa-Gevaert, Mortsel, Belgium). A lookup table was applied to the images to compensate for existing differences in calibration between the digitizer and the DICOM printer. Matching between hard-copy digitized images and analog original images was validated both visually and through optical densitometric measurements by a radiologist (M.S.) and a physicist (M.P.).
Observer Performance Study
The 560 images were randomly assigned to 16 reading sets, avoiding repetition of any image within the same set. Five radiologists (J.M.C., J.V., G.P., C.S., J.J.V.) with mammography experience that ranged from 10 to 12 years participated in the study and reviewed all images. All images were viewed on a standard light box in conditions of low ambient lighting, and readers could use a magnifying glass. Image sets and images within each set were presented in a different random order for each observer. The observers were told that the mammograms were negative or showed at least one lesion (ie, a mass or a cluster of microcalcifications [defined as five or more specks within a region 1 cm2 in area]) that was sampled for biopsy.
The free-response ROC (FROC) data collection and scoring paradigm has been described elsewhere (2527). Briefly, FROC data consist of mark-rating pairs, where the number of marks per image is determined by the reader and could be zero. A mark is a location on an image where the observer has perceived a possible lesion, and the rating is the corresponding confidence level. In this study, observers were required to identify the location of perceived clusters of microcalcifications or masses by indicating the smallest rectangle containing the finding. For each perceived lesion they also recorded a confidence level for the finding by using a four-point rating scale (definite, probable, possible, and questionable). These terms were more fully defined, respectively, as confidence levels of 4 (radiographic finding present with more than 95% probability), 3 (finding present with more than 75% but less than or equal to 95% probability), 2 (finding present with more than 50% but less than or equal to 75% probability), and 1 (finding present with less than or equal to 50% probability).
Because an observer saw each image five times during the study (as the original uncompressed and four compressed versions), a time limit of 20 seconds per image was imposed to minimize learning effects. An assistant recorded each observer's responses and removed the images from the light box when the time limit was exceeded. To further reduce learning effects, at least 1 week had to have elapsed between each reading session.
To classify the observers' location responses into true- and false-positive observations, we used previously described criteria (28)namely, when the intersection of the observer-indicated lesion extent and the true lesion extent (as recorded in the "ground truth file") was greater than 50%, the mark was classified as a true-positive mark. Any mark that did not comply with this requirement was classified as a false-positive mark. For the purpose of analyzing the data, unmarked lesions and normal images with no false-positive marks both received 0 ratings.
Statistical Analysis
The multireader multicase FROC data set, which consisted of true-positive or false-positive ratings that originated from 2800 observations (five types of images times five readers times 112 images), was analyzed with a recently introduced jackknife FROC method (29). The essential differences between the ROC and the jackknife FROC methods reside in the data collection paradigms and in how the figure of merit that quantifies observer performance is calculated. Multireader multicase ROC data are analyzed with the Dorfman-Berbaum-Metz method (30), which is implemented in the LABMRMC program (available at http://www-radiology.uchicago.edu/krl/KRL_ROC/software_index.htm#LABMRMC).
Like its multireader multicase ROC counterpart, a multireader multicase FROC data set is analyzed in two steps involving (a) the generation of pseudovalues and (b) analysis of the pseudovalues with an analysis of variance model. In an ROC study, each image interpretation yields a single rating, and the figure of merit, the area under the ROC curve, or Az, is the probability that an abnormal image rating exceeds a normal image rating. The calculation of pseudovalues involves removal of each image from the data set one at a time, recalculation of the figure of merit, and determination of the effect of the removal of the image on the figure of merit. In the case of jackknife FROC, each image interpretation yields varying numbers of mark-rating pairs, a localization criterion is applied to classify each mark as true- or false-positive, and the figure of merit
is defined as the probability that a lesion rating exceeds all false-positive ratings.
Although false-positive ratings can occur on abnormal images, they are ignored in the present analysis because otherwise the method would have the wrong statistical behavior (29). Omitting the false-positive responses on the abnormal images corrects this problem but at the expense of some loss in statistical power. Because the multiple responses involved in each image interpretation are reduced to a single pseudovalue, jackknife FROC analysis avoids the need to make assumptions of independence between events on the same image, thereby addressing concerns (31,32) with earlier kinds of FROC analysis (26,27). An F test is used internal to the analysis of variance, yielding a P value for rejecting the null hypothesis of no difference between the modalities. If the observed P value is smaller than 5%, the null hypothesis is rejected. Appendix A describes a procedure for calculating the confidence intervals for the observed intermodality differences in the jackknife FROC figures of merit. A statement of the confidence interval is equivalent to a statement of the statistical power of the study (see Appendix A) (33). (The jackknife FROC software used in this study [current version number, 1.04] is available for download at http://www.devchakraborty.com.)
 |
RESULTS
|
|---|
The reader-averaged values of the FROC figure of merit
for the five image types tested (uncompressed digitized images, images compressed at 40:1 and 80:1 with JPEG2000, and images compressed at 40:1 and 80:1 with object-based SPIHT) were 0.80, 0.81, 0.80, 0.81, and 0.80 for detecting clusters of microcalcifications; 0.81, 0.79, 0.81, 0.80, and 0.82 for detecting masses; and 0.72, 0.73, 0.73, 0.73, and 0.72 for detecting both radiographic findings (Tables 13).
View this table:
[in this window]
[in a new window]
|
TABLE 1. Values for the Detection of Clusters of Microcalcifications on Mammograms Processed with Each Compression Method and Ratio
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 3. Values for the Detection of Clusters of Microcalcification and Masses on Mammograms Processed with Each Compression Method and Ratio
|
|
Jackknife FROC analysis indicated that differences between the five types of image were insignificant at the 5% level for the detection of clusters of microcalcifications (F = 0.255, P = .903), masses (F = 0.340, P = .848), and both radiographic findings (F = 0.122, P = .975). These results are consistent with results of visual examination of representative images, which reveal that radiographic findings such as clusters of microcalcifications and masses are well preserved with JPEG2000 and object-based SPIHT, even at compression ratios of 80:1 (Figs 1, 2). For any given level of compression, the FROC figure of merit was smaller when both findings were included than when they were analyzed separately. For example, for the uncompressed images, the average figure of merit for microcalcifications was 0.80, the average figure of merit for masses was 0.81, and the average figure of merit when both findings were included was 0.72.

View larger version (179K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 1a. Magnified region of interest containing a subtle cluster of microcalcifications on lateral-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. The "rice artifacts" introduced during the compression process did not affect observer performance in the detection of clusters of microcalcifications.
|
|

View larger version (166K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 1b. Magnified region of interest containing a subtle cluster of microcalcifications on lateral-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. The "rice artifacts" introduced during the compression process did not affect observer performance in the detection of clusters of microcalcifications.
|
|

View larger version (175K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 1c. Magnified region of interest containing a subtle cluster of microcalcifications on lateral-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. The "rice artifacts" introduced during the compression process did not affect observer performance in the detection of clusters of microcalcifications.
|
|

View larger version (161K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 1d. Magnified region of interest containing a subtle cluster of microcalcifications on lateral-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. The "rice artifacts" introduced during the compression process did not affect observer performance in the detection of clusters of microcalcifications.
|
|

View larger version (170K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 1e. Magnified region of interest containing a subtle cluster of microcalcifications on lateral-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. The "rice artifacts" introduced during the compression process did not affect observer performance in the detection of clusters of microcalcifications.
|
|

View larger version (141K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 2a. Region of interest containing a subtle mass on craniocaudal-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. No visual differences are observed, despite the compression of the digitized images by up to 80:1.
|
|

View larger version (138K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 2b. Region of interest containing a subtle mass on craniocaudal-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. No visual differences are observed, despite the compression of the digitized images by up to 80:1.
|
|

View larger version (140K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 2c. Region of interest containing a subtle mass on craniocaudal-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. No visual differences are observed, despite the compression of the digitized images by up to 80:1.
|
|

View larger version (131K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 2d. Region of interest containing a subtle mass on craniocaudal-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. No visual differences are observed, despite the compression of the digitized images by up to 80:1.
|
|

View larger version (132K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 2e. Region of interest containing a subtle mass on craniocaudal-view mammogram. (a) Original uncompressed region and (be) same region compressed at (b) 40:1 with JPEG2000, (c) 40:1 with object-based SPIHT, (d) 80:1 with JPEG2000, and (e) 80:1 with object-based SPIHT. No visual differences are observed, despite the compression of the digitized images by up to 80:1.
|
|
With use of the method described in Appendix A, for the microcalcification task the 95% confidence interval for 
(the difference in figures of merit between the uncompressed and the object-based SPIHT 80:1 compressed mammograms) was 0.039, 0.033. To achieve a statistical power of 80% at a nominal significance level of 5% with the same number of readers and cases as in this study, the 
of the modalities would have to be 0.050 (see Appendix A). Similarly, for the detection of masses, the 95% confidence interval for the 
between the uncompressed and the object-based SPIHT 80:1 compressed mammograms was 0.055, 0.034. To achieve a statistical power of 80% at a nominal significance level of 5%, the 
of the modalities would have to be 0.062. Finally, for the detection of both findings, the 95% confidence interval for the 
between the uncompressed and the object-based SPIHT 80:1 compressed mammograms was 0.039, 0.030. To achieve a statistical power of 80% at a nominal significance level of 5%, the 
of the modalities would have to be 0.0483. Note that all of the indicated confidence levels include zero, which is consistent with the results of the F tests quoted earlierresults that revealed no significant differences at the 5% level between the different compression methods (including "no compression") for all tasks.
 |
DISCUSSION
|
|---|
To our knowledge, this work represents the first evaluation of observer performance in the detection of radiographic findings on high-spatial-resolution mammograms compressed with JPEG2000. The results obtained in this study indicate that, within the accuracy of our measurements, compression ratios up to 80:1 did not degrade the detection of clusters of microcalcifications or masses on digitized mammograms compressed with JPEG2000 or the region-based method object-based SPIHT as applied to the breast region within the mammogram. Either the JPEG2000 or the object-based SPIHT method can be used with digital technologies such as picture archiving and communication systems and telemammography to reduce storage and transmission costs and facilitate the growth of digital mammography as a medical imaging modality.
Our jackknife FROC confidence interval values were smaller than published data on the expected confidence intervals of similarly sized ROC studies (see Appendix B). We caution that our findings are based on just one data set (see also below) and that it is important to test them with more ROC and FROC studies performed by using the same images and readers. The apparently inconsistent result (Tables 13) that the FROC figure of merit
was smaller when both radiographic findings were included in the quantification of observer performance than when the two findings were analyzed separately is explained in Appendix C.
Unlike our study, most previous observer studies of image compression have used ROC methods. Smaller confidence intervals allow one to make stronger conclusions about the equivalence of different compression methods. To our knowledge, our study also represents the first clinical application of the jackknife FROC method to the evaluation of mammographic compression methods. Recently, results of another clinical study (34) in which the jackknife FROC method was applied to computer-aided detection in mammography have been reported. As with results of Dorfman-Berbaum-Metz analysis (30), results of jackknife FROC analysis generalize to the cases as well as to the readers. Alternate methods for analyzing location data are the location ROC method (32,35) and the region-of-interest method (36). We could not apply either of these methods to our FROC data because the data acquisition paradigms are incompatible.
Some of the images used in this study contained multiple lesions. Clinically, multiple breast lesions are not that common. Another complicating factor was "satisfaction of search," in which detection of a lesion may cause the reader to prematurely terminate the search (37) and miss other lesions. Our data set was enriched with positive findings, compared with the approximately 0.3% incidence of malignancy that is typically seen clinically. There was no "clinical pressure" involved in the evaluation of the data set in that there was no consequence to an incorrect decision, and the time for reading was limited to 20 seconds. Although these limitations are all legitimate concerns, as long as they applied equally to the different compressed modalities we do not expect these factors to affect the conclusion of our study. The analysis used in the study assumed that all images are independent. Because two images were generally obtained per patient, this assumption may not be satisfied by the data and the stated confidence intervals may underestimate the true values. A conservative correction would be to multiply the stated 95% symmetric confidence intervals by
2.
Owing to the neglect of false-positive responses on abnormal images and all but the highest-rated false-positive responses on normal images, we did not achieve the maximum statistical power of the FROC paradigm in this study. Further advances in FROC methods that are expected to enable utilization of all the data and achievement of even greater statistical power are being developed. In this study, the task was detection, not discriminating between normal and benign lesions, and it may be argued that the requirements of a true discrimination task would place more stringent requirements on image compression.
APPENDIX A
For calculating the confidence interval, given the assumption that the distribution of 
is normal, the 95% symmetric confidence interval is calculated as 2(1.96)
(
), where
(
) is the standard deviation of 
and the factor 2(1.96) is needed to achieve the 95% confidence interval. If the nominal significance value (ie, the type I error probability) of the test is 5%, then, for a two-sided test, the acceptance region for the null hypothesis extends from 1.96
(
) to 1.96
(
). Assume that one is seeking a power of 0.8. Because the area under a unit-normal distribution above the value 0.842 is 0.8, the upper end of the acceptance region must be 0.842
(
) units below the mean of the alternative hypothesis distribution. Because the upper end of the acceptance region is also 1.96
(
) units above the mean of the null hypothesis distribution, the separation of the alternative and null hypothesis distributions must be 2.802
(
) units. In other words, for a two-sided test at the 5% nominal significance level, the effect size needed for 80% power is 2.802
(
) units. If a one-sided test is desired, then the effect size for 80% power would be 2.487
(
) units.
The current version (ie, 1.04) of the jackknife FROC code calculates 95% confidence intervals for all 10 pairings of the five modalities that were tested. Prospective power analysis requires knowledge of the variability of the ratingsspecifically, how this variability is distributed between case, reader, and other components, collectively referred to as "variance components" (38). In principle, the variance components can be calculated from pilot data by using either bootstrapping (3741) or analysis of variance techniques (42). In cases of FROC analysis, one also needs estimates of the intraimage correlation coefficients (29).
APPENDIX B
In comparing our FROC confidence intervals to those obtained in earlier ROC-based studies, we note that Beiden et al (41) described a hypothetical two-modality study involving 60 normal and 50 abnormal images interpreted by five radiologists. The mean area under the ROC curve was 0.76. For the variance structures describing the reader and the image variability and other terms, they considered values reported by Roe and Metz (43) to be typical of a body of available radiologist ROC data. Using bootstrap analysis, Beiden et al found that, for these ROC-based studies, the expected 95% symmetric confidence interval was ±0.10. For comparison, in the microcalcification detection task, our FROC study involved 58 normal mammograms, 54 abnormal mammograms, and five radiologists. It is seen that, in this particular case, the jackknife FROC method yielded a substantially smaller 95% symmetric confidence interval than that expected for a similar-sized study involving the ROC method.
APPENDIX C
As seen in Tables 13, the FROC figure of merit
was smaller when both radiographic findings were included in the quantification of observer performance than when they were analyzed separately. The FROC sampling model (29) provides a natural explanation (perhaps not the only explanation) for this apparently inconsistent finding. Briefly, the definition of
implies a "competition" between the ratings of marked lesions, the true-positives, and the ratings of marked nonlesion locations, the false-positives. The number of false-positives on any image can be denoted by NFP. In this study, there were two types of false-positivesthose that resembled microcalcifications and those that resembled masses; their numbers can be denoted by NFPmicro and NFPmass, respectively. From the definition of the figure of merit,
micro = Prob(RTPmicro > HRFPmicro), where RTPmicro is the rating assigned to the microcalcification true-positive and HRFPmicro is the highest of the NFPmicro microcalcification false-positive ratings on the image. (An average over all images is implicit in the definition of
.) Similarly,
mass = Prob(RTPmass > HRFPmass), where RTPmass is the rating assigned to the mass true-positive and HRFPmass is the highest of the NFPmass mass false-positive ratings on the image. Finally,
both = Prob(RTPboth > HRFPboth), where RTPboth is the rating assigned to the mass or microcalcification true-positive and HRFPboth is the highest of the NFPmass + NFPmicro mass or microcalcification false-positive ratings on the image. Because some of the mass false-positive ratings will exceed some of the microcalcification true-positive ratings and some of the microcalcification false-positive ratings will exceed some of the mass true-positive ratings, one expects
both < (
micro or
mass). This explanation assumes that the microcalcification and mass false-positive locations are different and that the NFPmass + NFPmicro value is larger than the number of lesions on any image.
 |
ACKNOWLEDGMENTS
|
|---|
The authors are grateful to William A. Pearlman, PhD, director of the Center for Image Processing Research in the Electrical, Computer and Systems Engineering Department, Rensselaer Polytechnic Institute, Troy, NY, for his help and advice in data compression.
 |
FOOTNOTES
|
|---|
Abbreviations: FROC = free-response ROC JPEG = Joint Photographic Experts Group ROC = receiver operating characteristic SPIHT = set partitioning in hierarchical trees
Authors stated no financial relationship to disclose.
Author contributions: Guarantors of integrity of entire study, M.P., M.S., P.G.T.; study concepts and design, M.P., M.S., P.G.T.; literature research, M.P.; clinical studies, M.P., J.M.C., J.V., G.P., C.S., J.J.V.; data acquisition, M.P.; data analysis/interpretation, M.P., K.S.B., D.P.C.; statistical analysis, M.P., K.S.B., D.P.C.; manuscript preparation, M.P., D.P.C.; manuscript definition of intellectual content, M.P., M.S., P.G.T., D.P.C., L.L.F.; manuscript editing, revision/review, and final version approval, all authors
 |
References
|
|---|
- Jarlman O, Samuelsson L, Braw M. Digital luminescence mammography: early clinical experience. Acta Radiol 1991;32:110113.[Medline]
- Brettle DS, Ward SC, Parkin GJ, Cowen AR, Sumsion HJ. A clinical comparison between conventional and digital mammography utilizing computed radiography. Br J Radiol 1994;67:464468.[Abstract/Free Full Text]
- Cowen AR, Parkin GJ, Hawkridge P. Direct digital mammography image acquisition. Eur Radiol 1997;7:918930.[CrossRef][Medline]
- Venta LA, Hendrick RE, Adler YT, et al. Rates and causes of disagreement in interpretation of full-field digital mammography and film-screen mammography in a diagnostic setting. AJR Am J Roentgenol 2001;176:12411248.[Abstract/Free Full Text]
- Berns EA, Hendrick RE, Cutter GR. Performance comparison of full-field digital mammography to screen-film mammography in clinical practice. Med Phys 2002;29:830834.[CrossRef][Medline]
- Suryanarayanan S, Karellas A, Vedantham S, Ved H, Baker SP, D'Orsi CJ. Flat-panel digital mammography system: contrast-detail comparison between screen-film radiographs and hardcopy images. Radiology 2002;225:801807.[Abstract/Free Full Text]
- Pisano ED, Cole EB, Kistner EO, et al. Interpretation of digital mammograms: comparison of speed and accuracy of soft-copy versus printed-film display. Radiology 2002;223:483488.[Abstract/Free Full Text]
- Chan HP, Vyborny CJ, Macmahon H, Metz CE, Doi K, Sickles EA. Digital mammography: ROC studies of the effects of pixel size and unsharp-mask filtering on the detection of subtle microcalcifications. Invest Radiol 1987;22:581589.[CrossRef][Medline]
- Chan HP, Niklason LT, Ikeda DM, Lam KL, Adler DD. Digitization requirements in mammography: effects on computer-aided detection of microcalcifications. Med Phys 1994;21:12031211.[CrossRef][Medline]
- Chan HP, Lo SC, Niklason LT, Ikeda DM, Lam KL. Image compression in digital mammography: effects on computerized detection of subtle microcalcifications. Med Phys 1996;23:13251336.[CrossRef][Medline]
- Mendez AJ, Tahoces PG, Lado MJ, Souto M, Vidal JJ. Computer-aided diagnosis: automatic detection of malignant masses in digitized mammograms. Med Phys 1998;25:957964.[CrossRef][Medline]
- Lado M, Tahoces PG, Mendez AJ, Souto M, Vidal JJ. Evaluation of an automated wavelet-based system dedicated to the detection of clustered microcalcifications in digital mammograms. Med Inform Internet Med 2001;26:149163.[CrossRef][Medline]
- Good WF, Sumkin JH, Ganott M, et al. Detection of masses and clustered microcalcifications on data compressed mammograms: an observer performance study. AJR Am J Roentgenol 2000;175:15731576.[Abstract/Free Full Text]
- Zheng B, Sumkin JH, Good WF, Maitz GS, Chang YH, Gur D. Applying computer-assisted detection schemes to digitized mammograms after JPEG data compression: an assessment. Acad Radiol 2000;7:595602.[CrossRef][Medline]
- Kocsis O, Costaridou L, Varaki L, et al. Visually lossless threshold determination for microcalcification detection in wavelet compressed mammograms. Eur Radiol 2003;13:23902396.[CrossRef][Medline]
- Perlmutter S, Cosman P, Gray R, et al. Image quality in lossy compressed digital mammograms. Signal Process 1997;59:189210.
- Sung MM, Kim HJ, Kim EK, Kwak JY, Yoo JK, Yoo HS. Clinical evaluation of JPEG2000 compression for digital mammography. IEEE Trans Nucl Sci 2002;49:827832.[CrossRef]
- Suryanarayanan S, Karellas A, Vedantham S, Waldrop SM, D'Orsi CJ. A perceptual evaluation of JPEG 2000 image compression for digital mammography: contrast-detail characteristics. J Digit Imaging 2004;17:6470.[CrossRef][Medline]
- Penedo M, Pearlman WA, Tahoces PG, Souto M, Vidal JJ. Region-based wavelet coding methods for digital mammography. IEEE Trans Med Imaging 2003;22:12881296.[CrossRef][Medline]
- Adams M, Kossentini F. JasPer: a software-based JPEG2000 codec implementation. In: Proceedings of IEEE International Conference on Image Processing. Vol 2. Vancouver, British Columbia, Canada: Institute of Electrical and Electronics Engineers, 2000; 5356.
- Rabbani M, Joshi R. An overview of the JPEG2000 still image compression standard. Signal Process Image Comm 2002;17:348.[CrossRef]
- ISO/IEC 10918-1:1994. Information technologydigital compression and coding of continuous-tone still images: requirements and guidelines. International Organization for Standardization Web site. http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=18902&ICS1=35&ICS2=40&ICS3=):. Accessed September 1, 2005.
- Said A, Pearlman W. A new fast and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans Circuits Syst Video Technol 1996;6:243250.
- Savcenko V, Erickson BJ, Palisson PM, et al. Detection of subtle abnormalities on chest radiographs after irreversible compression. Radiology 1998;206:609616.[Abstract/Free Full Text]
- Chakraborty DP, Breatnach ES, Yester MV, Soto B, Barnes GT, Fraser RG. Digital and conventional chest imaginga modified ROC study of observer performance using simulated nodules. Radiology 1986;158:3539.[Abstract/Free Full Text]
- Chakraborty DP. Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. Med Phys 1989;16:561568.[CrossRef][Medline]
- Chakraborty DP, Winter LHL. Free-response methodology: alternate analysis and a new observer-performance experiment. Radiology 1990;174:873881.[Abstract/Free Full Text]
- Kallergi M, Carney GM, Gaviria J. Evaluating the performance of detection algorithms in digital mammography. Med Phys 1999;26:267275.[CrossRef][Medline]
- Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys 2004;31:23132330.[CrossRef][Medline]
- Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27:723731.[CrossRef][Medline]
- Metz C. Evaluation of digital mammography by ROC analysis. In: Doi K, Giger ML, Nishikawa RM, Schmidt RA, eds. Digital mammography '96. Amsterdam, the Netherlands: Elsevier Science, 1996; 6168.
- Swensson RG. Unified measurement of observer performance in detecting and localizing target objects on images. Med Phys 1996;23:17091725.[CrossRef][Medline]
- Metz CE. Quantification of failure to demonstrate statistical significance: the usefulness of confidence intervals. Invest Radiol 1993;28:5963.[CrossRef][Medline]
- Zheng B, Chakraborty DP, Rockette HE, Maitz GS, Gur D. A comparison of two data analyses from two observer performance studies using jackknife ROC and JAFROC. Medical Physics 2005;32:10311034.[CrossRef][Medline]
- Starr SJ, Metz CE, Lusted LB, Goodenough DJ. Visual detection and localization of radiographic images. Radiology 1975;116:533538.[Abstract]
- Obuchowski NA, Lieber ML, Powell KA. Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 2000;7:516525.[CrossRef][Medline]
- Berbaum KS, Franken EA, Dorfman DD, et al. Satisfaction of search in diagnostic radiology. Invest Radiol 1990;25:133140.[Medline]
- Roe CA, Metz CE. Variance-component modeling in the analysis of receiver operating characteristic index estimates. Acad Radiol 1997;4:587600.[CrossRef][Medline]
- Beiden SV, Wagner RF, Campbell G. Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. Acad Radiol 2000;7:341349.[CrossRef][Medline]
- Beiden SV, Wagner RF, Campbell G, Chan HP. Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis. Acad Radiol 2001;8:616622.[CrossRef][Medline]
- Beiden SV, Wagner RF, Campbell G, Metz CE, Jiang YL. Components-of-variance models for random-effects ROC analysis: the case of unequal variance structures across modalities. Acad Radiol 2001;8:605615.[CrossRef][Medline]
- Hillis S, Berbaum KS. Power estimation for the Dorfman-Berbaum-Metz method. Acad Radiol 2004;11:12601273.[CrossRef][Medline]
- Roe CA, Metz CE. Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. Acad Radiol 1997;4:298303.[CrossRef][Medline]
This article has been cited by other articles:

|
 |

|
 |
 
J. Vikgren, S. Zachrisson, A. Svalkvist, A. A. Johnsson, M. Boijsen, A. Flinck, S. Kheddache, and M. Bath
Comparison of Chest Tomosynthesis and Chest Radiography for Detection of Pulmonary Nodules: Human Observer Study of Clinical Cases
Radiology,
December 1, 2008;
249(3):
1034 - 1041.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. F. Branstetter IV
Basics of Imaging Informatics: Part 1
Radiology,
June 1, 2007;
243(3):
656 - 667.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. H. Kim, J. M. Lee, Y. J. Kim, J. Y. Choi, G. H. Kim, H. Y. Lee, and B. I. Choi
Detection of Hepatocellular Carcinoma on CT in Liver Transplant Candidates: Comparison of PACS Tile and Multisynchronized Stack Modes
Am. J. Roentgenol.,
May 1, 2007;
188(5):
1337 - 1342.
[Abstract]
[Full Text]
[PDF]
|
 |
|