Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print August 18, 2004, 10.1148/radiol.2331030432
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2331030432v1
233/1/255    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hadjiiski, L.
Right arrow Articles by Shen, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hadjiiski, L.
Right arrow Articles by Shen, J.
(Radiology 2004;233:255-265.)
© RSNA, 2004


Breast Imaging

Improvement in Radiologists’ Characterization of Malignant and Benign Breast Masses on Serial Mammograms with Computer-aided Diagnosis: An ROC Study1

Lubomir Hadjiiski, PhD, Heang-Ping Chan, PhD, Berkman Sahiner, PhD, Mark A. Helvie, MD, Marilyn A. Roubidoux, MD, Caroline Blane, MD, Chintana Paramagul, MD, Nicholas Petrick, PhD, Janet Bailey, MD, Katherine Klein, MD, Michelle Foster, MD, Stephanie Patterson, MD, Dorit Adler, MD, Alexis Nees, MD and Joseph Shen, MD

1 From the Department of Radiology, University of Michigan Medical Center, CGC B2102, 1500 E Medical Center Dr, Ann Arbor, MI 48109-0904 (L.H., H.P.C., B.S., M.A.H., M.A.R., C.B., C.P., J.B., K.K., M.F., S.P., D.A., A.N., J.S.); and Center for Devices and Radiological Health, U.S. Food and Drug Administration, Rockville, Md (N.P.). From the 2002 RSNA scientific assembly. Received March 17, 2003; revision requested June 13; final revision received January 9, 2004; accepted February 4. Supported by USAMRMC grants DAMD17–98-1–8211, DAMD17–02-1–0489, and DAMD17–02-1–0214. Address correspondence to L.H. (e-mail: lhadjisk@umich.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
PURPOSE: To evaluate the effects of computer-aided diagnosis (CAD) on radiologists’ characterization of masses on serial mammograms.

MATERIALS AND METHODS: Two hundred fifty-three temporal image pairs (138 malignant and 115 benign) obtained from 96 patients who had masses on serial mammograms were evaluated. The temporal pairs were formed by matching masses of the same view from two different examinations. Eight radiologists and two breast imaging fellows assessed the temporal pairs with and without computer aid. The classification of accuracy was quantified by using the area under receiver operating characteristic curve (Az). The statistical significance of the difference in Az between the different reading conditions was estimated with the Dorfman-Berbaum-Metz method for analysis of multireader multicase data and with the Student paired t test for analysis of observer-specific paired data.

RESULTS: The average Az for radiologists’ estimates of the likelihood of malignancy was 0.79 without CAD and improved to 0.84 with CAD. The improvement was statistically significant (P = .005). The corresponding average partial area index was 0.25 without CAD and improved to 0.37 with CAD. The improvement was also statistically significant (P = .005). On the basis of Breast Imaging Reporting and Data System assessments, it was estimated that with CAD, each radiologist, on average, reduced 0.7% (0.8 of 115) of unnecessary biopsies and correctly recommended 5.7% (7.8 of 138) of additional biopsies.

CONCLUSION: CAD based on analysis of interval changes can significantly increase radiologists’ accuracy in classification of masses and thereby may be useful in improving correct biopsy recommendations.

© RSNA, 2004

Index terms: Breast neoplasms, diagnosis, 00.31, 00.32 • Computers, diagnostic aid • Diagnostic radiology, observer performance


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Breast cancer is one of the leading causes of death in the United States among women between 40 and 55 years of age (1). Mammography is currently the most sensitive method for detecting early breast cancer, and it is also the most practical for screening (2,3). Although general rules for differentiation between malignant and benign lesions exist, in clinical practice only 15%–30% of patients referred for biopsy are found to have a malignancy (46). Unnecessary biopsies increase health care costs and may cause patient anxiety and morbidity. It is therefore important to improve the accuracy of interpreting mammographic lesions, thereby improving the positive predictive values of mammography.

Radiologists routinely compare the current mammograms of a patient with those obtained in previous years, if available, for identifying interval changes, detecting abnormalities, and evaluating breast lesions. It is widely accepted that interval changes in mammographic features are very useful for detection of breast cancer (7,8). In a recent study, Burnside et al (9) reported that in a diagnostic setting, comparison with the prior examination significantly (P < .001) increased the overall cancer detection rate.

A variety of computer-aided diagnosis (CAD) techniques have been developed to detect abnormalities and to distinguish malignant and benign lesions on mammograms. It has been shown that CAD systems could improve the radiologist’s accuracy in both detection and characterization of breast lesions in a single mammographic examination.

Chan et al (10) performed an observer study to evaluate the effects of CAD, which was designed for characterization of malignant and benign masses on mammograms obtained from a single examination (11), on the radiologist’s diagnostic accuracy. Two observer experiments were performed. In the first experiment, the radiologists evaluated a data set of masses on single-view mammograms. In the second experiment, they evaluated the masses on two-view mammograms. In both experiments, the radiologists’ performance in terms of the area under receiver operating characteristic (ROC) curve (Az) was significantly (P = .022 and .007, respectively) improved when reading with CAD was compared with reading without CAD.

Huo et al (12) developed a computer classifier for distinguishing between malignant and benign masses. Multiple views of the masses acquired in the same examination were used. An observer study with 12 radiologists was performed. The radiologists’ performance in terms of the Az was also significantly (P = .001) improved with computer aid.

Jiang et al (13) developed a computer classifier for classification of microcalcification clusters on multiple views of single-examination mammograms and also performed an observer study to evaluate its effectiveness. They found that with computer aid, the radiologists achieved a statistically significant (P < .001) improvement in the classification of microcalcifications. In addition, an increase in biopsy recommendations for malignant clusters, as well as a decrease in the recommendation of biopsy for benign lesions, was observed.

Authors of these previous studies of lesion classification with CAD used information from a single examination (1117). When mammograms from multiple examinations are available, it can be expected that even higher accuracy may be achieved if the computer can utilize the information obtained from analysis of interval changes for the classification. We (18) have developed a classification scheme that combines prior and current information that is automatically extracted from masses on prior and current mammograms, respectively. We found that the classifier using the combined prior and current information performed significantly better (P = .015) in terms of the Az than did the classifier using current information alone. Thus, the purpose of our study was to evaluate the effects of CAD on radiologists’ characterization of masses on serial mammograms.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Data Set
A set of 253 temporal pairs of mammograms containing biopsy-proved masses on the current mammograms was selected consecutively from our mammogram database, and the images were digitized. The mammograms were obtained from patients who had undergone biopsy of breast masses at our department. The data collection protocol had been approved by our institutional review board. Patient informed consent was waived for this retrospective study. The selection criterion was that the patient had undergone serial examinations in which a corresponding mass could be identified. The masses on both the current and prior mammograms encompassed a range of sizes and conspicuity that would be seen in clinical practice. We also tried to approximately balance the number of patients with malignant and benign masses. The data set consisted of 406 mammograms from 96 patients. The mammograms were digitized with a laser scanner (LUMISCAN 85; Lumisys, Los Altos, Calif) at a pixel resolution of 50 x 50 µm and 4096 gray levels. The digitizer was calibrated so that gray-level values were linearly proportional to the optical density in the range of 0–4, with a slope of 0.001 per pixel value. The digitizer output was linearly converted so that a large pixel value corresponded to a low optical density. The image matrix size was reduced by averaging every 2 x 2 adjacent pixels and was down-sampled by a factor of 2, resulting in images with a pixel size of 100 x 100 µm for further analysis.

There were 97 biopsy-proved masses (53 malignant and 44 benign) in 96 patients (age range, 37–86 years; mean, 59.6 years). One patient had a malignant mass in the left breast and a benign mass in the right breast. The 406 mammograms contained different mammographic views (193 craniocaudal, 177 mediolateral oblique, and 36 lateral) from multiple serial examinations of the masses, including those from the examination when the biopsy decision was made. By matching masses of the same view from two examinations, a total of 253 temporal pairs of images were formed, of which 138 had malignant and 115 had benign masses. In cases where there were only two examinations, a single pair was obtained for the given view. If there were three examinations, two or three temporal pairs were obtained. The distribution of the 253 temporal pairs among the 96 patients with 97 masses was as follows: 117 craniocaudal pairs originated from 87 masses, 115 mediolateral oblique pairs originated from 88 masses, and 21 lateral pairs originated from 17 masses. The same mass could have craniocaudal, mediolateral oblique, or lateral views. The prior mammogram was assessed as negative, benign, or probably benign in the prior year examination, and the majority remained so in retrospect. When a mass was not discretely visible on the prior mammogram, a Mammography Quality Standards Act–approved radiologist (M.A.H.), with 17 years of experience reading mammograms, defined the area where the mass would develop.

Since all 97 masses in this data set had undergone biopsy, the benign masses in this set could not be prospectively distinguished clinically from the malignant masses based on current mammographic criteria. The radiologists might have observed changes in or suspicious features of the benign masses that prompted them to recommend biopsy.

For the malignant masses, the average mass size was 8.0 mm on the prior and 11.5 mm on the current mammogram. The corresponding sizes were 9.9 and 11.5 mm, respectively, for the benign masses.

To simulate a more realistic clinical situation in which a radiologist also has to distinguish mass-mimicking fibroglandular tissue from true masses, 34 additional temporal pairs containing corresponding normal structures on the serial mammograms were also included. These normal structures were selected by an experienced radiologist (M.A.H.) and were deemed to be difficult to distinguish from masses without further diagnostic work-up. The main reason for inclusion of temporal pairs containing normal structures was to reduce potential bias the radiologists might have when they evaluated the cases in an ROC experiment. If the data set contained only malignant and benign masses (without normal pairs), the radiologists might be biased and give more optimistic scores. However, the 34 temporal pairs containing normal structures were excluded from the data analysis. In the analysis of results, it is more important to study the improvement in radiologists’ performance when true masses are read. Therefore, all analyses were based on the 253 temporal pairs containing masses.

The radiologist also rated the visibility of the masses on the mammograms relative to those encountered in clinical practice by using a 10-point scale, with a score of 1 representing the most obvious and a score of 10 representing the most subtle masses. For the malignant and benign temporal pairs, the visibility of the masses on the current mammogram is plotted against that observed on the prior mammogram, as shown in Figure 1. Generally, the malignant masses were less visible on the prior than on the current mammogram, while the visibility of the benign masses was found to be more similar on the current and prior mammograms. The mean difference in the visibility ratings between prior and current mammograms for the malignant masses was 2.3 compared with 1.0 for the benign masses (P < .001 with an unpaired t test between the malignant and benign masses). The correlation coefficient was 0.02 for malignant masses (Fig 1a) and 0.31 for benign masses (Fig 1b). The temporal pairs had an interval of 6–48 months (Fig 2).



View larger version (41K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1a. Graphs depict mass visibility on current and prior mammograms for (a) malignant and (b) benign temporal pairs of mammograms. Visibility was rated with a 10-point discrete scale (1, most obvious; 10, most subtle). Because many data points overlap, the number of points with the same rating are indicated by a number next to the symbol m or b. Diagonal line represents cases when the visibility ratings of current and prior masses are identical. The dashed linear regression line for the data is defined by (a) y = 0.121x + 7.599 and (b) y = 0.755x + 2.367. The correlation coefficient is 0.02 for malignant masses and 0.31 for benign masses.

 


View larger version (38K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1b. Graphs depict mass visibility on current and prior mammograms for (a) malignant and (b) benign temporal pairs of mammograms. Visibility was rated with a 10-point discrete scale (1, most obvious; 10, most subtle). Because many data points overlap, the number of points with the same rating are indicated by a number next to the symbol m or b. Diagonal line represents cases when the visibility ratings of current and prior masses are identical. The dashed linear regression line for the data is defined by (a) y = 0.121x + 7.599 and (b) y = 0.755x + 2.367. The correlation coefficient is 0.02 for malignant masses and 0.31 for benign masses.

 


View larger version (36K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Histogram illustrates the temporal interval between current and prior mammograms for the 253 temporal pairs of mammograms in the data set.

 
Computerized Classification of Temporal Masses
We have developed a classification technique that incorporates current and prior information to characterize the masses. The classification technique has been described in detail elsewhere (18). Figure 3 contains a flowchart of the method, which is summarized as follows: Initially, a region of interest (ROI) containing the mass was identified by a radiologist on both the current and prior mammograms. Automatic segmentation of the mass within each ROI was performed on the basis of a two-dimensional active contour model that was initialized with k-means clustering (19,20). Features related to texture, morphology, and spiculations were extracted from each mass (Appendix). A total of 35 features (20 based on run-length statistics, 12 morphological, and three spiculation) were extracted from each ROI. In addition, difference features were obtained by subtracting a prior feature from the corresponding current feature (Fig 3). Therefore, 35 difference features were derived from the 20 based on run-length statistics, 12 morphological features, and three spiculation features.



View larger version (36K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Flowchart of the classification method. LDA = linear discriminant classifier.

 
For the training and testing of the classifier, a "leave one case out" resampling scheme was used. To design a robust classifier, a subset of features was selected to reduce the dimensionality of the feature space. A stepwise feature selection was applied. For the 96 training subsets of temporal pairs used in this study, an average of seven features were selected for the classification. The most frequently selected features included two difference features based on run-length statistics (gray-level nonuniformity and short-run emphasis), three current features based on run-length statistics (short-run emphasis, run-length nonuniformity, and long-run emphasis), one spiculation feature from the current image, and one spiculation feature from the prior image. The distribution of the classifier test score is presented in Figure 4. Small values correspond to benign ratings and large values correspond to malignant ratings. By using the ROC methods, the overall performance of the classifier can be estimated on the basis of the classifier test scores.



View larger version (42K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4a. (a) Histogram of the classifier’s test scores. (b) Binormal distribution fitted to the histogram of the classifier’s test scores.

 


View larger version (37K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4b. (a) Histogram of the classifier’s test scores. (b) Binormal distribution fitted to the histogram of the classifier’s test scores.

 
Relative Computer Malignancy Rating of the Masses
A relative computer malignancy rating ranging from 1 to 10 was provided to the radiologists for the reading with CAD. The relative malignancy rating was obtained by linearly scaling the classifier output within the interval between 1 and 10 and then rounding the result to the nearest integer. A rating of 1 corresponded to the highest score of the mass being benign, and a rating of 10 corresponded to the highest score of the mass being malignant. This transformation provided a more intuitive presentation of the scale to the observer than did the original classifier output. The linear transformation was not used to evaluate the classifier accuracy in terms of the class distributions or in terms of ROC analysis. Gaussian functions were fitted to the distributions of the malignant and benign samples to obtain a fitted binormal distribution with the classifier’s malignancy ratings of 1–10 (Fig 4b). The fitted distribution was displayed on a graphical user interface as a reference when the radiologist evaluated the temporal pairs using CAD.

Observer Performance Study
The observer study evaluated the radiologist’s performance in the classification of malignant and benign breast masses by interpreting a temporal pair of ROIs containing the mass on a display monitor. The radiologist was asked to provide an estimate of the likelihood of malignancy by using a 0%–100% scale and the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) assessment (21) of each mass. The study was designed in two reading conditions. The first reading condition is referred to as the independent mode, in which the radiologist read the masses without computer aid. The second reading condition is referred to as the sequential mode, in which the radiologist initially read a temporal pair without computer aid and then read the same pair with computer aid. First, the ratings without computer aid were recorded and then the computer rating of the mass was displayed on the monitor. The radiologist recorded the final rating after taking into consideration the computer rating. For simplicity of presentation, we will consider that there are a total of three modes from the aforementioned two readings—independent mode, sequential mode without CAD, and sequential mode with CAD. The sequential mode without CAD differs from the independent mode only in that the reader knew that the computer information would immediately follow. Eight radiologists (A.N., C.B., C.P., D.A., J.B., K.K., M.A.R., and S.P.) approved by the Mammography Quality Standards Act and two breast imaging fellows (M.F. and J.S.) participated as observers in this study. (There was no correspondence between order of the observers above and the observers’ numeric order in the Results section and the Tables.) The eight radiologists had experience in mammography that ranged from 3 to 24 years. The breast imaging fellows were certified by the American Board of Radiology and had at least 3 months of experience in breast imaging.

For the observer experiments, the 253 pairs of images containing masses were divided into four non-overlapping groups, with approximately one-quarter of the pairs in each group. Each radiologist participated in four reading sessions. In each session, the observer read the pairs of images of one group in independent mode and those of another group in sequential mode so that no pairs of images would be read in both modes in a single session. The reading order of the temporal pairs of images within one group was randomized for each observer. Each observer would read in the independent mode first and then in the sequential mode in two of the sessions and vice versa in the other two sessions. We systematically arranged the reading order of the groups and the order of the modes to balance the frequency of both in the reading sessions. This counterbalanced design was intended to minimize the potential effects such as learning, fatigue, and memorization on the outcomes of the observer experiments. For each radiologist, there was at least a 1-month interval between reading pairs of images of the first two groups and those of the second two groups to avoid recall bias. All 10 observers read the temporal pairs independently.

Each observer underwent a training session in which the purpose of the study, the experimental procedure, the rating scales, the performance of the computer classifier, and the computer’s rating scale were explained. The observer was also informed that the pairs of images included normal tissues in addition to malignant and benign masses. The prevalence of the malignant masses in the data set was not disclosed to the observer either in the training session or in the actual reading session. The observer then read 10 temporal pairs of images that were not used in the actual experiments to familiarize the observer with the reading processes and the user interface. The observer was informed of the true pathologic findings after rating each training case so that the findings could be compared to the observer’s own ratings and the computer rating. However, in the actual experiment, no information regarding the true findings was provided after the readings.

A graphical user interface was developed to present the temporal pairs of images containing ROIs to the radiologists (Figs 5, 6). The observer assessed the two ROIs of a temporal pair that were displayed side-by-side on a display workstation. The observers provided estimates of the likelihood of malignancy by using a scale of 1%–100% and by choosing one of the five standard BI-RADS categories: negative, benign, probably benign, suspicious, and highly suggestive of malignancy. When the computer rating was displayed in the sequential mode with CAD, the fitted binomial distribution of the relative computer malignancy rating was presented to the radiologists (Fig 4b) as a reference. The radiologists were allowed unlimited time for the evaluation of the temporal pairs. For each radiologist, we recorded the time for the evaluation of the temporal pairs in both independent and sequential modes.



View larger version (45K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. Example of graphical user interface shows reading in the independent mode or sequential mode without CAD. ROI on the left is prior and that on the right is current. The radiologist provided two ratings: an estimate of the likelihood of malignancy and the BI-RADS assessment, shown at the upper right area of the screen.

 


View larger version (47K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6. Example of a graphical user interface shows reading in the sequential mode with computer aid. ROI on the left is prior and that on the right is current. The computer rating (8 in this example) was shown in the lower middle part of the screen. The performance of the computer classifier in terms of the distribution of the relative malignancy rating was shown in the lower right corner of the screen. In the sequential mode, the radiologist first evaluated the mass without CAD and then could change the likelihood of malignancy and/or BI-RADS assessment after taking into consideration the computer’s rating.

 
Statistical Analysis
The likelihood of malignancy ratings by the individual observers for the different reading conditions was analyzed by using ROC methods. The classification accuracy was quantified by using the total Az, as well as the partial area index (22) calculated above a sensitivity threshold of 0.90 (hereafter, 0.90A'z). The Az was estimated by using the Dorfman-Berbaum-Metz method for analysis of multireader multicase data (23), in which the maximum likelihood estimation of the binormal distributions was fitted to the observer ratings, deriving the ROC curve. The statistical significance of the difference in Az between the different reading conditions was also estimated by using the Dorfman-Berbaum-Metz method, the Student paired t test for analysis of observer-specific paired data, and the Obuchowski method (24). The Obuchowski method, which was also generalized by Lee and Rosner (25) for multiple readers multiple modalities studies, accounts for the possible correlations that exist among the temporal pairs of images, such as craniocaudal and mediolateral oblique pairs in the same patient or pairs obtained from multiple years in the same patient.

The radiologists’ diagnostic decision based on the BI-RADS assessment was analyzed in this study by partitioning the BI-RADS categories into two groups. Group 1 consisted of BI-RADS categories 1 and 2, and group 2 consisted of BI-RADS categories 3, 4, and 5. BI-RADS category 0 was not allowed. This partitioning was associated with the estimation of callbacks, referred to as the callback grouping. If a mass was assigned to group 1, then it was assumed that no callback would be recommended. If a mass was assigned to group 2, then it was assumed that at least callback would be recommended. Each of the temporal pairs of images for an observer reading in a given mode was then classified to be a member of one of the two groups on the basis of the BI-RADS assessment. The changes in the group membership for the temporal pairs were then tallied for the different modes. A second partitioning was performed by combining BI-RADS categories 1, 2, and 3 into group 1 and BI-RADS categories 4 and 5 into group 2. This partitioning was associated with the estimation of biopsy recommendations, referred to as the biopsy recommendation grouping. If a mass was assigned to group 1 then it was estimated that no biopsy would be recommended. If a mass was assigned to group 2 then it was assumed that biopsy would be recommended.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
The Az values for the 10 radiologists participating in the study for the three reading modes are presented in Table 1. The average ROC curves for the three reading modes and the classifier are shown in Figure 7. The computer classifier’s Az value for the 253 temporal pairs was 0.87. The average ROC curves for the observers were obtained by averaging the fitted a and b parameters of the individual radiologist’s ROC curve for each mode and then calculating the ROC curve. The Az value was 0.79 for the independent mode, 0.81 for the sequential mode without CAD, and 0.84 for the sequential mode with CAD. The performance of the radiologist therefore improved, on average, when reading was made with computer aid. The improvement between the sequential mode with CAD and the independent mode was statistically significant (Table 2) (P = .005, Student paired t test; P = .005, Dorfman-Berbaum-Metz method; P = .01, Obuchowski method). In addition, the improvement in performance between the sequential mode with CAD and the sequential mode without CAD was also statistically significant (P = .001, Student paired t test; P = .001, Dorfman-Berbaum-Metz method; P < .001, Obuchowski method). An improvement was observed between the sequential mode without CAD and the independent mode, but it did not achieve statistical significance (P = .137, Student paired t test; P = .139, Dorfman-Berbaum-Metz method; P = .073, Obuchowski method).


View this table:
[in this window]
[in a new window]

 
TABLE 1. AZ Values for the Characterization of Masses in the Three Modes

 


View larger version (40K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 7. Average ROC curves for the three reading modes: independent (No CAD-Ind) (Az = 0.79), sequential without CAD (No CAD-Seq) (Az = 0.81), and sequential with CAD (With CAD-Seq) (Az = 0.84).

 

View this table:
[in this window]
[in a new window]

 
TABLE 2. Statistical Significance of the Difference in Az Values for the Three Modes

 
The computer classifier’s Az value was higher than the individual radiologists’ Az value obtained in the independent mode without CAD. In the sequential mode with CAD, radiologists 1 and 4 achieved higher Az than did the computer classifier. Radiologist 6, who read in the sequential mode with CAD, obtained an Az value of 0.87, which was the same as that of the computer classifier. The performance of radiologist 8 declined with the use of CAD; the Az value was 0.82 for the independent mode and decreased to 0.76 for the sequential mode with CAD. However, when the sequential mode without CAD was compared with the sequential mode with CAD for this radiologist, there was no change (Az = 0.76). For the rest of the radiologists, the improvement in Az value ranged between 0.02 and 0.10.

Similar trends can be observed in the 0.90A'z values for the three reading modes (Table 3). The computer classifier’s 0.90A'z value was 0.52. The statistical significance of the differences between every two of the three modes is presented in Table 4. The improvement in the radiologists’ classification accuracy for the sequential mode with CAD (0.90A'z = 0.37) compared with that for the independent mode (0.90A'z = 0.21) was statistically significant (P = .005, Student paired t test). Similarly, the improvement for the sequential mode with CAD (0.90A'z = 0.37) compared with that for the sequential mode without CAD (0.90A'z = 0.26) was also statistically significant (P = .001, Student paired t test). Again, an improvement was observed between the sequential mode without CAD and the independent mode, but it did not achieve statistical significance (P = .180, Student paired t test). For radiologist 8, there was an improvement in the 0.90A'z value for the readings in the sequential mode without CAD (0.90A'z = 0.14) and then with use of CAD (0.90A'z = 0.17).


View this table:
[in this window]
[in a new window]

 
TABLE 3. 0.90A'z Values for the Characterization of Masses in the Three Modes

 

View this table:
[in this window]
[in a new window]

 
TABLE 4. Statistical Significance of the Difference in 0.90A'z Values for the Three Modes

 
On the basis of the BI-RADS assessment, the computer classifier’s influence on the radiologist’s diagnostic decision was evaluated. Results of the callback grouping based on the BI-RADS assessment for the three modes are presented in Table 5. When the radiologists evaluated the temporal pairs in the sequential mode with CAD, an average (per radiologist) of 2.3% (3.2 of 138) of additional malignant masses were correctly recommended for callback and 0.6% (0.7 of 115) of additional benign masses were incorrectly recommended for callback compared with the evaluation in the independent mode. The reading in sequential mode with CAD compared with reading in sequential mode without CAD resulted in an average of 1.4% (1.9 of 138) of additional correct callbacks for malignant masses and 2.1% (2.4 of 115) of additional incorrect callbacks for benign masses. A comparison of the results obtained from readings in the independent mode and the sequential mode without CAD is shown also in Table 5. Although both readings were conducted without CAD, there was, on average, a correct reduction of 1.5% (1.7 of 115) of callbacks for benign masses and an increase of 0.9% (1.3 of 138) of callbacks for malignant masses when reading in the sequential mode without CAD.


View this table:
[in this window]
[in a new window]

 
TABLE 5. Results of the Callback Grouping Based on BI-RADS Assessment for the Three Modes

 
Results of the biopsy recommendation grouping based on BI-RADS assessment for the three modes are presented in Table 6. When the radiologists evaluated the temporal pairs in the sequential mode with CAD compared with the independent mode, an improvement was obtained. There was an average reduction of 0.7% (0.8 of 115) of recommended biopsies for benign masses, and an additional 5.7% (7.8 of 138) of malignant masses were correctly recommended for biopsy. A comparison of the sequential mode with CAD with the sequential mode without CAD indicated that an average of 1.0% (1.1 of 115) of additional incorrect biopsy recommendations for benign masses were made, and an additional 4.0% (5.5 of 138) of correct biopsy recommendations for malignant masses were made. Table 6 also shows the results of comparison of reading in the independent mode with reading in the sequential mode without CAD. On average, a 1.7% (1.9 of 115) of correct reduction in biopsy recommendations were observed for benign masses, and an additional 1.7% (2.3 of 138) of correct biopsy recommendations for malignant masses were made when reading was performed in the sequential mode without CAD.


View this table:
[in this window]
[in a new window]

 
TABLE 6. Results of the Biopsy Recommendation Grouping Based on BI-RADS Assessment for the Three Modes

 
The reading time per temporal pair of mammograms for the 10 radiologists was 8.3–26.0 seconds (mean, 14.5 seconds) in the independent mode and 13.8–35.8 seconds (mean, 18.8 seconds) in the sequential mode. The increase in the reading time in the sequential mode (reading without CAD followed by reading with CAD) compared with reading in the independent mode (reading only without CAD) was statistically significant (P < .001).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
In this ROC study, we observed an improvement in the radiologists’ performance in the estimation of the likelihood of malignancy of masses seen on temporal pairs of mammograms when the radiologists read with computer aid. An improvement was also observed when the performance was evaluated in terms of BI-RADS assessment. To our knowledge, this was the first ROC study in which masses were evaluated by the radiologists on temporal pairs of mammograms and the computer classifier also used information regarding the temporal change in the classification of masses.

To study if the presence of a computer influences observer performance, we have used two reading modes without CAD: the independent mode and the sequential mode without CAD. We observed an interesting phenomenon: Seven of the 10 radiologists improved their Az values when reading in the sequential mode without CAD. Although the improvement did not achieve statistical significance and intraobserver variability might have contributed to the differences, this appeared to be consistent with our observation in another clinical observer study (26) for breast cancer detection. In that study, the callback rate for the study group increased during the reading without CAD compared with that for the screening population not participating in the study, and the sensitivity of cancer detection was relatively high (91%) compared with the sensitivities reported in the literature. This reflects the possibility of a subtle change in the behavior when that behavior is being studied.

In two other observer studies, in which the effect of CAD on radiologists’ performance in detection of lung nodules was evaluated (27,28), the independent and sequential modes were also compared. Kobayashi et al (27) found that 10 of 16 observers improved their Az values when reading in the sequential mode without CAD compared with reading in independent mode. The average Az value for the 16 observers was 0.894 for the independent mode and 0.906 for the sequential mode without CAD. In another study (28), the average Az value for the independent mode was 0.829 and that for sequential mode without CAD was 0.835. Therefore, in both studies, the same trend was observed as in our studies, although the differences again did not achieve statistical significance.

Beiden et al (29) discussed the psychologic phenomenon of reader vigilance even though it did not show statistically significant change in the radiologists’ performance in the aforementioned studies. Many radiologists may operate at a higher sensitivity level if they are aware that their performance is being evaluated. This awareness is accentuated when the computer’s reading is displayed immediately after the radiologist’s reading of each temporal pair of mammograms. There are exceptions. In our study, the performance of two of the radiologists (radiologist 8 and radiologist 10) decreased when the independent reading and the sequential reading without CAD were compared. However, if we compared the readings in the sequential mode without CAD and then with use of CAD, radiologist 8 showed an improvement in the 0.90A'z value. With CAD, radiologist 10 showed improved results, exceeding that of the reading in the independent mode.

The performance in terms of Az and 0.90A'z values was better in sequential mode with CAD than in the other modes. The improvement between reading in the sequential mode with CAD (Az = 0.84, 0.90A'z = 0.37) and the independent mode (Az = 0.79, 0.90A'z = 0.21) was greater than the improvement between reading in the sequential mode with CAD (Az = 0.84, 0.90A'z = 0.37) and the sequential mode without CAD (Az = 0.81, 0.90A'z = 0.26). However, reading in the sequential mode with CAD versus the sequential mode without CAD had higher statistical significance (P = .001, Student paired t test; P = .001, Dorfman-Berbaum-Metz method; P < .001, Obuchowski method for Az difference; P = .001, Student paired t test for 0.90A'z difference) than reading in the sequential mode with CAD versus the independent mode (P = .005, Student paired t test; P = .005, Dorfman-Berbaum-Metz method; P = .01, Obuchowski method for Az difference; P = .005, Student paired t test for 0.90A'z difference). This finding may be attributed to the fact that the correlation between the scores in the sequential mode with and without CAD is higher than the correlation between the scores in the independent mode and the sequential mode with CAD. The higher correlation leads to a smaller variance for the difference between reading in the sequential mode with and without CAD and thus a higher statistical significance in their difference.

Beiden et al (29) analyzed the variance components of the ROC accuracy measures for comparing independent versus sequential reading and reached the conclusion that sequential reading is expected to achieve higher statistical significance. Our results appear to be consistent with this expectation. The estimation based on the Obuchowski analysis that accounted for the possible correlation among the pairs of images did not change the trend or statistical significance of the results in comparison with those obtained with 253 temporal pairs of images.

The BI-RADS assessments provided by the radiologists allowed an estimation of the specific action that the radiologists would take after evaluating the temporal pairs of images. Generally, when the radiologists used CAD, they correctly recommended additional callbacks for malignant masses but also increased the callbacks for benign masses. This indicates that the radiologists would increase their sensitivity but might also reduce their specificity when they used CAD as discussed earlier and by Helvie et al (26). However, when the independent mode is compared with the sequential mode without CAD in terms of callback, we again observe the phenomenon that the radiologists were influenced by the presence of the computer. In this case, the trend is different: On average, the radiologists had a slight decrease in callbacks for benign masses and a correct increase in callbacks for malignant masses when evaluating in the sequential mode without CAD.

Performance based on the estimation of biopsy recommendations was better for sequential mode with CAD than for the other two modes. We observed, on average, a correct decrease in biopsy recommendation for benign masses (0.7%, 0.8 of 115) and an increase in biopsy recommendation for malignant masses (5.7%, 7.8 of 138) in the sequential mode with CAD than in the independent mode. For sequential mode with CAD compared with sequential mode without CAD, the radiologists also achieved, on average, a correct increase in biopsy recommendation for malignant masses (4.0%, 5.5 of 138); however, they also incorrectly increased biopsy recommendation for benign masses (1.0%, 1.1 of 115). Again, it is possible to conclude that the radiologists operated in a higher sensitivity mode when they used CAD. In this case, they correctly increased, on average, the recommendation for biopsy of malignant masses and did not substantially increase the recommendation for biopsy of benign masses. Note that the ROC curve for the radiologists’ reading with CAD is higher than the ROC curves for reading without CAD. The increase in sensitivity is therefore not a result of changing the operating point along their ROC curve but an actual increase in their overall accuracy.

When we compare the independent mode with the sequential mode without CAD in terms of biopsy recommendation, the radiologists were influenced by the presence of the computer. On average, the radiologists correctly reduced biopsy recommendation for benign masses and increased biopsy recommendation for malignant masses. If the individual radiologist’s decisions are reviewed, it can be seen that there are large variations regarding the effect of CAD. These variations may be caused by the differences in the radiologist’s confidence levels in the CAD system. The positive effect may increase if the accuracy of the computer classifier is further improved or if the confidence of the radiologists increases after they accumulate more experiences in working with CAD.

The increase in the reading time for the sequential mode compared with that in the independent mode is owing to the fact that in the sequential mode two conditions were evaluated, reading without CAD followed by reading with CAD; whereas in the independent mode, only one condition was evaluated, reading without CAD. We did not observe correlation between the reading time and the observer performance results.

We did not observe a specific trend in the performance of the breast imaging fellows and the radiologists. This probably may be explained by the fact that we included only two imaging fellows, which was insufficient to show a trend.

There are some limitations of our study. Ideally, the classifier should be developed on the basis of an independent data set and then applied to the data set used to evaluate the radiologist performance. However, we were limited in the size of the data set with temporal pairs collected for this study. A split of the data set would reduce the statistical power of the study. We used a "leave one case out" resampling method to develop and test our classifier with the same data set as that used for the observer performance study. The method is well established in the pattern recognition literature as a statistically valid technique for estimation of the classifier performance in an unknown population. The test scores of the classifier were presented to the radiologists in the observer study. Furthermore, the purpose of this study was not to measure the absolute performance of the radiologists in comparison with that of the classifier. Rather, our goal was to demonstrate that there is a relative improvement in radiologists’ performance when they use a computer classifier that has a reasonable performance as a second opinion. We believe that the use of a different data set will not change the conclusions as long as the computer classifier has a reasonable performance.

In conclusion, we have performed an observer study to evaluate the effects of CAD on radiologists’ characterization of masses on serial mammograms. The radiologists have significantly (P = .005) improved their performance when reading with computer aid was compared with reading without computer aid. Additional biopsies were correctly recommended for the malignant masses when reading with computer aid, and some biopsies of benign masses were reduced. These results suggest that CAD may be helpful in improving the accuracy of biopsy recommendations. Further studies are needed to determine if these improvements can be realized in clinical settings, where the prevalence of malignancy is much lower than that in an observer study.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Features related to texture, morphology, and spiculation were extracted from each mass. The texture features were based on run-length statistics matrices (30). The run-length statistics matrices were computed from the images obtained with the rubber-band-straightening transform (11). The rubber-band-straightening transform maps a band of pixels surrounding the mass onto the cartesian plane (a rectangular region). The texture features were extracted from the vertical and horizontal gradient-magnitude images (11). Five texture measures, namely, short-run emphasis, long-run emphasis, gray-level nonuniformity, run-length nonuniformity, and run percentage were extracted from the vertical and horizontal gradient images in two directions, {theta} = 0° and {theta} = 90°. Therefore, for each ROI, a total of 20 features were calculated. The definition of the feature measures based on run-length statistics matrices can be found in the literature (30).

Morphological features were extracted from the automatically segmented mass shape. Five of the morphological features were based on the normalized radial length, defined as the euclidean distance from the object’s centroid to each of its edge pixels, that is, the radial length, and normalized relative to the maximum radial length for the object (15). The following five features of normalized radial length were extracted: mean, standard deviation, entropy, area ratio, zero crossing count. In addition, the perimeter, area, circularity, rectangularity, contrast, perimeter-to-area ratio, and Fourier descriptor were extracted. The definitions of the morphological features can be found in the literature (20,31). Three of the morphological features (perimeter, area, and perimeter-to-area ratio) are related to the mass size and thus are feature descriptors of the mass size.

A spiculation measure was defined for each pixel on the mass border by using the statistics based on the directions of image gradients of pixels outside the mass border, relative to the normal direction to the mass border. The statistics were determined in a 90° sector centered about the normal at the border pixel and outside of the mass border (19,20). The spiculation measure for each border pixel was normalized to be between 0 and {pi}/2, with {pi}/4 indicating a random orientation of image gradients and larger values indicating a higher likelihood of spiculation. Three features were extracted from the spiculation measure. The first feature was the average of the spiculation measure for all pixels on the mass boundary. The second feature was the percentage of border pixels with a spiculation measure larger than {pi}/4. The third feature was the average of the spiculation measure for pixels with a spiculation measure larger than {pi}/4.


    ACKNOWLEDGMENTS
 
The authors are grateful to Charles E. Metz, PhD, for the use of the LABMRMC program.


    FOOTNOTES
 
Abbreviations: Az = area under ROC curve, 0.90A'z = partial Az index, BI-RADS = Breast Imaging Reporting and Data System, CAD = computer-aided diagnosis, ROC = receiver operating characteristic, ROI = region of interest

Authors stated no financial relationship to disclose.

Author contributions: Guarantor of integrity of entire study, L.H.; study concepts and design, L.H., H.P.C., B.S., M.A.H.; literature research, L.H., H.P.C., B.S.; experimental studies, M.A.R., C.B., C.P., J.B., K.K., M.F., S.P., M.A.H., D.A., A.N., J.S.; data acquisition, all authors; data analysis/interpretation, L.H., H.P.C., B.S., N.P.; statistical analysis, L.H.; manuscript preparation, definition of intellectual content, editing, revision/review, and final version approval, all authors


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 

  1. Greenlee RT, Hill-Harmon MB, Murray T, Thun M. Cancer statistics, 2001. CA Cancer J Clin 2001; 51:15-36.[Abstract/Free Full Text]
  2. Zuckerman HC. The role of mammography in the diagnosis of breast cancer. In: Ariel IM, Cleary JB, eds. Breast cancer, diagnosis and treatment. New York, NY: McGraw-Hill, 1987; 152-172.
  3. Tabar L, Dean PB. The control of breast cancer through mammography screening: what is the evidence? Radiol Clin North Am 1987; 25:993-1005.[Medline]
  4. Sickles EA. Mammographic features of 300 consecutive nonpalpable breast cancers. AJR Am J Roentgenol 1986; 146:661-663.[Abstract/Free Full Text]
  5. Kopans DB. The positive predictive value of mammography. AJR Am J Roentgenol 1992; 158:521-526.[Free Full Text]
  6. Adler DD, Helvie MA. Mammographic biopsy recommendations. Curr Opin Radiol 1992; 4:123-129.
  7. Bassett L, Shayestehfar B, Hirbawi I. Obtaining previous mammograms for comparison: usefulness and costs. AJR Am J Roentgenol 1994; 163:1083-1086.[Abstract/Free Full Text]
  8. Sickles EA. Periodic mammographic follow-up of probably benign lesions: results in 3183 consecutive cases. Radiology 1991; 179:463-468.[Abstract/Free Full Text]
  9. Burnside E, Sickles E, Sohlich R, Dee K. Differential value of comparison with previous examinations in diagnostic versus screening mammography. AJR Am J Roentgenol 2002; 179:1173-1177.[Abstract/Free Full Text]
  10. Chan HP, Sahiner B, Helvie MA, et al. Improvement of radiologists’ characterization of mammographic masses by computer-aided diagnosis: an ROC study. Radiology 1999; 212:817-827.[Abstract/Free Full Text]
  11. Sahiner B, Chan H, Petrick N, Helvie M, Goodsitt M. Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis. Med Phys 1998; 25:516-526.[CrossRef][Medline]
  12. Huo Z, Giger M, Vyborny C, Metz C. Breast cancer: effectiveness of computer-aided diagnosis—observer study with independent database of mammograms. Radiology 2002; 224:560-568.[Abstract/Free Full Text]
  13. Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K. Improving breast cancer diagnosis with computer-aided diagnosis. Acad Radiol 1999; 6:22-33.[CrossRef][Medline]
  14. Chan H, Sahiner B, Petrick N, et al. Computerized classification of malignant and benign microcalcifications on mammograms: texture analysis using an artificial neural network. Phys Med Biol 1997; 42:549-567.[CrossRef][Medline]
  15. Kilday J, Palmieri F, Fox MD. Classifying mammographic lesions using computer-aided image analysis. IEEE Trans Med Imaging 1993; 12:664-669.[Medline]
  16. Hadjiiski L, Sahiner B, Chan HP, Petrick N, Helvie MA. Classification of malignant and benign masses based on hybrid ART2LDA approach. IEEE Trans Med Imaging 1999; 18:1178-1187.[CrossRef][Medline]
  17. Tourassi G, Markey M, Lo J, Floyd C. A neural network approach to breast cancer diagnosis as a constraint satisfaction problem. Med Phys 2001; 28:804-811.[CrossRef][Medline]
  18. Hadjiiski L, Sahiner B, Chan HP, Petrick N, Helvie MA, Gurcan M. Analysis of temporal change of mammographic features: computer-aided classification of malignant and benign breast masses. Med Phys 2001; 28:2309-2317.[CrossRef][Medline]
  19. Sahiner B, Chan HP, Petrick N, Hadjiiski LM, Helvie MA, Paquerault S. Active contour models for segmentation and characterization of mammographic masses In: Proceeding of the 5th International Workshop on Digital Mammography. Toronto, Canada. Madison, Wis: Medical Physics, 2001; 357-362.
  20. Sahiner B, Chan H, Petrick N, Helvie M, Hadjiiski L. Improvement of mammographic mass characterization using spiculation measures and morphological features. Med Phys 2001; 28:1455-1465.[CrossRef][Medline]
  21. American College of Radiology. Breast imaging and data system atlas (BI-RADS atlas) Reston Va: American College of Radiolgy, 2003.
  22. Jiang Y, Metz C, Nishikawa R. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996; 201:745-750.[Abstract/Free Full Text]
  23. Dorfman DD, Berbaum KS, Metz CE. ROC rating analysis: generalization to the population of readers and cases with the jackknife method. Invest Radiol 1992; 27:723-731.[CrossRef][Medline]
  24. Obuchowski N. Nonparametric analysis of clustered ROC curve data. Biometrics 1997; 53:567-578.[CrossRef][Medline]
  25. Lee ML, Rosner BA. The average area under correlated receiver operating characteristic curves: a nonparametric approach based on generalized two-sample Wilcoxon statistics. J R Stat Soc C Appl Stat 2001; 50:337-344.[CrossRef]
  26. Helvie MA, Hadjiiski LM, Makariou E, Chan HP, Petrick N, Lo SB. A non-commercial CAD system for breast cancer detection on screening mammograms achieves high sensitivity: a pilot clinical trial (abstr). Radiology 2002; 225(P):459.
  27. Kobayashi T, Xu X, MacMahon H, Metz C, Doi K. Effect of a computer-aided diagnosis scheme on radiologists’ performance in detection of lung nodules on radiographs. Radiology 1996; 199:843- 848.[Abstract/Free Full Text]
  28. U.S Food and Drug Administration, Center for Devices and Radiological Health. Radiological devices advisory panel meeting, March 5, 2001: review of Deus RapidScreen. Available at www.fda.gov/search/databases.html. Accessed June 7, 2002.
  29. Beiden S, Wagner R, Doi K, et al. Independent versus sequential reading in ROC studies of computer-assist modalities: analysis of component of variance. Acad Radiol 2002; 9:1036-1043.[CrossRef][Medline]
  30. Galloway MM. Texture classification using gray level run lengths. Comput Graph Image Proc 1975; 4:172-179.
  31. Petrick N, Chan H, Sahiner B, Helvie M. Combined adaptive enhancement and region-growing segmentation of breast masses on digitized mammograms. Med Phys 1999; 26:1642-1654.[CrossRef][Medline]



This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
S. Kakeda, Y. Korogi, H. Arimura, T. Hirai, S. Katsuragawa, T. Aoki, and K. Doi
Diagnostic Accuracy and Reading Time to Detect Intracranial Aneurysms on MR Angiography Using a Computer-Aided Diagnosis System
Am. J. Roentgenol., February 1, 2008; 190(2): 459 - 465.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
S. D. O'Connor, J. Yao, and R. M. Summers
Lytic Metastases in Thoracolumbar Spine: Computer-aided Detection at CT--Preliminary Study
Radiology, March 1, 2007; 242(3): 811 - 816.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
B. Sahiner, H.-P. Chan, M. A. Roubidoux, L. M. Hadjiiski, M. A. Helvie, C. Paramagul, J. Bailey, A. V. Nees, and C. Blane
Malignant and Benign Breast Masses on 3D US Volumetric Images: Effect of Computer-aided Diagnosis on Radiologist Accuracy
Radiology, March 1, 2007; 242(3): 716 - 724.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
J. M. Ko, M. J. Nicholas, J. B. Mendel, and P. J. Slanetz
Prospective assessment of computer-aided detection in interpretation of screening mammography.
Am. J. Roentgenol., December 1, 2006; 187(6): 1483 - 1491.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
S. D. O'Connor, R. M. Summers, J. Yao, P. J. Pickhardt, and J. R. Choi
CT Colonography with Computer-aided Polyp Detection: Volume and Attenuation Thresholds to Reduce False-Positive Findings Owing to the Ileocecal Valve
Radiology, November 1, 2006; 241(2): 426 - 432.
[Abstract] [Full Text] [PDF]


Home page