|
|
||||||||
Breast Imaging |
1 From the Institute of Biomedical Engineering, National Taiwan University, 1, Section 1, Jen-Ai Rd, Taipei 100, Taiwan (C.M.C., K.C.H., G.S.H.); Department of Radiology, Division of Ultrasound, Taipei Veterans General Hospital and National Yang Ming University, Taiwan (Y.H.C., C.M.T., H.J.C.); and Department of Radiology, Division of Ultrasound, Taipei Veterans General Hospital (S.Y.C.). Received November 19, 2001; revision requested January 28, 2002; revision received May 17; accepted June 27. Supported by National Science Council grant NSC90-2213-E-002-103, Taiwan. Address correspondence to C.M.C. (e-mail: ming@lotus.mc.ntu.edu.tw).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Two sets of breast sonograms were evaluated. The first set contained 160 lesions and was stored directly on the magnetic optic disks from the ultrasonographic (US) system. Four different boundaries were delineated by four persons for each lesion in the first set. The second set comprised 111 lesions that were extracted from the hard-copy images. Seven morphologic features were used, five of which were newly developed. A multilayer feed-forward neural network was used as the classifier. Reliability, extendability, and robustness of the proposed CAD algorithm were evaluated. Results with the proposed algorithm were compared with those with two previous CAD algorithms. All performance comparisons were based on paired-samples t tests.
RESULTS: The area under the receiver operating characteristic curve (Az) was 0.952 ± 0.014 for the first set, 0.982 ± 0.004 for the first set as the training set and the second set as the prediction set, 0.954 ± 0.016 for the second set as the training set and the first set as the prediction set, and 0.950 ± 0.005 for all 271 lesions. At the 5% significance level, the performance of the proposed CAD algorithm was shown to be extendible from one set of US images to the other set and robust for both small and large sample sizes. Moreover, the proposed CAD algorithm was shown to outperform the two previous CAD algorithms in terms of the Az value.
CONCLUSION: The proposed CAD algorithm could effectively and reliably differentiate benign and malignant lesions. The proposed morphologic features were nearly setting independent and could tolerate reasonable variation in boundary delineation.
© RSNA, 2003
Index terms: Breast neoplasms, diagnosis, 00.30 Breast neoplasms, US, 00.1298 Computers, diagnostic aid Computers, neural network Images, analysis
| INTRODUCTION |
|---|
|
|
|---|
Breast sonography was shown to be an effective adjunct to mammography in reducing the number of negative biopsy results (48). For example, with deliberately devised sonographic features, Stavros et al (7) were able to attain the overall sensitivity, specificity, accuracy, positive predictive value, and negative predictive value of 98.4%, 67.8%, 72.9%, 38%, and 99.5%, respectively. Potentially effective as it is, breast sonography remains controversial for screening because interpretation of the ultrasonographic (US) images is greatly influenced by the scanning techniques and the sonographic features of the suspected abnormality. Breast sonologists with different experiences might have different interpretations of the sonograms. To minimize the effect of the operator-dependent nature inherent in US, many computerized approaches have been proposed to assist differentiation between benign and malignant breast lesions (915).
The general idea of computer-aided diagnosis (CAD) for breast sonography is to convert the visually extractable sonographic features into mathematic models and to characterize the lesions with the mathematic features based on the classification schemes. The mathematic features may be categorized into two classes, namely, the regional features and the morphologic features. The regional features characterize the image properties evolved from the intensity distribution (eg, echogenicity, echotexture), whereas the morphologic features describe the shape and contour of the lesion. As an example, with use of the mathematic features that quantify lesion margin, shape, homogeneity, and posterior acoustic attenuation pattern, Giger et al (12) achieved values for the area under the receiver operating characteristic (ROC) curve (Az) of 0.94 and 0.87 for the entire database and the equivocal database on the basis of linear discriminant analysis (LDA).
Although promising performances have been reported, CAD for breast sonography is still impractical for routine use because previous mathematic features depend on either the setting of the US systems or the contour extraction process. It is easy to show that most regional features vary nonlinearly with the system setting. For instance, the co-occurrence matrix used by Garra et al (10) may fluctuate with such system parameters as the time-gain compensation, total gain, and focal depth. To avoid this problem, many previous CAD algorithms necessitate that all breast images be obtained with the same system parameter setting (1315). This constraint is clinically undesirable. On the other hand, since the morphologic features (eg, the contour gradients [11]) are derived from the contour, they are more susceptible to the contour extraction process than are the regional features. Ideally, this problem may be solved by using automatic contour extraction schemes. However, automatic contour extraction on a US image is a difficult task in general, and no satisfactory approaches exist so far, to our knowledge.
The purpose of this study was to develop a CAD algorithm with setting-independent features and artificial neural networks to differentiate benign from malignant breast lesions. More specifically, this study was aimed to design a set of morphologic features that were nearly independent of not only the system setting but also the contour extraction process.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Study Subjects and Image Acquisition
Two sets of breast US images were used in this study that were selected randomly from the database of a medical center in Taiwan. The image data were collected during a period of 4 years. The institutional review boards agreed that the patient images and clinical information could be used for study without written consent if anonymity was maintained. This regulation was carefully followed in the present study.
The first set of sonograms was obtained from September 9, 1996, to June 6, 2000, in 160 female patients (age range, 1685 years; mean age, 46 years). The sonograms depicted 160 breast lesions, including 42 cysts, 49 fibroadenomas, and 69 carcinomas, that were pathologically proven. They were stored directly (by using the system built-in function) on a US system (HDI 3000; Advanced Technological Laboratory, Bothell, Wash) equipped with a broadband 510-MHz linear electronically focused transducer and cine loop capability.
The second set of sonograms was obtained from January 1, 1997, to December 31, 1998, in 111 women (age range, 1882 years; mean age, 42 years). The sonograms depicted 111 breast lesions that were pathologically proven, including 40 fibroadenomas and 71 carcinomas. They were obtained with the same US system that was used for the first set. Unlike the first set of US images, the second set comprised hard-copy images. The lesions were extracted from these sonograms by first digitizing these images with film scanners (HP6300C; Hewlett-Packard, Palo Alto, Calif).
No constraint was imposed on the system settings during acquisition of these images. The sonologists were free to adjust the system settings to obtain the best views. In both sets, the lesion boundaries were delineated manually. The first set of lesions served as the primary basis for performance evaluation and comparison of the complete morphologic and regional information preserved in the directly stored US images. To take into account the potential variation of delineation among different persons, the first set of lesions was delineated by four graduate students (K.C.H. and others), and each student was supervised by one of four attending physicians (Y.H.C., C.M.T., H.J.C., S.Y.C.) with 22, 7, 7, and 3 years of experiences in breast sonography, respectively. For each lesion, size variation was defined as the ratio of the SD to the mean of the sizes of the four delineated lesion boundaries. The mean ± SD of variations of lesion size for all lesions in the first set was 9.1% ± 7.7.
The second set of lesions served as a larger number of samples with which to evaluate the extendability and robustness of the proposed CAD algorithm. The second set of lesions was delineated by only one graduate student (G.S.H.), under the supervision of an attending physician (Y.H.C.) with 22 years of experience in breast sonography. Two of the five graduate students (K.C.H., G.S.H.) were involved in development of the CAD algorithm.
Feature Extraction
Seven morphologic features were extracted from each lesion to account for such sonographic features as shape, contour, and size. Five of these morphologic features were newly developed, including the number of substantial protuberances and depressions (NSPD), lobulation index (LI), elliptic-normalized circumference (ENC), elliptic-normalized skeleton (ENS), and long axis to short axis (L:S) ratio. The other two features were clinically useful indicators (19): depth-to-width (D:W) ratio and size of the lesion.
NSPD.The spiculation (7) and irregular shape and contour (8) of a lesion are two important sonographic features that characterize a malignant breast lesion. The NSPD is an effective descriptor in a lesion to quantify these two sonographic features. With a geographic analogy, a protuberance and a depression are like a peninsula and a bay, respectively. As an example, Figure 1 shows typical protuberances and depressions in a malignant breast lesion. Since protuberances and depressions may easily result from a wobbly delineation process, as described in the Appendix, only the substantial protuberances and depressions defined by the representative convex and concave points, respectively, were used to characterize a breast lesion.
|
p, let
= {
1,
2,...,
p} and
= {
1,
2,...,
d} be the set of representative convex and concave points of a lesion boundary, where p and d are the numbers of points in each set. The NSPD, denoted by NSPD(
p), is defined as p + d, where
p
{20°, 30°, 40°, 50°, 60°}. Ideally, a malignant breast lesion has a larger NSPD.
LI.LI was devised to characterize the size distribution of the lobes in a lesion. As illustrated in Figure 2, a lobe is defined as the gray region enclosed by the lesion contour and the dashed line connected by two adjacent representative concave points. The size of the lobe is the area of the gray region. Suppose a breast lesion has Nl lobes and the size of the ith lobe is Ai, i = 1, to Nl. Let Amax and Amin denote the sizes of the largest and the smallest lobes. The LI is then defined as
|
|
ENC.Anfractuosity is a common morphologic characteristic of malignant lesion boundaries that provides at least two visually appreciable geometric features. One feature is the multiple protuberances and depressions that may be well described with the NSPD. The other feature is the lengthened circumference due to the circuitous boundaries that define the protuberances and depressions. Since the boundary of a smaller lesion would appear to be more winding than that of a larger lesion with the same circumference, the circumference itself is not a good descriptor with which to characterize the anfractuosity of the lesion boundary. Alternatively, a more reasonable approach is quantification of the anfractuosity with the percentage of circumference increment relative to a lesion-dependent baseline. An ideal baseline would be a smooth curve such that the lesion boundary would look like twining around the curve.
To quantify the anfractuosity of a lesion contour, the circumference ratio of the lesion and its equivalent ellipse is proposed, which is termed the ENC. The equivalent ellipse of a lesion (20,21) is an ellipse with the same area and center of mass as those of the lesion when the interiors of the lesion and its equivalent ellipse are both set to the same constant gray level. For instance, Figure 3 shows the equivalent ellipse and boundary of a malignant lesion. Perceptually, one can see that the equivalent ellipse roughly captures the shape of the lesion, and the lesion boundary meanders around the equivalent ellipse.
|
X, where x is within R and there exist at least two boundary points, pi and pj, in BR such that d(x, pi) = d(x, pj) = min {d(x, pk)|pk
BR}, where d( · ) is any preferred distance metric (eg, Euclidean, city block). With the malignant breast lesion shown in Figure 3 as an example, the skeleton of the lesion is given in Figure 4.
|
In addition to these four descriptorsNSPD, LI, ENC, and ENSwhich capture the contour and shape characteristics, three more mathematic features are considered to incorporate two clinically useful indicators. The first feature is the D:W ratio of the lesion (11,12,19). The depth and the width of a lesion are the horizontal and vertical edge lengths, respectively, of the minimal circumscribed rectangle of the lesion. The larger the D:W ratio, the more likely the lesion is malignant. Since the D:W ratio may vary with the scanning angle and the compressing pressure, however, we suggest use of another quantity to describe the shape of the lesion, namely, the L:S ratio. The L:S ratio is the length ratio of the major (long) axis to the minor (short) axis of the equivalent ellipse of the lesion. Clearly, the L:S ratio is independent of the scanning angle but may be affected by the compressing pressure. The last feature is the size of the lesion (ie, the area within the lesion boundary). Clinically, the larger the breast lesion, the more likely the lesion is malignant.
Feature Selection and Classification
Feature selection is usually used to select a set of features that potentially yield the best performance with the given classifier. These selected features are referred to as the substantial features. The classifier is then trained with the substantial features to determine the mathematic model that describes the relation of these features.
The substantial features were selected for each training data set on the basis of the forward sequential search approach (23) with the logistic discrimination function (17). To minimize the estimation bias (16), the classification accuracy
(Y) for a feature set Y was evaluated by means of the leave-one-out cross-validation strategy for each training data set. More specifically, set the counter nc = 0, for every li in the training data set
with m data, then construct the logistic discrimination function with the training data
i =
- {li} with the feature set Y. If li is predicted correctly with the derived logistic discrimination function, then increase nc by 1. After all li values have been evaluated, then compute
(Y) = nC/m.
Feature selection was performed in two stages for each training data set. In the first stage, the best NSPD value was selected from five candidate NSPD values that corresponded to five
p values (ie,
p
{20°, 30°, 40°, 50°, 60°}). Then, in the second stage, the selected NSPD value along with the other six features were used to select the essential features that yielded the best classification accuracy for the underlying training data set.
The classifier used in the present study is an MFNN. Once the essential features were selected by means of the logistic discrimination function for a set of training data
, the training data were used to train the MFNN to divide the training data into benign and malignant categories. As depicted in Figure 5, the MFNN used in this study was a two-layer feed-forward neural network with one hidden layer. The number of inputs for the MFNN was set to be the same as the number of essential features, and the number of neurons in the output layer was set to 1 for the underlying two-class classification. Some suggestions were made previously (eg, the Kolmogorov theorem [24]) to determine the number of neurons in the hidden layer, but none of them led to satisfactory performance. Instead, the number of neurons in the hidden layer was determined through exhaustive experiments to be two to 10 neurons. As a result, the number of neurons in the hidden layer was set to two because results with two neurons gave the best performance for almost all cases evaluated.
|
. The discrepancy between the desired output and the estimated output was back propagated to modify the synaptic weights until the discrepancy was within the acceptable range. Since the final output o was a number between 0 and 1, a threshold TNN, where NN is neural network, was required to assign the datum to the benign or the malignant category. In the present study, TNN was determined on the basis of the value that resulted in a dichotomization with the best classification accuracy for the training data, or TNN varied from 0 to 1 to generate the ROC curve.
Comparative Performance Analysis
Five experiments were conducted in the present study for performance analysis. Two performance measures were reported for each analysis. One measure was the Az value, which was calculated by using commercially available statistical software (SPSS for Windows, version 10; SPSS, Chicago, Ill). The other measure was the best classification accuracy (TP + TN)/(TP + TN + FP + FN) along with the associated sensitivity (TP/[TP + FN]), specificity (TN/[TN + FP]), positive predictive value (TP/[TP + FP]), and negative predictive value (TN/[TN + FN]), where TP is the number of true-positive findings (ie, a malignant lesion is considered to be malignant); TN, true-negative; FP, false-positive; and FN, false-negative.
For performance comparison, the Az values were used because the best classification accuracy is not necessarily the preferred criterion for classification. Sometimes, one would rather have a higher sensitivity or specificity than have the best accuracy. Except for the third experiment, all performance measures were derived on the basis of the leave-one-out cross-validation strategy.
To evaluate the reliability of the proposed morphologic features, in the first experiment, denoted as C160, the proposed CAD algorithm was evaluated with the four sets of boundaries drawn independently by four people for each of the 160 breast lesions in the first set of US images. To justify the necessity for feature selection, the second experiment, denoted as C160A, repeated the first experiment but without feature selection (ie, all seven features were used by the MFNN classifier). A paired-samples t test was used to test if incorporation of feature selection would yield a better performance (ie, if C160 was significantly better than C160A), with the significance level set at
= .05.
Results of the third experiment validated the extendibility of the proposed CAD algorithm. Since the two sets of US images used in the present study originated from two archiving media, degradation for the boundary definition of the lesions that was caused by the acquisition procedure or the archiving medium was potentially different for each set. Therefore, these two sets of US images might be considered as samples from two different sample spaces. In this experiment, we attempted to investigate how well the classifier derived on the basis of images from one sample space could be extended to those from the other sample space. Two implementations were performed. One implementation was performed with the first set of US images as the training set and the second set as the prediction set, which was denoted as C271f. In reverse, the other implementation was performed with the second set of US images as the training set and the first set as the prediction set, which was denoted as C271r. The training set was trained on the basis of the leave-one-out cross-validation strategy. Recall that only the first set of US images had four sets of boundaries. Paired-samples t tests were used to test if C271f and C271r had the same or better performance than did C160 and if C271f and C271r had the same performance with the significance level set at
= .05.
The fourth experiment, denoted as C271LC, was performed to investigate the robustness of the proposed CAD algorithm. That is, we attempted to evaluate how well the performance achieved in the first experiment with a smaller sample size could be reproduced with a larger number of samples from heterogeneous sample spaces. All 271 breast lesions were involved in the fourth experiment, and the leave-one-out cross-validation strategy was used. To validate the robustness of the proposed CAD algorithm, paired-samples t test was used to determine if C271LC had the same performance as C160 with the significance level set at
= .05.
For comparative study, in the fifth experiment, the proposed CAD algorithm was compared with two previous CAD algorithms with the first set of breast lesions. The first algorithm was proposed by Giger et al (12), which was denoted as LDAGiger. The Giger algorithm included four mathematic features, namely, normalized radial gradient, D:W ratio, coarseness, and the mean gray-level difference between the region of interest within the lesion and that posterior to the lesion, denoted by µ1 - µ2. The classification scheme was the LDA. Since the MFNN is usually superior to the LDA, for a fair comparison with our approach, as a modified implementation, which was denoted as MFNNGiger, the MFNN was used to replace the LDA. The number of neurons in the hidden layer was also determined by searching in the range of two to 10 neurons. As a result, 10 neurons were used in the hidden layer for the modified Giger algorithm.
The second CAD algorithm to be compared with the proposed CAD algorithm was proposed by Chen et al (13), which was denoted as MFNNChen. The Chen algorithm was based solely on a texture feature (ie, normalized autocorrelation coefficients obtained from a rectangular region of interest that enclosed the lesion). The size of the region of interest was 12 mm extended from the lesion margin in all directions. The feature vector contained 5 x 5 autocorrelation coefficients. The classifier used by Chen et al (13) was an MFNNwith 25 inputs, 10 hidden nodes, and one output nodethat was also trained with the error back-propagation training algorithm.
To test if C160 had significantly better performance than that of LDAGiger, MFNNGiger, and MFNNChen, paired-samples t tests were applied with the significance level set at
= .05. Moreover, a paired-samples t test was used to determine the relative performance among these three implementations.
In addition to these five experiments, the performance of each individual feature, including the proposed seven features and the Giger features, was evaluated by means of logistic discrimination analysis (17) based on the leave-one-out cross-validation strategy. Paired-samples t tests were used to compare the performances of every pair of individual features. It should be emphasized that all algorithms were evaluated for statistical robustness with four collections of lesion boundaries. As a summary, Table 1 lists the notations for the implementations performed in this study.
|
| RESULTS |
|---|
|
|
|---|
C160, were 0.014% and 1.9%, respectively. On the basis of the experimental results, the proposed CAD algorithm potentially may be capable of tolerating the variations in boundary definition due to manual delineation by different persons.
|
C160A, respectively, and the performance data at the best classification accuracy are summarized in Table 3. The test hypothesis and result of the paired-samples t test, labeled as T1, to compare C160 with C160A are given in Table 4. Since P = .020 <
= .05, the null hypothesis should be rejected (ie, C160 was superior to C160A at the 5% significance level), which justified the advantage of adopting feature selection in the proposed CAD algorithm. The means and SDs of the Az values and the performance data at the best classification accuracy for C271f, C271r, and C271LC are provided in Table 4. These three implementations attained reasonably high performances, which were greater than 0.954% and 91.4% for the mean Az and the mean best accuracy, respectively.
|
|
= .05, results of the paired-samples t test T2 suggested that the null hypothesis be rejected, which implied that the performance of C271f was significantly higher than C160 at the 5% significance level. Results of the paired-samples t tests T3T5 suggest that these three null hypotheses be accepted, since all three P values were greater than
= .05. In other words, at the 5% significance level, the performance of C160 was the same as that of C271r and C271LC, and there was no significant difference between the performances of C271f and C271r. From the test results of T2T4, it might be concluded that in comparison with use of only the first set of US images, the proposed classifier derived on the basis of the images from one sample space might be generalized to the images from another sample space without performance degradation at the 5% significance level. Moreover, the test result of T5 validated that the proposed CAD algorithm was robust in the sense that its performance would be the same for both small and large sample sizes at the 5% significance level.
|
With the first set of US images, the mean and SD of the Az value and the performance data at the best classification accuracy for LDAGiger, MFNNGiger, and MFNNChen are provided in Table 6. The performances attained with LDAGiger and MFNNChen were substantially lower than those reported in references 12 and 13. For pairwise performance comparisons among C160, LDAGiger, MFNNGiger, and MFNNChen, the test hypotheses and results of the paired-samples t tests are given in Table 7. With P values less than
= 0.05, all null hypotheses should be rejected. It might be concluded that the relative performances of these four algorithms were MFNNChen < LDAGiger < MFNNGiger < C160, where A < B denotes that A is worse than B at the 5% significance level.
|
|
0, where µA and µB stand for the mean Az values attained with features A and B, respectively. Since all these P values were less than
= 0.05, NSPD, LI, ENC, and ENS were better than lesion size at the 5% significance level. Furthermore, results of paired-samples t tests suggest that the performances of NSPD, LI, and ENC remained the same for different sample sizes at the 5% significance level. That is, NSPD, LI, and ENC were robust. The P values of these three tests were .204, .405, and .587, respectively. Notably, the mean Az values and the mean best accuracies of NSPD in both Figures 6 and 7 were greater than 0.94 and 0.91, respectively. In particular, use of NSPD or ENS alone could outperform LDAGiger, MFNNGiger, and MFNNChen at the 5% significance level.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
On the basis of the high performance mean and low performance variation achieved with four sets of lesion boundaries in the first, third, and fourth experiments in the present study, the proposed CAD algorithm was shown to be an effective and robust approach to differentiation of benign from malignant breast lesions. Results of the third and fourth experiments further validated the extendibility and robustness of the proposed CAD algorithm. At the 5% significance level, the promising results obtained in these two experiments suggest that with the proposed CAD algorithm, the classifier trained by the images directly captured and stored in the electronic storage media may be applied to the hard-copy images and vice versa. Moreover, the proposed CAD algorithm was robust in the sense that performance remained the same for both small and large sample sizes.
The high performance of the proposed CAD algorithm resulted mainly from the effective and reliable morphologic features and incorporation of feature selection. Results of the evaluation of each individual feature showed that NSPD, ENC, and ENS were better than all the Giger features and even only one of the first two would outperform LDAGiger, MFNNGiger, and MFNNChen at the 5% significance level. On the other hand, NSPD, LI, and ENC were shown to give reliable performance for different sample sizes. They were intrinsically reliable because a small local variation in contour delineation would not lead to a dramatic change in feature values. For example, the NSPD was used to count NSPDs in a lesion boundary. Reasonable variation in contour delineation might alter the shape of the lesion boundary but would not cause a big change in the NSPD. For LI and ENC, any reasonable variation in local delineation would be diluted by their own normalization factors, which were the mean area of the lobes for LI and the circumference of the equivalent ellipse for ENC. Since these normalization factors were usually on the order of 100 or 1,000, the potential value changes in LI and ENC would be small relative to the dynamic ranges of these four features.
On the basis of the effective features, feature selection was a necessary and beneficial step to further integrate the differential power of each individual feature, while accounting for the problem of "curse of dimensionality." The curse of dimensionality suggests that the sampling density of the training data is too low to promise a meaningful estimation of a high-dimensional classification function with all seven features with the available finite number of training data (16). As verified in the second experiment, performance with feature selection (ie, C160) was superior to that without feature selection (C160A). It should be emphasized that feature selection is basically a learning process, and the best features vary as the training data change. This finding means that for a practical CAD system, feature selection should be performed frequently to allow learning from the changing training data sets.
The proposed CAD algorithm was shown to be better than the algorithms of Giger et al (12) and Chen et al (13). Setting dependence is one of the major reasons that the previous CAD algorithms are impractical for clinical use in the differentiation of benign from malignant breast lesions. This problem is particularly serious for those algorithms based on the regional features. For example, the Chen algorithm (13) was able to attain Az values and classification accuracy as high as 0.956 and 95%, respectively, when the system setting was basically fixed. Results of the fifth experiment, however, showed that the Chen algorithm performed poorly with the first set of US images, which were acquired without any constraint imposed on the system setting. Any nonlinear change in the system setting may cause a nonnegligible variation in the normalized autocorrelation coefficients used with the Chen algorithm, even for the same lesion.
Similarly, the Giger algorithm (12) also had the setting-dependence problem because of the two regional features involved (ie, coarseness and mean gray-level difference between the region of interest within the lesion and that posterior to the lesion [µ1 - µ2]). Worse than the normalized autocorrelation coefficients, these two regional features may give different values, even with a linear change in the system setting. On the other hand, the morphologic feature of normalized radial gradient was sensitive to the local delineation as a result of the gradient type of information. That is, a small zigzag in the contour might result in a drastic change in the gradient. The experimental result showed that the performance of normalized radial gradient was substantially worse than that of NSPD, ENS, and ENC. Figure 8 reveals that none of these four features could provide sufficient differential power by itself.
The setting dependence of the regional features and the high sensitivity to the local delineation might account for the discrepancy between the high performance reported by Giger et al (12) and the low performance achieved in our fifth experiment. Although the performance improved with MFNN as the classifier, the best performance of the Giger mathematic features was still inferior to that of the proposed CAD algorithm. Moreover, the reliability of the Giger features remained questionable because of their setting and operator dependence.
The bar graphs in Figures 6 and 7 suggest that the mathematic features based on the aspect ratio of the lesion (ie, D:W ratio and L:S ratio) were not effective for differentiating a malignant from a benign lesion, though D:W ratio is considered as a clinically useful indicator (19). In particular, the L:S ratio was devised to eliminate the dependence on the scanning angle that is inherent in the D:W ratio. The low classification accuracy of the L:S ratio seemed to imply that the aspect ratio of a lesion is not a useful indicator for lesion malignancy.
In conclusion, setting independence is clearly a crucial property for a CAD algorithm to be used in practice. To assist differential diagnosis of benign and malignant breast lesions without imposing constraints on system settings, we propose, on the basis of findings in the present study, a new CAD algorithm with nearly setting-independent morphologic features and an artificial neural network as the classifier. The proposed morphologic features were by no means comprehensive, though the experimental results supported that NSPD, LI, and ENC are effective and reliable. We believe that further exploration of the setting-independent regional features that may faithfully characterize echotexture, sound transmission, and angular margin would be required to form a complete set of mathematic features for CAD of breast lesions.
| APPENDIX |
|---|
|
|
|---|
i(k) = sgn(
A point pi is defined as a convex point if
i(k)
p, where
p is a prespecified positive threshold value. Similarly, a point pi is defined as a concave point if hi
2 pixels and hi is the local maximum among the neighborhood of pi and
i(k)
d, where
d is a prespecified negative threshold value. The threshold
2 pixels, which is the distance between two diagonal pixels, is set to eliminate undesirable depression caused by the unsteady delineation process. If two consecutive convex points do not have any concave point in between, the one with the smaller k-curve angle is eliminated. Likewise, if two consecutive concave points do not have any convex point in between, the one with the smaller depression depth is removed. Let
= {
1,
2,...,
p} and
= {
1,
2,...,
d} be the set of points after redundant points have been removed, where p and d are the numbers of points in each set. Then, each point
j in
is called a representative convex point that defines a substantial protuberance and each point
j in
is called a representative concave point that defines a substantial depression.
Empirically,
d was determined in consideration of two conflicting observations. On one hand, it is common to find a depression with a slowly varying contour (ie, the k-curve angle is small) so that
d should be kept as small as possible. On the other hand, it is easy to generate a depression with a small k-curve angle and a small depression depth simply owing to a wobbly delineation process, which may be considered as a noise. As a compromise,
d was set to -20°, which tolerated 10° of aberration from the ideal contour for each side of a concave point.
To determine k, consider a depression with the smallest possible depression depth (ie, hi =
2 pixels). One may easily obtain that k
8 pixels for
d = -20° by approximating the depression as a triangle and determining k with the Pythagorean theorem. When the discrete property of a digital image is taken into account, it is appropriate to use either k = 7 or 8 to evaluate a depression under the lower-bound condition (ie, given hi =
2 pixels and
d = -20°). In this study, k was set to 7 to allow the smaller depressions and protuberances. Since there was no reasonable constraint for
p, we decided to determine
p through learning from data. More specifically, five
p values were considered in this study (ie,
p
{20°, 30°, 40°, 50°, 60°}), the best of which was determined by using feature selection.
| FOOTNOTES |
|---|
Author contributions: Guarantors of integrity of entire study, C.M.C., Y.H.C.; study concepts and design, C.M.C., Y.H.C.; literature research, C.M.C., Y.H.C., K.C.H., G.S.H.; clinical studies, Y.H.C., C.M.T., H.J.C., S.Y.C.; experimental studies, C.M.C., K.C.H., C.M.T., H.J.C., S.Y.C.; data acquisition, K.C.H., G.S.H., C.M.T., H.J.C., S.Y.C.; data analysis/interpretation, C.M.C., Y.H.C., K.C.H., G.S.H.; statistical analysis, C.M.C.; manuscript preparation, definition of intellectual content, editing, revision/review, and final version approval, C.M.C., Y.H.C.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J.-W. Jeong, D. C. Shin, S.-H. Do, C. Blanco, N. E. Klipfel, D. R. Holmes, L. J. Hovanessian-Larsen, and V. Z. Marmarelis Differentiation of Cancerous Lesions in Excised Human Breast Specimens Using Multiband Attenuation Profiles From Ultrasonic Transmission Tomography J. Ultrasound Med., March 1, 2008; 27(3): 435 - 451. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Sahiner, H.-P. Chan, M. A. Roubidoux, L. M. Hadjiiski, M. A. Helvie, C. Paramagul, J. Bailey, A. V. Nees, and C. Blane Malignant and Benign Breast Masses on 3D US Volumetric Images: Effect of Computer-aided Diagnosis on Radiologist Accuracy Radiology, March 1, 2007; 242(3): 716 - 724. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Sehgal, T. W. Cary, S. A. Kangas, S. P. Weinstein, S. M. Schultz, P. H. Arger, and E. F. Conant Computer-Based Margin Analysis of Breast Sonography for Differentiating Malignant and Benign Masses J. Ultrasound Med., September 1, 2004; 23(9): 1201 - 1209. [Abstract] [Full Text] [PDF] |
||||
| |||||