|
|
||||||||
Computer Applications |
1 Division of Obstetrics and Gynecology, Santa Maria Annunziata Hospital (R.B., C.D., G.G.)
2 Departments of Obstetrics and Gynecology (R.B., E.V.)
3 Radiology (E.V.), Università di Firenze, Florence, Italy.
| Abstract |
|---|
|
|
|---|
MATERIALS AND METHODS: A total of 226 adnexal masses were examined before surgery: Fifty-one were malignant and 175 were benign. The data were divided into training and testing subsets by using a "leave n out method." The training subsets were used to compute the optimum MLR equations and to train the ANNs. The cross-validation subsets were used to estimate the performance of each of the two models in predicting ovarian malignancy.
RESULTS: At testing, three-layer back-propagation networks, based on the same input variables selected by using MLR (ie, women's ages, papillary projections, random echogenicity, peak systolic velocity, and resistance index), had a significantly higher sensitivity than did MLR (96% vs 84%; McNemar test, P = .04). The Brier scores for ANNs were significantly lower than those calculated for MLR (Student t test for paired samples, P = .004).
CONCLUSION: ANNs might have potential for categorizing adnexal masses as either malignant or benign on the basis of multiple variables related to demographic and US features.
Index terms: Computers, diagnostic aid Computers, neural network Ovary, neoplasms, 852.30 Ovary, US, 852.1298, 852.12983 Ultrasound (US), Doppler studies, 852.12983
| Introduction |
|---|
|
|
|---|
To predict a dichotomous outcome (eg, presence or absence of a disease) from a given set of variables that bear on that outcome, a multivariate logistic regression model can be used. Recently, this approach has been applied to estimate the probability of malignancy in an adnexal mass on the basis of multiple parameters related to demographic and US characteristics (10). An alternative to conventional statistical multivariate methods for classifying data is represented by artificial neural networks (ANNs). They consist of a variable number of processing elements (artificial neurons or nodes) connected together in hierarchical layers: an input layer, one or more hidden layers, and an output layer. Each node, with the exception of the input neurons, receives multiple weighted inputs and produces an output that is usually a nonlinear function of the inputs. Unlike rule-based methods, ANNs are capable of learning from examples and generalizing; they have the power of pattern recognition and discrimination tasks. The most widely used classes of ANNs are back-propagation networks (11), which have been used in various clinical diagnostic problems (12,13). One of the areas to which the networks have been extensively applied is imaging interpretation in radiology (1418).
The aim of the present study was to compare the diagnostic accuracy of ANNs with that of multiple logistic regression (MLR) analysis for predicting ovarian malignancy in patients with adnexal masses by using B-mode and color Doppler transvaginal US.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The distributions of morphologic and Doppler-related characteristics in benign and malignant tumor groups were compared by using the Mann-Whitney U test for continuous variables and the
2 test for categoric variables.
The data were divided into training and testing sets by using a "leave n out method" (21). In this method, 80% of all the cases were randomly chosen for generating a training set; the 20% that was left out was then used for cross-validation. This procedure was repeated five times so that each case appeared once in a testing subset. The training subsets were used to compute the optimum regression equations and to train the ANNs. The cross-validation subsets were used to estimate the performance of each of the two models to predict ovarian malignancy when they were applied to a different sample from the one on which the method was trained.
MLR Analysis
The seven predictor variables used for MLR analysis were women's ages, mean diameter of each mass, presence of multilocularity, presence of papillary projections, presence of random echogenicity, peak systolic velocity, and resistance index. Qualitative variables were coded in binary. If the variable was absent, it was assigned a value of 0; if it was present, it was assigned a value of 1. When the blood flow was not detectable, the peak systolic velocity and resistance index were set to 6 cm/sec and 0.90, respectively, for inclusion in multivariate analysis. These values were derived from the extremes of the distributions for benign tumors, because no malignancies were encountered in this group. Histologic diagnosis of malignancy in adnexal masses was entered as a dependent variable in the MLR model and was coded as 0 for absent and 1 for present. A forward stepwise procedure was used to select the independent variables that should have been included in the model to predict the dependent variable. A variable was entered into the model if the probability of its score statistics was less than .05. A variable was removed if its likelihood ratio statistic had a P value of greater than or equal to .10. The reason why the significance level to remove a variable from the equation must be slightly higher than that to include a variable is that this makes it possible to avoid infinite loops. The regression equation derived from each of the training subsets was applied to the just-removed cross-validation subset to estimate the individual probability of having a malignant ovarian tumor. This probability (Q) was calculated as follows (22):
+ ß1x1 + ß2x2 + . . . + ßnxn, in which
, which is the intercept, and ßn are coefficients estimated from the data, and xn is the predictor variable included in the model. In particular, the exponential form of ßi (ie, eßi) was the factor by which the odds of malignancy changed when the ith predictor variable increased by one unit. The probability Q estimated by using the logistic model always ranges between 0 and 1, regardless of the value of Z. In this study, if Q was greater than 0.5, the case was classified as malignant. The converse applied when the estimated probability was less than 0.5.
ANN Topology
ANNs were constructed by using the commercially available software BRAINMAKER (version 3.0; California Scientific Software, Nevada City, Calif). We used networks with three layers and a back-propagation algorithm. A logistic function was used as a transfer function for each of the nodes in the hidden layer and the output layer of the networks. This is a nonlinear function expressed as:
wiai) is the output activity of the neuron i and wiai is the net input to the neuron, which was derived by multiplying the connection weight (w) and the incoming activation (a) of the unit. The input layer of the networks consisted of the same variables selected by using stepwise logistic regression analysis. A hidden layer of three neurons was connected to the input layer. No more than three hidden neurons were used because this would determine an artificially high classification rate, taking into account the sample size of our study population (23). The hidden layer was connected to an output layer of a single neuron producing a normalized value in the range of 0 to 1, which could be viewed as representing the likelihood of ovarian malignancy. The training parameters of learning rate and momentum were set at 0.25 and 0.90, respectively. A training error tolerance of 0.10 was used; this means that if the networks' output was not within 0.1 of the target output, it was considered incorrect. This error value was used by the back-propagation algorithm to adjust the network weights to minimize the output error during the training. We took care not to overfit the training data, which happens when the network fits the training data too closely and it cannot generalize new data well. In brief, to prevent this problem, three methods may be applied: first, limiting the number of hidden neurons; second, constraining the network not to use large weights; or, third, stopping the training at an earlier cycle than that corresponding to the minimum training error.
Evaluation of Performance
Overall rates for sensitivity and specificity obtained from logistic regression models and ANNs at a cutoff value for outputs greater than 0.50 were calculated as the average of the individual cross-validation rate for each testing subset. These rates were compared by means of the McNemar test (24), a nonparametric test for two related dichotomous variables. The performances of the two classification procedures were compared statistically by using the Brier score (25), that is, the mean of the squared differences between the probabilities estimated by using a probability model and the actual value. The lower the score, the better the performance. The Brier score is independent of the classification threshold used, and, as previously reported (26,27), it represents a better test of predictive accuracy of a classifier than the receiver operating characteristic, or ROC, curve analysis. The scores calculated in each testing subset, by using MLR and ANNs, were compared by means of the Student t test for paired samples.
| RESULTS |
|---|
|
|
|---|
2 test) of the null hypothesis of no difference between the two groups with respect to each parameter. The median diameter was 53.3 mm in benign lesions and 64.7 mm in malignant lesions (P < .001). Malignancy in the adnexal masses was significantly associated with the presence of papillary formations, random echogenicity, and loculations. Color flow examinations depicted intratumoral blood vessels in all malignant masses and in 134 of the 175 benign tumors (76.6%). There were significant differences in the Doppler values between the benign cases and the malignant cases.
|
|
The ANNs we used included five input neurons, each of which corresponded to a predictor variable selected by using MLR; three hidden neurons; and one output unit. The networks had 18 connection weights that consisted of 15 weights toward the hidden layer and three weights toward the output neuron.
When logistic regression models were validated against the testing subsets, an overall sensitivity of 84% (43 of 51; 95% CI: 71.4%, 92.9%) and specificity of 96.6% (169 of 175; 95% CI: 92.7%, 98.7%) were obtained with a cutoff value greater than a 50% probability. At the same threshold, the networks had a significantly higher sensitivity of 96% (49 of 51; 95% CI: 86.5%, 99.5%; McNemar test, P = .04), with a specificity of 97.7% (171 of 175; 95% CI: 94.2%, 99.4%; P = .48).
The Brier scores for ANNs in the testing subsets were significantly lower than those calculated for MLR (mean values, 0.031 vs 0.051; two-tailed Student t test, P = .004), thus showing that ANNs classified significantly better than use of MLR did.
| DISCUSSION |
|---|
|
|
|---|
The distinction between benign and malignant ovarian lesions is important in planning the correct treatment by identifying those patients who require extensive surgery versus those in whom conservative management or minimally invasive surgery can be safely performed. The use of US morphologic criteria alone to differentiate between benign and malignant ovarian neoplasm is hindered by a high false-positive rate, with a consequent high number of invasive procedures required to confirm the abnormal US findings (13). The more recent introduction of color Doppler imaging can contribute to the differential diagnosis of adnexal masses (49), particularly in postmenopausal women. The most widely used method of interpretation of Doppler parameters (eg, resistance index or peak systolic velocity) is the concept of cutoff values, which are intended to help discriminate ovarian neoplasms by assigning a fixed limit of abnormality to the parameters. Thus, continuous variables are reduced to binary variables, with a potential lack of information contained in the data. This might explain the substantial variation in results reported by different authors who have evaluated the diagnostic accuracy of Doppler indexes alone in differentiating benign from malignant ovarian tumors (5,6,9,28). On the whole, it seems clear that an improvement in predictive accuracy can be achieved only by taking into account multiple parameters. A standard computational method for predicting a diagnosis from multivariate parameters, as well as identifying the variables useful in making the prediction, is represented by MLR. Furthermore, when continuous variables such as age or resistance index are included in this model, any change in these variables will be reflected in the final prediction, without a fixed cutoff of abnormality being needed. By using this approach, we obtained an overall sensitivity of 84% and a specificity of 96.6% on the basis of multiple variables related to demographic and US characteristics. These findings are in line with those previously reported by Tailor et al (10). A back-propagation ANN presents a potential method for finding a relationship between different input variables and binomial output variables. In the current study, we used simple three-layer networks to predict the presence of malignancy in an adnexal mass by using the same input variables selected by means of MLR. The ANNs correctly diagnosed 96.1% of malignant and 97.7% of benign lesions at testing. These results compare favorably with the corresponding rates obtained with MLR models. However, these rates are only a crude indicator of the quality of a classifier, whose predictive accuracy can be better quantified by using a test such as the Brier index, which summarizes how close each patient's prediction is to the true outcome (26,27). Evaluation of these scores in our testing subsets suggests that the networks were able to categorize adnexal masses more accurately than logistic regression models. Thus, ANNs seem to be able to extract information from the data that are not apparent to the logistic regression analysis. In general, ANNs have an inherent advantage over conventional statistical models, which is related to their capability of generating decision boundaries between overlapping diagnostic categories, when the relation between data items is determined by means of a complex multidimensional nonlinear function. The data set studied here was relatively simple, so it is conceivable that the advantage of ANN would become more apparent if other predictor variables were incorporated into the model. However, to evaluate whether a better predictive accuracy of the network might be achieved by adding other variables, it will be necessary to gather an adequate sample of data. Indeed, data sets with small sample size in relation to the number of input categories increase the possibility of a high classification rate by chance (29).
We acknowledge that ANNs are not the panacea for all complex data analysis, and several criticisms may be leveled at these techniques. These include the empirical nature of choosing network parameters and mainly the inability of the networks to explain themselves. However, in our opinion, an ANN that is nothing other than a nonlinear regression model can be used as a decision-support device, provided that its predictive ability is clearly demonstrated independently of a thorough understanding of its inner workings.
The purpose of this preliminary study was not to obtain a trained ANN for clinical application but to evaluate the feasibility of this approach. Although our results seem encouraging, further studies are needed to assess prospectively the performance of an ANN in predicting malignancy in adnexal masses as compared with the performance of traditional multivariate statistical methods. If such a study is successful, the possibility for this computational method to be applied to the clinical practice could be explored.
| Footnotes |
|---|
Abbreviations: ANN = artificial neural network MLR = multiple logistic regression
Author contributions: Guarantors of integrity of entire study, R.B., G.G.; study concepts, R.B., G.G.; study design, R.B., E.V.; definition of intellectual content, R.B., G.G.; literature research, E.V.; data acquisition and analysis, R.B., C.D.; statistical analysis, R.B.; manuscript preparation, R.B., E.V.; manuscript editing and review, E.V.
Received March 18, 1998;
revision requested May 22, 1998; revision received July 6, 1998;
accepted September 8, 1998.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Yoruk, O. Dundar, B. Yildizhan, L. Tutuncu, and T. Pekin Comparison of the Risk of Malignancy Index and Self-Constructed Logistic Regression Models in Preoperative Evaluation of Adnexal Masses J. Ultrasound Med., October 1, 2008; 27(10): 1469 - 1477. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Van Holsbeke, B. Van Calster, L. Valentin, A. C. Testa, E. Ferrazzi, I. Dimou, C. Lu, P. Moerman, S. Van Huffel, I. Vergote, et al. External Validation of Mathematical Models to Distinguish Between Benign and Malignant Adnexal Tumors: A Multicenter Study by the International Ovarian Tumor Analysis Group Clin. Cancer Res., August 1, 2007; 13(15): 4440 - 4447. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Kim, J. M. Lee, J. H. Kim, K. G. Kim, J. K. Han, K. H. Lee, S. H. Park, N.-J. Yi, K.-S. Suh, S. K. An, et al. Appropriateness of a Donor Liver with Respect to Macrosteatosis: Application of Artificial Neural Networks to US Images--Initial Experience Radiology, March 1, 2005; 234(3): 793 - 803. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kan, Y. Shimada, F. Sato, T. Ito, K. Kondo, G. Watanabe, M. Maeda, S. Yamasaki, S. J. Meltzer, and M. Imamura Prediction of Lymph Node Metastasis with Use of Artificial Neural Networks Based on Gene Expression Profiles in Esophageal Squamous Cell Carcinoma Ann. Surg. Oncol., December 1, 2004; 11(12): 1070 - 1078. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xu, F. M. Selaru, J. Yin, T. T. Zou, V. Shustova, Y. Mori, F. Sato, T. C. Liu, A. Olaru, S. Wang, et al. Artificial Neural Networks and Gene Filtering Distinguish Between Global Gene Expression Profiles of Barrett's Esophagus and Esophageal Cancer Cancer Res., June 1, 2002; 62(12): 3493 - 3497. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kinkel, H. Hricak, Y. Lu, K. Tsuda, and R. A. Filly US Characterization of Ovarian Masses: A Meta-Analysis Radiology, December 1, 2000; 217(3): 803 - 811. [Abstract] [Full Text] |
||||
![]() |
R.-F. Chang, W.-J. Kuo, D.-R. Chen, Y.-L. Huang, J.-H. Lee, and Y.-H. Chou Computer-Aided Diagnosis for Surgical Office-Based Breast Ultrasound Arch Surg, June 1, 2000; 135(6): 696 - 699. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |