|
|
||||||||
Genitourinary Imaging |
1 From the Departments of Radiology (K.K., Y.L., H.H., K.T., R.F.) and Biostatistics (Y.L.), University Hospital Geneva, rue Micheli-du-Crest 24, 1211 Geneva 14, Switzerland. Received July 22, 1999; revision requested September 20; final revision received April 13, 2000; accepted May 8. K.K. was supported in part by a grant from the French Radiology Society. Address correspondence to K.K. (e-mail: karen.kinkel@hcuge.ch).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Through a MEDLINE literature search, articles with imaging-histopathologic correlation and data that allowed calculation of contingency tables were identified. Results of morphologic assessment, Doppler US, color Doppler flow imaging, and combined techniques were compared.
RESULTS: Among 89 data sets from 46 included studies (5,159 subjects), 35 sets used morphologic information, 36 measured Doppler US indexes, 10 assessed tumor vascularity with color Doppler flow imaging, and eight used combined techniques. Summary receiver operating characteristic curves revealed significantly higher performance for combined techniques than for morphologic information (P = .003), Doppler US indexes (P = .003), or color Doppler flow imaging alone (P = .001). The Q* point (and 95% CI) for combined techniques was 0.92 (0.87, 0.96) versus 0.85 (0.83, 0.88) for morphology, 0.82 (0.78, 0.86) for Doppler US, and 0.73 (0.58, 0.87) for color Doppler flow imaging. Morphologic assessment showed a trend toward better performance than color Doppler flow imaging (P = .09) or Doppler US indexes (P = .07). Doppler US index results were better in earlier studies (P = .005).
CONCLUSION: Combined US techniques and a diagnostic algorithm perform significantly better than morphologic assessment, color Doppler flow imaging, or Doppler US indexes alone in characterizing ovarian masses.
Index terms: Ovary, neoplasms, 852.31, 852.32 Ovary, US, 852.12983, 852.12984 Ultrasound (US), Doppler studies, 852.12983, 852.12984 Ultrasound (US), comparative studies, 852.12983, 852.12984
| INTRODUCTION |
|---|
|
|
|---|
A majority of ovarian masses are nonneoplastic cysts. However, when a lesion is suspected of being a neoplasm, surgical intervention must be considered. Twenty-five percent of ovarian neoplasms are malignant (1). For this reason, surgical removal of a suspected ovarian neoplasm is the standard procedure. In most institutions, the type of surgery performed (laparoscopy vs laparotomy) depends on the probability of malignancy. The optimal US technique and diagnostic criteria to use when characterizing a suspected ovarian neoplasm remain controversial. The reported accuracy of US is 65%94% (2,3) for gray-scale US, 35%88% (4,5) for color Doppler flow imaging, and 48%99% (2,6) for Doppler arterial resistance measurements. In addition, diagnostic algorithms and multiparameter scoring systems have been advocated to increase test performance (79). The question of which US technique and diagnostic criteria provide the best ovarian lesion characterization has not, to our knowledge, been answered.
Although a meta-analysis does not replace large prospective clinical trials, it has been shown that the results of a meta-analysis in therapeutic trials do not differ from those obtained in large trials (10). Although we do not know whether this result can be applied for trials in which diagnostic tests are evaluated, Irwig et al (11) suggested that a meta-analysis of diagnostic tests represents a potentially powerful tool to summarize the literature by taking into account and analyzing differences between studies. The purpose of this meta-analysis study was to compare the effectiveness of current US techniques in characterizing ovarian masses.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Study Selection
The studies included in the meta-analysis met the following inclusion criteria: (a) Patients had an adnexal mass not discovered during screening for ovarian cancer; (b) the reference standard was histopathologic findings; (c) interpreters of US data were blinded to histopathologic findings; (d) presented data allowed calculation of true-positive, true-negative, false-positive, and false-negative imaging results; and (e) data or subsets of data were not published more than once.
Forty-six of 68 references fulfilled the inclusion criteria (Table 1). Reasons for exclusion were absent original data (n = 2), incomplete or inconclusive data (n = 7), absent lesion characterization by the sonologist (n = 8), nonblinded image interpretation (n = 1), lack of histopathologic findings (n = 1), exclusion of borderline tumors (n = 1), and presentation of patients in more than one publication (n = 2).
|
In some studies, only a subgroup of patients fulfilled the inclusion criteria. If an article presented data for more than one US technique, inclusion and exclusion criteria were applied to each technique.
Data Analysis
The meta-analytic method used in our study was based on summary receiver operating characteristic (ROC) curves (12,13). Sensitivity and specificity were recalculated for each reference study by using the conventional corrections for zero counts (14). Because of the lack of independence between sensitivity and specificity (Pearson correlation coefficients of 0.22), the standard method for quantitative integration of data, such as the mean sensitivity and specificity over the studies, was considered inappropriate. To compare US imaging modalities, we used summary ROC analysis, which accounts for the interdependence between sensitivity and specificity. Summary ROC, a mathematic transformation of sensitivity and specificity, has been described by Moses and colleagues (12). The transformed data of all studies were combined through a robust regression (Huber M-regression) analysis (15) in a regression line. The robust regression analysis reduces the effect of heterogeneity among studies by attributing appropriate weights to each study in accordance with its deviation from a normal distribution. The regression line is then transformed back into a summary ROC curve. By combining the data from all studies, summary ROC curves were independent from the diagnostic threshold used to separate benign from malignant ovarian neoplasms.
Following the guidelines for fitting summary ROC curves, we obtained corresponding single number summaries (Q* values). Q* values correspond to the point on the summary ROC curve where sensitivity and specificity are equal. Like the area under the ROC curve, the Q* point indicates how closely a test approaches the desirable performance of 100% sensitivity and specificity. The higher the Q* value, the better the diagnostic test performance. Testing for differences between US techniques was based on Q* values and their associated standard errors.
Covariate Adjustment
To determine whether imaging results were significantly affected by heterogeneity in the studies, we extracted covariates including (a) year of publication; (b) patient characteristicsthat is, percentages of pre- and postmenopausal women; percentage of mucinous tumors, endometriomas, and nonneoplastic cysts; prevalence of malignancy; and stage distribution of ovarian cancerby using International Federation of Gynecology and Obstetrics (FIGO) staging guidelines; (c) study designthat is, prospective versus retrospective data acquisition and description of diagnostic criteria for image interpretation; (d) technical factorsthat is, transabdominal versus endovaginal versus combined transabdominal and endovaginal approach, and frequency of US transducer; (e) geographic origin of the study (eg, United States, Europe, or other countries); and (f) professional specialty of authors who interpreted images (eg, gynecology vs radiology).
Covariate adjustment analysis was performed by applying a series of statistical tests in accordance with Moses and colleagues (12) by using regression analysis. The dependent variable of the regression analysis was the difference of the logit of true- and false-positive rates; the independent variables of the regression analysis were the sum of the logit transforms of true- and false-positive rates, the covariate, and its interaction with the sum by following the appendix of de Vries and colleagues (16). The regression was weighted with the inverse of the variance of dependent variables. Each covariate was analyzed separately in each US technique. A multivariate analysis could not be performed because of the small number of studies with complete data on all covariates. A local regression model (17) was used to demonstrate the effect of a covariate on sensitivity and specificity. A covariate significantly affecting the diagnostic performance was plotted against the reported sensitivities and specificities. A P value of .05 was considered to indicate a significant difference.
| RESULTS |
|---|
|
|
|---|
|
In the 46 included studies, a combined endovaginal and transabdominal approach was used in 24, an endovaginal approach only was used in 18, an abdominal approach only was used in two, and no approach was specified in two. The frequency of the endovaginal transducer was 5.0 MHz in 24 studies, 6.5 MHz in eight studies, 7.0 MHz in five studies, 7.5 MHz in three studies, 57 MHz in one study, and unspecified in another study. The frequency of the abdominal transducer was not specified.
Ten studies were performed in the United States; 24, in Europe; and 12, in other countries (five, in Japan; two, in Israel; two, in Taiwan; one, in Brazil; one, in India; and one, in Singapore). The specialty of the author who interpreted the images was radiology in 10 and gynecology in 36 studies. The type of data acquisition was specified in 20 studies, with 16 prospective and four retrospective studies.
Seventy-four subsets indicated diagnostic criteria. Among the diagnostic criteria used for morphologic US, the score published by Sassone and colleagues (56) was applied in 13 subsets; the score published by DePriest et al (57), in two subsets; the criteria published by Granberg and colleagues (58), in two subsets; and the score published by Benacerraf and colleagues (20), in two subsets. Although other subsets listed morphologic diagnostic criteria without a score, in a majority of studies, criteria similar to those of Sassone and colleagues (56) were used. The score included inner-wall structure and thickness, septal presence and thickness, and echogenicity of the mass (56). A score of 9 or more, per Sassone and colleagues (56), was used as the threshold for malignancy. The diagnostic criteria for color Doppler flow imaging considered "flow detection within a mass" as indicative of malignancy. In Doppler US studies in which the pulsatility index was used, a value inferior to 1.0 was used in 67% of the studies; other threshold values were 0.621.50. A large range of values was also observed for threshold values of the resistive index, 0.40.8.
US Characterization of Ovarian Masses
Because of the lack of independence between sensitivity and specificity, we used summary ROC analysis to compare US techniques. The Q* points of the different US techniques are presented in Table 3 and correspond with the point on the summary ROC curve where sensitivity and specificity are equal. The comparison at Q* points revealed significantly higher performance for combined US techniques than for morphologic assessment (P = .003), Doppler US indexes (P = .003), or color Doppler flow imaging analysis (P = .001) alone (Fig 1). Morphologic assessment showed a trend toward better performance than did color Doppler flow imaging (P = .09) or Doppler US indexes alone (P = .07). The Doppler US indexes employed to estimate arterial resistance performed similarly.
|
|
Subgroup Analysis
To test the validity of the results, a subgroup analysis that took into account potentially significant covariates was performed (Table 4). The analysis showed significantly higher performance in studies with fewer cases of mucinous tumors (P = .027) and in studies in which diagnostic criteria were described (P = .02). This result was valid for both covariates in the overall group (all US techniques) and in the subgroup of arterial Doppler index measurements. Insufficient sample size might be a reason why the other US techniques did not show a significant relationship between those covariates and the study performance. As shown in Figure 2, sensitivity and specificity decreased with the increase of the percentage of mucinous tumors in the study population. Figure 3 demonstrates that the specificity of 74 data sets that described diagnostic criteria was significantly better, at equal sensitivity, than the specificity of 15 data sets without specified diagnostic criteria. This result was obtained independently of the study technique. A trend toward better results was seen in study populations with a lower prevalence of malignancy (P = .15).
|
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Our study results demonstrate the superiority of diagnostic US performance when a combination of morphologic and color Doppler flow imaging information (8,38) or morphologic information, color Doppler flow imaging, and Doppler arterial resistance measurements (2,39) are employed. However, studies in which morphologic findings alone are assessed appear superior to studies in which Doppler arterial resistance measurements are used as the sole diagnostic criteria for malignancy. This was particularly true for studies published after 1995. Our subgroup analysis showed a significant decrease in the reported performance of Doppler arterial resistance measurements in studies conducted from 1992 to 1998. An analysis of individual data sets potentially explains this observation. For example, in the article by Kurjak and Predanic (2), 812 patients were included in the study, but results were reported for only 155 patients. Therefore, they and other investigators may have reported on only patients whose Doppler arterial resistance was measured successfully. This selection bias may have been introduced by the noninclusion of subjects who had unsuccessful Doppler arterial resistance measurements. This possibly led to an initial overestimation of the performance capabilities of Doppler arterial resistance measurements.
Although the menopausal status of a woman influences her pretest probability of ovarian cancer, the analysis of studies in which menopausal status is reported showed that the performance of US with or without color Doppler flow imaging was not significantly influenced by the percentage of premenopausal women. US can therefore be applied in pre- and postmenopausal women who are referred for the characterization of an ovarian mass. The preliminary results of previous studies in which the low sensitivity (27) and specificity (51) of US in pre- versus postmenopausal women are not demonstrated by our meta-analysis. Investigators in two studies (8,9) specifically evaluated the importance of "menopausal status" in diagnostic systems by using logistic regression. The results of neither study demonstrated a contribution of the menopausal status to diagnosis.
The characteristics of a study population are important in assessing the external validity of research results (59). Our subgroup analysis of patient characteristics showed that the percentage of mucinous tumors is an important parameter that explains differences in test performance between studies. Mucinous tumors, a subtype of benign and malignant epithelial tumors, represent the third most common benign neoplasm (1) and account for approximately 20% of all malignant epithelial tumors (60). The typical US appearance of a mucinous cystadenoma is a multilocular cystic mass, with the locules commonly containing liquids of different echogenicity (61). In general, morphologic criteria define lesions with greater than three septations as either indeterminate for malignancy or malignant. Portions of the cyst that are echogenic, if mistaken as solid elements, falsely indicate malignancy in these predominantly benign neoplasms. The absence of color Doppler flow in an echogenic portion helps to confirm the cystic nature of the tumor and avoid a potential false-positive diagnosis. This is a common mechanism by which color Doppler flow imaging can be used to interpret gray-scale morphologic features. Nonetheless, benign mucinous tumors remain difficult to diagnose by using all possible US techniques (42,61).
It is not unexpected that a higher prevalence of malignant neoplasm in the patient cohort adversely affected the performance of US techniques in which morphology alone was used. Historically, US has had its greatest success in use in accurately predicting benign status (18,21). The technique is less accurate when used to predict malignancy. Therefore, a greater prevalence of ovarian cancer in the study cohort would likely have a negative effect on diagnostic efficiency.
Results of the meta-analysis demonstrate that the success of US does not appear to be influenced by the specialty training of the sonologist, such as radiology versus gynecology, but rather by the use of meticulous methodology. Using the combination of gray-scale and color Doppler flow imaging findings in a diagnostic system was superior to using morphologic information or optimized thresholds for Doppler arterial resistance measurements alone in scoring systems. The heterogeneity of diagnostic systems used in the combined US techniques did not allow appropriate statistical comparison. However, if we compare studies with similar sensitivities (7,8,42), those combining morphology and Doppler arterial resistance measurements demonstrate lower specificities (40%52%) (7,42) than do diagnostic systems using morphology and color Doppler flow imaging (specificity, 93%) (8) (Fig 3).
Diagnostic systems requiring the combination of morphology, color Doppler flow imaging, and Doppler arterial resistance measurements (2,39) may be less feasible and more time-consuming, since they require adequate information from all three techniques. The time spent to obtain adequate Doppler arterial resistance measurements may be a factor that limits their larger use (49,53,62). Evidence suggests that multiple resistance indexes must be obtained in each lesion because of the wide variability in arterial resistances that are measured in different areas of any given lesion (28). The lowest resistance index is then used in lesion analysis.
It is unfortunate that there is no mechanism that allows the sonologist to ascertain that the lowest resistance index has been measured. The morphologic appearance of the mass influences the number of Doppler arterial resistance measurements obtained. If two or three "benign" resistance indexes are recorded in a morphologically benign-appearing mass, then the search is concluded. However, if two or three benign resistance indexes are recorded in a morphologically malignant-appearing mass, then the search is continued. Moreover, to our knowledge, the reproducibility of Doppler arterial resistance measurements in ovarian cysts has not yet been verified. However, studies in which Doppler arterial resistance was measured in larger pelvic vessels (63), such as in the ovarian and uterine artery, showed poor agreement. Indeed, the variability in arterial resistances measured in different areas of an ovarian mass (28) limits the feasibility of conducting a study of the reproducibility of Doppler arterial resistance measurements in ovarian lesions. In a study in which morphology alone was used (57), the interobserver agreement of the characterization of ovarian masses was moderate for frequently used criteria such as wall structure (
= 0.41) and septal structure (
= 0.47). Therefore, morphology likely contributes to the variability of US results to a lesser degree than does arterial waveform analysis.
Furthermore, the diagnostic systems requiring the combination of morphology, color Doppler flow imaging, and Doppler arterial resistance measurements (2,39), although their performance appears excellent, do not explain how the scoring system was devised. If the diagnostic criteria were first developed in and then applied in the same patient population, then a better fit would be expected; this method results in an overestimation of the techniques diagnostic performance. This also may help to explain the decline in efficiency noted in studies in more recent years in which arterial resistance indexes were measured (4,6,38,42,51).
Therefore, optimal ovarian lesion characterization appears to be obtained through the combination of gray-scale US morphology and color Doppler flow imaging information. Such a strategy is described in two studies. The system proposed by Buy et al (38) has the advantage of being verified in a large prospective study and does not require calculation. The scoring system in an article by Brown et al (8), obtained through logistic regression, requires the assessment of four parameters: a solid component, the location of color flow in the lesion, the amount of free intraperitoneal fluid, and the presence of septations, combined through a simple addition of subscore values. The presence of a nonhyperechoic solid portion of an ovarian neoplasm, a gray-scale US morphologic feature, had the greatest influence on the diagnosis. Although the scoring system in the article by Brown et al (8) demonstrated a good compromise between sensitivity and specificity, in our opinion, the system needs validation in a prospective study.
The characterization of an ovarian mass as benign or malignant can be achieved by using a list of diagnostic criteria that attribute equal weight to each criterion or by using a scoring system that attributes numbers of increasing value (weights) to each criterion, the sum of which results in a score. If the score of an ovarian mass reaches the previously established threshold for malignancy, the mass is considered malignant. The threshold value for a scoring system should be obtained by performing ROC curve analysis in a large study population that is representative of the target population. When multiple and sometimes conflicting diagnostic features are available, it is reasonable to assume that optimized weights attributed to each diagnostic criterion or the combination of criteria into a score have a greater chance to achieve a correct and reproducible diagnosis than do simple diagnostic rules that require experience. Scoring systems can be obtained through multilogistic regression and ROC curves or artificial neural networks that use a nonlinear regression model (64).
Because of the small number of studies that used similar diagnostic criteria, this meta-analysis could not answer the question regarding which criteria worked best, as compared with others. However, results of individual studies suggest that the inclusion of tumor size proposed by DePriest et al (57) did not improve the results obtained by using the score (31) of Sassone and colleagues. Logistic regression analysis in the combined study by Brown et al (8) demonstrated that an abnormal amount of ascites is indicative of malignancy. Although logistic regression analysis suggested an optimal threshold value close to 0.75 for the resistive and pulsatility index, in a multivariate approach, the incremental value of arterial Doppler waveform analysis compared with that of color Doppler flow imaging, and morphology was not powerful enough to be part of the variables that best predicted malignancy (8).
In conclusion, the results of this meta-analysis provides scientific evidence that US techniques that combine gray-scale US morphologic assessment with tumor vascularity imaging information (color Doppler flow imaging) in a diagnostic system are significantly better in ovarian lesion characterization than Doppler arterial resistance measurements, color Doppler flow imaging, or gray-scale US morphologic information alone. The specialty training of the sonologist does not influence results, provided that the sonologist has used meticulous methodology. Furthermore, specific diagnostic criteria for each US technique must be applied. The patients menopausal status does not influence results, but the character of the lesion may; mucinous cystadenomas in particular are difficult to accurately characterize by using any US technique or combination thereof.
| FOOTNOTES |
|---|
Author contributions: Guarantors of integrity of entire study, K.K., Y.L., H.H.; study concepts, K.K., H.H., R.A.F.; study design, K.K.; definition of intellectual content, K.K., H.H., R.A.F.; literature research, K.K., K.T.; data acquisition, K.K., K.T.; data analysis, K.K., Y.L.; statistical analysis, Y.L.; manuscript preparation, K.K.; manuscript editing, R.A.F., H.H.; manuscript review, K.K., H.H., Y.L., R.A.F.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Yoruk, O. Dundar, B. Yildizhan, L. Tutuncu, and T. Pekin Comparison of the Risk of Malignancy Index and Self-Constructed Logistic Regression Models in Preoperative Evaluation of Adnexal Masses J. Ultrasound Med., October 1, 2008; 27(10): 1469 - 1477. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Barua, J. S. Abramowicz, J. M. Bahr, P. Bitterman, A. Dirks, K. A. Holub, E. Sheiner, M. J. Bradaric, S. L. Edassery, and J. L. Luborsky Detection of Ovarian Tumors in Chicken by Sonography: A Step Toward Early Diagnosis in Humans? J. Ultrasound Med., July 1, 2007; 26(7): 909 - 919. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Alcazar, G. Castillo, M. Jurado, and G. L. Garcia Is expectant management of sonographically benign adnexal cysts an option in selected asymptomatic premenopausal women? Hum. Reprod., November 1, 2005; 20(11): 3231 - 3234. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kinkel, Y. Lu, A. Mehdizade, M.-F. Pelte, and H. Hricak Indeterminate Ovarian Mass at US: Incremental Value of Second Imaging Test for Characterization--Meta-Analysis and Bayesian Analysis Radiology, July 1, 2005; 236(1): 85 - 94. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Clarke, A. Edwards, and K. Pollard Acoustic Streaming in Ovarian Cysts J. Ultrasound Med., May 1, 2005; 24(5): 617 - 621. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Marret, S. Sauget, B. Giraudeau, M. Brewer, J. Ranger-Moore, G. Body, and F. Tranquart Contrast-Enhanced Sonography Helps in Discrimination of Benign From Malignant Adnexal Masses J. Ultrasound Med., December 1, 2004; 23(12): 1629 - 1639. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.D. de Kroon, H.A.G.M. van der Sandt, J.C. van Houwelingen, and F.W. Jansen Sonographic assessment of non-malignant ovarian cysts: does sonohistology exist? Hum. Reprod., September 1, 2004; 19(9): 2138 - 2143. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Woodward, K. Hosseinzadeh, and J. S. Saenger From the Archives of the AFIP: Radiologic Staging of Ovarian Carcinoma with Pathologic Correlation RadioGraphics, January 1, 2004; 24(1): 225 - 246. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-R. Orden, J. S. Jurvelin, and P. P. Kirkinen Kinetics of a US Contrast Agent in Benign and Malignant Adnexal Tumors Radiology, February 1, 2003; 226(2): 405 - 410. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |