Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Tello, R.
Right arrow Articles by Ptak, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tello, R.
Right arrow Articles by Ptak, T.
Related Collections
Right arrowRelated Article
(Radiology. 1999;211:605-607.)
© RSNA, 1999


Editorial

Statistical Methods for Comparative Qualitative Analysis1

Richard Tello, MD and Thomas Ptak, MD, PhD

1 From the Departments of Radiology (R.T., T.P.) and Epidemiology and Biostatistics (R.T.), Boston University School of Medicine, Boston Medical Center, 88 E Newton St, Boston, MA 02118. Received January 5, 1999; accepted January 8. Address reprint requests to R.T.

Index terms: Editorials • Magnetic resonance (MR), three-dimensional, 961.12942, 961.149, 961.721 • Magnetic resonance (MR), vascular studies, 961.12942, 961.149, 961.721 • Renal arteries, MR, 961.12942, 961.149, 961.721

In this issue of Radiology, Schoenberg et al (1) tackled the difficult task of comparing standard three-dimensional (3D) gadolinium-enhanced magnetic resonance (MR) angiography with bolus timing to a multiphase single-breath-hold MR angiographic acquisition. Although the magnitude of their project is comparable to that tackled frequently in the radiology literature, the authors have addressed some very difficult radiologic and statistical obstacles. Their article, in itself, stands to be a major reference work not only for radiologists interested in the current state of MR angiography but also for those interested in radiologic research and its appropriate methods. They began with the basic premise that direct bolus timing can be replaced with sequential full 3D acquisition if it is performed rapidly enough. Bolus timing is currently used in many methods, but the additional complexity and timing necessary to perform appropriate bolus-timing sequences requires the ability to obtain images in a way similar to that for digital subtraction angiography (2). After multiple sequential images are obtained, the optimal block of data obtained during peak arterial phase can be subselected and used for analysis and imaging diagnosis, which minimizes the overall time necessary to perform MR angiography. Preliminary work with two-dimensional sequential techniques provided the groundwork for sequential dynamic MR angiography (3). With their current study, Schoenberg and colleagues (1) have clearly improved diagnostic ability with multiphase 3D MR angiography.

Their study raised many potential questions, and, to avoid potential statistical pitfalls, the answers necessitate careful and diligent analysis to arrive at valid conclusions. The population groups imaged with the two sequences are of disparate sizes (37 vs 26 patients). This raises the temptation to subselect only the populations with disease diagnosed with standard of reference methods such as digital subtraction angiography or surgery and then to compare findings in these two subgroups on the basis of their respective specificity and sensitivity, which is typical in the radiology literature (4). Optimal comparison of two different techniques with the standard of reference could best be accomplished by means of paired analysis in the same patient populations, which, unfortunately, is not a realistic clinical scenario. Hence, the need occurs to compare two different techniques in two different populations, which is a common scenario in radiologic research (5).

Findings with quantitative evaluation appear to demonstrate comparable performance between standard and multiphase 3D MR angiography in stenosis grading and evaluation. The populations are small, and clearly, the investigational nature of their article indicates that study of a larger number of patients and angiographic correlation are needed for more accurate statistical validation. The important points in their article do not address diagnostic accuracy but rather the ability to maintain diagnostic certainty. By performing statistical analyses with inter- and intraobserver evaluations, which are effectively obtained by means of the {kappa} statistic (6), they demonstrate excellent, almost perfect inter- and intraobserver agreement in stenosis grading and vessel detection. Thus, the reliability of multiphase 3D MR angiography is firmly established, as is diagnostic certainty.

The questions that the authors address require complex statistical analysis to be answered rigorously and properly. Since their study depends heavily on the process of comparing subjective scores of study features such as severity of noise and quality of studies, use of the appropriate statistical tools is essential to achieve valid results. It is this feature that makes their article a landmark work in the MR angiography literature. The use of scoring schemes in radiologic studies is not new (7), but the recognition that the averaging of scores is not valid is rarely realized in the imaging literature (8).

Scoring Methods in Research
When scoring schemes are used, it is important to be able to differentiate whether a scoring scheme represents ordinal data or a nominal scale. Ordinal data can be ordered, but they do not have specific numeric values. Common arithmetic cannot be performed with ordinal data in a meaningful way because the data in a natural ordering scheme are usually related but not in a linear manner. Therefore, for example, the skill level of an angiographer to cannulate the main renal artery for renal digital subtraction angiography can be scored on the basis of relatively subjective criteria. With use of an ordinal scale (eg, 1, not competent; 2, competent; 3, exceptional), it becomes clearly evident that two "not competent" angiographers cannot be put together to form a competent angiographer. In particular, we cannot truly say that in a study in which one angiographer is rated as not competent and another is rated as exceptional, their mean performance is comparable to that of two competent angiographers. Because ordinal variables cannot be given a numeric scale that makes sense, the computation of means and SDs for such data is not valid. None of the methods of estimation and hypothesis testing commonly used in any quantitative environment, such as a t test, can be applied (9). However, we may still be interested in making comparisons between groups of variables in which an ordinal data scheme is used; thus, nonparametric statistical methods must be used.

Although this is not a common situation in radiology, the groundwork must be laid to understand that if we are dealing with a scoring scheme in which we are classifying categories that have no inherent ordering, then the use of a nominal scale is required to allow statistical analysis of the relationships among these categories. Data values in a nominal scale can be classified into categories, but the categories have no specific ordering. For example, in classifying the cause of death with a nominal scale (eg, 1, cancer; 2, cardiovascular disease; 3, renal disease; 4, all other causes), there is no ability to impose a particular ordering that relates to an inherent meaning. Generally, nominal categorization is not used in radiologic analysis, but ordinal data are often used (10). Hence, Schoenberg et al (1) exemplify the use of an ordinal data scale in the evaluation of various parameters compared between two imaging studies.

In a review of their methods, we found many points to emphasize. For example, in the semiquantitative analysis of image quality and vessel stenosis grading, an ordinal scale of 1–5 was used. Artifacts were rated with an ordinal scale of 1–3. Visual conspicuity and the ability to evaluate the extrarenal vascular tree were rated with an ordinal scale of 1–5. Finally, stenosis was graded with an ordinal scale of 1–5, which is not atypical in most angiographic research to date (11).

Analysis
How do we analyze these data? With interval data that arise from signal intensity measurements, the Wilcoxon rank sum test, also known as the Mann-Whitney U test, can be used (12). With ordinal data, use of the Mantel-Haenszel test is appropriate to analyze for trend. Note that these tests do not involve performance of any arithmetic with the ordinal scoring variables. To test the hypothesis that there is no difference in the visual conspicuity with the different techniques, we need to be able to take into account the magnitude and differences in the ability to evaluate vessel stenosis. Thus, use of a nonparametric test is required, and, therefore, whether data are paired or unpaired becomes important. A nonparametric analog to the paired t test is the Wilcoxon rank sum test, which can be used for two independent samples. For paired samples, the Wilcoxon signed rank test would be used. Therefore, to test the hypothesis that the median vessel conspicuity is comparable between images obtained with the two different techniques, the Wilcoxon rank sum test would be used (13).

With the Wilcoxon rank sum test, the data from the two groups are combined and ordered from lowest to highest. Ranks are assigned to the individual values from the best vessel conspicuity to the worst. If a group of observations has the same value, then the range of ranks for the group would be computed, and the average rank for each observation in the group would be assigned. The test statistic is the sum of the ranks in each sample group, and the distribution of the ranks is then assumed to be approximately normal. This is valid as long as the sample size is greater than 10, because the normal approximation can be used (13). The actual statistical significance level in small samples can be approximated by using the exact tables available in statistical books (13). Remember that the Wilcoxon rank sum test is often referred to as the Mann Whitney U test, so there should be no ambiguity in the understanding of the use of this test. How is this important in this case? The results of the U test demonstrated significantly different and improved vessel conspicuity and diagnosis with multiphase 3D MR angiography (1).

The Mantel-Haenszel test, which is a test for trend, allows statistical testing for the significance of differences between the two groups, as seen in their Table 2, which lists the distribution of image quality scores. Though the temptation would be to provide a mean quality score for each of the readers and each of the techniques, this is again not possible with a nominal scale. Rather, by observing the general distribution, we can see that there is agreement in the distribution of the quality scores between the observers, as well as a general improvement in quality with multiphase 3D MR angiography compared with the standard bolus-timing technique. Their Table 3 demonstrates this somewhat more succinctly by presenting the mode scores, which are more appropriate to communicate this information efficiently because they convey where the vast majority of scores aggregated. Typically in the literature (8), a false impression is created by the presentation of means or SDs. Such calculations are completely inappropriate if ordinal data are analyzed, as in the study by Schoenberg et al (1). Finally, calculation of the {kappa} statistic for both readers allows a quantitative measure of observer agreement. With the {kappa} statistic, however, an a priori consensus opinion cannot be formed. Therefore, this technique can be used only with separate blinded readers who do not know each other's interpretations.

In conclusion, Schoenberg et al (1) deal appropriately with some of the most complex and important statistical issues, in terms of the statistical validity of qualitative data, that must be dealt with in radiology every day. Their use of the appropriate statistical tools validates our ability to use statistical analyses in objective discussions. Our professional literature has been littered by subjective evaluations in an attempt to justify one method over another. The inability to adequately document and formalize these statistical observations has merely weakened our position in the general medical community. The article by Schoenberg et al (1) lays the groundwork to justify objectively and quantitatively, with intricate and excruciating detail, that multiphase 3D MR angiograms are easier, more reliable, and possibly more accurate to interpret than are standard 3D MR angiograms. They appropriately used the sophisticated statistical analyses required for nonparametric ordinal data from such qualitative evaluations. As such, theirs is a landmark article in the radiology literature that should serve as a standard for any future studies in which authors want to quantify subjective ratings and rankings with a new method.

Footnotes

Abbreviation: 3D = three-dimensional

See also the article by Schoenberg et al (pp 667–679 ) in this issue.

References

  1. Schoenberg SO, Bock M, Knopp MV, et al. Renal arteries: optimization of three-dimensional gadolinium-enhanced MR angiography with bolus-timing–independent fast multiphase acquisition in a single breath hold. Radiology 1999; 211:667-679.[Abstract/Free Full Text]
  2. Saadoon K. Atlas of normal and variant angiographic anatomy Philadelphia, Pa: Saunders, 1991.
  3. Tello R, Thomson KR, Witte D, Becker GJ, Tress BM. Standard dose Gd-DTPA dynamic MR of renal arteries. JMRI 1998; 8:421-426.
  4. de Haan MW, Kouwenhoven M, Thelissen RP, et al. Renovascular disease in patients with hypertension: detection with systolic and diastolic gating in three-dimensional, phase-contrast MR angiography. Radiology 1996; 198:449-456.[Abstract/Free Full Text]
  5. Sheafor DH, Keogan MT, Delong DM, Nelson RC. Dynamic helical CT of the abdomen: prospective comparison of pre- and postprandial contrast enhancement. Radiology 1998; 206:359-363.[Abstract/Free Full Text]
  6. Rosner B. Fundamentals of biostatistics 4th ed. Boston, Mass: Duxbury, 1995; 424-426.
  7. Costello P, Dupuy D, Ecker C, Tello R. Spiral CT of the thorax with small volumes of contrast material: a comparative study. Radiology 1992; 183:663-666.[Abstract/Free Full Text]
  8. Kim T, Murakami T, Takahashi S, et al. Effects of injection rates of contrast material on arterial phase hepatic CT. AJR 1998; 171:429-432.[Abstract/Free Full Text]
  9. Rosner B. Fundamentals of biostatistics 4th ed. Boston, Mass: Duxbury, 1995; 208-272.
  10. Tello R, Seltzer SE, Pelger M, Spaulding S, Sacci G. Design and development of an anthropomorphic nomogram for optimization of intravenous contrast delivery for hepatic volumetric CT. J Comput Assist Tomogr 1997; 21:236-245.[Medline]
  11. Spence LD, Tello R, Yucel EK, Kouwenhoven M, Groen J. Technical note: spiral MR angiography of the popliteal trifurcation. AJR 1998; 171:115-117.[Free Full Text]
  12. Rosner B. Fundamentals of biostatistics 4th ed. Boston, Mass: Duxbury, 1995; 560-562.
  13. Rosner B. Fundamentals of biostatistics 4th ed. Boston, Mass: Duxbury, 1995; 564-565.

Related Article

Renal Arteries: Optimization of Three-dimensional Gadolinium-enhanced MR Angiography with Bolus-timing–independent Fast Multiphase Acquisition in a Single Breath Hold
Stefan O. Schoenberg, Michael Bock, Michael V. Knopp, Marco Essig, Gerhard Laub, Hans Hawighorst, Ivan Zuna, Friedrich Kallinowski, and Gerhard van Kaick
Radiology 1999 211: 667-679. [Abstract] [Full Text] [PDF]



This article has been cited by other articles:


Home page
Am. J. Neuroradiol.Home page
H.J. Cloft
The Value of a P Value
AJNR Am. J. Neuroradiol., August 1, 2006; 27(7): 1389 - 1390.
[Full Text] [PDF]


Home page
RadiologyHome page
E. Castillo, H. Tandri, E. R. Rodriguez, K. Nasir, J. Rutberg, H. Calkins, J. A. C. Lima, and D. A. Bluemke
Arrhythmogenic Right Ventricular Dysplasia: Ex Vivo and in Vivo Fat Detection with Black-Blood MR Imaging
Radiology, July 1, 2004; 232(1): 38 - 48.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
J. S. Swan, T. J. Carroll, T. W. Kennell, D. M. Heisey, F. R. Korosec, R. Frayne, C. A. Mistretta, and T. M. Grist
Time-resolved Three-dimensional Contrast-enhanced MR Angiography of the Peripheral Vessels
Radiology, October 1, 2002; 225(1): 43 - 52.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
T. J. Carroll, F. R. Korosec, G. M. Petermann, T. M. Grist, and P. A. Turski
Carotid Bifurcation: Evaluation of Time-resolved Three-dimensional Contrast-enhanced MR Angiography
Radiology, August 1, 2001; 220(2): 525 - 532.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Tello, R.
Right arrow Articles by Ptak, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tello, R.
Right arrow Articles by Ptak, T.
Related Collections
Right arrowRelated Article


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE