Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print October 19, 2006, 10.1148/radiol.2413051358
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Appendices
Right arrow All Versions of this Article:
2413051358v1
241/3/854    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Roposch, A.
Right arrow Articles by Doria, A. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roposch, A.
Right arrow Articles by Doria, A. S.
(Radiology 2006;241:854-860.)
© RSNA, 2006


Special Report

Developmental Dysplasia of the Hip: Quality of Reporting of Diagnostic Accuracy for US1

Andreas Roposch, MD, MSc, Nicole M. Moreau, BHSc, Elizabeth Uleryk, BA, MLS and Andrea S. Doria, MD, MSc, PhD

1 From the Department of Orthopaedic Surgery, Great Ormond Street Hospital for Children, Institute of Child Health, University College London, Great Ormond St, London WC1N 3JH, England (A.R.); Population Health Sciences Research Institute (N.M.M., A.S.D.) and Department of Diagnostic Imaging (A.S.D.), the Hospital for Sick Children, Toronto, Ontario, Canada; the Hospital for Sick Children Library, Toronto, Ontario, Canada (E.U.); and Department of Medical Imaging, University of Toronto, Ontario, Canada (A.S.D.). Received August 15, 2005; revision requested October 19; revision received November 2; accepted December 1; final version accepted February 1, 2006. Address correspondence to A.R. (e-mail: a.roposch{at}ich.ucl.ac.uk).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
Purpose: To systematically review the quality of diagnostic accuracy reporting in studies on the use of ultrasonography (US) for the diagnosis of developmental dysplasia of the hip (DDH).

Materials and Methods: A systematic review of the MEDLINE, EMBASE, DARE, and Cochrane Library databases was performed by using a validated search strategy. Two independent reviewers evaluated articles by using the Standards for Reporting of Diagnostic Accuracy (STARD) and Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews (QUADAS) statements. Items were reported individually for STARD and QUADAS because these instruments do not incorporate a summary score. A simple {kappa} statistic with 95% confidence intervals was used to measure the level of agreement between the two reviewers.

Results: Ten studies were included. In three studies, reliability was investigated, and in seven studies elements of both validity and reliability were investigated. In no study did the authors adequately report more than 40% of the STARD items. The quality of methods that were used in the studies was poor. Only one (14%) of seven studies provided information on more than 50% of the QUADAS items. All studies included a good description of image acquisition, but data analysis was imperfect and lacked estimates of diagnostic accuracy and precision. Authors tended to overinterpret their results.

Conclusion: Overall, there was imperfect reporting of diagnostic accuracy in studies on the use of US for diagnosis of DDH.

Supplemental material: radiology.rsnajnls.org/cgi/content/full/2413051358/DC1

© RSNA, 2006


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
Developmental dysplasia of the hip (DDH) is the most common congenital musculoskeletal disorder in childhood (1) and is characterized by varying displacement of the proximal femoral head from the acetabulum, with poor development of the acetabulum. Ultrasonography (US) is considered the imaging technique of choice for the diagnosis of DDH in the neonate and infant (1). Therefore, diagnostic decisions are often made on the basis of the results of this imaging test (1). To a large extent, the results of the US examination determine the need for treatment, further diagnostic testing (including radiography), or referral to other health care providers.

To produce results that have validity (the degree to which a US scan corresponds to the true state of the hip) and reliability (the degree to which the same result is found when the hip is scanned on two different occasions), it is important that US methods for examination of the hip fulfill the basic diagnostic test standards (Fig 1). In recent decades, quality assessment of diagnostic tests has gained increasing interest. Complete and accurate reporting is essential to judge the generalizability of the results and the potential for bias. It has been suggested that the general quality of reporting in diagnostic test studies is poor (25), with an overestimation of diagnostic accuracy (2). In a recent systematic review, Woolacott et al (6) assessed the role of US for screening policies in patients with DDH; however, to our knowledge no previous systematic review has been conducted to investigate the current status of knowledge on measurement properties (ie, reproducibility and validity) for US of the hip. Thus, the purpose of our study was to systematically review the quality of diagnostic accuracy reporting in studies on the use of US for the diagnosis of DDH.


Figure 1
View larger version (5K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1: Diagram demonstrates how to determine if a test is useful. The first step, reproducibility (reliability), is an absolute prerequisite. Accuracy (reliability and validity) is the next important step. Only when these two elements, which are also referred to as measurement properties, are established are further investigations (diagnostic yield studies) useful. These include investigations on the effect of the test result on clinical decision making, the feasibility and cost-effectiveness of the test, and the risks associated with the test. Finally, the potential effects of the test on clinical outcomes are investigated. The present systematic review focused on the first two steps (reproducibility plus validity equals accuracy).

 

    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
Data Sources and Search
An electronic search of the literature was performed by two reviewers (E.U., A.R.) who identified studies in which authors reported on the measurement properties of US methods for the diagnosis of DDH. The MEDLINE (January 1966 to April 2005) and EMBASE (January 1980 to April 2005) databases were searched by using a sensitive search strategy (3) that combined medical subject headings and EMBASE terms with free text words in Ovid; these terms included "sensitivity," "specificity," "false negative," "accuracy," "screening," "US," "sonography," "congenital hip dislocation," "hip dysplasia," "predictive values," and "likelihood ratios." In addition, the DARE database of the National Health Service Center for Reviews and Dissemination and the Cochrane Library database (3rd quarter 2004) were examined for relevant abstracts.

Two reviewers (A.R., A.S.D.) independently assessed the titles and key words of all eligible citations to determine if the studies met our inclusion criteria. If the content of a study was not obvious from the title and key words, the abstract was retrieved and evaluated by both reviewers for eligibility. In the second step, all abstracts of articles that were found to be eligible for inclusion were reviewed independently by the same two reviewers. Finally, the original studies of the selected articles were evaluated independently (A.R., A.S.D.). At any stage, disagreements were discussed and resolved in a consensus meeting before the next step could be performed.

Inclusion Criteria
This systematic review included studies on the measurement properties (diagnostic accuracy) of any US methods that were reported for the diagnosis of DDH in neonates (age 0–4 weeks), infants (1–12 months), or older children (13–24 months). Specifically, we included studies in which at least one of the following criteria for single methods were described: process criteria (ie, how to perform the US examination) and, if applicable, conversion criteria (ie, how to interpret a US scan). Also included were studies in which any form of reliability (reproducibility) or validity was investigated. Reliability is defined as obtaining the same result when a phenomenon is measured by the same or different clinicians on the same or different occasions (7). Validity is the degree to which the result of a measurement corresponds to the true state of the phenomenon being measured (7).

Excluded were studies with inappropriate reference standards, such as studies on the correlation between clinical examination and US or diagnostic yield studies (eg, studies on US screening policies for DDH). Clinical examination cannot be considered a reference standard to determine the diagnostic accuracy of US because it does not provide an equivalent or superior amount of information compared with US. Articles written in languages other than English, French, German, Italian, Spanish, and Portuguese were excluded.

Data Extraction and Outcome Measures
The Standards for Reporting of Diagnostic Accuracy (STARD) statement (8), which was developed to improve the reporting of studies on diagnostic accuracy, was used to assess the quality of reporting. This evaluative tool contains 25 items. The reviewers independently evaluated each study to determine whether each item was adequately described according to the STARD guidelines (8); items were rated as adequately described, not described, or partially described. Disagreements were resolved by consensus. The STARD statement does not incorporate a quality score, and items are reported individually (8).

For each study, the quality of methods was assessed by using the Quality Assessment of Studies of Diagnostic Accuracy Included in Systematic Reviews (QUADAS) criteria, which is a 14-item instrument (9). For each item, the two reviewers independently assessed whether the elements that were mentioned in that item were adequately described (yes or no). If it was unclear from the information provided in the article as to whether an item had been addressed in a particular study, the item was rated as "unclear." Disagreements were discussed and resolved by consensus. Reliability studies were not assessed with the QUADAS criteria because only two items were applicable. Similar to the STARD tool, QUADAS does not incorporate a total quality score (10).

Statistical Analysis
The level of agreement between the two reviewers in scoring the STARD and QUADAS criteria was assessed by using a simple {kappa} statistic with 95% confidence intervals (SAS, version 9.1; SAS Institute, Cary, NC).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
Search and Selection
The electronic database search retrieved 397 citations (Fig 2). On the basis of title, key words, and publication type, 66 references were selected for further evaluation. A review of the abstracts of these 66 references resulted in 22 abstracts meeting the inclusion criteria. The original articles of these 22 abstracts were then retrieved. An evaluation of these articles led to the exclusion of 11 studies, including three review articles or expert opinions, one study on performer characteristics, one cadaver study, four longitudinal studies, and two cross-sectional studies on the correlation between clinical examination and US. Thus, we included four studies in which the reliability of US methods for the diagnosis of DDH was assessed (1113) and seven studies in which the elements of validity and reliability were assessed (1419).


Figure 2
View larger version (9K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2: Flow diagram demonstrates search and selection process.

 
Quality of Reporting
The time needed for quality assessment was approximately 1 hour for each article. Interrater agreement on the items of the STARD statement was very good ({kappa} = 0.83 [95% confidence interval: 0.77, 0.89]). Disagreements (n = 21) between the two reviewers were resolved in all cases and were the result of poor reporting of the study design and a vague description of key issues, such as the number of participants that satisfied the inclusion criteria (item 16) or the time interval between index and reference tests (item 17).

Overall, all items in the STARD statement were poorly reported (Appendix E1 [http://radiology.rsnajnls.org/cgi/content/full/2413051358/DC1]). Only one (9%) of 11 studies (14) included the medical subject heading term "sensitivity and specificity" (item 1). The objective of the study was reported in all but one article. Study objectives were heterogeneous and included a description of US methods for examination of the hip, evaluation of the correlation between US of the hip and radiography, and assessment of reproducibility (item 2).

Participants.—Only four (36%) of 11 articles included information on the study population; however, no article specifically provided the inclusion and exclusion criteria of the study. Only three (27%) of 11 studies included some information on the recruitment process. In three studies (27%), apparently consecutive cases were sampled, and in two studies (18%) a random sample was used. Data collection was performed prospectively in two studies (18%) and retrospectively in three studies (27%); data collection was not specified for the remaining studies.

Reference standard.—In all seven validity studies, radiography was used as the reference standard. However, it was not specifically stated what kind of radiographic techniques were used or when the radiographs were obtained in relation to the US examination. Among several radiographic measures, the acetabular index was the most consistently used (four studies). However, the rationale for using this particular measure as a reference standard was not stated in any study.

Test methods (items 8–11).—Process criteria were consistently well reported for all studies. However, the rationale for using specific units, cutoff values, or categories of US results was not stated in any study. Time intervals between US and radiography were not reported in any study, and the distribution of the severity of disease in patients with DDH was reported in only two (18%) of 11 studies. Only one (9%) of 11 studies provided explicit information on the number and expertise of the persons who executed and read the index and reference standard tests (12).

Statistics (items 12–13).—In two (18%) of 11 studies, Pearson correlation coefficients for US to radiography were calculated as a measure of concurrent validity. In four (36%) of 11 studies, reproducibility was described in terms of the mean difference between single measurements, whereas in three studies (27%) standard deviations or confidence intervals were included. Sound statistical methods for the assessment of reproducibility were used in only three studies (27%) (11,12).

Time frame of study (item 14).—Only four (36%) of 11 studies included information on when the study had been performed.

Characteristics of participants (item 15).—Overall, there was insufficient reporting of the clinical and demographic characteristics of participants, with only four (36%) of 11 studies providing sufficient information (1416,19).

Test results (items 17–20).—In studies on diagnostic accuracy, none of the authors reported their results in a 2 x 2 table for the entire sample nor did they define diseased verses nondiseased cases or true-positive versus true-negative findings. In one (14%) of seven studies, absolute numbers were not reported (15); in a second study (14%), only the results of a subsample were reported (16); and in a third study (14%), the definitions for single disease states (eg, residual dysplasia, dysplasia, and subluxation) were not specified, thereby making it unclear as to what the true-positive and false-positive rates were.

Estimates (items 21–24).—Precision values, such as 95% confidence intervals, were not reported for the estimates of diagnostic accuracy in any study. Possible sources of heterogeneity in the results were not explored in any study. Interestingly, indeterminate results, such as hips that were dysplastic but not dislocated, were not reported in any study that assessed the validity of US. As for test reproducibility, Bar-On et al (11) reported an interrater reliability (mean {kappa}) of 0.50 (95% confidence interval: 0.45, 0.55) and an intrarater reliability of 0.61 (95% confidence interval: 0.49, 0.69) for the Graf method of US of the hip. These estimates were based on the rating of US scans by three pediatric orthopedic surgeons. In the study by Ömeroglu et al (12), estimates were not significantly different among the four groups of raters with different levels of expertise according to the Graf method; the best interrater and intrarater reliability results ({kappa} coefficient) were 0.36 ± 0.06 (standard deviation) and 0.62 ± 0.18, respectively.

Of the remaining seven studies, four (57%) contained inter- and intrarater errors for US methods, including means and standard deviations, and four (57%) did not include an evaluation of the reproducibility of methods. Dias et al (20) investigated the reliability of several US parameters and showed a wide range of intra- and interrater reliability, with {kappa} coefficients ranging from 0.46 ± 0.24 to 0.68 ± 0.19 and from 0.09 ± 0.38 to 0.27 ± 0.25, respectively. They also reported intraclass correlation coefficients for {alpha} and ß angles, which were better for intrarater reliability (0.69 and 0.78, respectively) than for interrater reliability (0.65 and 0.11, respectively).

Quality of Methods
Seven studies were assessed by using the QUADAS tool (Appendix E2, [http://radiology.rsnajnls.org/cgi/content/full/2413051358/DC1]). Approximately 45 minutes were required to evaluate each article with the QUADAS instrument. The QUADAS criteria were investigated after all studies had been evaluated with the STARD statement. Interrater agreement on the items of the QUADAS instrument was very good ({kappa} coefficient, 0.87 [95% confidence interval: 0.78, 0.96]). Disagreements between the two reviewers could be resolved in all cases. Disagreements resulted from a vague description of key information regarding the sampling of the patients (item 1), the selection of patients who underwent verification with radiography (item 5), and the timing of US and radiography (item 7).

Overall, the quality of methods used in these studies was poor, with information on image acquisition (item 8) reported consistently among all studies. Radiography was used as the reference standard in all studies (item 3) but was not consistently performed in all study subjects and did not result in the correct classification of the target condition in all studies.

In six studies, clinical data were available during the interpretation of the US examination because such data would also be available in clinical practice (item 12). In four studies (57%), the spectrum of patients was found to be representative of patients who would undergo the test in practice (item 1). In four studies (57%), it was clearly reported whether the whole sample or only a part of the sample underwent radiography (item 5). Selection criteria (item 2), the time period between US and radiography (item 4), missing information at radiograph acquisition (item 8), and no reporting on potential problems with the interpretation of US findings (item 13) were among the major methodologic flaws (Appendix E2, [http://radiology.rsnajnls.org/cgi/content/full/2413051358/DC1]).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 
The use of US in the context of DDH is controversial. Reliability (1,21,22), as well as validity (2325), has been called into question. The STARD statement was used to assess the quality of reporting. Complete and accurate reporting allows the reader to detect the potential for bias and to judge the generalizability of the results. QUADAS, which is used to assess the quality of methods, allows the reader to examine whether certain design features have potentially led to an underestimation or overestimation of accuracy. Neither the QUADAS nor the STARD method incorporates a total quality score. Quality scores are not useful in the systematic review of diagnostic studies because the association between quality score and diagnostic accuracy tends to be poor (10). We therefore elaborated on individual items of the STARD statement, as recommended by the STARD initiative (8).

Overall, the results of the present study indicate that diagnostic test standards for US of the hip in the context of DDH are poorly established. We identified only seven studies in which the authors addressed any aspect of validity for three different US methods and only four studies in which the authors assessed reliability. In none of the accuracy studies did the authors adequately report more than 40% of the STARD items. Also, the quality of methods in studies on the validity of US of the hip was poor. Only one (14%) of seven studies included information on more than 50% of the QUADAS items (14).

Studies on the validity of US of the hip had one positive commonality in that they provided an appropriate description of image acquisition. However, the formulation of the research question and the rationale for using radiography as a reference standard (criterion validity) were poor because the expected relationship between US and radiography (construct validity) was not formulated at all. Overall, the validity of single methods remains unclear. There was poor definition of diseased and nondiseased cases throughout the studies, and 2 x 2 tables were not reported. A consistent finding among all studies was that dislocation on US scans correlated well with radiographic results, as did normal hips. The area between these two extremes was considered dysplasia but was not well defined. As a result, measures of test accuracy, such as sensitivity and specificity, remained unclear. Interestingly, of the seven validation studies, only one (14%) was performed prospectively, which partially justifies the poor quality of methods for studies included in this review.

Better design and reporting were found in the reliability studies. In three studies, sound methods were applied to establish the reliability of US of the hip according to the Graf method (11,12). Reliability was found to be poor (20) to moderate (11,12), and further studies were recommended to improve it.

Reliability has been assessed for other US methods as well. However, substantial flaws in design and analysis were noted. There was a lack of information regarding the profession, training, and expertise of the raters; the time frame of the study; and the sample size calculation. Estimates of reproducibility were given as means and standard deviations for continuous variables, without accounting for chance agreement between the two ratings. Without the use of sound methods, the generalizeability and interpretation of results remains unclear. For instance, Andersson (13) based his reliability assessment on only five cases per rater and recommended a "visual analysis with a more global approach" for US scans.

The STARD initiative recommended that authors of diagnostic studies should consistently use the medical subject heading term "sensitivity and specificity" to facilitate retrieval of their studies. We support this concern because our review confirmed that the exclusive use of these medical subject heading terms limits the search for studies (3,26). Only 47 (12%) of the 396 citations that were identified with our search strategy were obtained by using this medical subject heading term.

We agree with others (4,8) and recommend the use of flow diagrams to illustrate the methods used in diagnostic accuracy studies. In agreement with the findings of Smidt et al (4), the reviewers in the present study had to spend a considerable amount of time identifying the sampling frame, determining the sequence in which the tests were performed, and identifying the number of individuals who underwent these tests. Flow diagrams would have been useful to identify these issues more efficiently.

Several major methodologic flaws were identified by using the QUADAS instrument, including sampling bias, verification bias, and imperfect analysis, all of which were related to the use of inappropriate statistical methods. There was a tendency to overinterpret results. For example, the author of one study concluded that image acquisition and interpretation were easy to learn but did not provide any data to support this conclusion (13). In another study (16), normal and dislocated hips correlated well at US and radiography but dysplastic hips did not. Still, the authors concluded, "US was reliable in detecting hip pathology."

This study was not conducted to investigate the role of US for DDH screening. This issue falls within diagnostic yield studies, which were not the focus of the present systematic review. In a recently published systematic review, Woolacott et al (6) concluded that there is a lack of evidence either for or against US screening of newborns for DDH. Although we did not investigate the issue of screening, the results of our study correspond with those of Woolacott et al, which is plausible because diagnostic accuracy is a basic standard and prerequisite of any diagnostic test. If basic standards are not met, then the application of the test for screening purposes will likely demonstrate poor results as well.

A limitation of our study was the difficulty in identifying diagnostic accuracy studies of US methods for examination of the hip. We used a validated search strategy (3), which has a sensitivity of 80% and a specificity of 97%. Thus, we may have missed studies. However, we can be certain that for the three methods of US examination of the hip that were included (14,16,27), all of the eligible studies were included.

The results of our study raise interesting issues. Although basic diagnostic test standards are not met by commonly used methods of US of the hip, these methods have become standard in clinical care and have even been applied in population screening programs (28). Regarding US for the diagnosis of DDH, there is poor evidence for the diagnostic accuracy and benefit of the diagnostic test in terms of outcome (6).

Considering that there has been a shift toward evidence-based and cost-effective heath care and that clinicians are ordering many more diagnostic tests now than in the past, physicians may be required to more rigorously justify the benefit of tests for individual patients. This may involve justifying how the test result may change clinical decision making, how it may improve the likelihood of a correct diagnosis, and how it may improve clinical outcomes.

An overuse of diagnostic investigations in general has been suggested (2,29), and a more efficient use was recommended. Thus, on the basis of the results of our study we see a clear need for further investigation of the diagnostic accuracy of US of the hip. Establishing sufficient diagnostic accuracy is the prerequisite for diagnostic yield studies, such as studies on the role of US for DDH screening.

We suggest that three main elements should be considered in future studies. First, a clear description of the participants and the sampling frame is essential. The lack of information on demographic characteristics and inclusion and exclusion criteria compromises the interpretation of the results.

Second, radiography seems to be an adequate reference standard. However, the rationale for choosing the reference standard has to be stated, as well as the kind of relationship the investigators expect a priori between US and the reference standard.

Third, the test results should be reported with regard to the a priori assumptions. Cutoff values to distinguish between diseased and nondiseased cases are essential to cross-tabulate the results and to calculate estimates of accuracy with confidence limits, where applicable.

We found the STARD criteria useful in the context of this systematic review. They provided a good framework to assess studies on diagnostic accuracy. We recommend use of the STARD criteria not only for evaluating but also for preparing and drafting studies on diagnostic accuracy.


    ADVANCE IN KNOWLEDGE
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 


    FOOTNOTES
 

Abbreviations: DDH = developmental dysplasia of the hip • QUADAS = Quality Assessment of Studies of Diagnostic • Accuracy Included in Systematic Reviews • STARD = Standards for Reporting of Diagnostic Accuracy

Authors stated no financial relationship to disclose.

Author contributions: Guarantor of integrity of entire study, A.R.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, all authors; experimental studies, A.R., A.S.D.; statistical analysis, A.R.; and manuscript editing, all authors


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ADVANCE IN KNOWLEDGE
 References
 

  1. Herring JA. Developmental dysplasia of the hip. In: Herring JA, ed. Tachdjian's pediatric orthopaedics. Philadelphia, Pa: Saunders, 2002; 527–530.
  2. Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061–1066.[Abstract/Free Full Text]
  3. Deville WL, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65–69.[CrossRef][Medline]
  4. Smidt N, Rutjes AW, van der Windt DA, et al. Quality of reporting of diagnostic accuracy studies. Radiology 2005;235:347–353.[Abstract/Free Full Text]
  5. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research: getting better but still not good. JAMA 1995;274:645–651.[Abstract]
  6. Woolacott NF, Puhan MA, Steurer J, Kleijnen J. Ultrasonography in screening for developmental dysplasia of the hip in newborns: systematic review. BMJ 2005;330:1413–1418.[Abstract/Free Full Text]
  7. Feinstein A. Clinimetrics. New Haven, Conn: Yale University Press, 1987.
  8. Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7–18.[Abstract/Free Full Text]
  9. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25–37.[CrossRef][Medline]
  10. Whiting P, Harbord R, Kleijnen J. No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol 2005;5:19–28.[CrossRef][Medline]
  11. Bar-On E, Meyer S, Harari G, Porat S. Ultrasonography of the hip in developmental hip dysplasia. J Bone Joint Surg Br 1998;80:321–324.
  12. Ömeroglu H, Bicimoglu A, Koparal S, Seber S. Assessment of variations in the measurement of hip ultrasonography by the Graf method in developmental dysplasia of the hip. J Pediatr Orthop B 2001;10:89–95.[CrossRef][Medline]
  13. Andersson JE. Neonatal hip instability: normal values for physiological movement of the femoral head determined by an anterior-dynamic ultrasound method. J Pediatr Orthop 1995;15:736–740.[Medline]
  14. Morin C, Harcke HT, MacEwen GD. The infant hip: real-time US assessment of acetabular development. Radiology 1985;157:673–677.[Abstract/Free Full Text]
  15. Morin C, Zouaoui S, Delvalle-Fayada A, Delforge PM, Leclet H. Ultrasound assessment of the acetabulum in the infant hip. Acta Orthop Belg 1999;65:261–265.[Medline]
  16. Terjesen T, Runden TO, Tangerud A. Ultrasonography and radiography of the hip in infants. Acta Orthop Scand 1989;60:651–660.[Medline]
  17. Terjesen T. Ultrasound as the primary imaging method in the diagnosis of hip dysplasia in children aged < 2 years. J Pediatr Orthop B 1996;5:123–128.[Medline]
  18. Zieger M. Ultrasound of the infant hip. II. Validity of the method. Pediatr Radiol 1986;16:488–492.
  19. Boal DK, Schwenkter EP. The infant hip: assessment with real-time US. Radiology 1985;157:667–672.[Abstract/Free Full Text]
  20. Dias JJ, Thomas IH, Lamont AC, Mody BS, Thompson JR. The reliability of ultrasonographic assessment of neonatal hips. J Bone Joint Surg Br 1993;75:479–482.
  21. Exner GU. Ultrasound screening for hip dysplasia in neonates. J Pediatr Orthop 1988;8:656–660.[Medline]
  22. Weinstein SL, Mubarak SJ, Wenger DR. De-velopmental hip dysplasia and dislocation. I. Instr Course Lect 2004;53:523–530.
  23. Sucato DJ, Johnston CE, Birch JG, Herring JA, Mack P. Outcome of ultrasonographic hip abnormalities in clinically stable hips. J Pediatr Orthop 1999;19:754–759.[CrossRef][Medline]
  24. Engesaeter LB, Wilson DJ, Nag D, Benson MK. Ultrasound and congenital dislocation of the hip: the importance of dynamic assessment. J Bone Joint Surg Br 1990;72:197–201.
  25. Castelein RM, Sauter AJ. Ultrasound screening for congenital dysplasia of the hip in newborns: its value. J Pediatr Orthop 1988;8:666–670.[Medline]
  26. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ 1994;309:1286–1291.[Abstract/Free Full Text]
  27. Graf R. Fundamentals of sonographic diagnosis of infant hip dysplasia. J Pediatr Orthop 1984;4:735–740.[Medline]
  28. Roposch A. Twenty years of hip ultrasonography: are we doing better today? J Pediatr Orthop 2003;23:691–692.[Medline]
  29. Winkens R, Dinant GJ. Evidence base of clinical diagnosis: rational, cost effective use of investigations in clinical practice. BMJ 2002;324:783–785.[Free Full Text]



This article has been cited by other articles:


Home page
RadiologyHome page
A. Roposch and J. G. Wright
Increased Diagnostic Information and Understanding Disease: Uncertainty in the Diagnosis of Developmental Hip Dysplasia
Radiology, February 1, 2007; 242(2): 355 - 359.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Appendices
Right arrow All Versions of this Article:
2413051358v1
241/3/854    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Roposch, A.
Right arrow Articles by Doria, A. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roposch, A.
Right arrow Articles by Doria, A. S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE