|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Musculoskeletal Imaging |
1 From Depts of Radiology (F.J.G., M.G.C.G.), Health Services Research Unit (A.M.G., M.G.C.G., L.D.V., M.K.C.), Health Economics Research Unit (L.D.V.), Public Health (N.W.S.), and Orthopaedic Surgery (D.J.K., D.W.), University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, Scotland. Members of the Scottish Back Trial Group and their affiliations are listed at the end of this article. Received Jun 12, 2003; revision requested Aug 13; final revision received Oct 31; accepted Nov 25. Supported by NHS Research & Development Health Technology Assessment Programme. Health Services Research Unit and Health Economics Research Unit supported by Chief Scientist Office, Scottish Executive Health Dept. Address correspondence to F.J.G. (e-mail: f.j.gilbert@abdn.ac.uk).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: In a multicenter randomized study, two imaging policies for LBP were compared in 782 participants with symptomatic lumbar spine disorders who were referred to orthopedists or neurosurgeons. Participants were randomly allocated to early (393 participants; mean age, 43.9 years; range, 1682 years) or delayed selective (389 participants; mean age, 42.8 years; range, 1482 years) imaging groups. Delayed selective imaging referred to imaging restricted to patients in whom a clear clinical need subsequently developed. Main outcome measures were Aberdeen Low Back Pain (ALBP) score, Short Form 36 (SF-36) score (for multidimensional health status), EuroQol (EQ-5D) score (for quality-adjusted life-year [QALY] estimates), and healthcare resource use at 8 and 24 months after randomization. Data were evaluated with analysis of covariance, ordinal logistic regression analysis, and
2 and Mann-Whitney tests.
RESULTS: Both groups showed improvement in ALBP score, but this was greater in the early group (adjusted mean difference between groups, 3.05 points [95% CI: 5.16, 0.95; P = .005] and 3.62 points [95% CI: 5.92, 1.32; P = .002] at 8 and 24 months, respectively). Scores for SF-36 (bodily pain domain) and EQ-5D were also significantly better at 24 months. Clinical treatment was similar in both groups. Differences in total costs reflected cost of imaging. Imaging provided an adjusted mean additional QALY of 0.041 during 24 months at a mean incremental cost per QALY of $2,124.
CONCLUSION: Early use of imaging does not appear to affect treatment overall. Decisions about the use of imaging depend on judgments concerning whether the small observed improvement in outcome justifies additional cost.
© RSNA, 2004
Index terms: Cost-effectiveness Efficacy study Spine, CT, 33.1211 Spine, MR, 33.1214
| INTRODUCTION |
|---|
|
|
|---|
It is unclear which of the diagnostic imaging pathways is most effective and cost-effective and how the imaging impacts on patient treatment. Thus, in the middle 1990s the place of sophisticated imaging in the treatment of LBP was identified as a research priority by the UK National Health Service (NHS) Research and Development Health Technology Assessment Programme (6).
The purpose of our study was to establish whether early use of magnetic resonance (MR) imaging or computed tomography (CT) influences treatment and outcome of patients with LBP and whether it is cost-effective. A pragmatic (7) multicenter study design was used to evaluate two imaging policies for LBP as might be applied in routine clinical practice settings in the UK NHS. Early imaging (implying liberal use of imaging) was compared with delayed selective imaging (implying use restricted to patients in whom a clear clinical need subsequently developed).
| MATERIALS AND METHODS |
|---|
|
|
|---|
All new patients who had symptomatic lumbar spine disorders at presentation were eligible for the trial if there was clinical uncertainty about the need for imaging (MR imaging or CT). Patients were excluded who required immediate referral for imaging (eg, those who had signs suggestive of serious abnormalities or disease ["red flags"] or who required surgical intervention), who had undergone MR imaging or CT of the spine in the previous 12 months, who did not need imaging (eg, those who were discharged to primary care), and who had pain of a nonspinal origin.
Principal Outcome Measures
A variety of outcome measures are now available to measure health outcomes for patients with LBP, and these include generic instruments that provide a summary of overall health and more sensitive condition- or disease-specific instruments that focus on LBP (8,9). The primary outcome measure was the Aberdeen Low Back Pain (ALBP) score. This condition-specific questionnaire allows assessment of LBP across several dimensions, including pain, physical impairment, and functional disability. It has been rigorously validated and shown to be more responsive to clinical change than is the Short Form 36 (SF-36) general health status questionnaire (10,11). Responses to the 19-item questionnaire are summed and converted to a percentage score (scores range from 0 for least disabled to 100 for most disabled). Other principal outcome measures were the SF-36, a generic instrument that is widely used and has been shown to be a reliable and valid instrument for the assessment of functional status (12), and the Euroqol (EQ-5D) (13), a generic instrument for measurement of health status that was specifically developed for the derivation of quality-adjusted life-years (QALYs). The SF-36 has eight subscales (physical and social functioning, physical and emotional role limitation, mental health, vitality, bodily pain, general health) that are each scored from 0 (poorest health) to 100 (best health). The EQ-5D includes five items (mobility, self-care, usual activities, pain or discomfort, and anxiety and depression) from which a single utility score may be derived (scores range from 0.59 for worst possible health state to +1 for best possible health state). In addition, measures of health service and participant resource use were also evaluated.
Data Collection
After we obtained informed consent, research nurses collected baseline clinical and demographic details, and participants completed health status questionnaires prior to random assignment to groups. At 8 and 24 months, health status measures and information about primary care consultation, purchase of prescription and nonprescription medicines, non-NHS treatment (eg, private physiotherapy and osteopathy), and time off work and discontinuation or interruption of usual activities because of LBP were collected with postal self-completion questionnaires. Telephone calls and reminder letters were used to increase the return of questionnaires.
The validity of information in regard to NHS resource use obtained from questionnaires was assessed by comparing patient recall with data abstracted from case notes for the first 100 patients recruited to the trial. The results of this comparison showed that although data may have been available from fewer people, questionnaires provided data that were broadly similar to information in hospital notes but covered items that were not recorded in hospital case notes, such as primary care physician consultations. Data about description of events that were available in case notes (eg, epidural injections) during the 24 months after trial entry were collected retrospectively from hospital case notes by trained researchers. These researchers were blinded to the original randomization allocation. Data were supplemented with information from the postal questionnaires. One researcher was responsible for data abstraction from all but one of the centers, and this researcher cross-checked the validity of the data abstracted by the researcher at the remaining center.
Randomization
Throughout the trial, we used a system of "distant" randomization with which we ensured complete separation of the randomization process from those providing care. For the first 66 (8%) patients, a system of simple randomization was used, and this was superseded by our automated system in which we incorporated minimization to ensure a balance in key prognostic variables. Assignment to the early imaging group (in which MR imaging or CT was performed as soon as was practicable) or the delayed selective imaging group (in which no MR imaging or CT was performed unless a clear clinical indication subsequently developed [eg, a decision to perform surgery]) was accomplished with the method of minimization (14). Hereafter, the "early imaging" and the "delayed selective imaging" groups will be termed "early" and "delayed selective" groups, respectively. Groups were stratified according to individual clinician with balancing for age (four age bands: <21, 2140, 4160, >60 years), sex, and diagnostic category. On the basis of clinical criteria, patients were assigned to one of five diagnostic categories: (a) symptomatic lumbar disk herniation, (b) root entrapment secondary to degenerative disease, (c) neurogenic claudication, (d) chronic LBP not caused by the previous three conditions, and (e) other LBP (spondylosis, spondylolisthesis, sacroiliitis, pathologic fracture, and osteoporosis).
Imaging modality and patient treatment plan were chosen at the discretion of the referring clinician.
Sample Size
The original aim was to recruit 1,200 participants to provide 90% power to identify a difference of 3.0 points in the ALBP score and 80% power to detect a difference of 2.5 points (P < .05, two-tailed test). Following lower than expected recruitment and reconsideration of the size of difference that would be clinically significant, the recruitment number was revised to 800 at the time of the data monitoring committee meeting. This provided 90% power to detect a difference of 3.7 points in the ALBP score and 80% power to detect a difference of 3.2 points (P < .05, two-tailed test).
Statistical Analysis of Noneconomic Data
Categoric variables were analyzed with the
2 test, and continuous outcomes were evaluated with the Mann-Whitney test (where data were not normally distributed). For the ALBP score, the EQ-5D score, and scores for six of the subscales of SF-36, the primary method was analysis of covariance with adjustment for the factors used in the minimization method (age, sex, diagnostic category, and clinician) and the score at baseline. Except for the ordinal analyses, we also adjusted for the consultant and treated that adjustment as a random factor in the model. The data are reported as the adjusted mean scores and differences in means with 95% CIs. The three subscales of SF-36 with six or fewer possible responses were treated as ordinal outcomes, and ordinal logistic regression analysis with adjustment for the minimization factors and the score at baseline was used. All data were analyzed on an intention-to-treat basis. Preplanned secondary analyses were stratified according to diagnostic category, buttock or leg pain, duration of current episode, and trial center.
Economic Data and Statistical Analysis
Derivation of NHS costs.Total average costs for the two trial policies were derived from estimated costs that were based on changes in treatment. The areas of treatment considered were related to hospital-based services (outpatient consultation; imaging; physiotherapy; hospital admission; surgery; injection; provision of back supports, corsets, or braces), primary care services (general practitioner visits, use of prescription and nonprescription medicines), and other tests (blood and urine tests) and devices. Unit costs were obtained from published sources (1517) or a costing exercise conducted as part of the trial in the six hospitals from which the majority of participants were recruited. Existing sources of cost data did not provide sufficient detail to allow an event, such as a surgical operation, to be costed out. In these situations, a "bottom-up" costing exercise was conducted in the six hospitals from which the majority of participants were recruited. Bottom-up costing involved identification of the staff, materials (both disposable and reusable), and relevant overhead costs (heat, power for imaging units, light, building costs) required to provide a procedure or test.
Staff costs were based on national salary scales plus additions for national insurance and superannuation (method of contribution to pension scheme). Missing questionnaire data were imputed primarily by using mean imputation, as the quantity of missing data for any given variable was small. Regression imputation was used for two variables identified as heavy cost drivers (number of primary physician care visits and of physiotherapy sessions). The cost per patient was the cost per event multiplied by the number of events. The UK treasury discount rate of 6% was applied to costs incurred in the 2nd year of follow-up and the amortized costs of reusable equipment. A cost per patient was estimated by dividing the equivalent annual cost by the number of people who would be expected to use the equipment in a year. Costs were derived in 20012002 UK pounds and converted into U.S. dollars with the exchange rate of U.S. $1.44 to UK £1.
Derivation of QALYs.The QALYs were estimated with the standard EQ-5D UK tariff. The response health state tariff was calculated for each participant by using standard syntax (SPSS; EuroQol, York, England). Adjusted mean additional QALYs were calculated by using analysis of covariance and by adjusting for the factors used in minimization and determination of baseline EQ-5D scores (18).
Assessment of cost using utility.With established economic methods, the incremental cost per QALY was estimated from the mean costs and effectiveness differences between the groups (19). Distribution-free methods were used for statistical inference with respect to costs, QALYs, and cost per QALY because of skewed distributions (20). This approach incorporates the uncertainty surrounding estimates of cost per patient caused by differences in health care resource use and the uncertainty surrounding QALY estimates. Bootstrap biascorrected methods were used to estimate CIs for the difference in QALYs and the incremental cost per QALY.
The estimates of cost per QALY derived with the bootstrap method are presented in terms of a cost-effectiveness acceptability curve. The cost-effectiveness acceptability curve allows a decision-maker to consider whether the intervention is cost-effective in relation to some value he or she thinks is the maximum cost worth paying for a QALY. At each ceiling value for the willingness of society to pay for a QALY, the cost-effectiveness acceptability curve shows the probability that the treatment would be cost-effective.
Sensitivity analysis.The main determinant of the differences in cost between early and delayed selective imaging was the cost of an image obtained with an MR imaging unit or CT scanner. The cost of imaging varied between the average observed cost of $129 and $720 or £500, which represents the upper estimate of the average UK cost of an MR image when all building and overhead costs are included. For illustrative purposes, the analysis was also repeated for an average cost of $432 or £300.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
For the SF-36 score, the clearest difference was in the bodily pain subscale score, for which the adjusted mean difference was 4.54 (95% CI: 1.23, 7.86; P = .007) at 8 months and 5.14 (95% CI: 1.61, 8.67; P = .004) at 24 months. For other subscales, the adjusted scores were generally better in the early group, and the differences were statistically significant at 8 months for social functioning, vitality, and reported health transition.
The EQ-5D score also showed improvement in the early group, compared with the delayed selective group, significantly so at 24 months (adjusted difference, 0.057; 95% CI: 0.013, 0.101; P = .01 [Table 2]). Secondary analyses stratified according to diagnostic category, the presence of buttock or leg pain at trial entry, duration of current episode, or the clinical center for recruitment of patients showed adjusted differences in means for the ALBP score at 24 months that were all statistically compatible with the overall result (Fig 2).
|
|
|
| DISCUSSION |
|---|
|
|
|---|
There were no significant differences between the two groups in overall clinical treatment, except for the use of imaging and the timing of outpatient appointments. Also, the pattern of changes in ALBP scores was more consistent, with small improvements in a large number of participants rather than a large improvement in a few participants. These findings suggest that imaging may have a small direct effect, perhaps through reassurance, rather than an indirect effect, through changes in treatment in a minority of patients. This observation is consistent with the findings of a study (21) about the influence of imaging on clinical decision making that showed greater clinician confidence in the diagnosis in the early group; this confidence might have been transferred to the patient. Improvement in patient psychologic well-being and satisfaction has been reported in two small trials with radiography for patients with LBP, but no differences in patient outcome were observed (22,23). In addition, investigators in a study (24) of a randomized comparison of radiography and MR imaging for patients with LBP reported similar patient outcomes in both groups. Results of that study indicated that there was no measurable improvement in functional status or health-related quality of life; however, the MR imaging group had greater self-reported reassurance and satisfaction.
In this study, participants were allocated to groups by using the method of minimization. Although this is essentially a nonrandom method, we included a random element in the allocation process to ensure that those who were recruiting participants could never have predicted the allocation with certainty. We also adjusted for the minimization factors in the analysis, as failure to do so could have resulted in overconservative results (14). Although the groups were similar in regard to the minimization factors of age, sex, specialist, and diagnostic category, there were more participants with a shorter duration of episode in the early group. Furthermore, there were also differences in the baseline score of all three outcome measures, with the early group reporting better scores. Nevertheless, the differences in outcome persisted after adjustment for baseline differences. In the interpretation of results, one should also consider possible imprecision in the estimates. On the basis of the 95% CIs, the true difference (after baseline adjustment) in ALBP score at 24 months is likely to be between 1.32 and 5.92. The decision about whether or not any effect is clinically and economically important may be substantially influenced by whether the true effect is at the upper or lower end of this range.
Although we did not expect any effects of early imaging to be large because of the chain of events between imaging and the time of follow-up assessment of health status (25), the clinical relevance of the small observed change that benefited the early group is difficult to assess. Garratt and colleagues (11) compared the responsiveness of health status questionnaires and reported a change of approximately 7.5 points in the ALBP score for patients in the control group who said they were better at 1 year. Similarly, these patients recorded a score change of 0.14 points with the EQ-5D score. In this context, our observed score difference of 3.62 points in the ALBP score and 0.057 in the EQ-5D score may not represent meaningful changes to patients. In a comparison of the estimates of effect derived from different instruments, such as the UK Back Pain Exercise and Manipulation Trial (Garratt AM, personal communication, 2001), patients who reported "no change," "slightly improved," or "much improved" had differences in ALBP scores between the ratings of 3.25 and 5.29.
With our own data, we compared the score for the reported health transition subscale of the SF-36 with the ALBP score at 24 months. The majority of patients in our study reported the status as "somewhat worse," "about the same," and "somewhat better," and the differences in ALBP scores between these strata were 4.24 and 10.28. However, it should be noted that the reported health transition subscale of the SF-36 relates to general health and that patient responses to the question for this subscale will be influenced by changes in not only back pain but also other health problems. Results of a comparison of SF-36 score changes with subjective reports of improvement in a group of patients with sciatica suggested that a seven-point difference in the SF-36 physical functioning and bodily pain subscales was consistent with a clinically important difference (26). Although this finding suggests that our observed difference of 4.54 points (95% CI: 1.23, 7.86) in the bodily pain subscale score may not be clinically important, caution is required in generalizing results to all categories of LBP for which the prognosis and natural history may differ (27).
Since the patients in this study had baseline scores for SF-36 and EQ-5D that were lower (ie, poorer health) than scores reported in two studies (22,23) of patients with LBP referred by primary care physicians for plain radiography, our data may only be generalizable to the patients with more severe disability who are referred to secondary care specialists.
In the secondary stratified analyses, estimates of effect were similar in the various clinical categories and consistent with the overall trial results, and there was no clear evidence of a larger or smaller effect in any subgroup of participants. In relation to duration of symptoms, the pattern of changes in outcome scores did not follow the expectation that the improvement would be greatest among those with pain for less than 3 months. Instead, the largest difference in the mean score was in the group of participants who had pain between 3 and 12 months.
Although there was a statistically significant difference in EQ-5D scores at 24 months, this did not equate to a difference in QALYs that was statistically significant at 24 months. The reason for this is that the QALY represents cumulative quality-adjusted survival at 24 months and as such incorporates the 8-month EQ-5D data. Nevertheless, the evidence suggests that there is more than a 95% probability that early imaging is associated with greater QALYs at 24-month follow-up. Furthermore, the economic evaluation indicated that there is a 9% probability that imaging is both less costly and more effective and approximately a 95% chance that the incremental cost per QALY is less than $50,000. These estimates are, however, sensitive to changes in the cost of imaging, the main cost driver.
With respect to study limitations, there was a baseline imbalance in the ALBP, SF-36, and EQ-5D scores, with the early group reporting better health status. In addition, at presentation, the episode of back pain tended to be longer in the delayed selective group. Furthermore, patients with a shorter duration of back pain may be expected to have better outcome (28,29). Nevertheless, differences in outcome persisted after adjustment for baseline values, and the values included in the study were obtained after adjustment. At trial entry, it was intended that patients complete the baseline questionnaire prior to notification of their randomization allocation. However, if patients were notified of their allocation while they were in the process of completing the questionnaire, the knowledge of being referred for imaging might have affected their self-reporting of health status (30).
Since baseline scores for the SF-36 and EQ-5D were lower (ie, they indicated poorer health) than scores reported in studies (2224) involving patients with LBP in primary care settings, our data may only be generalizable to patients with more severe disability who are referred to secondary care specialists. Furthermore, differences in waiting times and access to secondary care and imaging services could influence the generalizability of the study results. In the economic evaluation, biases may have arisen in the participants recall of health service use and in regard to missing data. However, the concordance between the alternative data sources was high; because of the efforts made to ensure that questionnaires were returned and to assess all hospital records, the total quantity of missing data was relatively low.
Decisions about the use of sophisticated imaging will depend to an important extent on judgments about the value of the observed differences in outcome and whether they are worth the extra costs of early imaging. The use of MR imaging does not appear to affect treatment overall, and the small observed improvement in outcome is of questionable clinical importance. Although some researchers may argue that any improvement is worthwhile, given that the other costs of treatment do not appear to be increased, others may say that the cost of providing a small improvement in patients overall well-being is not justifiable, especially when there are competing demands for MR imaging resources.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
The views expressed in this article are those of the Scottish Back Trial Group.
Abbreviations: ALBP = Aberdeen Low Back Pain, EQ-5D = EuroQol, LBP = low back pain, NHS = National Health Service, QALY = quality-adjusted life-year, SF-36 = Short Form 36
Author contributions: Guarantor of integrity of entire study, F.J.G.; study concepts, F.J.G., A.M.G., D.W.; study design, F.J.G., A.M.G., M.G.C.G., M.K.C., D.W., D.J.K.; literature research, F.J.G., M.G.C.G.; clinical studies, F.J.G., D.W., D.J.K.; data acquisition, M.G.C.G., N.W.S., L.D.V.; data analysis/interpretation, L.D.V., N.W.S.; statistical analysis, N.W.S., L.D.V.; manuscript preparation, M.G.C.G., L.D.V., F.J.G., A.M.G., N.W.S., D.W., D.J.K.; manuscript definition of intellectual content, M.G.C.G., L.D.V., F.J.G., A.M.G.; manuscript editing, F.J.G., A.M.G., M.K.C.; manuscript revision/review, F.J.G., A.M.G., L.D.V., N.W.S., M.K.C.; manuscript final version approval, all authors
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K.-A. Jansson, G. Nemeth, F. Granath, B. Jonsson, and P. Blomqvist Health-related quality of life (EQ-5D) before and one year after surgery for lumbar spinal stenosis J Bone Joint Surg Br, February 1, 2009; 91-B(2): 210 - 216. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Deyo, S. K. Mirza, J. A. Turner, and B. I. Martin Overtreating Chronic Back Pain: Time to Back Off? J Am Board Fam Med, January 1, 2009; 22(1): 62 - 68. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.M. Ash, M.T. Modic, N.A. Obuchowski, J.S. Ross, M.N. Brant-Zawadzki, and P.N. Grooff Effects of Diagnostic Information, Per Se, on Patient Outcomes in Acute Radiculopathy and Low Back Pain AJNR Am. J. Neuroradiol., June 1, 2008; 29(6): 1098 - 1103. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Ouwendijk, M. de Vries, T. Stijnen, P. M. T. Pattynama, M. R. H. M. van Sambeek, J. Buth, A. V. Tielbeek, D. A. van der Vliet, L. J. SchutzeKool, P. J. E. H. M. Kitslaar, et al. Multicenter Randomized Controlled Trial of the Costs and Effects of Noninvasive Diagnostic Imaging in Patients with Peripheral Arterial Disease: The DIPAD Trial Am. J. Roentgenol., May 1, 2008; 190(5): 1349 - 1357. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Chou, A. Qaseem, V. Snow, D. Casey, J. T. Cross Jr, P. Shekelle, D. K. Owens, and for the Clinical Efficacy Assessment Subcommittee Diagnosis and Treatment of Low Back Pain: A Joint Clinical Practice Guideline from the American College of Physicians and the American Pain Society Ann Intern Med, October 2, 2007; 147(7): 478 - 491. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M Kymes, K. Lee, J. W Fletcher, and SNAP (CSP 027) Study Group Assessing diagnostic accuracy and the clinical value of positron emission tomography imaging in patients with solitary pulmonary nodules (SNAP) Clinical Trials, February 1, 2006; 3(1): 31 - 42. [Abstract] [PDF] |
||||
![]() |
J Teh, A Imam, and C Watts Imaging of back pain Imaging, December 1, 2005; 17(3): 171 - 207. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Ouwendijk, M. de Vries, P. M. T. Pattynama, M. R. H. M. van Sambeek, M. W. de Haan, T. Stijnen, J. M. A. van Engelshoven, and M. G. M. Hunink Imaging Peripheral Arterial Disease: A Randomized Controlled Trial Comparing Contrast-enhanced MR Angiography and Multi-Detector Row CT Angiography Radiology, September 1, 2005; 236(3): 1094 - 1103. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Carragee Persistent Low Back Pain N. Engl. J. Med., May 5, 2005; 352(18): 1891 - 1898. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |