|
|
||||||||
Thoracic Imaging |
1 From the Institute for Technology Assessment, Massachusetts General Hospital, 101 Merrimac St, 10th Floor, Boston, MA 02114 (P.M.M., C.Y.K., G.S.G.); Departments of Radiology (P.M.M., C.Y.K., J.O.S., G.S.G.) and Medicine (B.E.J., M.C.W., J.C.W.), Harvard Medical School, Boston, Mass; Lowe Center for Thoracic Oncology (B.E.J.) and Department of Medical Oncology/Population Sciences (J.C.W.), Dana-Farber Cancer Institute, Boston, Mass; Department of Health Policy and Management, Harvard School of Public Health, Boston, Mass (M.C.W., K.M.K., G.S.G.); Department of Health Policy and Management, School of Public Health, University of Minnesota, Minneapolis, Minn (K.M.K.); and Department of Radiology, Mayo Clinic, Rochester, Minn (S.J.S.). Received August 13, 2007; revision requested October 10; revision received November 9; accepted January 2, 2008; final version accepted January 30. Supported in part by the National Cancer Institute (R01 CA97337, G.S.G.; K99 CA126147, P.M.M.). Address correspondence to P.M.M. (e-mail: pamela{at}mgh-ita.org).
| ABSTRACT |
|---|
|
|
|---|
Materials and Methods: The study was approved by institutional review boards and was HIPAA compliant. Deidentified individual-level data from participants (1520 current or former smokers aged 50–85 years) in the Mayo Clinic helical CT screening study were used to populate the Lung Cancer Policy Model, a comprehensive microsimulation model of lung cancer development, screening findings, treatment results, and long-term outcomes. The model predicted diagnosed cases of lung cancer and deaths per simulated study arm (five annual screening examinations vs no screening). Main outcome measures were predicted changes in lung cancer–specific and all-cause mortality as functions of follow-up time after simulated enrollment and randomization.
Results: At 6-year follow-up, the screening arm had an estimated 37% relative increase in lung cancer detection, compared with the control arm. At 15-year follow-up, five annual screening examinations yielded a 9% relative increase in lung cancer detection. The relative reduction in cumulative lung cancer–specific mortality from five annual screening examinations was 28% at 6-year follow-up (15% at 15 years). The relative reduction in cumulative all-cause mortality from five annual screening examinations was 4% at 6-year follow-up (2% at 15 years).
Conclusion: Screening may reduce lung cancer–specific mortality but may offer a smaller reduction in overall mortality because of increased competing mortality risks associated with smoking.
© RSNA, 2008
| INTRODUCTION |
|---|
|
|
|---|
Together, all of these factors suggest that lung cancer may be a good candidate for mass screening. Large randomized trials of screening with chest radiography, however, did not demonstrate a reduction in lung cancer mortality (10–13). Recent single-arm studies of lung cancer screening with CT have shown that screening helps detect greater than twice as many early-stage lung cancers than would be expected to be detected without screening (9,13–17). Single-arm study designs that compare a screening population versus an external nonscreening population, however, cannot definitively demonstrate reductions in either lung cancer–specific or all-cause mortality rates. The possibility of observing higher interval (eg, 5-year) survival after diagnosis in the absence of a mortality reduction is explained by several well-known biases present in screening trial data: lead-time, length-time, and overdiagnosis biases (18–21). Because all three biases can contribute to longer survival of patients with screening-detected cancers, a control arm is critical for parsing out any true effect of screening on mortality.
In conjunction with trials, simulation modeling may be used to integrate available data (22), to evaluate screening programs, and to inform those making screening decisions in the years prior to publication of long-term results from randomized trials. Investigators in two large ongoing randomized trials (23,24) will report short-term results on the effectiveness of lung cancer screening during the next few years, but publication of trial results does not always eliminate uncertainty about the effectiveness of cancer screening (13,25,26). By varying screening protocols, patient populations, or adherence rates, models can be used to interpret and reconcile apparently inconsistent trial results. Modeling also may be used to estimate sample sizes.
The objective of our evaluation was to use individual-level data provided from the single-arm study of helical CT screening at the Mayo Clinic (Rochester, Minn) (16,27) to estimate the long-term effectiveness of screening in Mayo study participants and to compare estimates from an existing lung cancer simulation model with estimates from a different modeling approach that used the same data (28). Individual-level data on smoking histories are essential for modeling the dynamic relationship between smoking behavior and lung cancer risk because summary tables of smoking history and the common pack-year metric collapse critical information about timing and dose.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Lung Cancer Policy Model
The Lung Cancer Policy Model (LCPM) is a comprehensive microsimulation model of lung cancer development, disease progression, lung cancer detection, treatment results, and survival (29). The LCPM addresses three important limitations of published models used to evaluate lung cancer screening (30–36). First, the LCPM simulates survival after screening detection as a function of "true" (known in the model) disease characteristics (eg, stage and growth rates). The explicit modeling of survival avoids the problematic assumption—inherent in stage-shift models—that screening-detected cancers behave like non–screening-detected cancers; the screening biases listed previously undermine this assumption (18–21,37,38). Second, the LCPM explicitly models benign nodules. Compared with a range of lung cancer prevalence of 1%–2.7%, 23%–51% of screened smokers have detectable nodules on baseline CT screening scans (14,16,39). High false-positive rates are potentially worrisome because of the burdens that subsequent evaluations place on patients and the health care system and because people with false-positive screening scans may be less likely to participate in subsequent screening examinations (40). Third, the LCPM incorporates the high competing mortality risks faced by cigarette smokers (41); failure to incorporate those risks would yield a biased (inflated) estimate of the effect of screening on mortality.
The LCPM is a state-transition model analyzed as a patient-level Monte Carlo simulation to allow for individual heterogeneity in risk factors (eg, smoking history) and event rates. A 1-month cycle length captures the short survival times of late-stage lung cancer and allows for a variety of event frequencies. Outputs include estimates of incident cancers stratified according to age, cell type, and stage, estimates of survival according to stage at detection, and estimates of non–lung cancer deaths.
Natural history parameters were estimated by populating the LCPM with individuals assigned a smoking history that was representative of a specified age-sex-race–calendar year cohort of the U.S. population (42,43) and by calibrating the model to tumor registry data from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (8). Data from past cohort studies (44) and other literature sources (4,45–47) that described clinical experience were used as secondary calibration targets.
Data from the Mayo CT screening study (16) were used to estimate age-specific probabilities of benign nodules. To estimate the proportion of adenocarcinoma that was bronchioloalveolar carcinoma and the distribution of the growth parameter for adenocarcinoma, we calibrated the model to the binomial 95% confidence interval (CI) around the prevalence (all cell types combined, excluding sputum-detected and interval cancers) reported from the Mayo CT screening study (16). Remaining end points reported from the study were used as validation targets. Further model details and parameter values may be found in a model profile under "Lung-Massachusetts General Hospital" at the National Cancer Institute Web site (http://cisnet.cancer.gov/profiles/).
Mayo CT Screening Study Data
The Mayo CT Screening Study was a single-arm evaluation of helical CT screening for lung cancer in current and former smokers (16,27,48). Participants were enrolled during 12 months (January to December 1999). At enrollment, the study participants (n = 1520) had a mean age of 59 years (range, 50–85 years); 52% were men and 48% were women; and 99% were white; 1% were African American, Native American, or Hispanic. At enrollment, most (61%) participants were current smokers, and the participants had a smoking history of a median of 45 pack-years (range, 20–230 pack-years). All participants underwent baseline (prevalence) screening and were assigned to three annual CT examinations. Later, the number was changed to four examinations, for a total of five screening examinations.
Simulating Outcomes in the Screening Arm
The LCPM was populated with the Mayo CT study participants by drawing (with replacement) from the individual records. We simulated the published screening protocol (five annual helical CT screening examinations) and assumed perfect adherence for individuals who did not receive a diagnosis of lung cancer. Participants with a confirmed diagnosis of lung cancer were not eligible for screening but could instead undergo surveillance (National Cancer Institute Web site at http://cisnet.cancer.gov/profiles/). We modeled participants with screening-detected nodules as undergoing the suggested follow-up protocol, which varies according to the size of the largest nodule, of the study (16). We assumed that helical CT scans could depict nodules as small as 2 mm in diameter (consistent with the 5-mm section thickness and 3.75-mm reconstruction thickness in screening examinations and 1–3-mm section thickness on follow-up thin-section CT scans in the study) and derived estimates of test sensitivity according to size and location from the study (16). We defined an imaging examination with false-positive findings as one in which an ultimately benign nodule of any size was detected. Note that this definition allows us to disregard those few screening examinations for which findings were positive but no nodule existed (per-person specificity of helical CT was assumed to be 98%); follow-up thin-section CT was assumed to be capable of resolving all such screening CT findings (ie, perfect specificity for absence of any pulmonary nodule).
Simulating Outcomes in the Control Arm
To simulate a hypothetical control arm for the Mayo CT screening study, we populated the LCPM with the same (deidentified) individual smoking histories from the Mayo CT study and disabled the screening component, although asymptomatic lung cancers could be detected incidentally (National Cancer Institute Web site at http://cisnet.cancer.gov/profiles/).
Base-Case and Sensitivity Analyses
Base-case outputs from the screening arm (simulated as described previously) were used to validate data against published study end points (16,27). Outputs from both simulated study arms were used to estimate the stage shift caused by screening; detection, mortality, and survival rates were stratified according to study arm. Changes in lung cancer–specific and all-cause mortality were predicted. Counts of cases of lung cancer and of deaths according to cause were normalized to a study size of 1520 per study arm for consistency with the Mayo CT study. With sensitivity analyses, we examined the effects of changes in critical but uncertain model inputs. We modeled scenarios of 10 annual screening examinations; reduced operative mortality rates for lobectomy, mediastinoscopy, and excisional (wedge) biopsies (National Cancer Institute Web site at http://cisnet.cancer.gov/profiles/); and implemented a protocol with no follow-up for nodules smaller than 4 mm in diameter. Also modeled was a protocol with more sampling biopsies (fine needle, bronchoscopy, mediastinoscopy) and correspondingly fewer excisional biopsies for indeterminate growing nodules (20% vs 50% base-case rate) and large but stable nodules (2% vs 11% base-case rate). An additional sensitivity analysis simulated a 20% annual smoking cessation rate in the screening arm, beginning at the baseline screening and continuing for the life of the cohort.
Statistical Analysis
We simulated 500 000 individuals per study arm to generate more precise estimates of effectiveness than would be possible with a smaller sample size; accordingly, P values for comparisons between study arms are not informative. By comparing outcomes in the simulated control and screening study arms, we were able to simulate the likely results of a two-arm clinical trial of the same or larger size as the Mayo CT study. Estimated changes in 15-year lung cancer–specific mortality were calculated from 10 replicates of 1520 simulations per study arm to reflect the potential variance in effectiveness across multiple trials in which the same number of participants were enrolled. To investigate the effect of sample size, we increased the number of simulations per study arm in increments of 1000 until a significant (P < .05, log-rank test) difference in 15-year lung cancer mortality was evident between the study arms in nine of 10 replicates.
| RESULTS |
|---|
|
|
|---|
|
At the second screening in the study (ie, first-incidence screening), three participants with lung cancer were identified (observed rate, 0.2%; predicted rate, 0.29%). After 4 years of screening, at least one nodule was identified in a reported 74% of participants, versus a predicted rate of 69.5%. By 2 years of follow-up, 0.46% (n = 7), versus the base-case predicted rate of 2%, of study participants had undergone wedge excision of benign pulmonary nodules.
Comparison with Control Arm
Lung cancer cases and deaths.—Predicted outcomes according to years of follow-up for the hypothetical control arm and the simulated screening arm (500 000 simulations each, normalized to the study size of 1520) are presented in Table 2. By 6 years after enrollment, there were an estimated 14 additional diagnosed cases of lung cancer in the screening arm versus the control arm (52 vs 38 cases). The number of additional lung cancer cases (n = 14) corresponds to an absolute increase in lung cancer detection of 0.9%, calculated as [(52 – 38)/1520 · 100], or a 37% relative increase, calculated as {[(52/1520) – (38/1520)]/[38/1520]}. Extrapolated over the lifetime of the trial cohort, the two simulated study arms differed by only eight cases of lung cancer (screening-related increase in lung cancer detection, 0.5% [absolute] and 4% [relative]).
|
One of the 11 hypothetical individuals who would have died from lung cancer in the absence of screening died of other causes within the same 15 years after enrollment. The reductions in 15-year cumulative all-cause mortality were 0.6% (absolute) and 2% (relative).
Iatrogenic deaths (defined as those that resulted from invasive staging or therapy for benign disease) were rare but 37% more likely in the screening arm (30 deaths per 100 000 population) compared with the control arm (22 per 100 000).
Detection rates, stage shift, and survival.—NSCLC detection as a function of follow-up time (ie, years since randomization) was higher in the simulated screening arm than in the hypothetical control arm while screening was in place (years 0 through 4 since randomization), but at the end of routine screening, the detection rate in the screening arm decreased below that of the control arm (Fig 1). SCLC detection was the same in both simulated study arms (Fig 1).
|
|
Evaluating trial size.—Of 10 replicates of 1520 simulations per trial arm (the size of the study), screening yielded a significant (P < .05, log-rank test) reduction in 15-year lung cancer–specific mortality in one replicate. A significant (P < .05) result was first observed in nine of 10 replicates when the simulated number of patients per trial arm reached 8000.
Sensitivity Analyses
Extrapolated over the lifetime of the cohort, 10 annual screening examinations, compared with the scenario of no screening, yielded a 14% relative reduction in cumulative lung cancer–specific mortality. The predicted reductions in 15-year cumulative all-cause mortality were 1% (absolute) and 3% (relative).
In the analysis for which we assumed that current smokers in the screening arm had a smoking cessation rate of 20% per year (held constant beginning at enrollment), an estimated 75% of current smokers at enrollment had quit smoking by the end of the screening period, with nearly complete cessation by 20 years. The 20% annual cessation rate is similar to the 23% rate reported by researchers in the Early Lung Cancer Action Program (49), versus a lower reported estimate of 14% from the Mayo Clinic (50). Relative to a 3% cessation rate in the simulated control arm, the high cessation rate combined with five annual screening examinations reduced relative 15-year lung cancer–specific mortality by 16% and relative 15-year all-cause mortality by 11%.
Reducing the operative mortality rates from lobectomy, mediastinoscopy, and video-assisted thoracoscopic surgery reduced the rate of iatrogenic deaths at 6 years to 7.6 predicted deaths per 100 000 participants in the control arm and 12.4 predicted deaths per 100 000 in the screening arm. Absolute death rates decreased by approximately two-thirds compared with the base-case rate, but the higher rate of follow-up in the screening arm relative to the control arm yielded a relative increase in the risk of death of 63%.
In the scenario of no follow-up for lesions smaller than 4 mm in diameter, fewer iatrogenic deaths occurred in the control arm (12 deaths per 100 000), but the number of deaths in the screening arm was unchanged from the number in the base-case analysis. The percentage of screening participants who underwent excision of benign nodules within 2 years was also unchanged from the base-case rate of 2%. In the scenario of fewer excisional biopsies, 1.4% of participants in the screening arm, versus 0.6% of participants in the control arm, underwent excision of benign nodules within 2 years of simulated randomization.
| DISCUSSION |
|---|
|
|
|---|
Our overall conclusion—that CT screening may offer a moderate lung cancer mortality reduction—lies between the conclusions of two recent high-profile studies, one of which concluded that screening offered a large benefit (9) and the other of which concluded that it offered no benefit (28). In the next paragraphs, we place our findings in the context of these two contradictory claims.
The International Early Lung Cancer Action Program published results from a single-arm collaborative screening study of more than 31 000 individuals (9). The reported 88% 10-year survival of participants with stage I lung cancers is consistent with the LCPM-predicted stage shift and 10-year survival rate of 87% for participants with screening-detected stage I NSCLCs. Although survival is prolonged, the LCPM provides no corroboration for a reduction as large as 80%, as estimated in the discussion section of the article (9), in lung cancer deaths with CT screening. For this cohort, the LCPM predicted that the relative reduction in lifetime lung cancer mortality would increase from 8% with five screening examinations to 14% with 10 screening examinations. With single-arm study designs, changes in mortality cannot be quantified (51).
By using individual data from three single-arm screening studies, the largest of which was the Mayo CT study (16), Bach et al (28) used an existing prediction model to estimate the participants' hypothetical outcomes in the absence of screening. Their main finding was that CT screening in the combined cohorts of a total of 3210 participants offered no significant reduction in lung cancer mortality (relative risk, 1.0; 95% CI: 0.7, 1.3; P = .9) (28), but the conclusion of Bach et al that there is no benefit was based on exclusion of all lung cancer deaths in the 1st year of the hypothetical control arm from the mortality calculation. For the Mayo study in particular, if the 26.68 1st-year deaths were not excluded (table 2 in Bach et al [28]), the Bach model would have predicted a 29% relative reduction in lung cancer–specific mortality at 6 years of follow-up, versus 28% from the LCPM (which predicted 26.4 1st-year deaths). The justifications cited by Bach et al (28) for excluding the 1st-year deaths are (a) the requirement that study individuals be asymptomatic at enrollment and (b) the implausibility of an immediate decrease in lung cancer mortality from screening. We did not exclude lung cancer deaths from the 1st year of the hypothetical control arm follow-up, because the LCPM explicitly models symptom detection and therefore the two simulated trial arms would be identical at enrollment, as in a real trial. Similarly, in a microsimulation model such as the LCPM, there is no need to make the rather strong assumption that screening could never detect a fast-growing cancer and prevent a lung cancer death within the next year. No lung cancer deaths were observed in the 1st year of the Mayo CT study, but the 95% CI would be wide, given the number of study participants. Bach et al (28) excluded participants older than 80 years of age and lighter smokers (more than 5% of the Mayo CT study participants), so the higher cumulative mortality rate (1.85% vs 1.74% in the LCPM) is not unexpected.
The LCPM predicted a lower relative risk for lung cancer diagnosis in the screening arm versus the control arm (1.3 at 1.5 years of follow-up) than did the model in the study of Bach et al (28) (1.99 at 2 years), but our predicted increase in iatrogenic deaths mirrored the predicted increase in resections that resulted from screening with the model of Bach et al. Unlike the model of Bach et al, the LCPM predicted a decrease in advanced (stage IV) NSCLC because of screening, but the methods for estimating stage distributions are different. To estimate the proportion of advanced cancers, the prediction model of Bach et al was adjusted by using observed stage distributions in the SEER registries (matched for age and sex but not smoking status, as smoking status is not recorded in the SEER registries). Nonsmokers are known to have less aggressive histologic types of cancer (eg, adenocarcinoma) and therefore may be more likely than smokers to receive a diagnosis of early-stage cancers. As part of the development of the LCPM, the model was populated with specific age, sex, race, and calendar-year cohorts that are representative of the U.S. population in terms of smoking history (ie, including nonsmokers) and was calibrated to stage distributions in corresponding cohorts in the SEER registries.
It is logical that screening will allow diagnosis of some cancers that would not have been apparent during the individual's lifetime: Without screening, the individual dies of unrelated causes, unaware of the asymptomatic cancer. Thus, the question of how much overdiagnosis is too much is difficult to answer on the basis of numbers of cancers in each study arm. Policy makers will benefit from analyses that enumerate the trade-offs between the lung cancer deaths that are prevented and the iatrogenic deaths that are caused by invasive staging and treatment that arise from screening. Cost-effectiveness analyses will be necessary to evaluate whether imaging-based screening is a good use of resources, relative to other interventions, including effective smoking cessation programs or improved treatments.
Volunteer populations are highly selected and probably differ from the general population in unidentified ways. In addition to exclusion of persons who were symptomatic for lung cancer and the possibilities of healthy volunteer bias (28) and endemic histoplasmosis, participants in the Mayo CT screening study were recruited by using television news coverage and 99% were white. Therefore, the results of our analysis may not be predictive of the effect of screening in a broader population of current and former U.S. smokers. Compared with the Mayo CT study, the National Lung Screening Trial (NLST) (24) enrolled more participants (n = 50 000) with stricter requirements for age (55–74 years) and smoking history (
30 pack-years). Also, in the NLST, CT was compared with chest radiography rather than screening with CT versus no screening, as in our simulation. Because of the differences in participants and trial design, our findings from this study should not be interpreted as evidence that the NLST trial will be underpowered to detect a significant result. However, the NLST is 90% powered to detect a 21% mortality reduction at 6 years of follow-up between screening with CT and screening with chest radiography (52), versus our estimate of a 28% mortality reduction at 6 years of follow-up between screening with CT and no screening. In the event that a significant result is not observed from the NLST, simulation modeling may help in interpretation of those findings.
The analysis in this study had several limitations in addition to those common to all studies with modeling (53). Randomized controlled screening trials have not yet demonstrated a reduction in lung cancer–specific mortality and cannot therefore inform model inputs. The LCPM was calibrated (in the absence of screening) to lung cancer incidence rates in the nine core SEER registries, in which much of the South is excluded and thus may not be generalizable to the entire U.S. population. Information about doubling times and aggressiveness of lung cancers other than the four main histologic types of lung cancer, which together comprise approximately 90% of lung cancers (54), is limited. The LCPM does not generate the identical prevalence as observed in the Mayo CT study; rather, accepting a prevalence within the 95% CI avoids overfitting to the small numbers of study participants and cancers. We predicted a higher rate of excision of benign nodules than was observed in the study, because of our modeling assumption that individuals undergo guideline staging and treatment. However, with the screening arm, a 2.3-fold increased risk of excisional biopsy was observed for benign disease, even in the scenario with fewer excisional biopsies. A critical output, the increased risk of iatrogenic death caused by screening, was variable and dependent on assumptions about details of the follow-up protocol.
Our assumption that all lung cancer patients (in both simulated study arms) received guideline treatment may not be generalizable to the U.S. population and may lead to overestimation of the survival gains that are possible from screening. In the Mayo study, 14% of participants had incidental findings other than lung cancer at screening examinations. Including any benefit and harm accruing from such incidental findings would affect the predicted overall mortality reduction, although these effects would probably be minimal (55). The LCPM explicitly simulates only lung cancer, so we cannot address questions about whether participation in regular lung cancer screening would influence an individual's overall cancer screening behavior or health care management for other diseases. We omitted possible effects of screening CT on future lung cancer risk (56) and screening behavior (57). The results of our analyses suggest that the rates of smoking cessation achievable in combination with a screening program may greatly influence the overall effectiveness of screening.
Modeling offers a way to use available data—including that from observational or single-arm studies—to inform those making current screening decisions, while they await long-term mortality data from randomized trials. Our results suggest that adding a control arm to the Mayo CT study would have provided some, but not significant (P > .05), evidence of a moderate reduction in lung cancer–specific mortality caused by screening.
| ADVANCES IN KNOWLEDGE |
|---|
|
|
|---|
| IMPLICATION FOR PATIENT CARE |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Abbreviations: CI = confidence interval LCPM = Lung Cancer Policy Model NLST = National Lung Screening Trial NSCLC = non-SCLC SCLC = small cell lung cancer SEER = Surveillance, Epidemiology, and End Results
Author contributions: Guarantors of integrity of entire study, P.M.M., C.Y.K.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, P.M.M.; clinical studies, S.J.S.; statistical analysis, P.M.M., C.Y.K., M.C.W., K.M.K.; and manuscript editing, all authors
Authors stated no financial relationship to disclose.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Berrington de Gonzalez, K. P. Kim, and C. D Berg Low-dose lung computed tomography screening before age 55: estimates of the mortality reduction required to outweigh the radiation-induced cancer risk J Med Screen, September 1, 2008; 15(3): 153 - 158. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. McMahon and G. S. Gazelle Response to "comentary: lung cancer screening--progress or peril. Oncologist, June 1, 2008; 13(6): 733 - 733. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |