|
|
||||||||
Opinion |
1 From the Department of Radiology, Harvard Medical School, and Department of Radiology, Massachusetts General Hospital, Avon Foundation Comprehensive Breast Evaluation Center, Wang Ambulatory Care Center, Suite 240, 15 Parkman Street, Boston, MA 02114 (D.B.K.); Department of Radiology, Washington University Medical Center, St Louis, Mo (B.M.); and Department of Radiology, Mount Sinai School of Medicine, New York, NY (S.A.F.). Received October 1, 2002; revision requested December 12; revision received March 19, 2003; accepted April 28. Address correspondence to D.B.K. (e-mail: kopans.daniel@mgh.harvard.edu).
| ABSTRACT |
|---|
|
|
|---|
© RSNA, 2003
Index terms: Breast radiography, utilization, 00.11 Cancer screening Opinions
| INTRODUCTION |
|---|
|
|
|---|
In the past, much of health care was based on anecdotal experience. In the absence of science this is not unreasonable. However, evidence-based guidelines have replaced anecdotes in modern medicine. The requirements for evidence of benefit from an intervention are most critical for screening tests that affect otherwise healthy individuals.
Most imaging tests have been developed and are used to diagnose diseases among individuals who are ill. With the development of faster computed tomography (CT) scanners, magnetic resonance imaging systems, and positron emission tomography scanners, a great deal of interest in using these technologies to screen for various diseases has developed. Screening tests differ from diagnostic studies in that they are usually applied in the evaluation of healthy individuals. More and more healthy individuals are being subjected to these tests to try to find disease before the individuals become clinically ill. Furthermore, the vast majority of those who are screened do not have the disease being sought. This introduces a new way of looking at diseases and requires a different level of evidence for determining the efficacy of the screening test.
In his 2001 American Roentgen Ray Society presidential address (3), Robert Stanley, MD, discussed some of the issues associated with new screening tests for lung cancer, colon cancer, and coronary artery disease. Obuchowski et al (4) reviewed what they believed to be the "Ten Criteria for Effective Screening." It is somewhat surprising that the one imaging test that has undergone rigorous analysis as a screening techniquemammographywas barely mentioned in these reviews. Obuchowski and colleagues stated that "mammography is not an ideal screening model" (4).
In fact, although breast cancer screening may not be the ideal test, the issues associated with mammographic screening make it an excellent model for understanding the requirements for screening. Those involved in breast cancer screening have had firsthand experience for several decades with the pitfalls that lie in the way of demonstrating the efficacy of a screening test. X-ray mammography for detecting breast cancer has been studied in greater detail than any other test, and the problems that have been encountered in demonstrating its screening efficacy are basic in the demonstration of efficacy for any newer screening tests.
The complexities involved in demonstrating the efficacy of a screening test were highlighted by a controversy that recently arose almost 40 years after the first randomized controlled trial (RCT) of breast cancer screening. Two analysts argued that five of the seven RCTs of screening were not properly performed, and, therefore, their data and conclusions were not valid (5,6). Since the two remaining trials, they claimed, did not show a benefit, they concluded that mammographic screening was not justified. Although their concerns ultimately proved to be either unfounded or inconsequential (7,8), their review caused a great deal of confusion and consternation (9,10). The controversy underscored the importance of carefully designed RCTs and the proper execution and strict monitoring of these trials. It is hoped that the evaluation of new technologies that are being proposed for screening can benefit from the experience gained in validating mammographic screening for breast cancer.
Screening is not confined to the detection of cancer. The concepts involved in the search for occult breast cancer can be applied to the search for other processes such as tuberculosis or hypertension. There are general concepts that apply to all screening tests and some that are more specifically applicable to screening for cancer. Since many of the recent efforts have been to develop imaging tests to screen for colon cancer and lung cancer, we will confine our comments to the issues involved with cancer screening. However, most, if not all, of these concepts also apply to screening that is performed to assess for other life-threatening processessuch as imaging the coronary arteries to detect coronary artery diseaseand even to nonimaging tests, such as prostate specific antigen (PSA) testing for detecting prostate cancer.
| SOME BASIC CONCEPTS |
|---|
|
|
|---|
A screening test for cancer is similar to a mechanical screen over a window. The principle is to allow what is desirable (air and light) through the screen while filtering out what is not desirable (insects). The degree of filtration (ie, detection) depends on the size of the openings in the screen. To prevent very small insects from passing through, the openings must be very small, but this will also reduce the flow of air and the amount of light that enters the room. Larger openings will let more air and light through, but they will also let some of the insects through. This concept of screen size is analogous to the threshold for intervention in a cancer screening test. If the threshold for intervention is low (ie, small abnormalities are aggressively evaluated), then many noncancerous lesions will be caught in the screen. If the threshold is elevated to reduce the false-positive results, then small cancers will pass through and be missed. Efficacious screening has to balance the benefits and the risks.
Among the many requirements that have been suggested for ensuring that a cancer screening test is efficacious (11) are the following:
1. The cancer must be fairly common, so that the benefit to those with cancer will offset the inconvenience, cost, and harms that will be incurred by the many individuals who do not have the cancer.
2. The new test must be able to reveal the disease earlier in its growth than does the customary way in which the disease is found.
3. The cancer must have an effective treatment (either it is curable when found earlier, or there is treatment that can result in delayed death for a reasonable number of individuals).
4. The value of detecting the cancer at this earlier time outweighs the risks and costs generated by screening.
| THE IDEAL SCREENING TEST FOR CANCER |
|---|
|
|
|---|
| DOES EARLIER DETECTION MEAN THAT THE COURSE OF THE DISEASE IS ALTERED? |
|---|
|
|
|---|
Some cancers are indolent and may not affect an individual during his or her lifetime. Tumors have been found at autopsy in individuals who died from some other cause. Since these never affected the individual during life, then finding them would actually have been detrimental to the individual (resulting in unnecessary treatment, anxiety, etc). Furthermore, "competing" causes of death must also be taken into account. Screening someone for breast cancer who has chronic congestive heart failure and a life expectancy of less than 5 years is unlikely to result in a prolongation of the individuals life. Thus, for a screening test to be efficacious, it must be shown that its use actually alters the natural history of the cancer in a way that is beneficial to the patient.
Furthermore, because most individuals who are screened will not actually derive any benefit from the test (most will not develop the cancer), the benefit for the few whose disease course is influenced by earlier detection must outweigh the harms that the test might cause to the others.
| INCREASED SURVIVAL DOES NOT PROVE THE EFFICACY OF SCREENING |
|---|
|
|
|---|
Assume there are two individuals who are identical twins (A and B). They are both going to develop a cancer at the same time in the same organ on the same date, and these cancers are going to grow at the same rate. If left untreated, they would kill the individuals 15 years after the first cell began to proliferate. Five years after the cancer begins to grow, twin A undergoes a screening test. The cancer is detected before the twin has any symptoms, and the twin undergoes treatment, lives another 10 years, and then succumbs to the cancer. Since we usually only know the time from detection to the time of death (the survival time), the data would show that twin A "survived" 10 years.
Twin B does not want to undergo the cancer test, waits until symptoms develop, and then is diagnosed with cancer 8 years after the first cell began to proliferate. The cancer is treated, but the twin still dies 7 years later. Since twin Bs survival from diagnosis to death was 7 years, while twin As was 10 years, survival results suggest that twin As survival was 3 years longer than twin Bs, and twin A would appear to have derived a benefit from the screening test. However, adding up all the years, it is clear that there was no benefit from the test. Both actually died at the same time, 15 years after the cancer began. The screening test only made twin A aware of the cancer 3 years earlier, and finding the cancer earlier had no effect on the twins mortality. This has been termed lead time bias and is one of the reasons why comparing survival datathe time from diagnosis to the time of deathcan be misleading. Lead time bias is the main reason why survival data are usually insufficient for proving a benefit from a screening test. It is not enough to show that cancers can be detected earlier. It must be shown that detecting them earlier prolongs life.
| OVERDIAGNOSIS (OR PSEUDODISEASE) BIAS |
|---|
|
|
|---|
Mammographic screening, for example, reveals cancer while it is still in the milk ducts. This is known as ductal carcinoma in situ (DCIS). Because the majority of breast cancers almost certainly begin in the epithelial cells of the ducts (12), most breast cancers begin as intraductal lesions. However, it is unclear which intraductal cancers will progress to become invasive lesions. Because breast cancer can only become lethal by infiltrating into the tissue around the ducts so that it can gain access to the lymphatic vessels and vascular supply that will allow it to spread to other organs, DCIS is a nonlethal lesion. Because the cells of these in situ lesions look like the cells of invasive cancer, DCIS, until recently, has been treated like invasive cancer (13). For many years mastectomy was used to treat these very early lesions. On the basis of the uncertainty as to what percentage of DCIS lesions will progress to invasive cancer and the fact that it is likely that not all DCIS lesions will progress, many have criticized mammographic screening because it led to overdiagnosis and "overtreatment" (14).
There is still great uncertainty over the importance of these lesions. Given enough time, a large percentage may progress to invasive cancers, but it is likely that many will not (15). If DCIS is considered a "real" cancer, but few cases actually lead to lethal lesions, then finding DCIS with the screening test will bias the results. The test will appear to result in saving more lives by revealing "cancers" that would never have taken a life even if they were not discovered by the test. This has been called overdiagnosis bias.
It can confuse the interpretation of data in the following way: Suppose that there are 100 cancers diagnosed each year before the introduction of a new screening test, and 50% of the individuals with this diagnosis die from their cancer. Then the screening test is used. The number of cancers diagnosed each year doubles, and it is found that the death rate has been cut in half. On the surface, it would seem as if the screening test had provided a major benefita 50% reduction in deaths. Closer inspection could, however, reveal overdiagnosis bias. If none of the additionally diagnosed cancers had lethal potential, then they would contribute to the denominator of the ratio of deaths to the number of individuals with cancer, but they would not contribute to the numerator. Thus, the rate of death before screening would be 50/100, or 50%, while the rate of death after screening would be 50/200, or 25%. Superficially, it would seem that the use of screening had halved the number of deaths, when in fact screening had only revealed more cancers that had no lethal potential, and the absolute number of deaths remained unchanged.
| LENGTH BIAS SAMPLING |
|---|
|
|
|---|
| SELECTION BIAS |
|---|
|
|
|---|
| RCT IS THE ONLY ACCEPTED METHOD FOR VALIDATING THE EFFICACY OF A SCREENING TEST |
|---|
|
|
|---|
So that we can understand an RCT, let us return to the example of the twins with cancer. The only way we would have known that twin A actually benefited from screening was if this twin had actually outlived twin B. Since in that situation the only difference between the two individuals was that one had been screened and the other had not, then we could attribute the longevity of the first twin to screening.
Because twins like those described above do not exist in real life, the relationship must be simulated by using the laws of probability. If a large enough number of individuals are randomly separated into two groups, then every individual in one group will have a "twin" in the other. In other words, for every individual who is destined (assuming no intervention is performed) to die at a certain time from a certain type of cancer in one group, there will be a twin in the other group who is destined to die from the same type of cancer at the same time in the future. If a new intervention (like screening) works, then, as the two groups are followed up over time, there will be fewer deaths among the study (screened) group than among the control (unscreened) group. If the numbers are large enough, or if the reduction in death is large enough, that the difference cannot easily be attributed to chance, then the difference is said to be statistically significant, and the efficacy of the test has been demonstrated.
It can easily be appreciated that, even if the cancer is fairly common, the numbers of individuals that need to be included in a trial must be enormous. There have to be enough individuals so that, over the course of the trial, there will be enough participants who develop the cancer that is being studied and enough individuals who will die of the cancer such that the reduction in deaths from the intervention will be statistically significant. The greater the benefit from screening, the smaller the number of study participants needed to prove the benefit. If the actual benefit is small, then a much larger study is needed.
| BLINDED RANDOMIZATION IS CRITICAL |
|---|
|
|
|---|
Nonblinded randomization compromised the results of the Canadian National Breast Screening Study (8,17). Women with advanced breast cancers were placed in greater numbers into the screened group than into the control group. The trial was compromised because women who already had advanced breast cancers and thus could not benefit from screening were allowed to participate in a screening trial. Each of the women first underwent a clinical breast examination (so that those with lumps were identifiable), and then the allocation to one group or the other was performed with open lists, making it possible to just skip a line on the open lists to place a woman with a lump in the screened group. Obviously, this compromises the results of the trial.
One measure of how well the random allocation process went is to see if the demographic features (eg, age distribution, family income, number of children) of the two groups are the same. However, with a compromise such as that in the Canadian National Breast Screening Study, shifting a few women with advanced cancer from one group to the other will have a dramatic impact on the mortality rates of the groups but will have no influence on the overall demographics because the other thousands of women involved will dilute the effect. Randomization must be completely blinded.
| STATISTICAL POWER |
|---|
|
|
|---|
The statistical power of a study is critical. Failure to appreciate its importance has confused many analysts. If a study includes just enough individuals to show a benefit for the entire population that participated in the trial, then a retrospective analysis of data from a subgroup within the trial may not show benefit merely because there were not enough individuals in that group to permit a statistically significant result. This is precisely what happened in the breast cancer screening trials in which data in women aged 4049 years were analyzed, retrospectively, as a separate group (20). There were far too few women in that age group to demonstrate significant results in the early years of follow-up (20).
The fact that the benefit did not achieve statistical significance (because the trials did not include sufficient numbers of women to be able to show a significant benefit in this subgroup in the early years of follow-up) was misinterpreted by some analysts as meaning that there was no benefit. Following up the women for a longer period of time meant that there were more deaths from breast cancer and more patient years such that the benefit became significant (21).
Obviously, if numbers were not critical, then only very small trials would be needed. Retrospective subgroup analysis of data that lack the statistical power to permit accurate analysis should only be used to raise the next research question; it is scientifically unjustified to use these analyses to make medical recommendations. Investigators who wish to analyze data by subgroups (eg, age, sex) must plan at the outset to be sure that there will be sufficient numbers of participants in these groups to permit legitimate analysis.
| STATISTICAL SIGNIFICANCE |
|---|
|
|
|---|
There are mathematical formulas that are used to estimate the likelihood that a result is due to chance. These "tests of significance" are based on varying assumptions, but the basic principle behind them can be seen in the following example: If we are told that there were seven cancer deaths in the screened group and 10 cancer deaths in the unscreened control group, most of us would say that seven versus 10 was not a big difference and could easily be due to chance. However, if there were 70 deaths in the screened group and 100 deaths in the control group, even though the ratio is exactly the same, we would be more inclined to believe that there was a real benefit on the basis of the larger numbers. "Statistical significance" is merely a way of showing this mathematically.
| POINT ESTIMATES AND THE TIME TO SHOW A BENEFIT |
|---|
|
|
|---|
If we return to the basis of RCTs and the analogy of the twins, the problem is clear. For a benefit to be shown for a screened individual who would have died without screening, that individuals "twin" in the unscreened group must die from the cancer. If the screening test enables the interruption of only moderately growing and slow-growing but lethal cancers, it may take many years for the control twin to die and the benefit to be revealed. Thus, the follow-up time is very important. A benefit may be overlooked if the follow-up is too short. In fact, an "early benefit" would be difficult to explain. For an early benefit to appear in a cancer screening trial, the cancer in the screened "twin" would have to be detected and treated just before it metastasized, while the cancer in the unscreened "twin" would have to metastasize soon after and kill the unscreened twin fairly quickly. On the basis of the fact that length bias sampling means that periodic screening is more likely to interrupt moderately growing and slow-growing cancers, an early benefit soon after the start of screening would actually be unlikely.
| NONCOMPLIANCE AND CONTAMINATION |
|---|
|
|
|---|
Although counting noncompliers with the screened group and counting contaminators as if they were unscreened controls seems nonsensical, it is necessary because not doing so could introduce major biases. If, for example, individuals who were destined to have poor-prognosis cancers refused screening and were consequently not included in the study group data, this could make screening appear successful when it was merely the fact that these individuals with bad cancers had refused to participate.
Because women in the mammographic screening trials were not forced to undergo screening if they were allocated to the screened group and because those allocated to the control group were not prevented from being screened, the mammography trial results are reported as showing a benefit for women who were "invited" to be screened as opposed to those who actually were screened. Many women who died from breast cancer in the screened group had actually refused screening, while some in the unscreened control group may have been saved because of mammograms obtained outside the screening trial. Because proper analysis requires that such women still be counted with the group to which they had been allocated, it is fairly certain that the mammographic screening trial results represent an underestimation of the true benefit from screening.
| SCREENING INTERVAL |
|---|
|
|
|---|
In general, increasing the time between screening tests increases the number of cancers that become evident between screening tests (interval cancers). This means that the value of the screening test is diminished because an increasing number of individuals gain no benefit from the test. Because the cost of screening and the secondary costs that it creates (eg, for additional testing, biopsies) represent major challenges to its use, the goal is to try to determine the optimal screening interval (ie, lives saved vs time between screening examinations). At some point there is insufficient incremental value to justify decreasing the time between screening examinations (few if any additional lives are saved beyond those saved with the longer interval). It is often the cost of the test (including harms to the patients as well as economic cost) that dictates a compromise in which health planners support a longer interval than would be medically ideal (an ideal interval being one that results in the greatest number of lives saved).
When a new test is introduced that reveals cancers earlier than they would be revealed without the test, the first round of screening will not only reveal the cancers that have just reached the threshold where the test can demonstrate them but will also reveal cancers that have been accumulating in the population because the old method could not reveal the cancers until they reached an even larger size. Thus, in the first round of screening (prevalence screening), many more cancers will be detected than will be found at subsequent screening examinations if the time between screening examinations is not too long. If screening is repeated, and the time between screening examinations is appropriate, then most of the next group of cancers detected will have just entered the detectable phase. These new cancers are the incident cases that are newly discovered after each screening interval.
The period of time during which a cancer is detectable with a test before it is clinically evident is called the sojourn time. If, for example, a cancer becomes palpable at 1.5 cm, is detectable with mammography at 0.5 cm, and takes 2 years to grow from 0.5 to 1.5 cm, then the sojourn time is 2 years. If screening takes place every 3 years, then many cancers that were not detected at one screening examination will already have become palpable before the next screening examination, and the screening test will be less effective than it would have been if the screening interval were shorter. If there are too many cancers that become evident in the interval between screening examinations (ie, the screening interval is too long), then the screening test will have little value. Performing more frequent screening is generally better than having longer intervals between screening examinations, but the practical "best" time between screening examinations is a balance between the most effective period for reducing deaths through early detection and the cost of performing more frequent testing. To intercept cancers earlier, the screening interval should be less than half the sojourn time (26).
| THRESHOLDS FOR INTERVENTION |
|---|
|
|
|---|
We are unaware of a screening test that does not yield false-positive and false-negative results. Probably the best-known cancer screening test is the Papanicolaou (Pap) test for cancer of the uterine cervix. Depending on how the test is interpreted, as many as 10% of women will have falsely positive smears and will be recalled for additional testing that turns out to yield negative results. This is usually due to the fact that for all screening tests there is often an overlap in characteristics between benign and malignant lesions. Some lesions (eg, spiculated masses at mammography) have a very high probability of being cancer, while others (eg, well-circumscribed masses at mammography) have a low probability of being cancer, but a small risk nonetheless. If the goal is to find all cancers, then the threshold for intervention must be low and the false-positive call rate will be high. If the goal is to minimize the false-positive call rate, then the threshold for intervention is high and cancers will be allowed to slip through the screening test (27). Until a perfect screening test is developed, there is usually no way to avoid this relationship. One of the goals in developing a screening test should be to develop methods that increase the sensitivity (ie, cancers detected with the test divided by the actual number of cancers that prove to be in the population) of the test for small cancers while keeping the specificity (ie, the number of test results that are interpreted as negative divided by the number of individuals who actually do not have cancer) as high as possible.
| HOW TO MEASURE THE BENEFIT: DISEASE-SPECIFIC OR ALL-CAUSE MORTALITY? |
|---|
|
|
|---|
A drug might reduce the death rate for the targeted disease but actually lead to more deaths from what would appear to be an unrelated cause. For example, use of a particular chemotherapeutic agent reduces the death rate from breast cancer, but if the doses are too high, it causes heart damage that can lead to death. A trial that only evaluated deaths from breast cancer would miss these deaths from cardiac causes, whereas a trial that measured deaths from all causes would demonstrate these deaths. The screening trials were faulted for not comparing all-cause mortality but instead comparing deaths caused by the specific disease being studied (disease-specific mortality).
The reason for this is quite straightforward. Screening trials differ from therapeutic trials in that in the latter, all of the participants have the disease, and the disease will probably cause most of the deaths in the trial. The use of all-cause mortality to measure benefit is practical, and the trials do not require enormous numbers. In screening trials, however, most of the participants will not die. Most will not develop the cancer being studied, and most of the deaths among participants will not be due to the disease being studied but will instead be due to the multiple other causes of death that occur in a general population each year. Since, for example, breast cancer accounts for only a small percentage of deaths each year, it would take huge numbers of participants to show that a reduction in deaths from breast cancer results in a decrease in all-cause mortality. Even if breast cancer accounted for 10% of all the deaths in the general population each year (the actual number is approximately 3%), it has been estimated that a trial would require more than 3 million participants so that a 25% decrease in breast cancer deaths would be reflected as a statistically significant decrease in all-cause mortality (29).
If all-cause mortality is the required measure, then screening trials would become prohibitively large, expensive, and impossible to perform. Consequently, screening trials are designed to measure deaths from the disease being studied. This reduces the size of the trial that is needed to show a benefit, but it places an extra burden on researchers conducting screening trials to ensure that the determination of the cause of death is not biased. If, for example, a participant in a breast cancer screening trial dies with metastatic disease but also heart disease, it might not be clear what actually caused her death. If she had been allocated to the screened group and a reviewer was biased toward screening as being beneficial, her death might be attributed to heart disease so that it would not be counted as a breast cancer death. The opposite might occur if the reviewer was biased against screening. The only way to avoid the potential for bias is to be sure that the review to determine the cause of death is completely blinded as to the allocation of the patient. This is critical if the results of the trial are to be credible. Those who conduct trials must also be alert to unexpected causes of death that might occur as a result of the testing but might not be due to the disease for which the screening is being performed.
| UNEXPECTED CONSEQUENCES OF SCREENING |
|---|
|
|
|---|
Nevertheless, this example points out that unanticipated factors might significantly influence the results of a trial. Even if all-cause mortality is not used to measure benefit, it should be evaluated in screening trials, and deaths from other causes should be compared between the two groups to try to minimize the possibility of overlooking unexpected negative consequences of screening.
| GENERALIZABILITY |
|---|
|
|
|---|
Once a test has been shown to be efficacious in an RCT, then the final confirmation of its value is established by monitoring the disease in the population when the test is introduced into general practice. The ultimate test of efficacy is satisfied if, when the screening examination is introduced into the general population, deaths from the cancer decrease as a result. The widespread use of cervical cancer screening is an extreme example of this. The reason that cervical cancer screening has been accepted, despite the fact that there has never been an RCT to prove its efficacy, is that when it has been applied in populations, the death rate from cervical cancer has decreased in those populations, and when it was withdrawn, the death rate increased (32,33).
Not only has the benefit from mammographic screening been shown in RCTs, but the benefit from mammographic screening for breast cancer has now been shown in large populations outside of trials. In the two counties that participated in the Swedish Two County Trial (34), breast cancer death rates decreased by over 50% among the women living in the two counties when breast cancer death rates before screening was offered were compared with the rates after screening was offered. In the review of data from the two counties, it was observed that women who were not offered screening during the Two County Trial and women who refused the general offer of screening in the most recent years had the same breast cancer death rate as women prior to any screening. Although not conclusive proof, this suggests that most of the decrease in deaths was due to screening and not changes in therapy.
In another study of seven counties in Sweden (comprising 30% of the population of the country), a 30%44% decrease in breast cancer deaths was attributed to mammographic screening (35). In the United States, the death rate from breast cancer suddenly began to decrease, for the first time in 50 years, in 1989. This was 57 years after screening began for large numbers of American women, as evidenced by the sudden increase in cancers being detected beginning around 19831985. The relationship between the start of screening and mortality reduction (as well as the suddenness of both) is a marker that strongly suggests that screening is responsible for much of the decrease in breast cancer deaths in the United States (36).
For all of these population reviews, it is important to realize that the analyses are based on the entire population and the rate of deaths from breast cancer (ie, the number of deaths per 100,000 women in the population) so that they are independent of the amount of breast cancer being diagnosed, thus avoiding pseudodisease bias.
| BEWARE OF THE ARTIFICIAL GROUPING OF DATA |
|---|
|
|
|---|
An example of age grouping bias can be seen in evaluating the percentage of individuals with gray hair. There is a steady increase in the percentage of individuals with gray hair with increasing age, but if those numbers were to be analyzed by comparing those younger than 42 years with those 42 years and older, it would appear that there is a sudden increase in the number of individuals with gray hair at the age of 42, making it appear that something must be happening at that age. Analysts must be very careful that their methods of analysis do not bias the results of a trial.
| "AGE CREEP" |
|---|
|
|
|---|
Superficially, it would seem obvious that the data should be analyzed on the basis of the age at which the cancers are detected. However, this can be misleading. For example, if the screening test reveals cancers, on average, 3 years before they become clinically evident (ie, the "lead time" of the test is 3 years), and an individual in the screened group has a cancer detected at age 48, the individuals "twin" will not have the cancer detected until age 51. If the data (as happened in the mammographic screening trials) are analyzed by age at diagnosis, and data in individuals younger than 50 years are analyzed as a group while data in those 50 years and older are analyzed as another group, the death of the "control" individual will be counted with the over-50 group, and the benefit for the 48-year-old screened women will not be counted. Prorok et al (44) have pointed out that the best way to analyze the data in these trials is by the age of the individuals at the time that they are allocated to the screened or control group and not by the age at diagnosis.
| SURROGATE END POINTS |
|---|
|
|
|---|
Unfortunately, there is, as yet, no universal agreement on the use of any surrogate end points for any cancer screening test at this time, and many health planners require evidence of statistically significant mortality reduction in an RCT as proof of efficacy.
In summary, a screening test, unfortunately, is not efficacious simply because it can reveal cancers earlier. It must reveal cancers at a time when they can be successfully treated to either delay death or prevent death from the cancer. This generally can only be shown through the use of RCTs that must be large enough to have the statistical power to be able to "prove" a mortality reduction. Our focus has been on the medical and scientific issues underlying cancer screening. Ultimately, any benefit that can be derived will be balanced against the economic costs, and these will determine whether or not the screening test can be adopted for population-based screening.
We have tried to address the major issues involved in determining the efficacy of a screening test. If investigators wish to avoid many of the controversies that made it so difficult to "prove" the benefit of mammographic screening, they should heed the experience gained from the breast cancer screening controversies. One of the major lessons is that, although costly, RCTs must be sufficiently large, and allocation must be blinded and random so that results will have power and credibility.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. J. Kim, E.-K. Kim, J. Y. Kwak, B.-W. Park, S.-I. Kim, J. Sohn, and K. K. Oh Sonographic Surveillance for the Detection of Contralateral Metachronous Breast Cancer in an Asian Population Am. J. Roentgenol., January 1, 2009; 192(1): 221 - 228. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Sihto, J. Lundin, T. Lehtimaki, M. Sarlomo-Rikala, R. Butzow, K. Holli, L. Sailas, V. Kataja, M. Lundin, T. Turpeenniemi-Hujanen, et al. Molecular Subtypes of Breast Cancers Detected in Mammography Screening and Outside of Screening Clin. Cancer Res., July 1, 2008; 14(13): 4103 - 4110. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Burger, N. E. Kass, J. H. Sunshine, and S. S. Siegelman The Use of CT for Screening: A National Survey of Radiologists' Activities and Attitudes Radiology, July 1, 2008; 248(1): 160 - 168. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Graf, T. H. Helbich, G. Hopf, C. Graf, and E. A. Sickles Probably Benign Breast Masses at US: Is Follow-up an Acceptable Alternative to Biopsy? Radiology, July 1, 2007; 244(1): 87 - 93. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Schoder and M. Gonen Screening for Cancer with PET and PET/CT: Potential and Limitations J. Nucl. Med., January 1, 2007; 48(1_suppl): 4S - 18S. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Furtado, D. A. Aguirre, C. B. Sirlin, D. Dang, S. K. Stamato, P. Lee, F. Sani, M. A. Brown, D. L. Levin, and G. Casola Whole-Body CT Screening: Spectrum of Findings and Recommendations in 1192 Patients Radiology, November 1, 2005; 237(2): 385 - 394. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Kopans Bias in the Medical Journals: A Commentary Am. J. Roentgenol., July 1, 2005; 185(1): 176 - 182. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Strano, P. Crystal, and D. Kopans Adjunct Sonography and Not Screening in Cancer Detection Am. J. Roentgenol., August 1, 2004; 183(2): 539 - 539. [Full Text] [PDF] |
||||
![]() |
L. Liberman Breast Cancer Screening with MRI -- What Are the Data for Patients at High Risk? N. Engl. J. Med., July 29, 2004; 351(5): 497 - 500. [Full Text] [PDF] |
||||
![]() |
A K Dixon Whole-body CT health screening Br. J. Radiol., May 1, 2004; 77(917): 370 - 371. [Full Text] [PDF] |
||||
![]() |
D. B. Kopans Sonography Should Not Be Used for Breast Cancer Screening Until Its Efficacy Has Been Proven Scientifically Am. J. Roentgenol., February 1, 2004; 182(2): 489 - 491. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |