Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2292021272
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kopans, D. B.
Right arrow Articles by Feig, S. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kopans, D. B.
Right arrow Articles by Feig, S. A.
(Radiology 2003;229:319-327.)
© RSNA, 2003


Opinion

Screening for "Cancer": When is it Valid?—Lessons from the Mammography Experience1

Daniel B. Kopans, MD, FACR, Barbara Monsees, MD and Stephen A. Feig, MD

1 From the Department of Radiology, Harvard Medical School, and Department of Radiology, Massachusetts General Hospital, Avon Foundation Comprehensive Breast Evaluation Center, Wang Ambulatory Care Center, Suite 240, 15 Parkman Street, Boston, MA 02114 (D.B.K.); Department of Radiology, Washington University Medical Center, St Louis, Mo (B.M.); and Department of Radiology, Mount Sinai School of Medicine, New York, NY (S.A.F.). Received October 1, 2002; revision requested December 12; revision received March 19, 2003; accepted April 28. Address correspondence to D.B.K. (e-mail: kopans.daniel@mgh.harvard.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
There is increasing interest in the development of imaging tests to screen for diseases such as cancer. Mammographic screening for breast cancer has undergone greater scrutiny than any other test. Many important lessons have been learned from the issues that have been raised with regard to mammographic screening. Those interested in developing new screening tests can learn from the mammography experience.

© RSNA, 2003

Index terms: Breast radiography, utilization, 00.11 • Cancer screening • Opinions


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Santayana’s well-worn suggestion "Those who cannot remember the past are condemned to repeat it" (1) is often forgotten. There has been a great deal of recent interest in the use of imaging tests to screen for undetected disease. The rationale for this is the intuitively obvious belief that it must be better to detect cancer earlier. The most recent of several articles by Kolb et al (2) suggesting that ultrasonography (US) can reveal unsuspected cancers in women with radiographically dense breasts is an example. The findings of this study resulted in the impression that whole-breast US screening was beneficial when in fact the study design did not enable the researchers to prove that screening with US has any value.

In the past, much of health care was based on anecdotal experience. In the absence of science this is not unreasonable. However, evidence-based guidelines have replaced anecdotes in modern medicine. The requirements for evidence of benefit from an intervention are most critical for screening tests that affect otherwise healthy individuals.

Most imaging tests have been developed and are used to diagnose diseases among individuals who are ill. With the development of faster computed tomography (CT) scanners, magnetic resonance imaging systems, and positron emission tomography scanners, a great deal of interest in using these technologies to screen for various diseases has developed. Screening tests differ from diagnostic studies in that they are usually applied in the evaluation of healthy individuals. More and more healthy individuals are being subjected to these tests to try to find disease before the individuals become clinically ill. Furthermore, the vast majority of those who are screened do not have the disease being sought. This introduces a new way of looking at diseases and requires a different level of evidence for determining the efficacy of the screening test.

In his 2001 American Roentgen Ray Society presidential address (3), Robert Stanley, MD, discussed some of the issues associated with new screening tests for lung cancer, colon cancer, and coronary artery disease. Obuchowski et al (4) reviewed what they believed to be the "Ten Criteria for Effective Screening." It is somewhat surprising that the one imaging test that has undergone rigorous analysis as a screening technique—mammography—was barely mentioned in these reviews. Obuchowski and colleagues stated that "mammography is not an ideal screening model" (4).

In fact, although breast cancer screening may not be the ideal test, the issues associated with mammographic screening make it an excellent model for understanding the requirements for screening. Those involved in breast cancer screening have had firsthand experience for several decades with the pitfalls that lie in the way of demonstrating the efficacy of a screening test. X-ray mammography for detecting breast cancer has been studied in greater detail than any other test, and the problems that have been encountered in demonstrating its screening efficacy are basic in the demonstration of efficacy for any newer screening tests.

The complexities involved in demonstrating the efficacy of a screening test were highlighted by a controversy that recently arose almost 40 years after the first randomized controlled trial (RCT) of breast cancer screening. Two analysts argued that five of the seven RCTs of screening were not properly performed, and, therefore, their data and conclusions were not valid (5,6). Since the two remaining trials, they claimed, did not show a benefit, they concluded that mammographic screening was not justified. Although their concerns ultimately proved to be either unfounded or inconsequential (7,8), their review caused a great deal of confusion and consternation (9,10). The controversy underscored the importance of carefully designed RCTs and the proper execution and strict monitoring of these trials. It is hoped that the evaluation of new technologies that are being proposed for screening can benefit from the experience gained in validating mammographic screening for breast cancer.

Screening is not confined to the detection of cancer. The concepts involved in the search for occult breast cancer can be applied to the search for other processes such as tuberculosis or hypertension. There are general concepts that apply to all screening tests and some that are more specifically applicable to screening for cancer. Since many of the recent efforts have been to develop imaging tests to screen for colon cancer and lung cancer, we will confine our comments to the issues involved with cancer screening. However, most, if not all, of these concepts also apply to screening that is performed to assess for other life-threatening processes—such as imaging the coronary arteries to detect coronary artery disease—and even to nonimaging tests, such as prostate specific antigen (PSA) testing for detecting prostate cancer.


    SOME BASIC CONCEPTS
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
The use of imaging to screen for cancer differs from most other uses of imaging in that ostensibly healthy people are evaluated in an effort to find a disease early in its course. The requirements for a screening test differ from those used to evaluate individuals who are already ill. In the latter situation, the intervention is limited to those with a definite problem in an effort to help them recover. Most individuals who undergo a screening test, however, are not sick. Therefore, a false-positive test (a test that suggests disease when the patient does not actually have the disease) can cause "harms" such as anxiety or even morbidity from a biopsy or diagnostic procedure needed to establish whether or not the disease is truly present. These harms never would have occurred if the screening test had not been administered. The degree of risk that is acceptable for patients who are already ill is usually greater than what would be acceptable for a healthy population that is being screened.

A screening test for cancer is similar to a mechanical screen over a window. The principle is to allow what is desirable (air and light) through the screen while filtering out what is not desirable (insects). The degree of filtration (ie, detection) depends on the size of the openings in the screen. To prevent very small insects from passing through, the openings must be very small, but this will also reduce the flow of air and the amount of light that enters the room. Larger openings will let more air and light through, but they will also let some of the insects through. This concept of screen size is analogous to the threshold for intervention in a cancer screening test. If the threshold for intervention is low (ie, small abnormalities are aggressively evaluated), then many noncancerous lesions will be caught in the screen. If the threshold is elevated to reduce the false-positive results, then small cancers will pass through and be missed. Efficacious screening has to balance the benefits and the risks.

Among the many requirements that have been suggested for ensuring that a cancer screening test is efficacious (11) are the following:

1. The cancer must be fairly common, so that the benefit to those with cancer will offset the inconvenience, cost, and harms that will be incurred by the many individuals who do not have the cancer.

2. The new test must be able to reveal the disease earlier in its growth than does the customary way in which the disease is found.

3. The cancer must have an effective treatment (either it is curable when found earlier, or there is treatment that can result in delayed death for a reasonable number of individuals).

4. The value of detecting the cancer at this earlier time outweighs the risks and costs generated by screening.


    THE IDEAL SCREENING TEST FOR CANCER
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Although there is no screening test that can achieve the following ideal criteria, these are, nevertheless, the goals that should be sought for any cancer screening test. An ideal screening test will (a) reveal all cancers at a time when they are curable, (b) yield no false-positive results, (c) yield no false-negative results, and (d) be harmless.


    DOES EARLIER DETECTION MEAN THAT THE COURSE OF THE DISEASE IS ALTERED?
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
One of the most difficult concepts to understand and accept is that merely finding a cancer earlier does not mean that the patient will benefit. Finding it earlier may not be early enough. Most solid cancers are thought to arise from a single cell. It is likely that, through damage or mutation, the DNA of the cell is altered, allowing unrestrained proliferation. Most cancers do not kill by merely destroying the organ in which they arise; they kill through metastatic spread that destroys other organs. Simply finding cancers earlier or at a smaller size may not alter the course of the disease. If a cancer has metastasized even before the new screening test can reveal it, then detecting the cancer earlier will have no life-saving benefit unless treatment can destroy early metastases. Unless a screening test results in a beneficial alteration in the natural history of the disease, it may not only be of no benefit but may also (if it causes harm to some of the healthy individuals being tested) clearly cause more harm than good.

Some cancers are indolent and may not affect an individual during his or her lifetime. Tumors have been found at autopsy in individuals who died from some other cause. Since these never affected the individual during life, then finding them would actually have been detrimental to the individual (resulting in unnecessary treatment, anxiety, etc). Furthermore, "competing" causes of death must also be taken into account. Screening someone for breast cancer who has chronic congestive heart failure and a life expectancy of less than 5 years is unlikely to result in a prolongation of the individual’s life. Thus, for a screening test to be efficacious, it must be shown that its use actually alters the natural history of the cancer in a way that is beneficial to the patient.

Furthermore, because most individuals who are screened will not actually derive any benefit from the test (most will not develop the cancer), the benefit for the few whose disease course is influenced by earlier detection must outweigh the harms that the test might cause to the others.


    INCREASED SURVIVAL DOES NOT PROVE THE EFFICACY OF SCREENING
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Superficially, it would seem that there should be a benefit from a new test if it reveals cancers earlier than the usual means of detection and this results in longer survival. Survival is the time between the detection of a cancer and the individual’s death from the cancer. The time from detection and diagnosis to death is the survival period. One patient with a cancer who survives for a longer period of time than another might be thought to have had more successful care (eg, earlier detection or better treatment). The reasons why this may not be true have been described as possible biases that may confound the analysis of the results of survival studies. One of the major biases that can occur is known as lead time bias. The following example not only explains lead time bias but also is critical for understanding the use of the RCTs that are needed to overcome the various biases that confuse the interpretation of screening data.

Assume there are two individuals who are identical twins (A and B). They are both going to develop a cancer at the same time in the same organ on the same date, and these cancers are going to grow at the same rate. If left untreated, they would kill the individuals 15 years after the first cell began to proliferate. Five years after the cancer begins to grow, twin A undergoes a screening test. The cancer is detected before the twin has any symptoms, and the twin undergoes treatment, lives another 10 years, and then succumbs to the cancer. Since we usually only know the time from detection to the time of death (the survival time), the data would show that twin A "survived" 10 years.

Twin B does not want to undergo the cancer test, waits until symptoms develop, and then is diagnosed with cancer 8 years after the first cell began to proliferate. The cancer is treated, but the twin still dies 7 years later. Since twin B’s survival from diagnosis to death was 7 years, while twin A’s was 10 years, survival results suggest that twin A’s survival was 3 years longer than twin B’s, and twin A would appear to have derived a benefit from the screening test. However, adding up all the years, it is clear that there was no benefit from the test. Both actually died at the same time, 15 years after the cancer began. The screening test only made twin A aware of the cancer 3 years earlier, and finding the cancer earlier had no effect on the twin’s mortality. This has been termed lead time bias and is one of the reasons why comparing survival data—the time from diagnosis to the time of death—can be misleading. Lead time bias is the main reason why survival data are usually insufficient for proving a benefit from a screening test. It is not enough to show that cancers can be detected earlier. It must be shown that detecting them earlier prolongs life.


    OVERDIAGNOSIS (OR PSEUDODISEASE) BIAS
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
There are other screening phenomena that can be misleading. For example, not all cancers are lethal. Some individuals who die from other causes are discovered, at autopsy, to have a cancer that never influenced them while they were alive. If a screening test had revealed these nonlethal cancers, it would have caused unnecessary anxiety. The individuals might have been subjected to treatments that they did not need, and the treatments might have been costly and harmful, without producing any benefit. The reason it is a bias is that if these nonlethal cancers would never have been detected without screening, counting them would make the test appear to result in saving lives when instead it was revealing cancers that would never have taken a life.

Mammographic screening, for example, reveals cancer while it is still in the milk ducts. This is known as ductal carcinoma in situ (DCIS). Because the majority of breast cancers almost certainly begin in the epithelial cells of the ducts (12), most breast cancers begin as intraductal lesions. However, it is unclear which intraductal cancers will progress to become invasive lesions. Because breast cancer can only become lethal by infiltrating into the tissue around the ducts so that it can gain access to the lymphatic vessels and vascular supply that will allow it to spread to other organs, DCIS is a nonlethal lesion. Because the cells of these in situ lesions look like the cells of invasive cancer, DCIS, until recently, has been treated like invasive cancer (13). For many years mastectomy was used to treat these very early lesions. On the basis of the uncertainty as to what percentage of DCIS lesions will progress to invasive cancer and the fact that it is likely that not all DCIS lesions will progress, many have criticized mammographic screening because it led to overdiagnosis and "overtreatment" (14).

There is still great uncertainty over the importance of these lesions. Given enough time, a large percentage may progress to invasive cancers, but it is likely that many will not (15). If DCIS is considered a "real" cancer, but few cases actually lead to lethal lesions, then finding DCIS with the screening test will bias the results. The test will appear to result in saving more lives by revealing "cancers" that would never have taken a life even if they were not discovered by the test. This has been called overdiagnosis bias.

It can confuse the interpretation of data in the following way: Suppose that there are 100 cancers diagnosed each year before the introduction of a new screening test, and 50% of the individuals with this diagnosis die from their cancer. Then the screening test is used. The number of cancers diagnosed each year doubles, and it is found that the death rate has been cut in half. On the surface, it would seem as if the screening test had provided a major benefit—a 50% reduction in deaths. Closer inspection could, however, reveal overdiagnosis bias. If none of the additionally diagnosed cancers had lethal potential, then they would contribute to the denominator of the ratio of deaths to the number of individuals with cancer, but they would not contribute to the numerator. Thus, the rate of death before screening would be 50/100, or 50%, while the rate of death after screening would be 50/200, or 25%. Superficially, it would seem that the use of screening had halved the number of deaths, when in fact screening had only revealed more cancers that had no lethal potential, and the absolute number of deaths remained unchanged.


    LENGTH BIAS SAMPLING
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Another problem with screening is that it is usually performed on a periodic basis. An understanding of how screening works makes it apparent that periodic testing has a greater chance of detecting slower-growing cancers than faster-growing cancers. Assume that a screening test is performed today and there are two individuals with cancers that are just below the threshold of detection so that neither cancer is detected. One is a fast-growing, more lethal cancer and the other is slower growing and indolent. The first cancer grows so quickly that it becomes clinically apparent in the interval before the next screening test (ie, the individual will develop a sign or symptom that indicates the presence of the cancer). The slower-growing cancer does not become clinically apparent before the next screening test, at which time it is detected with the test. Thus, when individuals whose cancers were detected with the screening test are compared with those whose cancers were not detected with the screening test, the individuals with screening-detected cancers have better survival simply because they have more indolent cancers. The fact that periodic screening is more likely to detect slower-growing, more indolent cancers is known as length bias sampling.


    SELECTION BIAS
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Another factor that can compromise the validity of a study of screening is selection bias. For example, volunteers can be asked to participate in a study. People who take special interest in their health (and volunteer for studies) tend to be healthier than the average individual. The results of the new test may appear to show a benefit, but the results may be due to the fact that the volunteers are healthier to begin with and might have better results for this reason. Conversely, a study may reveal no benefit if only individuals who already have a clinically evident problem are in the study. Women with advanced cancers, for example, will not benefit from screening. Allowing these women to participate in a screening study, as happened in the Canadian National Breast Screening Study, can bias the results (16).


    RCT IS THE ONLY ACCEPTED METHOD FOR VALIDATING THE EFFICACY OF A SCREENING TEST
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
For all of the reasons described above, it is generally not possible to rely on survival data or even mortality data from screening trials that lack an unscreened control group. It may be impossible to eliminate all forms of bias, but the only way to eliminate those listed above is through the use of RCTs.

So that we can understand an RCT, let us return to the example of the twins with cancer. The only way we would have known that twin A actually benefited from screening was if this twin had actually outlived twin B. Since in that situation the only difference between the two individuals was that one had been screened and the other had not, then we could attribute the longevity of the first twin to screening.

Because twins like those described above do not exist in real life, the relationship must be simulated by using the laws of probability. If a large enough number of individuals are randomly separated into two groups, then every individual in one group will have a "twin" in the other. In other words, for every individual who is destined (assuming no intervention is performed) to die at a certain time from a certain type of cancer in one group, there will be a twin in the other group who is destined to die from the same type of cancer at the same time in the future. If a new intervention (like screening) works, then, as the two groups are followed up over time, there will be fewer deaths among the study (screened) group than among the control (unscreened) group. If the numbers are large enough, or if the reduction in death is large enough, that the difference cannot easily be attributed to chance, then the difference is said to be statistically significant, and the efficacy of the test has been demonstrated.

It can easily be appreciated that, even if the cancer is fairly common, the numbers of individuals that need to be included in a trial must be enormous. There have to be enough individuals so that, over the course of the trial, there will be enough participants who develop the cancer that is being studied and enough individuals who will die of the cancer such that the reduction in deaths from the intervention will be statistically significant. The greater the benefit from screening, the smaller the number of study participants needed to prove the benefit. If the actual benefit is small, then a much larger study is needed.


    BLINDED RANDOMIZATION IS CRITICAL
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Although RCTs are the best way to determine whether or not there is a benefit from an intervention such as a screening test, RCTs are not always perfect. There are very strict rules that need to be applied in their design, execution, and analysis, or they run the risk of being compromised. One of the critical elements of a trial is that the allocation of individuals to either the study group or the control group must be truly random. Otherwise, biases can enter into the trial that compromise its results. Random allocation is critical to ensure the equal distribution of "twins" between the two groups. To achieve a random distribution of participants, allocation must be completely blinded. Those performing the allocation can know nothing about the participants so that there is no way to consciously or otherwise compromise the random allocation. Obviously, assigning individuals with small cancers to the screened group and/or individuals with large, more advanced cancers to the control group will bias the results in favor of the screening test, while the reverse situation will show that the screening test has no benefit or even has a detrimental effect.

Nonblinded randomization compromised the results of the Canadian National Breast Screening Study (8,17). Women with advanced breast cancers were placed in greater numbers into the screened group than into the control group. The trial was compromised because women who already had advanced breast cancers and thus could not benefit from screening were allowed to participate in a screening trial. Each of the women first underwent a clinical breast examination (so that those with lumps were identifiable), and then the allocation to one group or the other was performed with open lists, making it possible to just skip a line on the open lists to place a woman with a lump in the screened group. Obviously, this compromises the results of the trial.

One measure of how well the random allocation process went is to see if the demographic features (eg, age distribution, family income, number of children) of the two groups are the same. However, with a compromise such as that in the Canadian National Breast Screening Study, shifting a few women with advanced cancer from one group to the other will have a dramatic impact on the mortality rates of the groups but will have no influence on the overall demographics because the other thousands of women involved will dilute the effect. Randomization must be completely blinded.


    STATISTICAL POWER
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Before a study begins, estimates must be made as to how many participants will be needed in the trial. The statistical power of a projected study is determined ahead of time so that sufficient numbers of participants will be included. This is done by predicting the expected benefit (ie, percentage reduction in deaths) and then calculating how many participants will be needed to endow the study with a high probability of showing benefit with given estimates of cancer incidence and expected death rates. Since RCTs are very expensive, these estimates are often used to choose the lowest number of participants that is likely to show the expected benefit. The "power" of data to "prove" a benefit is also used to determine the success of a trial after the study has been concluded. If the estimated benefit proves to have been too high and the actual benefit is lower, the trial will not be able to show statistically significant results and a real benefit may be overlooked (18,19).

The statistical power of a study is critical. Failure to appreciate its importance has confused many analysts. If a study includes just enough individuals to show a benefit for the entire population that participated in the trial, then a retrospective analysis of data from a subgroup within the trial may not show benefit merely because there were not enough individuals in that group to permit a statistically significant result. This is precisely what happened in the breast cancer screening trials in which data in women aged 40–49 years were analyzed, retrospectively, as a separate group (20). There were far too few women in that age group to demonstrate significant results in the early years of follow-up (20).

The fact that the benefit did not achieve statistical significance (because the trials did not include sufficient numbers of women to be able to show a significant benefit in this subgroup in the early years of follow-up) was misinterpreted by some analysts as meaning that there was no benefit. Following up the women for a longer period of time meant that there were more deaths from breast cancer and more patient years such that the benefit became significant (21).

Obviously, if numbers were not critical, then only very small trials would be needed. Retrospective subgroup analysis of data that lack the statistical power to permit accurate analysis should only be used to raise the next research question; it is scientifically unjustified to use these analyses to make medical recommendations. Investigators who wish to analyze data by subgroups (eg, age, sex) must plan at the outset to be sure that there will be sufficient numbers of participants in these groups to permit legitimate analysis.


    STATISTICAL SIGNIFICANCE
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
As is well known, many events are due to chance alone. A coin can be flipped and turn up heads repeatedly, but the chance of this continuing to happen diminishes with an increasing number of flips. Similarly, in a screening trial, there may be fewer deaths in one group or the other on the basis of chance alone. When the balance shifts back and forth, it is termed statistical fluctuation and is usually due to the small numbers involved. It is particularly evident in the early years of follow-up in screening trials, when the number of deaths is small. Early on, there can even appear to be more deaths among the screened women due to statistical fluctuation. This occurred in some of the mammography trials (22,23), but if there is a real benefit and the numbers are sufficiently large (usually with longer follow-up), the "truth" will appear beyond chance.

There are mathematical formulas that are used to estimate the likelihood that a result is due to chance. These "tests of significance" are based on varying assumptions, but the basic principle behind them can be seen in the following example: If we are told that there were seven cancer deaths in the screened group and 10 cancer deaths in the unscreened control group, most of us would say that seven versus 10 was not a big difference and could easily be due to chance. However, if there were 70 deaths in the screened group and 100 deaths in the control group, even though the ratio is exactly the same, we would be more inclined to believe that there was a real benefit on the basis of the larger numbers. "Statistical significance" is merely a way of showing this mathematically.


    POINT ESTIMATES AND THE TIME TO SHOW A BENEFIT
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Although the numbers keep changing as the populations being studied are followed over time (deaths from most cancers continue to occur over time and not all at once), the data must be analyzed at some point (or points) in time. These are called the point estimates. If the data are analyzed too soon, the number of deaths will be small, and the differences between the two groups may be small. Consequently, the early point estimates may not be significant, producing misleading analyses. Even experts have overlooked this important fact (24).

If we return to the basis of RCTs and the analogy of the twins, the problem is clear. For a benefit to be shown for a screened individual who would have died without screening, that individual’s "twin" in the unscreened group must die from the cancer. If the screening test enables the interruption of only moderately growing and slow-growing but lethal cancers, it may take many years for the control twin to die and the benefit to be revealed. Thus, the follow-up time is very important. A benefit may be overlooked if the follow-up is too short. In fact, an "early benefit" would be difficult to explain. For an early benefit to appear in a cancer screening trial, the cancer in the screened "twin" would have to be detected and treated just before it metastasized, while the cancer in the unscreened "twin" would have to metastasize soon after and kill the unscreened twin fairly quickly. On the basis of the fact that length bias sampling means that periodic screening is more likely to interrupt moderately growing and slow-growing cancers, an early benefit soon after the start of screening would actually be unlikely.


    NONCOMPLIANCE AND CONTAMINATION
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Factors that can confuse results are called confounding factors. These can greatly weaken the power of a trial. For example, there is usually no way to prevent individuals who are supposed to be in the unscreened control group from going out on their own and having the screening test. Even if an individual’s life is saved by this, the proper analysis of RCT results requires that this person be counted with the unscreened group. This "contamination" weakens the ability of the trial to show a benefit, and larger numbers of participants are needed to compensate (25). Similarly, some individuals in the screening group refuse to be screened. Nevertheless, to avoid introducing a bias, these individuals must be counted as having been screened even if they die from the cancer. This is called noncompliance, and increased numbers of participants are needed to overcome the dilutional effects of this confounding factor.

Although counting noncompliers with the screened group and counting contaminators as if they were unscreened controls seems nonsensical, it is necessary because not doing so could introduce major biases. If, for example, individuals who were destined to have poor-prognosis cancers refused screening and were consequently not included in the study group data, this could make screening appear successful when it was merely the fact that these individuals with bad cancers had refused to participate.

Because women in the mammographic screening trials were not forced to undergo screening if they were allocated to the screened group and because those allocated to the control group were not prevented from being screened, the mammography trial results are reported as showing a benefit for women who were "invited" to be screened as opposed to those who actually were screened. Many women who died from breast cancer in the screened group had actually refused screening, while some in the unscreened control group may have been saved because of mammograms obtained outside the screening trial. Because proper analysis requires that such women still be counted with the group to which they had been allocated, it is fairly certain that the mammographic screening trial results represent an underestimation of the true benefit from screening.


    SCREENING INTERVAL
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
As noted earlier, length bias sampling is the concept that periodic screening is more likely to reveal moderately growing and slower-growing cancers than it is to reveal fast-growing cancers. Simply put, if the time between screening tests (screening interval) is too long, then a faster-growing cancer may be too small to be detected at one screening and will grow to a clinically detectable size before the next screening test.

In general, increasing the time between screening tests increases the number of cancers that become evident between screening tests (interval cancers). This means that the value of the screening test is diminished because an increasing number of individuals gain no benefit from the test. Because the cost of screening and the secondary costs that it creates (eg, for additional testing, biopsies) represent major challenges to its use, the goal is to try to determine the optimal screening interval (ie, lives saved vs time between screening examinations). At some point there is insufficient incremental value to justify decreasing the time between screening examinations (few if any additional lives are saved beyond those saved with the longer interval). It is often the cost of the test (including harms to the patients as well as economic cost) that dictates a compromise in which health planners support a longer interval than would be medically ideal (an ideal interval being one that results in the greatest number of lives saved).

When a new test is introduced that reveals cancers earlier than they would be revealed without the test, the first round of screening will not only reveal the cancers that have just reached the threshold where the test can demonstrate them but will also reveal cancers that have been accumulating in the population because the old method could not reveal the cancers until they reached an even larger size. Thus, in the first round of screening (prevalence screening), many more cancers will be detected than will be found at subsequent screening examinations if the time between screening examinations is not too long. If screening is repeated, and the time between screening examinations is appropriate, then most of the next group of cancers detected will have just entered the detectable phase. These new cancers are the incident cases that are newly discovered after each screening interval.

The period of time during which a cancer is detectable with a test before it is clinically evident is called the sojourn time. If, for example, a cancer becomes palpable at 1.5 cm, is detectable with mammography at 0.5 cm, and takes 2 years to grow from 0.5 to 1.5 cm, then the sojourn time is 2 years. If screening takes place every 3 years, then many cancers that were not detected at one screening examination will already have become palpable before the next screening examination, and the screening test will be less effective than it would have been if the screening interval were shorter. If there are too many cancers that become evident in the interval between screening examinations (ie, the screening interval is too long), then the screening test will have little value. Performing more frequent screening is generally better than having longer intervals between screening examinations, but the practical "best" time between screening examinations is a balance between the most effective period for reducing deaths through early detection and the cost of performing more frequent testing. To intercept cancers earlier, the screening interval should be less than half the sojourn time (26).


    THRESHOLDS FOR INTERVENTION
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
As pointed out earlier, the ideal cancer screening test would yield no false-positive results (when the test suggests a possible cancer but the finding proves to be benign) and no false-negative results (when the test results suggest that there is no cancer present when there actually is cancer present that is undetected with the test). A high false-positive rate due to an aggressive interpretation of the test results may be undesirable because it increases the costs that result from screening (because these lesions require additional evaluation). This also increases the harms of screening because some (if not many) of the false-positive cases will require a biopsy to show that they are not cancer. However, waiting for a cancer to develop more suspicious characteristics may allow it to spread before it is detected. An aggressive approach may be the only way to detect cancers at an early enough time to alter the natural history of the tumor. Just as the screening interval may influence the value of the test, the thresholds that are set for intervention will influence the percentage of cancers detected (ie, the sensitivity of the test) and the lives saved.

We are unaware of a screening test that does not yield false-positive and false-negative results. Probably the best-known cancer screening test is the Papanicolaou (Pap) test for cancer of the uterine cervix. Depending on how the test is interpreted, as many as 10% of women will have falsely positive smears and will be recalled for additional testing that turns out to yield negative results. This is usually due to the fact that for all screening tests there is often an overlap in characteristics between benign and malignant lesions. Some lesions (eg, spiculated masses at mammography) have a very high probability of being cancer, while others (eg, well-circumscribed masses at mammography) have a low probability of being cancer, but a small risk nonetheless. If the goal is to find all cancers, then the threshold for intervention must be low and the false-positive call rate will be high. If the goal is to minimize the false-positive call rate, then the threshold for intervention is high and cancers will be allowed to slip through the screening test (27). Until a perfect screening test is developed, there is usually no way to avoid this relationship. One of the goals in developing a screening test should be to develop methods that increase the sensitivity (ie, cancers detected with the test divided by the actual number of cancers that prove to be in the population) of the test for small cancers while keeping the specificity (ie, the number of test results that are interpreted as negative divided by the number of individuals who actually do not have cancer) as high as possible.


    HOW TO MEASURE THE BENEFIT: DISEASE-SPECIFIC OR ALL-CAUSE MORTALITY?
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
One of the criticisms raised by Gotzsche and Olsen (28) concerned the method used in the mammographic screening trials to measure benefit. In trials that are performed to evaluate new chemotherapeutic agents, the absolute number of deaths from any cause among those in the study group is compared with the number of deaths from any cause in the control group. Measuring deaths from all causes (all-cause mortality) enables one to avoid any inadvertent biases that might arise in determining the cause of death.

A drug might reduce the death rate for the targeted disease but actually lead to more deaths from what would appear to be an unrelated cause. For example, use of a particular chemotherapeutic agent reduces the death rate from breast cancer, but if the doses are too high, it causes heart damage that can lead to death. A trial that only evaluated deaths from breast cancer would miss these deaths from cardiac causes, whereas a trial that measured deaths from all causes would demonstrate these deaths. The screening trials were faulted for not comparing all-cause mortality but instead comparing deaths caused by the specific disease being studied (disease-specific mortality).

The reason for this is quite straightforward. Screening trials differ from therapeutic trials in that in the latter, all of the participants have the disease, and the disease will probably cause most of the deaths in the trial. The use of all-cause mortality to measure benefit is practical, and the trials do not require enormous numbers. In screening trials, however, most of the participants will not die. Most will not develop the cancer being studied, and most of the deaths among participants will not be due to the disease being studied but will instead be due to the multiple other causes of death that occur in a general population each year. Since, for example, breast cancer accounts for only a small percentage of deaths each year, it would take huge numbers of participants to show that a reduction in deaths from breast cancer results in a decrease in all-cause mortality. Even if breast cancer accounted for 10% of all the deaths in the general population each year (the actual number is approximately 3%), it has been estimated that a trial would require more than 3 million participants so that a 25% decrease in breast cancer deaths would be reflected as a statistically significant decrease in all-cause mortality (29).

If all-cause mortality is the required measure, then screening trials would become prohibitively large, expensive, and impossible to perform. Consequently, screening trials are designed to measure deaths from the disease being studied. This reduces the size of the trial that is needed to show a benefit, but it places an extra burden on researchers conducting screening trials to ensure that the determination of the cause of death is not biased. If, for example, a participant in a breast cancer screening trial dies with metastatic disease but also heart disease, it might not be clear what actually caused her death. If she had been allocated to the screened group and a reviewer was biased toward screening as being beneficial, her death might be attributed to heart disease so that it would not be counted as a breast cancer death. The opposite might occur if the reviewer was biased against screening. The only way to avoid the potential for bias is to be sure that the review to determine the cause of death is completely blinded as to the allocation of the patient. This is critical if the results of the trial are to be credible. Those who conduct trials must also be alert to unexpected causes of death that might occur as a result of the testing but might not be due to the disease for which the screening is being performed.


    UNEXPECTED CONSEQUENCES OF SCREENING
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Another problem that must be addressed by screening trials is the possibility that the test itself, or the consequences of a positive test result, might lead to deaths from an unanticipated, unexpected, and even unrecognized consequence of the screening. For example, at the time that many of the breast cancer screening trials were underway, many patients who were treated for breast cancer had substantial cardiac radiation exposure from the therapy. This led to coronary artery disease and even death in a small percentage of women (30,31). This problem was not recognized for many years, so these deaths were not attributed to screening. Although the deaths from breast cancer were decreased in the screening trials, concern was raised that there might have been an increase in deaths from this form of coronary artery disease that would negate the benefit. It was feared that the decrease in breast cancer deaths might even be offset by cardiac deaths but that this would not be reflected in the data analysis. Fortunately, only about 5% of the women who were irradiated were affected (30,31), and since women with breast cancer were exposed to the same treatments regardless of their screening allocation, this probably played little role in the results. (Modern radiation therapy avoids this complication.)

Nevertheless, this example points out that unanticipated factors might significantly influence the results of a trial. Even if all-cause mortality is not used to measure benefit, it should be evaluated in screening trials, and deaths from other causes should be compared between the two groups to try to minimize the possibility of overlooking unexpected negative consequences of screening.


    GENERALIZABILITY
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
One of the basic questions that also must be answered is whether or not the results of any study apply only to the population that participated in the study or if they can be "generalized" to the population as a whole. For example, if a screening test is shown to be efficacious for a high-risk group, is it efficacious for the rest of the population that is not at high risk? If the population studied consisted entirely of volunteers and not a random cross section of the population, then the results may not apply to the population as a whole. In general, it is better to test within the population for which the screening test is planned and to perform an RCT with randomly assigned members of this population rather than with a group that is not representative of the population that will ultimately be offered the screening test.

Once a test has been shown to be efficacious in an RCT, then the final confirmation of its value is established by monitoring the disease in the population when the test is introduced into general practice. The ultimate test of efficacy is satisfied if, when the screening examination is introduced into the general population, deaths from the cancer decrease as a result. The widespread use of cervical cancer screening is an extreme example of this. The reason that cervical cancer screening has been accepted, despite the fact that there has never been an RCT to prove its efficacy, is that when it has been applied in populations, the death rate from cervical cancer has decreased in those populations, and when it was withdrawn, the death rate increased (32,33).

Not only has the benefit from mammographic screening been shown in RCTs, but the benefit from mammographic screening for breast cancer has now been shown in large populations outside of trials. In the two counties that participated in the Swedish Two County Trial (34), breast cancer death rates decreased by over 50% among the women living in the two counties when breast cancer death rates before screening was offered were compared with the rates after screening was offered. In the review of data from the two counties, it was observed that women who were not offered screening during the Two County Trial and women who refused the general offer of screening in the most recent years had the same breast cancer death rate as women prior to any screening. Although not conclusive proof, this suggests that most of the decrease in deaths was due to screening and not changes in therapy.

In another study of seven counties in Sweden (comprising 30% of the population of the country), a 30%–44% decrease in breast cancer deaths was attributed to mammographic screening (35). In the United States, the death rate from breast cancer suddenly began to decrease, for the first time in 50 years, in 1989. This was 5–7 years after screening began for large numbers of American women, as evidenced by the sudden increase in cancers being detected beginning around 1983–1985. The relationship between the start of screening and mortality reduction (as well as the suddenness of both) is a marker that strongly suggests that screening is responsible for much of the decrease in breast cancer deaths in the United States (36).

For all of these population reviews, it is important to realize that the analyses are based on the entire population and the rate of deaths from breast cancer (ie, the number of deaths per 100,000 women in the population) so that they are independent of the amount of breast cancer being diagnosed, thus avoiding pseudodisease bias.


    BEWARE OF THE ARTIFICIAL GROUPING OF DATA
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
One of the major controversies that arose in the breast cancer screening trials occurred when investigators wanted to see if menopause had any influence on the benefit that appeared in the trials. Because the age of menopause is difficult to determine, the analysts decided to use the age of 50 years as the average age of menopause and evaluated women aged 40–49 years and compared them with women aged 50 years and older. It turns out that there are no screening results that change abruptly at any age, and certainly not at the age of 50 (37). Most parameters of screening (call-back rates, biopsy rates, and yield of cancer) tend to change steadily with increasing age (38). Through age grouping, the data were made to appear as if there was a sudden change at the age of 50 years, when it was merely an artifact of the dichotomous analysis (39). As a consequence, the age of 50 years was promoted as having some biologic importance that it did not have, and investigators argued that screening should focus on women 50 years and older and not include those younger than 50 years (40), when this argument actually could be made for any age at which the analysis was repeated.

An example of age grouping bias can be seen in evaluating the percentage of individuals with gray hair. There is a steady increase in the percentage of individuals with gray hair with increasing age, but if those numbers were to be analyzed by comparing those younger than 42 years with those 42 years and older, it would appear that there is a sudden increase in the number of individuals with gray hair at the age of 42, making it appear that something must be happening at that age. Analysts must be very careful that their methods of analysis do not bias the results of a trial.


    "AGE CREEP"
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
One of the other controversial issues in the analysis of screening trials lies in trying to determine the age of the patients for whom the screening test is efficacious. For the breast cancer screening trials, it was argued by some that the benefit from screening women aged 40–49 years was due to their having reached the age of 50 during the trial and screening suddenly beginning to become effective (4143).

Superficially, it would seem obvious that the data should be analyzed on the basis of the age at which the cancers are detected. However, this can be misleading. For example, if the screening test reveals cancers, on average, 3 years before they become clinically evident (ie, the "lead time" of the test is 3 years), and an individual in the screened group has a cancer detected at age 48, the individual’s "twin" will not have the cancer detected until age 51. If the data (as happened in the mammographic screening trials) are analyzed by age at diagnosis, and data in individuals younger than 50 years are analyzed as a group while data in those 50 years and older are analyzed as another group, the death of the "control" individual will be counted with the over-50 group, and the benefit for the 48-year-old screened women will not be counted. Prorok et al (44) have pointed out that the best way to analyze the data in these trials is by the age of the individuals at the time that they are allocated to the screened or control group and not by the age at diagnosis.


    SURROGATE END POINTS
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 
Large RCTs are very expensive and difficult to perform properly. Furthermore, if death is the measured end point, it may take years, even decades, to show that a screening test is efficacious. By the time proof is shown, the test may have been replaced by a more effective intervention. In addition, once a screening test becomes widespread, it becomes more and more difficult to have a control group that does not have substantial contamination. One way around this is to conduct the trial in countries where the test has not been widely adopted, but then the technical and interpretive skills of the investigators may be suboptimal and the population demographics may be different. There is no complete agreement as to how to circumvent these problems. Ideally, a factor or group of factors that accurately predict outcome can be substituted for waiting for death, and these surrogate end points can be used to determine efficacy more rapidly. For example, one might expect to see a reduction in the number of late-stage cases if a screening intervention was effective. Surrogate end points could also be used to compare the effect of different screening parameters such as the frequency of screening or the thresholds for intervention. They could be used to avoid having to perform a new RCT for every change in protocol (45).

Unfortunately, there is, as yet, no universal agreement on the use of any surrogate end points for any cancer screening test at this time, and many health planners require evidence of statistically significant mortality reduction in an RCT as proof of efficacy.

In summary, a screening test, unfortunately, is not efficacious simply because it can reveal cancers earlier. It must reveal cancers at a time when they can be successfully treated to either delay death or prevent death from the cancer. This generally can only be shown through the use of RCTs that must be large enough to have the statistical power to be able to "prove" a mortality reduction. Our focus has been on the medical and scientific issues underlying cancer screening. Ultimately, any benefit that can be derived will be balanced against the economic costs, and these will determine whether or not the screening test can be adopted for population-based screening.

We have tried to address the major issues involved in determining the efficacy of a screening test. If investigators wish to avoid many of the controversies that made it so difficult to "prove" the benefit of mammographic screening, they should heed the experience gained from the breast cancer screening controversies. One of the major lessons is that, although costly, RCTs must be sufficiently large, and allocation must be blinded and random so that results will have power and credibility.


    FOOTNOTES
 
Abbreviations: DCIS = ductal carcinoma in situ, RCT = randomized controlled trial


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 SOME BASIC CONCEPTS
 THE IDEAL SCREENING TEST...
 DOES EARLIER DETECTION MEAN...
 INCREASED SURVIVAL DOES NOT...
 OVERDIAGNOSIS (OR PSEUDODISEASE)...
 LENGTH BIAS SAMPLING
 SELECTION BIAS
 RCT IS THE ONLY...
 BLINDED RANDOMIZATION IS...
 STATISTICAL POWER
 STATISTICAL SIGNIFICANCE
 POINT ESTIMATES AND THE...
 NONCOMPLIANCE AND CONTAMINATION
 SCREENING INTERVAL
 THRESHOLDS FOR INTERVENTION
 HOW TO MEASURE THE...
 UNEXPECTED CONSEQUENCES OF...
 GENERALIZABILITY
 BEWARE OF THE ARTIFICIAL...
 "AGE CREEP"
 SURROGATE END POINTS
 REFERENCES
 

  1. Santayana G. Reason in common sense. The life of reason New York: Scribner’s;, 1905; 284.
  2. Kolb TM, Lichy J, Newhouse JH. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology 2002; 225:165-175.[Abstract/Free Full Text]
  3. Stanley RJ. 2001 ARRS presidential address: inherent dangers in radiologic screening. AJR Am J Roentgenol 2001; 177:989-992.[Free Full Text]
  4. Obuchowski NA, Ruffin RJ, Baker ME, Powell KA. Ten criteria for effective screening: their application to multislice CT screening for pulmonary and colorectal cancers. AJR Am J Roentgenol 2001; 176:1357-1362.[Free Full Text]
  5. Gotzsche PC, Olsen O. Is screening for breast cancer with mammography justifiable? Lancet 2000; 355:129-134.[CrossRef][Medline]
  6. Olsen O, Gotzsche PC. Cochrane review on screening for breast cancer with mammography. Lancet 2001; 358:1340-1342.[CrossRef][Medline]
  7. Kopans DB. The most recent breast cancer screening controversy about whether mammographic screening benefits women at any age: nonsense and nonscience. AJR Am J Roentgenol 2003; 180:21-26.[Free Full Text]
  8. Jackson VP. Screening mammography: controversies and headlines. Radiology 2002; 225:323-326.[Free Full Text]
  9. Kolata G. Study sets off debate over mammograms’ value. The New York Times 2001; Dec 9:sect 1A:1(col 4).
  10. Kolata G. Questions grow over usefulness of some routine cancer tests. The New York Times 2001; Dec 16.
  11. Gordis L. Epidemiology Philadelphia: Saunders, 1996; 229-246.
  12. Wellings SR, Jensen HM. On the origin and progression of ductal carcinoma in the human breast. J Natl Cancer Inst 1973; 50:1111-1118.
  13. Silverstein MJ, Lagios MD. Use of predictors of recurrence to plan therapy for DCIS of the breast. Oncology 1997; 11:393-410.[Medline]
  14. Ernster VL, Barclay J, Kerlikowske K, Grady D, Henderson C. Incidence of and treatment for ductal carcinoma in situ of the breast. JAMA 1996; 275:913-918.[Abstract/Free Full Text]
  15. Feig SA. Ductal carcinoma in situ: implications for screening. Radiol Clin North Am 2000; 38:653-668.[CrossRef][Medline]
  16. Kopans DB, Feig SA. The Canadian National Breast Screening Study: a critical review. AJR Am J Roentgenol 1993; 161:755-760.[Abstract/Free Full Text]
  17. Tarone RE. The excess of patients with advanced breast cancers in young women screened with mammography in the Canadian National Breast Screening Study. Cancer 1995; 75:997-1003.[CrossRef][Medline]
  18. Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Controlled Clin Trials 1981; 2:93-113.[CrossRef][Medline]
  19. Moher D, Dulberg C, Wells GA. Statistical power, sample size, and their reporting in randomized controlled trials. JAMA 1994; 272:122-124.[Abstract/Free Full Text]
  20. Kopans DB, Halpern E, Hulka CA. Statistical power in breast cancer screening trials and mortality reduction among women 40–49 with particular emphasis on the National Breast Screening Study of Canada. Cancer 1994; 74:1196-1203.[CrossRef][Medline]
  21. Hendrick RE, Smith RA, Rutledge JH, Smart CR. Benefit of screening mammography in women ages 40–49: a new meta-analysis of randomized controlled trials. Monogr Natl Cancer Inst 1997; 22:87-92.
  22. Shapiro S, Venet W, Strax P, Venet L. Periodic screening for breast cancer: the Health Insurance Plan Project and its sequelae, 1963–1986 Baltimore: The Johns Hopkins University Press, 1988.
  23. Andersson I, Aspegren K, Janzon L, et al. Mammographic screening and mortality from breast cancer: the Malmo mammographic screening trial. BMJ 1988; 297:943-949.
  24. Fletcher SW, Black W, Harris R, Rimer BK, Shapiro S. Report of the International Workshop on Screening for Breast Cancer. J Natl Cancer Inst 1993; 85:1644-1656.[Abstract/Free Full Text]
  25. Eyre H, Sondik E, Smith RA, Kessler L. Joint meeting on the feasibility of a study of screening premenopausal women (40–49 years) for breast cancer: April 20–21, 1994. Cancer 1995; 75:1391-1403.[CrossRef][Medline]
  26. Kopans DB. Breast imaging 2nd ed. Philadelphia: Lippincott Williams & Wilkins, 1997; 58-61.
  27. D’Orsi C. Swets JA. Variability in the interpretation of mammograms (letter). N Engl J Med 1994; 332:1172.
  28. Gotzsche PC, Olsen O. Is screening for breast cancer with mammography justifiable? Lancet 2000; 355:129-134.
  29. Kopans DB, Halpern E. Re: All-cause mortality in randomized trials of cancer screening (letter). J Natl Cancer Inst 2002; 94:863.[Free Full Text]
  30. Cuzick . Cause-specific mortality in long term survivors of breast cancer who participated in trials of radiation therapy. J Clin Oncol 1994; 12:447-453.[Abstract]
  31. Early Breast Cancer Trialists’ Collaborative Group. Favourable and unfavourable effects on long-term survival of radiotherapy for early breast cancer: an overview of the randomised trials. Lancet 2000; 355:1757-1770.[CrossRef][Medline]
  32. Miller AB, Lindsay J, Hill GB. Mortality from cancer of the uterus in Canada and its relationship to screening for cancer of the cervix. Int J Cancer 1976; 17:602-612.[Medline]
  33. Johannesson G, Geirsson G, Day N. The effect of mass screening in Iceland, 1965–1974, on the incidence and mortality of cervical carcinoma. Int J Cancer 1978; 21:418-425.[Medline]
  34. Tabar LK, Vitak B, Chen HHT, Yen MF, Duffy SW, Smith RA. Beyond randomized controlled trials: organized mammographic screening substantially reduces breast cancer mortality. Cancer 2001; 91:1724-1731.[CrossRef][Medline]
  35. Duffy SW, Tabar L, Chen H, et al. The impact of organized mammography service screening on breast carcinoma mortality in seven Swedish counties. Cancer 2002; 95:458-469.[CrossRef][Medline]
  36. Kopans DB. Beyond randomized controlled trials: organized mammographic screening substantially reduces breast carcinoma mortality (letter). Cancer 2002; 94:580-581.[CrossRef][Medline]
  37. Kopans DB, Moore RH, McCarthy KA, et al. The positive predictive value of mammographically intitated breast biopsy: there is no abrupt change at age 50 years. Radiology 1996; 200:357-360.[Abstract/Free Full Text]
  38. Feig SA. Age-related accuracy of screening mammography: how should it be measured? Radiology 2000; 214:633-640.[Free Full Text]
  39. Kopans DB. The breast cancer screening controversy: lessons to be learned. J Surg Onc 1998; 67:143-150.[CrossRef][Medline]
  40. Kerlikowske K, Grady D, Barclay J, Sickles EA, Eaton A, Ernster V. Positive predictive value of screening mammography by age and family history of breast cancer. JAMA 1993; 270:2444-2450.[Abstract/Free Full Text]
  41. Fletcher SW, Black W, Harris R, Rimer BK, Shapiro S. Report of the International Workshop on Screening for Breast Cancer. J Natl Cancer Inst 1993; 85:1644-1656.
  42. de Koning HJ, Boer R, Warmerdam PG, Beemsterboer PMM, van der Maas PJ. Quantitative interpretation of age-specific mortality reductions from the Swedish breast cancer-screening trials. J Natl Cancer Inst 1995; 87:1217-1223.[Abstract/Free Full Text]
  43. Tabar L, Duffy SW, Chen HH. Re: Quantitative interpretation of age-specific mortality reductions from the Swedish Breast Cancer-Screening Trials (letter). J Natl Cancer Inst 1996; 88:52-53.[Free Full Text]
  44. Prorok PC, Hankey BF, Bundy BN. Concepts and problems in the evaluation of screening programs. J Chron Dis 1981; 34:159-171.[CrossRef][Medline]
  45. Feig SA. Determination of mammographic screening intervals with surrogate measures for women aged 40–49 years. Radiology 1999; 193:311-314.



This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
M. J. Kim, E.-K. Kim, J. Y. Kwak, B.-W. Park, S.-I. Kim, J. Sohn, and K. K. Oh
Sonographic Surveillance for the Detection of Contralateral Metachronous Breast Cancer in an Asian Population
Am. J. Roentgenol., January 1, 2009; 192(1): 221 - 228.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
H. Sihto, J. Lundin, T. Lehtimaki, M. Sarlomo-Rikala, R. Butzow, K. Holli, L. Sailas, V. Kataja, M. Lundin, T. Turpeenniemi-Hujanen, et al.
Molecular Subtypes of Breast Cancers Detected in Mammography Screening and Outside of Screening
Clin. Cancer Res., July 1, 2008; 14(13): 4103 - 4110.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
I. M. Burger, N. E. Kass, J. H. Sunshine, and S. S. Siegelman
The Use of CT for Screening: A National Survey of Radiologists' Activities and Attitudes
Radiology, July 1, 2008; 248(1): 160 - 168.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
O. Graf, T. H. Helbich, G. Hopf, C. Graf, and E. A. Sickles
Probably Benign Breast Masses at US: Is Follow-up an Acceptable Alternative to Biopsy?
Radiology, July 1, 2007; 244(1): 87 - 93.
[Abstract] [Full Text] [PDF]


Home page
JNMHome page
H. Schoder and M. Gonen
Screening for Cancer with PET and PET/CT: Potential and Limitations
J. Nucl. Med., January 1, 2007; 48(1_suppl): 4S - 18S.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
C. D. Furtado, D. A. Aguirre, C. B. Sirlin, D. Dang, S. K. Stamato, P. Lee, F. Sani, M. A. Brown, D. L. Levin, and G. Casola
Whole-Body CT Screening: Spectrum of Findings and Recommendations in 1192 Patients
Radiology, November 1, 2005; 237(2): 385 - 394.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
D. B. Kopans
Bias in the Medical Journals: A Commentary
Am. J. Roentgenol., July 1, 2005; 185(1): 176 - 182.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
S. Strano, P. Crystal, and D. Kopans
Adjunct Sonography and Not Screening in Cancer Detection
Am. J. Roentgenol., August 1, 2004; 183(2): 539 - 539.
[Full Text] [PDF]


Home page
NEJMHome page
L. Liberman
Breast Cancer Screening with MRI -- What Are the Data for Patients at High Risk?
N. Engl. J. Med., July 29, 2004; 351(5): 497 - 500.
[Full Text] [PDF]


Home page
Br. J. Radiol.Home page
A K Dixon
Whole-body CT health screening
Br. J. Radiol., May 1, 2004; 77(917): 370 - 371.
[Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
D. B. Kopans
Sonography Should Not Be Used for Breast Cancer Screening Until Its Efficacy Has Been Proven Scientifically
Am. J. Roentgenol., February 1, 2004; 182(2): 489 - 491.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kopans, D. B.
Right arrow Articles by Feig, S. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kopans, D. B.
Right arrow Articles by Feig, S. A.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE