Cancer screening (hereafter referred to as screening)is synonymous with secondary prevention, in which earlier therapeutic intervention is possible through screening an asymptomatic population to identify cancer at an earlier stage than it would have been diagnosed in the absence of screening. Screening is distinguished from case detection or case finding, which occurs when
the patient presents to a physician with symptoms or suspicion of a condition. The goal of screening is to reduce mortality from the disease and/or reduce the severity of the disease through early diagnosis and treatment.
Screening is the application of a test to an asymptomatic population to determine who is likely to have the disease and who is not likely to have the disease. Thus, screening tests are not generally diagnostic tests. The detection of occult disease through screening is a two-phase process because those who have positive results from a screening test must undergo diagnostic procedures
to determine if they actually have the disease.
There are circumstances in which screening can contribute to primary prevention. for example, when colorectal cancer screening leads to the identification and removal of an adenoma, which subsequently reduces the incidence of colorectal cancer.
Screening for cancer began with the Pap smear, a test developed by George Papanicolaou, who in 1941 published his seminal article: “The diagnostic value of vaginal smears in carcinoma of the uterus” in the American Journal of Obstetrics and Gynecology. He initially presented his research at the Third Race Betterment Conference in Battle Creek, Michigan, in 1928, but his
colleagues were sufficiently discouraging of his ideas about detecting cancer through exfoliative cytology that Papanicolaou abandoned the work for many years. He returned to it more than a decade later and eventually established the correlation between cells scraped from the surface of the cervix and the detection of cervical carcinoma, which he published in 1941. In 1942, he published “Diagnosis of uterine cancer by the vaginal smear.” The first widespread use of this technology may have been as early as 1937
when Dr. Elise L'Esperance established a cancer detection center in New York and began using the Pap smear to test women for cervical cancer. Cervical cytology followed by biopsy of a positive test is still the principal method for cervical cancer screening in the world.
Breast cancer screening started in the 1960s when mammography became available. To determine if mammography screening would reduce breast cancer mortality, the Health Insurance Plan of New York study was initiated in 1963. This landmark study of 62,000 women aged 40 to 64 years lasted 25 years and provided the first experimental evidence of the efficacy of breast cancer screening.
This study also increased awareness of two important concepts in cancer screening: lead time bias and length bias sampling, which are discussed later.
A number of other cancer screening trials followed the Health Insurance Plan study, including additional breast cancer screening trials in Europe and Canada. Lung cancer screening trials using chest radiograph and sputum cytology were conducted in the 1970s, and the first colorectal cancer screening trial was initiated in 1975. Three other colorectal cancer trials followed.
Subsequently, trials evaluating tests for prostate and ovarian cancers were initiated in the 1990s. Today the only proven cancer screening tests are for cervical, breast, and colorectal cancers, although trials have been underway for prostate, lung, and ovarian cancers. Results recently published for prostate cancer screening using the prostate-specific antigen (PSA) test were not consistent.
Principles of screening
Although most cancers have a better prognosis if diagnosed earlier in their natural history, this basic observation is not in itself sufficient to justify screening an asymptomatic population for cancer. A number of criteria should be met before initiating cancer screening, these having been first outlined by Wilson and Junger in 1968 for the World Health Organization. These
principles of cancer screening along with some additional considerations are as follows:
The disease should be an important public health problem in terms of its frequency and/or severity. Historically, the development of this principle was in the general context of screening for infectious and chronic diseases and not related specifically to cancer. Today some of the cancer sites considered for screening are not particularly common diseases; nevertheless, early detection
and subsequent reduction of mortality can result in a significant benefit in life-years saved.
The natural history of the disease presents a window of opportunity for early detection. For cancer this generally refers to a detectable preclinical phase (DPCP), and it represents the interface between characteristics of the disease and the screening technology. It is during this period that screening is considered optimal to detect the disease early and prior to the development of
symptoms. For screening to be effective, the recommended screening interval must be shorter than the estimate of the DPCP.
An effective treatment should be available that favorably alters the natural history of the disease. Usually for cancer this means a reduction in cause-specific mortality.
The treatment should be more effective if initiated during the presymptomatic (or earlier) stage than during the symptomatic (or later) stage; that is, if treating early (presymptomatic) stage has no advantage over treating late (symptomatic) stage, then the cost and the risk of screening cannot be justified.
A suitable screening test should be available, that is, one that is accurate, acceptable to the population, fairly easy to administer, safe, and relatively inexpensive.
There should be an appropriate screening strategy for the target population (i.e., an age to begin screening and a screening interval).
The screening guidelines should be based on good scientific evidence (usually based on results of a randomized controlled clinical trial) and economically feasible:
Screening programs should have high rates of participation from the eligible population.
Screening programs for a particular geographic area should take into account specific resources available for screening, diagnosis, and treatment so that countries can focus on optimal recommendations based on available resources.
Screening programs should be sensitive to patient and provider concerns.
Screening programs should ensure prompt follow-up of positive tests with a diagnostic examination and prompt treatment of cases.
Screening programs should be cost-effective.
Screening programs should be monitored and regularly evaluated.
Most of these principles are fairly straightforward, but a few warrant further discussion.
Evaluating Screening Tests
The accuracy or validity of a screening test, that is, its ability to distinguish between diseased and nondiseased people, is measured by sensitivity and specificity (Table 1). Sensitivity refers to the ability of the screening test to correctly identify people with the disease among the screened population and is defined as the number of people screened with a positive test
divided by those who actually have the disease. Specificity refers to the ability of the test to correctly identify people without the disease among the screened population and is defined as the number of people with a negative test divided by the number of people who do not have the disease. Ideally, sensitivity and specificity would be 100% accurate, but unfortunately there is no cancer screening test that performs this well. Hence, although a majority of people undergoing screening will have accurate test
results, some will be labeled by the screening test as positive but eventually will be found not to have the disease following the diagnostic workup (false-positives), and some with the disease will be labeled negative by the screening test and thus are missed cases or false-negatives. Sensitivity and specificity are inversely related, ultimately increasing one results in a decrease in the other after a certain threshold of accuracy is reached.
For a quantitative test (e.g., a quantitative immunochemical test for colorectal cancer), the cutoff level to designate a test positive can be adjusted to the extent that all potential cases can be identified (sensitivity equals 100%). However, to do so would sacrifice specificity, resulting in a large number of individuals who would be unnecessarily subjected to a costly and
sometimes risky diagnostic procedure. Thus, balancing sensitivity and specificity (when possible) is important in determining the outcome of a screening program.
Measuring Test Performance
Evaluating the sensitivity and specificity of cancer screening tests in community practice poses unique challenges. Specificity is more easily measured as false-positive outcomes are identified in the near term because of the workup of individuals with positive tests. Measuring sensitivity is a greater challenge. Although identifying true positives, like false-positives, occurs
at the conclusion of the diagnostic evaluation, those with cancer who test negative are not subjected to a diagnostic evaluation. Hence, immediate ascertainment of the false-negative rate generally is not possible. In practice, estimates of test sensitivity rely on long-term follow-up through cancer registries to determine which individuals were diagnosed with cancer within a fixed interval after a negative screening test. This is the most common method, and it measures cases diagnosed between screenings (interval
cancers) as the criterion for a false-negative test result. This method assumes that these cases were detectable at the screening but were missed.
Table 1. Outcomes of a Screening and Diagnostic Programa
Positive screening test
True positives (a)
Test positives (a+ b)
Negative screening test
True negatives (d)
Test negatives (c + d)
Total with disease(a + c)
Total without disease(b + d)
Total screened (a + b + c + d)
Sensitivity = a/(a + c).
Specificity = d/(b + d).
Positive predictive value = a/(a + b).
Negative predictive value = d/(c + d).
In research settings, test performance may be measured by applying definitive diagnostic tests to all individuals with normal and abnormal screening test results, although this methodology is uncommon because the prevalence of detectable cancer is low and the costs associated with this study design are very high. Alternatively, test performance may be measured by applying the
screening test to a group of symptomatic patients. However, this method does not measure screening performance accurately as the results from this approach are not applicable to an asymptomatic population that ultimately will be screened. Regardless of the complexity of screening technology under evaluation, the test may have much better performance in a symptomatic population than in an asymptomatic population.
The proportion of the population with detectable preclinical disease is an important factor in determining the success of screening. If the proportion is low, then relatively few cases will be detected and the yield may be considered too low relative to the cost. The length of the DPCP will also influence the frequency of screening. A screening test that can detect a lesion
very early in its development will generally be associated with a longer DPCP than a test that more commonly detects a lesion that is more advanced. However, if a screening program is detecting a high percentage of advanced cases, this suggests either the screening interval is too long, allowing considerable tumor progression during the DPCP, or that the test has low test sensitivity perhaps resulting from poor quality assurance. More commonly, host characterists limit optimal performance of screening tests in
Positive Predictive Value
An important parameter in evaluating a screening program is positive predictive value (PPV), which is the proportion of individuals with a positive screening test who actually have the disease. The PPV can be computed only after the diagnostic examinations of those who test positive had been completed. A PPV of 10% means that only one in ten of the patients with positive test
results truly had the disease. The other nine received the diagnostic examination and incurred cost and risks that are commonly described as “unnecessary.” Actually, it is not reasonable to label all false-positives as unnecessary because additional tests and invasive procedures often are necessary in the presence of a positive screening test in order to confirm the presence or absence of cancer. As previously noted, screening tests are not diagnostic tests. Thus, it is important to distinguish, at least conceptually,
between what might be labeled as unavoidable versus avoidable follow-up tests. If poor quality results in an excess rate of false-positives, the workups prompted by this fraction of positive test results truly can be labeled as unnecessary and avoidable.
The PPV is influenced by three factors: sensitivity and specificity of the test and the prevalence of disease. Specificity has a bigger effect on PPV because most people who are screened for cancer do not have the disease. If the specificity is increased to improve PPV, at some point the sensitivity will likely decrease (as sensitivity and specificity ultimately are inversely
related) and the number of false-negative findings will increase. If the rate of disease increases with increasing age, the PPV also will improve in the higher age groups, even if sensitivity and specificity do not change at all. Alternatively, focusing the screening on a population with a higher prevalence of disease can be accomplished by restricting the screening program to higher-risk individuals who are more likely to have the disease.
Although the PPV is regarded as a measure of effectiveness, a proper interpretation of it requires data on the tumor characteristics of the cancers detected and the cancer detection rate. A program with a very high PPV, but mostly very large tumors, will achieve less than a program with a lower PPV, but a greater cancer detection rate, and a more favorable distribution of tumor
Test Sensitivity Versus Program Sensitivity
There is a difference between the performance of a screening test applied once (test sensitivity) and the performance of a screening test applied multiple times to the same population (program sensitivity). If a population is highly adherent with screening recommendations, program sensitivity will generally be higher than test sensitivity because of the greater chance of detecting
a cancer on the second round of screening that was missed on the first round of screening. If the screening interval is considerably shorter than the DPCP, then the limitations of a test with lower sensitivity can be overcome with dependable, successive opportunities for detection. However, under circumstances with the same test sensitivity, if adherence with the screening interval is poor, then program sensitivity will be lower because the screening program may be dominated by both missed cancers and cancers
detected symptomatically out of interval.
Test sensitivity and program sensitivity also will vary based on the proportion of the population that has undergone screening. When evaluating a test applied to a population, there generally will be different outcomes from the initial screening that detects prevalent cases than from the subsequent screenings that detect incident cases (usually prevalence is greater than incidence).
This observation applies to the program overall, but also to the subpopulations that enter the ongoing screening program for the first time. The initial screening will generally detect more cases, and more advanced cases, than subsequent screenings because the pool of prevalent cancers has been reduced after the initial round of screening.
In evaluating program performance over time, the underlying prevalence of disease overall and in subpopulations will influence PPV. Thus, a screening program administered to a lower-risk population will have a different outcome than one administered to a higher-risk population. Variability in PPV can be seen in screening for the same cancer where the underlying prevalence of
disease is lower in the younger population undergoing screening than the older population, even if test sensitivity is the same. The quality of the screening and the quality of the diagnostic workup will also influence the outcome of the program.
Threats to Validity
Because of various biases that can affect survival in screening-detected versus nonscreening-detected cancers, such as lead time and length bias sampling (described later), survival alone (case fatality) cannot be used to determine the efficacy of screening. The most informative evaluation of a screening test is a randomized controlled clinical trial (RCT) comparing cause-specific
mortality among individuals randomly allocated to screening and individuals randomly allocated to usual care. Overall mortality (i.e., all-cause mortality) is not a sensitive indicator if the number of deaths from the disease of interest represents a relatively small proportion of the total number of deaths, as is the case with most cancers. Further, the goal of screening for a particular cancer is not to prevent an individual from dying of any cause, but rather to avoid a premature death or significant morbidity
for a particular cancer.
Generally RCTs for cancer screening tests are expensive and take considerable time to complete. A large number of participants are usually needed, and they must be engaged in the study for many years to be able to detect a difference in cause-specific mortality if in fact the test is effective. For example, in evaluating fecal occult blood testing for colorectal cancer there
have been four major RCTs involving a total of more than 300,000 people. The duration of the trials was at least 15 years. Following the successful completion of trials for one screening test such as Hemoccult (Beckman Coulter Inc., Fullerton, California) for colorectal cancer, is it necessary to conduct a full-scale RCT for another similar test such as an immunochemical fecal occult blood test? Because there are a number of similar fecal occult blood tests, each would have to be evaluated using thousands of
people studied over a 15-year period. Clearly, such trials are not needed for every test, and a study could be designed to evaluate the performance of a new fecal occult blood test against the proven one using many fewer people over a shorter period of time.
Case-control studies, which can be conducted with fewer people and in a relatively short period of time, have been used to evaluate screening tests, but because of potential selection bias these studies are not sufficient to provide definitive evidence of effectiveness. The U.S. Preventive Services Task Force, which develops recommendations for screening tests, including cancer
screening tests, relies primarily on RCTs for proof that a test is effective in reducing deaths.
Lead Time Bias
In the evaluation of cancer screening tests, it is important to distinguish between the survival rate and the mortality rate, which principally is the reason there is a distinction between lead time and lead-time bias. Because screening advances the time of diagnosis, the duration of time between when a cancer is detected by screening and when it would have been detected because
of symptoms is referred to as the lead time and achieving lead time is a fundamental goal of screening. The lead time gained nearly always is less than the DPCP, as there will be few individuals who have perfect concordance between the date of screening and the date that their cancer entered the DPCP. As previously noted, the recommended screening interval must be shorter than the estimated DPCP in order to have the greatest probability of detecting most cancers through regular screening.
The survival rate is the percentage of people diagnosed with a particular cancer who are alive after a specific duration of time. In the absence of screening, survival is measured from the time of diagnosis associated with symptomatic disease and the proportion dying or surviving over a particular duration. In a screening program, survival time is measured by the average time
between the date of diagnosis as a result of a screening that detected occult tumor and the proportion surviving or dying over a particular duration. However, if screening results in earlier detection of disease, but death occurs at the same time as it would in the absence of screening, then there will appear to be an increase in mean survival associated with screening when in fact there is not. This is referred to as lead-time bias and is represented by the interval of time between detection by screening and
the time when the diagnosis would have been made in the absence of screening, that is, generally when the patient is symptomatic. It represents the amount of time by which treatment is advanced because of earlier detection, but there is no survival advantage for the patient. However, the proportion of cases that survive for some time after diagnosis will be higher, thus giving the impression that screening is effective.
The goal of screening is to reduce the incidence rate of advanced disease, and that is achieved by advancing the lead time before a cancer becomes symptomatic. Prospective RCTs avoid lead-time bias because the end point of interest is mortality. Studies that show higher survival in screening-detected cancers compared with symptom-detected cancers, however encouraging, do not
provide sufficient evidence to endorse a policy of offering screening to the population because lead-time bias cannot be ruled out.
Although lead-time bias can increase survival in screening-detected cases, its true influence is limited by the duration of the detectable preclinical phase. Thus, the effect of lead-time bias on survival occurs in the near term, whereas long-term differences in survival are more likely to be influenced by length bias, which is discussed next.
Length-bias sampling refers to the tendency for screening to be more successful at detecting slow-growing, less aggressive disease, and to be less successful at detecting more aggressive, faster-growing disease. Length bias simply refers to the greater likelihood of screening-detected cancers having a longer DPCP, and hence a greater likelihood of being detected.
Length-bias sampling is a function of the variability in cancer progression rates. For any given screening interval, there is a greater probability of detecting a slower-growing cancer than a faster-growing cancer. If the slower-growing cancers have better prognosis, the screening will selectively identify cases at a lower risk of death, and this “length-bias sampling” will
create the impression that screening is more effective than it actually is, when in fact the increased survival is simply the result of the detection of slower-growing cancers with a more favorable prognosis. Overdiagnosis is an extreme case of length-bias sampling, but it is difficult to estimate the rate of overdiagnosis as there is no way to prove that a given tumor would remain subclinical indefinitely.
A consequence of screening can be overdiagnosis, which is the detection of a cancer that would not have progressed to become symptomatic in the person's lifetime. Such lesions, when detected, are currently indistinguishable from lesions that are, or evolve, to become clinically significant. The example generally cited to illustrate overdiagnosis bias resulting from screening
is the detection of prostate cancer using the PSA test. Because of the high prevalence of latent prostatic cancer in older men, a screening test such as PSA will detect some cancers that, in the absence of screening, would have remained asymptomatic during the remainder of the person's lifetime. As a result of the testing they are identified and subsequently treated in the same manner and perhaps with the same urgency as all other prostate cancers. However, there may be the same serious consequences for patients
undergoing treatment of these latent cancers, including short-term disability, unnecessary costs, and long-term treatment-related side effects as experienced by those who needed to be treated to avoid premature death. Further, in reality the screening has produced no true benefit for these patients because they did not have a life-threatening cancer, although there may appear to be a benefit in that the patient's outcome in terms of survival is favorable.
Overdiagnosis is largely a theoretical and statistical concept. It is very difficult to measure, and where experimental evidence has shown that screening is associated with a reduction in deaths from cancer, overdiagnosis likely represents a small contribution to the harms in comparison to a larger benefit from screening.
Individuals who participate in cancer screening are usually different from those who do not participate, and these differences could have an affect on disease outcomes. For example, compared with the population who does not undergo screening, those who do generally will be more health conscious and healthier, more aware of the signs and symptoms of disease, have access to better
health care, and be more adherent to treatment.
Developing and Evaluating a Cancer Screening Program
Cancer screening programs are generally designed to administer tests multiple times on a recommended schedule (e.g., annual mammograms). Current cancer screening tests are generally not designed for a one-time application, although one test, colonoscopy for colorectal cancer (the diagnostic examination), has been considered as a screening test that could be applied periodically
(every 10 years) or even once in a lifetime. Continued screening of a population will lessen the consequences of false-negative results because these cancers may be detected in subsequent screenings. If disease progression is not too rapid, as for example in most colorectal cancers, then a false-negative finding on a single screen may not have serious long-term consequences if the cancer is detected on a follow-up screen. An effective screening test may not significantly reduce disease-specific morbidity and
mortality in the population if the participation rate with the screening and/or the diagnostic evaluation in the program is low. Thus, adherence to the screening and subsequent diagnostic program is essential to ensure that the benefit accrues to the population. A good health education program is important to ensure that the population understands the disease and the importance of screening, the screening method to be used, the diagnostic procedures, potential risks, and treatment options. Some countries are
able to rely on population registers to remind individuals that screening tests are due, whereas in the United States, the referring physician plays a key role.
Once screening has been introduced in a population, how soon should benefits be evident in population trends of disease rates? The answer to this question is not easily answered, and depends on the duration of the rollout period, the rate of uptake, nonscreening influences on incidence and mortality (e.g., behavioral changes, improvements in therapy), the lag in surveillance
data, and the ability to isolate screened and nonscreened cohorts in an analysis. In 2009, several publications on cancer screening argued that the benefits of screening were lower and harms higher than commonly perceived and challenged the value of screening for breast and prostate cancer as trends in incidence and mortality were not more favorable in the presence of significant screening rates. They stated that optimal screening should have produced a rise in incidence rates, followed by a fall in rates, and
then a return to prescreening rates, which should have a more favorable stage distribution. This theoretical scenario would result from a rise in incidence during screening related to lead time, followed by the decline in incidence from cancers already having been detected, and then a return to prescreening incidence rates. They observed that in the United States, breast and prostate cancer screening has not produced that trend, but instead has led to an increase in localized disease, without a decline in advanced
disease. Despite contrary results from RCTs, these studies concluded that screening is not very effective at altering the natural history of aggressive disease, and mostly detects less aggressive and indolent (i.e., overdiagnosed) cases.
However, this scenario is not very realistic because the entire population is different from the potentially screened population, the actually screened population, and the occasionally and regularly screened populations. Incidence rates include cancers detected in adults who are not eligible for screening, have no access to screening, are eligible but refuse screening, are irregularly
screened, screened but did not have their early-stage disease detected, and those who entered the screening cohort for the first time, of which the latter group will generally manifest the characteristics of a prevalent screening round. The conclusion that much of the excess of disease represented significant overdiagnosis may be explained by the short period of observation, a trend in rising incidence rates, and the expected effect of lead time. Even though overdiagnosis is a significant problem in prostate
cancer screening, it is likely a small problem in breast cancer screening, and mostly limited to ductal carcinoma in situ. Short-term evaluations of population surveillance data are not a sound basis for judging the effectiveness of screening.
Beyond population adherence, it is important to continuously evaluate screening programs to measure performance and apply corrective action where appropriate. Screening must be appreciated as a continuum of interrelated steps. Compromises in the quality at any one step can significantly reduce the benefit of a screening program.
comments powered by