Outcomes studies, long common on the therapeutic side of medicine, are appearing in the diagnostic arena. Outcomes can be defined as results of medical interventions (therapies or tests) in terms of health or cost. The studies of outcomes are important because funding for medical interventions increasingly depends on them; a major accrediting agency even defines “quality” entirely in terms of outcomes.
The study of laboratory-related outcomes is complex. Multiple steps occur between testing and outcomes, physicians act unpredictably on test results, and outcomes studies have high costs relative to potential profit from the test. Study design often must specify the action that is to follow a test result.
The model outcomes study is a randomized controlled trial (RCT). The CONSORT statement, which is used as a guideline for RCTs of therapies, is largely applicable to studies of diagnostic interventions. Recent laboratory-related RCTs have addressed questions such as: “Does routine testing before cataract surgery decrease morbidity or mortality?” and “Does fecal occult bleed testing decrease the incidence of colorectal cancer?”
RCTs of tests are sometimes impractical. Other approaches include simulation modeling and the use of intervention and control periods of testing. As for RCTs, these approaches require careful attention to study design, data analysis, and interpretation and reporting of results.
Medical tests are central to modern medicine and require better characterization than has been the norm in the past. Tests must be characterized in several important ways (Table 1⇓ ). Conceptually, the first step is to determine the analytical features of a new test. Clinical chemists have focused on (and are the traditional experts on) analytical features: precision, accuracy (or “trueness”), detection limits, linearity, interferences, and so forth. This focus is clearly evident from the “Information for Authors” of Clinical Chemistry (1), and articles in this Journal and elsewhere have exemplified the high standards of clinical chemists in defining the analytical performance of tests.
Clinical chemists have also been active in other areas of test evaluation. They are among the foremost pioneers in the characterization of nonanalytical factors related to tests, including within- and between-person biological variation. In recent years, clinical chemists and clinical epidemiologists have stressed the need to improve the characterization of the diagnostic accuracies of tests (2)(3). Indeed, the “Information for Authors” of Clinical Chemistry (1) now contains a checklist (3) of items to include in studies of diagnostic accuracy. By contrast, neither clinical chemists nor others have focus2ed as intently on studies of the clinical usefulness of tests.
Study of the clinical usefulness of tests is arguably the most important of the types of test characterization in Table 1⇑ . An analytically sound test with high diagnostic accuracy may or may not be viewed as highly useful by the tested person. The test may have risks (e.g., radiation exposure or reaction to injected dye), discomfort, and financial costs. It may lead to anxiety or to risky therapy of unproven benefit. Its performance and interpretation may lead to delay in needed treatment. In sum, an accurate test may lead to poor outcomes.
This report considers the importance of outcomes studies for laboratory medicine; addresses problems in performing such studies; highlights a guideline for performance of randomized, controlled outcomes studies; and describes selected studies of laboratory-related outcomes that may serve to stimulate thought regarding ways to perform these studies.
What Are Outcomes and Outcomes Studies?
Outcomes are results of medical interventions in terms of health or cost (4). Patient outcomes are results that are perceptible to the patient (4). Typical measures in outcomes studies include mortality, morbidity, and cost of care, but also others such as nosocomial infection rate, length of stay (LOS), 1 readmission rate, and satisfaction with care.
Outcomes studies are sometimes confused with studies of prognostic accuracy of a test. Prognostic accuracy is a special case of diagnostic accuracy. Studies of prognostic accuracy of a test assess the ability of a test to predict mortality, morbidity, and so forth (e.g., risk stratification). They address questions of the form: Is result A of test X associated with event Y? Guidance on the conduct of such studies is provided in guidelines for studies of diagnostic accuracy (2). By contrast, outcomes studies (of laboratory testing) address questions of the form, Is use of test X associated with outcome Y? Thus a study may ask: Does introduction of point-of-care testing (POCT) decrease LOS? Does use of fecal occult blood testing decrease the incidence of colorectal cancer? The test attributes that can be studied include not just availability of the test, but also the use of different versions of a test (e.g., with higher or lower imprecision), test turnaround time, and the method of relating results of the test to the patient or caregiver. Examples of test attributes that can be assessed in outcomes studies are given in Table 2⇓ along with selected outcomes that can be measured.
An ideal outcomes study can address the question of whether use of the studied intervention leads to an anticipated outcome. The randomized controlled trial (RCT) is a powerful tool for such studies (see below). By contrast, observational studies may demonstrate an association of a desired outcome with the use of an intervention, but the association may reflect one or more confounding factors. For example, a higher socioeconomic status can confound a study because it may be associated with the use of a test and also with favorable outcomes that occur for reasons unrelated to the use of the test.
The RCT avoids many pitfalls of other study designs by randomly assigning subjects to intervention or control groups (5). It has been used for decades for studies of therapeutic interventions. Proper performance of a RCT is often a requirement for regulatory approval of drugs and for governmental decisions to pay for them. Recently, RCTs of biochemical tests have begun to appear in appreciable numbers (see below) (5). It seems reasonable to expect that many more such studies will be needed to inform decisions concerning whether and when physicians and patients will use biochemical tests, and whether governmental and corporate insurers will pay for them. The studies will need to assess not only whether the test leads to a favorable outcome, but also will need to quantify effects on outcomes (e.g., LOS, days lost from work, days of life gained), and do so with reasonable confidence intervals.
Problems in Outcomes Studies of in Vitro Diagnostic Tests
Some salient problems in the performance of outcomes studies are listed in Table 3⇓ . Outcomes studies can be expensive. This is especially true when long-term patient follow-up is required. The cost of a given study may exceed the test manufacturer’s anticipated financial benefit from the sale of the test or the potential increase in sales of the test that may result from the study. This is especially true when the test is already widely used. Similarly, payers (e.g., insurers and governmental agencies) may conclude that the cost of a proposed study will exceed potential savings from decreased use of the test. This conclusion is likely to be erroneous, however, when it is based only on the charges for the test and ignores downstream expenses such as follow-up testing of false-positive test results and the costs of unnecessary therapy and its untoward effects.
Related to the cost of the study is the sample size required to detect, with reasonable certainty, an outcome of an anticipated size. Smaller studies may have value if their results are reported adequately to allow metaanalyses (see CONSORT below). Attention to study design can improve efficiency of trials (5) and thus reduce costs dramatically.
When the intervention (biochemical test, surgical procedure) under study is considered to be the standard of care, it may be unethical to withhold the intervention from patients and a randomized trial may not be feasible. In such cases, other study designs must be used with a full understanding and discussion of any increased potential for confounding.
For studies of outcomes related to use of biochemical tests, a key concern is the remoteness of many potential outcomes from the testing. The remoteness is both temporal and conceptual. Consider a test (fecal occult blood) designed to prevent colon cancer by detecting polyps. Such a test can be evaluated only after many years and after thousands of follow-up diagnostic procedures and removal of hundreds or thousands of colonic polyps.
Related to the problem of remoteness of outcomes is the inconsistent medical response to the results of medical tests. For occult blood testing, patients and physicians may ignore positive results. Conversely, they may proceed, despite a negative test, to colonoscopy and removal of observed polyps. Either of these responses leads to underestimation of the potential benefit from the test. To address this issue, the study design may require an agreement on the part of participants and physicians to follow a specified course of action after any given test result (5). This approach has the advantage of evaluating a more complete pathway of care (initiated by the test), but it may suffer from being a less realistic reflection of the real-world behavior of physicians and patients, and may involve therapeutic steps that are known to be suboptimal by the time the study is reported. In the latter case, if a markedly better therapy has become available, the value of the test may have been grossly underestimated by the study.
Masking (sometimes referred to by the unfortunate name, “blinding”, as in “double-blind experiment”) is an important aspect of most trials of medical interventions. Thus, placebo pills ideally are indistinguishable from the active ones, and participants in control groups may undergo sham procedures. With medical testing, it often is impossible to completely mask all participants from knowledge of the use of the test or of the test’s results. Study designs may call for the physician or patient to choose treatments based on the results of the test. 2 When the intervention is a point-of-care test performed by the patient or physician, the problem is compounded.
Although it may be difficult to completely mask all involved in the study, it often is possible to mask the investigators who evaluate the outcomes. For example, when the outcome under investigation is occurrence of colorectal cancer, those searching databases need not know the allocations of the study participants. Similarly, those evaluating outcomes, such as days lost from work and hospital readmission rates, can be masked.
Failure of “allocation concealment” can pose problems in studies of outcomes. Allocation concealment implies that, at the time of patient enrollment, the investigators, physicians, and possible participants do not know the group (intervention or control) to which the (next) participant will be assigned. For studies of diagnostic tests, failure of allocation concealment may lead to performance of the test (or the newer test) preferentially in the most challenging cases because these are the ones in which physicians most want to have additional diagnostic test results. Because these patients may be destined for poorer outcomes, failure of allocation concealment could lead to underestimation of the value of the diagnostic test that is being evaluated.
The CONSORT Statement and RCTs
The RCT is a potent tool, the results of which dramatically affect the fortunes of patients, drug manufacturers, physicians, and insurers. The impact of the results of RCTs has led to a realization of the special importance of the quality of these studies and the need for tools to assess that quality. Two groups proposed guidelines (6)(7) for the reporting of RCTs, and a collaboration of the two groups led to the CONSORT statement (8). The guidelines are summarized in a checklist, the items of which are reproduced in Table 4⇓ (see below). The guidelines have been adopted by numerous journals and can serve as a guide not only for reporting trials, but also for designing them according to modern principles of clinical epidemiology. Among the virtues of the guidelines is the promise that adherence to them will allow investigators to carry out meaningful systematic reviews and metaanalyses of the primary studies that follow the CONSORT reporting guidelines (which can be accessed at www.consort-guidelines.org). By the time this report is published, the guidelines will have been revised based on evaluation of the experience with the original CONSORT statement.
Although the CONSORT statement has been quoted most frequently in studies of therapeutic interventions, most, if not all of it can be applied beneficially to studies of diagnostic tests. The CONSORT checklist (Table 4⇑ ) includes several items that were discussed above. The guidelines include a flow diagram (not reproduced here) to describe the flow of patients through the study and to indicate loss of patients at different points.
To test the applicability of the CONSORT statement to RCTs of diagnostic tests, I examined the inclusion of CONSORT items in four recently published RCTs of diagnostic tests (9)(10)(11)(12). As shown in Table 4⇑ (right-hand column), most of the items were included in at least three of the four studies. The one CONSORT item that no studies covered was the item related to masking. Although masking was described in some papers, CONSORT requires evidence that the masking was effective. This appears to be an area that requires particular attention in outcomes studies of diagnostic tests.
Four other items were covered in only one paper (each). Only one paper used the term “randomized trial” in its title. In two other papers, however, “randomized” appeared in the abstract so that a Medline search would have identified the article as such. A single study addressed the required sample size explicitly. Only one paper addressed prospectively defined stopping rules, but the investigators involved in the other studies may have felt that stopping rules were not “warranted” (which is when CONSORT calls for inclusion of the information), or they simply did not describe the rules in the paper, as the studies were not stopped early. Only one paper addressed separation of the generator from the executor of assignment. Again, it is not clear whether the authors of the other studies failed to address this issue in the study design or simply failed to describe the separation.
Studies of Laboratory-related Outcomes: Examples
Studies of outcomes related to laboratory testing have been of interest to investigators, clinicians, and payers for many years. Regrettably, however, the number of such studies has seemed trivial compared with the number of analytical studies of new diagnostic tests or with the number of outcomes studies of therapeutic agents. Nonetheless, early examples can be found. In 1976, for example, Durbridge et al. (13) described in this Journal a controlled study of outcomes of screening tests performed at the time of hospital admission of 100 patients. The LOS was unchanged by testing, but costs increased (13). Similarly, a 1981 randomized trial of multiphasic screening in >7000 outpatients (14) showed no effect on morbidity or mortality.
Several nonrandomized studies are instructive in the outcomes studied and in the way they have reached conclusions. Parvin et al. (15) investigated the possibility that POCT would decrease LOS in the emergency department. (It did not.) The study, which involved 4985 patients, measured baseline LOS before introduction of the testing, LOS during an intervention period with POCT, and then in a second period when POCT was removed. The lack of effect on LOS was found also by others in a later RCT (16) (see below).
Nichols et al. (17) studied the ability of POCT to provide required preprocedure test results rapidly and thereby prevent delays in the start of procedures in interventional radiology and invasive cardiology. As in the study of Parvin et al. (15), data were collected in a baseline period and compared with results during a period with POCT (17). The POCT was not removed after the period with POCT, leaving the study somewhat more subject to error if an effect was seen, because an “effect” could reflect other, contemporaneous changes in the health system. In this study, POCT had limited impact when used alone. Waiting time decreased, however, in a further testing period when workflow was changed to take advantage of POCT. This study, although perhaps viewed primarily as a management study, is a type of outcomes study.
Examples of recent RCTs of biochemical testing (9)(10)(11)(12)(16) are shown in Table 5⇓ . Two of the studies addressed the use of POCT in place of testing in a central laboratory (9)(16). The two studies addressed similar outcomes, including health-related outcomes, costs, and acceptance by patients and physicians. Both contained most of the elements desired by the authors of the CONSORT statement.
Each of the other three studies in Table 5⇑ asked whether availability of a test affected outcomes. The number of patients in each study was large: 6352 patients in the study of early cardiac markers (including creatine kinase MB and myoglobin) (10); 18 189 patients in the study of routine preoperative testing (including electrolytes, urea, creatinine, and glucose) (12); and 46 551 participants in the study of fecal occult blood testing (11).
Consistent with the difficulty of funding of outcomes studies (Table 3⇑ ), most large RCTs of biochemical tests address tests that are used in the diagnosis or management of common disorders. The tests in Table 5⇑ concern diabetes, coronary artery disease, colon cancer, cataracts, and conditions that prompt people to attend the emergency department. These common conditions demand attention because of the expense involved: the annual cost to Medicare of routine testing before cataract surgery, for example, is approximately $150 million (12). It is not clear how RCTs of tests for rare disorders will be funded. This question seems particularly important as testing is increasingly available for rare genetic conditions.
Modeling of Outcomes
A variant of outcomes studies that may be useful is computer modeling. Golan et al. (18) considered the evidence that angiotensin-converting enzyme inhibitors were beneficial for people with diabetes regardless of the presence or absence of microalbuminuria. They questioned whether microalbumin testing was needed, given the cost and the need for patient compliance with testing. To address this question, they developed a Markov model of progression of diabetes. The model showed an advantage in life expectancy (at a modest cost) of treating all patients without testing for microalbuminuria.
Simulation modeling has been used to investigate analytical goals (quality specifications) for glucose meters that are used as aids in the selection of insulin doses (19). This study (19) aimed to quantify the relationship between meter errors and errors in insulin dose. This was done by creating a “virtual” meter, with specified imprecision and bias, that “analyzed” 10 000 (or 20 000) “samples” with known glucose concentrations. The known and “measured” concentrations were each used to identify the corresponding dose of insulin specified by a commonly used rule. The frequency of discrepant doses was then tabulated, and the process was repeated for a meter with another combination of imprecision and bias. The resulting data allowed generation of plots of insulin error rates related to imprecision or bias or both. Further modeling conceivably could relate insulin dose errors to changes in plasma glucose and thus to rates of diabetic complications. The key point is that modeling of outcomes may provide a way to address quality specifications for biochemical methods.
This selective review suggests that outcomes studies are becoming more common in laboratory medicine, that they will likely determine which tests are used and whether the tests are paid for, that such studies may be used to assess analytical quality requirements for tests, and that the design and reporting of outcomes studies requires careful attention to the principles of clinical epidemiology. The difficulty of performing these studies represents an important challenge to workers in all areas of laboratory medicine.
1 In two other papers, the abstract contained the information.
2 Partial or complete inclusion.
3 Three studies described masking of outcomes assessors or others.
4 Two other studies provided extensive narrative descriptions.
1 Hgb, hemoglobin.
↵1 Nonstandard abbreviations: LOS, length of stay; POCT, point-of-care testing; and RCT, randomized controlled trial.
↵2 The problem of masking can be addressed even in this situation by having the treatment option dictated by a third party, according to an agreed algorithm. The third party need not disclose the test result to the physician, only the therapy. For example, blood samples may be obtained from all participants in the study, but the test results for the control group may be ignored by the third party. For the controls, the choice between two treatment options is then made by the (remote) third party without use of the test result, e.g., randomly or based solely on other features of the patient. The physician can be told the treatment path to follow and will not know whether the decision was or was not based on testing. Such a study design could be applied (for example) to compare patient history (alone) with history plus HIV sequence data as guides to the choice of initial antiretroviral therapy.
- © 2001 The American Association for Clinical Chemistry