An interesting and potentially useful approach to biomarker identification was recently reported by Reddy et al. in the journal Cell (1). Using a combinatorial library of synthetic “shape” molecules, they screened for ligands that bind antibodies in the serum of individuals with the target condition but that do not bind antibodies in the serum of individuals without the target condition. When this approach was investigated with 2 mice immunized with myelin oligodendrocyte glycoprotein (MOG)3 (which then developed a syndrome resembling human multiple sclerosis) and 2 mice not immunized with MOG, the investigators identified 3 peptoids that subsequently discriminated perfectly between new sets of mice—7 immunized with MOG and 7 not immunized with MOG. This result was an elegant and convincing proof-of-principle experiment.
The authors then used the same protocol to determine if the approach “is capable of identifying potentially useful diagnostic antibody–peptoid pairs for a human disease state.” They compared the antibody responses to 15 000 peptoids of 6 patients with Alzheimer disease (AD) to the antibody responses of 6 age-matched nondemented control individuals and 6 patients with Parkinson disease. The authors report sensitivities ≥93.7%, specificities ≥93.7%, and areas under the ROC curve of 0.99 for the 3 most discriminatory peptoids when those peptoids were tested with serum samples from 16 different patients with AD and 16 nondemented control individuals.
The purpose of this Perspective is to highlight the strengths of this study and, by contrast, the weaknesses of the study design used to develop and evaluate the biomarkers for AD. To the authors' credit, some of these weaknesses were acknowledged in their Cell report. Weak study designs are pervasive in the field of biomarker identification, however, and this flaw may be partly responsible for the slow pace of real progress in the development of clinically useful biomarkers. A common consequence of poor study design is that biomarkers with seemingly superb performance in early-phase studies subsequently show mediocre performance in rigorous validation studies. Beyond pointing out weaknesses that lead to such false-positive findings, we suggest some better general strategies for designing early-phase studies aimed at identifying candidate biomarkers for clinical use.
Consider first the clinical application for which a biomarker of AD is sought. The purpose is to test individuals who have mild cognitive impairment and identify those who are likely to develop AD, at least in the absence of intervention. Reddy et al. tested individuals with advanced AD and compared them with nondemented individuals; however, a biomarker that distinguishes between the extremes of advanced AD and normal cognitive function may not distinguish well between individuals with mild cognitive impairment destined to develop AD in the future vs those who will not develop AD. A biomarker common to individuals with cognitive impairment, for example, may work for comparing patients with and without AD (as described in the Cell report) but not for the clinical application. Likewise, a biomarker present only when AD is sufficiently advanced may work for the comparison examined in the Cell report but not for the clinical application. A better strategy, from both the discovery and evaluation points of view, would have been to test serum samples from individuals with mild cognitive impairment who subsequently were and were not diagnosed with AD. Large prospective cohort studies of aging individuals, such as the Cardiovascular Health Study (http://www.chs-nhlbi.org/), the Women's Health Initiative (http://www.nhlbi.nih.gov/whi/), or the Ginkgo Evaluation of Memory Study (http://www.nccam-ginkgo.org/), could provide the samples for this sort of design and therefore might provide a better basis for identifying and evaluating a biomarker for the intended clinical use.
Reddy et al. provided no detail on the enrollment of the individuals included in their study. For example, individuals with AD often are under institutional care, whereas nondemented individuals live independently. Institutionalized individuals are likely to differ in many respects from those still living independently. Levels of depression, anxiety, inactivity, or medication use are higher in institutionalized individuals. Such factors could give rise to molecular markers, such as those described in the Cell report, that distinguish between AD and nondemented individuals but that would not be useful in testing individuals for their future risk of developing AD. Biased comparison groups are a notorious source of false-positive findings in early-phase biomarker studies. A better strategy would be to identify the target population and to select the cases and controls randomly from that population (Fig. 1). In other words, individuals who subsequently developed AD (cases) and those who did not (controls) should be representative of those groups in the clinical population of interest, a quality made possible by selecting them randomly after classifying all individuals in the relevant population according to the outcome. Again, this procedure is best done in the context of a prospective cohort study of aging individuals.
Another advantage of a well-designed prospective collection of samples is that all samples are collected, processed, and stored in a uniform manner. Reddy et al. provide no details about the collection of samples from the human study participants. For example, given the statement that several AD cases were autopsy-confirmed, collections from nondemented individuals might have been relatively recent, whereas those from AD individuals might have occurred earlier. Such a difference in the timing of collections would give rise to spurious differences between the groups, for biomarkers that degrade over time for example. False-positive findings could also arise from differences between the groups in the protocols for sample collection, processing, or storage. Some differences are likely if collections were done by different clinical staff or for different purposes.
Although the Cell report presents substantial detail with regard to the technology used for measuring the biomarkers and the mice used in the experiments, scant information is provided regarding the human studies. The lack of information about the sources of the individuals and the samples for the human studies described in the Reddy et al. report is not uncommon in the literature on biomarker identification. To address this general concern requires improvements in the reporting standards for biomarker studies, both to improve the science and to move it forward to clinical application. Interestingly, journals currently seem to apply much more rigorous standards of reporting to therapeutic studies than to biomarker studies. In particular, more attention to thorough reporting of the clinical aspects of study design is needed for biomarker studies. For example, authors should provide detailed descriptions of enrollment procedures, eligibility criteria, outcome assessments, selection of cases and controls, and protocols for sample ascertainment, as well as techniques for biomarker measurement. Guidelines for the reporting of diagnostic test and prognostic marker studies already exist (2, 3) and could be adapted for the reporting of biomarker discovery/evaluation studies.
The design of biomarker studies would be much improved by using long-standing principles of good study design borrowed from population science. The basic concept is straightforward: prospective collection of samples and outcome ascertainment in the clinical context of interest with biomarker assays of random subsets of cases and controls (Fig. 1). We have described the details of this PRoBE (prospective sample collection, retrospective blinded evaluation) design (4) for cancer biomarker studies, but the same design is appropriate for early-detection, diagnostic, and prognostic studies of biomarkers for other diseases. For prognostic biomarkers, the clinical study population is patients with disease, whereas for early-detection biomarkers, the study population is healthy individuals. Basic-science investigators rely on clinical collaborators to provide samples. Therefore, clinical collaborators must be encouraged to construct sample repositories according to PRoBE principles and to make sample sets available for testing. The Early Detection Research Network (http://edrn.nci.nih.gov/) is an organization that facilitates the construction of biorepositories and collaboration between basic and clinical scientists. In addition, this network takes advantage of some excellent existing biorepositories, such as those of the Women's Health Initiative and the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (http://prevention.cancer.gov/plco) for early-detection biomarkers.
Ideally, quality sets of samples should be made available for biomarker identification (discovery) (5). Unfortunately, gatekeepers of biosample banks often are inclined to save quality sample sets for the validation of biomarkers and are reluctant to allow their use in discovery research; however, if we continue to use for discovery poorly designed sample sets that give rise to biased comparison groups, we risk continuing to produce many false leads from discovery studies and remaining stuck in the frustrating wasteful cycle of discovering biomarkers that do not validate when subjected to more rigorous PRoBE evaluation.
Of course, in circumstances in which sample availability is extremely limited, such as for rare diseases, practical considerations dictate that discovery research be done with study designs that do not follow the PRoBE criteria. In such instances, we must wait for subsequent PRoBE-designed validation studies before drawing conclusions about biomarker performance and acknowledge the limitations of conclusions based on studies not designed according to PRoBE principles. The most serious concern about the Reddy et al. report is the implication that their study evaluations “constitute a fair and critical test of the peptoid–antibody complexes as biomarkers.” Because AD is not rare, it is unfortunate that an unbiased PRoBE set of samples was not used.
In closing, we congratulate Reddy and colleagues on some elegant experiments that prove the principle that a molecular “shape library” can be useful in finding markers to distinguish between 2 groups. We will not be surprised, however, if the markers identified in this study of AD patients do not validate well in future studies. If that occurs, we would encourage the investigators to redo their discovery and evaluation work with a better clinical study design so that the true potential of their technology for identifying useful diagnostic antibody–peptoid pairs for a human disease state can be realized.
↵3 Nonstandard abbreviations:
- myelin oligodendrocyte glycoprotein;
- Alzheimer disease;
- prospective sample collection, retrospective blinded evaluation.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors' Disclosures or Potential Conflicts of Interest: No authors declared any potential conflicts of interest.
- Received for publication April 15, 2011.
- Accepted for publication May 18, 2011.
- © 2011 The American Association for Clinical Chemistry