Microarrays were first applied to experimental studies of gene expression, but newer microarray applications, such as comparative genomic hybridization, have evolved and preceded it to clinical introduction. Challenges to clinical use include potentially dramatic biological changes in gene expression as a result of preanalytic variation in patient and sample handling, analytical variation resulting from the complex assay process, and daunting analytical and statistical problems in simultaneous measurement of thousands of signals ranging over several orders of magnitude.
Arrays come in many formats, including chips, slides, microfluidic devices, and beads. They differ in the sets of genes assayed and in the probe sequences used to assay for a given gene. Some systems hybridize the sample and control to the same array, using two-color labeling, whereas another system hybridizes the sample and control to separate arrays. As a result, formats have incommensurable merits and often incommensurable data. Published information comparing formats are too limited for conclusions other than that data interchange might be feasible (1)(2).
This flowering of ingenuity is acceptable for hunting candidate genes with one format and verifying results by more common techniques, but for clinical applications, this variety is an obstacle. The typical application envisions a multiparametric “gene-expression signature” in which the expression patterns for many genes are combined to generate a “classifier” for diagnosis or prognosis. Laboratory 1, which uses array system brand A, publishes a well-designed study showing that a gene-expression signature distinguishes benign from malignant omphalomas. How should Laboratory 2, which uses brand X, adapt the “signature”? Clinical studies are also hampered by lack of well-defined controls. This problem is analogous to deciding, without benefit of reference materials, which of two immunoassays is better—when they use independently derived antibodies and different calibrators and controls—and then making this decision for 10 000 immunoassays at once. Some suppliers already provide tools to control for variation, including replicate probes, “spike-in” RNA controls, normalization algorithms, and image-quality metrics, but these also differ among formats. For comparing results among methods, it would be decidedly helpful to have widely available, standardized, renewable pools of RNA species that could monitor RNA purification, monitor cDNA labeling, verify sensitivity, and serve as controls.
A NIST workshop in March 2003, “Metrology and Standards Needs for Gene Expression Technologies: Universal RNA Standards” (3), initiated a consortium effort to design, produce, and validate two types of RNA reference materials applicable to both microarrays and quantitative reverse transcription-PCR (QPCR). A more recent consortium workshop focused on one type of RNA reference material (4). Cronin et al.(5) ably summarize the first meeting. Each RNA reference material will be a cRNA pool used to generate labeled cDNA, which is then hybridized to a (correspondingly modified) array:
Assay Process Standard (APS): A pool of 96 human cRNAs, as yet unspecified and presumably covering a range of concentrations, to be used for quality control in manufacture and to compare performance among array formats.
Universal Hybridization Standard (UHS): A pool of 12 “alien” cRNA sequences showing no sequence similarity with the human genome and spanning 12 orders of magnitude in concentration. This would be added to each RNA sample before labeling.
During an actual assay, the APS could be cohybridized with a complex sample in two-color format. In the single-color format, the APS and the complex sample would have to be hybridized to separate arrays. The performance of a simple cRNA pool will not necessarily reflect performance of an array system with a more complex mixture. For such a format perhaps a complex sample could be labeled in parallel with and without addition of an APS, and then hybridized to separate arrays (recovery experiment).
The UHS (“spike-in” controls) would provide information on efficiency of cDNA synthesis/labeling, uniformity of hybridization, and sensitivity of detection. A pool added to samples before RNA purification could monitor that process. The utility of APS and UHS for QPCR is clear. The proposed number of materials exceeds needs but will not resolve the fundamental question of how best to normalize QPCR.
The External RNA Control Consortium proposes a detailed plan for 100 cRNA “spike-in controls”, each alien to the combined genomes of humans and seven research organisms (4). If it expedites production, clinical users would accept controls alien to humans but not to Arabidopsis. The notion in both workshops, that “spike-in controls” permit comparison among arrays, should be treated with caution: sample RNA risks preanalytical variation long before a “spike-in” is added; a pretty good RNA electrophoresis profile does not assure overall performance. Even a large number of “spike-in” controls (UHS) gives no information about the performance of array probes for genes of interest.
The APS proposal is both too ambitious and too modest. The APS proposal is ambitious because a measure that permits “apparently” simple comparison among formats will meet commercial resistance. If the goal is to provide general quality control for manufacture, “alien” genes could serve. If the goal is to provide quality control for genes used in gene-expression signatures, the proposal is too modest. The goal of 96 cRNAs was chosen based on microtiter plate capacity. “Universal RNA standards” are currently available, typically pools of RNA from many sources (6). Such a “standard” can include mRNA for most genes on a large array, but at concentrations that are not individually controlled. Such material is neither easily replenished nor standardized. Given the extent of interest and the expectation of high expense, why not plan adjustable pools of 1000 cRNAs that could form a cohybridization control (negative or positive)? Issues of cost, intellectual property rights, and clone-library maintenance will have to be overcome.
In the short run, a well-chosen 96-gene APS would be helpful. As A.N. Whitehead observed, “Civilization advances by extending the number of important operations which we can perform without thinking about them” (7). There are related but thornier problems to address (or to muddle through) before clinical gene-expression arrays succeed:
Can a gene-expression signature be transferred among array formats (assuming gene signatures are not patented)? The data are not encouraging, but they are very limited (8). Regardless of methodologies, it will be necessary for arrays to have more overlap in the set of genes tested. A RNA standard material might promote this.
Are gene-expression signatures robust? Differences in gene expression occur as a function of age, gender, medication, nutrition, stress, preoperative regimens (consider needle biopsy vs open biopsy), and RNA preparative methods.
How will the amount of acceptable sample heterogeneity be defined? The profile of a blood sample with 5% leukemic blasts will look different from the profile of a sample with 95% blasts.
Should clinical arrays probe a small number of genes or most genes? If the number is small, QPCR might be more effective. If the number is in the hundreds, the results would be met, rightly, with skepticism. Will a focused array need 200 control genes? Would an omnigene array be best because look-back data will probably become useful?
Which statistical-quality-control measures are appropriate for the “gene-expression signature” of the controls? Quality control is already complex for testing with only three (or four) tests, specifically for the triple (or quadruple) birth defect screen, which is the highest-order multitest index in clinical chemistry.
Which statistical quality-control measures are appropriate for the signal of each array element? With thousands of elements, a few are likely to be “out” of control.
This is not a litany of despair. Consider the state of immunohistochemistry and immunoassay, which are now several decades past their introduction. For some antibodies immunohistochemistry results are controversial, but for others there is clear utility. Prostate-specific antigen immunoassays have a clear utility, but the details are still controversial. NIST efforts at microarray standardization should be applauded and an early fruition desired, especially because proteomics is beginning to grab attention. Still, the data-analysis challenges in proteomics might be just as tough—with mass spectrometry you cannot even see the spots.
- © 2004 The American Association for Clinical Chemistry