Background: In most measurements of gene expression, mRNA is first reverse-transcribed into cDNA. We studied the reverse transcription reaction and its consequences for quantitative measurements of gene expression.
Methods: We used SYBR green I-based quantitative real-time PCR (QPCR) to measure the properties of reverse transcription reaction for the β-tubulin, glyceraldehyde-3-phosphate dehydrogenase, Glut2, CaV1D, and insulin II genes, using random hexamers, oligo(dT), and gene-specific reverse transcription primers.
Results: Experimental variation in reverse transcription-QPCR (RT-QPCR) was mainly attributable to the reverse transcription step. Reverse transcription efficiency depended on priming strategy, and the dependence was different for the five genes studied. Reverse transcription yields also depended on total RNA concentration.
Conclusions: RT-QPCR gene expression measurements are comparable only when the same priming strategy and reaction conditions are used in all experiments and the samples contain the same total amount of RNA. Experimental accuracy is improved by running samples in (at least) duplicate starting with the reverse transcription reaction.
Gene expression reflects both the genetic predisposition and the physiologic condition of the individual. From measurements of gene expression, it is possible to diagnose an individual’s state of health and also to monitor how an individual responds to medication, treatment, and altered living conditions. The expression of virtually all genes in a sample can be roughly assessed by cDNA microarray techniques, and the expression of selected genes can be measured by real-time PCR with very high accuracy (1)(2). In studies of new systems or in search for drug targets, key marker genes are typically identified by cDNA microarray screening and then studied in greater detail by more sensitive real-time PCR.
Both real-time PCR and cDNA microarray measurements are highly reproducible (2)(3)(4), but before the expression of any gene can be measured, the mRNA in the sample must be copied to cDNA by reverse transcription. The reverse transcription reaction is not very well understood, and it is expected to be the uncertain step in gene expression analysis. It can introduce biases as a result of effects of the secondary and tertiary structure of mRNA, variation in priming efficiency, and properties of the reverse transcriptase. The yield of the reverse transcription reaction can also be affected by reaction inhibitors present in biological samples (5)(6)(7)(8)(9)(10). To date, no published study has considered the accuracy and precision of the reverse transcription reaction.
Our aim was to study the properties of the reverse transcription reaction, using quantitative real-time PCR (QPCR) 1 as an analytical tool. We investigated the reproducibility, yield, dynamic range, sensitivity, and specificity of the reverse transcription reaction, using the β-tubulin, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) (11)(12), Glut2, CaV1D, and insulin II genes, which are expressed differently in a pancreatic β-tumor mouse cell line (13). We also studied the effect of total RNA concentration on reverse transcription efficiency and compared priming with random hexamers, oligo(dT), and gene-specific primers.
Materials and Methods
cell culture, rna isolation, and DNase treatment
A pancreatic β-tumor cell line (a generous gift from Dr. Gerhard Christofori, Vienna, Austria) derived from primary tumors of the Rip1Tag2 mouse was grown to confluence in DMEM (Sigma-Aldrich) containing 200 mL/L fetal calf serum (PAN Systems), 100 kilounits/L penicillin (Invitrogen), 100 mg/L streptomycin sulfate (Invitrogen), and 2 mmol/L l-glutamine (Invitrogen). Total RNA was prepared from harvested cells with the RNeasy Midi Kit (Qiagen) and treated with RNase-free DNase (Qiagen) according to the manufacturer’s instructions. The RNA concentration was measured by fluorescence (TD-360; Turner Designs) with the RiboGreen Quantitation Reagent (Molecular Probes) according to the manufacturer’s instructions. RNA integrity was verified by electrophoresis in a 1% agarose gel containing 54 g/L formaldehyde.
The SuperScript II (Invitrogen) reagent set was used for the reverse transcription reaction. We heated 13-μL samples containing total RNA, ultrapure deoxynucleotide triphosphates (dNTPs; Amersham Pharmacia Biotech), and random hexamers (Promega), oligo(dT) (Amersham Pharmacia Biotech), or gene-specific primer (MWG-Biotech) designed to anneal to target mRNA at ∼80 bp before the start of the PCR product at 65 °C for 5 min to denature the RNA and then chilled the samples on ice for 5 min. We then added Tris-HCl (pH 8.3), KCl, MgCl2, and dithiothreitol (Invitrogen) to a total volume of 19 μL. The random hexamer-primed samples were incubated for 10 min at 25 °C. All samples were then heated to 42 °C for 2 min, and 1 μL of SuperScript II was added to give a final volume of 20 μL containing: 500 μM dNTPs, 50 mM Tris-HCl, 75 mM KCl, 3 mM MgCl2, 5 mM dithiothreitol, 200 U of SuperScript II, and 0.1 μg/μL random hexamers, 0.05 μg/μL oligo(dT), or 1 μM gene-specific primer (Table 1 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol50/issue3/). The reverse transcription reaction was performed at 42 °C for 50 min and was stopped by heating to 70 °C for 15 min.
All real-time PCR assays contained 10 mM Tris (pH 8.3), 50 mM KCl, 1 U of Taq polymerase (Sigma-Aldrich), 200 ng/μL bovine serum albumin (MBI Fermentas), 3 mM MgCl2, 0.3 mM dNTPs (Sigma-Aldrich), 1:100 000× SYBR Green I (Molecular Probes), and 400 nM each PCR primer (MWG-Biotech; Table 1 in the online Data Supplement) in 20 μL. The reverse transcription and real-time PCR primers were designed using Primer3 (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi).
Real-time PCR was performed in a LightCycler (Roche Diagnostics) starting with 3 min of preincubation at 95 °C followed by 50 amplification cycles (Table 1⇓ in the online Data Supplement). The threshold cycle (Ct) was determined by use of the maximum-second-derivative function of the LightCycler software. Formation of expected PCR product was confirmed by agarose gel electrophoresis (2%) and melting curve analysis (14).
Different reverse transcription priming strategies were compared as follows. cDNA was synthesized by use of random hexamer primers; oligo(dT) primers; one of the primers specific for the genes β-tubulin, GAPDH, CaV1D, insulin II, and Glut2; or a mixture of the five gene-specific primers. Reverse transcription without any primer was used as negative control. All reverse transcription reactions were performed in replicates of five on material from the same RNA pool. This eliminated sample-to-sample variation attributable to inhibition that may affect reaction efficiencies (2) as well as any deviations in efficiency attributable to variations in copy number, as has been reported in highly diluted samples (15). A schematic showing the experimental setup is shown in Fig. 1 in the online Data Supplement.
The yield and reproducibility of cDNA synthesis of the β-tubulin, CaV1D, GAPDH, insulin II, and Glut2 genes were measured by QPCR with SYBR Green I detection (detailed protocols are given in Table 1 in the online Data Supplement). PCR efficiencies (E) and the median and SD (SDQPCR) of the Ct values were calculated for the five genes (Table 1⇑ ) (16). For the β-tubulin, CaV1D, GAPDH, and insulin II assays, SDQPCR was <0.12 cycles (Table 1⇑ ). This corresponds to less than (1 + E)0.12 ≈ (1 + 0.85)0.12 = 1.08 (assuming an 85% PCR efficiency; Table 1⇑ ) or <8% variation in the estimated number of cDNA molecules. For Glut2, which is expressed to a lesser degree than the other genes, SDQPCR = 0.36. This corresponds to a variation of 26% (1.910.36) in the estimated number of cDNA molecules.
efficiency and reproducibility of reverse transcription
Shown in Table 2⇓ are the mean Ct values measured by QPCR for the five genes when the different reverse transcription priming strategies were used. In general, a low Ct value corresponds to high gene expression in the biological sample. However, in our comparative study, the same starting material was used for all reverse transcription reactions; therefore, in this study, a low Ct value indicates a more efficient reverse transcription reaction. A difference of one cycle in Ct between two reverse transcription priming strategies for a particular gene corresponds to a (1 + E)-fold difference in reverse transcription yield. As the data in Table 2⇓ indicate, no reverse transcription priming strategy was best for all five genes. For example, for β-tubulin, the highest reverse transcription yield was obtained with oligo(dT) primer (Ct = 18.1), and random hexamers gave the lowest yield (Ct = 19.5). For CaV1D, the opposite was true: random hexamers gave the most efficient priming (Ct = 26.5), whereas oligo(dT) performed worst (Ct = 28.8). Also shown in Table 2⇓ are the largest differences in Ct among the various priming strategies for the five genes. The difference ranged from 0.8 cycles for GAPDH to 4.4 cycles for Glut2. Expressed in terms of cDNA molecules, this corresponds to a 59% (1.780.8 = 1.59) difference in cDNA synthesis yield between the best and worst priming strategies for GAPDH and a 17-fold variation (1.914.4 = 17.2) for Glut2. Clearly, the choice of priming strategy can have profound effects on the yield of cDNA synthesis. The yields are evidently also gene dependent.
Reverse transcription yields when we used nonmatching or false primers were in all cases low (Table 2 in the online Data Supplement). For the highly expressed genes (β-tubulin, GAPDH, insulin II, and CaV1D), the reverse transcription yields with false primers were always much lower than with random hexamers, oligo(dT), individual gene-specific primers, or the mixture of the five gene-specific primers, but they were not negligible. In all cases, Ct with false primers was lower than the Ct of the negative control with no primers, indicating that priming of mRNA for reverse transcription is not a particularly stringent reaction. For the Glut2 gene, reverse transcription primers designed to be specific for the other genes gave lower Ct values than the Glut2-specific reverse transcription primer (Table 2 in the online Data Supplement). Clearly, the Glut2 reverse transcription primer hybridizes poorly to Glut2 mRNA under our conditions.
The reproducibility of the reverse transcription priming strategies can be estimated from the five replicates performed for each reverse transcription reaction (Fig. 1 in the online Data Supplement). The SD of a gene expression measurement, i.e., the SD of the determination of the amount of a particular mRNA (SDmRNA), is a weighted sum of the SDs of the QPCR (SDQPCR) and the reverse transcription (SDRT) reactions (16): With use of the SDQPCR calculated above, SDRT for the different assays could be calculated from SDmRNA (Table 1⇑ ). SDRT was in the range 0.05–0.60, which is 0.7- to 28-fold higher than the typical SD of optimized QPCR assays. When we compared SDRT and SDQPCR, most experimental variation in the determination of mRNA for the β-tubulin, GAPDH, and insulin II assays was in the reverse transcription step. For the CaV1D and Glut2 assays, SDQPCR and SDRT were comparable. When we evaluated the reproducibility of the priming strategies, we found that different strategies were best for different genes. No single priming strategy outperformed the others. For CaV1D, oligo(dT) priming gave highest reproducibility, whereas for β-tubulin and GAPDH, gene-specific priming was optimal. For Glut2, the mixture of the five gene-specific primers gave the highest reproducibility, and for insulin II, random hexamers performed best. To test the significance of the determined SD, we calculated the covariance between samples and gene expression and found it to be negligible.
accuracy of mRNA quantification
The accuracy of the estimation of gene expression can be improved by running samples in replicate and averaging the measurement results. Because mRNA quantification is performed in two steps, reverse transcription and QPCR, repeats can be done at either one or both steps. When designing experiments, one should consider the experimental accuracy of the two steps. Assuming that samples drawn from the target population are gaussian distributed, the true mRNA concentration (μ) is: where x̄ is the estimated mRNA concentration calculated as the mean of n measurements with standard deviation (SDmRNA), for a particular confidence (t) (16). Rearranging Eq. 2 and substituting with the sampling error (ε = μ − x̄) gives: where SDmRNA2 and ε2 are both expressed as either absolute uncertainties or relative uncertainties. Because both the reverse transcription and QPCR steps contribute to experimental variation, Eq. 3 may be rewritten: nRT and nQPCR are the total number of replicates in the reverse transcription and QPCR steps, respectively. The samples can be divided into DRT (DRT = nRT) aliquots before the reverse transcription step to improve the estimation of the cDNA synthesis yield and into DQPCR aliquots (nQPCR = DQPCR · DRT) before the PCR step to improve the estimation of cDNA amplification efficiency. The variation between identical runs in QPCR (intraassay variation) was <0.12 Ct for all genes but Glut2.
Although PCR is a cyclic reaction that accumulates errors, its reproducibility is significantly higher than that of the single-step reverse transcription reaction (Table 1⇑ ). Using Eq. 4, we could estimate the sampling error for different experimental setups. As an example, for GAPDH with random hexamer priming (SDQPCR = 0.02; SDRT = 0.13), the sampling error when we divided the test sample into aliquots before QPCR (DRT = 1 and DQPCR = 4) was 0.30. If instead the samples were divided into aliquots before reverse transcription (DRT = 4 and DQPCR = 1), the sampling error was only 0.17. Hence, experimental accuracy was two times higher when the test sample was split into aliquots before the reverse transcription reaction than when it was split before the QPCR. The examples in Fig. 1⇓ show how sampling error depends on experimental design for the different cases of four QPCR aliquots. In general, experimental accuracy is higher when samples are split into aliquots before rather than after the reverse transcription step. The only circumstance in which splitting samples after reverse transcription appears to be advantageous is when SDQPCR is greater than SDRT and cost is an issue.
dynamic range of reverse transcription
QPCR analysis has in several studies been shown to have a large dynamic range (17)(18). For quantitative gene expression analysis, the yield of reverse transcription must also be independent of template amount. This was the focus of the present study.
The amount of mRNA as determined by reverse transcription-QPCR analyses is given by: where nCt is the number of cDNA molecules after Ct amplification cycles, E is the PCR efficiency, n0 is the number of target mRNA molecules, and η is the reverse transcription efficiency defined as the fraction of mRNA molecules that are converted to cDNA in the reverse transcription reaction. The exponent in Eq. 5 is Ct − 1 and not Ct, as in the case of regular QPCR (2)(3), because reverse transcription generates single-stranded cDNA that is copied to double-stranded template in the first PCR cycle. For nCT to correctly reflect the amount of mRNA, η must be independent of both the total RNA and target mRNA concentrations. This is generally assumed but rarely verified. We studied the dynamic range of reverse transcription using the setup shown in Fig. 2 on the online Data Supplement. A RNA stock solution (1024 ng of total RNA) was diluted in steps of 4 with either water or yeast tRNA, which kept the total RNA concentration constant. cDNA was synthesized with use of either random hexamers or oligo(dT) priming, and the samples were PCR-amplified. The Ct values of the amplification curves of the samples serially diluted with tRNA decreased linearly with dilution factor (Fig. 2⇓ ), giving the constant η (Eq. 1). This was not the case for the samples diluted in water; for these samples, plots of Ct vs total RNA concentration were curved, and the most dilute samples gave no specific signal at all.
In recent years real-time PCR- and cDNA microarray-based assays have been developed for molecular diagnostics (19)(20)(21)(22) and for transcriptional profiling (23)(24). Some tests are already used in clinical practice, but before these methods can reach their full capacity, all technical steps, including sampling, RNA isolation, reverse transcription, cDNA quantification, and data analysis, must be carefully optimized and validated (25). We therefore studied the reproducibility, dynamic range, specificity, and sensitivity of the reverse transcription reaction.
The experimental reproducibility of QPCR, as manifested by the low SD of Ct values, is usually very high for common detection chemistries such as SYBR Green I (3), TaqMan probes(12), LightUp probes (2), and Molecular Beacons(26). Only when the number of cDNA molecules is low does SDQPCR increase as a result of interfering primer–dimer formation and statistical effects (15)(27). In the present study, primer–dimer formation was observed only in samples with Ct values >32. A Ct of 32 corresponds, under our conditions (SYBR Green I detection in LightCycler), to ∼100 cDNA molecules, which is where statistical effects start to become significant. Of the five genes studied Glut2 had the highest Ct values with the lowest reproducibility (Tables 1⇑ and 2⇑ ). In a control experiment, larger amounts of Glut2 template were found to give Ct values as reproducible as the other QPCR assays [SDQPCR(Glut2) = 0.065]. Evidently, the poor quality of Glut2 data can be ascribed to low amounts of Glut2 cDNA. This in turn may be attributable to a low abundance of Glut2 mRNA or low reverse transcription yields for Glut2. Because all priming strategies gave rather high Ct values for Glut2 (Table 2⇑ ), it is likely that the samples contained little Glut2 mRNA.
In general, when false primers were used, reverse transcription yields were higher than for the negative controls in which no primer was used (Table 2 in the online Data Supplement). This means that the false primers nonspecifically prime the reverse transcription reaction. Interestingly, the negative controls, which contained all reverse transcription components but no primers, gave some reverse transcription products. Evidently, reverse transcription can be primed by other RNA molecules present in the sample or perhaps by dNTPs (28). The lower assay temperature for reverse transcription compared with PCR is likely to contribute to the low degree of sequence specificity in the priming event. This problem may be overcome by use of thermostable reverse transcriptases. Primer hybridization relies on access to the appropriate target site in the mRNA and may vary substantially because of mRNA folding (29)(30). In general, a higher reverse transcription annealing temperature improves reverse transcription yields by reducing formation of mRNA secondary structures (7)(8).
Our results clearly show that the Ct of a reverse transcription-QPCR assay depends not only on the amount of target mRNA but also on the total RNA concentration (Fig. 2⇑ ). We speculate this is attributable to adsorption artifacts that can be eliminated, or at least reduced, by the addition of carrier nucleic acids. We showed that yeast tRNA can be used as carrier, but other polymers, such as linear polyacrylamide, also work well (data not shown).
Gene expression measurements are usually performed as relative measurements (3)(11)(12)(31). Most experimental strategies compare the expression of target genes with the expression of nonregulated reference genes (32). In some cases it has been possible to use the relative expression of two reporter genes as an indicator (2)(33). Calculating the expression ratios, estimated as (1 + E)Ctgene 1/(1 + E)Ctgene 2 (3) for any two of the five genes studied here, we found that it depends on the priming strategy. For example, the expression of β-tubulin relative to Glut2 is 1.9127.5/1.7919.5 = 628 when measured with use of random hexamers to prime reverse transcription and 1.9131.8/1.7918.8 = 15200 when gene-specific reverse transcription primers were used. Hence, the expression ratio of the two genes differs 15 200/628 = 24-fold when measured with use of two different priming strategies. Clearly, one cannot compare relative gene expression measurements performed under different priming conditions. In fact, it may even be uncertain to compare relative gene expression measurements performed with different batches of random hexamers or oligo(dT) because of batch-to-batch variation. Relative measurements of expression ratios, i.e., comparing the expression of genes in different samples, are possible by compensating for sample-to-sample variation in PCR efficiency by, for example, in situ calibration (2) or kinetic PCR (34)(35). Absolute measurements of expression ratios, i.e., comparing the expression of two genes in a single sample, are not meaningful because of variation in reverse transcription yield unless the experimental system is properly calibrated by use of external standards.
In conclusion, we show that experimental variation in reverse transcription-QPCR is mainly attributable to the reverse transcription step. The efficiency of the reverse transcription reaction depends on the priming strategy and also varies among different genes. The efficiency also depends on total RNA concentrations. When performing gene expression measurements by reverse transcription-QPCR we recommend (a) extensive optimization of the reverse transcription reaction, (b) running the experiment in at least duplicate starting with the reverse transcription step, (c) adjusting the total RNA concentrations to be the same in all samples by adding carrier, and (d) always using the same reverse transcription priming strategy and reaction conditions in experiments that are to be compared.
We thank S. Sigge (TATAA Biocenter, Gothenburg, Sweden) and his colleagues for their support. This work was supported financially by SWEGENE, the Crafoord Foundation, and the Wilhelm and Martina Lundgrens Science Foundation.
1 Total number of samples was 20.
2 SDRT is the SD of the reverse transcription step, and SDmRNA is the combined SD of the reverse transcription and QPCR steps Eq. 1. The priming strategy that yielded highest reproducibility lowest SDRT for each gene is underlined. The number of reverse transcription replicates was 5.
3 SDmRNA was less than mean SDQPCR, and the contribution from SDRT could not be estimated.
4 Mixture of the five gene-specific primers.
1 The priming strategy that gave highest reverse transcription yield for each gene is underlined. For β-tubulin, CaV1D, and insulin II, the optimum priming strategy is better than its alternatives with 99% confidence.
2 Mixture of the five gene-specific primers.
3 Maximum difference in reverse transcription efficiency among the four priming strategies.
↵1 Nonstandard abbreviations: QPCR, quantitative real-time PCR; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; dNTP, deoxynucleotide triphosphate; and Ct, threshold cycle.
- © 2004 The American Association for Clinical Chemistry