BACKGROUND: A genomewide genetic and mutational profile of a fetus was recently determined via deep sequencing of maternal plasma DNA. This technology could have important applications for noninvasive prenatal diagnosis (NIPD) of many monogenic diseases. Relative haplotype dosage (RHDO) analysis, a core step of this procedure, would allow one to elucidate the maternally inherited half of the fetal genome. For clinical applications, the cost and complexity of data analysis might be reduced via targeted application of this approach to selected genomic regions containing disease-causing genes. There is thus a need to explore the feasibility of performing RHDO analysis in a targeted manner.
METHODS: We performed target enrichment by using solution-phase hybridization followed by massively parallel sequencing of the β-globin gene region in 2 families undergoing prenatal diagnosis for β-thalassemia. We used digital PCR strategies to physically deduce parental haplotypes. Finally, we performed RHDO analysis with target-enriched sequencing data and parental haplotypes to reveal the β-thalassemic status for the fetuses.
RESULTS: A mean sequencing depth of 206-fold was achieved in the β-globin gene region by targeted sequencing of maternal plasma DNA. RHDO analysis was successful for the sequencing data obtained from the target-enriched samples, including a region in one of the families in which the parents had similar haplotype structures. Data analysis revealed that both fetuses were heterozygous carriers of β-thalassemia.
CONCLUSIONS: Targeted sequencing of maternal plasma DNA for NIPD of monogenic diseases is feasible.
The discovery of cell-free fetal nucleic acids in maternal plasma during pregnancy has opened up new possibilities for noninvasive prenatal diagnosis (NIPD)4 (1). Maternal plasma consists of a mixture of DNA from the fetus and the mother and thus represents considerable challenges for many of its applications. The recent advent of massively parallel sequencing has catalyzed the development of clinical applications of circulating fetal DNA for NIPD (2). Thus, use of this approach with maternal plasma has allowed the robust detection of fetal trisomies 21, 18, and 13 (3–8).
In addition, deep sequencing of maternal plasma DNA has recently been demonstrated to be useful for constructing a genomewide genetic and mutational map of the fetus (9). In this approach, the paternally derived half of the fetal genome is deduced through the detection of paternally inherited fetal single-nucleotide polymorphism (SNP) alleles that are present in maternal plasma but absent from the maternal genome. The maternally derived half of the fetal genome, on the other hand, is deduced from genomic regions for which the father is homozygous and the mother is heterozygous for the analyzed SNP alleles. The fetal inheritance of blocks of such alleles (i.e., a haplotype) is deduced by comparing the relative concentrations of such haplotypes in maternal plasma, a process called “relative haplotype dosage analysis” (RHDO analysis) (9). This approach has demonstrated that the entire fetal genome is present in maternal plasma. For clinical applications, however, such genomewide sequencing might not be the most cost-effective approach, because the clinically relevant genomic regions represent only a minor fraction of the sequencing data. Hence, it would be desirable to explore the possibility of using maternal plasma DNA to perform genetic and mutational profiling of selected genomic regions.
We have previously demonstrated that solution-phase hybridization allows the capture of specific genomic regions from maternal plasma while maintaining the fractional concentration of fetal DNA (10). It is not known, however, whether the uniformity of the capture process across multiple SNPs would allow the RHDO procedure to be performed with target-enriched maternal plasma DNA. If such a “targeted RHDO” process is possible, it would greatly enhance the use of massively parallel sequencing for the NIPD of monogenic diseases, because multiple genomic regions for different monogenic diseases prevalent in a particular population could be analyzed simultaneously.
In addition, the previous study (9) used only genomic regions in which the father's SNPs were homozygous and the mother's SNPs were heterozygous. On the other hand, the haplotype structures of the father and mother are expected to be very similar in the disease-causing genomic regions for monogenic diseases with a strong founder effect and for couples from consanguineous marriages. In other words, if the mother is heterozygous for SNPs in these genomic regions, the father is expected to be heterozygous as well. One would therefore need to develop the necessary algorithms if RHDO analysis is to be performed under this scenario.
Furthermore, the previous study deduced the maternal haplotype from fetal genotype information obtained via chorionic villus sampling (CVS) (9). In actual clinical applications, such information would not be available. Thus, there is a need to formally demonstrate the operation of the RHDO process in families in which the maternal haplotypes have been deduced without prior knowledge of the fetal genotype.
In this study, we attempted to address the 3 issues mentioned above. We sought to (a) demonstrate the feasibility of the targeted RHDO process, (b) develop the algorithms for applying the RHDO process to cases in which the father and mother are both heterozygous for the analyzed SNPs in a genomic region, and (c) achieve these goals in families in which the parental haplotype structures had been elucidated directly with parental genomic DNA via digital PCR (11). Owing to the importance of β-thalassemia in many parts of the world, including China (12), we have used β-thalassemia as a model system to illustrate these concepts.
Materials and Methods
SAMPLE COLLECTION AND DNA EXTRACTION
We recruited study participants with informed consent from the Department of Obstetrics and Gynaecology, Prince of Wales Hospital, Hong Kong. Ethics approval was obtained from the Joint Chinese University of Hong Kong–Hospital Authority New Territories East Cluster Clinical Research Ethics Committee. Blood samples from 2 pregnant women and their husbands were collected into EDTA-containing tubes before CVS. DNA was extracted from plasma and the buffy coat with the QIAamp DSP DNA Blood Mini Kit (Qiagen), as previously described (13). DNA was extracted from the CVS samples with the QIAamp Tissue Kit (Qiagen).
Parental buffy coat DNA and fetal DNA extracted from the CVS samples were genotyped with the Affymetrix Genome-Wide Human SNP Array 6.0 system, as described previously (9).
DESIGN OF BIOTINYLATED RNA CAPTURE PROBES
We designed 54 745 nonduplicated probes (120 nucleotides long) covering 5.11 Mb of the targeted genomic regions. A set of 972 (1.8%) of these probes covered 482 and 478 Affymetrix SNPs on the Affymetrix GeneChip Genome-Wide SNP 6.0 arrays for chromosome 11, which are located in genomic regions 5′ and 3′, respectively, of the HBB5 (hemoglobin, beta) gene. Another set of 2401 probes (4.4%) targeted other SNPs within a 288-kb region (Fig. 1A). These SNPs included 157 Affymetrix SNPs, 3425 SNPs archived in the UCSC database, and 20 SNPs identified in the 2 studied families by Sanger sequencing. The genome coordinates of the region were chr 11: 5060576–5348695 with reference to the human genome (Hg18 NCBI.36). The remaining probes (93.8%) were designed to target chromosomes 7, 13, 18, 21, and X for other studies. These probes were used in this study to calculate the fetal DNA concentration in maternal plasma. All of these RNA probes were biotinylated and were obtained from Agilent Technologies.
LIBRARY PREPARATION FOR TARGET ENRICHMENT AND MASSIVELY PARALLEL SEQUENCING
Libraries were prepared from the extracted plasma DNA with the Paired-End Sequencing Sample Preparation Kit (Illumina), as described previously (9). Targets were enriched with the SureSelect Paired-End Target Enrichment System (Agilent Technologies), as described previously (10).
SEQUENCING AND ALIGNMENT
Two capture libraries from the maternal plasma samples were each sequenced on a single lane of a flow cell on a HiSeq 2000 sequencer (Illumina) with a paired-end (PE) format of 50 bp × 2. The Short Oligonucleotide Alignment Program 2 (SOAP2) (http://soap.genomics.org.cn) was used to align 50-bp PE sequenced reads to the non–repeat-masked reference human genome (Hg18 NCBI.36); 2 nucleotide mismatches were allowed. The downstream analysis was performed with programs written specifically for this research.
GENOTYPING BY SANGER SEQUENCING
A 103-kb gene region extending 46 kb upstream and 57 kb downstream of the HBB gene was divided into 16 fragments and amplified via the PCR with secondary nested PCR primers (see Tables 1–3 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol58/issue10). We used 10 ng of the parental buffy coat DNA for each reaction. Thermal cycling was as follows: 92 °C for 2 min; 10 cycles of 92 °C for 20 s, 47 °C to 53 °C for 15 s, and 68 °C for 10 min; 24 cycles of 92 °C for 20 s, 47 °C to 53 °C for 15 s, and 68 °C for 10 min plus 20 s for each successive cycle; and 68 °C for 7 min. PCR products were analyzed by agarose gel electrophoresis, stained with GelRed (Biotium), and purified with illustra MicroSpin Columns (GE Healthcare). The targeted SNPs were sequenced with the BigDye Terminator v1.1 Cycle Sequencing Kit in an ABI 3100 Genetic Analyzer (Applied Biosystems).
HAPLOTYPE DETERMINATION BY DIGITAL PCR
Haplotyping was performed by digital PCR (11). Primary PCRs were seeded with 3–10 pg buffy coat DNA for each 10-μL reaction. Secondary nested PCRs were subsequently performed with an input of primary-PCR products that had been diluted 200-fold. PCRs were performed in 0.2-mL tubes or 96-well plates with the Expand Long Range, dNTPack kit (Roche Applied Science) and an iCycler (Bio-Rad Laboratories). PCRs were set up with 0.3 μmol/L PCR primers, 0.07 U/μL Expand Long Range Enzyme mix, 1× Expand Long Range buffer with MgCl2, and 0.5 mmol/L of each deoxynucleoside triphosphate. The thermal cycling protocol was the same as described above for genotyping, but with 18 cycles instead of 24 cycles for extension in the primary PCRs (see Tables 1–3 in the online Data Supplement for primer sequences and annealing temperatures).
Secondary nested PCR products were subjected to agarose gel electrophoresis, and the products of positive reactions were purified as described above. Each targeted SNP site on PCR amplicons was sequenced with an ABI 3100 Genetic Analyzer (Applied Biosystems) to determine whether the SNP sites were hemizygous. Amplicons with identical alleles at the same SNP site were bioinformatically combined into a single haplotype. One allele of each heterozygous SNP site was checked by sequencing and assigned to one haplotype, and the other allele was assigned to a second haplotype on the basis of the genotype information.
MEASUREMENT OF FRACTIONAL FETAL DNA CONCENTRATION
We calculated the fractional fetal DNA concentration in maternal plasma (f) from the sequencing data for SNPs that were homozygous in both parents but for different alleles by using the equation: f = Σ2p/Σ(p + q), where p is the read count of the fetus-specific allele (paternal origin) and q is the read count of the allele shared by the maternal and fetal genomes. Such SNPs were captured from regions on the 6 chromosomes selected for target enrichment.
The RHDO process was carried out for a series of SNPs within the same genomic region in which the mother was heterozygous. The combinations of SNPs would therefore form 2 haplotypes in the mother's genome, namely haplotypes I and II (HapI and HapII) (Fig. 2). The RHDO process involved the testing of statistical hypotheses that would allow us to determine whether HapI or HapII was more likely to have been inherited by the fetus.
Each SNP could be classified into type α or type β. A type α SNP was one in which the allele on the maternal HapI was identical to the paternal allele at the same SNP inherited by the fetus. In contrast, a type β SNP was one in which the inherited paternal allele was identical to the maternal allele on HapII. For type α SNPs, overrepresentation of HapI would be observed in maternal plasma if the fetus had inherited HapI from its mother. On the other hand, alleles on HapI and HapII would be equally represented if the fetus had instead inherited HapII from the mother. Similarly, for type β SNPs, overrepresentation of HapII would be observed in maternal plasma if the fetus had inherited HapII from the mother, whereas alleles on HapI and HapII would be equally represented if the fetus had instead inherited HapI from the mother (Fig. 2).
RHDO analysis was performed by using a sequential probability ratio test (SPRT)-based classification (14, 15) of cumulative counts of selected type α and type β SNPs. We excluded type α and type β SNPs that were separated from their neighboring SNPs by a physical distance of <200 bp. This filtering criterion minimized any inaccurate SPRT calls due to the overcounting of SNPs clustering in small regions hybridized on the same probe during enrichment. The null hypothesis for each SPRT analysis was that no dosage imbalance existed between the read counts for HapI and HapII. An odds ratio of 1200 was used to calculate the threshold for accepting or rejecting the null hypothesis. The equations for calculating the upper and lower classification thresholds of the SPRT curves have been described previously (9).
We analyzed 2 families in this study. The father in the first family was a carrier of the −CTTT 4-bp deletion at codons 41/42 of HBB, and the pregnant mother was a carrier of an A→G mutation at nucleotide −28 of HBB. A maternal blood sample was collected into an EDTA-containing blood tube at the 12th week of gestation. This family had been analyzed previously, with maternal plasma DNA having been sequenced in a genomewide fashion (9). The father in the second family was a carrier of an A→T mutation at codon 17 of HBB. The pregnant mother was a carrier of the −CTTT 4-bp deletion at codons 41/42 of HBB; a maternal blood sample was collected at the 11th week of gestation. Fetuses in both families were heterozygous carriers of β-thalassemia (Fig. 1, B and C). The fetus in the first family had inherited the mutation from the father, whereas the fetus in the second family had inherited the mutation from the mother. These findings were confirmed by genotyping the chorionic villus samples.
THE EFFECTIVENESS OF TARGETED SEQUENCING
The β globin gene cluster comprises 4 genes—HBB, HBD (hemoglobin, delta), HBG1 (hemoglobin, gamma A), and HBG2 (hemoglobin, gamma G) (Fig. 1A). Probes were designed to cover DNA fragments from a 288-kb region of the cluster. After target enrichment, a mean of 0.6 × 106 nonduplicated PE reads were mapped to the 288-kb region (Table 1). These PE reads, with a total length of 60 × 106 nucleotides, covered almost the entire target, generating a mean sequencing depth of 206-fold per base.
For the first family, the fractional fetal DNA concentration measured in a nontargeted, genomewide fashion and the fractional fetal DNA concentration measured in a targeted fashion were very similar [11.43% (9) and 12.26%, respectively]. This result indicated that target enrichment had introduced no significant quantitative bias (10).
DETERMINATION OF FETAL HBB GENOTYPE IN THE FIRST FAMILY
We combined targeted sequencing and RHDO analysis to deduce whether the paternal and/or maternal β-thalassemic mutations had been inherited by the fetus. Given that both parents in the first family were heterozygous carriers of different mutations, we were able to seek the paternal mutation specifically in the targeted-sequencing data. Targeted sequencing of the maternal plasma DNA revealed that 60 of 741 sequence reads encompassing the paternal mutation site had the 4-bp deletion from the father (see Table 4 in the online Data Supplement). This result, indicating that the fetus had inherited the paternal mutation, was consistent with the genotyping data (Fig. 1B).
We then used RHDO analysis to deduce whether the fetus had inherited the maternal mutation. We performed digital PCR–based haplotyping across a 103-kb region within the 288-kb region captured for targeted sequencing (Fig. 1A and Fig. 3). In this family, we focused on SNPs heterozygous in the mother but homozygous in the father. Although these SNPs were homozygous in the father, we needed to deduce only the maternal haplotypes experimentally. We amplified DNA molecules at an extreme dilution, roughly 1 haploid genome for each reaction (11), and bioinformatically combined all PCR amplicons carrying hemizygous SNPs together into a single haplotype (Fig. 3). Fragments ranging in size from 2.4 kb to 10.4 kb and carrying 2–23 SNPs were amplified by the PCR to include at least one overlapping SNP at both ends. We identified 161 SNPs across 16 amplicons and used these data to deduce the 2 maternal haplotypes (Fig. 3A; see Table 4 in the online Data Supplement).
We analyzed 51 SNPs that fulfilled the filtering criterion for the SPRT (see Materials and Methods). These SNPs were classified into 5 type α and 46 type β SNPs, and they were counted separately in the SPRT (Fig. 2A). There was no statistically significant SPRT call for type α SNPs. For type β SNPs, we made 5 SPRT classifications, all of which indicated overrepresentation of HapII compared with HapI (Fig. 4A). These results suggested that the fetus had inherited HapII (where the wild-type HBB gene was located) from the mother (see Table 4 in the online Data Supplement). Therefore, the fetus was a heterozygous carrier of β-thalassemia, with the mutation having been inherited from the father. This conclusion was confirmed by CVS-based fetal genotyping, thus confirming the feasibility of the targeted RHDO process.
DETERMINATION OF FETAL HBB GENOTYPE IN THE SECOND FAMILY, IN WHICH BOTH PARENTS HAD VERY SIMILAR HAPLOTYPE STRUCTURES
None of the 826 sequence reads covering codon 17 of HBB contained the paternal β-thalassemic allele, suggesting that the paternal mutation had not been inherited by the fetus (see Table 5 in the online Data Supplement).
The haplotype structures for both parents in the second family were very similar: 94% of all analyzed SNPs that were heterozygous in the mother were also heterozygous in the father. The SNPs that were maternally heterozygous but paternally homozygous across the region were too few to permit RHDO analysis. Therefore, we needed to extend the RHDO analysis to SNPs that were heterozygous in both parents.
For such an analysis, we had to physically deduce all 4 paternal and maternal haplotypes. A number of SNPs that were heterozygous in both parents and were downstream of the HBB gene were relatively distant from each other (approximately 10 kb). Such distances had limited our ability to perform efficient amplification via digital PCR. Therefore, we performed haplotyping on a smaller region (48 kb) that encompassed 58 informative SNPs in a series of 6 amplicons (Fig. 3A; see Table 5 in the online Data Supplement). We identified 2 tagging SNPs that were homozygous in the mother but heterozygous in the father (Figs. 2B and 3A). By tagging the alleles of these SNPs to the physically deduced paternal haplotypes, we were therefore able to determine the inherited paternal haplotype from the targeted-sequencing data.
All SNPs on HapI were assigned as type α SNPs, because the allelic phasing of these SNPs in the 2 parents was the same and because all alleles on HapI were the same as those on the inherited paternal haplotype. We used a total of 24 SNPs for the RHDO analysis. We successfully made 3 SPRT classifications, all of which indicated inheritance of HapI by the fetus (Fig. 4B; see Table 5 in the online Data Supplement). Because the −CTTT 4-bp deletion at codons 41/42 of HBB was present on the maternal HapI, our data suggested that the fetus had inherited the maternal mutation. Thus, the fetus was a heterozygous β-thalassemia carrier, with the mutation having been inherited from the mother (Fig. 1C).
We performed targeted sequencing and RHDO analysis on the β-globin gene cluster to reveal the fetal β-thalassemic profiles in 2 families. Compared with the findings of our previous study (9), the new results revealed that the regions of interest could be analyzed more cost-effectively after target enrichment.
The maternal haplotypes in our previous study (9) were deduced from the fetal genotypes, which would not be available in clinical scenarios. In this study, we deduced haplotypes across the β-globin gene cluster experimentally via digital PCR. Although haplotypes were successfully determined, this method is laborious, and its success is limited by such factors as the GC content and the distribution of SNPs across the analyzed region. These technical factors, as well as the specific haplotype structures of the 2 parents, have limited RHDO classification across regions both upstream and downstream of the mutations (see Tables 4 and 5 in the online Data Supplement). The possibility of recombination having occurred between a RHDO block and the mutation could not be excluded with the current data set, although the expected probability of such an event is low. These technical issues could be resolved by using new genomewide haplotyping approaches (16–19). In principle, haplotype blocks from any locus can be defined with these methods, thereby allowing analyses across regions both upstream and downstream of the mutations and analyses of other monogenic disorders, such as cystic fibrosis and congenital adrenal hyperplasia (20–22).
The classification of type α and type β SNPs for the SPRT in RHDO analysis depends on the identities of the inherited paternal alleles, compared with those on an assigned maternal haplotype. Thus, knowledge of all 4 parental haplotypes, particularly the inheritance of paternal alleles, is essential for assigning type α and type β SNPs. Previous work on the RHDO process (9) had studied only SNPs that were heterozygous in the mother but homozygous in the father. Because the paternal alleles were homozygous, the inherited paternal haplotype was known, and type α or type β SNPs could therefore be assigned for RHDO analysis with reference to the fetal genotypes in a trio family.
In this study, we used digital PCR to deduce all 4 parental haplotypes in a family and extended the use of polymorphic markers to all maternally heterozygous SNPs, including those that were heterozygous in both parents, thereby enhancing the spectrum of informative markers usable for NIPD. The use of a tagging SNP allowed all alleles on the inherited paternal haplotype to be observed in the targeted-sequencing data. In fact, assays for SNPs that are heterozygous in both parents are especially useful in NIPD for consanguineous families in which the parents share very similar haplotype structures and for genetic diseases having a strong founder effect.
The relative mutation dosage approach (23) is an alternative for noninvasively detecting a slight overrepresentation of a mutation in maternal plasma. When one is targeting a specific mutation, the relative mutation dosage process is simpler (e.g., it can be performed entirely via digital PCR and requires no haplotyping information) and is more cost-effective than the targeted RHDO process. Each additional mutation targeted would require the entire optimization process to be repeated, however. In contrast, the targeted RHDO process allows the same sequencing and bioinformatics work flow to be used for multiple genomic regions. Thus, the availability of both relative mutation dosage and targeted RHDO approaches allows clinicians and investigators to choose the process most appropriate for their applications.
In conclusion, we have demonstrated that targeted sequencing and RHDO analysis can be used for the NIPD of β-thalassemia. The concept we have illustrated could be generalized for other genetic disorders, thus expanding the application of plasma DNA–based NIPD.
We thank H. Sun for helpful discussion. We thank K.Y.W. Chan, L.Y.S. Chan, K.C.K. Chow, and Y. Jin for performing the sequencing. We thank the staff of the Department of Obstetrics and Gynaecology of the Prince of Wales Hospital for their kind help during the project.
↵4 Nonstandard abbreviations:
- noninvasive prenatal diagnosis;
- single-nucleotide polymorphism;
- relative haplotype dosage;
- chorionic villus sampling;
- Short Oligonucleotide Alignment Program 2;
- haplotype I;
- sequential probability ratio test.
↵5 Human genes:
- hemoglobin, beta;
- hemoglobin, delta;
- hemoglobin, gamma A;
- hemoglobin, gamma G.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors' Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest:
Employment or Leadership: None declared.
Consultant or Advisory Role: R.W.K. Chiu, Sequenom; Y.M.D. Lo, Sequenom.
Stock Ownership: R.W.K. Chiu, Sequenom; Y.M.D. Lo, Sequenom; K.C.A. Chen, Sequenom.
Honoraria: None declared.
Research Funding: University Grants Committee of the Government of the Hong Kong Special Administrative Region, China, under the Areas of Excellence Scheme (AoE/M-04/06), Sequenom, and General Research Fund Scheme of the Hong Kong Research Grants Council (CUHK463109); Y.M.D. Lo, University Grants Committee Areas of Excellence Scheme and Sequenom.
Expert Testimony: None declared.
Patents: K.C.A. Chan, United States patent application 2011/0105353 and holding of or application for patents on noninvasive prenatal diagnosis using fetal nucleic acids in maternal plasma; Y.M.D. Lo, United States patent application 2011/0105353.
Other Remuneration: Y.M.D. Lo, Sequenom and Illumina.
Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.
- Received for publication May 15, 2012.
- Accepted for publication July 24, 2012.
- © 2012 The American Association for Clinical Chemistry