Abstract
BACKGROUND: Humans suffer from infections caused by single species or more complex polymicrobial communities. Identification of infectious bacteria commonly employs microbiological culture, which depends upon the in vitro propagation and isolation of viable organisms. In contrast, detection of bacterial DNA using next generation sequencing (NGS) allows culture-independent microbial profiling, potentially providing important new insights into the microbiota in clinical specimens.
METHODS: NGS 16S rRNA gene sequencing (NGS16S) was compared with culture using (a) synthetic polymicrobial samples for which the identity and abundance of organisms present were precisely defined and (b) primary clinical specimens.
RESULTS: Complex mixtures of at least 20 organisms were well resolved by NGS16S with excellent reproducibility. In mixed bacterial suspensions (107 total genomes), we observed linear detection of a target organism over a 4-log concentration range (500–3 × 106 genomes). NGS16S analysis more accurately recapitulated the known composition of synthetic samples than standard microbiological culture using nonselective media, which distorted the relative abundance of organisms and frequently failed to identify low-abundance pathogens. However, extended quantitative culture using selective media for each of the component species recovered the expected organisms at the proper abundance, validating NGS16S results. In an analysis of sputa from cystic fibrosis patients, NGS16S identified more clinically relevant pathogens than standard culture.
CONCLUSIONS: Biases in standard, nonselective microbiological culture lead to a distorted characterization of polymicrobial mixtures. NGS16S demonstrates enhanced reproducibility, quantification, and classification accuracy compared with standard culture, providing a more comprehensive, accurate, and culture-free analysis of clinical specimens.
Clinical management of infection relies on early detection and identification of the pathogen(s) involved to facilitate appropriate, targeted therapy. The recovery of individual organisms in culture allows for biochemical and phenotypic identification as well as antimicrobial susceptibility testing; culture therefore remains the mainstay of the current clinical microbiology laboratory. However, clinically relevant organisms may be difficult to culture, including fastidious and/or slow-growing bacteria that are vulnerable to improper sample storage, transport, or culture conditions (1–4). Additionally, culture-based identification may take multiple days or weeks, and there are practical limits to the number of different species within a complex mixture that can be isolated and identified using culture-based techniques (5).
Standard molecular methods, in which target genes are PCR-amplified from patient specimens for conventional Sanger sequencing, do not depend on in vitro growth for bacterial detection and have become routine in many clinical microbiology laboratories (6). However, this approach is not readily applied to polymicrobial specimens, where the presence of multiple templates results in superimposed Sanger electropherograms that are generally uninterpretable (5). More recently, next generation sequencing (NGS)3 has been used to overcome this limitation: because millions of DNA molecules are sequenced independently and in parallel, each can be classified individually. With the advent of rapid and affordable bench-top DNA sequencers, NGS is becoming an attractive extension of Sanger sequencing methods for culture-independent microbial profiling in clinical microbiology laboratories.
Although NGS analysis offers theoretical advantages over standard microbiological culture, rigorous comparisons of the 2 methods have not been performed. Defined mixtures of genomic DNA are commercially available, but cannot be used to evaluate culture performance. Comparisons using patient samples are complicated by the fact that the true identity and relative abundance of bacteria present are unknown. In this study, we evaluated NGS and microbiological culture using both clinical specimens and synthetic polymicrobial test samples created by combining bacteria in precisely defined ratios.
Materials and Methods
SAMPLES
Use of clinical microbiological specimens was approved by the University of Washington Human Subjects Review Board (approval number 42541) and was conducted in accordance with the Declaration of Helsinki. As a minimal risk study utilizing surplus microbiological isolates, consent was not required. Control amplicon mixtures (see Supplemental Fig. 1A in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol62/issue11) were generated by pooling equal amounts of individual amplicon sequencing libraries. Commercial control mixtures (see online Supplemental Fig. 1B) containing genomic DNA from 20 bacterial species at defined 16S rRNA operon counts (105 copies/μL each, or 103–106 copies/μL), were obtained from BEI Resources (Mock Community B). Synthetic polymicrobial test samples (see online Supplemental Fig. 1C) were created by growing bacteria separately in BHI broth with vitamin K and hemin (Remel R07019), precisely measuring the bacterial concentration for each culture using a Multisizer 4 cell counter (Beckman Coulter), and combining in defined ratios. DNA was extracted from synthetic test samples using the UltraClean microbial DNA isolation kit (MOBIO) and from human-derived specimens using a High Pure PCR template preparation kit (Roche) with mechanical disruption of samples with 1.4-mm ceramic beads followed by enzymatic lysis via proteinase K.
LIBRARY PREPARATION, SEQUENCING, AND DATA ANALYSIS
Sequencing libraries were prepared as previously described (7); a detailed description is provided in the online Supplemental Methods file. Briefly, the 16S rRNA V1–V2 region was amplified using custom primers incorporating Illumina-compatible sequencing adaptors and a sample-specific 8-bp barcode sequence. Paired-end sequencing was performed on an Illumina MiSeq using a 500-cycle sequencing kit (version 2) with custom primers (7) and in the presence of bacterial whole-genome shotgun sequencing libraries or 7% PhiX control (Illumina). Raw data processing and run demultiplexing were performed using on-instrument software. Sequences were processed and classified as previously described (5, 8). Where appropriate, the relative abundance of each classification was corrected for 16S ribosomal RNA operon copy numbers (9) using the NCBI taxonomy database (10) and the Ribosomal RNA Database project (RDP; release 10, update 32), and was expressed as percent corrected reads. For each taxonomic node where 16S rRNA copy numbers were not available, copy numbers were inferred in 1 of 2 ways. First, for taxonomic nodes where 16S copy numbers were available in at least 1 descendent taxonomic classification with lower-order rank (i.e., if genus “Escherichia” had an unknown copy number but species “Escherichia coli” had a known copy number), that node was designated the median copy number of all immediate descendent classifications. This process was repeated recursively at each taxonomic level to the root of the taxonomy. Secondly, and conversely, for taxonomic classifications where 16S rRNA copy numbers of a lower-order rank were not available (i.e., if species “Escherichia coli” had an unknown copy number but genus “Escherichia” had a known copy number) each descendent node was assigned the 16S rRNA copy number of the immediate parent node. This process was repeated recursively, working out from the root of the taxonomy, until every taxonomic node was assigned a copy number.
MICROBIOLOGICAL CULTURE
Microbiological culture was performed by the University of Washington Clinical Microbiology Laboratory, according to standard clinical procedures. Sputum samples were plated on sheep blood, MacConkey, chocolate, mannitol salt, and cepacia agar plates as previously described (5). Synthetic polymicrobial test samples were plated on sheep blood, MacConkey, and chocolate agar plates. For extended quantitative culture of synthetic test samples, 100 μL of various dilutions in PBS were plated onto Hektoen agar (Remel 453572, to isolate Salmonella), mannitol salt agar (Remel 453902, to isolate Staphylococcus), and LB agar containing either 50 μg/mL chloramphenicol (to isolate Pseudomonas) or 40 μg/mL gentamycin (to isolate Achromobacter). Colonies on dilution plates containing 30–300 colonies were counted and used to calculate colony-forming units per milliliter.
Results
ACCURACY OF NGS16S FOR INTERROGATING POLYMICROBIAL SAMPLES
To assess the degree of bias inherent to 16S rRNA NGS analysis (NGS16S) of complex mixtures, we analyzed control mixtures in which the identity and relative abundance of all organism templates was known. We began by sequencing a sample containing equimolar amounts of individual amplicon libraries (see online Supplemental Fig. 1A). In this sample, DNA extraction or PCR biases were eliminated by quantifying and pooling separate products downstream from these steps. Reads corresponding to all organisms were recovered near the expected abundance of 5.5% per organism (Fig. 1A, mean 5.5%, range 4.4%–6.4%), indicating minimal sequencing bias. We next sequenced 2 control mixtures comprised of genomic DNA individually extracted from 20 different organisms and combined in defined 16S rRNA operon counts for each organism (see online Supplemental Fig. 1B). For these samples, extraction biases were eliminated by the pooling strategy, and any differences from the expected values were largely the result of PCR biases introduced during library preparation. When16S templates for each organism were present in equal concentration, all organisms were identified by NGS16S analysis (Fig. 1B). However, the relative abundances of many organisms differed up to 2-fold from the expected value of 5%, indicating organism-specific PCR biases. Despite this observation, all organisms were detected in the sample where the highest and lowest template concentration differed by 4 orders of magnitude (Fig. 1C). This demonstrated that the presence of abundant templates did not obscure the detection of rare templates in the same sample. Importantly, the relative abundance of organisms inferred by NGS16S mirrored the known abundance of templates in this sample (Fig. 1C), indicating its utility as a semiquantitative assay.
A control mixture of pooled amplicons (A) or commercial control genomic DNA mixtures containing defined 16S rRNA operon counts [105copies/μL each (B), or 103–106 copies/μL (C)] were sequenced. Mean read counts are provided for organisms of abundance too low to be visualized. Refer to online Supplemental Fig. 1, A and B for illustrations of each sample type.
REPRODUCIBILITY OF NGS16S ANALYSIS
To test NGS16S reproducibility, we sequenced DNA extracted from a single clinical brain abscess specimen for which multiple templates were detected by conventional Sanger sequencing. Serial dilutions of this DNA were used as template for library preparation and the greatest dilution of DNA that yielded a visible product (1:1000, data not shown) was identified as the detection limit for this specimen. Separate dilutions at above (1:500) and below (1:5000) this limit were subsequently used as templates for duplicate library preparation on separate days by different laboratory personnel. Excellent inter- and intraassay reproducibility was observed (Fig. 2); statistically indistinguishable relative abundance assignments (1-way ANOVA, P <0.05) were generated for each classification reported. We concluded that NGS16S analysis was highly reproducible over a wide range of sample concentrations.
NGS16S libraries were generated from dilutions of the same clinical sample on 3 separate days by different laboratory personnel. Replicates from each day are presented side by side. Classifications at genus level or higher are listed as “Other.” Refer to online Supplemental Table 1 for summary statistics.
DETECTION OF A TARGET ORGANISM IN A DEFINED POLYMICROBIAL SAMPLE
To better define the limits of NGS16S analysis, we created and analyzed defined polymicrobial test mixtures of intact cells. We precisely quantified bacteria in separate cultures of 4 different organisms using a Coulter counter (which accurately counts individual particles in solution), combined 3 of those organisms in defined proportions, and then serially diluted the final organism, Salmonella typhimurium, into the mixture (see online Supplemental Fig. 1C). Thus, the relative abundance of 3 organisms was held constant, while a fourth varied over a wide range. DNA was extracted from each dilution and subjected to NGS16S analysis using 107 total genomes as template. As few as 500 Salmonella (0.005% of the total bacteria) could be detected by NGS16S (Fig. 3A). The recovery of Salmonella sequences was linear over 4 orders of magnitude (Fig. 3B, r2 = 0.86), although the relative abundance of Salmonella by NGS was slightly lower than the known input value. This reflects cumulative bias in DNA extraction, PCR amplification and NGS sequencing (11–18).
(A), S. typhimurium (SALM) was serially diluted into a mixture containing 75% A. xylosoxidans (AXYL), 24% S. aureus (SAUR), and 1% P. aeruginosa (PSAR). (B), The observed abundance of Salmonella detected by NGS (% corrected reads) compared to the known input (% input bacteria). Refer to online Supplemental Fig. 1C for illustration of sample type.
COMPARISON OF NGS16S AND CULTURE FOR ANALYSIS OF POLYMICROBIAL SPECIMENS
To directly compare NGS16S and standard culture, we again used defined polymicrobial test mixtures in which constituent organisms were precisely quantified (see online Supplemental Fig. 1C). Mixtures were divided into multiple aliquots and subjected to (a) DNA extraction and NGS16S analysis, (b) standard clinical culture and scored according to conventional 1+ through 4+ quantification, and (c) extended quantitative culture in which multiple sample dilutions were plated onto various selective media to allow isolation and colony-forming unit enumeration of each of the 4 constituent organisms independently. Studies were performed in quadruplicate and benchmarked against known organism abundance as established using Coulter counting (Fig. 4, A and B). For each replicate, NGS16S properly recapitulated the known composition of the mixtures, including organisms present at 1% relative abundance (Fig. 4, C and D). In contrast, standard culture, where organisms are plated at high density on rich media and must directly compete with one another, was less accurate (Fig. 4, E and F). Staphylococcus aureus and S. typhimurium were universally reported as 4+ (in some cases a higher quantification than the most abundant organism), although they represented no more than 20% of the overall mixture. Organisms at 1% relative abundance were either not detected (in 5 of 8 replicates), or were reported at amounts equal to or higher than the most abundant organism (3 of 8 replicates).
Bacteria were combined in various proportions to generate polymicrobial test samples of known input composition (A, B), which were subjected to NGS16S analysis (C, D), standard clinical culture (E, F) and quantitative culture on selective medium (G, H). Relative abundance of recovered component organisms (output) is expressed as true percent of input cells by Coulter counting (A, B), percent corrected reads (C, D), 1+, 2+, 3+, or 4+ (E, F), and percent of total recovered colonies by quantitative culture on selective medium (G, H). Mean output (% of total) is provided for organisms of abundance too low to be visualized. PSAR, P. aeruginosa; SALM, S. typhimurium; SAUR, S. aureus; AXYL, A. xylosoxidans; NR, not recovered. Refer to online Supplemental Fig. 1C for illustration of sample type.
Extended quantitative culture, using various selective media to support the growth of a single constituent and to eliminate direct competition from the other constituents, more accurately reflected actual organism abundance than did standard culture (Fig. 4, G and H). The most and least abundant organisms were correctly and reliably identified, and the relative abundance of all organisms mirrored known proportions in the input mixture. These results validated the accuracy of NGS16S analysis and highlight previously underappreciated biases of standard culture practices.
CHARACTERIZATION OF PRIMARY CLINICAL SPECIMENS
To compare the methods in clinical context, we evaluated the ability of NGS16S and culture to identify select bacterial pathogens of clinical significance from cystic fibrosis (CF) sputa samples (see online Supplemental Table 2). A cohort of 15 patient sputa were processed by the clinical laboratory using standard workflows, and DNA was extracted from residual material and subjected to NGS16S analysis (online Supplemental Table 3 provides a comprehensive list of all organisms identified in each sample).
Six of the pathogens of interest were detected in 1 or more cases (Table 1), and all CF pathogens identified by culture were also identified by NGS16S. However, in 3 samples, CF pathogens were identified only by NGS16S (Tables 1 and 2): Streptococcus agalactiae (specimen CF2, 9.55% of corrected reads), Achromobacter xylosoxidans (specimen CF4, 10.26% of corrected reads), and Burkholderia cepacia complex (specimen CF15, 4.48% of corrected reads). The percentage of reads for each of these organisms was well above the detection limit identified in previous experiments with defined samples (0.005%, Fig. 3). It is unlikely that these organisms were the result of reagent or sample contamination since corresponding reads were not observed in any paired negative control sample.
Select CF pathogens identified by microbiological culture and NGS16S.a
Comparison of culture and NGS16S results for select samples.a
We also compared the quantification of the 2 assays in this context (Table 2). Pseudomonas aeruginosa was the most commonly identified pathogen in the cohort (12 of 15 samples), and its relative abundance by culture was consistent with the quantification by NGS16S in most cases (Table 2). However in 3 samples where P. aeruginosa was reported as 2+ or less by culture, it ranged from 0.47% (specimen CF3) to 90% (specimens CF12, CF13) relative abundance by NGS16S read count. S. aureus, another frequently identified CF pathogen (6 of 16 samples), was consistently overestimated by culture compared to NGS16S; although never more than 3.53% relative abundance by NGS, it was quantified in culture over a range of 1 colony to 4+ abundance (Table 2; also see online Supplemental Fig. 2).
In addition to the identification of known pathogens, NGS16S also provided more information than culture about other organisms in each sample. There were 33 species-level identifications by NGS16S compared to only 10 for standard microbiological culture (see online Supplemental Table 3). Whereas clinical reporting described “contaminating oral-pharyngeal microbiota,” NGS16S was able to resolve the identities and relative abundance of these organisms (see online Supplemental Table 3). Importantly, anaerobes were found in a majority (13 of 15) of these samples by NGS16S, but not identified using standard clinical culture. The most common organisms were Veillonella (8 of 15 samples) and Prevotella (5 of 15 samples).
Finally, culture identified some organisms that were missed by NGS16S (Table 2). 16S rRNA gene sequence analysis is restricted to detection and identification of bacteria, and the majority of discrepancies between the 2 methods (7 of 8) involved yeast or fungi. In only a single case was a bacterium identified by culture but not NGS16S. In CF14, Ewingella americana, was reported in low abundance. The clinical significance of this organism is unclear; it is not known to be part of the normal respiratory microbiota, and can survive in water (19), suggesting that it could be an environmental contaminant.
Discussion
The ease, speed, and decreasing cost of NGS sequencing make the application of this technology increasingly attractive for routine clinical use. Here, we demonstrate that NGS sequencing generates a reproducible (Fig. 2), analytically sensitive (Fig. 3), and accurate assessment of the identity and relative abundance of organisms present in polymicrobial samples, outperforming standard culture (Fig. 4). Our work indicates that the interpretation of standard culture results, where multiple organisms with differing growth requirements compete for resources on rich, nonselective media, is substantially more complicated and biased than commonly presumed.
Culture-based methods are the mainstay of specimen analysis in the clinical laboratory, and the isolation of individual organisms is required for antimicrobial susceptibility testing, strain typing and virulence studies. For an organism to be identified in culture, it must be both viable in situ and culturable in vitro. Individual colonies must also be physically isolated and differentiated from others on a culture plate. Yet, viable organisms may fail to grow in vitro due to prior broad-spectrum antibiotic treatment, improper sample storage, transport, or culture conditions. During culture, there is underappreciated amplification bias: a single organism must replicate to approximately 109 organisms to form a visible colony on a plate. Because growth of a particular organism may also be affected by the presence of other organisms in the sample (1, 20–26), known pathogens in a polymicrobial specimen can be obscured or over-estimated (Fig. 4, E and F). Therefore, standard culture is unlikely to reproducibly provide a complete understanding of the microbial composition of a sample containing a complex bacterial community (27).
NGS technology eliminates the requirements of viability, cultivability and physical isolation for bacterial detection and identification from patient specimens. Despite the potential biases of NGS-based methods, introduced by differential DNA extraction and/or PCR amplification as a result of template concentration and complexity (11–18), our results from defined specimens (Fig. 4), and complex DNA mixtures (Fig. 1), demonstrate that NGS16S is superior to standard culture for accurate detection and relative abundance assignments of specimen constituents in complex mixtures. In applications where bacterial load may be of importance NGS16S could be supplemented with qPCR (28). Extended quantitative culture on selective media, each of which supports the growth of only 1 of the constituent organisms present in a polymicrobial test sample (Fig. 4, G and H), both validated our NGS16S results and confirmed the existence of bias introduced in standard culture practices where multiple organisms may be in direct competition. In our analysis of CF sputa, standard culture consistently overestimated the abundance of S. aureus, suggesting a recovery bias in favor of this fast-growing organism with a clearly distinguishable colony phenotype (Fig. 4, E and F, online Supplemental Fig. 2). Although standard culture may be effective for identification of rapidly growing organisms, it may fail to identify other clinically important community members without extensive effort, and we have found that it is unreliable for determining relative abundance.
This study also highlights other aspects of NGS16S that improve upon existing practice. First, although commonly thought of as single agents, pathogens often exist in a polymicrobial context in vivo. Clinically relevant organisms may not directly cause disease, but nevertheless contribute to pathogenesis by altering the microenvironment to facilitate colonization or virulence gene expression of other bacteria (1, 8, 24, 25, 29–31). Therefore, knowledge of the identity and relative abundance of organisms detected in a clinical sample may have important implications for patient care. For example, accurate and complete identification of all organisms present in a sample (particularly unexpected organisms) will facilitate targeted antimicrobial therapy, reducing broad-spectrum antibiotic use (32, 33). Similarly, the metabiome profiles associated with various disease states may provide diagnostic or prognostic information (1, 4, 5, 34–37). In our evaluation of CF sputa, NGS16S analysis provided a more comprehensive assessment of sample microbiota than did standard culture. As a case in point, 3 pathogens of clinical interest, S. agalactiae, A. xylosoxidans, and B. cepacia complex, were identified at significant levels from NGS16S analysis but not recovered in culture (Table 2). Secondly, NGS16S analysis resolved the identities of organisms commonly reported by clinical laboratories as “contaminating oral-pharyngeal microbiota” (see online Supplemental Table 3). Interestingly, anaerobes were found in a majority of samples, the most common being Veillonella and Prevotella, in agreement with published data (4, 35, 37–40). Although not traditionally part of a standard CF culture workup, anaerobes may play a more important role in CF pathology than has been previously appreciated. Anaerobes can be recovered from CF sputa at equivalent numbers to aerobes (37), and recent studies indicate that production of inflammatory short-chain fatty acids (30) or extended-spectrum β-lactamases (31) by anaerobes may contribute to CF disease.
These data demonstrate that NGS-based analysis is redefining the catalog of potential pathogens in human disease states and expanding our view of polymicrobial contributions to the pathogenesis of infection. Moreover, these studies provide a context for the appropriate evaluation of the accuracy of NGS analysis of patient samples when compared to culture based methods. By performing carefully controlled experiments such as those presented here, we can better appreciate the limitations of both NGS16S and culture-based approaches and understand how these complementary methods can best be used in clinical practice.
Footnotes
Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
↵3 Nonstandard abbreviations:
- NGS,
- next generation sequencing;
- NGS16S,
- NGS of the 16S rRNA gene;
- CF,
- cystic fibrosis.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors' Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest:
Employment or Leadership: None declared.
Consultant or Advisory Role: None declared.
Stock Ownership: None declared.
Honoraria: None declared.
Research Funding: National Center for Advancing Translational Sciences of the National Institutes of Health (grant number UL1TR000423).
Expert Testimony: None declared.
Patents: None declared.
Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, and final approval of manuscript. See feedback in article file for further information.
- Received for publication April 7, 2016.
- Accepted for publication August 17, 2016.
- © 2016 American Association for Clinical Chemistry