Metagenome phylogenetic profiling of microbial community evolution in a tetrachloroethene-contaminated aquifer responding to enhanced reductive dechlorination protocols

Chlorinated solvent contamination of potable water supplies is a serious problem worldwide. Biostimulation protocols can successfully remediate chlorinated solvent contamination through enhanced reductive dechlorination pathways, however the process is poorly understood and sometimes stalls creating a more serious problem. Whole metagenome techniques have the potential to reveal details of microbial community changes induced by biostimulation. Here we compare the metagenome of a tetrachloroethene contaminated Environmental Protection Agency Superfund Site before and after the application of biostimulation protocols. Environmental DNA was extracted from uncultured microbes that were harvested by on-site filtration of groundwater one month prior to and five months after the injection of emulsified vegetable oil, nutrients, and hydrogen gas bioamendments. Pair-end libraries were prepared for high-throughput DNA sequencing and 90 basepairs from both ends of randomly fragmented 400 basepair DNA fragments were sequenced. Over 31 millions reads were annotated with Metagenome Rapid Annotation using Subsystem Technology representing 32 prokaryotic phyla, 869 genera, and 3,181 species. A 3.6 log2 fold increase in biomass as measured by DNA yield per mL water was measured, but there was a 9% decrease in the number of genera detected post-remediation. We apply Bayesian statistical methods to assign false discovery rates to fold-change abundance data and use Zipf’s power law to filter genera with low read counts. Plotting the log-rank against the log-fold-change facilitates the visualization of the changes in the community in response to the enhanced reductive dechlorination protocol. Members of the Archaea domain increased 4.7 log2 fold, dominated by methanogens. Prior to remediation, classes Alphaproteobacteria and Betaproteobacteria dominated the community but exhibit significant decreases five months after biostimulation. Geobacter and Sulfurospirillum replace “Sideroxydans” and Burkholderia as the most abundant genera. As a result of biostimulation, Deltaproteobacteria and Epsilonproteobacteria capable of dehalogenation, iron and sulfate reduction, and sulfur oxidation increase. Matches to thermophilic, haloalkane respiring archaea is evidence for additional species involved in biodegradation of chlorinated solvents. Additionally, potentially pathogenic bacteria increase, indicating that there may be unintended consequences of bioremediation.


Introduction
Contamination of groundwater with chlorinated solvents is a major threat to potable water supplies, but the challenge of remediating contamination can be addressed by exploiting the metabolism of microbial communities. While qPCR and microarrays are effective tools for designing and monitoring remediation strategies [1][2][3], whole metagenome phylogenetic profiling provides unparalleled opportunities to understand the genetic response of microbial communities to remediation strategies. A proof-of-concept project was undertaken to determine the efficacy of whole metagenome profiling of aquifer-borne microbial consortia in a tetrachloroethenecontaminated EPA superfund site to increase our understanding community response to biostimulation.
Tetrachloroethene (also known as perchloroethylene) is a solvent used for dry cleaning since the 1930s. PCE is now a common contaminant of groundwater that is a likely carcinogen [4]. The daughter products of PCE biodegradation include TCE, cis and trans isomers of DCE, VC, and ethene. VC is the most toxic of these compounds and a major problem is the stalling of biodegradation. Enhanced in-situ biodegradation involves the addition of electron donor or food source (substrate), electron acceptors, nutrients, and/or selective cultures of beneficial microbes into the subsurface to accelerate the rate of biodegradation. RD is a biodegradation process that is restricted to anaerobic conditions capable of eliminating PCE and its daughter products. The selection of appropriate electron donors is an important site-specific component for ERD. Although our current knowledge of the microbes and the biochemical pathways involved RD is largely limited to species that can be cultured, reductive anaerobic biological in-situ treatment technology augmentation protocols create favorable environmental conditions for RD microbial consortia [5]. This study compares two whole metagenomes from the same well within a PCE-contaminated aquifer one month before and five months after biostimulation with electron donors and nutrients. This is part of a larger study that includes two additional time points (23 and 43 months) for this well, and similar points for an additional well on the NRAP site.

Site information
In 1989, PCE and TCE were detected in municipal water supply wells operated by the City of Española, New Mexico. The source of the contamination was a 58-acre (0.23 km 2 ), 260-foot (79.2 m)-deep plume of PCE from a now closed dry cleaner and laundromat located on North Railroad Avenue. Because this aquifer is the sole source drinking water aquifer for Española, the Santa Clara Pueblo, and nearby populations, the NRAP site was designated an EPA Superfund site in 1999 (National Priorities List #NMD986670156). Some minor soil contamination was found near the PCE surface release at NRAP; however, most of the contaminant mass was in the shallow saturated zone and occurred as high-level, adsorbed-and dissolved-phase PCE in the aquifer beneath the release area (source area). Prior to remedy implementation the dissolved-phase plume migrated to within the boundaries of the Santa Clara Pueblo trust lands above maximum allowable contaminant levels.
The selection of a remediation strategy relied on the analysis of groundwater using established regulatory guidelines for geochemical and contaminant chemistry [6], and commercially available molecular biological techniques [7]. As per ITRC protocols water quality parameters such as dissolved oxygen, temperature, pH, oxidation-reduction potential, and specific conductance were monitored in the field. Laboratory analyses included a suite of geochemical parameters shown in Table 10. Data from DGGE of 16S rRNA genes, PLFA analysis, and a qPCR screen for dechlorinating bacteria, was obtained from Microbial Insights, Inc. (Rockford, TN). Genera detected by DGGE included Dechloromonas, Sulfurimonas, Thiomicrospira, Sulfurovum, Gallionella, and Zoogloea. PLFA analysis indicated the presence of Firmicutes and anaerobic metal reducing bacteria. The presence of indigenous dechlorinating genera detected by qPCR included Dehalococcoides, Desulfuromonas, and Dehalobacter, indicating that the addition of nonindigenous microbes (known as bioaugmentation) was unnecessary. This data indicated the appropriate remedial action for NRAP was biostimulation, achieved by injecting the bioamendments EVO, a nutrient mix, and the addition of hydrogen gas as an electron donor [7]. Figure 1 shows the progress of VOC conversions during pilot scale remediation operations at NRAP. Prior to the addition of bioamendments, the majority (~97%) of the chloroethenes were in the form of PCE. TCE (~1.0%) and DCE isomers (~1.5%) accounted for the remaining mass; and VC was not detected in any of the wells sampled. By comparison, in October 2007, a large majority of the contaminant mass was transformed from PCE to cis-DCE. More than three-quarters (~77%) was cis-DCE while 20% remained as PCE. TCE and VC accounted for the remaining portion at 1.5% each. Ethene production was evident by January 2008.

Metagenome sequencing information
Metagenome project history NRAP was sampled four times for nucleic acid analysis starting in 2007; once one month prior to, then at five, 23, and 43 months following the addition of bioamendments. The two time points selected for this preliminary study were during the pilot-scale operations at NRAP, one collected prior to the initiation of bioremediation and one collected five months after the initial injections of EVO, nutrient mix, and H 2 (Fig. 1). Study information is shown in Table 1.

Sample information
The sampling events analyzed as part of the pilot project occurred on 15 June 2007 (SAE3-0) and on 27 November 2007 (SAE3-5) at the NRAP site in Española, NM (latitude: 35.992053, longitude: −106.079752, and a depth of 2.9 m.) Sample information is provided in Table 2 as per minimal information standards [8]. The exact time of sampling was not recorded, but both samples were extracted from the well in the afternoon, at approximately 14 h local time.

Sample preparation
Suspended solids, including microbes were collected onsite by filtering the water through a Gelman 0.2 micron filter capsule (Cat. #12117, Pall Gelman, Ann Harbor, MI) attached to the discharge of a dedicated, submersible pump for SAE3. The volume of water filtered was monitored and filtration was allowed to proceed until the flow was reduced due to clogging of the filter. Filters were sealed with entrained water and residue and transported to the laboratory on ice. Filter residues, inclusive of captured microbes and particulates, were recovered by shaking the filters overnight on a Berrell wrist-action shaker (Model 75, Philadelphia, PA) followed by back flushing the filter with sterile 10 mM Tris-SO 4 at a pH 7.8. Samples were concentrated by centrifugation at 25,000 x g for 30 minutes. The resulting pellet was resuspended in Tris-SO 4 and frozen at −20°C without cryopreservation for nucleic acid extraction.

DNA extraction
Microbial eDNA was extracted from the uncultured but concentrated microbial samples SAE3-0 and SAE3-5 using protocols for microbes pursuant to the G-nome® DNA isolation kit (Qbiogene Cat # 2010-200, Carlsbad, CA), consisting of treatment with RNase, lysis of the cells, protease treatment, and precipitation of the protein and lipids with high salt. The eDNA in the resulting supernatant was collected and precipitated by the addition of 1/10 volume of 3 M sodium acetate and 2.5 volumes of ethanol. The eDNA was further purified with a Geneclean® Turbo-Kit (Qbiogene cat #1102-200) following the manufacturer's protocol, which involves

Library generation
The eDNA from SAE3-0 and SAE3-5 were subjected to Illumina paired-end sequencing protocols at the National Center for Genome Resources (NCGR) in Santa Fe, New Mexico. The purified eDNA was subjected to mechanical fragmentation by nebulization and the resulting double-stranded overhang fragments were end-repaired, phosphorylated, and ligated to proprietary adapter oligonucleotides (Illumina Cat # PE-102-1001). Ligation products were size-selected by gel electrophoresis and the 400 bp fraction excised. Purified DNA libraries were subjected to a final PCR amplification step (10 cycles). The index sequences, oligos six bp in length, were added during the PCR enrichment step in order to allow for the pooling of multiple samples in a single lane of an Illumina IIx flowcell. All amplified libraries are quantitatively and qualitatively assessed by Nanodrop ND-1000 UV/Vis spectroscopy and in a DNA Bioanalyzer 2100 (Agilent, CA, USA).

Sequencing technology
Ninety bp from each end of the 400 bp fragments were sequenced on the Illumina Genome Analyzer IIx. Sequencing analysis resulted in two files per library, each representing the results from one of each end of the fragment. This sequencing protocol resulted in approximately 7.8 million 90 bp reads of data per file, for a total of 2.8 Gbp of data in a single lane (Table 3). Two files were generated for each sample, one for each end of the molecule.

Sequence processing, annotation, and data analysis
Sequence processing Sequence reads that passed the quality control measures established by Illumina were uploaded to the MG-RAST version 3.0 server [9,10]. MG-RAST applies additional quality control parameters, determines the guanine and cytosine content (% G + C) for each read, and provides an estimate of sequencing errors using the DRISEE algorithm [11].

Metagenome processing
MG-RAST compares every read to the non-redundant database M5NR, which includes inferred protein sequences [10]. There are two MG-RAST files for SAE3-0 library that represent each end of the insert; these are EW3Pre1 and EW3Pre2 while the SAE3-5 library files for each end are referred to as EW3Post1 and EW3Post2. The results of sequence processing are shown in Table 4. The metagenome data are available as MG-RAST project number 11259, the North Railroad Avenue Plume EPA Superfund Site.

Metagenome annotation
The results of MG-RAST phylogenetic analysis include the number reads that match each OTU, and three other attributes: the average percent identity, the average alignment length, and the average expectation (e-value). The cutoff value for the minimum percent identity was 60%, the maximum e-value was 10 −5 , and the minimum average alignment length was 15 Bp. These data were

Post-processing
The read counts from each end of the DNA fragment identified as a genus-level OTU were used as proxies for replicates providing a measure of the noise in the HTS process. The baySeq algorithm assumes a negative bionomical distribution and uses a Bayesian approach to determine significant differences between high-throughput sequencing data sets and was used to assign FDRs to each genus [12]. Zipf's law (a.k.a. the power law) [13] was used to determine a cut-off for genera with low frequencies.

Metagenome properties
The microbial slurries extracted from the filters exhibited a dramatic color shift, from tan in SAE3-0 to black in SAE3-5 (Fig. 2), indicating a shift from an aerobic to anaerobic condition following the addition of the bioamendments. The DNA yield increased from 1.3 to 16.1 μg DNA per L water, indicating a 3.6 fold increase in biomass. A 9% decrease (79 out of 869) in the number of genera annotated was observed after remediation.
The overall properties of the SAE3-0 and SAE3-5 metagenomes are provided in Tables 5, 6, and 7. The level of sequencing errors (the DRISEE score) in the dataset that passed both instrumental and MG-RAST QC metrics falls well within one standard deviation of the 6.4 mean for all the MG-RAST data. Between 86.0% and 97.3% of reads were assigned to an OTU.
A distinctive shift toward decreased %G + C content is evident following biostimulation (Fig. 3). One explanation is that this change indicates shift toward species with lower %G + C, but it may also reflect recombinational processes during microbial evolution that facilitates horizontal transmission of genes. For example, Dehalococcoides codons tend toward high %G + C content, but the genes specific for VC respiration are composed of codons with low %G + C content that are flanked by mobile genetic elements, indicating that these genes are a relatively recent acquisition of the Dehalococcoides genome [22]. The

Taxonomic diversity
The phyla level taxonomic diversity at NRAP is shown in Dehalobacter and Dehalococcoides are known to compete with each other [24], making it possible that population fluctuations rendered Dehalobacter barely detectable when the metagenomic samples were collected. In addition, Dehalobacter may not be required for successful remediation, as it was not detected in DehaloR^2; a TCE dechlorinating consortium isolated from sediment and analyzed by 16S rRNA and qPCR [25]. Dehalobacter abundance and its role in this successful remediation project are under investigation. Members of the genus Geobacter exhibit a dramatic increase in abundance, increasing from 0.8 to 21% of all prokaryotes. This group was not measured as part of qPCR, DGGE, or PLFA experiments. G. lovleyi is an iron-reducing dechlorinator [26] that becomes the dominate species in response to ERD (Fig. 5). G. lovleyi and other members of this genus, including G. sulfurreducens and G. uraniireducens, are capable of transmitting electrons over special pilli known as microbial nanowires [27]. These species are capable of direct interspecies energy transfer, a recently recognized characteristic of biofilms capable of bioremediation [28,29]. Geobacter as well as SRB are detected during uranium bioremediation [30]. G. lovleyi can transfer cobamide to Dehalococcoides in culture, indicating that these species form an important metabolic link during ERD [31].
Dehalococcoides mccartyi are the only microbes currently known to fully respire PCE to ethene [32]; this species exhibit a 0.5 log 2 fold change increase in abundance at NRAP, an outcome consistent with effective ERD [25]. Matches to D. mccartyi strains 195, GT and VS, CBDB1, and BAV are detected at NRAP. This finding is consistent with pangenomic microarray results for Dehalococcoides enrichment cultures, in which heterogeneous mixtures of reductive dehalogenase genes are found, indicating the presence of a mixture of species with dehalorespiration abilities [33,34]. A dechlorinating enrichment cultures is known to contain a variety other phylogenetic groups    Fig. 3 Percent G + C shift at NRAP. A major peak at 60% is observed prior to remediation, but a bimodal distribution is observed with peaks at 45% and 60% G + C after bioamendment application [35], all of which increase at NRAP. Following six months of ERD at the NRAP site vinyl chloride was just beginning to increase (Fig. 1), so the small increase in Dehalococcoides is consistent with contaminant levels and it is expected that Dehalococcoides will increase in subsequent sampling events. Dehalogenimonas lykanthroporepellens is another DHB of the phylum Chloroflexi that exhibits a similar level of increase as other members of Dehalococcoidia at NRAP. D. lykanthropore was first isolated from a chlorinated solvent Superfund site in Louisiana [36,37] and has been detected in contaminated aquifers in Europe [38]. Other dehalorespiring species include the Firmicutes Desulfitobacterium and the Deltaproteobacteria Desulfovibrio; both increase in abundance at NRAP, consistent with findings from RD enrichment cultures [39]. The increase in Epsilonproteobacteria is lead by the genus Sulfurospirillum, which exhibits the largest increase (8.2 log 2 fold change) in abundance in response to ERD. Of the two species OTUs found at NRAP, one is capable of halorespiration ("S. multivorans") and is found at other chlorinated solvent contaminated sites [40]. Other chemolithotropic Epsilonproteobacteria genera that increase in response to the bioamendment are Sulfuricurvum, a sulfur-oxidizing facultative anaerobe first isolated from oil-storage containers [41] as well as Caminibacter [42], Nitratifractor [43], Nautilia [44], and Sulfurovum [45]; each of which were first isolated from deep-sea hydrothermal vents. The pathogenic branch of Epsilonproteobacteria is also represented at NRAP and provides additional evidence for groundwater as a reservoir for emerging pathogens [45]. Moreover, and of concern in the design, operation, and monitoring of an ERD remediation process, these potential pathogens were seen to increase in abundance following the addition of bioamendments (Fig. 5d). Two similar species that were observed to increase in abundance are Arcobacter nitrofigilis [46], which is a non-pathogenic nitrogen-fixing species and Arcobacter butzleri, known to cause diarrhea in humans [47]. Other examples of pathogenic OTUs that increased in abundance following the addition of bioamendments include Campylobacter jejuni, a well-characterized food and water-borne pathogen [48] and Helicobacter pylori, which is linked to gastric ulcers and other gastrointestinal syndromes [49]. Although Epsilonproteobacteria play an important role in the bioremediation process, these data indicate that testing for pathogenic bacteria is warranted when certifying that a formerly contaminated water source that has undergone treatment by ERD is safe.
The genus Desulfuromonas includes dehalorespiring species, although none of the species detected at NRAP are known to have this ability. De novo alignment and annotation is necessary to determine if a new dehalorespiring strain can be identified.
"Anaeromyxobacter dehalogenans" is one of three species of this Deltaproteobacteria genus detected at NRAP and it is capable of aryl-halorespiration. Matches to the L-haloacid dehalogenases present in the genomes of Pyrococcus horikoshii [50] and Sulfolobus tokodaii [51] are detected at NRAP. As members of thermophilic Archaea, these are unlikely matches since the average groundwater temperature at NRAP during the sampling period was 18.6°C, far below the range for thermophiles. Additional data collection is underway to facilitate de novo alignment of NRAP genomes and to identify alternative dehalogenation pathways.
In addition to the Deltaproteobacteria and phyla Chloroflexi and Firmicutes, methanogenic Archaea carry out key steps in the metabolic pathways during RD [33] and all three phylogenetic groups increase during the first five months of remediation. The dramatic increase in methanogenic Archaea (Fig. 5c) also explains the increase of on-site methane concentrations in the shallow vadose zone, an important consideration for safe ERD design and operation. Table 9 lists the functional diversity of the metagenomes from a protein-centric perspective using COG annotations. The availability of qPCR, PLFA, and DGGE data collected to determine the optimal remediation protocol provides evidence for the functional diversity of microbes, not just the genes. This facilitates a focus on shifts in taxonomy that occur in response to bioremediation that led to the degradation of PCE.

Statistical and visualization results
It can be difficult to determine an appropriate level of coverage for further examination. Here we turn to the Zipf's law and plot the log 10 reads per genus against the log 2 of the genus rank and annotate the with the -log 10 FDRs (Fig. 4). A clear inflection marks delineates where a logical cut-off can be made. For our preliminary analyses, genera with a total of 1000 counts or greater chosen for further annotation and visualization. This filtering step results in 579 genus-level OTUs retained of the 869 total (67%).
To visualize the ecological shifts in the community, we employ a modified rank abundance curve, in which the abundance variable is replaced by log 2 fold-change (Fig. 5) for each prokaryotic genera. Using this convention, the genera that increase as a result of biostimulation are located to the right of 0, those that decrease graph to the left. All genera are detectable before remediation, indicating that any non-indigenous bacteria introduced into the aquifer by injection of bioamendments have no affect on the ecology. Seventy-nine genera are not detected after remediation protocols, this decrease in diversity is expected as the community adapts to the availability of EVO as carbon source. The shape of the markers indicates the taxonomic affiliation of selected genera (Fig. 5a) highlights the phylogenetic shift in the community in response to remediation. Although the proteobacteria exhibit little change overall (Table 8), 81 out of 83 of Alphaproteobacteria, all 50 Betaproteobacteria, and all 89 Gammaproteobacteria genera decrease in abundance. Twenty-one out of 27 Deltaproteobacteria and all 13 Epsilonproteobacteria are more abundant after remediation. The expansion of Archaea is dominated by increases in methanomicrobia (12/12). The phenotypic characteristics of interest in bioremediation are polyphyletic, so it is necessary to consider relevant metabolic classifications independent of taxon. The rank fold-change plot can be further annotated to visualize multiple metabolic attributes of the data. In Figs. 5b-5d, selected metabolic categories are highlighted by color and facilitate the correlation of changes in groundwater chemistry to the taxonomic changes in the microbial community. The increase in ferrous iron Fig. 5 Rank abundance fold-change plot of genera detected at the NRAP site. Each of the 569 genera has at least 1000 reads and are ranked by their abundance on the y-axis. On the x-axis, those to the left of zero decrease in abundance after six months, those to the right increase. The number of genera in each category is indicated in parentheses. The labeled genera are discussed in the text. a Selected classes are indicated by marker shape. b The metabolic classification of genera is indicated by color. c Methanogenic and dehalogenic bacteria are highlighted by color. d Oxygen requirements for selected genera are indicated by color and demonstrates that the remediation protocol selects for anaerobic species. The labeled species are potential pathogens Plotting the log reads per genus against the genus rank reveals an inflection point that separates genera with coverage within a linear from those with limited coverage. Genera with less than 1000 reads were excluded from subsequent analyses detected in source area groundwater following the addition of EVO (Table 10) is consistent with the decrease in FeOB and increase in the FeRB (Fig. 5b). The surge in SRB reflects the decrease in sulfate and explains the unpleasant odor encountered during the November sampling. Genera that include species capable of dehalogenation and methanogenesis are notated by color in Fig. 5c; both increase. The shift toward anaerobic conditions in the aquifer is confirmed in Fig. 5d, in which the oxygen requirement of genera for which data are available are indicated by color.

Conclusions
We demonstrate the efficacy of whole metagenome sequencing as a tool to understand ERD and developed a statistical approach and a visualization method that aids this goal. The major difficultly with annotating plots with metabolic data is the lack of a single, authoritative source for metabolic characteristics of bacterial species. Most of the annotation was done manually by consulting the literature. Although the Joint Genome Institute provided a method of annotating genomes with metabolic classifications, few of the genomes detected are annotated by metabolic data. Despite this issue, it is clear that geochemical changes are correlated to the shift in the microbial community. Taxa capable of dehalorespiration and methanogenesis increase, as do species that thrive in anaerobic conditions, supporting the established principles of ERD. The importance of Geobacter in bioremediation is confirmed by NRAP data. Additionally, there are findings from these metagenomes that suggest topics for future discoveries. The NRAP data suggests that uncharacterized Archaea may also play a role in successful bioremediation of chlorinated solvents. The presence of pathogenic OTUs indicates that testing prior to recertification of aquifer as potable is a concern. As additional data is obtained from NRAP, de novo alignment and annotation of NRAP genomes will increase our understanding of ERD and its role in biodegradation.