Skip to main content

Metagenomes and metatranscriptomes from the L4 long-term coastal monitoring station in the Western English Channel

Abstract

Both metagenomic data and metatranscriptomic data were collected from surface water (0–2m) of the L4 sampling station (50.2518 N, 4.2089 W), which is part of the Western Channel Observatory long-term coastal-marine monitoring station. We previously generated from this area a six-year time series of 16S rRNA V6 data, which demonstrated robust seasonal structure for the bacterial community, with diversity correlated with day length. Here we describe the features of these metagenomes and metatranscriptomes. We generated 8 metagenomes (4.5 million sequences, 1.9 Gbp, average read-length 350 bp) and 7 metatranscriptomes (392,632 putative mRNA-derived sequences, 159 Mbp, average read-length 272 bp) for eight time-points sampled in 2008. These time points represent three seasons (winter, spring, and summer) and include both day and night samples. These data demonstrate the major differences between genetic potential and actuality, whereby genomes follow general seasonal trends yet with surprisingly little change in the functional potential over time; transcripts tended to be far more structured by changes occurring between day and night.

Introduction

The Western Channel Observatory station L4, located off the Plymouth coast in the UK, has been collecting environmental data for almost a century [1]. This includes published 16S rRNA V6 amplicon pyrosequencing data cataloging monthly patterns in microbial diversity [2,3]. The importance of the area rests with its being a transition zone between many northern and southern planktonic species [1] and with the fact that, as a major confluence between the North Atlantic Ocean and the North Sea, water masses exhibit extremely short residence times (>2 months [4];). In the study reported here, we use shotgun metagenomics and metatranscriptomics to identify the relationship between genetic and functional diversity at station L4.

Classification and features

Relationship of reported datasets

We generated 8 metagenomes and 7 metatranscriptomes for eight time points. Figure 1 shows the relationships of these metagenomes and metatranscriptomes; the figure was produced by using a group-average clustering dendrogram representing the relationships based on comparison of 66,529 amino acid sequences of greater than 40 amino acids predicted from each dataset (for details of the process, see Metagenome Annotation). One can clearly see that the metagenomic and metatranscriptomic data cluster separately. The metagenomic data shows an average similarity of less than 2%, clustered by season, from which one can infer that the seasonal differences are stronger than the diel differences. On the other hand, the metatranscriptomes show more similarity and a tendency to cluster by diel time point; specifically, the April night data and January night data are more similar to each other than either is to the April day data and January day data. The August metatranscriptomes cluster by themselves, but this clustering is also structured by day and night. Table 1 details the classification and general features of the metagenomic datasets information for this study in MIMS format.

Figure 1.
figure1

Group-average dendrogram showing relationship between all metagenomes and metatranscriptomes, based on comparison of annotated protein fragments via BLAST x using the SEED database in MG-RAST for each dataset. MTS – metatranscriptome. MGS – metagenome.

Table 1. Classification and general feature of 8 metagenome datasets according to the MIMS recommendations [5].

Environmental characteristics and descriptions

Environmental data was collected for temperature, density, salinity, chlorophyll a, total concentration of organic nitrogen and carbon, nitrate, ammonia, silicate, and phosphate [Table 2]. The methods used are described on the Western Channel Observatory website.

Table 2. Environmental variables for each sampling occasion

Figure 2 plots the environmental trends at L4 averaged for the years 2003–2008; the graph clearly shows the differences among the samples taken in the three months. Figure 3 shows a principal component analysis of the environmental parameters recorded during this study. Evident from the figure is the fact that the January samples have higher nutrient concentrations, the April samples show changes in the water salinity as a consequence of density, and the August samples show changes in temperature and chlorophyll a concentration.

Figure 2.
figure2

Monthly annual averages for all environmental parameters and species richness (S). TO — total organic; SRP — Soluble Reactive Phosphorous; PAR — Photosynthetically Active Radiation; NAO — North Atlantic Oscillation. Data taken from Gilbert et al., 2010.

Figure 3.
figure3

Principal component analysis of environmental variables showing the seasonal differences in variables outlined in Table 2. Classification and general features of the 15 datasets in accordance with the MIMS recommendations [5]

Metagenome sequencing and annotation

Metagenome project history

Two factors motivated the choice of station L4: its century-long history of environmental data [8] and the six years of 16S rRNA V6 amplicon pyrosequencing information detailing microbial diversity patterns [2,3], from which we inferred interannual variability from our single-year study. All 16S rRNA V6 amplicon pyrosequencing data have been submitted to the NCBI short reads archive under SRA009436 and registered with the GOLD database (Gm00104). The data also can be accessed from the VAMPS servers. The metagenomic data and metatranscriptomic data are available on the CAMERA website under Western Channel Observatory Microbial Metagenomic Study and on the Metagenome Rapid Annotation using Subsystem Technology (MG-RAST) system under 4443360-63, 4443365-68 and 4444077, 4445065-68, 4445070, 4445081, and 4444083, as well as through the INSDC short-reads archive under ERP000118. Table 1, Table 2, Table 3, and Table 4 detail the metagenomic sequencing project information for this study in MIMS format.

Table 3. Metagenome sequencing project information (MIMS compliance)
Table 4. Metatranscriptome sequencing project information (MIMS compliance)

Sampling and DNA isolation

For the sampling, a minimal-impact surface buoy was deployed with a 7 m current drogue following a Lagrangian drift. Samples were taken at station L4 to represent three seasons and both day and night readings, as follows:

  • Winter: January 28, at 3:00 pm and again at 7 pm (2 hours after sundown) at 50.2611 N: 4.2435 W

  • Spring: April 22, at 4 pm and again at 10 pm (one and a half hours after sundown) at 50.253N:4.1875W

  • Summer: August 27, at 4 pm and again at 10 pm (two hours after sundown) at 50.2545N:4.199W

  • Summer: August 28, at 4 am (two hours before sunrise) at 50.2678N:4.1723W and at 10 am at 50.2665N:4.1486W

The sampling technique involved the following steps: (1) collection of 20 L of seawater from the surface (0–2 m), (2) prefiltering through a 1.6 µm GF/A filter (Whatmann), (3) passage of the filtrate through a 0.22 µm Sterivex cartridge (Millipore) for a maximum of 30 minutes (approximately 10 L per Sterivex cartridge); (4) pump-drying and snap-freezing of the cartridges in liquid nitrogen, (5) barcoding [9] of the samples at the laboratory, and (6) storage at −80 °C.

Both DNA and RNA then were isolated from each sample [2,9], barcoded, and stored at −80°C. DNA and mRNA-enriched cDNA were purified from the samples; for details, see [9].

Metagenome sequencing and assembly

The isolated DNA was used for metagenomic analysis, and the mRNA-enriched cDNA was used for metatranscriptomic pyrosequencing analysis. All DNA and cDNA were pyrosequenced on the GS-FLX Titanium platform. No DNA assembly was carried out.

Metagenome annotation

The MG-RAST bioinformatics server [10] was used for annotating the metagenomic samples [113]. The data also were processed by using custom-written programming scripts on the Bio-Linux system [6] at the NERC Environmental Bioinformatics Centre unless otherwise indicated. In order to ensure high quality, the following sequences were removed from the pyrosequenced data: transcript fragments with >10% non-determined base pairs (Ns), fragments <75 bp in length, fragments with >60% of any single base, and exact duplicates (resulting from aberrant dual reads during sequence analysis). So-called artificial duplicates in the metagenomic data (i.e., multiple reads that start at the same position; see, e.g., Gomez-Alvarez et al., 2009) were not removed, however, because of the possibility of their being natural; their removal would have precluded comparison with the metatranscriptomic data [12].

The nucleic acid sequences were then compared with three major ribosomal RNA databases – (SILVA, RDP II, and Greengenes – using the bacterial and archaeal 5S, 16S, and 23S and the eukaryotic 18S and 25S sequence annotator function of MG-RAST (e-value < 1 × 10–5; minimum length of alignment of 50 bp; minimum sequence nucleotide identity of 50%). Reads annotated as rRNA were excluded. All subsequent reads were considered to be valid DNA or valid putative mRNA derived sequences and were annotated against the SEED database using MG-RAST (e-value < 1 × 10–3; minimum length of alignment of 50 bp; minimum sequence nucleotide identity of 50%; Meyer et al., 2008). The result was an abundance matrix of functional genes and protein-derived predicted taxonomies across the DNA and mRNA samples.

The sequences also were translated using the techniques described by Gilbert et al. (2008) and Rusch et al. (2007) [9,13]. Predicted open reading frames (pORFs) having >40 amino acids were produced in all six reading frames. The CD-HIT program [15] was used to cluster the proteins from the datasets at 95% amino acid identity over 80% of the length of the longest sequence in a cluster. The longest representative from each cluster then was clustered at 60% amino acid identity over 80% of the length of the longest sequence to group these sequences by protein families. Based on the relative abundance of each sample in a cluster, an abundance matrix was created using the output cluster files from CD-HIT that contained the original fasta sequences and headers for each sample (abundanceMatrix-twoStep.pl). Subsequently, protein clusters with ≤2 representative pORFs were removed from the pORF matrix (MatrixParser.pv). In order to equalize the sequencing effort, all samples were randomly resampled (Daisychopper.pl) to the same number of pORFs or sequences across the clusters or functional/taxonomic SEED annotations.

Metagenome properties

Approximately 4.5 million combined microbial metagenomic reads were produced, comprising 1.9 billion bp, with an average read length of 350 bp across the eight samples, ranging from 326,475 to 784,823 sequences [Table 5]. Seven metatranscriptomic datasets were also produced (the sample taken on August 28 at 10 am was lost in transit) totaling 1 million sequences. After cleanup, 392,632 putative mRNA-derived sequences remained, totaling 159 million bp, with an average of 272 bp per sequence. The effort per sample varied from 33,149 to 96,026 sequences [Table 6]. SEED annotations produced via MG-RAST (Table 7 and Table 8 ranged from 20% to 46% of each metagenomic dataset and from to 11% to 35% of the metatranscriptomic datasets.

Table 5. Metagenome statistics
Table 6. Metatranscriptome statistics
Table 7. Number of genes associated with the general SEED functional categories
Table 8. Number of transcripts associated with the general SEED functional categories

Highlights from the metagenome sequences

In general, in the samples, metagenomes were more similar than metatranscriptomes. Photosynthesis genes showed both seasonal and diel changes: specifically, 10 times greater photosynthetic potential in winter than in summer and greater abundance at night in January and April. Gene fragments annotated to proteorhodopsin showed virtually no seasonal or diel fluctuations, however: only approximately 0.07% of the annotated functional profile from each sample. Other seasonal differences in metagenomic profiles included a considerably higher winter abundance (compared to spring or summer) of archaeal genes associated with lipid synthesis, thermosome chaperonins, RNA polymerase, small subunit ribosomal proteins, DNA replication, and rRNA modification. Diel differences were apparent among genes involved in respiratory metabolism, which were more abundant at night.

The metatranscriptomic photosynthetic profiles were similar to those of the metagenomes in that photosynthesis genes were most abundant in January and virtually absent in August. Photosynthetic transcripts also were most abundant during the winter. On the other hand, unlike metagenomes, they were most abundant in the daytime in all months. Other seasonal differences in metatranscriptomic seasonal profiles included a greater abundance of transcripts related to membrane transport, especially amino acid transport, in summer when nutrients and dissolved organic material (DOM) are least abundant. The diel metatranscriptional profiles for January showed considerable difference in functions (in addition to photosynthesis); for example, transcripts relating to nitrogen cycling were most abundant during the day and were associated mainly with ammonification. Cell wall and capsule and cell division and cycle were upregulated at night, suggesting a nocturnal increase in cell division, potentially associated with the Cyanobacteria. Similarly, April samples showed a considerable up-regulation in RNA metabolism during the day, resulting primarily from an increase in group I intron and RNA polymerase transcripts. In August, transcripts with homology to membrane transport were upregulated during the day, while transcripts associated with motility and chemotaxis and with the synthesis of cofactors, vitamins, prosthetic groups, and pigments were considerably upregulated at night, suggesting that nocturnal motility and cellular activity (nucleotide and amino acid synthesis) were also upregulated.

References

  1. 1.

    Southward AJ, Langmead O, Hardman-Mountford NJ, Aiken J, Boalch GT, Dando PR, Genner MJ, Joint I, Kendall MA, Halliday NC, et al. Longterm oceanographic and ecological research in the Western English Channel. Adv Mar Biol 2004; 47:1–105. doi:10.1016/S0065-2881(04)47001-1

    Article  Google Scholar 

  2. 2.

    Gilbert JA, Field D, Swift P, Newbold L, Oliver A, Smyth T, Somerfield P, Huse S, Joint I. Seasonal succession of microbial communities in the Western English Channel using 16S rDNA-tag pyrosequencing. Environ Microbiol 2009; 11:3132–3139. PubMed doi:10.1111/j.1462-2920.2009.02017.x

    Article  CAS  PubMed  Google Scholar 

  3. 3.

    Gilbert JA, Swift P, Somerfield P, Temperton B, Huse S, Smyth T, Field D. Seasonal succession and impact of environmental change on bacterial populations in the Western English Channel: A sixyear study. ISME J 2010; (In Review).

  4. 4.

    Siddorn JR, Allen JI, Uncles RJ. Heat, alt and tracer transport in the Plymouth Sound coastal region: a 3-D modeling study. J Mar Biol Assoc UK 2003; 83:673–682. doi:10.1017/S002531540300763Xh

    Article  Google Scholar 

  5. 5.

    Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 2008; 26:541–547. PubMed doi:10.1038/nbt1360

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  6. 6.

    Field D, Tiwari B, Booth T, Houten S, Swan D, Bertrand N, Thurston M. Open software for biologists: from famine to feast. Nat Biotechnol 2006; 24:801–803. PubMed doi:10.1038/nbt0706-801

    Article  CAS  PubMed  Google Scholar 

  7. 7.

    Harris R. The L4 time-series: the first 20 years. J Plankton Res 2010; 32:577–583. doi:10.1093/plankt/fbq021

    Article  Google Scholar 

  8. 8.

    Booth T, Gilbert JA, Neufeld JD, Ball J, Thurston M, Chipman K, Joint I, Field D. Handlebar: a flexible, web-based inventory manager for handling barcoded samples. Biotechniques 2007; 42:300–302. PubMed doi:10.2144/000112385

    Article  CAS  PubMed  Google Scholar 

  9. 9.

    Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I. Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE 2008; 3:e3042. PubMed doi:10.1371/journal.pone.0003042

    PubMed Central  Article  PubMed  Google Scholar 

  10. 10.

    Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Stevens R, Wilke A, Wilkening J, Edwards RA. The Metagenomics RAST Server — a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008; 9:386. PubMed doi:10.1186/1471-2105-9-386

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  11. 11.

    Gomez-Alvarez V, Teal TK, Schmidt TM. Systematic artifacts in metagenomes from complex microbial communitiesMetagenomes artifact. ISME J 2009; 3:1314–1317. PubMed doi:10.1038/ismej.2009.72

    Article  PubMed  Google Scholar 

  12. 12.

    Niu B, Fu L, Sun S, Li W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 2010; 11:187. PubMed doi:10.1186/1471-2105-11-187

    PubMed Central  Article  PubMed  Google Scholar 

  13. 13.

    Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, et al. The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol 2007; 5:e77. PubMed doi:10.1371/journal.pbio.0050077

    PubMed Central  Article  PubMed  Google Scholar 

  14. 14.

    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene Ontology: tool for the unification of biology. Nat Genet 2000; 25:25–29. PubMed doi:10.1038/75556

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  15. 15.

    Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22:1658–1659. PubMed doi:10.1093/bioinformatics/btl158

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was funded by a grant from the Natural Environmental Research Council (NERC - NE/F00138X/1). The authors thank Neil Hall from the NERC / University of Liverpool Advanced Genomics Facility. This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jack A. Gilbert.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Cite this article

Gilbert, J.A., Meyer, F., Schriml, L. et al. Metagenomes and metatranscriptomes from the L4 long-term coastal monitoring station in the Western English Channel. Stand in Genomic Sci 3, 183–193 (2010). https://doi.org/10.4056/sigs.1202536

Download citation

Keywords

  • Marine
  • aerobic
  • surface water
  • coastal
  • temperate
  • metagenome
  • metatranscriptome
  • pyrosequencing
  • time-series
  • diel
  • seasonal