Permanent draft genome of strain ESFC-1: ecological genomics of a newly discovered lineage of filamentous diazotrophic cyanobacteria

The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. One striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. Additionally, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.


Introduction
Microbial mats played a key role in the evolution of the early Earth and today provide a model system for exploring relationships between evolution, ecology, and biogeochemical cycles. In many mats, nitrogen-fixing filamentous Cyanobacteria are often central components with important roles in carbon, nitrogen and sulfur cycling [1,2]. Recently, a previously unknown lineage of filamentous nitrogen-fixing Cyanobacteria was described in intertidal microbial mats from Elkhorn Slough, Moss Landing, California [3]. The type strain of this organism, ESFC-1, lacks both heterocysts and an extracellular sheath and has been shown to be an important cyanobacterial diazotroph in the Elkhorn Slough system [3].
At Elkhorn Slough this strain is often a dominant cyanobacterial member of the community (along with Cyanobacteria closely related to Coleofasciculus chthonoplastes PCC 7420); the sequence abundance of ESFC-1 in 16S rRNA libraries based on DNA and cDNA has been observed to reach up to 5 % (based on pyrosequencing) and 33-36 % (based on clone libraries and pyrosequencing), respectively [3,4]. Although it is not always dominant, ESFC-1 is highly active, based on nifH transcript abundance and rRNA transcript to rRNA gene ratios [3,5]. Recent work has shown that ESFC-1 produces a considerable external carbon pool as an EPS; this EPS is managed by means of an active exoproteome, and provides a source of organic carbon for the cyanobacterium and other community members [6]. Previous phenetic analyses using full-length 16S rRNA gene sequences have indicated this organism shares only a moderate identity with other identified Cyanobacteria; its best cultured BLAST hit is the marine Aphanocapsa sp. HBC6 at 93.6 % similarity [7,8]. Given the importance of ESFC-1 in the Elkhorn Slough mat system and its evolutionarily divergent 16S rRNA, the genomic sequence was determined [3,8]. Here we report a detailed description of the genome of ESFC-1 as it relates to the ecology of this important mat community organism.

Classification and features
Strain ESFC-1 was isolated by L. Prufert-Bebout at NASA Ames Research Center in Moffett Field, California from the top 2 mm of intertidal microbial mat samples collected at Elkhorn Slough, California, USA. Fresh microbial mat was repeatedly streaked onto plates of a modified version of ASN artificial seawater medium, until a unialgal culture was obtained [3,9]. Strain ESFC-1 is a motile, Gram negative, non-heterocystous filament ( Fig. 1). Trichomes are cylindrical in shape and straight to slightly curved, with rounded to slightly conical ends. Individual cells are approximately 1.8 μm across, and cells are typically longer than wide, up to 3.5 μm in length, slightly longer than reported previously [3]. Constrictions between cells are shallow but clearly visible. Hormogonia and akinetes have not been observed. Heterocysts have not been observed in cells, even when actively fixing N 2 . Morphologically, ESFC-1 appears most similar to isolates of the form-genus Geitlerinema, but with a cell size more typical of the form-genus Leptolyngbya [10].
General features of ESFC-1 and project information are presented in Tables 1 and 2. In previous similarity and phylogenetic analyses based on the 16S rRNA locus, strain ESFC-1 did not show a close similarity (<94 %) with any other cyanobacterial sequence, and its phylogenetic placement within the cyanobacterial radiation was ambiguous [3,8]. A 31-marker gene phylogenomic analysis of the cyanobacterial radiation, including ESFC-1, is presented in Fig. 2. This analysis places ESFC-1 with strong support in a clade with two nondiazotrophic Spirulina strains, PCC 6313 and PCC 9445 [10].

Genome sequencing information
Genome project history Strain ESFC-1 was selected for sequencing because of its recent discovery as a major diazotroph in intertidal mat communities and its unique taxonomic position within the cyanobacteria. The genome project is deposited in the Genome On Line Database (GOLD Legacy ID Gi14129) and the complete genome sequence is deposited in GenBank (accession ARCP00000000). Sequencing, finishing and annotation were performed by the DOE-JGI. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
ESFC-1 was maintained in culture in liquid modified ASN at 25°C on a 14:10 L:D cycle under cool fluorescent lamps at approximately 50 μmol photons · m -2 · s -1 . High molecular weight genomic DNA was isolated based on the "JGI Bacterial DNA isolation CTAB" protocol from JGI [11], including an RNA digestion step according to the protocol. 50 μg of gDNA was provided to the JGI for sequencing.

Genome sequencing and assembly
The high-quality draft genome of strain ESFC-1 was generated by the DOE-JGI using the Illumina GAIIx platform [12]. An Illumina standard short-insert pairedend library with an average insert size of 222 bp +/− 50 bp generated 15,283,374 reads. An Illumina CLIP-PE Image is a calculated maximum intensity projection of a 50 μm z-stack. Red is autofluorescent ESFC-1 trichomes. Cells were fixed with 10 % formaldehyde prior to imaging. c Scanning electron microscopy image of ESFC-1 trichomes. ESFC-1 samples were fixed with 10 % formaldehyde, rinsed with sterile water, spotted onto a silicon wafer, air-dried and coated with~5 nm of gold. Imaged with an FEI Inspect F SEM (Hillsboro, OR). For all panels, scale bar represents 10 μm long-insert paired-end library with an average insert size of 7791 +/− 660 bp generated 18,062,354 reads [13]. In total, 4099 Mbp of Ilumina data were generated.
The Illumina draft data was assembled with Allpaths, version r38445 [14], and contained 117 contigs in 14 scaffolds. The consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The draft data was also assembled with Velvet, version 1.1.05 [15], and the consensus sequences were computationally shredded into 1.5 Kbp overlapping shreds. The Illumina draft data was reassembled with Velvet using the shreds from the first Velvet assembly to guide the reassembly. The consensus from this second Velvet assembly was shredded into 1.5 Kbp overlapping fake reads. Fake reads from the Allpaths and both Velvet assemblies were assembled using parallel phrap, version 4.24 (High Performance Software, LLC) with a subset of the Illumina CLIP-PE reads [16,17]. Possible misassemblies were checked and manually corrected in Consed [18]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished). The final assembly is based on 4099 Mbp of Illumina draft data, with an average genome coverage of 719×.

Genome annotation
The genome was annotated automatically with Prodigal 2.5 [19] by IMG [20], locally using the RAST server [21], and by GeneMarkS+ [22] by the NCBI annotation pipeline. Pathways of interest were mapped to the KEGG maps through both IMG and RAST.

Genome properties
The high quality draft genome of cyanobacterium ESFC-1 was resolved to 3 scaffolds consisting of 5,431,811, 135,349 and 64,875 bp, for a total of 5,632,035 bp. GC content was 46.47 %. The genome sequence is predicted to encode 5006 total genes, with 92 RNA genes, and 4914 protein-encoding genes. A majority (79.0 %) of genes were assigned putative functions, and the remainder were annotated as hypothetical proteins. The properties of the ESFC-1 genome, and the distribution of genes into COG functional groups are presented in Tables 3, 4, and Fig. 3. 16S rRNA gene sequence similarity to closely related cultured cyanobacteria, as determined by the phylogeny in Fig. 2, is summarized in Table 5 for the two 16S rRNA genes found in this genome. a Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly ob-served for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [41]  Insights from the genome sequence ESFC-1 is a nitrogen-fixing cyanobacterium [3]; all structural genes for nitrogenase were detected (nifHDK operon; A3MYDRAFT_2398-2400), as were genes required for uptake of nitrate and reduction to ammonia (narB and nirA; A3MYDRAFT_3316 and A3MYDRAFT_3311, respectively). However, ESFC-1's sister taxa, Spirulina PCC 6313 and PCC 9445, as determined by the phylogenomic analysis presented in Fig. 2, both lack the nif operon. Of the cyanobacterial taxa found in the phylogenomic tree, the highest scoring BLAST hit to the translated nifH sequence of ESFC-1 belongs to Halothece sp. PCC 7418 (second best score overall, 87 % amino acid identity, E-value 0.0). ESFC-1 has homologs required for assimilatory sulfur reduction, and a homolog for a sulfide:quinine oxidoreductase, suggesting it can utilize hydrogen sulfide as an electron donor to photosystem I; a useful trait in mat environments that may periodically become anoxic/sulfidic [23,24]. ESFC-1 has a full complement of homologs for photosystems I and II, the cytochrome b 6 f complex, photosynthetic electron transport and ATP synthesis. Detected phycobilisome gene homologs indicate a phycocyaninrich genotype, with phycocyanin, allophycocyanin core, and linker peptide homologs present. ESFC-1 appears to lack the ability to chromatically adapt; phycoerythrin and phycoerythrocyanin genes appear to be absent. A single set of phycocyanin homologs are present (cpcBA; A3MYDRAFT_2965-2964).
Complete sets of genes were detected for the Calvin-Benson cycle, the pentose phosphate pathway, Entner-Doudoroff pathway and glycolysis/gluconeogenesis. TCA cycle and the carbon dioxide concentrating mechanism  [35,42] showing the phylogenomic affiliation of ESFC-1 with two species of Spirulina (PCC 6313 and PCC 9445; clade in red). Only the portion of the larger tree corresponding to lineage B2 (sensu [42]) is shown. The full 126-taxon ML tree was built using PHYML using the LG protein substitution matrix, and was rooted with Chloroflexus auranticus J-10, Rhodobacter sphaeroides 2.4.1, Heliobacterium modesticaldum Ice1, and Chlorobium tepidum TLS [43][44][45]. ML bootstrap values for nodes >50 are shown; black boxes at the nodes denote bootstrap values of 100. Strain ESFC-1 and the nearest neighbors are highlighted in red

Extended insights
Cyanobacterium ESFC-1 appears to lack either a functional uptake or bi-directional hydrogenase. Neither the JGI nor RAST annotations detected these sequences. Extensive manual searches for the hox cluster genes (hoxEFUYH), encoding the bi-directional hydrogenase, and the hupSL genes encoding the uptake hydrogenase commonly found in N-fixing cyanobacteria, were unsuccessful. Similarly, the hydrogenase-maturation enzymes hypFCDE were not found, although the hypAB locus was detected (A3MYDRAFT_0781-0782). Comparative analyses of strain ESFC-1 with the two closely-related Spirulina strains revealed they both lack nif and hupL, but unlike ESFC-1, both possess hupS homologs and the hox operon. The hypFCDEAB homologs were found dispersed throughout their genomes; a comparative blastp analysis between the Spirulina PCC 6313 hox operon and best hits within strain ESFC-1 did not reveal any evidence of synteny for nearby loci. The genome of ESFC-1 contains an approximately 56 kbp region of low GC content, with several putatively horizontally transferred ORFs (Fig. 3). Based on the IMG annotation, 51 ORFs were identified (A3MYDRAFT_4511 -A3MYDRAFT_4561), with a global GC content of 39.67 % and individual GC content for loci ranging from 29 to 47 %, compared with the global genome GC content of 46.47 %. At the phylum level and higher, IMG designated 112 ORFs as putatively horizontally transferred within the entire genome; of these 16, or 14.2 %, were found within this island (11 from Proteobacteria, and one each from Bacteroidetes, Chloroflexi, Deferribacteres, Firmicutes and Nitrospirae). Additional blastp analysis against the non-redundant protein database (excluding environmental sequences) for these ORFs revealed 19 have best hits to non-cyanobacterial sequences. Several additional sequences in this island did have best hits to cyanobacterial sequences, but these cyanobacterial homologs appeared to be restricted to ESFC-1 and a few closelyrelated Cyanobacteria. The remaining high-scoring hits for these genes belonged to other bacterial phyla, suggesting that a gene transfer event for these loci into the Cyanobacteria occurred in a common ancestor shared by ESFC-1 and the other closely-related Cyanobacteria.
In total, as many as 32 loci in the island region may have been horizontally transferred, either recently or into an ancestor of ESFC-1. Predicted gene functions for this region are primarily involved with lipopolysaccharide and outer membrane synthesis, including several methyltransferase-like and glycosyltransferase-like enzymes, such as homologs to the rfaB, rfaG, rfaS, loci and to the rfaL and rfbX loci related to Oantigen synthesis. Seven of these proteins have been detected via proteomics of an ESFC-1 culture, four of which were extracellular [6], suggesting genes in this island may play a role in extracellular polysaccharide and cell wall synthesis and maintenance. Many Cyanobacteria secrete exopolysaccharides with distinct structures and roles, both in protection from stress (ultraviolet radiation, osmotic and metals) and possibly for carbon storage [25]. This genomic island may provide an advantage to ESFC-1 in stress protection, as has been shown with other cyanobacterial genomic islands [26].  Conclusions Despite representing a genus-level divergence within the Cyanobacteria, based on both 16S rRNA and phylogenomic analyses, the genome of ESFC-1 appears to belong to a typical filamentous cyanobacterium. However, the ESFC-1 genome is striking in its apparent lack of uptake or bi-directional hydrogenases expected within a diazotrophic cyanobacterium. Although the uptake hydrogenase hupSL is found dispersed through the cyanobacterial radiation, to our knowledge, strain ESFC-1 is one of very few N-fixing cyanobacteria to lack this gene [27]. The uptake hydrogenase is generally considered an integral part of the energetically expensive process of N-fixation, allowing the cyanobacterium to recapture hydrogen produced by nitrogenase activity [28]. However, a deficient mutant of Anabaena sp. PCC 7120, lacking the large subunit of the uptake hydrogenase, hupL, demonstrated similar growth and N-fixation rates compared to the wildtype, but with enhanced hydrogen production under N-fixation conditions [29]. Given this, and the fact that ESFC-1 is known to be an active nitrogen-fixer in situ in the Elkhorn Slough intertidal mat community, it appears that some Cyanobacteria do not require a classical uptake hydrogenase, yet still perform this critical ecological role.
The bi-directional hydrogenase hox is also common within Cyanobacteria. It is thought to play roles in fermentation and potentially as an electron valve during photosynthesis to maintain proper redox conditions [30][31][32]. However, as suggested by Tamagnini et al. [33], the physiological role of the bi-directional hydrogenase in Cyanobacteria is unsettled. A recent analysis of 36 cyanobacterial strains indicated that a bi-directional hydrogenase was necessary for hydrogen production via fermentation in these cyanobacteria [34]. Consistent with its apparent lack of a bi-directional hydrogenase, strain ESFC-1 has been shown to produce hydrogen under N-fixation, but not fermentation conditions under laboratory conditions (data not shown). One possible explanation is that strain ESFC-1 ferments under anoxic conditions via lactate or homolactate fermentation pathways, as found within the genome. Such fermentation is known from filamentous Cyanobacteria, and allows for maintenance of redox without concomitant production of hydrogen gas [30]. Both Spirulina spp. PCC 6313 and PCC 9445 possess homologues for the hox genes, so this absence in strain ESFC-1 is best explained by loss, consistent with the uneven distribution of the hup and hox genes in the cyanobacterial radiation [33].
Finally, since the ESFC-1 genome is not closed, the absent hox, hup and hyp genes are possibly in missing regions, or simply were not detected in the automated annotation process. However, extensive manual searches of the genome failed to find any putative hydrogenases. A search of the draft genome for the 107 marker genes commonly used to estimate completeness in metagenomic analyses found all 107 [35], suggesting the absence of these three gene groups is genuine.
Despite the apparent lack of a functional hydrogenase, strain ESFC-1 has been shown to be a dominant and active member of the Elkhorn Slough community. Further, it appears to be globally distributed. Although this distribution appears more limited compared to the cosmopolitan C. chthonoplastes, both nifH and 16S rRNA gene environmental sequences similar to ESFC-1 (>95 %) have been observed in the intertidal mats at Guerrero Negro, Mexico [36], and in lake sediments in Daqing, China (unpublished data, accession KJ176902). An isolate, Leptolynbya sp. LEGE 07176, from the intertidal zone in Portugal [37] may represent a second isolate of this lineage. As one of the only known N-fixing cyanobacteria natively lacking an uptake hydrogenase, this organism may be a suitable target for hydrogen production research. Future studies of ESFC-1 should experimentally confirm the lack of functioning hydrogenase proteins, and explore the nature and energetics of fermentation and N-fixation, and the ecological consequences for an organism that lacks these key enzymes.