High-quality draft genome sequence of Ensifer meliloti Mlalz-1, a microsymbiont of Medicago laciniata (L.) miller collected in Lanzarote, Canary Islands, Spain

10.1601/nm.1335 Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata. This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here the features of 10.1601/nm.1335 Mlalz-1 are described, together with high-quality permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to 10.1601/nm.1335 10.1601/strainfinder?urlappend=%3Fid%3DIAM+12611 T, 10.1601/nm.1334 A 321T and 10.1601/nm.17831 10.1601/strainfinder?urlappend=%3Fid%3DORS+1407 T, based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as 10.1601/nm.1335. Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata-nodulating 10.1601/nm.1328 strains, but ≤93% with nodC of 10.1601/nm.1328 strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced 10.1601/nm.1335 strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In 10.1601/nm.1334 strain 10.1601/strainfinder?urlappend=%3Fid%3DWSM+419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of 10.1601/nm.1334 strains, which suggests genetic recombination between strain Mlalz-1 and 10.1601/nm.1334 and the horizontal gene transfer of lpiA-acvB. Electronic supplementary material The online version of this article (10.1186/s40793-017-0270-2) contains supplementary material, which is available to authorized users.


Introduction
Symbiotic nitrogen fixation by pasture legumes and their associated root nodule bacteria provides a critical contribution to sustainable animal and plant production, and the maintenance of soil fertility in agricultural systems [1][2][3]. As such, it is of direct relevance to maintaining environmentally sustainable high agricultural yields, which significantly contributes to the Sustainable Development Goals adopted in September 2015 as part of the UN's development agenda 'Transforming our world: the 2030 Agenda for Sustainable Development' [4]. Medics (Medicago spp.) are some of the most important and extensively grown pasture legumes and their specific symbiosis with strains of rhizobia belonging to either Ensifer (synonym Sinorhizobium) meliloti or the closely related species E. medicae [5,6] has been the subject of extensive research efforts [7].
Medicago laciniata (L.) Miller (cut leaf medic), an annual native of southern and eastern Mediterranean and Saharo-Sindian countries, is of importance because of its ability to grow in comparatively arid habitats and marginal cropping areas [8][9][10][11]. It is highly specific in its rhizobial requirements, forming a symbiosis only with a restricted subset of E. meliloti and not with strains that nodulate Medicago sativa L. (alfalfa) or Medicago truncatula Gaertn. [12,13]. This symbiotic specificity has been linked to the rhizobial nod genes, in particular a specific nodC allele [14]. For example, van Berkum and colleagues found that most rhizobial strains isolated from Tunisian M. truncatula and M. laciniata shared chromosomal identity, but differed in their nodC alleles [15]. Based on these and other differing symbiotic traits, Villegas et al. [13] proposed two biovars within E. meliloti: bv. medicaginis for Ensifer strains that are symbiotically efficient on M. laciniata and bv. meliloti for the classical E. meliloti group that efficiently nodulates M. sativa. However, in subsequent studies the diversity observed within bv. medicaginis strains indicate that this group is certainly heterogeneous [16].
M. laciniata is native to the Canary Islands and is present on all of the islands of this archipelago, growing in environments that range from arid to subhumid. Ensifer meliloti strain Mlalz-1 was isolated from a N 2 -fixing nodule of M. laciniata grown in alkaline soil (pH 9.0) collected in Guatiza, in the arid Northeast of Lanzarote Island, in 2007. This strain was one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 GEBA-RNB project proposal [17,18]. Here an analysis of the complete genome sequence of E. meliloti Mlalz-1 is provided.

Organism information
Classification and features E. meliloti Mlalz-1 is a motile, non-sporulating, nonencapsulated, Gram-negative strain in the class Alphaproteobacteria. The rod shaped form has dimensions of approximately 0.5 μm in width and 1.0-2.0 μm in length ( Fig. 1 Left and Center). It is fast growing, forming colonies after 3-5 days when grown on ½LA, TY, or a modified yeast-mannitol agar [19] at 28°C. Colonies on ½LA are opaque, slightly domed and moderately mucoid with smooth margins (Fig. 1 Right). Minimum Information about the Genome Sequence (MIGS) for strain Mlalz-1 is provided in Table 1 and Additional file 1:  Table S1.

Symbiotaxonomy
M. laciniata is a highly specific host and its microsymbionts also appear to be highly specific since studies of Medicago isolates have shown that M. laciniata strains fail to nodulate a range of Medicago species [5,12]. Bailly et al. [20] reported that isolates of M. laciniata nodulated and fixed nitrogen with M. truncatula, but also provided evidence that these were the progeny of horizontal transfer of the nodulation genes. Strain Mlalz-1 nodulates and is effective for nitrogen fixation with M. laciniata. We report here that strain Mlalz-1 is unable to nodulate Medicago polymorpha L., the definitive host for E. medicae strains [6].

Extended feature descriptions
Previous studies using multilocus sequence typing showed that M. laciniata rhizobia did not form a distinct chromosomal group [15]. Phylogenetic analysis of strain Mlalz-1 was performed by aligning the 16S rRNA sequence (1389 bp from scaffold 84.85) to the 16S rRNA gene sequences of Ensifer type strains (Fig. 2). Based on four variable sites within this 16S rRNA gene sequence alignment, strain Mlalz-1 is closely related to E. meliloti IAM 12611 T (= LMG 6133 T ) [21], E. medicae A 321 T (= LMG 19920 T ) [6] and E. numidicus ORS 1407 T [22]. The available IMG 16S rRNA sequence of strain Mlalz-1 gave alignment identities of 100% to E. meliloti IAM 12611 T , 99.7% to E. medicae A 321 T and 99.5% to E. numidicus ORS 1407 T . In contrast, E. meliloti IAM Fig. 1 Images of Ensifer meliloti Mlalz-1 using scanning (Left (a)) and transmission (Center (b)) electron microscopy as well as light microscopy to visualize colony morphology on solid media (Right (c)) 12611 T and Ensifer terangae LMG 7834 T [23] were only 97.3% similar.

Genome sequencing information
Genome project history E. meliloti Mlalz-1 was selected for sequencing at the U.S. Department of Energy funded Joint Genome Institute as part of the GEBA-RNB project [17,18]. The root nodule bacteria in this project were selected based on environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance. In particular, strain Mlalz-1 was chosen since it has strict host specificity for M. laciniata, which is suited for cultivation in arid environments [11]. The E. meliloti Mlalz-1 genome project is deposited in the Genomes Online Database [24] and a high-quality permanent draft genome sequence (IMG Genome ID 2513237143) is deposited in IMG [25]. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 2.
Growth conditions and genomic DNA preparation E. meliloti Mlalz-1 (= USDA 1984) was cultured on MAG solid media [26] for three days at 28°C to obtain well grown, well separated colonies, then a single colony was selected from the plate and inoculated into 5 ml MAG broth media. The culture was grown for 48 h on a gyratory shaker (200 rpm) at 28°C. Subsequently 1 ml was used to inoculate 50 ml of MAG and the cells were Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). Evidence codes are from the Gene Ontology project [76,77] incubated on a gyratory shaker (200 rpm) at 28°C until an OD 600nm of 0.6 was reached. DNA was isolated from 50 ml of cells by Peter van Berkum according to the method described by van Berkum [26]. The final concentration of the DNA was set to 0.5 mg ml −1 .

Genome sequencing and assembly
The draft genome of E. meliloti Mlalz-1 was generated at the DOE Joint genome Institute (JGI) using Illumina technology [27]. An Illumina standard PE library was constructed and sequenced using the Illumina HiSeq 2000 platform that generated 35,720,836 reads totalling 4983 Mbp. All general aspects of library construction and sequencing were done at the JGI and details can be found on the JGI website [28]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artefacts (Mingkun L, Copeland A, Han J; unpublished). The following steps for assembly were: (1) filtered Illumina reads were assembled using Velvet (version 1.1.04) [29]; (2) 1-3 Kbp simulated paired end reads were created from Velvet contigs using wgsim (version 0.3.0) [30]; (3) Illumina reads were Mesorhizobium ciceri bv biserrulae WSM1271 was included in the analysis. Phylogenetic analysis was done using MEGA, version 6.0 [61] after manually assembling the alignment by using GeneDoc version 2.6.001 [62]. M. ciceri bv biserrulae WSM1271 was used as an outgroup and the tree was assembled using the UPGMA algorithm based on the number of nucleotide differences. This approach was used since the potential for genetic recombination among the different 16S rRNA genes as reported by van Berkum [63] cannot be ignored. Bootstrap analysis [64] with 2000 permutations of the data set was done to assess support for the branch points. Strains with a genome sequencing project registered in GOLD [24] are Ensifer adhaerens Casida A T , M. ciceri bv. biserrulae WSM1271 and Mlalz-1 and the GOLD ID is provided in place of the GenBank accession number assembled with simulated read pairs using Allpaths-LG (version r39750) [31]. Parameters for the assembly steps were 1) Velvet: -v -s 51 -e 71 -i 2 -t 1 -f "-shortPaired -fastq $FASTQ" -o "-ins_length 250 -min_contig_lgth 500" for Velvet and 2) wgsim: -e 0-1 76-2 76 -r 0 -R 0 -X 0. The final draft assembly contained 100 contigs in 99 scaffolds. The total size of the genome is 6.7 Mbp and the final assembly is based on 4983 Mbp of Illumina data, which provides an average of 748× coverage of the genome.

Genome annotation
Genes were identified using Prodigal [32], as part of the DOE-JGI genome annotation pipeline [33,34]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [35] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [36]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [37]. Additional gene prediction analysis and manual functional annotation was done within the Integrated Microbial Genomes-Expert Review platform [38] developed by the Joint Genome Institute, Walnut Creek, CA, USA.

Genome properties
The genome is 6,664,116 bp with 62.16% GC content ( Table 3) and comprised of 99 scaffolds. From a total of 6388 genes, 6314 were protein encoding and 74 RNA only encoding genes. Most genes (79.52%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.
Insights from the genome sequence E. meliloti Mlalz-1 is one of seven strains of E. meliloti that have been sequenced from the GEBA-RNB genome sequencing projects [17]. On the basis of 16S rRNA sequence identity, strain Mlalz-1 is closely related to E. meliloti IAM 12611 T (= LMG 6133 T ), E. medicae A 321 T (= LMG 19920 T ) and E. numidicus ORS 1407 T . As the genomes of these type strains have not been sequenced or are not publically available, gANI values [39] [39]. The total genome size of strain Mlalz-1 is 6.6 Mbp, which falls within the expected size range of 6.6-8.9 Mbp for E. meliloti. The genome architecture of E. meliloti consists of a chromosome and the two symbiotic megaplasmids pSymA and pSymB [20]. Replication of a plasmid is initiated by the replication protein encoded by repC, which is present as a single copy on E. meliloti pSymA and pSymB. The E.

Extended insights
All 29 E. meliloti strains within the gANI clique share a core set of 4948 orthologous genes, using cut off values of 1e-5 and 30% minimum protein identity. E. meliloti Mlalz-1 contains 176 unique genes, 96 (54.5%) of which encode hypothetical proteins. The unique genes include those encoding the components of a T2SS, located on scaffold A3CADRAFT_scaffold_5.6 ( Fig. 3a), as well as genes that encode a DNA methyltransferase and a NitT/ TauT family transport system. These T2SS components form part of a unique COG profile generated for Mlaz-1 ( Table 6). The T2SS secretion system is used to translocate a wide range of proteins from the periplasm across the outer membrane [40]. Although T2SS genes are not found in other E. meliloti strains or in the Ensifer fredii strains GR64 and USDA 257, they are present in the genomes of the E. fredii strains HH103 and NGR234, in a similar gene arrangement to that observed in E. meliloti Mlalz-1 [41,42] (Fig. 3b). Generally, the T2SS gene cluster is comprised of 12-15 genes, and strain Mlalz-1 contains the 12 required genes gspDOGLMCKEFHIJ necessary for a functional T2SS, but lacks the gspS gene found only in certain genera [43] (Fig. 3c).
In common with some other E. meliloti strains, strain Mlalz-1 contains several genes encoding phage components. The PHASTER algorithm [44] was used to identify two resident prophages, present on scaffold A3CADRAFT_scaffold_4.5: one that was incomplete (Prophage Region 1) and one that was intact (Prophage Region 2) (Fig. 4). The proteins encoded by Prophage Region 1 (11.4 kb) and Prophage Region 2 (55 kb) were most closely related to the phage proteins of PHAGE_Mycoba_-Catalina_NC031238 and PHAGE_Sinorh_phiLM21_ NC_029046, respectively.
Strain Mlalz-1 would appear to be typical of Ensifer strains that nodulate Medicago species since the nodEF, nodL and nodHPQ genes that are required for these specific decorations of the Nod factor are present in the genome. E. meliloti Mlalz-1 also possesses the three nodD genes that mediate host-specific activation of nodABC in the symbiotic interactions of E. meliloti with Medicago [60].

Conclusions
E. meliloti Mlalz-1 is a rhizobial strain that is able to nodulate and fix nitrogen with the highly specific host M. laciniata. Although the 16S rRNA gene sequence divergence was insufficient to differentiate strain Mlalz-1 from E. meliloti, E. medicae or E. numidicus, a gANI value of 98.8% with the genome of E. meliloti 1021,  compared with 87.9% with the genome of E. medicae WSM419 identifies strain Mlalz-1 as E. meliloti. Nodulation of M. laciniata has been shown to be dependent on the presence of a specific nodC allele, which also is present in the genome of E. meliloti Mlalz-1, based on a 98% sequence identity with the nodC of other M. laciniata-nodulating Ensifer strains [14]. However, strain Mlalz-1 is unique among sequenced E. meliloti strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. The second copy of the E. meliloti Mlalz-1 lpiA-acvB operon has highest sequence identity (>96%) with that of sequenced E. medicae strains, which infers horizontal gene transfer of this region from E. medicae.

Additional files
Additional file 1: