High-quality permanent draft genome sequence of the Bradyrhizobium elkanii type strain USDA 76T, isolated from Glycine max (L.) Merr

Bradyrhizobium elkanii USDA 76T (INSCD = ARAG00000000), the type strain for Bradyrhizobium elkanii, is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Glycine max (L. Merr) grown in the USA. Because of its significance as a microsymbiont of this economically important legume, B. elkanii USDA 76T was selected as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria sequencing project. Here the symbiotic abilities of B. elkanii USDA 76T are described, together with its genome sequence information and annotation. The 9,484,767 bp high-quality draft genome is arranged in 2 scaffolds of 25 contigs, containing 9060 protein-coding genes and 91 RNA-only encoding genes. The B. elkanii USDA 76T genome contains a low GC content region with symbiotic nod and fix genes, indicating the presence of a symbiotic island integration. A comparison of five B. elkanii genomes that formed a clique revealed that 356 of the 9060 protein coding genes of USDA 76T were unique, including 22 genes of an intact resident prophage. A conserved set of 7556 genes were also identified for this species, including genes encoding a general secretion pathway as well as type II, III, IV and VI secretion system proteins. The type III secretion system has previously been characterized as a host determinant for Rj and/or rj soybean cultivars. Here we show that the USDA 76T genome contains genes encoding all the type III secretion system components, including a translocon complex protein NopX required for the introduction of effector proteins into host cells. While many bradyrhizobial strains are unable to nodulate the soybean cultivar Clark (rj1), USDA 76T was able to elicit nodules on Clark (rj1), although in reduced numbers, when plants were grown in Leonard jars containing sand or vermiculite. In these conditions, we postulate that the presence of NopX allows USDA 76T to introduce various effector molecules into this host to enable nodulation. Electronic supplementary material The online version of this article (doi:10.1186/s40793-017-0238-2) contains supplementary material, which is available to authorized users.


Introduction
Soybean (Glycine max) (L.) Merr. is the dominant and the most important commercial legume crop species, yielding food oil and animal meal as well as nutritious vegetable protein [1][2][3]. The plant was first introduced into USA agriculture during the mid-18th century and was mainly used as a forage crop until the 1920s [4].
The development of new cultivars, along with technological advances in soybean processing and increased demand for soybean products, has led to major increases in production during the 20th century [4].
As with most papilionoid legumes, soybean engages in a symbiotic relationship with dinitrogen-fixing soil bacteria known as rhizobia and is able to obtain on average 50-60% of its required nitrogen through symbiotic nitrogen fixation [5]. A greater understanding of the symbiosis between soybean and its cognate rhizobia is of direct relevance for maintaining environmentally sustainable high crop yields, which significantly contributes to the Sustainable Development Goals adopted in September 2015 as part of the UN's development agenda 'Transforming our world: the 2030 Agenda for Sustainable Development' [6].
The soybean-nodulating bacteria, known as Rhizobium japonicum according to a 1929 classification scheme [7], were reclassified as Bradyrhizobium japonicum in 1982 because of several fundamental morphological and physiological differences with the genus Rhizobium [8]. The bacteria isolated from nodules of soybean had previously been shown to be phenotypically diverse, even though they were grouped together in the species Bradyrhizobium japonicum. One of the major methods that demonstrated this diversity was serology, which was used to classify individual isolates into 17 distinct serogroups [9]. This was accomplished by generating antisera to specific strains in the USDA collection in Beltsville and then using the sera to generate a serological scheme. One of the strains used to generate antisera was USDA 76 T and all isolates that cross-reacted with the antiserum generated with this serotype strain were combined together in the 76 serogroup. The strain USDA 76 T deposited in the Beltsville collection was a re-isolate from a greenhouse-grown plant inoculated with USDA 74 in Maryland. In turn, USDA 74 was a reisolate of USDA 8 from a plant passage field test in California in 1956. The original parent culture of USDA 76 T is USDA 8, which was isolated from soybean grown at the Arlington Farm, Virginia in 1915.
Differences among the soybean root nodule bacteria classified as B. japonicum were also demonstrated using molecular methods. Hollis et al. [10] reported the presence of three DNA homology groupings by analysis of 28 strains within the soybean rhizobia. Using this approach, nine of the 17 serogroups were assigned to three DNA homology groupings: group I, the closely related group Ia and the more divergent group II. Supporting evidence for these three groupings was obtained by Kuykendall et al. [11]. By sequence analysis of the 16S rRNA genes, each of the 17 serotype strains representing the serogroups were also placed into three closely related groups [12] that matched their separation by DNA homology. Since soybean strains could be distinguished phenotypically and by several approaches in molecular biology, Kuykendall et al. [13] proposed that DNA homology group II strains be separated from B. japonicum as the species Bradyrhizobium elkanii, with USDA 76 T as the type strain.
Because of these distinguishing characteristics and its significance as a microsymbiont of the economically important legume soybean, B. elkanii USDA 76 T was selected as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria sequencing project [14,15]. Here we present a summary classification and a set of general features for B. elkanii strain USDA 76 T , together with a description of its genome sequence and annotation.

Classification and features
Bradyrhizobium elkanii USDA 76 T is a motile, nonsporulating, non-encapsulated, Gram-negative strain in the order Rhizobiales of the class Alphaproteobacteria. The rod shaped form has dimensions of approximately 0.5 μm in width and 1.0-2.0 μm in length ( Fig. 1 Left and Center). It is relatively slow growing, forming colonies after 6-7 days when grown on ½ Lupin Agar [16], Modified Arabinose Gluconate [17] and modified Yeast Mannitol Agar [18] at 28°C. Colonies on ½ LA are opaque, slightly domed and moderately mucoid with smooth margins (Fig. 1 Right).
Sequence divergence among the 16S rRNA genes of the 33 type strains within the genus Bradyrhizobium was limited and ranged from no differences in many cases to a similarity of 98% between B. elkanii USDA 76 T and B. neotropicale (Fig. 2) after accounting for 40 bp in gaps along the alignment length. Such high similarity values would question the reliability of defining species limits within the genus based on divergence of the 16S rRNA genes [19]. Bootstrap values for each of the nodes of the branches were low and none of the confidence values reached or exceeded 95%. Therefore, the placement of each of the taxa relative to the others in the tree is inconclusive.
Genetic recombination resulting in a reticulate evolutionary history of the 16S rRNA gene is perhaps a likely explanation for the low bootstrap values. Therefore, an analysis for recombination was done with the aligned 33 Bradyrhizobium 16S rRNA genes using the pairwise homoplasy index test [20]. By using this test, statistically significant evidence for recombination among the 33 16S rRNA genes was detected (P = 0.003). The detection of genetic recombination within the rrn loci of rhizobia is not unprecedented since reticulate evolutionary histories of the 16S rRNA genes and the Internally Transcribed Spacer between the 16S and 23S rRNA genes has been described before [21,22]. The 16S rRNA sequence of B. pachyrhizi was identical with those of the B. elkanii serogroup strains USDA 31, USDA 94 and USDA 130, which differed from B. elkanii USDA 76 T by one bp (99.999% similar). The most divergent 16S rRNA gene within B. elkanii was that of the serogroup strain USDA 46 (99.996% similar), while the most divergence among the soybean serogroup strains was that between  [67]. Subsequently the alignment was manually inspected for errors and necessary corrections were made by using GeneDoc version 2.6.001 [68]. The outgroups Mesorhizobium loti LMG6125 T and M. ciceri UPM-Ca7 T were chosen because of the reported recombination events between the 16S rRNA genes of B. elkanii and Mesorhizobium [22]. Of the 1313 active sites of the alignment there were 40 gaps among the Bradyrhizobium taxa. The number of different base pairs among all the 35 aligned sequences (including the two Mesorhizobium species) was determined by using MEGA, version 5 [67] to generate a tree using the UPGMA algorithm. Bootstrap analysis [69] with 2000 permutations of the data set was used to determine support for each of the branches. USDA 46 and USDA 110, which were 98.4% similar. Since the divergence of the 16S rRNA genes of the genus Bradyrhizobium is narrow, with evidence for the presence of a history of genetic recombination, it may be necessary to more precisely establish their phylogeny by comparing their entire genomes rather than individual genes. Such an approach may provide more fundamental insight into the evolutionary history of this class of symbiotic bacteria as well as impacting potential changes in their current proposed taxonomy. Minimum Information about the Genome Sequence of USDA 76 T is provided in Table 1 and Additional file 1: Table S1.

Symbiotaxonomy
An investigation of the symbiotic properties of soybean began with the work of Brooks [23] in the late 19th century, when he observed that soybean grown in the fields of his experiment station in Massachusetts only nodulated when supplied with dust he had brought with him from Japan. This led to the theory that soybeannodulating bacteria in the soils of the USA were imported from the Far East. Cotrell et al. [24] and Hopkins [25] reported the supporting evidence that soybean in Kansas nodulated with soil taken from the Massachusetts Experiment station, or in Illinois from soil collected from fields with a history of soybean cultivation. However, several decades later it became evident that rhizobia that nodulated native American legumes within the genera Apios, Amphicarpa, Crotalaria, Desmodium, Lespedeza, Baptisia, Cassia, Genista and Wisteria also nodulated soybean [26][27][28]. With the exception of USDA 6 and USDA 38, which are from Japan, all the remaining soybean serotype strains were recovered from nodules of soybeans grown in the USA, including USDA 8 (the original parent of USDA 76 T ). Consequently, it is unclear whether these rhizobia obtained from nodules of USA-grown soybean originate from the Far East or are in fact native to the soils of America. Therefore, the possibility exists that USDA 76 T may be able to nodulate and form a symbiosis with a wide variety of legumes, but this has not been thoroughly investigated. Unfortunately, the communication that included the proposal of USDA 76 T as the type strain for B. elkanii did not include results of plant tests to describe its symbiotic range, but instead relied on distinction by phenotype and genotype [11]. An indication of the possible American origin of USDA 76 T is its reported effectiveness in symbiosis with the native Apios americana Medik. and use as an inoculum for this potential leguminous crop [29]. Further evidence for this theory is the ability of USDA 76 T to nodulate and fix nitrogen with the native American Amphicarpaea bracteata (L.) Fernald [30]. USDA 76 T effectively nodulates the promiscuous Vigna unguiculata (L.) Walp.
(cowpea), but is unable to nodulate the tropical American legume Phaseolus lunatus L. (Lima bean), which forms nodules with various other strains of bradyrhizobia [31]. To our knowledge, the only other reported information is that USDA 74 (parent of USDA 76 T ) forms an effective symbiosis with Macroptilium atropurpureum (DC.) Urb. (Siratro) and Vigna unguiculata (L.) Walp [32]. In soybean, the Rj(s) or rj(s) genetic loci have been identified as controlling the ability of compatible rhizobia to nodulate with a particular cultivar (reviewed by Hayashi et al. [33]). USDA 76 T is reported to form nodules (albeit in reduced numbers) on the cultivar Clark (rj1) and to nodulate and fix N 2 with the isogenic lines BARC-2 and BARC-3, harboring the Rj4 and rj4 alleles, respectively, when tested in Leonard jars with sterile vermiculite or sand [30]. The symbiotic characteristics of B. elkanii USDA 76 T on a range of selected hosts are summarized in Additional file 2: Table S2.

Genome project history
This organism was selected for sequencing at the U.S. Department of Energy funded Joint Genome Institute as part of the Genomic Encyclopedia of Bacteria and Archaea-Root Nodule Bacteria project project [14,15]. The root nodule bacteria in this project were selected on the basis of environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance. In particular, strain USDA 76 T was chosen since it is a microsymbiont of the economically important legume soybean, but can also form symbioses with several legumes native to the USA. The USDA 76 T genome project is deposited in the Genomes Online Database [34] and a high-quality permanent draft genome sequence is deposited in IMG [35]. Sequencing, finishing and annotation were performed by the JGI [36] and a summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
After recovery from permanent storage, the B. elkanii USDA 76 T was streaked onto MAG solid medium and grown at 28°C for 6 days to obtain well grown, well separated colonies, then a single colony was selected and used to inoculate 5 ml MAG broth. The culture was grown on a gyratory shaker (200 rpm) at 28°C for 6 days. Subsequently 1 ml was used to inoculate 50 ml MAG broth and grown on a gyratory shaker (200 rpm) at 28°C until an OD 600nm of 0.6 was reached. DNA was isolated from the cells according to van Berkum [17]. Final concentration of the DNA was set to 0.5 mg ml −1 . Culture identity was confirmed by partial sequence analysis of several housekeeping genes and the 16S rRNA gene using the prepared DNA as template for PCR.

Genome sequencing and assembly
The draft genome of B. elkanii USDA 76 T was generated at the DOE Joint genome Institute (JGI) using the Illumina technology [37]. An Illumina short-insert paired-end library was constructed with an average insert size of 200 bp that when sequenced generated 312,796,730 reads. An Illumina long-insert paired-end library with an average insert size of 6505.78 +/− 3679.88 bp also was constructed that when sequenced generated 19,315,434 reads. The total amount of sequence data obtained with the Illumina was 34,177 Mbp. Library construction and sequence analysis were done at the JGI according to the protocols outlined on their website [38]. The first of two initial drafts, assembled with Allpaths version r38445 [39], contained 81 contigs in 17 scaffolds and subsequently a consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The second draft assembled with Velvet, version 1.1.05 [40], resulted in consensus sequences that were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The data were assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from this second Velvet assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths and both Velvet assemblies together with a subset of the Illumina CLIP paired-end reads were assembled using parallel Phrap, version 4.24 (High Performance Software, LLC). Potential errors in the assemblies were corrected by manual editing with Consed [41][42][43]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished) and sequence analysis of bridging PCR fragments with PacBio technology (Cliff Han, unpublished). Gaps were closed and the quality of the final sequence was

Genome annotation
Genes were identified using Prodigal [44] that was followed by a round of manual curation using Gene-PRIMP [45] as part of the DOE-JGI genome annotation pipeline [46,47]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [48] was used to find tRNA genes. Ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [49]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [50]. Additional gene prediction analysis and manual functional annotation were done within the Integrated Microbial Genomes-Expert Review system [51] developed by the Joint Genome Institute, Walnut Creek, CA, USA.

Genome properties
The genome of B. elkanii USDA 76 T is 9,484,767 nucleotides long with a GC content of 63.70% (Table 3) and has been assembled into two scaffolds. Of the 9151 genes identified, 9060 are protein encoding and 91 are RNA only encoding genes. Of the 9151 total genes identified in USDA 76 T , the majority (73.28%) were assigned a putative function and the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.
Insights from the genome sequence Scaffold 1.1 of B. elkanii USDA 76 T contains a low GC content for the region~3,000,000-3,800,000 and the presence of symbiotic nod, nif and fix genes in this region indicates a symbiotic island integration (Fig. 3).
Using the Phylogenetic Profiler tool within IMG, 356 genes were found to be unique to USDA 76 T in a  were annotated as encoding hypothetical proteins. Out of the remainder, a significant number were phage related. Using the PHASTER algorithm [54], 22 of these genes were found to be co-located genes of an intact resident prophage (Fig. 4). Using this algorithm another incomplete phage gene set on the same scaffold was also identified.

Extended insights
Using the Phylogenetic Profiler tool, 7556 genes were found to be conserved in five B. elkanii strains (587, CCBAU43297, CCBAU05737, USDA 76 T , USDA 94), including genes encoding a general secretion pathway and type II, III, IV and VI secretion system proteins. The Type III secretion system (T3SS) [55] can either promote or impair the establishment of symbiosis, depending on the legume host [56], and has been characterized as a host determinant for rj1, Rfg1, Rj2 and Rj4 soybean cultivars [33,57,58]. The dominant soybean genes Rj2 and Rj4 restrict nodulation with specific strains of Bradyrhizobium [33]. Most investigations of soybean host genes controlling the symbiosis have focused on the Rj4 soybean line that was originally identified by its inability to nodulate with USDA 61 (B. elkanii, serogroup 31) [59]. The predicted Rj4 thaumatin-like protein is thought to be involved in conferring resistance to Bradyrhizobium strains producing specific T3SS effector proteins [60]. However, USDA 76 T was reported to nodulate and form an effective nitrogen-fixing symbiosis with the isogenic lines BARC-2 (Rj4) and BARC-3 (rj4) [30,61], suggesting that this strain does not produce the interacting T3SS effector protein(s). Conversely, the recessive soybean gene rj1rj1 [62], encoding a putative truncated Nod factor receptor protein [63], restricts nodulation by many Bradyrhizobium and Ensifer strains, although specific strains of B. elkanii, including USDA 76 T , can form a limited number of nodules when tested with plants in Leonard jars containing sterilized vermiculite or sand [30,59,61]. USDA 76 T genes encoding components required for a functional T3SS were identified within the integrated symbiotic island (Figs. 5 and 6). Although the nopA and nopC genes were not annotated in the USDA 76 T genome, by using TBLASTN these genes were identified in the intergenic region between BraelDRAFT_3047 (sctD) and BraelDRAFT_3048 (hypothetical) that share 100% sequence similarity with nopA and nopC of the characterized Bradyrhizobium elkanii strain USDA 61 [57]. Although T3SS components can also be found in Bradyrhizobium strain USDA 110, this strain lacks the nopX gene encoding the translocon required to introduce effector molecules into host cells [56,64]. This is in contrast to the presence of nopX in USDA 76 T , which could extend its host range to otherwise incompatible hosts. a b Fig. 3 Graphical map of the largest scaffold (9,116,505 bp) of USDA 76 T (a) showing the location of common nodulation genes within the symbiotic island of this strain (b). From bottom to the top of the scaffold map: Genes on forward strand (color by COG categories as denoted by the IMG platform), genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew   [54]. Prophage maps are not drawn to scale. Reference locus tag for Prophage Region 1 is BraelDRAFT_5594 terminase; ter); reference locus tag for Prophage Region 2 is BraelDRAFT_6751 (terminase; ter). Coat protein (coa), fiber protein (fib), phage-like protein (plp), portal protein (por), tail shaft protein (sha), and terminase (ter). All other genes encode hypothetical proteins Fig. 6 Schematic representation of the components constituting the T3SS present in Bradyrhizobium elkanii USDA 76 T . The IMG product name is provided with the Yersinia Ysc-Yop T3SS ortholog shown in brackets. The relative secretion components were identified based on information provided by Galán et al. [55] under contract No. DE-AC02-05CH11231. We gratefully acknowledge the funding received from the Curtin University Sustainability Policy Institute, and the funding received from Murdoch University Small Research Grants Scheme in 2016.