Skip to main content


We’d like to understand how you use our websites in order to improve them. Register your interest.

Genome sequence of Ensifer arboris strain LMG 14919T; a microsymbiont of the legume Prosopis chilensis growing in Kosti, Sudan


Ensifer arboris LMG 14919T is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of several species of legume trees. LMG 14919T was isolated in 1987 from a nodule recovered from the roots of the tree Prosopis chilensis growing in Kosti, Sudan. LMG 14919T is highly effective at fixing nitrogen with P. chilensis (Chilean mesquite) and Acacia senegal (gum Arabic tree or gum acacia). LMG 14919T does not nodulate the tree Leucena leucocephala, nor the herbaceous species Macroptilium atropurpureum, Trifolium pratense, Medicago sativa, Lotus corniculatus and Galega orientalis. Here we describe the features of E. arboris LMG 14919T, together with genome sequence information and its annotation. The 6,850,303 bp high-quality-draft genome is arranged into 7 scaffolds of 12 contigs containing 6,461 protein-coding genes and 84 RNA-only encoding genes, and is one of 100 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.


Legume plants form nitrogen fixing symbiosis with root nodule bacteria, collectively called rhizobia. These legumes are particularly useful crop plants that do not require exogenous nitrogenous fertilizer to support growth in less fertile, nitrogen-deficient conditions. They include some of our staple food and feed plants such as beans, peas, soybeans, lentils, clover, peanuts and alfalfa and are mostly annual crops. In many arid and savannah regions, leguminous trees represent a particularly valuable resource as they are often deep-rooted and drought resistant. They have been used traditionally in the Sahel region as sources of timber, fodder and for soil improvement [1]. Prosopis chilensis, also known as Chilean mesquite, is a native tree from South America that has many uses: its nutritious pods can be ground to produce flour and are also eaten by livestock; its wood is used for construction and furniture. Chilean mesquite is also used for intercropping with other plants, for which it provides shelter and nutrients (leaf compost, nitrogen). Acacia senegal (recently renamed as Senegalia senegal) is a plant of particular importance in the production of gum arabic in the Sahel region and the Middle East. Its seeds are dried for human consumption, and its leaves and pods serve as feed for sheep, goats and camels. The plant is also used in agroforestry in intercropping with watermelon and grasses, and in rotation systems with other crops (Agroforestree Database [2]).

The microsymbiont of these legume trees from Sudan and Kenya [3] has been renamed as Ensifer arboris [4], of which LMG 14919T (= HAMBI 1552, ORS 1755, TTR38) is the type strain. This strain was isolated from root nodules of Prosopis chilensis from Kosti, Sudan, and shown to effectively nodulate its original host as well as Acacia senegal [5].

Given the drought tolerance of the host trees, it seems fitting that their symbionts are also stress resistant: Ensifer arboris was described as tolerant to temperatures up to 41–43 °C, 3% NaCl, several heavy metals (including Pb, Cd, Hg, Cu) and a wide range of antibiotics [3,5], characteristics that contribute to the success of the rhizobial-legume tree association in challenging environmental conditions [6]. Here we present a summary classification and a set of features for E. arboris strain LMG 14919T (Table 1), together with the description of the complete genome sequence and its annotation.

Table 1. Classification and general features of Ensifer arboris LMG 14919T according to the MIGS recommendations [7]

Classification and features

E. arboris LMG 14919T is a motile, non-sporulating, non-encapsulated, Gram-negative rod in the order Rhizobiales of the class Alphaproteobacteria. The rod-shaped form varies in size with dimensions of approximately 0.25 µm in width and 1.0–1.5 µm in length (Figure 1, Left and Center). The strain is fast-growing, forming colonies within 3–4 days when grown on half strength Lupin Agar (½LA) [19], tryptone-yeast extract agar (TY) [20] or a modified yeast-mannitol agar (YMA) [21] at 28°C. Colonies on ½LA are white-opaque, slightly domed and moderately mucoid with smooth margins (Figure 1 Right).

Figure 1.

Images of Ensifer arboris LMG 14919T using scanning (Left) and transmission (Center) electron microscopy and the appearance of colony morphology on a solid medium (Right).

E. arboris LMG 14919T is capable of using several amino acids, including L-proline, L-arginine, sodium glutamate and L-histidine as sole nitrogen sources and can use a wide range of different carbon sources including L-arabinose, D-galactose, raffinose, L-rhamnose, maltose, lactose, D-fructose, D-mannose, trehalose, D-ribose, xylene, methyl-D-mannoside, sorbitol, dulcitol, meso-inositol, inulin, dextrin, amygdalin, arbutin, sodium citrate, itaconate, α-ketoglutarate, sodium maltose, 1,2-propylene glycol, and 1,2-butylene glycol [5].

Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighborhood of E. arboris LMG 14919T in a 16S rRNA sequence based tree. This strain shares 99% (1361/1366 bp) and 99% (1361/1366 bp) sequence identity to the 16S rRNA of the fully sequenced E. meliloti Sm1021 [26] and E. medicae WSM419 [27] strains, respectively.

Figure 2.

Phylogenetic tree showing the relationship of Ensifer arboris LMG 14919T (shown in bold print) to other Ensifer spp. in the order Rhizobiales based on aligned sequences of the 16S rRNA gene (1,290 bp internal region). All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5 [22]. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [23]. Bootstrap analysis [24] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [25]. Published genomes are indicated with an asterisk.


E. arboris LMG 14919T was initially shown to form nodules (Nod+) and fix nitrogen (Fix+) with two leguminous tree species, P. chilensis and A. senegal. It was unable to elicit nodules on the herbaceous perennials Macroptilium atropurpureum, Trifolium pratense, Medicago sativa, Lotus corniculatus and Galega orientalis [5]. The symbiotic properties of this strain in seedlings of Acacia and Prosopis spp. in Sudan and Senegal have been reported in detail [6]. Indeterminate nodules are induced, mainly on the lateral roots either in clusters or individually. Young nodules are spherical and later become elongated and are commonly branched. LMG 14919T (=HAMBI 1552) was shown to nodulate and fix nitrogen in seedlings of African A. mellifera, A. nilotica, A. oerfota (synonym A. nubica), A. senegal, A. seyal, A. sieberiana, A. tortilis subsp. raddiana, Latin American A. angustissima, P. chilensis and P. pallida, and Afro-Asian P. cineraria. It also effectively nodulates with Latin-American introductions of P. chilensis and P. juliflora in Africa [6]. It induced small ineffective nodules on Australian A. holosericea and African P. africana [6].

Genome sequencing and annotation

Genome project history

This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [25] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information for E. arborisLMG 14919T.

Growth conditions and DNA isolation

E. arboris LMG 14919T was cultured to mid logarithmic phase in 60 ml of TY rich medium on a gyratory shaker at 28°C [28]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [29].

Genome sequencing and assembly

The genome of Ensifer arboris LMG 14919T was sequenced at the Joint Genome Institute (JGI) using Illumina technology [30]. An Illumina short-insert paired-end library with an average insert size of 270 bp generated 19,256,666 reads and an Illumina long-insert paired-end library with an average insert size of 9,232.94 +/− 2,530.88 bp generated 1,365,298 reads totaling 3,093.3 Mbp of Illumina data. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI user home.

The initial draft assembly contained 27 contigs in 9 scaffolds. The initial draft data was assembled with Allpaths, version r38445, and the consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet, version 1.1.05 [31], and the consensus sequences were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second VELVET assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap, version SPS 4.24 (High Performance Software, LLC). Possible mis-assemblies were corrected with manual editing in Consed [3234]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished), and sequencing of bridging PCR fragments using Sanger (unpublished, Cliff Han) technology. For the improved high quality draft, one round of manual/wet lab finishing was completed. A total of 46 additional sequencing reactions, were completed to close gaps and to raise the quality of the final sequence. The estimated total size of the genome is 6.9 Mbp and the final assembly is based on 3,093.3 Mbp of Illumina draft data, which provides an average of 448× coverage of the genome.

Genome annotation

Genes were identified using Prodigal [35] as part of the DOE-JGI annotation pipeline [36] followed by a round of manual curation using the JGI GenePRIMP pipeline [37]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-protein coding genes and miscellaneous features were predicted using tRNAscan-SE [38], RNAMMer [39], searches against models of the ribosomal RNA genes built from SILVA [40], Rfam [41], TMHMM [42], and SignalP [43]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [44].

Genome properties

The genome is 6,850,303 nucleotides with 62.02% GC content (Table 3) and comprised of 7 scaffolds (Figure 3) of 12 contigs. From a total of 6,545 genes, 6,461 were protein encoding and 84 RNA only encoding genes. The majority of genes (80.78%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.

Figure 3.

Graphical map of the genome of Ensifer arboris LMG 14919T showing the seven largest scaffolds. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew.

Table 3. Genome Statistics for Ensifer arboris LMG 14919T
Table 4. Number of protein coding genes of Ensifer arboris LMG 14919T associated with the general COG functional categories.


  1. 1.

    Deans JD, Diagne O, Nizinski J, Lindley DK, Seck M, Ingleby K, Munro RC. Comparative growth, biomass production, nutrient use and soil amelioration by nitrogen-fixing tree species in semi-arid Senegal. For Ecol Manage 2003; 176:253–264.

  2. 2.

    Agroforestree Database.

  3. 3.

    Nick G, de Lajudie P, Eardly BD, Suomalainen S, Paulin L, Zhang X, Gillis M, Lindstrom K. Sinorhizobium arboris sp. nov. and Sinorhizobium kostiense sp. nov., isolated from leguminous trees in Sudan and Kenya. Int J Syst Bacteriol 1999; 49:1359–1368. PubMed

  4. 4.

    Young JM. The genus name Ensifer Casida 1982 takes priority over Sinorhizobium Chen et al. 1988, and Sinorhizobium morelense Wang et al. 2002 is a later synonym of Ensifer adhaerens Casida 1982. Is the combination “Sinorhizobium adhaerens” (Casida 1982) Willems et al. 2003 legitimate? Request for an Opinion. Int J Syst Evol Microbiol 2003; 53:2107–2110. PubMed

  5. 5.

    Zhang X, Harper R, Karsisto M, Lindstrom K. Diversity of Rhizobium bacteria isolated from the root nodules of leguminous trees. Int J Syst Evol Microbiol 1991; 41:104–113.

  6. 6.

    Räsänen LA, Lindström K. Effects of biotic and abiotic constraints on the symbiosis between rhizobia and the tropical leguminous trees Acacia and Prosopis. Indian J Exp Biol 2003; 41:1142–1159. PubMed

  7. 7.

    Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen M, Angiuoli SV, et al. Towards a richer description of our complete collection of genomes and metagenomes “Minimum Information about a Genome Sequence” (MIGS) specification. Nat Biotechnol 2008; 26:541–547. PubMed

  8. 8.

    Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA 1990; 87:4576–4579. PubMed

  9. 9.

    Garrity GM, Bell JA, Lilburn T. Phylum XIV. Proteobacteria phyl. nov. In: Garrity GM, Brenner DJ, Krieg NR, Staley JT (eds), Bergey’s Manual of Systematic Bacteriology, Second Edition, Volume 2, Part B, Springer, New York, 2005, p. 1.

  10. 10.

    Garrity GM, Bell JA, Lilburn T. Class I. Alphaproteobacteria class. nov. In: Garrity GM, Brenner DJ, Krieg NR, Staley JT (eds), Bergey’s Manual of Systematic Bacteriology, Second Edition, Volume 2, Part C, Springer, New York, 2005, p. 1.

  11. 11.

    Validation List No. 107. List of new names and new combinations previously effectively, but not validly, published. Int J Syst Evol Microbiol 2006; 56:1–6. PubMed

  12. 12.

    Kuykendall LD. Order VI. Rhizobiales ord. nov. In: Garrity GM, Brenner DJ, Kreig NR, Staley JT, editors. Bergey’s Manual of Systematic Bacteriology. Second ed: New York: Springer-Verlag; 2005. p 324.

  13. 13.

    Skerman VBD, McGowan V, Sneath PHA. Approved Lists of Bacterial Names. Int J Syst Bacteriol 1980; 30:225–420.

  14. 14.

    Conn HJ. Taxonomic relationships of certain non-sporeforming rods in soil. J Bacteriol 1938; 36:320–321.

  15. 15.

    Casida LE. Ensifer adhaerens gen. nov., sp. nov.: a bacterial predator of bacteria in soil. Int J Syst Bacteriol 1982; 32:339–345.

  16. 16.

    Judicial Commission of the International Committee on Systematics of Prokaryotes. The genus name Sinorhizobium Chen et al. 1988 is a later synonym of Ensifer Casida 1982 and is not conserved over the latter genus name, and the species name ‘inorhizobium adhaerens’ is not validly published. Opinion 84. Int J Syst Evol Microbiol 2008; 58:1973. PubMed

  17. 17.

    Agents B. Technical rules for biological agents. TRBA (

  18. 18.

    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25:25–29. PubMed

  19. 19.

    Howieson JG, Ewing MA, D’antuono MF. Selection for acid tolerance in Rhizobium meliloti. Plant Soil 1988; 105:179–188.

  20. 20.

    Beringer JE. R factor transfer in Rhizobium leguminosarum. J Gen Microbiol 1974; 84:188–198. PubMed

  21. 21.

    Terpolilli JJ. Why are the symbioses between some genotypes of Sinorhizobium and Medicago suboptimal for N2 fixation? Perth: Murdoch University; 2009. 223 p.

  22. 22.

    Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol 2011; 28:2731–2739. PubMed

  23. 23.

    Nei M, Kumar S. Molecular Evolution and Phylogenetics. New York: Oxford University Press; 2000.

  24. 24.

    Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 1985; 39:783–791.

  25. 25.

    Liolios K, Mavromatis K, Tavernarakis N, Kyrpides NC. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2008; 36:D475–D479. PubMed

  26. 26.

    Galibert F, Finan TM, Long SR, Puhler A, Abola P, Ampe F, Barloy-Hubler F, Barnett MJ, Becker A, Boistard P, et al. The composite genome of the legume symbiont Sinorhizobium meliloti. Science 2001; 293:668–672. PubMed

  27. 27.

    Reeve W, Chain P, O’Hara G, Ardley J, Nandesena K, Brau L, Tiwari R, Malfatti S, Kiss H, Lapidus A, et al. Complete genome sequence of the Medicago microsymbiont Ensifer (Sinorhizobium) medicae strain WSM419. Stand Genomic Sci 2010; 2:77–86. PubMed

  28. 28.

    Reeve WG, Tiwari RP, Worsley PS, Dilworth MJ, Glenn AR, Howieson JG. Constructs for insertional mutagenesis, transcriptional signal localization and gene regulation studies in root nodule and other bacteria. Microbiology 1999; 145:1307–1316. PubMed

  29. 29.

    DOE Joint Genome Institute.

  30. 30.

    Bennett S. Solexa Ltd. Pharmacogenomics 2004; 5:433–438. PubMed

  31. 31.

    Zerbino DR. Using the Velvet de novo assembler for short-read sequencing technologies. Current Protocols in Bioinformatics 2010;Chapter 11:Unit 11 5.

  32. 32.

    Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 1998; 8:186–194. PubMed

  33. 33.

    Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 1998; 8:175–185. PubMed

  34. 34.

    Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res 1998; 8:195–202. PubMed

  35. 35.

    Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119. PubMed

  36. 36.

    Mavromatis K, Ivanova NN, Chen IM, Szeto E, Markowitz VM, Kyrpides NC. The DOE-JGI Standard operating procedure for the annotations of microbial genomes. Stand Genomic Sci 2009; 1:63–67. PubMed

  37. 37.

    Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A, Kyrpides NC. GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 2010; 7:455–457. PubMed

  38. 38.

    Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997; 25:955–964. PubMed

  39. 39.

    Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 2007; 35:3100–3108. PubMed

  40. 40.

    Pruesse E, Quast C, Knittel K. Fuchs BdM, Ludwig W, Peplies J, Glöckner FO. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 2007; 35:7188–7196. PubMed

  41. 41.

    Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res 2003; 31:439–441. PubMed

  42. 42.

    Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Bol 2001; 305:567–580. PubMed

  43. 43.

    Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004; 340:783–795. PubMed

  44. 44.

    Markowitz VM, Mavromatis K, Ivanova NN, Chen IM, Chu K, Kyrpides NC. IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 2009; 25:2271–2278. PubMed

Download references


This work was performed under the auspices of the US Department of Energy’s Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract No. DE-AC02-06NA25396.

Author information



Corresponding author

Correspondence to Wayne Reeve.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Reeve, W., Tian, R., Bräu, L. et al. Genome sequence of Ensifer arboris strain LMG 14919T; a microsymbiont of the legume Prosopis chilensis growing in Kosti, Sudan. Stand in Genomic Sci 9, 473–483 (2014).

Download citation


  • root-nodule bacteria
  • nitrogen fixation
  • rhizobia
  • Alphaproteobacteria