Non contiguous-finished genome sequence and description of Microbacterium gorillae sp. nov.

Strain G3T (CSUR P207 = DSM 26203) was isolated from the fecal sample of a wild gorilla (Gorilla gorilla subsp gorilla) from Cameroon. It is a Gram-positive, facultative anaerobic short rod. This strain exhibits a 16S rRNA sequence similarity of 98.2 % with Microbacterium thalassium, the closest validly published Microbacterium species and member of the family Microbacteriaceae. Moreover, it shows a low MALDI-TOF-MS score (1.1 to 1.3) that does not allow any identification. Thus, it is likely that this strain represents a new species. Here we describe the phenotypic features of this organism, the complete genome sequence and annotation. The 3,692,770 bp long genome (one chromosome but no plasmid) contains 3,505 protein-coding and 61 RNA genes, including 4 rRNA genes. In addition, digital DNA-DNA hybridization values for the genome of the strain G3T against the closest Microbacterium genomes range between 19.7 to 20.5, once again confirming its new status as a new species. On the basis of these polyphasic data, consisting of phenotypic and genomic analyses, we propose the creation of Microbacterium gorillae sp. nov. that contains the strain G3T. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0152-z) contains supplementary material, which is available to authorized users.


Introduction
Strain G3 T (= CSUR P207 = DSM 26203) is the type strain of Microbacterium gorillae sp. nov. This bacterium is a Gram-positive, non-spore-forming, indole-negative, facultative anaerobic rod shaped bacillus. It was isolated from the feces of western lowland gorilla in Cameroon as part of a culturomics study to describe the bacterial communities of the gorilla gut [1]. By applying a large variety of culture conditions, culturomics allowed previously the isolation of numerous new bacterial species from gorilla fecal samples [1].
In this report, we present a summary classification, phenotypic features for M. gorillae sp. nov. strain G3 T , together with the description of the complete genome sequence and annotation. These characteristics support the circumscription of the species M. gorillae [8].

Classification and features
Information about the fecal sample collection and conservation are described previously [1]. Strain G3 T ( Table 1) was isolated in January 2012 as part of a culturomics study [1] by cultivation on Columbia agar supplemented with sheep blood (BioMérieux, Craponne, France).
When compared to sequences available in GenBank, the 16S rRNA gene sequence of M. gorillae strain G3 T (GenBank accession number JX650056) exhibited an identity of 98.2 % with Microbacterium thalassium, the closest validly published Microbacterium species. This value was equal to the percentage of 16S rRNA gene sequence threshold recommended by Meier-Kolthoff et al. for class Actinobacteria to delineate a new species without carrying out DNA-DNA hybridization with maximum error probability of 0.1 % [9]. Figure 1 presents the 16S rRNA based tree for the strain G3 T and other Microbacterium species.
Different growth temperatures (20, 25, 30, 37, 45°C) were tested. Growth occurred between 25°C and 37°C, but the optimal growth was observed at 25°C, 24 h after inoculation. No growth occurred at 20 and 45°C. Colonies were 0.8 mm in diameter, appear as gray color on Columbia agar supplemented with sheep blood. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and under aerobic conditions, with or without 5 % CO 2 . Growth was achieved under aerobic (with and without CO 2 ), microaerophilic and anaerobic conditions. Gram staining showed Gram positive short bacilli (Fig. 2, left panel). A motility test with API M medium (BioMérieux) produced a negative result. Cells grown on agar do not sporulate and the rods have a mean length of 1 μm and a mean width of 0.5 μm. Both the length and the diameter were determined by negative staining transmission electron microscopy ( Fig. 2, right panel).
When compared to other Microbacterium species [10][11][12][13][14][15][16], M. gorillae sp. nov. strain G3 T exhibited the phenotypic differences detailed in Additional file 1: Table S1. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [37]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements

Extended feature descriptions
Matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) MS protein analysis was carried out as previously described [17] using a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany). Twelve distinct deposits were done for strain G3 T from 12 isolated colonies. Two microliters of matrix solution (saturated solution of alpha-cyano-4-hydroxycinnamic acid) in 50 % acetronitrile and 2.5 % trifluoroacetic-acid were distributed on each smear and submitted at air drying for five minutes. Then, the spectra from the 12 different colonies were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against 5,626 bacterial spectra including 43 spectra from 33 Microbacterium species, used as reference data, in the BioTyper database. Briefly, a score ≥ 2 with a species with a validly published name provided allows the identification at the species level, a score ≥ 1.7 but < 2 allows the identification at the genus level; and a score < 1.7 does not allow any identification. For strain G3 T , no good score was obtained, suggesting that our isolate was not a member of any known species. We incremented our database with the spectrum from strain G3 T (Additional file 2: Figure S1). The gel view highlighted spectrum differences with other Microbacterium species (Additional file 3: Figure S2).

Fig. 1
Phylogenetic tree highlighting the position of Microbacterium gorillae strain G3 T relative to other type strains within the Microbacterium genus using 16S rRNA gene. GenBank accession numbers are indicated in parentheses. Sequences were aligned using MUSCLE. Alignments were then cleaned from highly divergent blocks using Gblocks version 0.91b [38]. Maximum likelihood (ML) phylogenetic tree was generated using RAxML [39], employing the GTR GAMMA substitution model with 500 bootstraps. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 500 times to generate a majority consensus tree. Corynebacterium diphtheriae was used as outgroup. The scale bar represents a rate of substitution per nucleotide position of 0.02. (T) indicates that the sequence used in the tree is from the type strain of the species.* indicates the strains used in the tree have a sequenced genome. # indicates that a sequenced genome is available for this species but not for the strain used to build the tree

Genome sequencing information
Genome project history According to phenotypic characteristics of this strain and MALDI-TOF result and because of the low16S rRNA similarity to other members of the genus Microbacterium, it is likely that the strain represents a new species and thus it was chosen for genome sequencing. It was the 20 th genome of a Microbacterium species (Genomes Online Database) and the first genome of Microbacterium gorillae sp. nov. A summary of the project information is shown in Table 2. The GenBank accession number is CDAR00000000 and consists of 14 contigs. Table 2 shows the project information and its association with MIGS version 2.0 compliance [18].
Bacteria grown on four Petri dishes were resuspended in 3x500μl of TE buffer and stored at 80°C. Then, 500 μl of this suspension were thawed, centrifuged 3 min at 10,000 rpm and resuspended in 3x100μL of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed by glass powder on the Fastprep-24 device (Sample Preparation system, MP Biomedicals, USA) using 2x20 s cycles. DNA was then treated with 2.5 μg/ μL lysozyme (30 min at 37°C) and extracted using the BioRobot EZ1 Advanced XL (Qiagen). The DNA was then concentrated and purified using the Qiamp kit (Qiagen). The yield and the concentration was measured by the Quant-it Picogreen kit (Invitrogen) on the Genios Tecan fluorometer at 50 ng/μl.

Genome sequencing and assembly
Genomic DNA of M. gorillae was sequenced on the MiSeq Technology (Illumina Inc, San Diego, CA, USA) with the 2 applications: paired end and mate paired. The gDNA was barcoded in order to be mixed with 11 others projects with the Nextera Mate Pair sample prep kit (Illumina) and with 17 others projects with the Nextera XT DNA sample prep kit (Illumina). gDNA was quantified by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) to 46.7 ng/μlTo prepare the paired end library, dilution was performed to require 1 ng of each genome as input. The « tagmentation » step fragmented and tagged the DNA. Then limited cycle PCR amplification (12 cycles) completed the tag adapters and introduced dual-index barcodes. After purification on AMPure XP beads (Beckman Coulter Inc, Fullerton, CA, USA), the libraries were then normalized on specific beads according to the Nextera XT protocol (Illumina). Normalized libraries were pooled for sequencing on the MiSeq. The pooled single strand library was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and  Total information of 7.6 Gb was obtained from a 931 K/ mm 2 cluster density with a cluster passing quality control filters of 82.8 % (17,658,000 clusters). Within this run, the index representation for M. gorillae was determined to 5.11 %. The 732,922 paired end reads were trimmed and filtered by Trimmomatic tool using the recommended parameters for Illumina sequence data [19]. Two mate pair libraries were prepared with 1 and 1.5 μg of genomic DNA using the Nextera mate pair Illumina guide. The genomic DNA sample was simultaneously fragmented and tagged with a mate pair junction adapter. The pattern of the fragmentation was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc, Santa Clara, CA, USA) with a DNA 7500 labchip. The DNA fragments ranged from 1 kb to 11 kb in size with the majority of fragments at 8.8 and 9.4 kb of size. No size selection was performed and 45 ng for the 1 st library and 600 ng for the second library of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with the majority at 400 and 380 bp on the Covaris device S2 in microtubes (Covaris, Woburn, MA, USA). The library profile was visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA) and the final concentration library was measured at 0.65 and 0.59 nmol/l respectively. The libraries were normalized at 2nM and pooled. After a denaturation step and dilution at 15 pM, the pool of libraries was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and sequencing run were performed in a single 39-h run in a 2x251-bp. The first libray was loaded three times on a flowcell and the second once. Within these runs, the index representation for M. gorillae was determined as an average at 3.51 %. The 1,881,286 paired reads were filtered according to the read qualities. The global paired end and mate pair libraries lead to 2,614,208 paired reads which were trimmed by Trimmomatic [19] then assembled by Spades software using the recommended options "-careful" and "-k 127" to fix the kmer size to 127 [20]. The final assembly identified 14 scaffolds generating a genome size of 3.69 Mb which corresponds to genome coverage of 213X.

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [21] with default parameters but the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank database [22] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [23] was used to find tRNA genes, whereas ribosomal RNAs were found using RNAmmer [24] and BLASTn against the GenBank database. Lipoprotein signal peptides and the number of transmembrane helices were predicted using SignalP [25] and TMHMM [26] respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [27] was used for data management and DNA Plotter [28] for visualization of genomic features. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [29]. To estimate the mean level of nucleotide sequence similarity at the genome level between M. gorillae sp. nov. strain G3T and other members of the genus Microbacterium, we used the MAGI home-made software to calculate the average genomic identity of gene sequences (AGIOS) among compared genomes [30]. Briefly, this software combines the Proteinortho software [31] for detecting orthologous proteins in pairwise genomic comparisons, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm. Finally, we used Genome-to-Genome Distance Calculator (GGDC) web server available at (http://ggdc.dsmz.de) to estimate of the overall similarity among the compared genomes and to replace the wet-lab DNA-DNA hybridization (DDH) by a digital DDH (dDDH) [32,33]. GGDC 2.0 BLAST+ was chosen as alignment method and the recommended formula 2 was taken into account to interpret the results.

Genome properties
The genome of M. gorillae strain G3 T is 3,692,770 bplong with a 69.3 % G+C content (Table 3, Fig. 3). Of the 3,566 predicted genes, 3,505 were protein-coding genes and 61 were RNA genes, including 4 complete rRNA operons (Additional file 4). A total of 2,412 genes (68.82 %) were assigned a putative function. A total of 6.33 % were identified as Pseudo-genes. The remaining genes were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Table 3. The distribution of genes into COGs functional categories is presented in Table 4 and Additional file 4.  (Table 5). However the distribution of genes into COG categories was similar

Conclusions
On the basis of phenotypic characteristics, phylogenetic position, genomic analyses (taxonogenomics) and GGDC results, we formally propose the creation of Microbacterium gorillae sp. nov. that contains the strain G3 T . This strain has been isolated from a gorilla stool sample collected from Cameroon.

Taxonomic and nomenclatural proposals
Description of Microbacterium gorillae sp. nov.
Microbacterium gorillae (go.ril'lae. NL neut. gen gorilla, pertaining to a gorilla from which the stool sample was obtained). Cells stain Gram-positive, are small rod, nonendospore-forming, non-motile and have a diameter of 0.5 μm and a length of 1 μm. Colonies are gray and 2 mm in diameter on blood-enriched Columbia agar. Growth occurs between 25 and 37°C, with optimal growth observed at 25°C.
The G+C content of the genome is 69.3 %. The 16S rRNA and genome sequences are deposited in GenBank under accession numbers JX650056 and CDAR00000000, respectively. The type strain G3 T (= CSUR P207 = DSM 26203) was isolated from the fecal sample of a western lowland gorilla from Cameroon.