Draft genome sequence of Fermentimonas caenicola strain SIT8, isolated from the human gut

We report the properties of a draft genome sequence of the bacterium Fermentimonas caenicola strain SIT8 (= CSUR P1560). This strain, whose genome is described here, was isolated from the fecal flora of a healthy 28-month-old Senegalese boy. Strain SIT8 is a facultatively anaerobic Gram-negative bacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,824,451-bp long (1 chromosome but no plasmid) contains 2354 protein-coding and 46 RNA genes, including four rRNA genes. Electronic supplementary material The online version of this article (10.1186/s40793-018-0310-6) contains supplementary material, which is available to authorized users.


Introduction
Fermentimonas caenicola strain SIT8 (= CSUR P1560) was isolated from the stool of a healthy 28-month-old Senegalese boy as part of a study aiming at cultivating all species within the human gastro-intestinal microbiota. It is a Gram-negative, facultatively anaerobic, indole-negative bacillus. Initially, we had named this bacterium "Lascolabacillus massiliensis" as it exhibited unique features among members of the family Porphyromonadaceae [1]. However, concomitantly to our work, Hahnke et al. formally described the genus Fermentimonas in 2016 [2]. To date, this genus contains only one species, F. caenicola [2], the type strain of which, ING2-ESB T , exhibits a 100% 16S rRNA sequence identity with strain SIT8. As a consequence, strain SIT8 belongs to the species F. caenicola. Strain ING2-ESB2 T was isolated from a mesophilic laboratory-scale biogas reactor [2]. To the best of our knowledge, we report here the first isolation of F. caenicola from the fecal flora of a human being [3].
Herein, we present a set of features for F. caenicola strain SIT8 together with the description of the complete genomic sequence and annotation.

Classification and features
Fermentimonas caenicola strain SIT8 was isolated from the stool of a healthy 28-month-old Senegalese boy ( Table 1). The patient's parents gave informed signed consent, and the agreement of the National Ethics Committee of Senegal and the ethics committee of the IFR48 (Marseille, France, agreement numbers 11-017 and 09-022) were obtained. Strain SIT8 was initially grown after 10 days of culture in a medium enriched with 5% sheep blood and sterile-filtered sheep rumen, in an aerobic atmosphere at 37°C. The bacterium was sub-cultured on 5% sheep blood-enriched Columbia agar (bioMérieux, Marcy l'Etoile, France) and grew in 24 h at 37°C in both aerobic and anaerobic conditions. Using our systematic matrix-assisted laser desorptionionization time-of-flight screening on a MicroFlex spectrometer (Bruker Daltonics, Bremen, Germany) [4], strain SIT8 exhibited no significant score, suggesting that it was not a member of any known species (Fig. 1). We added the spectrum from strain SIT8 to our database (Fig. 1). Strain SIT8 exhibited a 100% 16S rRNA sequence identity with Fermentimonas caenicola strain ING2-E5B T (GenBank accession KP233810), the phylogenetically closest species with a validly published name in nomenclature (Fig. 2). The 16S rRNA sequence of strain SIT8 has been deposited in GenBank under number LN827535.
Growth at different temperatures (29, 37 and 55°C) was tested. Growth of the strain was tested in 5% sheep blood-enriched Columbia agar (bioMérieux) and Tryptic Soy agar (Becton-Dickinson, Le Pont-de-Claix, France) under anaerobic and microaerophilic conditions using the GENbag anaer and GENbag microaer systems, respectively (bioMérieux), and under aerobic conditions, with or without 5% CO 2 . Growth was tested for salt tolerance, with 0-5, 50 and 100% (w/v) NaCl. The pH range for growth was tested at pH 6.5 and 8.5 using Tryptic Soy agar. Phenotypic tests were performed using API ZYM, API 20NE and API 50CH strips (bioMérieux). In vitro susceptibility to antibiotics was determined using the disk-diffusion method on 5% sheep bloodenriched Mueller-Hinton agar (bioMérieux).
Electron microscopy was performed with detection Formvar coated grids which were deposited on a 40 μL bacterial suspension drop and incubated at 37°C for 30 min, followed by a 10 s incubation on ammonium molybdate 1%. Grids were then observed using a Morgagni 268D transmission electron microscope (Philips) at an operating voltage of 60 kV.
Different growth temperatures (29°C, 37°C, 55°C), pH and salinity were determined. Growth was obtained at 29 and 37°C, with optimal growth at 37°C, at pH 6. 5-8.5 and at NaCl concentration of 0 to 5 g/L. Strain growth was observed in both aerobic and anaerobic conditions and with or without 5% CO 2 . Colonies were pale grey and 1.5 mm in diameter on 5% sheep bloodenriched Columbia agar (bioMérieux). A motility test was negative. Cells were Gram-negative, rod-shaped, polymorphic (Fig. 3), unable to form spores and exhibited a mean diameter of 0.35 μm (range 0.3-0.4 μm) and a mean length of 3.8 μm (range 1-8.8 μm) (Fig. 4).
Using an API 50 CH strip (bioMérieux), positive reactions were observed after 48 h of incubation for the fermentation of D-arabinose, D-galactose, D-glucose, Dmannose, N-acetylglucosamine, amygdalin, arbutin, salicin, D-cellobiose, D-maltose, D-lactose, D-trehalose, D-melezitose, amidon, glycogen, gentiobiose, D-turanose, and potassium-5-ketogluconate. Negative reactions were observed for the fermentation of glycerol, erythritol, Larabinose, D-ribose, D-xylose, L-xylose, D-adonitol, methyl-β-D-xylopyranoside, D-fructose, L-sorbose, Lrhamnose, dulcitol, inositol, D-mannitol, D-sorbitol, methyl-αD-xylopyranoside, methyl-αD-glucopyranoside, D-mellibiose, D-saccharose, inulin, D-raffinose, xylitol, D- , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [26]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements GenBank Accession numbers are indicated in parentheses. Sequences were aligned using MUSCLE, and phylogenetic inferences were obtained using the maximum-likelihood method within the MEGA software [20]. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1000 times to generate a majority consensus tree. Only values ≥ 70% were displayed. The scale bar indicates a 2% nucleotide sequence divergence lyxose, D-tagatose, D-fucose, L-fucose, D-arabitol, Larabitol, potassium gluconate, potassium 2-ketogluconate.

Chemotaxonomic data
Cellular fatty acid methyl ester analysis was performed by GC/MS. Two samples were prepared with approximately 30 mg of bacterial biomass per tube harvested from several culture plates. Fatty acid methyl esters were prepared as described by Sasser [5]. GC/MS analyses were carried out as described before [6]. Briefly, fatty acid methyl esters were separated using an Elite 5-MS column and monitored by mass spectrometry (Clarus 500 -SQ 8 S, Perkin Elmer, Courtaboeuf, France). Spectral database search was performed using MS Search 2.0 operated with the Standard Reference Database 1A (NIST, Gaithersburg, USA) and the FAMEs mass spectral database (Wiley, Chichester, UK).

Genome project history
The strain was selected for sequencing on the basis its 16S rRNA similarity, phylogenetic position, and phenotypic differences with the other members of the family Porphyromonadaceae, and is part of a culturomics study of the human microbiome. It is the second published genome from the F. caenicola species. Table 2 shows the project information and its association with MIGS version 2.0 compliance [7]. The genome Genbank accession number is CTEJ01000000. The genome consists of 2 scaffolds.

Growth conditions and DNA preparation
Strain SIT8 (CSUR P1560) was sub-cultured on 5% sheep blood-enriched Columbia agar (bioMérieux) and grew in 24 h at 37°C in anaerobic atmosphere. Eight Petri dishes were harvested and resuspended in 4x100μl of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed by glass powder on the Fastprep-24 device (MP Biomedicals, Santa Ana, California, USA) using 2 × 20 seconds cycles. DNA was then treated with 2.5 μg/μL lysozyme (30 min at 37°C) and extracted using the BioRobot EZ 1 Advanced XL (Qiagen). DNA was then concentrated and purified with the Qiamp kit (Qiagen). DNA concentration was 70.7 ng/μl as determined by the Genios Tecan fluorometer, using the Quant-it Picogreen kit (Invitrogen).

Genome sequencing and assembly
The genomic DNA of F. caenicola strain SIT8 was sequenced on a MiSeq sequencer (Illumina Inc., San Diego, CA, USA) with the Mate-Pair strategy. The gDNA was barcoded in order to be mixed with 9 other projects with the Nextera Mate-Pair sample prep kit (Illumina). The gDNA was quantified by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) to 82.6 ng/μl. The Mate-Pair library was prepared with 1. 5 μg of gDNA using the Nextera mate pair Illumina guide. The gDNA was simultaneously fragmented and tagged with a Mate-Pair junction adapter. The fragmentation pattern was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA) with a DNA 7500 labchip. DNA fragments ranged in size from 1.5 kb up to 11 kb with an optimal size at 4.33 kb. No size selection was performed and 662 ng of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with an optimal at 1200 bp on the Covaris device S2 in T6 tubes (Covaris, Woburn, MA, USA). The library profile was visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies) and the final concentration library was measured at 61.4 nmol/l. The libraries were normalized at 2 nM and pooled. After a denaturation step and dilution at 15 pM, the pool of libraries was loaded. Automated cluster generation and sequencing run were performed in a single 39-h run in a 2 × 251-bp.
Total information of 7.84 Gb was obtained from an 884 K/mm 2 cluster density with a cluster passing quality control filters of 92.7% (15,478,025 passing filter paired reads). Within this run, the index representation for F. caenicola strain SIT8 was determined to be 13.25%. The 2,050,529 paired reads were trimmed and then assembled in 2 scaffolds.

Genome annotation
Open Reading Frames were predicted using Prodigal [8] with default parameters but the predicted ORFs were excluded if they were spanning a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank database [9] and the Clusters of Orthologous Groups databases [10] using BLASTP. The tRNAScanSE tool [11] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [12] and BLASTn against the GenBank database. Signal peptides and numbers of transmembrane helices were predicted using SignalP [13] and TMHMM [14] respectively. ORFans were identified if their BLASTP Evalue was lower than 1e-03 for alignment lengths greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [15] was used for data management, and DNA Plotter [16] was used for visualization of genomic features. The Mauve alignment tool was used for multiple genomic sequence alignment [17]. To identify putative orthologues and estimate the pan/core-genome composition, comparative genomic analysis was carried out between the two F. caenicola strains SIT8 and ING2-E5B T using bidirectional Best Blast from the BLASTClust algotithm [18], and then specific genes were checked by tBLASTN. We estimated the mean level of nucleotide sequence similarity at the genome level using the digital DNA-DNA hybridization and the genome-to-genome distance calculator Web server as previously reported [19].

Genome properties
The genome of strain SIT8 is 2,824,451-bp long with a 37% G + C content (Table 3; Fig. 5). Of the 2400 predicted genes, 2354 are protein-coding genes, and 46 encode rRNAs. Four rRNA genes (one 16SrRNA, one 23S rRNA and two 5S rRNA) and 42 predicted tRNA genes were identified in the genome. A total of 1668 genes (69. 5%) were assigned a putative function. Twenty-eight genes were identified as ORFans (1.7%). The remaining genes were annotated as hypothetical proteins (732 genes, 30.5%). The properties and the statistics of the genome are summarized in Table 3.
The distribution of genes into COGs functional categories is presented in Table 4.

Insights from the genome sequence
To date, one genome from the Fermentimonas genus has been published. Here, we compared the genome sequence of F. caenicola strains SIT8 (Genbank accession number CTEJ01000000) and ING2-E5B T (Genbank accession number NZ_LN515532).
The draft genome of strain SIT8 (2.87 Mb) has a larger size than that of strain ING2-E5B T (2.85 Mb). The G + C content of strains SIT8 and ING2-E5B T are comparable (37% vs 37.3%, respectively). The gene content of strain   (Table 4). The genomic comparison identified a pangenome of 2681 genes and core genome of 2096 genes. Strains SIT8 and ING2-E5B T harboured 273 and and 312 specific genes, respectively. Functional annotation of the unique genes from strain SIT8 revealed that 48.35% are found into COGs functional categories against 52.56% for strain ING2-E5B T (Additional file 1: Table S2). The COG functional classification of the specific genes from strain SIT8 showed that 10.62% play a role in cell wall, membrane biogenesis and 6.59% in inorganic ion transport and metabolism (Additional file 1: Table S2). In contrast, 16. 99% of specific genes from strain ING2-E5B T are involved in replication, recombination and repair and 6. 73% in carbohydrate transport and metabolism (Additional file 1: Table S2).

Conclusions
We describe the phenotypic, phylogenetic and genomic characteristics of Fermentimonas caenicola strain SIT8. This bacterial strain was isolated from a stool specimen of a healthy 28-month-old Senegalese boy. Strain SIT8 (= CSUR P1560) is the first F. caenicola strain isolated from humans.