Non contiguous-finished genome sequence and description of Bacillus jeddahensis sp. nov.

Strain JCET was isolated from the fecal sample of a 24-year-old obese man living in Jeddah, Saudi Arabia. It is an aerobic, Gram-positive, rod-shaped bacterium. This strain exhibits a 16S rRNA nucleotide sequence similarity of 97.5 % with Bacillus niacini, the phylogenetically closest species with standing nomenclature. Moreover, the strain JCET presents many phenotypic differences, when it is compared to other Bacillus species, and shows a low MALDI-TOF Mass Spectrometry score that does not allow any identification. Thus, it is likely that this strain represents a new species. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,762,944 bp long genome (1 chromosome but no plasmid) contains 4,654 protein-coding and 98 RNAs genes, including 92 tRNA genes. The strain JCET differs from most of the other closely Bacillus species by more than 1 % in G + C content. In addition, digital DNA-DNA hybridization values for the genome of the strain JCET against the closest Bacillus genomes range between 19.5 to 28.1, that confirming again its new species status. On the basis of these polyphasic data made of phenotypic and genomic analyses, we propose the creation of Bacillus jeddahensis sp. nov. that contains the strain JCET. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0024-y) contains supplementary material, which is available to authorized users.

The genus Bacillus, described by Cohn [14] more than 140 years ago, includes actually 310 species names (296 validly and 14 not-validly published species) [15]. Species belonging to this genus are Gram-positive or variable and mostly motile and spore-forming bacteria. Bacillus spp. are ubiquitous bacteria isolated from various environmental sources but it could be involved in human infections [16].
Strain JCE T (= CSUR P732 = DSM 28281) is the type strain of Bacillus jeddahensis sp. nov. This bacterium is a Gram-positive, flagellated, facultatively anaerobic, indolenegative bacillus that has rounded-ends. It was isolated from the stool sample of a 24-year-old obese man living in Jeddah, Saudi Arabia as part of a culturomics study aiming at cultivating bacterial species within human feces. By applying large scale of culture conditions, culturomics allowed previously the isolation of many new bacterial species from human stool samples [17][18][19].
Here we present a summary classification and a set of features for B. jeddahensis sp. nov. strain JCE T together with the description of the complete genome sequence and annotation. These characteristics support the circumscription of the species B. jeddahensis [20].
specimen was preserved at −80°C after collection and sent to Marseille. Strain JCE T (Table 1) was isolated in July 2013 by cultivation on blood culture bottle (Becton Dickinson, Temse, Belgique) supplemented with rumen fluid and sheep blood. This strain exhibited a 97.5 % 16S rRNA nucleotide sequence similarity with Bacillus niacini, the phylogenetically closest validly published Bacillus species (Fig. 1), when it was compared against NCBI database and Ribosomal Database Project. This value was equal to the percentage of 16S rRNA gene sequence threshold recommended by Meier-Kolthoff et al. for Firmicutes to delineate a new species without carrying out DNA-DNA hybridization with maximum error probability of 0.01 % [21].
Different growth temperatures (28,30,37,45, 56°C) were tested. Growth occurred for the temperatures (28-45°C), but the optimal growth was observed at 37°C. Colonies were 0.4-0.5 mm in diameter on Columbia agar, appear smooth and grey in color at 37°C. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in aerobic conditions, with or without 5 % CO 2 . Growth was achieved under aerobic (with and without CO 2 ), microaerophilic

MIGS-4.4 Altitude unknown
Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [49]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements and anaerobic conditions. Gram staining showed Gram positive bacilli (Fig. 2). A motility test was negative. Cells grown on agar sporulate and the rods have a length ranging from 3.83 to 4.71 μm (mean 4.14 μm) and a diameter ranging from 0.75 to 0.95 μm (mean 0.87 μm). Both the length and the diameter were determined by negative staining transmission electron microscopy ( Fig. 3). Strain JCE T exhibited oxidase activity but not catalase activity. Using API 50CH system (BioMerieux), a positive reaction was observed for D-arabinose, L-arabinose, D-xylose, D-glucose, D-fructose, D-mannose, N-acetylglucosamine, esculin, D-maltose, D-trehalose, and weak reaction for D-melezitose. Negative reactions were observed for the remaining carbohydrate tests (i.e. glycerol, erythritol, Dribose, L-xylose, D-adonitol, methyl-β-D-xylopyranoside, D-galactose, L-sorbose, L-rhamnose, dulcitol, inositol, D-mannitol, D-sorbitol, methyl-α-D-mannopyranoside, methyl-α-D-glucopyranoside, amygdalin, arbutin, salicin, D-cellobiose, D-lactose, D-melibiose, D-saccharose, inulin, D-raffinose, amidon, glycogen, xylitol, gentiobiose, Dturanose, D-lyxose, D-tagatose, D-fucose, L-fucose, D-arabitol, L-arabitol, potassium gluconate, potassium 2-ketogluconate and potassium 5-ketogluconate). Using API ZYM, positive reactions were observed for esterase (C 4), esterase lipase (C 8), acid phosphatase, naphthol-AS-BI-phosphohydrolase and β-glucosidase. Negative reactions were observed for alkaline phosphatase, lipase (C 14), leucine arylamidase, valine arylamidase, cystine  Fig. 1 Phylogenetic tree highlighting the position of Bacillus jeddahensis strain JCE T relative to other type strains within the Bacillus genus. GenBank accession numbers are indicated in parentheses. Sequences were aligned using MUSCLE, and phylogenetic inferences obtained using the maximum-likelihood method and Kimura 2-parameter model within the MEGA 6 software [50]. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1,000 times to generate a majority consensus tree. Clostridium botulinum was used as outgroup. The scale bar represents a rate of substitution per site of 0.01. *indicates the strains used in the tree have a sequenced genome. # indicates that a sequenced genome is available for this species but not for the strain used to build the tree Fig. 2 Gram staining of B. jeddahensis strain JCE T arylamidase, trypsin, α-chymotrypsin, α-galactosidase, β-galactosidase, β-glucuronidase, α-glucosidase, N-acetylβ-glucosaminidase, α-mannosidase and α-fucosidase. Using API NE system, nitrates were reduced to nitrites, the urease reaction, indole production, arginine dihydrolase and gelatin hydrolysis were negative, the following carbon sources were assimilated: D-glucose, D-mannose, N-acetylglucosamine and D-maltose, and the following carbon sources were not assimilated: L-arabinose, Dmannitol, potassium gluconate, capric acid, adipic acid, malic acid, trisodium citrate and phenylacetic acid. B. jeddahensis is susceptible to imipenem, doxycyclin amoxicillin, amoxicillin-clavulanate and gentamycin, but resistant to metronidazole, trimethoprim/sulfamethoxazole, rifampicin, vancomycin, erythromycin, ceftriaxone, ciprofloxacin and benzylpenicillin.
Matrix-assisted laser-desorption/ionization time-offlight (MALDI-TOF) MS protein analysis was carried out as previously described [2] using a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany). Twelve distinct deposits were done for strain JCE T from 12 isolated colonies. The twelve JCE T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against 6,335 bacterial spectra including 210 spectra from 110 Bacillus species, used as reference data, in the BioTyper database. Interpretation of scores was as follows: a score ≥ 2 enabled the identification at the species level, a score ≥ 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification (These scores were established by the manufacturer Bruker Daltonics). For strain JCE T , the obtained scores ranged from 1.4 to 1.6, thus suggesting that our isolate was not a member of a known species. We incremented our database with the spectrum from strain JCE T (Fig. 4). Spectrum differences with other of Bacillus species are shown in Fig. 5.

Genome project history
On the basis of phenotypic characteristics of this strain and because of the low16S rRNA similarity to other members of the genus Bacillus, it is likely that the strain represents a new species and thus it was chosen for genome sequencing. It was the 348th genome of a Bacillus species (Genomes Online Database) and the first genome of Bacillus jeddahensis sp. nov. sequenced. A summary of the project information is shown in Table 2. The Genbank accession number is CCAS00000000 ( Table 2) and consists of 149 contigs. Table 2 shows the project information and its association with MIGS version 2.0 compliance [25].
Growth conditions and genomic DNA preparation B. jeddahensis sp. nov. strain JCE T , CSUR P732, DSM 28281, was grown aerobically on 5 % sheep blood-enriched Columbia agar at 37°C. Four Petri dishes were spread and resuspended in 3 × 500 μl of TE buffer and stored at 80°C. Then, 500 μl of this suspension were thawed, centrifuged 3 min at 10,000 rpm and resuspended in 3 × 100 μL of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed by glass powder on the Fastprep-24 device (Sample Preparation system, MP Biomedicals, USA) using 2 × 20 s cycles. DNA was then treated with 2.5 μg/ μL lysozyme (30 min at 37°C) and extracted using the BioRobot EZ1 Advanced XL (Qiagen). The DNA was then concentrated and purified using the Qiamp kit (Qiagen). The yield and the concentration was measured by the Quant-it Picogreen kit (Invitrogen) on the Genios Tecan fluorometer at 50 ng/μl.

Genome sequencing and assembly
Genomic DNA of B. jeddahensis was sequenced on the MiSeq Technology (Illumina Inc, San Diego, CA, USA) with the 2 applications: paired end and mate pair. The paired end and the mate pair strategies were barcoded in order to be mixed respectively with 14 others genomic projects prepared with the Nextera XT DNA sample prep kit (Illumina) and eleven others projects with the Nextera Mate Pair sample prep kit (Illumina). The DNAg was quantified by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) to 16 ng/μl and dilution was performed to require 1ng of each genome as input to prepare the paired end library. The « tagmentation » step fragmented and tagged the DNA. Then limited cycle PCR amplification (twelve cycles) completed the tag adapters and introduced dual-index barcodes. After purification on AMPure XP beads (Beckman Coulter Inc, Fullerton, CA, USA), the libraries were then normalized on specific beads according to the Nextera XT protocol (Illumina). Normalized libraries were pooled into a single library for sequencing on the MiSeq. The pooled single strand library was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and paired end sequencing with dual index reads were performed in a single 39-h run in 2 × 250-bp. Total information of 5.3 Gb was obtained from a 574 K/mm2 cluster density with a cluster passing quality control filters of 95.4 % (11,188,000 clusters). Within this run, the index representation for B. jeddahensis was determined to 10.3 %. The 1,062,432 reads were filtered according to the read qualities. The mate pair library was prepared with 1 μg of genomic DNA using the Nextera mate pair Illumina guide. The genomic DNA sample was simultaneously fragmented and tagged with a mate pair junction adapter. The profile of the fragmentation was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc, Santa Clara, CA, USA) with a DNA 7500 labchip. The DNA fragments ranged in size from 1 kb up to 11 kb with an optimal size at 5 kb. No size selection was performed and 600 ng of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with an optimal at 692 bp on the Covaris device S2 in microtubes (Covaris, Woburn, MA, USA). The library profile was visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA). The libraries were normalized at 2 nM and pooled. After a denaturation step and dilution at 10 pM, the pool of libraries was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and sequencing run were performed in a single 42-h run in a 2 × 250-bp. Total information of 3.9 Gb was obtained from a 399 K/mm2 cluster density with a 0.0 0.5  The Gel View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a gray-scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed with and the peak intensity in arbitrary units  [29] and BLASTn against the GenBank database. Signal peptides and numbers of transmembrane helices were predicted using SignalP [30] and TMHMM [31], respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [32] and DNA Plotter [33] were used for data management and visualization of genomic features, respectively. Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [34].
To estimate the mean level of nucleotide sequence similarity at the genome level between B. jeddahensis sp nov. strain JCE T and nine other members of the genus Bacillus, we use the Average Genomic Identity of orthologous gene Sequences (AGIOS) program. Briefly, this software combines the Proteinortho software [35] to detect orthologous proteins between genomes compared on a pair-wise basis, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm. Moreover, we used Genome-to-Genome Distance Calculator (GGDC) web server available at (http://ggdc.dsmz.de) to estimate of the overall similarity among the compared genomes and to replace the wet-lab DNA-DNA hybridization (DDH) by a digital DDH (dDDH) [36,37]. GGDC 2.0 BLAST+ was chosen as alignment method and the recommended formula 2 was taken into account to interpret the results.

Genome properties
The genome 4,762,944 bp long (1 chromosome, but no plasmid) with a 39.42 % G + C content ( Fig. 6 and Table 3). It is composed of 149 contigs. Of the 4,741 predicted genes, 4,654 were protein-coding genes and 98 were RNAs including 6 rRNA (1 gene is 16S rRNA, 1 gene is 23S rRNA and 5 genes are 5S rRNA). A total of 3,410 genes (71.92 %) were assigned a putative function (by COGs or by NR blast) and 147 genes were identified as ORFans (3.17 %). The distribution of genes into COGs functional categories is presented in Table 4. The properties and statistics of the genome are summarized in Tables 3 and 4.  (Table 5). As it was reported recently that the G + C content varies no more than 1 % within species [38] and because the strain JCE T differs from most of the other closely strains by more than 1 % in G + C content, this might provide an additional argument for the new taxon described herein. The protein content of B. jeddahensis is higher than those of "B. massilioanorexius", "B. timonensis", B. licheniformis and B. niacini DSM 2923 T (4654 vs 4436, 4647, 4173 and 2184, respectively) but lower than those of "B. massiliosenegalensis", B. cereus, B. megaterium,B. bataviensis, B.vireti andB. niacini JAM F8 (4654 vs 4935, 5231, 5100, 5207, 5092 and 6103, respectively) ( Table 6). The distribution of genes into COG categories was not entirely similar in all the nine compared genomes (Fig. 7). In addition, B. jeddahensis  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. ND not determined  Table 5 summarizes the number of orthologous genes and the average percentage of nucleotide sequence identity between the different genomes studied.

Conclusions
On the basis of phenotypic characteristics (Additional file 1: Table S1), phylogenetic position ( Fig. 1), genomic analyses (taxonogenomics) ( Table 5) and GGDC results, we formally propose the creation of Bacillus jeddahensis sp. nov. that contains the strain JCE T . This strain has been found in obese human feces collected from Jeddah, Saudi Arabia.
The G + C content of the genome is 39.42 %. The 16S rRNA and genome sequences are deposited in GenBank under accession numbers HG931339 and CCAS00000000, respectively. The type strain JCE T (= CSUR P732 = DSM 28281) was isolated from the fecal flora of an obese man from Jeddah in Saudi Arabia.

Additional file
Additional file 1: Table S1. Differential phenotypic characteristics between B. jeddahensis sp. nov. strain JCE T and phylogenetically close Bacillus species.