Genome sequence of Shimia str. SK013, a representative of the Roseobacter group isolated from marine sediment

Shimia strain SK013 is an aerobic, Gram-negative, rod shaped alphaproteobacterium affiliated with the Roseobacter group within the family Rhodobacteraceae. The strain was isolated from surface sediment (0–1 cm) of the Skagerrak at 114 m below sea level. The 4,049,808 bp genome of Shimia str. SK013 comprises 3,981 protein-coding genes and 47 RNA genes. It contains one chromosome and no extrachromosomal elements. The genome analysis revealed the presence of genes for a dimethylsulfoniopropionate lyase, demethylase and the trimethylamine methyltransferase (mttB) as well as genes for nitrate, nitrite and dimethyl sulfoxide reduction. This indicates that Shimia str. SK013 is able to switch from aerobic to anaerobic metabolism and thus is capable of aerobic and anaerobic sulfur cycling at the seafloor. Among the ability to convert other sulfur compounds it has the genetic capacity to produce climatically active dimethyl sulfide. Growth on glutamate as a sole carbon source results in formation of cell-connecting filaments, a putative phenotypic adaptation of the surface-associated strain to the environmental conditions at the seafloor. Genome analysis revealed the presence of a flagellum (fla1) and a type IV pilus biogenesis, which is speculated to be a prerequisite for biofilm formation. This is also related to genes responsible for signalling such as N-acyl homoserine lactones, as well as quip-genes responsible for quorum quenching and antibiotic biosynthesis. Pairwise similarities of 16S rRNA genes (98.56 % sequence similarity to the next relative S. haliotis) and the in silico DNA-DNA hybridization (21.20 % sequence similarity to S. haliotis) indicated Shimia str. SK013 to be considered as a new species. The genome analysis of Shimia str. SK013 offered first insights into specific physiological and phenotypic adaptation mechanisms of Roseobacter-affiliated bacteria to the benthic environment.


Introduction
The Roseobacter group is known for its worldwide distribution and its broad metabolic versatility in a great variety of marine habitats [1][2][3]. About 25 % of all Roseobacter species with validly published names (42 out of 168) have a benthic origin [4]. In marine sediments, they can contribute up to 11 of all 16S rRNA genes and up to 10 % of total cell counts [5,6], but still little is known about the specific distribution and physiology of roseobacters in this habitat.
Shimia str. SK013, analysed in the present study, was isolated from the top centimeter of Skagerrak sediments at a water depth of 114 m below sea level (mbsl) [7]. The strain is affiliated with the genus Shimia which was first proposed by Choi and Cho in 2006 [8] in honor of Dr. Jae H. Shim, for his contributions to marine plankton ecology in Korea. According to Pujalte et al. [4], the genus Shimia consists of four species, with a fifth species Shimia sagamensis recently included. Members of the genus Shimia were isolated from different marine habitats: e.g. S. haliotis was isolated from the intestinal tract of the abalone Haliotis discus hannai [9], S. biformata from surface sea water [10], S. isoporae from reef building corals [11] and S. marina from a fish farm biofilm [8]. The new species affiliated to the genus Shimia (Shimia sagamensis) was isolated from cold seep sediment [12]. The sequenced genome of Shimia str. SK013 will allow for genetic comparison between the strain and other organisms of benthic origin, additional sediment-derived roseobacters and close relatives isolated from different habitats.
Here, we present the genome of Shimia str. SK013 with special emphasis on the genes involved in sulfur cycling such as dimethylsulfoniopropionate (DMSP) degradation and dimethyl sulfoxide reduction, as well as other anaerobic pathways such as nitrate reduction. The second focus is on genes which may be indicative for biofilm formation (pili, flagella and quorum sensing) as an adaptation to their surface-associated lifestyle.

Classification and features
Sediment samples were collected in July 2011 during a cruise with the RV 'Heincke' (expedition HE361) to the eastern North Sea. The strain was isolated from surface sediment (0-1 cm) of the Skagerrak (Site 27, 57°61.28′N, 8°58.18′E) at 114 mbsl from an aerobic enrichment culture. Shimia str. SK013 is a Gram-negative, motile, rod shaped bacterium with a length of 1.8 to 2.0 μm and a width of approximately 0.5 μm (Table 1; Fig. 1). Colonies are small, slightly domed and white to transparent on artificial sea water medium agar plates, but cream-coloured Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [52] or beige in marine broth medium agar plates. The strain is mesophilic (range: 10-35°C, T opt = 30°C), neutrophilic (optimum pH: 6-7) and halophilic (optimum: 2-3 % w/v). Shimia str. SK013 grows well in liquid medium but relatively slowly on agar-solidified marine broth and artificial sea water medium. The strain is able to utilize various substrates such as glucose, lactose, glutamate, mannose, xylose, acetate and citrate. When Shimia str. SK013 grows in ASW medium with glutamate as sole carbon source, cell-connecting filaments that might represent bundleforming pili or specialized flagella are induced (Fig. 1). However, these structures were not observed in cultures amended with any other tested substrate (see above). The 16S rRNA gene sequence of Shimia str. SK013 (1453 bp) was analysed using ARB [13] and revealed 98.56 % sequence similarity to the next relative, Shimia haliotis. Furthermore, in the phylogenetic tree, Shimia str. SK013 is branching together with the other Shimia species except Shimia biformata (Fig. 2).

Genome sequencing information
Genome project history Shimia str. SK013 was selected for draft genome sequencing based on its physiological and phenotypical features and its benthic origin. The information related to this project is summarized in Table 2. The draft genome is deposited in the Genomes On Line Database [14] and in the Integrated Microbial Genome database [15]. The Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number LAJH00000000.1.

Growth conditions and genomic DNA preparation
Shimia str. SK013 was enriched and isolated from agar plates containing artificial sea water medium [16] with DMS (100 μM) and lactate (5 mM) as substrates, incubated at 15°C. The genomic DNA extraction was performed using a DNA isolation kit (MO BIO, Carlsbad, CA, USA), following the manufactures instructions.

Genome sequencing and assembly
Whole-genome sequencing was performed using the Illumina technology. Preparation of paired-end sequencing library with the Illumina Nextera XT library preparation kit and sequencing of the library using the Genome Analyzer IIx were performed as described by the manufacturer (Illumina, San Diego, CA, USA). A total of 11,098,582 paired-end reads were derived from sequencing and trimmed using Trimmomatic version 0.32 [17]. De novo assembly of all trimmed reads with SPAdes version 3.5.0 [18] resulted in 28 contigs and 137.9-fold coverage. A summary of project information is shown in Table 2.

Genome annotation
Protein-coding genes were identified as part of the genome annotation pipeline the Integrated Microbial Genomes Expert Review platform using Prodigal v2.50. The predicted CDS were translated and used to search the CDD, KEGG, UniProt, TIGRFam, Pfam and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [19], RNAmmer [20], Rfam [21], TMHMM [22] and Sig-nalP [23]. Additional gene prediction analyses and functional annotation were performed within the IMG-Expert Review platform [24].

Genome properties
The genome analysis showed the presence of 28 scaffolds corresponding to one large chromosome with a total length of 4,049,808 bp and a G + C content of 57.22 % ( Table 3). The absence of additional   [25]. In total, 4,028 genes were predicted, in which 3,981 were proteincoding genes and 47 RNA genes. About 82.35 % were protein-coding genes with a putative function while those remaining were annotated as hypothetical proteins. The genome statistics are further provided in Table 3 and in Fig. 3. The distribution of genes into functional categories (clusters of orthologous groups) is shown in Table 4.

Not in COGs
The total is based on the total number of protein-coding genes in the genome   (Shim_09600, 31260). The conspicuous morphological trait of cell-connecting filaments in Shimia str. SK013 (Fig. 1) led to the search for the presence of genes involved in the formation of pili and flagella. The bacterial flagellum is one of the signal transduction systems with complex proteins which enables the bacterial reorientation and motility [32]. So far three different types of flagella gene clusters (FGCs) were described, designated fla1, fla2 and fla3 in Rhodobacteraceae that originated from FGC duplications [33]. Genome analysis revealed the presence of a single compact flagella gene cluster of the fla1-type on the chromosome (con-tig_000021; Shim_33080 to Shim_33420) that contains all genes necessary for the assembly of a functional flagellum. Recently, Frank et al. [33] showed for the plasmid curing mutant of Marinovum algicola DG898, which is lacking the 143-kb plasmid pMaD5 with a fla2-type FGC, a conspicuous morphological similarity with the filamentous structures observed in the current study for Shimia str. SK013 (Fig. 1). The bundles of filaments were explained by the presence of an additional chromosome-encoded fla1-type flagellum in Marinovum. However, genes for type IV pilus biogenesis, which were found in Shimia str. SK013 (Shim_13020, Shim_37620) are also present in the genome of M. algicola DG898 (MALG_02262) and thus, it is remains unclear if the conspicuous bundles at the cell pole are caused by pilus and/or flagellum formation.
As the described morphological traits are often related to a surface-associated lifestyle, we also searched the genome of Shimia str. SK013 for genes involved in the production of signalling molecules and quorum sensing as indicators for the communication within biofilms. Earlier studies showed that quorum sensing signals are mainly associated with virulence [34,35], but recent investigations revealed that these signalling molecules play a significant role in basic metabolic processes [36,37]. The presence of genes for the production of N-acylhomoserine lactones (AHLs) (Shim_31370) and homoserine lactones (Shim_16180) that are part of the quorum sensing system indicate that Shimia str. SK013 uses this form of bacterial communication. In contrast, the newly established genome only contains a few additional genes which interfere with quorum sensing such as quorum quenching or antibiotic biosynthesis related genes (AHL acylase QuiP precursor; Shim_09300) [38][39][40]. When compared to other selected roseobacters, these three signal molecule genes were also found in Roseobacter litoralis (RLO149_c018030, c029420, c006500) and Sediminimonas qiahouensis (G568DRAFT_00799, 01106, 03483). This finding was proven by an antiSMASH analysis [41] of the Shimia str. SK013 genome, indicating the presence of the type I polyketide synthase (PKS), the homoserine lactone cluster and the bacteriocin gene cluster.
Pairwise similarities of 16S rRNA genes of Shimia str. SK013 and the next relative, Shimia haliotis were 98.56 %. A genome comparison of Shimia str. SK013 with the available draft genomes from the KMG-2 project, Genomic encyclopedia of Bacteria and Archaea (GEBA) [42,43] of Shimia haliotis DSM 28453 (IMG ID: 2619619046) and Shimia marina DSM 26895 (IMG ID: 2619618961) was conducted using the online analysis tool "Genome-Genome-Distance Calculator" 2.0 (GGDC). The results of the in silico calculated DNA-DNA hybridization (DDH) of Shimia str. SK013 suggests that the given genome might belong to a new species based on the low percentages obtained (Table 7). According to the GGDC tool, formula 2 was recommended for the comparison between the draft genomes as it provides higher DDH correlations than Average Nucleotide Identity (ANI) implementations [44,45]. The analysis showed that Shimia str. SK013 only shared a genome sequence similarity of 21 % with Shimia haliotis DSM 28453 and 20 % with Shimia marina DSM 26895 and thus represents neither a new isolate of the species S. haliotis nor of S. marina. A direct comparison with the available Shimia genomes revealed further differences such as the IMG pathway counts (representing the number of metabolites and macromolecular complexes) and horizontally transferred gene counts (Table 5). Until now, genome sequences of S. bioformata, S. isoporae and Shimia sagamensis are not available for additional in silico calculated DNA-DNA hybridization or direct genome comparisons. However, as S. haliotis was identified as the closest relative by 16S rRNA gene analysis with a 66/60 % bootstrap support, the DDH data provide strong evidence that Shimia str. SK013 represents a new species within the genus Shimia. The standard deviations indicate the inherent uncertainty in estimating DDH values from intergenomic distances based on models derived from empirical test data sets (which are always limited in size); see [45] for details. The distance formulas are explained in [44]. Formula 2 is recommended, particularly for draft genome (like species above)