Genome sequence of Acuticoccus yangtzensis JL1095T (DSM 28604T) isolated from the Yangtze Estuary

Acuticoccus yangtzensis JL1095T is a proteobacterium from a genus belonging to the family Rhodobacteraceae; it was isolated from surface waters of the Yangtze Estuary, China. This strain displays the capability to utilize aromatic and simple carbon compounds. Here, we present the genome sequence, annotations, and features of A. yangtzensis JL1095T. This strain has a genome size of 5,043,263 bp with a G + C content of 68.63%. The genome contains 4286 protein-coding genes, 56 RNA genes, and 83 pseudo genes. Many of the protein-coding genes were predicted to encode proteins involved in carbon metabolism pathways, such as aromatic degradation and methane metabolism. Notably, a total of 31 genes were predicted to encode form II carbon monoxide dehydrogenases, suggesting potential for carbon monoxide oxidation. The genome analysis helps better understand the major carbon metabolic pathways of this strain and its role in carbon cycling in coastal marine ecosystems. Electronic supplementary material The online version of this article (10.1186/s40793-017-0295-6) contains supplementary material, which is available to authorized users.


Introduction
We isolated a member in the family Rhodobacteraceae, Acuticoccus yangtzensis JL1095 T (= CGMCC 1.12795 = DSM 28604), from surface waters of the Yangtze Estuary, China (31°N, 122°E) [1,2]. The physiological properties of members in the family Rhodobacteraceae suggest that they may be important in regulating the carbon cycle in terrestrial and marine ecosystems. For instance, many members of this family can degrade aromatic compounds [3] and metabolize one-carbon compounds [4]. Physiological tests of A. yangtzensis JL1095 T have shown that strain JL1095 T was able to degrade naphthol-AS-BIphosphate, and utilize acetic acid and glycerol [1]. In addition, many members of the family Rhodobacteraceae examined to date have the ability to oxidize CO.
CO is an important atmospheric trace gas that contributes to climate change despite its low concentrations (0.05-0.12 ppm) in air [5]. Although CO is toxic for many organisms, a number of microbes can consume CO. Marine microbial CO oxidation represents an important CO sink in the oceans. CODHs, key enzymes for CO oxidation, have been classified into two major types based on their cofactor composition, structure, and stability in the presence of dioxygen [6]. Ni-and Fecontaining CODHs are found in anaerobic bacteria and archaea, while Cu-and Mo-containing CODHs are found in aerobic bacteria [7]. Compared with the relatively hypoxic and high CO concentrations in the early Earth environment [8], the ecological significance of aerobic CO oxidation has become increasingly critical in the relatively aerobic and low CO concentrations in modern environments. Aerobic CO oxidation is carried out by phylogenetically and physiologically diverse aerobic bacteria and certain newly identified archaea that are distributed in a variety of habitats, including terrestrial, sedimentary, freshwater, and marine ecosystems [9]. The most active CO oxidizers belong to various genera, such as Ruegeria, Roseobacter, Stappia and Silicibacter, mostly from the family Rhodobacteraceae [10,11]. Based on phylogenic analysis of 16S rRNA sequences and physiological characteristics, A. yangtzensis JL1095 T is most closely related to  a Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [22] the genus Stappia [1], in which all known and examined to date have the ability to oxidize CO, containing form I and II cox gene operons [12][13][14].
In this study, we describe the classification and features of A. yangtzensis JL1095 T , report its first draft genome sequence, and explore its major carbon metabolic pathways and potential capability to oxidize CO.
The draft genome sequence of A. yangtzensis JL1095 T has one full-length 16S rRNA gene sequence (1450 bp; BIX52_RS22260) that was consistent with the partial 16S rRNA gene sequence from the original species description (1397 bp; KF741873) [1]. Strain JL1095 T showed the highest 16S rRNA gene sequence similarity with Stappia indica B106 T (92.7%) followed by Stappia stellata IAM 12621 T (92.6%) and Labrenzia suaedae DSM 22153 T (92.3%). The phylogenetic tree was constructed to assess the evolutionary relationships between strain JL1095 T and other related strains with the MEGA 5.05 software by using a neighbor-joining algorithm with the Jukes-Cantor model. The phylogeny of the strain JL1095 T illustrated that one monophyletic branch is formed at the periphery of the evolutionary radiation occupied by the various genera in the family Rhodobacteraceae (Fig. 2).

Genome project history
This strain was selected for sequencing on the basis of its important evolutionary position, the degradation of aromatic and simple hydrocarbon compounds via metabolism [1], and its potential CO oxidation ability. The sequencing of the A. yangtzensis JL1095 T genome was carried out at Beijing Novogene Bioinformatics Technology Co., Ltd. The genome sequence of A. yangtzensis JL1095 T has been deposited in the GOLD [15] and DDBJ/EMBL/ GenBank under accession number MJUX00000000. A summary for the genome sequencing information of A. yangtzensis JL1095 T is listed in Table 2, in compliance with MIGS version 2.0 [16].

Growth conditions and genomic DNA preparation
A. yangtzensis JL1095 T (= CGMCC 1.12795 = DSM 28604) was cultivated aerobically in MB (Difco) medium. The genomic DNA of strain JL1095 T was extracted using the Tguide Bacteria Genomic DNA Kit (OSR-M502, TIANGEN Biotech Co. Ltd., Beijing, China) in accordance with the instruction manual. After this strain was cultivated in MB medium in the shaker at 35°C for 2-3 days, the total DNA obtained was subjected to quality control by agarose gel electrophoresis and quantified by Qubit 2.0 fluorometer (Life Technologies, MA, USA). The tree was constructed with MEGA 5.05 software by using the neighbor-joining (NJ) method for 16S rRNA gene sequences. Accession numbers in the GenBank database are shown in parentheses. Reference sequences from relative strains that has been sequenced and obtained a public genome are in blue font, while the JL1095 T sequence is in blue bold font. The numbers at the nodes indicate bootstrap percentages based on 1000 replicates; only values higher than 50% are shown. Bar, 0.02 substitutions per nucleotide position. Thauera aminoaromatica S2 T was used to root the tree

Genome sequencing and assembly
The genome sequencing of this strain was conducted using Illumina HiSeq 2500 paired-end sequencing technology under the PE 150 strategy. A total filtered read size of 1674 Mbp was obtained. The filtered reads were assembled by SOAPdenovo version 2.04 software and 29 contigs were generated [17,18]. Gene prediction was performed on the genome assembly using GeneMarkS version 4.17 [19].

Genome annotation
Functional annotation of the coding sequences was performed by searching various databases (KEGG [20], NR, COG [21], and GO [22]). The rRNA genes of strain JL1095 T were predicted using rRNAmmer software [23], tRNA genes were identified using tRNAscan-SE [24], and sRNA were predicted by BLAST searches against the Rfam database [25]. The online CRISPRFinder program was used for CRISPR identification [26].

Genome properties
The A. yangtzensis JL1095 T genome was composed of 5,043,263 bp with a G + C content of 68.63%. A total of 4286 protein-coding genes were predicted with an average length of 994 bp, occupying 87.01% of the genome. The genome also contained 56 RNA genes and 83 pseudo genes. Detailed genome statistical information is shown in Table 3. COG categories were assigned to 2522 of the protein-coding genes which were classified into 21 functional groups. The most dominant COG categories were "amino acid transport and metabolism" followed by "general function prediction only", "function unknown", and "energy production and conversion". Detailed gene numbers and percentages related with the COG categories are shown in Table 4. In total, 2470  The total is based on the total number of protein coding genes in the genome protein-coding genes were assigned to 153 KEGG metabolic pathways, including key genes involved in carbon metabolism processes such as gluconeogenesis, polycyclic aromatic hydrocarbon degradation, and methane metabolism. In addition, based on the GO database, 1992 protein-coding genes were assigned to molecular function, 1394 genes were assigned to cellular components, and 2646 genes were assigned to biological processes.

Insights from the genome sequence
We performed a systematic analysis of the proteincoding genes with functional predictions by BLAST searches against the four databases (KEGG, NR, COG, and GO), with E-value <1e − 5 and minimal alignment length of >40%. Strain JL1095 T was predicted to contain most of the genes central to carbon metabolism, including those related to glycolysis/gluconeogenesis, the tricarboxylic acid cycle, and the pentose phosphate pathway. Furthermore, about 198 genes were assigned to COG categories related to carbohydrate transport and metabolism, including fructose, mannose, and galactose metabolism. These carbohydrate metabolic characteristics are generally coincident with those obtained from a sole-carbon-source utilization experiment [1]. The capacity of this strain to degrade aromatic compounds such as naphthol-AS-BIphosphate has been identified. Approximately 236 genes were involved in 13 KEGG metabolic pathways related to aromatic compounds degradation, such as polycyclic aromatic hydrocarbon, bisphenol, and naphthalene. Aromatic compounds are important environmental organic pollutants because of their persistence in environments, toxicity, and carcinogenic characteristics [27]. Furthurmore, strain JL1095 T was annotated to contain 48 genes related to methane metabolism.
Based on results from the four functional annotation databases, the A. yangtzensis JL1095 T genome contained a total of 31 genes predicted to encode aerobic-type CODHs (Additional file 1: Table S1). The cox gene clusters that encode aerobic CODHs have been classified into two major forms based on genome analysis [9]. Form I genes are mainly from Oligotropha, Mycobacterium and Pseudomonas, and form II putative genes are mainly from Bradyrhizobium, Mesorhizobium, and Sinorhizobium [13]. Form I and II cox gene operons consisted of three conserved structural genes that were transcribed as coxMSL and coxSLM, respectively [28,29]. For strain JL1095 T , three structural genes containing coxS (small subunit), coxM (medium subunit) and coxL (large subunit) were all sequenced. Form I coxS and coxM gene sequences were similar to form II coxS and coxM gene sequences, but the form II putative coxL gene sequence was approximately 40-50% similar to the form I coxL gene sequence [9]. Therefore, the coxL gene has been used as a molecular Fig. 3 Unrooted phylogenetic tree showing the coxL genetype of Acuticoccus yangtzensis JL1095 T . The tree was constructed with MEGA 5.05 software by using the neighbor-joining (NJ) method based on the form I coxL and form II putative coxL genes from CO-oxidizing microbes. Accession numbers in the GenBank database are shown in parentheses. The coxL genes encoded in the Acuticoccus yangtzensis JL1095 T genome are shown in bold. Sequences in orange and blue shades represent form I and II coxL genes, respectively. The numbers at the nodes indicate bootstrap percentages based on 1000 replicates; only values higher than 50% are shown. Bar, 0.05 substitutions per nucleotide position biomarker to explore the distribution of aerobic CO bacteria in ecosystems [29]. We constructed the coxL phylogenetic tree for strain JL1095 T and confirmed that four predicted coxL genes (Locus tag: BIX52_RS02480, BIX52_RS05715, BIX52_RS17810 and BIX52_RS18370) were recognized as form II coxL genes (Fig. 3). Additionally, the accessory genes were also essential for CO oxidation to take place. The accessory genes in forms I and II varied substantially, and even within the same form, the order and subunit types varied among isolates [9]. Form I cox accessory genes, including coxB, C, G, H, I, and K, were distributed flexibly around the structural genes. Among the form II cox accessory genes, coxG was usually an indispensable gene compared with other accessory genes, such as coxD, E, and F [28]. For this strain, the accessory gene coxG was detected. Form I CODH has been specifically characterized for its ability to oxidize CO, while form II is a putative CODH and its ability to oxidize CO remains uncertain. For the Roseobacter clade, both coxL forms were present, which enables them to oxidize CO [11]. Phylogenetic analysis using the 16S rRNA gene sequences of A. yangtzensis JL1095 T and Roseobacter clade bacteria indicates that JL1095 T does not belong to the Roseobacter clade (Fig. 4). However, many other bacteria containing only form II cox genes have been shown by molecular and culture-based methods to oxidize CO, including Mesorhizobium sp. strain NMB1, Mesorhizobium loti, Aminobacter sp. strain COX, Xanthobacter sp. strain COX, and Burkholderia sp. strain LUP [13]. According to the phylogenetic tree (Fig. 3), the coxL genes of JL1095 T Fig. 4 Unrooted phylogenetic tree displaying the relationship between Acuticoccus yangtzensis JL1095 T and Roseobacter clade bacteria. The tree was constructed with MEGA 5.05 software by using the neighbor-joining (NJ) method based on 16S rRNA gene sequences. Accession numbers in the GenBank database are shown in parentheses. The 16S rRNA gene encoded in the Acuticoccus yangtzensis JL1095 T genome is shown in bold. The numbers at the nodes indicate bootstrap percentages based on 1000 replicates; only values higher than 50% are shown. Bar, 0.01 substitutions per nucleotide position