High-quality-draft genome sequence of the heavy metal resistant and exopolysaccharides producing bacterium Mucilaginibacter pedocola TBZ30T

Mucilaginibacter pedocola TBZ30T (= CCTCC AB 2015301T = KCTC 42833T) is a Gram- negative, rod-shaped, non-motile and non-spore-forming bacterium isolated from a heavy metal contaminated paddy field. It shows resistance to multiple heavy metals and can adsorb/remove Zn2+ and Cd2+ during cultivation. In addition, strain TBZ30T produces exopolysaccharides (EPS). These features make it a great potential to bioremediate heavy metal contamination and biotechnical application. Here we describe the genome sequence and annotation of strain TBZ30T. The genome size is 7,035,113 bp, contains 3132 protein-coding genes (2736 with predicted functions), 50 tRNA encoding genes and 14 rRNA encoding genes. Putative heavy metal resistant genes and EPS associated genes are found in the genome.


Introduction
The genus Mucilaginibacter was first established by Pankratov et al. in 2007 and the type species is Mucilaginibacter paludis [1]. The common characteristics of this genus are Gram-negative, non-spore-forming, nonmotile, rod-shaped and producing exopolysaccharides (EPS) [1,2]. EPS are long-chain polysaccharides and consist of branched, repeating units of sugars or sugar derivatives [3]. EPS producing bacteria play an important role in environmental bioremediation such as water treatment, sludge dewatering and metal removal [4]. So far, genomic features of Mucilaginibacter strains are less studied.
Mucilaginibacter pedocola TBZ30 T (= CCTCC AB 2015301 T = KCTC 42833 T ) was isolated from a heavy metal contaminated paddy field in Hunan Province, P. R. China [5]. Here we show that strain TBZ30 T is resistant to multiple heavy metals and remove Zn 2+ and Cd 2+ . In addition, strain TBZ30 T is able to produce EPS. The genomic information of strain TBZ30 T are provided.

Genome information
Genome project history M. pedocola TBZ30 T was sequenced on the basis of its abilities of heavy metals resistance and removal, which has a great potential for bioremediation. The draft genome was sequenced by Wuhan Bio-Broad Co., Ltd., Wuhan, China. The high-quality-draft genome sequence has been deposited at DDBJ/EMBL/GenBank under the accession number MBTF00000000.1. The project information is shown in Table 2.
Growth condition and DNA isolation M. pedocola TBZ30 T was grown in R2A medium at 28°C for 36 h with continuous shaking at 120 rpm. Bacterial cells were harvested through centrifugation (13,400×g for 5 min at 4°C) and the total genomic DNA was extracted using the QiAamp kit (Qiagen, Germany). The quality and quantity of the DNA were determined using a spectrophotometer (NanoDrop 2000, Thermo).

Genome sequencing and assembly
Whole-genome DNA sequencing was performed in Bio-broad Co., Ltd., Wuhan, China using Illumina standard shotgun library and Hiseq2000 pair-end sequencing strategy [12]. For accuracy of assembly, low quality of the original sequence data reads were removed. The    [13]. The part gaps of assembly were filled and the error bases were revised using GapCloser v1.12 [14].

Genome annotation
The genome of strain TBZ30 T was annotated through the NCBI PGAP, which combined the gene caller Gene-MarkS + with the similarity-based gene detection approach [15]. Pseudo genes were predicted using the NCBI PGAP. Internal gene clustering was performed by the OrthoMCL program using Match cutoff of 50% and E-value Exponent cutoff of 1-e5 [16,17]. The COGs functional categories were assigned by the WebMGA server with E-value cutoff of 1-e10 [18]. The translations of the predicted CDSs were used to search against the Pfam protein family database and the KEGG database [19,20]. The transmembrane helices and signal peptides were predicted by TMHMM v. 2.0 and SignalP 4.1, respectively [21,22].   The total is based on the size of the genome in base pairs and the total number of protein coding genes in the annotated genome The total is based on the total number of protein coding genes in the genome Fig. 5 A graphical circular map of Mucilaginibacter pedocola TBZ30 T . From outside to center, rings 1, 4 show protein-coding genes colored by COG categories on forward/reverse strand; rings 2, 3 denote genes on forward/reverse strand; rings 5 show G + C % content; ring 6 shows G + C % content plot and the innermost ring shows GC skew

Genome properties
The genome size of strain TBZ30 T is 7,035,113 bp with an average G + C content of 46.1% (Table 3). It has 6072 genes including 5935 protein-coding genes, 70 pseudo genes and 14 rRNA, 50 tRNA, and 3 ncRNA genes. The information of the genome statistics is shown in Table 3 and the classification of genes into COGs functional categories is summarized in Table 4. The graphical genome map is provided in Fig. 5.

Insights from the genome sequence
Strain TBZ30 T could be resistant to multiple heavy metals (Zn 2+ , Cd 2+ , Pb 2+ , Cu 2+ and As 3+ ) and adsorb/ remove Zn 2+ and Cd 2+ during cultivation. Analyzing of its genome, various putative proteins related to multiple heavy metals resistance are found (Table 5). RND efflux systems (CzcABC), CDF efflux systems (CzcD and YieF) and P-type ATPases (HMA and ZntA) are responsible for the efflux of Zn 2+ , Cd 2+ and Pb 2+ [23][24][25][26][27].  [28][29][30], and CutC is involved in Cu 2+ homeostasis [30][31][32]. Moreover, As 3+ resistant proteins including arsenite efflux pump ACR3, arsenate reductase ArsC, arsenite S-adenosylmethyltransferase ArsM and arsenic resistance repressor ArsR are also found [33][34][35] ( Table 5). Strain TBZ30 T produces EPS during cultivation. According to KEGG analysis, the complete biosynthesis pathway of repeating units of nucleotide sugars are identified in the genome, including the biosynthesis of CDP-Glc, ADP-Glc and GDP-D-man (Table 5). Genes related to long-chain polysaccharide assembly are also found ( Table 5). The EPS production pathway in strain TBZ30 T appears to belong to ABC transporter dependent pathway [36]. First, the 3-deoxy-D-manno-octulosonic-acid transferase (KdtA) is responsible for the synthesis of poly-Kdo linker using either diacyl or monoacyl phosphatidylglycerol as the substrate [36]; Then priming glycosyltransferase (CpsE) catalyzes the transformation of the first repeating unit to the poly-Kdo linker; Next, glycosyltransferases catalyze the synthesis of EPS repeat-unit; Finally, the polymerized repeat-units are exported through an envelope-spanning complex consisting of ABC transporter (KpsMT), polysaccharide co-polymerase protein (PCP) and outer membrane polysaccharide protein (OPX) [37,38]. In addition, strain TBZ30 T genome owns a flippase (Wzx) which catalyzes the translocation of repeat-units crossing the cytoplasmic membrane. EPS have been reported to play an important role in metal removal [3]. Therefore, it is possible that the EPS of strain TBZ30 T participate in Zn 2+ and Cd 2+ removal by adsorption.

Conclusions
To the best of our knowledge, this study presents the first genomic information of a Mucilaginibacter type strain. The data reveal good correlation between genotypes and phenotypes. The genome information and the features provide insights for further theoretical and applied analysis of M. pedocola TBZ30 T and the related Mucilaginibacter members.