High-quality-draft genomic sequence of Paenibacillus ferrarius CY1T with the potential to bioremediate Cd, Cr and Se contamination

Paenibacillus ferrarius CY1T (= KCTC 33419T = CCTCC AB2013369T) is a Gram-positive, aerobic, endospore-forming, motile and rod-shaped bacterium isolated from iron mineral soil. This bacterium reduces sulfate (SO4 2−) to S2−, which reacts with Cd(II) to generate precipitated CdS. It also reduces the toxic chromate [Cr(VI)] and selenite [Se(VI)] to the less bioavailable chromite [Cr(III)] and selenium (Se0), respectively. Thus, strain CY1T has the potential to bioremediate Cd, Cr and Se contamination, which is the main reason for the interest in sequencing its genome. Here we describe the features of strain CY1T, together with the draft genome sequence and its annotation. The 9,184,169 bp long genome exhibits a G + C content of 45.6%, 7909 protein-coding genes and 81 RNA genes. Nine putative Se(IV)-reducing genes, five putative Cr(VI) reductase and nine putative sulfate-reducing genes were identified in the genome.


Introduction
The genus Paenibacillus was established in 1993 with Paenibacillus polymyxa as the type species [1,2]. The common characteristics of the Paenibacillus members are aerobic, Gram-positive, rod-shaped and endosporeforming [3]. Some Paenibacillus strains have the ability for plant growth promotion, biocontrol, manufacturing process and bioremediation, which making them very important in agricultural, industrial and medical applications [4]. A variety of industrial wastes including crude oil, diesel fuel, textile dyes, aliphatic and aromatic organic pollutants could be degraded by Paenibacillus strains [5][6][7][8][9][10][11]. However, the bioremediation of heavy metal(loids) contamination by Paenibacillus strains are rarely reported.
Paenibacillus ferrarius CY1 T is a multi-metal(loids) resistant bacterium isolated from iron mineral soil in Hunan Province, China [12]. During cultivation, it could efficiently reduce sulfate (SO 4 2− ) to S 2− , which could precipitate with cadmium [Cd(II)] to generate CdS [13]. In addition, it also reduces the more toxic chromate [Cr(VI)] and selenite [Se(VI)] to the much less toxic chromite [Cr(III)] and selenium (Se 0 ), respectively. Based on these interesting features, we propose that strain CY1 T represents a promising candidate for bioremediation of Cd, Cr and Se contamination. To gain insight into the molecular mechanisms involved in sulfate/chromate/selenite reduction and metal(loids) resistance, and to enhance its biotechnological applications, we analyze the high quality draft genome of this bacterium.

Organism information
Classification and features P. ferrarius CY1 T is a Gram-positive, endosporeforming, motile and aerobic bacterium. The rod-shaped cells are 0.5-0.8 mm in width and 4.2-5.7 mm in length with peritrichous flagella (Fig. 1). Colonies are yellowish to creamy-white, smooth and circular on NA agar plate [12]. Growth occurs at temperature and pH range of 4-37°C and pH 5.0-8.0, respectively [12]. Optimal growth occurs at 28°C and pH 6.0-7.0 (Table 1). Strain CY1 T grows on NA/R2A/LB and TSA media, but cannot grow on MacConkey agar [12]. The phylogenetic relationship of P. ferrarius CY1 T with other members within the genus Paenibacillus is shown in a 16S rRNA based neighbor-joining tree, and strain CY1 T is closely related to Paenibacillus marchantiophytorum R55 T (KP056549) (Fig. 2).
Physiological and biochemical analyses were performed using the API 20NE test (bioMérieux, France), ID 32GN text (bioMérieux, France) and traditional classification methods. Strain CY1 T is positive for oxidase and catalase activities, hydrolysis of Tween 80 and aesculin and production of NH 3 and H 2 S, but is negative for nitrate reduction, citrate utilization, egg yolk reaction, production of indole, and hydrolysis of starch, gelatin, casein, urea, L-tyrosine, arginine, Tween 20, DNA and CM-cellulose [12]. The carbon sources, which can be used by strain CY1 T , are shown in Table 1.
The resistance levels of P. ferrarius CY1 T for multimetal(loids) were tested with the minimal inhibition concentration on NA agar plates using Na 3 AsO 3 , K 2 Sb 2 (C 4 H 2 O 6 ) 2 , Na 2 SeO 3 , K 2 CrO 4 , CdCl 2 , PbCl 2 , CuCl 2 and MnCl 2 . The results showed that the MICs for As(III), Sb(III), Se(IV), Cr(VI), Cd(II), Pb(II), Cu(II) and Mn(II) are 2, 1, 8, 4, 0.08, 1, 0.5 and 100 mmol/L, respectively. In addition, the abilities of strain CY1 T for Cd(II) removal, and Cr(VI) and Se(IV) reduction were tested. Strain CY1 T was incubated in LB medium for Cd(II) removal and in NA medium for Cr(VI) and Se(IV) reduction, since NA medium can absorb some of the Cd(II). When OD 600 reach 0.6-0.7, CdCl 2 (50 μmol/L), K 2 CrO 4 (200 μmol/ L) and Na 2 SeO 3 (200 μmol/L) were each added to the culture. At designated times, culture samples were taken for measuring the residual concentrations of Cd(II), Cr(VI) and Se(IV). The concentration of Cd(II) was measured by the atomic absorption spectrometry [14]. The concentration of Cr(VI) was measured by the UV spectrophotometer (DU800, Beckman, CA, USA) with the colorimetric diphenylcarbazide method [15], and the concentration of Se(IV) was tested by HPLC-HG-AFS (Beijing Titan Instruments Co., Ltd., China) [16]. The results showed that strain CY1 T could  Evidence codes -IDA inferred from direct assay, TAS traceable author statement (i.e., a direct report exists in the literature), NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [51] remove nearly 50 μmol/L Cd(II) in 72 h ( Fig. 3a) and reduce 200 μmol/L Cr(VI) and Se(IV) in 5 h and 6 h, respectively (Fig. 3b, c). The removed Cd(II) is presented as pellets that is most probably by the reaction of Cd(II) with H 2 S to produce precipitated CdS.

Genome sequencing information
Genome project history Strain CY1 T was selected for genome sequencing on the basis of its ability for Cd(II) removal, Cr(VI) and Se(IV) reduction, these characters made strain CY1 T with great value for genetic study and for bioremediation of Cd, Cr and Se contamination. The draft genome sequence is deposited at DDBJ/EMBL/GenBank under the accession number MBTG00000000. The final genome consists of 73 scaffolds with 289.77 × coverage. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
Overnight cultures of strain CY1 T was inoculated into 50 mL of NA medium at 28°C with 120 rpm shaking. After incubation for 36 h, the bacterial cells were harvested through centrifugation (13,400×g for 5 min at 4°C). Genomic DNA was extracted using the QiAamp kit (Qiagen, Germany). The quality and quantity of the DNA were determined by a spectrophotometer (NanoDrop 2000, Thermo). Then, 10 μg of DNA was sent to Bio-broad Technology Co., Ltd., Wuhan, China for sequencing.

Genome sequencing and assembly
Genome sequencing and assembly were performed by Bio-broad Technology Co., Ltd., Wuhan, China, and all original sequence data can be found at the NCBI Sequence Read Archive. An Illumina standard shotgun library was constructed and sequenced using an Illumina  Data are shown as the mean of three replicates, with the error bars represents ± SD Hiseq2000 platform with pair-end sequencing strategy (300 bp insert size) [17]. The following quality control steps were performed for removing low quality reads: 1) removed the adapter sequences of reads; 2) trimmed the ambiguous bases (N) in 5′ end and the reads with a quality score lower than 20; and 3) filtered the reads which contain N more than 10% or have the length less than 50 bp (without adapters and N in 5′ end). The assembly of CY1 T genome is based on 20,189,278 quality reads totaling 3,000,798,615 bp, which provides a coverage of 289.77×. Subsequently, the reads were assembled into 75 contigs (> 200 bp) using SOAPdenovo v2.04 [18], and the gaps between the contigs were closed by GapCloser v1.12 [19].

Genome annotation
The draft genome of strain CY1 T was annotated through the RAST server version 2.0 and the NCBI Prokaryotic Genome Annotation Pipeline. Genes were identified using the gene caller GeneMarkS + with the similaritybased gene detection approach [20]. Pseudogenes were also predicted using the NCBI PGAP. Internal gene clustering was performed by OrthoMCL using Match cutoff of 50% and E-value Exponent cutoff of 1-e5 [21,22]. The COGs functional categories were assigned by WebMGA server [23] with E-value cutoff of 1-e10. The   The total is based on the total number of protein coding genes in the annotated genome Fig. 4 A graphical circular map of strain P. ferrarius CY1 T . From outside to center, rings 1 and 2 denotes the predicted coding sequences on forward/reverse strand with each gene colored by its assigned COG category; ring 3 shows G + C % content plot and ring 4 shows GC skew translations of the predicted CDSs were used to search against the Pfam protein family database [24] and the KEGG database [25]. The transmembrane helices and signal peptides were predicted by TMHMM v. 2.0 [26] and SignalP 4.1 [27], respectively.

Genome properties
The whole genome of strain CY1 T reveals a genome size of 9,184,169 bp and a G + C content of 45.6% (Table 3). The genome contains 8260 coding sequences, 19 rRNA, 58 tRNA, and 4 ncRNA. Among 7909 protein-coding genes, 4231 were assigned as putative function, while the other 3678 were designated as hypothetical proteins. In addition, 6632 genes were categorized into COGs functional groups. Information about the genome statistics is shown in Table 3 and the classification of genes into COGs functional categories is summarized in Table 4.
Insights from the genome sequence P. ferrarius CY1 T is a multi-metal(loids) resistant bacterium with the capability of SO 4 2− , Cr(VI) and Se(IV) reduction, suggesting that it has developed a number of evolutionary strategies to adapt to heavy metal (or metalloids) contaminated environments. To identify pathways and enzymes involved in SO 4 2− , Cr(VI) and Se(IV) reduction, high quality draft genome sequence of strain CY1 T was generated. The map of the P. ferrarius CY1 T genome is shown in Fig. 4.
KEGG analysis showed that strain CY1 T contains a complete SO 4 2− reduction pathway, which is consistent with the phenotype of H 2 S production. The genes responsible for SO 4 2− reduction include sulfate ABC transporter CysPWA, sulfate adenylyltransferase CysD, adenylylsulfate kinase CysC, adenylylsulfate reductase CysH and sulfite reductase CysJI ( Table 5). The S 2− generated from SO 4 2− reduction could react with Cd(II) to form the participated CdS [13], which may contribute to the   [28], were identified in the genome of strain CY1 T (Table 5). It has been reported that thioredoxin reductase ThxR and NADH:flavin oxidoreductase could reduce Se(IV) in Pseudomonas seleniipraecipitans and Rhizobium selenitireducens, respectively [29][30][31]. According to the NCBI and RAST annotation, seven thioredoxin reductases and two NADH-dependent flavin oxidoreductases were found in the genome of strain CY1 T (Table 5), and some of these proteins may responsible for Se(IV) reduction in strain CY1 T . Strain CY1 T could tolerant multi-metal(loids), such as As(III), Sb(III), Cr(VI), Cd(II), Pb(II), Cu(II) and Mn(II). Expectably, various metal resistant genes were identified in its genome (Table 6). Several transporters were found to responsible for the efflux of these metal(loids). In addition, the transcriptional regulator ArsR and arsenite reductase ArsC were also found to be involved in the As(III)/Sb(III) resistance (Table 6) [32][33][34]. Recently, it has been reported that an oxidoreductase AnoA, which belongs to the shortchain dehydrogenase/reductase family, and catalase KatA, which is responsible for H 2 O 2 degradation, are all involved in bacterial Sb(III) oxidation/resistance in Agrobacterium tumefaciens GW4 [35][36][37][38]. One AnoA homologue oxidoreductase gene and five catalase genes were identified in the genome of strain CY1 T (Table 6), which may associate with Sb(III) oxidation/resistance.

Conclusions
The genome of P. ferrarius CY1 T harbors various genes responsible for sulfate transport and reduction, chromate and selenite reduction and resistance of multi-metal(loids), which is consistent with its phenotypes. To date, the utilization of Paenibacillus species in immobilization of heavy-metals (or metalloids) is still limited and the genes and enzymes involves in Cr(VI) and Se(IV) reduction were poorly understood in Paenibacillus members. The genomic sequence of strain CY1 T enriches the genome information of Paenibacillus strains. More importantly, the genome information provides basis for understanding molecular mechanisms of microbial redox transformations of metal(loids).