High-quality draft genome sequence of Aquidulcibacter paucihalophilus TH1–2T isolated from cyanobacterial aggregates in a eutrophic lake

Aquidulcibacter paucihalophilus TH1–2T is a member of the family Caulobacteraceae within Alphaproteobacteria isolated from cyanobacterial aggregates in a eutrophic lake. The draft genome comprises 3,711,627 bp and 3489 predicted protein-coding genes. The genome of strain TH1–2T has 270 genes encoding peptidases. And metallo and serine peptidases were found most frequently. A high number of genes encoding carbohydrate active enzymes (141 CAZymes) also present in strain TH1–2T genome. Among CAZymes, 47 glycoside hydrolase families, 37 glycosyl transferase families, 38 carbohydrate esterases families, nine auxiliary activities families, seven carbohydrate-binding modules families, and three polysaccharide lyases families were identified. Accordingly, strain TH1–2T has a high number of transporters (91), the dominated ones are ATP-binding cassette transporters (61) and TonB-dependent transporters (28). Major TBDTs are Group I, which consisted of transporters for various types of dissolved organic matter. These genome features indicate adaption to cyanobacterial aggregates microenvironments.


Introduction
Lake Taihu is the third largest freshwater lake in China, located in the rapidly-developing, economically-important Changjiang (Yangtze) River Delta. Microcystis spp. often form large mucilaginous blooms in the lake due to anthropogenic nutrient over-enrichment. These bloom aggregates were composed of extracellular polymeric substances, produced via a number of approaches including excretion, secretion, sorption and cell lysis, comprising a heterogeneous polymer and mainly consisted of polysaccharides, proteins, lipids and humic substances [1]. Within the bloom, a variety of niches are created within a dense scum that can be 10-30 cm in thickness [2]. The diel shifts lead to changes in the dissolved oxygen levels with oxygen enrichment during the day and depleted at night, and with microaerobic zones present at all times within the Microcystis spp. blooms [3]. It is known that many heterotrophic bacteria live in association with cyanobacteria [4,5]. To maintain the dominance of the cyanobacterial bloom, bacterial taxa within the cyanobacterial aggregates possibly catalyze the turnover of complex organic matters released by cyanobacteria, to recycle the previously-loaded nutrient sources [5].
Aquidulcibacter paucihalophilus type strain TH1-2 T (=CGMCC 1.12979 T = LMG 28362 T ) is a member of the family Caulobacteraceae within Alphaproteobacteria isolated from cyanobacterial aggregates in lake Taihu, China [6]. The genus Aquidulcibacter currently includes only one cultivated strain. The sequenced genome of A. paucihalophilus TH1-2 T will provide the genetic basis for better understanding of adaptation to cyanobacterial aggregates and ecological function during the cyanobacterial bloom.
Here, we present the genome of A. paucihalophilus TH1-2 T with special emphasis on the genes coding for carbohydrate active enzymes and peptidases. The second focus is on genes coding for dedicated transport systems for the uptake of macromolecule decomposition products which released by cyanobacteria Microcystis spp., such as ATP-binding cassette transporters and TonBdependent transporter system.

Classification and features
Cyanobacterial bloom samples were taken from Lake Taihu. Samples were transferred to 500 mL beakers and left at room temperature for 2 h. This resulted in flotation of the cyanobacterial aggregates to the top of the beaker. Several of the largest aggregates were selected for testing and washed three times in sterile lake water. A. paucihalophilus strain TH1-2 T was isolated from cyanobacterial aggregates [6]. The 16S rRNA gene sequence similarities between strain TH1-2 T and others were <91%. The position of strain TH1-2 T relative to its phylogenetic neighbors is shown in Fig. 1. Strain TH1-2 T formed a deeply separated branch, with the genera Asticcacaulis, Brevundimonas, Caulobacter and Phenylobacterium, which belong to the family Caulobacteraceae, and separate from the cluster with genera of the family Hyphomonadaceae (Fig. 1).

Genome sequencing information
Genome project history A. paucihalophilus strain TH1-2 T was selected for sequencing in 2017 based on its phylogenetic position and its isolation environment [6]. The quality draft assembly and annotation were made available for public access on Apr 24, 2017. The genome project is deposited in the Genomes OnLine Database as project Gp0225845. This Whole Genome Shotgun project has been deposited at GenBank under the accession NCSQ00000000.1. The NCBI accession number for the Bioproject is PRJNA382246. Table 2 presents the project information and its association with MIGS version 2.0 compliance [7].

Growth conditions and genomic DNA preparation
A. paucihalophilus strain TH1-2 T was grown in R2A agar medium at 30°C, as previously described [6]. Genomic DNA was isolated from 0.5 g of cell paste using Gentra Puregene Yeast/Bact. Kit (Qiagen) as recommended by the manufacturer.

Genome sequencing and assembly
Whole-genome sequencing was performed using the Illumina technology. Preparation of paired-end sequencing library with the Illumina Nextera XT library preparation kit and sequencing of the library using the Illumina HiSeq PE150 were performed as described by the manufacturer (Illumina, San Diego, CA, USA). A total of 17,033,314 paired-end reads totaling 5109.9 Mbp remained after quality trimming and adapter removal with Trimmomatic-0.33 [8]. The trimmed reads represented an average genome coverage of~1380-fold based on the size of the assembled draft genome of strain TH1-2 T . De novo assembly of all trimmed reads with SOAPdenovo v2.0 [9] resulted in 174 contigs. A summary of project information is shown in Table 2.

Genome annotation
Protein-coding genes were identified as part of the genome annotation pipeline the Integrated Microbial Genomes Expert Review platform using Prodigal v2.50. The predicted CDSs were translated and used to search the National Center for Biotechnology Information nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro database. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [10], RNAmmer [11], Rfam [12], TMHMM [13] and SignalP [14]. Additional gene prediction analyses and functional annotation were performed within the IMG-Expert Review platform [15].

Genome properties
The assembly of the draft genome sequence consists of 174 contigs amounting to 3,711,627 bp. The G + C content is 55.7 mol% (Table 3). A total of 3544 genes with 3489 protein-coding genes were predicted, whereas 2758 (77.82% of total genes) protein-encoding genes were associated with predicted functions. Of the RNA, 42 are tRNAs and 3 are rRNAs. The genome statistics are further provided in Table 3. The distribution of genes into functional categories (clusters of orthologous groups) is shown in Table 4.

Energy metabolism
A. paucihalophilus TH1-2 T has the complete Embden-Meyerhof-Parnas pathway, pentose 5-phosphate pathway and Entner-Doudoroff Pathway. For pyruvate oxidation to acetyl-coenzyme A, TH1-2 T contains a threecomponent pyruvate dehydrogenase complex. TH1-2 T has a complete tricarboxylic acid cycle with the glyoxylate shunt and a redox chain for oxygen respiration, including a sodium-transporting NAD(H): quinone oxidoreductase (complex I), succinate dehydrogenase (complex II), cytochrome c type (complex IV) terminal oxidases, and a F0F1-type ATPase. The complex III (cytochrome bc1) is absent. Under anoxic conditions, TH1-2 T has the potential for a mixed acid fermentation, such as acetyl-coA fermentation to butyrate, as indicated by presence of a 3-hydroxybutyryl-CoA dehydrogenase. TH1-2 T likely stores energy and phosphorus in the form of polyphosphate, since the genome encodes an exopolyphosphatase and a polyphosphate kinase.
A. paucihalophilus TH1-2 T is able to grow on organic acid, amino acid, and various sugar [6]. Based on COG functional categories (Table 4), The majority of genes of A. paucihalophilus associated with translation, ribosomal structure and biogenesis, amino acid transport and metabolism, lipid transport and metabolism, transcription, cell wall/membrane/envelope biogenesis, coenzyme transport and metabolism, energy production and conversion, and carbohydrate transport and metabolism of which the proportions were higher than 5%. The high number of proteins in these classes indicated that A. paucihalophilus TH1-2 T possessed a delicate regulation system as well as a requirement for sufficient organic in its lifestyle.
Comparison of different functional categories with other model bacteria (Escherichia coli K12 [16], Pseudomonas putida KT2440 [17], Shewanella oneidensis MR-1 [18] revealed remarkable differences in the distribution of functional categories of predicted proteins (Additional file 1: Table S1). A. paucihalophilus TH1-2 T had the highest proportion of genes devoted to lipid metabolism, which was even higher than that of P. putida KT2440 (4.01%), an important environmental bacterium involved in biodegradation. From the genes assigned to lipid metabolism, 33 genes were related to fatty acid degradation based on KEGG database. A. paucihalophilus TH1-2 T also had an increased proportion of coenzyme transport and metabolism, carbohydrate transport and metabolism, and protein turnover. The distinctive percentage of genes for various metabolisms indicated that A. paucihalophilus TH1-2 T had sophisticated systems to uptake and metabolize lipid, carbohydrate, and protein. This provides clues to different roles of A. paucihalophilus strain TH1-2 T in cyanobacterial aggregates environments.

Carbohydrate active enzymes
A. paucihalophilus TH1-2 T was isolated from cyanobacterial aggregates, hydrolyzes casein, starch and hemicellulose Evidence codes -TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [44] [6]. Therefore, we compared the predicted CDS against the CAZyme and dbCAN [19] database. The genome of strain TH1-2 T comprised a high number and high diversity of carbohydrate active enzymes including a total of 47 glycoside hydrolases, 37 glycosyl transferases, 38 carbohydrate esterases, 9 auxiliary activities, 7 carbohydrate-binding modules, and 3 polysaccharide lyases ( Table 5). The A. paucihalophilus TH1-2 T genome encodes CAZymes with expected properties such as peptidoglycan synthesis and remodelling/degradation (belonging to GT28 and GT51 families and GH3, GH23, GH24, GH102 and GH103 families respectively), and lipopolysaccharide biosynthesis pathway (belonging to GT9, GT19, GT30, GT83 families). Furthermore, A. paucihalophilus TH1-2 T has the potential to produce glucose from glycogen by candidate α-amylases belonging to GH13 family (eight in total). In addition, there were also other two cellulase classes for the complete degradation of hemicellulose by endo-1,4-β-mannosidase of families GH5 (2 copies) and β-glucosidase of families GH3 (4 copies).
Members of families CE1 and CE10, represented a significant proportion (71%) of the total CEs, share the common activities of carboxylesterase and endo-1,4β-xylanase [20]. However, they have a great diversity in substrate specificity. For example, vast majority of CE10 enzymes act on non-carbohydrate substrates [21]. Out of the 12 GT families identified in TH1-2 T genome, enzymes belonging to families GT2 and GT4 (cellulose synthase, chitin synthase, α-glucosyltransferase, etc.) represented a significant proportion (64%) of the total GTs.

Function unknown Not in COGs
Abbreviation: n.a. not assigned The total is based on the total number of protein coding genes in the genome Lignin-degrading enzymes of which, CAZyme families AA3 (glucose/methanol/choline oxidoreductases) and AA7 (glucooligosaccharide oxidase) appeared to be present in strain TH1-2 T genome ( Table 5). The family AA3 enzymes provide hydrogen peroxide required by the family AA2 enzymes (class II peroxidases) for catalytic activity, whereas family AA7 enzymes are known to be involved in the biotransformation or detoxification of lignocellulosic biomass [22]. Generally, the families AA1 enzymes (multicopper oxidase) and AA2 enzymes (class II peroxidase) are the main oxidative enzymes that degrade phenolic and non-phenolic structures of lignin.
Pectate lyases PL1 (2 copies) possessed in this strain suggested that these enzymes could degrade pectin associated with cyanobacteria. CBMs which have no reported enzymatic activity on their own, but can potentiate the activities of all other CAZymes (GHs, CEs, and auxiliary enzymes) or act as an appendix module of CAZymes [23,24].

Peptidases
The MEROPS annotation was carried out by searching the sequences against the MEROPS 12.0 database [25] (access date: 2017.10.16, version: pepunit.lib) as described in Hahnke et al. [26]. The genome of strain A. paucihalophilus TH1-2 T comprised 270 identified peptidase genes (or homologues), mostly serine peptidases (S, 133), metallo peptidases (M, 56) and cysteine peptidases (C, 27) ( Table 6). Among serine peptidases, members of the families S09 and S33, both of which cleave mainly prolyl bonds [27], are most prevalent in A. paucihalophilus TH1-2 T . S09 members act mostly on oligopeptides, probably due to the confined space in the N-terminus of their β-propeller tunnel [28,29], and S33 members release an N-terminal residue from a peptide, preferably (but not exclusively) a proline [28]. So far, S9 and S33 peptidases have been connected to the degradation of proline-rich proteins from animals [30][31][32] and are not known for a role in the biodegradation of algal biomass.
Among the present metalloproteinases, members of the families M23 belong to the most frequent ones. M23 family members have been shown to take part in the  extracellular degradation of bacterial peptidoglycan, either as a defense or as a feeding mechanism [33]. The complete extracellular decomposition of peptides to amino acids requires M20 and M28 family exopeptidases [27], both of which can be found abundantly in the A. paucihalophilus TH1-2 T genome as well.

Transport systems
Sixty-one ATP-binding cassette transporters, one tripartite ATP-independent periplasmic transporters, one phosphotransferase system transporters, 28 TonBdependent transporters were identified in TH1-2 T genome. ABC transporters are ubiquitous in bacteria and function in the import of growth substrates or factors, including carbohydrates, amino acids, polypeptides, vitamins, and metal-chelate complexes [34]. TBDT in the bacterial outer membrane often promotes the transport of rare nutrients and is known for its high-affinity uptake of iron complexes. Experimental data reveal that carbohydrates, amino acid, and organic acid are TonB-dependent substrates [35,36]. Twenty-eight TBDTs detected in TH1-2 T genome were classified by aligning these genes with genes within different clusters classified by Tang et al., [37]. Group I TBDTs, which was dominated in TH1-2 T genome, consisted of transporters for various types of dissolved organic matter, including carbohydrates, amino acids, lipids, organic acid, and protein degradation products (Table 7). Nine genes were identified as group III TBDTs, that transport iron from heme or iron proteins with high affinity (Table 7). Thirty-seven genes were related to porphyrin and chlorophyll metabolism based on KEGG database.

Conclusions
The genome of A. paucihalophilus TH1-2 T contains a relatively high number of genes coding for fatty acid degradation, carbohydrate active enzymes and peptidase, and transporter. The availability of A. paucihalophilus TH1-2 T draft genome sequence may provide better insights into its primary metabolism and other phenotypic characteristics of interest. Further studies involving characterization of carbon element cycling genes would accentuate its biogeochemical cycling importance, particularly in ecological restoration for the eutrophic lake.