Draft genome sequence of the type strain of the sulfur-oxidizing acidophile, Acidithiobacillus albertensis (DSM 14366)

Acidithiobacillus albertensis is an extremely acidophilic, mesophilic, obligatory autotrophic sulfur-oxidizer, with potential importance in the bioleaching of sulfidic metal ores, first described in the 1980s. Here we present the draft genome sequence of Acidithiobacillus albertensis DSM 14366T, thereby both filling a long-standing gap in the genomics of the acidithiobacilli, and providing further insight into the understanding of the biology of the non iron-oxidizing members of the Acidithiobacillus genus. The assembled genome is 3,1 Mb, and contains 47 tRNAs, tmRNA gene and 2 rRNA operons, along with 3149 protein-coding predicted genes. The Whole Genome Shotgun project was deposited in DDBJ/EMBL/GenBank under the accession MOAD00000000.


Introduction
The genus Acidithiobacillus [1] comprises a group of obligatory acidophilic chemolithotrophic bacteria that derive energy from the oxidation of reduced sulfur compounds, thereby contributing to the bioleaching of ores and to the formation of polluting mine drainage waters. Although they were considered until relatively recently as members of the Gamma-proteobacteria, multi-protein phylogenetic analysis of concatenated ribosomal proteins re-categorized the order Acidithiobacillales as a new class of proteobacteria, now known as Acidithiobacillia [2]. Currently, seven species are recognized: Acidithiobacillus thiooxidans [3], A. ferrooxidans [4], A. albertensis [5], A. caldus [6], A. ferrivorans [7], A. ferridurans [8], A. ferriphilus [9], four of which also catalyze the dissimilatory oxidation of ferrous iron while three (A. thiooxidans, A. albertensis and A. caldus) do not.
Being capable of biogenic acid production and oxidation of reduced sulfur compounds, most species of the taxon have been exploited industrially in the recovery of valuable metals such as copper and gold and other relevant elements from ores and wastes ( [10] and references therein). Not only are they frequent members of most analyzed bioleaching consortia, but tend also to be numerically relevant ( [11] and references therein). Due to their biotechnological relevance most species of the taxon have been the object of intensive research since the early 1900's [12]. Yet, despite compelling evidence regarding the widespread occurrence of A. albertensis [13][14][15][16] and its potential for chalcopyrite and sphalerite bioleaching [13,17], A. albertensis remains the least studied species of all acidithiobacilli.
Whole genome sequences of a number of representative strains of four species of Acidithiobacillus (A. thiooxidans, A. ferrooxidans, A. caldus and A. ferrivorans) have been reported to date [18] and genome comparisons have been performed both between and within species [19][20][21][22][23]. However, no representative genome sequence is yet available for A. albertensis. Given that A. albertensis resembles A. thiooxidans in several aspects of their biology and physiology [5,24], and that presence of either species in the natural and industrial environments tend to be confounded due to the high similarity between species at the 16S rRNA level [25], further characterization of the former is required to shed light into the species-specific processes. Availability of the whole-genome of the type strain of A. albertensis represents a first necessary step in this direction.
Here we present a description of the first draft of the genome sequence and annotation of the type strain of A. albertensis (DSM 14366 T ) along with relevant genomic indices of the taxon. The data presented fill a longstanding gap in the understanding of the genomic landscape of the acidithiobacilli and of the biology of A. albertensis and paves the way for more encompassing phylogenomic analyses of the species complex of these fascinating model acidophiles.

Classification and features
Originally described by Bryant and colleagues [5], A. albertensis (formerly Thiobacillus albertis) was recognized as a new species in 1988 [26]. The species epithet derives from the Latin (al.ber.ten'sis. M.L. adj. albertensis Albertan), meaning pertaining to Alberta, a province of Canada, from where it was first isolated. The type strain is DSM 14366/ATCC 35403. A. albertensis was described as a mesophilic, obligatory autotrophic sulfur-oxidizer that did not oxidize iron. Differentiating characteristics from other members of the acidithiobacilli include forming yellowish colonies on solid sulfur-containing media, a slightly larger cellular size, a tuft of polar flagella, a glycocalyx and a number of large intracellular sulfur globules [5,17]. A. albertensis was reported to have a more confined pH range for growth (2-4.5) and a slightly higher temperature growth optimum with respect to other members of the genus [1], although these features may vary between strains [17]. Additional properties of A. albertensis are listed in Table 1.
Phylogenetic analysis of the 16S rRNA gene sequence of A. albertensis DSM 14366 T places the type strain close to a few other cultivated members of the species and several uncultured clones deposited in GenBank, all of which are 100% identical at the16S rRNA gene level (Fig. 1). The A. albertensis type strain and its closest relatives branch apart from A. thiooxidans T .

Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to members of the genus Acidithiobacillus. This represents the first draft genome sequence of an A. albertensis strain. The Whole Genome Shotgun project has been deposited at GenBank under the accession MOAD00000000. The version described in this paper consists of 1 scaffold (2.7 > X Mbp) and 140 smaller contigs and is the first version, MOAD01000000. Table 2 presents the project information and its association with MIGS (version 2.0) compliance [27].

Growth conditions and genomic DNA preparation
A. albertensis strain DSM 14366 T was obtained from the DSMZ collection and grown in DSMZ 71 medium at 30°C. DNA isolation and routine manipulations were carried out following standard protocols [28]. Evidence codes -IDA Inferred from Direct Assay, TAS: Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [50] Genome sequencing and assembly The genome of A. albertensis DSM 14366 T was sequenced using Illumina sequencing technology (MiSeq platform) and paired-end libraries. Duplicate high quality libraries with insert sizes of~460 bp were prepared using Nextera™ DNA Sample Preparation kit (Nextera, USA). Raw sequencing reads were preprocessed using Trimmomatic v0.32 [29]. Only reads with a quality score > Q30 (corresponding to less than 1 error per 1000 bp) and a read length > 35 nt were retained. High quality reads were assembled de novo using Velvet (v1.2.10) [30] and a k-mer length of 151, with an N50 of 39,225. Contig segments with at least 37 fold coverage were further scaffolded. The final draft assembly contained 1 scaffold (2.7 > X Mbp) and 140 smaller contigs. The total size of the draft genome is~3.1 Mbp and the final assembly is based on 3.1 Gbp of Illumina data.

Genome annotations
Genes were identified using Glimmer 3.02 [31] as part of the RAST annotation pipeline [32]. The tRNA and tmRNA predictions were made using ARAGORN v1.2.36 [33] and the rRNA prediction was carried out via HMMER3 [34]. Additional gene prediction analysis and manual functional annotation curation was performed using in house resources. The predicted CDSs were used to search the National Center for Biotechnology Information non-redundant database, UniProt, TIGRFam, A. ferridurans ATCC 33020 T , NR_117036; A. caldus ATCC 51756 T , CP005986. The tree was inferred using the Neighbor-Joining method [51]. The optimal tree with the sum of branch length = 0.08720008 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method [52] and are in the units of the number of base substitutions per site. The analysis involved 34nucleotide sequences. There were a total of 1314 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 [53]  . Non-synonymous substitution rates were calculated as the proportion between the number of observed synonymous substitutions in pairwise gene alignments and the size of the each alignment, and are expressed in percent. Genome comparisons were performed using the GET_HOMOLOGUES software package (version 07112016). Orthology was determined based on all-versus-all Best Bidirectional BlastP Hit and COGtriangles v2.1 as clustering algorithm. Pairwise alignment cutoffs were set at 75% coverage and E-value of 10E-5. The phylogenomic relationships between the A. albertensis T and other Acidithiobacillus strains were inferred from the average nucleotide identity (ANI) values assessed by BLASTn [38] and the in silico DNA-DNA hybridization indexes (DDH) assessed using the Genome-to-Genome Distance Calculator with recommended formula 2 [39]. Species cutoff limits were those defined by Meier-Kolthoff and colleagues [40].

Genome properties
The 3.5 Mbp draft genome of A. albertensis T is currently arranged into one high quality scaffold (Fig. 2) and 140 smaller contigs, most of which correspond to fragments of plasmids and other mobile genetic elements. According to the criteria of conservation of universal housekeeping genes [41], the genome is predicted to be 99.9% complete. Its average G + C content is 52.5% (Table 3). From a total of 3202 predicted genes, 3149 were protein-coding genes and 53 were RNA genes. A total of 63.4% of the CDSs were assigned a putative function while the remainders were annotated as hypotheticals. A total of 53 RNA genes partitioned into 47 tRNAs, 1 tmRNA and 2 rRNA operons ( Table 3). The presence of two rRNA operons has recently been experimentally validated [25]. According to the genomic sequence information, the two operons are 100% identical. The distribution of genes into COGs functional categories is presented in Table 4.

Insights from the genome sequence
Metabolic reconstruction analysis revealed a complete suite of genes for sulfur oxidation, including those encoding the SOX complex (soxYZB-AX and soxYZA-B, soxH), tetrathionate hydrolase (tetH, doxD) and heterodisulfide reductase (hdrBC and hdrABC) previously found in A. thiooxidans T and A. caldus T [42,43]. Multiple copies of cytochrome d (cydAB) and cytochrome o (cyoACBD) terminal oxidases found in professional sulfur-oxidizing acidithiobacilli [19], also occur in A. albertensis T . Genes for carbon dioxide fixation are well conserved, but no genes for nitrogen fixation were detected in the draft genome. Instead, genes for nitrate/nitrite assimilation and urea hydrolysis, both resulting in the production of ammonia, were found in the genome of the A. albertensis T , along with a number of ammonia transporters.
Gene clusters for the biosynthesis and assembly of flagella, which is a differential morphologic trait between this species and A. thiooxidans, are conserved with respect to those encoded in the latter, in both general architecture and gene content. The pairwise identity between the predicted protein products of the flagellar genes of both type strains ranges from 87 to 100%, suggesting as well, the common ancestry of the operons.
Yet, a relevant number of SNPs (single nucleotide polymorphisms) producing non-synonymous amino acidic substitutions of presently unclear relevance were uncovered in nine genes of the A. albertensis T flagellar cluster (Fig. 3), namely: flaB2, flhF, flhG, fliH, fliK, fliR, fliS2, fleS and fleQ1. All these genes are well conserved between A. thiooxidans strains (Fig. 3). The gene variants identified in A. albertensis were validated by read recruitment on a one-to-one basis, and are supported by more than 75 fold average (deep) coverage. These genes encode the flagellins FlaB2, the hook-length control protein FliK, the biosynthesis proteins FlhF, FliR and FliS, the biosynthesis regulator FlhG, also known as FleN, the assembly protein FliH, the sensor histidine kinase FleS and the regulator FleQ. Among these proteins, FlhF and FlhG/FleN encode proteins that have  The total is based on the total number of predicted protein coding genes in the annotated genome been shown to be relevant in the control flagellation patterns in other model bacteria [44], suggesting that differences in flagellation between A. albertensis (lophotrichous) and A. thiooxidans (monotrichous) shown in Fig. 3 might be partially attributed to divergence in these genes (6-14%). For the rest of the flagellar genes the rate of SNPs conductive to amino acidic substitutions between A. albertensis and other A. thiooxidans sequenced strains is low (<3) and similar to the rate observed in well conserved housekeeping genes. Further studies should be pursued to clarify the relevance of the uncovered substitutions in the flagellation patterns of the acidithiobacilli. Also, a larger number of chemotaxis genes were predicted in the A. albertensis T genome sequence with respect to those in A. thiooxidans. This latter set of genes is organized in a cluster that includes mcp1-cheYSA-mcp2-cheWRDB, and encodes proteins participating in sensory adaptation to changing environmental signals rather than flagellar motor control [45]. Differences between the A. albertensis T genome and the pangenome of 10 other sequenced A. thiooxidans (recently reported by [22]) can be attributed to little over 1000 genes (1066 genes). Nearly half of these genes pertain to at least 10 integrated mobile genetic elements and a presently unclear number of plasmids, representing up to 16.2% of the A. albertensis T genome. In these genomic segments 54.5% of the genes are hypotheticals but a number of relevant functions were also detected, including among others: a) four orthologs of the sulfur oxygenase reductases (sor1-4), b) the gene cluster encoding the assimilatory nitrate and nitrite reductases, c) the urea carboxylase/allophanate hydrolase and the urea ABC transporter encoding genes, d) the spermidine/putrescine ABC transporter potABC and e) the three-gene operon associated with rubrerythrin, recently described by Cárdenas et al. [46]. All of these functions could confer adaptive advantages to A. albertensis T over A. thiooxidans strains under nitrogen and oxygen limitation and/or under extremely low pH.
Differences in gene dosage have also been observed between the two mesophilic sulfur-oxidizing/non ironoxidizing species based on the comparison of the two type strains. A. albertensis T has more copies or gene variants (2 to more than 30) of the following: a) transposases and inactivated derivatives, b) thiol:disulfide interchange protein DsbG precursor, c) methyl-accepting chemotaxis receptor proteins, d) Crp/Fnr, LysR and MerR family transcriptional regulators, e) cytochrome d ubiquinol oxidases and e) SOR sulfur oxygenase reductases. The latter occur in four copies in the A. albertensis T genome, being completely absent in A. thiooxidans T . Also more than 30 predicted protein products with GGDEF/EAL domains, likely involved in c Comparison of the flagellar gene cluster between A. albertensis T (AAL) and A. thiooxidans T (ATH) derived from the corresponding genomic sequences. Flagellar genes and gene clusters are indicated accordingly. Percentage of amino-acid similarity is indicated only when bellow 98%. Color coding is as follows: motor proteins (blue), basal-body (turquoise), hook (green), flagellin (light blue), biosynthesis and assembly functions (orange), regulation (red). d Heatmap of the non-synonymous amino acidic substitution rates (percent) of the protein products of each flagellar gene and seven housekeeping genes from A. albertensis T and 11 A. thioxidans T sequenced strains (AFOH01, LGYM01, JMEB01, AZMO01, LWSA01, LWRZ01, LWSB01, LWRY01, LWSC01, LWSD01, LZYI01). Housekeeping genes were chosen after Nuñez et al. [25] nucleotide driven signaling pathways, control and modulate gene expression and/or activity in A. albertensis T , 40% of which seem to be exclusive to this species. Significant quantitative and qualitative differences in gene content have been reported before between strains of A. thiooxidans obtained from industrial processes [21,22].
Despite the above mentioned differences between the type strains of A. albertensis and A. thiooxidans, the average nucleotide identity value assessed by BLASTn (97,4%) and the in silico DNA-DNA hybridization index assessed by GGDC (82.9%) are bellow the currently recognized species cutoff limits [39], implying that A. albertensis and A. thiooxidans probably comprise a single genospecies.

Conclusions
Altogether, the evidence presented herein suggests that validity of A. albertensis as an independent species should be reconsidered. In this respect, genomic approaches are crucial for understanding evolutionary processes and the origins of microbial biodiversity. The availability of the first high quality draft genome sequence of an A. albertensis strain will certainly enable more comprehensive comparative genomic studies and contribute to the resolution of the taxonomy and phylogeny of the genus. From a genomic standpoint, further analyses should be performed to assess if existing differences between the two type strains extend to other strains of each 'presumed species'.