Isolation and complete genome sequence of the thermophilic Geobacillus sp. 12AMOR1 from an Arctic deep-sea hydrothermal vent site

Members of the genus Geobacillus have been isolated from a wide variety of habitats worldwide and are the subject for targeted enzyme utilization in various industrial applications. Here we report the isolation and complete genome sequence of the thermophilic starch-degrading Geobacillus sp. 12AMOR1. The strain 12AMOR1 was isolated from deep-sea hot sediment at the Jan Mayen hydrothermal Vent Site. Geobacillus sp. 12AMOR1 consists of a 3,410,035 bp circular chromosome and a 32,689 bp plasmid with a G + C content of 52 % and 47 %, respectively. The genome comprises 3323 protein-coding genes, 88 tRNA species and 10 rRNA operons. The isolate grows on a suite of sugars, complex polysaccharides and proteinous carbon sources. Accordingly, a versatility of genes encoding carbohydrate-active enzymes (CAZy) and peptidases were identified in the genome. Expression, purification and characterization of an enzyme of the glycoside hydrolase family 13 revealed a starch-degrading capacity and high thermal stability with a melting temperature of 76.4 °C. Altogether, the data obtained point to a new isolate from a marine hydrothermal vent with a large bioprospecting potential. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0137-y) contains supplementary material, which is available to authorized users.


Introduction
In 2001 the genus Geobacillus was proposed by Nazina et al. [1] to distinguish it from the genus Bacillus. Bacteria of the genus Geobacillus have been isolated from diverse marine and terrestrial habitats such as oil wells [2], cool soils like from Bolivian Andes [3], sediments from Mariana Trench [4] and deep sea hydrothermal vents [5]. Surprisingly, these thermophiles can be isolated from cold environments from different geographical regions in such large quantities that it speaks against a "contamination" from hot environments, which have been described as paradox [6]. The influence of direct heating action of the sun upon the upper soil layers and heat development due to putrefactive and fermentative processes of mesophiles could give an explanation for their abundance [7,8]. To our knowledge, Geobacillus has not been isolated from an Arctic marine habitat. As of June 2015, 37 Geobacillus genomes have been deposited in GenBank. Due to the development of next generation sequencing the number of new sequenced genomes (17) has been almost doubled in the last one and a half years. Of all Geobacillus genomes, 13 have been described as complete, whilst the other 24 genomes have been deposited as drafts. The genus exhibits a broad repertoire of hydrolytic and modifying enzymes and is therefore a valuable resource for biocatalysts involved in biotechnological processes with accelerated temperatures [9,10]. The application of thermophilic microorganisms or enzymes in biotechnology gives advantage in enhancing biomass conversion in a variety of biotechnical applications; it minimizes contamination and can reduce the process costs [11]. Diverse Geobacillus strains comprise an arsenal of complex polysaccharide degrading enzymes such as for lignocellulose [12]. Other Geobacillus strains are able to degrade a broad range of alkanes [13,14]. Up to now a multiplicity of patents derived from the genus comprises restriction nucleases, DNA polymerases, α-amylases, xylanase, catalase, lipases and neutral protease among others (EP 2392651, US2011020897, EP2623591, US2012309063, KR100807275 [15,16]). The glycoside hydrolase group 13 (GH13) α-amylases are well studied enzymes which have a broad biotechnological application, for example for bioethanol production, food processing or in textile and paper industry [17]. Due to the broad application of α-amylases there is a focus of interest to identify novel α-amylases for new and improved applications in biotechnology. In addition to functional screening for enzyme activity, genome investigation is a valuable tool to identify potential biocatalysts. Here we present the isolation and metabolic features of Geobacillus sp. 12AMOR1 (DSM 100439) together with the description of the complete genome and its annotation.

Classification and features
Geobacillus sp. strain 12AMOR1 was isolated from a 90°C hot deep-sea sediment sample collected in July of 2012 from the Arctic Jan Mayen Vent Field (JMVF). The sample was collected using a shovel box connected to a Remote Operating Vehicle (ROV) at a water depth of 470 m. The detailed description of the JMVF site is described elsewhere [18,19].
The bacterium was isolated at 60°C on Archaeoglobus medium agar plates [20] pH 6.3 containing 1 % Starch (Sigma Aldrich) at the attempt to screen for starch degraders. Genomic DNA of isolates was extracted using Fas-tDNA® Spin Kit for Soil (MP). The partial 16S rRNA gene was amplified by PCR using Hot Star Plus (QIAGEN) and following universal primers B8f (5' AGAGTTTGATCC TGGCTCAG) [21] and Un1391r (5' GACGGGCGGTG  [22]. The preliminary partial 16S rRNA gene fragment of strain 12AMOR1 has been analyzed using the megablast algorithm in the standalone blastn [23] against 16S ribosomal RNA (Bacteria and Archaea database). The partial 16S rRNA gene shared 98 % sequence identity with the strains G. stearothermophilus DSM 22 T (NR_114762.1) and R-35646 (NR_116987.1), as well as to other Geobacillus species: Geobacillus subterraneus strain 34 T (NR_025 109.1), Geobacillus zalihae strain NBRC 101842 T (NR_114 014.1), Geobacillus thermoleovorans strain BGSC 96A1 T (ref|NR_115286.1), Geobacillus thermocatenulatus strain BGSC 93A1 T (NR_043020.1), Geobacillus vulcani strain 3S-1 T (NR_025426.1) and Geobacillus kaustophilus strain BGSC 90A1 T (NR_115285.1) (Additional file 1). The genome of Geobacillus sp. 12AMOR1 encoded 10 genes for 16S rRNA whereby blastn analysis [23] revealed small differences in top hits towards multiple Geobacillus strains.  The 16S rRNA gene GARCT_01776 was identical to the partial sequence obtained by PCR mentioned above, and thus, the whole 16S rRNA gene GARCT_01776 was used for the phylogenetic analysis. A phylogenetic tree was constructed from aligning the 16S rRNA gene GARCT_01776 with 16S rRNA genes from selected strains and species from the same genus using MUSCLE [24,25] and Neighbor-Joining algorithm incorporated in MEGA 6.06 [26]. The 16S rRNA from Geobacillus sp. 12AMOR1 grouped together with Geobacillus sp. ZY-10 and G. stearothermophilus strain 32A, Z3-14a and mt-24 ( Fig. 1). Interestingly, within the subcluster of G. stearothermophilus, the isolate 12AMOR1 and herein before mentioned strains were grouped apart from the type strain G. stearothermophilus DSM 22 T . To further evaluate how closely related the new isolate was to existing species of Geobacillus, a digital DNA-DNA hybridization (DDH) [27] was performed using the complete genomes of 13 Geobacillus species listed in Additional file 2. DDH estimations below 70 % suggested that Geobacillus sp. 12AMOR1 belonged to a new species. The level of relatedness by DDH estimations using formula 2 (identities/HSP length) ranged from 21.5 to 41.5 % between the isolate and different Geobacillus species. Geobacillus sp. 12AMOR1 is a Gram-positive [28], spore-forming, motile, facultative anaerobic rod. The cells are in average 0.5-0.7 μm in width and between 1.8 and 4.5 μm long. In addition, cells forming long filamentous chains were observed by microscopy. The cells were peritrichous flagellated ( Fig. 2) consistent with previously observation of Geobacilli [1,29]. Terminal ellipse shaped spores was observed.
The isolate was able to grow in a temperature range of 40 to 70°C and pH of 5.5 to 9.0, with a temperature optimum of 60°C and a broad pH optimum between 6.5 and 8.0. Growth was observed in concentrations ranging between 0 and 5 % NaCl. Besides aerobic growth, Geobacillus sp. 12AMOR1 was able to grow on yeast extract in anaerobic NRB medium containing nitrate [30].
Besides the utilization of starch, Geobacillus 12AMOR1 was able to grow on complex polysaccharides such as xylan, chitin and α-cellulose (Table 1). Fast growth was accomplished by cultivating the isolate on yeast extract and gelatin. In addition, the isolate utilizes lactose, galactose and organic acids such as lactate and acetate. No growth was observed using pectine, xylose, tween20 and tween80 as carbon source. Geobacillus sp. 12AMOR1 degrades DNA supplemented in agar (Fig. 4d).
Geobacillus sp. 12AMOR1 was catalase positive using 3 % hydrogen peroxide. Tests using diatabs (Rosco Diagnostics) identified the isolate as oxidase positive and urease negative.

Genome project history
The complete genome sequence and annotation data of Geobacillus sp. 12AMOR1 have been deposited in DDBJ/ EMBL/GenBank under the accession number CP011832.1. Sequencing was performed at the Norwegian Sequencing Centre in Oslo, Norway [31]. Assembly and finishing steps were performed at the Centre for Geobiology, University of Bergen, Norway. Annotation was performed using the Prokka automatic annotation tool [32] and manually edited to fulfill NCBI standards. Table 2 summarizes the project information and its association with MIGS version 2.0 compliance [33].

Growth conditions and genomic DNA preparation
A pure culture of the isolated Geobacillus sp. 12AMOR1 was cultivated on 50 ml LB media for 18 h at 60°C. After harvesting the cells by centrifugation at 8,000 x g for 10 min high-molecular DNA for sequencing was obtained using a modified method of Marmur [34]. In short: The pellet was suspended in a solution of 1 mg/ml Lysozyme (Sigma 62971) in 10 mM TE buffer (pH 8) and incubated at 37°C for 15 min. After a Proteinase K treatment (40mg/ ml final concentration, Sigma P6556) at 37°C for 15 min, a final concentration of 1 % SDS was added and the solution was incubated at 60°C for 5 min until clearance of the solution. A final concentration of 1 M sodium perchlorate (Sigma-Aldrich 410241) was added and the solution well mixed, before an equal volume of Phenol:Chloroform:Isoamylalcohol (25:24:1) was added and the solution gently

Genome sequencing and assembly
Approximately 200 μg of genomic DNA was submitted for sequencing. In short, a library was prepared using Pacific Biosciences 10 kb library preparation protocol. Size selection of the final library was performed using BluePippin (Sage Science). The library was sequenced on Pacific Biosciences RS II instrument using P4-C2 chemistry. In total, two SMRT cells were used for sequencing. Raw reads were filtered and de novo assembled using SMRT Analysis v. 2.1 and the protocol HGAP v2 (Pacific Biosciences) [35]. The consensus polishing process resulted in a highly accurate self-overlapping contig, as observed using Gepard dotplot [36], with a length of 3,426,502 bp, in addition to a self-overlapping 45,474 bp plasmid. Circularization and trimming was performed using Mini-mus2 included in the AMOS software package [37]. The circular chromosomal contig and plasmid was polished and consensus corrected twice using the RS_Resequencing protocol in SMRT Analysis v. 2.1. The final polishing resulted in a 3,410,035 bp finished circular chromosome and a 32,689 bp circular plasmid, with a consensus concordance of 99.9 %. The chromosome was manually reoriented to begin at the location of the dnaA gene.

Genome properties
The genome of Geobacillus sp. 12AMOR1 includes one plasmid of 32,689 bp (47 % G + C content), with one circular chromosome of 3,410,035 bp (52 % G + C content).
The main chromosome contained 10 rRNA operons and 88 tRNAs and predicted to encode 3323 protein-coding genes ( Table 3 and Fig. 3). 2454 of the protein-coding genes were assigned to a putative function. Identification of peptidases and carbohydrate-degrading enzymes was performed using the MEROPS peptidase database [41] and dbCAN [42], respectively. Using the PHAST web server for the detection of prophages [43], two prophage regions were detected, one intact (56.1Kb: 2476493-2532633) and one incomplete (7.7 Kb: 2811872-2819623). 46 % of the intact prophage protein-coding genes were related to the deep-sea thermophilic bacteriophage GVE2 (NC_009552). The 32.7 Kbps plasmid encoded 34 protein-coding genes.

Insights from the genome sequence
The genome of Geobacillus sp. 12AMOR1 encodes for 3323 protein-coding genes (Table 4). Of those proteins 26.15 % could not be annotated towards a specific function and remain hypothetical. In total, 92.66 % of the proteins could be assigned to a COG functional category. The COG functional categories included replication, recombination and repair (9.4 %); amino acid transport and metabolism (6.9 %); inorganic ion transport and metabolism (3.9 %); energy production and conversion (4.17 %); cell wall/membrane/envelop biogenesis (3.7 %) and carbohydrate transport and metabolism (3.8 %) amongst others (Table 5). In the dbCAN analysis, 108 proteins were assigned for one or more functional activities within the CAZy families, which catalyzes the breakdown, biosynthesis or modification of carbohydrates and glycoconjugates [44,45]. Geobacillus sp. 12AMOR1 hydrolyzes starch, dextrin, gelatin, casein and DNA, and utilized sugars such as D-glucose, Dgalactose, D-mannose, D-maltose, D-lactose, D-melibiose, D-saccharose, D-trehalose, D-raffinose and glycogen. CDSs encoding for enzymes to metabolize the above mentioned substrates were identified by genome prediction, homology search or mapping onto pathways using the KEGG Automatic Annotation Server [46] server.  Furthermore, the isolate was able to grow on the complex carbon polymers xylan, chitin and α-cellulose, however the pathways for such polymer degradation were not identified in the genome. In contrast, pathways for utilization of D-mannitol, arbutin and salicin were identified, although utilization involving acid production was not observed. In comparison with other Geobacillus strains, 12AMOR1 harbors less gene modules involved in hydrolysis and utilization of complex carbohydrates [8,12]. Enzymes involved in protein degradation have been analyzed using MEROPS. In total 126 proteinases were identified. Of those 18 carried a signal peptide identified by SignalP [47] and could be responsible for the extracellular degradation of proteins. Geobacillus sp. 12AMOR1 showed strong enzymatic activities for esterase (C4), esterase/lipase (C8), leucine arylamidase, α-chymotrypsin, α-glucosidase, alkaline and acidic phosphatase and weak activity for lipase (C14), valine arylamidase, cysteine arylamidase, β-glucosidase, β-glucuronidase and naphtol-AS-BI-phosphohydrolase.
Due to their broad biotechnological applications, such as in food processing, detergents or bioethanol production [17], identifying novel α-amylases is still of biotechnological interest. Five genes encoding for α-amylases of the GH13 family (Table 6) were identified by dbCAN analysis. The neopullulanase (GARCT_00679; AKM17981) was cloned using following primers F: AGG AGA TAT ACC ATG CAA AAA GAA GCC ATT CAC CAC CGC, R: GTG ATG GTG ATG TTT CCA GCT TTC AAC TTT ATA GAG CAC AAA CCC, and expressed in E. coli BL21 (DE3). The protein GARCT_00679 was purified in high amounts from E. coli and revealed a melting temperature of 76.4°C in differential scanning calorimetry (DSC) analysis. As expected this value was elevated from the optimal growth temperature of the isolate. Using purified protein solution on 1 % starch-agar plates only GARCT_00679 showed starch degradation capacity comparable with the reference alpha amylase from B. licheniformis (Sigma-Aldrich) (Fig. 5).