High quality draft genome sequence of an extremely halophilic archaeon Natrinema altunense strain AJ2T

Natrinema altunense strain AJ2T, a halophilic archaeal strain, was isolated from a high-altitude (3884 m) salt lake in Xinjiang, China. This strain requires at least 1.7 M NaCl to grow and can grow anaerobically in the presence of nitrate. To understand the genetics underlying its extreme phenotype, we de novo assembled the entire genome sequence of AJ2T (=CGMCC 1.3731T=JCM 12890T). We assembled 3,774,135 bp of a total of 4.4 Mb genome in only 20 contigs and noted its high GC content (64.6%). Subsequently we predicted the gene content and generated genome annotation to identify the relationship between the epigenetic characteristics and genomic features. The genome sequence contains 52 tRNA genes, 3 rRNA genes and 4,462 protein-coding genes, 3792 assigned as functional or hypothetical proteins in nr database. This Whole Genome Shotgun project was deposited in DDBJ/EMBL/GenBank under the accession JNCS00000000. We performed a Bayesian (Maximum-Likelihood) phylogenetic analysis using 16S rRNA sequence and obtained its relationship to other strains in the Natrinema and Haloterrigena genera. We also confirmed the ANI value between every two species of Natrinema and Haloterrigena genera. In conclusion, our analysis furthered our understanding of the extreme-environment adapted strain AJ2T by characterizing its genome structure, gene content and phylogenetic placement. Our detailed case study will contribute to our overall understanding of why Natrinema strains can survive in such a high-altitude salt lake.


Introduction
When the genus Natrinema was first described in 1998, it contained two species, Natrinema pellirubrum and Natrinema pallidum [1]. The genus Natrinema belongs to family Halobacteriaceae, phylum Euryarchaeota. Five more species of this genus were isolated and characterized since then, including N. versiforme [2], N. altunense [3], N. gari [4], N. ejinorense [5] and N. salaciae [6]. For now, the genomic sequences of all but N. ejinorense and N. salaciae in the genus Natrinema are publicly available on Genomes Online Database [7] and/or NCBI Genbank. Our lab first identified the N. altunense strain AJ2 T in 2005 in a salt lake [3]. Living cells in salt lake have made numerous adaptations to this special ecosystem, allowing them to flourish in a very harsh environment. To determine if the AJ2 T genome contains genes for adaptation to a particular set of environmental restrictions and supply a version of genome assembly in the database, we sequenced its whole genome in 2011 and published the whole genome sequence in the WGS database in May, 2014 as the first reported whole genome sequence of its species.

Organism information
We isolated the strain AJ2 T from a water sample collected from the edge of Ayakekum salt lake (37°37′ N, 89°29′ E) in Altun Mountain (Altyn-Tagh) National Nature Reserve in Xinjiang, China (Table 1). This salt lake is cold and exposed to strong ultraviolet radiation throughout the year due to its high altitude. It also has high salinity and lacks the common organic nutrients for microorganisms [3].

Classification and features
N. altunense strain AJ2 T is an extremely halophilic archaea growing at 1.7-4.3 M NaCl and 0.005-1.0 M MgCl 2 . Colonies in the agar plate have a vivid orange or red colour. Cells are rod-shaped, but can become pleomorphic under unfavourable conditions as reported in 2005 [3]. The 16S rRNA gene sequence analysis was submitted to the EzTaxon-e service [8] and revealed 95.77-98.50% sequence similarity to members of the genus Natrinema. Strain AJ2 T exhibited the highest 16S rRNA gene sequence similarity with N. gari HIS40-3 T (98.50%). Phylogenetic analysis based on 16S rRNA gene sequences showed that strain AJ2 T clustered with most type strains of the genus Natrinema with a high bootstrap value (Fig. 1). The other three type strains, N. pellirubrum DSM 15624 T , N. salaciae MDB25 T and N. ejinorense EJ-57 T , were clustered with the genus Haloterrigena. In the 16S rRNA gene trees (Fig. 1) and rpoB' (RNA polymerase subunit B′) gene trees [9], these three type strains of genus Natrinema showed unclear taxonomic positions [10]. The mixture phylogenetic relationship between these strains in the Natrinema and Haloterrigena genera were reported in 2003 [9]. This suggests that Haloterrigena maybe a later synonym (heterotypic) of genus Natrinema. The cell morphology and flagellum of N. altunense strain AJ2 T were examined using transmission electron microscopy (JEM-1230, JEOL). The cells of strain AJ2 T are straight and rods and have a diameter ranging 0.3-0.8 μm and length of 0.9-4.0 μm (Fig. 2). The cells are motile and their growth requires at least 1.7 M NaCl and 0.005-1 M MgCl 2 (optimal 3.0-4.3 M NaCl and 0.05-0.2 M MgCl 2 ). This a Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature); NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [39] strain is chemo-organotrophic and can anaerobically grow in the presence of nitrate. The strain had oxidase and catalase activity. The strain can reduce nitrate and nitrite and produce N 2 gas. This strain can also hydrolyse gelatine and tweens 20, 40 and 80 as well as produce H 2 S from thiosulfate [3].

Genome project history
We selected N. altunense AJ2 T for sequencing because its halophilic properties and high-altitude habitat may have caused interesting changes in its genome. Additionally, the five other members of genus Natrinema were sequenced and could be compared to our sequence ( Table 2). This Whole Genome Shotgun project has been deposited in the DDBJ/EMBL/GenBank under the accession JNCS00000000. The version described in this paper is version JNCS00000000.1. Table 3 presents the project information and its association with MIGS version 2.0 compliance [11].
Growth conditions and genomic DNA preparation N. altunense strain AJ2 T was aerobically cultivated at 37°C for 3 days in modified CM medium, which contained the following (per liter distilled water): 7.5 g Casamino acid (Bacto), 10 g yeast extract (OXOID), 3 g trisodium citrate, 2 g KCl, 20 g MgSO 4 · 7H 2 O and 200 g NaCl (pH 7.2). Genomic DNA was extracted according to the method described by Marmur & Doty [12]. The cells were suspended from 250 ml CM medium and washed once with 20% (w/v) NaCl solution. After extraction, the genomic DNA was dissolved in 1 ml of TE buffer. The Fig. 1 Phylogenetic tree highlighting the position of the Natrinema altunense strain AJ2 T relative to phylogenetically closely related type strains within the family Halobacteriaceae. These sequences were aligned on the SINA Online service [40] based on SILVA SSU/LSU databases. According to the best nucleotide substitution models found by the maximum-likelihood method in MEGA6 [41], the algorithm of the Jukes-Cantor model [42] was used to calculate the evolutionary distances in the neighbour-joining (NJ) method. Numbers at branch nodes refer to bootstrap values ≥ 50% (based on 1000 replicates). Halobacterium salinarum DSM 3754 T (AJ496185) was used as an out-group. Bar, 0.01 substitutions per nucleotide position

Genome sequencing and assembly
The next-generation genome sequencing of N. altunense strain AJ2 T and quality control was performed using pyrosequencing technology on a GS FLX+ system (454 Life Sciences, Roche). One library with an insert size 2,000 bp was constructed and a total of 380 Mb clean data was obtained after filtering the adapter, artificial or low quality sequence. In other words we sequenced for a genome-wide average coverage of 87. A total of 630,866 reads were used for assembly and produced 20 contigs using the Newbler v.2.5 (454 Life Sciences, Roche). The average contig size was 188,706 bp and the largest contig size was 837,556 bp with the N50 size of 425,349 bp.

Genome annotation
The tRNA genes of strain AJ2 T were identified using tRNAscan-SE 1.21 [13] with an archaeal model, and its rRNA genes were found via RNAmmer 1.2 Server [14]. Other ORFs were predicted using Glimmer3 [15]. The predicted ORFs were translated and analysed using the BLASTp program (BLAST 2.2.26+) against the nonredundant, Swiss-Prot [16], Pfam [17] and COG [18] databases. Only results with an e-value smaller than 1 × e −5 were kept. For cross-validation purposes, we annotated the genome with a RAST server online [19]. KAAS [20] was used to assign the predicted amino acids into the KEGG Pathway [21] with the BBH method. Genes with transmembrane helices were predicted using TMHMM Server v.2.0 [22]. We attempted to predict signal peptides using SignalP 4.1 Server [23], but because there were not enough experimentally confirmed signal peptides in the Uni-Prot database [23], the online server failed to provide the archaeal group model. The circular map of the genome was obtained using a local CGView application [24] with adjusted parameters (−size medium -title ' AJ2 T ' -draw_divider_rings T -gene_decoration arc -linear circular). We uploaded the whole genome sequences in FASTA files and calculated the ANI value between every two genome sequences within the genus Natrinema and Haloterrigena on the EzGenome online server [25,26].

Genome properties
This high-quality draft genome sequence of N. altunense AJ2 T revealed a genome size of 3,774,135 bp (all 20 contigs length, 64.56% GC content). We predicted 4517 genes; 4462 are protein-coding sequences. A total of 3792 protein-coding genes (83.95%) were assigned to a putative function or as hypothetical proteins. We also found 52 tRNA genes (removed 1 Pseudo tRNA) and 3 rRNA genes (one 23 S rRNA, one 16 S rRNA and one 5 S rRNA). We assigned 1929 protein-coding genes (42.71%) to Pfam domains and categorized 2255 (49.92%) protein-coding genes into COGs functional groups (Table 4 and Fig. 3). This genome has a gene content redundancy of 36.11%, and there are 1631 protein coding genes belonging to 540 paralog clusters. The genomic ANI values within the Natrinema and Haloterrigena genera are listed in Table 5. In the Richter & Rosselló-Móra report, the proposed ANI cut-off for the species boundary is at 95~96% [25]. According to our calculation data, the ANI values between any two species of Natrinema with published genome sequences were lower than 93.2% and this value was observed between strains AJ2 T and Natrinema pallidum DSM 3751 T . We can also easily observe that N. pellirubrum show higher ANI values (>95%) with H. thermotolerans DSM 11522 T (95.4%) and H. jeotgali A29 T (95.2%). These data are also identical to the phylogenetic distance in the 16S rRNA maximum-likelihood tree (Fig. 1). In the tree, the other two strains N. salaciae MDB25 T and N. ejinorense EJ-57 T , which are in the same clade as genus Haloterrigena, lack of genome information for considering their ANI values in this study.

Insights from the genome sequence
We compared all sequenced strains in the genus Natrinema with strain AJ2 T according to the contig numbers, G + C content, predicted protein numbers, Fig. 3 Graphical circular map of the genome of N. altunense AJ2 T . Labelling from outside to the center: circle 1, CDSs on the forward strand (coloured by COG categories); circle 2, CDSs on the reverse strand (coloured by COG categories); circle 3, RNA genes (tRNAs red and rRNAs blue); circle 4, G + C content (peaks out/inside the circle indicate values higher or lower than the average G + C content 64.65%, respectively); circle 5, GC skew (calculated as (G-C)/(G + C) using a window size of 10000 and step of 100, green/purple peaks out/inside the circle indicates values higher or lower than average GC skew value (−0.0047), respectively); and circle 6, Genome size (Mbp) total length and N50, which are listed below ( Table 6). The other relevant genomic features were listed in Table 7. According to the chemotaxonomic information and characteristic features of strain AJ2 T that was mentioned before, the strain contains a flagellin domain protein in its genomic features to support cell motility. It also has DNA repair systems for protecting the stability of its genome from potential damage caused by UV radiation. Additionally, the energy converting system and light-driven pumps are introduced below.

Light-driven pumps
The strict living environment and lack of nutritious carbon/nitrogen sources cause diversification of metabolic pathway strain AJ2 T and similar halophilic archaea, as well as for haloarchaea, with more resources. Strain AJ2 T might use sunlight to produce ATP. We predicted the existence of two light-energy-converting system genes in the AJ2 T genome, namely bop and hop. The two encode homologous proteins bacteriorhodopsin and halorhodopsin, respectively. Bacteriorhodopsin and halorhodopsin share 36% of the amino acid residues in the transmembrane part and 19% in the surface connecting loops [27]. Bacteriorhodopsin is an integral membrane protein, called purple membrane, located in the archaea cell membrane, and it acts as a light-driven proton pump. It is mainly found in the Halobacteriaceae family [28,29]. It captures and uses light energy to move protons out of the cell membrane, resulting in a proton electrochemical gradient. Subsequently, the gradient is converted into chemical energy through ATP synthesis or is used to fuel flagellar motility and other energy requiring processes [30]. We obtained the complete bop gene (AY279548, JQ406920, and AFB77278) in the strain AJ2 T by the LPA method. We then successfully expressed the AJ2 T bacteriorhodopsin protein in E.coli BL21 with recombinant pET28a plasmid. This result indicates that the prediction of the bop gene is correct. Halorhodopsin is a light-activated chloride pump that is also found in archaea. It utilizes light to transfer the chloride ions into the cytoplasm and increase the electrochemical potential of the proton gradient [31]. This gene is extremely important for salty environment tolerance and, by reporting the existence of a hop gene in the N. altunense strain AJ2 T , we shed light on the potential mechanism of its adaptation to high salinity.
Bacteriorhodopsin, halorhodopsin and several related bacterio-opsin activator HTH domain proteins were also found in the other sequenced type strains N. pellirubrum, N. pallidum, N. gari and strain Natrinema sp. J7-2 (listed in Table 8). As the haloarchaea species of the genus Natrinema typically live in similar environment,

Conclusions
The genome of strain AJ2 T did not have the longest length in the sequenced strains of Natrinema, but it had most predicted proteins. Meanwhile, the assembled result in the strain AJ2 T had the lowest contig numbers and largest N50 length. This indicated the larger size of the library (2000 bp library) and the longer read length (up to 1000 bp with an average read length 603 bp) may significantly improve the assembling quality. Our genomic analysis of strain AJ2 T shed light on its ability to survive in the Ayakekum salt lake of Altun Mountain National Nature Reserve in Xinjiang, China. This lake is regarded as a relatively extreme environment with low nutrient levels, a cool temperature, strong sunlight and high-altitude. We found evidence for an alternative energy converting system to gain a supplementary  This data line represents the closest output obtained using BLASTp program against the nr database. These two genes are on contig 1 (position:629096-629767, forward strand) and contig 3 (position:389528-390385, forward strand) of the genome of strain AJ2 T , respectively energy source. The energy converting system, bacteriorhodopsin, halorhodopsin and HTH domain proteins, were also found in comparison it to all other sequenced strains in the genus Natrinema and they mostly share this energy-producing pathway.
More intensive study and data-mining need to be considered in genomes of the genus Natrinema or another halophilic archaeon. Then, we might find some reasons for these ancient archaeon to have so much vitality and prosperity in extreme environment on planet Earth.