- Extended genome report
- Open Access
Complete genome sequence of the haloalkaliphilic, obligately chemolithoautotrophic thiosulfate and sulfide-oxidizing γ-proteobacterium Thioalkalimicrobium cyclicum type strain ALM 1 (DSM 14477T)
Standards in Genomic Sciences volume 11, Article number: 38 (2016)
Thioalkalimicrobium cyclicum Sorokin et al. 2002 is a member of the family Piscirickettsiaceae in the order Thiotrichales. The γ-proteobacterium belongs to the colourless sulfur-oxidizing bacteria isolated from saline soda lakes with stable alkaline pH, such as Lake Mono (California) and Soap Lake (Washington State). Strain ALM 1T is characterized by its adaptation to life in the oxic/anoxic interface towards the less saline aerobic waters (mixolimnion) of the stable stratified alkaline salt lakes. Strain ALM 1T is the first representative of the genus Thioalkalimicrobium whose genome sequence has been deciphered and the fourth genome sequence of a type strain of the Piscirickettsiaceae to be published. The 1,932,455 bp long chromosome with its 1,684 protein-coding and 50 RNA genes was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program (CSP) 2008.
Strain ALM 1T (= DSM 14477 = JCM 11371) is the type strain of the species Thioalkalimicrobium cyclicum , one of four species in the genus Thioalkalimicrobium . The most prominent feature of T. cyclicum is its ability to live chemolithoautotrophically in the aerobic surface waters of a mixolimnion lake. Cultures of strain ALMT were first isolated from Mono Lake water samples taken from the sulfide-oxygen interface layer at a depth of 19 – 25 m . The species epithet for the organism was derived from the Latin adjective cyc.li’cum, cyclus, pertaining to the circle-like shape of the cells. For a short time after the initial description of the organism it was known as “ Thialkalimicrobium cyclicum ” until the Judical Commission of the International Committee on Systematics of Prokaryotes restored the correct genus name at the Xth International IUMS Congress of Bacteriology and Applied Microbiology in Paris (France) . Here we present a summary classification and a set of features for T. cyclicum ALM 1T (DSM 14477T ), together with the description of the genomic sequencing and annotation of the genome. Sequencing was done within the DOE JGI CSP 2008 for analysis of three type strains of alkaliphilic sulfur oxidizers.
Classification and features
A representative genomic 16S rDNA sequence of T. cyclicum ALM1T was compared using NCBI BLAST  under default settings (e.g., considering only the HSPs from the best 250 hits) with the most recent release of the Greengenes database  and the relative frequencies of taxa and keywords (reduced to their stem ) were determined, weighted by BLAST scores. The most frequently occurring genera were Thiomicrospira (74.7), Thioalkalimicrobium (11.2), ‘ Thialkalimicrobium ’ (8.4), Hydrogenovibrio (3.8) and ‘Thiovibrio’ (1.9 %) (49 hits in total). Regarding the single hit to sequences from members of the species, the average identity within HSPs was 98.7 %, whereas the average coverage by HSPs was 96.4 %. Regarding the single hit to sequences from other members of the genus, the average identity within HSPs was 98.5 %, whereas the average coverage by HSPs was 92.6 %. Among all other species, the one yielding the highest score was ‘ Thialkalimicrobium sibericum’ (AF126549), which corresponded to an identity of 98.6 and an HSP coverage of 96.3 %. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification, inverted commas indicate species names that are not approved.) The highest-scoring environmental sequence was DQ900619 (Greengenes short name ‘Sulfur-oxidizing Soap Lake (Washington State) meromictic haloalkaline unprecedented sulfide content lake water isolate ASL1ASL1 str. ASL1’, where ‘meromictic’ denotes a lake with separate, oxic and anoxic waterzones that do not intermix), which showed an identity of 99.7 % and an HSP coverage of 89.3 %. Environmental samples which yielded hits of a higher score than the highest scoring species were not found.
Figure 1 shows the phylogenetic neighborhood of T. cyclicum in a 16S rRNA based tree. The sequences of the two identical 16S rRNA gene copies in the genome differ by one nucleotide and a nine bp long gap from the previously published 16S rRNA sequence (AF329082), which contained nine ambiguous base calls.
The paraphyletic structure of the genus Thiomicrospira in Fig. 1 and the location of Hydrogenovibrio marinus and Galenea microaerophila within Thiomicrospira might indicate the need for genome sequence-based reclassifications once enough reference sequences become available.
Cells of T. cyclicum ALM 1T are non-motile, Gram-negative staining, irregular spheres often in the form of open rings with a diameter of 0.5–0.8 μm and a cell width of 0.3–0.4 μm (Table 1 and Fig. 2) . Carboxysome-like structures were frequently observed (see ). Colonies of strain ALM 1T are reddish, transparent with a diameter up to 3 mm . Cells oxidize thiosulfate and sulfide but grow less actively on polysulfide and tetrathionate . The pH range for growth is 6.5 to 11 (optimum 9.5) with a moderate salt concentration (about 0.6 M NaCl) .
The original description of strain ALM 1T  did not provide any chemotaxonomic information. No new chemotaxonomical data were generated for this report.
Genome sequencing information
Genome project history
This organism was selected for sequencing as part of the DOE JGI CSP 2008. The genome project is deposited in the Genomes On Line Database  and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE JGI using state of the art sequencing technology . A summary of the project information is shown in Table 2.
Growth conditions and genomic DNA preparation
Strain ALM 1T was grown from a culture of DSM 14477T in DSMZ medium 925 at 28 °C. gDNA was purified using the Genomic-tip 100 System (Qiagen) following the directions provided by the supplier. The purity, quality and size of the bulk gDNA preparation were assessed by JGI according to DOE-JGI guidelines which included electrophoretic separation of samples and comparison against standards of known molecular masses, analysis of UV absorption spectra and sequencing of the 16S rDNA.
Genome sequencing and assembly
The genome was sequenced using a combination of Illumina and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website . Pyrosequencing reads were assembled using the Newbler assembler . The initial Newbler assembly consisted of 15 contigs in one scaffold and the consensus contigs were computationally shredded to form 2 kb overlapping reads. Illumina GAii sequencing data (3,091 Mb) were assembled with Velvet  and the consensus sequences were computationally shredded into 1.5 kb overlapping reads. The computational shreds from both assemblies were assembled together with the 454 long-insert paired end reads using phrap [11, 12]. The 454 draft assembly was based on 171.4 Mb 454 draft data and all of the 454 paired end data. The Phred/Phrap/Consed software package [11–13] was used for sequence assembly and quality assessment in the subsequent finishing process. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with gapResolution [14, 65], Dupfinisher , or sequencing cloned bridging PCR fragments with subcloning. Gaps between contigs were closed by editing in Consed, by PCR and by bubble PCR primer walks  (J.-F. Chang, unpublished). A total of 74 additional reactions and one shatter library were necessary to close gaps and to raise the quality of the final sequence. Illumina reads were also used to correct potential base errors and increase consensus quality using a software Polisher developed at JGI . The error rate of the final genome sequence is less than 1 in 100,000. Together, the combination of the Illumina and 454 sequencing platforms provided 1559.4 × coverage of the genome. The final assembly contained 216,642 pyrosequence and 38,029,488 Illumina reads.
Genes were identified using Prodigal  as part of the DOE-JGI  genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline . The predicted CDSs were translated and used to search the NCBI non-redundant database, UniProt , TIGRFam , Pfam , PRIAM , KEGG , COG , and InterPro  databases. These data sources were combined to assert a product description for each predicted protein. Additional gene prediction analysis and functional annotation was performed within the IMG-ER platform .
The genome consists of a circular 1,932,455 bp chromosome with 47 % G + C content (Table 3 and Fig. 3). Of the 1734 genes predicted, 1684 were protein-coding genes, and 50 RNAs; 19 pseudogenes were also identified. The majority of the protein-coding genes (78.5 %) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.
Insights from the genome sequence
T. cyclicum has been described as an obligate chemolithoautotroph, and its genome contains only 1684 protein encoding genes indicating a reduction in gene contents possibly in adaptation to this lifestyle. Reductions in genome size are a common feature in bacteria from many specialized ecological niches where relatively stable growth conditions are encountered. Examples include other free-living bacteria in the family Piscirickettsiaceae , Thioalkalimicrobium species, Thiomicrospira crunogena and related species, although in most cases the genome reduction is not as extreme as in this case and ~ 2000 proteins are present.
During chemolithoautotrophic growth, strain ALM 1T oxidizes reduced sulfur compounds such as thiosulfate or sulfide without the formation of sulfur as an intermediate, which suggests that it uses a Sox-type sulfur oxidation pathway  rather than a combination of DSR and Sox proteins which leads to the formation of elemental sulfur as an intermediate and seems to be common in other Gammaproteobacteria , such as the phototrophic purple sulfur bacteria . The Sox sulfur oxidation pathway relies on four essential enzyme complexes, SoxAX, SoxYZ, SoxB, and SoxCD, to oxidize reduced sulfur compounds to sulfate without the formation of free intermediates [28, 30, 31], and all of these proteins are encoded in the T. cyclicum genome. In the reaction cycle of the Sox multienzyme complex the SoxAX cytochrome (encoded by Thicy_216 & Thicy_219) that catalyzes covalent attachment of reduced sulfur compounds such as thiosulfate to the SoxYZ carrier protein (Thicy_217, Thicy_218), the manganese-containing SoxB (Thicy_0833) protein then removes the fully oxidized sulfur residues from SoxYZ through hydrolysis, while the SoxCD sulfane dehydrogenase (Thicy_50, Thicy_51) a heterotetrameric complex of the molybdenum protein and a cytochrome, catalyzes a six electron oxidation of reduced sulfur residues bound to SoxYZ [32–36] (Fig. 4). All T. cyclicum Sox proteins are most closely related to homologues found in Thioalkalimicrobium aerophilum (94–95 % amino acid identity) as well as Thiomicrospira sp. and Hydrogenovibrio marinus , which is in keeping with the phylogenetic position of this bacterium.
In T. cyclicum the sox genes encoding the essential components of the Sox multienzyme complex are distributed in three separate genomic gene loci (Fig. 4). This is similar to what has been seen in the related Thiomicrospira species , but differs from the situation in other Proteobacteria , e.g. the α-Proteobacterium Paracoccus pantotrophus , where the genes encoding the core Sox enzymes as well as genes encoding accessory Sox proteins such as SoxV, W, H, S or the SoxF sulfide dehydrogenase are located in the same gene region and often even within only one or two major operons [28, 38].
In T. cyclicum ALM 1, most of genes encoding accessory Sox proteins appear to be absent, only a homologue of the SoxF flavocytochrome (Thicy_0003, Thicy_0004) (Fig. 4) and a gene encoding a homologue of the SoxH protein (Thicy_0412), the exact function of which is unknown, were detected during our analyses of the genome. This leads to the question of how essential these accessory proteins are for the function of the Sox complex. T. cyclicum , unlike many of the sulfur oxidizing α- Proteobacteria , is an obligate chemolithoautotroph and thus relies on the optimal function of this pathway for energy generation, and yet does not appear to rely on the accessory proteins to keep the core Sox enzymes functional.
Studies of the SoxAX cytochromes in various Proteobacteria have led to the realization that these proteins are extremely diverse. There are currently three recognized types of proteins that vary significantly in terms of the redox cofactors present as well as their subunit structure and specifically the sequences of the SoxX proteins involved in the SoxAX complexes [30, 31]. In comparison to biochemically characterized SoxA proteins, the protein encoded by the T. cyclicum soxA gene is most closely related to the Type III SoxA protein from Allochromatium vinosum (33.9 % amino acid sequence identity, as opposed to 21.8 and 17.4 % identity with the Type I and Type II SoxA proteins from P. pantotrophus and S. novella ) (Fig. 5). Type III SoxAX proteins normally contain three subunits, SoxA, SoxX and SoxK . The low molecular weight SoxK protein is required to stabilize the complex of SoxA and SoxX. In T. cyclicum , however, no gene encoding a protein homologous to SoxK appears to be present, which indicates that there is even more diversity of SoxAX proteins than previously assumed. A similar situation was already described by Ogawa et al.  for the SoxAX protein from the related bacterium, Thiomicrospira crunogena , and has also been discussed in depth, including a phylogenetic analysis across all groups of SoxA related proteins in two recent reviews [30, 31] (Fig. 5).
Although the Sox sulfur oxidation pathway has been recognized as a key pathway in microbial sulfur chemolithotrophy, issues still exist with the annotation of the various genes in automated annotation pipelines. For example, soxB genes are often annotated as encoding a ‘5′nucleotidase’, which is correct as SoxB does belong to this larger group of enzymes, but at the same time creates confusion as to the actual nature of the encoded protein. Dedicated SoxB protein domains exist (e.g. cd07411, abbreviated as “MPP_soxB_N” or SoxB proteins with an N-terminal metallophosphatase domain), and recently a dedicated full length domain, TIGR04486 (thiosulf_soxB) has been defined.
Another curiosity is the annotation of the SoxCD sulfane dehydrogenase (Thicy_0050/0051) as a ‘SO (sulfite oxidase) family protein’ (which is correct), and then as a ‘nitrate reductase (NADH)’. In as far as is currently known, the sulfite oxidase enzyme family only contains nitrate reductases from plants, and no prokaryotic nitrate reductases have ever been found in this enzyme family. Clearly, there is scope for improving the specificity of current COGs/cd patterns to avoid such obvious errors in the future, although the conserved domain cd_02113 is diagnostic for SoxC proteins, regardless of their annotation.
We also analyzed the T. cyclicum genome for other proteins known to be involved in sulfur oxidation that are not part of the Sox multienzyme complex. Such proteins are frequently found in sulfur oxidizing bacteria and enhance their ability to use different sulfur compounds, including those that are not generally recognized as substrates of the Sox pathways (e.g. tetrathionate) or to cope with toxic sulfur compounds that can be byproducts of abiotic sulfur conversions (e.g. sulfite and sulfide converting enzymes). No genes encoding homologues of DSR, APS reductase (aprABM), tetrathionate hydrolase (tth gene) or sulfite dehdyrogenases (sorAB) were identified. However, we did identify two genes (Thicy_0064 & Thicy_1132) that encode proteins with strong similarities to proteins annotated as SQRs in T. crunogena (Tcr_1170, Tcr_1381) .
The protein encoded by the putative SQR gene Thicy_1132 is actually related to Ndh NADH dehydrogenase-type proteins, while Thicy_0064 shows homology to ‘HcaD uncharacterized FAD dependent dehydrogenases, COG0446′. Using the SQR classification system of , the two T. cyclicum SQRs could be classified as a periplasmic (34 aa Tat- leader peptide; 52 % conserved aa) SqrB type protein (Thicy_0064), and a soluble, likely cytoplasmic SqrF-like protein (Thicy_1132, 49 % conserved aa). Interestingly, the Thicy_1132 encoded protein only has homology to the SqrF like proteins, while the Thicy_0064 encoded protein exhibited significant homologies to all SQR types except SqrE and SqrF. It would thus appear that T. cyclicum contains two SQRs of different types, as representatives from both SqrB and SqrF groups have been enzymatically characterized.
Overall despite the fact that sulfur oxidation is a key element of T. cyclicum metabolism, the actual number of genes supporting this process is very small and shows very little redundancy or diversity. All genes encoding essential proteins of the Sox pathway are present as single copies, and genes encoding other enzymes known to support chemolithotrophic growth on sulfur compounds are absent. This is in contrast to other sulfur oxidizing bacteria such as the haloalkaliphilic Thioalkalivibrio sulfidophilus  which contains several copies of genes encoding SoxAX proteins and Starkeya novella where two copies of SoxAX and SoxYZ encoding genes are present , as well as additional genes encoding sulfur converting enzymes that are not part of the Sox complex.
With chemolithoautotrophy being the major growth mode for T. cyclicum , we also investigated the carbon dioxide fixation pathways present in this bacterium. Of the known microbial pathways for carbon dioxide fixation only the Calvin Benson Bassham cycle was present, and carbon dioxide incorporation into phosphoenolpyruvate to form oxaloacetate, a required intermediate of the TCA cycle, was also identified using the KEGG pathway database .
Central carbon metabolism in T. cyclicum includes a complete set of genes encoding glycolysis and the pentose phosphate pathway as well as a pyruvate dehydrogenase enzyme complex and several routes by which pyruvate can be converted into oxaloacetate (PEP synthase, Thicy_1283, EC 18.104.22.168; PEP carboxylase, Thicy_1240, EC 22.214.171.124) or lactate (D-lactate dehydrogenase, Thicy_1457, 126.96.36.199)
The TCA cycle of T. cyclicum is incomplete, with genes encoding the 2-oxoglutarate dehydrogenase or homologous enzymes (e.g. 2-oxoglutarate:ferredoxin oxidoreductase, KorAB) not having been identified in the genome. This indicates that in T. cyclicum the TCA cycle mainly serves biosynthetic purposes rather than being part of general energy generation, which is in keeping with the chemolithoautotrophic lifestyle of this bacterium, as sulfur oxidation by the Sox pathway or via SQRs will feed electron directly into the respiratory chain for energy generation.
The respiratory chain of T. cyclicum is of a very linear architecture, with only complex I being represented by three different types of NADH dehydrogenases. A multisubunit (‘mitochondrial type’) NADH dehydrogenase (EC 188.8.131.52) is encoded by the nuo gene cluster (Thicy_0637–0650), while the other two are encoded by two genes (EC 184.108.40.206, Thicy_1224–1225) and a single gene (EC 220.127.116.11, Thicy_0083), respectively. Complex II/succinate dehydrogenase is encoded by genes Thicy_0875–0878, while Complex III/cytochrome bc1-complex is encoded by Thicy_0482–484. Only a single gene cluster encoding a cytochrome c oxidase appears to be present (Thicy_1535–1529), which encodes a cbb 3 -type cytochrome oxidase. This type of cytochrome oxidase is known to have a high affinity for oxygen and thus has been associated with microaerophilic growth conditions [43, 44], suggesting that in its natural environment T. cyclicum encounters medium to low oxygen tensions. In addition to this function, cbb 3 -type cytochrome oxidases have been implicated in affecting various regulatory processes in bacterial cells [45–47], including redox regulation and responses to environmental conditions, and it is possible that the enzyme from T. cyclicum also fulfills additional, regulatory functions. An F-type ATPase (Thicy_1606–1612) completes the respiratory chain.
With only about ~ 1700 encoded genes the genome of T. cyclicum ALM1T is relatively small compared to genomes from related sulfur oxidizing bacteria such as Thiomicrospira sp. or Thioalkalivibrio sp. which generally contain ~2000 or more protein encoding genes. The reduction in genome size becomes even more obvious in comparison to other sulfur chemolithoautotrophic bacteria (e.g. Starkeya novella , Paracoccus sp., Rhizobiales sp.) that often have more than 4000 encoded genes and also tend to encode redundant pathways. This again is likely to reflect the limited availability of substrates for energy generation in the organism’s natural habitat, which is an extreme environment with high alkalinity and salinity.
Despite the reliance of T. cyclicum on autotrophy for acquiring cell carbon, only a single pathway for carbon dioxide fixation was found, and only the Sox pathway for sulfur oxidation and a few additional proteins that enable efficient use of sulfide as an energy source (SQRs and flavocytochromes) were identified. This is in keeping with a direct oxidation of sulfur substrate such as thiosulfate and sulfide to sulfate without intermediate formation of elemental sulfur which is a trait of the other major sulfur oxidation pathway that uses the DsrAB dissimilatory sulfite reductase. It is also supported by our observation on aerobic cultures of T. cyclicum supplement with thiosulfate as an energy source, which showed no sign of sulfur formation, which would have led to increased, optically apparent, turbidity of the culture during growth.
However, with about 20 % of genes having either unknown functions or not being assigned a COG category, there are clearly many things that can still be discovered regarding this organism. The apparent absence of accessory genes aiding in the maturation of the essential Sox sulfur oxidation enzymes is unusual, and should be further investigated, as should the effect of a high pH environment on the physical and catalytic properties of the periplasmic Sox proteins. T. cyclicum also prefers moderate salt concentrations, and it would be interesting to carry out comparative studies on compatible solutes and other adaptations between species of haloalkaliphilic sulfur oxidizers.
Taxonomic and nomenclatural proposals
The difference in the reported G + C content of T. cyclicum (49.6 %)  to the one calculated from the genome sequence (47.0 %) calls for an emendation of the species description. The genome sequence-derived G + C content is also outside of the 48 to 51.2 % G + C range reported for the genus Thioalkalimicrobium .
Community Sequencing Program
Deutsche Sammlung von Mikroorganismen
Deutsche Sammlung von Mikroorganismen und Zellkulturen (German Collection of Microorganisms and Cell Cultures
dissimilatory sulfite reductase
high-scoring segment pair
integrated microbial genomes - expert review
international union of microbiological societies
Japan collection of microorganisms
Sulfide: quinone reductase
Sorokin DY, Gorlenko VM, Tourova TP, Tsapin A, Nealson KH, Kuenen GJ. Thioalkalimicrobium cyclicum sp. nov. and Thioalkalivibrio jannaschii sp. nov., novel species of haloalkaliphilic, obligately chemolithoautotrophic sulfur-oxidizing bacteria from hypersaline alkaline Mono Lake (California). Int J Syst Evol Microbiol. 2002;52:913–20.
Sorokin DY, Lysenko AM, Mityushina LL, Tourova TP, Jones BE, Rainey FA, Robertson LA, Kuenen GJ. Thioalkalimicrobium aerophilum gen. nov., sp. nov. and Thioalkalimicrobium sibericum sp. nov., and Thioalkalivibrio versutus gen. nov., sp. nov., Thioalkalivibrio nitratis sp. nov. and Thioalkalivibrio denitrificans sp. nov., novel obligately alkaliphilic and obligately chemolithoautotrophic sulfur- oxidizing bacteria from soda lakes. Int J Syst Evol Microbiol. 2001;51:565–80.
De Vos P, Truper HG, Tindall BJ. Judicial commission of the international committee on systematics of prokaryotes Xth international (IUMS) congress of bacteriology and applied microbiology. Minutes of the meetings, 28, 29 and 31 july and 1 august 2002, Paris, France. Int J Syst Evol Micro. 2005;55:525–35.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–72.
Porter MF. An algorithm for suffix stripping. Program: electronic library and information systems. 1980;14:130–7.
Pagani I, Liolios K, Jansson J, Chen I-MA, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC. The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012;40(D1):D571–9.
Mavromatis K, Land ML, Brettin TS, Quest DJ, Copeland A, Clum A, Goodwin L, Woyke T, Lapidus A, Klenk HP and others. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation. PLoS One. 2012;7(12):e48837.
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen Y-J, Chen Z and others. Genome sequencing in open microfabricated high density picoliter reactors. Nature. 2005;437(7057):376–80.
Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9.
Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces UsingPhred. I. Accuracy assessment. Genome Res. 1998;8(3):175–85.
Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8(3):186–94.
Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res. 1998;8(3):195–202.
LaButti K, Foster B, Han CG, Brettin TS, Lapidus A. Gap Resolution: A software package for improving Newbler genome assemblies. Report Number: LBNL-1899E Abs. 2009.
Han C, Chain P. Finishing repeat regions automatically with Dupfinisher. In: Arabnia HR, Valafar H, editors. Proceeding of the 2006 international conference on bioinformatics & computational biology: CSREA PRess. 2006. p. 141–6.
Smith DR. Ligation-rnediated PCR, of restriction fragments from large DNA molecules. Genome Res. 1992;2:21–7.
Lapidus A, LaButti K, Foster BSL, Trong SEG. POLISHER: an effective tool for using ultra short reads in microbial genome assembly and finishing. AGTB. 2008. Marco Island FT.
Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119–9.
Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A, Kyrpides NC. GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods. 2010;7(6):455–7.
Consortium TU. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43(D1):D204–12.
Haft DH, Selengut JD, White O. The TIGRFAMs database of protein families. Nucleic Acids Res. 2003;31(1):371–3.
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J and others. Pfam: the protein families database. Nucleic Acids Res. 2014;42(D1):D222–30.
Claudel-Renard C, Chevalet C, Faraut T, Kahn D. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 2003;31(22):6633–9.
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40(D1):D109–14.
Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33–6.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L and others. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37(Database issue):D211–5.
Markowitz VM, Mavromatis K, Ivanova NN, Chen I-MA, Chu K, Kyrpides NC. IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics. 2009;25(17):2271–8.
Friedrich CG, Bardischewsky F, Rother D, Quentmeier A, Fischer J. Prokaryotic sulfur oxidation. Current Opinion in Microbiol. 2005;8(3):253–9.
Frigaard NU, Dahl C. Sulfur metabolism in phototrophic sulfur bacteria. In: Poole RK, editor. Advances in Microbial Physiology. Volume Volume 54: Academic Press; 2008. p 103–200.
Kappler U, Maher M. The bacterial SoxAX cytochromes. Cell Mol Life Sci. 2013;70(6):977–92.
Kappler U, Maher MJ. SoxAX cytochromes. In: Scott RA, editor. Encyclopedia of Inorganic and Bioinorganic Chemistry. Chichester: John Wiley; 2013. doi:10.1002/9781119951438.eibc2169.
Sauve V, Roversi P, Leath KJ, Garman EF, Antrobus R, Lea SM, Berks BC. Mechanism for the hydrolysis of a sulfur-sulfur bond based on the crystal structure of the thiosulfohydrolase SoxB. J Biol Chem. 2009;284(32):21707–18.
Sauve V, Bruno S, Berks BC, Hemmings AM. The SoxYZ complex carries sulfur cycle intermediates on a peptide swinging arm. J Biol Chem. 2007;282(32):23194–204.
Bamford VA, Bruno S, Rasmussen T, Appia-Ayme C, Cheesman MR, Berks BC, Hemmings AM. Structural basis for the oxidation of thiosulfate by a sulfur cycle enzyme. EMBO J. 2002;21(21):5599–610.
Zander U, Faust A, Klink BU, de Sanctis D, Panjikar S, Quentmeier A, Bardischewsky F, Friedrich CG, Scheidig AJ. Structural basis for the oxidation of protein-bound sulfur by the sulfur cycle molybdohemo-enzyme sulfane dehydrogenase SoxCD. J Biol Chem. 2011;286(10):8349–60.
Kilmartin JR, Maher MJ, Krusong K, Noble CJ, Hanson GR, Bernhardt PV, Riley MJ, Kappler U. Insights into structure and function of the active site of SoxAX cytochromes. J Biol Chem. 2011;286(28):24872–81.
Scott KM, Sievert SM, Abril FN, Ball LA, Barrett CJ, Blake RA, Boller AJ, Chain PSG, Clark JA, Davis CR and others. The genome of deep-sea vent chemolithoautotroph Thiomicrospira crunogena XCL-2. PLoS Biol. 2006;4(12):2196–212.
Kappler U, Aguey-Zinsou KF, Hanson GR, Bernhardt PV, McEwan AG. Cytochrome c551 from Starkeya novella: characterization, spectroscopic properties, and phylogeny of a diheme protein of the SoxAX family. J Biol Chem. 2004;279(8):6252–60.
Ogawa T, Furusawa T, Nomura R, Seo D, Hosoya-Matsuda N, Sakurai H, Inoue K. SoxAX binding protein, a novel component of the thiosulfate-oxidizing multienzyme system in the green sulfur bacterium Chlorobium tepidum. J Bact. 2008;190(18):6097–110.
Gregersen LH, Bryant DA, Frigaard N-U. Mechanisms and evolution of oxidative sulfur metabolism in green sulfur bacteria. Front Microbiol. 2011;2. doi: 10.3389/fmicb.2011.00116.
Muyzer G, Sorokin DY, Mavromatis K, Lapidus A, Clum A, Ivanova N, Pati A, d’Haeseleer P, Woyke T, Kyrpides NC. Complete genome sequence of “Thioalkalivibrio sulfidophilus” HL-EbGr7. Stand Genomic Sci. 2011;4(1):23–35.
Kappler U, Davenport K, Beatson S, Lucas S, Lapidus A, Copeland A, Berry KW, Glavina Del Rio T, Hammon N, Dalin E and others. Complete genome sequence of the facultatively chemolithoautotrophic and methylotrophic alpha-Proteobacterium Starkeya novella type strain (ATCC 8093 T). Stand Genomic Sci. 2012;7(1):44–58.
Cosseau C, Batut J. Genomics of the ccoNOQP-encoded cbb3 oxidase complex in bacteria. Arch Microbiol. 2004;181(2):89–96.
Marchal K, Sun J, Keijers V, Haaker H, Vanderleyden J. A Cytochrome cbb3 (Cytochrome c) terminal oxidase in Azospirillum brasilense Sp7 supports microaerobic growth. J Bact. 1998;180(21):5689–96.
Hamada M, Toyofuku M, Miyano T, Nomura N. cbb3-type cytochrome c oxidases, aerobic respiratory enzymes, impact the anaerobic life of Pseudomonas aeruginosa PAO1. J Bact. 2014;196(22):3881–9.
Colburn-Clifford J, Allen C. A cbb3-Type Cytochrome C oxidase contributes to Ralstonia solanacearum R3bv2 growth in microaerobic environments and to bacterial wilt disease development in tomato. Mol Plant-Microbe Interactions. 2010;23(8):1042–52.
Kim YJ, Ko IJ, Lee JM, Kang HY, Kim YM, Kaplan S, Oh JI. Dominant role of the cbb3 oxidase in regulation of photosynthesis gene expression through the PrrBA system in Rhodobacter sphaeroides 2.4.1. J Bact. 2007;189(15):5617–25.
Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV and others. The minimum information about a genome sequence (MIGS) specification. Nat Biotech. 2008;26(5):541–7.
Deck J, Barker K, Beaman R, Buttigieg PL, Dröge G, Guralnick R, Miller C, Tuama ÉÓ, Murrell Z, Parr C and others. Clarifying concepts and terms in biodiversity informatics. Stand Genomic Sci. 2013;8(2):352–9.
Garrity G. NamesforLife. BrowserTool takes expertise out of the database and puts it right in the browser. Microbiology Today. 2010;37:9.
Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A. 1990;87(12):4576–9.
Garrity GM, Bell JA, Lilburn T. Phylum XIV. Proteobacteria phyl. nov. In: Brenner DJ, Krieg NR JTS, Garrity GM, editors. Bergey’s Manual of Systematic Bacteriology. New York: Springer; 2005. p. 1. Volume 2 pt B.
Garrity GM, Bell JA, Lilburn T. Class III. Gammaproteobacteria class. nov. In: Brenner DJ, Krieg NR JTS, Garrity GM, editors. Bergey’s Manual of Systematic Bacteriology. secondth ed. New York: Springer; 2005. p. 1–59. Volume 2 pt B.
Validation List No. 106. List of new names and new combinations previously effectively, but not validly, published. Int J Syst Evol Micro. 2005;55:2235–8.
Garrity GM, Bell JA, Lilburn T, Order V. Thiotrichales ord. nov. In: Brenner DJ, Krieg NR JTS, Garrity GM, editors. Bergey’s Manual of Systematic Bacteriology. secondth ed. New York: Springer; 2005. p. 131–210. Volume 2 pt B.
Fryer JL. C.N. L. Family II. Piscirickettsiaceae fam. nov. In: Brenner DJ, Krieg NR JTS, Garrity GM, editors. Bergey’s Manual of Systematic Bacteriology. secondth ed. New York: Springer; 2005. p. 180. Volume 2 pt B.
The Gene Ontology C, Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS and others. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.
Lee C, Grasso C, Sharlow MF. Multiple sequence alignment using partial order graphs. Bioinformatics. 2002;18(3):452–64.
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17(4):540–52.
Stamatakis A, Hoover P, Rougemont J. A rapid bootstrap algorithm for the RAxML web servers. Syst Biol. 2008;57(5):758–71.
Pattengale ND, Alipour M, Bininda-Emonds ORP, Moret BME, Stamatakis A. How many bootstrap replicates are necessary? J Comp Biol. 2010;17(3):337–54.
Swofford DL. PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), version 4.0 b 10. Sunderland: Sinauer Associates; 2002.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28:2731–9.
JGI website: www.jgi.doe.gov.
Website for gapResolution: jgi.doe.gov/data-and-tools/gap-resolution/
The work conducted by the U.S. Department of Energy Joint Genome Institute was supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02–05CH11231, and a Fellowship and grant to UK (DP 0878525).
UK, SB, MdCM-C, MG, HPK, and NCK drafted the manuscript. KD, AL, CP, CH, ML, LH, NI and TW sequenced, assembled and annotated the genome. MR provided the electron micrograph. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.