Complete genome sequence of Jiangella gansuensis strain YIM 002T (DSM 44835T), the type species of the genus Jiangella and source of new antibiotic compounds

Jiangella gansuensis strain YIM 002T is the type strain of the type species of the genus Jiangella, which is at the present time composed of five species, and was isolated from desert soil sample in Gansu Province (China). The five strains of this genus are clustered in a monophyletic group when closer actinobacterial genera are used to infer a 16S rRNA gene sequence phylogeny. The study of this genome is part of the G enomic E ncyclopedia of B acteria and A rchaea project, and here we describe the complete genome sequence and annotation of this taxon. The genome of J. gansuensis strain YIM 002T contains a single scaffold of size 5,585,780 bp, which involves 149 pseudogenes, 4905 protein-coding genes and 50 RNA genes, including 2520 hypothetical proteins and 4 rRNA genes. From the investigation of genome sizes of Jiangella species, J. gansuensis shows a smaller size, which indicates this strain might have discarded too much genetic information to adapt to desert environment. Seven new compounds from this bacterium have recently been described; however, its potential should be higher, as secondary metabolite gene cluster analysis predicted 60 gene clusters, including the potential to produce the pristinamycin. Electronic supplementary material The online version of this article (doi:10.1186/s40793-017-0226-6) contains supplementary material, which is available to authorized users.

The genus Jiangella was first identified by Song et al. in 2005, including five halotolerant species listed at present by LPSN [2]. Members of this taxon isolated from different habitats, respectively, are rarely described except for their polyphasic approach based on combination of phenotypic and genotypic characteristics [1,[3][4][5][6]. The Jiangella was originally identified as a new genus of the family Nocardioidaceae within the suborder Propionibacterineae [1] based on phenotypic and genotypic criteria. However, the reconstruction of the phylogenetic relationships of Actinobacteria at higher taxa was done later based on using the 16S rRNA genes and other related evidences, such as taxon-specific 16S rRNA gene signature nucleotides [7,8]. After the genus Haloactinopolyspora was described by Tang et al., the genus Jiangella together with the genus Haloactinopolyspora were placed in a novel family Jiangellaceae belong to Jiangellineae subord. nov., mainly because of theirs signature nucleotide patterns, 16S rRNA gene similarity and phylogenetic criteria [9]. Presently, the J. gansuensis is placed in the family Jiangellaceae of the order Jiangellales within the class Actinobacteria [10].
The capacity of J. gansuensis YIM 002 T to produce seven new compounds (five pyrrol-2-aldehyde compounds, jiangrines A-E; one indolizine derivative, jiangrine F; one glycolipid, jiangolide) has previously been shown [11], highlighting the importance of this bacterium and its analysis as a novel source of secondary metabolites. As part of the GEBA project and considering its phylogenetic position and biological significance, we finally decided to sequence the genome of the type strain of J. gansuensis. Here we present a summary classification and a set of features for J. gansuensis YIM 002 T , together with the description of genomic sequencing and annotation. At the same time, we will provide a brief introduction of its genome in this article.

Classification and features
Strain YIM 002 T is a free-living isolate collected from a desert soil sample of Gansu Province during an investigation into microbial diversity of extreme environments. This actinobacterium forms welldifferentiated non-sporulating aerial and substrate mycelia. Its aerial hypha was observed to have yellow-white color at the earliest and finally turns to orange-yellow after few days on NA medium, and its substrate mycelia fragmented into short or elongated rods in the early phase of the growth (Fig. 1). Growth was observed on ISP 2, ISP 3, ISP 4, ISP 5, nutrient agar and Czapek's agar [1,12]. The type strain of this taxon is able to tolerate a pH range between 5.0 and 10.0, and able to growth at the salinity between 0 and 10% (w/v NaCl), with no growth observed at 12.5%. Optimal growth of strain YIM 002 T occurs at pH 7.0-8.0, 1-5% (w/v) NaCl and 28°C. The diamino acid in the peptidoglycan is LL-2,6diaminopimelate. MK-9(H 4 ) is the predominant menaquinone. The primary phospholipids profile of strain DSM 44835 T was found to consist of phosphatidylinositol mannosides, phosphatidylinositol and diphosphatidylglycerol. Its major cellular fatty acids (>10%) are anteiso-C 15:0 , anteiso-C 17:0 and iso-C 15:0 . Whole cell sugar composition includes glucose and ribose, whereas the amino acids in the peptidoglycan layer were LL-A 2 pm, alanine, glycine and glutamic acid [1]. The DNA G + C content of the type strain was previously determined as 70% while genome analysis showed a higher value of 70.91%.
The draft genome of J. gansuensis YIM 002 T has one almost full-length 16S rRNA gene sequence, which correspond perfectly with the original sequence from the species description (AY631071). The comparison of this 16S rRNA sequence of YIM 002 T using the EzTaxon-e server [13], showed highest similarity to Jiangella alba YIM 61503 T (98.93%), with close relationships to other species within the genus, Jiangella muralis 15-Je-017 T (98.88%), Jiangella mangrovi 3SM4-07 T (98.49%) and Jiangella alkaliphila D8-87 T (98.10%). Closest other genera are Haloactinopolyspora [9] and Phytoactinopolyspora [14]. The strains of the genus Jiangella have many 16S rRNA gene signature nucleotides compared with most of other described actinomycetes. This allows for distinguished them easily from other actinobacteria, especially in 11  Phylogenetic analyses were performed using both neighbor-joining (NJ) and maximum-likehood (ML) algorithms. The NJ phylogenetic tree of the genus Jiangella based on 16S rRNA genes provide an evidence of its independent taxon (Figs. 2 and Additional file 1: Figure S1), together with the genera Haloactinopolyspora and Phytoactinopolyspora, which arouse ours reflection on the relationship of three families among Jiangellaceae, Nocardioidaceae and Pseudonocardiaceae. The ML tree (Additional file 1: Figure S1) demonstrates the same positions in Jiangellaceae compared with the NJ tree. Minimum Information about the Genome Sequence is provided in Table 1.

Genome project history
This organism was selected for sequencing on the basis of its important phylogenetic position and biological significance [15,16], and for a better The Neighbour-joining tree was built using MEGA 5 [39] with the Kimura 2-parameter model. Bootstrap values (percentages of 1000 replicates) are shown at branch points. Asterisks denote nodes that were also recovered using the Maximum Likelihood method in the branch of Jiangellaceae. The Haloglycomyces albus act as the outgroup understanding of the school of 'evolutionary taxonomy' [17]. Sequencing of J. gansuensis YIM 002 T is part of Genomic Encyclopedia of Bacteria and Archaea pilot project [18], which aims for generating high quality draft genomes for bacterial and archaeal strains. The genome project is deposited in the Genomes OnLine Database (GOLD) [19], and the finished genome sequence was deposited in Gen-Bank. Genome sequencing, finishing and annotation were performed by the Department of Energy, Joint Genome Institute (JGI) using state of the art genome sequencing technology [20]. A summary of project information is shown in Table 2, compliance with MIGS version 2.0 [21].

Genome sequencing and assembly
All general aspect of library construction and sequencing performed can be found at the JGI website. The complete sequence in one scaffold was obtained from 9 contigs with the assembly method ALLPATHS v. R37654, obtaining a total size of 5.5 Mbp from a total volume data of 4 Gbases (Fig. 3).

Genome annotation
Prodigal [22] was used to identify genes as part of the JGI genome annotation pipeline [23,24] followed by a round of manual curation using the JGI GenePRIMP pipeline [25]. The National Center for Biotechnology Information non-redundant database, UniProt, TIGR/Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases were used to analyse the predicted CDSs after translation. RNA genes identification was done using HMMER Fig. 3 Graphical map of the J. gansuensis strain YIM 002 T chromosome. The genome circular map was set up by the CGView Server [46]. From the outside to the center: Genes on forward strand (colored by COG categories), Genes on reverse strand (colored by COG categories), GC content, GC skew, where green indicates positive values and magenta indicates negative values 3.0 [26] (rRNAs) and tRNAscan-SE 21.23 [27] (tRNAs). INFERNAL 1.0.2 [28] was used for prediction of other non-coding genes. Integrated Microbial Genomes Expert Review platform [29] permitted the additional gene prediction analysis and functional annotation. CRISPR elements were detected with CRT [30] and PILER-CR [31]. General statistics are shown in Table 3.

Genome properties
The assembly of the draft genome sequence consists of one scaffold for the strain YIM 002 T (Fig. 1), with 70.9% GC content (Table 3) in 5,585,780 nucleotides. From a total of 5104 genes, there were 4905 protein-coding genes, 149 pseudogens and 50 RNA genes. Numbers of the genes were assigned a putative function (48.86%), while the remaining protein-coding genes were annotated as hypothetical proteins. COGs categories distributions for the genes are presented in Table 4.

Insights from the genome sequence
The genome of YIM 002 T with a high G + C content and the smallest size within the Jiangella genomes (Table 3) may be the result of selection and mutation [32], which could involve several factors, such as environment, aerobiosis and others [33]. Generally speaking, a larger genome size may correlate with more complex habitat, suggesting that the genome encodes a large metabolic and stress-tolerance potential [34]. However, after we investigated the genome size of other type strains of Jiangella species, we found the size of the other three strains sequenced of this genus, J. alkaliphila, J. alba and J. muralis greater than 7 Mbp based on the genome data from NCBI. This result could implicate that the tight packing and small size of J. gansuensis is likely an adaptation for reproductive efficiency or competitiveness [35]. As a halotolerant actinobacterium, solute and ion transporter were predicted in its genome. At the same time, the genome shows properties related to solution of nitrate and sulfonate transport systems. Moreover, nitrite reductase and nitrogen fixation protein NifU were also detected. The capacity of this microorganism to produce antibiotics has been recently proved with the description of seven new compounds (five pyrrol-2-aldehyde compounds, jiangrines A-E; one indolizine derivative, jiangrine F; one glycolipid, jiangolide) [11]. However, its potential should be higher, taken account the 45 biosynthetic clusters found within the JGI tool [36] and the 497 genes implicated in these clusters. As most of the clusters appear to be putative genes in this analysis, a  The total is based on the total number of protein-coding genes in the genome second approach was carried out to detect the variety of biosynthetic types and enhance manual genome annotations of secondary metabolite biosynthesis. The software pipeline antiSMASH for secondary metabolite gene cluster identification, annotation and analysis was used [37,38]. From this analysis, 60 gene clusters were identified, including 20 gene clusters in which the most similar clusters were still unknown (Additional file 2: Table S1). The result of the analysis shown the potential of J. gansuensis to produce pristinamycin, an antibiotic derived from Streptomyces pristinaespiralis effective against staphylococcal infections, and other antibiotics.