Draft genome sequence of Micromonospora sp. DSW705 and distribution of biosynthetic gene clusters for depsipeptides bearing 4-amino-2,4-pentadienoate in actinomycetes

Here, we report the draft genome sequence of Micromonospora sp. DSW705 (=NBRC 110037), a producer of antitumor cyclic depsipeptides rakicidins A and B, together with the features of this strain and generation, annotation, and analysis of the genome sequence. The 6.8 Mb genome of Micromonospora sp. DSW705 encodes 6,219 putative ORFs, of which 4,846 are assigned with COG categories. The genome harbors at least three type I polyketide synthase (PKS) gene clusters, one nonribosomal peptide synthetase (NRPS) gene clusters, and three hybrid PKS/NRPS gene clusters. A hybrid PKS/NRPS gene cluster encoded in scaffold 2 is responsible for rakicidin synthesis. DNA database search indicated that the biosynthetic gene clusters for depsipeptides bearing 4-amino-2,4-pentadienoate are widely present in taxonomically diverse actinomycetes.


Introduction
In our screening of antitumor compounds from rare actinomycetes, Micromonospora sp. DSW705 collected from deep seawater was found to produce rakicidins A and B. Rakicidins are fifteen-membered cyclic depsipeptides comprising three amino acids and a modified fatty acid. The most intriguing feature of rakicidins is the presence of a rare unusual amino acid, 4-amino-2,4-pentadienoate (APDA) in their cyclic structures, which is present only in a limited range of secondary metabolites of actinomycetes [1][2][3]. To date, five rakicidin congeners have been reported; rakicidins A, B, and E were isolated from Micromonospora, and rakicidins C and D from Streptomyces [4][5][6][7]. Recently, we disclosed the biosynthetic gene (rak) cluster for rakicidin D through the genome analysis of Streptomyces sp. MWW064 and proposed its biosynthetic pathway (Komaki1 H, Ishikawa A, Ichikawa N, Hosoyama A, Hamada M, Harunari E, Nihira T, Panbangred W, Igarashi Y. Draft genome sequence of Streptomyces sp. MWW064 for elucidating the rakicidin biosynthetic pathway. Stand Genomic Sci.), if its volume and pages are determined. In this study, the whole genome shotgun sequencing of Micromonospora sp. DSW705 was conducted to assess its potential in secondary metabolism, to identify the biosynthetic genes for rakicidins A and B, and to make a comparative analysis with the gene cluster of rakicidin D in Streptomyces sp. MWW064. We here report the draft genome sequence of Micromonospora sp. DSW705, together with the taxonomical identification of the strain, description of its genome properties, and annotation of the rakicidin gene cluster. Furthermore, we investigated distribution of the rak-like clusters in other bacterial strains to evaluate the gene distribution in taxonomically diverse actinomycetes.

Classification and features
In the screening of antitumor compounds from rare actinomycetes, Micromonospora sp. DSW705 was isolated from deep seawater collected in Toyama Bay, Japan and found to produce BU-4664 L and rakicidins A and B (unpublished). The general feature of this strain is shown in Table 1. This strain grew well on ISP 2 and ISP 4 agars. On ISP 7 agars, the growth was poor. No growth was observed on ISP 5 agar. No aerial mycelia were observed. Substrate mycelium was orange, turning dark brown on sporulation on ISP 2 agar. No diffusible pigment was observed on ISP 2, ISP 3, ISP 4, ISP 5, ISP 6, and ISP 7 agar media. The strain bored single spore on short sporophore. The spores were spherical (0.7-0.8 μm in diameter) with wrinkle surface. A scanning electron micrograph of the strain is shown in Fig. 1. Growth occurred at 20-45°C (optimum 37°C) and pH 5-8 (optimum pH 7). Strain DSW705 exhibited growth with 0-3 % (w/v) NaCl (optimum 0 % NaCl). Strain DSW705 utilized arabinose, fructose, glucose, raffinose, sucrose, and xylose for growth. This strain was deposited in the NBRC culture collection with the registration number of NBRC 110037. The genes encoding 16S rRNA were amplified by PCR using two universal primers, 9 F and 1541R. After purification of the PCR product by AMPure (Beckman Coulter), the sequencing was carried out according to an established method [8]. Homology search of the sequence by EzTaxon-e [9] indicated the highest similarity (99.66 %, 1448/1453) to Micromonospora chalcea DSM 43026 T (X92594) as the closest type strain. A phylogenetic tree was reconstructed using ClustalX2 [10] and NJPlot [11] on the basis of the 16S rRNA gene sequence together with those of taxonomically close type strains showing over 98.5 % similarities. Evolutionary distances were calculated using Kimura's two-parameter model [12]. The tree has been deposited into TreeBase (http:// purl.org/phylo/treebase/phylows/study/TB2:S19405). In the phylogenetic tree, strain DSW705 and M. chalcea

Chemotaxonomic data
The isomer of diaminopimelic acid in the whole-cell hydrolysate was analyzed according to the method described by Hasegawa et al. [13]. Isoprenoid quinones and cellular fatty acids were analyzed as described previously [14]. The whole-cell hydrolysate of strain DSW705 contained meso-diaminopimelic acid as its diagnostic peptidoglycan diamino acid. The predominant menaquinone was identified as MK-10(H 4 ); MK-9(H 4 ), MK-10(H 2 ), and MK-10(H 6 ) were also detected as minor components. The major cellular fatty acids were found to be iso-C 16:0 , iso-C 15:0 and anteiso-C 17:0 .

Genome sequencing information
Genome project history In collaboration between Toyama Prefectural University and NBRC, the organism was selected for genome sequencing to elucidate the rakicidin biosynthetic pathway. The draft genome sequences have been deposited in the INSDC database under the accession number BBVA01000001-BBVA01000024. The project information and its association with MIGS version 2.0 compliance are summarized in Table 2 [15].

Growth conditions and genomic DNA preparation
Micromonospora sp. DSW705 was deposited in the NBRC culture collection with the registration number of NBRC Fig. 2 Phylogenetic tree of Micromonospora sp. DSW705 and phylogenetically close type strains showing over 98.5 % similarity to strain DSW705 based on 16S rRNA gene sequences. The accession numbers for 16S rRNA genes are shown in parentheses. The tree was reconstructed by the neighbor-joining method [33] using sequences aligned by ClustalX2 [10]. All positions containing gaps were eliminated. The building of the tree also involves a bootstrapping process repeated 1,000 times to generate a majority consensus tree, and only bootstrap values above 50 % are shown at branching points. Actinoplanes teichomyceticus NBRC 13999 T was used as an outgroup. Bar, 0.005 K nuc substitutions per nucleotide position

Genome sequencing and assembly
Shotgun and paired-end libraries were prepared and subsequently sequenced using 454 pyrosequencing technology and HiSeq1000 (Illumina) paired-end technology, respectively ( Table 2). The 36 Mb shotgun sequences and 682 Mb paired-end sequences were assembled using Newbler v2.6 and subsequently finished using GenoFinisher [16] to yield 24 scaffolds larger than 500 bp. The N50 was 629,027 bp.

Genome annotation
Coding sequences were predicted by Prodigal [17] and tRNA-scanSE [18]. The gene functions were annotated by an in-house genome annotation pipeline, and searched for domains related to polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS)  The total is based on the total number of protein coding genes in the genome R starter molecule, C 3 C 3 unit derived from methylmalonyl-CoA, C 2 C 2 unit derived from malonyl-CoA, X amino acid unpredicted, ? lack of A domain in the NRPS module, C 4 C 4 unit derived from ethylmalonyl-CoA or methoxymaronyl-CoA, C ? substrate of AT domain was not predicted a Although antiSMASH predicted that the AT domain incorporates malonyl-CoA as the substrate, the signature sequence for substrate determination is not HAFHS for malonyl-CoA but TSSHS likely for methylmaronyl-CoA [32] using the SMART and PFAM domain databases. PKS and NRPS gene clusters and their domain organizations were determined as reported previously [8] and using antiSMASH [19]. Substrates of adenylation (A) and acyltransferase (AT) domains were predicted using anti-SMASH. BLASTP search against the NCBI nr databases were also used for predicting function of proteins encoded in the rak cluster.

Genome properties
The total size of the genome is 6,795,311 bp and the GC content is 72.9 % (Table 3), similar to other genome-sequenced Micronomospora members. Of the total 6,273 genes, 6,219 are protein-coding genes and 54 are RNA genes. The classification of genes into COGs functional categories is shown in Table 4. As for secondary metabolite pathways by modular PKSs   [20], we predicted the chemical structures which each cluster would synthesize (Table 5), suggesting the potential of Micromonospora sp. DSW705 to produce diverse polyketide-and nonribosomal peptidecompounds as secondary metabolites.

Insights from the genome sequence
Rakicidin biosynthetic gene cluster in Micromonospora sp. DSW705 Our previous study revealed that rakicidin is synthesized by a hybrid PKS/NRPS gene cluster. Its domain organization is shown in Fig. 3a (SIGS-D-16-00018.2). Among the three hybrid PKS/NRPS gene clusters present in the Micromonospora sp. DSW705 genome shown in Table 5  organization as the rak cluster of Streptomyces sp. MWW064 (Fig. 3b). Since this gene cluster encodes all the enzymes necessary for assembling the rakicidin core structure, this cluster was confirmed as a rak cluster (Table 6). Gene organizations of the clusters for rakicidin D in Streptomyces sp. MWW064 (Fig. 3a) and rakicidins A and B in Micromonospora sp. DSW705 (Fig. 3b) are essentially identical. Proposed biosynthetic pathway for rakicidins in Micromonospora sp. DSW705 is illustrated in Fig. 3b.

Biosynthetic gene clusters for rakicidins and the related compounds in other strains
Since the BLAST analysis shown in Table 6 suggests that other Micromonospora strains such as M. purpureochromogenes and Micromonospora sp. M42 may possess rak clusters, hybrid PKS/NRPS gene clusters similar to rak clusters were searched for bacterial strains whose genome sequences and the ORF information are available in the GenBank database. We carried out BLAST search using RakEF sequence of Micromonospora sp. DSW705 and Streptomyces sp. MWW064 as the queries, and then analyzed each of the gene clusters encoding RakEF orthologues using antiSMASH [19] and manually if necessary. As shown in Fig. 4, three Micromonospora, 19 Streptomyces, three Frankia, one Nocardiopisis, one Salinispora, and two Kitasatospora strains were found to possess hybrid PKS/NRPS gene clusters encoding RakEF orthologues. On the basis of the domain organizations and amino-acids substrates of A domains, these gene clusters can be classified into four groups (Fig. 4). M. purpureochromogenes NRRL B-2672 harbors a rak cluster as same as Micromonospora sp. DSW705 and Streptomyces sp. MWW064. Micromonospora sp. M42 also possesses almost the same cluster, but the methyltransferase (MT) domain in module 5 (m5) is not present and some ORFs are fragmented (Fig. 4a).
Eighteen gene clusters categorized into Fig. 4b have domain organizations similar to rak clusters but the substrate of A domain in m6 was predicted to be Lvaline. As vinylamycin and microtermolide contain a valine residue in their depsipeptide structure [1,2], the four gene clusters of "Streptomyces rubellomurinus" ATCC 31215 and three Frankia strains were proposed to be responsible for vinylamycin biosynthesis. A plausible biosynthetic pathway for vinylamycin is illustrated in Fig. 5a. If the loading modules incorporate a C 3 unit or LMs encode an AT domain for a C 3 starter instead of the CoA-ligase domain, the cluster is likely responsible for microtermolide biosynthesis. The remaining 14 strains in Fig. 4b lack a KR domain in m2. In the clusters of eight among the 18 strains, NRPSs for m5 and m6 are encoded the complementary strands, although the cluster of Streptomyces durhamensis NRRL ISP-5539 T was not completely sequenced. Streptomyces sp. 769 does not have the PKS for LM and m1. In the cluster of Streptomyces sp.
MspMP-M5, the PKS likely for LM and m1 is encoded downstream of the PKS gene for m4, although the gene cluster was not completely sequenced. The cluster of Nocardiposis sp. CNS639 likely lacks a LM, and some domains are distinct from those of other strains. Gene clusters of Salinispora arenicola CNR107 and Micromonospora sp. RV43 contain three NRPS modules at m3, m4, and m6, which were predicted to incorporate glycine, serine, and glycine, respectively. Only BE-43547 is known as a depsipeptide containing two glycines and APDA moiety. According to the domain organization, these two clusters are proposed to be involved with BE-43547 production as illustrated in Fig. 5b. Figure 4d shows gene clusters in which the last NRPS module incorporates amino acids different from  Fig. 3: Fig. 3a, white;  Fig. 3b, blue; Fig. 3c, yellow; Fig. 3d, green and red. Strains whose 16S rRNA gene sequences are neither registered nor almost complete are excluded from this analysis those of the other three groups described above. Five gene clusters shown in green were predicted to incorporate Ltyrosine into the polyketide/nonribosomal peptide chains by m6. Since depsipeptides bearing both tyrosine and APDA residues are not known, products from these clusters may be structurally novel. Two gene clusters of Streptomyces celluloflavas NRRL B-2493 T and Streptomyces albus subsp. albus NRRL B-2513 showed the same domain organization as rak clusters, but NRPS substrate prediction suggests incorporation of L-glutamate and Ltryptophan/β-hydroxy-tyrosine (bht) by m6, respectively. Because rakicidin analogues containing these amino acids in place of the asparagine residue have not been reported, production of novel APDA-containing peptides is expected in these strains.

Distribution of the gene clusters among genome-sequenced strains
Whole genome sequencing has been performed for a large number of actinomycete strains. At present, genome sequences of over 227 Streptomyces species, eight species and six strains of Kitasatospora, eight species and seven strains of Micromonospora, three Salinispora species, one species and 97 strains of Frankia, and 18 species and 6 strains of Nocardiopsis are available from the GenBank database. Among them, 29 strains possess the rak-like gene clusters. To investigate the correlation between evolution and secondary metabolite gene distribution, strains harboring the rak-like gene clusters (shaded in black) were mapped onto the phylogenetic tree of genome-sequenced strains based on 16S rRNA gene sequences (Fig. 6). Micromonospora strains are divided into two clades, one of which includes three rakicidin-producers and one BE-43547-producer. Strain MWW064 is the only Streptomyces that possesses the rak cluster other than Micromonospora. In contrast, vinylamycin-related gene clusters, shown in blue, are distributed in taxonomically diverse Streptomyces strains. It is noteworthy that two Frankia strains have the same gene cluster whereas only four compounds have been described for Frankia species [21]. This genus should be more examined for secondary metabolite production. BE-43547 gene clusters are present only in two strains of two genera belonging to the family Micromonosporaceae in this analysis. But, since this compound was originally found from Streptomyces [3], the gene cluster must also be present in the genus Streptomyces. Presence of gene clusters for depsipeptides containing a tyrosine residue is limited to the genus Kitasatospora and phylogenetically close Streptomyces members. The S. celluloflavas NRRL B-2493 T gene cluster shows a similar domain organization to those of rak clusters stated above, but this strain is not taxonomically close to rakicidin producers.

Conclusions
The 6.8 Mb draft genome of Micromonospora sp. DSW705, a producer of rakicidins A and B isolated from deep seawater, has been deposited at GenBank/ENA/ DDBJ under the accession number BBVA00000000. This strain contains seven PKS and NRPS gene clusters, from which rakicidin-biosynthetic gene cluster was identified. Gene clusters for the synthesis of rakicidins or the related compounds are present in taxonomically diverse actinomycete strains, belonging to Micromonospora, Salinispora, Frankia, Nocardiposis, Kitasatospora, and Streptomyces. These findings provide useful information for discovering new and diverse depsipeptides bearing the APDA unit, and accelerate understanding of relationship between taxonomy and secondary metabolite gene distribution, and will possibly provide the insight regarding to the evolution of secondary metabolite genes.