Draft genome sequence of Streptomyces sp. MWW064 for elucidating the rakicidin biosynthetic pathway

Streptomyces sp. MWW064 (=NBRC 110611) produces an antitumor cyclic depsipeptide rakicidin D. Here, we report the draft genome sequence of this strain together with features of the organism and generation, annotation and analysis of the genome sequence. The 7.9 Mb genome of Streptomyces sp. MWW064 encoded 7,135 putative ORFs, of which 6,044 were assigned with COG categories. The genome harbored at least three type I polyketide synthase (PKS) gene clusters, seven nonribosomal peptide synthetase (NRPS) gene clusters, and four hybrid PKS/NRPS gene clusters, from which a hybrid PKS/NRPS gene cluster responsible for rakicidin synthesis was successfully identified. We propose the biosynthetic pathway based on bioinformatic analysis, and experimentally proved that the pentadienoyl unit in rakicidins is derived from serine and malonate.


Introduction
Rakicidin D is an inhibitor of tumor cell invasion isolated from the culture broth of an actinomycete strain MWW064 of the genus Streptomyces [1]. To date, five congeners rakicidins A, B, and E from Micromonospora and rakicidins C and D from Streptomyces have been reported [1][2][3][4]. Rakicidins share the 15-membered cyclic depsipeptide structure comprising three amino acids and a fatty acid modified with hydroxy and methyl substitutions. The most intriguing part of rakicidins is a rare unusual amino acid, 4-amino-2,4-pentadienoate (APDA), which is present only in a limited range of secondary metabolites of actinomycetes such as BE-43547 [5] and microtermolide [6,7]. Despite the scarcity of APDA unit in nature, nothing is known about its biosynthesis. Recently, putative biosynthetic genes for rakicidin D were reported [8], but the data is incomplete, no detailed information is shown in the paper, and DNA sequences have not been registered in public databases. Hence, the biosynthesis of rakicidins has been actually unclear yet. In this study, we performed whole genome shotgun sequencing of the strain MWW064 to elucidate the biosynthetic mechanism of rakicidin D. We herein present the draft genome sequence of Streptomyces sp. MWW064, together with the taxonomical identification of the strain, description of its genome properties and annotation of the gene cluster for rakicidin synthesis. We propose the rakicidin-biosynthetic mechanism predicted by bioinformatics analysis and confirmed by precursor-incorporation experiments.

Classification and features
In the course of screening for antitumor compounds from actinomycetes, Streptomyces sp. MWW064 was isolated from a marine sediment sample collected in Samut Sakhon province of Thailand and found to produce rakicidin D [1]. The general feature of this strain is shown in Table 1. This strain grew well on ISP 2 and ISP 4 agars. On ISP 5 and ISP 7 agars, the growth was poor. The color of aerial mycelia was white and that of the reverse side was pale red on ISP 2 agar. Diffusible pigments were dark orange on ISP 2 agar medium. Strain MWW064 formed extensively branched-substrate and aerial mycelia. The aerial mycelium formed flexuous spore chains at maturity. The spores were cylindrical, having a smooth surface. A scanning electron micrograph of this strain is shown in Fig. 1. Growth occurred at 15-37°C (optimum 28°C) and pH 5-9 (optimum pH 7). Strain MWW064 exhibited growth with 0-3 % (w/v) NaCl (optimum 0 % NaCl). Strain MWW064 utilized glucose and inositol for growth. The gene sequence encoding 16S rRNA was obtained from GenBank/ EMBL/DDBJ databases (accession no. GU295447). A phylogenetic tree was reconstructed on the basis of the 16S rRNA gene sequence together with taxonomically close Streptomyces type strains using ClustalX2 [9] and NJPlot [10]. The phylogenetic analysis confirmed that the strain MWW064 belongs to the genus Streptomyces (Fig. 2).

Genome project history
In collaboration between Toyama Prefectural University and NBRC, the organism was selected for genome sequencing to elucidate the rakicidin biosynthetic pathway. We successfully accomplished the genome project of Streptomyces sp. MWW064 as reported in this paper. The draft genome sequences have been deposited in the INSDC database under the accession number BBUY01000001-BBUY01000099. The project information and its association with MIGS version 2.0 compliance are summarized in Table 2 [13]. Phylum Actinobacteria TAS [25] Class Actinobacteria TAS [26] Order Actinomycetales TAS [26][27][28][29] Suborder Streptomycineae TAS [26,29] Family Streptomycetaceae  The tree uses sequences aligned by ClustalX2 [9], and constructed by the neighbor-joining method [35]. All positions containing gaps were eliminated. The building of the tree also involves a bootstrapping process repeated 1,000 times to generate a majority consensus tree, and only bootstrap values above 50 % are shown at branching points. Streptomyces albus NBRC 13014 T was used as an outgroup   The total is based on the total number of protein coding genes in the genome No. of modules Backbone of predicted product pks/nrps-1 (rak) scaffold 9 6 7 R-C 3 -C 3 -Ser-C 2 -Gly-X pks/nrps-2 scaffold 5 6 14

Genome sequencing and assembly
Shotgun and paired-end libraries were prepared and subsequently sequenced using 454 pyrosequencing technology and HiSeq1000 (Illumina) paired-end technology, respectively ( Table 2). The 70 Mb shotgun sequences and 739 Mb paired-end sequences were assembled using Newbler v2.8 and subsequently finished using GenoFinisher [14] to yield 99 scaffolds larger than 500 bp.

Genome annotation
Coding sequences were predicted by Prodigal [15] and tRNA-scanSE [16]. The gene functions were annotated using an in-house genome annotation pipeline, and PKS-and NRPS-related domains were searched using the SMART and PFAM domain databases. PKS and NRPS gene clusters and their domain organizations were determined as reported previously [17]. Substrates of adenylation (A) and acyltransferase (AT) domains were predicted using antiSMASH [18]. BLASTP search against the NCBI nr databases were also used for predicting function of proteins encoded in the rakicidin biosynthetic gene cluster.

Genome properties
The total size of the genome is 7,870,697 bp and the GC content is 71.1 % (Table 3), similar to other genomesequenced Streptomyces members. Of the total 7,206 genes, 7,135 are protein-coding genes and 71 are RNA genes. The classification of genes into COGs functional categories is shown in Table 4. As for secondary metabolite pathways by modular PKSs and NRPSs, Streptomyces sp. MWW064 has at least four hybrid PKS/NRPS gene clusters, three type I PKS gene clusters, and seven NRPS gene clusters. According to the assembly line mechanism [19], we predicted the chemical backbones that each cluster will synthesize (Table 5), suggesting the potential of Streptomyces sp. MWW064 to produce diverse polyketide-and nonribosomal peptide-compounds as the secondary metabolites.

Insights from the genome sequence
Rakicidin biosynthetic pathway in Streptomyces sp. MWW064 The chemical structure of rakicidin D suggested that it is synthesized by a hybrid PKS/NRPS pathway. Among  C signal intensity of each peak in the labeled 1 divided by that of the corresponding signal in the unlabeled 1, respectively, normalized to give an enrichment ratio of 1 for the unenriched peak of C7. The numbers in bold type indicate 13 C-enriched atoms from 13 C-labeled precursors the four hybrid PKS/NRPS gene clusters present in Streptomyces sp. MWW064 (Table 5), pks/nrps-1 is most likely responsible for rakicidin synthesis because the carbon backbone of the predicted product (R-C 3 -C 3 -Ser-C 2 -Gly-X) is in good accordance with that of rakicidin D. Genes in pks/nrps-1 (Table 6) encode enzymes necessary for rakicidin biosynthesis (Fig. 3). This cluster contains three PKS genes (SSP35_09_01910, SSP35_09_01900, SSP35_09_01880) and three NRPS genes (SSP35_0 9_01890, SSP35_09_01870, SSP35_09_01860), corresponding to rakAB, rakC, rakEF, rakD, rakG, and rakH [8], respectively. Based on the collinearity rule of modular PKS/NRPS pathways, it is deduced that RakAB loads a starter molecule ('R' in Fig. 3), and subsequently RakAB and RakC add a diketide chain to the starter by condensation of two methylmalonyl-CoA molecules, since the substrates of their AT domains are likely methylmalonyl-CoA (' AT m ' in Fig. 3). An NRPS RakD and the remaining PKS RakEF are most likely involved in the APDA supply: the A domain of RakD has signature amino acid residues for serine, and RakEF contains a set of domains (AT, KR, DH) for malonate incorporation, ketoreduction, and dehydration to provide a double bond between C9 and C10. In addition, the DH domain in RakEF is also proposed to be responsible for the dehydration of the primary hydroxy group of the incorporated serine molecule on the basis of the following reasons although experimental evidences are required. First, no dehydratase gene is present near the rakicidin cluster. In the biosynthesis of dehydroalanine in bacterial peptides such as lantibiotics, a dehydratase catalyzes the exo-methylene formation from serine [20,21]. Second, the order of KR and DH domains in RakEF is unusual: among the three hundred type I PKS genes for eighty actinomycete polyketides, the order of two domains is exclusively DH-KR [22]. The only exception can be seen in the PKS genes for enediynes in which the chain elongation is iteratively catalyzed as similar to type II PKS [23]. The unusual order of KR-DH may render an undescribed function to the DH domain of RakEF. After formation of APDA moiety, RakG is likely responsible for the condensation of glycine and the following Nmethylation, and RakH for asparagine condensation. Hydroxylation of asparagine would be catalyzed by asparagine hydroxylase encoded by rakO in the downstream of the cluster, to yield rakicidin D. On the basis of the abovementioned bioinfomatic evidences, we here propose the biosynthetic pathway of rakicidin D as shown Fig. 3.

Identification of biosynthetic precursors of the APDA moiety
To verify the predicted biosynthetic origin of the APDA unit, feeding experiments using 13 C-labeled precursors were carried out. Inoculation, cultivation, extraction, and purification were performed in the same manner as previously reported [1]. Addition of sodium [2-13 C]acetate or [1-13 C]-L-serine (20 mg/100 ml medium/flask, 10 flasks for [2-13 C]acetate, 3 flasks for [1-13 C]-L-serine) was initiated at 48 h after inoculation and periodically carried out every 24 h for four times. After further incubation for 24 h, the whole culture broths were extracted with 1-butanol and several steps of purification yielded 2.5 mg and 1.7 mg of 13 C-labeled rakicidin D, respectively. The 13 C NMR spectrum of these labeled rakicidin D is shown in Table 7. Feeding of sodium [2-13 C]acetate gave enrichments at C9 of the APDA unit and three carbons C18, C20, and C22 in the aliphatic chain of the fatty acid moiety. [1-13 C]-L-serine feeding enriched C10 of the APDA unit and the carbonyl carbon of Gly (C5). These results unambiguously indicated that the APDA unit is derived from an acetate and a serine (Fig. 4). Labeling of C5 by serine-feeding can be explained by the interconversion between glycine and serine by transformylase in primary metabolism for amino acid supply.

Conclusions
The 7.9 Mb draft genome of Streptomyces sp. MWW064, a producer of rakicidin D isolated from marine segment, has been deposited at GenBank/ENA/DDBJ under the accession number BBUY00000000. We successfully identified the PKS/NRPS hybrid gene cluster for rakicidin synthesis and proposed the plausible biosynthetic pathway. Labeled precursor incorporation experiments showed the APDA moiety is synthesized from serine and malonate. These finding will open up possibilities of genetic engineering to synthesize more potential rakicidin-based antitumor compounds and discovering new bioactive compounds possessing APDA units.