Genome sequencing and analysis of Ralstonia solanacearum phylotype I strains FJAT-91, FJAT-452 and FJAT-462 isolated from tomato, eggplant, and chili pepper in China

Ralstonia solanacearum is an extremely destructive pathogen able to cause disease in a wide range of host plants. Here we report the draft genome sequences of the strains FJAT-91, FJAT-452 and FJAT-462, isolated from tomato, eggplant, and chili pepper, respectively, in China. In addition to the genome annotation, we performed a search for type-III secreted effectors in these strains, providing a detailed annotation of their presence and distinctive features compared to the effector repertoire of the reference phylotype I strain (GMI1000). In this analysis, we found that each strain has a unique effector repertoire, encoding both strain-specific effector variants and variations shared among all three strains. Our study, based on strains isolated from different hosts within the same geographical location, provides insight into effector repertoires sufficient to cause disease in different hosts, and may contribute to the identification of host specificity determinants for R. solanacearum.


Introduction
Ralstonia solanacearum is often considered one of the most destructive bacterial pathogens, causing bacterial wilt disease in more than 250 plant species worldwide [1]. The pathogenicity of R. solanacearum heavily relies on the injection of proteins inside plant cells through a type-III secretion system (T3SS). The versatility of R. solanacearum strains correlates with the presence of a larger number of T3SS substrates, called type-III effectors (T3Es), encoded in their genomes, in comparison to other bacterial pathogens [2]. T3Es are important virulence factors required by most gram-negative pathogens to manipulate plant cells and cause disease [3,4]. Bacteria from a single R. solanacearum strain can inject more than 70 T3Es (termed Rips for Ralstonia injected proteins) inside plant cells [2,5]. Studies conducted in Pseudomonas syringae and Xanthomonas axonopodis strains indicate that T3E repertoires are highly variable among strains of these species, and led to the hypothesis that T3E composition may shape the host range of bacterial pathogens [6,7]. Although the genome sequences and T3E repertoires have been defined for several R. solanacearum strains, repertoire comparisons have failed in identifying host specificity determinants so far [2], which may suggest that genome sequences from additional strains infecting different hosts are required for this analysis. Additionally, the diversity in the geographical origins of sequenced strains hinders this comparative analysis, since additional environmental factors, such as temperature, light, and humidity may have a significant impact on the requirement of effectors for a successful infection. In this project, we sequenced and annotated the genomes of the R. solanacearum strains FJAT-91, FJAT-452 and FJAT-462, isolated from tomato (Solanum lycopersicum), eggplant (Solanum melongena), and chili pepper (Capsicum annum), respectively, in the Fujian province (China) [8]. In addition, we performed a search for T3Es in these strains, providing a detailed annotation of their presence and distinctive features compared to the effector repertoire of the reference phylotype I strain (GMI1000). To our knowledge, this is the first report of genome sequences combined with T3E repertoire analysis performed in strains isolated from different hosts with the same geographical origin.

Classification and features
Ralstonia solanacearum belongs to the order Burkholderiales of the class Betaproteobacteria. It is an aerobic, Gram-negative bacterium, naturally present in soil, water, infected plants or plant debris. It has a worldwide distribution, with higher incidence in tropical and subtropical regions, but also present in other temperate areas [9]. R. solanacearum is the agent causing bacterial wilt disease in multiple host plants, characterized by a sudden wilt of the whole plant. The strains sequenced in this study, FJAT-91, FJAT-452 and FJAT-462, were isolated from naturally infected tomato (Solanum lycopersicum), eggplant (Solanum melongena), and chili pepper (Capsicum annum) plants, respectively, in the Fujian province (China). Plants showing typical wilting symptoms were collected, surface-sterilized, and the tissue was homogenized with sterile water before plating serial dilutions to determine the causal agent [8,9]. Sequence analysis determined that they belong to the R. solanacearum species complex [8]. The pathogenicity of FJAT-91 has been confirmed and used as positive control for pathogenicity assays in tomato plants in previous studies [10]. All three isolated strains displayed the typical physiological features of strains from the R. solanacearum species complex, showing aerobic growth in laboratory conditions, and were able to form 3-4 mm colonies within 2 days at 28°C when grown on a rich laboratory medium containing tetrazolium chloride and high glucose content. For all three strains, colony shape was irregular, mucooid, and displayed a pink area in the middle of the colony and a large white edge (Fig. 1). Gene sequence analysis of PCR-amplified fliC, hrpB and pehA genes indicated that these strains belong to the phylotype I (represented by the reference strain GMI1000; Fig. 2), mostly formed by Asian strains [2]. The classification and general features of the three strains are summarized in the Tables 1, 2

Genome project history
This sequencing project was started in 2015, assembly and annotation was performed in 2016. Assembled draft genome sequences for the strains FJAT-91, FJAT-452 and FJAT-462 have been deposited to GenBank (Table 4). Fig. 1 Images of the Ralstonia solanacearum strains using a stereo microscope to visualize colony morphology on solid medium. The strains were grown on rich medium at 28°C for 2 days. Scale bars (1 mm) and the size of representative colonies are provided for reference Raw genomic reads have been deposited to the Sequence Read Archive with accession numbers SRP09 1690, SRR4431158, SRR4431159, SRR4428740.
Growth conditions and genomic DNA preparation R. solanacearum strains were grown in rich medium (10 g/l bactopeptone, 1 g/l yeast extract and 1 g/l casamino acids). Genomic DNA was extracted from bacterial cultures grown to stationary phase for 18 h at 28°C and shaking at 220 rpm (OD 600 = 1) using the Blood & Cell Culture DNA Mini kit (Qiagen), following manufacturer's instructions for gram-negative bacteria. DNA concentration and quality were measured using a Qubit 2.0 Fluorometer (Invitrogen).

Genome sequencing and assembly
For each genome, we prepared a paired-end library with an average insert size of 470 bp and sequenced the library for 250 bp from both ends using Illumina HiSeq 2500. The number of raw read bases was greater than 300 million (>50x genome coverage) for each sequenced strain. The raw sequencing data were first preprocessed to remove adapter sequences, low-quality regions, and short sequences (less than 20 nucleotides) with Cutadapt [11] and SolexaQA [12]. The remaining clean reads were de novo assembled into contigs and scaffolds by using SOAPdenovo2 and GapCloser v1.12 [13]. Contigs and scaffolds were further assembled into chromosome, plasmid and scaffolds with CONTIGuator, using the GMI1000 genome as the reference. The resulting FJAT-91, FJAT-452 and FJAT-462 genomes are 4,620,128 bp, 5,334,434 bp and 5,083,617 bp, respectively (Table 5), close to the genome length of the R. solanacearum reference strain GMI1000 (5,810,922 bp) [14].

Genome annotation
Genome annotation was performed using Prokka (v1.11) [15] with the option for non-coding RNA (ncRNA) search. The COG database [16] and Pfam v30.0 [17] were used for functional annotation of genes. T3Es in the three newly sequenced strains were identified and study, relative to other sequenced strains from the same species. The phylogenetic tree was constructed using concatenated alignments of the marker genes fliC, hrpB and pehA. The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model [27]. Evolutionary analyses were conducted in MEGA7 [28]. GenBank accession numbers are displayed within brackets. Ralstonia pickettii 12 J was used as an outgroup , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [37] annotated in two steps: first, 52, 62 and 60 of the T3Es from the R. solanacearum species complex [2] were identified in FJAT-91, FJAT-452 and FJAT-462, respectively, based on Prokka annotations; second, known T3Es protein sequences [2] were used as query to search the assembled genome sequences of three strains using BLAST [18] with a stringent significance cutoff of e-value < 1e-30, identity > 60, and coverage on the query T3E protein sequence being over 50% or at least 100 aa in length. As a result, 72, 78 and 75 T3Es were identified in FJAT-91, FJAT-452 and FJAT-462, respectively. These two sets of T3E genes were merged together to generate the final lists of T3E genes in the three genomes. To identify the sequence variations within T3E genes between three strains and the reference strain, the clean reads from the three newly  , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [37] sequenced strains were mapped to the reference genome GMI1000 using BWA (v0.7.12) [19]. SNPs and INDELs were identified using Samtools (v0.1.19) [20] and vcftools (v0.1.12) [21] and were further annotated using SnpEff (v4.0) [22].

Genome properties
The genome of R. solanacearum strain FJAT-91 has 329 scaffolds and the average GC content of the genome is 60.6% (Table 5). A total of 6,522 genes (6457 CDSs and 65 ncRNAs) were predicted. Of the protein-coding genes, 2544 (39.4%) had functions assigned while 3913 were considered hypothetical (

Insights from the genome sequence
Comparative analysis of virulence-related genes T3E proteins are essential virulence factors in most gramnegative bacterial pathogens, such as R. solanacearum [2,5], although they can also be perceived by resistant hosts as invasion signals, leading the development of plant defense responses [23]. The expression of genes encoding T3Es and structural components of the T3SS is activated after the perception of plant signals, and coordinated by a well-studied signaling pathway [24]. We analyzed the presence of genes involved in plant sensing and virulence regulation in the newly sequenced strains, and found that all the major regulators are present in the three strains (Table 7). These genes displayed a high percentage of similarity when compared to their homologs in the GMI1000 reference strain, ranging from 98.97 to 100% at the DNA level and from 99.19 to 100% at the amino acid level ( Table 7). The composition of T3E repertoires often defines the host range of specific strains. In this regard, we have identified over 70 T3Es in each strain based on comparisons with effector sequences in public databases (Table 8). The total is based on the total number of protein coding genes in the genome Comparisons with the reference GMI1000 strain suggest that the FJAT-91 strain lacks the T3E genes ripAG, ripS4, ripM, ripP3, hyp16, ripAI and ripY; the FJAT-452 strain lacks the T3E genes ripP3, hyp16 and ripM, and the FJAT-462 strain lacks the T3E genes ripAI, ripS4, ripP3, hyp16, ripM and ripAM. On the other hand, several T3E genes that are not present in GMI1000 were found in the three newly sequenced strains, including ripBE (in FJAT-462), ripS7 (in FJAT-452), hyp7 (in FJAT-452 and FJAT-462) and ripAL and ripF2 (in all 3 strains). The presence of most new T3E genes was confirmed by sequence analysis of PCR-amplified fragments from the three strains, being 100% identical among them and very similar or identical (78.84-100%) to their closest orthologs from other sequenced strains (Fig. 3). However, the hyp7 gene from FJAT-462 has a 1206 bp insertion annotated as a transposase 180 bp downstream the start codon (Fig. 3). By comparing the sequences of the T3E genes that are shared by the three newly sequenced strains and the reference strain GMI1000, we identified 652, 798 and 692 variant sites in T3E sequences of FJAT-91, FJAT-452, FJAT-462, respectively (Table 9). These variations were classified into 7 types: missense variant, synonymous variant, frame shift variant, inframe deletion, inframe insertion, stop codon gain, and stop codon loss. Among them, 351 variations are shared by the three newly sequenced strains (Fig. 4). For example, the effector ripA1 has both missense and synonymous variants, ripAZ1 have a frame-shift variant, and ripX has an inframe deletion in all three strains (Fig. 5). Besides T3Es, R. solanacearum employs several additional virulence factors to achieve infection, such as EPS. The signaling cascade leading to the production of EPS involves several different regulatory components [25]. We analyzed the presence of genes involved in the regulation of EPS production, and found that all the major regulators are present in the three strains (Table 7). These genes displayed a high percentage of similarity when compared to their homologs in the GMI1000 reference strain, with most genes ranging from 98.35%-100% at the DNA level (98.26-100% at the amino acid level), with the exception of phcB, which shows a lower similarity in the FJAT-91 and FJAT-452 strains (86.41% at the DNA level in both strains) ( Table 7). Other genes encoding putative virulence factors, such as egl (encoding an endoglucanase) and pehB (encoding an exo-polyα-d-galacturonosidase) were also present in the three strains, with >99% similarity at the DNA and amino acid level compared to GMI1000 (Table 7).

Conclusions
Earlier studies on the T3E repertoires of different plant pathogens suggested that T3E composition might shape the host range [6,7]. In this study, we sequenced and analysed the genome of three R. solanacearum strains isolated from different host plants  with similar geographical origin (Fujian province, China). Our analysis indicates that each one of these strains have a unique effector repertoire (Table 7). In contrast to what we observed for T3E genes, all the analysed genes involved in the perception of plant signals and the regulation of virulence factors were present in all strains, and displayed a high degree of similarity between the newly sequenced strains and the GMI1000 reference strains ( Table 7), suggesting that the mechanism of perception of plant signals does not differ significantly among bacteria infecting different plant species. In addition to their presence or absence in specific strains, T3E genes may undergo several types of mutations that change or disrupt their coding sequence. As a consequence, the encoded proteins may lose the original function, become unstable, or gain a new function. This allelic diversification may be imposed by the host defense system, and allows and ripAL (c) sequence alignment. The nucleotide sequence of these genes in the strains sequenced in this study is 100% identical to each other, except for hyp7, which has an insertion annotated as a transposase in FJAT-462 (numbers indicate the insertion site). The percentage of identity compared to the orthologs in other sequenced strains is indicated in the figure  [26]. We identified alterations in effector sequences that were conserved in the three sequenced strains (Fig. 4). These sequence modifications may be due to the geographical distribution of these strains in comparison with the GMI1000 reference strain, originally isolated from French Guyana (South America), and may have functional relevance in the subversion of host functions in specific environmental conditions. Similarly, it is noteworthy that the FJAT-91 strain lacks 7 T3Es compared to GMI1000, while both are able to cause disease in tomato plants. Comparative analyses using the same tomato cultivars in controlled conditions will determine whether (i) these effectors are really dispensable to infect tomato, (ii) these effectors are dispensable in specific tomato cultivars, (iii) these effectors trigger immunity in specific tomato cultivars, or (iv) the environmental conditions in the FJAT-91 isolation site are more favourable to R. solanacearum infection, rendering unnecessary their virulence activities. The strain-specific absence of T3E genes or strain-specific loss-of-function variants (Fig. 4) may be caused by adaptation of these strains to specific hosts. Similarly, the transposase insertion in hyp7 (specific from FJAT-462; Fig. 3) is likely to alter or abolish the function of the encoded T3E in this strain, and may suggest that this T3E is not needed (or its alteration is actually required) to cause disease in chili pepper plants. Additional functional characterization will be required to determine whether these effectors induce immune responses in eggplant or chili pepper, and may allow the identification of novel sources of resistance against R. solanacearum. Our analysis shows that these unique effector repertoires are sufficient to cause disease in different hosts within a similar geographical location, allowing us to reduce the impact of environmental conditions in the analysis of the requirement of T3Es to cause infection. This information, together with the increasing number of sequenced R. solanacearum strains, constitutes one more step towards the identification of host specificity determinants for R. solanacearum.