Complete genome anatomy of the emerging potato pathogen Dickeya solani type strain IPO 2222T

Several species of the genus Dickeya provoke soft rot and blackleg diseases on a wide range of plants and crops. Dickeya solani has been identified as the causative agent of diseases outbreaks on potato culture in Europe for the last decade. Here, we report the complete genome of the D. solani IPO 2222T. Using PacBio and Illumina technologies, a unique circular chromosome of 4,919,833 bp was assembled. The G + C content reaches 56% and the genomic sequence contains 4,059 predicted proteins. The ANI values calculated for D. solani IPO 2222T vs. other available D. solani genomes was over 99.9% indicating a high genetic homogeneity within D. solani species. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0208-0) contains supplementary material, which is available to authorized users.


Introduction
Dickeya are pectinolytic enterobacteria that cause soft rot and blackleg diseases on a wide range of crops worldwide including potato plants (Solanum tuberosum) [1,2]. They are equipped with an arsenal of plant-cell wall degrading enzymes that macerate tuber and stem tissues provoking disease symptoms [3]. In the beginning of the 2000′s, D. solani emerged as a novel species causing blackleg and soft rot diseases on potato in Europe and Mediterranean Basin [4]. Initially, several pectinolytic strains isolated from potatoes grown in Europe and Israel, were identified as members of the Dickeya genus, but shown to exhibit distinctive genetic and physiological traits (biovar 3). Thereafter, additional phylogenetic and biochemical analyses have brought these isolates into a distinct clade called D. solani [5][6][7][8]. The D. solani strain IPO 2222 T was isolated from infected potato plants in The Netherlands in 2007 [9].
To date, 12 draft genomes of D. solani are available in GenBank databases. Among them, the genome of the strain IPO 2222 T was sequenced using 454-pyrosequencing with a low average genome coverage (14×). The resulting draft genome is composed of 91 contigs that were assembled in a single scaffold [9]. In this report, we combined Illumina and Pacific Biosciences technologies to provide a complete genome sequence of the strain IPO 2222 T . We also highlighted some phylogenetic and phenotypic key-features of the D. solani species.

Organism information
Classification and features D. solani IPO 2222 T belongs to the order of Enterobacteria and the class of Gammaproteobacteria. The gapAbased phylogenetic tree ( Fig. 1) was congruent with the previously reported trees inferred from MLSA [8,10], gathering all D. solani strains in a distinct clade within the Dickeya genus. The gapA housekeeping gene was chosen instead of 16S rRNA gene because the sequence analysis of gapA permit a highly resolved view of distinction between members of the Dickeya genus [8,10].
D. solani IPO 2222 T is a Gram negative, non-sporeforming, motile and facultative anaerobic bacterium with rod shaped cells (0.9x2.0 μm) (Fig. 2) [8]. The strain IPO 2222 T grows in TY medium (tryptone 5 g/L, yeast extract 3 g/L and agar 1.5%) at 28°C forming 1-2 mm colonies within 24 h. It produces phosphatase and indole and belongs to Dickeya biovar 3 as described previously [10]. Distinctive metabolic abilities of D. solani species were described using BIOLOG system [11]; among them, D. solani IPO 2222 T uses urea as sole nitrogen source (Additional file 1: Figure S1). D. solani IPO 2222 T was recovered form naturally infected potato plants showing blackleg and soft rot symptoms. Its aggressiveness was confirmed by infecting potato tubers and plants in greenhouse assays (Additional file 2: Figure S2). In addition, its ability to colonize the roots and stem tissues and to provoke disease symptoms has been reported using green fluorescent protein-tagged strain [12].
The strain IPO 2222 T has been registered at the Belgian Co-ordinated Collections of Micro-organisms (LMG 25993 T ), the National Collection of Plant Pathogenic Bacteria in UK (NCPPB 4479 T ), and the International Center for Microbial Resources -French collection of Fig. 1 Phylogenetic tree highlighting the relative position of D. solani IPO 2222 T within other Dickeya and Pectobacterium species. The unique gapA gene was retrieved from each of the complete and draft genomes that are available in NCBI database; alignment was generated using MUSCLE [23]; the evolutionary history was inferred using the Neighbor-Joining method [24] and the evolutionary distances were computed using the Maximum Composite Likelihood method [25]. Phylogenetic analyses were conducted using MEGA7 software [26] Table 1.

Genome project history
The genome sequence of D. solani strain IPO 2222 T was sequenced using two technologies, PacBio RSII and Illumina NextSeq 500. This organism was selected based on the agricultural relevance as an emerging pathogen with a significant impact on the potato production and trade in Europe and around the world. Project information is available from Genome Online database number Gp0138842 under the Gold study number Gs0118682 at Joint Genome Institute. The complete genome sequence is also deposited in GenBank under the accession number CP015137. In Table 2, we provide a summary of the project information and its association with MIGS [13].
Growth conditions and genomic DNA preparation D. solani IPO 2222 T was routinely cultured in TY medium at 28°C. Genomic DNA extraction was performed from 5 mL overnight culture using a phenolchloroform purification method followed by an ethanol precipitation as described by Wilson [14]. Quantification and quality control of the DNA was completed using a NanoDrop (ND 1000) device, Qubit® 2.0 fluorometer and agarose (1.0%) gel electrophoresis.

Genome sequencing and assembly
Second generation sequencing was performed using NextSeq 500 (Illumina, CA, USA) at the I2BC platform (Gif-sur-Yvette, France). A paired-end library was constructed with an insert size of 390 bp and sequencing was carried out using 2 × 151 bp paired-end read module. The de novo assembly (length fraction, 0.5; similarity, 0.8) was performed using CLC Genomics Workbench (v8.0) software (CLC Inc, Aarhus, Denmark). After quality (quality score threshold 0.05) and length (above 40 nucleotides) trimming of the sequences, 33 contigs (N50 = 266,602 bp) were generated (CLC parameters: automatic determination of the word and bubble sizes with no scaffolding) with a 450× average genome coverage. The largest contig length was 617,431 bp.
Third generation sequencing was performed using PacBio RSII (Pacific Biosciences, CA, USA) at the University of Class Gammaproteobacteria TAS [28,29] Order "Enterobacteriales" TAS [28,29] Family Enterobacteriaceae TAS [30] Genus  Prior to assembly, short reads (less than 500 bp) were filtered off and the minimum polymerase read quality used for mapping of sub-reads from a single zero-mode waveguides was set at 0.75. In total 146,263 reads were obtained (N50 value was 9,161 bp) and total base pair number was at 1,070,191,526 resulting in a 217× average genome coverage. Reads were assembled using RS_HGAP_Assembly software (V2.0). The cut-off length of seeding reads was set at 13,304 bp in order to serve as a reference for the recruitment of shorter reads for preassembly. The resulted consensus accuracy based on multiple sequence alignment of the sub-reads was at 99.99%. The de novo Illumina-contigs were used to verify the RS_HGAP assembly by blasting them against the PacBio sequence. In addition, the trimmed Illumina reads were mapped (length fraction, 0.5; similarity, 0.8) against the PacBio sequence and errors (SNPs and InDels), that might be generated by homopolymers during PacBio sequencing, were searched and corrected using basic variant calling tool from CLC genomic workbench. Using these two sets of sequences, the complete genome sequence was approved and circularized.

Genome annotation
The complete genome of D. solani IPO 2222 T was annotated using the NCBI prokaryotic genome annotation pipeline [15]. The protein coding gene prediction process begin by an alignment using ProSplign [16] where only complete alignments with 100% identity to a reference protein are kept for final annotation. Then the remaining frameshift or partial alignments were further analyzed by GeneMarkS+ [17]. To identify structural rRNA, the pipeline uses BLASTn search against the curated reference set. tRNAscan-SE was used to identify the tRNAs [18]. The CRISPRs are identified by using the CRISPR database [15].

Genome properties
The detailed information about Dickeya solani IPO 2222 T genome is provided in Table 3. The genome is constituted of one circular chromosome, 4,919,833 bp in size. The annotation predicted 4,208 genes including 4,059 CDSs (Table 4), 104 RNA genes (75 tRNA, 22 rRNA and 7 ncRNA genes) and 45 pseudo genes. The  The total does not correspond to 4,208 CDS because some genes are associated with more than one COG functional categories G + C reached 56%. The graphical genome map is provided in the Fig. 3.
Insights from the genome sequence D. solani species is genetically highly homogenous with 99.9% in genomic similarity (ANI value) [19,20]. Between two given D. solani genomes, the number of variations (SNPs/InDels) is below one hundred. For example, when D. solani strain 3337 and D. solani strain IPO 2222 T were compared, 49 variations were observed: 15 were located out of CDS and 34 within CDS [19]. Only a few of D. solani genomes (strains RNS 07.7.3B, PPO 9019 and PPO 9134) exhibited a higher number of variations (>1000) because they acquired D. dianthicola genes by horizontal gene transfer [19]. None horizontal gene transfer from D. dianthicola was observed in strain IPO 2222 T . Plant-cell wall degrading enzymes comprising pectinases, proteinases and cellulases, play a major role in the plant tissue maceration process [21]. Indeed, 10 pectates lyase enzymes (genes pelABCDEILXWZ) were predicted in strain IPO 2222 T genome; they showed a 93.3% average nucleotide identity when compared to the orthologous genes of D. dadantii 3937.
Recent comparative analyses underlined the major genetic and metabolic divergences between Dickeya solani species and the nearest clades that are D.
dandatii (ANI 94%) and D. dianthicola (ANI 92%) [11,19]. D. solani is characterized by a low content of phages elements and CRISPR system: in strain IPO 2222 T genome, only one CRISPR cluster (208 bp) was identified. Using PHAST tool [22], the strain IPO 2222 T harbors one questionable prophage (11 CDSs) in a 10,687 bp region. In addition, some genomic regions were shown to be specific for D. solani species and contain some metabolic and NRPS/PKS encoding genes [11].

Conclusions
The complete sequence of D. solani IPO 2222 T is the first complete genome of a member of this species, the type strain. This work provides a substantial resource in terms of knowledge of the bacterial genetic material. It may help to understand the successful fitness of D. solani in invading potato fields, opening the way to new control strategies against this phytopathogen.