Draft genomes of “Pectobacterium peruviense” strains isolated from fresh water in France

Bacteria belonging to the genus Pectobacterium are responsible for soft rot disease on a wide range of cultivated crops. The “Pectobacterium peruviense” specie, recently proposed inside the Pectobacterium genus, gathers strains isolated from potato tubers cultivated in Peru at high altitude. Here we report the draft genome sequence of two strains belonging to “P. peruviense” isolated from river water in France indicating that the geographic distribution of this specie is likely to be larger than previously anticipated. We compared these genomes with the one published from the “P. peruviense” specie type strain isolated in Peru. Electronic supplementary material The online version of this article (10.1186/s40793-018-0332-0) contains supplementary material, which is available to authorized users.


Introduction
The Pectobacterium genus [1] gathers important plant pathogens that cause soft rot disease on a large variety of plant species [2]. Given their ability to cause disease on major crops, such as potato, Pectobacterium sp. have mainly been isolated from diseased plant during initial outbreak or sustained epidemic and their descriptions outside of agricultural context is scarce [3].

Classification and features
Strain A97-S13-F16 was isolated in february 2016 from fresh water sampled in the river Durance while strain A350-S18-N16 was isolated in november 2016 from fresh water sampled in river Bléone, close to the confluent with river Durance. The fresh water parameters measured at the sampling times respectively were the following respectively for A97-S13-F16 and A350-S18-N16 sampled water: temperature 6.4°C and 10.4°C; turbidity 2.69 NTU and 145 NTU, conductivity 629 μS and 629 μS. Following sampling, 500 ml of fresh water was filtered through 0.2 μm pore filters (Sartorius cellulose acetate filters), the bacteria present on the filters were suspended in 1 ml sterile distilled water and 100 μl of the suspension were poured onto semi selective modified single-layers CVP AG366 plates (same medium as described in [14] except that tryptone was not added to the medium, hereafter described as CVP). After 2 days of growth at 28°C, two strains forming pits on CVP medium were further isolated, named A97-S13-F16 and A350-S18-N16 and stored in 40% /60% glycerol/ LB liquid medium (10 g tryptone, 5 g yeast extract, 10 g NaCl per one liter of medium) at − 80°C.
Cells of both strains are rod shaped with length of approximately 2 μm in the exponential growth phase on LB medium (Fig. 1) and both strains are macerating potato tubers (Additional file 1: Figure S1). They are forming isolated colonies after 24 h at 28°C on LB-15 g agar medium and after 48 h at 28°C on TSA 10% medium (1,5 g tryptone, 0,5 g soy peptone, 0,5 g NaCl, 15 g agar per one liter of medium) and are inducing pits in CVP medium after 48 h at 28°C.
Amplification and sequencing of the gapA house keeping gene was recently described to rapidly characterize the different Pectobacterium species [15]. The gapA sequences of strains A97-S13-F16 and A350-S18-N16 clustered with the one of proposed "P. peruviense" type strain ( Fig. 2A) and the clusterization of both strains with "P. peruviense" was confirmed through MLSA analysis of full genomes (Fig. 2B).

Genome project history
The aim of the project was to described Pectobacterium sp. isolated from environmental samples outside agricultural context. Fresh water sampling was performed in the river Durance and its tributaries in 2016. Amongst the isolated strains, the two strains A97-S13-F16 and A350-S18-N16, isolated in different locations and at different months in the river stream, were selected for sequencing following amplification and sequencing of their gapA house keeping gene because phylogenetic analysis of their gapA sequences positioned both gapA sequences close to the gapA sequence of the recently proposed "P. peruviense" type strain UGC32 [8,13,15].

Growth conditions and DNA isolation
After isolation from fresh water in 2016, strains A97-S13-F16 and A350-S18-N16 have been stored in 40%/60% glycerol /LB medium at − 80°C. For preparation of genomic DNA, the strains were first grown overnight at 28°C on solid LB medium. A single colony was then pick up and grown overnight in 2 ml of liquid LB medium at 28°C with 120 rpm shaking. Bacterial cells were harvested by centrifugation (5 min at 12,000 rpm) and DNA was extracted with the wizard® genomic DNA extraction kit (Promega) following the supplier specification. DNA was suspended in 100 μl of sterile distilled water and the quantity and quality of DNA was assessed by nano-drop measurement, spectrophotometry analysis and gel analysis.

Genome sequencing and assembly
Genome sequencing was performed at the next generation sequencing core facilities of the Institute for Integrative Biology of the Cell, Bât. 21, Avenue de la Terrasse 91,190 Gif-sur-Yvette Cedex France. Nextera DNA libraries were prepared from 50 ng of high quality genomic DNA. Paired end 2 × 75 bp sequencing was performed on an Illumina NextSeq500 instrument, with a High Output 150 cycle kit. Fig. 1 Photomicrographs of Gram stained exponentially growing "P. peruviense" cells. (a) strain A97-S13-F16, (b) A350-S18-N16. A light microscope with 100X magnification was used. These photomicrographs show the rod shaped forms of both strains. The bar scale represent 5 μm CLC Genomics Workbench (Version 9.5.2, Qiagen Bioinformatics) was used to assemble 30,066,500 (mean length 53 bp) and 8,174,334 reads (mean length 52 bp) for strains A97-S13-F16 and A350-S18-N16 respectively. Final sequencing coverages were 331× and 86× with 61 and 73 scaffolds for strains A97-S13-F16 and A350-S18-N16 respectively ( Table 2).

Genome annotation
Coding sequences were predicted using the RAST server [16] with the Glimmer 3 prediction tool [17]. COG assignments and Pfam domain predictions were done using the Web CD-Search Tool [18]. CRISPRFinder [19] was used to detect CRISPRs. Signal peptide and transmembrane domain were detected with the SignalP 4.1 Server [20] and transmembrane helices were predicted with TMHMM [21].

Genomes properties
The "P. peruviense" A97-S13-F16 draft genome contains 4,775,191 bp with a GC content of 51%. Total predicted genes are 4503 while predicted protein coding genes are 4459 and RNA genes 44. The final assembly comprised 61 scaffolds. Among the predicted genes, 72.21% have a predicted function, 79.91% were assigned to COG and 85.40% have a predicted Pfam domain. Among the predicted proteins, 392 have a predicted signal peptide while 1090 contain a predicted transmembrane helix. Three CRIPS repeats array were detected in this genome. Fig. 2 Phylogenetic trees of "P. peruviense" strains and strains of other Pectobacterium species and subspecies. a Phylogenetic tree constructed from the gapA nucleotide sequences. Sequences were aligned using the MUSCLE software [24] and the alignments were filtered by using the program GBLOCKS [25].Tree was computed using PHYML [26]. One hundred bootstrap replicates were performed to assess the statistical support of each node. Bootstrap support values (percentages) are indicated if superior to 95%. gapA sequences were retrieved from full genome of type strains (accession numbers are indicated in Fig. 1b) or obtained from the sequenced gapA amplicon for strains A97-S13-F16 and A350-S18-N16. b Phylogenetic tree constructed from concatenated sequences of 1266 homologous amino acid sequences. Before concatenation, the homologous sequences of each gene were aligned using the MUSCLE software [24] and the alignments were filtered by using the program GBLOCKS [25]. Tree was computed using PHYML [26]. One hundred bootstrap replicates were performed to assess the statistical support of each node. Bootstrap support values (percentages) are shown if less than 100%. The accession number for each genome is indicated inside brackets after the strain name. Dickeya solani RNS08.23.3.1.A was used as outgroup. Type strains are marked with T after the strain name The "P. peruviense" A350-S18-N16 draft genome contains 4,871,019 bp with a GC content of 51,1%. Total predicted genes are 4635 while predicted protein coding genes are 4487 and RNA genes 48. The final assembly comprised 73 scaffolds. Among the predicted genes, 72.01% have a predicted function, 78.77% were assigned to GOG and 85.09% have a predicted Pfam domain. Among the predicted proteins, 395 have a predicted signal peptide while 1095 contain a predicted transmembrane helix. Two CRIPS repeats array were detected in this genome.

A B
The properties and the statistics of the two draft genomes are summarized in Tables 3 and 4.
Genomes comparison between the "P. peruviense" strains The phylogenetic trees (Fig. 2) indicate that strains A97-S13-F16 and A350-S18-N16 are more closely related to each other than they are from the "P. peruviense" type strain UGC32. To further gain insight into the distance between the three "P. peruviense" strains, we looked for shared and unique genes between genomes of strains A97-S13-F16, A350-S18-N16 and UGC32 type strain (Fig. 3). A97-S13-F16, A350-S18-N16 and UGC32 strains contain respectively a pool of specific genes of 292, 414 and 346. The slightly higher pool of specific genes observed in strain A350-S18-N16 could be partly related to its higher content of mobile genetic elements inserted in its genome as described in Table 4. Indeed, we observed 3 clusters of phage-related genes in strain A350-S18-N16, only one being also detected in strain A97-S13-F16. The Venn diagram indicated that 4129 genes are shared between strains A97-S13-F16 and A350-S18-N16 while only 3757 and 3765 genes are respectively shared between the type strain UGC32 and A97-S13-F16 / A350-S18-N16. This confirmed that A97-S13-F16 and A350-S18-N16 genomes are more closely related to each other than they are with the genome of the proposed type strain UGC32. The total %age is based on the total number of protein coding genes in the genome Fig. 3 Venn diagram. Shared and unique genes between the genomes of "P. peruviense" A97-S13-F16 and A350-S18-N16 and the proposed "P. peruviense" type strain UGC32. Orthology was assumed using a threshold of 80% identity on at least 80% of the protein length