Draft genomes of Cronobacter sakazakii strains isolated from dried spices bring unique insights into the diversity of plant-associated strains

Cronobacter sakazakii is a Gram-negative opportunistic pathogen that causes life- threatening infantile infections, such as meningitis, septicemia, and necrotizing enterocolitis, as well as pneumonia, septicemia, and urinary tract and wound infections in adults. Here, we report 26 draft genome sequences of C. sakazakii, which were obtained from dried spices from the USA, the Middle East, China, and the Republic of Korea. The average genome size of the C. sakazakii genomes was 4393 kb, with an average of 4055 protein coding genes, and an average genome G + C content of 56.9%. The genomes contained genes related to carbohydrate transport and metabolism, amino acid transport and metabolism, and cell wall/membrane biogenesis. In addition, we identified genes encoding proteins involved in osmotic responses such as DnaJ, Aquaproin Z, ProQ, and TreF, as well as virulence-related and heat shock-related proteins. Interestingly, a metabolic island comprised of a variably-sized xylose utilization operon was found within the spice-associated C. sakazakii genomes, which supports the hypothesis that plants may serve as transmission vectors or alternative hosts for Cronobacter species. The presence of the genes identified in this study can support the remarkable phenotypic traits of C. sakazakii such as the organism’s capabilities of adaptation and survival in response to adverse growth environmental conditions (e.g. osmotic and desiccative stresses). Accordingly, the genome analyses provided insights into many aspects of physiology and evolutionary history of this important foodborne pathogen. Electronic supplementary material The online version of this article (10.1186/s40793-018-0339-6) contains supplementary material, which is available to authorized users.


Introduction
Cronobacter species, formerly known as Enterobacter sakazakii, are a group of opportunistic foodborne bacterial pathogens [1,2]. The genus Cronobacter is comprised of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. muytjensii, C. dublinensis, C. universalis, and C. condimenti [2,3]. These re-emerged pathogens cause severe meningitis, septicemia, or necrotizing enterocolitis in neonates and infants and pneumonia, septicemia, and urinary tract and wound infections in adults [4][5][6][7]. Of the seven species, the primary pathogen is C. sakazakii; the status of Cronobacter, as a pathogen, was elevated to an international public health concern when contaminated samples of powdered infant formula (PIF) or follow-up formula (FUF) were recognized by the food safety community, after linking its presence to several neonatal meningitis outbreaks [8,9,10]. It is well-defined now that contamination of reconstituted, temperature-abused PIF occurs both intrinsically and extrinsically; the main reservoir(s) and routes(s) of contamination have yet to be established, however [11]. Furthermore, reports from numerous surveillance studies have shown that Cronobacter species are found in a variety of foods including dried foods (spices, herbs, flour, and cereals) and fresh ready-to-eat vegetables [12][13][14][15]. This increasing body of evidence suggests that plants may serve as a reservoir [16,17]. Moreover, linking the epidemiology of adult cases to consumption of PIF is difficult to explain [5][6][7], suggesting that there are still unknown sources, such as other foods which may be involved in causing adult infections. Although occurrences of Cronobacter species in plant-origin foods are increasingly being reported, relatively less genomic information is available [18,19]. Here, we describe the draft genome sequences of 26 C. sakazakii strains isolated from dried spices which were obtained from the USA, the Middle East, China, and the Republic of Korea.

Organism information
Classification and feature The strains described in this report were obtained through various surveillance studies reported by Gopinath et al. [18], Jaradat et al. [20], and Chon et al. [21]. C. sakazakii is a Gram-negative, non-sporulating, and mesophilic, facultatively anaerobic bacterium (Kingdom Domain: Bacteria) that belongs to the phylum Proteobacteria, class Gammaproteobacteria, order Enterobacterales, within the family Enterobacteriaceae. C. sakazakii cells are rod-shaped measuring approximately 3 by 1 μm when the cells are in the exponential growth phase; the cells are motile by peritrichously-expressed flagella (Fig. 1). The species type strain is ATCC 29544 T (strain synonyms: CDC 4562-70; DSM 4485; NCTC 11467, and WDCM 00214), which was isolated from a child's throat with whooping cough in 1970 by the Tennessee State Health Department, Nashville, TN, USA. Originally described as a yellow pigmented E. cloacae by Urmenyi and Franklin [22], the bacterium was later reclassified by Farmer et al. as Enterobacter sakazakii in 1980 [23], and then redefined as Cronobacter by Iversen et al. [2] after aligning the different biogroups described by Farmer et al. [23] into separate species epithets. Iversen et al. [2] characterized the new genus into six species groups based on a polyphasic approach utilizing both DNA-DNA hybridization and phenotypic analyses. Joseph et al. [3], then described C. condimenti and realigned the previously recognized Cronobacter genomospecies 1 with the new species epithet, C. universalis.
Phenotypically, it is very challenging to assign species identities to Cronobacter species based on classic biochemical reactions routinely used to characterize members of the family Enterobacteriaceae; Iversen et al. [2] have summarized these concerns. They assigned biogroups 1-4, 7, 8, 11, and 13 to the C. sakazakii epithet [2]. Typically, C. sakazakii strains will give a positive result in tests for the utilization of putrescine, turanose, maltitol, lactulose, 1-0-methyl a-D-glucopyranoside, palatinose, cisaconitate and 4-aminobutyrate. The utilization of myo-inositol is variable among strains and a small number of strains (less than 5%) can utilize malonate [2].
Cronobacter species also represent a group of bacteria that are highly resistant to desiccation [24-28, 29, 30].
Cronobacter species are ubiquitous in nature, and molecular typing schemes have been very helpful in both epidemiological and surveillance investigations. One of the most useful schemes is based on a DNA-sequence-typing (ST) method using a seven-locus MLST scheme which is maintained at http://www.pubmlst.org/cronobacter [31,32,33]. Recently Gopinath et al. [18] demonstrated that C. sakazakii strains possessing the ST64 allelic profile also contain a nine gene, 7.7 kb malonate utilization operon which shares sequence homology with operons possessed by C. turicensis and C. universalis. These results support the original findings of Iversen et al. [2] that projected that 5% of C. sakazakii strains can utilize malonate, a trait well recognized to be present in the other six Cronobacter species. There have been over 230 C. sakazakii STs identified and 11% of~1606 C. sakazakii strains stored within the Cronobacter PubMLST site are from clinical samples [31]. C. sakazakii ST64 strains are phylogenetically related to strain C. sakazakii strain GP1999, a ST145 strain which Fig. 1 Transmission electron photomicrograph of a typical Cronobacter sakazakii strain (ES632) grown on Trypticase soy agar supplemented with 1% sodium chloride, and incubated at 37°C for 22 h. The cells were negatively stained with 0.5% sodium phosphotungstate (pH 6.8).
Note the presence of numerous peritrichously expressed flagella (arrow). Bar represents 1 μm was isolated from a tomato plant's rhizoplane/rhizosphere continuum [16,17], as well as, to other strains obtained during surveillance studies of dried plant foods, PIF and dairy powder production facility environments, spice, milk powder, and mushroom samples located throughout the USA, Europe, the Middle East, the Republic of Korea, and China [18][19][20][21]. The general features of the strains reported in the present study are shown in Table 1 which includes five ST64 strains: AS (Allspice) 2, AS4, AS13, AS15, and Jor172 which were obtained from spice samples from the USA, the Republic of Korea, China, and Jordan. Strains representing 12 other STs are also incorporated into this report, including strains representing STs like the meningitis ST4 clone and other clinically relevant STs: ST1, ST8, ST3, ST13, ST21, ST31, ST40, ST99, ST219, ST226, and a recent new ST: ST643 [19].

Genome project history
This extended genome report describes draft genomes of twenty-six C. sakazakii strains which were obtained from various spice samples. This work is part of a larger study focused on exploring the microbial diversity of C. sakazakii strains which are associated with foods of plant-origin such as spices; Table 2 describes the project information and its association with minimum information about a genome sequence (MIGS) utilizing its version 2.0 compliance criteria [34].

Growth conditions and genomic DNA preparation
Frozen bacterial cultures were stored at − 80°C in Trypticase soy broth (BBL, Cockeysville, MD) supplemented with 1% NaCl (TSBS) and 50% glycerol, and were streaked onto agar plates containing Enterobacter sakazakii Chromogenic Plating Medium (ESPM, R&F Products; Downers Grove, IL) followed by incubation overnight at 37°C. Typical Cronobacter-like colonies (blue-black to blue-gray colored, raised colonies) were chosen to inoculate TSBS broth cultures (5 ml) which were incubated at 37°C, shaking at 150 rpm for 18 h. Bacterial DNA was extracted and purified using a Qiagen Qiacube instrument and its automated technology (QIAGEN Sciences; Germantown, MD) as described previously and according to the manufacturer's instructions [16,18,19,35,36].

Genome sequencing and assembly
For WGS analysis of the strains, the concentration of each strain's DNA was then determined using a Qubit Fluorometric spectrophotometer (Life Technologies, Thermo Fisher Scientific; Wilmington, DE). DNA samples were diluted with sterile nuclease-free deionized water (molecular biology grade, Thermo Fisher Scientific, Waltham, MA) to a final concentration of 0.2 ng/μl. Whole-genome sequencing was performed using a MiSeq benchtop sequencer (Illumina, San Diego, CA, USA), utilizing either 500 or 600 cycles of paired-end reads (Illumina). FASTQ datasets were de novo assembled with CLC Genomics Workbench version 9.0 (CLC bio, Aarhus, Denmark). The paired end libraries were generated and sequenced in conjunction with the Nextera XT DNA sample preparation guide on the Illumina Miseq instrument (Illumina; San Diego, CA) [16,18,19].

Genome annotation
Sequence data for each strain was uploaded onto the Rapid Annotation Subsystems Technology (RAST) server for annotation [37]. The genomes were also submitted to the Department of Energy Joint Genome Institute (Walnut Creek, CA) through the annotation submission portal of the NCBI prokaryotic genome annotation pipeline (PGAP) with its best-placed reference protein set GeneMarkS+ application. Table 3 shows each strain's source, geographic locale, genome size, topology, %G + C content, number of CDS, sequence type (ST), NCBI accession number, GOLD analysis project identification number, and locus tag which are captured for each spice-associated strain under the umbrella NCBI Gen-Bank BioProject PRJNA258403 which is a FDA-CFSAN Cronobacter GenomeTrakr project [38,39]. EggNOG analysis was also used to verify functional gene annotations and to help identify clusters of orthologous groups (COGs) categories [40].

Genome properties
A summary of the genome statistics for the 26 plant-origin C. sakazakii strains is provided in Table 4 and information on each individual strain is given in Additional file 1: Table S1. De novo assembly of the genomes resulted in an average total genome length of 4393 kb with a range of 4052 to 4716 kb observed among the genomes. The average total number of coding regions (CDS) was determined to be 3898 kb with a CDS range of 3779 to 4160 kb observed among the genomes (take note: that the JGI IMG annotation pipeline identified 3151 genes which were assigned to COGs). The average G + C content of strains was 56.9% with a range of 56.4 to 57.1% observed among the genomes. These values are similar to those reported for other strains of plant-origins curated at NCBI [16,18,19,35,36]. Using the JGI IMG annotation pipeline, it was possible to identify an average of 4207 predicted genes (range: 4090-4541) among the 26 genomes of which 4055 (3937 to 4383) genes putatively encoded for proteins (which constituted~96% of all genes). Onehundred pseudogenes (range: 73-157 genes), and 151 RNA genes (range: 142-162 genes) were also identified; 3877 genes possessed identifiable Pfam domains, while ~413 genes encoded proteins possessing predicted signal peptides. Lastly, approximately 994 genes encoded for predicted proteins with a function that could be assigned to a transmembrane protein.

Insights from the genome sequence
Plasmids Comparative RAST analysis of the draft assemblies with that of the virulence plasmid, pESA3 (131,196 bp in size [37]), shown in Additional file 4: Table S4, revealed the presence of coding sequences for the predicted alleles of the pESA3-like, RepFIB virulence plasmid originally described by Franco et al. [43]. pESA3-like plasmids contain a common backbone set of alleles represented by the plasmid origin of replication gene, repA, an ABC iron transporter gene cluster (identified by the presence of eitA) and a Cronobactin (an aerobactin-like siderophore) gene cluster (identified by the presence of iucC). Prototypical C. sakazakii strain BAA-894 also possesses plasmidborne gene sequences for a Cronobacter plasminogen activator gene (cpa), genes encoding an~17-kbp type six secretion system (T6SS) and, in approx. 20% of C. sakazakii strains (however, not found in BAA-894), possess genes of the2 7-kbp gene filamentous hemagglutinin (FHA) gene cluster represented by the presence of fhaB [44,43]. Interestingly, results of PCR analysis of the strains reported in the present study, shown in Table 6, revealed that all of the strains were PCR-positive for repA, cpa, eitA, and iucC. All of the strains were also PCR-positive for the T6SS's IntLeft (IntL) gene locus, but only seven, 11, and three of the strains were PCR-positive for the other three T6SS alleles (vgrG, R end, IntR). These results suggest that the T6SS gene cluster is highly variable in these strains, similar to what Franco et al. [43] and Yan et al. [45,46] had previously reported. In addition, six of the strains were PCR-positive for fhaB, signifying that these strains possess the FHA gene cluster. Only one of the strains was PCR-positive for pESA2-like plasmids, while five of the strains were PCR-positive for the C. turicensis-like pCTU3 plasmid which was identified by Stephan et al. [47]. RAST analysis was used to determine if any of the 26 plant-origin strains harbored the small cryptic CSK29 544_2p-like plasmid which has been found in other C. sakazakii strains such as C. sakazakii strain SP291 (CSK29544_2p is homologous to pSP291-3), a highly persistent environmental strain found associated with an Irish PIF manufacturing facility [45,46]. According to the C. sakazakii NCBI website (https://www.ncbi.n lm.nih.gov/genome/genomes/1170?), the species type strain, C. sakazakii 29544 T harbors three plasmids CSK29544_1p (pESA3-like virulence plasmid, 93,905 bp in size), CSK29544_2p (a small cryptic plasmid, 4938 bp in size), and CSK29544_3p (a pESA2-like conjugative plasmid, 53,457 bp in size). CSK29544_2p contains five genes encoding for a methyl-accepting chemotaxis protein, a hypothetical protein and a plasmid mobilization relaxosome protein cluster, MobCABD. Our analysis showed that none of the strains harbored this plasmid (data not shown).  Table 3 Genbank ID See Table 3 GenBank Date of Release 2018/03/07 GOLD ID SEE Table 3 BIOPROJECT PRJNA258403 (Cronobacter GenomeTrakr Project, FDA-CFSAN) Project relevance Food Safety, source attribution

Chromosomal traits
Next generation genome sequencing of the different Cronobacter species revealed a species-level bidirectional divergence which is hypothesized to be driven by niche adaptation [35]. Figure 2 illustrates this phylogenetic divergence, using the kSNP3 tool [48], of the strains reported in this study with representative strains of each species. The phylogeny among these strains followed similar sequence type evolutionary lineages which were reported by Chase et al. [36] and Gopinath et al. [18]. Furthermore, Cronobacter possess a diversity of remarkable features which support the organism's capability to survive under severe environmental growth conditions such as xerotolerant econiches confined to the production of dried foods, such as PIF [35,29,30]. The physiological mechanisms of desiccation survival are thought to involve both primary and secondary desiccation responses; and involve the efflux of various sugars such as trehalose and other osmoprotectants [29,30]. Genomically, several genes involved in osmotic responses were found within these spice-associated strains; furthermore, these genes were shown by Srikumar et al. [30] to be transcriptionally highly up-regulated in C. sakazakii cells grown under xerotolerant growth conditions. For example, DnaJ and DnaK, (Additional file 3: Table S3) in strain MOD1_O23mB, represented by locus tags: C5975_08705 and C5975_08710 are two co-expressed chaperone proteins which are classified in COG O and were found in all of the strains analyzed in this study. DnaJ participates actively in the response to hyperosmotic and heat shock by preventing the aggregation of stress-denatured proteins and acts in association with DnaK and GrpE (locus tag C5975_09365). DnaJ is considered to be the nucleotide exchange factor for DnaK and may function as a thermosensor. Unfolded proteins bind initially to DnaJ. It is also hypothesized that DnaJ, DnaK, and GrpE act together in the replication of plasmids through activation of initiation proteins. Another protein, Aquaporin Z (classified in COG M, represented here as an example in strain MOD1_O23mB (locus tag: C5975_14540) Additional file 3: Table S3), was found in all strains and is a porin-like channel protein that permits osmotically driven movement of water in both directions. It is thought to be involved in osmoregulation and in the maintenance of cell turgor pressure during volume expansion in rapidly growing cells. It is thought that Aquaporin Z opens in response to the stretch forces in the membrane lipid bilayer and that it may also participate in the regulation of osmotic pressure changes within the cell during osmotic stress. Thus, Aquaporin Z mediates rapid entry or exit of water in response to abrupt changes in osmolarity. Aquaporin Z is also a member of the major intrinsic protein (MIP) superfamily which functions primarily as water-selective membrane channels that transport water, small neutral molecules, and ions out of and between cells. Still another protein, ProQ (as example, locus C5975_18900 in strain MOD1_O23mB in Additional file 3: Table S3), is classified in COG T; and is a protein that is a structural element that influences the osmotic activation of the proline/ betaine transporter ProP at a post-translational level. It also acts as a proton symporter that senses osmotic shifts and responds by importing osmolytes such as proline, glycine betaine, stachydrine, pipecolic acid, ectoine and taurine into the cell. ProP is thought to have a dual role in that it serves the cell as both an osmosensor and an osmoregulator which is available to participate in the bacterial osmoregulatory response [29,30]. The channel opens in response to the stretch forces in the membrane lipid bilayer and may also participate in the regulation of osmotic pressure changes within the cell. Other proteins such a TreF (an alpha, alpha-trehalase, MOD1_O23mB locus C5975_10755, COG G, Additional file 3: Table S3) was found and is thought to provide cells with the ability to utilize trehalose under high osmolarity growth conditions by splitting it into glucose molecules that can subsequently be taken up by the phosphotransferase-mediated uptake system. Another set of proteins encoded by the mdoHGC operon (COG P, MOD1_O23mB locus C5975 _17925, C5975_17930, C5975_17940 in Additional file 3: Table S3), which is involved in the biosynthesis of osmoregulated periplasmic glucans (OPGs), was found to be highly up-regulated in C. sakazakii grown under xerotolerant growth conditions [30]. The roles of the OPGs are complex and vary considerably among bacteria, but OPGs are thought to be a part of a signal transduction pathway(s) and are thought to indirectly regulate genes involved in virulence. The total number of OPGs increases when the osmolarity growth conditions decreases [49]. In general, EggNOG analysis identified 10 proteins per strain that were involved in the osmotolerance response. Another group of chaperone-like proteins which these C. sakazakii strains possessed are also annotated as heat shock proteins, and consist of IbpA (C5975_06750), DiaA (C5975_07735), and HtpX (C5975_18890), and Hsp15 (C5975_00700, COG M). There were in general between 11 and 17 heat shock-related proteins found by EggNOG analysis. Other sets of proteins found associated with these strains include 22-27 fimbriae proteins, however no curli proteins were found. There were 23-28 different efflux pump-associated proteins including proteins involved with the efflux or transport of threonine, homoserine lactone (locus tag C5975_00275), p-hydroxybenzoic acid (locus tag C5975_07280), glutathione-regulated potassium (locus tag C5975_00475, C5975_00480, C5975_08855, C5975_08860, KefGFCB), RND efflux (C5975_02520, Transporter), proteins associated with heavy metal efflux of nickel/cobalt (C5975_13445, RcnB), cobalt/magnesium

.0 Not in COGs
The total is based on the total average number of protein coding genes (3902) for the genome. a Note: A summary of the total number of COG alleles per strain is shown in Additional file 2: Table S2. Individual strain's genome statistics is shown in Additional file 3:  Only 24 strains were analyzed by PCR for presence of pESA2 and pCTU3 (MOD1_788569 and MOD1_O123mB strains were not analyzed). Therefore, the percent positive for pESA2 and pCTU3 were calculated using a total number of 24 strains (C5975_08880, ApaG), and manganese ions (C5975 _18840, MntP), sugar efflux (C5975_13720, SetB), and multidrug resistance (MdtA, MdtH, MdtD). There were on average 5-13, 1-10, 15-20 proteins that were annotated as integrases, transposases, and recombinase-like proteins, respectively. All of these genes have been observed in other C. sakazakii genomes [16,18,19]. Interestingly, there was a large difference  in the number of phage-associated proteins among the strains. For example C. sakazakii strain Jor96 possessed phage proteins annotated for lambda, GP49-like, P2, Mu, and cp-933 k phages. Lastly there was also a wide difference in the number of both toxin-antitoxin type I and type II toxin-antitoxin family proteins found among the genomes; examples include type I toxin-antitoxin system hok family toxin and type II toxin-antitoxin systems such as RelE/ ParE, RelE/DinJ, and HipA families.
Among the spice-associated C. sakazakii strains, 4 to 7 hemolysin-related proteins were identified. For example C. sakazakii strain MOD1_Jor93 possessed six alleles encoding for hemolysin-related proteins, such as four COG category U (intracellular trafficking and secretion) genes. A hemolysin secretion/activation protein homologous to the ShlB/FhaC/HecB family of alleles was found in Fig. 2 Phylogenetic analysis of Cronobacter sakazakii strains isolated from spices, compared with eight representative Cronobacter species strains (marked with superscripted 'T' after each strain's name). NCBI GenBank Accession numbers of type strains: C. malonaticus LMG 23826 T (NZ_CP013940), C. turicensis LMG 23827 T (NC_013282), C. universalis NCTC 9529 T (NZ_CP012257), C. muytjensii ATCC 51329 T (NZ_CP012268), C. dublinensis subsp. dublinensis LMG 23823 T (NZ_CP012266), C. dublinensis subsp. lactaridi LMG 23825 T (NZ_AJKX00000000), C. dublinensis subsp. lausannensis LMG 23824 T (NZ_AJKY00000000), and C. condimenti LMG 26250 T (NZ_CP012264). Whole genome SNP analysis was carried out using kSNP3 software [48]. The phylogenetic tree was built using neighbor-joining method [65] and the evolutionary distances were computed using the Maximum Composite Likelihood method [66] available on MEGA7 phylogenetic suite [67]. The bootstrap values obtained from 500 bootstrap replicates are reported as percentages at the nodes [68]. Sequence type (ST) information was obtained by uploading each strain's genome assembly to the Cronobacter MLST website (http://pubmlst.org/cronobacter/) after which the ST information was manually overlaid onto the tree with different color. Note that the phylogeny among the strains followed ST evolutionary lineages. The scale bar indicates 0.10 substitutions per nucleotide position MOD1_Jor93 (C5940_08565, Additional file 3: Table S3). This Pfam annotated allele shares homology with a group of sequences that are related to ShlB from Serratia marcescens [50]. It is hypothesized that ShlB is an outer membrane protein possibly involved in either a Type V or a two-partner secretion system where it functions to secrete and activate a ShlA type hemolysin. The activation of ShlA is thought to occur during secretion when ShlB imposes a conformational change in the inactive hemolysin to form the active protein. Though ShlA was not found in MOD1_Jor93, this protein was found in MOD1 _Jor20 (C5932_21600).
There were three proteins defined as COG category S (function unknown) which included a hemolysin expression modulating protein, a putative hemolysin, and COG1272, a predicted membrane hemolysin III which Cruz et al. previously described [51].
Other virulence-related proteins included MsgA (analogous with a DNA damage-inducible protein, DinI family protein). Every genome possessed genes for this protein.
The same protein is found in Salmonella enterica subsp. enterica. It is thought that MsgA in Salmonella is required for intramacrophage survival and seems to be independent of the PhoP regulon [52]. Other virulence factor-like proteins found were ImpE and SrfB [46].
Xylose and arabinose account for more than 30% of the total sugars in agricultural residues and in fact, Xylose is the second most abundant sugar in nature besides glucose and primarily exists as D-xylose [53]. However, it is usually found as a polymeric component of plant cell wall matrix polysaccharides such as xylans, e.g., arabinoxylans, hemicellulose (xylan, glucuronoxylan), and xyloglucan [53]. Complex interactions are thought to exist between human pathogens and a plant's indigenous microflora, including phytopathogens, which are associated with fresh produce [53]. Xanthomonas pathogens such as X. campestris pathovars cause diseases of agronomic importance throughout the world; examples include black rot disease in crucifers such as cauliflower, cabbage, garden cress, bok choy, broccoli, and brussel sprouts; and in fact these pathovars can affect all cultivated brassicas. Also, X. campestris pv. vesicatoria (now reclassified as X. euvesicatoria), causes bacterial spot disease on pepper and tomato plants, and X. campestris pv. malvacearum (now X. axonopodis pv. malvacearum), causes angular leaf spot of cotton [54,55]. These phytopathogens possess a number of plant cell wall-degrading enzymes (as part of the carbohydrate utilization with TonB-dependent outer membrane transporter system regulon, CUT), which are secreted by a type II secretion system (T2SS) and are required for virulence and pathogenesis. These pathogens also possess two major xylanase-related genes, xynA and xynB, which could influence biofilm formation and virulence by weakening the plant cell wall structure through degradation causing the release of nutrients during plant colonization [54]. A xylanolytic-like system, ubiquitous in lignocellulose-degrading bacteria, is also found in E. coli [56], and thought to play important roles in biofilm formation, nutrient uptake and adaptation of these Proteobacteria to the plant phyllosphere [56]. Functional metagenomic findings reported by Carter et al. [57] and transcriptional analyses suggest that E. coli O157:H7 competes with spinach indigenous microflora for essential macronutrients which is thought to lead to its ability to contaminate spinach [57,58].
A xylose utilization operon (average size of~16,771 bp; 11 genes) which possessed a G + C content of 54.9%, was found among the spice-associated C. sakazakii strains. A map of the operon for C. sakazakii strain MOD1_AS15 is shown in Fig. 3a. The operon consists of the following genes: xylA (xylose isomerase, locus tag C5965_02230), xylB (xylulose kinase, locus tag C5965_02235), xylF (D-xylose ABC transporter substrate binding protein, locus tag C5965_02225), xylG (xylose ABC transporter ATP binding protein, locus tag C5965_02220), xylH (a sugar ABC transporter permease, locus tag C5965_02215), which is part of the ABC transporter complex XylFGH. This latter complex is involved in D-xylose uptake, xylR (an AraClike xylose operon transcription regulator, locus tag C5965_02210), bax (an ATP-ribonucleoside binding protein, locus tag C5965_02205), an α-amylase gene (amy1, locus tag C5965_02200), a valine-pyruvate transaminase gene (avtA, locus tag C5965_02195), xylS (an αxylosidase gene, locus tag C5965_02190), and a proposed α-xynT (glycoside-pentoside-heuronide family transporter, locus tag C5965_02185). Outside of the xylose utilization operon are other xyloside uptake genes and genes encoding degradation enzymes, such as a second xynT (a proposed β-xynT, locus tag C5965_04340), xynB (a β-xylosidase, locus tag C5965_04335), and xylE (a proton-sugar symporter (locus tag C5965_09300). This shares significant homology with xylE of E.coil, which is a member of the major facilitator superfamily (MFS) of transporters) possessed by E. coli and other bacteria [56]. The genomic structure of the Cronobacter xylose utilization operon was similar to that found in E. coli strain K-12 (strain MG1655; GenBank assembly accession: GCA_000005845; RefSeq assembly accession:GCF_0000 05845) except that two genes present in the Cronobacter xylose operon, xylS and α-xynT are missing from within the operon in E. coli strain MG1655 which resulted in1 3,041 bp sized operon. Additionally, there was a size difference (ranging from 16,340 to 16,790 bp) observed among the operons possessed by the twenty-six C. sakazakii strains, and there were four strains which differed in that bax and the α-xynT were either truncated or duplicated.
Previously we reported the presence of a xylose utilization operon in C. sakazakii strain GP1999, which was isolated from a tomato's rhizoplane/rhizosphere continuum [16]. Furthermore the xylose utilization operon was found in 29 other C. sakazakii strains [19] which were obtained from foods of plant origin and dried-food manufacturing environments, supporting the hypothesis that plants may be the ancestral econiche for Cronobacter spp., as posited by Schmid et al. [17] and Joseph et al. [32]. Among these strains, we also observed differences in size of the operon [19]. In comparison, the CUT-like xylose utilization operon possessed by X. axonopodis pv. citri strain AW12879 (NCBI GenBank assembly accession number: GCA_000349225; RefSeq assembly accession: GCF_000349225) comprises a total of 13 genes and was 25,382 bp in size. Noteworthy, within this operon, an IS3 family transposase was located next to an αglucosidase gene. Additional differences found were the presence of a TonB-dependent receptor gene and a LacI family transcriptional regulator gene (data not shown).
In the current report, we show the G + C content of a 17, 970 bp region upstream and a 17,422 bp region downstream of the C. sakazakii xylose utilization operon possessed G + C contents of 58.1 and 59.6%, respectively (data not shown). This change in G + C content suggests that the Cronobacter xylose utilization operon may be a predicted genomic (GI) or metabolic island [59]. Because bacterial genomes evolve through re-combinational events such as mutations, rearrangements, or horizontal gene transfer, we looked for clusters of genes of known or predicted GIs. Genomic islands were historically classified into distinct subtypes depending on the functions they encoded: e.g., symbiotic islands, metabolic islands, fitness islands, pathogenicity islands, or antibiotic resistance islands [60]. However, such G + C content change was not seen in the genomes of the E. coli and the X. axonopodis pv. citri strains. As shown in Fig.  3b, and similar to the xylose operon of E. coli strain MG1655 a number of sequence repeats (two in the case of MG1655) were located throughout the Cronobacter  Table S5. c Nucleotide sequence alignment captured in Geneious for repeat region five in xylB for MOD1_AS-15 and MOD-1_Jor22 showing the presence of the repeat region in MOD1_Jor22 (56,266 to 56,280, see red box). Note that bax can contain two to three identical repeat regions which suggest that this is an important highly regulated gene. bax has been shown to induce cell apoptosis of Arabidopsis protoplast cells through reactive oxygen independent and dependent processes namely DNA fragmentation, increased vacuolation, and loss of plasma membrane integrity [61] xylose operon (up to six sequence repeat regions were observed in some strains) suggesting that these are binding sites for regulatory proteins or that they may be evidence of past transpositions. For any one strain, there were multiple sequence repeats found. Table 7 shows examples of the various inverted repeats, palindromes and direct repeats observed in two C. sakazakii strains MOD1_Jor151 and MOD1_Jor173. Inverted and direct repeats were sometimes found in two different genes within the same strain (MOD1_Jor151 amy1 and xylS or xylG and xynT); while palindromic sequence was found in bax of MOD1_Jor151. Occasionally, the size of the sequence repeat varied between 15 or 16 bases (which are the default parameters for the sequence repeats finder algorithm within Geneious). Finally, the location of the sequence repeats and type of sequence repeats found among the strains generally followed sequence type evolutionary lines with the exception of ST4 strains (MOD1_Jor148, MOD1_Jor154) and ST643 strain (MOD1_Jor103) which possessed different palindromic sequences which were associated with hypothetical protein or bax. Additional file 5: Table S5 shows the location of each the identical repeat regions within each strain's xylose utilization operon. It should be noted that other palindromic inverted repeats (IR) of 10 to 13 nucleotides, separated by a 10-bp spacer, forming a stem-loop structure, are found on the virulence plasmids, pESA3 and pCTU1. Furthermore, Franco et al. [43] showed that a conserved pCTU1 region was located upstream of this IR, while the Cronobacter plasminogen activator locus on pESA3 was located downstream from this sequence repeat. Also, the upstream flanking gene seen in the Cronobacter xylose utilization operon was identified as a hydrolase and the downstream flanking gene was identified as DUF-2778. These two genes and their locations were conserved throughout the 26 spiceassociated C. sakazakii genomes. Figure 3c shows an alignment of a xylB gene that has the IR repeat region from strain MOD1_Jor22 compared to strain MOD1 _AS15 which lacks this repeat region. Note that bax can contain two to three identical repeat regions suggesting that this is a highly regulated gene. Bax has been shown to induce cell apoptosis of Arabidopsis protoplast cells through reactive oxygen independent and dependent processes namely DNA fragmentation, increased vacuolation, and loss of plasma membrane integrity [61]. Together, these results suggest that there is a virulence factor function to Bax and that the Cronobacter xylose utilization operon may be a predicted metabolic island. Figure 4 illustrates the proposed molecular basis of how C. sakazakii (strain MOD1_Jor22 as an example) may utilize D-xylose, xylose-containing plant cell wall polymers (xylans, hemicellulose-like, and cellulose) or αand β-xylosides. D-xylose enters the cytoplasm of a cell either by diffusion or by transport and binds to the AraC-like positive xylose operon transcription regulator, XylR. XylR is, identical to AraC which activates the transcription of the analogous arabinose utilization operon, araBAD, araE and araFGH operons, but represses the transcription of the araC operon. Once bound, XylR Genome assemblies were analyzed using the sequence repeat finder algorithm within Geneious. These two examples represent the various sequence repeat permutations found among the 26 spice-associated strains. For specific locations of the sequence repeats for each stain please refer to Additional file 5: Table S5 b Abbreviations: IR Inverted repeat, P Palindrome, DR Direct repeat c Numbers within the parenthesis refer to the start and end base position of sequence repeats within Geneous actuates the xylose regulon by activating the transcription of the xylFGH, xylR, xylAB, and xylE genes. In fact, in E. coli, the xylose transporters XylE and XylFGH can transport both arabinose and xylose; conversely the arabinose transporters AraE and AraFGH can take up xylose, even in the absence of arabinose [56]. As with arabinose, expression of the XylE and XylFGH transporters increases the rate of xylose uptake and further enhances activation of the regulon. Another set of genes, which are also outside the operon, may be triggered through the proposed activation of the xylose regulon: xynA encoding for Xylanase A (xynA, locus tag C5934 _19110) which is an Endo-1,4-β-xylanase and may be secreted by a proposed type 2 secretion system. A third pathway of xylose utilization, also seen in E. coli, was found in these Cronobacter spice strain's genomes and includes a xylulose reductase, an oxidoreductase (locus tag C5934_08370), and a NAD(P)-dependent alcohol dehydrogenase (locus tag C5934_08415) which are thought to be activated under anaerobic growth conditions [56]. D-xylose, or transported α/β-xylosides (via α/β-XynTs) are converted to D-xylose by α/β-xylosidases (XylS/ XynB) within the cell. It is not certain, at this time, how xylans are converted to α-xylosides in the extracellular milieu. However, the fact Cronobacter possess an α-xylosidases (xylS) and an adjacent xynT gene, suggests that that αxylosides may be transported into the cell and then converted to D-Xylose, which is then converted to D-xylulose by xylose isomerase (XylA) and then phosphorylated by Xylulose kinase (XylB). Then, xylulose 5-phosphate is metabolized by the enzymes of the pentose phosphate pathway [56]. Together these results support those reported by Srikumar et al. [30], which suggest that 5-carbon sugar physiological mechanisms utilized by Cronobacter plays important roles in its overall survival strategy.

Conclusions
Several lines of evidence posited by Schmid et al. [17] and Joseph et al. [32] suggest that the ancestral econiche for Cronobacter species may have been eukaryotic plants. It is interesting to speculate that both the survival mechanisms, which we now recognize through the use of NGS and the study of efflux of important molecules Fig. 4 Schematic representation of xylose utilization by C. sakazakii strain MOD1_Jor22. The proposed model for xylose utilization involves activation of the xylose regulon by the binding of D-xylose with XylR. It is thought that D-xylose enters the cell either through diffusion or transport via XylE or XylFGH. In addition, xylanase A (XynA) is secreted to the extracellular milieu through an unknown type 2 secretion component where it can digest xylan to β-xyloside which is then brought into the cell via a xyloside transporter (XynT, a putative β-xyloside transporter) where XynB (β-xylosidase) converts it to D-xylose. Though unconfirmed, αxyloside is thought to be transported into the cell where XylS (α-xylosidase) converts to D-xylose. Dxylose then is converted to D-xylulose by XylA (xylose isomerase) and then converted to D-xylulose-5P by XylB (xylulokinase). This physiological pathway is identical to that of E. coli. Similar to that of E. coli, Cronobacter also have anaerobic metabolic pathway where D-xylose is converted to xylitol by oxidoreductase and then converted to D-xyloulose using NAD(P)-dependent alcohol dehydrogenase. D-xylulose-5P is then shunted into pentose-phosphate pathway