First genome sequencing and comparative analyses of Corynebacterium pseudotuberculosis strains from Mexico

Corynebacterium pseudotuberculosis is a pathogenic bacterium which has been rapidly spreading all over the world, causing economic losses in the agricultural sector and sporadically infecting humans. Six C. pseudotuberculosis strains were isolated from goats, sheep, and horses with distinct abscess locations. For the first time, Mexican genomes of this bacterium were sequenced and studied in silico. All strains were sequenced using Ion Personal Genome Machine sequencer, assembled using Newbler and SPAdes software. The automatic genome annotation was done using the software RAST and in-house scripts for transference, followed by manual curation using Artemis software and BLAST against NCBI and UniProt databases. The six genomes are publicly available in NCBI database. The analysis of nucleotide sequence similarity and the generated phylogenetic tree led to the observation that the Mexican strains are more similar between strains from the same host, but the genetic structure is probably more influenced by transportation of animals between farms than host preference. Also, a putative drug target was predicted and in silico analysis of 46 strains showed two gene clusters capable of differentiating the biovars equi and ovis: Restriction Modification system and CRISPR-Cas cluster. Electronic supplementary material The online version of this article (10.1186/s40793-018-0325-z) contains supplementary material, which is available to authorized users.


Introduction
Corynebacterium pseudotuberculosis is a Gram-positive bacterium that infects several different species of mammals. Strains of the biovar ovis infect sheep and goats, and strains of the biovar equi infect larger mammals such as horses, camels, and buffaloes. The manifestation of the infection depends on the host [1][2][3][4]. This bacterium causes significant economic loss to animal production all over the world due to reduced production of wool, milk and meat, carcass condemnation, as well as the death of infected animals [4][5][6]. C. pseudotuberculosis can also affect humans, causing distinct kinds of lymphadenitis. Contamination occurs through contact with infected animals and consumption of infected food [4,5,7]. This organism affects several countries such as Australia, Brazil, Canada, Egypt, Israel, New Zealand, South Africa, United Kingdom and United States [4,[8][9][10][11][12][13][14][15][16][17]. Cases in other countries such as Portugal [18], Mexico [19] and Equatorial Guinea [20] have been reported in the recent years. In the United States, C. pseudotuberculosis infections are reemerging and considered endemic [19], and the state with the highest number of cases of this bacterium was Texas, which borders Mexico [21]. The spread of C. pseudotuberculosis to other countries brings out the importance of improving the understanding of this bacterium. In the present study, six Mexican C. pseudotuberculosis strains were investigated, two from the biovar equi and four from the biovar ovis. This is the first time that strains of this bacterium, isolated in Mexico, have been completely sequenced. Among those strains, these are the first isolates of the biovar equi coming from this country [19]. The characterization of these strains is important for achieving a better understanding of this species, considering they can present relevant features not yet identified in other strains.

Organism information
C. pseudotuberculosis is a pathogenic bacterium that belongs to the CMNR (Corynebacterium, Mycobacterium, Nocardia, and Rhodococcus) group. This group is characterized by high GC content (46-74%) and by the structure of the cell wall which is mainly composed of peptidoglycan, arabinogalactan and mycolic acids [4,22]. C. pseudotuberculosis is placed in the phylum Actinobacteria, class Actinobacteria, order Actinomycetales, suborder Corynebacterineae and genus Corynebacterium [23][24][25][26][27][28][29][30]. The species is considered a facultative intracellular pathogen [4,31] which is Gram-positive, pleomorphic, non-motile, non-sporulating, mesophilic and can survive both in the host and in the soil [25,[31][32][33][34][35]. Its strains are classified into two biovars, ovis and equi, according to its host preference and nitrate reduction capacity, which is identified through the presence or absence of the narG gene in a PCR Multiplex test [36]. The biovar equi can reduce nitrate and affects mostly large ruminants. The biovar ovis is not able to reduce nitrate and affects mostly small ruminants [4]. More information about classification, general features of this species and some details about the target strains are shown in Table 1 (Additional file 1).
Six C. pseudotuberculosis strains were isolated in Mexico from different hosts and biovars. The strain MEX1 was isolated from a retropharyngeal abscess in a goat. The strain MEX9 was isolated from a prescapular abscess in a goat. The strain MEX25 was isolated from a parotidean abscess in a sheep. The strain MEX29 was isolated from a retropharyngeal abscess in a sheep. These four strains presented negative result for the presence of the narG gene in the PCR multiplex test and were classified as belonging to the biovar ovis. All ovis strains were obtained from outbreaks occurred relatively close to Mexico City. MEX30 and MEX31 were isolated from abscesses in the pectoral muscles of two horses [19]. These two strains were positive for the presence of the narG gene in PCR Multiplex. Consequently, they were classified as belonging to the biovar equi. Although both equi strains were obtained in the same city, they could be considered as isolated cases.
To verify the phylogenetic relationship of these strains to other strains of C. pseudotuberculosis, we generated a phylogenetic tree (Fig. 1) based on the core proteome and progressive refinement, using a bootstrap value of 100. The tree was generated using the PEPR software (https://github.com/enordber/pepr.git) with the Maximum-Likelihood method. The Mexican strains were clustered according to the respective biovars and host preferences, as shown in previous works) [1,37].
MEX30 and MEX31 were isolated in Valparaiso, in the first reported case of infection of horses in Mexico [19]. They clustered together probably because they came from the same source, that could be transported infected animals. Affected horses were identified in all regions of the US and the state of Texas, which borders Mexico, has the highest number of cases) [9,21].
Ovis strains were isolated in Tlaxcala (MEX1) and Rio Frio de Juárez (MEX29), with a 50 Km distance from each other, and Guanajuato (MEX9 and MEX25), within a 400-450 Km distance from the two other isolation localities. However, the strains cluster by host rather than locality of isolation. MEX1 and MEX9 were isolated from goat and MEX25 and MEX29 were isolated from sheep. However, MEX25 and MEX29 (goat) clustered with isolates from lhama (USA) and cow (Israel), while MEX1 and MEX9 (sheep) clustered with isolates from goat and sheep (Brazil), all with a 100% bootstrap. Strains of Ovis biovar are more clonal but does not show the same degree of clustering by the host as Equi [1,37]. Considering a maximum distance of 450 Km between localities of isolation, this genetic structure could better be explained by farming history than host preference. The goat and sheep farms could have different sources of Ovis strains. Transportation of infected animals and further contact and transmission of the disease probably occurred between farms of the same host species [38][39][40].

Genome project history
The present project is a collaboration between the National Autonomous University of Mexico (UNAM), Mexico City, Mexico, and the Federal University of Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil. The six C. pseudotuberculosis strains were isolated by UNAM researchers. Sequencing was performed at the National Reference Laboratory for Aquatic Animal Diseases (AQUACEN), and the two processes of assembly and annotation were performed at the Laboratory of Cellular and Molecular Genetics (LGCM), both laboratories located at UFMG. All genomes are complete and available at the National Center for Biotechnology Information (NCBI). This information is shown in Table 2 and conforms with MIGS recommendations [41]. As mentioned above, the present study presents the first sequencing of C. pseudotuberculosis, and the first isolation of the biovar equi, from Mexico. This data can provide new insights into the diagnosis and treatment of diseases caused by this organism.

Growth conditions and genomic DNA preparation
The samples used in the present study are in the sample collection of LGCM. All six strains were grown in a brain-heart-infusion media (BHI-HiMedia Laboratories Pvt. Ltd., India) with 1.5% of bacteriological agar and supplemented with 0.5% of Tween 80, at 37°C for 72 h under rotation. Genomic DNA was extracted following the protocol of Pacheco et al. [36].

Genome sequencing and assembly
The first step in sequencing each genome was the library construction, following manufacturer's recommendations (IonXpress™ Plus gDNA Fragment Library Preparation). This was performed in three steps: (i) DNA Phylum Actinobacteria TAS [24] Class Actinobacteria TAS [25] Order Actinomycetales Suborder Corynebacterineae TAS [25][26][27][28] Family Corynebacteriaceae TAS [25,28] Genus Corynebacterium TAS [29,30] Species Corynebacterium pseudotuberculosis TAS [26,29] strain: Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [75] fragmentation using the Ion Shear™ Plus Reagents Kit, (ii) addition of adapters using Ion Xpress™ Bar    The assembly process was managed using SIMBA software [42]. The quality assessment of the reads was performed using FastQC software [43]. The assemblies were performed using SPAdes version 3.6 [44] [45] was used to evaluate the quality of the assemblies for all strains. The scaffolds were constructed using CONTIGuator software version 2.0 [46] with C. pseudotuberculosis strain 29,156 (CP010795.1) as a reference to MEX9, MEX25 and MEX29, C. pseudotuberculosis strain MEX9 as a reference to MEX1, C. pseudotuberculosis strain 316 (CP003077.1) as a reference to MEX30 and C. pseudotuberculosis strain E19 (CP012136.1) as a reference to MEX31. Gap closure was performed using CLC Genomics Workbench 7 (Qiagen, USA). This process resulted in six complete genome sequences.

Genome annotation
Genome annotation was performed in two steps: automatic annotation and manual curation. The RAST [47] and tRNAscan-SE [48] software were used in the automated annotation. An in-house script was also employed to transfer the annotation from a reference genome. The Artemis software version 16.0.0 [49], the UniProt [50] and the National Center for Biotechnology Information (NCBI) databases [51] were used in the manual curation.
Putative frameshifts were analyzed using CLC Genomics Workbench 7 (Qiagen, USA) and fixed whenever possible.  Table 3 shows detailed information about properties and statistics of these genomes. The number of genes associated with general COG functional categories [53,54] was generated with the in-house script Blast Cog (https://github.com/aquacen/blast_cog) and are summarized in Table 4. The circular maps of C. pseudotuberculosis MEX1 and MEX30 strains in comparison with the other strains of the present study are shown in Figs. 2 and 3, respectively.

Insights from the genome sequence
The nucleotide sequences, analyzed using the Gegenees software version 2.1 [55], show high similarity (> 92%) between the strains. Higher similarity (> = 99.7%) within strains belonging to the same biovar was found (Fig. 4). This is consistent with a previous study [1], using 15 strains of C. pseudotuberculosis, that shows similarity greater than 99% within the biovar ovis strains and at least 95% of sequencing similarity within the biovar equi strains. Moreover, the sequencing similarity among strains isolated from the same host is higher than the similarity among strains isolated from different hosts (Figs. 1 and 4). Traditionally, the two biovars are differentiated using a nitrate reduction test, in which equi is positive, and ovis is negative [56]. Figure 3 highlights the cluster of genes related to nitrate reduction in Mexican equi strains with the black rectangle. The Protein Family Sorter tool [57] was used to search for genes or clusters of genes that may be used to differentiate the biovars. Within the six  The total is based on the total number of protein coding genes in the genome genomes of the present study, we found the cluster of genes that is related to proteins of type III restrictionmodification (RM) systems [58,59] exclusively in the biovar ovis (highlighted in blue in Fig. 2). A cluster of genes related to the proteins of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR-Cas) systems, probably belonging to type I-E [60], was found exclusively in the biovar equi (highlighted in blue in Fig. 3). Both systems work as protection systems, defending the bacteria against exogenous DNA. We analyzed 40 other sequenced strains of C. pseudotuberculosis to confirm these results in other strains. The same pattern was observed. RM systems have two main components, a DNA methyltransferase, and a restriction endonuclease. The first one methylates the DNA in possible cleavage sites; the second one is responsible for the cleavage of DNA from external sources [61]. A good review of RM systems can be found in [62]. CRISPR-Cas systems are adaptive immune systems in bacteria and archaea. They use a complex of proteins known as Cas that are responsible for acquiring new, short sequences of external sources (exogenous genetic elements). These short sequences are incorporated into the bacterial chromosome and are called CRISPRs. The CRISPRs are transcribed into small RNAs that guide the Cas proteins to recognize and cleave foreign DNA, protecting the bacterial genome [63]. Reviews of CRISPR-Cas systems can be found in [63][64][65].
Possible new drug targets were predicted using the Specialty Genes Search from the Pathosystems Resource Integration Center (PATRIC) bioinformatics resource center [66]. The result shows a new putative target, the gene nrdF2, for five of the six strains used in the present study. In the C. pseudotuberculosis MEX30 strain, this gene is annotated as a pseudogene, which can explain why it was not considered a putative target. The product of this gene is the small subunit of ribonucleotide reductase (RNR) which is involved in dNTP (deoxynucleotide triphosphate) synthesis that reduces ribonucleotides to nucleotides. The RNRs can be classified into three classes (I, II and III). Class I is oxygen dependent and has two subclasses (Ia and Ib). Class Ia is coded by nrdA and nrdB genes; class Ib is coded by nrdE and nrdF. Therefore, the RNR found in the biovar ovis strains belongs to class Ib [67]. Previous studies [68][69][70] show the importance of this gene for growth under normal conditions (in vitro) in Mycobacterium tuberculosis, Corynebacterium ammoniagenes and Corynebacterium glutamicum. Additionally, other studies have pointed to this gene as a potential target of M. tuberculosis vaccine [70][71][72].

Conclusions
In the present study, we investigated six strains of C. pseudotuberculosis from different hosts and their sequenced genomes, the first whole-genome investigation of this organism from Mexico. The phylogenomic  The nitrate reductase gene cluster is highlighted by a black rectangle analysis suggested that the genetic structure of Ovis is more influenced by animal transportation than host preference. An in silico analysis of protein families showed two important clusters that may differentiate the biovars equi and ovis. Also, the present work identified a new putative drug target against C. pseudotuberculosis, the gene nrdF2, which has been previously described as a potential vaccine target [70][71][72]. Further in silico and in vitro analyses are required to validate these findings. Those results could provide a better understanding of this organism and its mechanisms of virulence and pathogenesis, as well as develop new diagnoses, vaccines, and treatments.