The genome anatomy of Corynebacterium pseudotuberculosis VD57 a highly virulent strain causing Caseous lymphadenitis

Corynebacterium pseudotuberculosis strain VD57 (Cp_VD57), a highly virulent, nonmotile, non-sporulating, and a mesophilic bacterium, was isolated from a goat’s granulomatous lesion in the municipality of Juazeiro, Bahia State, Brazil. Here, we describe a set of features of the strain, together with the details of its complete genome sequence and annotation. The genome comprises of a 2.5 Mbp long, single circular genome with 2,101 protein-coding genes, 12 rRNA, 49 tRNA and 47 pseudogenes and a G + C content of 52.85 %. Genetic variation was detected in Cp_VD57 using C. pseudotuberculosis strain 1002 as reference, wherein small genomic insertions and deletions were identified. The comparative analysis of the genome sequence provides means to better understand the host pathogen interactions of this strain and can also help us to understand the molecular and genetic basis of virulence of this bacterium.


Introduction
Corynebacterium pseudotuberculosis is the etiologic agent of caseous lymphadenitis in sheep and goats, the organism has also been associated with mastitis [1][2][3] and can cause ulcerative lymphangitis in horses and cattle [4]. CL is a chronic disease that is characterized by the formation of granulomas in lymph nodes and internal organs, as a response of the host's immune system against this bacterium that resists to the bactericidal action of phagocytic cells [3].
CL is considered as one of the economically important diseases of small ruminants with losses attributed to reduced wool and hide yields, carcass condemnation, morbidity and rarely mortality [5,6]. The prevalence of CL has been observed worldwide, including South Africa, Brazil, the USA, Canada, Australia, New Zealand, United Kingdom and Egypt [7].
The pangenome analysis of 15 strains of the pathogen was completed recently [8]. However, as C. pseudotuberculosis is a relatively clonal organism [9][10][11][12][13], the identification of the virulence mechanisms or nucleotide modifications responsible for making a strain more virulent than another, have not yet been identified.
Sequencing of new genomes coupled with a deeper comparative analysis between the genomes and associating such analyses with the host pathogen interactions can help us understand and identify the differences between genomes and virulence factors. In this context, the present study reports the sequence the genome of the highly virulent strain VD 57 and to understand its virulence factors.

Organism information
Classification and features C. pseudotuberculosis is a Gram-positive bacteria and belong to a CMNR (Corynebacterium, Mycobacterium, Nocardia and Rhodococcus) group that shares characteristics including an outer lipid layer, mycolic acids in the cell wall along with its derivatives including phospholipids and lipomannans [7]. C. pseudotuberculosis is a facultative intracellular pathogen showing pleomorphic forms like coccoids and filamentous rods, non-motile, nonsporulating and possessing fimbriae, with sizes ranging between 0.5-0.6 μm and 1.0-3.0 μm [7].
The C. pseudotuberculosis strain VD57 (Cp_VD57) was isolated from a goat's granulomatous lesion in the municipality of Juazeiro, Bahia State, Brazil. The bacterial identification was made through Gram's staining, colonies' morphology analysis, synergic hemolysis with Rhodococcus equi in Brain Heart Infusion, Blood Agar Medium, and biochemical assays using the API Coryne system (BioMérieux). The strain is maintained in BHI broth at the Microbiology Laboratory of the Federal University of Bahia [14,15].
C. pseudotuberculosis strain VD57 has been shown to be highly pathogenic to goats and mice [14]. This Cp_VD57 strain was able to induce IFN-gamma production in goats on day 5 after infection. Additionally, it induced a positive antibody titer between 6 and 11 days after infection [16]. Using a murine experimental model, it was observed that, the strain was able to induce a high mortality, when compared to the T1 attenuated strain, confirming its virulent profile [15]. Moura-Costa et al. used Cp_VD57 strain to challenge goats that were immunized with the attenuated T1 strain, obtaining a protection of 33.3 % and a strong humoral response, but the immunization was not able to prevent the spread of this virulent bacteria in the majority of the vaccinated animals [14].
One of the most important fields in the C. pseudotuberculosis study is the definition of genes that are differentially expressed in bacterial cultures and inside the granulomatous lesions. In this regard, VD57 strain was used in a study with the objective to determine reference genes to be used in quantitative real time PCR. It was found that eight of these genes (atpA, dnaG, efp, fusA, gyrA, gyrB, rpoB, and rpoC), mostly participating in DNA replication and transcription, can be useful as candidate reference genes, while DNA gyrase subunit A (gyrA) and elongation factor P (fusA) presented the most suitable profiles to be used in qPCR studies [17]. Figure 1 shows a phylogenetic tree of Corynebacterium pseudotuberculosis strain VD57 based on rpoB gene (β subunit of Fig. 1 Phylogenetic tree of C. pseudotuberculosis strain VD57 representing its position relative to type strains in Corynebacteriaceae along with some other type strains of CMNR group. The tree was inferred from 3,537 aligned characters of the rpoB gene sequence using maximum likelihood method and then checked for its agreement with the current classification in Table 1. The branch lengths represent the expected number of substitutions per site. Numbers adjacent to the branches are support values from 1,000 bootstrap replicates, indicated when larger than 60 %. Calculations to determine the phylogenetic distances were done by the software MEGA v6 [40]. The GenBank accession numbers are shown in parentheses Phylum Actinobacteria TAS [31] Class Actinobacteria TAS [32] Order Actinomycetales Suborder Corynebacterineae TAS [32,33] Family Corynebacteriaceae TAS [32][33][34][35] Genus Corynebacterium TAS [36][37][38] Species Corynebacterium pseudotuberculosis TAS [37,39] Gram stain Positive TAS [14] Cell shape Bacilli TAS [14] Motility Non-motile TAS [14] Sporulation Non-sporulating TAS [14] Temperature  RNA polymerase). All the classification and general features of C. pseudotuberculosis strain VD57 are summarized in Table 1. De Souza et al. employed VD57 strain to verify the intracellular signaling cascade activation during the infection of splenocytes with the bacterium, and the importance of signaling pathways in the production of different cytokines. The results showed that VD57 strain was able to induce the production of TNF-alpha through the MAPK p38, and IL-10 induction via ERK-1 and −2 pathways. The complete genome sequencing and analysis will help in identifying the genetic background and the genes that may be involved in the infections [18].

Genome project history
In the present study, we determined the nucleotide sequence of the C. pseudotuberculosis strain VD57 (Cp_VD57) genome, isolated from a goat granulomatous lesion. Sequencing, assembly, and annotation were performed at Laboratory of Cellular and Molecular Genetics (LGCM), Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil and Aquacen -National Reference Laboratory for Aquatic Animal Diseases, Federal University of Minas Gerais, Brazil. The Cp_VD57 complete genome sequence and annotation data were deposited in the GenBank under the accession number CP009927. Table 2 presents the project information in accordance with the Minimum Information about a Genome Sequence (MIGS) [19].

Growth conditions and genomic DNA preparation
Cp_VD57 strain was grown in brain-heart-infusion media (BHI-HiMedia Laboratories Pvt. Ltd, India) under rotation at room temperature (37°C). Extraction of chromosomal DNA was performed using 30 mL of 48-72 h culture of bacteria, centrifuged at 4°C and 4000 rpm for 15 min. Resuspension of cell pellets was done in 600 μL Tris/EDTA/ NaCl [10 mM Tris/HCl (pH7.0), 10 mM EDTA (pH 8.0), and 300 mM NaCl], and transferred to tubes with beads for cell lysis using Precellys®24-Dual (2 cycles of 15 s at 6500 rpm with 30 s between them). Thereafter, purification of DNA with phenol/chloroform/isoamyl alcohol The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome The total does not correspond to 1,537 CDSs, because some genes are associated with more than one COG functional categories (25:24:1) was followed by precipitation with ethanol/ NaCl/glycogen (2.5v, 10 % NaCl and 1 % glycogen). The DNA was re-suspended in 30 μL MilliQ®. The concentration was determined by spectrophotometer, and the DNA was visualized in ethidium bromide-stained 0.7 % agarose gel.

Genome sequencing and assembly
The Ion Personal Genome Machine® System (Life Technologies) platform was used for sequencing, using fragment library. The reads with good quality was assembled using de novo strategy through Mira 4.0 software [20]. The assembly produced a total of 15 contigs, coverage of 78.22x with a N 50 contig length of 405.436. Additionally, a scaffold was created using the CONTIGuator 2 software [21], taking the genome sequence of C. pseudotuberculosis strain 1002 (NC_017300.1) as reference. The gaps were closed manually using CLC Genomics Workbench 7 software [22].

Genome annotation
The annotation of genes was transferred by our in-house scripts using C. pseudotuberculosis strains 1002, 258 Fig. 2 Graphical circular map of the genome [41]. From center to the outside: In wine Ovis strains, in blue Equi strains, RNA genes (tRNAs green, rRNAs orange, tRNAs red), GC content in black, GC skew Manual annotation was performed using Artemis software [23]. Other elements such as rRNA, tRNA, and repetitive regions were predicted using RNAmmer [24], tRNAscan-SE [25], and Tandem Repeat Finder [26], respectively. Enzyme Commission Numbers (EC number) prediction were performed using RAST tool [27].

Genome properties
The genome is 2,337,177 bp long and comprises one main circular chromosome with a 52.19 % GC content. A total of 2,148 genes were predicted, among which 2,101 were protein coding genes, and 61 RNAs. Forty seven pseudogenes were also identified. The properties and statistics of the Cp_VD57 strain genome are listed in Table 3. The distributions of genes according to the COGs functional categories is presented in Table 4, followed by a cellular overview diagram in Fig. 2 and a summary of metabolic network statistics shown in Table 5.

Insights from the genome sequence
Genetic variation seems to be limited in C. pseudotuberculosis, which has been shown previously as genetically homogenous [9][10][11][12][13]. The MLST findings of the 64 biovar ovis strains show seven STs and all were clonally derived by eBURST analysis when a complex was deemed to share 7/8 loci; the strain Cp_VD57 was included in this analysis [28]. Although it is evident that there is very little genetic variation, we analyzed the fully sequenced Cp_VD57 genome to detect the presence of SNPs. The detected SNPs are listed in Table 6.
To run SNP detection programs with MUMmer [29], default parameters were assigned. The results for SNP are in agreement with the literature, despite the fact that these strains were isolated from several hosts in different countries thereby verifying that C. pseudotuberculosis strains show limited genetic differences between worldwide strains.
Small genomic insertions and deletions were identified using the reference strain 1002, which is closer to Cp_VD57. MUMmer [29] identified 425 indels in Cp_VD57, 18 of which were in coding regions. However, three major regions of indel were identified comparing 1002 and VD57 strains: two insertion regions and one deletion. The first insertion region is located at coordinates 966430 to 968875 and comprises 2445 pb; this region has 4 genes and is present in biovar Equi strains. The second insertion region is located at coordinates 1182765 to 1182855 (90 pb), and is located within a hypothetical protein. Finally, the deletion region is located at 1002 strain (1575360-1576000) and comprises 640pb aceF pseudogenes.

Conclusions
Isolates from the C. pseudotuberculosis are genetically homogenous. Multi-locus sequence typing and comparative genomic analysis show that the isolates ovis seem to fall into the same clades. Despite the general similarity between the strains from C. pseudotuberculosis, some are more virulent, as C. pseudotuberculosis strain VD57 presented in this paper. Comparative studies with genome sequences of different C. pseudotuberculosis strains and Cp_VD57 can be performed and these analyses may be useful in identification of genome variations.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions SA: wrote the manuscript, performed manual curation, and analyzed the data. ST: drafted the manuscript and analyzed the data. DM: genome assembly and analysis of raw data. FS and FAD performed laboratory experiments. SBJ, RTR and NC: annotated the genome. LFM, RP and RM: performed the microbiology and molecular biology studies. FLP: development of scripts for analysis of raw data. SCS, CAGL and AFC: analysis of raw data. VA: wrote the manuscript. VA, AS, DB, PG and HF: Contributed reagents/materials/ analysis tools. All authors read and approved the final manuscript.