The draft genome of Brucella abortus strain Ba col-B012, isolated from a dairy farm in Nariño, Colombia, bring new insights into the epidemiology of biovar 4 strains


 Brucellosis is a commonly diagnosed zoonosis that causes infertility and abortion in cattle, it is acquired from handling of infected animals or consuming contaminated milk or milk products. In Colombia, it belongs to the official notifiable disease list, despite its relevance little is known about the origin, epidemiology and the genetic constituents of the strains circulating in dairy farms. Here we present the draft genome of B. abortus Ba Col-B012, an isolate obtained from a female Holstein belonging to a dairy farm in Nariño, Colombia. This genome comprises 3,234,714 bp and 3018 predicted protein-encoding genes. Using comparative genomics and phylogenetic analysis, we found that the strain Ba Col-B012 clustered with known biovar 4 variants. The analysis of the core genes allowed the identification of polymorphisms only present in biovar 4 genomes, these regions are proposed as possible targets for identification by PCR. The sequencing of B. abortus Ba Col-B012 genome provides important insights to improve the diagnosis and the epidemiology of this disease and represents the first report of the biovar 4 in Colombia.



Introduction
The brucellosis is one of the most important zoonotic diseases that causes infertility and abortion in cattle. In livestock, brucellosis is mainly caused by Brucella abortus, a Gram-negative coccobacillus that behaves as a facultative intracellular pathogen. There are up to eight variants of this species that differ on their physiological characteristics and are classified as biovars. However, some of these biovars differ only slightly and their status as true variants is unresolved. Some biovars have a wide geographic distribution; B. abortus biovar1 and biovar2 are found around the world, while others as the biovar 5 are mainly distributed in Europe [1]. In South America, recent studies have identified several biovars, for instance, a survey of a 30-year B. abortus collection from Brazil, found biovars 1, 2, and 3 [2], while in Ecuador, biovar 1 and 4 have been reported [3]. However, there still is a lack of sufficient information to establish biovar presence and distribution in other countries of the continent. In Colombia, even though there are regions with high prevalence and isolation of B. abortus [4,5], there are no reports on the identification of their corresponding biovars.
The genome presented here belongs to a larger collection of pathogens isolated as part of a monitoring program to identify the principal infectious agents related to infertility and abortion in cattle present in the southern part of Colombia [6]. During this survey, 12 B. abortus strains were isolated from dairy farms (Nariño, Colombia). Recently some of these strains were typified using AMOS-ERY-PCR [7] and MLVA methods [8], and a representative isolate was chosen for sequencing. Here we present the draft quality genome of the strain, B. abortus Ba Col-B012, this genome contributes to a better understanding of the genomic constituents of local isolates and to the identification of virulence factors and conserved genes that code for immunogenic proteins that can eventually be used in the development of vaccines and new serological tests.

Classification and features
Brucella abortus is a non-motil, Gram-negative short bacillus measuring about 0.6 to 1.5 μm by 0.5-0.7 μm (Fig. 1). The B. abortus species belong to the family Brucellaceae, class Alphaproteobacteria and phylum Proteobacteria. Colonies are smooth, small, round, convex, and non-pigmented, on Brucella agar small colorless punctate colonies, appear within 48 to 72 h at 37°C. Even though they are aerobes, providing a CO 2 atmosphere may enhance growth.
The Brucella abortus Ba Col-B012 strain was obtained from a female Holstein with an episode of abortion. The sample was taken from vaginal fluids with a swab and isolation was done on trypticase soy agar and brain infusion agar supplemented with 5% Horse serum, this media was incubated at 37°C for 72 to 96 h, with a 5% CO 2 atmosphere. Small transparent colonies were obtained with regular edges. Isolates were characterized by being non-motile and positive for the urease and oxidase tests and for the agglutination test by using polyclonal anti-Brucella abortus antibody (Difco). A summary of the classification and general features of B. abortus strain Ba Col-B012 is presented in Table 1.

Genome sequencing and information
Genome project history B. abortus strain Ba Col-B012 was isolated as part of a monitoring program to identify the principal infectious agents related to infertility and abortion in cattle present in the southern part of Colombia [6]. The main objective for sequencing B. abortus genomes is to explore the genomic constituents of the local isolates and to identify virulence factors, polymorphic regions, and immunogenic proteins that can be eventually be used in the development of vaccines and new serological and molecular tests. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
Brucella abortus strain Ba Col-B012 strain was grown on trypticase soy agar and brain infusion agar supplemented with 5% horse serum, this media was incubated at 37°C for 72 h. Genomic DNA extraction was done with the CTBA-Phenol Chloroform method couple to ethanol precipitation [9]. DNA was quantified using the dsDNA HS (High Sensitivity) kit on a Qubit™ (Life Technologies), a greater than 30 ng/μl DNA concentration was obtained. Quality and purity of DNA was determined by spectrophotometry (Nanodrop® 2000 Thermo Fisher Scientific) obtaining a 260/280 and 260/230 ratio equal to 2.

Genome sequencing and assembly
Whole-genome sequencing of the B. abortus strain Ba Col-B012 strain was performed by employing the Illumina HiScan SQ (Molecular Biology Lab, Corpoica). Libraries were generated using the Sure Select Strand Agilent Sample Preparation, once the DNA concentration was determined library amplification was done with the TruSeq PE Cluster Kit v3, (Illumina), using Cbot (Illumina). For de novo assembly, we used 3,956,238 paired-end Illumina reads (150 bp) and the Newbler v 2.0.01.14 software. The assembly resulted in 233 contigs with total genome length of 3227,565 bp and with 50× average coverage.

Genome annotation
Gene prediction was conducted with GeneMarkS+ [10], and PRODIGAL [11] and annotation was done automatically using the NCBI Prokaryotic Genome Annotation Pipeline. The annotation was corrected manually using the data from different databases (Swiss-Prot [12] and RAST [13]). We use LipoP v 1.0 [14] for finding genes with signal peptides and with transmembrane helices.

Genome properties
The genome statistics are provided in Table 3. The assembly resulted in 233 contigs with total genome length of 3227,565 bp and with 50× average coverage. The N 50 contig size is 22,624 and a maximum contig size of 106,301 bp and a G + C content of 57.28 mol%. These values are similar to those reported for the genomes NC_006932.1, NZ_CP007709.1 and NZ_CP007705.1 of B. abortus at NCBI. Using our annotation pipeline, it was possible to identify 3227 predicted genes of which 3018 were putatively protein-encoding, 166 pseudogenes, 42 Cells were grown on trypticase soy agar and brain infusion agar supplemented with 5% horse serum, this media was incubated at 37°C for 48 h tRNAs and 1 ncRNA. For the majority of the proteinencoding genes (78.12%) a function could be assigned. The distribution of these genes into COG functional categories [15] is shown in Table 4. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/Gen-Bank under the accession LODQ00000000. The version described in this paper is version LODQ01000000.

Genomes used in this study
A total of 28 B. abortus genomes were downloaded from the NCBI database of complete and draft bacterial genomes, even though there are many more genomes in the database, only those with identified biovar were used for further analyses. The genomes and their GeneBank accession numbers are listed in Table 5. The genes used Phylum Proteobacteria TAS [25,26] Class Proteobacteria alfa TAS [27] Order "Rhizobiales" TAS [28] Family Brucellaceae TAS [29,30] Genus Brucella TAS [30,31] Species Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [12/17]  in the analysis were predicted from the genomes using PRODIGAL with the default settings [11].
Genomic differences between B. abortus BA col-B012 and the type strain B. abortus 2308 The comparative genomic analysis between B. abortus strain BA Col-B012 and the type strain, B. abortus 2308, shows that both genomes shared 3015 genes, most of these genes are identical (2862 genes with 100%). Within this set of genes there are around 12 genes that are divergent with a nucleotide identity ranging from 77 to 94%, (Additional file 1: Table S1) among the genes are an ABC transporter permease, benzoate transporter, an alpha/beta hydrolase, a 5-hydroxymethyluracil DNA glycosylase, several hypothetical proteins and a hemolysin D gene (HlyD). Hemolysin D is part of the membrane transporter of the HlyA, a pore-forming toxin that affects the membrane of the host [16]. We also identified 16 genes present in strain BA Col-B012 that were not found in the type strain (Additional file 1: Table S2).
Most genes in this set are hypothetical proteins, transporters and transcriptional regulators. These differences show that strain BA Col-B012 differs from the type  The total is based on the total number of protein coding genes (3018) in the genome strain 2308. In order to elucidate if these differences are related to the biovar classification a comparative genomic analysis with more strain was done in the next section.
The evolutionary distance and phylogenetic relationship of B. abortus strain Ba col-B012 A phylogenomic approach was done to establish the evolutionary relationship of B. abortus strain Ba Col-B012 and to evaluate whether biovars are congruent with true genetic groupings. The phylogenetic analysis was done by concatenating the alignment of orthologues genes shared by all strains. In order to identify a set of orthologous genes, an in-house PERL script that incorporates the reciprocal best match approach was used [17]. In brief, the predicted genes of strain Ba Col-B012 were searched using the blastn algorithm [18] against the genomic sequences of each of the remaining genomes. The best match for each query gene (genes with higher than 70% identity and alignment coverage) was extracted and searched against the complete gene complement of the Ba Col-B012 strain to identify reciprocal best matches. The reciprocal best match genes were denoted as orthologues, 3139 orthologous genes were shared among all strains, from these 2169 were identical among all strains (100% nucleotide identity). Average nucleotide identity (ANI) was quantified using the nucleotide identity of orthologues between the strain Ba Col-B012 and the other genomes, this is a measurement of genomic divergence that is used in modern taxonomy as the gold standard to delimitate new species [19,20]. The ANI values between the Ba Col-B012 and the rest of the strains were higher than 99.6 % (Additional file 1), these high identity reflect the close evolutionary relationship between the B. abortus strains that make difficult the identification of biovar variants. Despite the close relationship between all genomes, strain Ba Col-B012 showed a closest affiliation with biovar 4 strains (99.88%).
In order to corroborate the affiliation of Ba Col-B012 to biovar 4, the phylogenetic relationship of shared polymorphic genes, around 2961 genes, was inferred using the Neighbor Joining algorithm with the Jukes-Cantor distance and 1000 bootstraps (Fig. 2). As shown before by the ANI analyses, strain Ba Col-B012 was more closely related to the biovar 4 strains clustering in the same clade with a 100 bootstrap value. This represents the first confirmed report of a biovar 4 strain in Colombia, and may suggest a possible transfer from Ecuador which is the country that delimits with the Nariño region and where biovar 4 has been reported [3].

Used of polymorphic regions in the identification of B. abortus Biovar 4 and its potential for diagnosis and vaccination
Current identification of biovars is based on standard microbiological methods and molecular approaches like MLVA analysis. MLVA is particularly a high discriminatory method useful in epidemiological studies and in the identification of genetic variability of strains [21]. However, this methodology is not always conclusive. In order to complement the current methods of diagnosis with PCR-based amplification and sequencing, orthologous regions that could be used to differentiate biovar 4 genomes from others were identified. We found around 42 genes with polymorphism that differentiate biovar 4 genomes from the rest. Most genes have only one single nucleotide polymorphism (SNP), from this set almost half of the SNPs are non-synonymous. From all evaluated genes, only one hypothetical gene has two polymorphisms that are synonymous (set 12). We also found two genes that have insertion-deletions and three genes that are shorter than the biovar 1 counterpart due to the presence of an early stop codon (See Table 6 for a description of genes and differences). All gene set described in the analysis are provided in the Additional files section. In order to design primers for genetic markers for biovar 4, we focused on orthologues amplifiable by PCR (<400 bp) that have large INDELs or genes with synonymous polymorphisms, this guarantees that the observed changes are not under selection. We identified six genes that met this criteria, these were: hypothetical protein similar with BA14K family domain (gene set 8), hypothetical protein (gene set 12), DNA-3-methyladenine glycosylase (gene set 13), tyrosine-tRNA ligase (gene set 19), glutamine synthetase (gene set 30), and ABC transporter permease (gene set 40). Based on these genes, we designed sets of primers that amplify the polymorphic regions and therefore can be used for the identification. Table 7 summarizes the designed primers and their predicted PCR conditions. Comparative genome analysis of B. abortus strains is a powerful tool for the identification of allele variants/ polymorphism that modulate virulence. Interestingly, among the identified polymorphic genes, two genes have been associated with pathogenicity and immune response, a hypothetical protein similar with BA14K family domain ( Table 6, gene set 8) and a gene coding for the subunit B of the exonuclease ABC (Table 6, gene set 7). The domain BAL14K had been demonstrated to induce a strong immunoreactivity in mice, with a Th1 response and induction of IL-12 secretion [22]. While changes in the subunit B of exonuclease ABC have been associated with minor virulence changes between attenuated and virulent Brucella strains [23]. It is also worth mentioning that several other sets of genes identified as polymorphic might also display immunogenic reactivity, as their coding proteins are located in the membrane at the interphase with the environment, for instance, several transporters in B. abortus have been used to produce in vivo-induced antigens [24]. These genes are potential targets for future vaccination and diagnosis. Fig. 2 Evolutionary relationships of B. abortus strain Ba Col-B012. The evolutionary history was inferred using the Neighbor-Joining method [33]. The bootstrap consensus tree inferred from 1000 replicates [34] is taken to represent the evolutionary history of the taxa analyzed [35]. All positions containing gaps and missing data were eliminated. There were a total of 2,632,124 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 [36]   Position of primers is relative to the gene set alignment