Complete genome sequences of Francisella noatunensis subsp. orientalis strains FNO12, FNO24 and FNO190: a fish pathogen with genomic clonal behavior

The genus Francisella is composed of Gram-negative, pleomorphic, strictly aerobic and non-motile bacteria, which are capable of infecting a variety of terrestrial and aquatic animals, among which Francisella noatunensis subsp. orientalis stands out as the causative agent of pyogranulomatous and granulomatous infections in fish. Accordingly, F. noatunensis subsp. orientalis is responsible for high mortality rates in freshwater fish, especially Nile Tilapia. In the current study, we present the genome sequences of F. noatunensis subsp. orientalis strains FNO12, FNO24 and FNO190. The genomes include one circular chromosome of 1,859,720 bp, consisting of 32 % GC content, 1538 coded proteins and 363 pseudogenes for FNO12; one circular chromosome of 1,862,322 bp, consisting of 32 % GC content, 1537 coded proteins and 365 pseudogenes for FNO24; and one circular chromosome of 1,859,595 bp, consisting of 32 % GC content, 1539 coded proteins and 362 pseudogenes for FNO190. All genomes have similar genetic content, implicating a clonal-like behavior for this species. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0151-0) contains supplementary material, which is available to authorized users.


Introduction
In 1922, Edward Francis , an American bacteriologist, described the bacterium that causes tularemia in humans, Francisella tularensis. This bacterium is the most studied of its genus [1,2]. Until recently, the genus Francisella consisted of only two species: F. tularensis and F. philomiragia; however, new species and new strains were isolated, such as F. noatunensis and the subspecies F. noatunensis subsp. orientalis [1], the latter being recognized as one of the most important pathogens of cultured tilapia (Oreochromis spp.) [3].
F. noatunensis subsp. orientalis is the etiologic agent of pyogranulomatous and granulomatous infections in fish. In the last few years, F. noatunensis subsp. orientalishas been responsible for a large number of deaths of tilapia and other freshwater species cultured in the United States, the United Kingdom, Japan, Taiwan, Jamaica, Costa Rica, Brazil and some other Latin American regions [4][5][6]. Nevertheless, besides infecting important cultivable species such as tilapia, threeline grunt (Parapristipoma trilineatum) and hybrid striped bass (Morone chrysops X Morone saxatilis), this bacterium is also capable of infecting wild fish such as guapote tigre (Parachromis managuensis) [4,5].
Although the disease caused by this species presents with a high mortality rate during outbreaks and has been reported in several countries, the phylogenomic relationships among isolates from different countries and the evolutionary history of this pathogen are still poorly characterized. Therefore, the strains presented herein were isolated from three different regions and outbreaks to characterize the genetic diversity of the microorganism F. noatunensis subsp. orientalis strains FNO12, FNO24 and FNO190.

Genome project history
In the present study, the nucleotide sequence of the F. noatunensis subsp. orientalis FNO12, FNO24 and FNO190 complete genomes was determined. Sequencing and assembly were performed by the National Reference Laboratory for Aquatic Animal Diseases, and annotation was performed by the Laboratory of Cellular and Molecular Genetics. Both laboratories are located at the Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil. Source DNA of these three strains are available at culture collection of AQUACEN. Table 2 presents the project information and its association with MIGS version 2.0 compliance [9].
Growth conditions and genomic DNA preparation F. noatunensis subsp. orientalis strains FNO12, FNO24 and FNO190 were isolated from three different outbreaks  Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS, Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or an anecdotal evidence). These evidence codes are from the Gene Ontology project [11] from Nile tilapia fish farms. Swabs of kidney (FNO12) and spleen (FNO24 and FNO190) tissues from each fish were sampled aseptically, streaked onto cysteine heart agar supplemented with 2 % bovine hemoglobin (BD Biosciences, USA) and incubated at 28°C for 4-7 days [7]. The isolates were stored at -80°C in Mueller-Hinton cation-adjusted broth supplemented with 2 % VX supplement (Laborclin, Brazil), 0.  The quality of the raw data was analyzed using FastQC [10], and the assembly was performed using the Edena 2.9 [11], Mira 3.9 [12] and Newbler 2.9 (Roche, USA) as the applied ab initio strategy. The assemblies of FNO12, FNO24 and FNO190 produced a total of 15, 57 and 16 contigs, respectively. The first strain resulted in~1382-  The total is based on either the size of the genome in base pairs or the total genes in the annotated genome fold, coverage, the second had a value of~79-fold, coverage, and the third had a value of~203-fold coverage,. Additionally, the strains FNO12, FNO24 and FNO190 presented an N50 value of 275,043 bp, 87,100 bp, and 237,022 bp, respectively. A super scaffold for FNO12 was produced with an optical map as a reference using restriction enzyme NheI, on MapSolver software (OpGen Technologies, USA). The remaining gaps were filled through the use of CLC Genomics Workbench 7 (Qiagen, USA) by mapping the raw data in gap flank repeated times until the overlap was found. For FNO24 and FNO190, the complete genome of FNO12 was used as a reference to construct the super scaffolds on CONTIGuator 2.0 software [13], and gap filling was conducted as described for strain FNO12. All the raw sequencing data were mapped onto the each final genome and the lack of contamination with other genomes were confirmed by the coverage and the low number of unmapped reads.

Genome annotation
Automatic annotation was performed using the RAST software [14]; tRNA and rRNA predictions were conducted using the tRNAscan-SE Search Server [15] and the RNAmmer [16], respectively. Manual curation of the annotation was done using Artemis software [17] and the UniProt database [18]. All putative frameshifts were manually curated based on the raw data coverage in CLC Genomics Workbench 7 software (Qiagen, USA), which was used to correct indel errors in regions of homopolymers.

Genome properties
The genomes are each comprised of a circular chromosome with sizes of 1,859,720 bp, 1,862,322 bp, and 1,859,595 bp for FNO12, FNO24, and FNO190, respectively ( Table 3). The GC content in the three strains is 32 %, and the number of pseudogenes is relatively high (363 on average). The percentage is based on the total number of protein coding genes in the annotated genome b The total does not correspond to the final quantity of CDSs for each genome because some genes are associated with more than one COG functional category Strain FNO24 had more protein coding genes, and one RNA-coding gene fewer than the other two strains. For the FNO12 and FNO190 strains, 1280 genes were annotated with functional prediction, whereas for strain FNO24, 1282 genes were annotated. Each genome contained 621 CDSs classified as hypothetical proteins by the COG database [19]. Table 4 summarizes the number of genes associated with general COG functional categories. Figure 3 shows the comparison of FNO12 with FNO24, FNO190 (presented in this study) with the other two strains deposited in GenBank (F. noatunensis subsp. orientalis strains LADL-07-285A and Toba04, accession numbers: CP006875 and CP003402, respectively).

Insights from the genome sequence
A high similarity in the genetic content of these genomes was seen in Fig. 3. Additionally, Additional file 1 shows the only eight protein coding sequences with less than 99 % identity between the three sequenced genomes (six hypothetical proteins, one Type IV pili, and one secreted protein). Also, this high intraspecies similarity (100.00 ± 0 %) may be viewed in Additional file 2 and Additional file 3 using Gegenees [20] with threshold of 30 % and Mauve [21] with progessiveMauve algorithm, respectively. These analyses include the three strains of this work and other three deposited at GenBank (FNO01, Toba04, and LADL-07-285A, GenBank nos. CP012153, CP003402, and CP006875, respectively) belonging to the same species. In contrast, the similarity with the subspecies F. noatunensis subsp. noatunensis is reduced to 84.09 ± 0.40 % (Additional file 2). Furthermore, the orthoMCL software [22] was used to predict the cluster of orthologous genes. CDSs shared by all species were considered to be part of the core genome, whereas CDSs harbored by only species were considered to be species-specific genes. There are 891 CDSs shared by all Francisella species (Fig. 4) XCL-2, GenBank nos. CP000937, CP000439, CP000109, respectively). Ten genomics islands were predicted by GIPSy, including 2 putative pathogenic islands (PAI1 and PAI2) and 1 putative resistance island (REI1), and plotted using BRIG software [24] (Additional file 5). GEI3 is, apparently, exclusive of F. noatunensis subsp. orientalis, and GEI4 is shared only with F. noatunensis subsp. noatunesis species, another species of marine environment. REI1 and PAI1 are partially shared by all species of Francisella genus. PAI2 is partially shared with all species of Francisella genus and totally shared with F. philomiragia and F. philomiragia subsp. philomiragia species. GEI6, predicted only as genomic island by GIPSy, contains the genes mltA, rplM, rpsI, mglA, mglB, rnhB, yfhQ, ptsN, mnmE, cysK, pdpA, pdpB, iglD, iglC, iglB, iglA, pdpD, anmK, related with the Francisella Pathogenicity Island, a previously described pathogenic island for the Francisella genus [25]. Further studies are required to characterize these genomic islands, since the GIPSy analysis suggests a greater number of Horizontal Gene Transfer than previously described for this species.

Conclusions
Three genomes of an important fish pathogen are presented in this work. Despite being isolated from different outbreaks and from different host organs, they are very similar considering the brief analysis of this work. All analyses suggest the clonality of the strains with minor differences in the quantity of pseudogenes and the number of CDSs and RNAs. Furthermore, the high number of pseudogenes present in all sequenced strains corroborate that this species is undergoing genome decay [1].