Complete genome sequence of Streptococcus agalactiae strain GBS85147 serotype of type Ia isolated from human oropharynx

Streptococcus agalactiae, also referred to as Group B Streptococcus, is a frequent resident of the rectovaginal tract in humans, and a major cause of neonatal infection. The pathogen can also infect adults with underlying disease, particularly the elderly and immunocompromised ones. In addition, S. agalactiae is a known fish pathogen, which compromises food safety and represents a zoonotic hazard. This study provides valuable structural, functional and evolutionary genomic information of a human S. agalactiae serotype Ia (ST-103) GBS85147 strain isolated from the oropharynx of an adult patient from Rio de Janeiro, thereby representing the first human isolate in Brazil. We used the Ion Torrent PGM platform with the 200 bp fragment library sequencing kit. The sequencing generated 578,082,183 bp, distributed among 2,973,022 reads, resulting in an approximately 246-fold mean coverage depth and was assembled using the Mira Assembler v3.9.18. The S. agalactiae strain GBS85147 comprises of a circular chromosome with a final genome length of 1,996,151 bp containing 1,915 protein-coding genes, 18 rRNA, 63 tRNA, 2 pseudogenes and a G + C content of 35.48 %. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0158-6) contains supplementary material, which is available to authorized users.


Introduction
Streptococcus agalactiae is a bacterial pathogen, distributed worldwide, that causes diseases in humans and animals [1]. In humans, it is frequently associated with meningitis, neonatal sepsis and may also affect immunocompromised adults and the elderly [2]. S. agalactiae is responsible for the most fatal bacterial infections in human newborns [3]. In fish, the pathogen causes meningoencephalitis and septicemia worldwide, in both freshwater and salt-water species [4,5]. Consumption of fish has been associated with an increased risk of colonization by S. agalactiae serotypes Ia and Ib in people [6]. S. agalactiae continues to be a major cause of subclinical mastitis in dairy cattle, which is the dominant health disorder affecting milk production in the dairy industry, and is responsible for substantial financial losses in that industry worldwide [7].
S. agalactiae is of great medical and veterinary importance due to a high social and economic impact [8], together with the incidence of diesase in different hosts [9]. The incidence of invasive infections unrelated to pregnancy in human adults and animals is increasing worldwide [10]. Therefore, further studies in the area remains necessary. Since the 1990s, serotype V emerged in the United States as the most frequent S. agalactiae serotype causing invasive disease in nonpregnant adults [11]. Nowadays, other serotypes including Ia and III have also been recognized in different countries as significant cause of invasive diseases [12]. Comparative genomic studies among several S. agalactiae strains of the different serotypes will contribute to a better understanding of the biological complexity of the species. One such reason drove this study for genome sequencing, assembly and annotation of the GBS85147 S. agalactiae serotype Ia and Sequence Type 103 (ST-103) strain. The pathogenic potential of this human isolate obtained from the oropharynx of an asymptomatic female patient suffering of various recurrent pharyngitis episodes has been increasingly observed in different investigations [13][14][15][16]. From six S. agalactiae strains of Ia, III and V serotypes, only serotype Ia, including strain GBS85147, was capable of triggering a respiratory oxidative burst during adherence to the surface of activated macrophages. This activity was demonstrated by NADPH-oxidase activation within phagocytic vacuoles, indicating a high ability of strain GBS85147 isolated from an asymptomatic patient to survive in aerobic stress conditions. Moreover,  Phylogenetic tree of S. agalactiae GBS85147 strain representing its position relative to other type strains. The phylogenetic tree was generated using S. agalactiae GBS85147 strain, 21 strains of Streptococcus agalactiae, and 3 strains from the genus Streptococcus as outgroup strains available at GenBank. The align and tree were constructed with CLC Genomic Workbench using Neighbor Joining method and Jukes-Cantor measure of nucleotide distance with 1000 bootstrap replications the invasive potential of strain GBS85147 was also demonstrated by bacterial adherence, invasion and survival (24 h) in the intracytoplamatic environment of endothelial cells. Moreover, the detection of sialic acid in bacteria is limited to a few examples, which, strikingly, are all pathogenic, including S. agalactiae. Similar to serotypes III and V, sialic acid residues were also detected on the surface of serotype Ia GBS85147 strain. These findings reinforce the pathogenic potential of S. agalactiae GBS85147 strain by its ability to interfere specifically with opsonic components due to inhibition of the alternative complement pathway by serum deficient in a specific antibody [16,17].

Organism information
Classification and features S. agalactiae is a Gram stain-positive, non-sporulating bacterium having a spherical shape with dimensions ranging from 0.2 to 1.0 microns [10] in diameter. On solid medium, S. agalactiae may form short chains or may form groups of double cocci. In liquid cells, the microorganism can form long chains (Fig. 1). The bacterium is a facultative anaerobe, catalase and oxidase negative, and is capable of lactic acid fermentation [18]. Lancefield identified the group B antigen, a peptidoglycan-anchored antigen (rhamnose, galactose, N-acetylglucosamine, and glucitol), that defines the S. agalactiae species [19,20].
A capsular polysaccharide antigen is used to classify S. agalactiae strains into serotypes [21]. The structure of the CPS is determined by genes encoding the enzymes responsible for its synthesis [22]. Serotype classification is based on the capsular antigen differences detected by PCR or by immunodiffusion techniques [23]. Currently, ten serotypes have been described (Ia, Ib, II, III, IV, V, VI, VII, VIII, IX); serotype IX was identified in 2007 [24]. In some strains, serotype identification is not possible due to the absence of the polysaccharide, caused by a mutation in the capsular genes [25]. The high degree of variation in the capsular structure is related to the virulence of different strains of S. agalactiae [26]. Those variations in the capsular structure may also explain its infection of unusual hosts such as camels, dogs, horses, seals, chickens, dolphins, cats, hamsters, frogs, and monkeys [9].  A phylogenetic analysis was performed using, in total 25 different strains, including S. agalactiae GBS85147 strain, plus 21 strains of Streptococcus agalactiae, and 3 strains from the genus Streptococcus, as outgroup strains, available at GenBank. The 16S rRNA genes, with mean length of 1,526 ± 50 bp, were aligned with CLC Genomics Workbench (Qiagen, USA). The phylogenetic tree was generated in the same software with the Neighbor Joining method and Jukes-Cantor measure of nucleotide distance with 1,000 bootstrap replications. The phylogenetic tree demonstrates the placement of GBS85147 strain with other closely related strains from the same species, forming a specific clade in 100 % of replications, while it remained distant from the Streptococcus spp. equi, suis, and pyogenes (Fig. 2). All 16S rRNA genes found on assembled contigs were in an equal form. Through this data we observed no contamination and evidence of correct identification of GBS85147 strain. Other features of the strain can be viewed in Table 1.

Genome sequencing information
Genome project history S. agalactiae strain GBS85147, taken from a human oropharynx, was isolated in the Laboratory of Molecular The genome project was deposited to the public database and the complete genome sequence is available in the Gen-Bank under the accession number Genbank ID CP010319. Further, project information and association with MIGS version 2.0 compliance [27], are summarized in Table 2.

Genome sequencing and assembly
Genome sequencing was performed using a fragment library with the Ion Torrent™ Personal Genome Machine System, with 200 bp sequencing kit. The sequencing produced a total of 578,082,183 bp, distributed among 2,973,022 reads, with an average genome coverage depth of 246-fold and a Phred quality greater than or equal to 20 in 91.25 % of bases. De novo assembly was performed using Mira v3.9.18 [29]. The assembly resulted in 104 contigs, accounting for 2,032,890 bp and an N50 of 104.996 bp.
Twenty of the contigs obtained were randomly used as query on BlastN+ [30], over NR database to identify the most similar S. agalactiae complete genome deposited in GenBank. After that, the contigs were ordered and oriented using the software CONTIGuator v2 [31] with S. agalactiae GD201008-001 [32] as a reference genome, generating a pseudo chromosome with 31 scaffolds. The remaining gaps were closed removing overlaps of neighboring contigs and via consensus sequences obtained by

Not in COGs
The total is based on the total number of protein coding genes in the genome mapping the raw data against the reference genome using CLC Genomics Workbench 7.0 (Qiagen, USA) [33] and BlastN. Furthermore, only the consensus data was used to close gaps in the rRNA regions.

Genome annotation
Structural gene prediction was performed using the FGE-NESB [34]. To choose a reference, twenty random parts of our genome were used as query on BlastN over the available four S. agalactiae genomes on FGENESB. Therefore, using S. agalactiae 09mas018883 [35] as reference, the prediction resulted in 1,616 genes. The genome annotation was performed manually with Artemis [36], UniProt databases [37] and Interproscan 5 [38]. During manual annotation, 299 additional genes were added. For the prediction of rRNA and tRNA the software RNAmmer v1.2 [39] and tRNAscan-SE [40] were used, respectively.

Genome properties
The genome has one circular chromosome with 1,999,151pb, 35.48 % G + C content, a total of 1,998 CDS, including 1,915 protein-coding genes, 18 rRNAs, 63 tRNAs and 2 pseudogenes. A circular map of the genome was generated using the CGView Comparison Tool [41], shown in Fig. 3. Genome statistics are summarized in Tables 3 and 4. Functional analysis using the COG base showed that approximately 27 % of the genes do not have any described function, which consists in the sum of genes with unknown functions (7.69 %) and genes that were not found in the database (19.97 %).

Insight from the genome sequence
To predict pathogenic islands, GIPSy software [42] was used. GBS85147 strain was compared against 16 complete strains of the same species found at GenBank. BRIG software [43], was used to view the circular structure of pathogenic Islands and the genome strains. Figure 4a represents the seven predicted pathogenicity islands; especially pathogenicity island 4 that consists of six genes, representing four conserved hypothetical proteins whereas two of them are not conserved in all strains. The first one is "Streptokinase", an enzyme usually secreted by Streptococcus species and has a high therapeutic potential to combat thrombolysis, also currently used to combat heart attack and pulmonary embolism [44]. The second "Glycine betaine/proline transport system", makes part of the glycine betaine transport complex [45]. Glycine is involved in the formation of the peptidoglycan cell wall of Gram-positive bacteria and also helps in securing external cell structures [46], indicating that the bacteria have evolved abilities to survive the stress within the host cells, becoming more resistant to the intracellular environment. Figure 4b b. Representation of the eight genomic islands predicted using the same software by comparing the S. agalactiae GBS85147 strain against the 16 complete genomes of the S. agalactiae species obtained from the NCBI database. From the inner to outer ring (black) we used the genome of S. agalactiae GBS85147 strain as a reference, followed by GC -(purple) and GC + (green) content, the strains of S. agalactiae 09mas018883 [35], 138P [56], 138spar [57], 2603 V/R [58], A909 [59], CNCTC10/84 [60], COH1 [61], GBS1-NY [62], GBS2-NM [62], GBS6 [62], GD201008-001 [32], ILRI005 [63], ILRI112 [63], NGBS061 [64], NGBS572 [64] and SA20-06 [65] respectively. The last external ring in 4A display the pathogenic islands while the last external ring in 4B display the genomic islands, respectively (Additional file 1) shows eight genomic islands of unknown classification. This result indicates that Gipsy recognized the region as a probable genomic island, but could not identify it. An in-depth analysis of the genes present in this island revealed that much of the genes products hypothetical proteins, highlighting an importance of conducting further studies for genes present in this region in order to better characterize their functions.

Conclusion
The genome sequence of S. agalactiae GBS85147, obtained using the Ion Torrent PGM platform with approximately 246-fold coverage, was completely finished, manually annotated, its putative pseudogenes manually curated and the resulting genome file deposited in NCBI. After manual annotation of CDSs, the function of 1,713 (85.73 %) genes was identified and, after frameshift manual curation, only two pseudogenes remained. The final size of the genome is 2 Mb with G + C content of 35.48 %, consistent with the genomes of other strains of the S. agalactiae species.
The complete genome of GBS85147, the first isolate of oropharynx of an adult patient in Brazil, can help in further understanding the dissemination of this disease, and improve the identification of genes that allow the S. agalactiae serotype Ia to trigger the respiratory oxidative burst during adherence to the surface of activated macrophages. Furthermore, our data may become valuable to future comparative studies with other S. agalactiae strains of different serotypes in order to explore their virulence determinants, evolutionary relationships and the genetic basis of host tropism in S. agalactiae.

Additional file
Additional file 1: Strain ID Summary. (DOC 26 kb)