Complete genome sequence of an agr-dysfunctional variant of the ST239 lineage of the methicillin-resistant Staphylococcus aureus strain GV69 from Brazil

Staphylococcus aureus is a versatile Gram-positive coccus frequently found colonizing the skin and nasal membranes of humans. The acquisition of the staphylococcal cassette chromosome mec was a major milestone in the evolutionary path of methicillin-resistant S. aureus. This genetic element carries the mecA gene, the main determinant of methicillin resistance. MRSA is involved in a plethora of opportunistic infectious diseases. The accessory gene regulator is the major S. aureus quorum sensing system, playing an important role in staphylococcal virulence, including the development of biofilms. We report the complete genome sequence (NCBI BioProject ID: PRJNA264181) of the methicillin-resistant S. aureus strain GV69 (= CMVRS P4521), a variant of the ST239 lineage that presents with a natural attenuation of agr-RNAIII transcription and a moderate accumulation of biofilm.


Introduction
Staphylococcus aureus is an adaptable pathogen capable of infecting nearly all tissues and organs of the human body. Methicillin-resistant S. aureus is a major bacterial pathogen in terms of its incidence and the severity of associated illnesses. MRSA infections can affect either hospitalized patients or healthy individuals within the community [1]. Hospital-associated MRSA show a highly clonal population, and clonality have usually been characterized based on pulsed-field gel electrophoresis analysis, SCCmec typing and multilocus sequence typing. One of the most globally disseminated HA-MRSA lineages is the ST239-SCCmecIII [1].
We previously reported the complete genome sequence of the ST239 strain BMB9393 from Brazil that expresses high levels of agr-RNAIII transcripts, and has a superior ability to accumulate ica-independent biofilm [2]. The accessory gene regulator operon is the main quorumsensing system of S. aureus. It is well-known that agr regulates a plethora of virulence factors and key mechanisms associated with the pathogenesis of S. aureus infections, including the development of biofilm [3]. The agr-RNAIII transcripts and the AgrA protein are the regulatory molecules (effectors) of the agr operon [4].
We report here the complete genome sequence of an ST239 variant, strain GV69, which has a natural attenuation of the agr-rnaIII gene expression and forms a thinner biofilm layer in comparison to BMB9393.

Classification and features
We sequenced the complete genome of a variant of the ST239 MRSA lineage called GV69. This strain was * Correspondence: agnes@micro.ufrj.br † Equal contributors 1 isolated in 1996 from a skin wound infection in a patient admitted at a burn unit in a general public hospital in Teresina city, located at the northeast region of Brazil [5]. In Brazil, ST239 isolates are only associated with hospital infections, and they are broadly disseminated, multiresistant, and frequently grouped in the Brazilian epidemic clone, based on PFGE analysis, MLST, and mec typing [6][7][8]. Strain GV69 has a natural agr dysfunction and a moderate biofilm phenotype. The ability of many bacteria to develop biofilm is considered an important mechanism of colonization, primarily in infections associated with the use of indwelling medical devices [6]. GV69 strain is a non-motile, non-spore forming, non-hemolytic Gram-positive cocci in the family Staphylococcaceae, order Bacillales, and class Bacilli. Figure 1 shows the phylogenetic position of the GV69 in relation to other Staphylococcus spp. The GV69 strain is a facultatively anaerobic, mesophilic bacterium that can grow at temperatures of 30-37°C. S. aureus isolates exhibit a preference for glycolytic carbon sources. Acid is produced aerobically and anaerobically from glucose, lactose, maltose and mannitol, and aerobically from fructose, galactose, mannose, ribose, sucrose, trehalose, turanose and glycerol [9]. Figure 2 shows a photomicrograph of the S. aureus GV69 strain using Gram stain technique.
GV69 cultures were grown at 37°C with aeration (250 rpm) in rich media (tripticase soy broth) for 18 h, and the strain was initially identified by routine diagnostics based on Gram stain, mannitol fermentation, catalase testing and tube coagulase testing. A summary of the general information gathered for the GV69 is listed in the Table 1. Data from antimicrobial disc susceptibility test demonstrated that, in addition to methicillin and other β-lactam drugs, this strain is resistant to several different groups of antimicrobial drugs, although vancomycin and the more recent commercially available antibiotics are exceptions. In addition, GV69 strain shows an average biofilm unit of 0.86 (moderate biofilm phenotype), whereas BMB9393 has an average BU of 3.7 (strong biofilm phenotype) [6]. This strain was deposited at the public collection "Coleção de Micro-organismos de Referência em Vigilância Sanitária" of the Fundação Oswaldo Cruz with the reference name P4521 [10]. To construct the tree, the sequences were aligned with the RDP aligner using the Jukes-Cantor corrected-distance model for assembling a distance matrix based on the alignment model positions without the use of alignment inserts and with a minimum comparable position of 200. The tree was built with RDP Tree Builder, which uses Weighbor with an alphabet size of 4 and size length of 1000 [31]. The bootstrapping process was repeated 100 times to generate a strict consensus tree [32] Fig. 2 Photomicrograph of the S. aureus strain GV69 using Gran stain. bar = 10 μm

Genome project history
A collaboration between the Laboratório Nacional de Computação Científica, operated by the Ministério de Ciência e Tecnologia e Inovação of the Brazilian government, and the Universidade Federal do Rio de Janeiro sequenced, assembled, and annotated the complete GV69 genome as part of the ST239 Genome Program. This organism was selected for sequencing as a representative of the approximately 30 % of Brazilian ST239 isolates that display an agr dysfunction. The raw sequence data was deposited in NCBI's Sequence Read Archive (experiment accession number SRX1322312 and GV69 run accession number SRR2601051). The complete genome sequence of the GV69 strain was deposited in GenBank (accession number CP009681). Table 2 presents the project information and its association with MIGS version 2.0 compliance [11].

Growth conditions and genomic DNA preparation
A volume of 0.5 mL of a GV69 culture (37°C/18 h) was inoculated into a 250 mL-Erlenmeyer flask containing 50 mL of pre-sterilized TSB. The culture was grown at 37°C for 18 h under normal atmospheric conditions and shaking at 250 RPM. The bacteria were harvested by centrifugation (1500 × g at 4°C), washed twice in cold sterile water and the whole pellet used for DNA preparation. Cells were lysed with 20U/mL lysostaphin and DNA obtained by phenol extraction and ethanol precipitation [12]. The concentration and purity of the resulting DNA was assessed using a Qubit® 2.0 fluorometer (Invitrogen; Eugene, Oregon, USA). This genomic DNA (5 μg) was used to prepare a paired-end library.

Genome sequencing and assembly
The genome sequencing was performed using a 454 GS FLX Titanium (3-kb paired-end library) approach (Roche Diagnostics Corporation, Indianapolis, IN, USA). The assembly, based on 362,284 reads that corresponded to 62,981,906 bp (23-fold coverage), was performed using Newbler v2.6 (Roche) and Celera Assembler v6.1  [13]. Gaps within scaffolds resulting from repetitive sequences were resolved by in silico gap filling. For determining the small insertions and deletions occurring into homopolymer regions (at least three consecutive equal base pairs), the complete genomic sequence of the GV69 isolate was compared to that of the ST239 isolate, TW20, from United Kingdom, whose complete sequence is deposited in the GenBank (accession number: FN433596). For this comparison we applied Crossmatch (version 0.990329) with more stringent default parameter (mismatch = 14). The result of the alignment showed 541 inserts (of which 174 occurring into homopolymeric regions) and 575 deletes (of which 244 occurring into homopolymeric regions). In summary, the complete genome sequence of the GV69 isolate harbors 418 InDels occurring into homopolymer regions in relation to the genome sequence of the TW20 (Additional file 1: Table S1).

Genome annotation
The genome annotation was performed using the System for Automated Bacterial Integrated Annotation [14]. This software uses an automated annotation pipeline, where each open reading frame is submitted to comparison with several databases (NCBI-nr, KEGG, InterPro  The total is based on either the size of the genome in base pairs or the total number of genes in the annotated genome. b Not annotated. c Confirmed CRISPRs repeats = 0 [22] and UniProtKB/Swiss-Prot), and the results are made available on the screen for the assessment of expert users. All possible ORFs are predicted by Glimmer [15] and GeneMark [16] and tRNAs are detected by tRNAscan-SE [17]. The identification of bona fide ORFs and their probable functions takes in account the results of similarity searches using both nucleotide and amino acid sequences by BLAST against KEGG, NCBI-nr and UniProtKB/Swiss-Prot databases, and also the prediction of protein domains and important sites using InterPro [18]. ORFs with a good BLAST coverage in the NCBI-nr database, with a minimum of 60 % positive identity, 80 % query coverage, 80 % subject coverage, and 10 −5 evalue cutoff were assigned as "valid", with known function or hypothetical. On the other hand, when identified truncated version of a gene, because of nonsense or frameshift mutations in the coding sequence, the corresponding ORF was annotated as pseudogene. In addition, other analyses using SABIA pipeline comprised the classification of gene products according with biological processes, cellular components and molecular functions based on Gene Ontology [19,20]. The functional classification according with biological systems was based on KEGG and COG databases. The identification and classification of membrane transport proteins was based on Transporter Classification system available in TCDB database, and subcellular localization of proteins was predicted using PSORT tool [21]. CRISPRFinder was used for identifying clustered regularly interspaced short palindromic repeats [22].

Genome properties
The GV69 genome consists of one circular chromosome of 3,046,210 bp with a G + C content of 32.94 % (Fig. 3).
Using the SABIA pipeline [14], we functionally annotated 2,758 protein-coding sequences of which 2,285 were assigned to known functions, with the remaining 473 assigned to unknown categories. Seventy-six were assigned as putative pseudogenes. The genome harbors 5 rRNA operons (5 copies of 16S rRNA, 5 of 23S rRNA, and 6 of 5S rRNA) and 60 tRNA genes, which were identified with RNAmmer [23] and tRNAscan-SE [17], respectively. This information is summarized in Table 3. A total of 2,098 genes were assigned to COG; a breakdown of their functional assignments is shown in Table 4.

Conclusions
Comparative analyses were performed using the SABIA pipeline [14]. The bidirectional best hit (90 % amino acid identity and 90 % alignment coverage) comparison with six other published ST239 S. aureus genomes revealed that GV69 shares 2,415 CDS with BMB9393, another Brazilian ST239 variant; 2,328 CDS with strain JKD6008; 2,357 with strain TW20; 2,342 with strain T0131; 2,380 with Z172; and 2,290 with XN108. Despite that, GV69 has 170 unique CDS relative to the other six genomes, including an extra copy of a gene encoding a putative Nacetylmuramoyl-L-alanine amidase, an enzyme related to the bacterial cell autolytic function. This gene is located in a phage-associated mobile genetic element (phage-associated) inserted in the chromosome. Although belonging to the same lineage and clonal type, strains GV69 and BMB9393 have differences in their flexible genomes. In addition to 343 CDS (150 of unknown function, including several related to MGEs) found exclusively in GV69, this strain lacks a small 2,908 bp plasmid found in BMB9393 that carries the cat gene, a determinant for chloramphenicol resistance. The total is based on the total number of protein coding genes in the annotated genome