Genome sequences and annotation of two urinary isolates of E. coli

The genus Escherichia includes pathogens and commensals. Bladder infections (cystitis) result most often from colonization of the bladder by uropathogenic E. coli strains. In contrast, a poorly defined condition called asymptomatic bacteriuria results from colonization of the bladder with E. coli strains without symptoms. As part of an on-going attempt to identify and characterize the newly discovered female urinary microbiota, we report the genome sequences and annotation of two urinary isolates of E. coli: one (E78) was isolated from a female patient who self-reported cystitis; the other (E75) was isolated from a female patient who reported that she did not have symptoms of cystitis. Whereas strain E75 is most closely related to an avian extraintestinal pathogen, strain E78 is a member of a clade that includes extraintestinal strains often found in the human bladder. Both genomes are uncommonly rich in prophages.

In an effort to provide a comprehensive view of the newly discovered female urinary microbiota, we have established an Enhanced Quantitative Urine Culture protocol. This enhanced culture protocol isolates bacteria from 75 to 90 % of urine samples deemed 'no growth' by the standard clinical microbiology urine culture method [4,7,10]. We have begun to the sequence and annotate the genomes of these isolated bacteria.
Here, we report the full genome sequences and annotations of two of those bacteria, Escherichia coli strains E75 and E78 isolated from female patients pursuing urogynecologic clinical care. Strain E75 was isolated from a patient who thought that she did not have a urinary tract infection, while E78 was isolated from a patient who thought that she did. The strains were sub-cultured to purity and then identified as E. coli by Matrix-Assisted Laser Desorption/Ionization-Time-of-Flight Mass Spectrometry [10]. Strain E75 is most closely related to APEC O1, an avian extraintestinal pathogen. In contrast, strain E78 is a member of a clade that includes extraintestinal strains often associated with the human bladder, including uropathogenic strains UTI89 and J89 and asymptomatic bacteriuric strain ABU83972. Both genomes are uncommonly rich in prophages.

Classification and features
Escherichia coli is a non-sporulating, Gram-negative, rod shaped bacterium. It is a facultative anaerobe found commonly in the environment and the lower intestines of mammals and other endotherms. Extra-intestinal strains can colonize other organs, including the urinary bladder. Most E. coli strains are harmless constituents of the normal microbiota, but others cause disease. For example, uropathogenic E. coli is the major case of urinary tract infections in humans; other E. coli strains colonize the bladder without causing symptoms, a condition called asymptomatic bacteriuria.
Transmission electron microscopy images were generated for both E75 and E78 (Fig. 1). Cell pellets were fixed with 0.1 % Ruthenium Red en bloc with sequential gluteraldehyde and osmium tetroxide fixation steps. These fixed samples were dehydrated with Ethanol and embedded in Resin. Ultrathin sections of 80 nm were mounted on copper grids, post-stained with uranyl acetate and lead citrate and observed in a Hitachi H-600 transmission electron microscope at 75 kV. Films were taken, negatives developed and scanned via a Microtek i800 film scanner. PhotoShop was used to convert negatives to positive images and adjust for brightness and contrast. The transmission electron micrographs revealed the typical E. coli rod-shape morphology. Strain E75 tended to possess electron poor intracellular inclusions (Fig. 1b, black arrow). The general features of E. coli strains E75 and E78 are presented in Table 1.
E. coli strains E75 and E78 were isolated from patients who sought clinical care at Loyola University Medical Center's Female Pelvic Medicine and Reconstructive Surgery center in September 2014. Patients were asked the question: Do you feel that you have a urinary tract infection? E75 was isolated from a patient who answered 'no, ' whereas E78 was isolated from a patient who answered 'yes.' Both patients were white, post-menopausal women seeking care for Pelvic Organ Prolapse. Neither patient was taking antibiotics; both were using daily vaginal estrogen supplement. The UTI Symptoms Assessment Questionnaire was used to characterize the degree of severity and bother of the patients' symptoms [11]. Both E. coli strains were identified at >100,000 colony forming units per milliliter, using an Expanded Spectrum version of the Enhanced Quantitative Urine Culture protocol [10]. After they were sub-cultured to purity, Matrix-Assisted Laser Desorption/Ionization-Time-of-Flight Mass Spectrometry was used to confidently identify them as E. coli. For E75, the identification score was 2.530; for E78, the score was 2.265. No other microbes were detected in the urine sample containing strain E75. In the urine sample containing strain E78, Alloscardovia omnicolens (10 colony forming units per milliliter) and Lactobacillus rhamnosus (10 colony forming units per milliliter) were also detected. Figure 2 shows a phylogenetic tree of the 16S rRNA sequences. 16S rRNA gene sequences include Yersinia enterocolitica (NR_104903), E. coli IAI39 (NC_011750), E. coli O157:H7 str. Sakai (NR_074891), E. coli K-12 substr. MG1655 (NR_102804), E. coli O157:H7 str.

Genome project history
The sequencing and quality assurance was performed at the Loyola Genome Facility at Loyola University Chicago, Maywood, IL, USA. The assemblies and finishing were done at the Lakeshore Campus of Loyola University Chicago, Chicago, IL, USA. Functional annotation was produced by the RAST service [13] and in-house scripts for COG classification [14]. Table 2 presents the project information and its association with MIGSversion2.0compliance [15].
Growth conditions and genomic DNA preparation E. coli strains E75 and E78 were isolated from transurethral catheterized urine specimens of adult women with urinary symptoms [10] using a Expanded Spectrum version of the previously described Enhanced Quantitative Urine Culture protocol [4]. Three urine volumes (1 μL, 10 μL, and 100 μL) of each urine sample was spread quantitatively (i.e., pinwheel streak) onto t5% sheep blood (BD BBL™ Prepared Plated Media, Cockeysville, MD), Chocolate, and Colistin Naladixic Acid agars (BD BBL™ Prepared Plated Media) and incubated in 5 % CO 2 at 35°C for 48 h; 5 % sheep blood and MacConkey (BD BBL™ Prepared Plated Media) agars incubated aerobically at 35°C for 48 h; two CDC Anaerobic 5 % sheep blood agars (BD BBL™ Prepared Plated Media) incubated in either Microaerophilic Campy gas mixture (5 % O 2 , 10 % CO 2 , 85 % N), or anaerobically at 35°C for 48 h. All agars were documented for growth (i.e., for morphologies and colony forming units per milliliter) at 24 and 48 h. Each distinct colony morphology was sub-cultured at 48 h to obtain pure culture for microbial identification.
Microbial identification was determined using a Matrix-Assisted Laser Desorption/Ionization-Time-of-Flight Mass Spectrometer (Bruker Daltonics, Billerica, MA) as described [4]. Pure cultures were stored at -80°C in a 2 ml CryoSaver Brucella Broth with 10 % Glycerol, no beads, Cryovial, for preservation (Hardy Diagnostics). For genome extraction and sequencing, the preserved pure culture isolates were grown on 5 % sheep blood agar under aerobic conditions at 35°C for 24 h. Evidence codes-IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [40] Genomic DNA extraction was performed using a phenol-chloroform extraction protocol. Briefly, cells were resuspended in 0.5 mL DNA Extraction Buffer (20 mM Tris-Cl, 2 mM EDTA, 1.2 % Triton X-100, pH 8) followed by addition of 50uL Lysozyme (20 mg/mL), 30uL Mutanolysin, and 5uL RNase (10 mg/mL). After a 1 h incubation at 37°C, 80uL 10 % SDS, and 20uL Proteinase K were added followed by a 2 h incubation at 55°C. 210uL of 6 M NaCl and 700uL phenol-chloroform were then added. After a 30-min incubation with rotation, the solutions were centrifuged at 13,500 RPM for 10 min, and the aqueous phase was extracted. An equivalent volume of Isopropanol was then added, and solution was centrifuged at 13,500 RPM for 10 min after a 10-min incubation. The supernatant was decanted and the DNA pellet was precipitated using 600uL 70 % Ethanol. Following ethanol evaporation, the DNA pellet was resuspended in Tris-EDTA and stored at -20°C.

Genome sequencing and assembly
DNA samples were diluted in water to a concentration of 0.2 ng/ul as measured by a fluorometric-based method (Life Technologies, Carlsbad, CA) and 5 ul was used to obtain a total of 1 ng of input DNA. Library preparation was performed using the Nextera XT DNA Library Preparation Kit (Illumina, San Diego, CA) according to manufacturer's instructions. The isolates were barcoded, pooled and each isolate was sequenced twice, on two separate runs, using the Illumina MiSeq platform and the MiSeq Reagent Kit v2 (300-cycles) to produce 150 bp paired-end reads. Sequencing reads were parsed into individual folders according to the respective barcodes.
Sequence assembly was conducted using Velvet [16] ( Table 2). The tool VelvetOptimiser was used to determine the best hash length; 99 was used in the two assemblies performed here. The scaffolding software SSPACE [17] was utilized for scaffold finishing. The genome of strain E75 was assembled into 463 contigs. The genome of strain E78 was assembled into 62 contigs. To confirm that the contigs were Enterobacteriaceae (i.e. not the result of contamination), each contig was  [19]. Two plasmids were identified by the SPAdes assembler with high coverage. Querying these two contigs against the Gen-Bank nr/nt database revealed sequence homology to the annotated E. coli plasmids p2PCN033 (GenBank: CP006634) and pVR50G (GenBank: CP011141) (among other E. coli plasmids), respectively. These two plasmids are listed in Table 3 as Plasmid pE78.1 and Plasmid pE78.2, respectively. All 62 E78 contigs were also assessed for putative plasmid sequences using Plasmid-Finder [20]. While PlasmidFinder recognized pE78.2, it did not detect pE78.1. The complete genome of the E78 chromosome is thus represented within 60 contigs (mean coverage 272×).

Genome annotation
Genes were identified using GLIMMER using the g3from-scratch.csh script included in the package [21] The predicted CDSs were translated using the transeq script within the EMBOSS suite [22]. rRNA genes were identified by RNAmmer [23] using the parameter set to identify bacterial rRNA sequences. The program tRNA-Scan [24] identified tRNA sequences, using the parameter for bacterial tRNAs. Trans-membrane proteins were identified using TMHMM with standard parameters [25]. Sig-nalP [26] predicted signal peptides. All CDSs were queried (blastp) locally against the COG sequence dataset ( [14]) and assigned based upon their sequence homologies. CRISPR elements were detected through CRISPR-db [27]. Genes with Pfam domains were ascertained via searches of the Pfam database (E-value threshold 1.0) [28].

Genome properties
Tables 4 and 5 include the summaries of the properties and statistics of each genome. Sequencing of the E78 isolate identified two plasmids (Table 3); the E75 isolate did not contain any identifiable plasmid sequences. The E75 and E78 chromosomes are similar in length and GC content: E75 is 5,032,328 bp (GC content 50.4 %), while E78 is 5,021,201 bp (GC content 50.3 %). The genomes for E75 and E78 are predicted to include 4587 and 4743  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome protein coding genes, respectively. A similar coding density is observed within the two genomes. The 85 RNA genes identified within the E75 genome include 78 tRNAs and 7 rRNAs. The E78 genome encodes for more RNA genes: 83 tRNAs and 13 rRNAs. The scaffolds of E75 and E78 are only annotated as having a single 16S rRNA gene, an underestimation due to recognized challenges of assembling sequences containing genes with multiple copies such as the rRNA genes [29] Thus, we fully expect that the E75 and E78 genomes harbor rRNA gene numbers on par with the genus.

Insights from the genome sequence
Although E75 was isolated from a woman who reported that she did not have symptoms of cystitis, its genome encodes proteins associated with E. coli pathogenesis, including the P pilus, RTX toxin, and α-fimbriae. These genes were not found in E78. While the E75 strain did not include plasmid sequences, genome sequencing of the E78 isolate contained two. Plasmid pE78.2 was nearly identical (one mismatch) to the E. coli plasmid pVR50G, collected from urine obtained from an individual with asymptomatic bacteriuria [30]. Both genomes included a number of prophages. Each prophage sequence within the genomes was BLASTed (blastx) to the nr/nt database revealing numerous hits to phage sequences annotated as infecting Escherichia spp. Annotations within the genomes of the temperate phages Lambda and P4 were identified most frequently within the E75 and E78 genomes, respectively. Table 6 lists the statistics of this search. The vast majority of the hits were to phages annotated as infectious for The total is based on the total number of protein coding genes in the genome Escherichia, Salmonella, and/or Shigella spp. Nevertheless, prophage sequences for both temperate as well as lytic phages were identified. The abundance of prophage sequences within these two genomes exceeds that previously identified in E. coli genomes.

Conclusions
The genome of E75, isolated from a woman who reported no symptoms of cystitis, is more closely related to the avian extraintestinal pathogen APEC 01. The genome of E75, isolated from a woman who reported cystitis symptoms, resides in a clade populated by human extra-intestinal strains that are either uropathogenic or asymptomatic bacteriuric. Both genomes contain an unusually large number of prophage sequences.