Complete genome sequence of new bacteriophage phiE142, which causes simultaneously lysis of multidrug-resistant Escherichia coli O157:H7 and Salmonella enterica

The emergence of antibiotic-resistant foodborne bacteria is a global health problem that requires immediate attention. Bacteriophages are a promising biotechnological alternative approach against bacterial pathogens. However, a detailed analysis of phage genomes is essential to assess the safety of the phages prior to their use as biocontrol agents. Therefore, here we report the complete genome sequence of bacteriophage phiE142, which is able to lyse Salmonella and multidrug-resistant Escherichia coli O157:H7 strains. Bacteriophage phiE142 belongs to the Myoviridae family due to the presence of long non-flexible tail and icosahedral head. The genome is composed of 121,442 bp and contains 194 ORFs, and 2 tRNAs. Furthermore, the phiE142 genome does not contain any genes coding for food-borne allergens, antibiotics resistance, virulence factors, or associated with lysogenic conversion. The bacteriophage phiE142 is characterized by broad host range and compelling genetic attributes making them potential candidates as a biocontrol agent. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0211-5) contains supplementary material, which is available to authorized users.


Introduction
Foodborne diseases are an important cause of morbidity and mortality worldwide, therefore are a serious public health problem [1]. Bacteria cause the majorities of foodborne illnesses; Escherichia coli and Salmonella are among the most common foodborne pathogens that affect millions of people annually [2]. Furthermore, the emergence of antimicrobial resistance E. coli and Salmonella strains makes more difficult its control [3]. Hence, novel control methods for reducing the risk of bacterial food contamination, which are both environmental friendly, are urgently needed.
In this context, bacteriophages have several potential applications in the food industry; these killing-bacteria viruses are alternatives to conventional antimicrobials method for the control of pathogenic bacteria and have great potential in the improvement of food safety [4][5][6]. Bacteriophages suitable for biocontrol purposes must be genetically sequenced to ensure that are strictly lytic (always lyse infected cells host), does not encode any bacterial virulence factors or proteins with a potential to cause allergenicity [7,8].
The primary aim of our research group is increase knowledge of phage biodiversity and contribute to the understanding of different types of phages in several regions of Sinaloa, an important agricultural region in Northwestern Mexico. Recently, a new bacteriophage, designated as phiE142, one of phages isolated, exhibits a high potential as a biocontrol agent [9]. However, information about genome of phage phiE142 is still limited; therefore, to further understand the phage biology, the genome was sequenced.

Classification and features
The bacteriophage phiE142 was previously isolated in Food and Environmental Microbiology Laboratory at the Research Center for Food and Development from animal feces samples collected on a farm in Northwestern Mexico. An E. coli strain EC-48 (bacterial used for bacteriophage propagation and titration), was also isolated from the same geographical region two years before the isolation of the phage [10]. Phage phiE142 produced clear plaques of 2 to 3 mm in diameter on the E. coli EC-48 lawn; the plaques were already visible after four to six hours of incubation time at 37°C.
We analyzed the lytic host range of phage using spot tests assays of different bacterial, including 48 Salmonella strains and 33 E. coli strains (Additional file 1: Table S1). Based upon spot testing results, the phage phiE142 had lytic activity against 76% of the E. coli strains and 29% of Salmonella strains tested. These results indicate that bacteriophage phiE142 has the potential to be evaluated as an alternative strategy to biocontrol of E. coli and Salmonella.
The phiE142 phage was stained with 2% uranyl acetate and examined by transmission electron microscopy (TEM) and classified into its appropriate viral morphotype according to Ackermann's classification [11]. The analysis suggests that phage phiE142 belongs to the order Caudovirales and family Myoviridae based on the presence of almost isometric head with an average diameter of ∼ 58 nm, long non-flexible contractile tail about 120 nm in length ( Fig. 1) [12]. Phage phiE142 has a genome of 121,442 bp, with a coding region of 94.4%, GC content of 37.4%, and the gene density is 1.60. It contains 194 coding sequences ranging from 102 bp to 3,300 bp, with 53 genes on the positive strand and 141 genes on the negative strand. Phylogenetic characteristics of this phage are indicated in Table 1.
The sequence of DNA polymerase has become a commonly-used marker for constructing phylogenetic analysis, therefore the phylogenetic tree was performed based of DNA polymerase deduced amino acid sequences. According to the phylogenetic tree, the phage phiE142 and others eight phages that infect the bacterial family Enterobacteriaceae were clustered in the same group (Figs. 2 and 3). All of these phages are members of the Tevenvirinae subfamily and are strictly lytic (Based on PHACTS program server). Considering the close relationship among these phages, it is likely that phiE142 also belongs to this genus. This result confirms the findings obtained by electron microscopy.

Genome project history
The bacteriophage phiE142 is one of the first genome to be completely sequenced publicly available for a phage infecting E. coli and Salmonella strains isolated from environmental sources in Northwest Mexico. The analysis of more genomes of bacteriophages is necessary to increase our understanding of the genetic diversity of bacteriophages, phage biology, basic molecular mechanisms, and provide a deeper insight into the relationship of phages with their hosts. Furthermore, analysis of phage genomes may reveal novel antimicrobial peptides and enzymes with bactericidal activity. In addition, the genome well understood is an essential requisite to ensure the safety of the phages prior to their use as biocontrol agents. Therefore, the genome project was deposited in the Genomes On Line Database (GOLD). The genome sequence of bacteriophage phiE142 was deposited in GenBank under accession number KU255730. The summary of genome project is available in the Table 2.

Growth conditions and genomic DNA preparation
Standard double-layer agar plate method was used to obtain high-titer stocks of the phage phiE142 [13], with some modifications. Briefly, 100 μl of phage stock and 1 ml of overnight culture of E. coli strain EC-48 were mixed with 3 ml TSB with 0.4% agarose, spread on TSA plates, and incubated overnight at 37°C. After, phage was subsequently collected by adding 6 ml of SM buffer (50 mM Tris-HCl, pH 7.5, 0.1 M NaCl, 8 mM MgSO 4 , 0.01% gelatin) to the surface of each plate and the soft agar was scraped off the surface of the agar plates. Cell debris was removed by subsequent centrifugation at 5,500 × g for 10 min, the supernatant was filtered with 0.22 μm syringe filters, and phage particles were precipitated by centrifugation at 40,000 × g at 4°C for 2 h. The phage pellet was suspended in SM buffer and stored at 4°C. Bacteriophage DNA was isolated by the method of proteinase K and phenol-chloroform as previously described [14], with minor modifications. One milliliter of purified phage suspension was treated with 1 μg/ml of DNaseI and RNaseA (Sigma-Aldrich) at 37°C for 1 h. Subsequently, sodium dodecyl sulfate (final concentration, 0.5%), EDTA (20 mM, pH 8.0), and proteinase K (final concentration, 25 μg/ml) were added, and the suspension was incubated at 56°C for 1 h. After proteins were removed by an equal volume of phenol-chloroform (1:1), and DNA was precipitated from the aqueous phase by cold ethanol. Following centrifugation at 15, 000 × g for 15 min at 4°C, the pellet was washed twice with 70% ethanol, centrifuged at the same conditions. Finally, the dried DNA pellet was suspended in nuclease-free water. Concentration of phage DNA was estimated with a NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE) and also the quality of extracted DNA was also tested visually with electrophoresis on a 1% agarose.

Genome sequencing and assembly
High-throughput DNA Sequencing of phage genomic DNA was performed using HiSeq 2000 technology (Illumina) to produce 100 bp paired-end reads, library construction and sequencing were performed according to the manufacturer's instructions. In total, about 18 million pair reads of 100 bases in length were obtained with a quality filter threshold of Q30. The reads were analyzed and quality checked using FastQC and Geneious software package R8 (Biomatters Ltd., New Zealand) was used to trim raw reads with a low quality score. The de novo assembly was conducted with Velvet (implemented in Geneious, running VelvetOptimiser for selection of kmer), resulting in one final contig with coverage from approximately 10,000-fold. Additional manual functional annotation and genome map was performed using Geneious software.

Genome annotation
Open reading frames (ORFs) were identified using Glimmer 3.02 [15], GeneMark.hmm [16], and ORF Finder [17]. The putative functions of the ORFs were analyzed by protein BLASTp searches, with a cut off E value of 10 −4 . Predicted protein sequences were analyzed against InterProScan [18], Pfam [19] and TMHMM Server version 2.0 [20] for conservative domain identification. Signal peptides were predicted using SignalP 4.1. The search of putative tRNA encoding genes was done using ARAGORN [21] and tRNAscan-SE [22]. The origin of replication was predicted using a GC-skew plot generated by GenSkew [23]. Moreover, all identified ORFs were compared against the virulence factor database [24] and the ResFinder database [25]. Additionally, the predicted phage protein sequences were searched to identify proteins potentially allergenic using tools from the Food Allergy Research and Resource Programme [26]. The lifestyle of the phages was predicted using the PHACTS program [27]. Whole genome comparisons were carried out using Mauve [28].

Genome properties
The detailed annotation information for phage genome was summarized in Table 3. The phage has a DNA genome consisting of 121,442 bp with a GC content of 37.4%, which is significantly lower than that of the host E. coli (about 50% GC). Genome analysis of the phage revealed 194 putative open reading frames (94.4% of the genome consists of a coding region), with 26 oriented in a forward orientation and 168 in a reverse orientation, and two tRNA genes were identified. Based on BLAST results, functions were assigned to 95 of the genes; most of the annotated genes (98 genes) were hypothetical proteins, probably due to the enormous diversity of bacteriophages and the insufficient database information about the functional genes of phage. Only one gene product is hypothetical novel proteins (Additional file 2: Table S2). The distribution of the ORFs into COG functional categories is provided in Table 4.

Insights from the genome sequence
The results of BLAST revealed that the genome of phage phiE142 has a high similarity (query coverage, 94%; identity, 97%) with coliphage vB_EcoM_PhAPEC2, which belong to the Tevenvirinae subfamily of the genus T4-like viruses, an observation that is consistent with the analysis of the DNA polymerase. We therefore concluded that phiE142, based on sequence similarity, belong to the Tevenvirinae subfamily. However, some differences in genome organization were observed, because progressive Mauve genome alignment revealed one colinear block that is in the different order in both bacteriophages (Additional file 3: Figure S3). The principle region of genomic dissimilarity was located between 110,000 pb and 121,000 pb, this region includes a set of ORFs found to be associated with phage-host recognition, suggesting specific features of phage evolution. The phiE142 genome is functionally organized into four modules containing gene clusters for virion morphogenesis, DNA replication/regulation, DNA packaging, and host cell lysis. This modular organization of the genome is typical of bacteriophages.
Thirty-one ORFs were found to encode proteins involved in the morphogenesis of virions. These include the ORFs 1-3, 170, 172, 175-185, and 187-194, which are proposed to be genes encoding the components of the tail fiber and baseplate. Databases homology searches suggested that ORFs encoding capsid protein are 46, 139, 142, and 174. Additionally, the proteins encoded by ORFs 185 and 186 are most similar in its amino acid sequence to neck protein.
Overall, a total of 46 ORFs are associated with processing of the viral DNA. Our analysis of the phage genomes reveals several genes potentially involved in nucleotide metabolism, including ORFs 14-15, 38-39, 47, 64, 70, 96, 100-101, 125, and 171. In addition, genes that encode proteins involved in replication and transcription of its own DNA were identified in ORFs 5,7,[12][13]18,[20][21][24][25][28][29]32,[34][35]37, 49, 56,   Two ORFs exhibit similarity to a gene involved in the host cell lysis, including endolysin and holin. The protein encoded by ORF 143 displays a high degree of identity with the endolysin. This ORF contained one glycohydrolase domain (hydrolyse the beta-1,4-glycosidic bond between N-acetylmuramic acid and N-acetylglucosamine), which indicates that this protein is probably an enzyme that degrades peptidoglycan. While the putative protein of ORF 4 was identified as a holin protein. Unusually, this ORF is not located adjacent to the endolysin ORF, in most genomes bacteriophages, the holin ORF is adjacent or overlaps a ORF encoding an endolysin. The deduced holin encoded by phiE142 phage has one putative transmembrane domain, and thus resembles class III holins.
The phage lifestyle prediction result of PHACTS indicated that the phiE142 is a virulent phage, consistent with the results of genomic analysis, which revealed the absence of genes associated with the establishment and maintenance of lysogenic cycle.
The DNA packaging module includes ORF 60, which encode the putative portal protein. However, it was not possible to identify the terminase subunits.