Insights into Cedecea neteri strain M006 through complete genome sequence, a rare bacterium from aquatic environment

Cedecea neteri M006 is a rare bacterium typically found as an environmental isolate from the tropical rainforest Sungai Tua waterfall (Gombak, Selangor, Malaysia). It is a Gram-reaction-negative, facultative anaerobic, bacillus. Here, we explore the features of Cedecea neteri M006, together with its genome sequence and annotation. The genome comprised 4,965,436 bp with 4447 protein-coding genes and 103 RNA genes.


Introduction
The Cedecea genus is an extremely rare member of the Enterobacteriaceae family [1]. The name Cedecea was proposed in 1980 for a new genus formerly designated as CDC Enteric Group 15 [1,2]. Cedecea is characterized by positive lipase activity, resistance to colistin and cephalothin, and the inability to hydrolyze gelatin or DNA [3][4][5]. Discovery was from human sources where its natural environmental habitat remains unknown, Cedecea constitutes a rare pathogen of rising importance [6]. To date, only a few species of Cedecea have been identified: C. davisae, C. lapagei and C. neteri. All three species exhibit different behaviors in the human body. C. davisae has been reported to be associated with scrotal abscess [7] and, most recently, to cause bacteraemia in patients with sigmoid colon cancer [8]. On the other hand, C. lapagei has mostly been reported to be involved in pneumonia cases [5,9]. C. neteri is associated with bacteremia in heart disease patients [4] and patients with systemic lupus erythematosus [10].
Strain M006 is a strain of Cedecea neteri and is an aquatic isolate from the Sungai Tua Waterfall, a Malaysian tropical rainforest waterfall (N 03 19.91′ E 101 42.15′). In this study, we present an overview of the classification and features of C. neteri M006 as well as its genome sequence and annotation. There are a few C. neteri aquatic isolates deposited in GenBank and C. neteri strain M006 was one of the few isolates discovered from a waterfall which its genome feature has not been reported. Hence, here we firstly reported the genome information of C. neteri M006 isolated from a waterfall environment.

Classification and features
Strain M006 was categorized as a member of the genus Cedecea by 16S rRNA phylogeny and phenotypic characteristics ( Table 1). The EzTaxon database [11] was used as the preliminary 16S rRNA gene sequence-based identification. Strain M006 was most closely related to C. neteri GTC 1717T (GenBank accession = AB086230) with a sequence similarity of 99.78%. Subsequent phylogenetic analysis was performed comparing the 16S rRNA gene sequences of strain M006 and related species (Fig. 1). The sequences were aligned and phylogenic trees were built using neighbor-joining (NJ) and maximum-likelihood (ML) methods implemented in MEGA version 5 [12].
C. neteri M006 cells are Gram-negative, bacillus in shape (0.6-0.7 × 1.3-1.9 μm), are facultatively anaerobic and are motile with 5-9 peritrichous flagella. Colonies formed on nutrient agar are 1.5 mm in diameter and nonpigmented. Scanning electron micrograph pictures of nutrient broth grown cultures showed free-floating cells Phylum Proteobacteria TAS [23,24] Class Gammaproteobacteria TAS [25][26][27] Order unknown TAS [23] Family Enterobacteriaceae TAS [28][29][30] Genus Cedecea TAS [4] Species Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [31] Fig. 1 Phylogenetic tree highlighting the position of Cedecea neteri M006 relative to the type strains of other species within the genus of Cedecea. The strains and their corresponding GenBank accession numbers of 16S rRNA genes are indicated in parentheses. The sequences were aligned and the phylogenetic inferences were obtained using the maximum-likelihood method with MEGA version 5 [12]. The numbers at the nodes are the percentage of bootstrap values obtained by 500 replicates. Bar, 0.01 substitutions per nucleotide positions and clotted cells (Fig. 2). The carbon sources utilized by C. neteri are D-sorbitol, sucrose, D-xylose and malonate. C. neteri is reported to be unable to utilize dulcitol, adoitol, L-rhamnose, erythritol, glycerol and mucate. The optimal temperature for strain M006 is 28°C. C. neteri M006 cells are Gram-negative, bacillus in shape, survive facultative anaerobically and are motile. The colonies formed on nutrient agar are 1.5 mm in diameter and are non-pigmented. The colony is whitish in color and the appearance is round with a smooth edge. Signaling molecules, known as N-acylhomoserine lactone, are produced for communication purposes in order to regulate physiological properties. The preliminary screening of strain M006 using the bacterial biosensor Chromobacterium violaceum (CV026) showed the purple pigmentation indicative the presence of signaling molecules (Fig. 3).

Genome sequencing information
Genome project history Strain M006 was selected for the sequencing based on its phylogenetic position and the similarity of its 16S rRNA to other members of the genus Cedecea, The genome project was deposited in the Genomes On-Line Database [13] and the genome sequence was deposited in GenBank (CP009458.1). A summary of the project  [14] are shown in Table 2.

Growth conditions and genomic DNA preparation
Cedecea neteri M006 was cultured aerobically on Luria-Bertani (LB) agar medium at 28°C overnight (16-18 h). Genomic DNA was extracted using the MasterPure™ DNA Purification Kit (Epicentre Inc., Madison, WI, USA). The extracted genomic DNA was examined via a   The total is based on the total number of protein coding genes in the annotated genome NanoDrop spectrophotometer (Thermo Scientific, Waltham, MA, USA) and a Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA) for its quality.

Genome sequencing and assembly
The genome of strain M006 was sequenced at the microbiome lab, High Impact Research, University Malaya, using a Pacific Biosciences single-molecule real-time (PacBio SMRT) sequencer. The sequencing was carried out using P5 chemistry on two SMRT cells with a 20-kb prepared SMRTbell library [15]. De novo assembly of 41,094 reads using the hierarchical genome assembly process in the SMRT version 2.1.1 portal resulted with one contig of 3.96 Mb in size. The sequencing average coverage is 74.34 × and this genome has a GC content of 54.41%.

Genome annotation
After genome assembly, it was analyzed using Rapid Annotation using Subsystem Technology server databases (version 2.0) [16], which identified 4423 predicted coding sequences with a total of 103 RNA genes. The predicted open reading frames were annotated by searching clusters of orthologous groups [17] using the Integrated Microbial Genomes Expert Review [18]. The different groups of RNAs (rRNA and tRNA) were identified by using RNAmmer 1.2 [19] and tRNAscan-SE 1.23 [20] respectively. The additional gene prediction analysis and functional annotation were performed within IMG-ER platform.

Genome properties
The genome comprised a circular chromosome with a length of 4,965,436 bp and 54.41% G + C content ( Fig. 4 and Table 3). It is composed of one contig and of the 4550 predicted genes, 4447 were proteincoding genes. The properties of and the statistics for the genome are summarized in Table 3. The distribution of genes into COG functional categories is presented in Table 4.

Insights from the genome sequence
RAST annotation allowed the insight of subsystem category distribution of C. neteri strain M006. This category enabled the understanding of various functional roles such as protein classes, amino acid Analyses of conserved genes in the core genome computed based on AAI calculator provided (a) an AAI matrix; and (b) AAI-based phylogenetic distance tree, clustered according to distance pattern. The AAI-distance tree was clustered based on BIONJ method biosynthesis and metabolic pathways. There are 552 subsystems. The most abundant subsystem feature belonged to carbohydrate metabolism (n = 576; out of a total of 3760 subsystem feature counts), followed by amino acid and derivatives (n = 495) and protein metabolism (n = 299) (Fig. 5). One of the subsystem features grouped as regulation and cell signaling was focused to allow functional genes related to quorum sensing (QS) activity to be searched. The in-silico study identified the novel LuxIR homologue of C. neteri, which was later designated as CneIR. The complete open reading frame of C. neteri strain M006 cneI and cneR homologues were found and are 462 bp and 723 bp, respectively. The complete genome sequencing allows deeper understanding of the genetic makeup that may help in identifying the linkage of pathogenicity and virulence factors with its QS properties [15]. Currently, the availability of genomes of this genus is low. Only 5 complete genomes of C. neteri strains including strain M006 and a draft genome of type strain NBRC 105707 are deposited in NCBI. A matrix and dendrogram were generated based on AAI calculation that provide estimation of the average amino acid identity using best hits (one-way AAI) and reciprocal best hits (two-way AAI) between several genomic datasets of proteins [21], C. davisae type strain DSM 4568 was included in the analyses. From the analyses, we can see closer protein clustering between strain M004 and strain ND14a (Fig. 6). Some of the basic comparisons of the genomes are listed in Table 5.

Conclusion
This study provides phenotypic and genomic insights into Cedecea neteri strain M006. It reports the isolation of C. neteri from an aquatic environment for the first time. This study also revealed of the QS ability of C. neteri.