Permanent draft genome sequence of Frankia sp. NRRL B-16219 reveals the presence of canonical nod genes, which are highly homologous to those detected in Candidatus Frankia Dg1 genome

Frankia sp. NRRL B-16219 was directly isolated from a soil sample obtained from the rhizosphere of Ceanothus jepsonii growing in the USA. Its host plant range includes members of Elaeagnaceae species. Phylogenetically, strain NRRL B-16219 is closely related to “Frankia discariae” with a 16S rRNA gene similarity of 99.78%. Because of the lack of genetic tools for Frankia, our understanding of the bacterial signals involved during the plant infection process and the development of actinorhizal root nodules is very limited. Since the first three Frankia genomes were sequenced, additional genome sequences covering more diverse strains have helped provide insight into the depth of the pangenome and attempts to identify bacterial signaling molecules like the rhizobial canonical nod genes. The genome sequence of Frankia sp. strain NRRL B-16219 was generated and assembled into 289 contigs containing 8,032,739 bp with 71.7% GC content. Annotation of the genome identified 6211 protein-coding genes, 561 pseudogenes, 1758 hypothetical proteins and 53 RNA genes including 4 rRNA genes. The NRRL B-16219 draft genome contained genes homologous to the rhizobial common nodulation genes clustered in two areas. The first cluster contains nodACIJH genes whereas the second has nodAB and nodH genes in the upstream region. Phylogenetic analysis shows that Frankia nod genes are more deeply rooted than their sister groups from rhizobia. PCR-sequencing suggested the widespread occurrence of highly homologous nodA and nodB genes in microsymbionts of field collected Ceanothus americanus. Electronic supplementary material The online version of this article (10.1186/s40793-017-0261-3) contains supplementary material, which is available to authorized users.


Introduction
The symbiosis resulting from members of the genus Frankia interacting with the roots of 8 dicotyledonous plant families (referred to actinorhizal plants) is found worldwide and contributes to the ability of actinorhizal pioneer plants to grow in poor and marginally fertile soils [1]. This symbiotic association has drawn interest because of its higher rate of soil nitrogen input and the ability of the plants to overcome harsh environmental conditions [2]. The molecular mechanism for the establishment of an actinorhizal nitrogen-fixing root nodule remains elusive [3]. Molecular phylogeny of the Frankia genus has consistently identified four main clusters regardless of the typing locus used [1]. These Frankia clusters also follow and support the host specificity groups proposed by Baker [4]. Cluster 1 is divided into sub-cluster 1a including F. alni and relatives that are infective on Alnus and Myricaceae and sub-cluster 1b strains that are infective on Allocasuarina, Casuarina and Myricaceae including F. casuarinae [5]. Cluster 2 contains F. coriariae [6] and uncultured microsymbionts of Coriariaceae, Datiscaceae, Dryadoideae and Ceanothus, while cluster 3, associated F. elaeagni [5], "F. discariae" [7] and closely related strains are infective on Colletieae, Elaeagnaceae, Gymnostoma and Myricaceae. Finally, cluster 4 groups a broad range of non-nitrogenfixing and infective strains including F. inefficax species [8] together with "F. asymbiotica" [9] and other related strains that are unable to establish a symbiosis with actinorhizal plants. As has been established for rhizobial and arbuscular mycorrhizal symbioes, the LysM-RLKs are also involved in the perception of Frankia signal molecules by the actinorhizal plant [10,11]. However, the bacterial signals triggering this symbiosis remain unknown. At present, more than 30 Frankia genomes from strains in pure culture have been sequenced and annotated [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] and two Candidatus genomes were generated from nodule metagenomes [31,32]. Analysis of the Frankia genomes failed to reveal the presence of common canonical nodABC genes [33] which also appear to be missing in several photosynthetic [34] and non-photosynthetic [35] bradyrhizobia. The only exceptions were found in the two Candidatus Frankia genomes, which contained the canonical nodABC and sulfotransferase nodH genes [32,36]. This contradictory situation justifies additional sequencing of genomes from cultivated Frankia strains to gain insight into the depth of the pangenome pool covered. Here we report the first proof of the presence of rhizobial homologous canonical nodABCH genes within the draft genome of cultivated Frankia isolate, strain NRRL B-16219 and widespread occurrence of nodAB in field collected Ceanothus americanus microsymbionts.

Classification and features
Strain NRRL B-16219 metabolizes short fatty acids, TCA-cycle intermediates and carbohydrates (Table 1). It is infective on members of Elaeagnaceae and Morella cerifera and produces effective root nodules [4,37]. In coherence with its host range, strain NRRL B-16219 is phylogenetically affiliated to cluster 3, known to effectively nodulate members of Elaeagnaceae, Rhamnaceae and Myricaceae families. Phylogenetic analysis based on 16S rRNA gene sequence showed that strain NRRL B-16219 was most closely related to type strains of "F. discariae" DSM 46785 T (99.78%) and F. elaeagni (98.26%) (Fig. 1). Frankia sp. strain NRRL B-16219 shows typical Frankia morphological structures; branched hyphae, vesicles, the site of nitrogenase activity, and multilocular sporangia containing non-motile spores (Fig. 2).

Extended feature descriptions
Strain NRRL B-16219 represents one of the rare Frankia strains directly isolated from soil on plate medium without passing through plant trapping assay. The strain was isolated from the rhizosphere of Ceanothus jepsonii [37] following a complex protocol of soil treatment with phenol (0.7%), sample fractionation through ultracentrifugation in sucrose density gradient, and plating on solid DPM without nitrogen source. Strain NRRL B-16219 developed unpigmented white colonies after 4 weeks growth on DPM medium at 28°C without shaking. The strain was phenotyped using GENIII microplates in an Omnilog device (BIOLOG Inc., Haywood, USA) as previously described [5]. It was able to metabolize acetic acid, citric acid, D-cellobiose, dextrin, D-fructose, Dmannitol, D-mannose, fructose-6-phosphate, fusidic acid, glucose-6-phosphate, D and L malic acid, p-hydroxy-phenylacetic acid, propionic acid and D-serine and to grow in presence of 1% sodium lactate and up to 1% NaCl. Growth occurred between pH 5.0-6.8. The strain showed tolerant only to rifamycin.

Genome project history
Because it is one of the rare strains isolated directly from the soil, NRRL B-16219 strain was selected as part of an effort to gain insight into the depth of the pangenome pool and to identify symbiotic signaling molecules. The sequencing project was completed in April 2016 and the generated data was submitted as draft genome to Genbank under BioProject PRJNA318440 and the accession number MAXA00000000.1.

Growth conditions and genomic DNA preparation
The studied strain was kindly provided by David Labeda, ARS USDA bacterial collection, as NRRL B-16219 strain ID. The strain was grown at 28°C in stationary culture in 1-l bottles containing DPM medium [5], supplemented with 0.5 mM NH 4 Cl as nitrogen source maintained. Biomass from 1 month-old culture was harvested by centrifugation at 9000 x g for 15 min, rinsed several times with sterile distilled water. The mycelial mats were broken by repeated passages through syringes with progressively smaller diameters (21 g to 27 g). Genomic DNA extraction was performed using Plant DNeasy kits (Qiagen, Hilden, Germany) following the recommendation of the manufacturer. Prior to genome sequencing, the quality of the isolated DNA was checked by using the prepared DNA as template for PCR and partial sequences of several housekeeping genes and the 16S rRNA gene were generated and analyzed [16]. Genome Studies (University of New Hampshire, Durham, NH) using Illumina technology [38]. A standard Illumina shotgun library was constructed and sequenced using the Illumina HiSeq2500 platform with pair-end reads of 2 × 250 bp. The Illumina sequence data were trimmed by Trimmonatic version 0.32 [39], and assembled using Spades version 3.5 [40], and ALLPaths-LG version r52488 [41].

Genome annotation
The genome was annotated via the NCBI Prokaryotic Genome Annotation Pipeline. Additionally nod gene prediction analysis was done within the Integrated Microbial Genomes-Expert Review system developed by the Joint Genome Institute, Walnut Creek, CA, USA [42] developed by the Joint Genome Institute, Walnut Creek, CA, USA, using similarity search tools. This whole-genome shotgun sequence has been deposited at DDBJ/EMBL/GenBank under the accession number MAXA00000000.1. The version described in this paper is the first version, MAXA00000000.1. A summary of the project information is shown in Table 2.

Genome properties
The draft genome of Frankia NRRL B-16219 consisted of 289 DNA contigs that correspond to estimated genome size of 8,032,739 bp and a GC content of 71.7%. The draft genome contained 6859 total genes, including 6211 protein-encoding genes (90.55%), 561 pseudo genes (8.17%) and 53 RNAs (0.76%) ( Table 3). Classification of genes into the COG functional categories is shown in Table 4.

Insights from the genome sequence
Comparison of genomes from Frankia sp. NRRL B-16219 and other Frankia species The Frankia sp. NRRL B-16219 genome was compared to all of the Frankia genomes available at NCBI genome database including seven Frankia species including F. alni, F. casuarinae, F. elaeagni, F. coriariae, "F. discariae", F. inefficax, and "F. asymbiotica", two Candidatus Frankia and other Frankia sp. strains. As shown for other closely related strains from cluster 3, strain NRRL B-16219 has one of the largest genome sizes (8,032,739 bp) with a high GC content of 71.72%. Genes shown or suggested to be involved in the actinorhizal symbiosis were detected. Nitrogenase genes were organized into one operon: nifH-D-K-E-N-X-orf1-orf2-W-Z-B-U and a non-linked nifV gene. Genes encoding the hydrogenase subunits were clustered into two operons. Genes for two different types of truncated hemoglobins, HbN and HbO, were also present.

Nodulation pathway
In rhizobia, the common canonical nodABC genes playing a key role in triggering root nodule formation in Legumes. These signals are secreted as a reply to host-plant flavonoids perceived by the compatible rhizobial strains [43]. The Nod factors perceived by host plant through the LysM-RLKs, and the resulting signal transduction cascade triggers a bacterial invasion of root cortical cells and the genesis of functional nodules. Despite the presence of these LysM-RLKs in the actinorhizal plants [11], none of the Frankia genomes from cultivated strains contained any homologous nod genes [33], but they are present in the two Candidatus Frankia genomes [32,36]. Six nod-like genes were detected in the NRRL B-16219 draft genome (Additional file 1: Table  S1) organized into two regions (Fig. 3). The first cluster contained genes encoding the nodA1, nodC, ABC-2 type transport system ATP-binding protein (nodJ), ABC-2 transporter efflux protein, DrrB family The total is based on either the size of the genome in base pairs or the total genes in the annotated genome b Pseudo genes may also be counted as protein coding or RNA genes, so is not additive under total gene count The total is based on the total number of protein-coding genes in the genome (nodI) and nodH. The second cluster contained nodA, nodB and a nodH genes. Amino acid sequence similarities between Frankia sp. strain NRRL B-16219 NodA, B, C, and H predicted proteins ranged from 86 to 93% and 57-67% with the uncultured Frankia (Dg1 and Dg2) and (α-and β-) rhizobia, respectively (Additional file 2: Table S2). Further phylogenetic analysis (Fig. 4) showed that the Frankia Nod proteins were positioned at the root of both the αand β-rhizobial NodABC proteins as previously reported [4,8]. They were most closely related to plant nodulating Betaproteobacteria of Burkholderia and Paraburkholderia genera. The GC content of Frankia nod genes ranged from 57.9% for nodA to 66.37% for nodB which is quite similar to that of some rhizobial species including Methylobacterium and Burkoldaria. For both Frankia and rhizobia, GC% of the nod genes was lower than that of total genome sequences.

Field collected microsymbionts of Ceanothus americanus contain nod genes
Root nodules from Alnus glutinosa, Casuarina glauca and Elaeagnus angustifolia growing in Tunisia and Ceanothus americanus and Elaeagnus umbellata growing in Durham New Hampshire, USA, were collected. The nodA-nodB region from C. americanus nodules was PCR-amplified and sequenced. Following the alignments of the nodA and nodB gene sequences of Dg1 and NRRL B-16219, the primer set (forward primer nodAF 5′-AGCGCGACCCGAGCTCAGGATA ATCG-3′ and reverse nodBF (5′-CGATCCCACCCGG ATGGAGCTGC-3′) was designed in this study. The sequenced PCR-products were translated into amino acid sequences to permit the detection of the 23 aa sequence at the beginning of the 193 aa of the NodA, the intergenic region (160 nucleotides) and 41 aa at the end of the 230aa of the NodB. Both sequences showed 100% sequence similarities to their respective homologous region in NodA (23/193aa) and NodB (41/230aa) protein sequences for Candidatus Frankia Dg1. Regardless of their affiliation to cluster 2 or to cluster 3 (Fig. 5), all of the analyzed C. americanus microsymbionts contained the nodAB genes. In contrast, A. glutinosa, C. glauca, E. umbellata and E. angustifolia microsymbionts failed to amplify the expected PCR product. This result is in congruence with previous reports claiming that no homologous nod genes are retrievable in sequenced genomes from strains isolated from these actinorhizal plant species [33].

Conclusions
We report here the genome sequence of a Frankia strain directly isolated from soil rhizosphere. The generated draft genome was assembled into 289 contigs corresponding to 8,032,739 bp, which falls within the size range of Frankia cluster 3 [33]. Bacterial factors triggering actinorhizal   Fig. 4 Maximum likelihood phylogeny based on amino acids of nodA (a), nodB (b), nodC (c) and nodH (d). GC-content is provided for nod genes and for genomes (in parenthesis). Bootstrap and probability values larger than 50% are only shown symbiosis remain enigmatic since many sequenced Frankia genomes have revealed the absence of universal nodfactors. It was hypothesized that most Frankia strains use a novel nod-independent pathway for the infection process of actinorhizal plants. In contrast, two Candidatus Frankia Dg1 and Dg2 genomes contain canonical nod genes [32,36]. Here we provide the first proof for the presence of nod genes in the genome of a cultivated Frankia strain. In addition, a PCR-sequencing approach suggested that nod genes are only widespread in C. americanus microsymbionts. This situation is similar to legume symbionts where two nodulation pathways are described: the well-studied nod-dependent and an alternative nodindependent pathway. The majority of rhizobia use the nod-dependent pathway, while some photosynthetic [34] and non-photosynthetic [35] bradyrhizobia use the alternative nod-independent pathway. Moreover, some rhizobia use both pathways and the use of the nodindependent pathway seems to be highly dependent on host species rather than the presence or absence of nod genes in a given bradyrhizobial genome [44]. For Frankia, almost all host plants are infected through the nodindependent pathway, while the nod-dependent process may only be present in unstudied actinorhizal species such as members of the genus Ceanothus.