- Short genome report
- Open Access
Complete genome sequence of Mycobacterium tuberculosis K from a Korean high school outbreak, belonging to the Beijing family
Standards in Genomic Sciences volume 10, Article number: 78 (2015)
Mycobacterium tuberculosis K, a member of the Beijing family, was first identified in 1999 as the most prevalent genotype in South Korea among clinical isolates of M. tuberculosis from high school outbreaks. M. tuberculosis K is an aerobic, non-motile, Gram-positive, and non-spore-forming rod-shaped bacillus. A transmission electron microscopy analysis displayed an abundance of lipid bodies in the cytosol. The genome of the M. tuberculosis K strain was sequenced using two independent sequencing methods (Sanger and Illumina). Here, we present the genomic features of the 4,385,518-bp-long complete genome sequence of M. tuberculosis K (one chromosome, no plasmid, and 65.59 % G + C content) and its annotation, which consists of 4194 genes (3447 genes with predicted functions), 48 RNA genes (3 rRNA and 45 tRNA) and 261 genes with peptide signals.
Mycobacterium tuberculosis , the bacterium responsible for causing tuberculosis, carries the world record for the highest mortality as a single infectious agent. According to a 2014 World Health Organization report, 8.6 million people were estimated to be new TB cases, and approximately 1.3 million people died from TB worldwide . Strains of M. tuberculosis in different geographical locations or populations may have different levels of virulence due to co-evolutionary processes, which consequently leads to varying epidemiological dominance [2, 3].
Among the various M. tuberculosis strains, M. tuberculosis strains belonging to the Beijing genotypes are more prone to induce disease progression and relapse from the latent state [4, 5]. For example, HN878, the causative agent of major TB outbreaks in Texas prisons between 1995 and 1998 , belongs to the W-Beijing family, which expresses a highly biologically active lipid species (phenolic glycolipid). HN878 causes rapid progression to death in mice compared to other clinical isolates (CDC1551) or standard laboratory-adapted virulent strains (H37RvT) . In addition, the Beijing genotypes are associated with greater drug resistance than the other M. tuberculosis genotypes . The frequency of Beijing M. tuberculosis is estimated to be 85 to 95 % in South Korea . Recently, the Beijing strains have spread all over the world, including the US, Europe, and Africa, and account for over 13 % of all of the M. tuberculosis strains worldwide [10–12].
Unusually high rates of pulmonary TB occurred in senior high schools in Kyunggi Province in South Korea in 1998 . During the national survey for genotyping analysis of clinical M. tuberculosis isolates in 1999, a single strain with a unique restriction fragment length polymorphism (RFLP) profile was the most frequently identified strain . This particular M. tuberculosis K strain phylogenetically belongs to the Beijing genotype and is the most dominant M. tuberculosis strain in South Korea.
M. tuberculosis K replicates rapidly during the early stages of infection in a murine model of TB, causing a more severe pathology and a high level of reactivation from latent infection . In addition, the Bacillus-Calmette-Guérin vaccination is less effective against an M. tuberculosis K infection than H37RvT (unpublished data). These remarkable features of M. tuberculosis K are associated with its high transmissibility and dominance in South Korea. However, the molecular mechanisms of virulence and the pathogenicity-related genetic features of this strain remain unclear. To understand the genomic features of the strain in detail, we sequenced and annotated the complete genome of M. tuberculosis K.
Classification and features
A representative genomic rpoB gene of M. tuberculosis K was compared with those obtained using BLASTN  with the default settings (only highly similar sequences). The sequence of the single rpoB gene copy was found in the genome. The rpoB gene, which was derived from the M. tuberculosis K genome sequence, showed 99.97 % sequence similarity to the M. tuberculosis H37RvT that was deposited in GenBank (GenBank accession: CP007803.1). We identified only one single-nucleotide polymorphism within the entire rpoB gene (3519 bp) in M. tuberculosis K compared to M. tuberculosis H37RvT (C3225T). M. tuberculosis K shares a high nucleotide sequence similarity with M. tuberculosis H37RvT and other mycobacteria (Table 1, Fig. 1 and Additional file 1: Table S1). Figure 1 shows the phylogenetic position of M. tuberculosis K in the partial rpoB-based tree. For a more detailed analysis, the whole-genome sequences were used for an average nucleotide identity analysis (Additional file 2: Figure S1). The ANI results showed that M. tuberculosis K belongs to the M. tuberculosis group but is separated from the other M. tuberculosis strains. The 16S rRNA gene sequence of M. tuberculosis K showed 100 % similarity with M. tuberculosis H37RvT.
M. tuberculosis K is an aerobic, non-motile rod with a cell size of approximately 0.2–0.5 × 1.0–1.5 μm. It stains weakly positive under Gram staining and contains lipid bodies (Fig. 2). The colonies are slightly yellowish and appear rough and wrinkled on a 7H10-OADC plate (Fig. 3). The viable temperature range for growth is 4–37 °C, with optimum growth at 30–37 °C. The viable pH range is 5.5–8.0, with optimal growth at pH 7.0–7.5.
M. tuberculosis K is resistant to ampicillin, penicillin, chloramphenicol, erythromycin, azithromycin, clarithromycin and tetracycline, but it is susceptible to rifampicin, isoniazid, pyrazinamide, ethambutol, cycloserine, protionamide, amikacin, capreomycin, kanamycin, streptomycin, moxifloxacin, levofloxacin and ofloxacin. To investigate the phenotype of M. tuberculosis K, we observed 106 M. tuberculosis K bacilli under transmission electron microscopy. Briefly, the immobilized bacteria were rinsed with phosphate-buffered saline and fixed in 2.0 % paraformaldehyde and 2.0 % glutaraldehyde in 1x PBS with 3 mM MgCl2 (pH 7.2) for at least 1 h at room temperature. The bacterial cells were transferred to propylene oxide and were gradually infiltrated with Spurr’s low-viscosity resin (Polysciences, Warrington, USA): propylene oxide. After three changes in the 100 % Spurr’s resin, the pellets were cured at 60 °C for two days. The sections were cut on an ultramicrotome using a Diatome Diamond knife (Electron Microscopy Sciences, Hatfield, USA). Eighty-nanometer sections were picked up on formvar-coated 1 × 2-mm copper slot grids and stained with tannic acid and uranyl acetate followed by lead citrate. The grids were examined and photographed using TEM (JEM-1011, JEOL, Japan).
Genome sequencing information
Genome project history
Mycobacterium tuberculosis K and the other K-related strains comprise the most dominant genotype of M. tuberculosis in South Korea, but the genomic characteristics and genetic information regarding this strain are still poorly understood. This organism was selected to gain understanding of the molecular pathogenesis of the highly pathogenic and prevalent strain of M. tuberculosis in South Korea.
As the reference strain for studying tuberculosis in Korea, in this study, M. tuberculosis K was selected and sequenced. We used two different next-generation sequencing methods: Sanger and Illumina. The Sanger sequencing was performed at the Korea Research Institute of Bioscience and Biotechnology, Daejeon, South Korea. The NGS sequencing, finishing and genome annotation was performed by ChunLab Inc., Seoul, Korea, and the finished genome sequence and the related data were deposited in GenBank under the accession number CP007803.1. Table 2 presents the project information and its association with MIGS version 2.0 compliance .
Growth conditions and genomic DNA preparation
M. tuberculosis K was kindly provided by the Korean Institute of Tuberculosis, Seoul, Korea. M. tuberculosis H37RvT, which is stored at the International Tuberculosis Research Centre (ITRC, Masan, South Korea), was also used in this study. M. tuberculosis was cultured aerobically at 37 °C in Middlebrook 7H10 media containing 0.02 % glycerol and 10 % OADC for 4 weeks.
From the M. tuberculosis cultures grown in the 7H10 media for a month, the bacterial DNA was isolated as previously described . In short, the bacilli in suspension were killed by heating at 80 °C for 30 min, and after centrifugation, the cell pellets were resuspended in 500 μl of TE buffer (0.01 M Tris–HCl, 0.001 M EDTA [pH 8.0]). The cells were treated with lysozyme (1 mg/ml) for 1 h at 37 °C, then with 10 % sodium dodecyl sulfate (SDS) and proteinase K (10 mg/ml) for 10 min at 65 °C prior to the DNA isolation. A total of 80 μl of N-acetyl-N,N,N,-trimethyl ammonium bromide was then added to approximately 500 μl of the lysed cell suspension, and the suspension was vortexed briefly and incubated for 10 min at 65 °C. An equal volume of chloroform-isoamyl alcohol (24:1, vol/vol) was added, and the mixture was vortexed for 10 s. The solution was then centrifuged for 5 min, and 0.6 volumes of isopropanol were added to the supernatant to precipitate the DNA. After cooling for 30 min at 20 °C, the DNA solution was centrifuged for 15 min, and the pellet was washed once with 70 % ethanol. Finally, the air-dried pellet was redissolved in 50 μl of 0.1x TE buffer and stored at −20 °C until use.
Genome sequencing and assembly
The M. tuberculosis K genome was sequenced at KRIBB (Daejeon, South Korea) and ChunLab Inc. (Seoul, South Korea) using two Sanger libraries (2 kb random shotgun library and fosmid library) and one Illumina library. The random shotgun and fosmid libraries were prepared using the pTZ19U vector and the CopyControl Fosmid Library Production Kit (Epicentre, Madison, USA), respectively. For the Illumina sequencing, the genomic DNA was fragmented using dsDNA fragmentase (NEB, Hitchin, UK) to make it to the proper size for the library construction. The resulting DNA fragments were processed using the TruSeq DNA Sample Preparation Kit v2 (Illumina, Inc., San Diego, USA) following the manufacturer’s instructions. The final library was quantified using a Bioanalyzer 2100 (Agilent, Santa Clara, USA), and the average library size was 300 bp.
The genomic libraries were sequenced via Sanger sequencing on an ABI3730 and an Illumina MiSeq (Illumina, Inc., San Diego, USA). The generated Sanger sequencing reads (70,889 reads, total read length: 36,413,063 bp) and the Illumina paired-end sequencing reads (10,493,598 reads, total read length: 2,419,306,885 bp) were assembled using the Phred/Phrap/Consed package and CLC Genomics Workbench v6.5 (CLC bio, Aarhus, Denmark). The resulting contigs from the Sanger sequencing of the 2 kb random shotgun library were scaffolded by sequencing the reads from the fosmid clones, and the gaps in the scaffolds were closed using PCR and Sanger sequencing. The contigs and the Sanger sequence reads for the gap closure were combined via manual curation using Phred/Phrap/Consed and CodonCode Aligner 3.7.1 (CodonCode Corp., Centerville, USA). The final genome sequence was reviewed by remapping with the Illumina raw reads and correcting the dubious regions and errors.
The coding sequences were predicted by Glimmer 3.02 . The tRNAs were identified using tRNAScan-SE , and the rRNAs were searched using HMMER with the EzTaxon-e rRNA profiles [20, 21]. The predicted CDSs were compared to catalytic families and NCBI Clusters of Orthologous Groups using rpsBLAST and the NCBI reference sequences SEED, TIGRFam, Pfam, Kyoto Encyclopedia of Genes and Genomes, COG and InterPro databases, using BLASTP and HMMER for the functional annotation [22–25]. Additional analyses and functional annotations for the genome statistics were performed using the Integrated Microbial Genomes platform.
The total length of the complete genome sequence was 4,385,518 bp, and no plasmid was found. The G + C content was determined to be 65.59 %, which is similar to other M. tuberculosis strains (65–66 %) (Fig. 4 and Table 4). Based on the gene prediction results, 4194 CDSs were identified, and 45 tRNAs and 1 rRNA operon were annotated. The total length of the genes was 3,953,484 bp, which makes up 90.15 % of the entire genome. The majority of the genes (82.19 %) were assigned putative functions, while the remaining genes (17.81 %) were annotated as hypothetical. A total of 2610 CDSs were assigned to functional COG groups, 3349 genes were assigned to Pfam domains and 261 genes had signal peptides. The genome properties and statistics are summarized in Table 3. The distributions of the genes among the COG functional categories are shown in Tables 4 and 5.
M. tuberculosis strains in different populations or geographical locations can exhibit different levels of virulence during the human-adaptation process with consequent varying epidemiological dominance. Importantly, clinical and epidemiological studies have demonstrated that the emergence of the Beijing strains may be associated with multi-drug resistance and a high level of virulence, resulting in increased transmissibility and rapid progression from infection to active disease. The M. tuberculosis K strain, which was isolated from an outbreak of pulmonary TB in senior high schools in South Korea, phylogenetically belongs to the Beijing genotype. Here, we present a summary classification and a set of genomic features of M. tuberculosis K together with the description of the complete genome sequence and annotation. The genome of the M. tuberculosis K strain is 4.4 Mbp with a GC content of 65.59 %. M. tuberculosis K genome contains several key virulence factors that are absent in the M. tuberculosis H37RvT genome, such as PE/PPE/PE-PGRS family proteins considered to be involved in granuloma formation and antigenic variations. Further functional analyses of the M. tuberculosis K-specific virulence factors involved in pathogenesis are currently under investigation. These studies may help us to understand the geographical evolution and molecular pathogenesis of this unique genotypic M. tuberculosis .
Transmission electron microscopy
Average nucleotide identity
Restriction fragments length polymorphism
WHO. Global tuberculosis report. WHO report 2014.
Brown T, Nikolayevskyy V, Velji P, Drobniewski F. Associations between Mycobacterium tuberculosis strains and phenotypes. Emerg Infect Dis. 2010;16:272–80.
Bouwman AS, Kennedy SL, Muller R, Stephens RH, Holst M, Caffell AC, et al. Genotype of a historic strain of Mycobacterium tuberculosis. Proc Natl Acad Sci U S A. 2012;109:18511–6.
Thwaites G, Caws M, Chau TT, D'Sa A, Lan NT, Huyen MN, et al. Relationship between Mycobacterium tuberculosis genotype and the clinical phenotype of pulmonary and meningeal tuberculosis. J Clin Microbiol. 2008;46:1363–8.
de Jong BC, Hill PC, Aiken A, Awine T, Antonio M, Adetifa IM, et al. Progression to active tuberculosis, but not transmission, varies by Mycobacterium tuberculosis lineage in The Gambia. J Infect Dis. 2008;198:1037–43.
Sreevatsan S, Pan X, Stockbauer KE, Connell ND, Kreiswirth BN, Whittam TS, et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A. 1997;94:9869–74.
Barczak AK, Domenech P, Boshoff HI, Reed MB, Manca C, Kaplan G, et al. In vivo phenotypic dominance in mouse mixed infections with Mycobacterium tuberculosis clinical isolates. J Infect Dis. 2005;192:600–6.
Glynn JR, Whiteley J, Bifani PJ, Kremer K, van Soolingen D. Worldwide occurrence of Beijing/W strains of Mycobacterium tuberculosis: a systematic review. Emerg Infect Dis. 2002;8:843–9.
Shamputa IC, Lee J, Allix-Beguec C, Cho EJ, Lee JI, Rajan V, et al. Genetic diversity of Mycobacterium tuberculosis isolates from a tertiary care tuberculosis hospital in South Korea. J Clin Microbiol. 2010;48:387–94.
Bifani PJ, Mathema B, Kurepina NE, Kreiswirth BN. Global dissemination of the Mycobacterium tuberculosis W-Beijing family strains. Trends Microbiol. 2002;10:45–52.
Parwati I, van Crevel R, van Soolingen D. Possible underlying mechanisms for successful emergence of the Mycobacterium tuberculosis Beijing genotype strains. Lancet Infect Dis. 2010;10:103–11.
European Concerted Action on New Generation Genetic Markers and Techniques for the Epidemiology and Control of Tuberculosis. Beijing/W genotype Mycobacterium tuberculosis and drug resistance. Emerg Infect Dis. 2006;12:736–43.
Kim SJ, Bai GH, Lee H, Kim HJ, Lew WJ, Park YK, et al. Transmission of Mycobacterium tuberculosis among high school students in Korea. Int J Tuberc Lung Dis. 2001;5:824–30.
Jeon BY, Kwak J, Hahn MY, Eum SY, Yang J, Kim SC, et al. In vivo characteristics of Korean Beijing Mycobacterium tuberculosis strain K1 in an aerosol challenge model and in the Cornell latent tuberculosis model. J Med Microbiol. 2012;61:1373–9.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 2008;26:541–7.
Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A. 1990;87:4576–9.
Garrity GM, Holt J. The Road Map to the Manual. In: Bergey's Manual of Systematic Bacteriology, vol. 1. 2nd ed. New York: Springer; 2001. p. 119–69.
Stackebrandt ERF, Ward-Rainey NL. Proposal for a new hierarchic classification system, Actinobacteria classis nov. Int J Syst Bacteriol. 1997;47:479–91.
Zhi XY, Li WJ, Stackebrandt E. An update of the structure and 16S rRNA gene sequence-based definition of higher ranks of the class Actinobacteria, with the proposal of two new suborders and four new families and emended descriptions of the existing higher taxa. Int J Syst Evol Microbiol. 2009;59:589–608.
Skerman VBD, Mcgowan V, Sneath PHA. Approved Lists of Bacterial Names. Int J Syst Bacteriol. 1980;30:225–420.
Re B. Studies in the nomenclature and classification of bacteria. J Bacteriol. 1917;2:155–64.
Chester FD. Report of mycologist: bacteriological work. Del Agric Exp Stn Bull. 1897;9:38–145.
Runyon EH WL, Kubica GP. Genus I. Mycobacterium Lehmann and Neumann 1896, 363. In: Buchanan RE, Gibbons NE, editors. Bergey's Manual of Determinative Bacteriology. 8th ed. Baltimore: The Williams and Wilkins Co; 1974. p. 682–701.
Lehmann KB KB, Neumann RO. Atlas und Grundriss der Bakteriologie und Lehrbuch der speziellen bakteriologischen Diagnostik, vol. 1. München: JF Lehmann; 1896. p. 1–448.
Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA. 1990;87:4576–79.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9.
van Soolingen D, Hermans PW, de Haas PE, Soll DR, van Embden JD. Occurrence and stability of insertion sequences in Mycobacterium tuberculosis complex strains: evaluation of an insertion sequence-dependent DNA polymorphism as a tool in the epidemiology of tuberculosis. J Clin Microbiol. 1991;29:2578–86.
Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23:673–9.
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–64.
Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–63.
Kim OS, Cho YJ, Lee K, Yoon SH, Kim M, Na H, et al. Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol. 2012;62:716–21.
Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–702.
Yu C, Zavaljevski N, Desai V, Reifman J. Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases. Proteins. 2009;74:449–60.
Pruitt KD, Tatusova T, Klimke W, Maglott DR. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 2009;37:D32–36.
Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–6.
Felsenstein D. Confidence limits on phylogenies an approach using the bootstrap. Evolution. 1985;39:783–91.
This research was supported by the Basic Science Research Program through the Ministry of Science, ICT, and Future Planning (NRF-2013R1A2A1A01009932). We would like to thank Dr. Hong-Seok Park (Korea Research Institute of Bioscience and Biotechnology) for performing the Sanger sequencing.
The authors declare that have no competing interests.
SJH and TS performed all of the microbiological work and significantly contributed to the writing of the manuscript. SJH, TS, and YJC performed the molecular characterization and all of the bioinformatic analyses, including phylogenetic analysis, the genome assembly, and annotation. TS and YJC significantly contributed to the writing of the manuscript. JSJ and SYC performed all electron microscopy experiments. JC participated in genome sequencing and ensuring quality control of the data. GHB isolated the strain and prepared related clinical information regarding the isolate. SNC participated in the study design and helped to write the manuscript. SJS conceived the study as the supervisor of the project and was responsible for completing the manuscript. All authors read and approved the final manuscript.
Seung Jung Han, Taeksun Song and Yong-Joon Cho contributed equally to this work.
Associated MIGS record. (DOCX 36 kb)
The genome tree showing the relationships of M. tuberculosis K with other Mycobacterium species based on ANI values. Description of data: To convert the ANI into a distance, its complement to 1 was taken. From this pairwise distance matrix, an ANI tree was constructed using the UPGMA clustering method. (PDF 846 kb)
About this article
Cite this article
Han, S.J., Song, T., Cho, YJ. et al. Complete genome sequence of Mycobacterium tuberculosis K from a Korean high school outbreak, belonging to the Beijing family. Stand in Genomic Sci 10, 78 (2015). https://doi.org/10.1186/s40793-015-0071-4
- Mycobacterium tuberculosis
- Korean Beijing strain
- M. tuberculosis K complete genome
- TB clinical strain
- TB Beijing family