Skip to main content

Table 3. Selected annotation report examples1

From: Solving the Problem: Genome Annotation Standards before the Data Deluge

   Chromosome Feature counts    Calculated values  
Bioproject ID2 Organism name No. of replicons Length (Mbp) GC (%) No. of proteins No. of RNAs No. of amino acids with tRNA5 No. of hypothetical proteins3 Coding Density4 Avg. protein length (aa) Min. protein length (aa) Short proteins [%]6 Percent standard start codon [%]7
225 Escherichia coli str. K-12 substr. MG1655 (1) 4.640 50.79 4,144 1,75 22 21 0.89 316 14 20.32 90.54
76 Bacillus subtilis subsp. subtilis str. 168 (1) 4.216 43.51 4,177 178 20 221 0.99 294 20 26.48 77.76
17977 Candidatus Carsonella ruddii PV (1) 0.160 16.56 182 31 20 44 1.14 274 37 32.42 96.15
32135 Candidatus Hodgkinia cicadicola Dsem (1) 0.144 58.39 169 18 12* 37 1.18 257 38 33.73 27.22
46847 Streptomyces bingchenggensis BCW-1 (1) 11.937 70.75 10,022 84 21 3606 0.84 342 24 19.86 60.69
19943 Rickettsia rickettsii str. Iowa (1) 1.268 32.45 1,384 37 19* 607 1.09 232 17 47.76 73.55
81 Clostridium tetani E88 (1) 2.799 28.75 2,373 72 20 247 0.85 336 101 12.09 68.27
12634 Anaeromyxobacter dehalogenans 2CP-C (1) 5.013 74.91 4,346 58 21 965 0.87 349 38 15.85 69.21
49535 Propionibacterium freudenreichii subsp. shermanii CIRM-BIA1 (1) 2.616 67.27 2,375 51 20 721 0.91 317 2 21.14 70.57
43535 Lactobacillus salivarius CECT 5713 (1) 1.828 32.94 1,350 120 21 86 0.74 352 95 2.22 80.00
105 Haloarcula marismortui ATCC 43049 (2) 3.420 61.93 3,412 59 20 1 1.00 285 30 27.02 100.00
13128 Photobacterium profundum SS9 (2) 6.323 41.71 5,413 209 21 2,490 0.86 316 35 21.97 73.88
28711 Haliangium ochraceum DSM 14365 (1) 9.446 69.48 6,719 55 20 1,827 0.71 411 32 13.37 79.67
244 Nostoc sp. PCC 7120 (1) 6.414 41.35 5,368 64 20 0 0.84 326 17 25.58 82.41
19857 Vibrio harveyi ATCC BAA-1116 (2) 5.969 45.44 5,944 159 20 5944* 1.00 286 24 30.43 84.84
28111 Sorangium cellulosum ‘So ce 56’ (1) 13.034 71.38 9,375 319 0* 4170 0.72 401 30 13.08 73.33
344 Rhizobium leguminosarum bv. viciae 3841 (1) 5.057 61.09 4,700 0* 0* 247 0.93 309 40 19.57 80.83
31271 Mycobacterium leprae Br4923 (1) 3.268 57.80 1,604 47 20 143 0.49 335 33 21.01 54.30
29335 Neisseria gonorrhoeae NCCP11945 (1) 2.232 52.37 2,662 67 20 324 1.19 240 32 41.81 71.22
  1. 1. Selected genomes and categories for INSDC genomes are shown. The first two rows are for the model organisms E. coli and B. subtilis. The other genomes were selected as the minimum (bolded) or maximum (bolded and underlined) in the categories shown. Those marked with an asterisk fall below the minimal standards described in this publication.
  2. 2. INSDC Bioproject ID for each genome [57].
  3. 3. Number of proteins annotated as ‘hypothetical protein’.
  4. 4. Number of proteins per Kbp ((total number of proteins/genome length (bp)) * 1000).
  5. 5. Number of amino acids for which at least one tRNA is annotated in the genome (excluding predicted or annotated pseudo tRNAs).
  6. 6. Percent of short proteins (number less than 150 amino acids in length/total number of proteins * 100).
  7. 7. Percent of standard starts for proteins (number of standard starts (ATG)/total starts * 100).