Skip to main content

Table 3. Selected annotation report examples1

From: Solving the Problem: Genome Annotation Standards before the Data Deluge

  

Chromosome

Feature counts

  

Calculated values

 

Bioproject ID2

Organism name No. of replicons

Length (Mbp)

GC (%)

No. of proteins

No. of RNAs

No. of amino acids with tRNA5

No. of hypothetical proteins3

Coding Density4

Avg. protein length (aa)

Min. protein length (aa)

Short proteins [%]6

Percent standard start codon [%]7

225

Escherichia coli str. K-12 substr. MG1655 (1)

4.640

50.79

4,144

1,75

22

21

0.89

316

14

20.32

90.54

76

Bacillus subtilis subsp. subtilis str. 168 (1)

4.216

43.51

4,177

178

20

221

0.99

294

20

26.48

77.76

17977

Candidatus Carsonella ruddii PV (1)

0.160

16.56

182

31

20

44

1.14

274

37

32.42

96.15

32135

Candidatus Hodgkinia cicadicola Dsem (1)

0.144

58.39

169

18

12*

37

1.18

257

38

33.73

27.22

46847

Streptomyces bingchenggensis BCW-1 (1)

11.937

70.75

10,022

84

21

3606

0.84

342

24

19.86

60.69

19943

Rickettsia rickettsii str. Iowa (1)

1.268

32.45

1,384

37

19*

607

1.09

232

17

47.76

73.55

81

Clostridium tetani E88 (1)

2.799

28.75

2,373

72

20

247

0.85

336

101

12.09

68.27

12634

Anaeromyxobacter dehalogenans 2CP-C (1)

5.013

74.91

4,346

58

21

965

0.87

349

38

15.85

69.21

49535

Propionibacterium freudenreichii subsp. shermanii CIRM-BIA1 (1)

2.616

67.27

2,375

51

20

721

0.91

317

2

21.14

70.57

43535

Lactobacillus salivarius CECT 5713 (1)

1.828

32.94

1,350

120

21

86

0.74

352

95

2.22

80.00

105

Haloarcula marismortui ATCC 43049 (2)

3.420

61.93

3,412

59

20

1

1.00

285

30

27.02

100.00

13128

Photobacterium profundum SS9 (2)

6.323

41.71

5,413

209

21

2,490

0.86

316

35

21.97

73.88

28711

Haliangium ochraceum DSM 14365 (1)

9.446

69.48

6,719

55

20

1,827

0.71

411

32

13.37

79.67

244

Nostoc sp. PCC 7120 (1)

6.414

41.35

5,368

64

20

0

0.84

326

17

25.58

82.41

19857

Vibrio harveyi ATCC BAA-1116 (2)

5.969

45.44

5,944

159

20

5944*

1.00

286

24

30.43

84.84

28111

Sorangium cellulosum ‘So ce 56’ (1)

13.034

71.38

9,375

319

0*

4170

0.72

401

30

13.08

73.33

344

Rhizobium leguminosarum bv. viciae 3841 (1)

5.057

61.09

4,700

0*

0*

247

0.93

309

40

19.57

80.83

31271

Mycobacterium leprae Br4923 (1)

3.268

57.80

1,604

47

20

143

0.49

335

33

21.01

54.30

29335

Neisseria gonorrhoeae NCCP11945 (1)

2.232

52.37

2,662

67

20

324

1.19

240

32

41.81

71.22

  1. 1. Selected genomes and categories for INSDC genomes are shown. The first two rows are for the model organisms E. coli and B. subtilis. The other genomes were selected as the minimum (bolded) or maximum (bolded and underlined) in the categories shown. Those marked with an asterisk fall below the minimal standards described in this publication.
  2. 2. INSDC Bioproject ID for each genome [57].
  3. 3. Number of proteins annotated as ‘hypothetical protein’.
  4. 4. Number of proteins per Kbp ((total number of proteins/genome length (bp)) * 1000).
  5. 5. Number of amino acids for which at least one tRNA is annotated in the genome (excluding predicted or annotated pseudo tRNAs).
  6. 6. Percent of short proteins (number less than 150 amino acids in length/total number of proteins * 100).
  7. 7. Percent of standard starts for proteins (number of standard starts (ATG)/total starts * 100).