Skip to main content

Table 4. Pseudogene annotation strategies and outcomes

From: Solving the Problem: Genome Annotation Standards before the Data Deluge

Case

Situation

Flag1

How to Annotate

Consequence2

In BLAST3

1

Pseudogene

“/pseudo”

pseudogene

no translation; product name is in note, associated feature (CDS, tRNA, rRNA, etc.) will be annotated

No

2

Potential pseudogene

N/A

normal gene annotated, potential pseudogene status in note

no CDS feature, not documented as a pseudogene, not trackable as protein vs. RNA-coding

No

3a

Frameshifted gene and sequence IS correct

“/pseudo”

combine intervals into a single gene with /pseudo

no translation; product name is in note

No

3b

Frameshifted gene and sequence MAY be correct

N/A

keep both and add a note to each CDS

two separate coding regions and two protein translations

Yes (Both)

3c*

Frameshifted gene and there are sequence ERRORS

/“exception=”annotated by transcript or proteomic data” AND (“/experiment” OR “/inference”)

experimental evidence defining the evidence that translation is correct and/or inference pointing to Accession Number with correct translation

protein sequence imported-translation does not match nucleotide

Yes

3d

Frameshifted gene and there are sequence ERRORS

“/artificial_location”

locations altered for ‘correct’ location

all protein deflines prefaced with “LOW-QUALITY PROTEIN:”

Yes

4

Region of similarity

N/A

misc_feature denoting location of region of similarity

no gene, no locus_tag, not systematically enumerated

No

5

Potential unresolvable problems

N/A

note explaining the issue

no change in annotation

Yes

64

Split/interrupted gene in the case of an insertion (ex. transposon insertion)

N/A

could be either a single interval, or a split interval, annotation depends on consequence of insertion

no standards for split genes, locations do not match regions of similarity

No

  1. 1. Qualifier to be used on feature.
  2. 2. Downstream consequence of annotation decision, including impacts on presentation of the record.
  3. 3. Whether a protein sequence is encoded and will be present in protein and BLAST databases. Note, BLAST dbs only provide the ability to differentiate proteins based on defline changes. ie. Case 3b, 3c, and 5 present undifferentiated protein deflines in BLAST databases whereas case 3d has an altered protein defline.
  4. 4. Insertions can result in complicated cases such as gene fusion events. These annotation results should be due to real insertions, not simply regions of the genome that exhibit weak similarity to a part of a protein sequence.