Skip to main content

Table 4. Pseudogene annotation strategies and outcomes

From: Solving the Problem: Genome Annotation Standards before the Data Deluge

Case Situation Flag1 How to Annotate Consequence2 In BLAST3
1 Pseudogene “/pseudo” pseudogene no translation; product name is in note, associated feature (CDS, tRNA, rRNA, etc.) will be annotated No
2 Potential pseudogene N/A normal gene annotated, potential pseudogene status in note no CDS feature, not documented as a pseudogene, not trackable as protein vs. RNA-coding No
3a Frameshifted gene and sequence IS correct “/pseudo” combine intervals into a single gene with /pseudo no translation; product name is in note No
3b Frameshifted gene and sequence MAY be correct N/A keep both and add a note to each CDS two separate coding regions and two protein translations Yes (Both)
3c* Frameshifted gene and there are sequence ERRORS /“exception=”annotated by transcript or proteomic data” AND (“/experiment” OR “/inference”) experimental evidence defining the evidence that translation is correct and/or inference pointing to Accession Number with correct translation protein sequence imported-translation does not match nucleotide Yes
3d Frameshifted gene and there are sequence ERRORS “/artificial_location” locations altered for ‘correct’ location all protein deflines prefaced with “LOW-QUALITY PROTEIN:” Yes
4 Region of similarity N/A misc_feature denoting location of region of similarity no gene, no locus_tag, not systematically enumerated No
5 Potential unresolvable problems N/A note explaining the issue no change in annotation Yes
64 Split/interrupted gene in the case of an insertion (ex. transposon insertion) N/A could be either a single interval, or a split interval, annotation depends on consequence of insertion no standards for split genes, locations do not match regions of similarity No
  1. 1. Qualifier to be used on feature.
  2. 2. Downstream consequence of annotation decision, including impacts on presentation of the record.
  3. 3. Whether a protein sequence is encoded and will be present in protein and BLAST databases. Note, BLAST dbs only provide the ability to differentiate proteins based on defline changes. ie. Case 3b, 3c, and 5 present undifferentiated protein deflines in BLAST databases whereas case 3d has an altered protein defline.
  4. 4. Insertions can result in complicated cases such as gene fusion events. These annotation results should be due to real insertions, not simply regions of the genome that exhibit weak similarity to a part of a protein sequence.