From: Solving the Problem: Genome Annotation Standards before the Data Deluge
Case | Situation | Flag1 | How to Annotate | Consequence2 | In BLAST3 |
---|---|---|---|---|---|
1 | Pseudogene | “/pseudo” | pseudogene | no translation; product name is in note, associated feature (CDS, tRNA, rRNA, etc.) will be annotated | No |
2 | Potential pseudogene | N/A | normal gene annotated, potential pseudogene status in note | no CDS feature, not documented as a pseudogene, not trackable as protein vs. RNA-coding | No |
3a | Frameshifted gene and sequence IS correct | “/pseudo” | combine intervals into a single gene with /pseudo | no translation; product name is in note | No |
3b | Frameshifted gene and sequence MAY be correct | N/A | keep both and add a note to each CDS | two separate coding regions and two protein translations | Yes (Both) |
3c* | Frameshifted gene and there are sequence ERRORS | /“exception=”annotated by transcript or proteomic data” AND (“/experiment” OR “/inference”) | experimental evidence defining the evidence that translation is correct and/or inference pointing to Accession Number with correct translation | protein sequence imported-translation does not match nucleotide | Yes |
3d | Frameshifted gene and there are sequence ERRORS | “/artificial_location” | locations altered for ‘correct’ location | all protein deflines prefaced with “LOW-QUALITY PROTEIN:” | Yes |
4 | Region of similarity | N/A | misc_feature denoting location of region of similarity | no gene, no locus_tag, not systematically enumerated | No |
5 | Potential unresolvable problems | N/A | note explaining the issue | no change in annotation | Yes |
64 | Split/interrupted gene in the case of an insertion (ex. transposon insertion) | N/A | could be either a single interval, or a split interval, annotation depends on consequence of insertion | no standards for split genes, locations do not match regions of similarity | No |