Skip to main content

Table 1. Databases, tools, resources for genomes and annotation.

From: Solving the Problem: Genome Annotation Standards before the Data Deluge

Category/Title Description Reference URL
General    
NCBI Genome Annotation Workshop All information from this publication, the Annotation Workshop, and futureannouncements will be made available   http://www.ncbi.nlm.nih.gov/genomes/AnnotationWorkshop.html
Difference between Archive and Curated Databases GenBank, RefSeq, TPA and UniProt:What’s in a Name? Microbe Online http://www.microbemagazine.org/index.php?option=com_content&view=article&id=1270:genbank-refseq-tpa-and-uniprot-whats-in-a-name&catid=347:letters&Itemid=646
Difference between Archive and Curated Databases GenBank, RefSeq, TPA and UniProt:What’s in a Name? NCBI Handbook http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook&part=ch1#GenBank_ASM
INSDC International Nucleotide Sequence Database Collaboration   http://www.insdc.org
INSDC Feature Table Feature table document   http://www.insdc.org/documents/feature_table.html
DDBJ DNA Databank of Japan [35] http://www.ddbj.nig.ac.jp
ENA European Nucleotide Archive [36] http://www.ebi.ac.uk/ena
GenBank GenBank [20] http://www.ncbi.nlm.nih.gov/genbank/index.html
Automated Annotation providers    
NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) Intended for use during the annotation of prokaryotic genomes in preparation for submission to GenBank — capable of annotating complete genomes as wells WGS genomes   http://www.ncbi.nlm.nih.gov/genomes/static/Pipeline.html
JCVI Annotation Service Anyone with a prokaryotic genome sequence in need of annotation may submit it to the JCVI Annotation Service completely free-of-charge   http://www.jcvi.org/cms/research/projects/annotation-service/overview
IGS Annotation Engine A free resource for genomics researchers and educators bringing advanced bioinformatics tools to the lab bench and the classroom.   http://ae.igs.umaryland.edu/cgi/index.cgi
KAAS - KEGG Automatic Annotation Server KAAS (KEGG Automatic Annotation Server)provides functional annotation of genes by BLAST comparisons against the manually curated KEGG GENES database with resulting KO (KEGG Orthology) assignments and automatically generated KEGG pathways [37] http://www.genome.jp/tools/kaas
RAST RAST (Rapid Annotation using Subsystem Technology) is a fully automated service for annotating bacterial and archaeal genomes — provides high quality genome annotations for these genomes across the whole phylogenetic tree [38] http://rast.nmpdr.org
DOE-JGI MAP Expert Review Data Submission: Microbial Genomes & Management [39] http://img.jgi.doe.gov/cgi-bin/submit/main.cgi
Annotation Cleanup, Analyses, and Validation Tools    
NCBI Submission Check Tool For the validation of genome submissions to GenBank — utilizes a series of self-consistency checks as well as comparison of submitted annotations to computed annotations — web-based and downloadable versions available   http://www.ncbi.nlm.nih.gov/genomes/frameshifts/frameshifts.cgi
NCBI Sequin Validation Sequin is a standalone tool for submitting and updating sequences [20] http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.html
NCBI TBL2ASN Command-line tool for automation of sequence records to GenBank [20] http://www.ncbi.nlm.nih.gov/Genbank/tbl2asn2.html
NCBI Discrepancy report Evaluation of ASN.1 files for annotation discrepancies-part of Sequin, available separately as downloadable command line version, and part of tbl2asn [20] http://www.ncbi.nlm.nih.gov/Genbank/asndisc.html
Broad’s Gene Pidgin (formerly BioName) A free resource for genomics researchersand educators bringing advanced bioinformaticstools to the lab bench and the classroom.   http://ae.igs.umarvland.edu/cgi/index.cgi
JCVI’s Protein Naming Utility KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes by BLAST comparisons against themanually curated KEGG GENES database with resultingKO (KEGG Orthology) assignments andautomatically generated KEGG pathways [37] http://kaas.genome.jp/tools/kaas/
Frameshift Tool RAST (Rapid Annotation using Subsystem Technology) is afully-automated service for annotating bacterial andarchaeal genomes — provides high quality genome annotations forthese genomes across the whole phylogenetic tree [38] http://rast.nmpdr.org
Annotation Report Expert Review Data Submission: Microbial Genomes & Management 39 http://img.jgi.doe.gov/cgi-bin/submit/main.cgi
Annotation Guidelines    
GenBank Bacterial Genome Submission Guidelines For the validation of genome submissions to GenBank-utilizes a series of self-consistency checksas well as comparison of submitted annotations tocomputed annotations — web-based anddownloadable versions available   http://www.ncbi.nlm.nih.gov/genomes/frameshifts/frameshifts.cgi
Annotation Instructions Sequin is a standalone tool for submittingand updating sequences [20] http://www.ncbi.nlm.nih.gov/Sequin/OuickGuide/sequin.htm
Project Submission Command-line tool for automation of sequencere-cords to GenBank [20] http://www.ncbi.nlm.nih.gov/Genbank/tbl2asn2.html
Locus_tag proposal Evaluation of ASN.1 files for annotation discrepancies-part of Sequin, available separately as downloadablecommand line version, and part of tbl2asn [20] http://www.ncbi.nlm.nih.gov/Genbank/asndisc.html
UniProt’s Protein Naming Guidelines UniProt’s prokaryotic-specific protein naming guidelines — adopted by INSDC   http://www.uniprot.org/docs/nameprot
GSC Structured Format Accepted structured format for genome metadata including SOPs [43] http://gensc.org/gc_wiki/index.php/MIGS/MIMS/MIENS
Insertion Sequences Insertion sequence finder, nomenclature, and registry [44] http://www-is.biotoul.fr/
Transposons Transposon nomenclature and registry [45] http://www.ucl.ac.uk/eastman/tn/
Enzyme Commission Numbers Official NC-IUBMB site   http://www.chem.qmul.ac.uk/iubmb/enzyme/
UniProt ENZYME ENZYME is a repository of information relative to the nomenclature of enzymes.   http://ca.expasy.org/enzyme/
Functional Annotation/Protein Families    
NCBI COGs Clusters of orthologous groups - no longer actively curated [46] http://www.ncbi.nlm.nih.gov/COG/
NCBI ProtClustDB Cliques of related proteins — curated and uncurated — for multiple organism groups including prokaryotes and Viruses [33] http://www.ncbi.nlm.nih.gov/proteinclusters
NCBI Cluster Comparison Tool Protein family comparison for functional annotation   http://www.ncbi.nlm.nih.gov/sutils/clustcomp.cgi
NCBI Cluster Comparison Tool - Core Mode Protein family core comparison for functional annotation   http://www.ncbi.nlm.nih.gov/sutils/clustcomp.cgi?core=on
List of Core Clusters Protein family core list   http://www.ncbi.nlm.nih.gov/sutils/clustcomp.cgi?report=core
UniProt HAMAP system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies in prokaryotes and plastids [47] http://ca.expasy.org/sprot/hamap/
KEGG Orthology Groups Manually defined ortholog groups that correspond to KEGG pathway nodes and BRITE hierarchy nodes [48] http://www.genome.jp/kegg/ko.html
JCVI’s TIGRFAMs Protein families based on Hidden Markov Models [49] http://www.jcvi.org/cms/research/projects/tigrfams/overview/
ACLAME Database dedicated to the collection and classification of mobile genetic elements [50] http://aclame.ulb.ac.be/
E. coli CCDS Project Comparison of annotation for model E. coli K-12 MG1655   http://www.ncbi.nlm.nih.gov/genomes/MICROBES/ecok12.cgi