Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs

Auch, Alexander F.; Klenk, Hans-Peter; Göker, Markus

doi:10.4056/sigs.541628

Environmental Microbiome

Table 1. HSP determination and filtering

From: Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs

Algorithm	WU BLAST^a	NCBI BLAST^b	BLAT^c	MUMmer^d	BLASTZ^e
Run time	Very high [M]	Low [M]	High [M]	Very low [M]	Moderate [M]
Memory consumption and output size	High [M]	Moderate [M]	Moderate [M]	Very low [M]	Low [M]
Typical effect on correlation with DDH values	decrease [M]	increase [M]	increase [M]	moderate increase [M]	decrease [M]
Seed parameter	W=	-W	-tileSize	-l	T=0 W=
Typical effect on runtime, RAM usage and file size	higher → speedup smaller output files [E]	higher → speedup smaller output files [E]	higher → speedup; lower → significant increase of memory consumption [M]	higher → speedup smaller output files [M]	higher → speedup smaller output files [E]
Typical effect on correlation with DDH values	N/A	N/A	lower → decrease of correlation [M]	higher → increase of correlation [M]	N/A
Identity parameter	score based, i.e., identical to initial word length	score based, i.e., identical to initial word length	-minIdentity	100% (fixed)	score based, i.e., identical to initial word length
Typical effect on runtime, RAM usage and file size	N/A	N/A	insignificant [M]	(none)	N/A
Typical effect on correlation with DDH values	N/A	N/A	lower → increase of correlation [M]	N/A	N/A
Measure of HSP quality used for filtering	e-value	e-value	substitution score	(makes no sense)	substitution score
Typical effect on subsequent runtime and RAM usage	insignificant [E]	insignificant [E]	lower → small increase of runtime and memory consumption [M]	(none)	lower → small increase of runtime and memory consumption [M]
Typical effect on correlation with DDH values	insignificant [E]	insignificant [E]	lower → small increase of correlation	N/A	higher → slight increase of correlation

The table shows different parameters of the similarity search algorithms and their influence on the correlation with DDH values (for details, see [1]). Note that the best possible correlation of DDH values (similarities) with GGD (dissimilarities) is −1.0; that is, ‘high’ correlations indicate more negative ones. Seed parameter: Minimum length for a stretch of DNA used as HSP starting point. Identity parameter: Minimum identity within HSP for prolongation. Evidence codes: [M] measured; [E] extrapolated.
^aVersion 2.0MP-WashU [04-May-2006], website http://blast.wustl.edu/. [2]
^bVersion 2.2.18, website ftp://ftp.ncbi.nlm.nih.gov/blast/executables/, [2]
^cVersion 34, website http://users.soe.ucsc.edu/∼kent/src/, [3]
^dVersion 3.0, website http://mummer.sourceforge.net/. [5]
^eVersion 7, website http://www.bx.psu.edu/miller_lab/, [4]

Back to article page

ISSN: 2524-6372

Contact us

Submission enquiries: journalsubmissions@springernature.com