Skip to main content

Table 1. HSP determination and filtering

From: Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs

Algorithm

WU BLASTa

NCBI BLASTb

BLATc

MUMmerd

BLASTZe

Run time

Very high [M]

Low [M]

High [M]

Very low [M]

Moderate [M]

Memory consumption and output size

High [M]

Moderate [M]

Moderate [M]

Very low [M]

Low [M]

Typical effect on correlation with DDH values

decrease [M]

increase [M]

increase [M]

moderate increase [M]

decrease [M]

Seed parameter

W=

-W

-tileSize

-l

T=0 W=

Typical effect on runtime, RAM usage and file size

higher → speedup smaller output files [E]

higher → speedup smaller output files [E]

higher → speedup; lower → significant increase of memory consumption [M]

higher → speedup smaller output files [M]

higher → speedup smaller output files [E]

Typical effect on correlation with DDH values

N/A

N/A

lower → decrease of correlation [M]

higher → increase of correlation [M]

N/A

Identity parameter

score based, i.e., identical to initial word length

score based, i.e., identical to initial word length

-minIdentity

100% (fixed)

score based, i.e., identical to initial word length

Typical effect on runtime, RAM usage and file size

N/A

N/A

insignificant [M]

(none)

N/A

Typical effect on correlation with DDH values

N/A

N/A

lower → increase of correlation [M]

N/A

N/A

Measure of HSP quality used for filtering

e-value

e-value

substitution score

(makes no sense)

substitution score

Typical effect on subsequent runtime and RAM usage

insignificant [E]

insignificant [E]

lower → small increase of runtime and memory consumption [M]

(none)

lower → small increase of runtime and memory consumption [M]

Typical effect on correlation with DDH values

insignificant [E]

insignificant [E]

lower → small increase of correlation

N/A

higher → slight increase of correlation

  1. The table shows different parameters of the similarity search algorithms and their influence on the correlation with DDH values (for details, see [1]). Note that the best possible correlation of DDH values (similarities) with GGD (dissimilarities) is −1.0; that is, ‘high’ correlations indicate more negative ones. Seed parameter: Minimum length for a stretch of DNA used as HSP starting point. Identity parameter: Minimum identity within HSP for prolongation. Evidence codes: [M] measured; [E] extrapolated.
  2. aVersion 2.0MP-WashU [04-May-2006], website http://blast.wustl.edu/. [2]
  3. bVersion 2.2.18, website ftp://ftp.ncbi.nlm.nih.gov/blast/executables/, [2]
  4. cVersion 34, website http://users.soe.ucsc.edu/kent/src/, [3]
  5. dVersion 3.0, website http://mummer.sourceforge.net/. [5]
  6. eVersion 7, website http://www.bx.psu.edu/miller_lab/, [4]