Skip to main content

Table 1. HSP determination and filtering

From: Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs

Algorithm WU BLASTa NCBI BLASTb BLATc MUMmerd BLASTZe
Run time Very high [M] Low [M] High [M] Very low [M] Moderate [M]
Memory consumption and output size High [M] Moderate [M] Moderate [M] Very low [M] Low [M]
Typical effect on correlation with DDH values decrease [M] increase [M] increase [M] moderate increase [M] decrease [M]
Seed parameter W= -W -tileSize -l T=0 W=
Typical effect on runtime, RAM usage and file size higher → speedup smaller output files [E] higher → speedup smaller output files [E] higher → speedup; lower → significant increase of memory consumption [M] higher → speedup smaller output files [M] higher → speedup smaller output files [E]
Typical effect on correlation with DDH values N/A N/A lower → decrease of correlation [M] higher → increase of correlation [M] N/A
Identity parameter score based, i.e., identical to initial word length score based, i.e., identical to initial word length -minIdentity 100% (fixed) score based, i.e., identical to initial word length
Typical effect on runtime, RAM usage and file size N/A N/A insignificant [M] (none) N/A
Typical effect on correlation with DDH values N/A N/A lower → increase of correlation [M] N/A N/A
Measure of HSP quality used for filtering e-value e-value substitution score (makes no sense) substitution score
Typical effect on subsequent runtime and RAM usage insignificant [E] insignificant [E] lower → small increase of runtime and memory consumption [M] (none) lower → small increase of runtime and memory consumption [M]
Typical effect on correlation with DDH values insignificant [E] insignificant [E] lower → small increase of correlation N/A higher → slight increase of correlation
  1. The table shows different parameters of the similarity search algorithms and their influence on the correlation with DDH values (for details, see [1]). Note that the best possible correlation of DDH values (similarities) with GGD (dissimilarities) is −1.0; that is, ‘high’ correlations indicate more negative ones. Seed parameter: Minimum length for a stretch of DNA used as HSP starting point. Identity parameter: Minimum identity within HSP for prolongation. Evidence codes: [M] measured; [E] extrapolated.
  2. aVersion 2.0MP-WashU [04-May-2006], website http://blast.wustl.edu/. [2]
  3. bVersion 2.2.18, website ftp://ftp.ncbi.nlm.nih.gov/blast/executables/, [2]
  4. cVersion 34, website http://users.soe.ucsc.edu/kent/src/, [3]
  5. dVersion 3.0, website http://mummer.sourceforge.net/. [5]
  6. eVersion 7, website http://www.bx.psu.edu/miller_lab/, [4]