This manual page documents briefly the commands bl2seq, blast,
blastall, blastcl3, blastpgp, impala,
megablast, rpsblast, and seedtop. These commands
are documented together because they have a lot of common options.
bl2seq performs a comparison between two sequences using either
the blastn or blastp algorithm. Both sequences must be either
nucleotides or proteins.
blast2 compares a sequence against either a local database or a
second sequence; it incorporates most of the functionality of both
bl2seq and blastall, but uses a semi-experimental new
internal engine.
blastall and blastall_old find the best matches in a
local database for a sequence.
blastall uses a newer engine than blastall_old by default,
but supports using the older engine as well (when invoked with the
option -V F).
blastcl3 accesses the newest NCBI BLAST search engine (version
2.0). The software behind BLAST version 2.0 was written from scratch
to allow BLAST to handle the new challenges posed by the sequence
databases in the coming years. Updates to this software will continue
in the coming years.
blastpgp performs gapped blastp searches and can be used to
perform iterative searches in psi-blast and phi-blast mode.
impala searches a database of score matrices, prepared by
copymat(1), producing BLAST-like output.
megablast uses the greedy algorithm of Webb Miller et al. for
nucleotide sequence alignment search and concatenates many queries to
save time spent scanning the database. This program is optimized for
aligning sequences that differ slightly as a result of sequencing or
other similar "errors". It is up to 10 times faster than more common
sequence similarity programs and therefore can be used to swiftly
compare two large sets of sequences against each other.
rpsblast (Reverse PSI-BLAST) searches a query sequence against a
database of profiles. This is the opposite of PSI-BLAST that searches
a profile against a database of sequences, hence the `Reverse'.
rpsblast uses a BLAST-like algorithm, finding single- or
double-word hits and then performing an ungapped extension on these
candidate matches. If a sufficiently high-scoring ungapped alignment
is produced, a gapped extension is performed and those (gapped)
alignments with sufficiently low expect value are reported. This
procedure is in contrast to IMPALA that performs a Smith-Waterman
calculation between the query and each profile, rather than using a
word-hit approach to identify matches that should be extended.
seedtop answers two relatively simple questions:
1.
Given a sequence and a database of patterns, which patterns occur
in the sequence and where?
2.
Given a pattern and a sequence database, which sequences contain the
pattern and where?
Some of these commands support multiple types of comparison, governed
by the -p ("program") flag:
blastp
compares an amino acid query sequence against a protein sequence
database.
blastn
compares a nucleotide query sequence against a nucleotide sequence
database.
blastx
compares the six-frame conceptual translation products of a nucleotide
query sequence (both strands) against a protein sequence database.
For bl2seq, the nucleotide should be the first sequence given.
psitblastn
compares a protein query sequence against a nucleotide sequence
database dynamically translated in all six reading frames (both
strands) using a position specific matrix created by PSI-BLAST.
tblastn
compares a protein query sequence against a nucleotide sequence
database dynamically translated in all six reading frames (both
strands). For bl2seq, the nucleotide should be the second
sequence given.
tblastx
compares the six-frame translations of a nucleotide query sequence
against the six-frame translations of a nucleotide sequence database.
Multiple Hits window size; generally defaults to 0 (for single-hit
extensions), but defaults to 40 when using discontiguous templates.
-BN (blast2)
Produce on-the-fly output:
0
none (default)
1
table of offsets and quality values
2
add sequence data
3
text ASN.1
4
binary ASN.1
-BN (blastall, blastall_old)
Number of concatenated queries, in blastn or tblastn mode
-Bfilename (blastpgp)
Input Alignment File for PSI-BLAST Restart
-CX (blast2, blastall, blastall_old, blastcl3)
Use composition-based statistics for blastp or tblastn:
T, t, D, or d
Default (equivalent to 1 for blast2 and blastall_old
and to 2 for blastall and blastcl3)
0, F, or f
No composition-based statistics
1
Composition-based statistics as in NAR 29:2994-3005, 2001
2
Composition-based score adjustment as in Bioinformatics 21:902-911,
2005, conditioned on sequence properties
3
Composition-based score adjustment as in Bioinformatics 21:902-911,
2005, unconditionally
When enabling statistics in blastall, blastall_old, or blastcl3 (i.e.,
not blast2), appending u (case-insensitive) to the mode enables
use of unified p-values combining alignment and compositional p-values
in round 1 only.
-Cfilename (blastpgp)
Output File for PSI-BLAST Checkpointing
-CN (seedtop)
Score only or not (default = 1)
-DN (bl2seq)
Output format:
0
traditional (default)
1
tabular
-DN (blast2, blastall, blastall_old, blastcl3)
Translate sequences in the database according to genetic code N
in /usr/share/ncbi/data/gc.prt (default is 1; only applies to tblast*)
-DN (megablast)
Type of output:
0
alignment endpoints and score
1
all ungapped segments endpoints
2
traditional BLAST output (default)
3
tab-delimited one line format
4
incremental text ASN.1
5
incremental binary ASN.1
-DN (seedtop)
Cost decline to align (default = 99999)
-EN (bl2seq, blastcl3, megablast)
Extending a gap costs N (-1 invokes default behavior)
-EN (blast2, blastall, blastall_old)
Extending a gap costs N (-1 invokes default behavior:
non-affine if greedy, 2 otherwise)
blastcl3, impala, megablast, rpsblast)
Filter options for DUST or SEG; defaults to T for bl2seq,
blast2, blastall, blastall_old, blastcl3, and megablast, and to
F for blastpgp, impala, and rpsblast.
-F (seedtop)
Filter sequence with SEG.
-GN (bl2seq, blastcl3, megablast)
Opening a gap costs N (-1 invokes default behavior)
-GN (blast2, blastall, blastall_old)
Opening a gap costs N (-1 invokes default behavior: non-affine
if greedy, 5 if using dynamic programming)
-GN (blastpgp, impala, seedtop)
Opening a gap costs N (default is 11)
-H (blast2)
Produce HTML output
-HN (blastpgp)
End of required region in query (-1 indicates end of query)
-H (impala)
Print help (different from usage message)
-HN (megablast)
Maximal number of HSPs to save per database sequence (default is 0, unlimited)
-I "start stop" (bl2seq, blast2)
Location on first (query) sequence (applies only if file specified
with -i contains a single sequence)
blastpgp, impala, seedtop)
Use matrix str (default = BLOSUM62)
-MN (megablast)
Maximal total length of queries for a single search (default = 5000000)
-N (blast2)
Show only accessions for sequence IDs in tabular output
-NX (blastpgp, rpsblast)
Number of bits to trigger gapping (default = 22.0)
-NN (megablast)
Type of a discontiguous word template:
0
coding (default)
1
optimal
2
two simultaneous
-Ofilename (blastall, blastall_old, blastcl3,
blastpgp, impala, megablast, rpsblast, seedtop)
Write (ASN.1) sequence alignments to filename; only valid for
blastpgp, impala, rpsblast, and seedtop with -J, and only valid
for megablast with -D2.
blastpgp, megablast, rpsblast)
Use words of size N (length of best perfect match; zero invokes
default behavior, except with megablast, which defaults to 28, and
blastpgp, which defaults to 3. The default values for the other
commands vary with "program": 11 for blastn, 28 for megablast, and 3
for everything else.)
blastpgp, megablast, rpsblast, seedtop)
X dropoff value for gapped alignment (in bits) (zero invokes default
behavior, except with megablast, which defaults to 20, and rpsblast
and seedtop, which default to 15. The default values for the other
commands vary with "program": 30 for blastn, 20 for megablast, 0 for
tblastx, and 15 for everything else.)
megablast, rpsblast)
X dropoff value for final [dynamic programming?] gapped alignment in
bits (default is 100 for blastn and megablast, 0 for tblastx, 25 for
others)
impala, megablast, seedtop)
Database to use (default is nr for all executables except blast2,
which requires a second FASTA sequence if this is not set)
-dfilename (rpsblast)
RPS BLAST Database
-eX
Expectation value (E) (default = 10.0)
-fX (blast2, blastall, blastall_old, blastcl3)
Threshold for extending hits, default if zero: 0 for blastn and
megablast, 11 for blastp, 12 for blastx, and 13 for tblasn and
tblastx.
-fN (blastpgp)
Threshold for extending hits (default 11)
-f (megablast)
Show full IDs in the output (default: only GIs or accessions)
-f (seedtop)
Force searching for patterns even if they are too likely
-g F (bl2seq, blastall, blastall_old, blastcl3)
Do not perform gapped alignment (N/A for tblastx)
-g (blast2)
Use greedy algorithm for gapped extensions
-g F (megablast)
Make discontiguous megablast generate words for every base of the
database (mandatory with the current BLAST engine)
-hN (blast2)
Frame shift penalty for out-of-frame gapping (blastx, tblastn only;
default is zero)
-hX (blastpgp, impala)
e-value threshold for inclusion in multipass model (default = 0.002
for blastpgp, 0.005 for impala)
-ifilename
Read (first, query) sequence or set from filename (default is
stdin; not needed for blastpgp if restarting from scoremat)
-jfilename (bl2seq, blast2)
Read second (subject) sequence or set from filename
-jN (blastpgp)
Maximum number of passes to use in multipass version (default = 1)
Length of a discontiguous word template (the largest intron allowed in
a translated nucleotide sequence when linking multiple distinct
assignments; default = 0; negative values disable linking for blastall,
blastall_old, and blastcl3.)
-tN[u] (blastpgp)
Composition-based score adjustment.
The first character is interpreted as follows:
0, F, or f
no composition-based statistics
1
composition-based statistics as in NAR 29:2994-3005, 2001
2, T, or t
composition-based score adjustment as in Bioinformatics
21:902-911, 2005, conditioned on sequence properties in round 1 (default)
3
composition-based score adjustment as in Bioinformatics
21:902-911, 2005, unconditionally in round 1
When composition-based statistics are in use, appending u
(case-insensitive) to the argument requests unified p-value combining
alignment p-value and compositional p-value in round 1 only.
-tN (megablast)
Length of a discontiguous word template (contiguous word if 0 [default])
-u (blast2)
Do only ungapped alignment (always TRUE for tblastx)
-ustr (blastcl3)
Restrict search of database to results of Entrez2 lookup