Some Significant Databases Blast Blast
Some Significant Databases Blast Blast
These three databases contain the same data at any given time!
Genome
This resource organizes information on genomes including sequences,
maps, chromosomes, assemblies, and annotations.
http://www.ncbi.nlm.nih.gov/genome/
http://www.genome.jp/kegg/
Ensembl
BLAST
Basic Local Alignment Search Tool
Local vs Global alignment
Comparing biological sequence
Query vs library or database of sequences
Identify library sequences above threshold
Most widely used programs for sequence
searching
Heuristic algorithm
Faster than others
Calculating optimal alignment
Full alignment procedure (Smith–Waterman algorithm)
Cannot guarantee optimal alignments
Time-efficient
Search only significant patterns in the sequences
Input
Graphical format
Hitsfound
Table showing sequences
identified with scoring data
Process
Heuristic method
Short matches between two sequences
Seeding
First match
Begins local alignments
Default word size is 3
GLKFA
GLK, LKF, KFA
Heuristic algorithm
Locates three-letter words between sequences
Build an alignment
Process…
Alignment score at least the threshold T
Default scoring matrix
BLOSUM62
Alignment extended in both directions
Score higher than T
Included
Otherwise discarded
Increasing the T score limits
Amount of search?
Process speed?
Overview of the BLAST
algorithm
Remove low-complexity region in the
query sequence
k word list of the query sequence
High-scoring words into search tree
Scan the database for exact matches
Extend exact matches to high-scoring
segment pair (HSP)
List all HSPs in the database higher than T
BLAST Programs
Nucleotide-nucleotide BLAST (blastn)
DNA query, DNA database
Protein-protein BLAST (blastp)
Protein query, Protein database
Position-Specific Iterative BLAST (PSI-BLAST)
To find distant relatives of a protein
A list of all closely related proteins is created
A query against the protein database is then run
using this profile
Iterative
BLAST Programs
Nucleotide 6-frame translation-protein (blastx)
Compares the six-frame translation products of nucleotide query against a protein
database
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)
Slowest of the BLAST family
Translates the query nucleotide sequence in all six possible frames
Compares it against the six-frame translations of a nucleotide sequence database
To find very distant relationships between nucleotide sequences
Protein-nucleotide 6-frame translation (tblastn)
Compares a protein query against six frames of nucleotide database
Large numbers of query sequences (megablast)
Comparing large numbers of input sequences via the command-line BLAST
Uses of BLAST
Identifying species
Correctly identify a species or find homologous
Locating domains
To locate known domains within the sequence of interest
Establishing phylogeny
Create a phylogenetic tree using different alignments given
by BLAST
DNA mapping
Comparing unknown sequence against the known
sequences
Some definitions in BLAST
Score
Score of the alignment
Query coverage
How much query is found in database (percentage)
Identity
How much identity found in the covered query (percentage)
E value (Expected value)
Describes the random background noise
Describes the number of hits one can "expect" to see by chance
It decreases exponentially as the Score (S) of the match
increases