Fasta and Blast

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3
At a glance
Powered by AI
BLAST and FASTA are two major algorithms used for sequence similarity searches. They use heuristic methods like word matching to quickly align sequences and identify homologous sequences. BLAST finds ungapped local alignments while FASTA can find similarities between less similar sequences.

BLAST finds ungapped local alignments while FASTA can find similarities between less similar sequences. BLAST uses words while FASTA uses shorter k-tuples. BLAST calculates expectation values while FASTA does not.

Variants of BLAST include BLAST-N, BLAST-P, BLAST-X, tBLAST-N and tBLAST-X which allow comparisons between different types of sequences like nucleotide to nucleotide, protein to protein or nucleotide to protein.

FASTA and BLAST

microbenotes.com/fasta-and-blast

The number of DNA and protein sequences in public databases is very large.
Searching a database involves aligning the query sequence to each sequence in the database,
to find significant local alignment.
BLAST and FASTA are two similarity searching programs that identify homologous DNA
sequences and proteins based on the excess sequence similarity.
They provide facilities for comparing DNA and proteins sequences with the existing DNA
and protein databases.
They are two major heuristic algorithms for performing database searches.

Working of FASTA and BLAST


FASTA and BLAST are the software tools used in bioinformatics. Both BLAST and FASTA
use a heuristic word method for fast pairwise sequence alignment.
It works by finding short stretches of identical or nearly identical letters in two sequences.
These short strings of characters are called words.
The basic assumption is that two related sequences must have at least one word in common.
By first identifying word matches, a longer alignment can be obtained by extending
similarity regions from the words.
Once regions of high sequence similarity are found, adjacent high-scoring regions can be
joined into a full alignment.

The main difference between BLAST and FASTA is that BLAST is mostly involved in finding of
ungapped, locally optimal sequence alignments whereas FASTA is involved in finding similarities
between less similar sequences.

BLAST (Basic Local Alignment Search Tool)

1/3
The BLAST program was developed by Stephen Altschul of NCBI in 1990 and has since
become one of the most popular programs for sequence analysis.
BLAST uses heuristics to align a query sequence with all sequences in a database.
The objective is to find high-scoring ungapped segments among related sequences. The
existence of such segments above a given threshold indicates pairwise similarity beyond
random chance, which helps to discriminate related sequences from unrelated sequences in
a database.
BLAST is popular as a bioinformatics tool due to its ability to identify regions of local
similarity between two sequences quickly. BLAST calculates an expectation value, which
estimates the number of matches between two sequences. It uses the local alignment of
sequences.

Variants of BLAST
BLAST-N: compares nucleotide sequence with nucleotide sequences
BLAST-P: compares protein sequences with protein sequences
BLAST-X: Compares nucleotide sequences against the protein sequences
tBLAST-N: compares the protein sequences against the six frame translations of nucleotide
sequences
tBLAST-X: Compares the six frame translations of nucleotide sequence against the six
frame translations of protein sequences.

FASTA
FASTA stands for fast-all” or “FastA”.
It was the first database similarity search tool developed, preceding the development of
BLAST.
FASTA is another sequence alignment tool which is used to search similarities between
sequences of DNA and proteins.
FASTA uses a “hashing” strategy to find matches for a short stretch of identical residues
with a length of k. The string of residues is known as ktuples or ktups, which are equivalent
to words in BLAST, but are normally shorter than the words.
Typically, a ktup is composed of two residues for protein sequences and six residues for
DNA sequences.
The query sequence is thus broken down into sequence patterns or words known as k-tuples
and the target sequences are searched for these k-tuples in order to find the similarities
between the two.
FASTA is a fine tool for similarity searches.

These methods are not guaranteed to find the optimal alignment or true homologs, but are 50–
100 times faster than dynamic programming.

References
1. Xiong J. (2006). Essential Bioinformatics. Texas A & M University. Cambridge University
Press.
2. Arthur M Lesk (2014). Introduction to bioinformatics. Oxford University Press. Oxford,
United Kingdom
3. http://pediaa.com/difference-between-blast-and-fasta/#BLAST
4. https://embnet.vital-it.ch/CoursEMBnet/Basel03/slides/BLAST_FASTA.pdf

2/3
5. https://www.slideshare.net/avrilcoghlan/blast-16572940
6. https://www.cs.helsinki.fi/bioinformatiikka/mbi/courses/0708/itb/slides/itb0708_slides_83-
116.pdf
7. https://blast.ncbi.nlm.nih.gov/Blast.cgi?
CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp

3/3

You might also like