BLAST - Practic Information
BLAST - Practic Information
We will use the TRIM5α nucleotide sequences in fasta format (with accession number
AY843504.1) for blastn search. You can either use the accession number or the fasta sequence
(that you downloaded above, it is also available in /3.practical) for the blastn search.
You must specify the database you want to search. To search the complete GenBank database,
choose “Nucleotide collection (nr/nt)” database. Click on the Blast button to initiate
the search.
A graphical overview with color-coded alignment scores is provided for the most similar sequences.
In Description tab, you can see the output of the blastn search in form of a table with
sequences producing significant alignments. You see that the first hits have the highest Max score
and maximum identity, one of which is, in fact, the query sequence. In Graphic Summary, you
see a visual presentation of the results. For most hits, there are discontinuous stretches with high
alignment scores. You can see the actual alignments between the query and hits in the
Alignments tab. Taxonomy tab gives a rundown of the hits in taxa. The first hits are all TRIM5α
sequences from African green monkeys.
You can also filter the results based on organism name, percent identity, e-value and query coverage
in Filter Results.
3. Go to the first blast page in blastn and go to the bottom and click on algorithm parameters. The
parameters of the algorithm can be adjusted here. This will expand the options for the blast
searches.
match/mismatch to 1,-4
If you click open results in new window you can compare the results more easily. Compare the result
of this BLAST with the previous one. (You can rerun with default parameters if you didn't leave them
open). What differences you see after you change the algorithm parameters?
4. Conduct a blastn again but change the organism to Chlorocebus. Is the first hit significant?
This is another way to shrink the results but not the database. Compare the search summary of the
first to a search summary without the organism limit. The statistics are the same so the size of the
1
database didn't change. Alternatively, you can also filter the results based on organism name in
Filter Results. But this will only show you the top 20 hits.
5. We also perform a protein Blast search using the amino acid translation of the TRIM5α coding
sequence (accession number AAV91975.1, it is available in /3.practical). Use blastp During the
search, a page will be displayed that shows four putative conserved domains detected in the query.
Alternatively, you can use the nucleotide sequence (AY843504.1) in blastx search to find similar
protein sequences using the translated nucleotide sequence.
6. Run a blastn search on nerd.fasta (it is available in /3.practical). This is the sequence used by
Michael Crichton in The Lost World. What do you notice about the sequence? Have a look at
Taxonomy tab! What is it closely related to (top hit)?
Go back and run a blastx on the same sequence. Scroll to the first few alignments. What do you
notice in the gaps? If you are wondering Mark Boguski was a scientist at NCBI.
You notice that some amino acid positions are in lower case and gray.
>test
NIRVANA
Then click show results in new window and then click BLAST. Note the e-value. Lower values are
important for significance.
Now go back and click algorithm, unclick change parameters for short input sequences. Click BLAST
again.
8. If you have extra time, go to blastp and try some different names that happen to overlap with the
amino acid alphabet (ARNDCQEGHILKMFPSTWYVBZ). Note the e-values as you increase the letters.
You can try, NEIL, FLEA, ELVIS, SLASH, EMINEM, HANNAH, NICKCAVE, STEPHEN