0% found this document useful (0 votes)
59 views19 pages

Bioinformatics Session11

This document discusses using various BLAST programs like TBLASTN, PSI-BLAST and DELTA-BLAST to search nucleotide and protein sequences against databases. It provides examples of using these programs to search the multidomain viral protein HIV-1 Pol against bacterial and human sequences. It also outlines a general strategy for a "Find-a-gene project" to discover a novel gene using TBLASTN and BLASTX/BLASTP searches. Key steps include searching a known protein against DNA databases, analyzing hits, and using multiple sequence alignments to confirm a novel protein is homologous to the query.

Uploaded by

Rohan Ray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views19 pages

Bioinformatics Session11

This document discusses using various BLAST programs like TBLASTN, PSI-BLAST and DELTA-BLAST to search nucleotide and protein sequences against databases. It provides examples of using these programs to search the multidomain viral protein HIV-1 Pol against bacterial and human sequences. It also outlines a general strategy for a "Find-a-gene project" to discover a novel gene using TBLASTN and BLASTX/BLASTP searches. Key steps include searching a known protein against DNA databases, analyzing hits, and using multiple sequence alignments to confirm a novel protein is homologous to the query.

Uploaded by

Rohan Ray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Bioinformatics (BIO213)

Session 11
TBLASTN, PSI-BLAST, DELTA-BLAST …
Why are they around?
Name the BLAST programs you have used till now?
• BLASTn
• BLASTp
• Smart BLAST
• Global Align
• Multiple Alignment
What are they used for?
When are they used? Example…
• BLASTn:
• Nucleotide to Nucleotide search
• BLASTp:
• Protein to Protein search
• BLASTn vs BLASTp: Which is preferred and why?
• Which scoring matrix/scheme is generally used and why?
TBLASTN, PSI-BLAST, DELTA-BLAST …
Why are they around?
• BLAST Searching with Multidomain Protein: HIV-1 Pol
• The Gag‐Pol protein of HIV‐1 (NP_057849.4) is a multidomain protein
of 1435 AA residues with protease, reverse transcriptase, and
integrase domains.
Kinds of searches we can perform with
such a viral protein:
Graphical overview: Clicking on domains takes you
to domain databases
List of alignments (query-anchored with
dots for identities)

Perfectly conserved
Rarely substituted residues
• Taxonomy report for a BLASTP search shows
an overview of which species have proteins
matching the HIV‐1 query.
• Most matches are viral, but others include
rabbit, fungal, pig, and insect sequences.
• To learn more about the distribution of Pol proteins
throughout the tree of life, we may further ask what
bacterial proteins are related to the viral HIV‐1 Pol
polyprotein.

• Repeat the BLASTP search with NP_057849 as the


query, but limit the search to “Bacteria”
BLASTP searching HIV-1 pol against bacterial proteins

bacterial matches to HIV-1


retropepsin, reverse
transcriptase domains

bacterial matches to
HIV-1 ribonuclease H
domain bacterial matches to
HIV-1 integrase core
domain

• This suggests that the ribonuclease H and integrase core domains of HIV‐1 match many dozens of bacterial
proteins.
• You can inspect pairwise alignments to confirm that the viral and bacterial proteins are homologous, often
sharing about 30% amino acid identity over spans of over 150 amino acids.
BLAST searching HIV-1 pol against human
sequences

Question: are there human homologs


of HIV-1 pol protein?
Query: HIV-1 Pol
Program: BLASTP
Database: human nr (nonredundant)
Matches: many human proteins
share significant identity.
BLAST searching HIV-1 pol against human
sequences

Question: are there human RNA


transcripts corresponding to HIV-1 pol?
Query: HIV-1 Pol Program: TBLASTN
Database: human ESTs
Matches: many human genes are actively
transcribed to generate transcripts
homologous to HIV-1 pol.

TBLASTN/X helps in searching for super diverged species


PSI-BLAST and DELATA BLAST serves the same purpose
Using BLAST for gene discovery: FIND-A-GENE

• A common problem in biology is finding a new gene.


• Traditionally, genes and proteins were identified using the techniques
of molecular biology and biochemistry.
• Such experimental biology approaches will always remain essential
but has practical limitations.
• Bioinformatics approaches can also be useful to provide evidence for
the existence of new genes.
• For our purposes a “new” gene refers to the discovery of some DNA
sequence in a database that is not annotated (described).
• You may want to find new genes for many reasons:
• Can you think of a few?
A general strategy for “Find-a-gene project” to practice BLAST

Start with the sequence TBLASTN


of a known protein

Eg. human beta globin


(NP_000509) to search for novel Inspect the
output
globin gene

BLASTX nr
or
BLASTP nr
A general strategy for “Find-a-gene project” to practice BLAST

Start with the sequence TBLASTN


of a known protein

Inspect the
output

BLASTX nr
or
BLASTP nr

2) Perform a TBLASTN search against a DNA database consisting of genomic DNA or ESTs.
Include the output of that BLAST search in your document.
A general strategy for “Find-a-gene project” to practice BLAST

Start with the sequence TBLASTN


of a known protein

3) You need to distinguish between a perfect match to your query Inspect the
(not “novel”), a near match (might be “novel”, depending on the output
results), and a nonhomologous result.

BLASTX nr
or
BLASTP nr

2) Perform a TBLASTN search against a DNA database consisting of genomic DNA or ESTs.
Include the output of that BLAST search in your document.
Gather information about this “novel” protein
• At a minimum, identify the protein sequence of the “novel”
protein as displayed in the BLAST results.
• Propose a name for the novel protein (e.g., “Krishnazoa globin”),
and report the species from which it derives.
• It is very unlikely (but still possible) that you will find a novel gene
from an organism such as S. cerevisiae, human, or mouse,
because those genomes have already been thoroughly
annotated.
• It is more likely that you will discover a new gene in a genome
that is currently being sequenced, such as bacteria or mosses or
protozoa.
A general strategy for “Find-a-gene project” to practice BLAST

Start with the sequence TBLASTN


of a known protein

Inspect the
output

BLASTX nr
or
BLASTP nr

(4)
• Use the DNA sequence of the EST and perform a BLASTX query against the nonredundant (nr) database
• As an alternative strategy, take the encoded protein sequence, and use it as a query in a BLASTP search of the
nonredundant (nr) database at NCBI.
Demonstrate that this gene, and its
corresponding protein, are novel
For the purposes of this course, “novel” is defined as follows.
• If there is a 100% identity match to a protein in the database from the same
species, then your protein is NOT novel (even if the match is to a protein with
a name such as “unknown”).
• If the best match is to a protein with < 100% identity to your query, then it is
likely that your protein is novel and you have succeeded.
• If there is a match with 100% identity but to a different species than the one
you started with, then you have succeeded in finding a novel gene.
• If there are no database matches to the original query from step (1), this
indicates that you have found a DNA/protein that is not homologous to the
original query. You should start over.
Confirm if the novel protein is hit for your
query
• Generate a multiple sequence alignment with your novel
protein, your original query protein, and a group of other
members of this family.
• A typical number of proteins to use in a multiple sequence
alignment is a minimum of 5 or 10 and a reasonable maximum
is 30.

You might also like