Module III
Module III
If you know your own DNA sequence than you know every thing about your self
Unique requirements
Heuristic search
Basic Local Alignment Search Tool (BLAST) and
FASTA
Variants, Statistical significance
Pyrimidines:
It is an organic ring consisting of six atoms: 4 carbon atoms and 2
nitrogen atoms
The nitrogen atoms are placed in the 1 and 3 positions around the ring
include cytosine, thymine, uracil, thiamine (vitamin B1), uric acid, and
barbituates
Pyrimidines function in DNA and RNA, cell signaling, energy storage (as
residues, as well as
the likelihood of certain residues being substituted among true
homologous sequences
Certain amino acids with similar physicochemical properties can be
more easily substituted than those without similar characteristics
substitutions
Although the two different approaches
coincide to a certain extent,
the first approach has been shown
residues
A zero score means that the frequency of amino acid substitutions
aligned position,
nine of the sequences are F (phenylalanine) and the remaining one
I (isoleucine)
The observed frequency of I being substituted by F is one in ten (0.1),
After taking this ratio to the logarithm to the base of 2, this makes the
log-odds equal to 1
This value can then be interpreted as the likelihood of substitution
between the two residues being 2^1, which is two times more frequently
than by random chance
February 10, 2025 Bioinformatics 13
Amino Acid Scoring Matrices- PAM
The PAM matrices (also called Dayhoff PAM matrices) were first
constructed by Margaret Dayhoff, who compiled alignments of 71
groups of very closely related protein sequences
PAM stands for “point accepted mutation” or APM
Because of the use of very closely related homologs,
The observed mutations were not expected to significantly change
sequences
Sequence alignment statistics for more divergent sequences are not
available
To fill in the gap, a new set of substitution matrices have been
developed
This is the series of amino acid blocks substitution matrices
(BLOSUM), all of which are derived based on
direct observation for every possible amino acid substitution in
database
BLAST performs sequence alignment through the following steps
Step1:
Create a list of words from the query sequence
Each word is typically three residues for protein sequences and eleven
sequence
This step is also called seeding
February 10, 2025 Bioinformatics 36
Heuristic Database Searching: BLAST
Step 2:
Search a sequence database for the occurrence of these words
words
Step 3:
The matching of the words is scored by a given substitution matrix
Step 4:
Perform pairwise alignment from highest score word by extending the
words in both directions while counting the alignment score using the
same substitution matrix
The extension continues until the score of the alignment drops below a
high-scoring segment
In the original version of BLAST, the highest scored HSPs are
-------------------------------------------------------------------------------------
A recent improvement in the implementation of BLAST is the ability
database
BLASTp uses protein sequences as queries to search against a protein
sequence database
BLASTx uses nucleotide sequences as queries and translates them in all
frames, to search against a nucleotide sequence database that has all the
Februarysequences
10, 2025 translated in six frames
Bioinformatics 41
Variants of BLAST
Step 3
The gapped alignment is refined further using the Smith–Waterman
whereas
FASTA identifies identical matching words using the hashing
procedure
Thank You