0% found this document useful (0 votes)
47 views

Bioinformatics: Lecture 5: Calculating Identities, Similarity and Gab Scores

This document discusses how to calculate sequence identity, similarity, and gaps from a pairwise sequence alignment. It provides an example alignment between retinol-binding protein and β-lactoglobulin. There are 11 identical residues out of 50 aligned positions, giving 22% identity. There are 3 additional similar residues, giving 28% similarity. There are also internal and terminal gaps in the alignment.

Uploaded by

Salix Matt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Bioinformatics: Lecture 5: Calculating Identities, Similarity and Gab Scores

This document discusses how to calculate sequence identity, similarity, and gaps from a pairwise sequence alignment. It provides an example alignment between retinol-binding protein and β-lactoglobulin. There are 11 identical residues out of 50 aligned positions, giving 22% identity. There are 3 additional similar residues, giving 28% similarity. There are also internal and terminal gaps in the alignment.

Uploaded by

Salix Matt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Bioinformatics

Lecture 5: Calculating Identities,


Similarity and Gab Scores
Calculating Identities, Similarity &
Gab Scores:
How the computer does it behind the
scenes
Recall:
Sequence identity
• Exactly same Nucleotide/Amino Acid in same the position

Sequence similarity
• Substitutions with similar chemical properties

Sequence homology
• General term that indicates evolutionary relatedness among
sequences
• Sequences are homologous if they are derived from a
common ancestral sequence.
Pairwise alignment of retinol-binding protein
and b-lactoglobulin:
Example of an alignment with internal, terminal gaps

1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP
. ||| | . |. . . | : .||||.:| :
1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin

51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP
: | | | | :: | .| . || |: || |.
45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin

98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP


|| ||. | :.|||| | . .|
94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin

137 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV 185 RBP


. | | | : || . | || |
136 QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI....... 178 lactoglobulin
• Pairwise alignment of human RBP and bovine
B-lactoglobulin:

• Note that the alignment is global (i.e., the


entire lengths of each protein are compared)

• Identity between the two sequences is


indicated with bars, I .
• There are five different kinds of dots in this
alignment:
• (1) The paired dots between aligned residues
indicate different amounts of similarity (e.g.,
on the top line R & K have two dots and share
similar physiochemical properties) (arrow 1).
• There are five different kinds of dots in this
alignment:

• (2) Single dots between aligned residues (


arrow 2) also indicate similarity, but less than
for paired dots.
• There are five different kinds of dots in this
alignment:
• (3,4) The alignment contains both internal
gaps (indicated by dots in place of alphabetic
characters along the sequence (arrow 3) and
gaps at the amino and carboxy termini of B-
lactoglobulin (arrow 4).
• There are five different kinds of dots in this
alignment:
• (5) A dot is indicated above the sequences to
mark every 10 bp (arrow 5).
• Notice that along the top row the residues
GTWY are all identical between the two
proteins.
• Notice that along the top row the residues
GTWY are all identical between the two
proteins.
• We can count the number of identical
residues; in this case, the two proteins share
23% identity (43 residues/185 aligned
residues).

• Identity is the extent to which two amino acid


(or nucleotide) sequences are invariant.
• Some of the aligned residues are similar but not
identical

• They are related to each other because they


share similar biochemical properties.

• Similar pairs of residues are structurally or


functionally related.

• For example, on the first row of the alignment we


can find arginine and lysine (R and K connected
by two dots, : ); also we can see an aspartate
and a glutamate residue that are aligned.
• These are conservative substitutions. Amino
acids with similar properties include
• The basic amino acids (K, R, H),
• acidic amino acids (D, E),
• Hydroxylated amino acids (S, T),
• and hydrophobic amino acids (W, F, Y, L, I, V,
M, A).
• The percent similarity of two protein
sequences is the sum of both identical and
similar matches.
• On the top part of Figure 3.5, there are (50)
aligned amino acid residues of which 11 are
identical and 3 are similar.
• The percent identity is 22% (1 1/50) and the
percent similarity is 28% (14/50).
• More useful to consider the identity shared by
two protein sequences, rather than the
similarity, since similarity measure may be
based upon a variety of definitions of how
related (similar) two amino acid residues are
to each other.
• Pairwise alignment is the process of lining up
two sequences to achieve maximal levels of
identity (and maximal levels of conservation in
the case of amino acid alignments).
• The purpose of a pairwise alignment is to
assess the degree of similarity and the
possibility of homology between two
molecules.
• We may say that two proteins share 22% amino
acid identity or (as in the alignment above) that
they share 28% similarity.
• If the amount of sequence identity is
significant, then the two sequences are
probably homologous.
• Not correct to say that two proteins share a
certain percent homology; they are either
homologous or not.
• Strongest evidence to determine whether two
proteins are homologous comes from
structural studies in combination with
evolutionary analyses.
Terms of Sequence Comparison
Sequence identity
• Exactly same Nucleotide/AminoAcid in same position

Sequence similarity
• Substitutions with similar chemical properties

Sequence homology
• General term that indicates evolutionary relatedness among sequences
• Sequences are homologous if they are derived from a common ancestral
sequence.
Conservation
• Changes at a specific position of an amino acid or (less commonly,
DNA) sequence that preserve the physico-chemical properties of the
original residue.
Homework: From Second Edition

• Letters in between alignment sequences indicate matches (identity)

• + sign indicate similarity

Calculate percent identity and percent similarity.


How many gabs are there?
Calculate the percentage of gabs.
End here

You might also like