Lecture 4
Lecture 4
-----
Multiple Sequence Alignment
Course Provider: PhD Tam Tran
Department of Life Sciences (LS) – USTH
Email: [email protected]
• Introduction
• Applications: Why we do multiple sequence alignments?
• Multiple-sequence-alignment methods
ClustalW
MUSCLE
2
From pairwise to multiple alignment
F G K G K G
F G K G K G F G K F G K G
F G K F G K G - G K Q G K G
- - K F G K G
Pairwise: MSA:
For 2 sequences For more than 2 sequences
3
Why we do multiple sequence alignments?
4
Why we do multiple sequence alignments?
5
Multiple Alignment Methods
6
ClustalW
ClustalW = multiple alignment tool
The most commonly used program for making multiple sequence alignments
‘W’ stands for ‘weighted’ (sequences are weighted differently).
BLAST: Clustal:
7
pairwise comparison Global alignment "all against all"
ClustalW- Progressive Alignment
A
A B C D
A - - - -
B
B 1 - - -
C 7 8 - - C
D 11 5 2 -
D
8
See Thompson et al. (1994) for an explanation of the three
stages of progressive alignment implemented in ClustalW
MUSCLE
11
EXERCISE BREAK
Exercise 1: Perform multiple sequence alignements
1. Download amino acid sequences for accession numbers P20472, P80079, P02626,
P02619, P43305, P32930, Q91482, P02620, P02622, P02586 from the NCBI protein
database.
4. What do you guess that the symbols ("*"), (“:”) and (“.”) under the alignment mean?
5. How many stretches of perfectly conserved sequence (of at least, say, 10 amino
acid) can you find? Write down the sequence(s) of the perfectly conserved
stretch(es).
12
Interpreting Multiple Sequence Alignment
13
How can you tell whether a block is good?
14
Important amino acids (or nucleotides) are not allowed to mutate
active site
15
Editing and Analyzing Multiple Sequence Alignments
Bioedit: https://bioedit.software.informer.com/7.2/
Jalview: https://www.jalview.org/
Mega: https://www.megasoftware.net/
16
Sequence logo
17
Sequence logo
IGF1B_HUMAN APQTGIVDECCFRSCDLRRLEMYCAPLKPAKSAR
IGF1_PIG APQTGIVDECCFRSCDLRRLEMYCAPLKPAKSAR
IGF1_CANFA APQTGIVDECCFRSCDLRRLEMYCAPLKPAKSAR
IGF2_HORSE -RSRGIVEECCFRSCDLALLETYCATPAKSERDV
INS_CHIBR -----IVDQCCTSICTLYQLENYCN---------
INS_ORNAN -----IVEECCKGVCSMYQLENYCN---------
INS_AOTTR MQKRGVVDQCCTSICSLYQLQNYCN---------
19
HOMEWORK - DAY 4
1. Get 10 homologous proteins with identity <80% from BLAST result in Homework Day 3
- Provide the list of proteins sequences with their names and E-values, but NOT the full FASTA
format
21