0% found this document useful (0 votes)

5 views46 pages

Lecture 3

The document outlines homework assignments related to bioinformatics, focusing on gene structure prediction, sequence alignment, and database searching using tools like BLAST. It includes exercises on identifying sequences and analyzing results from BLAST searches, as well as discussions on scoring parameters and alignment algorithms. The document serves as a guide for students in a bioinformatics course to complete their assignments effectively.

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views46 pages

Lecture 3

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

HOMEWORK Day 2

Extra point for

- Revise your Homework 2 from Day 1.
Homework 2 Day 1
- Extract the FASTA sequence of the genomic region of your genes (from
Homework Day 1) and predict gene structure of these DNA sequences using
one gene prediction programs. Summary the exons and introns from your
prediction; and write your observation and conclusion.

- Finding transcript information about a specific gene using NCBI & Ensembl
and compare with your prediction from bioinformatics program.

- Exploring genomic information of your genes (from Homework Day 1) using

Ensembl (see exercise 2 for detail).

- Between Ensembl and NCBI, which one would you prefer when searching
information of human genes? Why?

Homework Day 2
Which ORF is a real gene?

Homologous sequences in databank

Is my ORF similar to an already known protein ?

2
Bioinformatics
-----
Sequence Alignment and Database Search
Course Provider: PhD Tam Tran
Department of Life Sciences (LS) – USTH
Email: [email protected]

Master in Medical biotechnology - Plant biotechnology – Pharmacology – Year 2

Outline

• Database searching

• Sequence Alignment Algorithms

• Local Alignment with BLAST

4
Database searching

We compare our DNA/protein sequences to all known sequences in the database to find
any matches

GenBank, SwissProt,
One sequence Lots of sequences
NR, DDBJ

5
Why do we compare sequences?

Human vs. Chimpanzee

A similarity between 2 sequences may indicate

 a common biological function
 a similar 3D structure
 a common evolutionary origin → homology

6
Sequence Alignment Algorithms

7
What is an algorithm?
An algorithm is a step-by-step procedure to solve a problem

Input Algorithm Output

8
Algorithm for Pairwise Sequence Alignment

E.g: Compare two sequences ATGCATGC and TGCATGCA

ATGCATGC ATGCATGC-
TGCATGCA -TGCATGCA
no matching positions six matching positions

Key questions for alignment of two sequences:

Q1: What do we want to align?

Q2: How do we “score” an alignment?

Q3: How do we find the “best” alignment?

9
Q1: What Do We Want to Align?

find best match of both sequences in their entirety

find best subsequence match find best match without penalizing gaps on the
start or end of the alignment
10
Q2: How Do We Score Alignments?

S1: TACG---A--TTCAGATACG
|||| | ||||||||||
S2: AACGCTAACGTTCAATCGTC

Score(alignment) = Total cost of editing S1 into S2

 Cost of substitution (-1) Substitution Insertion Deletion
 Cost of insertion / deletion (-1) AATAAGC AAT-AAGC AATAAGC
 Reward of match (+2)
 Score for gaps (-1) AATTAAGC AATTAAGC AA-AAGC

We would score it by:

s(T,A) + m(A,A) + m(C,C) + m(G,G) + 3g + m(A,A) + 2g …

→ Therefore the score should be: 5

11
Q3: How Do We Find the Best Alignment?

 Simple approach: compute & score all possible alignments

E.g. How many possible global alignments are there for 2 sequences of length 3 (TAT and TCT)?

12
 But: too many possible alignments
Possible global alignments for 2 sequences of length n

e.g. two sequences of length 100 have possible ~ 1077 alignments

We need a smart algorithm

13
Dot plot

• Maybe a dot plot will help

G
Sequence 1 A
T
A
C
T
A C A T A G
Query

14
The dot plot of the alignment of two different contigs to a reference sequence.

15
Needleman-Wunsch algorithm
Key insight: Matrix representation of alignments

• Build a matrix
• S(i,j) = score of the best alignment of S1[1..i] and
S2[1..j]

• Systemically ﬁll in the matrix and compute the

optimal score in S(i, j)
• Trace back from the optimal score, and ﬁnd
Goal: alignment solution.
Find best path through the matrix

16
(Needleman & Wunsch, Journal of Molecular Biology, 1970)
Scoring parameters

Parameters so far:
 Match/mismatch
 Gap opening
Insertion/Deletion
 Gap extending

CGATGCAGCAGCAGCATCG CGATGCAGCAGCAGCATCG
|||||| ||||||| || || |||| || || |
CGATGC------AGCATCG CG-TG-AGCA-CA--AT-G

gap opening gap extension

(13 x 1) - 10 - (6 x 1) = -3 (13 x 1) - (5 x 10) - (6 x 1) = -43

this alignment is more likely … … than this one

(single evolutionary event) (5 evolutionary events)
DNA substitutions

• DNA:
 Purines (A,G) – dual ring

 Pyrimidines (C, T) – single ring

• Transitions: Substitutions of the same type

• Transversions: Exchanging one for another

• Transistions occur more frequently than transversions,

so we can score them higher in the scoring matrix

18
Amino acid substitution matrices

Scoring matrices reflect:

 Probabilities of mutual substitutions
 Probability of occurrence of each amino acid

Two most commonly used models:

 BLOSUM
 PAM

19
BLOSUM 62 (BLOcks SUbstitution Matrix)

Cluster proteins with identity greater than 62%

Estimate the frequency that a is replaced by b

S(a, b) = log (Pab/PaPb)

BLOSUM62 is the most frequently used protein similarity alignment scoring matrix,
20
default for NCBI BLASTP
Scoring parameters

• Match/mismatch

• Gap opening

• Gap extending

• Substitution matrix

21
BLAST
(Basic Local Alignment Search Tool)

22
NCBI BLAST

• The Basic Local Alignment Search Tool (BLAST) finds

regions of local similarity between sequences.

• The program compares nucleotide or protein sequences

to sequence databases and calculates the statistical
significance of matches.

• BLAST can be used to infer functional and evolutionary

relationships between sequences as well as help identify
members of gene families.

https://blast.ncbi.nlm.nih.gov/Blast.cgi

23
How BLAST works

“words” (subsequences of the query seq)

Query words are compared to the database

(target sequences) and exact matches
identified

For each word match, alignment is extended

in both directions to find alignments that score
greater than some threshold (maximal
segment pairs, or MSPs)

(Schneider and La Rota 2000)

BLAST @NCBI: easy !!

1. query (your sequence)

2. search space
(what you want to compare with)

3. submit !
Several versions of BLAST

Query
Database
sequence
blastn nucleic nucleic

blastp protein protein

nucleic
blastx ↓ protein
protein
nucleic
tblastn protein ↓
protein
nucleic nucleic
tblastx ↓ ↓
protein protein
EXERCISE BREAK
Exercise 1: Identify sequences with BLAST
>Unknown_sequence 1
AAATGAGTTAATAGAATCTTTACAAATAAGAATATACACTTCTGCTTAGGATGATAATTGGAGGCAAGTGAAT
CCTGAGCGTGATTTGATAATGACCTAATAATGATGGGTTTTATTTCCAGACTTCACTTCTAATGGTGATTATG
GGAGAACTGGAGCCTTCAGAGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGA
TTATGCCTGGCACCATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCAT
CAAAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGCATATGAACCCTTCACACT
ACCCAAATTATATATTTGGCTCCATATTCAATCGGTTAGTCTACATATATTTATGTTTCCTCTATGGGTAAGC
TACTGTGAATGGATCAATTAATAAAACACATGACCTATGCTTTAAGAAGCTTGCAAACACATGAA

1. Select the appropriate type of BLAST for this unknown sequence.

2. Report the "Algorithm parameters“ of your search

3. What does BLAST tell you about this sequences?

27
BLAST results

1. summary of the query

2. graphical summary of results

3. detailled summary of results

4. alignments
BLAST results

Summary of query
which sequence was submitted ?

which database is queried ?

which program is used ?

BLAST results

2. Graphical summary

this red bar represents the

submitted sequence (long. 253 AA)

each colored line represents a local alignment

between the query sequence and a sequence
from the selected database
color → score
length → size of the alignment

= HSP ("high scoring pair")

BLAST results

List of alignments
identifier description score E-value

each line in this summary

corresponds to a colored line
in the graphics
How to read a BLAST alignment ?

name, identifier, description and length of the similar sequence in the database

Query = user submitted sequence

Subject = similar sequence found in the database
How to read a BLAST alignment ?

3 lines in the alignment

 top line: query sequence
 bottom line: subject sequence
 middle line: AA if conserved between both sequences; "+" if score of
the substitution is positive
How to read a BLAST alignment ?

Score (raw score) E-value

Number of
matches (%) Number of positive
Number of gaps (%)
substitutions (%)
Interpreting the E-value
What is the likelihood that this alignment represents a true
homology between both sequences ?
1 10-10
maybe
no homology homology
("twilight zone")

false-positives: both sequences are

aligned, but no homology. Example :

Query: "British forces at a gaelic football match during the war of …"
Items of interest in BLAST

•Max[imum] Score: the highest alignment score calculated from the sum of the
rewards for matched nucleotides or amino acids and penalities for mismatches
and gaps.
•Tot[al] Score: the sum of alignment scores of all segments from the same
subject sequence.
•Query Cover[age]: the percent of the query length that is included in the
aligned segments.
•E[xpect] Value: the number of alignments expected by chance with the
calculated score or better. The expect value is the default sorting metric; for
significant alignments the E value should be very close to zero.
•Ident[ity]: the highest percent identity for a set of aligned segments to the
same subject sequence.

36
BLAST must-know

 BLAST= very fast tool for local alignments, used to query a sequence
database with a query sequence
 output: list of HSP (high scoring pairs = alignments) with
 % identity, % positives, %gaps
 score, E-value
 E-value = statistical measure : how many HSP with a comparable score
would we expect to find by chance ?
 E-value decreases when score increases: the smaller, the better !
E-value < 1e-10: very likely homology.
EXERCISE BREAK
Exercise 2: Carrying out a BLAST search of an unknown protein
>unknown_protein2
VVKSSGVRQPFDKEKIYKVLKWACDGHNIDVRAFLENVLELIRDGMTTKQIQRIAAIKYA
ADHISVKEPDWQYVASNLEMFALRKDVYGQFDPIPFYDHIVKMVEAGKYDKEILEKYSKQ
DIQVFERAIDHDKDFEFSYAGSQQLIGKYLVQDRDTGEIFETPQYAFMLIAMCLHQEETG
AQVTHIVDFYNAISDRKLSLPTPIMAGVRTPTRQFSSCVVIESGDSLGSLNAVTSAIKVY
ISQRAGIGVNAGHIRAMGSKIRGGEAVHTGVIPFWKIQTAVKSCSQGGVRGGAATLYYPF
WHLEVENLLVLKNNKGVEENRVRHLDYGVQLNQLMYKRLMNRDYITLFSPDVANDRLYDL

1. Which type of BLAST search should use for this sequence?

2. Report the "Algorithm parameters“ of your search
3. Do you find any sequences that look like your input sequences
4. What is the typical length of the hits (the alignment length)?
5. What is the typical % identity?
6. What is the range of the E-values?

38
HOMEWORK - DAY 3
1. Report your translated protein sequence in FASTA format of one gene in your three genes
from Gene list (Homework Day 1)

2. Using BLAST to find similar proteins to this protein in mouse, fly, and C. elegans
a. Report the "Algorithm parameters“ of your search
b. Do we find any significant hits? How many are they? % identity? the range of the E-values?
c. Are all the best hits the same category of function?
d. Report a best hit of your sequence
- What is the identifier (Accession)?
- What is the alignment score ("max score")?
- What is the percent identity and query coverage?
- What is the E-value?
- Are there any gaps in the alignment?

e. Give a summary of your BLAST search for this protein.

DEADLINE: 10am Thursday 22th 2021

39
END

40
NW algorithm calculate the scoring matrix for the alignment between two sequences

Si-1, j-1 Si-1, j

Si, j-1 Si, j

41
Example
S1,1 = 0 S1, j-1 + g

Si-1, 1 + g

Gap penalty = -1
Match = +2
Mismatch = -1

42
- A
0 -1
_ Si-1, j-1 Si-1, j Si-1, j-1 + Mi,j = 0 + 2

-1 2 Max Si, j-1 + g = -1 + 0

A Si, j-1 Si, j
Si-1, j + g = 0 + (-1)
Gap penalty = -1
Match = +2
Mismatch = -1

43
- A G
-1 -2
_ 0
Si-1, j-1 Si-1, j Si-1, j-1 + Mi,j = -1 + (-1)

2 1 Max Si, j-1 + g

A -1 = 2 + (- 1)
Si, j-1 Si, j
Si-1, j + g = -2 + (-1)
Gap penalty = -1
Match = +2
Mismatch = -1

44
The traceback path
The traceback performed on the completed traceback matrix:

- A G C A
_ 0 -1 -2 -3 -4

A -1 2 1 0 -1

C -2 1 1 3 2

A -3 0 0 2 5
Traceback
A -4 -1 -1 1 4 Starts here

45
The traceback path
- A G C A S1
diag – the letters from two sequences are aligned
_ done left – a gap is introduced in the left sequence (S2)
up – a gap is introduced in the top sequence (S1)

A up diag

C diag

A left
Traceback
A diag Starts here

S1: A_CAA

S2: AGCA_ 46

Prestige Lavender Fields
No ratings yet
Prestige Lavender Fields
53 pages
Blast
No ratings yet
Blast
115 pages
Bt7 Ncbi Blast
No ratings yet
Bt7 Ncbi Blast
60 pages
5 Database Similarity Search BLAST
No ratings yet
5 Database Similarity Search BLAST
47 pages
Teenage Pregnancy
No ratings yet
Teenage Pregnancy
19 pages
Sequence Alignment Session.3-2020
No ratings yet
Sequence Alignment Session.3-2020
34 pages
MLAB 2361 Clinical II Immunohematology Assignment Activity 6: ABO Discrepancies Case Studies
0% (1)
MLAB 2361 Clinical II Immunohematology Assignment Activity 6: ABO Discrepancies Case Studies
7 pages
Modeling 161 Enu
No ratings yet
Modeling 161 Enu
268 pages
Fundamentals of Bioinformatics - L5
No ratings yet
Fundamentals of Bioinformatics - L5
56 pages
BLAST
No ratings yet
BLAST
30 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Second - Done - w14b - Searching Squence Databases
No ratings yet
Second - Done - w14b - Searching Squence Databases
32 pages
Lecture2022 - 3 /!
No ratings yet
Lecture2022 - 3 /!
60 pages
Bio 2
No ratings yet
Bio 2
39 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
BLAST Background
100% (1)
BLAST Background
27 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
Database Searching
No ratings yet
Database Searching
41 pages
Unit Iv - Blast
No ratings yet
Unit Iv - Blast
21 pages
05 CAP5510 Fall21
No ratings yet
05 CAP5510 Fall21
40 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Television Broadcasting Systems Maintenance Ennes
No ratings yet
Television Broadcasting Systems Maintenance Ennes
510 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
No ratings yet
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
53 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
Janome Memory Craft Compulock II Sewing Machine Service Manual
No ratings yet
Janome Memory Craft Compulock II Sewing Machine Service Manual
38 pages
Lecture 8 - BLAST - MSA
No ratings yet
Lecture 8 - BLAST - MSA
15 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
BLAST Topic
No ratings yet
BLAST Topic
13 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Blast Fasta
No ratings yet
Blast Fasta
27 pages
Lab 2.1
No ratings yet
Lab 2.1
21 pages
Blast 2 Sequences: Salman Khan Current Gpa in Bioinf 4 Gpa
No ratings yet
Blast 2 Sequences: Salman Khan Current Gpa in Bioinf 4 Gpa
45 pages
Racecar Engineering 2006 05 PDF
No ratings yet
Racecar Engineering 2006 05 PDF
100 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
Blast
No ratings yet
Blast
18 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Earthing Grid Design Calculations Rev 0
100% (1)
Earthing Grid Design Calculations Rev 0
10 pages
1201 - Adam Norton Research Paper
No ratings yet
1201 - Adam Norton Research Paper
15 pages
Basic Local Alignment Search Tool-BLAST
No ratings yet
Basic Local Alignment Search Tool-BLAST
9 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Lecture 8 ACB
No ratings yet
Lecture 8 ACB
5 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
Introduction To Bioinformatics 3. Sequence Alignment #1
No ratings yet
Introduction To Bioinformatics 3. Sequence Alignment #1
24 pages
Contraceptive Pharmacology Katz Com 2010
No ratings yet
Contraceptive Pharmacology Katz Com 2010
41 pages
BLAST Script
No ratings yet
BLAST Script
10 pages
p7 - Circuits - Series Parallel
No ratings yet
p7 - Circuits - Series Parallel
27 pages
1 What Is Programming
No ratings yet
1 What Is Programming
14 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
BLAST - A Heuristic Algorithm
No ratings yet
BLAST - A Heuristic Algorithm
18 pages
Qualitative Data Analysis
No ratings yet
Qualitative Data Analysis
14 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
Blast ND Fasta
No ratings yet
Blast ND Fasta
28 pages
02.-Sequence Analysis PDF
No ratings yet
02.-Sequence Analysis PDF
14 pages
Computer Architecture Topic Analysis
No ratings yet
Computer Architecture Topic Analysis
6 pages
Flashcards - Topic 14 Coordination and Response - CAIE Biology IGCSE
No ratings yet
Flashcards - Topic 14 Coordination and Response - CAIE Biology IGCSE
121 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lecture 2
No ratings yet
Lecture 2
40 pages
University of Kwazulu-Natal Bioinformatics Gene320 3 May 2016 Test 2 Duration 100 Minutes Total Marks: 70
No ratings yet
University of Kwazulu-Natal Bioinformatics Gene320 3 May 2016 Test 2 Duration 100 Minutes Total Marks: 70
6 pages
Eapp-Module-1 Revised 2024
No ratings yet
Eapp-Module-1 Revised 2024
8 pages
Understanding Power Flow
No ratings yet
Understanding Power Flow
20 pages
4-Excitable Cell 2024
No ratings yet
4-Excitable Cell 2024
23 pages
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
4 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Saltwater System With E.C.O. (Electrocatalytic Oxidation) : Important Safety Rules
No ratings yet
Saltwater System With E.C.O. (Electrocatalytic Oxidation) : Important Safety Rules
27 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Starjammer - Medical Marvels
No ratings yet
Starjammer - Medical Marvels
32 pages
JCB
No ratings yet
JCB
8 pages
3-Introduction To Physiology Part II 2024
No ratings yet
3-Introduction To Physiology Part II 2024
36 pages
1-Basic Human and Animal Anatomy 2024
No ratings yet
1-Basic Human and Animal Anatomy 2024
34 pages
3.5.7 - Radiation Scanning Procedure PDF
No ratings yet
3.5.7 - Radiation Scanning Procedure PDF
3 pages
Disc11-Examprep-Sols (9 Files Merged)
No ratings yet
Disc11-Examprep-Sols (9 Files Merged)
12 pages
Handout 3.1 Writing Equation of A Line 1 Combined
No ratings yet
Handout 3.1 Writing Equation of A Line 1 Combined
13 pages
Hul 253 Term Paper
No ratings yet
Hul 253 Term Paper
3 pages
Rotational Symmetry
No ratings yet
Rotational Symmetry
2 pages
How To Make Simple Creamy Coffee Milk
No ratings yet
How To Make Simple Creamy Coffee Milk
6 pages
22 em
No ratings yet
22 em
4 pages
Byram Estate
No ratings yet
Byram Estate
3 pages
Electro-Thermal Analysis For Automotive High Power Mosfets
No ratings yet
Electro-Thermal Analysis For Automotive High Power Mosfets
4 pages
Jaundice
No ratings yet
Jaundice
2 pages
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture 3

Uploaded by

Lecture 3

Uploaded by

HOMEWORK Day 2

Extra point for

- Exploring genomic information of your genes (from Homework Day 1) using

Homologous sequences in databank

Master in Medical biotechnology - Plant biotechnology – Pharmacology – Year 2

• Sequence Alignment Algorithms

• Local Alignment with BLAST

Human vs. Chimpanzee

A similarity between 2 sequences may indicate

Input Algorithm Output

E.g: Compare two sequences ATGCATGC and TGCATGCA

Key questions for alignment of two sequences:

Q1: What do we want to align?

Q2: How do we “score” an alignment?

Q3: How do we find the “best” alignment?

find best match of both sequences in their entirety

Score(alignment) = Total cost of editing S1 into S2

We would score it by:

→ Therefore the score should be: 5

 Simple approach: compute & score all possible alignments

e.g. two sequences of length 100 have possible ~ 1077 alignments

We need a smart algorithm

• Maybe a dot plot will help

• Systemically ﬁll in the matrix and compute the

gap opening gap extension

(13 x 1) - 10 - (6 x 1) = -3 (13 x 1) - (5 x 10) - (6 x 1) = -43

this alignment is more likely … … than this one

 Pyrimidines (C, T) – single ring

• Transitions: Substitutions of the same type

• Transversions: Exchanging one for another

• Transistions occur more frequently than transversions,

Scoring matrices reflect:

Two most commonly used models:

Cluster proteins with identity greater than 62%

Estimate the frequency that a is replaced by b

S(a, b) = log (Pab/PaPb)

• The Basic Local Alignment Search Tool (BLAST) finds

• The program compares nucleotide or protein sequences

• BLAST can be used to infer functional and evolutionary

“words” (subsequences of the query seq)

Query words are compared to the database

For each word match, alignment is extended

(Schneider and La Rota 2000)

1. query (your sequence)

blastp protein protein

1. Select the appropriate type of BLAST for this unknown sequence.

2. Report the "Algorithm parameters“ of your search

3. What does BLAST tell you about this sequences?

1. summary of the query

2. graphical summary of results

3. detailled summary of results

which database is queried ?

which program is used ?

this red bar represents the

each colored line represents a local alignment

= HSP ("high scoring pair")

each line in this summary

Query = user submitted sequence

3 lines in the alignment

Score (raw score) E-value

false-positives: both sequences are

1. Which type of BLAST search should use for this sequence?

e. Give a summary of your BLAST search for this protein.

DEADLINE: 10am Thursday 22th 2021

Si-1, j-1 Si-1, j

Si, j-1 Si, j

-1 2 Max Si, j-1 + g = -1 + 0

2 1 Max Si, j-1 + g

You might also like