0% found this document useful (0 votes)

69 views

BLAST Lecture Notes

The document provides information about BLAST and sequence alignment algorithms. It defines key BLAST programs and what types of queries and databases they can search. It also defines important scoring terms like E-value, percent identity, and substitution matrices like BLOSUM and PAM that are used in sequence alignments.

Uploaded by

alyaasalahmohamedmahmoudali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

BLAST Lecture Notes

Uploaded by

alyaasalahmohamedmahmoudali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Bioinformatics

Dr Mohamed Abdelmoteleb
Lecturer of Microbiology - Bioinformatics
BLAST (The Basic Local Alignment Search Tool)

BLAST algorithm: Karlin Altschul algorithm

Find a common character-pattern in two sequences, and
use it as core, and extend the alignment in both
directions from the core
BLAST Programs
How to select a program:
– What type of query sequence you have
(nucleotide or protein)
– What type of database you want to search
against (nucleotide or protein)

Start With Compare against Use

Nucleotide sequence Nucleotide sequence database blastn

Protein sequence Protein sequence database blastp

Nucleotide sequence Protein sequence database blastx

Nucleotide sequence database
Protein sequence tblastn
(6 frame translations)
Nucleotide sequence Nucleotide sequence database
tblastx
(6 frame translations) (6 frame translations)
https://blast.ncbi.nlm.nih.gov/Blast.cgi
Important definitions

The RAW score is calculated as a sum of:

The match or mismatch scores between nucleic acid
bases OR amino acid residues (BLOSUM or PAM
scoring matrices)
the number of indels (insertions/deletions) gap
opening penalty
the total length of indels gap extension penalty.

The BIT score is the log2 of the raw score

Parameters λ and K depend on the substitution

matrix and the gap penalties (Altschul algorithm)
Important definitions

P-value: is the probability to obtain

by chance a score x at least equal
to S
P-val (S) = P(x ≥ S)
Important definitions

E-value (Expectation value): is a correction of P

value for multiple testing
In the context of database searches, E value is the
number of distinct alignments, with a score equivalent to
or better than S, that are expected to occur in a
database search by chance.
The lower the E value, the more significant the score is.

E-val (S) = P-val (S) * N

where N is the size of the search space (N = n*m where

n is the length of the query sequence and m is the
length of the database).
Important definitions
Max score = highest alignment score (bit-score)
between the query sequence and the database
sequence segment .

Total score = sum of alignment scores of all segments

from the same database sequence that match the
query sequence (calculated over all segments). This
score is different from the max score if several parts
of the database sequence match different parts of
the query sequence.

Query coverage = percent of the query length that is

included in the aligned segments.
Percent Sequence Identity:
Percent of identical matches between base pairs or
amino acids in pairwise sequence alignment
Percent Sequence Similarity:

There are amino acid changes. However, amino acid

changes tend to preserve the physico-chemical

properties of the original residue

– Polar to polar
• aspartate à glutamate
– Nonpolar to nonpolar
• leucine à valine
– Similar sized residues
• Glycine to alanine
Classification of Amino acids

• Acidic amino acid residues: • Aliphatic (oily, long chain) residues

– aspartic acid (D) and – leucine (L)
– glutamic acid (E) – isoleucine (I)
– valine (V)
• Basic (high pH) amino acid
– methionine(M)
residues:
– arginine (R)
• Aromatic residues
– triptophan (the largest residue, W)
– lysine (K) – phenylalanine (F)
– to a lesser extent histidine (H) – tyrosine (Y)
• Other polar (hydrophilic) • Small side chain
– asparagine (N) – glycine (the smallest residue, G)
– glutamine (Q) – alanine (A)
– serine (S) • Disulphide bridge forming
– threonine (T) – cysteine (C) , see also
selenocysteine
• Alpha-helix breaker, rigid structure
– proline (P)
Scoring matrices FYI

• Amino acid substitution matrices

– PAM
– BLOSUM

• DNA substitution matrices

– As a rule, DNA is much less conserved
than protein sequences
– Less effective to compare coding regions
at nucleotide level

FYI: For your information

PAM FYI: For your information

Point accepted mutation

PAM matrices are amino acid
substitution matrices that encode the expected
evolutionary change at the amino acid level.

A PAM matrix is a matrix where each column and

row represents one of the twenty standard
amino acids.
For any specific pair (Ai, Aj) of amino acids
PAM matrix reflects the frequency at which Ai is
expected to replace with Aj in two sequences
that are n PAM units diverged.
PAM matrix

FYI: For your information

BLOSUM

Blocks Substitution Matrix

– Scores derived from observations of the
frequencies of substitutions in blocks of
local alignments in related proteins
– Matrix name indicates evolutionary
distance
– BLOSUM62 was created using sequences
sharing no more than 62% identity

FYI: For your information

The BLOSUM62 scoring matrix:
a brief summary of
a large part of protein biochemistry

-OH, -SH

Small aliphatic

Acidic pH

Basic pH

Large aliphatic

Aromatic

FYI: For your information

Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
Protein Alignment Scoring - PAM and BLOSUM
No ratings yet
Protein Alignment Scoring - PAM and BLOSUM
11 pages
Fundamentals of bioinformatics_L5
No ratings yet
Fundamentals of bioinformatics_L5
56 pages
BLAST Background
100% (1)
BLAST Background
27 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Sequence Alignment: Scoring Matrices
No ratings yet
Sequence Alignment: Scoring Matrices
30 pages
PAM and BLOSUM
No ratings yet
PAM and BLOSUM
21 pages
Scoring Matrices and The Statistical Significance of Molecular Sequence Features
No ratings yet
Scoring Matrices and The Statistical Significance of Molecular Sequence Features
2 pages
Blast
No ratings yet
Blast
26 pages
Where Did The BLOSUM62 Alignment Score Matrix Come From?: Primer
No ratings yet
Where Did The BLOSUM62 Alignment Score Matrix Come From?: Primer
2 pages
Lecture 9 Scoring Matrices
No ratings yet
Lecture 9 Scoring Matrices
20 pages
Algorithm Design and Scoring Matrices PDF
No ratings yet
Algorithm Design and Scoring Matrices PDF
31 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
5 Database Similarity Search BLAST
No ratings yet
5 Database Similarity Search BLAST
47 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
2-Substitution Matrices and Python - 2017
No ratings yet
2-Substitution Matrices and Python - 2017
65 pages
1 Pearson
No ratings yet
1 Pearson
9 pages
Bioinfo - BLAST - Scores PDF
No ratings yet
Bioinfo - BLAST - Scores PDF
8 pages
BLAST
No ratings yet
BLAST
30 pages
Module III
No ratings yet
Module III
55 pages
BLAST Glossary With Highlights
No ratings yet
BLAST Glossary With Highlights
9 pages
Blast
100% (1)
Blast
21 pages
6 Blastp
No ratings yet
6 Blastp
1 page
05 CAP5510 Fall21
No ratings yet
05 CAP5510 Fall21
40 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Basic Bioinformatics
No ratings yet
Basic Bioinformatics
40 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
Multiple Sequence Alignment MSA
No ratings yet
Multiple Sequence Alignment MSA
8 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
_second_done_w14b_searching squence databases
No ratings yet
_second_done_w14b_searching squence databases
32 pages
BLOSUM
No ratings yet
BLOSUM
3 pages
Blast & Fasta
No ratings yet
Blast & Fasta
47 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
BLAST
100% (1)
BLAST
4 pages
Lab 2.1
No ratings yet
Lab 2.1
21 pages
Blast ND Fasta
No ratings yet
Blast ND Fasta
28 pages
Unit2 2
No ratings yet
Unit2 2
30 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
100% (3)
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
23 pages
TY-Exercise_4_(35)
No ratings yet
TY-Exercise_4_(35)
8 pages
7256
No ratings yet
7256
51 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
Week 3 LocalAlignment
No ratings yet
Week 3 LocalAlignment
25 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
5.Pairwise Alignment
No ratings yet
5.Pairwise Alignment
85 pages
Unit-5 Bioinformatics
No ratings yet
Unit-5 Bioinformatics
13 pages
Multi Blast
No ratings yet
Multi Blast
3 pages
Blast
No ratings yet
Blast
28 pages
Anatomy and Physiology Terms: Brief Definitions, Roots & Morphology; An Abecedary; Vol 4B Muscular System - Histology Terms
From Everand
Anatomy and Physiology Terms: Brief Definitions, Roots & Morphology; An Abecedary; Vol 4B Muscular System - Histology Terms
Lee Oliva
No ratings yet
IB Biology Revision Workbook
From Everand
IB Biology Revision Workbook
Roxanne Russo
No ratings yet
Multiple Choice Questions in Hematology
From Everand
Multiple Choice Questions in Hematology
Amin Alamin
No ratings yet
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
From Everand
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
Ilya Narsky
No ratings yet
Chemistry Part Two Dictionary: Grow Your Vocabulary, #29
From Everand
Chemistry Part Two Dictionary: Grow Your Vocabulary, #29
Blake Pieck
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
BLAST Homepage and Selected Search Pages: Background
No ratings yet
BLAST Homepage and Selected Search Pages: Background
8 pages
Jurnal Enterobacteriaceae Heat Resistance
No ratings yet
Jurnal Enterobacteriaceae Heat Resistance
12 pages
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
No ratings yet
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
17 pages
DISSERTATION
No ratings yet
DISSERTATION
18 pages
02 Blades (Instruction Manual) - v2
No ratings yet
02 Blades (Instruction Manual) - v2
54 pages
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
No ratings yet
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
14 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Blast Command Line Applications User Manual: Last Updated: June 28, 2021
No ratings yet
Blast Command Line Applications User Manual: Last Updated: June 28, 2021
101 pages
List of Online Bioinformatics Tools and Software - Final
No ratings yet
List of Online Bioinformatics Tools and Software - Final
23 pages
Maptek_BlastLogic_Measure_Audit_Improve_benefitstudy
No ratings yet
Maptek_BlastLogic_Measure_Audit_Improve_benefitstudy
1 page
Book Chapter
No ratings yet
Book Chapter
17 pages
BLAST Lecture Notes
No ratings yet
BLAST Lecture Notes
16 pages
Journal of Plant Physiology: Sciencedirect
No ratings yet
Journal of Plant Physiology: Sciencedirect
10 pages
2017 DNA Barcoding JNP Np6b01085 Si 001
No ratings yet
2017 DNA Barcoding JNP Np6b01085 Si 001
7 pages
Bioinfromatics part -2
No ratings yet
Bioinfromatics part -2
77 pages
Syllabus M.tech Computational Biology 2023 2024
No ratings yet
Syllabus M.tech Computational Biology 2023 2024
68 pages
Lab Report 05
No ratings yet
Lab Report 05
20 pages
AutoPIPE Vessel - Defining Multiple Load Combinations
No ratings yet
AutoPIPE Vessel - Defining Multiple Load Combinations
20 pages
BLAST: An Introductory Tool For Students To Bioinformatics Applications
No ratings yet
BLAST: An Introductory Tool For Students To Bioinformatics Applications
11 pages
PDF (Ebook) Bioinformatics and Functional Genomics by Jonathan Pevsner ISBN 9781118581780, 1118581784 download
100% (2)
PDF (Ebook) Bioinformatics and Functional Genomics by Jonathan Pevsner ISBN 9781118581780, 1118581784 download
67 pages
Week2 BlastTutorial
No ratings yet
Week2 BlastTutorial
11 pages
(Ebook PDF) Understanding Bioinformatics by Marketa Zvelebil Ebook All Chapters PDF
100% (4)
(Ebook PDF) Understanding Bioinformatics by Marketa Zvelebil Ebook All Chapters PDF
41 pages
Bioinformatics Lab Assignment Group 3
No ratings yet
Bioinformatics Lab Assignment Group 3
7 pages
GenBank Overview
No ratings yet
GenBank Overview
2 pages
Identification of Functionally Related Enzymes by Learning-to-Rank Methods
No ratings yet
Identification of Functionally Related Enzymes by Learning-to-Rank Methods
13 pages
Bioinformatics-An Introduction and Overview
No ratings yet
Bioinformatics-An Introduction and Overview
12 pages
2028013_An assignment on (Explore NCBI,PDB and BLAST)
No ratings yet
2028013_An assignment on (Explore NCBI,PDB and BLAST)
11 pages
lncRNADetector A Bioinformatics Pipeline For Long Non-Coding RNA Identification and MAPslnc A Repository of Medicinal and Aromatic Plant lncRNAs
No ratings yet
lncRNADetector A Bioinformatics Pipeline For Long Non-Coding RNA Identification and MAPslnc A Repository of Medicinal and Aromatic Plant lncRNAs
7 pages
BTT302 - Ktu Qbank
No ratings yet
BTT302 - Ktu Qbank
6 pages
Biological Pattern Discovery With R Machine Learning Approaches Zheng Rong Yang pdf download
100% (1)
Biological Pattern Discovery With R Machine Learning Approaches Zheng Rong Yang pdf download
79 pages

BLAST Lecture Notes

Uploaded by

BLAST Lecture Notes

Uploaded by

Bioinformatics

BLAST algorithm: Karlin Altschul algorithm

Start With Compare against Use

Protein sequence Protein sequence database blastp

Nucleotide sequence Protein sequence database blastx

The RAW score is calculated as a sum of:

The BIT score is the log2 of the raw score

Parameters λ and K depend on the substitution

P-value: is the probability to obtain

E-value (Expectation value): is a correction of P

E-val (S) = P-val (S) * N

where N is the size of the search space (N = n*m where

Total score = sum of alignment scores of all segments

Query coverage = percent of the query length that is

There are amino acid changes. However, amino acid

changes tend to preserve the physico-chemical

properties of the original residue

• Acidic amino acid residues: • Aliphatic (oily, long chain) residues

• Amino acid substitution matrices

• DNA substitution matrices

FYI: For your information

Point accepted mutation

A PAM matrix is a matrix where each column and

FYI: For your information

Blocks Substitution Matrix

FYI: For your information

FYI: For your information

You might also like