0% found this document useful (0 votes)

6 views51 pages

Bio 3

The document provides an overview of protein structure, including primary, secondary, tertiary, and quaternary structures, as well as the concept of active sites and their importance in drug design. It discusses bioinformatics topics such as sequence alignment, protein-ligand docking, and the challenges faced in the interdisciplinary field. Additionally, it covers DNA sequencing techniques and the significance of sequence conservation in evolutionary biology.

Uploaded by

mahmoudweso2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views51 pages

Bio 3

Uploaded by

mahmoudweso2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

BIOINFORMATICS(BIOCOMPUTING)

(3)
ALIGNMENT AND MATCHING
DR. IBRAHIM ZAGHLOUL
PROTEIN STRUCTURE

https://www.rcsb.org/structure/3ERT 2
MACROMOLECULAR STRUCTURE
• Primary structure of proteins
– Linear polymers linked by peptide bonds
– Sense of direction

3
SECONDARY STRUCTURE
• Polypeptide chains fold into regular local structures
– alpha helix, beta sheet, turn, loop
– based on energy considerations

4
ALPHA HELIX

5
BETA SHEET

anti-parallel parallel

schematic

6
TERTIARY STRUCTURE
• 3-d structure of a polypeptide sequence
– interactions between non-local and foreign atoms
– often separated into domains

tertiary structure of domains of CD4

myoglobin

7
QUATERNARY STRUCTURE

• Arrangement of protein subunits

quaternary structure
of Cro
human hemoglobin
tetramer

8
ACTIVE SITE (BINDING SITE)
- Upon folding, the protein active site is formed.
- The spot at which molecules fit and interact.
- The major point for protein activity.
- Usually it is a Cleft, Pocket, Cavity.
- Called active because interaction usually
results in some chemical change or reaction.
- Basis of the lock and key model.
https://www.slideshare.net/MerlynH/protein-structure-
function-46933802

http://www.chemeddl.org/collections/TSTS/Gellman/Gellm
anpg5-8/Active%20Sites.html 9
ACTIVE SITE AND DRUG DESIGN: LOCK AND KEY

• Structure (chemically interact).

• Shape should match .
• Interacting molecules (ligands).
• Lock: Protein (Receptor, Target).
• Key: Ligand (Compound)

10
PROTEIN-LIGAND DOCKING
 Computational method that mimics the binding of a
ligand to a protein
 Given: Target (Protein), Binding Site, Ligand (set of
ligands)
• Predicts:
• The pose of the molecule in the binding site
• The binding affinity or a score representing the
strength of binding
• Docking can be used for:
• virtual screening: Virtual testing of compounds
• Lead optimization: Investigate specific compounds
• De novo design of ligands: Synthesize new
compounds.
Image credit: Charaka Goonatilake, Glen Group, University of Cambridge.
11
http://www-ucc.ch.cam.ac.uk/research/cg369-research.html
VIDEO: MOLECULAR DOCKING USING GLIDE
Bioinformatics: A simple view

Biological Computational
+
Data methods

13
Challenges of working in bioinformatics

• Need to feel comfortable in interdisciplinary area

• Depend on others for primary data
• Need to address important biological and computer
science problems

14
Skill set

• Artificial intelligence
• Machine learning
• Statistics & probability
• Algorithms
• Databases
• Programming

15
Bioinformatics Topics Genome Sequence

• Finding Genes in Genomic DNA

– introns
– exons
– promotors
• Characterizing Repeats in Genomic DNA
– Statistics
– Patterns
• Duplications in the Genome
– Large scale genomic alignment

16
Bioinformatics Topics Protein Sequence
• Sequence Alignment
– non-exact string matching, gaps
• Scoring schemes and Matching
– How to align two strings optimally via statistics
Dynamic Programming
• Patterns – How to tell if a given
– Local vs Global Alignment alignment or match is
– TM-helix finding – Amino acid substitution scoring
statistically significant
– Motifs
– A P-value (or an e-value)?
• Secondary Structure “Prediction”
– Assessing Secondary Structure Prediction – Score Distributions
(extreme val. dist.)
– Ab initio
– Low Complexity Sequences
• Function Prediction
• Evolutionary Issues
– Active site identification
– Rates of mutation and
• Tertiary Structure Prediction change
– Fold Recognition • Relation of Sequence Similarity to Structural
– Threading Similarity
17
Evolution

1
DNA Sequencers
DNA Sequencing
• DNA sequencing refers to the general laboratory technique for determining the exact
sequence of nucleotides, or bases, in a DNA molecule.

• A DNA sequencer is a scientific instrument used to automate the DNA

sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the
order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then
reported as a text string, called a read.
DNA Sequencers
DNA Sequencers
Sequencing Reads
Alignment
Assembly
Evolution

3
Sequence conservation implies function

Alignment is the key to

• Finding important regions
• Determining function
• Uncovering the evolutionary forces
Sequence alignment
• Comparing DNA/protein sequences for
– Similarity
– Homology
• Prediction of function
• Construction of phylogeny: the history of the evolution of a species or
group.
• Shotgun assembly
– End-space-free alignment / overlap alignment
• Finding motifs

3
Homology

• Homology: Homology among DNA, or proteins is

inferred from their sequence similarity. Significant
similarity is strong evidence that two sequences are related
by evolutionary changes from a common ancestral
sequence.
• Orthologs (Different Species)
– Divergence follows speciation
– Similarity can be used to construct phylogeny between
species
• Paralogs (Same Species)
- Divergence follows duplication
3
Sequence Alignment

• Procedure of comparing two (pairwise) or more

(multiple) sequences by searching for a series of
individual characters that are in the same order in
the sequences

GCTAGTCAGATCTGACGCTA
| |||| ||||| |||
TGGTCACATCTGCCGC

35
Sequence Alignment

• Procedure of comparing two (pairwise) or more

(multiple) sequences by searching for a series of
individual characters that are in the same order
in the sequences

VLSPADKTNVKAAWGKVGAHAGYEG
||| | | | || | ||
VLSEGDWQLVLHVWAKVEADVAGEG

3
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

Definition
Given two strings x = x1x2...xM, y = y1y2…yN,

an alignment is an assignment of gaps to positions 0,…, M

in x, and 0,…, N in y, so as to line up each letter in one
sequence with either a letter, or a gap in the other sequence

3
Sources of variation
• Nucleotide substitution
– Replication error
– Chemical reaction
• Insertions or deletions (indels)
– Unequal crossing over
– Replication slippage
• Duplication
– a single gene (complete gene duplication)
– part of a gene (internal or partial gene duplication)
• Domain duplication
• Exon shuffling
– part of a chromosome (partial polysomy)

– an entire chromosome (aneuploidy or polysomy)

– the whole genome (polyploidy)

38
A simple alignment

• Let us try to align two short nucleotide sequences:

– AATCTATA and AAGATA
• Without considering any gaps (insertions/deletions) there
are 3 possible ways to align these sequences

AATCTATA AATCTATA AATCTATA

AAGATA AAGATA AAGATA

• Which one is better?

39
Scoring the alignments
• We need to have a scoring mechanism to evaluate alignments
– match score
– mismatch score
• We can have the total score as:
n
∑
=1
i
match or mismatch score at position i

• For the simple example, assume a match score of 1 and a

mismatch score of 0:
AATCTATA AATCTATA AATCTATA
AAGATA AAGATA AAGATA
4 1 3
40
Simple alignment with gaps
• Considering gapped alignments vastly
increases the number of possible alignments.

• If gap penalty is -1 what will be the new

scores?

AATCTATA AATCTATA AATCTATA

AAG-AT-A AA-G-ATA AA--GATA
1 3 3

41
BLOSUM 62 matrix
String Definitions

A string S is a finite ordered list of characters.

Characters are drawn from an alphabet Σ.

Nucleic acid alphabet: { A, C, G, T }

Amino acid alphabet: { A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, V }

Length of S, |S |, is the number of characters in S

ϵ is the empty string. | ϵ | = 0

43
String Definitions

• For strings S and T over Σ, their concatenation consists of the characters of

S followed by the characters of T, denoted ST

• S is a substring of T if there exist (possibly empty) strings u and v such that

T = uSv

• S is a prefix of T if there exists a string u such that T = Su.

If neither S nor u are ϵ, S is a proper prefix of T.
• Definitions of suffix and proper suffix are similar.

44
String Definitions

• We defined substring. Subsequence is similar except the

characters need not be consecutive.

• “cat” is a substring and a subsequence of “concatenate”

• “cant” is a subsequence of “concatenate”, but not a

substring

45
Exact matching

• Looking for places where a pattern P occurs as a substring

of a text T. Each such place is an occurrence or match.

• An alignment is a way of putting P’s characters opposite

T’s characters. It may or may not correspond to an
occurrence.

46
Exact Matching

47
Exact matching: naïve algorithm

48
Exact matching: naïve algorithm

49
Exact matching: naïve algorithm

50
Can we improve on the naïve algorithm?

P: word
T: There would have been a time for such a word
word

u doesn’t occur in P,so skip next two alignments

P: word
T: There would have been a time for such a word
word
word skip!
word skip!
word

Family and Friends Special Edition 4 - Workbook - Answer Key & Script
50% (2)
Family and Friends Special Edition 4 - Workbook - Answer Key & Script
22 pages
Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
Atomic Habits by James Clear
100% (1)
Atomic Habits by James Clear
23 pages
Formula Sheet Physics 12
100% (1)
Formula Sheet Physics 12
2 pages
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
100% (1)
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
294 pages
Algorithms On Strings Trees and Sequences
100% (1)
Algorithms On Strings Trees and Sequences
163 pages
Pipeline Pigging
100% (3)
Pipeline Pigging
28 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Datasheet LORENTZ PSk3 Hybrid Solar Pumping Solution
No ratings yet
Datasheet LORENTZ PSk3 Hybrid Solar Pumping Solution
7 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Hotel Classification
No ratings yet
Hotel Classification
9 pages
Oxford Big Ideas Geography 8 Ch1 Landforms and Landscapes
0% (1)
Oxford Big Ideas Geography 8 Ch1 Landforms and Landscapes
14 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Estimating The Cost of Risky Debt by Ian Cooper
No ratings yet
Estimating The Cost of Risky Debt by Ian Cooper
8 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Reaction Forces - Load Report
No ratings yet
Reaction Forces - Load Report
68 pages
History of Irrigation
No ratings yet
History of Irrigation
24 pages
Microteaching Chemistry
No ratings yet
Microteaching Chemistry
3 pages
M.E. Production Engineering - Manufacturing &amp Automation
No ratings yet
M.E. Production Engineering - Manufacturing &amp Automation
41 pages
S. M. Tamjeed Bin Alam (R193139)
No ratings yet
S. M. Tamjeed Bin Alam (R193139)
51 pages
Nokia Mission-Critical Mining Networks Transformation White Paper EN
100% (1)
Nokia Mission-Critical Mining Networks Transformation White Paper EN
16 pages
L6-Pairwise Seq Alignment
No ratings yet
L6-Pairwise Seq Alignment
70 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Algorithms On Strings Trees and Sequence PDF
No ratings yet
Algorithms On Strings Trees and Sequence PDF
326 pages
Pairwise Alignment Prelab PDF
No ratings yet
Pairwise Alignment Prelab PDF
87 pages
Edited - Utbk Preparation
No ratings yet
Edited - Utbk Preparation
4 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
Laguna - Coupe Quick Manual
No ratings yet
Laguna - Coupe Quick Manual
23 pages
First Lecture
No ratings yet
First Lecture
89 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Lec 02
No ratings yet
Lec 02
103 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
CE6068 Lecture 5
No ratings yet
CE6068 Lecture 5
83 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Sequences Alignments (Similarity & Homology)
No ratings yet
Sequences Alignments (Similarity & Homology)
32 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Lecture1 Loi
No ratings yet
Lecture1 Loi
52 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
BT302 L3 Psa
No ratings yet
BT302 L3 Psa
47 pages
Lyrics
No ratings yet
Lyrics
56 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Engineering Mechanics - ME3351 2021 Regulation - Semester Question Paper 2022 Nov Dec
No ratings yet
Engineering Mechanics - ME3351 2021 Regulation - Semester Question Paper 2022 Nov Dec
5 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
CC Aws Splunk Brochure
No ratings yet
CC Aws Splunk Brochure
15 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Flight Testing The X-36-The Test Pilot's Perspective
No ratings yet
Flight Testing The X-36-The Test Pilot's Perspective
15 pages
Transparent Conducting Oxides.
No ratings yet
Transparent Conducting Oxides.
8 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Xi-Bio - Guess Paper 2024 - Sigma FT Homelander
No ratings yet
Xi-Bio - Guess Paper 2024 - Sigma FT Homelander
43 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
520 2022 Article 7194
No ratings yet
520 2022 Article 7194
9 pages
Latihan Soal
No ratings yet
Latihan Soal
10 pages
Underwater Noise Review: For Saoirse Wave Energy Limited
No ratings yet
Underwater Noise Review: For Saoirse Wave Energy Limited
29 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Week 3
No ratings yet
Week 3
4 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Sequence Alignment - Final
No ratings yet
Sequence Alignment - Final
6 pages
BIF401 Current Papers Solution Part 1
No ratings yet
BIF401 Current Papers Solution Part 1
6 pages
History: History of The Electric Vehicle
No ratings yet
History: History of The Electric Vehicle
3 pages
Tofinoxe 0200t1t1tddz90007tatxxxxx
No ratings yet
Tofinoxe 0200t1t1tddz90007tatxxxxx
4 pages
c630 Nickel Aluminum Bronze PDF
No ratings yet
c630 Nickel Aluminum Bronze PDF
2 pages
Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
From Everand
Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
Dr. M. Kanagasabapathy
5/5 (1)
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
From Everand
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Backtracking Algorithms and Applications: Definitive Reference for Developers and Engineers
From Everand
Backtracking Algorithms and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Replication Architectures and Protocols: Definitive Reference for Developers and Engineers
From Everand
Practical Replication Architectures and Protocols: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Bio 3

Uploaded by

Bio 3

Uploaded by

BIOINFORMATICS(BIOCOMPUTING)

tertiary structure of domains of CD4

• Arrangement of protein subunits

• Structure (chemically interact).

• Need to feel comfortable in interdisciplinary area

• Finding Genes in Genomic DNA

• A DNA sequencer is a scientific instrument used to automate the DNA

Alignment is the key to

• Homology: Homology among DNA, or proteins is

• Procedure of comparing two (pairwise) or more

• Procedure of comparing two (pairwise) or more

an alignment is an assignment of gaps to positions 0,…, M

– an entire chromosome (aneuploidy or polysomy)

• Let us try to align two short nucleotide sequences:

AATCTATA AATCTATA AATCTATA

• Which one is better?

• For the simple example, assume a match score of 1 and a

• If gap penalty is -1 what will be the new

AATCTATA AATCTATA AATCTATA

A string S is a finite ordered list of characters.

Characters are drawn from an alphabet Σ.

Nucleic acid alphabet: { A, C, G, T }

Length of S, |S |, is the number of characters in S

ϵ is the empty string. | ϵ | = 0

• For strings S and T over Σ, their concatenation consists of the characters of

• S is a substring of T if there exist (possibly empty) strings u and v such that

• S is a prefix of T if there exists a string u such that T = Su.

• We defined substring. Subsequence is similar except the

• “cat” is a substring and a subsequence of “concatenate”

• “cant” is a subsequence of “concatenate”, but not a

• Looking for places where a pattern P occurs as a substring

• An alignment is a way of putting P’s characters opposite

u doesn’t occur in P,so skip next two alignments

You might also like