0% found this document useful (0 votes)
6 views51 pages

Bio 3

The document provides an overview of protein structure, including primary, secondary, tertiary, and quaternary structures, as well as the concept of active sites and their importance in drug design. It discusses bioinformatics topics such as sequence alignment, protein-ligand docking, and the challenges faced in the interdisciplinary field. Additionally, it covers DNA sequencing techniques and the significance of sequence conservation in evolutionary biology.

Uploaded by

mahmoudweso2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views51 pages

Bio 3

The document provides an overview of protein structure, including primary, secondary, tertiary, and quaternary structures, as well as the concept of active sites and their importance in drug design. It discusses bioinformatics topics such as sequence alignment, protein-ligand docking, and the challenges faced in the interdisciplinary field. Additionally, it covers DNA sequencing techniques and the significance of sequence conservation in evolutionary biology.

Uploaded by

mahmoudweso2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

BIOINFORMATICS(BIOCOMPUTING)

(3)
ALIGNMENT AND MATCHING
DR. IBRAHIM ZAGHLOUL
PROTEIN STRUCTURE

https://www.rcsb.org/structure/3ERT 2
MACROMOLECULAR STRUCTURE
• Primary structure of proteins
– Linear polymers linked by peptide bonds
– Sense of direction

3
SECONDARY STRUCTURE
• Polypeptide chains fold into regular local structures
– alpha helix, beta sheet, turn, loop
– based on energy considerations

4
ALPHA HELIX

5
BETA SHEET

anti-parallel parallel

schematic

6
TERTIARY STRUCTURE
• 3-d structure of a polypeptide sequence
– interactions between non-local and foreign atoms
– often separated into domains

tertiary structure of domains of CD4


myoglobin

7
QUATERNARY STRUCTURE

• Arrangement of protein subunits

quaternary structure
of Cro
human hemoglobin
tetramer

8
ACTIVE SITE (BINDING SITE)
- Upon folding, the protein active site is formed.
- The spot at which molecules fit and interact.
- The major point for protein activity.
- Usually it is a Cleft, Pocket, Cavity.
- Called active because interaction usually
results in some chemical change or reaction.
- Basis of the lock and key model.
https://www.slideshare.net/MerlynH/protein-structure-
function-46933802

http://www.chemeddl.org/collections/TSTS/Gellman/Gellm
anpg5-8/Active%20Sites.html 9
ACTIVE SITE AND DRUG DESIGN: LOCK AND KEY

• Structure (chemically interact).


• Shape should match .
• Interacting molecules (ligands).
• Lock: Protein (Receptor, Target).
• Key: Ligand (Compound)

10
PROTEIN-LIGAND DOCKING
 Computational method that mimics the binding of a
ligand to a protein
 Given: Target (Protein), Binding Site, Ligand (set of
ligands)
• Predicts:
• The pose of the molecule in the binding site
• The binding affinity or a score representing the
strength of binding
• Docking can be used for:
• virtual screening: Virtual testing of compounds
• Lead optimization: Investigate specific compounds
• De novo design of ligands: Synthesize new
compounds.
Image credit: Charaka Goonatilake, Glen Group, University of Cambridge.
11
http://www-ucc.ch.cam.ac.uk/research/cg369-research.html
VIDEO: MOLECULAR DOCKING USING GLIDE
Bioinformatics: A simple view

Biological Computational
+
Data methods

13
Challenges of working in bioinformatics

• Need to feel comfortable in interdisciplinary area


• Depend on others for primary data
• Need to address important biological and computer
science problems

14
Skill set

• Artificial intelligence
• Machine learning
• Statistics & probability
• Algorithms
• Databases
• Programming

15
Bioinformatics Topics Genome Sequence

• Finding Genes in Genomic DNA


– introns
– exons
– promotors
• Characterizing Repeats in Genomic DNA
– Statistics
– Patterns
• Duplications in the Genome
– Large scale genomic alignment

16
Bioinformatics Topics Protein Sequence
• Sequence Alignment
– non-exact string matching, gaps
• Scoring schemes and Matching
– How to align two strings optimally via statistics
Dynamic Programming
• Patterns – How to tell if a given
– Local vs Global Alignment alignment or match is
– TM-helix finding – Amino acid substitution scoring
statistically significant
– Motifs
– A P-value (or an e-value)?
• Secondary Structure “Prediction”
– Assessing Secondary Structure Prediction – Score Distributions
(extreme val. dist.)
– Ab initio
– Low Complexity Sequences
• Function Prediction
• Evolutionary Issues
– Active site identification
– Rates of mutation and
• Tertiary Structure Prediction change
– Fold Recognition • Relation of Sequence Similarity to Structural
– Threading Similarity
17
Evolution

1
DNA Sequencers
DNA Sequencing
• DNA sequencing refers to the general laboratory technique for determining the exact
sequence of nucleotides, or bases, in a DNA molecule.

• A DNA sequencer is a scientific instrument used to automate the DNA


sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the
order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then
reported as a text string, called a read.
DNA Sequencers
DNA Sequencers
Sequencing Reads
Alignment
Assembly
Evolution

3
Sequence conservation implies function

Alignment is the key to


• Finding important regions
• Determining function
• Uncovering the evolutionary forces
Sequence alignment
• Comparing DNA/protein sequences for
– Similarity
– Homology
• Prediction of function
• Construction of phylogeny: the history of the evolution of a species or
group.
• Shotgun assembly
– End-space-free alignment / overlap alignment
• Finding motifs

3
Homology

• Homology: Homology among DNA, or proteins is


inferred from their sequence similarity. Significant
similarity is strong evidence that two sequences are related
by evolutionary changes from a common ancestral
sequence.
• Orthologs (Different Species)
– Divergence follows speciation
– Similarity can be used to construct phylogeny between
species
• Paralogs (Same Species)
- Divergence follows duplication
3
Sequence Alignment

• Procedure of comparing two (pairwise) or more


(multiple) sequences by searching for a series of
individual characters that are in the same order in
the sequences

GCTAGTCAGATCTGACGCTA
| |||| ||||| |||
TGGTCACATCTGCCGC

35
Sequence Alignment

• Procedure of comparing two (pairwise) or more


(multiple) sequences by searching for a series of
individual characters that are in the same order
in the sequences

VLSPADKTNVKAAWGKVGAHAGYEG
||| | | | || | ||
VLSEGDWQLVLHVWAKVEADVAGEG

3
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

Definition
Given two strings x = x1x2...xM, y = y1y2…yN,

an alignment is an assignment of gaps to positions 0,…, M


in x, and 0,…, N in y, so as to line up each letter in one
sequence with either a letter, or a gap in the other sequence

3
Sources of variation
• Nucleotide substitution
– Replication error
– Chemical reaction
• Insertions or deletions (indels)
– Unequal crossing over
– Replication slippage
• Duplication
– a single gene (complete gene duplication)
– part of a gene (internal or partial gene duplication)
• Domain duplication
• Exon shuffling
– part of a chromosome (partial polysomy)

– an entire chromosome (aneuploidy or polysomy)


– the whole genome (polyploidy)

38
A simple alignment

• Let us try to align two short nucleotide sequences:


– AATCTATA and AAGATA
• Without considering any gaps (insertions/deletions) there
are 3 possible ways to align these sequences

AATCTATA AATCTATA AATCTATA


AAGATA AAGATA AAGATA

• Which one is better?

39
Scoring the alignments
• We need to have a scoring mechanism to evaluate alignments
– match score
– mismatch score
• We can have the total score as:
n

=1
i
match or mismatch score at position i

• For the simple example, assume a match score of 1 and a


mismatch score of 0:
AATCTATA AATCTATA AATCTATA
AAGATA AAGATA AAGATA
4 1 3
40
Simple alignment with gaps
• Considering gapped alignments vastly
increases the number of possible alignments.

• If gap penalty is -1 what will be the new


scores?

AATCTATA AATCTATA AATCTATA


AAG-AT-A AA-G-ATA AA--GATA
1 3 3

41
BLOSUM 62 matrix
String Definitions

A string S is a finite ordered list of characters.

Characters are drawn from an alphabet Σ.

Nucleic acid alphabet: { A, C, G, T }


Amino acid alphabet: { A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, V }

Length of S, |S |, is the number of characters in S

ϵ is the empty string. | ϵ | = 0

43
String Definitions

• For strings S and T over Σ, their concatenation consists of the characters of


S followed by the characters of T, denoted ST

• S is a substring of T if there exist (possibly empty) strings u and v such that


T = uSv

• S is a prefix of T if there exists a string u such that T = Su.


If neither S nor u are ϵ, S is a proper prefix of T.
• Definitions of suffix and proper suffix are similar.

44
String Definitions

• We defined substring. Subsequence is similar except the


characters need not be consecutive.

• “cat” is a substring and a subsequence of “concatenate”

• “cant” is a subsequence of “concatenate”, but not a


substring

45
Exact matching

• Looking for places where a pattern P occurs as a substring


of a text T. Each such place is an occurrence or match.

• An alignment is a way of putting P’s characters opposite


T’s characters. It may or may not correspond to an
occurrence.

46
Exact Matching

47
Exact matching: naïve algorithm

48
Exact matching: naïve algorithm

49
Exact matching: naïve algorithm

50
Can we improve on the naïve algorithm?

P: word
T: There would have been a time for such a word
word

u doesn’t occur in P,so skip next two alignments

P: word
T: There would have been a time for such a word
word
word skip!
word skip!
word

51

You might also like