0% found this document useful (0 votes)
3 views22 pages

Disclaimer

The document discusses the importance of sequence alignment in bioinformatics, highlighting concepts such as similarity, identity, homology, and the differences between homologs, paralogs, orthologs, and analogs. It explains the significance of sequence conservation and variation, methods of alignment, and the characteristics of nucleic acids. Additionally, it covers the types of sequence alignment, including pairwise and multiple sequence alignments, and their respective algorithms and applications.

Uploaded by

Sneha Ardeshna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views22 pages

Disclaimer

The document discusses the importance of sequence alignment in bioinformatics, highlighting concepts such as similarity, identity, homology, and the differences between homologs, paralogs, orthologs, and analogs. It explains the significance of sequence conservation and variation, methods of alignment, and the characteristics of nucleic acids. Additionally, it covers the types of sequence alignment, including pairwise and multiple sequence alignments, and their respective algorithms and applications.

Uploaded by

Sneha Ardeshna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Disclaimer

It is hereby declared that the production of the said content is meant for non-commercial, scholastic and research
purposes only.

We admit that some of the content or the images provided in this channel's videos may be obtained through the
routine Google image searches and few of them may be under copyright protection. Such usage is completely
inadvertent.

It is quite possible that we overlooked to give full scholarly credit to the Copyright Owners. We believe that the non-
commercial, only-for-educational use of the material may allow the video in question fall under fair use of such
content. However we honour the copyright holder's rights and the video shall be deleted from our channel in case of
any such claim received by us or reported to us.
Department
of
Microbiology
Unit no 3
Introduction to

Sequence Alignment databases

Subject name
and code
Bioinformatics
& Biostatistics;
02MB0301
Dr. Purvi M.
Rakhashiya
Importance of Sequence alignment

• Function or activity of a new gene/protein


• Structure or shape of a new protein
• Location or preferred location of a protein
• Stability of a gene or protein
• Origin of a gene, protein, organelle, organism…
Similarity and Identity

• The key difference between similarity and identity in


sequence alignment is that similarity is the likeness
(resemblance) between two sequences in comparison
while identity is the number of characters that match
exactly between two different sequences.
Similarity versus Homology

• Similarity refers to the • Homology refers to shared


likeness or % similarity ancestry
between 2 sequences • Two sequences are
• Similarity of sequences homologous if they are
derived from a common
usually means sharing a
ancestral sequence
statistically measured
• Homology often implies
number of bases or amino
similarity
acids (note that structural, but
• Similarity does not not sequence, similarity
necessarily imply may occur)
homology
Similarity versus Homology contd..

• Similarity can be quantified


• It is correct to say that two sequences are X%
identical
• It is correct to say that two sequences have a
similarity score of Z
• It is correct to say that two sequences are X%
similar, as long as the criteria for similarity is clear.
Similarity versus Homology
• Homology cannot be quantified
“Its homologous or it isn’t”
• If two sequences have a high % identity it
is OK to say they are homologous
• It is incorrect to say two sequences have a
homology score of Z
• It is incorrect to say two sequences are X
% homologous or have a homology of X %
Department
of
Microbiology
Unit no 3
Introduction to

Sequence databases

Alignment-2 Subject name


and code
Bioinformatics
& Biostatistics;
02MB0301
Dr. Purvi M.
Rakhashiya
Homolog Vs. Paralog Vs. Orthog Vs. Analog

• https://youtu.be/TiKbMw_bKEk
• Convergent Vs. Divergent Evolution
– Divergent evolution occurs when two different species share a
common ancestor but have different characteristics from one
another. Each time one ancestral species diverges into multiple
descendant species it is called speciation. Speciation is an
important result of divergent evolution.
– Convergent evolution is the independent evolution of similar
features in species of different periods or epochs in time.
Convergent evolution creates analogous structures that have
similar form or function but were not present in the last common
ancestor of those groups.
• Homologs
– Sequence homology is the biological homology between DNA,
RNA, or protein sequences, defined in terms of shared ancestry in
the evolutionary history of life.
– Two segments of DNA can have shared ancestry because of three
phenomena: either a speciation event (orthologs), or a duplication
event (paralogs), or else a horizontal (or lateral) gene transfer
event (xenologs).
– Paralogs-Paralogous genes are genes that are related via
duplication events in the last common ancestor (LCA) of the
species being compared. They result from the mutation of
duplicated genes during separate speciation events.
– Orthologs- Homologous sequences are orthologous if they are
inferred to be descended from the same ancestral sequence
separated by a speciation event: when a species diverges into two
separate species, the copies of a single gene in the two resulting
species are said to be orthologous. Orthologs, or orthologous
genes, are genes in different species that originated by vertical
descent from a single gene of the last common ancestor.
Conservation or variation

• Degree of sequence conservation – reflects the evolutionary relatedness of


different sequences

• Degree of sequence variation - reflects the changes that have occurred


during evolution

• Methods of variations
• Mutation or Substitutions
• Insertions
• Deletions
From unknown to known

• Sequence alignment provides inference for the relatedness of


two sequences under study

• Identifying the evolutionary relationships between sequences


helps to characterize the function of unknown sequences

• If the two sequences share significant similarity, it is extremely


unlikely that the extensive similarity between the two
sequences has been acquired randomly, meaning that the two
sequences must have derived from a common evolutionary
origin
Causes for sequence (dis)similarity

mutation: a nucleotide at a certain location is replaced by


another nucleotide (e.g.: ATA → AGA)

insertion: at a certain location one new nucleotide is


inserted in between two existing nucleotides
(e.g.: AA → AGA)

deletion: at a certain location one existing nucleotide


is deleted (e.g.: ACTG → AC-G)

indel: an insertion or a deletion

13
Nucleic Acid Sequence characteristics and
parameters

• DNA has a double helical structure


• The strands are complementary
• Hence there is redundancy in DNA in terms of
information
• But the redundant strand is not same
• The direction of both the strands are different
• Anti-parallel

14
Complementarity

 The two strands of DNA are reverse complementary

5’ ACGTTACG 3’
3’ TGCAATGC 5’
 Most cellular processes involving DNA occur in the 5’ to 3’ direction

 For every G one strand there is a C on other strand and for every A on one strand
there is a T on other strand

15
DNA Composition (Rigidity and Flexibility)

• AT content
• GC content

• AT or GC content = x100

• High GC = Rigid DNA


• High AT = Flexible DNA
• High probability to interact with proteins
• ATATACGGGCAGCAGC

16
Types of Alignment

• Pairwise Sequence Alignment


– Global Alignment
– Local Alignment
• Multiple Sequence Alignment
Pairwise sequence alignment Multiple sequence alignment

• of sequences compared = 2 • of sequences compared > 2


• Can be global or local • Generally global, but can be local
• Simple algorithm for scoring • Sophisticated algorithm for scoring
– Global – Needleman-Wunsch – Progressive alignment
– Local – Smith-Waterman – Pairwise alignment in loop
– Align two most closely related sequences
– Followed by the next sequence most similar
to the pair and so on
Pairwise sequence alignment Multiple sequence alignment

• To find conserved regions b/w • To detect regions of variability and


sequences conservation in a family of protein
• Similarity search in a database for • Phylogenetic analysis
homologous sequences • Tools
• Tools – MUSCLE
– BLAST – T-Coffee
– EMBOSS Needle – CLUSTAL-W
– EMBOSS water
Pairwise sequence alignment

• Goal - To find the best pairing of two sequences, such that there is
maximum correspondence among residues

• How - One sequence is shifted relative to the other to find the position
where maximum matches are found

• Strategies -
1. Global alignment
2. Local alignment
Global Alignment Local Alignment
Local alignments are more useful for
Attempt to align every residue in every dissimilar sequences that are suspected to
sequence, are most useful when the contain regions of similarity or similar
sequences in the query set are similar and of sequence motifs within their larger
roughly equal size. sequence context.
• Assumption – Similarity over the entire • Assumption – Similarity, confined to local
length of sequences regions instead of entire length of
• Sequences – More or less of similar sequences
length • Sequences – could vary largely over
• Alignment - Carried out from beginning length
to end of both sequences • Alignment – carried out considering local
• Aim - Look for best possible alignment regions of high similarity without regard
across the entire length between the two for the alignment of the rest of the
sequences sequence regions
• General global alignment technique is • Aim – Look for local regions with the
the Needleman–Wunsch algorithm highest level of similarity
• General global alignment technique is
the Smith–Waterman algorithm
Global Alignment Local Alignment

• Applications - More applicable for aligning • Applications - Used for aligning more
two closely related sequences of roughly divergent sequences to find conserved
the same length patterns also known as domains or
• Limitation – Not good for divergent motifs
sequences and sequences of variable
lengths

“:” indicates identical residue matches


“.” indicates similar residue matches

You might also like