Lecture3-DNA Data Analysis

Uploaded by

shoyo3918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views17 pages

Lecture3-DNA Data Analysis

Uploaded by

shoyo3918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

DNA Data Analysis

Dr. Y. V. Lokeswari
Associate Professor
SSN College of Engineering
DNA Data Analysis - DNA Sequence
• DNA is the basis of heredity.
• It is a polymer made up of small molecules called nucleotides, which can be distinguished by the four
bases: adenine (A), cytosine (C), guanine (G), and thymine (T).
• DNA usually occurs in double strands, and the bases in the two strands are complementary to
each other, i.e., A pairing with T and G pairing with C with hydrogen bonds.
• a single strand of DNA (written in the 5 to 3 direction): 5 AACCGTACC 3 is paired to a
complementary strand running in the opposite direction

DNA=basis for heredity.

It's made of small molecules called neucelotides which are distingushed by the 4 bases.
The bases running along the 2 strands are complementary to each other
DNA Data Analysis - DNA Sequence
DNA Data Analysis - DNA Sequence
• The transcription process is different in prokaryotes (i.e., simple bacteria) and eukaryotes (non-bacteria
possessing a nucleus, e.g., fungi, unicellular paramecia, and all plants and animals).
• In prokaryotes, RNA polymerase produces an mRNA transcript directly from the DNA template.
• In eukaryotes, genes in a DNA sequence are not continuous, but instead are broken up into coding
regions (exons, which code for proteins) and noncoding regions (introns).
• The amino acid leucine (Leu) is encoded by six different codons. There are three codons, UAA, UAG,
and UGA, that do not encode any amino acids.
• They are the stop codons that terminate the translation process.
• It is important then to decide with which nucleotide to start translation, and when to stop. This is
called an open reading frame (ORF). Determination of the correct reading frame is an important
problem in genomics and bioinformatics. Transcription process is diff in prokaryotes (no nucles)
and eukaryotes.

Prokaryotes--DNA template lendhu RNA polymerase

produces an mRNA transcript directly

Eukaryotes--genes in DNA seq are broken into coding

(exons) and non-coding(introns regions)

Ex: Leucine (Leu)--encoded by 6 diff codons

There are in general 3 codons that are useless (non
coding)--these are stop codons that terminate the
translation process.

with which nucleotide u should start translation?

with which neuclotide to stop?
Determine the correct reading frame(Open Reading
Frame ORF)--imp problem in bioinf
DNA Data Analysis - Sequence Comparison and Alignment
• To study the functional and structural information encoded in the sequence.
• Done by comparing the new sequence with sequences that have already being well studied and
annotated. compare unknown with known
• Sequences that are similar would probably have the same function, be it a functional role (i.e., ORFs
coding for similar proteins), regulatory role (i.e., similar regulatory or biochemical pathways), or
structural properties in the case of proteins. seq similar=> structure, function (ORF), regulation(pathways) same
• Additionally, if two sequences from different organisms are similar, there may be a common ancestor
sequence, and the sequences are then said to be homologous. seq similar in diff org=> may have common ancestor, they are homolgous
• Sequence alignment: comparing two (pairwise alignment) or more (multiple sequence alignment)
sequences by searching for a series of individual characters or character patterns that are in the same
order in the sequences. string matching
• Global alignment tries to align the entire sequence in such a way as to maximize the degree of similarity
between the two sequences. epdiyaachu both are similar nu sollanum
• In local alignment, the alignment stops at the ends of regions of strong similarity, and a much higher
priority is given to finding these local regions than to extending the alignment to include more
neighboring pairs. (BLAST, FASTA) shorter seq, higher sim is priority, don't extend
• Dynamic programming for sequence alignment is an efficient mathematical technique for optimum
alignment, it is still too slow for comparing large numbers of bases. DP=slow
• Needleman-Wunsch algorithm and the Smith-Waterman algorithm for sequence alignment is available
freelyEMBOSS (European Molecular Biology Open Software Suite)
DNA Data Analysis - Sequence Comparison and Alignment
• In sequence alignment represents the following in all sequences used in the alignment
• Gaps are represented as dash (-).
• The asterisk ( Ã ), - indicates identical amino acid residues * - identical amino acid residues
• colon ( : ) indicates conserved substitution and : - conserved substitution
• dot ( Á ) indicates semi-conserved substitution. . - semiconserved substituition
Demo on BLAST and ClustalW
DNA Data Analysis - Gene Prediction
• Gene prediction requires the integration of many different signals such as promoter regions, translation
start and stop codons, reading frame periodicities, polyadenylation (polyA) signals, and,
• In eukaryotes, intron splicing signals, base compositional bias between codon positions for exons and
introns, and various coding statistics.
• In prokaryotes, gene finding is made simpler by the fact that coding regions are not interrupted by
intervening sequences such as introns.
• 3 approaches of gene prediction:
• Similarity-based, Content-based, and Site-based.
• Similarity-based methods make use of already determined sequences by a comparison of sequence
data.
• Content-based methods determine the overall properties of a sequence in terms of the various coding
statistics.
• Site-based methods determine transcription factor binding sites, polyA signals, start and stop codons,
splice junctions, and other specific subsequences or sequence patterns.
Gene prediction needs integ of diff signals like-start & stop codons, promoter
regions, reading frame periodicities
In prokaryotes--all are exons--coding regions not interrupted by introns
In eukaryotes--splice coding and non-coding regions
3 approaches of gene prediction-Sim, Content, Site Based (SCS)
DNA Data Analysis - Gene Prediction
• study of several coding statistics for the recognition of human, yeast and C. elegans coding and non-
coding sequences.
DNA Data Analysis - Gene Prediction
One Pyrimidine feature
DNA Data Analysis - Gene Prediction
DNA Data Analysis - Gene Prediction
DNA Data Analysis - Gene Prediction
DNA Data Analysis - Gene Prediction
DNA Data Analysis - Phylogenetic Analysis
• A phylogenetic analysis of a family of related DNA or protein sequences is a determination of how the
family might have been derived during molecular evolution
• Phylogenetic analysis leads to the construction of an evolution tree.
• The evolutionary relationships among the sequences are depicted by placing the sequences the leaves
on the tree in such a way that the branching relationship in the tree reflects the degree to which
different sequences are related.
• Phylogenetic study performed on a gene family could also aid in the prediction of genes with
equivalent or similar functions.
• Phylogenetic analysis is closely linked to sequence alignment.
• Three methods that are commonly used to derive the phylogenetic tree
• Maximum parsimony method, Distance method, and Maximum likelihood method.
DNA Data Analysis - Phylogenetic Analysis
• For parsimony analysis, the best results are obtained when the amount of variation among all pairs of
sequences is similar and the amount of variation is small.
• It is not good for reconstructing ancient phylogenies.
• If variation among sequences is present (some sequences are more similar than others) and the amount of
variation is intermediate, distance method can be used.
• Distance method, the concept of genetic distance between two sequences needs to be defined
appropriately, depending on the type of sequences in consideration, and on their structural properties.
• Algorithms are also available for converting sequence similarity scores into distance scores.
• The genetic distances between sequences are then used to construct the phylogenetic tree.
• Maximum likelihood methods are particularly useful when the sequences are more variable.
• The method uses probability calculations based on an explicit evolutionary model.
• e.g., the F84 substitution model in the PHYLIP package and the TN93 substitution model, to find a tree
that best accounts for the variation in the sequence.
DNA Data Analysis - Phylogenetic Analysis
• Phylogenetic analysis programs are widely available. Two main ones
• PHYLIP (available at http://evolution.genetics.washington.edu/phylip.html) and
• PAUP (available at http://www.lms.si.edu/PAUP/).
• Both packages provide the three main methods for phylogenetic analysis.

Water Bottling Plant
No ratings yet
Water Bottling Plant
12 pages
Sanitation Standard Operating Procedure
75% (4)
Sanitation Standard Operating Procedure
5 pages
OC and Stiggs
100% (1)
OC and Stiggs
71 pages
Session 5: Gender and Development (Gad) Pre-Test
No ratings yet
Session 5: Gender and Development (Gad) Pre-Test
3 pages
DNA Sequences Analysis: Hasan Alshahrani CS6800
No ratings yet
DNA Sequences Analysis: Hasan Alshahrani CS6800
26 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
02.-Sequence Analysis PDF
No ratings yet
02.-Sequence Analysis PDF
14 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
4 Phylogenetics
No ratings yet
4 Phylogenetics
43 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
CUBT401 - 4 - Sequence and Genome Annotation
No ratings yet
CUBT401 - 4 - Sequence and Genome Annotation
66 pages
Rana
No ratings yet
Rana
53 pages
Lva1 App6891 PDF
No ratings yet
Lva1 App6891 PDF
33 pages
Reading The Blueprint of Life: DNA Sequencing
No ratings yet
Reading The Blueprint of Life: DNA Sequencing
23 pages
Phylogenetics
No ratings yet
Phylogenetics
108 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
Bio 3
No ratings yet
Bio 3
51 pages
Phylogenetic Analyses2
No ratings yet
Phylogenetic Analyses2
16 pages
Bioinformatics 2015
No ratings yet
Bioinformatics 2015
269 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Protein Sequence Analysis
No ratings yet
Protein Sequence Analysis
44 pages
BIF401 MID Term Exam 2022 Preparation by BADSHA ALI
No ratings yet
BIF401 MID Term Exam 2022 Preparation by BADSHA ALI
6 pages
Sequence Alignment - Final
No ratings yet
Sequence Alignment - Final
6 pages
Genome Analysis: DNA Typing, Genomics, and Beyond
No ratings yet
Genome Analysis: DNA Typing, Genomics, and Beyond
92 pages
Sequence Analysis
No ratings yet
Sequence Analysis
6 pages
Human Genome Project: Presented By: Vaishali Gade & Sandhya Singh
No ratings yet
Human Genome Project: Presented By: Vaishali Gade & Sandhya Singh
30 pages
Lecture1 BIOF242 Shuvadeep
No ratings yet
Lecture1 BIOF242 Shuvadeep
38 pages
Rosales
No ratings yet
Rosales
27 pages
Same Nva Tting
No ratings yet
Same Nva Tting
22 pages
An Overview of Gene Identification
No ratings yet
An Overview of Gene Identification
9 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Lincoln Stein - Genome Annotation: From Sequence To Biology
No ratings yet
Lincoln Stein - Genome Annotation: From Sequence To Biology
13 pages
First Lecture
No ratings yet
First Lecture
89 pages
Unit 2 BI
No ratings yet
Unit 2 BI
10 pages
Phylogenetic Tree Bioinformatics
No ratings yet
Phylogenetic Tree Bioinformatics
4 pages
Cse291d 19
No ratings yet
Cse291d 19
43 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Bioinformatics
No ratings yet
Bioinformatics
8 pages
Bioinformatics
No ratings yet
Bioinformatics
11 pages
Genes
No ratings yet
Genes
74 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Sequences Alignments (Similarity & Homology)
No ratings yet
Sequences Alignments (Similarity & Homology)
32 pages
Comparative Genomics and Target Discovery: Maarten Sollewijn Gelpke MDI, Organon
No ratings yet
Comparative Genomics and Target Discovery: Maarten Sollewijn Gelpke MDI, Organon
35 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Gene Prediction
25% (4)
Gene Prediction
36 pages
Sequence Alignment
No ratings yet
Sequence Alignment
8 pages
Qualitative Analysis of Biomolecules: 1. The Human Genome
No ratings yet
Qualitative Analysis of Biomolecules: 1. The Human Genome
6 pages
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
100% (3)
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
9 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
BIF501-Bioinformatics-II Solved Questions FINAL TERM (PAST PAPERS)
No ratings yet
BIF501-Bioinformatics-II Solved Questions FINAL TERM (PAST PAPERS)
23 pages
Bioinfo Notes
No ratings yet
Bioinfo Notes
5 pages
Bioinformatics-And-Phylogeny
No ratings yet
Bioinformatics-And-Phylogeny
14 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Introduction To Bioinformatics: Tolga Can
No ratings yet
Introduction To Bioinformatics: Tolga Can
21 pages
Sequence Analysis Primer 1st Edition ISBN 0195098749, 9780195098747 Full Text Download
No ratings yet
Sequence Analysis Primer 1st Edition ISBN 0195098749, 9780195098747 Full Text Download
16 pages
BioAlg10 9
No ratings yet
BioAlg10 9
69 pages
Unit 3
No ratings yet
Unit 3
44 pages
Sequence Analysis Primer, 1st Edition Full Download
100% (8)
Sequence Analysis Primer, 1st Edition Full Download
17 pages
Introducing Epigenetics: A Graphic Guide
From Everand
Introducing Epigenetics: A Graphic Guide
Cath Ennis
3/5 (4)
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Gene Editing, Epigenetic, Cloning and Therapy
From Everand
Gene Editing, Epigenetic, Cloning and Therapy
Amin Elsersawi Ph.D.
4.5/5 (2)
Lecture2-DataMining For Bioinformatics
No ratings yet
Lecture2-DataMining For Bioinformatics
7 pages
Lecture4-Protein Data Analysis
No ratings yet
Lecture4-Protein Data Analysis
26 pages
Lecture1-Bioinformatics Technologies
No ratings yet
Lecture1-Bioinformatics Technologies
69 pages
Lecture4-Gene Prediction Problem - Simiarity Based Method
No ratings yet
Lecture4-Gene Prediction Problem - Simiarity Based Method
5 pages
Lecture3-Structural Bioinformatics-Secondary Resources
No ratings yet
Lecture3-Structural Bioinformatics-Secondary Resources
26 pages
Lecture2-Structural Bioinformatics
No ratings yet
Lecture2-Structural Bioinformatics
8 pages
Nerve Sonography
No ratings yet
Nerve Sonography
11 pages
A Synergetic Model For Implementing An Integrated Management System: An Empirical Study in China
No ratings yet
A Synergetic Model For Implementing An Integrated Management System: An Empirical Study in China
8 pages
Roald Dahl Poison
No ratings yet
Roald Dahl Poison
20 pages
Packaging of Spices and Spice Products Report
83% (6)
Packaging of Spices and Spice Products Report
14 pages
Chocolate Buttercream Frosting - RecipeTin Eats
No ratings yet
Chocolate Buttercream Frosting - RecipeTin Eats
2 pages
Instruction Manual Model Et-73 Redi Check® Smoker: Components
No ratings yet
Instruction Manual Model Et-73 Redi Check® Smoker: Components
2 pages
W:comb
No ratings yet
W:comb
3 pages
NON BANKING F. INSTITUTIONS - New2
No ratings yet
NON BANKING F. INSTITUTIONS - New2
5 pages
Industrial Commissary
No ratings yet
Industrial Commissary
40 pages
تمارين ديناميكا حرارية 1-أ
No ratings yet
تمارين ديناميكا حرارية 1-أ
27 pages
250kw Biogas Generator Spec
No ratings yet
250kw Biogas Generator Spec
6 pages
extiNGUSHER TAG
No ratings yet
extiNGUSHER TAG
1 page
Antepartum Haemorrhage: Review
No ratings yet
Antepartum Haemorrhage: Review
5 pages
ProductBrochure L150G L180G L220G en 21A1006521
No ratings yet
ProductBrochure L150G L180G L220G en 21A1006521
32 pages
Data Sheets Penberthy Series TSL TSM Liquid Level Gauges Penberthy en en 8045478
No ratings yet
Data Sheets Penberthy Series TSL TSM Liquid Level Gauges Penberthy en en 8045478
7 pages
Chapter 3 Reviewer - Gov Acco
No ratings yet
Chapter 3 Reviewer - Gov Acco
2 pages
Addison's Disease
No ratings yet
Addison's Disease
6 pages
Quiz 3 Finals
No ratings yet
Quiz 3 Finals
11 pages
FR2115 - 2125 Ime-E
No ratings yet
FR2115 - 2125 Ime-E
81 pages
ICU Triage
No ratings yet
ICU Triage
27 pages
Raptor English
No ratings yet
Raptor English
2 pages
Dark Crystal RAT
No ratings yet
Dark Crystal RAT
17 pages
OT Facility and Equipment
No ratings yet
OT Facility and Equipment
2 pages
Struers Grinding Products
No ratings yet
Struers Grinding Products
8 pages
New Text Document
No ratings yet
New Text Document
63 pages
HexagonMI FTI Flyer USLetter FASTFORM ADVANCED
No ratings yet
HexagonMI FTI Flyer USLetter FASTFORM ADVANCED
2 pages

Lecture3-DNA Data Analysis

Uploaded by

Lecture3-DNA Data Analysis

Uploaded by

DNA Data Analysis

DNA=basis for heredity.

Prokaryotes--DNA template lendhu RNA polymerase

Eukaryotes--genes in DNA seq are broken into coding

Ex: Leucine (Leu)--encoded by 6 diff codons

with which nucleotide u should start translation?

You might also like