0% found this document useful (0 votes)
41 views73 pages

T2 Syllabus Revision Class

This document discusses DNA sequencing processes and technologies. It begins with an overview of DNA structure, replication, and repeats found in DNA. It then covers various sequencing processes like fragmentation, amplification, isolation, and sequencing of DNA fragments. Finally, it discusses sequencing technologies like Sanger sequencing and Illumina sequencing. The key aspects covered are isolation of DNA, fragmentation into smaller pieces, amplification via PCR or cloning, labeling fragments, sequencing individual fragments, assembling sequenced reads to generate contigs, and improvements in read length, accuracy and throughput over time with newer sequencing technologies.

Uploaded by

Ohhh Okay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views73 pages

T2 Syllabus Revision Class

This document discusses DNA sequencing processes and technologies. It begins with an overview of DNA structure, replication, and repeats found in DNA. It then covers various sequencing processes like fragmentation, amplification, isolation, and sequencing of DNA fragments. Finally, it discusses sequencing technologies like Sanger sequencing and Illumina sequencing. The key aspects covered are isolation of DNA, fragmentation into smaller pieces, amplification via PCR or cloning, labeling fragments, sequencing individual fragments, assembling sequenced reads to generate contigs, and improvements in read length, accuracy and throughput over time with newer sequencing technologies.

Uploaded by

Ohhh Okay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

T2 Syllabus revision class

• DNA Structure
• Replication
• Repeats in DNA
• Sequencing processes
• Sequencing technologies
Molecular basis of DNA Structure
• Polynucleotide chain – sugar phosphate back bone
having nitrogenous base attached to it.
• Phosphodiester Bond – the backbone of DNA
• Nucleotide has three elements – phosphate,
pentose sugar, nitrogenous base
• Pairing between Nitrogenous base is not chemical….
• Base Stacking – Allows millions of base pairs lie one
above the other
Chargaff Rule
• Erwin Chargaff “A pairs with T & C pairs with G
• C – G bond stronger than A – T (amount of heat
required to separate the DNA strand increases with
increase in G + C )
• Base composition – % of G + C in terms of % of total
base
differs among species but is constant in all cells of an
organism and within a species.
Human being – A 29.8, T- 31.8, G-20.2, C -18.2, G+C-
38.4
• A+ G = C + T that is purines = pyrimidines
Watson Crick Model
The Double Helix Structure
• Right hand twisted helix
• Ten base pairs in each helical strand
• Bases are spaced at 3.4 Å
• Each turn measures 34 Å
• Bases are perpendicular to the sugar
phosphate backbone but stacked parallel to
each other
• Grooves - The major & the minor groove,
provide binding site for the proteins.
•Read from 5’ to 3’ direction.
These labels are indicative of
free carbon on sugar
phosphate backbone ( 5’ has
terminal phosphate group &
3’ has free OH)
•DNA Length measured in
base pair (BP) units (1kb =
1000 bp)
•Sugar lie above and below
the plane containing the
base pair
Organization of DNA
• DNA strand is longer than
the nucleus
• Smallest DNA is
14000 μm.
• Average size of nucleus
6 μm
• DNA is packed as
Chromosome
• Packing ratio – Length
of DNA/length of
Chromosome
Review….
• Genome is complete DNA sequence of one set of
chromosome
• Contains both coding and the non-coding sequence of
DNA
• DNA is a polymer called poly nucleotide
• Nucleotide unit is made of – Phosphate group, pentose
sugar (deoxyribose) and nitrogenous base
• The phosphate can attach to the sugar at 3’ or 5’
position
• Nitrogenous base always at 1’ position
• Two polynucleotide chain make a DNA (double helix
structure)
• Both chain are anti parallel – 5’ of one pair the 3’ end
of another chain
• Chargaff rule – A pairs with T & G pairs with C
• Base composition – % G + C in a genome Fixed for a
species
• Base Stacking – Allows millions of base pairs lie one
above the other
• DNA Length measured in base pair (BP) units (1kb =
1000 bp)
• Both reverse and forward strand read from 5’ direction
• DNA is a very dynamic molecule
• Satisfy the criteria for genetic material - make a
copy of itself, code for life, allow for changes
• DNA is packed as Chromosome
• Packing ratio – Length of DNA/length of Chromosome
• Relplication – biological process by which DNA
makes a copy of itself
• Each strand act as a template
• can also be performed in vitro
Essentials for replication
• A parent strand as template
• Nucleotides containing bases adenine, guanine,
cytosine & thymine
• RNA Primer – oligonucleotide containing upto
30 bp
• In vitro synthesis - DNA primer is used
• DNA polymerase
• Some proteins and enzymes like helicase, ligase
Replication
• Biological process of
producing two identical
replicas of DNA from one
original molecule.
• DNA make copy of itself
• Each strand acts as
template
• can also be performed in
vitro
• Double stranded molecule gets converted into
two identical double stranded molecule/DNA
Essentials
• A parent strand as template
• Nucleotides containing bases adenine,
guanine, cytosine & thymine
• RNA Primer – oligonucleotide containing upto
30 bp
• DNA polymerase
• Some proteins and enzymes like helicase,
ligase
Primer
Replication begins at 3’end of each strand
Bases are added one at a time
Process continues till the strand is completed
How a genome looks like?
Coding Regions
• Called Genes
• Roughly 20K in number
• Make 5% of the total genome
• Eukaryotic gene contain interspersed non
coding repeated sequence – Introns
Repeated Sequences
• Function largely unknown or poorly
understood – Labeled as Junk DNA
• Repeat can be Tandem or Interspersed
Tandem Repeats

•Regions include large number of repeated DNA sequence


family
• Array can be simple or complex
•Highly repetitive
•Known as Satellite DNAs
Telomere & Centromere

• Telomeres make the


end of Chromosomes
• Base sequence (T/A)xGy
• Human telomere
TTGGGG
• Usually repeated about
3,000 times and can
reach up to 15,000 base
pairs.
Interspersed Repeated Sequence
• Identified by Barbara McClintock 1951
• Sequence dispersed throughout the sequence
• Linked to transposable elements in genome
• Can move in the genome
• Two types – Transposons & Retrotransposons
• Transposons move from one place to another by
cut and paste mechanism
• Retrotransposon move by making a copy and paste
Alu Family
 5' - Part A - A5TACA6 - Part B - PolyA Tail - 3'
• Alu sequences are Repeatitive
DNA elements
• An estimated frequency of
500 000 to 1 million copies per
genome.
• Primate-specific
• Interspersed
• May serve as functional genes
• Retrotranposon-mediated
reinsertion throughout the
genome over 65 million years
of primate evolution
Sequencing
• Read size
• Time
• Accuracy
• Cost
Timeline of large-scale genomic DNA Sequencing
• Isolation of
DNA/specific portion
• Shear into
pieces/fragments
• Amplify
• labeling
• Sequencing fragments
• Generate Reads
• Assemble reads in
order
Contigs
Isolation of DNA
• Lyse – Breaking the cell
membrane
• Bind – Binding of nucleic
acid to silica gel membrane
• Wash - washing the nucleic
acid bound to the silica gel
membrane to remove
impurities
• Elute – Removing the DNA
from silica gel membrane
https://www.youtube.com/watch?v=qfa0hi6s35E
Fragmentation
• Longer sequences subdivided into smaller
fragments.
• Sequencing can only be performed for fairly
short strands (100 to 5000 base pairs)
• Quality of the base identification decreases
with length.
• Three methods – Physical, Chemical and
enzyme assisted.
Physical method
• Physical methods like ultrasonication, acoustic
shearing use different frequency of sound
wave to shear DNA.
• The fragments obtained are nonspecific.
• Hydrogen bond in double helix as well as the
oxygen carbon bond broken.
Enzymatic Fragmentation
• Enzyme assisted fragmentation uses Endonuclease
enzymes.
• Involved in defense mechanism.
• Extracted from several bacteria.
• These enzymes are site specific, the sequence of end part
of fragments are known.
• Can break both strands uniformly or leave sticky ends
• DNA fragment produced are called restriction fragment
• Human genome produces millions of fragments
• Fragments are separated by electrophoresis
Enzymes and fragmentation
Amplification
• Process to increase the number of fragments
• Helps in isolation & identification of fragments
by gel electrophoresis
• Polymerase chain reaction(PCR)
• Cloning(DNA recombinant Technology)
PCR
• The method for selective amplification is called
the polymerase chain reaction (PCR).
• Kary B. Mullis received Nobel Prize in 1993.
• PCR amplification requires DNA polymerase, a
pair of short, synthetic primers & nucleotides
• DNA polymerase obtained from heat resistant
bacteria
• Primers complementary in sequence for each
strand of the fragment.
• The three steps are
Denaturation, annealing and
elongation
• Denaturation is unzipping of
DNA. Temperature 95oC
• Annealing of primer at
temperature 50-60oC
• Elongation of chain in 5' direction
by addition of nucleotides
• First cycle there is a pair of
parent strands and a pair of
synthetic strand
• At the end of 25th cycle 3.4x107
fragments.
https://www.youtube.com/watch?v=ThG_02miq-4
DNA Cloning
• Plasmids are special DNA
in certain bacteria
• Selected fragment of
DNA can be inserted into
the fragment by cut and
paste mechanism
• Uses restriction enzymes.
• The recombinant DNA
amplifies when insrted in
bacteria.
Amplification and isolation

•The fragments of DNA are isolated by lysis with


restriction enzymes.
•The fragments obtained are used for further analysis.
Additional information……
Sequencing
• Identification of base sequences in the
fragments
• Methods like Sanger Sequencing, Pyro
sequencing, Illumina sequencing etc employed
• Sanger method sequenced the first genome
Data Interpretation
Sanger Sequencing (I Gen)
• Sequencing method
• Chain Termination Method
• Dideoxynucleotide triphosphates(ddNTPs),
terminate DNA replication and stop synthesis.
• ddNTPs — ddGTP, ddCTP,ddTTP & ddATP (4)
• Synthesized DNAs separated by gel
electrophoresis.
• Fluorescently labeled nucleotides allowed
reading of sequence
•99.99% base
accuracy

•Read length
upto 500 bp

•Low
thoroughput
Illumina Sequencing(II Gen)
• Fast, high throughput, parallel sequencing.
• Library preparation done by fragmenting DNA
(tagmentation)
• Single stranded templates (fragments)
attached to flow cell
• Cluster formation by amplification on flow cell.
• Sequencing
Sequencing by Synthesis
• Requires DNA fragments,
bases attached with
terminator which can
fluoresce, polymerase
• Base sequencing done
one at a time
• The light emitted is
snapped and read as base.
Sequencing Read Options
• Two types: single-read and paired-end
sequencing
• Single-read sequencing reads DNA fragments
from one end to the other
• In paired-end sequencing, after a DNA
fragment is read from one end, the process
starts again in the other direction. Two reads
generated. Common method.
Nanopore Sequencing (III Gen)
• Conductivity of ion currents in the pore
changes when the strand of nucleic acid
passes through it.
• The flow of ion current depends on the shape
of the molecule translocating through the
pore. Since nucleotides have different shapes,
each nucleotide is recognized by its effect on
the change of the ionic current
• Sample preparation is minimal, less
time consuming.
• Long read lengths.
• No amplification or ligation steps
required before sequencing.
• Challenge is to optimize the speed
of DNA translocation through the
nanopore to improve measurement
accuracy and reduce the
high error rates of base
calling 
Base Calling
• Base calling is the process of assigning bases along
with the confidence level for each base.
• Computer program Phred base-calling, DNA Baser
Assembler.
• Shows high base calling accuracy.
• Each cycle generates 320,000 images (.TIFF),
• Each image approx 7Mb,
2.24x106 Mb (2.24TB) of total data.
A fragment (100bp) sequencing requires 100 cycles

• Tiff images are deleted on the fly


Reads
• Millions of raw reads generated
• Raw reads represent sequence created after imaging
process, does not give any idea about sequence alignment
• Each sequence is described over 4 lines
• Data is represented and stored in public repository
as fastq.gz files.
• Essentially they are compressed text files
• Can be manipulated with standard unix tools;
 cat, head, grep, more, less
• Same format regardless of sequencing protocol
(RNA-seq, DNA-seq etc)
• For paired-end data gives two files
– they should have the same number of lines
– the sequences should be in the same order
Errors

• Experimental errors are minimized by


redundancy, careful checking and sequencing
segments multiple times.

• Read errors generated due to


- fidelity to replication
- occurrence of variation in template
1

2 3
Base quality
• Software entrusted to read the sequence are base caller,
Phred
• The ambiguity for each base is reported as base quality.
• Base caller’s estimate of the "probability that the base
called incorrectly” is base quality(Q).
• Q = - 10 log10(p); p is the probability
that base called is incorrect.
Q = Base quality
• P is ratio of incorrect base call over the entire
cluster in terms of light intensity.
• Phred Quality Score “Q” is an easy representation of
base quality
• Characters in the base quality line match with
characters in the sequence line in a typical
FASTQ file.
Base Quality Representation
• base quality is ASCII- encoded version of
Q = - 10 log10(p)
• Method of conversion is called Phred 33.
• Base quality converted to nearest integer and
then 33 added
• Integer then mapped to ASCII character.
• ASCII is the character encoding standard for
electronic communication
• Character encoding standard for electronic
communication
American Standard Code for Information Interchange (ASCII)
Assembly of Reads
• Billions of raw read generated by sequencing
machines create terabyte of data.
• The processing and assembling of sequence is
done by assemblers like Celera Assembler,
Arachne etc.
• Sequence assembly is an algorithm-driven
automated process.
• Process based on sequence overlaps for
sequence reads in correct order
Assemble approaches
• De Novo approach – the sequence has to be
done all from scratch.
• Mapping – map the fragment from pre
sequenced genome
Type of sequencing
• Mapping
Reads generated can be mapped to the reference
genome in case the genome was earlier mapped.

• De Novo assembly
Sequencing a novel genome where there is no
reference sequence available for alignment
Analysis by Mapping
• Reads of experiment mapped to best fit region
in the reference sequence.
• For complex genome, multiple locations for a
sequence is possible
• BLAST (basic local alignment search tool)
• is used in modern genome analysis
• It compares nucleotide or protein sequences
to sequence databases and calculates the
statistical significance of matches.
de novo Assembly
• Sequence reads are assembled as contigs.
• Contiguous long sequences are obtained from
reads having common sequences
• The coverage quality of de novo sequence
data depends on the size and continuity of the
contigs (ie, the number of gaps in the data).
• de novo assemblers are greedy algorithm
assemblers (SEQAID, CAP) and de
Bruijn assemblers (ABySS, SPAdes)

You might also like