DNA Sequencing
DNA Sequencing
E –content
Course: M.Sc.
Subject: Biochemistry; Biotechnology, Microbiology, Environmental Science
Topic: Instrumentation and Analytical Techniques
Subtopic: DNA Sequencing
Prepared by: Dr. Prabhakar Singh
Department : Biochemistry
Faculty : Science
Email: [email protected]
Contact: +91-9454695363
Note-This E-content has been prepared as a reading material for students without any
commercial interest. Original contributors are acknowledged.
DNA Sequencing
Maxam-Gilbert sequencing
Allan Maxam and Walter Gilbert published a DNA sequencing method in 1977 based on
chemical modification of DNA and subsequent cleavage at specific bases.
Also known as chemical sequencing, this method allowed purified samples of double-
stranded DNA to be used without further cloning. This method's use of radioactive labeling
and its technical complexity discouraged extensive use after refinements in the Sanger
methods had been made.
Maxam-Gilbert sequencing requires radioactive labeling at one 5' end of the DNA and
purification of the DNA fragment to be sequenced.
Chemical treatment then generates breaks at a small proportion of one or two of the four
nucleotide bases in each of four reactions (G, A+G, C, C+T).
The fragments in the four reactions are electrophoresed side by side in denaturing
acrylamide gels for size separation.
To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a
series of dark bands each corresponding to a radiolabeled DNA fragment, from which the
sequence may be inferred
Principle
The first DNA sequencing technique, using chemical reagents, was developed by
Maxam and Gilbert (1977). This method is briefly described as-
A strand of source DNA is labeled at one end with 32P. The two strands of DNA are then
separated. The labeled DNA is distributed into four samples (in separate tubes).
Each sample is subjected to treatment with a chemical that specifically destroys one
(G, C) or two bases (A + G, T + C) in the DNA. Thus, the DNA strands are partially
digested in four samples at sites G, A + G, T + C and C. This results in the formation of a
series of labeled fragments of varying lengths.
The actual length of the fragment depends on the site at which the base is destroyed
from the labeled end. Thus for instance, if there are C residues at positions 4, 7, and
10 away from the labeled end, then the treatment of DNA that specifically destroys C
will give labeled pieces of length 3, 6 and 9 bases.
The labeled DNA fragments obtained in the four tubes are subjected to
electrophoresis side by side and they are detected by autoradiograph.
The sequence of the bases in the DNA can be constructed from the bands on the
electrophoresis.
Maxam and Gilbert method
for DNA sequencing
Fragments separated
by electrophoresis
SANGER SEQUENCING/ CHAIN-TERMINATOR/
DIDEOXY SEQUENCING
Once the analog is incorporated at the growing point of the DNA chain,
the 3′end lacks a hydroxyl group and no longer is a substrate for chain elongation.
DNA polymerase moves along a single stranded template, each of the four
nucleoside triphosphates is fed sequentially and then removed.
If one of the four bases is incorporated then pyrophosphate is released and this is
detected in an enzyme cascade that emits light.
The acceptable read length of pyrosequencing currently is about 200 nucleotides, i.e.
much lessthan is achieved with Sanger sequencing. However, many modifications are
being made to the reaction conditions to extend the read length. For example, the
addition of ssDNA-binding protein to the reaction mixture increases read length,
facilitates sequencing of difficult templates, and provides flexibility in primer design.
PYROGRAM
Pyrogram of the raw data obtained from liquid-phase pyrosequencing.
Proportional signals are obtained for one, two, three, and four base
incorporations. Nucleotide addition, according to the order of nucleotides, is
indicated below the pyrogram and the obtained sequence is indicated above the
pyrogram. (Redrawn with permission from Ronaghi 2001.)
Sanger Sequencing Vs Pyrosequencing
This can make the process of genome assembly more difficult, particularly for
sequences containing a large amount of repetitive DNA.
The templates for pyrosequencing can be made both by solid phase template
preparation (streptavidin-coated magnetic beads) and enzymatic template
preparation (apyrase+exonuclease).
NOTE: In 2013 Roche announced that they would be shutting down development of
454 technology and phasing out 454 machines completely in 2016 but Roche
produces software tools which optimised the analysis of 454 sequencing data.
GS Run Processor converts raw images generated by a sequencing run into
intensity values. The process consists of two main steps: image processing and signal
processing. The software also applies normalization, signal correction, base-calling
and quality scores for individual reads.
ION TORRENT /
ION SEMICONDUCTOR SEQUENCING
The technology was licensed from DNA Electronics Ltd, developed by Ion Torrent
Systems Inc. and was released in February 2010. Ion Torrent have marketed their
machine as a rapid, compact and economical sequencer that can be utilized in a large
number of laboratories as a bench top machine.
Roche's 454 Life Sciences is partnering with DNA Electronics on the development of a
long-read, high-density semiconductor sequencing platform using this technology.
Sequencing characteristics
The per base accuracy achieved in house by Ion Torrent on the Ion Torrent Ion
semiconductor sequencer as of February 2011 was 99.6% based on 50 base reads,
with 100 Mb per run. The read-length as of February 2011 was 100 base pairs. The
accuracy for homopolymer repeats of 5 repeats in length was 98%. Later releases
show a read length of 400 base pairs.
Strengths
Another limitation of this system is the short read length compared to other
sequencing methods such as Sanger sequencing or pyrosequencing.
Longer read lengths are beneficial for de novo genome assembly. Ion
Torrent semiconductor sequencers produce an average read length of
approximately 400 nucleotides per read.
Double stranded DNA is cleaved by transposomes. The cut ends are repaired and adapters,
indices, primer binding sites, and terminal sites are added to each strand of the DNA.
2. Reduced cycle amplification
The next step is called reduced cycle amplification. During this step,
sequences for primer binding, indices, and terminal sequences are added.
Indices are usually six base pairs long and are used during DNA sequence
analysis to identify samples. Indices allow for up to 96 different samples to
be run together. During analysis, the computer will group all reads with the
same index together.
The terminal sequences are used for attaching the DNA strand to the flow
cell. Illumina uses a "sequence by synthesis" approach.
This process takes place inside of an acrylamide-coated glass flow
cell. The flow cell has oligonucleotides (short nucleotide sequences)
coating the bottom of the cell, and they serve to hold the DNA strands in
place during sequencing.
The oligos match the two kinds of terminal sequences added to the DNA
during reduced cycle amplification. As the DNA enters the flow cell, one of
the adapters attaches to a complementary oligo.
2. Bridge Amplification
Once attached, cluster generation can begin. The goal is to create hundreds of
identical strands of DNA. Some will be the forward strand; the rest, the reverse.
Clusters are generated through bridge amplification.
Polymerases move along a strand of DNA, creating its complementary strand. The
original strand is washed away, leaving only the reverse strand. At the top of the
reverse strand there is an adapter sequence. The DNA strand bends and attaches
to the oligo that is complementary to the top adapter sequence. Polymerases attach
to the reverse strand, and its complementary strand (which is identical to the
original) is made.
The now double stranded DNA is denatured so that each strand can separately
attach to an oligonucleotide sequence anchored to the flow cell. One will be the
reverse strand; the other, the forward. This process is called bridge amplification,
and it happens for thousands of clusters all over the flow cell at once
Over and over again, DNA strands will bend and attach to oligos.
Polymerases will synthesize a new strand to create a double stranded
segment, and that will be denatured so that all of the DNA strands in one area
are from a single source (clonal amplification).
At the end of clonal amplification, all of the reverse strands are washed off the flow
cell, leaving only forward strands. Primers attach to the forward strands and a
polymerase adds fluorescently tagged nucleotides to the DNA strand. Only one base
is added per round. A reversible terminator is on every nucleotide to prevent multiple
additions in one round. Using the four-colour chemistry, each of the four bases has a
unique emission, and after each round, the machine records which base was added.
Starting with the launch of the NextSeq and later the MiniSeq, Illumina introduced a
new two-colour sequencing chemistry. Nucleotides are distinguished by either one of
two colours (red or green), no colour ("black") or binding both colours (appearing
orange as a mixture between red and green).
Once the DNA strand has been read, the strand that was just added is washed away.
Then, the index 1 primer attaches, polymerizes the index 1 sequence, and is washed
away. The strand forms a bridge again, and the 3' end of the DNA strand attaches to
an oligo on the flow cell. The index 2 primer attaches, polymerizes the sequence, and
is washed away.
A polymerase sequences the complementary strand on top of the arched strand. They
separate, and the 3' end of each strand is blocked. The forward strand is washed
away, and the process of sequence by synthesis repeats for the reverse strand.
Data analysis
The sequencing occurs for millions of clusters at once, and each cluster has ~1,000
identical copies of a DNA insert. The sequence data is analyzed by finding fragments
with overlapping areas, called contigs, and lining them up. If a reference sequence is
known, the contigs are then compared to it for variant identification.
This piecemeal process allows scientists to see the complete sequence even though
an unfragmented sequence was never run; however, because Illumina read lengths
are not very long (HiSeq sequencing can produce read lengths around 90 bp long, it
can be a struggle to resolve short tandem repeat areas.
Each of the four DNA bases is attached to one of four different fluorescent
dyes. When a nucleotide is incorporated by the DNA polymerase, the
fluorescent tag is cleaved off and diffuses out of the observation area of the
ZMW where its fluorescence is no longer observable.
2. PHOSPHOLINKED NUCLEOTIDE
For each of the nucleotide bases, there is a corresponding fluorescent dye molecule
that enables the detector to identify the base being incorporated by the DNA
polymerase as it performs the DNA synthesis.
The fluorescent dye molecule is attached to the phosphate chain of the nucleotide.
When the nucleotide is incorporated by the DNA polymerase, the fluorescent dye is
cleaved off with the phosphate chain as a part of a natural DNA synthesis process
during which a phosphodiester bond is created to elongate the DNA chain. The
cleaved fluorescent dye molecule then diffuses out of the detection volume so that the
fluorescent signal is no longer detected
History
Pacific Biosciences [PacBio] commercialized SMRT sequencing in 2011,after
releasing a beta version of its RS instrument in late 2010.
On 19 Sep 2018, Pacific Biosciences [PacBio] released the Sequel 6.0
chemistry, synchronizing the chemistry version with the software version.
Performance is contrasted for large-insert libraries with high molecular weight
DNA versus shorter-insert libraries below ~15,000 bases in length. For larger
templates average read lengths are up to 30,000 bases. For shorter-insert
libraries, average read length are up to 100,000 bases while reading the same
molecule in a circle. The latter shorter-insert libraries then yield up to 50 billion
bases from a single SMRT Cell
8M Chip
In April 2019 the company released a new SMRT Cell with eight million
ZMW's, increasing the expected throughput per SMRT Cell by a factor of
eight. Early access customers in March 2019 reported throughput over 58
customer run cells of 250 GB of raw yield per cell with templates about 15 kb in
length, and 67.4 GB yield per cell with templates in higher weight molecules.
System performance is now reported in either high-molecular-weight continuous
long reads or in pre-corrected HiFi (aka CCS) reads. For high-molecular-weight
reads roughly half of all reads are longer than 50 kb in length
Application
1. Single-molecule real-time sequencing may be applicable for a broad range of genomics
research.
2. For de novo genome sequencing, read lengths from the single-molecule real-time
sequencing are comparable to or greater than that from the Sanger sequencing method
based on dideoxynucleotide chain termination.
3. The same DNA molecule can be resequenced independently by creating the circular
DNA template and utilizing a strand displacing enzyme that separates the newly
synthesized DNA strand from the template. In August 2012, scientists from the Broad
Institute published an evaluation of SMRT sequencing for SNP calling.
4. The dynamics of polymerase can indicate whether a base is methylated. Scientists
demonstrated the use of single-molecule real-time sequencing for detecting methylation
and other base modifications. In 2012 a team of scientists used SMRT sequencing to
generate the full methylomes of six bacteria. In November 2012, scientists published a
report on genome-wide methylation of an outbreak strain of E. coli
5. Long reads make it possible to sequence full gene isoforms, including the 5' and 3'
ends. This type of sequencing is useful to capture isoforms and splice variants.
6. SMRT sequencing has several applications in reproductive medical genetics research
when investigating families with suspected parental gonadal mosaicism. Long reads
enable haplotype phasing in patients to investigate parent-of-origin of mutations. Deep
sequencing enables determination of allele frequencies in sperm cells, of relevance for
estimation of recurrence risk for future affected offspring.
OXFORD NANOPORE
The device is four inches long and gets power from a USB port. MinION
decodes DNA directly as the molecule is drawn at the rate of 450
bases/second through a nanopore suspended in a membrane.
MinION: this portable protein nanopore sequencing USB device has been
commercially available since May 2015
GridION X5: this desktop device has been commercially available since March
2017. The device processes up to five MinION Flow Cells and enables generation
of up to 100 Gb of data per run.
Metrichor: this spinout company from Oxford Nanopore was set up to provide
end to end solutions for biological analyses, using nanopore sensing technologies.
Use the In-tip detection method for a range between 5-1200ng/ul of dsDNA or for an
extended range switch the holder to use the TrayCell and benefit from the extended
concentration range up to 8500ng/ul dsDNA.
1. Voet D, Voet J. .G “Biochemistry”, 4th Edition, John Wiley & Sons, Inc.,2010.
ISBN: 978-0-470-57095-1
2. David L. Nelson; Michael M. Cox “Lehninger Principles of Biochemistry”
Seventh Edition, Macmillan 2017, ISBN:9781464187957
3. Wilson, K., Walker, J. (eds.); Cambridge University Press, Cambridge, 2000, 784
pp., ISBN 0‐521‐65873‐X (paperback)
4. Primrose S B, Twyman R. Principles of Gene Manipulation and Genomics, 7th
Edition. John Wiley & Sons, Inc, 2006. ISBN: 978-1-405-13544-3
All the original contributors of the concept and findings published are
gratefully acknowledged while preparing the e-content for the
students of Biochemistry and allied sciences