DNA Sequencing
DNA Sequencing
(2023 - 2024)
Table of Contents
1. Introduction.............................................................................................. 1
6. Conclusion ................................................................................................ 17
7. References ................................................................................................. 18
1. Introduction
The plummeting costs of DNA sequencing render it possible to approach
such a grand goal. Yet, the current state-of-the-art clinical research toolkit
contains blunt instruments. The gold-standard clinical studies - randomized
controlled trials - seldom make use of molecular data, and even more rarely
make use of the blooming availability of vast new categories of "omic" data.
The RCT has been a bedrock of clinical research in the quest to establish the
safety and efficacy of drugs and protocols. Many lessons have been learned,
usually through hard experience. But the RCT has had limitations that have
led to years of delays in drug development and, remarkably, questions about
its power. Randomizing patients to one of two treatments enables the
estimator of the average effect of treatment to pick up the difference
between the average treatment value of outcomes under the true treatment
and any other treatment. (Eichler et al.2021)
The cost of determining a human DNA sequence has fallen drastically, from
about $3 billion for the first sequence in 2003 to less than $1000 in 2015.
The race to sequence much larger numbers of human genomes is now on.
Vast amounts of patient genomic information will soon be available, and
vast amounts of treatment and outcome information will also be released
under the aegis of precision or personalized medicine. In an ideal clinical
research world, massive data on patient genomic and treatment outcomes
could be rapidly combined to advance our understanding of human biology,
so that the path to better cures and therapeutics could be quickly mapped.
Health maintenance or decreasing the time to the cure all rely on
understanding biology. (Tyson et al.2020)
1
2. Techniques and Methods
Usually, helical single molecule sequencing is faster. New techniques use
polymerase to rapidly identify DNA bases in a small chamber. Many
machines sequence one DNA fragment at a time, as it is typically faster. 454
grows a lawn of DNA in a large device with a few beads. Ultrasonic
vibrations distribute the beads evenly in the chamber. (Chakrawarti et al.)
Some new methods are based on structures. For example, 454 sequencing
machines grow identical DNA molecules on beads. DNA polymerase copies
the bases using mononucleotide triphosphates. Correct polymerization
releases inorganic pyrophosphate, generating enough phosphate ions to form
a light disc detected by sensitive detectors. The location of the disc matches
the primer position. After polymerase is done, DNA fragments are released,
allowing the next round of base copying to start. (Piccaluga, 2023)
2
2.1. Sanger Sequencing
3
that is used to determine the order of the four bases: adenine, guanine,
cytosine, and thymine. The advent of rapid DNA sequencing methods has
greatly accelerated biological and medical research and discovery. DNA
sequencing has become the basis for essential science, medicine, and
forensics and is a key tool in many applications of biotechnology. Given
this, the process of sequencing DNA becomes cheaper, faster, and easier
over time. There are two major strategies used to sequence DNA: Maxam-
Gilbert (chemical) sequencing and Sanger (chain-termination) sequencing.
And now Illumina sequencing seems to have pushed back on these two
competitors, becoming the dominant technology as seen.
In the 1990s, the NHGRI aimed to reduce the cost of genome sequencing
significantly. They developed methods to directly interpret DNA sequence
information and achieved this through parallel sequencing and optical
imaging. The NHGRI has now reached their goal, known as the "$1000
Genome" program. (Schloss et al.2020)
5
Figure 3: PacBio SMRT technology and Oxford Nanopore can use unaltered
DNA to detect methylation.
6
individual developing a disease. Once genetic tests are developed that reflect
these genetic variations, they could be used to personalize treatment for an
individual. Current knowledge of DNA variation has significantly advanced
our understanding of the genetic variation which plays a role in neurological
and psychiatric problems as well as phenotype variability. Understanding the
genetic basis for undesirable medical traits may help people to understand
that the trait is not what they want, and that they are at risk for the problem.
7
3.1. Genetic Research
8
disease causes and pathogens. Next-gen DNA sequencing has made it a vital
tool in biological research. (Molla et al., 2021)
9
privacy concerns and to find an acceptable format for collecting and storing
DNA sequence data. Their results also indicated that even individuals not
sharing any DNA data have a 12% general chance of matching. This might
well be far too high given the sensitive nature of the data. Their estimate
grows even higher for individuals who are either European or both European
and Asian descent. Their stance is that it is possible to threaten the identity
of semi-anonymous individuals by going through these SNP catalogs, cross-
referencing and tasks along those lines. This is a large cause for concern and
potential target for insurance companies and employers.
In forensic analysis, DNA sequence data has been added to other kinds of
biological evidence in the identification of cold cases and also in the
exoneration of some innocent defendants. Despite the fact that DNA
sequence information is often the most discriminating measure among
individuals, the fact that individuals share DNA sequence data with their
close relatives or with strangers from certain countries needs to be taken into
account. There are existing methods for predicting some of these attributes
from just the DNA sequence data of that person. (Mittelman, 2021)
10
gives us the likelihood of the higher quality signal and a pair of optimal
matched state path and hidden path for every signal-sequence pair. Our
prediction of sequencing error probabilities is thus directly related to the
state from which the given base would emanate. The decoding described in
this dissertation corresponds to fixing, beforehand, the known local
alignment of the signal and DNA sequences to be computed.
11
fragment yields alignments like those in Figure 2. We follow the current
practice of calling each DNA fragment a read. A read that was reversed
before binding to the sequenced sample DNA molecule is said to read in
reverse or revcomp. A typical read originates at an unknown position in an
unknown DNA molecule. Its actual origin depends on both the sample DNA
and the shearing procedure. The randomness and uniformity of the shear and
source distributions make the alignment and orientation of sequenced reads
random when the reads originate at unique sites in a single copy of a
genome.
12
number of bases sequenced and instrument noise may be contributing
factors.
DNA sequencing technology has improved, but errors still occur at a rate of
one per 10,000 base pairs. More funding is needed to complete sequencing
of eukaryotic genomes. Base data is used to find genes and proteins.
Genomes contain all genetic information, but not all is necessarily active.
Researchers must first define scientific questions before beginning a
sequencing project. (Delahaye & Nicolas, 2021)
13
sequencing the human genome required 3 liters of capillary buffer to do one
single read per individual base pair. Technology by Illumina has improved
upon individual health in the 15 years since the first human genome was
sequenced but the cost per genome is still approximately $1,500. Other
whole-light technologies such as Oxford Nanopore and Pacific Biosciences
have not yet reached the same throughput as cloud platforms. (Wonkam,
2021)
14
5.1. Nanopore Sequencing
The DNA block's biological way of translating information into the ionic
current varies with nature and the type of DNA. Resistance is lower for
single strands passing through a narrowed pore, transmitting signals on a
shorter timescale. To sequence living matter, single bases must be firmly
held in the pore to generate interactions between the polymeric filament and
the bases. The forces that hold the base stationary and the absence of
interactions from neighboring bases are not yet understood. Materials and
engineering can help provide sensitivity and force for the obstructing base to
approach the pore's mouth.
The idea of sequencing DNA using a polymer and a nanopore dates back to
the 1990s. David D. Deamer and Daniel Branton proposed that it should be
possible to determine the DNA sequence as it passes through a membrane
with a nanopore. The key is to drive the DNA through the pore so that each
nucleotide base blocks the flow of ions in a characteristic way.
Distinguishing all four bases requires a synthetic polymer with 20^4
combinations. Initially, there were doubts about achieving this resolution,
but recent developments in synthetic pore manufacturing and
nanofabrication have renewed interest in the concept. (MacKenzie &
Argyropoulos, 2023)
16
DNA metaphoric copy of the DNA sequence currently being synthesized
and a tight but still open DNA polymerase that synthesizes DNA product in
the correct direction.
6. Conclusion
In conclusion, we believe that the Paracel platform is a good, efficient
candidate for solving many problems in computational biology, including
sequence error detection. Interested readers can build their own distributed
software for solving complex biological problems using Paracel as a
foundation. The source code of ParMBD can be downloaded for installation,
and more information about ParMBD is also available on the following
website. The future research will focus on RNA-seq data analysis and
multiple sequence alignment.
In this work, Paracel Sequence was used to solve the most compute-
intensive part of sequence error detection, and we achieved a 2x speedup.
Furthermore, ParMBD was proposed to take Paracel's advantage in solving
sequence error detection problems. We analyzed the performance of
17
ParMBD and showed that it scaled well with the sequence length and the
sequence numbers. The publicly available tools were compared based on
time consumption, space usage, and memory usage. Both real and simulated
data were considered in our comparative analysis.
7. References
Eichler, H. G., Pignatti, F., Schwarzer‐Daum, B., Hidalgo‐Simon, A.,
Eichler, I., Arlett, P., ... & Rasi, G. (2021). Randomized controlled trials
versus real world evidence: neither magic nor myth. Clinical Pharmacology
& Therapeutics, 109(5), 1212-1218.
Tyson, J. R., James, P., Stoddart, D., Sparks, N., Wickenhagen, A., Hall, G.,
... & Quick, J. (2020). Improvements to the ARTIC multiplex PCR method
for SARS-CoV-2 genome sequencing using nanopore. BioRxiv.
Cheng, C., Fei, Z., & Xiao, P. (2023). Methods to improve the accuracy of
next-generation sequencing. Frontiers in bioengineering and biotechnology,
11, 982111.
18
Schloss, J. A., Gibbs, R. A., Makhijani, V. B., & Marziali, A. (2020).
Cultivating DNA sequencing technology after the human genome project.
Annual review of genomics and human genetics, 21, 117-138.
Kumar, A., Reed, A. J., Zahurancik, W. J., Daskalova, S. M., Hecht, S. M.,
& Suo, Z. (2022). Interlocking activities of DNA polymerase β in the base
excision repair pathway. Proceedings of the National Academy of Sciences,
119(10), e2118940119.
Yoo, K. W., Yadav, M. K., Song, Q., Atala, A., & Lu, B. (2022). Targeting
DNA polymerase to DNA double-strand breaks reduces DNA deletion size
and increases templated insertions generated by CRISPR/Cas9. Nucleic
Acids Research, 50(7), 3944-3957.
Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko,
A., ... & Phillippy, A. M. (2022). The complete sequence of a human
genome. Science, 376(6588), 44-53.
Molla, K. A., Sretenovic, S., Bansal, K. C., & Qi, Y. (2021). Precise plant
genome editing using base editors and prime editors. Nature Plants.
Jeon, S. A., Park, J. L., Park, S. J., Kim, J. H., Goh, S. H., Han, J. Y., &
Kim, S. Y. (2021). Comparison between MGI and Illumina sequencing
platforms for whole genome sequencing. Genes & Genomics, 43, 713-724.
Hu, T., Chitnis, N., Monos, D., & Dinh, A. (2021). Next-generation
sequencing technologies: An overview. Human Immunology.
19
Faust, H., Duffek, P., Hentschel, J., & Popp, D. (2024). Evaluation of
Automated Magnetic Bead–Based DNA Extraction for Detection of Short
Tandem Repeat Expansions With Nanopore Sequencing. Journal of Clinical
Laboratory Analysis, 38(6), e25029.
Maghini, D. G., Moss, E. L., Vance, S. E., & Bhatt, A. S. (2021). Improved
high-molecular-weight DNA extraction, nanopore sequencing and
metagenomic assembly from the human gut microbiome. Nature Protocols.
Exworthy, M., Lunt, N., Tuck, P., & Mistry, R. (2024). From
commodification to entrepreneurialism: how commercial income is
transforming the English NHS. Public Money & Management, 44(4), 308-
316.
Baum, C., Berlips, J., Chen, W., Cui, H., Damgard, I., Dong, J., ... & Zhang,
H. (2024). A system capable of verifiably and privately screening global
DNA synthesis. arXiv preprint arXiv:2403.14023.
20
Jiang, C., Zhang, Y., Wang, F., & Liu, H. (2021). Toward Smart Information
Processing with Synthetic DNA Molecules. Macromolecular Rapid
Communications, 42(11), 2100084.
21