Available online at www.sciencedirect.com
BioSystems 91 (2008) 117–125
Solving the SAT problem using a DNA computing
algorithm based on ligase chain reaction
Xiaolong Wang ∗ , Zhenmin Bao, Jingjie Hu, Shi Wang, Aibin Zhan
Department of Biotechnology, Ocean University of China, Qingdao 266003, People’s Republic of China
Received 4 April 2007; received in revised form 12 August 2007; accepted 17 August 2007
Abstract
A new DNA computing algorithm based on a ligase chain reaction is demonstrated to solve an SAT problem. The proposed DNA
algorithm can solve an n-variable m-clause SAT problem in m steps and the computation time required is O (3m + n). Instead of
generating the full-solution DNA library, we start with an empty test tube and then generate solutions that partially satisfy the SAT
formula. These partial solutions are then extended step by step by the ligation of new variables using Taq DNA ligase. Correct strands
are amplified and false strands are pruned by a ligase chain reaction (LCR) as soon as they fail to satisfy the conditions. If we score
and sort the clauses, we can use this algorithm to markedly reduce the number of DNA strands required throughout the computing
process. In a computer simulation, the maximum number of DNA strands required was 20.48n when n = 50, and the exponent ratio
varied inversely with the number of variables n and the clause/variable ratio m/n. This algorithm is highly space-efficient and
error-tolerant compared to conventional brute-force searching, and thus can be scaled-up to solve large and hard SAT problems.
© 2007 Elsevier Ireland Ltd. All rights reserved.
Keywords: DNA computing; SAT problem; Ligase chain reaction; Space complexity; Time complexity
1. Introduction
DNA computing is a newly emerging interdisciplinary science that uses molecular biotechnologies
to solve problems in computer science or mathematics. In their pioneering studies, Adleman and Lipton
solved combinatorial problems [Hamilton path problem
(HPP) (Adleman, 1994) and satisfiability problem (SAT)
(Lipton, 1995)], using DNA computing algorithms based
on a brute-force search. At the beginning of computation, they constructed a DNA pool that contained the
full-solution space, and then extracted correct answers
and/or eliminated false ones from the pool step by step.
Thus, the number of distinct DNA strands required in the
∗
Corresponding author. Tel.: +86 532 82031970.
E-mail address:
[email protected] (X. Wang).
initial data pool grows exponentially with the size of the
problem, and eventually swamps the DNA data storage
for large problems, which makes molecular computation impractical from the outset. Generally, it is believed
that DNA computers that use a brute-force search algorithm are limited to 60 to 70 variables (Lipton, 1995).
Recently, a few new algorithms, such as the breadthfirst search algorithm (Yoshida and Suyama, 2000),
and random walking algorithm (Liu et al., 2005), have
been proposed. With the breadth-first search algorithm,
the capacity of a DNA computer can be theoretically
increased to about 120 variables (Yoshida and Suyama,
2000).
In the present study, we developed a new spaceefficient DNA computing algorithm based on a ligase
chain reaction (LCR), which uses only four operations:
ligating, amplifying, splitting and merging. Instead
of generating the full DNA library at the beginning,
0303-2647/$ – see front matter © 2007 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.biosystems.2007.08.006
118
X. Wang et al. / BioSystems 91 (2008) 117–125
we start with an empty test tube, and generate partial
solutions that only satisfy the clauses. The partial
solutions are extended through ligation of new variable
DNA: correct solutions are selectively amplified and
false ones are pruned by LCR.
2. DNA Code Design
The SAT problem is an NP-hard computational problem that requires an exponential amount of time to
solve with known sequential algorithms. Since all NP-
Fig. 1. The principle of the proposed DNA computer. (A) Ligate xiv with xjv to form xiv − xjv , amplify the ligation product by LCR, and then ligate xkv
with xiv − xjv to form xiv − xjv − xkv . Note that s and s’ are respectively the sense and antisense strands of the linker sequence S (5′ -ACTTTCCC-3′ ).
(B) Computation of the literal xj : DNA strands that contain xj1 (xj0 ) are (are not) amplified. (C) Computation of the literals ∼ xjv : DNA strands that
contain xj0 (xj1 ) are (are not) amplified.
X. Wang et al. / BioSystems 91 (2008) 117–125
complete problems can be encoded into SAT, the study
of SAT problems has not only occupied a central role
in theoretical computer science, but has also served as
a benchmark for testing the performance of DNA computers. An n-variable m-clause 3-SAT problem can be
written as,
m
3
F = ∧m
j=1 Cj = ∧j=1 (∨k=1 Ljk )
where Cj = (Lj1 ∨Lj2 ∨Lj3 ); ∧ is the logical AND operation; ∨ is the logical OR operation; and the literals, Ljk
(j = 1, 2, . . ., m; k = 1, 2, 3), are either a variable xi or
its negation ∼xi (i = 1, 2, . . ., n), where ∼ is the logical
NOT operation.
For simplicity, in this proof-of-principle study we
target a simple example of the SAT problem with a
4-variable 8-clause 3-CNF Boolean formula,
F = (∼ x0 ∨ x2 ∨ x1 ) ∧ (x0 ∨ x1 ∨ ∼ x2 ) ∧ (x0 ∨ ∼ x1
∨x2 ) ∧ (x0 ∨ ∼ x1 ∨ ∼ x2 ) ∧ (x0 ∨ x1 ∨ x2 )
Table 1
Oligonucleotides used to construct the DNA data
119
∧(∼ x0 ∨ ∼ x3 ∨ ∼ x2 ) ∧ (∼ x0 ∨ ∼ x1 ∨ x2 )
∧(∼ x0 ∨ ∼ x1 ∨ ∼ x2 )
Every possible solution can be represented by 4 binary
variables written as (x0v , x1v , x2v , x3v ), where xi (i = 0 to 3)
is the variable name; v is the value assignment of the variable, v = 1 represents True, and v = 0 represents False.
There exists a unique correct solution (x01 , x10 , x21 , x30 ).
The data structure was designed in the form of doublestrand DNA (dsDNA). Sixteen pairs of oligonucleotides
were synthesized and phosphorylated (Table 1). The first
eight pairs, xiv (+) and xiv (−), i = 0 to 3, v = 1 or 0,
are respectively the sense and antisense strands of the
variables. Since xiv (+) and xiv (−) are complementary
with respect to each other, they can be annealed to form
eight dsDNA molecules, xiv , i = 0 to 3, v = 1 or 0. These
molecules xiv are all 36 bp in length, include a 20-bp core
sequence (underlined in Table 1), an 8-bp upstream 3′ extension (sticky end) and an 8-bp downstream linker
(blunt end). The 20-bp core sequences represent the
value assignments of the variables. The sequences of
the sticky ends and the linkers are identical.
Another eight oligos, Piv (−), are used as primers for
the ligase chain reaction, and their sequences are the
same as that of xiv (−), i = 0 to 3, v = 1 or 0, except that
they lack the sequences in the upstream 3′ -extensions.
The last eight oligos, Qvi (−), are all 20-bp long, and their
sequences are the same as the core sequences of xiv (−)
(i = 0 to 3, v = 1 or 0), but without either the upstream
3′ -extensions or the downstream linkers. Qvi (−) have
two uses: to act as a primer for LCR and to be annealed
with xiv (+) to form eight 28-bp headers, h xiv , whose core
sequences are the same as that of xiv (i = 0 to 3, v = 1 or
0), but upstream of the core sequences is blunt-ended
and the 3′ -extensions are located downstream.
Since the downstream sticky end of h xiv and the
upstream 3′ -extensions of xiv (i = 0 to 3, v = 1 or 0,
respectively) are complementary with respect to each
other, any h xiv can be ligated with xjv to form xiv − xjv by
Taq DNA ligase (Fig. 1A). The ligation product xiv − xjv
can be amplified by a ligase chain reaction (LCR) with
four primers, xiv (+), Piv (−), xjv (+) and Qvj (−), by which
the downstream sticky end will be regenerated in the
product xiv − xjv , so that it can be ligated with xkv to
form xiv − xjv − xkv (Fig. 1A). Moreover, the ligase chain
reaction has sufficient computational power to solve
SAT problems: to compute a literal, correct strands
are selectively amplified by a LCR reaction that contains only primers that satisfy the literal; false ones are
pruned because primers that do not satisfy the literal are
excluded (Fig. 1B and C).
120
X. Wang et al. / BioSystems 91 (2008) 117–125
Fig. 2. Graph Gn which encodes the paths for SAT problems (n = 4; i,
j, k, l = 1, 2, . . ., n).
3. DNA Computing Algorithm
In this computation, we will always start with one
empty test tube. The set of DNA in the remaining test
tubes corresponds to the graph Gn (Fig. 2). The graph
Gn has n nodes, xi , xj , xk , xl . . ., where i, j, k, l ∈ [1,
n]. Every node has two edges to every neighbor, and
every edge comprises a logic gate. Every logic gates
hold a fixed inner value of 0 or 1, which is conceptually new and different from conventional logic gates. A
SAT formula is taken as a restraint condition to generate paths through the nodes, the edges and the logic
gates. All possible partial paths that start at the beginning and end at the latest node being treated are generated
step by step for each literal and each clause. A path is
called a true path and reserved if a literal in a clause is
satisfied by the gate value of the corresponding node, otherwise the path is considered a false path and removed.
All true paths for all literals of a clause are merged to
form a set of true paths for that clause; the true paths
are then extended and/or sieved for the next clause.
Therefore, all, and only, partial paths that satisfy all of
the clauses that have been treated are generated. For
example, given a SAT formula (∼xi ∨∼xj )∧(xi ∨∼xk ),
for the first clause, all paths through the first two
nodes are generated, among which three true paths,
xi0 − xj0 , xi1 − xj0 and xi0 − xj1 are reserved, and the false
path, xi1 − xj1 , is removed. For the second clause, the
true paths are extended to node xk to form six paths,
among which four true paths, xi0 − xj0 − xk0 , xi1 − xj0 −
xk0 , xi0 − xj1 − xk0 and xi1 − xj0 − xk1 , are reserved, and
the false paths, xi0 − xj0 − xk1 and xi0 − xj1 − xk1 , are
removed.
The Pseudocode for solving a 3-SAT problem with
n variables and m clauses is shown in Program 1. In
the computing process, tj contains all of the sequences
that satisfy clauses C1 to Cj . Strings that do not satisfy
C1 to Cj are not produced because the corresponding
primers are absent from tjk . After m steps of such operations guided by the SAT formula, all correct strings that
satisfy all of the clauses will be generated. The computation time is O (3m + n) because split, ligation, LCR
and merge commands are executed m, n, m and m times,
respectively.
4. Implementation of the Algorithm
The biotechnological implementation of the proposed
DNA algorithm is shown in Fig. 1. The commands are
described in detail below:
(1) Ligate (tjl , xiv ), ligation of new variables, was performed in a volume of 40 L containing 1 mol/L
header or LCR product, 1 mol/L xiv , 1X Taq DNA
ligase buffer and 80 U Taq DNA ligase (NEB). This
mixture was heated to 95 ◦ C for 1 min, gradually
cooled to 55 ◦ C, and then incubated at 55 ◦ C for
30 min.
(2) LCR (tjk , Ljk ), selective amplification of the ligation products in tjk according to Ljk , is performed
by a LCR reaction that contains only primers
that satisfy the literal Ljk . For example, amplification of xiv − xjv − xkv according to ∼xj was
performed in a total volume of 50 L, using 10 ng
of ligation product xiv − xjv − xkv , 100 nmol/L of
each primer xiv (+), xj0 (+), xkv (+), Piv (−), Pjv (−),
and Qvk (−), and 20 U Taq DNA ligase in 1X Buffer
(NEB). Amplification was carried out on a Biometre
T-gradient thermal controller as follows: predenaturing at 94 ◦ C for 1 min, followed by 25 cycles of
denaturing at 94 ◦ C for 20 s, annealing and ligation
at 60 ◦ C for 2 min, and a final ligation at 60 ◦ C for
10 min.
X. Wang et al. / BioSystems 91 (2008) 117–125
121
Fig. 3. PAGE analysis of products after each step in DNA computation. Every lane is labeled with the name of the corresponding test tube. In lanes
t11 and t12 , the specific 64-bp bands indicate the ligation products x00 − x2v and x0v − x2v , respectively, in test tubes t11 and t12 ; lanes t1 to t8 show the
LCR products merged in each step, respectively x0v − x2v − x1v (92 bp) in test tubes t1 , t2 , t3 , t4 , t5 and x0v − x2v − x1v − x3v (120 bp) in t6 , t7 , t8 ; lane
M1 is a 25-bp DNA ladder (MBI); lane M2 is a 20-bp DNA ladder (TaKaRa).
5. Results of the Lab Experiment
The DNA samples generated in each step were
detected by electrophoresis on 15% polyacrylamide gel
for 0.5 h at 240 V, results are shown in Fig. 3. To compute the first clause, DNA fragments h x0v (28 bp) were
added into three empty test tubes, t11 , t12 and t13 , ligated
with x2v (36 bp) to form x0v − x2v (64 bp), amplified by
LCR to regenerate the sticky end, and then ligated with
x1v to produce x0v − x2v − x1v (92 bp). DNA strands in t11 ,
t12 and t13 were selectively amplified by LCR, which
removed incorrect solutions, and merged into t1 . DNA
in t1 (x0v − x2v − x1v ) was split into t21 , t22 and t23 , and
further computation was carried out using program 1 in
a series of test tubes tj according to the corresponding
clause Cj , j = 2, 3 . . ., 8. Note that to compute the sixth
clause (∼x0 ∨∼x3 ∨∼x2 ), since this is the first appearance of x3 , DNA in t61 , t62 and t63 were ligated with
x3v to form x0v − x2v − x1v − x3v (120 bp), then selectively
amplified by LCR and merged into t6 . The 64 bp bands
shown in lane t1 through t8 are incomplete solutions, such
as xiv − xjv , formed in the LCR reactions; these incomplete solutions do not cause serious problems, since they
can be removed by size separation or recognized easily
by sequencing of the final answer DNA.
After eight steps of operations guided by the SAT
formula, the answer DNA was finally produced in test
tube t8 . The answer DNA in each step was detected by
molecular cloning and sequencing. The answer DNA in
t1 through t8 was re-amplified by LCR to make both
ends blunt, and the LCR product was separated by
electrophoresis on 15% polyacrylamide gel for 0.5 h at
240 V. After EB staining, the desired DNA band was
sliced under UV-light, purified and ligated into pUC19
plasmid that had been digested with EcoRV. The plasmid containing the answer DNA was transferred into
E. coli JM109 and cloned, and then for each sample
50 colonies were selected and propagated in liquid LB
medium containing 50 g/mL ampicillin. The plasmid
DNA was then extracted and sequenced. The sequences
were decoded according to Table 1 and classified. The
results are consistent with the expectation (Table 2). The
sequencing results of the last test tube (t8 ) are identical (Fig. 4), and from these the answer was decoded
into x01 − x21 − x10 − x30 . Thus, this simple example of a
3-SAT problem was solved using this DNA computing
algorithm.
6. Evaluation of the Space Complexity of the
DNA Algorithm
For a given m-clause SAT formula, F, in the computing process, when j grows from 1 to m, the number of
different DNA strands in tj equals the number of true
assignments (Nj ) for the first j clauses. The space complexity of this algorithm is the maximum number of DNA
122
X. Wang et al. / BioSystems 91 (2008) 117–125
Fig. 4. Sequence of the final answer DNA. The upstream and downstream sequencing primers are underlined and italicized. The inserted sequence
is between the two cutting points of EcoRV, which is underlined in bold. The core sequences of the variables are underlined and marked with their
names and values, which represent the unique answer x01 − x21 − x10 − x30 .
strands produced in test tubes tj , max {Nj |j = 1,. . ., m},
which is always smaller than the full-solution space (2n ).
To investigate the space complexity of this algorithm,
randomly generated 3-CNF SAT problems were solved
using a home-made simulation program running on an
HP Proliant workstation and an 80-node PC cluster. To
generate sample formulas, we wrote a program that gives
a range for the number of variables, n1 to n2 , and a range
for the clause/variable ratios, r1 to r2 , and constructs formulas of n variables and m clauses, with n ∈ [n1 , n2 ],
m/n ∈ [r1 , r2 ]. When picking up a clause, three literals were repeatedly selected independently with equal
probability, while the clause was kept free from complementary literals and identical literals.
In both conventional and molecular computing studies on SAT problems, we are mainly concerned with
hard SAT problems. It is well known that random k-SAT
undergoes a phase transition in computational complexity for k > 2. This transition occurs for specific values of
m/n, the ratio of the number of clauses (m) to the num-
ber of variables (n). For low values of m/n the problem
is easy to solve and there is a large number of possible
valuations for the variables that solve the problem; in this
regime we say that the problem is under-constrained. For
large values of m/n, the problem is over-constrained and
it is easy to discover early in the search process that no
solution to the problem exists. However, around the critical value m/n, the problem becomes computationally
hard and sometimes there is only a single valuation, out
of the 2n possible combinations, that solves the formula.
In the case of random 3-SAT, it is well known that the
clause/variable ratio of the hardest instances of 3CNFSAT is around 4.25 (Selman et al., 1996). Therefore, we
generated 50,000 instances of random 3CNF-SAT formulas, with number of variables n ∈ [5, 50] (more than
1000 for each n) and clause/variable ratio m/n ∈ [1, 50],
and then investigated how the number of partial assignments changes as the algorithm runs.
Selective LCR amplification helps to decrease the
number of partial assignments, and the more often a
Table 2
The sequencing result of each samples
Test tube no.
Clause treated
Sequence decoded
1
C1 (∼x0 ∨x2 ∨x1 )
2
3
4
5
6
7
8
C2
C3
C4
C5
C6
C7
C8
x00 − x20 − x10 , x00 − x20 − x11 , x00 − x21 − x10 , x00 − x21 − x11 x01 − x20 − x11 , x01 − x21 −
x10 , x01 − x21 − x11
x00 − x20 − x10 , x00 − x20 − x11 , x00 − x21 − x11 , x01 − x21 − x10 x01 − x20 − x11 , x01 − x21 − x11
x00 − x20 − x10 , x00 − x21 − x11 , x01 − x20 − x11 x01 − x21 − x10 , x01 − x21 − x11
x00 − x20 − x10 , x01 − x20 − x11 x01 − x21 − x10 , x01 − x21 − x11
x01 − x20 − x11 , x01 − x21 − x10 , x01 − x21 − x11
x01 − x20 − x11 − x30 , x01 − x21 − x10 − x30 x01 − x20 − x11 − x31 , x01 − x21 − x11 − x30
x01 − x21 − x10 − x30 , x01 − x21 − x11 − x30
x01 − x21 − x10 − x30
(x0 ∨x1 ∨∼x2 )
(x0 ∨∼x1 ∨x2 )
(x0 ∨∼x1 ∨∼x2 )
(x0 ∨x1 ∨x2 )
(∼x0 ∨∼x3 ∨∼x2 )
(∼x0 ∨∼x1 ∨x2 )
(∼x0 ∨∼x1 ∨∼x2 )
X. Wang et al. / BioSystems 91 (2008) 117–125
123
variable appears, the more LCR selections it brings.
Therefore, we adopted a selection-first strategy: the
clauses are scored and sorted to make the selections as
soon as possible after ligating a new variable. Once a
3-SAT formula is generated, say F = C1 ∧C2 ∧. . .Cm , the
clauses, Cj , are scored by,
Wj1 = 3k=1 log(qi );
Wj2 = 3k=1 log(i + 1)/n;
Wj = Wj1 /n + Wj2 /m; j = 1 . . . , m,
where i and qi are, respectively, the indices and the occurrence number of the variable xi for the literal Ljk . The
clauses are then sorted in descending order by Wj , by
which F was converted into an equivalent form, F ′ =
′ ; where C ′ ∈ [C , C , . . . , C ], W ′ >
C1′ ∧ C2′ ∧ . . . Cm
1
2
m
j
1
′
′
W2 > . . . > Wm . The order of the clauses is determined
mainly by Wj1 and fine-tuned by Wj2 . Next, F′ was solved
with the simulation program for this algorithm running
on a electronic computer, which gave the maximum number of partial assignments (λ) and the exponent ratio,
ρ = (log2 λ)/n. The average and maximum ratio ρ for random 3-SAT problems are shown in Fig. 5. The number
of assignments in the initial pool generated by Lipton’s
brute-force algorithm is 2n , so the exponent ratio for
a brute-force algorithm is a constant, ρLipton = 1.0. The
observed ratio ρ for this algorithm decreases almost linearly with an increase in n and the m/n ratio. When
n = 50, the overall average and maximum number of
DNA strands required is, respectively, 20.4198n and 20.48n .
If this relation 20.48n holds true or decreases further in
cases of 3-SAT problems with more variables, this algorithm should make it possible to solve large and hard
3-SAT problems with a much smaller amount of DNA
than the conventional brute-force method. Additionally,
Fig. 5. Average and maximum exponent ratio (ρ = (log2 λ)/n) for
different n and m/n ratio. Data were calculated from 50,000 random 3CNF–SAT instances with number of variables n = [5, 50] and
clauses/variable ratio m/n = [1, 50].
Fig. 6. Incidence of exponential ratio (ρ = (log2 λ)/n) obtained from
10,000 random 3CNF–SAT instances with number of variables n = 50
and clauses/variable ratio m/n = [1, 50].
as Fig. 6 indicates, the worst case situation (20.48n ) is rare;
in most cases the ratio is less than 20.446n . Thus, based
on the analysis in Section 3 and Section 6, we propose
the hypothesis that most large and hard SAT problems
can be solved on a DNA computer with time complexity
of at most O (3m + n) and space complexity of less than
20.48n .
7. Discussion
Even though the sample SAT problem solved here
is very small, the proposed DNA computing algorithm
has several advantages over the conventional brute-force
search algorithm. First, it eliminates the need to construct
a full-solution DNA library. The initial test tube (t0 ) is
empty instead of containing the full-solution data pool,
and the other test tubes tj (j = 1 to m) contain only strings
that satisfy clauses C1 to Cj , which greatly reduces the
number of DNA strands required in the DNA computation and makes it possible to extend this approach to
solve large combinatorial problems.
Second, the special characteristics of the proposed
graph that was used to construct the paths to solve the
SAT problem not only help to reduce the number of
DNA strands required, but also bring some new concepts into the graph theory of combinatorial problems.
The proposed graph differs from Lipton’s mainly in two
aspects. First, every edge is embedded with a logic gate
that holds a value assignment for the corresponding node
and checks whether a path is allowed or not, and paths
are generated step by step, restricted by the clauses of the
SAT formula. Therefore, only true paths that satisfy the
clauses being treated are reserved. In contrast, in Lipton’s
graph, all 2n possible paths are generated at the beginning
of computation. Second, the nodes are sequenced accord-
124
X. Wang et al. / BioSystems 91 (2008) 117–125
ing to the order of the first appearance of the variables in
the SAT formula instead of by their indices. Compared
to previous algorithms (Adleman, 1994; Lipton, 1995;
Yoshida and Suyama, 2000; Liu et al., 2005; Ouyang
et al., 1997; Sakamoto et al., 2000; Braich et al., 2002;
Liu et al., 2000; Su and Smith, 2004; Faulhammer et
al., 2000; Ogihara and Ray, 2000) in which the variables are commonly connected in the increasing order
of their indices, this feature of the present algorithm
makes it much easier to handle and possible to score
and sort clauses, which makes the searching space
smaller.
Third, this DNA computing algorithm is highly errortolerant. In our experimental implementation of this
algorithm, only one enzyme, Taq DNA ligase, was used
to carry out all of the enzymatic operations. Taq DNA
ligase is a highly accurate DNA ligase that has only nickrepairing activity. It can only ligate perfectly matched
sticky ends, which means that a one-base-pair mismatch in the sticky ends is enough to prevent them
from ligating DNA molecules. The intrinsic highly accurate sequence-specific DNA ligation ability of Taq DNA
ligase makes it probably one of the best tools for use
in DNA computing. Other enzymatic DNA operations,
such as PCR amplification and restriction digestion, have
been used successfully to solve the max clique problem (Ouyang et al., 1997). Those authors pointed out
that major errors arise from two sources. The first is
the production of single-stranded DNA (ssDNA) during
PCR, which cannot be cut by restriction enzymes. The
second is the incomplete cutting of double-strand DNA
(dsDNA) by restriction enzymes, which also leads to
incorrect answers (Ouyang et al., 1997).
Previously, we solved a SAT problem using PCR to
amplify the ligation product, and restriction endonuclease digestion to prune incorrect DNA strands
(unpublished). However, these methods are obviously
impractical for large problems, since it will become
increasingly difficult to handle more and more different restriction enzymes, which works often in different
buffer solutions. In the present study, all of the enzymatic
operations, including ligating, amplifying and pruning,
were carried out with the same enzyme that works in
the same buffer. This makes the whole computing process simple, fully automatable, and practical for large
problems.
Generally, PCR-based algorithms are error-prone,
since the key enzyme used in the PCR reaction, Taq DNA
polymerase, produces not only desired DNA strands but
also sometimes unwanted sequences, since it is possible that the primers may anneal non-specifically to the
wrong sequences (mismatching primers). More serious
errors, such as point mutations in the restriction sites,
may sometimes arise when PCR is used to amplify the
solution DNA, which is often required to contain many
restriction sites (Ouyang et al., 1997). In PCR, primers
that are annealed to the template are elongated nucleotide
by nucleotide, which is the main source of point mutation. In contrast, in LCR, they are extended block by
block, since all DNA sequence blocks are synthesized
before the beginning of computation, and during computation we only have to ligate these pre-synthesized
sequence blocks properly.
In this LCR-based algorithm, all of the abovementioned sources of errors, such as point mutations,
mismatching primers, incomplete cutting of ssDNA and
dsDNA, are eliminated. Thanks to the high fidelity of
LCR, this procedure gives an exponential amplifier with
a larger exponent for correct strands than for incorrect ones. Therefore, by repeating the LCR process,
we should reduce the amount of noise arising from the
remaining false strands in the template DNA. Therefore,
this algorithm is inherently more reliable than conventional PCR-based algorithms.
As noted by Adleman (1994), the major advantages
of DNA computing lie in its high information-storage
capacity and parallel-computation power. The present
algorithm takes advantage of these features of DNA
computing without generating an initial data pool that
contains every possible solution. Compared with conventional brute-force algorithms, this algorithm is more
space-efficient and error-tolerant. Though the number of
DNA strands required still increase exponentially with
the size of the problem, the exponent ratio is less than
half of that for brute-force algorithms. We believe that it
can be scaled-up to solve large and hard SAT problems.
The maximum number of variables that it can deal with
depends on how many cycles of ligation and LCR amplification can be performed to extend the DNA strands
without any serious error. In the present study, we performed three steps of extension and obtained a 4-bit DNA
solution. This process can theoretically proceed for as
many steps as desired, while the actual number of steps
should be determined in practice by further experiments.
In conclusion, we believe that the algorithm described
here represents an important contribution to the development of DNA computing models, and there is great
potential for the further exploration of DNA computers.
This algorithm starts with an empty test tube, since an
empty start actually means all of the possibilities are
included. With continual improvements in both DNA
computing algorithms and molecular biotechnologies, it
may soon be possible to develop powerful DNA computers that can be useful in certain aspects, such as
X. Wang et al. / BioSystems 91 (2008) 117–125
in an in vivo biochemical/genetic controlling system.
Unfortunately, at present, only the data are encoded into
DNA sequences; the programs used for data processing are still controlled by humans or with the aid of
electrical-mechanical systems. One of the most interesting open questions is how to create a truly molecular
computer (TMC), which means that not only the data,
but also the programs themselves are encoded into DNA
sequences, and execution of the program is not controlled by a human but rather by an automatic biological
system.
Acknowledgements
This research was supported by the National Science
Foundation of China through Grant 30500379.
References
Adleman, L.M., 1994. Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024.
Braich, R.S., Chelyapov, N., Johnson, C., Rothermund, P.W.K., Adleman, L.M., 2002. Solution of a 20-variable 3-SAT problem on a
DNA computer. Science 296, 499–502.
125
Faulhammer, D., Cukras, A.R., Lipton, R.J., Landweber, L.F., 2000.
Molecular computation: RNA solutions to chess problems. Proc.
Natl. Acad. Sci. U.S.A. 97, 1385–1389.
Lipton, R.J., 1995. Using DNA to solve NP-complete problems. Science 268, 542–545.
Liu, W., Gao, L., Zhang, Q., Xu, G., Zhu, X., Liu, X., Xu, J., 2005.
A random walk DNA algorithm for the 3-SAT problem. Curr.
Nanosci. 1, 85–90.
Liu, Q., Wang, L., Frutos, A.G., Condon, A.E., Corn, R.M., Smith,
L.M., 2000. DNA computing on surfaces. Nature 403, 175–
179.
Ogihara, M., Ray, A., 2000. DNA computing on a chip. Nature 403,
143–144.
Ouyang, Q., Kaplan, P.D., Liu, S., Libchaber, A., 1997. DNA
solution of the maximal clique problem. Science 278, 446–
449.
Sakamoto, K., Gouzu, H., Komiya, K., Kiga, D., Yokoyama, S.,
Yokomori, T., Hagiya, M., 2000. Molecular computation by DNA
hairpin formation. Science 288, 1223–1226.
Selman, B., Mitchell, D., Levesque, H., 1996. Generating hard satisfiability problem. Artif. Intell. 81, 17–29.
Su, X., Smith, L.M., 2004. Demonstration of a universal surface DNA computer. Nucleic Acids Res. 32 (10), 3115–
3123.
Yoshida, H., Suyama, A., 2000. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science. Solution to 3-SAT
by Breadth-First Search, vol. 54. American Mathematical Society,
Providence, RI, pp. 9–20.