Academia.eduAcademia.edu
Available online at www.sciencedirect.com BioSystems 91 (2008) 117–125 Solving the SAT problem using a DNA computing algorithm based on ligase chain reaction Xiaolong Wang ∗ , Zhenmin Bao, Jingjie Hu, Shi Wang, Aibin Zhan Department of Biotechnology, Ocean University of China, Qingdao 266003, People’s Republic of China Received 4 April 2007; received in revised form 12 August 2007; accepted 17 August 2007 Abstract A new DNA computing algorithm based on a ligase chain reaction is demonstrated to solve an SAT problem. The proposed DNA algorithm can solve an n-variable m-clause SAT problem in m steps and the computation time required is O (3m + n). Instead of generating the full-solution DNA library, we start with an empty test tube and then generate solutions that partially satisfy the SAT formula. These partial solutions are then extended step by step by the ligation of new variables using Taq DNA ligase. Correct strands are amplified and false strands are pruned by a ligase chain reaction (LCR) as soon as they fail to satisfy the conditions. If we score and sort the clauses, we can use this algorithm to markedly reduce the number of DNA strands required throughout the computing process. In a computer simulation, the maximum number of DNA strands required was 20.48n when n = 50, and the exponent ratio varied inversely with the number of variables n and the clause/variable ratio m/n. This algorithm is highly space-efficient and error-tolerant compared to conventional brute-force searching, and thus can be scaled-up to solve large and hard SAT problems. © 2007 Elsevier Ireland Ltd. All rights reserved. Keywords: DNA computing; SAT problem; Ligase chain reaction; Space complexity; Time complexity 1. Introduction DNA computing is a newly emerging interdisciplinary science that uses molecular biotechnologies to solve problems in computer science or mathematics. In their pioneering studies, Adleman and Lipton solved combinatorial problems [Hamilton path problem (HPP) (Adleman, 1994) and satisfiability problem (SAT) (Lipton, 1995)], using DNA computing algorithms based on a brute-force search. At the beginning of computation, they constructed a DNA pool that contained the full-solution space, and then extracted correct answers and/or eliminated false ones from the pool step by step. Thus, the number of distinct DNA strands required in the ∗ Corresponding author. Tel.: +86 532 82031970. E-mail address: [email protected] (X. Wang). initial data pool grows exponentially with the size of the problem, and eventually swamps the DNA data storage for large problems, which makes molecular computation impractical from the outset. Generally, it is believed that DNA computers that use a brute-force search algorithm are limited to 60 to 70 variables (Lipton, 1995). Recently, a few new algorithms, such as the breadthfirst search algorithm (Yoshida and Suyama, 2000), and random walking algorithm (Liu et al., 2005), have been proposed. With the breadth-first search algorithm, the capacity of a DNA computer can be theoretically increased to about 120 variables (Yoshida and Suyama, 2000). In the present study, we developed a new spaceefficient DNA computing algorithm based on a ligase chain reaction (LCR), which uses only four operations: ligating, amplifying, splitting and merging. Instead of generating the full DNA library at the beginning, 0303-2647/$ – see front matter © 2007 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2007.08.006 118 X. Wang et al. / BioSystems 91 (2008) 117–125 we start with an empty test tube, and generate partial solutions that only satisfy the clauses. The partial solutions are extended through ligation of new variable DNA: correct solutions are selectively amplified and false ones are pruned by LCR. 2. DNA Code Design The SAT problem is an NP-hard computational problem that requires an exponential amount of time to solve with known sequential algorithms. Since all NP- Fig. 1. The principle of the proposed DNA computer. (A) Ligate xiv with xjv to form xiv − xjv , amplify the ligation product by LCR, and then ligate xkv with xiv − xjv to form xiv − xjv − xkv . Note that s and s’ are respectively the sense and antisense strands of the linker sequence S (5′ -ACTTTCCC-3′ ). (B) Computation of the literal xj : DNA strands that contain xj1 (xj0 ) are (are not) amplified. (C) Computation of the literals ∼ xjv : DNA strands that contain xj0 (xj1 ) are (are not) amplified. X. Wang et al. / BioSystems 91 (2008) 117–125 complete problems can be encoded into SAT, the study of SAT problems has not only occupied a central role in theoretical computer science, but has also served as a benchmark for testing the performance of DNA computers. An n-variable m-clause 3-SAT problem can be written as, m 3 F = ∧m j=1 Cj = ∧j=1 (∨k=1 Ljk ) where Cj = (Lj1 ∨Lj2 ∨Lj3 ); ∧ is the logical AND operation; ∨ is the logical OR operation; and the literals, Ljk (j = 1, 2, . . ., m; k = 1, 2, 3), are either a variable xi or its negation ∼xi (i = 1, 2, . . ., n), where ∼ is the logical NOT operation. For simplicity, in this proof-of-principle study we target a simple example of the SAT problem with a 4-variable 8-clause 3-CNF Boolean formula, F = (∼ x0 ∨ x2 ∨ x1 ) ∧ (x0 ∨ x1 ∨ ∼ x2 ) ∧ (x0 ∨ ∼ x1 ∨x2 ) ∧ (x0 ∨ ∼ x1 ∨ ∼ x2 ) ∧ (x0 ∨ x1 ∨ x2 ) Table 1 Oligonucleotides used to construct the DNA data 119 ∧(∼ x0 ∨ ∼ x3 ∨ ∼ x2 ) ∧ (∼ x0 ∨ ∼ x1 ∨ x2 ) ∧(∼ x0 ∨ ∼ x1 ∨ ∼ x2 ) Every possible solution can be represented by 4 binary variables written as (x0v , x1v , x2v , x3v ), where xi (i = 0 to 3) is the variable name; v is the value assignment of the variable, v = 1 represents True, and v = 0 represents False. There exists a unique correct solution (x01 , x10 , x21 , x30 ). The data structure was designed in the form of doublestrand DNA (dsDNA). Sixteen pairs of oligonucleotides were synthesized and phosphorylated (Table 1). The first eight pairs, xiv (+) and xiv (−), i = 0 to 3, v = 1 or 0, are respectively the sense and antisense strands of the variables. Since xiv (+) and xiv (−) are complementary with respect to each other, they can be annealed to form eight dsDNA molecules, xiv , i = 0 to 3, v = 1 or 0. These molecules xiv are all 36 bp in length, include a 20-bp core sequence (underlined in Table 1), an 8-bp upstream 3′ extension (sticky end) and an 8-bp downstream linker (blunt end). The 20-bp core sequences represent the value assignments of the variables. The sequences of the sticky ends and the linkers are identical. Another eight oligos, Piv (−), are used as primers for the ligase chain reaction, and their sequences are the same as that of xiv (−), i = 0 to 3, v = 1 or 0, except that they lack the sequences in the upstream 3′ -extensions. The last eight oligos, Qvi (−), are all 20-bp long, and their sequences are the same as the core sequences of xiv (−) (i = 0 to 3, v = 1 or 0), but without either the upstream 3′ -extensions or the downstream linkers. Qvi (−) have two uses: to act as a primer for LCR and to be annealed with xiv (+) to form eight 28-bp headers, h xiv , whose core sequences are the same as that of xiv (i = 0 to 3, v = 1 or 0), but upstream of the core sequences is blunt-ended and the 3′ -extensions are located downstream. Since the downstream sticky end of h xiv and the upstream 3′ -extensions of xiv (i = 0 to 3, v = 1 or 0, respectively) are complementary with respect to each other, any h xiv can be ligated with xjv to form xiv − xjv by Taq DNA ligase (Fig. 1A). The ligation product xiv − xjv can be amplified by a ligase chain reaction (LCR) with four primers, xiv (+), Piv (−), xjv (+) and Qvj (−), by which the downstream sticky end will be regenerated in the product xiv − xjv , so that it can be ligated with xkv to form xiv − xjv − xkv (Fig. 1A). Moreover, the ligase chain reaction has sufficient computational power to solve SAT problems: to compute a literal, correct strands are selectively amplified by a LCR reaction that contains only primers that satisfy the literal; false ones are pruned because primers that do not satisfy the literal are excluded (Fig. 1B and C). 120 X. Wang et al. / BioSystems 91 (2008) 117–125 Fig. 2. Graph Gn which encodes the paths for SAT problems (n = 4; i, j, k, l = 1, 2, . . ., n). 3. DNA Computing Algorithm In this computation, we will always start with one empty test tube. The set of DNA in the remaining test tubes corresponds to the graph Gn (Fig. 2). The graph Gn has n nodes, xi , xj , xk , xl . . ., where i, j, k, l ∈ [1, n]. Every node has two edges to every neighbor, and every edge comprises a logic gate. Every logic gates hold a fixed inner value of 0 or 1, which is conceptually new and different from conventional logic gates. A SAT formula is taken as a restraint condition to generate paths through the nodes, the edges and the logic gates. All possible partial paths that start at the beginning and end at the latest node being treated are generated step by step for each literal and each clause. A path is called a true path and reserved if a literal in a clause is satisfied by the gate value of the corresponding node, otherwise the path is considered a false path and removed. All true paths for all literals of a clause are merged to form a set of true paths for that clause; the true paths are then extended and/or sieved for the next clause. Therefore, all, and only, partial paths that satisfy all of the clauses that have been treated are generated. For example, given a SAT formula (∼xi ∨∼xj )∧(xi ∨∼xk ), for the first clause, all paths through the first two nodes are generated, among which three true paths, xi0 − xj0 , xi1 − xj0 and xi0 − xj1 are reserved, and the false path, xi1 − xj1 , is removed. For the second clause, the true paths are extended to node xk to form six paths, among which four true paths, xi0 − xj0 − xk0 , xi1 − xj0 − xk0 , xi0 − xj1 − xk0 and xi1 − xj0 − xk1 , are reserved, and the false paths, xi0 − xj0 − xk1 and xi0 − xj1 − xk1 , are removed. The Pseudocode for solving a 3-SAT problem with n variables and m clauses is shown in Program 1. In the computing process, tj contains all of the sequences that satisfy clauses C1 to Cj . Strings that do not satisfy C1 to Cj are not produced because the corresponding primers are absent from tjk . After m steps of such operations guided by the SAT formula, all correct strings that satisfy all of the clauses will be generated. The computation time is O (3m + n) because split, ligation, LCR and merge commands are executed m, n, m and m times, respectively. 4. Implementation of the Algorithm The biotechnological implementation of the proposed DNA algorithm is shown in Fig. 1. The commands are described in detail below: (1) Ligate (tjl , xiv ), ligation of new variables, was performed in a volume of 40 ␮L containing 1 ␮mol/L header or LCR product, 1 ␮mol/L xiv , 1X Taq DNA ligase buffer and 80 U Taq DNA ligase (NEB). This mixture was heated to 95 ◦ C for 1 min, gradually cooled to 55 ◦ C, and then incubated at 55 ◦ C for 30 min. (2) LCR (tjk , Ljk ), selective amplification of the ligation products in tjk according to Ljk , is performed by a LCR reaction that contains only primers that satisfy the literal Ljk . For example, amplification of xiv − xjv − xkv according to ∼xj was performed in a total volume of 50 ␮L, using 10 ng of ligation product xiv − xjv − xkv , 100 nmol/L of each primer xiv (+), xj0 (+), xkv (+), Piv (−), Pjv (−), and Qvk (−), and 20 U Taq DNA ligase in 1X Buffer (NEB). Amplification was carried out on a Biometre T-gradient thermal controller as follows: predenaturing at 94 ◦ C for 1 min, followed by 25 cycles of denaturing at 94 ◦ C for 20 s, annealing and ligation at 60 ◦ C for 2 min, and a final ligation at 60 ◦ C for 10 min. X. Wang et al. / BioSystems 91 (2008) 117–125 121 Fig. 3. PAGE analysis of products after each step in DNA computation. Every lane is labeled with the name of the corresponding test tube. In lanes t11 and t12 , the specific 64-bp bands indicate the ligation products x00 − x2v and x0v − x2v , respectively, in test tubes t11 and t12 ; lanes t1 to t8 show the LCR products merged in each step, respectively x0v − x2v − x1v (92 bp) in test tubes t1 , t2 , t3 , t4 , t5 and x0v − x2v − x1v − x3v (120 bp) in t6 , t7 , t8 ; lane M1 is a 25-bp DNA ladder (MBI); lane M2 is a 20-bp DNA ladder (TaKaRa). 5. Results of the Lab Experiment The DNA samples generated in each step were detected by electrophoresis on 15% polyacrylamide gel for 0.5 h at 240 V, results are shown in Fig. 3. To compute the first clause, DNA fragments h x0v (28 bp) were added into three empty test tubes, t11 , t12 and t13 , ligated with x2v (36 bp) to form x0v − x2v (64 bp), amplified by LCR to regenerate the sticky end, and then ligated with x1v to produce x0v − x2v − x1v (92 bp). DNA strands in t11 , t12 and t13 were selectively amplified by LCR, which removed incorrect solutions, and merged into t1 . DNA in t1 (x0v − x2v − x1v ) was split into t21 , t22 and t23 , and further computation was carried out using program 1 in a series of test tubes tj according to the corresponding clause Cj , j = 2, 3 . . ., 8. Note that to compute the sixth clause (∼x0 ∨∼x3 ∨∼x2 ), since this is the first appearance of x3 , DNA in t61 , t62 and t63 were ligated with x3v to form x0v − x2v − x1v − x3v (120 bp), then selectively amplified by LCR and merged into t6 . The 64 bp bands shown in lane t1 through t8 are incomplete solutions, such as xiv − xjv , formed in the LCR reactions; these incomplete solutions do not cause serious problems, since they can be removed by size separation or recognized easily by sequencing of the final answer DNA. After eight steps of operations guided by the SAT formula, the answer DNA was finally produced in test tube t8 . The answer DNA in each step was detected by molecular cloning and sequencing. The answer DNA in t1 through t8 was re-amplified by LCR to make both ends blunt, and the LCR product was separated by electrophoresis on 15% polyacrylamide gel for 0.5 h at 240 V. After EB staining, the desired DNA band was sliced under UV-light, purified and ligated into pUC19 plasmid that had been digested with EcoRV. The plasmid containing the answer DNA was transferred into E. coli JM109 and cloned, and then for each sample 50 colonies were selected and propagated in liquid LB medium containing 50 ␮g/mL ampicillin. The plasmid DNA was then extracted and sequenced. The sequences were decoded according to Table 1 and classified. The results are consistent with the expectation (Table 2). The sequencing results of the last test tube (t8 ) are identical (Fig. 4), and from these the answer was decoded into x01 − x21 − x10 − x30 . Thus, this simple example of a 3-SAT problem was solved using this DNA computing algorithm. 6. Evaluation of the Space Complexity of the DNA Algorithm For a given m-clause SAT formula, F, in the computing process, when j grows from 1 to m, the number of different DNA strands in tj equals the number of true assignments (Nj ) for the first j clauses. The space complexity of this algorithm is the maximum number of DNA 122 X. Wang et al. / BioSystems 91 (2008) 117–125 Fig. 4. Sequence of the final answer DNA. The upstream and downstream sequencing primers are underlined and italicized. The inserted sequence is between the two cutting points of EcoRV, which is underlined in bold. The core sequences of the variables are underlined and marked with their names and values, which represent the unique answer x01 − x21 − x10 − x30 . strands produced in test tubes tj , max {Nj |j = 1,. . ., m}, which is always smaller than the full-solution space (2n ). To investigate the space complexity of this algorithm, randomly generated 3-CNF SAT problems were solved using a home-made simulation program running on an HP Proliant workstation and an 80-node PC cluster. To generate sample formulas, we wrote a program that gives a range for the number of variables, n1 to n2 , and a range for the clause/variable ratios, r1 to r2 , and constructs formulas of n variables and m clauses, with n ∈ [n1 , n2 ], m/n ∈ [r1 , r2 ]. When picking up a clause, three literals were repeatedly selected independently with equal probability, while the clause was kept free from complementary literals and identical literals. In both conventional and molecular computing studies on SAT problems, we are mainly concerned with hard SAT problems. It is well known that random k-SAT undergoes a phase transition in computational complexity for k > 2. This transition occurs for specific values of m/n, the ratio of the number of clauses (m) to the num- ber of variables (n). For low values of m/n the problem is easy to solve and there is a large number of possible valuations for the variables that solve the problem; in this regime we say that the problem is under-constrained. For large values of m/n, the problem is over-constrained and it is easy to discover early in the search process that no solution to the problem exists. However, around the critical value m/n, the problem becomes computationally hard and sometimes there is only a single valuation, out of the 2n possible combinations, that solves the formula. In the case of random 3-SAT, it is well known that the clause/variable ratio of the hardest instances of 3CNFSAT is around 4.25 (Selman et al., 1996). Therefore, we generated 50,000 instances of random 3CNF-SAT formulas, with number of variables n ∈ [5, 50] (more than 1000 for each n) and clause/variable ratio m/n ∈ [1, 50], and then investigated how the number of partial assignments changes as the algorithm runs. Selective LCR amplification helps to decrease the number of partial assignments, and the more often a Table 2 The sequencing result of each samples Test tube no. Clause treated Sequence decoded 1 C1 (∼x0 ∨x2 ∨x1 ) 2 3 4 5 6 7 8 C2 C3 C4 C5 C6 C7 C8 x00 − x20 − x10 , x00 − x20 − x11 , x00 − x21 − x10 , x00 − x21 − x11 x01 − x20 − x11 , x01 − x21 − x10 , x01 − x21 − x11 x00 − x20 − x10 , x00 − x20 − x11 , x00 − x21 − x11 , x01 − x21 − x10 x01 − x20 − x11 , x01 − x21 − x11 x00 − x20 − x10 , x00 − x21 − x11 , x01 − x20 − x11 x01 − x21 − x10 , x01 − x21 − x11 x00 − x20 − x10 , x01 − x20 − x11 x01 − x21 − x10 , x01 − x21 − x11 x01 − x20 − x11 , x01 − x21 − x10 , x01 − x21 − x11 x01 − x20 − x11 − x30 , x01 − x21 − x10 − x30 x01 − x20 − x11 − x31 , x01 − x21 − x11 − x30 x01 − x21 − x10 − x30 , x01 − x21 − x11 − x30 x01 − x21 − x10 − x30 (x0 ∨x1 ∨∼x2 ) (x0 ∨∼x1 ∨x2 ) (x0 ∨∼x1 ∨∼x2 ) (x0 ∨x1 ∨x2 ) (∼x0 ∨∼x3 ∨∼x2 ) (∼x0 ∨∼x1 ∨x2 ) (∼x0 ∨∼x1 ∨∼x2 ) X. Wang et al. / BioSystems 91 (2008) 117–125 123 variable appears, the more LCR selections it brings. Therefore, we adopted a selection-first strategy: the clauses are scored and sorted to make the selections as soon as possible after ligating a new variable. Once a 3-SAT formula is generated, say F = C1 ∧C2 ∧. . .Cm , the clauses, Cj , are scored by,  Wj1 = 3k=1 log(qi );  Wj2 = 3k=1 log(i + 1)/n; Wj = Wj1 /n + Wj2 /m; j = 1 . . . , m, where i and qi are, respectively, the indices and the occurrence number of the variable xi for the literal Ljk . The clauses are then sorted in descending order by Wj , by which F was converted into an equivalent form, F ′ = ′ ; where C ′ ∈ [C , C , . . . , C ], W ′ > C1′ ∧ C2′ ∧ . . . Cm 1 2 m j 1 ′ ′ W2 > . . . > Wm . The order of the clauses is determined mainly by Wj1 and fine-tuned by Wj2 . Next, F′ was solved with the simulation program for this algorithm running on a electronic computer, which gave the maximum number of partial assignments (λ) and the exponent ratio, ρ = (log2 λ)/n. The average and maximum ratio ρ for random 3-SAT problems are shown in Fig. 5. The number of assignments in the initial pool generated by Lipton’s brute-force algorithm is 2n , so the exponent ratio for a brute-force algorithm is a constant, ρLipton = 1.0. The observed ratio ρ for this algorithm decreases almost linearly with an increase in n and the m/n ratio. When n = 50, the overall average and maximum number of DNA strands required is, respectively, 20.4198n and 20.48n . If this relation 20.48n holds true or decreases further in cases of 3-SAT problems with more variables, this algorithm should make it possible to solve large and hard 3-SAT problems with a much smaller amount of DNA than the conventional brute-force method. Additionally, Fig. 5. Average and maximum exponent ratio (ρ = (log2 λ)/n) for different n and m/n ratio. Data were calculated from 50,000 random 3CNF–SAT instances with number of variables n = [5, 50] and clauses/variable ratio m/n = [1, 50]. Fig. 6. Incidence of exponential ratio (ρ = (log2 λ)/n) obtained from 10,000 random 3CNF–SAT instances with number of variables n = 50 and clauses/variable ratio m/n = [1, 50]. as Fig. 6 indicates, the worst case situation (20.48n ) is rare; in most cases the ratio is less than 20.446n . Thus, based on the analysis in Section 3 and Section 6, we propose the hypothesis that most large and hard SAT problems can be solved on a DNA computer with time complexity of at most O (3m + n) and space complexity of less than 20.48n . 7. Discussion Even though the sample SAT problem solved here is very small, the proposed DNA computing algorithm has several advantages over the conventional brute-force search algorithm. First, it eliminates the need to construct a full-solution DNA library. The initial test tube (t0 ) is empty instead of containing the full-solution data pool, and the other test tubes tj (j = 1 to m) contain only strings that satisfy clauses C1 to Cj , which greatly reduces the number of DNA strands required in the DNA computation and makes it possible to extend this approach to solve large combinatorial problems. Second, the special characteristics of the proposed graph that was used to construct the paths to solve the SAT problem not only help to reduce the number of DNA strands required, but also bring some new concepts into the graph theory of combinatorial problems. The proposed graph differs from Lipton’s mainly in two aspects. First, every edge is embedded with a logic gate that holds a value assignment for the corresponding node and checks whether a path is allowed or not, and paths are generated step by step, restricted by the clauses of the SAT formula. Therefore, only true paths that satisfy the clauses being treated are reserved. In contrast, in Lipton’s graph, all 2n possible paths are generated at the beginning of computation. Second, the nodes are sequenced accord- 124 X. Wang et al. / BioSystems 91 (2008) 117–125 ing to the order of the first appearance of the variables in the SAT formula instead of by their indices. Compared to previous algorithms (Adleman, 1994; Lipton, 1995; Yoshida and Suyama, 2000; Liu et al., 2005; Ouyang et al., 1997; Sakamoto et al., 2000; Braich et al., 2002; Liu et al., 2000; Su and Smith, 2004; Faulhammer et al., 2000; Ogihara and Ray, 2000) in which the variables are commonly connected in the increasing order of their indices, this feature of the present algorithm makes it much easier to handle and possible to score and sort clauses, which makes the searching space smaller. Third, this DNA computing algorithm is highly errortolerant. In our experimental implementation of this algorithm, only one enzyme, Taq DNA ligase, was used to carry out all of the enzymatic operations. Taq DNA ligase is a highly accurate DNA ligase that has only nickrepairing activity. It can only ligate perfectly matched sticky ends, which means that a one-base-pair mismatch in the sticky ends is enough to prevent them from ligating DNA molecules. The intrinsic highly accurate sequence-specific DNA ligation ability of Taq DNA ligase makes it probably one of the best tools for use in DNA computing. Other enzymatic DNA operations, such as PCR amplification and restriction digestion, have been used successfully to solve the max clique problem (Ouyang et al., 1997). Those authors pointed out that major errors arise from two sources. The first is the production of single-stranded DNA (ssDNA) during PCR, which cannot be cut by restriction enzymes. The second is the incomplete cutting of double-strand DNA (dsDNA) by restriction enzymes, which also leads to incorrect answers (Ouyang et al., 1997). Previously, we solved a SAT problem using PCR to amplify the ligation product, and restriction endonuclease digestion to prune incorrect DNA strands (unpublished). However, these methods are obviously impractical for large problems, since it will become increasingly difficult to handle more and more different restriction enzymes, which works often in different buffer solutions. In the present study, all of the enzymatic operations, including ligating, amplifying and pruning, were carried out with the same enzyme that works in the same buffer. This makes the whole computing process simple, fully automatable, and practical for large problems. Generally, PCR-based algorithms are error-prone, since the key enzyme used in the PCR reaction, Taq DNA polymerase, produces not only desired DNA strands but also sometimes unwanted sequences, since it is possible that the primers may anneal non-specifically to the wrong sequences (mismatching primers). More serious errors, such as point mutations in the restriction sites, may sometimes arise when PCR is used to amplify the solution DNA, which is often required to contain many restriction sites (Ouyang et al., 1997). In PCR, primers that are annealed to the template are elongated nucleotide by nucleotide, which is the main source of point mutation. In contrast, in LCR, they are extended block by block, since all DNA sequence blocks are synthesized before the beginning of computation, and during computation we only have to ligate these pre-synthesized sequence blocks properly. In this LCR-based algorithm, all of the abovementioned sources of errors, such as point mutations, mismatching primers, incomplete cutting of ssDNA and dsDNA, are eliminated. Thanks to the high fidelity of LCR, this procedure gives an exponential amplifier with a larger exponent for correct strands than for incorrect ones. Therefore, by repeating the LCR process, we should reduce the amount of noise arising from the remaining false strands in the template DNA. Therefore, this algorithm is inherently more reliable than conventional PCR-based algorithms. As noted by Adleman (1994), the major advantages of DNA computing lie in its high information-storage capacity and parallel-computation power. The present algorithm takes advantage of these features of DNA computing without generating an initial data pool that contains every possible solution. Compared with conventional brute-force algorithms, this algorithm is more space-efficient and error-tolerant. Though the number of DNA strands required still increase exponentially with the size of the problem, the exponent ratio is less than half of that for brute-force algorithms. We believe that it can be scaled-up to solve large and hard SAT problems. The maximum number of variables that it can deal with depends on how many cycles of ligation and LCR amplification can be performed to extend the DNA strands without any serious error. In the present study, we performed three steps of extension and obtained a 4-bit DNA solution. This process can theoretically proceed for as many steps as desired, while the actual number of steps should be determined in practice by further experiments. In conclusion, we believe that the algorithm described here represents an important contribution to the development of DNA computing models, and there is great potential for the further exploration of DNA computers. This algorithm starts with an empty test tube, since an empty start actually means all of the possibilities are included. With continual improvements in both DNA computing algorithms and molecular biotechnologies, it may soon be possible to develop powerful DNA computers that can be useful in certain aspects, such as X. Wang et al. / BioSystems 91 (2008) 117–125 in an in vivo biochemical/genetic controlling system. Unfortunately, at present, only the data are encoded into DNA sequences; the programs used for data processing are still controlled by humans or with the aid of electrical-mechanical systems. One of the most interesting open questions is how to create a truly molecular computer (TMC), which means that not only the data, but also the programs themselves are encoded into DNA sequences, and execution of the program is not controlled by a human but rather by an automatic biological system. Acknowledgements This research was supported by the National Science Foundation of China through Grant 30500379. References Adleman, L.M., 1994. Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024. Braich, R.S., Chelyapov, N., Johnson, C., Rothermund, P.W.K., Adleman, L.M., 2002. Solution of a 20-variable 3-SAT problem on a DNA computer. Science 296, 499–502. 125 Faulhammer, D., Cukras, A.R., Lipton, R.J., Landweber, L.F., 2000. Molecular computation: RNA solutions to chess problems. Proc. Natl. Acad. Sci. U.S.A. 97, 1385–1389. Lipton, R.J., 1995. Using DNA to solve NP-complete problems. Science 268, 542–545. Liu, W., Gao, L., Zhang, Q., Xu, G., Zhu, X., Liu, X., Xu, J., 2005. A random walk DNA algorithm for the 3-SAT problem. Curr. Nanosci. 1, 85–90. Liu, Q., Wang, L., Frutos, A.G., Condon, A.E., Corn, R.M., Smith, L.M., 2000. DNA computing on surfaces. Nature 403, 175– 179. Ogihara, M., Ray, A., 2000. DNA computing on a chip. Nature 403, 143–144. Ouyang, Q., Kaplan, P.D., Liu, S., Libchaber, A., 1997. DNA solution of the maximal clique problem. Science 278, 446– 449. Sakamoto, K., Gouzu, H., Komiya, K., Kiga, D., Yokoyama, S., Yokomori, T., Hagiya, M., 2000. Molecular computation by DNA hairpin formation. Science 288, 1223–1226. Selman, B., Mitchell, D., Levesque, H., 1996. Generating hard satisfiability problem. Artif. Intell. 81, 17–29. Su, X., Smith, L.M., 2004. Demonstration of a universal surface DNA computer. Nucleic Acids Res. 32 (10), 3115– 3123. Yoshida, H., Suyama, A., 2000. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science. Solution to 3-SAT by Breadth-First Search, vol. 54. American Mathematical Society, Providence, RI, pp. 9–20.