Satisfiability of 3-CNF
Satisfiability of 3-CNF
Abstract
arXiv:1609.05709v1 [cs.CC] 25 Jul 2016
The relationship between the complexity classes P and NP is an unsolved question in the
nor P = NP was “unprovable” within the a-temporal framework of Mathematics. See [4].
A part of the proof about the impossibility to prove that P = NP turns to be inexact, and
This paper proposes a new way to decide the satisfiability of any 3-CNF-SAT problem. The
analysis of this exact [non heuristical] algorithm shows a strictly bounded exponential
descriptor functions) of the evolving set of solutions in function of the already considered
clauses, without exploring these solutions. Any remark about this paper is warmly welcome.
Index Terms
Boolean formulae are built in the usual way from propositional variables xi and the logical
A 3-CNF formula ϕ is a Boolean formula in conjunctive normal form with exactly three
literals per clause, like ϕ := (x1 ∨ x2 ∨ ¬x3 ) ∧ (¬x2 ∨ x3 ∨ ¬x4 ) := ψ1 ∧ ψ2 . A 3-CNF formula
logical values for the propositional variables, so that ϕ can be true. Until now, we do not
know whether it is possible or not to check the satisfiability of any given 3-CNF formula ϕ
A. Definitions
The size of a 3-CNF formula ϕ is defined as the size of the corresponding Boolean circuit,
i.e. the number of logical connectives in ϕ. Let us note the following property :
where α = m/n is the ratio of clauses with respect to variables. It seems that α ≈ 4.258
B. Examples
Let A be such a matrix, A can be extended to new propositional variables by adding columns
filled with the neutral sign “.”, meaning that the corresponding variable can be set either to
B. Reduction of 3-CNF-matrices
The inverse operation, called reduction, replaces two same lines only differing by a 0 and a
1 for a variable, by a unique line with a neutral sign for this variable :
x1 x2 x3
0 0 0
x1 x2 x3
0 0 1
x1 x2 x3
0 0 .
0 1 0
≡
0 . .
A= 0 1 . ≡ (5)
0 1 1 1 0 .
1 0 .
1 0 0 1 1 0
1 1 0
1 0 1
1 1 0
C. Disjunction of 3-CNF-matrices
Let A and B be two matrices and {x1 , · · · , xn } the union of their support variables. Let A
and B be their extensions over {x1 , · · · , xn }. Then we define the disjunction of A and B by
x1 · · · xn
A∨B =
A
(6)
B
Of course, this new matrix should be reordered so that the lines are in a ascending order,
which can yield sometimes in replacing a line with a neutral sign by two lines with a one
and a zero.
Let A a matrix such that the reduction process yields to lines with neutral sign, then A can
The block decomposition is not unique. For instance, there are 6 different block decompo-
E. Conjunction of 3-CNF-matrices
Let A and B be two matrices, A and B their extensions to the joint set of propositional
ΣA ΣB
_ _
A= Ak and B = Bl (7)
k=1 l=1
where
i
∅ if ∃ cm = “NaN”
! !
x1 xi xn x1 xi xn !
C k,l = ∧ = x1 xi xn (9)
a1k aik ank b1l bil bnl otherwise
c1 ci cn
m m m
and
aik if aik = bil
ai if ai 6= bi and bi = “ · ”
k k l l
cim = (10)
i i i i
b l if ak 6
= b l and a k = “·”
“NaN” otherwise
Let us call ∅, the empty matrix, with no line at all. The empty matrix is neutral for the
Let us define Ω, the full matrix, as a one line matrix with only neutral signs in it. The full
G. Example of operations
Let us consider the following block decompositions for [ψ1 ] and [ψ2 ] with x2 and (x2 x3 )
as common supports.
x x2 x3
1
0 0 0
0 1 0
! ! !
0 1 1 x2 x2 x3 x1 x2 x3
[ψ1 ] = = ∨ ∨
1 0 0 1 0 0 1 0 1
1 0 1
1 1 0
1 1 1
and
x2 x3 x4
0 0 0
0 0 1
! ! !
0 1 0 x2 x2 x3 x2 x3 x4
[ψ2 ] = = ∨ ∨
0 1 1 0 1 1 1 0 0
1 0 0
1 1 0
1 1 1
[ψ1 ] ∧ [ψ2 ]
! ! ! !
x2 x3 x2 x3 x4 x2 x3 x1 x2 x3
= ∅∨ ∨ ∨ ∨∅∨∅∨ ∨∅∨∅
1 1 1 0 0 0 0 1 0 1
x1 x2 x3 x4
. 0 0 .
= . 1 0 0
. 1 1 .
1 0 1 .
Let us note A the set of all the 3-CNF-matrices. Then (A, ∨) and (A, ∧) are both semi-
Let us define the two absorption laws as x = x ∨ (x ∧ y) and its dual x = x ∧ (x ∨ y). A
(A, ∨, ∧) is a lattice over the set of 3-CNF-matrices with respect to the disjunction and
distributive with respect to ∨ and A ∨ Ω = Ω & A ∧ ∅ = ∅ ∀A ∈ A. See [1] for more details
over lattices.
!
_ x1 ··· xi ··· xn
[ϕ] = (11)
(α1 ,··· ,αn )∈{0,1}n f1 (α1 ) · · · fi (α1 , · · · , αi ) · · · fn (α1 , · · · , αn )
f1 (α1 )
notation ..
≡
.
(12)
fn (α1 , · · · , αn )
Example :
Proof:
• Let the theorem be true for n − 1 and [ϕ] be a 3-CNF-matrix of dimension n. There exist
!
_ x1 · · · xn
Thus [ϕ] =
αi ∈{0,1} h1 (α1 ) · · · hn (α1 , · · · , αn )
where
h1 (α1 ) = α1
binations of αi .
Example :
Consider a clause ψ ≡ [¬]xr ∨ [¬]xs ∨ [¬]xt where 1 ≤ r < s < t ≤ n. [ψ] is then characterized
hi (α1 , · · · , αi ) = αi ∀i<t
(αr + 1)(αs + 1)(αt + 1) + αt if ψ = xr ∨ xs ∨ xt
(αr + 1)(αs + 1) αt + αt if ψ = xr ∨ xs ∨ ¬xt
(αr + 1) αs (αt + 1) + αt if ψ = xr ∨ ¬xs ∨ xt
if ψ = xr ∨ ¬xs ∨ ¬xt
(αr + 1) αs αt + αt
ht (αr , αs , αt ) = (14)
αr (αs + 1)(αt + 1) + αt if ψ = ¬xr ∨ xs ∨ xt
if ψ = ¬xr ∨ xs ∨ ¬xt
αr (αs + 1) αt + αt
αr αs (αt + 1) + αt if ψ = ¬xr ∨ ¬xs ∨ xt
if ψ = ¬xr ∨ ¬xs ∨ ¬xt
αr αs αt + αt
Theorem IV.2: The conjunction operator ∧ between two sets of clauses can be
rewritten as the merging of their characterization functions : [hi (·)] = [fi (·)]∧[gi (·)].
f1 (α1 ) g1 (α1 )
.. and [ϕ0 ] ≡
..
Let [ϕ] ≡
. .
fn (α1 , · · · , αn ) gn (α1 , · · · , αn )
Note : ϕ or ϕ0 should be extended if necessary in order to get the same support of propo-
sitional variables. Remember that all operations are modulo 2 : αi + αi = 0, αi2 = αi and
h1 (α1 )
..
Then [ϕ] ∧ [ϕ0 ] ≡
.
hn (α1 , · · · , αn )
where for 1 ≤ t ≤ n :
Moreover if there exists a (unique) j < t, related to the highest αj such that :
(17)
Recursivity will end as soon as there is no longer such gj∗ (α1 , · · · , αj ) = 1 or when gj∗ (α1 , · · · , αj )
Proof: Consider the possible values for ft (α1 , · · · , αt ) and gt (α1 , · · · , αt ) in equation (15)
when αt ∈ {0, 1} :
ht (α1 , · · · , αt ) = (αt + 1) · { [ft (·, 0) + gt (·, 0)] · [ft (·, 1) · gt (·, 1)] + [ft (·, 0) · gt (·, 0)] } +
[ft (·, 1) + gt (·, 1)] · [ft (·, 0) · gt (·, 0)] + [ft (·, 1) · gt (·, 1)] }
= ft (α1 , · · · , αt ) = ht (α1 , · · · , αt )
ht (α1 , · · · , αt ) = (αt + 1) · ft (·, 0) + αt · ft (·, 0) [as ft (·, 1) + gt (·, 1) = 1 and ft (·, 1) · gt (·, 1) = 0]
= ft (α1 , · · · , 0) = gt (α1 , · · · , 0)
= ft (α1 , · · · , 1) = gt (α1 , · · · , 1)
but [ft (·, 0) + gt (·, 0)] · [ft (·, 1) + gt (·, 1)] = f onction(α1 , · · · , αj ) = 1
and gj∗ (α1 , · · · , αj ) := [ft (·, 0) + gt (·, 0)] · [ft (·, 1) + gt (·, 1)] + gj (α1 , · · · , αj ) = 1
Example :
Then
α1 α1 α1
α2
α2
α2
[ϕ] = α1 α3 + α2 α3 + α1 α2 α3 ∧ α3 ∧ α3
α4 α +α α +α α α
4 2 4 2 3 4
α +α α +α α α
4 1 4 1 3 4
α5 α5 α5
α1
α2
= α1 α3 + α2 α3 + α1 α2 α3
α +α α +α α +α α α +α α α +α α α +α α α α
4 1 4 2 4 1 2 4 1 3 4 2 3 4 1 2 3 4
α5
α1 α1 α1
α2
α2
α2
0
[ϕ ] = α3 + α1 α3 + α1 α2 α3 ∧ α3 ∧ α3
α4
α4
α4
α5 α5 + α2 α5 + α2 α3 α5 α5 + α1 α5 + α1 α3 α5
α1
α2
= α3 + α1 α3 + α1 α2 α3
α4
α5 + α1 α5 + α2 α5 + α1 α2 α5 + α2 α3 α5
And
α1
α2
0
[ϕ] ∧ [ϕ ] = α2 α3
α +α α +α α +α α α +α α α +α α α +α α α α
4 1 4 2 4 1 2 4 1 3 4 2 3 4 1 2 3 4
α5 + α1 α5 + α2 α5 + α1 α2 α5 + α2 α3 α5
A. Boolean descriptors
defined over n propositional variables. These m clauses describe perfectly the set of solu-
tions for the 3-CNF-SAT problem and can be considered as Boolean descriptors of the
3-CNF-SAT problem. These Boolean descriptors are of linear complexity as they can be
The difficulty with such Boolean descriptors is that there is no simple or direct relation
between them and the set of solutions or the answer to the satisfiability question.
B. 3-CNF-matrix descriptors
descriptors (each line in the 3-CNF-matrix) can be of exponential complexity as there are
as many descriptors as solutions. Even if one uses the reduction version of the 3-CNF-matrix
description as explained in (5), simulations show that the complexity remains exponential.
The interest of these descriptors is the direct link between them and the set of solutions
C. Functional descriptors
This paper proposes also in (11) a 3-CNF-matrix functional description for any 3-CNF-
SAT problem. These functional descriptors are of unknown complexity, at least at this
These functional descriptors are somehow in between both previous types of descriptors, as
they are in an exponential relation to the set of solutions and in an direct relation with
give the answer to the satisfiability question : no if the functional descriptors does not exist,
and yes otherwise. However, one needs to generate all possible values for αi to get the entire
Conclusion : The approach of the 3-CNF-SAT problem via functional descriptors seems
to be promising as it does not consider the set of all solutions, but only focuses on the sole
where lenj (ht ) is the number of terms in ht (·) when the j first clauses are considered :
X
Let len(ht ) ≡ ∆t (δ1 , · · · , δt ) [ see (13) for the definition of ∆t ] (18)
(δ1 ,··· ,δt )∈{0,1}t
Proof: Let us compute the complexity of ft (·)∧gt (·) in (15). First of all, one has to compute
the four functions in square brackets : [ft (·, 0) + gt (·, 0)], [ft (·, 1) + gt (·, 1)], [ft (·, 0) · gt (·, 0)]
len(ft (·, 0)) ≤ len(ft (·, αt )) and len(ft (·, 1)) ≤ len(ft (·, αt )) [≡ len(ft )]
len(gt (·, 0)) ≤ len(gt (·, αt )) and len(gt (·, 1)) ≤ len(gt (·, αt )) [≡ len(gt )]
len(ft + gt ) ≤ len(ft ) + len(gt ) ≤ len(ft ) · len(gt ) when len(ft ) > 2 and len(gt ) > 2
The complexity for the four functions is then O(len(ft ) · len(gt )).
O( 3 · [(len(ft ) · len(gt ))2 + (len(ft ) · len(gt ))] + 2 · [2(len(ft ) · len(gt ))2 + (len(ft ) · len(gt ))])
Note : it needs three runs over the formula in the brackets to do the product with (αt + 1),
one to compute the formula, one to multiply it by αt and one to add both results. Similarly,
To solve the 3-CNF-SAT problem, one should compute all n functional descriptors ht (·),
each of them with at most n recursive calls, and this for each step of integration of the
m clauses. So, using the equivalence between (20) and (19), the overall complexity of the
functional approach to 3-CNF-SAT problem will be of order O(m n2 max max lenj (ht ) ).
1≤t≤n 1≤j≤m
Theorem VI.2: The most difficult 3-CNF-SAT problems are uniformly dis-
tributed ones.
Note : The invariance structure of 3-CNF-SAT problem is important with respect to the
complexity of the functional descriptor approach. It is then normal that problems with
can be done to reduce the complexity. A simple example of the importance of re-labeling is
Proof:
First, let us note that the computations involving negative literals are easier and faster than
for positive ones. This is a mere consequence of our definition for ht (·) in (14). So the
most difficult problems will be the balanced one with respect to the proportion of negative
and positive literals. Otherwise, we inverse some variables in order to get the maximum
of negative literals. Let us suppose from here that the proportion of positive and negative
• Some definitions
Let us divide now the clauses in two sets. The first set contains all the clauses with the
January 6, 2020—8 : 41 am DRAFT
A 3-CNF-SAT DESCRIPTOR ALGEBRA AND THE SOLUTION OF THE P =NP CONJECTURE 17
higher indexed literal being positive and the second with the negative ones :
[
Cl+ = Cl+ (xt )
t
[
= {ψi := [¬]xr ∨ [¬]xs ∨ xt with r < s < t, or any permutation of xr , xs , xt }
t
[
Cl− = Cl− (xt )
t
[
= {ψi := [¬]xr ∨ [¬]xs ∨ ¬xt with r < s < t, or any permutation of xr , xs , xt }
t
By construction, there is at least one solution for each xt when considering clauses only in
Cl+ [xt = 1] or in Cl− [xt = 0]. Moreover, the computation of the functional descriptors will
not involved recursive calls [see (16)] as no impossibility exists for any xt .
−
and len(ht )|Cl− ≤ 2(#V (xt ) + 1)
.
+
(xn ) ∪ V − (xn )) + 1
len(hn ) ≤ 2#(V
the variables xi (i < n − 1) found in common clauses with xn−1 as the highest variable :
len(hn−1 ) ≤ 2#V (xn−1 ) + 1 . But one has to add the potential αi involved in the recursive
∗
call gn−1 (·) from the previous computation of hn (·). [see (16)]
∗
From (16), gn−1 (α1 , · · · , αn−1 ) = [hn (·, 0)|Cl+ + hn (·, 0)|Cl− ] · [hn (·, 1)|Cl+ + hn (·, 1)|Cl− ].
∗
So gn−1 (·) will be a multi-linear combination of the same αi , except αn , as for hn (·).
∗
Therefore, hn−1 (·) ∧ gn−1 (·) will be a combination of the αi associated to the variables
Sn Sn
in i=n−1 [V + (xi ) ∪ V − (xi )] ⇒ len(hn−1 ∧ gn−1
∗
) ≤ 2#[ i=n−1 V (xi )] , as xn−1 should not be
counted in V (xn ).
Sn
So, ∀ t : len(ht ∧ gt∗ ) ≤ 2#[ i=t V (xi )] + 1−(n−t)
. But as ht ∧ gt∗ is a combination of at most
t αi ’s, we have :
Sn
len(ht ∧ gt∗ ) ≤ min( 2#[ i=t V (xi )] + 1−(n−t)
, 2t ) (23)
As V (xi ) is dependent of the ordering of the variables, it is possible to re-order the variables
Sn
so that 2#[ i=t V (xi )]
is minimal, except in the case of uniformly distributed literals. The
uniformly distributed case is then the most difficult problem, as no ordering can
variables :
^
ϕ := (x2i−1 ∨ x2i ∨ x2m+1 )
1≤i≤m
Here we have :
V + (xt ) = V − (xt ) = ∅ ∀ t 6= 2m + 1
Note : The proof of this equality is more difficult than interesting, so we do not write it here.
• But the same 3-CNF-SAT problem can be formalized in terms of opposite literals
yi = ¬xi , ∀i ∈ {1, · · · , 2m + 1} :
^
ϕ := (¬y2i−1 ∨ ¬y2i ∨ ¬y2m+1 )
1≤i≤m
V + (yt ) = V − (yt ) = ∅ ∀ t 6= 2m + 1
and h2m+1 (α1 , · · · , α2m+1 ) = [(α1 α2 α2m+1 + α2m+1 ) ∧ (α3 α4 α2m+1 + α2m+1 )] ∧ · · ·
see(15)
= [(α2m+1 + 1) · 0 + (α2m+1 ) · (α1 α2 + 1) · (α3 α4 + 1)] ∧ · · ·
m
Y
= (α2m+1 ) · (α2i−1 α2i + 1)
i=1
⇒ len2m+1 (h2m+1 ) = 2m
• Finally, the same 3-CNF-SAT problem can be formalized using re-ordered propositional
^
ϕ := (¬z2i ∨ ¬z2i+1 ∨ ¬z1 )
1≤i≤m
V + (zt ) = ∅ ∀ t
(
− ∅ for t = 1 or t = 2i (1 ≤ i ≤ m)
V (zt ) =
{z1 , z2i } for t = 2i + 1 (1 ≤ i ≤ m)
max max lenj (ht ) = 2
1≤t≤n 1≤j≤m
So for this example, one can reach a linear complexity of O(2 · number of ht (·)) = O(2 m),
as only one ht (·) has to be computed at each step without any recursive call.
The smallest exact uniformly distributed and optimally re-ordered 3-CNF-SAT problem is :
x1 ∨ ¬x2 ∨ ¬x3
x1 ∨ x2 ∨ ¬x3
¬x1 ∨ ¬x2 ∨ ¬x3
8
¬x1 ∨ x2 ∨ ¬x3
^ ^
ϕ := ψi =
i=1
x1 ∨ ¬x2 ∨ x3
x1 ∨ x2 ∨ x3
¬x1 ∨ ¬x2 ∨ x3
¬x1 ∨ x2 ∨ x3
No relabeling will reduced the 3-CNF-SAT complexity. This problem is “hard” in the sense
that each clause eliminates only one solution at a time. We have here :
V + (x1 ) = V − (x1 ) = ∅
V + (x2 ) = V − (x2 ) = ∅
and
Step h1 (·) h2 (·) h3 (·) maxt len(ht ) #Solutions
ψ1 α1 α2 (α1 + 1)α2 α3 + α3 3 7
ψ1 ∧ ψ2 α1 α2 α1 α3 1 6
∧3i=1 ψi α1 α2 α1 α2 α3 + α1 α3 2 5
∧4i=1 ψi α1 α2 0 1 4
∧5i=1 ψi
α1 α1 α2 0 1 3
∧6i=1 ψi 1 α2 0 1 2
∧7i=1 ψi 1 0 0 1 1
∧8i=1 ψi @ @ @ 0 0
Conclusions :
The theorem about uniformity is important as it states : for any non uniformly distributed
3-CNF-SAT problem ϕ with m clauses and n variables, there exists an uniformly distributed
3-CNF-SAT problem ϕ0 with m clauses and n variables which is more difficult to solve, in
Relabel the propositional variables [xi → yj ] in order to get their occurrence [O(m)]
Inverse the sign of the literals in order to get the maximum of negative literals;
Sort the clauses to get a increasing order of the highest variable in the ordered
Within the set of clauses with the same highest variable, sort the clauses so that the
ones with negative highest variable appear before the ones with positive highest variable.
As we seldom have exact uniformly distributed 3-CNF-SAT problems, the complexity can
len(h_t)
for
20
variables
and
120
clauses
len(h_t)
for
20
variables
and
120
clauses
30000
160
h_9
h_10
140
25000
h_11
120
h_12
20000
h_1
100
h_13
Fig. 1 : Complexity for the same dataset before and after the sorting algorithm.
literals be randomly distributed amongst the clauses. Such problem is called an exact
Remark : For exact uniformly distributed 3-CNF-SAT problem, the labeling part of the
3α
previous sorting algorithm has no effect, as the occurrence of each literal is 2 . Only the
problems, the expected number of clauses where i (i > 2) is the highest index, is
(i − 1)(i − 2)
E[#{ψ = [¬]xr ∨ [¬]xs ∨ [¬]xt | max(r, s, t) = i}] = 3 α ≡ mα (i) (24)
(n − 1)(n − 2)
Proof: Let ψ be a clause with xi or ¬xi , there are C2i−1 · 3 α combinations with smaller
indices amongst C2n−1 · 3 α possibles combinations. So, the probability for xi to get the
(i−1)(i−2)
highest index is : (n−1)(n−2) . The expected value is obtained by multiplying the probability
Figure 2 shows the theoretical density and cumulative distributions of mα (i) versus the
distributions for the observed values for #{ψ = [¬]xr ∨ [¬]xs ∨ [¬]xt | max(r, s, t) = i} in the
case of a 3-CNF-SAT problem with 175 variables and 753 random clauses.
Empirical
versus
theore1cal
distribu1on
of
m_α(i)
Empirical
versus
theore1cal
cumula1ve
distribu1on
Total
number
of
variables
in
the
first
N
clauses
14
200
Number
of
clauses
with
i
as
maximal
index
180
12
160
10 140
120
8
100
6 80
60
4
40
2 20
0
0
0
100
200
300
400
500
600
700
800
0
25
50
75
100
125
150
175
N
first
clauses
Index
i
of
x_i
:
175
variables
and
753
random
clauses
(uuf175-‐01.cnf)
(753
randon
clauses
from
uuf175-‐01.cnf
hBp://www.satlib.org/ubcsat)
Empirical data (175 variables -‐ alpha = 4,30) TheoreKcal data (175 var -‐ alpha = 4,30) Empirical # variables in N clauses TheoreKcal # variables in N clauses
Fig. 2 : Density and cumulative distributions of “sorted clauses” for n = 175 and α = 4, 30.
lems, the expected number of variables in V (xi ), for i > 2 and large n, is given
by :
(i − 1)(i − 2)
E[#V (xi )] = 2 mα (i) = 6α (25)
(n − 1)(n − 2)
Proof: There is C2i−1 possible triplets with xi being the highest indexed variable. The
i−2 2
probability for some xj (j < i) to appear in one of these triplets is C2i−1
= i−1 = p for any
j. The occurrence of xj follows a binomial model Bi(mα (i), p), as one can choose several
January 6, 2020—8 : 41 am DRAFT
A 3-CNF-SAT DESCRIPTOR ALGEBRA AND THE SOLUTION OF THE P =NP CONJECTURE 23
times the same triplet (given different clauses with respect to the negative or positive sign
of the included literals). The expected number of occurrence of xj in the mα (i) triplets is
6 α(i−2)
then mα (i) · p = (n−1)(n−2) < 1 for large n. So each variable is expected to appear at most
once in the mα (i) triplets-clauses. Therefore, the number of variables, different from xi ,
occurring in these mα (i) clauses is 2mα (i) as there are two variables distinct from xi in each
clause.
lems, the maximal expected complexity for the computation of ht (·) is bounded
by :
k
" #
X
min{2( mα (n(j) )) − (k − 1) , n(k) }
j=0
len(ht ) ≤ max 2
k≥0
k
X
where 2( mα (n(j) )) − (k − 1) is a concave quadratic function with respect to t or
j=0
k, as shown on figures 3 and 4.
Proof: From (25), we know that #V (xi ) is expected to be maximal for i = n, when
For the computation of the recursive call gj∗ (α1 , · · · , αj ) (see 16), the index j is the highest
index of the variables in V (xn ). Let us note it n(1) . We have thus # V (xn ) = 2 mα (n)
indexes uniformly chosen from {1, · · · , n − 1}. n(1) will be the expected maximal index from
2 mα (n) 6α
n(1) = E[ max (ui )] = (n − 1) = (n − 1) (27)
1≤i≤2mα (n) 2 mα (n) + 1 6α + 1
So, for the recursive call, we will have to compute hn(1) (·) ∧ gn∗ (1) (·). We get :
(n(1) − 1)(n(1) − 2)
#V (xn(1) ) = 2 mα (n(1) ) = 6α from (25)
(n − 1)(n − 2)
len(hn(1) (·) ∧ gn∗ (1) (·)) ≤ 2#{V (xn ) ∪ V (xn(1) )} + 1−(2−1) from (23)
(1)
≤ 22(mα (n)+mα (n ))
And so on, for the next recursive calls. We get for the recursive call k (k > 1) :
n ≡ n(0)
ui ∼ U [1, · · · , n(k−1) − 1]
2 mα (n(k−1) )
n(k) = E[ max (ui )] = (n(k−1) − 1)
1≤i≤2mα (n(k−1) ) 2 mα (n(k−1) ) + 1
(n(k) − 1)(n(k) − 2)
#{V (xn(k) )} = 2 mα (n(k) ) = 6α
(n − 1)(n − 2)
k
[
(k)
min{# {V (xn(j) )} − (k − 1) , n }
k
X
(j) (k)
min{2( mα (n )) − (k − 1) , n }
min{Mα (n(k) ), n(k) }
≤2 (28)
This bound is only defined for the variables with n(k) as index. Note that n(k) are functions
of the starting index n(0) = n. We can compute similar bounds for other starting indexes
n(0) in [1, · · · , n − 1], so that Mα (·) can be defined for all t as shown in Figure 4.
Numerical computations show that, for large n, Mα (n(k) ) as well as Mα (t) are concave
quadratic functions with coefficients only depending on α. This can be easily explained
as a mere consequence of the iid randomness of the variables #{V (xn(k) )} and mα (n(j) ).
Mα(n(k))
1400
1200
1000
0
0
10
20
30
40
50
60
-‐200
Recursive
call
index
k
k
X
Fig. 3 : Complexity wrt k : Mα (n (k)
) = 2( mα (n(j) )) − (k − 1) where n(0) = n.
j=0
Indeed, the central limit theorem for the expectation of iid random variables predicts that
(k)
E[2Mα (n )
] follows a Normal distribution (censored by min). But X ∼ N (µ, σ 2 ) implies a
2
quadratic log-density : log(fX (x)) ∝ − (x−µ)
2σ . For each value of α, we can compute the cor-
responding µα , σα and the maximum value for Mα (n(k) ). Quadratic regression estimations
give maxk (Mα (n(k) )) ≈ 294 for α = 4, maxk (Mα (n(k) )) ≈ 490 for α = 5, 12 (see figure 3 and
below for the choice of such α) and maxt (Mα (t)) ≈ 1160 for α = 8.
Remark : It is now important to see whether different starting points n(0) yield not to
aggregating trajectories so that addition of bounds are to be considered. This situation can
Theorem VI.6: The probability for a given variable xi to be in more than one
Proof: Let us consider separately the possible trajectories tr(xn(0) → xn(k) ) for n(0) = m ∈
{1, · · · , n} and k ∈ {1, · · · , n}. Let us note a given trajectory : tr(m, km ) with km such that
n(km ) > i. For each variable xi , there exists at most (n − i)(n − i − 1)/2 trajectories tr(m, km )
where xi could be the next highest indexed variable for n(km +1) : tr(n, 0), · · · , tr(n, kn ), · · · ,
tr(i + 1, 0). The probability for xi to get the highest index in a trajectory is :
We have :
P [xi ∈ tr(m, km )] = #{clauses in tr(m, km )} · P [xi ∈ the clause and i is the maximum index]
j=k
Xm n(j) − 2
= mα (n(j) ) · (j) −1
[see (24)]
j=0|n(0) =m
C2n
j=k
Xm 2
= mα (n(j) ) ·
n(j) − 1
j=0|n(0) =m
j=k
Xm (n(j) − 1)(n(j) − 2)
= 6α
(n − 1)(n − 2)(n(j) − 1)
j=0|n(0) =m
j=k
Xm
(6α)
= (n(j) − 2)
(n − 1)(n − 2) j=0
For instance :
6α(n(0) − 2) 6α m − 2 6α
P [xi ∈ tr(m, 0)] = = ≤ as n(0) = m ≤ n
(n − 1)(n − 2) n−1 n−2 n−1
and
Finally,
6α n−3 n − (km + 2)
P [xi ∈ tr(m, km )] ≤ P [xi ∈ tr(n, km )] < (1 + + ··· + )
n−1 n−2 n−2
→ 0 for large n with respect to km and α.
January 6, 2020—8 : 41 am DRAFT
A 3-CNF-SAT DESCRIPTOR ALGEBRA AND THE SOLUTION OF THE P =NP CONJECTURE 27
Now, considering that the elements of tr(m, km ) are iid uniformly distributed random vari-
In conclusion,
Pj=k m 2m (n(j) )−1
α j=k
Xm
i j=0
(6α)
P [i = max{l : xl ∈ tr(m, km )] = (n(j) − 2)
l m−1 (n − 1)(n − 2) j=0
j=k
Xm
(6α)
≤ (n(j) − 2) as i ≤ (m − 1)
(n − 1)(n − 2) j=0
→0 for large n with respect to k and α
trajectories tr(m, km ), as we can see this event as the output of a binomial model with a very
small probability of success (“xi being maximal in some tr(m, km )”), over (n − i)(n − i − 1)/2
possible trajectories :
= 1 − (P [0 success] + P [1 success])
(n−i)(n−i−1) (n − i)(n − i − 1) (n−i)(n−i−1)
−1
≤ 1 − ([(1 − p) 2 ]+[ p(1 − p) 2 ])
2
→0 for large n, as p → 0 for large n with respect to k and α.
The last thing to prove is that k is not O(n) as α is a given constant. Figure 3, which
is computed with the theoretical formula from (28), shows that the maximal value for k is
January 6, 2020—8 : 41 am DRAFT
A 3-CNF-SAT DESCRIPTOR ALGEBRA AND THE SOLUTION OF THE P =NP CONJECTURE 28
and α = 8.
250
200
y
=
-‐0,0037x2
+
1,8559x
+
19,51
R²
=
0,99937
150
N=500
-‐
alpha=4
(Theory)
50
0
0
100
200
300
400
500
600
Index
t
of
the
variables
“cluster” process, the size of one trajectory, i.e. the number of xi involved in that trajectory,
becoming more and more important so that this trajectory attracts all the variables. Then,
k is O(n), P [xi ∈ tr(n, k)] → 1 and the complexity becomes exponential. It is easy to solve
these cases. As the variables are uniformly distributed in random 3-CNF-SAT problems,
variables xj (belonging to two or more trajectories) with a smaller indexed variable, such as
xj−1 (or xj−2 if xj−1 is already in a previous trajectory, and so on). The two trajectories
Beginning with the last clause (with [¬]xn ), mark xj where j = max{i : xi ∈ V (xn )}
as already belonging in a trajectory and initialize W (xn ) := V (xn ) and W (xj ) := V (xn )
[we do not consider j < 3α as merging of trajectories for small indexes is not a
Figure 4 shows the result for a Dimacs generated 3-CNF-SAT problem with 500 variables
and α = 4. We apply the sorting and the permuting algorithms on the generated file to
If we have proved in this section that the complexity is bounded with respect to n, we still
have to show that complexity is not increasing with respect to α, which is not the case for
Mα (t).
It is easy to see that the complexity is an increasing function of α for exact uniformly
But there should be somewhere a threshold for α as large α-random problems are easy to
results from the literature suggest that this threshold for α is ≈ 4.258. See [3].
The analysis of complexity with respect to α will be done through Sϕ , the set of all satisfying
lems ϕ = {ψj }1≤j≤m with n variables and m clauses, we get for large n and m the
7 7
E[#Sϕ ] = E[#{(x1 , · · · , xn ) ∈ {0, 1}n |ϕ(x1 , · · · , xn ) = 1}] = 7 ( )n−3 ( )(m−n+2) (29)
4 8
Proof:
• Let us re-order the m clauses ψj in such a way that each clause has only one new additional
variables with respect to the set of variables appearing in the previous clauses.
The cases where all clause ψk+1 introduces two or three new additional variables to Vk are to
be neglected, as this means that the 3-CNF-SAT problem can be split into two sub-problems
with one or zero common variable, which reduces drastically the complexity of the problem.
• Let us look at the expected effect of a clause ψj (1 ≤ j ≤ m) over the number of solutions :
1. Let us consider ψ1 .
The first clause yields to 7 · 2n−3 solutions. The matrix representation of ψ1 will be
a 7 × 3 matrix.
2. Let us consider ψ2 .
Let ψ2 introduces only one new additional variable xt , and let xr and xs be the two
Depending on the sign of the literal xt in ψ2 , the result matrix for [ψ1 ∧ ψ2 ] will get
the 7 lines of [ψ1 ] with a zero in the column for xt if ψ2 = [¬]xr ∨ [¬]xs ∨ ¬xt or with
a one if ψ2 = [¬]xr ∨ [¬]xs ∨ xt . This corresponds to solutions where the literal [¬]xt
is satisfied.
On the contrary, when the value in the column for xt is opposite to the sign of
[¬]xt , the satisfiability of ψ2 should pass through the literals xr and xs . Among
the four possible values for (xr , xs ), only three will be accepted. One couple for
(xr , xs ) will be ruled out, as well in matrix [ψ1 ] as in [ψ2 ]. This corresponds to
one or two lines deleted in [ψ1 ], depending on the sign for xr and xs in ψ1 . The
be equal to 7 + (7− 74 ) = 7(1+ 43 ) = 7( 74 ) = 12, 25. And the boundaries for #[ψ1 ∧ψ2 ]
January 6, 2020—8 : 41 am DRAFT
A 3-CNF-SAT DESCRIPTOR ALGEBRA AND THE SOLUTION OF THE P =NP CONJECTURE 31
3. Let us consider ψ3 .
Let ψ3 introduce a new additional variable. Using the same type of arguments as
E(# deletions in ψ1 ∧ ψ2 ) =
2
X13 X2
d · P [d deletions |#[ψ1 ∧ ψ2 ] = k] · P [#[ψ1 ∧ ψ2 ] = k]
k=12 d=1
Let us consider the couples (xi , xj ) and the number of deleted lines for each case :
xi xj # del. x1 x2 # del.
0 0 d1 0 0 2
0 1 d2 For the above example : 0 1 4
1 0 d3 1 0 3
1 1 d4 1 1 4
P4
We see that, whatever the value of #[ψ1 ∧ψ2 ], i=1 di = #[ψ1 ∧ψ2 ]. So, the expected
V3
Therefore, the expected number of lines in [ i=1 ψi ] will be :
3
^ X
E[#[ ψi ]] = {#[ψ1 ∧ ψ2 ] + (#[ψ1 ∧ ψ2 ] − # deletions )} · P [#[ψ1 ∧ ψ2 ] = k]
i=1 k
13
X #[ψ1 ∧ ψ2 ]
= {#[ψ1 ∧ ψ2 ] + (#[ψ1 ∧ ψ2 ] −)} · P [#[ψ1 ∧ ψ2 ] = k]
4
k=12
1 3 1 1
= 12{1 + (1 − )} · + 13{1 + (1 − )} ·
4 4 4 4
2
^ 7
= E[#[ ψi ]] ·
i=1
4
7
= 7 · ( )2
4
= 21, 4375
V3
And #[ i=1 ψi ] ∈ [min3 , max3 ] = [12 + 12 − 4, 13 + 13 − 1] = [20, 25]
We know that ψj introduces a new additional variable. Then, using the same type
6. For ψj where j > n − 2, no new variable will be added, and the number of
Using the same previous argument, we can consider the six possible cases (xi , xj , xk )
8
X j−1
^
di = #[ ψi ]
i=1 i=1
January 6, 2020—8 : 41 am DRAFT
A 3-CNF-SAT DESCRIPTOR ALGEBRA AND THE SOLUTION OF THE P =NP CONJECTURE 33
#[ j−1
V
i=1 ψi ]
Vj−1
Thus, the expected number of deleted lines in [ i=1 ψi ] will be 8 .
m
^
and #[ ψi ] ∈ [0 , 6 · 2n−3 − m + n − 1]
i=1
5, 19.
Proof: Let us consider exact uniformly distributed α-random 3-CNF-SAT problems. The
most difficult problems are the ones where the decision between satisfiability and unsatisfia-
bility arises only when considering the last clause ψm . This is equivalent to have E[#Sϕ ] ≈ 1.
We get :
7 7
E[#Sϕ ] ≈ 1 ⇔ 7( )n−3 ( )m−n+2 ≈ 1
4 8
7
⇔ ( )m 2n ≈ 1
8
7
⇔ m log( ) + n log(2) ≈ 0
8
7
⇔ (α n) log( ) + n log(2) ≈ 0
8
7
⇔ α log( ) ≈ −log(2)
8
− log(2)
⇔α≈
log( 78 )
⇔ α ≈ 5, 19089307
Proof:
When considering exact uniformly distributed α-random 3-CNF-SAT problems, each literal
occurs with exactly the same frequency in the m clauses, only the combination of the literals
But the usual uniform α-random 3-CNF-SAT problems are such that : E[#xi ] = 3 α, where
the variables are drawn randomly from a multinomial population with P [xi appears in a
3α
clause] = pi = m = n3 . For large n, the number of occurrence for each variable will asymptot-
ically follow a Normal distribution N (µ, σ 2 ) with µ = m·pi = 3 α and σ 2 = m·pi (1−pi ) ≈ 3 α.
If we consider, after sorting the clauses as explained in our descriptor approach, the sec-
ond half of the clauses (where Mα (t) ≥ t), we will get a folded normal distribution for
Di ∼ |N (0, 3 α)|
r
2
E[Di ] ≈ σ
π
r
6α p
≈ = 1, 9098 α
π
p
⇒ E[#{xi }] ≈ 3α − 1, 9098 α for the clauses where Mα (t) ≥ t
So, if we have #{xi } = α in the exact uniformly distributed α-random 3-CNF-SAT problems,
√
this corresponds to an “folded” expected occurency E[#{xi }] ≈ 3α − 1, 9098 α for usual
Corollary VII.1: The threshold α = 5, 19 found for exact uniformly distributed α-random
16
14
12
10
8
6
4
2
0
0
50
100
150
200
250
300
Ordered
clauses
for
3-‐CNF-‐SAT
problem
with
n=75
and
m=325
uuf75-‐01.cnf
from
hDp://www.satlib.org/ubcsat
Note : This is still a theoretical value for the threshold. Indeed, for usual uniform α-random
generated 3-CNF-SAT problem, we detect a small difference between the observed and the
Vj
theoretical expected number of solutions with respect of the first j analyzed clauses i=1 ψi .
Vj
The theoretical expected number of solutions E[#[ i=1 ψi ]] is defined as 7 · ( 47 )s · ( 78 )t , where
s is the number of clauses in {ψ2 , · · · , ψj } introducing new additional variable and t the
number of remaining clauses. Figure 5 shows the situation for a 3-CNF-SAT problem with
This difference shows that theoretical expected values are over-estimating the observed val-
ues. Let us note that we re-ordered the m clauses ψj in such a way that new additional
variables are appearing as lately as possible in the 3-CNF-SAT problem (in order not to
Our researches were built on exact uniformly distributed α-random 3-CNF-SAT problems.
The complexity analysis was mainly done in terms of expected value for some characteristics.
We see that these expected values are over-estimating the real values. This means that our
conclusions about the most difficult value for α [= 5, 19], and therefore about the maximum
theoretical value for the complexity 2Mα (t) [= 2490 ] are over fitted. Future researches will try
January 6, 2020—8 : 41 am DRAFT
A 3-CNF-SAT DESCRIPTOR ALGEBRA AND THE SOLUTION OF THE P =NP CONJECTURE 36
to suppress this bias to be more accurate in our estimation of the complexity for the NP
problems.
We have seen that for α ≈ 5, 19, the maximum complexity for a α-random 3-CNF-SAT prob-
lem will be around 2490 whatever the number of variables. The NP problems are then
not exponential but bounded exponential problems. This makes them belong-
ing to P . But even with “yottaflops” computers (1024 instructions by second), this can
Even if this paper is mostly theoretical, each theorem was validated by extensive numeri-
cal tests. Future researches will be to improve our different algorithms implementing the
References
[1] Stanley Burris. A Course in Universal Algebra. Dover Pubns, City, 2012.
[2] Th. Cormen, Ch. Leiserson, R. Rivest, and Cl. Stein. Introduction to Algo-
[5] M. Sipser. The History and Status of the P versus NP Question. Proceed-