Cellular Automata, Many-Valued Logic, and Deep Neural Networks
Cellular Automata, Many-Valued Logic, and Deep Neural Networks
NEURAL NETWORKS
1
2 YANI ZHANG AND HELMUT BÖLCSKEI
1
We will abbreviate both the singular “cellular automaton” and the plural form
“cellular automata” as “CA”.
2
With slight abuse of terminology, we shall use the term MV logic to refer to
Łukasiewicz propositional logic.
3
ReLU stands for the Rectified Linear Unit nonlinearity, defined as x 7→ max{0, x}.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 3
Notation: 1{·} denotes the truth function which takes the value 1 if the
statement inside {·} is true and equals 0 otherwise. 0N stands for the
N -dimensional column vector with all entries equal to 0. N0 := N ∪ {0}
and |A| stands for the cardinality of the set A. || · ||1 is the ℓ1 -norm.
1.1. Cellular automata. CA were invented in the 1940s by von Neu-
mann [45] and Ulam [43] in an effort to build models that are capable of
universal computation and self-reproduction. Von Neumann’s conceptual-
ization emphasized the aspect of self-reproduction, while Ulam suggested
the use of finite state machines on two-dimensional lattices. A widely
known CA is the two-dimensional Game of Life devised by Conway [18].
Despite the simplicity of its rules, the Game of Life exhibits remarkable
behavioral complexity and has therefore attracted widespread and long-
standing interest. We begin by briefly reviewing the Game of Life.
Consider an infinite two-dimensional grid of square cells centered on the
points of an underlying lattice (symbolized by dashed lines in Figure 1).
Initially, each cell (or equivalently lattice point) is in one of two possible
states, namely “live” or “dead”. Each cell has eight neighbors, taking into
account horizontal, vertical, and diagonal directions (see Figure 1). For a
given cell, we refer to the set made up of the cell itself and its neighbors as
the neighborhood of the cell. The cells change their states synchronously
at discrete time steps, all following the same rule given by:
1. Every live cell with two or three live neighbors stays live; other live
cells turn into dead cells.
2. Every dead cell with three live neighbors becomes live; other dead
cells stay dead.
x0 x−1 00 10 01 11
f (x0 , x−1 ) 0 0 0 1
Table 1. Transition function f .
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 5
··· ···
··· ···
x 0 1/3 2/3 1
f (x) 0 1 1 0
f (x) fc (x)
1 1
2/3 2/3
1/3 1/3
x x
0 1/3 2/3 1 0 1/3 2/3 1
x 0 1/3 2/3 1
g(x) 0 1/3 1 0
g(x) gc (x)
1 1
2/3 2/3
1/3 1/3
x x
0 1/3 2/3 1 0 1/3 2/3 1
where
x
W1⊕ (x, y) = −1 −1 + 1, W2⊕ (x) = −x + 1.
y
To realize the operation x ⊙ y = max{0, x + y − 1}, we directly note
that
x
max{0, x + y − 1} = ρ 1 1 − 1 = (W2⊙ ◦ ρ ◦ W1⊙ )(x, y),
y
for x, y ∈ [0, 1], where
x
W1⊙ (x, y) = 1 1 − 1, W2⊙ (x) = x.
y
⊣
Equipped with the ReLU network realizations of the logical operations
underlying Id , we are now ready to state the following universal represen-
tation result.
Proposition 3.5. Consider the DMV algebra Id in Definition 2.6. Let
n ∈ N. For each DMV term τ (x1 , . . . , xn ) and its associated term function
τ Id : [0, 1]n → [0, 1], there exists a ReLU network Φ ∈ Nn,1 such that
Φ(x1 , . . . , xn ) = τ Id (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ [0, 1]n .
Proof. The proof follows by realizing the logical operations appearing
in the term function τ Id through corresponding concatenations, according
to Lemma 3.1, of ReLU networks implementing the operations ⊕ and ⊙
as by Lemma 3.4 and noting that ¬x = 1 − x, δi x = 1i x are trivally ReLU
networks with one layer. ⊣
We note that, in general, the ReLU network Φ in Proposition 3.5 will be
a properly deep network as it is obtained by concatenating the networks
realizing the basic logical operations ¬, ⊕, ⊙, and {δi }i∈N .
Now the ground has been prepared for the central result in this section,
namely a universal ReLU network realization theorem for CA transition
functions. Specifically, this will be effected by combining the connection
between CA and DMV algebras established in Section 2 with the ReLU
network realizations of the logical operations in DMV algebras presented
above.
Theorem 3.6. Consider a CA with cellular space dimension d ∈ N,
1
neighborhood size n ∈ N, state set K = {0, k−1 , . . . , k−2
k−1 , 1} of cardinality
n
k ∈ N, k ≥ 2, and transition function f : K → K. There exists a ReLU
network Φ ∈ Nn,1 satisfying
Φ(x1 , . . . , xn ) = f (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ K n .
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 19
We are left with the design of the network Φf which has to satisfy
c[z]
c[z − 1]
F (c)[z] = Φf .
..
.
c[z − n + 1]
It follows by inspection of (20) that this amounts to realizing the CA
transition function f through the ReLU network Φf . Based on this insight,
we can apply Theorem 3.6 to conclude the existence of Φf . In fact, the
proof of Theorem 3.6 spells out how Φf can be obtained explicitly by
composing the logical operations in the DMV term associated with f . The
proof is now completed by noting that the overall network Φ is obtained
by applying Lemma 3.2 and Lemma 3.3 to combine Φf and Φh according
to (17). ⊣
for all (x1 , . . . , xn ) ∈ K n . This, of course, requires that the RNN being
trained have “seen” all possible combinations of neighborhood states and
thereby the transition function f on its entire domain K n , a condition met
when the training configuration sequences are sufficiently long and their
initial configurations exhibit sufficient richness [1, Thesis 3.1, Thesis 3.2].
We will make this a standing assumption. In addition, the cellular space,
the state set, and the neighborhood set are taken to be known a priori.
4.1. Interpolation. The procedure for extracting DMV terms under-
lying CA evolution data presented in Section 4.3 below works off Φf as a
(continuous) function mapping [0, 1]n to [0, 1]. It turns out, however, that
condition (25) does not uniquely determine Φf on [0, 1]n as there are—in
general infinitely many—different ways of interpolating f (x1 , . . . , xn ) to a
continuous piecewise linear function; recall that ReLU networks always re-
alize continuous piecewise linear functions. Since Theorem 2.3 states that
the truth functions associated with DMV terms under Id are continuous
piecewise linear functions with rational coefficients, we can constrain the
weights of Φf to be rational. In fact, as the next two lemmata show, when
interpolating CA transition functions, we can tighten this constraint even
further, namely to integer weights and rational biases. We first establish
that CA transition functions can be interpolated to continuous piecewise
linear functions with integer weights and rational biases4 . Then, it is
shown that such functions can be realized by ReLU networks with integer
weights and rational biases.
4
To be consistent with terminology used in the context of neural networks, for a
linear function of the form p(x1 , . . . , xn ) = m1 x1 + · · · + mn xn + b, the coefficients mi
will often be referred to as weights and b as bias.
24 YANI ZHANG AND HELMUT BÖLCSKEI
1/2
0 1/2 1
1 1
(0, 0) ( k−1 , 0) (0, 0) ( k−1 , 0)
For n ≥ 3, we first divide the unit cube [0, 1]n into the n-dimensional
i1 i1 +1 in in +1
cubes [ k−1 , k−1 ] × · · · × [ k−1 , k−1 ], i1 , . . . , in ∈ {0, . . . , k − 2}. Each of
the resulting smaller cubes is then subdivided following the procedure
1
described in [22, Proof of 2.10], which divides, e.g., ∆n−1 × [0, k−1 ] into n-
n−1 n−1 1
simplices as follows. Let ∆ × {0} = [v0 , . . . , vn−1 ] and ∆ × { k−1 }=
[w0 , . . . , wn−1 ] and note that the coordinates of vj ∈ R and wj ∈ Rn n
1
coincide in the first n − 1 indices. Then, ∆n−1 × [0, k−1 ] is given by the
union of the n-simplices [v0 , . . . , vj , wj , . . . , wn−1 ], j = 0, . . . , n − 1, each
intersecting the next one in an (n−1)-simplex face. The result of this pro-
cedure yields a subdivision of [0, 1]n into n-simplices with vertices K n . As
in the case n = 2, an interpolating linear function is uniquely determined
on each of these n-simplices [41, Chapter 13]; we stitch the resulting linear
pieces together to obtain a function fc satisfying Property 1 and exhibit-
ing the structure as demanded by Property 3. As fc constitutes a simplex
interpolation of the discrete function f on the cube [0, 1]n , it is continuous
[39], [48], thereby satisfying Property 2.
We hasten to add that, for n ≥ 2, the set of n-simplices resulting from
the subdivision procedure employed here is not unique, see Figure 7 for
an illustration in the case n = 2. But, each fixed subdivision into n-
simplices uniquely determines a continuous piecewise linear function fc
that interpolates f .
26 YANI ZHANG AND HELMUT BÖLCSKEI
Figure 8 shows two different ReLU networks Φf1 , Φf2 : [0, 1]2 → [0, 1],
both with integer weights and biases in Qk , interpolating the transition
function. We remark that Φf1 effects simplex interpolation, while Φf2 does
not. Now, applying the DMV formula extraction algorithm introduced in
Section 4.3 below yields the DMV term associated with Φf1 as
τ1 = (x−1 ⊕ x−1 ) ∧ (¬x−1 ⊕ ¬x−1 ) ∧ (x0 ⊕ x0 ) ∧ (¬x0 ⊕ ¬x0 ),
and that associated with Φf2 according to
τ2 = (x−1 ⊕ x−1 ) ∧ (¬x−1 ⊕ ¬x−1 ) ∧ (x0 ⊕ x0 ⊕ x0 ) ∧ (¬x0 ⊕ ¬x0 ⊕ ¬x0 ).
These two DMV terms are algebraically and functionally different, but
the associated term functions under Id coincide on K 2 , i.e.,
τ1Id (x−1 , x0 ) = τ2Id (x−1 , x0 ) = f (x−1 , x0 ),
for (x−1 , x0 ) ∈ {0, 1/2, 1}2 .
We shall consider all term functions that coincide with a given CA
transition function on K n to constitute an equivalence class in the sense
of faithfully describing the logical behavior of the underlying CA.
There are two further sources of potential nonuniqueness discussed next.
4.2. Uniqueness properties of DMV formula extraction. We
first comment on a structural property pertaining to the RNN emulat-
ing the CA evolution. Recall the decomposition of Φ in (17). Our con-
struction built on the idea of having the hidden-state vector store the
neighbors of the current input cell, leading to the specific form of Φh ,
and making the subnetwork Φf be responsible for the computation of the
output samples. But this separation need not be the only way the RNN
can realize the CA evolution. The resulting ambiguity is, however, eas-
ily eliminated by enforcing the split of Φ according to (17) on the RNN
30 YANI ZHANG AND HELMUT BÖLCSKEI
This shows that even if we restrict ourselves to DNF, the algebraic ex-
pression will not be unique in general. Moreover, we can also express rule
30 in conjunctive normal form (CNF), i.e., as a concatenation of a finite
number of clauses linked by the Boolean ⊙ operation, where each clause
consists of a finite number of variables or negated variables connected by
the Boolean ⊕ operation, e.g.,
or
and
Now, noting that the ReLU network realization of f30 as expressed in (32),
is given by
with
−1 −1 1 0
x1
f30 1 1 −1
−1
−1 1 −1 x0 + 0 ,
Ŵ1 (x−1 , x0 , x1 ) =
x−1
1 −1 −1 0
f30 1 1 1 1 0
Ŵ2 (x) = x+ , x ∈ R4
1 1 1 1 −1
f30
x ∈ R2 ,
Ŵ3 (x) = 1 −1 x,
we can conclude, by comparing to the network (7) built based on (2),
that different algebraic expressions for f30 lead to different ReLU network
realizations. Yet, these two networks must exhibit identical input-output
relations. Conversely, the observation just made shows that ReLU net-
works can be modified without changing their input-output relations.
4.3. Extracting DMV formulae from trained networks. We now
discuss how DMV formulae can be read out from the network Φf . Recall
that the idea is that Φf was trained on CA evolution data and the ex-
tracted DMV formula describes the logical behavior underlying this CA.
In the Boolean case, with K = {0, 1}, the truth table representing the
CA transition function can be obtained by passing all possible neighbor-
hood state combinations through the trained network Φf and recording
the corresponding outputs. Following, e.g., the procedure in [38, Section
12.2], this truth table can then be turned into a Boolean formula. For
state sets K of cardinality larger than two, we are not aware of systematic
procedures that construct DMV terms from truth tables. The approach
we develop here applies to state sets of arbitrary cardinality and does not
work off truth tables, but rather starts from a (trained) ReLU network Φf
that, by virtue of satisfying the interpolation property (25), realizes a
linear interpolation of the truth table.
We start by noting that Φf encodes its underlying DMV formula both
through the network topology and the network weights and biases. Hon-
oring how the network architecture, i.e., layer-by-layer compositions of
affine mappings and the ReLU nonlinearity, affects the extracted DMV
term, we aim to proceed in a decompositional manner and on a node-by-
node basis. It turns out, however, that the individual neurons, in general,
will not correspond to DMV terms. To see this, recall that DMV term
functions map [0, 1]n to [0, 1] and note that the range of the function ρ is
not in [0, 1] in general, e.g., ρ(3x) : [0, 1] → [0, 3]. We will deal with this
matter by transforming the ρ-neurons in Φf into σ-neurons with a suit-
ably chosen σ : R → [0, 1]. This will be done in a manner that preserves
the network’s input-output relation and is reversible in a sense to be made
precise below. We start by defining σ-networks.
32 YANI ZHANG AND HELMUT BÖLCSKEI
We are now ready to describe how the DMV term underlying a given
ReLU network with integer weights and biases in Qk mapping [0, 1]n to
[0, 1] can be extracted. The corresponding algorithm starts by apply-
ing Lemma 4.3 to convert the network into a functionally equivalent σ-
network
Ψf = WL ◦ σ ◦ · · · ◦ W2 ◦ σ ◦ W1 ,
Ψf = σ ◦ WL ◦ · · · ◦ σ ◦ W2 ◦ σ ◦ W1 .
6
Implementation available at https://www.mins.ee.ethz.ch/research/downloads
/NN2MV.html
36 YANI ZHANG AND HELMUT BÖLCSKEI
x x x
0 1/3 1 0 1/3 1 0 2/3 1
′
N0 = d, and NL = d . The basic idea of the proof is to use the relationship
(48) σ(x) = ρ(x) − ρ(x − 1), for all x ∈ R,
to replace every σ-neuron in Ψ with a pair of ρ-neurons. We start with
σ ◦ W1 to obtain the equivalent network
(1) (1)
Ψ(1) = WL ◦ σ ◦ · · · ◦ σ ◦ W2 ◦ H1 ◦ ρ ◦ W1 ,
where
(1) W1 (x)
W1 (x) := , x ∈ RN 0 ,
W1 (x) − 1N1
(1)
H1 (x) := IN1 −IN1 x, x ∈ R2N1 ,
with 1N1 denoting the N1 -dimensional column vector with all entries equal
to 1. It follows directly that Ψ(1) has integer weights and biases in Qk .
Continuing in this manner, we get
(L−1) (L−1) (2) (1) (1)
Ψ(L−1) = WL ◦ HL−1 ◦ ρ ◦ WL−1 ◦ · · · ◦ W2 ◦ H1 ◦ ρ ◦ W1 ,
with
(ℓ) Wℓ (x)
Wℓ (x) = , x ∈ RNℓ−1 ,
Wℓ (x) − 1Nℓ
(ℓ)
Hℓ (x) = INℓ −INℓ x, x ∈ R2Nℓ ,
for ℓ = 1, . . . , L − 1, satisfying
Ψ(L−1) (x) = Ψ(x), for all x ∈ Rd .
The proof is concluded upon identifying Ψ(L−1) with Φ and noting that
Ψ(L−1) has integer weights and biases in Qk and L(Ψ(L−1) ) = L(Ψ). ⊣
7
To keep the exposition simple, we consider the form of Ψ for L ≥ 3, the cases
L = 1, 2 are trivially contained in the discussion.
38 YANI ZHANG AND HELMUT BÖLCSKEI
Appendix D.
Lemma D.1. Let n ∈ N and let fc : [0, 1]n → [0, 1] be a continuous
piecewise linear function with linear pieces
(49) pj (x1 , . . . , xn ) = mj1 x1 + · · · + mjn xn + bj , j = 1, . . . , ℓ,
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 39
It hence suffices to prove that fc (x) ≥ gπ (x), for x ∈ [0, 1]n , for all π ∈ Σ,
to obtain
fc (x) = max gπ (x), for x ∈ [0, 1]n ,
π∈Σ
which establishes the desired result (50). By continuity of fc and the fact
that the min and max of continuous functions are continuous, it suffices
to establish fc ≥ gπ on the set
{x ∈ [0, 1]n : ∃π ∈ Σ | x is in the interior of Pπ }.
Now fix an arbitrary π ∈ Σ. As already noted above, gπ (x) = fc (x), for
x ∈ Pπ . Take an arbitrary point y ∈ / Pπ that is in the interior of some
Pη . There exists a k ∈ {1, . . . , ℓ} so that fc (x) = pπ(k) (x), for x ∈ Pη . We
treat the cases k ≤ iπ and k > iπ separately. First, if k ≤ iπ , then
fc (y) = pπ(k) (y) ≥ min pπ(i) (y) = gπ (y).
i∈{1,...,iπ }
REFERENCES