0% found this document useful (0 votes)

22 views

Cellular Automata, Many-Valued Logic, and Deep Neural Networks

We develop a theory characterizing the fundamental capability of deep neural networks to learn, from evolution traces, the logical rules governing the behavior of cellular automata (CA). This is accomplished by first establishing a novel connection between CA and Lukasiewicz propositional logic. While binary CA have been known for decades to essentially perform operations in Boolean logic, no such relationship exists for general CA.

Uploaded by

Glad Wing

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Cellular Automata, Many-Valued Logic, and Deep Neural Networks

Uploaded by

Glad Wing

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP

NEURAL NETWORKS

YANI ZHANG AND HELMUT BÖLCSKEI

arXiv:2404.05259v1 [cs.AI] 8 Apr 2024

Abstract. We develop a theory characterizing the fundamental capability of deep

neural networks to learn, from evolution traces, the logical rules governing the be-
havior of cellular automata (CA). This is accomplished by first establishing a novel
connection between CA and Łukasiewicz propositional logic. While binary CA have
been known for decades to essentially perform operations in Boolean logic, no such
relationship exists for general CA. We demonstrate that many-valued (MV) logic,
specifically Łukasiewicz propositional logic, constitutes a suitable language for char-
acterizing general CA as logical machines. This is done by interpolating CA transi-
tion functions to continuous piecewise linear functions, which, by virtue of the Mc-
Naughton theorem, yield formulae in MV logic characterizing the CA. Recognizing
that deep rectified linear unit (ReLU) networks realize continuous piecewise linear
functions, it follows that these formulae are naturally extracted from CA evolution
traces by deep ReLU networks. A corresponding algorithm together with a software
implementation is provided. Finally, we show that the dynamical behavior of CA
can be realized by recurrent neural networks.

§1. Introduction. Neural networks were originally introduced as a

computational model whose functionality imitates that of the human brain
[31]. Machine learning based on neural networks has achieved state-of-the-
art results in numerous applications, such as pattern recognition [30], game
intelligence [40], or protein structure prediction [26]. Informally speaking,
a neural network consists of layers of nodes that are connected by weighted
edges. In practice, the network topology and weights are learned through
training on data either in a supervised [28] or an unsupervised [34] manner.
An ability ingrained in the human brain is to draw inferences from data
it is presented with and apply what has been learned to new problems.
In machine learning parlance, this aspect is usually referred to as gen-
eralization or extrapolation. For example, given the geometric sequence
of integers 1, 2, 4, 8, . . . , one would conclude that the rule generating the
sequence is: The next number is obtained by multiplying the present num-
ber by 2. Based on this extracted logical rule, the entire sequence can now

H. Bölcskei gratefully acknowledges support by the Lagrange Mathematics and Com-

puting Research Center, Paris, France.

1
2 YANI ZHANG AND HELMUT BÖLCSKEI

be calculated. Such logical reasoning tasks, which appear natural to hu-

mans, may be challenging to learn for neural networks. Moreover, even
if the network had learned, e.g., the mechanism generating the geometric
sequence above, it would have the corresponding logical rule encoded in
its topology and weights and it is unclear how to extract the rule and
present it in a manner accessible to humans.
In this paper, we report an attempt at building a theory characterizing
the fundamental capability of neural networks to learn logical rules from
data. This immediately leads to the question, “What is the logical struc-
ture behind data?” Here, we shall make a first step towards developing
the corresponding research program. Concretely, we consider data gener-
ated by cellular automata1 (CA). Abstractly speaking, a CA is a discrete
dynamical system evolving on a regular lattice whose points take values in
a finite set. Starting from an initial configuration, all lattice points change
their states at synchronous discrete time steps according to a transition
function that takes the present values of the lattice point under consider-
ation and its neighbors as inputs. The evolution of the lattice point states
over time furnishes the data sequence we shall be interested in. Now,
assume that the underlying CA is binary, i.e., the state set is given by
{0, 1}. In this case the transition function governing the CA evolution is
a Boolean function [46, pp. 29-31], which can be viewed as determining
the logical structure behind the data sequence generated by the CA.
The question we shall ask is whether one can extract the logical rule
governing a general (i.e., not only binary) CA by training a neural net-
work on data generated by the CA. While the connection between binary
CA and Boolean logic has been known for decades [49], [46], to the best of
our knowledge no such relationship has been reported in the literature for
CA with arbitrary state sets. One of the main conceptual contributions
of this paper is to demonstrate that many-valued2 (MV) logic is a suit-
able language for interpreting general CA as logical machines. We then
show that all possible transition functions can be realized by deep ReLU
networks3 , which, in turn, are found to naturally express statements in
MV logic. The dynamical system component of CA is demonstrated to
be realizable by recurrent neural networks (RNNs). Finally, we propose
and analyze a procedure for extracting the logical formulae behind CA
transition functions from neural networks trained on corresponding CA
evolution data.

1
We will abbreviate both the singular “cellular automaton” and the plural form
“cellular automata” as “CA”.
2
With slight abuse of terminology, we shall use the term MV logic to refer to
Łukasiewicz propositional logic.
3
ReLU stands for the Rectified Linear Unit nonlinearity, defined as x 7→ max{0, x}.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 3

Notation: 1{·} denotes the truth function which takes the value 1 if the
statement inside {·} is true and equals 0 otherwise. 0N stands for the
N -dimensional column vector with all entries equal to 0. N0 := N ∪ {0}
and |A| stands for the cardinality of the set A. || · ||1 is the ℓ1 -norm.
1.1. Cellular automata. CA were invented in the 1940s by von Neu-
mann [45] and Ulam [43] in an effort to build models that are capable of
universal computation and self-reproduction. Von Neumann’s conceptual-
ization emphasized the aspect of self-reproduction, while Ulam suggested
the use of finite state machines on two-dimensional lattices. A widely
known CA is the two-dimensional Game of Life devised by Conway [18].
Despite the simplicity of its rules, the Game of Life exhibits remarkable
behavioral complexity and has therefore attracted widespread and long-
standing interest. We begin by briefly reviewing the Game of Life.
Consider an infinite two-dimensional grid of square cells centered on the
points of an underlying lattice (symbolized by dashed lines in Figure 1).
Initially, each cell (or equivalently lattice point) is in one of two possible
states, namely “live” or “dead”. Each cell has eight neighbors, taking into
account horizontal, vertical, and diagonal directions (see Figure 1). For a
given cell, we refer to the set made up of the cell itself and its neighbors as
the neighborhood of the cell. The cells change their states synchronously
at discrete time steps, all following the same rule given by:
1. Every live cell with two or three live neighbors stays live; other live
cells turn into dead cells.
2. Every dead cell with three live neighbors becomes live; other dead
cells stay dead.

Figure 1. Cell z and its neighborhood.

This process repeats at discrete time steps, forming an evolution of the

system. We refer to Figure 2 for an example illustration. As already
mentioned, despite the simplicity of the transition rule, the evolutions
induced by different initial configurations exhibit rich and complex long-
term behavior. It is shown in [8], [47], [35], for example, that given an
initial configuration with a finite number of live cells, it is an undecidable
problem whether all these live cells would eventually die out. We refer
to [4] for an in-depth discussion of the behavioral patterns of the Game
of Life and proceed to formally define CA.
4 YANI ZHANG AND HELMUT BÖLCSKEI

Figure 2. Steps of the evolution of the Game of Life.

Black cells are live; white cells are dead.

Definition 1.1 (Cellular automaton). Let d, n, k ∈ N, k ≥ 2. A cellu-

lar automaton is an ordered quadruple (Zd , K, E, f ), where
1. the d-dimensional lattice Zd is referred to as the cellular space,
1
2. K = {0, k−1 , . . . , k−2
k−1 , 1} is the state set with k states,
3. E = {0d , z1 , . . . , zn−1 } ⊂ Zd is the neighborhood set,
4. f : K n → K is the transition function.
The lattice dimension d is called the cellular space dimension. Cells are
centered on the lattice points of Zd . A configuration, denoted by c, over
the cellular space assigns a state in the state set to each cell (or equiv-
alently lattice point). For z ∈ Zd , denote by c[z] its state under the
configuration c. Let C be the set of all possible configurations over the
cellular space. The CA map F : C → C effecting the configuration evo-
lution from one time step to the next is determined by the transition
function f according to
(1) F (c)[z] = f (c[z], c[z + z1 ], . . . , c[z + zn−1 ]), for all z ∈ Zd .
Remark. The CA map F is often referred to as the “global mapping
function”, whereas the transition function f is called the “local mapping
function”. Note that our choice of the state set K in Definition 1.1 is
without loss of generality, as for every general state set K ∗ , there is a
bijection h : K ∗ → K, with |K ∗ | = |K|. For example, for the state set
K ∗ = {1, 2, 3}, h : K ∗ → {0, 1/2, 1}, h(x) = (x − 1)/2 is a bijection.
We continue with examples illustrating Definition 1.1.
Example 1.1. Consider a CA with cellular space dimension d = 1,
neighborhood set E = {0, −1} ⊂ Z, state set K = {0, 1}, and the transi-
tion function f : K 2 → K as provided in Table 1. The CA map F is given
by
F (c)[z] = f (c[z], c[z − 1]), for all z ∈ Z.
Figure 3 depicts one step of the CA evolution.

x0 x−1 00 10 01 11
f (x0 , x−1 ) 0 0 0 1
Table 1. Transition function f .
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 5

··· ···

Figure 3. The state 1 is indicated by black cells, the

state 0 by white cells.

Example 1.2 (Game of Life). Consider a CA with cellular space di-

mension d = 2, state set K = {0, 1}, and neighborhood set E = {02 , z1 , . . . ,
z8 } ⊂ Z2 , where
z1 = (−1, 1), z2 = (0, 1), z3 = (1, 1), z4 = (−1, 0),
z5 = (1, 0), z6 = (−1, −1), z7 = (0, −1), z8 = (1, −1).
The transition function f : K 9 → K is given by
1x1 +···+x8 =3 ,
(
if x0 = 0,
f (x0 , . . . , x8 ) =
1x1 +···+x8 =2 + 1x1 +···+x8 =3 , if x0 = 1.
The resulting CA is the Game of Life with dead cells corresponding to
state 0 and live cells to state 1.
One of the main goals of this paper is to show that deep ReLU net-
works are capable of learning CA transition functions from CA evolution
sequences. This result also establishes that every CA transition function
can be realized by a deep ReLU network. Concretely, we build this uni-
versal CA transition function representation theorem in two steps. First,
we establish a correspondence between CA transition functions and MV
algebras [12]. Then, we show that statements in MV logic are naturally
expressed by deep ReLU networks. En route it is shown that MV logic
is a suitable language for describing CA with general state sets as logical
machines.
1.2. One-dimensional binary CA. In order to develop intuition on
the connection between CA, MV logic, and ReLU networks, we consider
a simple example setup. Specifically, we investigate CA with cellular
space Z, state set K = {0, 1}, and neighborhood set E = {−1, 0, 1}. This
class of CA is referred to as elementary CA [49]. As the transition func-
tion depends on the states of the cell itself and its left and right neighbor,
there are 23 possible states for a given cell’s neighborhood and hence a
3
total of 22 = 256 transition functions. Wolfram proposed a scheme [49]
to index the associated CA using an 8-digit binary string. For example,
Table 2 specifies the transition function of the elementary CA of index 30
(3010 = 000111102 ), and Table 3 contains that corresponding to elemen-
tary CA 110 (11010 = 011011102 ). It is remarkable that elementary CA,
6 YANI ZHANG AND HELMUT BÖLCSKEI

despite their simplicity, are capable of universal computation. Specifically,

Cook [13] proved that elementary CA 110 is Turing complete.

x−1 x0 x1 111 110 101 100 011 010 001 000

f30 (x−1 , x0 , x1 ) 0 0 0 1 1 1 1 0
Table 2. Transition function of elementary CA 30.

x−1 x0 x1 111 110 101 100 011 010 001 000

f110 (x−1 , x0 , x1 ) 0 1 1 0 1 1 1 0
Table 3. Transition function of elementary CA 110.
We now bring Boolean logic into the picture, concretely by interpreting
the state set elements 0 and 1 as truth values in Boolean algebra, and
assuming that x−1 , x0 , x1 are Boolean propositional variables. The tran-
sition function f110 , for example, can in effect be regarded as a Boolean
function, namely f110 = OR(XOR(x0 , x1 ), AND(NOT(x−1 ), OR(x0 , x1 ))). In the
same manner, we get f30 (x−1 , x0 , x1 ) = XOR(x−1 , OR(x0 , x1 )). As shown in
[46, pp. 29-31], [38, Section 12.2], the transition function of every elemen-
tary CA can be expressed as a formula in Boolean logic, particularly in
a canonical form called disjunctive normal form (DNF), which we define
next.
Definition 1.2. In Boolean logic, a finite string of symbols is called a
conjunction if it is given by AND connections of a finite number of propo-
sitional variables (possibly negated), and a disjunction if the connections
are effected through the OR operation. A logical formula is said to be
in disjunctive normal form if it is given by the OR connection of a finite
number of conjunctions.
Let the OR operation be denoted by ⊕, AND by ⊙, and NOT by ¬. Fol-
lowing the procedure in [38, Section 12.2], we can express the transition
function in Table 2 in DNF according to
(2) f30 = (x−1 ⊙ ¬x0 ⊙ ¬x1 ) ⊕ (¬x−1 ⊙ x1 ) ⊕ (¬x−1 ⊙ x0 ).
Having established the connection between elementary CA and Boolean
logic, by expressing the transition function in terms of a Boolean formula,
we are ready to have ReLU networks enter the story. Specifically, this shall
be effected by realizing the Boolean logical operations ⊕, ⊙, and ¬ through
ReLU networks (see Definition 3.1 below). With the ReLU function ρ :
R → R, ρ(x) := max{0, x}, the binary and n-ary conjunction can be
written as
(3) x1 ⊙ x2 = ρ(x1 + x2 − 1),
(4) x1 ⊙ · · · ⊙ xn = ρ(Σni=1 xi − (n − 1)).
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 7

Likewise, the binary and n-ary disjunction can be realized according to

(5) x1 ⊕ x2 = ρ(x1 + x2 ) − ρ(x1 + x2 − 1),
(6) x1 ⊕ · · · ⊕ xn = ρ(Σni=1 xi ) − ρ(Σni=1 xi − 1).
Finally, the Boolean operation ¬ can be expressed as
¬x = 1 − x.
We now combine these results to realize the logical formula f30 in (2) in
terms of the ReLU network Φf30 : R3 → R with three layers according to
(7) Φf30 := W3f30 ◦ ρ ◦ W2f30 ◦ ρ ◦ W1f30
where
  
−1 −1 1 x1
W1f30 (x−1 , x0 , x1 ) =  1 0 −1  x0  ,
0 1 −1 x−1

1 1 1 0
W2f30 (x) = x+ , for x ∈ R3 ,
1 1 1 −1

W3f30 (x) = 1 −1 x, for x ∈ R2 .

It can now be verified directly that

Φf30 (x−1 , x0 , x1 ) = f30 (x−1 , x0 , x1 ), for (x−1 , x0 , x1 ) ∈ {0, 1}3 .
We hasten to add that ReLU network realizations of Boolean formulae
are not unique. For example, upon noting that ΦId (x) = ρ(x) − ρ(−x) =
x, x ∈ R, is a valid ReLU network, the map Φf30 in (7) can be augmented
according to
ΦId ◦ · · · ◦ ΦId ◦ Φf30
without changing the input-output function it realizes. This results in an
infinite collection of different ReLU networks, all realizing the Boolean
formula f30 . In addition, the Boolean logical formula corresponding to
a given binary truth table is not unique either [38, Section 12.4]. These
issues of nonuniqueness will be addressed in more detail in Section 4.
We can summarize what was done in this section as follows. In an effort
to implement the transition function of a one-dimensional binary CA by a
ReLU neural network, we first expressed the transition function, originally
specified in the form of a truth table, by a formula in Boolean logic. This
formula was then realized by a ReLU network. The bridge via Boolean
algebra effected in this manner serves to illustrate that CA are actually
“logical machines”, an aspect elaborated on in the literature before [46],
[49], [45], but notably only in the binary case.
8 YANI ZHANG AND HELMUT BÖLCSKEI

Inspired by this connection, we shall show that the transition functions

of general CA with arbitrary state sets, in particular of arbitrary cardinal-
ity, can be expressed in terms of formulae in MV logic [12], which, in turn,
are found to be naturally realized by ReLU networks. Embedding these
(feedforward) networks into a recurrent neural network structure [17], we
get a dynamical system realization of general CA, entirely in terms of
neural networks.
1.3. Previous work. Despite the fact that neural networks and CA
stand for parallel efforts in building computational models of the human
brain [9], [27], a profound theoretical understanding of the connections be-
tween the two structures seems to be lacking. Wulff and Hertz [52] consider
shallow neural networks with the threshold activation function to learn the
transition functions of one- and two-dimensional binary CA. While no de-
tails on the training algorithm are provided, the experiments in [52] lead
to the conclusion that the approach employed is capable of learning only
a small subset of the CA considered. More recently, Gilpin [20] designed
a convolutional neural network simulating two-dimensional CA with arbi-
trary state sets and neighborhood size.
The connection between general CA and MV logic we report here seems
completely new. For binary CA and Boolean logic, besides the classical
results in [46], [49], more recent work studying CA evolution through the
lens of formal logic is reported in [14], [25]. Specifically, Sukanta and
Chakraborty [14] show that binary CA on finite one-dimensional cellular
spaces are models of certain logical languages in the domain of binary
strings. Inokuch et al. [25] use the interpretation of binary CA transition
functions as Boolean logical formulae to demonstrate that the multipli-
cation of logical formulae, defined by monoid action, corresponds to the
composition of global CA maps associated with the logical formulae under
consideration.
As for the connection between MV logic and neural networks, Amato
et al. [6] establish a correspondence between formulae in Łukasiewicz
logic and neural networks with rational weights and the clipped ReLU
(CReLU) nonlinearity σ(x) = min{1, max{0, x}}. In [15] this result is
further extended to a correspondence between Riesz MV algebras and
CReLU networks with real weights; this is accomplished by adding scalar
multiplication over the real numbers to Łukasiewicz propositional logic.
1.4. Outline of the paper. Section 2 establishes that CA are in
essence machines performing MV logic. In Section 3, we show how formu-
lae in MV logic can be realized by (deep) ReLU networks. The problem
of extracting CA transition functions, along with underlying formulae in
MV logic, from ReLU networks that have been trained on corresponding
CA evolution data is addressed in Section 4.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 9

§2. Many-valued logic and cellular automata. We now turn to

generalizing the correspondence between one-dimensional binary CA and
Boolean logic. Specifically, we show that CA with arbitrary cellular space
dimension and arbitrary state set (cardinality) can be seen as machines
carrying out operations in Łukasiewicz propositional logic [42], a many-
valued extension of Boolean logic. The corresponding algebraic counter-
part is known as Chang’s many-valued algebras [10]. The connection we
uncover is built on a fundamental result in MV logic, which states that ev-
ery piecewise linear function, whose linear pieces have integer coefficients,
corresponds to a formula in MV logic and vice versa. Known in the liter-
ature as McNaughton theorem [32], this result constitutes the equivalent
of translating between binary truth tables and Boolean logical formulae.
2.1. Many-valued algebras. We start with a brief review, follow-
ing [12], of the basic elements in the theory of MV algebras.
Definition 2.1. A many-valued algebra is a structure A = ⟨A, ⊕, ¬, 0⟩
consisting of a nonempty set A, a constant 0 ∈ A, a binary operation ⊕,
and a unary operation ¬ satisfying the following axioms:
(8.1) x ⊕ (y ⊕ z) = (x ⊕ y) ⊕ z
(8.2) x⊕y =y⊕x
(8.3) x⊕0=x
(8.4) ¬¬x = x
(8.5) x ⊕ ¬0 = ¬0
(8.6) ¬(¬x ⊕ y) ⊕ y = ¬(¬y ⊕ x) ⊕ x.
Specifically, (8.1)-(8.3) state that the structure ⟨A, ⊕, 0⟩ is an abelian
monoid. An MV algebra ⟨A, ⊕, ¬, 0⟩ is said to be nontrivial iff |A| > 1.
On each MV algebra we can define a constant 1 and a binary operation ⊙
as follows:
(9) 1 := ¬0
(10) x ⊙ y := ¬(¬x ⊕ ¬y).
The ensuing identities are then direct consequences of Definition 2.1:
(11.1) x ⊙ (y ⊙ z) = (x ⊙ y) ⊙ z
(11.2) x⊙y =y⊙x
(11.3) x⊙1=x
(11.4) x ⊙ 0 = 0.
We will frequently use the notions of MV term and term function for-
malized as follows.
10 YANI ZHANG AND HELMUT BÖLCSKEI

Definition 2.2 (MV term). Let n ∈ N and Sn = {(, ), 0, ¬, ⊕, x1 , . . . ,

xn }. An MV term in the variables x1 , . . . , xn is a string over Sn arising
from a finite number of applications of the operations ¬ and ⊕ as follows.
The elements 0 and xi , for i = 1, . . . , n, considered as one-element strings,
are MV terms.
1. If the string τ is an MV term, then ¬τ is also an MV term.
2. If the strings τ and γ are MV terms, then (τ ⊕ γ) is also an MV term.
In the remainder of the paper, we write τ (x1 , . . . , xn ) to emphasize that τ
is an MV term in the variables x1 , . . . , xn .
For instance, the following strings over S2 = {(, ), 0, ¬, ⊕, x1 , x2 } are
MV terms in the variables x1 and x2 :
0, x1 , x2 , ¬0, ¬x2 , x1 ⊕ ¬x2 .
MV terms are syntactic expressions without semantic meaning. To en-
dow them with semantics, an underlying MV algebra must be specified.
The resulting functions will be referred to as term functions.
Definition 2.3 (Term function). Let τ (x1 , . . . , xn ) be an MV term and
A = ⟨A, ⊕, ¬, 0⟩ an MV algebra. The term function τ A : An → A associ-
ated with τ under A is obtained by substituting, for i = 1, . . . , n, ai ∈ A
for all occurrences of xi in τ and interpreting the symbols ⊕ and ¬ ac-
cording to how they are specified in A.
We next demonstrate that Boolean algebra is an MV algebra.
Definition 2.4 (Boolean algebra). Consider the set B = {0, 1}. Define
the binary operation ⊕ and the unary operation ¬ on B according to
0⊕0=0
0⊕1=1 ¬0 = 1
and
1⊕0=1 ¬1 = 0.
1⊕1=1
It is readily verified that the resulting structure B = ⟨B, ⊕, ¬, 0⟩ satisfies
the axioms in Definition 2.1 and is hence an MV algebra. In fact, by
further defining the binary operation ⊙ on B through
x ⊙ y := ¬(¬x ⊕ ¬y),
which yields
0⊙0=0
0⊙1=0
1⊙0=0
1 ⊙ 1 = 1,
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 11

we immediately see that the operations ⊙, ⊕, and ¬ correspond to the AND,

OR, and NOT operations, respectively, in Boolean logic with 0 designating
False and 1 standing for True.
Recall the informal discussion in Section 1.2 explaining how the transi-
tion functions of binary one-dimensional CA can be expressed in terms of
Boolean logic. We now formalize this observation (for arbitrary cellular
space dimension) by casting it into MV algebras.
Lemma 2.1. Consider a CA with cellular space dimension d ∈ N, neigh-
borhood size n ∈ N, state set K = {0, 1}, and transition function f :
K n → K specified in the form of a (Boolean) truth table. There exists
an MV term τ (x1 , . . . , xn ) with associated term under the Boolean algebra
B = ⟨{0, 1}, ⊕, ¬, 0⟩ satisfying
τ B (x1 , . . . , xn ) = f (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ {0, 1}n .
Proof. We first transform the truth table specifying the transition
function f into a Boolean formula [38, Section 12.2]. Identifying, in the
resulting Boolean algebraic expression, the Boolean operations AND, OR,
and NOT with the operations ⊙, ⊕, and ¬ in a general MV algebra yields
a valid MV term τ . Converting τ into its associated term function τ B
according to Definition 2.3 establishes the result. ⊣
In order to extend Lemma 2.1 to CA with state sets of arbitrary cardi-
nality, we need a logic that can cope with more than two truth values. We
shall see that the theory of MV logic [12] provides a suitable framework
and start by developing the underlying algebraic structure, namely that
of MV algebras.
2.2. MV algebras and CA. We begin with a simple example of an
MV algebra.
Example 2.1. For k ∈ N, k ≥ 2, consider the CA state set
n 1 k−2 o
(12) K = 0, , . . ., ,1
k−1 k−1
of cardinality k. Define the binary operation ⊕ and the unary operation ¬
on K according to
x ⊕ y = min{1, x + y}
¬x = 1 − x.
It is readily seen that the structure Ak := ⟨K, ⊕, ¬, 0⟩ satisfies the axioms
in Definition 2.1 and hence constitutes an MV algebra. For k = 2, Ak
reduces to the Boolean algebra in Definition 2.4.
For the MV algebra Ak in Example 2.1, given an MV term τ in n
variables, the associated term function τ Ak : K n → K can be interpreted
as a CA transition function. We illustrate this observation through a
12 YANI ZHANG AND HELMUT BÖLCSKEI

simple example, but first note that τ Ak is guaranteed to map K n to K

owing to the specific structure of the state set K and the operations ⊕
and ¬.
Example 2.2. Consider the MV algebra A3 according to Example 2.1
with the associated set K = {0, 1/2, 1}. The term function τ A3 : K 3 → K
corresponding to the MV term τ = x−1 ⊕ x0 ⊕ x1 equals the transition
function of the 3-color totalistic CA [49] with cellular space Z, neighbor-
hood set E = {−1, 0, 1}, state set K, and transition function f as specified
in Table 4 (note that f depends only on the sum of the neighborhood state
values).

x−1 + x0 + x1 3 5/2 2 3/2 1 1/2 0

f (x−1 , x0 , x1 ) 1 1 1 1 1 1/2 0
Table 4. Transition function f of the totalistic CA.
Inspired by this example, we now ask whether every CA transition func-
tion has an underlying MV term with associated term function under a
suitable MV algebra equal to the transition function. Formally speaking,
we seek a generalization of Lemma 2.1 to the case of CA with arbitrary
state set cardinality k. If answered in the affirmative, this would show
that CA are essentially machines that perform operations in a suitable
logic. It turns out that this is, indeed, the case, but we need to consider
an MV algebra that accommodates state sets K of arbitrary cardinality k.
Definition 2.5. Consider the unit interval [0, 1] on R, and define x ⊕
y = min{1, x + y} and ¬x = 1 − x, for x, y ∈ [0, 1]. It can be verified that
the structure I = ⟨[0, 1], ⊕, ¬, 0⟩ is an MV algebra. In particular, I con-
stitutes the algebraic counterpart of Łukasiewicz propositional logic [10].
We further define the operation x ⊙ y := ¬(¬x ⊕ ¬y) = max{0, x + y − 1}.
The MV algebra I in Definition 2.5 is the so-called standard MV algebra.
As shown in [10, Section 5], [11], an equation holds in every MV algebra
iff it holds in the standard MV algebra I, endowing I with universality.
Next, recall that for the binary state set {0, 1}, in the proof of Lemma 2.1
we started by converting the binary truth table specifying the transition
function f into a formula in Boolean logic. In MV logic truth tables are
given by mappings from [0, 1]n to [0, 1]. While in the Boolean case, every
truth table can be cast into a formula in Boolean logic [38, Section 12.2],
this does not hold for the MV algebra I. The McNaughton theorem,
stated next, explicitly characterizes the class of functions from [0, 1]n to
[0, 1] that have underlying MV terms τ . Here, “underlying” means that
the term function associated with τ under I equals the function under
consideration.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 13

Theorem 2.2. [McNaughton theorem [32]] Consider the MV algebra

I = ⟨[0, 1], ⊕, ¬, 0⟩ in Definition 2.5. Let n ∈ N. For a function fc :
[0, 1]n → [0, 1] to have an associated MV term τ such that τ I = fc on
[0, 1]n , it is necessary and sufficient that
1. fc is continuous with respect to the natural topology on [0, 1]n ,
2. there exist linear functions p1 , . . . , pℓ with integer coefficients, i.e.,

pj (x1 , . . . , xn ) = mj1 x1 + · · · + mjn xn + bj , j = 1, . . . , ℓ,

where mj1 , . . . , mjn , bj ∈ Z, for j = 1, . . . , ℓ, such that for every

x ∈ [0, 1]n , there is a j ∈ {1, . . . , ℓ} with fc (x) = pj (x).
Functions satisfying these conditions are called McNaughton functions.

We now demonstrate how the McNaughton theorem can be employed

to decide whether a given CA transition function has an underlying MV
term. Since term functions under I are continuous piecewise linear func-
tions according to Theorem 2.2, we start by linearly interpolating CA
transition functions f : K n → K to continuous piecewise linear functions
fc : [0, 1]n → [0, 1]. The following simple example illustrates the approach
we pursue.

Example 2.3. Consider a one-dimensional CA with cellular space Z,

neighborhood set E = {0}, state set K = {0, 1/3, 2/3, 1}, and transition
function f according to

x 0 1/3 2/3 1
f (x) 0 1 1 0

f (x) fc (x)
1 1

2/3 2/3

1/3 1/3

x x
0 1/3 2/3 1 0 1/3 2/3 1

Figure 4. The transition function f and the associated

function fc in Example 2.3.
14 YANI ZHANG AND HELMUT BÖLCSKEI

The continuous piecewise linear function fc : [0, 1] → [0, 1] obtained by

interpolating f is given by

3x,
 0 ≤ x < 1/3
(13) fc (x) = 1, 1/3 ≤ x < 2/3 .

−3x + 3, 2/3 ≤ x ≤ 1


As fc in (13) has exclusively integer coefficients and is piecewise linear and

continuous by construction, it satisfies Conditions 1 and 2 in Theorem 2.2.
Hence, there is an underlying MV term, shown in Appendix A to be given
by
τ = (x ⊕ x ⊕ x) ∧ ¬0 ∧ (x ⊙ x ⊙ x),
where, for brevity, we write x ∧ y = (x ⊕ ¬y) ⊙ y.
However, for the same cellular space and state set, a slightly different
transition function g, given by

x 0 1/3 2/3 1
g(x) 0 1/3 1 0

yields the associated piecewise linear function


x,
 0 ≤ x < 1/3
gc (x) = 2x − 1/3, 1/3 ≤ x < 2/3 ,

−3x + 3, 2/3 ≤ x ≤ 1


which fails to satisfy Condition 2 in Theorem 2.2, as one of its linear

pieces exhibits a noninteger coefficient. As the McNaughton theorem is
iff, we can conclude that gc does not have an underlying MV term τ such
that τ I = gc .

g(x) gc (x)
1 1

2/3 2/3

1/3 1/3

x x
0 1/3 2/3 1 0 1/3 2/3 1

Figure 5. The transition function g and the continuous

function gc in Example 2.3.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 15

Example 2.3 manifests missing structure for the MV algebra I to allow

universal conversion from CA transition functions to MV terms. Con-
cretely, inspection of the state set K shows that we would need a Mc-
Naughton theorem that allows rational coefficients in Condition 2 of Theo-
rem 2.2. It turns out that so-called divisible many-valued (DMV) algebras
resolve this issue.
Definition 2.6. Consider the MV algebra I = ⟨[0, 1], ⊕, ¬, 0⟩ in Defi-
nition 2.5. Define the family of unary operations {δi : [0, 1] → [0, 1]}i∈N
according to
1
δi x = x, x ∈ [0, 1], for all i ∈ N.
i
It is easily verified that the structure Id = ⟨[0, 1], ⊕, ¬, 0, {δi }i∈N ⟩ is a
DMV algebra [19]. In particular, it constitutes the algebraic counterpart
of Rational Łukasiewicz logic [19].
The following result, often referred to as rational McNaughton theo-
rem, explicitly characterizes the class of term functions under the DMV
algebra Id .
Theorem 2.3. [Rational McNaughton theorem [7]] Consider the DMV
algebra Id = ⟨[0, 1], ⊕, ¬, 0, {δi }i∈N ⟩ in Definition 2.6. Let n ∈ N. For a
function fc : [0, 1]n → [0, 1] to have an associated DMV term τ such that
τ Id = fc on [0, 1]n , it is necessary and sufficient that
1. fc is continuous with respect to the natural topology on [0, 1]n ,
2. there exist linear functions p1 , . . . , pℓ with rational coefficients, i.e.,
pj (x1 , . . . , xn ) = mj1 x1 + · · · + mjn xn + bj , j = 1, . . . , ℓ,
where bj , mj1 , . . . , mjn ∈ Q, for j = 1, . . . , ℓ, such that for every
x ∈ [0, 1]n , there is a j ∈ {1, . . . , ℓ} with fc (x) = pj (x).
The central result we have been working towards in this section now
follows readily.
Proposition 2.4. Consider a CA with cellular space dimension d,
neighborhood size n, state set K = {0, 1/(k − 1), . . . , (k − 2)/(k − 1), 1}
of cardinality k ∈ N, k ≥ 2, and transition function f : K n → K. There
exists a DMV term τ (x1 , . . . , xn ) with associated term function under the
DMV algebra Id satisfying
τ Id (x1 , . . . , xn ) = f (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ K n .
Proof. We start by linearly interpolating the CA transition funtion
f : K n → K to a continuous function fc : [0, 1]n → [0, 1]. As all the
points
(x1 , . . . , xn , f (x1 , . . . , xn )) ∈ K n+1
16 YANI ZHANG AND HELMUT BÖLCSKEI

have exclusively rational coordinates, the linear pieces of the interpolated

function fc have rational coefficients as well. Application of Theorem 2.3
to fc then yields a DMV term τ satisfying τ Id = fc on [0, 1]n . This implies
τ Id = fc = f on K n , as desired. ⊣

§3. Recurrent neural networks realize CA dynamics. In this sec-

tion, we show that recurrent neural networks can realize the overall dy-
namical behavior of CA. This will be accomplished in two steps. First,
we demonstrate that ReLU networks naturally realize operations in DMV
algebras, which, in turn, leads to a universal realization theorem for CA
transition functions. The ReLU network realizing the CA transition func-
tion is then embedded into an RNN that emulates the dynamics of the
CA.
3.1. DMV algebras and ReLU neural networks. We start by for-
mally defining ReLU neural networks [16].
Definition 3.1 (ReLU neural network). Let L ∈ N and N0 , N1 , . . . ,
NL ∈ N. A ReLU neural network is a map Φ : RN0 → RNL given by

W1 ,
 L=1
Φ = W2 ◦ ρ ◦ W1 , L=2,

WL ◦ ρ ◦ WL−1 ◦ ρ ◦ · · · ◦ ρ ◦ W1 , L ≥ 3


where, for ℓ ∈ {1, 2, . . . , L}, Wℓ : RNℓ−1 → RNℓ , Wℓ (x) := Aℓ x + bℓ are

affine transformations with weight matrices Aℓ = RNℓ ×Nℓ−1 and bias vec-
tors bℓ ∈ RNℓ , and the ReLU activation function ρ : R → R, ρ(x) :=
max{0, x} acts component-wise. We denote by Nd,d′ the set of ReLU neu-
ral networks of input dimension N0 = d and output dimension NL = d′ .
The number of layers of the network Φ, denoted by L(Φ), is defined to
equal L.
It follows immediately from Definition 3.1 that ReLU networks real-
ize continuous piecewise linear functions. We shall frequently make use
of basic ReLU network constructions, namely compositions [16, Lemma
II.3], augmentations [16, Lemma II.4], and parallelizations [16, Lemma
A.7], collected here for completeness. The proofs of the corresponding
results Lemma 3.1-3.3 provided in [16] present explicit network construc-
tions that we will occasionally refer to.
Lemma 3.1 (Composition of ReLU networks [16]). Let d1 , d2 , d3 ∈ N,
Φ1 ∈ Nd1 ,d2 , and Φ2 ∈ Nd2 ,d3 . There exists a network Ψ ∈ Nd1 ,d3 with
L(Ψ) = L(Φ1 ) + L(Φ2 ), satisfying
Ψ(x) = (Φ2 ◦ Φ1 )(x), for all x ∈ Rd1 .
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 17

Lemma 3.2 (Augmentation of ReLU networks [16]). Let d1 , d2 , L ∈ N,

and Φ ∈ Nd1 ,d2 with L(Φ) < L. There exists a network Ψ ∈ Nd1 ,d2 with
L(Ψ) = L, satisfying Ψ(x) = Φ(x), for all x ∈ Rd1 .
Lemma 3.3 (Parallelization of ReLU networks [16]). Let n, d, L ∈ N
and, for i ∈ {1, . . . , n}, let d′i ∈ N and Φi ∈ Nd,d′i with L(Φi ) = L.
There exists a network Ψ ∈ Nd,Pni=1 d′i with L(Ψ) = L, satisfying
Pn
d′i
Ψ(x) = (Φ1 (x), . . . , Φn (x)) ∈ R i=1 , for all x ∈ Rd .
We proceed with the ReLU network constructions realizing operations
in the DMV algebra Id = ⟨[0, 1], ⊕, ¬, 0, {δi }i∈N ⟩. To this end, we start by
noting that the operation ¬x = 1 − x is trivially implemented by a ReLU
network with one layer according to
¬x = W1 (x) = 1 − x, for x ∈ [0, 1].
By the same token, as the operations δi x = 1i x, for i ∈ N, are affine map-
pings, they can be realized by single-layer ReLU networks. The following
lemma details the ReLU network constructions realizing the operations
x ⊕ y = min{1, x + y} and x ⊙ y = max{0, x + y − 1} in Id .
Lemma 3.4. There exist ReLU networks Φ⊕ ∈ N2,1 and Φ⊙ ∈ N2,1
satisfying
Φ⊕ (x, y) = min{1, x + y}
Φ⊙ (x, y) = max{0, x + y − 1},
for all x, y ∈ [0, 1].
Proof. First, to realize the operation x ⊕ y = min{1, x + y}, we note
that addition can be implemented by a single-layer ReLU network accord-
ing to

x
x+y = 1 1 .
y
For the “min” operation, we observe that
min{1, x} = 1 − ρ(1 − x) = (W2 ◦ ρ ◦ W1 )(x), x ∈ [0, 1],
where
W1 (x) = −x + 1, W2 (x) = −x + 1.
Now, applying Lemma
3.1 to concatenate the networks
x
Φ1 (x, y) = 1 1 and Φ2 (x) = (W2 ◦ ρ ◦ W1 )(x) yields the desired
y
ReLU network realization of x ⊕ y according to
x ⊕ y = (W2⊕ ◦ ρ ◦ W1⊕ )(x, y), x, y ∈ [0, 1],
18 YANI ZHANG AND HELMUT BÖLCSKEI

where

x
W1⊕ (x, y) = −1 −1 + 1, W2⊕ (x) = −x + 1.
y
To realize the operation x ⊙ y = max{0, x + y − 1}, we directly note
that

x
max{0, x + y − 1} = ρ 1 1 − 1 = (W2⊙ ◦ ρ ◦ W1⊙ )(x, y),
y
for x, y ∈ [0, 1], where

x
W1⊙ (x, y) = 1 1 − 1, W2⊙ (x) = x.
y
⊣
Equipped with the ReLU network realizations of the logical operations
underlying Id , we are now ready to state the following universal represen-
tation result.
Proposition 3.5. Consider the DMV algebra Id in Definition 2.6. Let
n ∈ N. For each DMV term τ (x1 , . . . , xn ) and its associated term function
τ Id : [0, 1]n → [0, 1], there exists a ReLU network Φ ∈ Nn,1 such that
Φ(x1 , . . . , xn ) = τ Id (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ [0, 1]n .
Proof. The proof follows by realizing the logical operations appearing
in the term function τ Id through corresponding concatenations, according
to Lemma 3.1, of ReLU networks implementing the operations ⊕ and ⊙
as by Lemma 3.4 and noting that ¬x = 1 − x, δi x = 1i x are trivally ReLU
networks with one layer. ⊣
We note that, in general, the ReLU network Φ in Proposition 3.5 will be
a properly deep network as it is obtained by concatenating the networks
realizing the basic logical operations ¬, ⊕, ⊙, and {δi }i∈N .
Now the ground has been prepared for the central result in this section,
namely a universal ReLU network realization theorem for CA transition
functions. Specifically, this will be effected by combining the connection
between CA and DMV algebras established in Section 2 with the ReLU
network realizations of the logical operations in DMV algebras presented
above.
Theorem 3.6. Consider a CA with cellular space dimension d ∈ N,
1
neighborhood size n ∈ N, state set K = {0, k−1 , . . . , k−2
k−1 , 1} of cardinality
n
k ∈ N, k ≥ 2, and transition function f : K → K. There exists a ReLU
network Φ ∈ Nn,1 satisfying
Φ(x1 , . . . , xn ) = f (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ K n .
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 19

Proof. By Proposition 2.4, there exists a DMV term τ (x1 , . . . , xn )

whose corresponding term function under the DMV algebra Id satisfies
(14) τ Id (x1 , . . . , xn ) = f (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ K n .
Application of Proposition 3.5 then yields a ReLU network Φ ∈ Nn,1 such
that
(15) Φ(x1 , . . . , xn ) = τ Id (x1 , . . . , xn ), for all (x1 , . . . , xn ) ∈ [0, 1]n .
The proof is finalized by combining (14) and (15) to get
Φ(x1 , . . . , xn ) = τ Id (x1 , . . . , xn ) = f (x1 , . . . , xn ),
for all (x1 , . . . , xn ) ∈ K n . ⊣
3.2. Realizing the dynamical behavior of CA. We now turn our
attention to the dynamical systems aspects of CA. Concretely, we show
how CA evolution can be realized through RNNs. The basic idea under-
lying the construction we present is to suitably impose a recurrent struc-
ture on top of the ReLU network realizing the transition function of the
CA under consideration. Concretely, we build on the RNN construction
techniques developed in [24] and, for simplicity of exposition, consider the
case of one-dimensional CA. An extension to the multi-dimensional case is
quite readily obtainable using tools from multi-dimensional signal process-
ing [44] and multi-dimensional RNNs [21], [29]. For the sake of simplicity
of exposition, however, we do not provide these extensions here.
An RNN is a discrete dynamical system mapping an input sequence to
an output sequence, both possibly of infinite length, through—at each dis-
crete time step—application of a feedforward neural network that updates
a hidden-state vector and computes the next output signal sample [17].
The formal definition of an RNN is as follows.
Definition 3.2 (Recurrent neural network). For hidden-state vector
dimension m ∈ N, let Φ ∈ Nm+1,m+1 be a ReLU neural network. The
recurrent neural network associated with Φ is the operator RΦ mapping
input sequences (x[z])z∈N0 in R to output sequences (y[z])z∈N0 in R ac-
cording to

y[z] x[z]
(16) =Φ , z ∈ N0 ,
h[z] h[z − 1]
where h[z] ∈ Rm is the hidden-state vector with initial value h[−1] = 0m .
The key to emulating the evolution of a CA with an RNN is the con-
struction of an appropriate hidden-state vector in combination with a
suitable ReLU network that encodes the CA transition function. Before
stating the corresponding result, by way of preparation, we introduce a
20 YANI ZHANG AND HELMUT BÖLCSKEI

decomposition of Φ in (16) according to

f
Φ
(17) Φ= ,
Φh
where Φf ∈ Nm+1,1 is responsible for the computation of the output sam-
ple according to

f x[z]
(18) y[z] = Φ , z ∈ N0 ,
h[z − 1]
and Φh ∈ Nm+1,m effects the evolution of the hidden-state vector such
that

h x[z]
(19) h[z] = Φ , z ∈ N0 .
h[z − 1]
The following theorem states the announced universal realization theorem
for one-dimensional CA by RNNs.
Theorem 3.7. Consider a CA with cellular space dimension d = 1,
neighborhood set E = {0, −1, . . . , −n + 1} of size n ∈ N, n ≥ 2, state
1
set K = {0, k−1 , . . . , k−2
k−1 , 1} of cardinality k ∈ N, k ≥ 2, and transition
n
function f : K → K with associated CA map F . There exists an RNN
that maps every configuration (c[z])z∈Z over the cellular space Z to the
next configuration (F (c)[z])z∈Z according to
(20) F (c)[z] = f (c[z], c[z − 1], . . . , c[z − n + 1]), ∀z ∈ Z.
Remark. Note that here we do not work with the general neighborhood
set E = {0, z1 , . . . , zn−1 } ⊂ Z, but rather consider the concrete neigh-
borhood set E = {0, −1, . . . , −n + 1}. This does not result in a loss of
generality as the following argument shows. Take a general neighborhood
set {0, z1 , . . . , zn−1 } ⊂ Z and expand it into a set of the form
(21) {ℓ1 , ℓ1 − 1, . . . , ℓ2 + 1, ℓ2 },
where ℓ1 := max{0, z1 , . . . , zn−1 } and ℓ2 := min{0, z1 , . . . , zn−1 } by, when
needed, adding void neighbors that do not affect the CA transition func-
tion. For example, consider the CA with neighborhood set E∗ = {1, 0, −2},
transition function f∗ , and corresponding global map
F∗ (c∗ )[z] = f∗ (c∗ [z + 1], c∗ [z], c∗ [z − 2]), for (c∗ [z])z∈Z .
Then, expand E∗ to E = {1, 0, −1, −2} and make f∗ formally depend on
the void neighbor z = −1 according to

f (c∗ [z + 1], c∗ [z], c∗ [z − 1], c∗ [z − 2]) =

f∗ (c∗ [z + 1], c∗ [z], c∗ [z − 2]), for (c∗ [z])z∈Z .
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 21

These modifications lead to the global map

(22) F∗ (c∗ )[z] = f (c∗ [z + 1], c∗ [z], c∗ [z − 1], c∗ [z − 2]), for (c∗ [z])z∈Z .
The general form (20) is finally obtained upon making the substitution
c[z] = c∗ [z + 1], for all z ∈ Z,
to yield
F (c)[z] = f (c[z], c[z − 1], c[z − 2], c[z − 3]), for (c[z])z∈Z .
Proof. For ease of exposition, we provide the proof for one-sided in-
finite sequences (c[z])z∈N0 only. The general case follows mutatis mu-
tandis, but is based on exactly the same ideas. The proof is construc-
tive and will be effected by explicitly specifying the ReLU networks Φf
and Φh in (17) underlying the desired RNN. By (20), the output sample
F (c)[z], which we identify with y[z], is a function of the states of the cells
{z, z − 1, . . . , z − n + 1}. Therefore, equating the current input sample
x[z] in (18) with c[z], we choose the hidden-state vector h[z − 1] such that
it stores the states of the neighbors {z − 1, . . . , z − n + 1} of the current
input cell, i.e.,
 
c[z − 1]
 c[z − 2] 
(23) h[z − 1] =  .
 
..
 . 
c[z − n + 1]
This leads to
 
c[z]
 c[z − 1]

h[z] = 
 
.. 
 . 
c[z − n + 2]
and informs the choice of Φh , which must satisfy
 
c[z]  
c[z]
  c[z − 1]
 
  c[z − 1] 
c[z]

Φ h
= Φh  ..
 =   = h[z].
   
h[z − 1] . ..

c[z − n + 2]
  . 
c[z − n + 2]
c[z − n + 1]
The evolution of the hidden-state vector hence proceeds by dropping the
oldest value c[z − n + 1] and inserting the new value c[z] at the top.
Following the methodology developed in [24], Φh can be realized by a
two-layer ReLU network according to
Φh : Rn → Rn−1 , Φh (x) = (W2h ◦ ρ ◦ W1h )(x),
22 YANI ZHANG AND HELMUT BÖLCSKEI

with affine maps W1h : Rn → R2n , W2h : R2n → Rn−1 given by

h In
W1 (x) = x, for x ∈ Rn ,
(24) −I n

W2 (x) = In−1 0n−1 − In−1 0n−1 x, for x ∈ R2n .

We are left with the design of the network Φf which has to satisfy
 
c[z]
 c[z − 1] 
F (c)[z] = Φf   .
 
..
 . 
c[z − n + 1]
It follows by inspection of (20) that this amounts to realizing the CA
transition function f through the ReLU network Φf . Based on this insight,
we can apply Theorem 3.6 to conclude the existence of Φf . In fact, the
proof of Theorem 3.6 spells out how Φf can be obtained explicitly by
composing the logical operations in the DMV term associated with f . The
proof is now completed by noting that the overall network Φ is obtained
by applying Lemma 3.2 and Lemma 3.3 to combine Φf and Φh according
to (17). ⊣

§4. Identification of CA logic from trained neural networks.

With suitably chosen transition functions and initial configurations, CA
can simulate a plethora of dynamical behavior characteristics [50]. The
inverse problem of deducing CA transition functions from observations of
their evolution is, however, extremely difficult [4], [1]. Formally, this is
known as the CA identification problem [1, Section 1.4]: Given a finite
sequence of consecutive configurations {c1 , c2 , . . . , cT } collected during the
evolution, construct a CA (Zd , K, E, f ) whose associated CA map F sat-
isfies
F (ct ) = ct+1 , for t = 1, . . . , T − 1.
Note that the initial configuration need not be recoverable from observed
evolution traces [49, Section 3], [27, Section 4].
Whilst the cellular space Zd and the state set K can be read off directly
from evolution traces, the neighborhood set E can either be selected man-
ually [36], [2] or determined through the application of specific criteria
such as, e.g., mutual information [53]. The most challenging aspect of the
CA identification problem resides in determining the transition function f .
A detailed discussion of known CA identification methods can be found
in [3].
We next propose a novel approach to CA identification, namely read-
ing out the DMV formula underlying the CA under consideration from
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 23

an RNN trained on its evolution traces. The focus will be on elucidat-

ing the fundamental principles of this idea. Accordingly, we will not be
concerned with the performance of specific RNN training algorithms, but
will rather assume that the feedforward network part Φf inside the RNN
has been trained to achieve what is called in machine learning parlance
“interpolation”, i.e.,

(25) Φf (x1 , . . . , xn ) = f (x1 , . . . , xn ),

for all (x1 , . . . , xn ) ∈ K n . This, of course, requires that the RNN being
trained have “seen” all possible combinations of neighborhood states and
thereby the transition function f on its entire domain K n , a condition met
when the training configuration sequences are sufficiently long and their
initial configurations exhibit sufficient richness [1, Thesis 3.1, Thesis 3.2].
We will make this a standing assumption. In addition, the cellular space,
the state set, and the neighborhood set are taken to be known a priori.
4.1. Interpolation. The procedure for extracting DMV terms under-
lying CA evolution data presented in Section 4.3 below works off Φf as a
(continuous) function mapping [0, 1]n to [0, 1]. It turns out, however, that
condition (25) does not uniquely determine Φf on [0, 1]n as there are—in
general infinitely many—different ways of interpolating f (x1 , . . . , xn ) to a
continuous piecewise linear function; recall that ReLU networks always re-
alize continuous piecewise linear functions. Since Theorem 2.3 states that
the truth functions associated with DMV terms under Id are continuous
piecewise linear functions with rational coefficients, we can constrain the
weights of Φf to be rational. In fact, as the next two lemmata show, when
interpolating CA transition functions, we can tighten this constraint even
further, namely to integer weights and rational biases. We first establish
that CA transition functions can be interpolated to continuous piecewise
linear functions with integer weights and rational biases4 . Then, it is
shown that such functions can be realized by ReLU networks with integer
weights and rational biases.

Lemma 4.1. Consider a CA with cellular space dimension d ∈ N, neigh-

1
borhood size n ∈ N, state set K = {0, k−1 , . . . , k−2
k−1 , 1} of cardinality
k ∈ N, k ≥ 2, and transition function f : K n → K. There exists a
function fc : [0, 1]n → [0, 1] with the following properties:
1. fc (x1 , . . . , xn ) = f (x1 , . . . , xn ), for (x1 , . . . , xn ) ∈ K n ,
2. fc is continuous with respect to the natural topology on [0, 1]n ,

4
To be consistent with terminology used in the context of neural networks, for a
linear function of the form p(x1 , . . . , xn ) = m1 x1 + · · · + mn xn + b, the coefficients mi
will often be referred to as weights and b as bias.
24 YANI ZHANG AND HELMUT BÖLCSKEI

3. there exist ℓ ∈ N and linear functions

bj
(26) pj (x1 , . . . , xn ) = mj1 x1 + · · · + mjn xn + , j = 1, . . . , ℓ,
k−1
with bj , mj1 , . . . , mjn ∈ Z, for j = 1, . . . , ℓ such that, for every x ∈
[0, 1]n there is an index j ∈ {1, . . . , ℓ} with fc (x) = pj (x).
Proof. We explicitly construct a function fc satisfying Properties 1-3.
For n = 1, one simply performs linear interpolation between each pair of
i i i+1 i+1
points ( k−1 , f ( k−1 )) and ( k−1 , f ( k−1 )), i = 0, . . . , k − 2, to obtain an fc
with the desired properties. While it is immediate that the resulting func-
tion fc satisfies Properties 1 and 2 and has the structure as demanded by
Property 3, the condition bj , mj1 , . . . , mjn ∈ Z will be verified summarily,
for all n, at the end of the proof.
Before proceeding to the cases n ≥ 2, we note that throughout the proof
an n-simplex, n ∈ N, with vertices {v0 , v1 , . . . , vn } ⊂ Rn will be denoted
by ∆n = [v0 , v1 , . . . , vn ]. We turn to the case n = 2 and refer to Figure 6
for an illustration of the domain K 2 of f for k = 3. First, we divide [0, 1]2
i1 i1 +1 i2 i2 +1
into the squares [ k−1 , k−1 ] × [ k−1 , k−1 ], i1 , i2 ∈ {0, . . . , k − 2}, followed
by a subdivision of each of the resulting squares into 2-simplices. There
are precisely two possibilities for each of these subdivisions. To illustrate
1 1
this, we consider the square [0, k−1 ] × [0, k−1 ] and note that it can be split
into two 2-simplices according to either

2 1 1 1
∆1 = (0, 0), 0, , ,
k−1 k−1 k−1

2 1 1 1
∆2 = (0, 0), ,0 , ,
k−1 k−1 k−1
or

2 1 1
∆3 = (0, 0), 0, , ,0
k−1 k−1

1 1 1 1
∆24 = , , 0, , ,0 ,
k−1 k−1 k−1 k−1
as depicted in Figure 7.
Fixing the subdivisions of all squares, on each 2-simplex ∆2 = [v0 , v1 , v2 ],
we define a linear function p∆2 that interpolates f (x), i.e., p∆2 (x) = f (x),
for x ∈ {v0 , v1 , v2 }. As {v0 , v1 , v2 } is affinely independent, p∆2 is uniquely
determined [41, Chapter 13]. We then stitch the resulting linear pieces
together as follows. For each point x ∈ [0, 1]2 , if it falls into the interior of
some 2-simplex ∆2 , we set fc (x) = p∆2 (x); if x resides in the intersection
of two or more 2-simplices, then fc (x) is taken to have the (shared) value
of the interpolating functions associated with the intersecting simplices
evaluated at x. The resulting function fc satisfies Properties 1 and 2 and
exhibits the structure as demanded by Property 3.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 25

1/2

0 1/2 1

Figure 6. The set K 2 for k = 3.

1 1 1 1 1 1
(0, k−1 ) ( k−1 , k−1 ) (0, k−1 ) ( k−1 , k−1 )

1 1
(0, 0) ( k−1 , 0) (0, 0) ( k−1 , 0)

Figure 7. Two different subdivisions.

For n ≥ 3, we first divide the unit cube [0, 1]n into the n-dimensional
i1 i1 +1 in in +1
cubes [ k−1 , k−1 ] × · · · × [ k−1 , k−1 ], i1 , . . . , in ∈ {0, . . . , k − 2}. Each of
the resulting smaller cubes is then subdivided following the procedure
1
described in [22, Proof of 2.10], which divides, e.g., ∆n−1 × [0, k−1 ] into n-
n−1 n−1 1
simplices as follows. Let ∆ × {0} = [v0 , . . . , vn−1 ] and ∆ × { k−1 }=
[w0 , . . . , wn−1 ] and note that the coordinates of vj ∈ R and wj ∈ Rn n
1
coincide in the first n − 1 indices. Then, ∆n−1 × [0, k−1 ] is given by the
union of the n-simplices [v0 , . . . , vj , wj , . . . , wn−1 ], j = 0, . . . , n − 1, each
intersecting the next one in an (n−1)-simplex face. The result of this pro-
cedure yields a subdivision of [0, 1]n into n-simplices with vertices K n . As
in the case n = 2, an interpolating linear function is uniquely determined
on each of these n-simplices [41, Chapter 13]; we stitch the resulting linear
pieces together to obtain a function fc satisfying Property 1 and exhibit-
ing the structure as demanded by Property 3. As fc constitutes a simplex
interpolation of the discrete function f on the cube [0, 1]n , it is continuous
[39], [48], thereby satisfying Property 2.
We hasten to add that, for n ≥ 2, the set of n-simplices resulting from
the subdivision procedure employed here is not unique, see Figure 7 for
an illustration in the case n = 2. But, each fixed subdivision into n-
simplices uniquely determines a continuous piecewise linear function fc
that interpolates f .
26 YANI ZHANG AND HELMUT BÖLCSKEI

It remains to prove that the coefficients of the constituent linear poly-

nomials
bj
(27) pj (x1 , . . . , xn ) = mj1 x1 + · · · + mjn xn + , j = 1, . . . , ℓ,
k−1
of fc , indeed, satisfy bj , mj1 , . . . , mjn ∈ Z, for j = 1, . . . , ℓ. As already
mentioned, the coefficients of each pj are uniquely determined and ob-
tained by solving a system of linear equations induced by the vertex co-
ordinates of the n-simplex pj resides on. We now show that the vertices
v0 , v1 , . . . , vn ⊂ K n of every n-simplex in [0, 1]n resulting from the proce-
dure described above satisfy, after reordering if necessary,
1
(28) ||vi − vi+1 ||1 = , for i = 0, . . . , n − 1.
k−1
In other words, each of the (reordered) vertices differs from the next one
1
in only one position, specifically by ± k−1 . In the case n = 1 the condi-
tion (28) holds without reordering. For n = 2, no reordering is needed in
∆21 and ∆22 , while in each of ∆23 and ∆24 we only need to swap the second
vertex with the first one. For general n ≥ 3, we make use of the fact that,
for each n-simplex ∆n of [0, 1]n constructed according to the procedure
described above, there exists an (n − 1)-simplex ∆n−1 = [v0 , v1 , . . . , vn−1 ]
such that

n v0 vi vi vn−1
∆ = j , . . ., j , j+1 , . . . , j+1 ,
k−1 k−1 k−1 k−1

for some j ∈ {0, . . . , k − 2} and with i ∈ {0, . . . , n − 1}. If ∆n−1 sat-

isfies (28), it follows directly that ∆n satisfies (28). Now, noting that
for n = 3, ∆2 satisfies (28) as already established, and proceeding by
induction across n establishes the desired statement.
Finally, assume that the linear piece
m1 x1 + m2 x2 + · · · + mn xn + m0
of fc is obtained, w.l.o.g. thanks to (28), by interpolating n + 1 points at

a1 a2 an b0
, , . . ., ,
k−1 k−1 k−1 k−1

a1 + 1 a2 an b1
, , . . ., ,
k−1 k−1 k−1 k−1

a1 + 1 a2 + 1 an b2
, , . . ., ,
k−1 k−1 k−1 k−1
..
.

a1 + 1 a2 + 1 an + 1 bn
, , . . ., , ,
k−1 k−1 k−1 k−1
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 27

with a1 , . . . , an , b0 , . . . , bn ∈ {0, 1, . . . , k−1}. The coefficients m0 , m1 , . . . , mn

hence have to satisfy the following system of linear equations
a1 a2 an b0
m1 + m2 + · · · + mn + m0 =
k−1 k−1 k−1 k−1
a1 + 1 a2 an b1
m1 + m2 + · · · + mn + m0 =
k−1 k−1 k−1 k−1
(29) a1 + 1 a2 + 1 an b2
m1 + m2 + · · · + mn + m0 =
k−1 k−1 k−1 k−1
..
.
a1 + 1 a2 + 1 an + 1 bn
m1 + m2 + · · · + mn + m0 = .
k−1 k−1 k−1 k−1
Subtracting the first row in (29) from the second and multiplying the
result by k − 1, we obtain
m1 = b1 − b0 ∈ Z.
Continuing likewise by subtracting the first row from the third yields
m2 = b2 − b0 − m1 ∈ Z
and so forth, establishing that mi ∈ Z, for all i = 1, 2, . . . , n. Substituting
back into any of the equations in (29) establishes that m0 is of the desired
form as well and thereby finalizes the proof. ⊣
We next show that functions fc satisfying Properties 1-3 in Lemma 4.1
can, indeed, be realized by ReLU networks with integer weights and biases
b
in Qk := { k−1 : b ∈ Z}.

Lemma 4.2. Let n ∈ N and let fc : [0, 1]n → [0, 1] be a continuous

piecewise linear function with linear pieces p1 , . . . , pℓ , ℓ ∈ N, of the form
bj
(30) pj (x1 , . . . , xn ) = mj1 x1 + · · · + mjn xn + , j = 1, . . . , ℓ,
k−1
where k ≥ 2 and mj1 , . . . , mjn , bj ∈ Z, for j = 1, . . . , ℓ. There exists a
ReLU network Φ ∈ Nn,1 with integer weights and biases in Qk , satisfying
Φ(x) = fc (x), for all x ∈ [0, 1]n .
Proof. By Lemma D.1, the function fc can be written in terms of
p1 , . . . , pℓ through “min” and “max” operations in the form
(31) fc = max min pj ,
I J

where I, J ⊂ {1, . . . , ℓ} are index sets. The linear pieces pj , for j = 1, . . . , ℓ,

in (30) can straightforwardly be realized by single-layer ReLU networks
with integer weights mji and biases bj /(k − 1). Next, we note that the
28 YANI ZHANG AND HELMUT BÖLCSKEI

“min” operation can be implemented by a ReLU network with integer

weights and biases equal to 0 according to
min{x1 , x2 } = ρ(x1 ) − ρ(−x1 ) − ρ(x1 − x2 ), for x1 , x2 ∈ R.
As
min{x1 , x2 , x3 } = min{x1 , min{x2 , x3 }}, for x1 , x2 , x3 ∈ R,
it follows that the minimum over any number of variables can be real-
ized by nesting minimum operations. Hence, inspection of the proof of
[16, Lemma II.3] reveals that there exists a ReLU network with integer
weights and biases equal to 0 computing the minimum over any number
of variables. Likewise, we have
max{x1 , x2 } = ρ(x1 ) − ρ(−x1 ) + ρ(x2 − x1 ), for x1 , x2 ∈ R,
and
max{x1 , x2 , x3 } = max{x1 , max{x2 , x3 }}, for x1 , x2 , x3 ∈ R,
which analogously establishes that the operation taking the maximum
over any number of variables can be realized by a ReLU network with
integer weights and biases equal to 0. Based on (31), fc can be written as
the composition of ReLU networks with integer weights and biases either
equal to 0 (for the min and max operations) or equal to bj /(k − 1) (for
the pj ). Inspection of the proof of [16, Lemma II.3] then shows that the
resulting overall ReLU network will have integer weights and biases in Qk ,
as desired. ⊣
The proof of Lemma 4.1, combined with Lemma 4.2, reveals that even
if we constrain the network Φf interpolating the transition function f to
have integer weights and biases in Qk , the network will not be unique.
This is a consequence of the n-simplices underlying the linear pieces pj
constructed in the proof of Lemma 4.1 not being unique, see e.g. Figure 7
for an illustration of this phenomenon in the case n = 2. Correspondingly
the DMV formulae extracted from the networks realizing the functions fc
based on different choices of n-simplices, following the procedure described
in Section 4.3 below, will be functionally different. Beyond this nonunique-
ness, we can even get interpolating ReLU networks with integer weights
and biases in Qk that do not effect simplex interpolation. This aspect is
illustrated in the next example.
Example 4.1. Consider a CA with cellular space dimension d = 1,
neighborhood size n = 2, state set {0, 1/2, 1} with k = 3, and transition
function f : {0, 1/2, 1}2 → {0, 1/2, 1} specified as follows:

x−1 x0 (0, 0) (0, 1/2) (0, 1) (1/2, 0) (1/2, 1/2)

f (x−1 , x0 ) 0 0 0 0 1
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 29

Figure 8. Different ReLU networks interpolating f .

x−1 x0 (1/2, 1) (1, 0) (1, 1/2) (1, 1)

f (x−1 , x0 ) 0 0 0 0

Figure 8 shows two different ReLU networks Φf1 , Φf2 : [0, 1]2 → [0, 1],
both with integer weights and biases in Qk , interpolating the transition
function. We remark that Φf1 effects simplex interpolation, while Φf2 does
not. Now, applying the DMV formula extraction algorithm introduced in
Section 4.3 below yields the DMV term associated with Φf1 as
τ1 = (x−1 ⊕ x−1 ) ∧ (¬x−1 ⊕ ¬x−1 ) ∧ (x0 ⊕ x0 ) ∧ (¬x0 ⊕ ¬x0 ),
and that associated with Φf2 according to
τ2 = (x−1 ⊕ x−1 ) ∧ (¬x−1 ⊕ ¬x−1 ) ∧ (x0 ⊕ x0 ⊕ x0 ) ∧ (¬x0 ⊕ ¬x0 ⊕ ¬x0 ).
These two DMV terms are algebraically and functionally different, but
the associated term functions under Id coincide on K 2 , i.e.,
τ1Id (x−1 , x0 ) = τ2Id (x−1 , x0 ) = f (x−1 , x0 ),
for (x−1 , x0 ) ∈ {0, 1/2, 1}2 .
We shall consider all term functions that coincide with a given CA
transition function on K n to constitute an equivalence class in the sense
of faithfully describing the logical behavior of the underlying CA.
There are two further sources of potential nonuniqueness discussed next.
4.2. Uniqueness properties of DMV formula extraction. We
first comment on a structural property pertaining to the RNN emulat-
ing the CA evolution. Recall the decomposition of Φ in (17). Our con-
struction built on the idea of having the hidden-state vector store the
neighbors of the current input cell, leading to the specific form of Φh ,
and making the subnetwork Φf be responsible for the computation of the
output samples. But this separation need not be the only way the RNN
can realize the CA evolution. The resulting ambiguity is, however, eas-
ily eliminated by enforcing the split of Φ according to (17) on the RNN
30 YANI ZHANG AND HELMUT BÖLCSKEI

to be trained. In practice, this means that the weights of Φh are fixed

according to (24) and only the weights of Φf are trained. In the light of
Lemma 4.1 and Lemma 4.2, the training algorithm has to enforce integer
weights and biases in Qk on Φf . This can be done using techniques such
as those described in e.g. [51], [23]. At the end of the training process,
the DMV formula underlying the CA can be read out from Φf using the
algorithm presented in Section 4.3 below. Recall that given what was said
after Lemma 4.2, this formula will not be unique in general.
The second aspect we wish to dwell on here revolves around the fact
that algebraically different DMV formulae can be functionally equivalent,
just like what one has in classical algebra. To demonstrate the matter, we
analyze a simple example in Boolean algebra, namely the elementary CA
of rule 30 with transition function specified in DNF in (2). A functionally
equivalent, but algebraically different, expression for this transition rule
is given by

(32) f30 = (x−1 ⊙ ¬x0 ⊙ ¬x1 ) ⊕ (¬x−1 ⊙ x0 ⊙ x1 ) ⊕

(¬x−1 ⊙ x0 ⊙ ¬x1 ) ⊕ (¬x−1 ⊙ ¬x0 ⊙ x1 ).

This shows that even if we restrict ourselves to DNF, the algebraic ex-
pression will not be unique in general. Moreover, we can also express rule
30 in conjunctive normal form (CNF), i.e., as a concatenation of a finite
number of clauses linked by the Boolean ⊙ operation, where each clause
consists of a finite number of variables or negated variables connected by
the Boolean ⊕ operation, e.g.,

(33) f30 = (x−1 ⊕ x0 ⊕ x1 ) ⊙ (¬x−1 ⊕ ¬x1 ) ⊙ (¬x−1 ⊕ ¬x0 ),

(34) f30 = (x−1 ⊕ x0 ⊕ x1 ) ⊙ (¬x−1 ⊕ ¬x1 ) ⊙ (¬x−1 ⊕ ¬x0 ⊕ x1 ).

If we do not limit ourselves to normal forms, starting from an expression

f30 (with x1 as one of its variables) for the transition function of CA 30,
equivalent expressions for f30 are given by

(35) f30 ⊕ (x1 ⊙ ¬x1 ),

and

(36) f30 ⊙ (x1 ⊕ ¬x1 ).

Now, noting that the ReLU network realization of f30 as expressed in (32),
is given by

Φ̂f30 := Ŵ3f30 ◦ ρ ◦ Ŵ2f30 ◦ ρ ◦ Ŵ1f30 ,

CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 31

with
   
−1 −1 1   0
x1
f30  1 1 −1 

 −1
 
−1 1 −1 x0 +  0  ,
Ŵ1 (x−1 , x0 , x1 ) = 
x−1
1 −1 −1 0

f30 1 1 1 1 0
Ŵ2 (x) = x+ , x ∈ R4
1 1 1 1 −1
f30
x ∈ R2 ,

Ŵ3 (x) = 1 −1 x,
we can conclude, by comparing to the network (7) built based on (2),
that different algebraic expressions for f30 lead to different ReLU network
realizations. Yet, these two networks must exhibit identical input-output
relations. Conversely, the observation just made shows that ReLU net-
works can be modified without changing their input-output relations.
4.3. Extracting DMV formulae from trained networks. We now
discuss how DMV formulae can be read out from the network Φf . Recall
that the idea is that Φf was trained on CA evolution data and the ex-
tracted DMV formula describes the logical behavior underlying this CA.
In the Boolean case, with K = {0, 1}, the truth table representing the
CA transition function can be obtained by passing all possible neighbor-
hood state combinations through the trained network Φf and recording
the corresponding outputs. Following, e.g., the procedure in [38, Section
12.2], this truth table can then be turned into a Boolean formula. For
state sets K of cardinality larger than two, we are not aware of systematic
procedures that construct DMV terms from truth tables. The approach
we develop here applies to state sets of arbitrary cardinality and does not
work off truth tables, but rather starts from a (trained) ReLU network Φf
that, by virtue of satisfying the interpolation property (25), realizes a
linear interpolation of the truth table.
We start by noting that Φf encodes its underlying DMV formula both
through the network topology and the network weights and biases. Hon-
oring how the network architecture, i.e., layer-by-layer compositions of
affine mappings and the ReLU nonlinearity, affects the extracted DMV
term, we aim to proceed in a decompositional manner and on a node-by-
node basis. It turns out, however, that the individual neurons, in general,
will not correspond to DMV terms. To see this, recall that DMV term
functions map [0, 1]n to [0, 1] and note that the range of the function ρ is
not in [0, 1] in general, e.g., ρ(3x) : [0, 1] → [0, 3]. We will deal with this
matter by transforming the ρ-neurons in Φf into σ-neurons with a suit-
ably chosen σ : R → [0, 1]. This will be done in a manner that preserves
the network’s input-output relation and is reversible in a sense to be made
precise below. We start by defining σ-networks.
32 YANI ZHANG AND HELMUT BÖLCSKEI

Definition 4.1 (σ-network). Let L ∈ N and N0 , N1 , . . . , NL ∈ N. A

σ-network Ψ is a map Ψ : RN0 → RNL given by

W1 ,
 L=1
(37) Ψ = W2 ◦ σ ◦ W1 , L=2,

WL ◦ σ ◦ WL−1 ◦ σ ◦ · · · ◦ σ ◦ W1 , L ≥ 3


where, for ℓ ∈ {1, 2, . . . , L}, Wℓ : RNℓ−1 → RNℓ , Wℓ (x) := Aℓ x + bℓ

are affine transformations with weight matrices Aℓ = RNℓ ×Nℓ−1 and bias
vectors bℓ ∈ RNℓ , and the activation function σ : R → [0, 1], σ(x) :=
min{1, max{0, x}} acts component-wise. Moreover, we define the depth
of the network Ψ as L(Ψ) := L.
We proceed to formally establish the result on ρ-network to σ-network
transformation announced above.
Lemma 4.3. For n ∈ N, let Φ : [0, 1]n → [0, 1] be a ReLU network with
integer weights and biases in Qk , with k ≥ 2. There exists a σ-network
Ψ : [0, 1]n → [0, 1], with L(Ψ) = L(Φ), integer weights, and biases in Qk ,
satisfying
Ψ(x) = Φ(x), for all x ∈ [0, 1]n .
Proof. By Definition 3.1, we can write5
Φ = WL ◦ ρ ◦ WL−1 ◦ ρ ◦ · · · ◦ ρ ◦ W1 ,
with NL = 1, N0 = n, Wℓ (x) = Aℓ x + bℓ , Aℓ ∈ ZNℓ ×Nℓ−1 , and bℓ ∈ QN ℓ
k ,
all for ℓ ∈ {1, . . . , L}. We start with the conversion of neurons in the first
hidden layer of Φ. These neurons are all of the form
ρ(a1 x + b1 ), with a1 ∈ Z1×n , b1 ∈ Qk .
As the domain of Φ is the unit cube [0, 1]n , a1 x + b1 resides in a bounded
interval in R. Denote this interval by [−u, u] ⊂ R, u ≥ 0. If u ≤ 1, no
conversion is needed as ρ(x′ ) = σ(x′ ), for x′ ∈ [−u, u]. For u > 1, we note
that
(38) ρ(x) = σ(x) + σ(x − 1) + · · · + σ(x − m), x ∈ [−u, u],
with m ∈ N, m ≥ u − 1. In summary, we can replace each ρ-neuron in the
first hidden layer by one or several σ-neurons according to
ρ(a1 x + b1 ) = σ(a1 x + b1 )
whenever a1 x + b1 ≤ 1, for all x ∈ [0, 1]n , or
ρ(a1 x + b1 ) = σ(a1 x + b1 ) + · · · + σ(a1 x + b1 − m)
5
To keep the exposition simple, we consider the form of Φ for L ≥ 3, the cases
L = 1, 2 are trivially contained in the discussion.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 33

when a1 x + b1 ≤ m + 1, for all x ∈ [0, 1]n , for some m > 1, m ∈ N. The

resulting network Φ(1) : [0, 1]n → [0, 1] is given by
(1) (1)
Φ(1) = WL ◦ ρ ◦ · · · ◦ W3 ◦ ρ ◦ W2 ◦ σ ◦ W1 ,
(1) (1)
(1) (1) (1) (1) ×n , b(1) N (1)
with W1 (x) = A1 x + b1 , A1 ∈ ZN1 1 ∈ Qk 1 , and W2 (x) =
(1) (1) (1) (1)
A2 x + b2 , A2 ∈ ZN2 ×N1 , where N1 is the number of σ-neurons in the
first hidden layer of Φ(1) , and satisfies
Φ(1) (x) = Φ(x), for x ∈ [0, 1]n .
Noting that the input of the second-hidden layer of Φ(1) is contained in
(1)
the unit cube [0, 1]N1 , we can proceed as in the first step and replace each
of the ρ-neurons in the second hidden layer by σ-neurons. The resulting
network is given by
(2) (2) (1)
Φ(2) = WL ◦ · · · ◦ W4 ◦ ρ ◦ W3 ◦ σ ◦ W2 ◦ σ ◦ W1 ,
(2) (1) (2)
(2) (2) (2) (2) ×N1 (2) N (2)
with W2 (x) = A2 x+b2 , A2 ∈ ZN2 , b2 ∈ Qk 2 , and W3 (x) =
(2) (2) (2) (2)
A3 x + b3 , A3 ∈ Z N3 ×N2
, where N2 is the number of σ-neurons in the
second hidden layer of Φ(2) , and satisfies
Φ(2) (x) = Φ(x), for x ∈ [0, 1]n .
Continuing in this manner, we eventually get the network Φ(L−1) : [0, 1]n →
[0, 1] given by
(L−1) (L−1) (2) (1)
Φ(L−1) = WL ◦ σ ◦ WL−1 ◦ σ ◦ · · · ◦ σ ◦ W2 ◦ σ ◦ W1 ,
(ℓ) (ℓ−1) (ℓ)
(ℓ) (ℓ) (ℓ) (ℓ) (ℓ) N (ℓ)
with Wℓ (x) = Aℓ x + bℓ , Aℓ ∈ ZNℓ ×Nℓ−1 , bℓ ∈ Qk ℓ , where Nℓ
is the number of (σ-)neurons in the ℓ-th hidden layer of Φ(L−1) , for ℓ ∈
(L−1)
(L−1) (L−1) (L−1)
{1, . . . , L − 1}, and WL (x) = AL x + bL , AL ∈ ZNL ×NL−1 . The
proof is concluded by noting that the resulting σ-network Ψ := Φ(L−1)
satisfies Ψ(x) = Φ(x), for x ∈ [0, 1]n , and has integer weights and biases
in Qk . ⊣
The reverse transformation from a σ-network to a ρ-network can be ef-
fected through Lemma B.1. Starting from a ReLU nework Φ, transforming
it into a σ-network Ψ according to Lemma 4.3, and then going back to a
ReLU network Φ′ by using Lemma B.1, will recover a network Φ′ that is
functionally equivalent to Φ, i.e., Φ′ (x) = Φ(x), for all x ∈ [0, 1]n , but can
be structurally different from Φ. For example, choosing different m ∈ N
with m ≥ u − 1 in (38) would lead to such functionally equivalent, but
structurally different recovered networks Φ′ .
We are now ready to show how DMV terms can be extracted from σ-
networks. Thanks to Lemma 4.3, we can restrict ourselves to σ-networks
34 YANI ZHANG AND HELMUT BÖLCSKEI

with integer weights and biases in Qk . We begin by considering a single

σ-neuron, which is of the form
(39) σ(m1 x1 + · · · + mn xn + b), with m1 , . . . , mn ∈ Z, b ∈ Qk .
The following lemma provides an inductive way for the extraction of DMV
terms from individual σ-neurons.
Lemma 4.4. Consider the function f (x1 , . . . , xn ) = m1 x1 +· · ·+mn xn +
b, defined on [0, 1]n , with m1 , . . . , mn ∈ Z, b ∈ Qk . Without loss of gen-
erality, assume that maxi=1,...,n |mi | = m1 . With f◦ (x1 , . . . , xn ) = (m1 −
1)x1 + m2 x2 + · · · + mn xn + b, we have
(40) σ(f ) = (σ(f◦ ) ⊕ x1 ) ⊙ σ(f◦ + 1).
Proof. See Appendix C [37], [33]. ⊣
We next demonstrate, by way of a simple example, how Lemma 4.4 can
be used to extract DMV terms from individual σ-neurons.
Example 4.2. We extract the DMV term underlying σ(x − 2y + 1/2)
and start by noting that
(41) σ(x − 2y + 1/2)
(42) = 1 − σ(−x + 2y − 1/2 + 1)
(43) = ¬σ(−x + 2y + 1/2)
(44) = ¬((σ(−x + y + 1/2) ⊕ y) ⊙ σ(−x + y + 3/2)),
where in (42) we used σ(x) = 1 − σ(−x + 1), for x ∈ R, and (43) is
by ¬x = 1 − x. In (44) we applied Lemma 4.4 with x1 = y. We again
apply Lemma 4.4 with x1 = y to remove the y-terms inside σ(·) as follows:
σ(−x + y + 1/2) = (σ(−x + 1/2) ⊕ y) ⊙ σ(−x + 3/2)
(45)
σ(−x + y + 3/2) = (σ(−x + 3/2) ⊕ y) ⊙ σ(−x + 5/2).
Likewise, we can remove the x-terms inside σ(·) according to
(46)
σ(−x + 1/2) = ¬σ(x + 1/2) = ¬((σ(1/2) ⊕ x) ⊙ σ(3/2)) = ¬(1/2 ⊕ x)
σ(−x + 3/2) = ¬σ(x − 1/2) = ¬((σ(−1/2) ⊕ x) ⊙ σ(1/2)) = ¬(x ⊙ 1/2)
σ(−x + 5/2) = ¬σ(x − 3/2) = ¬((σ(−3/2) ⊕ x) ⊙ σ(−1/2)) = 1.
For the constant term 1/2, by Definition 2.6, we can write
1/2 = δ2 1.
Substituting (46) back into (45) and then substituting the results thereof
into (44), we obtain the DMV term corresponding to σ(x − 2y + 1/2)
according to
(47) ¬((((¬(δ2 1 ⊕ x) ⊕ y) ⊙ ¬(x ⊙ δ2 1)) ⊕ y) ⊙ (¬(x ⊙ δ2 1) ⊕ y)).
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 35

We conclude the example by noting that the algebraic expression of the

DMV term extracted using this procedure depends on the order in which
the variables are eliminated through Lemma 4.4; functionally these ex-
pressions will, however, all be equivalent.

We are now ready to describe how the DMV term underlying a given
ReLU network with integer weights and biases in Qk mapping [0, 1]n to
[0, 1] can be extracted. The corresponding algorithm starts by apply-
ing Lemma 4.3 to convert the network into a functionally equivalent σ-
network

Ψf = WL ◦ σ ◦ · · · ◦ W2 ◦ σ ◦ W1 ,

where W1 , . . . , WL are affine transformations with integer weights and bi-

ases in Qk . As the range of Ψf is contained in [0, 1], we can apply the
σ-activation function to the output layer without changing the mapping,
i.e.,

Ψf = σ ◦ WL ◦ · · · ◦ σ ◦ W2 ◦ σ ◦ W1 .

Next, Lemma 4.4 is applied to each layer σ ◦ Wℓ , ℓ = 1, . . . , L, as in Exam-

ple 4.2 above to extract the DMV terms corresponding to the individual
output neurons in layer ℓ. The resulting DMV formulae are then alge-
braically composed honoring the compositional structure of Ψf to yield
the overall DMV term corresponding to the ReLU network we started
from. We note that in order to have the compositional structure of ReLU
networks correspond to compositions in DMV algebra as just explained, it
is crucial to first transform the ReLU network into a (functionally equiv-
alent) σ-network according to Lemma 4.3. This is, as mentioned above,
to ensure that the outputs of neurons inside the network are guaranteed
to be in [0, 1] so as to remain in the set ([0, 1]) underlying the DMV al-
gebra. We have automated our extraction algorithm in Python software6 ;
the algorithm takes a ReLU network as input and outputs a correspond-
ing DMV formula. Applying this algorithm, for example, to the ReLU
network fc = ρ(x−1 + x0 + x1 ) − ρ(x−1 + x0 + x1 − 1) linearly interpo-
lating the transition function in Table 4, indeed, recovers the MV term
τ = x−1 ⊕ x0 ⊕ x1 we started from in Example 2.2.

Acknowledgment. The authors are deeply grateful to Prof. Olivia

Caramello for drawing their attention to the McNaughton theorem and,
more generally, to MV logic.

6
Implementation available at https://www.mins.ee.ethz.ch/research/downloads
/NN2MV.html
36 YANI ZHANG AND HELMUT BÖLCSKEI

Appendix A. MV term corresponding to fc in (13). First, ex-

pand the domains of the linear pieces of fc to [0, 1] and denote them by
f1 , f2 , f3 , with
f1 (x) = 3x,
f2 (x) = 1,
f3 (x) = −3x + 3,
for x ∈ [0, 1]. Since fc : [0, 1] → [0, 1], we can compose each linear piece
with σ(·), which results in the truncated linear functions σ(f1 ), σ(f2 ), σ(f3 )
depicted in Figure 9. Next, we apply Lemma 4.4 to extract the MV terms
corresponding to σ(f1 ), σ(f2 ), σ(f3 ). Trivally, σ(f2 ) = ¬0 on [0, 1]. The
function σ(f1 ) can be expressed as
σ(f1 )(x) = x ⊕ x ⊕ x, for x ∈ [0, 1].
Likewise, we have
σ(f3 )(x) = x ⊙ x ⊙ x, for x ∈ [0, 1].
Inspection of Figure 9 and fc in Figure 4 allows us to conclude that
fc (x) = min{σ(f1 )(x), σ(f2 )(x), σ(f3 )(x)}, for x ∈ [0, 1].
Finally, it is readily verified that the “min” operation can be realized ac-
cording to
min{x, y} = x ∧ y,
and
min{x, y, z} = min{min{x, y}, z}.
Putting the pieces together, we obtain the MV term corresponding to fc
as
(x ⊕ x ⊕ x) ∧ ¬0 ∧ (x ⊙ x ⊙ x).

σ(f1 ) σ(f2 ) σ(f3 )

1 1 1

x x x
0 1/3 1 0 1/3 1 0 2/3 1

Figure 9. The truncated functions σ(f1 ), σ(f2 ), σ(f3 ).

CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 37

Appendix B. Transforming σ-networks into equivalent ReLU

networks.
′
Lemma B.1. For d, d′ ∈ N, let Ψ : Rd → Rd be a σ-network with integer
′
weights and biases in Qk . There exists a ReLU network Φ : Rd → Rd ,
with L(Φ) = L(Ψ), integer weights, and biases in Qk , satisfying
Φ(x) = Ψ(x), for all x ∈ Rd .

Proof. By Definition 4.1, we can write7

Ψ = WL ◦ σ ◦ WL−1 ◦ σ ◦ · · · ◦ σ ◦ W1 ,

with Wℓ (x) = Aℓ x + bℓ , Aℓ ∈ ZNℓ ×Nℓ−1 , bℓ ∈ QNk , for ℓ ∈ {1, . . . , L},

ℓ

′
N0 = d, and NL = d . The basic idea of the proof is to use the relationship
(48) σ(x) = ρ(x) − ρ(x − 1), for all x ∈ R,
to replace every σ-neuron in Ψ with a pair of ρ-neurons. We start with
σ ◦ W1 to obtain the equivalent network
(1) (1)
Ψ(1) = WL ◦ σ ◦ · · · ◦ σ ◦ W2 ◦ H1 ◦ ρ ◦ W1 ,
where

(1) W1 (x)
W1 (x) := , x ∈ RN 0 ,
W1 (x) − 1N1
(1)
H1 (x) := IN1 −IN1 x, x ∈ R2N1 ,

with 1N1 denoting the N1 -dimensional column vector with all entries equal
to 1. It follows directly that Ψ(1) has integer weights and biases in Qk .
Continuing in this manner, we get
(L−1) (L−1) (2) (1) (1)
Ψ(L−1) = WL ◦ HL−1 ◦ ρ ◦ WL−1 ◦ · · · ◦ W2 ◦ H1 ◦ ρ ◦ W1 ,
with

(ℓ) Wℓ (x)
Wℓ (x) = , x ∈ RNℓ−1 ,
Wℓ (x) − 1Nℓ
(ℓ)
Hℓ (x) = INℓ −INℓ x, x ∈ R2Nℓ ,

for ℓ = 1, . . . , L − 1, satisfying
Ψ(L−1) (x) = Ψ(x), for all x ∈ Rd .

The proof is concluded upon identifying Ψ(L−1) with Φ and noting that
Ψ(L−1) has integer weights and biases in Qk and L(Ψ(L−1) ) = L(Ψ). ⊣
7
To keep the exposition simple, we consider the form of Ψ for L ≥ 3, the cases
L = 1, 2 are trivially contained in the discussion.
38 YANI ZHANG AND HELMUT BÖLCSKEI

Appendix C. Proof of Lemma 4.4 [37], [33].

Proof. We follow the line of arguments in [33] and consider four dif-
ferent cases.
Case 1: f◦ (x) ≥ 1, for all x ∈ [0, 1]n . In this case, the LHS of (40) is
σ(f ) = 1.
The RHS becomes
(σ(f◦ ) ⊕ x1 ) ⊙ σ(f◦ + 1) = (1 ⊕ x1 ) ⊙ 1 = 1.
Case 2: f◦ (x) ≤ −1, for all x ∈ [0, 1]n . In this case, the LHS is
σ(f ) = 0.
The RHS is given by
(σ(f◦ ) ⊕ x1 ) ⊙ σ(f◦ + 1) = (0 ⊕ x1 ) ⊙ 0 = 0.
Case 3: −1 < f◦ (x) ≤ 0, for all x ∈ [0, 1]n . In this case, f ∈ (−1, 1] as
x1 ∈ [0, 1]. The RHS of (40) becomes
(σ(f◦ ) ⊕ x1 ) ⊙ σ(f◦ + 1)
= (0 ⊕ x1 ) ⊙ (f◦ + 1)
= x1 ⊙ (f◦ + 1)
= max{0, x1 + f◦ + 1 − 1}
= max{0, f }
= σ(f ).
Case 4: 0 < f◦ (x) < 1, for all x ∈ [0, 1]n . In this case, f ∈ (0, 2). The
RHS becomes
(σ(f◦ ) ⊕ x1 ) ⊙ σ(f◦ + 1)
= (f◦ ⊕ x1 ) ⊙ 1
= f◦ ⊕ x1
= min{1, f◦ + x1 }
= min{1, f }
= σ(f ).
⊣

Appendix D.
Lemma D.1. Let n ∈ N and let fc : [0, 1]n → [0, 1] be a continuous
piecewise linear function with linear pieces
(49) pj (x1 , . . . , xn ) = mj1 x1 + · · · + mjn xn + bj , j = 1, . . . , ℓ,
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 39

where mj1 , . . . , mjn ∈ R, bj ∈ R. Let Π be the set of all permutations

of the index set {1, . . . , ℓ}. There exist a nonempty set of permutations
Σ ⊂ Π and, for each π ∈ Σ, an integer iπ with 1 ≤ iπ ≤ ℓ, such that
(50) fc (x) = max min pπ(i) (x), for all x ∈ [0, 1]n .
π∈Σ i∈{1,...,iπ }

Proof. The proof follows by combining arguments in the proofs of [5,

Lemma 1.4.3] and [12, Proposition 9.1.4], where the result is established for
piecewise linear functions with integer coefficients. For each permutation
π = π(1)π(2) · · · π(ℓ) ∈ Π, let
Pπ := {x ∈ [0, 1]n : pπ(1) (x) ≥ pπ(2) (x) ≥ · · · ≥ pπ(ℓ) (x)}
be the polyhedron associated with π. We refer to [5, Lemma 1.4.2] for a
proof of Pπ , indeed, constituting a polyhedron. Define the dimension of
a polyhedron as the maximum number of affinely independent points it
contains minus 1. Let Σ be the collection of permutations whose associated
polyhedra are n-dimensional, i.e.,
Σ := {π ∈ Π : Pπ is n-dimensional}.
Note that ∪π∈Σ Pπ = [0, 1]n [5, Lemma 1.4.3]. For every π ∈ Σ, let
iπ ∈ {1, . . . , ℓ} be such that fc (x) = pπ(iπ ) (x), for x ∈ Pπ , and let gπ :=
mini∈{1,...,iπ } pπ(i) . For every π ∈ Σ, gπ (x) = fc (x) = pπ(iπ ) (x), for x ∈ Pπ .
Therefore,
fc (x) ≤ max gπ (x), for x ∈ [0, 1]n .
π∈Σ

It hence suffices to prove that fc (x) ≥ gπ (x), for x ∈ [0, 1]n , for all π ∈ Σ,
to obtain
fc (x) = max gπ (x), for x ∈ [0, 1]n ,
π∈Σ

which establishes the desired result (50). By continuity of fc and the fact
that the min and max of continuous functions are continuous, it suffices
to establish fc ≥ gπ on the set
{x ∈ [0, 1]n : ∃π ∈ Σ | x is in the interior of Pπ }.
Now fix an arbitrary π ∈ Σ. As already noted above, gπ (x) = fc (x), for
x ∈ Pπ . Take an arbitrary point y ∈ / Pπ that is in the interior of some
Pη . There exists a k ∈ {1, . . . , ℓ} so that fc (x) = pπ(k) (x), for x ∈ Pη . We
treat the cases k ≤ iπ and k > iπ separately. First, if k ≤ iπ , then
fc (y) = pπ(k) (y) ≥ min pπ(i) (y) = gπ (y).
i∈{1,...,iπ }

Second, assume that k > iπ . Let x be a point in the interior of Pπ . Then,

fc (x) = pπ(iπ ) (x) > pπ(k) (x). Denote by µ the linear function connecting
the points (x, fc (x)) and (y, fc (y)).
40 YANI ZHANG AND HELMUT BÖLCSKEI

Within {w = λx + (1 − λ)y : µ(w) = fc (w), 0 < λ ≤ 1}, which is the

set of intersection points of fc and µ over the interval [x, y), let w0 be the
point closest to y (but distinct from y).
There exists a k ′ ∈ {1, . . . , ℓ} such that fc (w0 ) = pπ(k′ ) (w0 ) and fc
coincides with pπ(k′ ) on a small interval [w0 , v] ⊂ [w0 , y], with v ̸= w0 . By
continuity of fc , the restriction of the graph of pπ(k′ ) to [w0 , v] lies strictly
below µ, i.e.,
pπ(k′ ) (x) < µ(x), for x ∈ [w0 , v], x ̸= w0 .
Moving from w0 to y along pπ(k′ ) , we get pπ(k′ ) (y) < fc (y). On the other
hand, moving from w0 to x, we obtain pπ(k′ ) (x) ≥ fc (x) = pπ(iπ ) (x),
thus k ′ ≤ iπ . Hence gπ (z) ≤ pπ(k′ ) (z), for z ∈ [0, 1]n , and in particular
gπ (y) ≤ pπ(k′ ) (y) < fc (y).
As π and y above were arbitrary, we can conclude that fc (x) ≥ gπ (x),
for all x ∈ [0, 1]n , for all π ∈ Σ. The proof is complete. ⊣

REFERENCES

[1] A. Adamatzky, Identification of Cellular Automata, 1 ed., CRC Press,

1994.
[2] , Automatic programming of cellular automata: Identification approach,
Kybernetes, vol. 26 (1997), no. 2, pp. 126–135.
[3] , Identification of cellular automata, Encyclopedia of Complexity and
Systems Science, (2009), pp. 4739–4751.
[4] A. Adamatzky (editor), Game of Life Cellular Automata, 1 ed., Springer,
2010.
[5] S. Aguzzoli, Geometrical and Proof Theoretical Issues in Łukasiewicz Proposi-
tional Logics, PhD thesis, University of Siena, Italy, 1998.
[6] P. Amato, A. Di Nola, and B. Gerla, Neural networks and rational
Łukasiewicz logic, Proceedings of the Annual Meeting of the North American
Fuzzy Information Processing Society, 2002, pp. 506–510.
[7] , Neural networks and rational McNaughton functions, Journal of
Multiple-Valued Logic and Soft Computing, vol. 11 (2005), no. 1-2, pp. 95–110.
[8] E. R. Berlekamp, J. H. Conway, and R. K. Guy, Winning Ways for
Your Mathematical Plays, Academic Press, New York, 1982.
[9] A. W. Burks, Von Neumann’s self-reproducing automata, Technical Report
08226-11-T, University of Michigan, 1969.
[10] C. C. Chang, Algebraic analysis of many-valued logics, Transactions of the
American Mathematical Society, vol. 88 (1958), no. 2, pp. 467–490.
[11] , A new proof of the completeness of the Łukasiewicz axioms, Transac-
tions of the American Mathematical Society, vol. 93 (1959), no. 1, pp. 74–80.
[12] R. L. Cignoli, I. M. d’Ottaviano, and D. Mundici, Algebraic Founda-
tions of Many-Valued Reasoning, Springer, 2000.
[13] M. Cook, Universality in elementary cellular automata, Complex Systems,
vol. 15 (2004), no. 1, pp. 1–40.
[14] S. Das and M. K. Chakraborty, Formal logic of cellular automata, Complex
Systems, vol. 30 (2021), no. 2, pp. 187–203.
CELLULAR AUTOMATA, MANY-VALUED LOGIC, AND DEEP NEURAL NETWORKS 41

[15] A. Di Nola, B. Gerla, and I. Leuştean, Adding real coefficients to

Łukasiewicz logic: An application to neural networks, Fuzzy Logic and Applica-
tions: 10th International Workshop, (2013), pp. 77–85.
[16] D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei, Deep
neural network approximation theory, IEEE Transactions on Information The-
ory, vol. 67 (2021), no. 5, pp. 2581–2623.
[17] J. L. Elman, Finding structure in time, Cognitive Science, vol. 14 (1990),
no. 2, pp. 179–211.
[18] M. Gardner, The fantastic combinations of John Conway’s new solitaire game
“life”, Scientific American, vol. 223 (1970), pp. 120–123.
[19] B. Gerla, Rational Łukasiewicz logic and DMV-algebras, Neural Network
Worlds, vol. 25 (2001), pp. 1–13.
[20] W. Gilpin, Cellular automata as convolutional neural networks, Physical Re-
view E, vol. 100 (2019), p. 032402.
[21] A. Graves and J. Schmidhuber, Offline handwriting recognition with mul-
tidimensional recurrent neural networks, Proceedings of the 21st International
Conference on Neural Information Processing Systems, 2008.
[22] A. Hatcher, Algebraic Topology, Cambridge University Press, 2002.
[23] I. Hubara et al., Quantized neural networks: Training neural networks with
low precision weights and activations, Journal of Machine Learning Research,
vol. 18 (2017), no. 1, p. 6869–6898.
[24] C. Hutter, R. Gül, and H. Bölcskei, Metric entropy limits on recurrent
neural network learning of linear dynamical systems, Applied and Computational
Harmonic Analysis, vol. 59 (2022), pp. 198–223.
[25] T. Ishida, S. Inokuchi, and Y. Kawahara, Cellular automata and formulae
on monoids, Proceedings of the 11th International Conference on Cellular
Automata for Research and Industry, 2014, pp. 55–64.
[26] J. Jumper et al., Highly accurate protein structure prediction with AlphaFold,
Nature, vol. 596 (2021), no. 7873, pp. 583–589.
[27] J. Kari, Theory of cellular automata: A survey, Theoretical Computer Sci-
ence, vol. 334 (2005), no. 1-3, pp. 3–33.
[28] S. B. Kotsiantis, Supervised machine learning: A review of classification tech-
niques, Informatica, vol. 31 (2007), pp. 249–268.
[29] G. Leifert et al., Cells in multidimensional recurrent neural networks, Jour-
nal of Machine Learning Research, vol. 17 (2016), no. 1, p. 3313–3349.
[30] M. Liang and X. Hu, Recurrent convolutional neural network for object recog-
nition, 2015 IEEE Conference on Computer Vision and Pattern Recognition,
2015, pp. 3367–3375.
[31] W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in
nervous activity, Bulletin of Mathematical Biophysics, vol. 5 (1943), pp. 115–133.
[32] R. McNaughton, A theorem about infinite-valued sentential logic, The Jour-
nal of Symbolic Logic, vol. 16 (1951), no. 1, pp. 1–13.
[33] D. Mundici, A constructive proof of McNaughton’s theorem in infinite-valued
logic, The Journal of Symbolic Logic, vol. 59 (1994), no. 2, pp. 596–602.
[34] A. Radford, L. Metz, and S. Chintala, Unsupervised representation learn-
ing with deep convolutional generative adversarial networks, Proceedings of the In-
ternational Conference on Learning Representations, 2016.
[35] P. Rendell, Turing universality of the Game of Life, Collision-Based Com-
puting (A. Adamatzky, editor), Springer, 2002, pp. 513–539.
[36] F. C. Richards, T. P. Meyer, and N. H. Packard, Extracting cellular au-
tomaton rules directly from experimental data, Physica D: Nonlinear Phenomena,
42 YANI ZHANG AND HELMUT BÖLCSKEI

vol. 45 (1990), no. 1, pp. 189–202.

[37] A. Rose and J. B. Rosser, Fragments of many-valued statement calculi,
Transactions of the American Mathematical Society, vol. 87 (1958), no. 1,
pp. 1–53.
[38] K. H. Rosen, Discrete Mathematics and Its Applications, 7 ed., New
York: McGraw-Hill, 2012.
[39] R. Rovatti, M. Borgatti, and R. Guerrieri, A geometric approach to
maximum-speed n-dimensional continuous linear interpolation in rectangular grids,
IEEE Transactions on Computers, vol. 47 (1998), no. 8, pp. 894–899.
[40] D. Silver et al., Mastering the game of Go with deep neural networks and tree
search, Nature, vol. 529 (2016), pp. 484–489.
[41] L. Smith, Linear Algebra, Springer, 1998.
[42] A. Tarski, Logic, semantics, metamathematics: Papers from 1923 to
1938, Hackett Publishing, 1983.
[43] S. Ulam, Random processes and transformations, Proceedings of the Inter-
national Congress on Mathematics, vol. 2, 1950, pp. 264–275.
[44] P. P. Vaidyanathan, Multirate systems and filter banks, Pearson Edu-
cation India, 2006.
[45] J. Von Neumann, The general and logical theory of automata, Systems Re-
search for Behavioral Science, Routledge, 1968, pp. 97–107.
[46] B. H. Voorhees, Computational analysis of one-dimensional cellular
automata, vol. 15, World Scientific, 1996.
[47] R. T. Wainwright, Life is universal!, Proceedings of the 7th Conference
on Winter Simulation-Volume 2, 1974, pp. 449–459.
[48] A. Weiser and S. E. Zarantonello, A note on piecewise linear and multilin-
ear table interpolation in many dimensions, Mathematics of Computation, vol. 50
(1988), no. 181, pp. 189–196.
[49] S. Wolfram, Statistical mechanics of cellular automata, Reviews of Modern
Physics, vol. 55 (1983), pp. 601–644.
[50] , A New Kind of Science, Wolfram Media, 2002.
[51] S. Wu, G. Li, F. Chen, and L. Shi, Training and inference with integers
in deep neural networks, Proceedings of the 6th International Conference on
Learning Representations, 2018.
[52] N. Wulff and J. A. Hertz, Learning cellular automaton dynamics with neural
networks, Proceedings of the 5th International Conference on Neural Infor-
mation Processing Systems, vol. 5, 1992, pp. 631–638.
[53] Y. Zhao and S. Billings, Neighborhood detection using mutual information
for the identification of cellular automata, IEEE Transactions on Systems, Man,
and Cybernetics, Part B (Cybernetics), vol. 36 (2006), no. 2, pp. 473–479.