EDEN - A Practical, SNARK-friendly Combinator VM and ISA
EDEN - A Practical, SNARK-friendly Combinator VM and ISA
Abstract
Succinct Non-interactive Arguments of Knowledge (SNARKs) enable a party to cryptographically
prove a statement regarding a computation to another party that has constrained resources. Practical
use of SNARKs often involves a Zero-Knowledge Virtual Machine (zkVM) that receives an input pro-
gram and input data, then generates a SNARK proof of the correct execution of the input program.
Most zkVMs emulate the von Neumann architecture and must prove relations between a program’s
execution and its use of Random Access Memory. However, there are conceptually simpler models of
computation that are naturally modeled in a SNARK yet are still practical for use. Nock is a minimal,
homoiconic combinator function, a Turing-complete instruction set that is practical for general com-
putation, and is notable for its use in Urbit.
We introduce Eden, an Efficient Dyck Encoding of Nock that serves as a practical, SNARK-friendly
combinator function and instruction set architecture. We describe arithmetization techniques and
polynomial equations used to represent the Eden ISA in an Interactive Oracle Proof. Eden provides
the ability to prove statements regarding the execution of any program that compiles down to the
Eden ISA. We present the Eden zkVM, a particular instantiation of Eden as a zk-STARK.
1
Contents
1 Introduction ............................................................................................................................................................ 3
1.1 Background ....................................................................................................................................................... 3
1.2 Our Work .......................................................................................................................................................... 3
2 Proof Systems ......................................................................................................................................................... 4
2.1 Interactive Oracle Proofs ............................................................................................................................... 5
2.2 zk-STARKs ........................................................................................................................................................ 5
3 Eden ISA .................................................................................................................................................................. 8
3.1 Nouns ................................................................................................................................................................. 8
3.2 Operators ........................................................................................................................................................... 9
3.3 The Eden Function ....................................................................................................................................... 10
4 Overview of Eden zkVM Design ..................................................................................................................... 11
4.1 Eden’s fundamental invariants and operations ..................................................................................... 11
4.2 Tables ............................................................................................................................................................... 12
5 Arithmetization Fundamentals ......................................................................................................................... 13
5.1 Field Choice .................................................................................................................................................... 13
5.2 Schwartz-Zippel Lemma .............................................................................................................................. 14
5.3 Noun Encoding .............................................................................................................................................. 14
5.4 Optimized Noun Encoding .......................................................................................................................... 17
6 Probabilistic Structures and Algorithms ........................................................................................................ 19
6.1 Multiset Arguments ...................................................................................................................................... 19
6.2 Stacks in the Eden zkVM ............................................................................................................................ 20
7 Eden zkVM Design ............................................................................................................................................. 22
7.1 Tables ............................................................................................................................................................... 22
7.2 Input/Output Linking ................................................................................................................................... 29
7.3 Extra Commitment Phase ........................................................................................................................... 30
7.4 Large-Noun Elision ....................................................................................................................................... 30
8 Security Analysis ................................................................................................................................................. 31
8.1 Noun Collision ............................................................................................................................................... 31
8.2 Cons & Inverse Cons Validation ................................................................................................................ 31
References ................................................................................................................................................................. 34
2
1 Introduction
1.1 Background
In recent years, work on and interest in zero-knowledge proof systems has been explosive. These are
protocols in which a prover entity can convince a verifier entity of a computational claim while
keeping certain information secret. Such systems work with computations encoded using an arith-
metization process into the mathematical language of finite fields and polynomials. Given the delicate
and specialized nature of this translation process, these proof systems are largely impractical for ordi-
nary programmers who would ideally specify computation in a familiar language and not in terms of
polynomials. Thus there is a natural demand to find a way to generate proofs without deviating too
far from normal programming practices.
One way of accomplishing this is to target a specific instruction set architecture (ISA) which is the
compilation target of a Turing-complete programming language, and to write polynomial constraints
which represent the execution of an arbitrary program for this ISA. Since modern CPUs physically im-
plement von Neumann architectures, it is natural that efforts to implement this solution have resulted
in ISAs that substantially mirror modern CPU architecture.
However, combinatory logic and the functional programming paradigm provide an alternative math-
ematically-centered perspective on computation which stretches back to the duality between Turing’s
machines and Church’s lambda calculus. Our attachment to functional computation flows from our
enthusiasm for the elegant minimalism of Nock. Nock is an ISA given by approximately a dozen rules
for transforming binary trees, which together define a practical Turing-complete combinator function.
While Nock is expressive enough to be read and understood, code is typically written in higher level
languages that compile down to Nock instead of in Nock directly. Hoon is at present the primary high-
level language that compiles to Nock; Hoon exists as part of the Urbit project.
Natural and minimal arithmetizations provide a conceptually simple substrate to reason about and
have numerous advantages, including efficiency and flexibility. Our uniquely compact and purely
mathematical specification has the potential to be utilized in various existing and future cryptographic
schemes.
3
1.2.1 Binary Tree Arithmetization
We describe a linearized arithmetic encoding of Eden’s noun data structure, which is based on binary
trees. This involves two vectors, one which describes the tree shape and another which encompasses
the leaf values stored in the tree. For the shape encoding, we utilize the bijection between binary trees
and dyck words, which is motivated by the simplicity of performing tree concatenation in this repre-
sentation. It transpires that dyck words are fundamentally related to depth-first traversal of a tree. The
tree leaves are then placed in the leaf vector in depth-first order. See Section 5.3 for more details.
2 Proof Systems
Proofs are demonstrations of mathematical or computational facts, e.g. “Assuming axiom system 𝒜,
statement 𝑋 implies statement 𝑌 ,” or “The computation 𝒞 with input 𝑥 outputs 𝑦.”
Traditional mathematical proofs are, ideally, pieces of text of reasonable length, composed by a prover
after some non-deterministic process involving intuition and guesswork. The resulting text can then
4
be checked by handing it to a verifier who engages in a straightforward check of logical implications
under reasonably limited time constraints. This roughly corresponds to the 𝖭𝖯 computational com-
plexity class of decision problems.
In the last several decades, complexity theorists have been exploring less traditional conceptions of
proof: interactive proofs (IPs), multiprover interactive proofs (MIPs), and probabilistically checkable
proofs (PCPs), as well as variations on and refinements of these models.
At the core of these innovations is a rather more flexible understanding of what proofs are. Tradi-
tionally, a proof purports to offer ironclad guarantees of truth or falsity if one merely examines some
hypotheses and logical deductions. This newer, innovative conception, called a proof system, is any
protocol by which a prover can convince a verifier of a claim, where the meaning of “convincing”
depends on the proof system.
In such proof systems, the prover is not necessarily a trustworthy entity, and there exists a possibility
that a prover can convince the verifier of a false statement, or can fail to convince the verifier of a true
statement. These possibilities are quantified by the soundness – the probability that the verifier will
accept a false statement – and completeness – the probability that the verifier will accept the proof
of a true statement.
2.1 Interactive Oracle Proofs
One of the recently introduced families of proof systems is the interactive oracle proof (IOP) model
of [BCS16]. The IOP model combines aspects of the earlier IP and PCP models. Like the PCP model,
it is expressive enough to characterize the complexity class 𝖭𝖤𝖷𝖯, which contains the 𝖭𝖯 class of
traditional proofs. However, the injection of interactivity from the IP model provides efficiency gains
over the PCP model (in e.g. proof length). This fusion works by allowing the verifier in the interaction
to probabilistically query the prover’s messages.
In an IOP, the prover and verifier interact over a number of rounds. In each round, the verifier sends
the prover a challenge. Then, the prover sends the verifier a set of data in response to the challenge,
which the verifier can partially query in this or later rounds. At the end of an agreed upon number of
rounds, the verifier either accepts or rejects the proof.
The surprise challenges are meant to dissuade a malicious prover from giving into the temptation to
deceive. There is a low probability that the prover will provide a falsified response that passes the
verifier’s challenge.
We remark that an IOP can be made into a non-interactive protocol via the Fiat-Shamir heuristic
[FS87]. The Fiat-Shamir heuristic is a transformation which substitutes the verifier’s messages with
the output of a random function – this is a purely theoretical construction called a random oracle; in
practice, it’s a cryptographic hash function – where the input is the totality of previously exchanged
messages. To allow the verifier access to parts of the prover’s messages, this procedure is augmented
to utilize cryptographic commitments to the data. The proof in this non-interactive variant is a collec-
tion of information related to the commitments, and the verifier’s role is to check this information for
validity without additional communication with the prover. For details see [BCS16].
2.2 zk-STARKs
A zk-STARK - zero-knowledge Scalable Transparent Argument of Knowledge - is a way to realize an
IOP model with some very desirable properties. Before discussing these features it’s useful to motivate
the utility of this system.
5
Imagine that a verifier wants some computation performed but does not have the resources to do the
computation itself. It would like to outsource the computation to a prover who has more resources but
may be untrusted by the verifier. The verifier would want some procedure to check the work of the
prover that takes vastly less time than running the computation and can be accomplished by the prover
in a practical amount of time. This is what a zk-STARK provides alongside the following guarantees:
1. Scalability: Verifying the computation takes exponentially less time than running it, and proving
the computation incurs marginal overhead to running the computation.
2. Transparency: The verifier only has to trust hash functions and mathematics, not the prover or
anyone else.
3. Perfect completeness: The verifier is always convinced by the protocol if the prover is honest.
4. High soundness: The verifier is only convinced by a malicious prover with astronomically low
odds.
5. Zero-knowledge: The proof can hide information that the verifier wants to keep private, while
still proving statements about this information.
A zk-STARK uses a technique called arithmetization to encode a computation. Computation is mod-
eled as a sequence of steps, where at each step we update a fixed finite amount of information according
to a set of rules. The information recorded at each step is arranged in a horizontal row, and the se-
quential ordering of the steps is reflected by stacking the rows vertically into a two-dimensional table.
For example, imagine you’re at the bank to cash several checks, and are filling out a deposit slip. You
record the check amounts in the appropriate column, and have to compute the total amount. To avoid
having to do the entire computation in your head, you can make a new column next to the given one
in the margin which keeps a running total. See Table 1. The first entry in this column is just the first
check’s amount. To compute the second entry of the running total, add the second entry of the check
amount column to the first entry of the running total column. In general, to compute the 𝑛th running
total, add the 𝑛th check amount to the (𝑛 − 1)st running total. The equation implicit in this rule applies
at every pair of rows and describes the correct execution of the computation; it is called a constraint,
an important concept in zk-STARKs. If every step is computed correctly, then the last running total is
the total amount.
6
computation can wildly affect the intended output. The verifier would need some way to magnify an
error in a single location so that it’s noticeable.
Since the discovery of Reed-Solomon codes, it is well-known that polynomials are good at this kind
of error detection, which one can see in the following way. Suppose we fit the simplest polynomial
possible to some data points over a particular input domain. If we corrupt the data by changing even
a single one of the data points, then either the polynomial will look radically different outside of the
input domain, or its degree – the largest exponent of a variable needed to express the polynomial –
will potentially differ from the original polynomial.
The zk-STARK protocol takes advantage of this property by (a) requiring that constraints be repre-
sented as polynomials and (b) requiring that table data be interpolated into polynomials. This is done
so that verification, (i.e. applying constraints to the computation data) becomes equivalent to polyno-
mial composition, which in turn yields a final polynomial that can be subjected to the error detection
process. It is in this way that deceit by the prover is magnified and made visible to the verifier.
The techniques of arithmetization and polynomial error correction sit at the heart of the zk-STARK
protocol. In a zk-STARK, a prover and verifier engage in one or more rounds of roughly four distinct
phases: commitment, challenge, response, and Low Degree Testing (LDT).
• In a commitment phase, the prover interpolates polynomials over various columns of the trace
tables, evaluates these polynomials over a large input domain, and constructs a Merkle tree of the
evaluations. The root of the Merkle tree is sent down the proof stream, serving as the commitment
of the polynomial.
• In a challenge phase, the prover receives random challenges from the verifier.
• In a response phase, the prover provides a direct response to random challenges that are specifically
used to reveal evaluations of parts of the commmitted polynomials.
• In an LDT phase, the verifier checks that one or more polynomials are of low degree.
Specifically, the phases are ordered in the following way:
(1) Base Commitment - The prover populates the base columns of the tables and commits to them
in the proof stream. Base columns are columns that do not require randomness to compute.
(2) Extension Challenge - The verifier sends the first round of random challenges.
(3) Extension Commitment - The prover uses the randomness to compute extension columns which
are interpolated and then committed.
(4) Quotient Commitment - The prover generates and commits to quotient polynomials, which en-
code information about the evaluation of constraints on the data.
(5) Combination Commitment - The prover creates and commits to the nonlinear combination code-
word polynomial. This polynomial is used as a means to mask information from the verifier and
enables the system to provide zero-knowledge properties.
(6) Evaluation Challenge - prover receives random challenges which are interpreted as requested
indices to be opened in the Merkle trees committed to hitherto.
(7) Evaluation Response - The prover provides the requested leaves from the Merkle trees through a
series of openings. Using the revealed values, the verifier performs a number of consistency checks
on the polynomials that have been committed to ensure that they are properly linked to one an-
other.
(8) LDT - The verifier runs the FRI protocol on the decommitment of the nonlinear combination code-
word to check that it is of sufficiently low degree; if it is, the verifier accepts the prover’s work.
7
Note that the random challenges sent by the verifier serve distinct purposes depending on the stage
of the proof: in (2), they are used primarily by the prover as a means of computing additional columns
that make use of randomness, while in (6), they are used by the verifier to actually request data from
the prover.
To make our system sound, we augment the sketch provided above with an extra round of the Chal-
lenge/Commitment phases described by (2) and (3). This extra round prevents a potentially malicious
prover from taking advantage of an extra degree of freedom that could otherwise be exploited.
While an in-depth explanation of the zk-STARK protocol is out of scope, we urge the reader to refer
to [Ben+18] or [KR08] for further information.
In order to use a zk-STARK to prove arbitrary computation, we need both a suitable Turing-complete
model of computation, and a way to encode this model of computation in terms of finite field elements.
Therefore, we will proceed to describe Eden, our chosen model of computation, and its arithmetization.
3 Eden ISA
The Eden ISA is a specification of a Turing-complete combinator function, comprised of a minimalist
set of reduction rules that map a data-code pair to a product. All data and code are represented using
the same data type, the noun, which makes Eden a homoiconic language. Eden is a minimally modified
version of Nock that is optimized for zkVM performance; to describe the modifications tersely, Eden
replaces natural number atoms with finite field atoms, and replaces regular increment with finite field
increment. What follows is an informal introduction to the Eden specification. For a more complete
and detailed introduction, see the Nock specification in [Noc23].
3.1 Nouns
A noun is defined as either an atom, which is a finite field element, or a cell, which is an ordered pair
of nouns.
This definition corresponds in a straightforward way to a full rooted binary tree whose leaves are finite
field elements, like so:
•
/ \
0 •
/ \
• 1
/ \
6 20
Full means each tree node has either no children – the node is an atom – or two children – the node
is a cell. Rooted means the tree has a distinguished node which is singled out as the top of the tree.
From now on, we will drop these two adjectives and just use the term binary tree.
We write [a b] to denote a cell that consists of two nouns, a and b. In a cell [a b], we say that the
head of the cell is a and that the tail is b. We also say that a is the left subtree of [a b] and that b is
the right subtree of [a b].
The expression [a [b c]] is a way to write a cell whose head is the noun a and tail is a cell of nouns
b and c. As an example, the noun pictured above is denoted as [0 [[6 20] 1]].
8
Note well that Eden is implicitly right associative. For example, the preferred way to write the ex-
pression [a [b c]] is [a b c]; the b and c to the right of a are assumed to associate into a cell. Similarly,
we have [a b c d] = [a [b c d]] = [a [b [c d]]], etc. As a concrete example, the preferred way
to write [0 [[6 20] 1]] is [0 [6 20] 1]. Note that one cannot drop any more brackets to reduce
this to [0 6 20 1]; the latter is equivalent, by convention, to [0 [6 [20 1]]], which looks like this
picture:
•
/ \
0 •
/ \
6 •
/ \
20 1
9
▪ /(2a b) will reduce to /(2 /(a b)) which will then be further evaluated until it reaches one
of the base cases.
▪ /((2a + 1) b) will reduce to /(3 /(a b)) which will then be further evaluated until it reaches
one of the base cases.
• the # operator functions recursively like the / operator, but rather than accessing a subtree, it mod-
ifies a subtree at a particular axis.
The slot operator is essentially a numerical addressing scheme for subtrees of a tree. The full tree gets
address 1; if a subtree has address a, its left and right subtrees get addresses 2a and 2a+1, respectively.
Thus, moving to the left child of a node with address a adds 0 to the binary expansion of a, and moving
to the right adds 1. This gives a straightforward way of translating a subtree address into a sequence of
left/right moves from the root: convert the address to binary; ignore the leading 1, which corresponds
to the tree’s root; the rest of the digits, read left-to-right, correspond to left/right moves according to
whether they are 0/1. This can also be used in reverse to turn a sequence of left/right moves into an
address. Note that the slot operator accepts a noun as input, and will traverse that noun in depth-first
order, recursively evaluating until said atom is 1, then traversing to the next atom in the noun.
3.3 The Eden Function
We define the Eden function as a function that takes an input noun, and returns an output noun or
crashes. Input is formatted as a cell of [subject formula], where formula is the code that is to be
executed and subject is the data that the formula can reference, which can contain code itself owing
to Eden’s homoiconicity. The Eden function is defined below:
#eden(x) may be written as *x.
Denote “evaluates to” by ↦.
A valid Eden formula is always a cell.
If the head of the formula is a cell, Eden treats both head and tail as formulas, evaluates each against
the subject, and produces the cell of their products. That is,
• *[a [b c] d] ↦ [*[a b c] *[a d]].
In this case, we say that the formula is cons as shorthand (which is related to but not the same as the
cons operation).
If the head of the formula is an atom, then it must contain an atom acting as a numeric opcode from 0
to 11. If the head of the formula is higher than 11, it is not valid. Invalid Eden instructions result in a
crash.
10
In the following transformation rules, a is a subject and the following cell is the formula:
• *[a 0 b] ↦ /(b a)
• *[a 1 b] ↦ b
• *[a 2 b c] ↦ *[*[a b] *[a c]].
• *[a 3 b] ↦ ?*[a b]
• *[a 4 b] ↦ +*[a b]
• *[a 5 b c] ↦ =[*[a b] *[a c]]
• *[a 6 b c d] ↦ *[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]
• *[a 7 b c] ↦ *[*[a b] c]
• *[a 8 b c] ↦ *[[*[a b] a] c]
• *[a 9 b c] ↦ *[*[a c] 2 [0 1] 0 b]
• *[a 10 [b c] d] ↦ #[b *[a c] *[a d]]
• *[a 11 b c] ↦ *[a c]
The fundamental opcodes are 0-5, and 6-10 are essentially macros of 0-5. The fundamental opcodes
can be divided into two categories: recursive and non-recursive. The non-recursive formulas are those
with opcodes 0 or 1, as their evaluations don’t involve another invocation of the Eden function, while
the recursive formulas are 2, 3, 4, and 5. Eden 11 is a formalism for dealing with extension instructions,
or jets, which replace a formally specified computation with a faster version thereof.
When dealing with recursive formulas, the tail is either a noun b or a cell of nouns [b c], and these
nouns b and c are used as formulas themselves in defining the evaluations. We refer to them as sub-
formulas of the outer formula.
Note that the evaluation of these subformulas against the subject must be computed in order to com-
pute the original invocation of the Eden function; that is, *[a b] and/or *[a c] must be eagerly
computed prior to the outer computation.
We observe that opcode 2 stands out since it involves three invocations of the Eden function * in its
evaluation. The first subformula b is evaluated against the subject a and generates a new subject, while
the second subformula c generates a new formula. Then, the new formula is evaluated against the new
subject as follows: *[new-subject new-formula].
For example, in the Eden expression *[[0 4 0 1] 2 [0 2] 0 3], the subject is [0 4 0 1] and the
formula is [2 [0 2] 0 3]. Thus to compute this expression, we compute a new subject via *[[0 4 0
1] 0 2] = 0 and a new formula via *[[0 4 0 1] 0 3] = [4 0 1]. To complete the computation, we
have to apply * one last time with this new subject and formula to find
*[[0 4 0 1] 2 [0 2] 0 3] = *[0 4 0 1] = +*[0 0 1] = +0 = 1.
11
3. If there are any subformulas, we execute them eagerly. Otherwise, we execute the decoded outer
formula.
The following is a complete list of operations which are performed in the process of executing Eden:
• the identity operation
• increment
• equality checking
• binary tree concatenation (cons)
• binary tree decomposition (inverse cons)
• subtree access (iterated tree decomposition)
• new recursive invocations of Eden
Of these, only the operations involving cons require something beyond standard logical or arithmetic
primitives.
From the description above, in order to capture the computational invariants of an Eden computation,
it is sufficient to have the following:
1. a formalism for manipulating nouns;
2. a means to inspect and decompose formulas;
3. a way to manage subcomputation order and a way to store intermediate results for completing
earlier computations
4. a way to perform cons, inverse cons, and subtree accesses within our noun encoding;
It is based on these high level invariants that we motivate the design of our zkVM and its subsystems.
Since the design is ultimately divided up into various tables, we now briefly describe each one and it’s
purpose.
4.2 Tables
For conceptual simplicity, our design involves multiple tables. Roughly, each table specializes in one
of the core aspects of a generic Eden computation that we enumerated above. The role of each table
is to use its constraints to disallow traces that violate Eden’s invariants.
We now describe the core functionality of each table and how it relates to Eden’s invariants.
Stack Table
The stack table is the central table in our design. It models Eden computation as a stack machine which
directs the flow of computation according to the invariants of the Eden specification. Accordingly,
several other tables are its direct subsidiaries. The table makes use of various stacks as a means to keep
track of subcomputation ordering.
Noun Table
All data expressed in proof tables must be encoded in terms of finite field elements, and nouns are
no exception. The noun table’s role is to store encoded nouns for later reference by other parts of the
zkVM, and to prevent any improperly encoded nouns from being used in a computation. The table’s
design contains critical optimizations that allow us to validate nouns using our efficient linearized tree
representation.
12
Subtree Access Table
The subtree access table performs Eden computations whose formulas have opcode 0 for the stack
table, according to the semantics of the Eden specification for the / operator. Because the subject rep-
resents the totality of data accessible to the program, one can consider subtree access in Eden as an
analogue of memory access found in traditional systems.
Decoder Table
The role of the decoder table is to serve the stack table with a means to inspect and decompose the for-
mulas encountered throughout the computation, check that the formula nouns are correctly formed,
and expose information such as the opcode and subformulas of the formula. Because of the tight cou-
pling between the functionality that this table provides and what the stack table requires, it exists as
a virtual table alongside the stack table.
Exponent Table
In order to perform cons and inverse cons correctly in our noun encoding, we require the computation
of 𝑥𝑦 for some quantities 𝑥 and 𝑦 that depend on the nouns being operated on. In practice, calculating
𝑥𝑦 is non-trivial since 𝑥𝑦 is not a polynomial quantity and ends up being quite costly. Thus, in order
to amortize the costs, we batch computation of these exponentials into a single table. The exponent
table’s role is to compute these quantities correctly and share the results with other tables.
Pop Table
The pop table is a technical necessity that is required to ensure the validity of the pop operation on
the various stacks used in the system, and is thus a subsidiary of the stack table. More detail about the
stack data structure can be found in Section 6.2.
5 Arithmetization Fundamentals
5.1 Field Choice
We utilize the field 𝔽𝑝 with prime cardinality 𝑝 = 264 − 232 + 1 as our base field. This is the field we
use to populate the cells in the tables before any randomness is generated. This field is widely referred
to as the “Goldilocks field”, despite the fact that the “Goldilocks” label was originally attached to the
prime 2248 − 2224 − 1 in [Ham15]. Much has been written about the virtues of this particular field, see
e.g. [Por22], [Blo23], [Gou21]; they include fast arithmetic mod 𝑝, as well as support for efficient 32-
bit arithmetic and fast Fourier transforms.
The main drawback of this field is its small size in relation to achieving our desired proof soundness.
Thus, we use the unique up to isomorphism degree 3 extension 𝔽 of 𝔽𝑝 . As a set this is 𝔽 = 𝔽3𝑝 ; note that
|𝔽| ≈ 2192 . Additive operations are performed component-wise according to the addition in 𝔽𝑝 , and
multiplicative operations derive from 𝔽𝑝 [𝑥], modulo an irreducible cubic polynomial. Theoretically,
any such polynomial gives rise to the same field up to isomorphism, but we use the depressed cubic
𝑥3 − 𝑥 + 1 because its coefficients are simple; irreducibility was verified in Sage.
Whenever we use the symbol 𝔽 in the context of discussing our zkVM, 𝔽 will denote this degree 3
extension.
13
5.2 Schwartz-Zippel Lemma
The main technical tool for proving the soundness of certain arguments throughout is the following:
Lemma (Schwarz-Zippel): Let 𝑓(𝑥1 , …, 𝑥𝑘 ) be a nonzero multivariable polynomial with coefficients
in a field 𝕂, let 𝑑 be the degree of 𝑓, and let 𝑆 be a finite subset of 𝕂. Then the number of zeroes of 𝑓
in 𝑆 𝑘 is at most
𝑑 · |𝑆|𝑘−1 ,
so the probability that 𝑓(𝑟1 , …, 𝑟𝑘 ) = 0 where 𝑟1 , …, 𝑟𝑘 are drawn randomly and uniformly from 𝑆 is
at most
𝑑 · |𝑆|𝑘−1 𝑑
𝑘
= .
|𝑆| |𝑆|
This lemma generalizes the elementary fact that the number of roots of a univariate polynomial over
a field cannot exceed its degree; the proof is an uncomplicated induction from this fact.
A weaker version of the bound in this result was proved by Zippel in [Zip79]; in fact, he was bested by
a year by DeMillo & Lipton in [DL78], whose names are sometimes attached to the result. The strong
version given above was proved later by Schwartz in [Sch80].
5.3 Noun Encoding
Given that nouns are the fundamental data structure in Eden and we intend to prove Eden computa-
tions in the zk-STARK framework, our aim turns to the arithmetization of nouns.
Nouns have two distinct components: the shape of the tree, and the leaf data.
The fundamental connection between tree shape, leaf data ordering, and our arithmetization is depth-
first traversal, which we briefly review.
14
5.3.1 Depth-First Traversal
Depth-first traversal is a fundamental algorithm for ordering the nodes of a binary tree. It can be de-
scribed recursively in the following way: Add the current node to the list of nodes. Move to the left
child node and do a depth-first traversal. Then move to the right child node and do a depth-first tra-
versal.
As an example, here are the successive stages of the example noun we introduced above being ordered
according to the depth-first algorithm:
• • •
• / / \ / \
0 0 • 0 •
/
•
• • •
/ \ / \ / \
0 • 0 • 0 •
/ / / \
• • • 1
/ / \ / \
6 6 20 6 20
15
If the noun is merely an atom, then the word describing the shape is empty and we denote this with
the symbol 𝜖.
We note by Section 3.1.1 that if the length of the vector component of the noun encoding is 𝑛 then
the binary word component has length 2𝑛 − 2, since the letters of the word correspond to the edges
in the tree.
We call a word that describes the shape of a noun the dyck word of the noun and the vector contain-
ing the leaf data the leaf vector of the noun. We now turn to a deeper explanation of the various
properties of dyck words.
𝑇𝑙 = 𝐿𝑙 𝑅𝑙 (2)
These follow immediately from the recursive description of depth-first traversal. To see (1), note that
to depth-first traverse a tree (𝑇𝑑 ), we first move to the left subtree (0), then depth-first traverse the left
subtree (𝐿𝑑 ), then move to the right subtree (1), then depth-first traverse the right subtree (𝑅𝑑 ). To see
(2), use the same recursive description of depth-first traversal: since we traverse 𝑇 by first traversing
𝐿 in depth-first traversal order, the left subtree’s leaves will be first in the list of 𝑇 ’s leaves, and then
we traverse 𝑅 and find its leaves in depth-first order.
Relation (1) was in fact our main motivation for anchoring our arithmetization to depth-first traversal,
as opposed to other potential schemes, since it is intimately related to the fundamental tree operations
cons and inverse cons, and it is imperative that these operations can be efficiently encoded. Relying
on a dyck word-based encoding allows us to unlock performance characteristics that would otherwise
not be possible under a scheme utilizing hash-consing.
We’ve seen that a binary tree gives rise to a word that satisfies (1). If we flip (1) on its head and use it
to define a language we obtain the dyck language of dyck words:
A dyck word 𝑤 over the alphabet 0, 1 is either the empty word 𝜖 or has the form
𝑤 = 0𝑥1𝑦 (3)
where 𝑥 and 𝑦 are dyck words.
We point out that, a priori, the set of dyck words could be larger than the set of dyck words of binary
trees. However, it’s easy to convince oneself that this is not the case. First, the simplest dyck word is 𝜖
and this is the dyck word of the trivial tree with one node. We can then argue inductively from (3): if
a set of dyck words {𝑥𝑖 } are the dyck words of some binary trees, i.e. (𝑇𝑖 )𝑑 = 𝑥𝑖 , then each word in
the set of all words 0𝑥𝑖 1𝑥𝑗 is the dyck word of a tree, namely [𝑇𝑖 𝑇𝑗 ] = 0𝑥𝑖 1𝑥𝑗 . But according to the
𝑑
definition, every dyck word is generated inductively from the single word 𝜖 via relation (3), so every
dyck word represents the shape of a binary tree.
16
Now, we present a nice theorem which characterizes how to recognize a dyck word nonrecursively
within {0, 1}∗ by merely reading it left-to-right, which will become relevant when we write constraints
that recognize dyck words.
Theorem: The set of dyck words over the alphabet {0, 1} is characterized by two properties:
1. While reading left-to-right, the number of 0’s encountered so-far is at least as large as the number
of 1’s.
2. The total numbers of 0’s and 1’s are equal.
If 𝑤 = 𝑤0 …𝑤𝑛−1 then these properties are equivalent to the constraints
1. ∑𝑗𝑖=0 (1 − 2𝑤𝑖 ) ≥ 0 for 𝑗 = 0, …𝑛 − 1.
2. ∑𝑛−1
𝑖=0
(1 − 2𝑤𝑖 ) = 0
The fact that a dyck word has these properties is easy to verify by induction from the definition, and
the fact that the empty word has these properties. The converse can also be proven by induction, but
is a bit more involved.
5.4 Optimized Noun Encoding
While the encoding of nouns as (dyck-word, leaf-vector) pairs is a noteworthy and necessary step
on the road to arithmetization, it is insufficient by itself. This is due to the cost within our zkVM of
any manipulations of the literals in the undoubtedly large (dyck-word, leaf-vector) pairs that would
occur in any realistic Eden computation.
What is called for is a compression mechanism that would make the representation of nouns more
succinct, but without compromising the ability to perform the fundamental tree operation, cons.
where (𝑎0 , …, 𝑎𝑘 ) is any finite ordered tuple of field elements. If the tuple is empty, the convention is
that we map to the zero polynomial.
If 𝑇 is a noun, we will write 𝑇𝑑 (𝑥) and 𝑇𝑙 (𝑥) for the dyck and leaf polynomials, respectively, and
len(𝑇 ) for the length.
For example, if the noun is [0 [6 20] 1], the encoding is (4, 𝑥4 + 𝑥 + 1, 6𝑥2 + 20𝑥 + 1). On the
other hand, the encoding of an atom a is (1, 0, 𝑎).
The len component is the length of the tree, which we define to be the number of leaf nodes; i.e.
len = length(leaf-vector). Recall from the theorem of Section 3.1.1 that the number of leaf nodes of
a tree determines the number of total nodes and the number of edges. Through this, len implicitly
encodes information about the length of the dyck word since it directly corresponds to the number of
edges, thus making it a comprehensive metric for the size of the tree.
The len component is also convenient for restoring any leading 0’s which become invisible when the
dyck word and leaf vector are converted into polynomials; we will point out other ways in which it is
useful later. However, including the length in the encoding is not strictly necessary. Briefly, one can
17
always restore the correct number of 0’s to recover the dyck word from the dyck polynomial since the
number of 0’s has to be equal to the number of 1’s. Then, the length of the leaf vector can be recovered
from the length of the dyck word, which can be used to restore any missing 0’s to the leaf vector from
the leaf polynomial.
𝑇𝑑 (𝑥) = 0 · 𝑥2· len(𝐿)+2 len(𝑅)−3 + 𝑥2· len(𝑅)−1 𝐿𝑑 (𝑥) + 1 · 𝑥2· len(𝑅)−2 + 𝑅𝑑 (𝑥) (5)
The powers of 𝑥 are merely to shift the different components of the original cons formulas so there is
no additive “mixing” of the word/vector literals.
We also have the straightforward identity
len(𝑇 ) = len(𝑅) + len(𝐿) (7)
Now we present an algorithm for performing inverse cons (that is, the computation of 𝐿 and 𝑅 from
𝑇 ): Simply write down (len(𝐿), 𝐿𝑑 (𝑥), 𝐿𝑙 (𝑥)) and (len(𝑅), 𝑅𝑑 (𝑥), 𝑅𝑙 (𝑥)) and check that (5), (6), and
(7) hold for the given 𝑇 ! This is a nondeterministic algorithm: we don’t worry about how 𝐿 or 𝑅 were
produced, we only care that we can efficiently check that they are correct. This is useful in an IOP
context precisely because traces are computed after a program has been fully executed, allowing a
prover to have access to such non-deterministic information.
is the polynomial representation of the noun. Then we fingerprint the dyck and leaf polynomials by
evaluating them at the 𝛼𝑖 , producing the field elements (felts) that comprise the ION fingerprint:
2𝜆−3 𝜆−1
(len, dyck-felt, leaf-felt) = (𝜆, ∑ 𝑤𝑖 𝛼2𝜆−3−𝑖
1 , ∑ 𝑙𝑖 𝛼𝜆−1−𝑖
2 )
𝑖=0 𝑖=0
This fingerprinting scheme is homomorphic because for any felt 𝛼, the evaluation map 𝑓(𝑥) ↦ 𝑓(𝛼)
is a ring homomorphism 𝔽[𝑥] → 𝔽. We will make use of this property in the following section.
An advantage of including the length in the noun representation emerges here for security: the only
possibility for collisions in this representation can occur between nouns of the same length. Since
distinct polynomials evaluated on a random input will almost certainly give different outputs over a
large domain, the probability that two different nouns have the same length, dyck felt, and leaf felt is
negligible; see Section 8.1 for a more involved analysis.
18
5.4.4 Cons in the Felt Triple Representation
Since the polynomial cons relations are polynomial identities, these can, with high probability, be
validated by checking that each side has identical evaluations on random field elements. Thus, if
len(𝑇 ) = len(𝑅) + len(𝐿), then
2· len(𝐿)+2· len(𝑅)−3 2· len(𝑅)−1 2· len(𝑅)−2
𝑇𝑑 (𝛼1 ) = 0 · 𝛼1 + 𝛼1 𝐿𝑑 (𝛼1 ) + 1 · 𝛼1 + 𝑅𝑑 (𝛼1 ) (9)
len(𝑅)
𝑇𝑙 (𝛼2 ) = 𝛼2 𝐿𝑙 (𝛼2 ) + 𝑅𝑙 (𝛼2 ) (10)
and it is overwhelmingly likely that 𝑇 = cons(𝐿, 𝑅). Note that, theoretically, the cons relations (9)
and (10) have the same form as the relations (5) and (6) precisely because the evaluation map is a ring
homomorphism.
For performing inverse cons in this representation (outlined in Section 5.4.2), we can use the same
nondeterministic algorithm we employed in the polynomial encoding with the 𝑥 variables replaced by
the appropriate 𝛼𝑖 .
19
For the soundness of this procedure, which uses the Schwartz-Zippel lemma in the multivariate linear
case, see [Sze].
20
Here, 1 means we perform the operation in that column. In order for this to be correct, we must have
𝑖3 = 𝑖2 , 𝑖6 = 𝑖5 , and 𝑖7 = 𝑖4 . Assuming this to be the case, we can show the state of the stack as a
polynomial in each row after the operation in that row has been performed:
while pop is
𝑠(𝑥) − 𝑖
pop(𝑖, 𝑠(𝑥)) = (12)
𝑥
provided that the constant term of 𝑠(𝑥) is 𝑖, i.e. 𝑠(0) = 𝑖.
Obviously, the pop operation is more difficult to verify, given the condition on its evaluation. Note that
𝑠(0) = 𝑖 if and only if there is a polynomial 𝑠(𝑥)
̂ such that 𝑠(𝑥) = 𝑥 · 𝑠(𝑥)
̂ + 𝑖, and that 𝑠(𝑥)
̂ is unique
and precisely pop(𝑖, 𝑠(𝑥)).
Thus, we can reconsider pop as a nondeterministic algorithm that returns the new state of the stack
and the item popped:
pop(𝑠(𝑥)) = (𝑠(𝑥),
̂ 𝑖)
Let 𝑣′ be the variable which denotes the value of the variable 𝑣 in the next row. Whenever pop′ is 1,
we imagine there to be a constraint that says poly = 𝑥 · 𝑠 ̂ + item′ which guarantees that item can be
21
popped and that 𝑠 ̂ is the next stack state; thus we also imagine a constraint forcing poly′ = 𝑠 ̂ in this
instance.
In the actual zkVM we have to use field elements in cells and not polynomials involving an indetermi-
nate variable 𝑥, but we can do this by replacing the 𝑥 in the previous figure with a verifier-provided
random value 𝛾. The identities
poly(𝑥) = 𝑥 · 𝑠(𝑥)
̂ +𝑖
provided that the 𝑠 ̂ polynomials are validly accumulated and made available. This is the purpose of the
Pop Table, explained in the next chapter.
7.1.1.1 Background
As mentioned previously, the core functionality of the stack table is to model valid Eden computation
and reject any computation that does not follow the proper semantics.
It accomplishes this by modeling a stack machine that executes Eden, always eagerly processing the
subformulas of the current computation and then processing the outer formula with the results. When
non-recursive formulas are encountered, their result is computed directly.
In other words, it can be described as an improved version of a tree-walking interpreter that has been
linearized in order to to fit more neatly within the tools that the AIR formalism has to offer.
While the table is central to the overall zkVM, it outsources a number of responsibilities to other tables,
namely
• noun validation to the noun table,
• Eden 0s calculation to the subtree access table
• stack popping validation to the pop table
7.1.1.2 Description
The stack machine has 3 stacks:
• a compute stack (CS), which is used to hold the remaining subcomputations to execute, which
come in subject/formula pairs
• a product stack (PS), which is used to hold the results of both non-recursive (i.e. terminal) and
intermediate subformula executions
• an operation stack (OS), which stores machine-specific operations as well as deferred Eden op-
erations to be done after subformulas have been computed.
22
Deferred opcodes are either lone atoms or tuples containing additional context to be used after process-
ing subformulas of the opcode in question. The machine-specific operations allow the stack machine
to perform critical tasks in addition to executing Eden opcodes themselves.
The primary machine-specific operation used is pop2, which takes a subject/formula pair off of the
compute stack and sets it as the next computation to run. When combined with the OS, pop2 allows
the machine to delineate subcomputation boundaries in the otherwise flat compute stack and allows
the machine to descend into a subcomputation while preserving the context of earlier computations.
The other machine-specific operation is cons, which signals to the machine that the formula is cons
and needs to be handled accordingly.
The stack machine also has a few registers:
• subj - the current subject
• form - the current formula
• op - the current operation to perform, taken from the OS
Moreover, we can think of the machine as having 3 states:
• start - this is the initial state of the machine. No computation has been executed, but the subject and
formula are loaded on the computation stack
• middle - this is the state during which Eden is recursively evaluated
• end - this is the final state of the machine, when no further computation is left to complete on the
computation stack, and the result of the Eden computation is available on the product stack.
These states are helpful for thinking about the machine but do not need to be explicitly encoded in the
trace, and can be implemented through constraints implicitly.
23
iii) Push the opcode of the formula to the OS such that when the subformulas have been
processed, we can come back to the outer instruction and finish processing it. If the formula
was a cons cell, push the cons operation to the OS.
iv) Push a pop2 on the OS for each subcomputation that will need to be evaluated.
b) If the operation is an Eden opcode, finish executing the instruction by manipulating the CS and
PS appropriately based on the opcode using the intermediate results stored on the PS.
c) If the operation is cons, finish processing it by consing the top two items on the product stack
to produce a cell.
4) If the OS is empty, then the final result of the computation is the single remaining value on the PS.
Otherwise, the machine returns to Step 2, and advances by recording state in the next row of the
trace.
24
To keep the table terse, we employ a shorthand: for the stack related columns, an empty cell means
the value remains unchanged, while a filled cell means that the included value(s) are to be pushed to
the stack.
Below are the semantics of the deferred Eden opcodes, including the cons instruction.
This observation generalizes to deconstructing other formulas with 2 or 3 subformulas, and thus the
whole formula deconstruction process can be implemented in terms of memory accesses operating on
formulas as the subject. In light of this, we can completely obviate any additional constraints in the
table that are used to prove the inverse cons relation when deconstructing formulas and instead reuse
the logic already present in the subtree access table.
25
dyck word is checked for validity and the dyck-felt and leaf-felt are accumulated in extension columns.
In the multiplicity phase, the result of this accumulation is placed repeatedly in a multiset.
One of the primary variables in the noun table is dyck, which denotes the column where the dyck
words are input by the prover. At the beginning of a segment of the noun table, the variable rem, for
“remaining,” is initialized by the prover. It indicates the number of letters remaining until we reach the
intended end of the dyck word for this segment, i.e. until the accumulation phase is completed.
The dyck column is constrained to only contain 0’s and 1’s in each cell. We can recall from the theorem
of Section 3.1.1 that the constraints
1. ∑𝑗𝑖=0 (1 − 2𝑤𝑖 ) ≥ 0 for 𝑗 = 0, …𝑛 − 1.
2. ∑𝑛−1
𝑖=0
(1 − 2𝑤𝑖 ) = 0
characterize dyck words, where 𝑤𝑖 denotes a binary letter. In the noun table, the partial sum
∑𝑗𝑖=0 (1 − 2𝑤𝑖 ) is denoted by the column variable ct for “count” (though it’s properly a weighted
count). We enforce the constraint ct′ = ct + (1 − 2 dyck′) throughout the accumulation phase to
maintain ct’s character as a running sum. It’s easy to encode the second constraint, which is that ct = 0
at the end of the accumulation phase. To encode the first constraint above we need to work harder
since we’re encoding an inequality in terms of polynomial equalities. The basic idea is that since the
weighted count can only increment or decrement, if ct ever went below 0 it would have to hit −1, and
so we encode ct ≠ −1 instead. We do this by introducing the variable ict-i, for “increment-of-count’s
inverse”, which is the true inverse of ct + 1 when ct + 1 ≠ 0, and 0 otherwise. Then the constraint
(ct + 1) · ict-i = 1 prevents ct = −1.
The column leaf contains the components of the leaf vectors, which aren’t constrained during the ac-
cumulation phase. The column len counts up from 1 for each 1 in the dyck word; since a tree of length
𝑛 has a dyck word with 2𝑛 − 2 letters, half of which are 1’s, this increments the variable len 𝑛 − 1
times from 1 to the correct final value of 𝑛.
In the extension columns dyck-felt and leaf-felt, the values in the dyck and leaf columns are accumu-
lated into the evaluations of the dyck and leaf polynomials at the random values 𝛼1 , 𝛼2 . While in the
accumulation phase,
dyck-felt′ = 𝛼1 · dyck-felt + dyck
The second constraint here is a conditional on the next character of the dyck word, and works because
of some tree combinatorics: we’re currently at a leaf if we can’t go down into the tree any further
from our current position, which means the next symbol of the dyck word is 1. So we accumulate leaf
components when the next dyck character is 1.
Once this accumulation phase is complete, we enter the multiplicity phase. The multiplicity mult is a
variable whose value is set by the prover at the top of a segment and remains constant throughout the
accumulation phase. In the multiplicity phase it decrements from row to row. With each decrement,
the values in the len, dyck, and leaf columns after the accumulation phase are compressed and put
into a multiset mset:
mset′ = mset · (𝛽 − (𝑎 len′ + 𝑏 dyck-felt′ + 𝑐 leaf-felt′))
26
Atomic nouns are an edge case due to their empty dyck words and have to be handled specially, but
this is not terribly difficult and we omit the details here.
We close out this section by mentioning that this basic design can be optimized. The basic problem
with the design above is that there’s substantial redundancy. For example, imagine two nouns with
large identical left subtrees but distinct right subtrees; these left subtrees would both be present in the
table twice, taking up double the number of rows. In the optimized design, we pass over the literals of
certain nouns in a first phase and record them as above. Then in a second phase, we cons the nouns
recorded in the first phase to create larger composite nouns and record them in a multiset. This cons
operation occurs in one row, yielding a significantly more efficient algorithm.
27
Since binary expansions are unique, any wrong path digit will lead to the axis never hitting the target
value. If the axis value never hits the target, the result of the axis lookup is never recorded.
In each row, we have a tree and its left and right subtrees. This decomposition is enforced by cons
constraints verifying that the left and right subtrees are correct.
The 1 in the first row of the path column determines, via constraint, that the right subtree of the ini-
tial tree will become the new tree of the next row. Then the 0 in the second row of the path column
determines that the left subtree of 𝑇 will become the new tree in the third row. Finally, the 1 in the
third row determines that the right subtree will become the tree in the final row, which is the target
subtree of 𝑆 since we’re out of digits.
Once the target subtree is found, we enter the multiplicity phase, where the results of the traversal
phase are recorded with multiplicity. There is an mset variable such that during the multiplicity phase
mset′ = mset · (𝛽 − (𝑝 · subject′ + 𝑞 · tarax′ + 𝑟 · tree′))
until the multiplicity variable mult, which is constant throughout the traversal phase, decrements to
0. Here, 𝛽, 𝑝, 𝑞, 𝑟 are verifier-generated randomness. Note that the input subject and axis to the Eden
0 computation are included in the multiset, as well as the output. This is the same form as the query
from the stack table: the stack table effectively asks, “Does running an Eden 0 computation with this
subject and axis yield this output?” and the subtree access table responds, “Running an Eden 0 on this
subject and axis yields this output.”
The variable subject is similar to tarax in that it is part of the input to the computation, and it re-
mains constant throughout each segment. Its value is constrained in the first row of a segment to be
𝑎 · tree-len + 𝑏 · tree-dyck + 𝑐 · tree-leaf where (tree-len, tree-dyck, tree-leaf) is the disaggregation
of the tree variable in Table 8, and 𝑎, 𝑏, 𝑐 are verifier randomness.
len(𝑅)
𝑇𝑙 (𝛼2 ) = 𝛼2 𝐿𝑙 (𝛼2 ) + 𝑅𝑙 (𝛼2 )
2· len(𝑅)−1 2· len(𝑅)−2 len(𝑅)
In order to make use of these, we need to compute the exponents 𝛼1 , 𝛼1 , and 𝛼2 .
Since len(𝑅) is a variable these are exponential functions which need to be computed in polynomial
terms. This is the sole purpose of the exponent table. It then shares its results via multiset with other
tables that perform cons.
It’s easy to describe a naive design for computing exponents 𝑎𝑥 of a value 𝑎 (𝑎 could be either of the
𝛼𝑖 above). Starting at 𝑎, we multiply, one step at a time, new factors of 𝑎 into a variable exp, and
increment the power variable pow. We then enter a secondary phase in which we accumulate (𝑥, 𝑎𝑥 )
pairs into the multiset mset according to the value of the multiplicity mult. This multiset evolves as
mset′ = mset · (𝛽 − (𝑎 · pow + 𝑏 · exp))
while the multiplicity decrements. Once mult reaches 0, pow increments and the accumulation process
begins again.
28
A more involved optimization of this design that we implement is to initially compute 𝑎, 𝑎2 , 𝑎4 , … ,
𝑎256 by squaring so that one can increment the power by any integer in the range [1, 512), and skip
ranges of exponents which aren’t needed and would take up unnecessary table space.
29
straints force the appropriate cells to contain the fingerprints of the input computation and output
respectively. From there, the zk-STARK protocol proceeds normally. This addition allows the proofs
generated from the system to be tied to the input and output as desired.
7.3 Extra Commitment Phase
As we’ve discussed, in order to use ION Fingerprints across the various tables of the Eden zkVM, these
tables create multisets of the (len, dyck-felt, leaf-felt) triples that they use and check via a multiset
argument that these coincide with ION Fingerprints constructed by the noun table.
However, there’s a technicality to be aware of here. In a multiset argument for the equality of
multisets {𝑥𝑖 }, {𝑦𝑗 }, the value 𝛽 which is used to check whether ∏(𝑡 − 𝑥𝑖 ) = ∏(𝑡 − 𝑦𝑗 ) must be
random. In practice, this means that all values {𝑥𝑖 }, {𝑦𝑗 } must be cryptographically committed to
before the verifier generates the random value 𝛽. If the prover knew the value of 𝛽 and was not com-
mitted to {𝑥𝑖 }, {𝑦𝑗 }, i.e. could change them after knowing 𝛽, it would be trivially easy to satisfy
∏(𝛽 − 𝑥𝑖 ) = ∏(𝛽 − 𝑦𝑗 ) with unequal multisets {𝑥𝑖 } ≠ {𝑦𝑗 }.
In the Eden zkVM, the prover needs randomness to construct the ION fingerprints, and this random-
ness is obtained from the verifier after an initial commitment to base columns. However, if the random
value 𝛽 used in multiset arguments as above is released at the same time as the random values used
to construct the ION Fingerprints, there is a problem of the type indicated in the previous paragraph.
This is because in every table except the noun table the fingerprint has not been committed to, and
now the value 𝛽 is known. This would give the prover leverage to forge inappropriate fingerprints that
would pass the multiset argument.
Our solution is to add an extra commitment phase in the construction of our tables. First, base columns
are committed to, and the verifier generates the randomness 𝛼1 and 𝛼2 used to construct ION Finger-
prints. Then, columns that only depend on the values 𝛼1 and 𝛼2 are filled and committed to; these are
the primary extension columns. The verifier then generates the remaining randomness, which includes
the randomness used to compress tuples and make multiset and stack arguments. Only then does the
prover fill the secondary extension columns to complete the construction of the tables.
7.4 Large-Noun Elision
Among various proof system performance metrics, proof size is often a primary concern. Because of
the way that Eden organizes computation, all code that a given computation wishes to reference or
make use of must be present in the subject, including the standard library. In combination with the
input/output linking technique described above, this would necessitate the prover having to send large
nouns in every proof, paying for the cost each time.
To avoid this, we make another modification to the zk-STARK protocol by taking advantage of the fact
that certain large nouns such as the standard library are “well-known”, i.e. are known a priori to both
the prover and verifier. When sending the subject and formula over the proof stream, we elide well-
known nouns by replacing each noun with a special tag that contains its hash instead. The verifier
then scans the noun for these tags and replaces them with the appropriate noun that it has associated
to the hash. The normal zk-STARK protocol proceeds from this point.
30
8 Security Analysis
8.1 Noun Collision
In this section we investigate the probability that two distinct nouns a and b encoded by the
noun table have the same ION Fingerprint. Recall that the ION Fingerprint of a noun a is
𝜄(𝑎) = (len(𝑎), 𝑑𝑎 (𝛼1 ), 𝑙𝑎 (𝛼2 )), where 𝛼1 , 𝛼2 are random values from the field. Thus a and b have the
same ION Fingerprint if and only if len(𝑎) = len(𝑏), 𝛼1 is a root of 𝑑𝑎 − 𝑑𝑏 , and 𝛼2 is a root of 𝑙𝑎 − 𝑙𝑏 .
For each length 𝑙, and each unordered pair {𝑎, 𝑏} of distinct nouns with length 𝑙, there is a set of
roots 𝑟𝑑 ({𝑎, 𝑏}) of 𝑑𝑎 − 𝑑𝑏 in 𝔽 and a set of roots 𝑟𝑙 ({𝑎, 𝑏}) of 𝑙𝑎 − 𝑙𝑏 in 𝔽, such that a and b
have the same ION Fingerprint if and only if 𝛼1 ∈ 𝑟𝑑 ({𝑎, 𝑏}) and 𝛼2 ∈ 𝑟𝑙 ({𝑎, 𝑏}). Since the 𝛼𝑖 are
independently chosen, we can equivalently say that a and b have the same ION Fingerprint if and
only if (𝛼1 , 𝛼2 ) ∈ 𝑟𝑑 ({𝑎, 𝑏}) × 𝑟𝑙 ({𝑎, 𝑏}). Let 𝑆 denote the set of pairs {𝑎, 𝑏} such that 𝑎 ≠ 𝑏 and
len(𝑎) = len(𝑏). Then, there is an ION Fingerprint collision between some pair {𝑎, 𝑏} ∈ 𝑆 if and only
if (𝛼1 , 𝛼2 ) ∈ ∪𝑆 𝑟𝑑 ({𝑎, 𝑏}) × 𝑟𝑙 ({𝑎, 𝑏}).
Let 𝑁 be the number of total nouns in the program; note that 𝑁 ≤ 232 since 232 is the maximum
allowable height of the noun table. For a length 𝑙, let 𝑁𝑙 denote the number of nouns of length 𝑙, so
that 𝑁 = ∑𝑙 𝑁𝑙 .
Note that for {𝑎, 𝑏} ∈ 𝑆, either 𝑟𝑑 ({𝑎, 𝑏}) = 𝔽 or 𝑟𝑙 ({𝑎, 𝑏}) = 𝔽 but not both. The equality
𝑟𝑑 ({𝑎, 𝑏}) = 𝔽 is equivalent to the case where a and b have the same tree shape but different leaf vec-
tors, whereas 𝑟𝑙 ({𝑎, 𝑏}) = 𝔽 is equivalent to the case where a and b are distinct nouns with the same
leaf vectors on different tree shapes. If neither are equal to 𝔽, then |𝑟𝑑 ({𝑎, 𝑏}) × 𝑟𝑙 ({𝑎, 𝑏})| ≤ 2(𝑙 − 1)𝑙.
In the worst case, |𝑟𝑑 ({𝑎, 𝑏}) × 𝑟𝑙 ({𝑎, 𝑏})| ≤ 2(𝑙 − 1)|𝔽|.
The worst case a priori upper estimate for |∪𝑆 𝑟𝑑 ({𝑎, 𝑏}) × 𝑟𝑙 ({𝑎, 𝑏})| is thus
1
∑ 𝑁𝑙 (𝑁𝑙 − 1) · 2(𝑙 − 1)|𝔽|
𝑙
2
32
2
In a naive implementation of the noun table we have 𝑁𝑙 ≤ 2(𝑙−1) since the dyck word of each noun is
stacked vertically in the noun table. Using this naive esimate as a heuristic, we’d find
|∪𝑆 𝑟𝑑 ({𝑎, 𝑏}) × 𝑟𝑙 ({𝑎, 𝑏})| < 𝑁 · 231 · |𝔽| ≤ 263 |𝔽|
which puts the probability of sampling (𝛼1 , 𝛼2 ) from ∪𝑆 𝑟𝑑 ({𝑎, 𝑏}) × 𝑟𝑙 ({𝑎, 𝑏}) as at most
263−192 = 2−129 .
If a performance-optimized version of the noun table were to cause the previous upper bound in the
232 264
inequality 𝑁𝑙 ≤ 2(𝑙−1) to swell by a factor of 232 , the resulting inequality would then be 𝑁𝑙 ≤ 2(𝑙−1) .
−97
This would still only put the probability of sampling appropriate (𝛼1 , 𝛼2 ) at 2 in the worst case.
Thus we conclude that the ION Fingerprint scheme within our zkVM is sufficiently secure against
collision.
8.2 Cons & Inverse Cons Validation
Recall that the way we perform cons(𝑎, 𝑏) of two nouns in our zkVM is by obtaining the ION Finger-
prints 𝜄(𝑎), 𝜄(𝑏) with randomness 𝛼1 , 𝛼2 and then performing cons in this representation (which we
denote by [𝜄(𝑎) 𝜄(𝑏)]). This latter quantity is a triple of field elements that we always validate (using
a multiset argument) to be equal to 𝜄(𝑐) for some noun c. Now, it is a priori possible that there is a c
31
such that 𝜄(𝑐) = [𝜄(𝑎) 𝜄(𝑏)] despite c ≠ [a b]. But if [a b] exists in the noun table, then we have the
collision 𝜄([𝑎 𝑏]) = 𝜄(𝑐), which we know is highly unlikely by Section 8.1. If [a b] is not already in
the noun table, then the noun table augmented with this single noun would produce a collision. This
can be demonstrated to be highly unlikely by making the same argument as in Section 8.1 with one
extra noun and a noun table of height at most 233 (since the extra noun can occupy at most 232 rows).
Inverse cons is somewhat different; to perform inverse cons on a noun a in the zkVM, which is repre-
sented in the form 𝜄(𝑎), two triples of field elements are produced, which are validated to be equal to
𝜄(𝑏) and 𝜄(𝑐) for nouns b and c, and it is checked that 𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)]. The nondeterministic nature
of the algorithm now gives the prover more latitude: they can survey all combinations of 𝜄(𝑏) and 𝜄(𝑐)
such that len(𝑎) = len(𝑏) + len(𝑐) and hope to find one such that a ≠ [b c] but 𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)].
However, the prover is still heavily constrained, as they must commit to the particular multiplicities
for each noun before they ever see any randomness. To illustrate, suppose that after the 𝛼𝑖 are gener-
ated by the verifier, the prover identifies appropriate a, b, c such that a ≠ [b c] but 𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)].
Either the multiplicities of b and c were committed to based on an honest run of the program, or they
were incorrect and committed to maliciously. In the former case, a malicious prover would be forced
to repurpose an existing instance of b and c in the program execution in order to take advantage of
𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)] to prove the false inverse cons relation a ≠ [b c]. This is because the prover cannot
modify the multiplicities of b and c given their prior commitment. But now the original rows where
b and c were used can no longer make reference to them, and have to be filled by some other nouns.
This creates a cascade of difficulties for the prover who now has to reshuffle nouns into new roles and
hope not to violate constraints. This adds robustness to our security analysis, as the variety of layers
of constraints drastically reduces the prover’s degrees of freedom. In the remaining case, the prover
simply commits to false multiplicities with the expectation that a ≠ [b c] and 𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)] would
hold. But this involves predicting ahead of time that the verifier’s random challenges 𝛼1 and 𝛼2 will
be the roots of two given polynomials, which by Schwartz-Zippel is completely improbable.
Leaving such intuition to the side, what is the probability that a ≠ [b c] but 𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)] holds?
Firstly, based on Equations (5) and (6), there are two categories of ways in which a ≠ [b c] can oc-
cur: (1) 𝑑𝑎 ≠ [𝑑𝑏 𝑑𝑐 ] and 𝑙𝑎 ≠ [𝑙𝑏 𝑙𝑐 ], and; (2) exactly one of 𝑑𝑎 ≠ [𝑑𝑏 𝑑𝑐 ], 𝑙𝑎 ≠ [𝑙𝑏 𝑙𝑐 ] hold. We can
easily argue that for the dishonest prover, looking for instances of a ≠ [b c] in category (1) where
𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)] is futile. For a given a, there are at most 264 pairs (b, c), so even if a had length 232 ,
3 2
(232 )
the probability of a single such instance is at most ( |𝔽|
) ≈ 2−192 , by using Schwartz-Zippel once
32
for each random value 𝛼𝑖 . So, even if there were 2 such inverse cons equations, the probability of
a single false positive would be in the neighborhood of 2−160 . This is negligible, so for all practical
purposes such cases can be ignored.
Thus, we turn to the odds of 𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)] for instances of a ≠ [b c] in category (2). The odds
of the prover seeing a ≠ [b c] because only one of 𝑑𝑎 = [𝑑𝑏 𝑑𝑐 ] or 𝑙𝑎 = [𝑙𝑏 𝑙𝑐 ] hold, and seeing
𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)], depends on:
1. the number of inverse cons relations
2. for any given a to which the inverse cons will apply, the number of b and c nouns available in the
program such that only one of 𝑑𝑎 = [𝑑𝑏 𝑑𝑐 ] or 𝑙𝑎 = [𝑙𝑏 𝑙𝑐 ] hold, and
3. the length of a, which determines the number of roots of 𝑙𝑎 − [𝑙𝑏 𝑙𝑐 ] that can be targeted by 𝛼2 (in
the case where 𝑑𝑎 = [𝑑𝑏 𝑑𝑐 ]) and the number of roots of 𝑑𝑎 − [𝑑𝑏 𝑑𝑐 ] that can be targeted by 𝛼1 (in
the case where 𝑙𝑎 = [𝑙𝑏 𝑙𝑐 ]).
32
In light of this, we will now provide a security argument using the naive implementation of the noun
table. Let 𝑁 be the total number of nouns in a given zkVM program, so 𝑁 ≤ 232 . For any noun length
𝑙, let 𝑁𝑙 be the number of nouns of length 𝑙. Of course, 𝑁 = ∑ 𝑁𝑙 , but critically we also have
since each noun of length 𝑙 takes up a vertical space of 2(𝑙 − 1). Now, for each noun a of length 𝑙 and
each decomposition 𝑙 = 𝑘 + (𝑙 − 𝑘) (where 𝑘 = 1, …, 𝑙 − 1), there are at most 𝑁𝑘 nouns b of length
𝑘 and 𝑁𝑙−𝑘 nouns c of length 𝑙 − 𝑘 available to construct a false equation a ≠ [b c] of category (2)
such that 𝜄(𝑎) = [𝜄(𝑏) 𝜄(𝑐)] holds. Thus, for each noun a of length 𝑙, there are at most 𝑃𝑙 total ways to
𝑙−1
attempt a false construction, where 𝑃𝑙 is the convolution-type sum 𝑃𝑙 = ∑𝑘=1 𝑁𝑘 · 𝑁𝑙−𝑘 . Since there
are 𝑁𝑙 nouns of length 𝑙, and the number of roots of either the dyck cons or leaf cons equation cannot
exceed 2(𝑙 − 1), the maximum possible number of roots available that would allow a category (2) false
cons is
∑ 𝑃𝑙 · 𝑁𝑙 · 2(𝑙 − 1) (14)
This double sum is further bounded by (∑ 𝑁𝑙 )2 = 𝑁 2 ≤ 264 by expanding (∑ 𝑁𝑙 )2 and the positivity
of the terms. Putting this together, we conclude that the maximum possible number of roots available
96
that would allow a false cons in category (2) is at most 296 , yielding a probability of at most 2|𝔽| ≈ 2−96
of such an event.
33
References
[Sze] A. Szepieniec. [Online]. Available: https://web.archive.org/web/20230630200925/https://
github.com/aszepieniec/stark-brainfuck/issues/43
[KR08] Y. T. Kalai, and R. Raz, “Interactive PCP,” in Automata, Languages Program., Berlin, Heidel-
berg, 2008, pp. 536–547.
[Ham15] M. Hamburg, “Ed-448 Goldilocks, a new elliptic curve,” 2015. [Online]. Available: https://
eprint.iacr.org/2015/625.pdf
[BCS16] E. Ben-Sasson, A. Chiesa, and N. Spooner, “Interactive Oracle Proofs,” 2016. [Online]. Avail-
able: https://eprint.iacr.org/2016/116 (Cryptology ePrint Archive, Paper 2016/116)
[Ben+18] E. Ben-Sasson, I. Bentov, Y. Horesh, and M. Riabzev, “Scalable, transparent, and post-
quantum secure computational integrity,” 2018. [Online]. Available: https://eprint.iacr.org/
2018/046 (Cryptology ePrint Archive, Paper 2018/046)
[Gou21] A. P. Goucher, “An efficient prime for number-theoretic transforms,” 2021. [On-
line]. Available: https://cp4space.hatsya.com/2021/09/01/an-efficient-prime-for-number-
theoretic-transforms/
[GPR21] L. Goldberg, S. Papini, and M. Riabzev, “Cairo – a Turing-complete STARK-friendly CPU
architecture,” 2021. [Online]. Available: https://eprint.iacr.org/2021/1063.pdf (Cryptology
ePrint Archive, Paper 2021/1063)
[Por22] T. Pornin, “EcGFp5: a Specialized Elliptic Curve,” 2022. [Online]. Available: https://
eprint.iacr.org/2022/274.pdf
[Noc23] “Nock Definition,” 2023. [Online]. Available: https://developers.urbit.org/reference/nock/
definition
[Blo23] R. Bloemen, “The Goldilocks Prime,” 2023. [Online]. Available: https://xn--2-umb.com/22/
goldilocks/
[DL78] DeMillo, and Lipton, “A probabilistic remark on algebraic program testing,” 1978.
[Zip79] R. Zippel, “Probabilistic algorithms for sparse polynomials,” 1979.
[Sch80] J. T. Schwartz, “Fast probabilistic algorithms for verification of polynomial identities,” 1980.
[FS87] A. Fiat, and A. Shamir, “"How To Prove Yourself: Practical Solutions to Identification and
Signature Problems",” in Advances Cryptology --- CRYPTO' 86, Berlin, Heidelberg, 1987, pp.
186–194.
34