0% found this document useful (0 votes)

23 views12 pages

Jellyfish Merkle Tree: Abstract

Uploaded by

drbaskerphd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views12 pages

Jellyfish Merkle Tree: Abstract

Uploaded by

drbaskerphd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Jellyfish Merkle Tree

Zhenhuan Gao, Yuxuan Hu, Qinfan Wu*

Abstract. This paper presents Jellyfish Merkle Tree (JMT), a space-and-computation-eﬀicient sparse
Merkle tree optimized for Log-Structured Merge-tree (LSM-tree) based key-value storage, which is
designed specially for the Diem Blockchain. JMT was inspired by Patricia Merkle Tree (PMT),
a sparse Merkle tree structure that powers the widely known Ethereum network. JMT further
makes quite a few optimizations in node key, node types and proof format to find the ideal balance
between the complexity of data structure, storage, I/O overhead and computation eﬀiciency, to cater
to the needs of the Diem Blockchain. JMT has been implemented in Rust, but it is language-
independent such that it could be implemented in other programming languages. Also, the JMT
structure presented is of great flexibility in implementation details for fitting various practical use
cases.

1 Introduction

Merkle tree, since introduced by Ralph Merkle [1], has been a de facto solution to productionized
account-model blockchain systems. Ethereum [2] and HyperLedger [3], are examples with various
versions of Merkle trees.
Although Merkle tree fits pretty well as an authenticated key-value store holding a huge amount of
data in a tamper-proof way, its performance in terms of computation cost and storage footprint are
two major concerns where people have been trying to achieve some enhancements. Among them, the
sparseness of Merkle trees [2], [4], [5] has been aptly leveraged to cut off unnecessary complexity in
more compact and eﬀicient Merkle tree design. Virtually, a Merkle tree in reality has far fewer leaf
nodes at the bottom level, a far cry from the intractable size that can be held by a perfect Merkle
tree. If sticking rigorously to the original Merkle tree structure, any design will by no means escape
from excessive computation and space overhead.
The optimizations offered by the sparseness can also benefit other related data, notably proofs [6].
With sparseness, and its resultant smaller tree sizes, shorter proofs can be properly designed with
less CPU time in generation and verification, and less network bandwidth in transmission, benefiting
both server and client sides.
Based on earlier works in the industry, we present JMT, a space-time win-win optimized sparse
Merkle tree designed specially for such blockchain systems as the Diem Blockchain [7], [8]. JMT is
built on top of an LSM-tree based key-value storage, featuring version-based key that circumvents
heavy I/O brought about by the randomness of a pervading hash-based key.

∗ The authors work at Novi Financial, a subsidiary of Facebook, Inc., and contribute this paper to the Diem Association
under a Creative Commons Attribution 4.0 International License.

Revised January 12, 2021

1
This mini whitepaper is structured as follows: First, we give a concise retrospection of Merkle trees
in practical uses cases where each leaf node is addressable via a key, followed by the variation and
evolvement (Section 2). We then present JMT, including node key structure and its implicit benefits,
logical data structures of two node types and related operations (Section 3). We then describe the
proof format of JMT with the verification algorithm, paired with examples (Section 4). Finally, we
conclude and discuss our future works (Section 5).

2 A Retrospection of Addressable Merkle Trees

Merkle trees are widely adopted by the blockchain industry as a cryptographically authenticated,
deterministic data structure that can be used to store and map between arbitrary binary data. In the
family of Merkle trees, an addressable Merkle tree, becomes the de facto standard of authenticated
data structure that keeps record of the global blockchain state. We presume the readers of this paper
are familiar with the background of Merkle trees and how Merkle trees are used to allow eﬀicient and
secure verification of the contents of large data structures with proofs.

2.1 Addressable Merkle Tree (AMT)

An authenticated data structure allows a verifier 𝑉 to hold a short authenticator 𝑎, which forms a
binding commitment to a larger data structure 𝐷. An untrusted prover 𝑃 , which holds 𝐷, computes
𝑓(𝐷) → 𝑟 and returns both 𝑟 — the result of the computation of some function 𝑓 on 𝐷 — as well as 𝜋
— a proof of the correct computation of the result — to the verifier. 𝑉 can run Verify(𝑎, 𝑓, 𝑟, 𝜋), which
returns true if and only if 𝑓(𝐷) = 𝑟. In the context of the Diem Blockchain, provers are generally
validators and verifiers are clients executing read queries.
In summary, an addressable Merkle tree is an authenticated data structure as a form of a binary
Merkle tree that stores maps. In a Merkle tree of size 2ℎ , the structure 𝐷 maps every key 𝑖 ∈ [0, 2ℎ )
to a value 𝑣𝑖 . The authenticator is formed from the root of a full binary tree created from the values,
labeling leaves as hash(𝑖 ‖ 𝑣𝑖 ) and internal nodes as hash(left ‖ right), where H is a cryptographic hash
function. Keys are encoded as the binary paths from the root to each leaf which are referred to as
keys in production.
1
The function 𝑓, which the prover wishes to authenticate, is an inclusion proof that a key-value pair
(𝑘, 𝑣) is within the map 𝐷.

a=H(h4||h5)
0 1
h4=H(h0||h1) h5=H(h2||h3)
0 1 0 1
h0=H(0||s0) h1=H(1||s1) h2=H(2||s2) h3=H(3||s3)

Figure 1: A Merkle tree storing 𝐷 = {0 ∶ s0, …}. If 𝑓 is a function that gets the third item (shown with a
dashed line) then 𝑟 = s2 and 𝜋 = [h3, h4] (these nodes are shown with a dotted line). Verify(𝑎, 𝑓, 𝑟, 𝜋)
?
verifies that 𝑎 = hash(h4 ‖ hash(hash(2 ‖ 𝑟) ‖ h3)).

𝑃 authenticates lookups for an item 𝑖 in 𝐷 by returning a proof 𝜋 that consists of the labels of the
sibling of each of the ancestors of node 𝑖. Figure 1 shows an AMT with 2-bit address where the
1 Secure Merkle trees must use different hash functions to hash the leaves and internal nodes to avoid confusion between
the two types of nodes. While we have omitted this detail in the example for simplicity, the Diem Core uses a unique
hash function to distinguish between different hash function types to avoid attacks based on type confusion [?].

2
addresses of nodes in binary are 0x00, 0x01, 0x10, 0x11, respectively. The addressing process is also
straightforward: Following each bit of the address from MSB to LSB, if it reaches a 0, visit the left
child of the current node; otherwise, the right child. Figure 1 also illustrates a lookup for the 3𝑟𝑑
item which is surrounded by dotted lines, and the nodes included in 𝜋 are shown with dashed lines.
In this way, the root digest, 𝑎, eﬀiciently authenticates all the data stored at all the leaf nodes,
addressed by their keys, altogether by recursively hashing the concatenation of the hashes of the two
children.

2.1.1 Tractable Representations

While a full tree with ℎ-bit keys (256 bits in Diem) of size 2ℎ is an intractable representation when ℎ
is a large value, Figure 2 shows two optimizations that can be applied to transform a naive binary
implementation ( 1 ) into an eﬀicient one for a sparse Merkle tree. First, subtrees that consist entirely
of empty nodes are replaced with a placeholder value (□ in 2 ) whose digest is a predefined constant
𝐷𝑑𝑖𝑔𝑒𝑠𝑡 as used in the certificate transparency system [9]. This optimization creates a representation
of a tractable size without substantially changing the proof generation of the Merkle tree.
However, leaves are always stored at the bottom level of the tree, meaning that the structure requires ℎ
hashes to be computed on every leaf modification. A second optimization replaces subtrees consisting
of exactly one leaf with a single node ( 3 ). In expectation, the depth of any given item is 𝑂(log 𝑛),
where 𝑛 is the number of items in the tree. This optimization reduces the number of hashes to be
computed when performing operations on the map including proof-related operations.
Some works [5] create an identical default digest only for the empty nodes on the same height ℎ and
ℎ ℎ−1 ℎ−1
let 𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 = 𝐻(𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 ‖ 𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 ). In this paper, we choose a single default digest 𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 to
represent empty subtrees at any level without ambiguity, especially in our design.

1 0 1
2 0 1

0 0 1 0 0 1
1 1

0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

A B C A B C

3 0 1

A 1
0

0 1

B C

Figure 2: Three versions of a sparse Merkle tree storing 𝐷 = {01002 ∶ A, 10002 ∶ B, 10112 ∶ C}.

2.2 Addressable Radix Merkle Tree (ARMT)

The same architecture could be extended to a more generalized radix tree since AMT is just a special
case with radix 𝑟 = 2. In turn, an ARMT is the compressed version of the corresponding AMT
such that every 𝑙𝑜𝑔2 (𝑟) levels of the original AMT are compressed into a single level where each node
has maximum 𝑟 children, though some children may not exist. We will use ARr MT to denote an
addressable Merkle tree with radix 𝑟. Notwithstanding 𝑟 = 2𝑘 , 𝑘 ∈ ℕ+ , most ARMT designs choose
AR16 MT in production, i.e., each internal node has up to 16 children.

3
The choice is mostly attributed from the following tradeoff between storage and I/O overhead because
an update of a single leaf node can require all the nodes on the path to the root to be updated
together.

• Larger 𝑟 leads to more storage cost for each update in that internal (non-leaf) node size is
proportional to 𝑟 so each update of a leaf node will update all the nodes on the path to the
root, resulting in greater write amplification.
• Smaller 𝑟 means a greater height of the tree that further increases I/O when querying or
updating the tree. For example, to traverse 4 levels down, a radix-16 tree only needs one read
but a binary tree requires four reads. For a read-heavy service such as Diem Core, 4x reads can
add up to a considerable burden to the underlying storage medium.

As long as this rule is honored, whatever 𝑟 could be applied to the physical representa-
tion/implementation of ARMT to adjust the tradeoff between I/O and space cost to fulfil
different requirements.
Figure 3 illustrates an AR16 MT with 8-bit keys. It is noted that when the tree is sparse, nodes may
have some empty children representing non-existence. In this figure, there are only four keys: 0x14,
0x1A, 0x1F, and 0xD5. Instead of bit by bit, traversing downward consumes one nibble at a time.
Similarly, an empty node □ indicates an empty subtree as aforementioned.

0 1

0 1 0 1

0x1
0 1 0 1 0xD

Compressed to radix-16 tree

0 1 0
1

0x4 0xF 0xC

0 0 0xA
0 1 1
1

0 0 0 1
A B C D
0 1 1 1

0 1 0 1 0 1 0 1

A B C D

Figure 3: AMT compressed to AR16 MT.

2.3 Use case: State Tree

In most mainstream blockchain architectures with an account based model, including the Diem
Blockchain, the global state at a specific time represents all the account information of the blockchain
at that time. The blockchain itself can be viewed as a state machine where the state can be updated
via some transaction execution mechanism.
It is the states at different versions of this state machine that have meaningful data engraved in the
chain and agreed upon by all the users. In Diem, a state tree showing a global state 𝑆𝑖 , represents
the state of all accounts at version 𝑖 as a map of key-value pairs, on the assumption that a fork is
not allowed. We use versioning to distinguish the states after applying each transaction, so 𝑆𝑖 is the
state of Diem Blockchain after applying the 𝑖-th transaction. Keys are based on the 256-bit account
addresses, and their corresponding value is the authenticator of the account. Therefore, a state tree
is a perfect use case of AMT where each leaf node can represent an account with its data and the
account address can be uniquely linked to the key of that leaf node.

4
3 Jellyfish Merkle Tree

To hold the global state of the Diem Blockchain, we propose a modified version of AR16 MT, named
Jellyfish Merkle Tree (JMT), with the following features:

• Version-based Node Key In lieu of directly adopting keys as raw node keys to lookup nodes
from Storage, JMT chooses a version-based key schema with multi-fold benefits:
– Facilitating version-based sharding.
– Greatly lowering compaction overhead in LSM-tree based storage engines such as
RocksDB [10].
– Smaller key size on average.
• Less Complexity JMT has only two physical node types, Internal Node and Leaf Node.
• Concise Proof Format The number of sibling digests in a JMT proof is less on aver-
age (Θ(𝑙𝑜𝑔(number of existent leaves))) than that of the same ARMT without optimizations
(𝑙𝑜𝑔(number of maximum leaves), i.e., the height of the equivalent AMT), requiring less
computation and space.

3.1 Versioning

Usually, AMT is utilized as a single-version database. In blockchain systems like Diem, a version
number is an monotonically increasing unsigned integer that represents the number of transactions
the system has executed and applied to the global state authenticated via an AMT. Since this paper
is oriented to be application independent, we define version as an monotonically increasing unsigned
integer used to identify different AMTs between updates so each update bumps the version by one.
The new tree reuses unchanged portions generated at previous versions, forming a persistent data
structure [11], [12]. If an update modifies 𝑚 out of 𝑛 leaves, on average 𝑂(𝑚 ⋅ log 𝑛) new nodes
are created in the tree that differ from the previous version. This approach allows any upper-layer
application to store multiple versions of the state eﬀiciently by only recording the “delta”. This feature
also allows for eﬀicient recomputation of the state authenticator after performing an update.

3.2 Core Building Blocks

To be short, JMT is a modified AR16 MT to which the two optimizations introduced in Section 2.1.1
are applied. Moreover, distinguished from the other productionized Merkle tree designs, it has a
simpler structure with special node key schema leveraging versioning.

3.2.1 Node Key

Since nodes are serialized and stored in a key-value store that supports point lookup, each node is
associated with a unique key identifying that node in Storage. JMT adopts a version-based node key
schema, by splicing version and nibble path, as:

version ‖ nibble path

where the node of this key is created at version and nibble path is the sequence of nibbles on the
path from the root node to this node following the given key. At any version, a nibble path itself
can uniquely identify a node. Therefore, combining version and nibble path could similarly pinpoint
any node across versions from the whole blockchain history. In Diem Core, we assign unsigned 64-bit

5
integers to version type encoded in big endian, but there can be any reasonable numeric types; Nibble
path is encoded in a way that nibbles near to the root show first.
There are two advantages of the version-based node key over widely adopted hash-based node key:

• Compared to a fixed-length hash-based node key, a JMT node key takes less space in a sparse
Merkle tree. For a JMT of 256-bit keys and 1 billion leaves, the average height is around 8
nibbles. The average size of all the node keys is about 12 bytes, significantly smaller than 32
bytes of a 256-bit hash key.
• We choose RocksDB as the underlying storage engine for Diem, similar to LevelDB [13], a LSM-
Tree key-value storage engine. The versioned JMT could remarkably reduce compactions to
zero, thereby achieving the optimal write amplification as one (not including write-ahead log).
In LevelDB or RocksDB, all the data is stored as key-value pairs and all keys are sorted by a
predefined order, usually lexicographic or reverse-lexicographic order. If a hash key is adopted,
each insertion will insert a new key-value pair at a random position within the current key
spectrum. But given the JMT node key schema, we could insert the new nodes generated by each
version sequentially to append to the current key set in storage according to the lexicographic
order because our key schema ensures keys of a high version are always lexicographically greater
than that at a lower version. In this case, compaction is no longer necessary as the keys inserted
are already ordered. Our experiment shows this schema saves IOPS and disk bandwidth by
more than 90 in contrast to hash-based node keys.

3.2.2 Node Types

JMT is made of two kinds of nodes2 :

• Internal Node is an interior node that has at least one child node, as a stepping-stone repre-
senting one nibble on the path to leaf nodes. Same as a normal AR16 MT, each internal one can
hold up to 16 children, compressing a subtree of 4 levels of AMT into one node. The structure
contains indices, versions, and digests of all the child nodes. If any subtree represented by a
child is empty, the internal node will leave its slot empty to denote an empty node. For example,
if an internal node with key 30 ‖ 0x996F has 4 child nodes, its structural layout can look like:

Index 0 1 2 3 4 5 6 7 8 9 A B C D E F
Version 12 9 25 18
Digest 𝑑0 𝑑7 𝑑𝐵 𝑑𝐹

where 𝑑𝑖 denotes the digest of the child node at index 𝑖. In this example, the node key of the
child node at index 0xB would be 25 ‖ 0x996FB. Later in this section we will cover the details
about constructing the node key of the next node.
• Leaf node is a node that stores user value at the bottom of the tree. Besides the data, it also
contains the key used for querying the tree of the node and the digest of the data. The nibble
path field of a leaf node key must be a prefix of its key.

Lookup When querying the tree at version 𝑣 with key 𝑘 to get the associated value, the following
steps are performed:

2 For implementation, two node types only are not able to gracefully handle a special case when the whole tree is empty.
Diem Core adopt a third node type called ‘Null‘ expressly for this case but it is definitely not the only workaround.

6
1. Get the root node key by an “out-of-band” method, where we use 𝑣 as the root node key in
Diem Core, and as the next node key.
2. Query the database to get the next node by its key.
3. Check the node type:
• If the node is an internal node, read the next nibble 𝑛 in 𝑘 as the index to find the next
version. If the slot is empty, the node doesn’t exist. Otherwise, splice the nibble path
of the node and the nibble just read as the next nibble path, i.e., current nibble path ‖ 𝑛.
Finally, splice the next version and the next nibble path to form the next node key. Then
goto step 2.
• If the node is a leaf node, read the stored key and check whether it matches the 𝑘. If
matches, return the value stored. Otherwise, the node of 𝑘 does not exist.

Insertion and Update Insertion shares similar steps as lookup with slightly different behaviors. Sup-
pose we are going to insert data 𝑑 with key 𝑘 at version 𝑣.

1. Follow lookup steps until find a leaf node, or an internal node where the next nibble slot to be
visited of index 𝑛, is empty.
2. Check the current node type:
• Internal Node: Create a new leaf node from 𝑘 and 𝑑 with its node key as
𝑣 ‖ current nibble path ‖ 𝑛. Fill version as 𝑣 and digest as hash(𝑘 ‖ hash(𝑑)) into
the current internal node at index 𝑛 .
• Leaf Node: Check whether 𝑘 equals the key of the node,
– If the keys mismatch, the new leaf node will be created likewise. Moreover, a series
of cascading internal nodes will be created one by one to represent the shared nibble
path within all unvisited nibbles of both keys. Also, both leaf nodes will be both
grafted onto the bottom one as child nodes. Afterwards, the uppermost new internal
node will be positioned in place of the old leaf node in its parent.
– If the keys match, the insertion becomes an update. We just have to replace the leaf
node with its new value 𝑑 and update its node key with new version 𝑣.
3. Until we finish step 2, since we have reached the deepest nodes of a tree, all the updates involving
version changes will affect addressing so we have to modify all the ancestor nodes visited. This
means from the last leaf node inserted/updated, update its parent internal nodes one by one
from the bottom up with the new version and digests of all modified child nodes until the root
node.

3.2.3 Taking Extension Node Away

It is noted that we expressly abandon an extension node, introduced by PMT [2], for the reasons
below:

• The eﬀicacy brought about by extension nodes will diminish quickly as the tree size grows and
in turn becomes less and less sparse. The chance that two leaf nodes share a long common
prefix of keys is so rare that the case where an extension node can substitute a long chain of
internal nodes is very uncommon.
• Removing the extension node effectively contributes to less complicated code in favor of less
potential bug surfaces.

7
3.3 Miscellaneous

Besides fundamental building blocks and operations, JMT can also support a range proof to validate
the existence of a set of consecutive leaf nodes. Moreover, with proper design, JMT stands ready
and flexible for functionality-wise extensions such as pruning, backup and restore, that are already
implemented in Diem Core.

4 Proof Format and Verification

Since ARMT is just a physical representation of AMT, irrespective of 𝑟, the proof format will always
be indistinguishable from that of the equivalent AMT.
According to the first optimization mentioned in Section 2.1.1, the proof of a node has been simplified
by replacing all empty subtrees as empty nodes with placeholder digest 𝐷𝑑𝑖𝑔𝑒𝑠𝑡 . JMT adopts this
simplification in the design likewise.
Whereas this format serves well for a dense Merkle tree, it proves verbose and in turn ineﬀicient for a
sparse tree attributed to obsessive default digests. For example, if a ℎ-bit tree has only one leaf node,
the proof still includes ℎ default digests. In fact, a key and only one default digest are adequate to
prove its existence. Therefore, similar to [6], we further improve the proof format in JMT to collapse
consecutive levels of empty siblings into one, shrinking the nibble path of the node. As a result, the
JMT optimized Merkle proof could be classified into three cases under two proof categories:

• Proof of Inclusion
– A leaf node of the key exists.
• Proof of Exclusion
– A leaf node of a different key exists, but with the same prefix of the queried key. Coun-
terintuitively, its existence indirectly proves the queried node does not exist.
– An empty node is on the path to the node during lookup. This means that the correspond-
ing position specified by the key currently belongs to an empty subtree, i.e., the queried
node doesn’t exist.

Then proof format can be summarized in C++ syntax as:

struct Leaf {
HashValue address;
HashValue value_hash;
};

struct Proof {
Leaf* leaf;
std::vector<HashValue> siblings;
};

Table 2: Proof format of Jellyfish Merkle Tree

In this format, HashValue is the digest type that is the output from the hash function. leaf in proof
consists of the node key key and the digest of the value stored, value_hash. siblings denotes that
there are digests that can be hashed with the node digest iteratively to obtain the root digest, upper
levels first. algorithm 1 shows how to verify a proof.

8
Algorithm 1: JMT Proof Verification
Input : Node key 𝑘, proof 𝑝, the expected root digest 𝑑𝑟𝑜𝑜𝑡 , and the node value 𝑣 that needs to
be verified against.
Output: return true if the 𝑣 is verified by 𝑝 and vice versa
1 if v ≠ NULL then // Node existence expects inclusion proof
2 if p.leaf ≠ NULL then
// Prove inclusion with inclusion proof
3 if 𝑘 ≠ p.leaf.key or hash(𝑏𝑙𝑜𝑏) ≠ p.leaf.value_hash then
4 return false
5 end
6 else // Expected inclusion proof but get non-inclusion proof passed in
7 return false
8 end
9 else // Node absense expects exclusion proof
10 if p.leaf ≠ NULL then // The inclusion proof of another node
11 if k = p.leaf.key or CommonPrefixLengthInBits(p.leaf.key) < Len(p.siblings) then
12 return false
13 end
14 else // Prove exclusion with an exclusion proof
// Noop
15 end
16 end
17 if p.leaf = NULL then 𝑑𝑐𝑢𝑟 ← 𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 else 𝑑𝑐𝑢𝑟 ← hash(p.leaf)
18 for 𝑖 ← DigestLengthInBits − Len(p.siblings) − 1 to 0 do
19 if the 𝑖𝑡ℎ bit from MSB of k = 1 then
20 𝑑𝑐𝑢𝑟 ← hash(p.siblings[𝑖] ‖ 𝑑𝑐𝑢𝑟 )
21 else
22 𝑑𝑐𝑢𝑟 ← hash(𝑑𝑐𝑢𝑟 ‖ p.siblings[𝑖])
23 end
24 end
25 return 𝑑𝑐𝑢𝑟 = 𝑑𝑟𝑜𝑜𝑡

We will give an example of each case to elaborate how JMT proof works. The assumption is that we
want to verify a proof against the node with key 𝑘𝑛 = 0b1000 and data value 𝑣𝑛 = 0x1234. It is
worth noting that we exemplify the proof verification algorithm by using 4-bit paths instead of full
nibble paths to help with understanding, because one nibble represents 4 bits and proof only concerns
bits but not nibbles. In these examples, all the siblings are marked grey in figures and their digests
are denoted as 𝑑𝑝𝑎𝑡ℎ where 𝑝𝑎𝑡ℎ is represented in binary notation.

4.1 Proof of inclusion

Figure 4 gives an example proof showing node with key=0b1000 exists. The proof has leaf.key =
0b1000, leaf.value_hash = hash(0x1234) and siblings = [𝑑101 , 𝑑11 (𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 ), 𝑑0 ].
Then we can prove that the node with key 0b1000 exists by:

1. Check leaf ≠ NULL and leaf.key = key. So if it is a valid proof, it must be an inclusion proof.
2. Get the first Len(leaf.siblings) bits of 𝑘𝑛 , i.e., 0b100, which is the binary path of the target leaf
node.

9
R
0 1

0 1

ht Hash(0b1000 k Hash(0x1234))
Figure 4: Proof of Inclusion

3. Then we get 𝑑100 = hash(𝑘𝑛 ‖ hash(𝑣𝑛 )), and based on the 3𝑟𝑑 bit from MSB is 0, we get 𝑑10 =
hash(𝑑100 , 𝑑101 ); similarly, 𝑑1 = hash(𝑑10 , 𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 ) and root node digest 𝑑𝑅 = hash(𝑑0 , 𝑑1 ).
4. Compare the 𝑑𝑅 with the expected. If they are the same, the proof is verified and not vice
versa.

4.2 Proof of Exclusion

There are two possible cases, both of which can prove the exclusion of the node. Either another node
of a different but having a common prefix key already exists or an empty node exists whose nibble
path is a prefix of the query key. The definition may seem obscure, so it’s better to give two examples
to untangle the complexities between these two cases.

4.2.1 Another node existence

R
0 1

0 1

R
0 1

Hash(0b1011 k Hash(0x5678))
(a) Proof of exclusion with another leaf node (b) Proof of exclusion with empty node on the path

Figure 5: Proofs of Exclusion

Figure 5a gives an example proof that a node with key 0b1011 and value 0x5678 exists. The proof has
leaf.key = 0b1000, leaf.value_hash = hash(0x5678) and siblings = [𝑑11 (𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 ), 𝑑0 ]. Then
we can also use it to prove that the node with key 0b1000 does not exist by:

1. Check leaf ≠ NULL in the proof and leaf.key ≠ 𝑘𝑛 . So if it is a valid proof, it must be a
non-inclusion proof.
2. Similarly, check the first 2 (number of the siblings) bits of the two keys are identical. Otherwise
it is not a valid proof. In our case the bits are 0b10.
3. Then we get 𝑑10 = hash(leaf.key ‖ leaf.value_hash), and based on the 2𝑛𝑑 bit is 0, we get
𝑑1 = hash(𝑑10 , 𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 ). Finally, 𝑑𝑅 = hash(𝑑0 , 𝑑1 ).
4. Compare 𝑑𝑅 with the expected.

10
4.2.2 Empty node existence

Figure 5b illustrates an example proof showing an empty node exists with path 0𝑏1, which in turn
proves no node with 0b1-prefixed key exists, including the target node. The proof has leaf = NULL
but siblings = [𝑑1 (𝐷𝑑𝑒𝑓𝑎𝑢𝑙𝑡 )]).

1. Check leaf = NULL. So if it is a valid proof, it must be a non-inclusion proof.

2. Check the first 1 (number of siblings) bits from MSB of both keys are identical. Otherwise it
is not a valid proof. In our case the bits are 0b1.
3. Then we get 𝑑1 = 𝐷𝑑𝑖𝑔𝑒𝑠𝑡 , and based on the 1𝑠𝑡 bit is 1, we can get 𝑑𝑅 = hash(𝑑0, 𝐷𝑑𝑖𝑔𝑒𝑠𝑡 ).
4. Compare 𝑑𝑅 with the expected.

4.3 Proof Generation

The proof generation can piggyback on a lookup. It is as simple as collecting all the siblings when
performing a lookup into a PMT. After a lookup, no matter whether the node of the lookup key
exists, the collected siblings on the path with the final node reached, empty or not, make up the final
proof that could be verified against its existence or absence.

5 Conclusion

We have presented Jellyfish Merkle Tree, an architecture of sparse Merkle tree optimized for compu-
tation and space, designed for the Diem Blockchain. Compared to other Merkle trees in production,
JMT simplifies node types with only two types and leverages the sparseness to benefit LSM-tree based
storage. The space saving can be optimized further if delta encoding is enabled for keys. Moreover,
though the proof format and verification algorithm has become more complex, it is the smaller proof
size and the less computation overhead of verification that practically benefit users while keeping the
algorithm complexity transparent to end users.

5.1 Future Work

The JMT node key structure and encoding are expressly optimized for LSM-tree key-value store to
avoid compaction. However, when we incorporate pruning features that enable deletion of expired
data that out of a recent window, compaction is inevitable since the key of a tombstone will definitely
overlap with the current sorted runs. Then the benefit will be greatly diminished by compactions
triggered. We may need to look into a smarter way to relieve the overhead brought about by it.
In the long run, we hope our efforts on precise specifications could help the evolvement of practical
authenticated data structure or inspire better designs and implementations.

References

[1] R. C. Merkle, “A digital signature based on a conventional encryption function,” in Advances in

cryptology - CRYPTO ’87, A conference on the theory and applications of cryptographic techniques,
santa barbara, california, usa, august 16-20, 1987, proceedings, 1987, pp. 369–378.
[2] G. Wood, “Ethereum: A Secure Decentralised Generalised Transaction Ledger.” http://gavwood.
com/paper.pdf, 2016.

11
[3] C. Cachin, “Architecture of the hyperledger blockchain fabric,” in Workshop on distributed cryp-
tocurrencies and consensus ledgers, 2016.
[4] B. Laurie and E. Kasper, “Revocation transparency,” Google Research, September, p. 33, 2012.
[5] R. Dahlberg, T. Pulls, and R. Peeters, “Eﬀicient sparse merkle trees,” 2016, pp. 199–215.
[6] H. Park, “Modified merkle patricia trie specification (also merkle patricia tree).” 2018. https:
//medium.com/aergo/releasing-statetrie-a-hash-tree-built-for-high-performance-interoperability-
6ce0406b12ae
[7] The Diem Association, “An Introduction to Diem.” https://diem.com/en-us/white-paper/.
[8] Z. Amsden et al., “The Libra Blockchain.” https://developers.diem.com/docs/technical-
papers/the-diem-blockchain-paper.
[9] B. Laurie, “Certificate transparency,” Communications of the ACM, vol. 57, no. 10, pp. 40–46,
2014.
[10] S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum, “Optimizing space
amplification in rocksdb,” in CIDR 2017, 8th biennial conference on innovative data systems research,
chaminade, ca, usa, january 8-11, 2017, online proceedings, 2017. http://cidrdb.org/cidr2017/pape
rs/p82-dong-cidr17.pdf
[11] J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan, “Making data structures persistent,”
J. Comput. Syst. Sci., vol. 38, no. 1, pp. 86–124, 1989.
[12] C. Okasaki, “Purely functional data structures,” 1999.
[13] S. Ghemawat and J. Dean, “LevelDB.” 2011. https://github.com/google/leveldb

Blockchain Lecture#1
No ratings yet
Blockchain Lecture#1
64 pages
BC Olt
No ratings yet
BC Olt
50 pages
DS Unit 6
No ratings yet
DS Unit 6
31 pages
Taproot Assets Protocol
No ratings yet
Taproot Assets Protocol
28 pages
Lecture 08
No ratings yet
Lecture 08
32 pages
Trees
No ratings yet
Trees
25 pages
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
125 pages
Ch1 - Question and Solutions
No ratings yet
Ch1 - Question and Solutions
11 pages
Nhom 2-Dynamic Merkle B-Tree With Efficient Proofs
No ratings yet
Nhom 2-Dynamic Merkle B-Tree With Efficient Proofs
11 pages
Unit 3-DS
No ratings yet
Unit 3-DS
13 pages
Asynchronous Merkle Trees: Anoushk Kharangate November 2023
No ratings yet
Asynchronous Merkle Trees: Anoushk Kharangate November 2023
6 pages
UNIT1 BT NOTEs
No ratings yet
UNIT1 BT NOTEs
15 pages
Cryptography For Efficiency: Authenticated Data Structures Based On Lattices and Parallel Online Memory Checking
No ratings yet
Cryptography For Efficiency: Authenticated Data Structures Based On Lattices and Parallel Online Memory Checking
24 pages
Unit V FBT
No ratings yet
Unit V FBT
5 pages
HMT: A Hardware-Centric Hybrid Bonsai Merkle Tree Algorithm For High-Performance Authentication
No ratings yet
HMT: A Hardware-Centric Hybrid Bonsai Merkle Tree Algorithm For High-Performance Authentication
28 pages
UnitWise Questions1
No ratings yet
UnitWise Questions1
5 pages
Taproot 2014 12 11 Merkelized Abstract Syntax Trees
No ratings yet
Taproot 2014 12 11 Merkelized Abstract Syntax Trees
3 pages
Unit 2 Tree
No ratings yet
Unit 2 Tree
55 pages
Authenticated Data Structures For Graph and Geometric Searching
No ratings yet
Authenticated Data Structures For Graph and Geometric Searching
16 pages
Lecture 8 Merkle Tree
No ratings yet
Lecture 8 Merkle Tree
27 pages
Efficient (Non-) Membership Tree From Multicollision-Resistance With Applications To Zero-Knowledge Proofs
No ratings yet
Efficient (Non-) Membership Tree From Multicollision-Resistance With Applications To Zero-Knowledge Proofs
35 pages
An Image Authentication Scheme Using Merkle Tree M
No ratings yet
An Image Authentication Scheme Using Merkle Tree M
18 pages
Towards Merkle Trees For High-Performance Data Systems: Muhammad El-Hindi Tobias Ziegler Carsten Binnig
No ratings yet
Towards Merkle Trees For High-Performance Data Systems: Muhammad El-Hindi Tobias Ziegler Carsten Binnig
6 pages
Tree 3.2
No ratings yet
Tree 3.2
26 pages
Practical Task Verifying Transaction Integrity Using Merkle
No ratings yet
Practical Task Verifying Transaction Integrity Using Merkle
4 pages
Unit 2
No ratings yet
Unit 2
63 pages
Merkle Tree and SPV
No ratings yet
Merkle Tree and SPV
20 pages
Bcta Unit-2 Notes
No ratings yet
Bcta Unit-2 Notes
27 pages
Brushwood: Distributed Trees in Peer-to-Peer Systems: Chi Zhang Arvind Krishnamurthy Randolph Y. Wang
No ratings yet
Brushwood: Distributed Trees in Peer-to-Peer Systems: Chi Zhang Arvind Krishnamurthy Randolph Y. Wang
6 pages
Merkle Tree
No ratings yet
Merkle Tree
19 pages
Blockchain
No ratings yet
Blockchain
87 pages
Merkle Tree in Blockchain
No ratings yet
Merkle Tree in Blockchain
33 pages
Merkle Hash Grids Instead of Merkle Trees
No ratings yet
Merkle Hash Grids Instead of Merkle Trees
18 pages
DS Unit-4
No ratings yet
DS Unit-4
11 pages
Lect 06 - MPT
No ratings yet
Lect 06 - MPT
11 pages
GT06N Protocol
No ratings yet
GT06N Protocol
44 pages
Unit 2
No ratings yet
Unit 2
46 pages
Introduction To Blockchain and Cryptocurrency Lacture 4
No ratings yet
Introduction To Blockchain and Cryptocurrency Lacture 4
22 pages
Benchmark Merkle Tree On Cloud Environments
No ratings yet
Benchmark Merkle Tree On Cloud Environments
11 pages
Bdaa
No ratings yet
Bdaa
41 pages
Merkle Tree
No ratings yet
Merkle Tree
10 pages
Merkle Tree
No ratings yet
Merkle Tree
9 pages
Blockchain Merkle Tree
No ratings yet
Blockchain Merkle Tree
3 pages
Proof of Inclusion of TX in Merkle Tree-MuhammadEssamAbdelaziz
No ratings yet
Proof of Inclusion of TX in Merkle Tree-MuhammadEssamAbdelaziz
4 pages
Overview of Merkle Trees
No ratings yet
Overview of Merkle Trees
4 pages
Merkle Tree
No ratings yet
Merkle Tree
2 pages
A Space - and Time-Efficient Implementation of The Merkle Tree Traversal Algorithm
No ratings yet
A Space - and Time-Efficient Implementation of The Merkle Tree Traversal Algorithm
19 pages
Verkle Trees: Merkle Patricia Vs Verkle Tree Node Structure
No ratings yet
Verkle Trees: Merkle Patricia Vs Verkle Tree Node Structure
8 pages
BC Exp 1
No ratings yet
BC Exp 1
4 pages
Merkle TRee
No ratings yet
Merkle TRee
3 pages
Hash Pointers and Data Structures
No ratings yet
Hash Pointers and Data Structures
30 pages
Blockchain Structure-Merkle Tree
No ratings yet
Blockchain Structure-Merkle Tree
8 pages
Blockchain Merkle Tree
No ratings yet
Blockchain Merkle Tree
4 pages
4.1-Merkle Roots
No ratings yet
4.1-Merkle Roots
3 pages
On Merkle Trees: DR Craig S Wright
No ratings yet
On Merkle Trees: DR Craig S Wright
5 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
1 page
Raj All Practical FBC Final PDF
No ratings yet
Raj All Practical FBC Final PDF
14 pages
Merkle Tree
No ratings yet
Merkle Tree
2 pages
What Is Tree - Types of Trees in Descrite Mathemat...
No ratings yet
What Is Tree - Types of Trees in Descrite Mathemat...
2 pages
Block Header: A Block Header Is What The Miners Hash To Try and Make The Block Valid. This
No ratings yet
Block Header: A Block Header Is What The Miners Hash To Try and Make The Block Valid. This
2 pages
MMPC 8 Ebook Final
100% (1)
MMPC 8 Ebook Final
61 pages
Hyperledger GPT
No ratings yet
Hyperledger GPT
5 pages
Auto Create Invoice in Oracle Apps
No ratings yet
Auto Create Invoice in Oracle Apps
3 pages
30 Common Email Abbreviations & Acronyms You Should Know
No ratings yet
30 Common Email Abbreviations & Acronyms You Should Know
5 pages
Balaji
No ratings yet
Balaji
94 pages
WT PPT U2
No ratings yet
WT PPT U2
51 pages
Problem Solving Cos 102 Class-1
No ratings yet
Problem Solving Cos 102 Class-1
48 pages
27604MangeshGhonge MS
No ratings yet
27604MangeshGhonge MS
402 pages
Don Bosco Institute of Technology: Course Syllabi With CO's
No ratings yet
Don Bosco Institute of Technology: Course Syllabi With CO's
2 pages
Project Report
No ratings yet
Project Report
23 pages
12 01 2023 - CSE3009 IOT BK MQTT Topics
No ratings yet
12 01 2023 - CSE3009 IOT BK MQTT Topics
30 pages
Compaq Armada E500 Series of Personal Computers: Reference Guide
No ratings yet
Compaq Armada E500 Series of Personal Computers: Reference Guide
199 pages
Collection in Java
No ratings yet
Collection in Java
18 pages
Java Mock Test III
No ratings yet
Java Mock Test III
6 pages
ECMS1 Lab Manual 1
0% (1)
ECMS1 Lab Manual 1
14 pages
Pa 1400 Series
No ratings yet
Pa 1400 Series
46 pages
A Model Data Management Plan Standard Operating Procedure: Results From The DIA Clinical Data Management Community, Committee On Clinical Data Management Plan
No ratings yet
A Model Data Management Plan Standard Operating Procedure: Results From The DIA Clinical Data Management Community, Committee On Clinical Data Management Plan
10 pages
Bookwithindex
No ratings yet
Bookwithindex
96 pages
Swing Java
No ratings yet
Swing Java
34 pages
DriveLock Admin Guide
No ratings yet
DriveLock Admin Guide
566 pages
AR-NB2: Network Expansion Kit
No ratings yet
AR-NB2: Network Expansion Kit
32 pages
Jquery For Designers Beginners Guide 2nd Edition Natalie Maclees PDF Download
No ratings yet
Jquery For Designers Beginners Guide 2nd Edition Natalie Maclees PDF Download
65 pages
Acm Tosn
No ratings yet
Acm Tosn
22 pages
MX430 101
No ratings yet
MX430 101
8 pages
Intrusion in Information Security
No ratings yet
Intrusion in Information Security
26 pages
PUF Aging
No ratings yet
PUF Aging
26 pages
Fuzzy Extractor N
No ratings yet
Fuzzy Extractor N
15 pages
Fringe
No ratings yet
Fringe
38 pages
RF-BM-ND04 Hardware Datasheet V1.2
No ratings yet
RF-BM-ND04 Hardware Datasheet V1.2
19 pages
Selfstudys Com File
No ratings yet
Selfstudys Com File
5 pages
Blockchain of Finite-Lifetime Blocks With Applications To Edge-Based IoT
No ratings yet
Blockchain of Finite-Lifetime Blocks With Applications To Edge-Based IoT
15 pages
CPAKA Mutual Authentication and Key Agreement Scheme Based On Conditional PUF in Space-Air-Ground Integrated Network
No ratings yet
CPAKA Mutual Authentication and Key Agreement Scheme Based On Conditional PUF in Space-Air-Ground Integrated Network
14 pages
A Privacy-Aware Provably Secure Smart Card Authentication Protocol Based On Physically Unclonable Functions
No ratings yet
A Privacy-Aware Provably Secure Smart Card Authentication Protocol Based On Physically Unclonable Functions
13 pages
PA-PUF: A Novel Priority Arbiter PUF
No ratings yet
PA-PUF: A Novel Priority Arbiter PUF
6 pages
Android Sqlite Tutorial - Javatpoint
No ratings yet
Android Sqlite Tutorial - Javatpoint
17 pages
Izar Net 2 14
No ratings yet
Izar Net 2 14
3 pages
Design & Evaluation in The Real World: Communicators & Advisory Systems
No ratings yet
Design & Evaluation in The Real World: Communicators & Advisory Systems
13 pages
Java - Control Flow Statements
No ratings yet
Java - Control Flow Statements
9 pages
Digipm: What Do You Get With Digipm?
No ratings yet
Digipm: What Do You Get With Digipm?
3 pages
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet

Jellyfish Merkle Tree: Abstract

Uploaded by

Jellyfish Merkle Tree: Abstract

Uploaded by

Jellyfish Merkle Tree

Zhenhuan Gao, Yuxuan Hu, Qinfan Wu*

Revised January 12, 2021

2 A Retrospection of Addressable Merkle Trees

2.1 Addressable Merkle Tree (AMT)

2.1.1 Tractable Representations

2.2 Addressable Radix Merkle Tree (ARMT)

Compressed to radix-16 tree

0x4 0xF 0xC

Figure 3: AMT compressed to AR16 MT.

2.3 Use case: State Tree

3.2 Core Building Blocks

3.2.1 Node Key

version ‖ nibble path

3.2.2 Node Types

JMT is made of two kinds of nodes2 :

3.2.3 Taking Extension Node Away

4 Proof Format and Verification

Then proof format can be summarized in C++ syntax as:

Table 2: Proof format of Jellyfish Merkle Tree

4.1 Proof of inclusion

4.2 Proof of Exclusion

4.2.1 Another node existence

Figure 5: Proofs of Exclusion

1. Check leaf = NULL. So if it is a valid proof, it must be a non-inclusion proof.

4.3 Proof Generation

5.1 Future Work

[1] R. C. Merkle, “A digital signature based on a conventional encryption function,” in Advances in

You might also like