Nightshade
Nightshade
Nightshade
Contents
1 Sharding Basics 3
1.1 Validator partitioning and Beacon chains . . . . . . . . . . . . . 4
1.2 Quadratic sharding . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 State sharding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Cross-shard transactions . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Malicious behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.1 Malicious forks . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.2 Approving invalid blocks . . . . . . . . . . . . . . . . . . . 9
3 Nightshade 21
3.1 From shard chains to shard chunks . . . . . . . . . . . . . . . . . 21
3.2 Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Block production . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Ensuring data availability . . . . . . . . . . . . . . . . . . . . . . 24
3.4.1 Dealing with lazy block producers . . . . . . . . . . . . . 25
3.5 State transition application . . . . . . . . . . . . . . . . . . . . . 25
3.6 Cross-shard transactions and receipts . . . . . . . . . . . . . . . . 26
3.6.1 Receipt transaction lifetime . . . . . . . . . . . . . . . . . 26
3.6.2 Handling too many receipts . . . . . . . . . . . . . . . . . 27
1
3.7 Chunks validation . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7.1 State validity challenge . . . . . . . . . . . . . . . . . . . 30
3.7.2 Fishermen and fast cross-shard transactions . . . . . . . . 31
3.7.3 Hiding validators . . . . . . . . . . . . . . . . . . . . . . . 31
3.7.4 Commit-Reveal . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7.5 Handling challenges . . . . . . . . . . . . . . . . . . . . . 35
3.8 Signature Aggregation . . . . . . . . . . . . . . . . . . . . . . . . 35
3.9 Snapshots Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Conclusion 38
2
Introduction
It is well-known that Ethereum, the most used general purpose blockchain at
the time of this writing, can only process less than 20 transactions per second
on the main chain. This limitation, coupled with the popularity of the network,
leads to high gas prices (the cost of executing a transaction on the network)
and long confirmation times; despite the fact that at the time of this writing a
new block is produced approximately every 10–20 seconds the average time it
actually takes for a transaction to be added to the blockchain is 1.2 minutes,
according to ETH Gas Station. Low throughput, high prices, and high latency
all make Ethereum not suitable to run services that need to scale with adoption.
The main reason for Ethereum low throughput is that every node in the
network needs to process every single transaction. Developers have proposed
many solutions to address the issue of throughput on the protocol level. These
solutions can be mostly separated into those that delegate all the computation to
a small set of powerful nodes, and those that have each node in the network only
do a subset of the total amount of work. An example of the former approach is
Solana that through careful low level optimizations and GPU usage can reach
hundreds of thousand simple payment transactions per second processed by
each node in the system. Algorand, SpaceMesh, Thunder all fit into the former
category, building various improvements in the consensus and the structure of
the blockchain itself to run more transactions than Ethereum, but still bounded
by what a single (albeit very powerful) machine can process.
The latter approach, in which the work is split among all the participating
nodes, is called sharding. This is how Ethereum Foundation currently plans to
scale Ethereum. At the time of this writing the spec is not finalized yet, the most
recent spec can be found here: https://github.com/ethereum/eth2.0-specs.
Near Protocol is also built on sharding. The Near team, which includes
several ex-MemSQL engineers responsible for building sharding, cross-shard
transactions and distributed JOINs, as well as five ex-Googlers, has significant
industry expertise in building distributed systems.
This document outlines the general approach to blockchain sharding, the
major problems that need to be overcome, including state validity and data
availability problems, and presents Nightshade, the solution Near Protocol is
built upon that addresses those issues.
1 Sharding Basics1
Let’s start with the simplest approach to sharding. In this approach instead of
running one blockchain, we will run multiple, and call each such blockchain a
“shard”. Each shard will have its own set of validators. Here and below we use
a generic term “validator” to refer to participants that verify transactions and
produce blocks, either by mining, such as in Proof of Work, or via a voting-based
1 This section was previously published at https://near.ai/shard1. If you read it before,
3
mechanism. For now let’s assume that the shards never communicate with each
other.
This design, though simple, is sufficient to outline some initial major chal-
lenges in sharding.
which brings us to the second point: who chooses validators for each shard?
Controlling 5.1% of validators is only damaging if all those 5.1% of validators
are in the same shard. If validators can’t choose which shard they get to validate
in, a participant controlling 5.1% of the validators is highly unlikely to get all
their validators in the same shard, heavily reducing their ability to compromise
the system.
Almost all sharding designs today rely on some source of randomness to
assign validators to shards. Randomness on blockchain on itself is a very chal-
lenging topic and is out of scope for this document. For now let’s assume there’s
some source of randomness we can use. We will cover validators assignment in
more detail in section 2.1.
Both randomness and validator assignment require computation that is not
specific to any particular shard. For that computation, practically all existing
designs have a separate blockchain that is tasked with performing operations
necessary for the maintenance of the entire network. Besides generating random
4
numbers and assigning validators to the shards, these operations often also
include receiving updates from shards and taking snapshots of them, processing
stakes and slashing in Proof-of-Stake systems, and rebalancing shards when that
feature is supported. Such chain is called a Beacon chain in Ethereum, a Relay
chain in PolkaDot, and the Cosmos Hub in Cosmos.
Throughout this document we will refer to such chain as a Beacon chain.
The existence of the Beacon chain brings us to the next interesting topic, the
quadratic sharding.
5
1. The necessity to process transactions requires more compute power with
the increased number of transactions being processed;
2. The necessity to relay transactions and blocks requires more network band-
width with the increased number of transactions being relayed;
3. The necessity to store data requires more storage as the state grows. Im-
portantly, unlike the processing power and network, the storage require-
ment grows even if the transaction rate (number of transactions processed
per second) remains constant.
From the above list it might appear that the storage requirement would be
the most pressing, since it is the only one that is being increased over time
even if the number of transactions per second doesn’t change, but in practice
the most pressing requirement today is the compute power. The entire state of
Ethereum as of this writing is 100GB, easily manageable by most of the nodes.
But the number of transactions Ethereum can process is around 20, orders of
magnitude less than what is needed for many practical use cases.
Zilliqa is the most well-known project that shards processing but not storage.
Sharding of processing is an easier problem because each node has the entire
state, meaning that contracts can freely invoke other contracts and read any data
from the blockchain. Some careful engineering is needed to make sure updates
from multiple shards updating the same parts of the state do not conflict. In
those regards Zilliqa is taking a relatively simplistic approach2 .
While sharding of storage without sharding of processing was proposed, it is
extremely uncommon. Thus in practice sharding of storage, or State Sharding,
almost always implies sharding of processing and sharding of network.
Practically, under State Sharding the nodes in each shard are building their
own blockchain that contains transactions that affect only the local part of the
global state that is assigned to that shard. Therefore, the validators in the
shard only need to store their local part of the global state and only execute,
and as such only relay, transactions that affect their part of the state. This
partition linearly reduces the requirement on all compute power, storage, and
network bandwidth, but introduces new problems, such as data availability and
cross-shard transactions, both of which we will cover below.
8f9efae0ce3b
6
one account to another within the same shard, the transaction can be processed
entirely by the validators in that shard. If, however, Alice that resides on shard
#1 wants to send money to Bob who resides on shard #2, neither validators
on shard #1(they won’t be able to credit Bob’s account) nor the validators on
shard #2 (they won’t be able to debit Alice’s account) can process the entire
transaction.
There are two families of approaches to cross-shard transactions:
7
Figure 2: Asynchronous cross-shard transactions
8
is not negligible, we can’t assume that the forks won’t happen even if a byzan-
tine consensus was reached among the shard validators, or many blocks were
produced on top of the block with the state change.
This problem has multiple solutions, the most common one being occasional
cross-linking of the latest shard chain block to the beacon chain. The fork
choice rule in the shard chains is then changed to always prefer the chain that is
cross-linked, and only apply shard-specific fork-choice rule for blocks that were
published since the last cross-link.
9
Figure 4: Attempt to create an invalid block in a non-sharded blockchain
of the last valid block known to them, creating a fork. Since there are fewer
validators in the honest fork, their chain is shorter. However, in classic non-
sharded blockchain every participant that uses blockchain for any purpose is
responsible for validating all the blocks they receive and recomputing the state.
Thus any person who has any interest in the blockchain would observe that A’
is invalid, and thus also immediately discard B’, C’ and D’, as such taking the
chain A-B as the current longest valid chain.
In a sharded blockchain, however, no participant can validate all the trans-
actions on all the shards, so they need to have some way to confirm that at no
point in history of any shard of the blockchain no invalid block was included.
Note that unlike with forks, cross-linking to the Beacon chain is not a suffi-
cient solution, since the Beacon chain doesn’t have the capacity to validate the
blocks. It can only validate that a sufficient number of validators in that shard
signed the block (and as such attested to its correctness).
We will discuss solutions to this problem in section 2.2 below.
10
they interact is the result of some valid sequence of blocks and that such sequence
of blocks is indeed the canonical chain in the shard. A problem that doesn’t
exist in a non-sharded blockchain.
We will first present a simple solution to this problem that has been proposed
by many protocols and then analyze how this solution can break and what
attempts have been made to address it.
11
Figure 6: A blockchain with each block finalized via BFT consensus
This simple solution doesn’t work if we assume that the validators can be
corrupted adaptively, which is not an unreasonable assumption6 . Adaptively
corrupting a single shard in a system with 1000 shards is significantly cheaper
than corrupting the entire system. Therefore, the security of the protocol de-
creases linearly with the number of shards. To have certainty in the validity of
a block, we must know that at any point in history no shard in the system has
a majority of validators colluding; with adaptive adversaries, we no longer have
such certainty. As we discussed in section 1.5, colluding validators can exercise
two basic malicious behaviors: create forks, and produce invalid blocks.
Malicious forks can be addressed by blocks being cross-linked to the Bea-
con chain that is generally designed to have significantly higher security than
the shard chains. Producing invalid blocks, however, is a significantly more
challenging problem to tackle.
12
Figure 7: A cross-shard transaction from a chain that has an invalid block
1. For validators of Shard #2 to validate the block from which the transaction
is initiated. This won’t work even in the example above, since block C
appears to be completely valid.
2. For validators in Shard #2 to validate some large number of blocks pre-
ceding the block from which the transaction is initiated. Naturally, for
any number of blocks N validated by the receiving shard the malicious
validators can create N+1 valid blocks on top of the invalid block they
produced.
13
Figure 8: An invalid cross-shard transaction in chainweb-like system that will
get detected
Shard #3 validates all the blocks in Shard #2, but not in Shard #1, and
has no way to detect the malicious block.
There are two major directions of properly solving state validity: fishermen
14
and cryptographic proofs of computation.
2.3 Fisherman
The idea behind the first approach is the following: whenever a block header
is communicated between chains for any purpose (such as cross-linking to the
beacon chain, or a cross-shard transaction), there’s a period of time during
which any honest validator can provide a proof that the block is invalid. There
are various constructions that enable very succinct proofs that the blocks are
invalid, so the communication overhead for the receiving nodes is way smaller
than that of receiving a full block.
With this approach for as long as there’s at least one honest validator in the
shard, the system is secure.
This is the dominant approach (besides pretending the problem doesn’t ex-
ist) among the proposed protocols today. This approach, however, has two
major disadvantages:
1. The challenge period needs to be sufficiently long for the honest validator
to recognize a block was produced, download it, fully verify it, and prepare
the challenge if the block is invalid. Introducing such a period would
significantly slow down the cross-shard transactions.
2. The existence of the challenge protocol creates a new vector of attacks
when malicious nodes spam with invalid challenges. An obvious solution
to this problem is to make challengers deposit some amount of tokens that
are returned if the challenge is valid. This is only a partial solution, as it
might still be beneficial for the adversary to spam the system (and burn
the deposits) with invalid challenges, for example to prevent the valid
15
challenge from a honest validator from going through. These attacks are
called Grieving Attacks.
See section 3.7.2 for a way to get around the latter point.
16
Figure 11: Merkle Tree
Now if a majority of full nodes collude, they can produce a block, valid or
invalid, and send its hash to the light nodes, but never disclose the full content
of the block. There are various ways they can benefit from it. For example,
consider figure 12:
17
and sent you a header of that block with a Merkle proof of the state in which
you have money (or a Merkle proof of a valid transaction that sends the money
to you). Confident the transaction is finalized, you provide the service.
However, the validators never distribute the full content of the block B to
anyone. As such, the honest validators of block C can’t retrieve the block, and
are either forced to stall the system or to build on top of A, depriving you as a
merchant of money.
When we apply the same scenario to sharding, the definitions of full and
light node generally apply per shard: validators in each shard download every
block in that shard and validate every transaction in that shard, but other
nodes in the system, including those that snapshot shard chains state into the
beacon chain, only download the headers. Thus the validators in the shard are
effectively full nodes for that shard, while other participants in the system,
including the beacon chain, operate as light nodes.
For the fisherman approach we discussed above to work, honest validators
need to be able to download blocks that are cross-linked to the beacon chain.
If malicious validators cross-linked a header of an invalid block (or used it to
initiate a cross-shard transaction), but never distributed the block, the honest
validators have no way to craft a challenge.
We will cover three approaches to address this problem that complement
each other.
18
Figure 13: Validators need to download state and thus cannot be rotated
frequently
to withhold the parts of the block that were not downloaded by any light node,
thus still making the block unavailable.
One solution is to use a construction called Erasure Codes to make it possible
to recover the full block even if only some part of the block is available, as shown
on figure 14.
Both Polkadot and Ethereum Serenity have designs around this idea that
provide a way for light nodes to be reasonably confident the blocks are available.
The Ethereum Serenity approach has a detailed description in [2].
19
2.5.3 Polkadot’s approach to data availability
In Polkadot, like in most sharded solutions, each shard (called parachain) snap-
shots its blocks to the beacon chain (called relay chain). Say there are 2f + 1
validators on the relay chain. The block producers of the parachain blocks, called
collators, once the parachain block is produced compute an erasure coded ver-
sion of the block that consists of 2f + 1 parts such that any f parts are sufficient
to reconstruct the block. They then distribute one part to each validator on the
relay chain. A particular relay chain validator would only sign on a relay chain
block if they have their part for each parachain block that is snapshotted to
such relay chain block. Thus, if a relay chain block has signatures from 2f + 1
validators, and for as long as no more than f of them violated the protocol, each
parachain block can be reconstructed by fetching the parts from the validators
that follow the protocol. See figure 15.
20
shards, the security of the sharded protocol needs to be designed in such a
way that the system is secure even if some old blocks in some shards become
completely unavailable.
3 Nightshade
3.1 From shard chains to shard chunks
The sharding model with shard chains and a beacon chain is very powerful but
has certain complexities. In particular, the fork choice rule needs to be executed
in each chain separately, the fork choice rule in the shard chains and the beacon
chain must be built differently and tested separately.
In Nightshade we model the system as a single blockchain, in which each
block logically contains all the transactions for all the shards, and changes the
whole state of all the shards. Physically, however, no participant downloads the
full state or the full logical block. Instead, each participant of the network only
maintains the state that corresponds to the shards that they validate transac-
tions for, and the list of all the transactions in the block is split into physical
chunks, one chunks per shard.
Under ideal conditions each block contains exactly one chunk per shard per
block, which roughly corresponds to the model with shard chains in which the
shard chains produce blocks with the same speed as the beacon chain. However,
due to network delays some chunks might be missing, so in practice each block
contains either one or zero chunks per shard. See section 3.3 for details on how
blocks are produced.
Figure 16: A model with shard chains on the left and with one chain having
blocks split into chunks on the right
21
3.2 Consensus
The two dominant approaches to the consensus in the blockchains today are the
longest (or heaviest) chain, in which the chain that has the most work or stake
used to build it is considered canonical, and BFT, in which for each block some
set of validators reach a BFT consensus.
In the protocols proposed recently the latter is a more dominant approach,
since it provides immediate finality, while in the longest chain more blocks need
to be built on top of the block to ensure the finality. Often for a meaningful
security the time it takes for sufficient number of blocks to be built takes on the
order of hours.
Using BFT consensus on each block also has disadvantages, such as:
A hybrid model in which the consensus used is some sort of the heaviest
chain, but some blocks are periodically finalized using a BFT finality gad-
get maintain the advantages of both models. Such BFT finality gadgets are
Casper FFG [6] used in Ethereum 2.0 8 , Casper CBC (see https://vitalik.
ca/general/2018/12/05/cbc_casper.html) and GRANDPA (see https://
medium.com/polkadot-network/d08a24a021b5) used in Polkadot.
Nightshade uses the heaviest chain consensus. Specifically when a block
producer produces a block (see section 3.3), they can collect signatures from
other block producers and validators attesting to the previous block. See section
3.8 for details how such large number of signatures is aggregated. The weight
8 Also see the whiteboard session with Justin Drake for an indepth overview of Casper
FFG, and how it is integrated with the GHOST heaviest chain consensus here: https://www.
youtube.com/watch?v=S262StTwkmo
22
of a block is then the cumulative stake of all the signers whose signatures are
included in the block. The weight of a chain is the sum of the block weights.
On top of the heaviest chain consensus we use a finality gadget that uses
the attestations to finalize the blocks. To reduce the complexity of the system,
we use a finality gadget that doesn’t influence the fork choice rule in any way,
and instead only introduces extra slashing conditions, such that once a block is
finalized by the finality gadget, a fork is impossible unless a very large percentage
of the total stake is slashed. Casper CBC is such a finality gadget, and we
presently model with Casper CBC in mind.
We also work on a separate BFT protocol called TxFlow. At the time of
writing this document it is unclear if TxFlow will be used instead of Casper
CBC. We note, however, that the choice of the finality gadget is largely orthog-
onal to the rest of the design.
23
root of the resulting state. b will ultimately only contain a very small header of
the chunk, namely the merkle root of all the applied transactions (see section
3.7.1 for exact details), and the merkle root of the final state.
Throughout the rest of the document we often refer to the block producer
that is responsible to produce a chunk at a particular time for a particular shard
as a chunk producer. Chunk producer is always one of the block producers.
The block producers and the chunk producers rotate each block according
to a fixed schedule. The block producers have an order and repeatedly produce
blocks in that order. E.g. if there are 100 block producers, the first block
producers is responsible for producing blocks 1, 101, 201 etc, the second is
responsible for producing 2, 102, 202 etc).
Since chunk production, unlike the block production, requires maintaining
the state, and for each shard only sw w/n block producers maintain the state
per shard, correspondingly only those sw w/n block producers rotate to create
chunks. E.g. with the constants above with four block producers assigned to
each shard, each block producer will be creating chunks once every four blocks.
24
Figure 17: Each block contains one or zero chunks per shard, and each chunk
is erasure coded. Each part of the erasure coded chunk is sent to a designated
block producer via a special onepart message
25
the chunk header contains the merkle root of the merkelized state as of before
the transactions in the chunk are applied.
The transactions are only applied when a full block that includes the chunk
is processed. A participant only processes a block if
3. For each chunk the participant does maintain the state for they have the
full chunk.
Once the block is being processed, for each shard for which the participant
maintains the state for, they apply the transactions and compute the new state
as of after the transactions are applied, after which they are ready to produce
the chunks for the next block, if they are assigned to any shard, since they have
the merkle root of the new merkelized state.
26
Distributing the receipts. Once cpa is ready to produce the chunk for
shard a for block B, they fetch all the receipts generated by applying the trans-
actions from block A for shard a, and included them into the chunk for shrad
a in block B. Once such chunk is generated, cpa produces its erasure coded
version and all the corresponding onepart messages. cpa knows what block pro-
ducers maintain the full state for which shards. For a particular block producer
bp cpa includes the receipts that resulted from applying transactions in block A
for shard a that have any of the shards that bp cares about as their destination
in the onepart message when they distributed the chunk for shard a in block B
(see figure 17, that shows receipts included in the onepart message).
Receiving the receipts. Remember that the participants (both block pro-
ducers and validators) do not process blocks until they have onepart messages
for each chunk included in the block. Thus, by the time any particular particpi-
ant applies the block B, they have all the onepart messages that correspond to
chunks in B, and thus they have all the incoming receipts that have the shards
the participant maintains state for as their destination. When applying the
state transition for a particular shard, the participant apply both the receipts
that they have collected for the shard in the onepart messages, as well as all
the transactions included in the chunk itself.
27
Figure 19: If all the receipts target the same shard, the shard might not have
the capacity to process them
28
Figure 20: Delayed receipts processing
participants that maintain the state. They can be block producers, validators,
or just external witnesses that downloaded the state and validate the shard in
which they store assets.
In this document we assume that majority of the participants cannot store
the state for a large fraction of the shards. It is worth mentioning, however,
that there are sharded blockchains that are designed with the assumption that
most participants do have capacity to store the state for and validate most of
the shards, such as QuarkChain.
Since only a fraction of the participants have the state to validate the shard
chunks, it is possible to adaptive corrupt just the participants that have the
state, and apply an invalid state transition.
Multiple sharding designs were proposed that sample validators every few
days, and within a day any block in the shard chain that has more than 2/3
of signatures of the validators assigned to such shard is immediately considered
final. With such approach an adaptive adversary only needs to corrupt 2n/3 + 1
of the validators in a shard chain to apply an invalid state transition, which,
while is likely hard to pull off, is not a level of security sufficient for a public
blockchain.
As discussed in section 2.3, the common approach is to allow a certain win-
dow of time after a block is created for any participant that has state (whether
it’s a block producer, a validator, or an external observer) to challenge its va-
lidity. Such participants are called Fishermen. For a fisherman to be able to
challenge an invalid block, it must be ensured that such a block is available to
them. The data availability in Nightshade is discussed in section 3.4.
In Nightshade once a block is produced, the chunks were not validated by
anyone but the actual chunk producer. In particular, the block producer that
suggested the block naturally didn’t have the state for most of the shards, and
29
was not able to validate the chunks. When the next block is produced, it con-
tains attestations (see section 3.2) of multiple block producers and validators,
but since the majority of block producers and validators do not maintain state
for most shards as well, a block with just one invalid chunk will collect signifi-
cantly more than half of the attestations and will continue being on the heaviest
chain.
To address this issue, we allow any participant that maintains the state of
a shard to submit a challenge on-chain for any invalid chunk produced in that
shard.
30
3.7.2 Fishermen and fast cross-shard transactions
As discussed in section 2.3, once we assume that the shard chunks (or shard
blocks in the model with shard chains) can be invalid and introduce a challenge
period, it negatively affects the finality, and thus cross-shard communication. In
particular, the destination shard of any cross-shard transction cannot be certain
the originating shard chunk or block is final until the challenge period is over
(see figure 21).
Figure 21: Waiting for the challenge period before applying a receipt
31
Figure 22: Applying receipts immediately and rolling back the destination
chain if the source chain had an invalid block
the challenge period the adaptive adversary needs to corrupt all the participants
that maintain the state of the shard, including all the validators.
Estimating the likelihood of such an event is extremely complex, since no
sharded blockchain has been live sufficiently long for any such attack to be at-
tempted. We argue that the probability, while extremely low, is still sufficiently
large for a system that is expected to execute multi-million transactions and
run a world-wide financial operations.
There are two main reasons for this belief:
32
Proof-of-Work chains are primarily incentivized by the financial upside. If
an adaptive adversary offers them more money then the expected return
from operating honestly, it is reasonable to expect that many validators
will accept the offer.
2. Many entities do validation of Proof-of-Stake chains professionally, and
it is expected that a large percentage of the stake in any chain will be
from such entities. The number of such entities is sufficiently small for an
adaptive adversary to get to know most of them personally and have a
good understanding of their inclanation to be corrupted.
We take one step further in reducing the probability of the adaptive cor-
ruption by hiding which validators are assigned to which shard. The idea is
remotely similar to the way Algorand [5] conceals validators.
It is critical to note that even if the validators are concealed, as in Algorand
or as described below, the adaptive corruption is still in theory possible. While
the adaptive adversary doesn’t know the participants that will create or validate
a block or a chunk, the participants themselves do know that they will perform
such a task and have a cryptographic proof of it. Thus, the adversary can
broadcast their intent to corrupt, and pay to any participant that will provide
such a cryptographic proof. We note however, that since the adversary doesn’t
know the validators that are assigned to the shard they want to corrupt, they
have no other choice but to broadcast their intent to corrupt a particular shard to
the entire community. At that point it is economically beneficial for any honest
participant to spin up a full node that validates that shard, since there’s a high
chance of an invalid block appearing in that shard, which is an opportunity to
create a challenge and collect associated reward.
To not reveal the validators that are assigned to a particular shard, we do
the following (see figure 24):
Using VRF to get the assignment. At the beginning of each epoch each
validator uses a VRF to get a bitmask of the shards the validator is assigned to.
The bitmask of each validator will have Sw bits (see section 3.3 for the definition
of Sw ). The validator then fetches the state of the corresponding shards, and
during the epoch for each block received validates the chunks that correspond
to the shards that the validator is assigned to.
Sign on blocks instead of chunks. Since the shards assignment is con-
cealed, the validator cannot sign on chunks. Instead it always signs on the entire
block, thus not revealing what shards it validates. Specifically, when the val-
idator receives a block and validates all the chunks, it either creates a message
that attests that all the chunks in all the shards the validator is assigned to are
valid (without indicating in any way what those shards are), or a message that
contains a proof of an invalid state transition if any chunk is invalid. See the
section 3.8 for the details on how such messages are aggregated, section 3.7.4 for
the details on how to prevent validators from piggy-backing on messages from
other validators, and section 3.7.5 for the details how to reward and punish
validators should a successful invalid state transition challenge actually happen.
33
Figure 24: Concealing the validators in Nightshade
3.7.4 Commit-Reveal
One of the common problems with validators is that a validator can skip down-
loading the state and actually validating the chunks and blocks, and instead
observe the network, see what the other validators submit and repeat their
messages. A validator that follows such a strategy doesn’t provide any extra
security for the network, but collects rewards.
A common solution for this problem is for each validator to provide a proof
that they actually validated the block, for example by providing a unique trace
of applying the state transition, but such proofs significantly increase the cost
of validation.
34
Instead we make the validators first commit to the validation result (either
the message that attests to the validity of the chunks, or the proof of an invalid
state transition), wait for a certain period, and only then reveal the actual vali-
dation result, as shown on figure 25. The commit period doesn’t intersect with
the reveal period, and thus a lazy validator cannot copycat honest validators.
Moreover, if a dishonest validator committed to a message that attests to the
validity of the assigned chunks, and at least one chunk was invalid, once it is
shown that the chunk is invalid the validator cannot avoid the slashing, since,
as we show in section 3.7.5, the only way to not get slashed in such a situation
is to present a message that contains a proof of the invalid state transition that
matches the commit.
35
Figure 26: Handling the challenge
36
the ECDMA signatures from the block producers and rely on the fact that the
block producer was not challenged and slashed.
Using on-chain transactions and merkle proofs for challenges. It
can be noted that there’s no value in revealing messages from validators if no
invalid state transition was detected. Only the messages that contain the actual
proofs of invalid state transition need to be revealed, and only for such messages
it needs to be shown that they match the prior commit. The message needs to
be revealed for two purposes:
1. To actually initiate the rollback of the chain to the moment before the
invalid state transition (see section 3.7.5).
2. To prove that the validator didn’t attempt to attest to the validity of the
invalid chunk.
1. The actual commit was not included on chain, only the merkle root of the
commit aggregated with other messages. The validator needs to use the
merkle path provided by the block producer and their original commit to
prove that they committed to the challenge.
2. It is possible that all the validators assigned to the shard with the invalid
state transition happen to be assigned to corrupted block producers that
are censoring them. To get around it we allow them to submit their reveals
as a regular transaction on-chain and bypass the aggregation.
The latter is only allowed for the proofs of invalid state transition, which are
extremely rare, and thus should not result in spamming the blocks.
The final issue that needs to be addressed is that the block producers can
choose not to participate in messages aggregation or intentionally censor partic-
ular validators. We make it economically disadvantageous, by making the block
producer reward proportional to the number of validators assigned to them. We
also note that since the block producers between epochs largely intersect (since
it’s always the top w participants with the highest stake), the validators can
largely stick to working with the same block producers, and thus reduce the risk
of getting assigned to a block producer that censored them in the past.
37
Near to Ethereum is a requirement, and today verifying BLS signatures to ensure
Near blocks validity on Ethereum’s side is not possible.
Each block in the Nightshade main chain can optionally contain a Schnorr
multisignature on the header of the last block that included such a Schnorr
multisignature. We call such blocks snapshot blocks. The very first block of
every epoch must be a snapshot block. While working on such a multisignature,
the block producers must also accumulate the BLS signatures of the validators
on the last snapshot block, and aggregate them the same way as described in
section 3.8.
Since the block producers set is constant throughout the epoch, validating
only the first snapshot blocks in each epoch is sufficient assuming that at no
point a large percentage of block producers and validators colluded and created
a fork.
The first block of the epoch must contain information sufficient to compute
the block producers and validators for the epoch.
We call the subchain of the main chain that only contains the snapshot
blocks a snapshot chain.
Creating a Schnorr multisignature is an interactive process, but since we
only need to perform it infrequently, any, no matter how inefficient, process
will suffice. The Schnorr multisignatures can be easily validated on Ethereum,
thus providing crucial primitives for a secure way of performing cross-blockchain
communication.
To sync with the Near chain one only needs to download all the snapshot
blocks and confirm that the Schnorr signatures are correct (optionally also ver-
ifying the individual BLS signatures of the validators), and then only syncing
main chain blocks from the last snapshot block.
4 Conclusion
In this document we discussed approaches to building sharded blockchains and
covered two major challenges with existing approaches, namely state validity
and data availability. We then presented Nightshade, a sharding design that
powers NEAR Protocol.
The design is work in progress, if you have comments, questions or feedback
on this document, please go to https://near.chat.
References
[1] Monica Quaintance Will Martino and Stuart Popejoy. Chainweb: A proof-
of-work parallel-chain architecture for massive throughput. 2018.
[2] Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin. Fraud proofs:
Maximising light client security and scaling blockchains with dishonest ma-
jorities. CoRR, abs/1809.09044, 2018.
38
[3] Songze Li, Mingchao Yu, Salman Avestimehr, Sreeram Kannan, and Pramod
Viswanath. Polyshard: Coded sharding achieves linearly scaling efficiency
and security simultaneously. CoRR, abs/1809.10361, 2018.
[4] Ittai Abraham, Guy Gueta, and Dahlia Malkhi. Hot-stuff the linear, optimal-
resilience, one-message BFT devil. CoRR, abs/1803.05069, 2018.
[5] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai
Zeldovich. Algorand: Scaling byzantine agreements for cryptocurrencies. In
Proceedings of the 26th Symposium on Operating Systems Principles, SOSP
’17, pages 51–68, New York, NY, USA, 2017. ACM.
[6] Vitalik Buterin and Virgil Griffith. Casper the friendly finality gadget.
CoRR, abs/1710.09437, 2017.
39