Supra Containers Whitepaper
Supra Containers Whitepaper
v2.0
Abstract
By definition, app-chains expect isolation to efficiently serve specific use cases. Currently, Layer 2s,
Avalanche’s Subnets, Polkadot’s Parachains, and Cosmos Zones are commonly used solutions to
host app-chains. We observe that in an ecosystem with a high throughput Layer 1 blockchain, it is
sufficient to offer this isolation at the application layer via namespaces. We observe that most of
the times smart contracts come bundled as they compose and call amongst this bundle. Hence, on
Supra, app-chains are offered as Supra Containers, which group smart contracts into namespaces,
disallowing atomic function calls across containers.
On a closer investigation we observe that it is not the isolation that is desired in the appchain
setting, but a regulated way to access services outside of their appchains to be more precise. These
are implemented as cross-(Cosmos)zone, or cross-L2, or cross-(Polkadot)parachain transactions.
Supra containers offer container-admins to bake in their custom cross-container access control
specification into the runtime so that such services are accessed atomically and efficiently, compared
to a multi-network-hop accesses for such accesses in the regular appchains. This completely avoids
fragmentation of liquidity across containers.
This container abstraction lends itself naturally to the parallel execution of transactions in batches
derived from different containers, as transactions belonging to batches of two different containers
are non-conflicting. Through experimentation, we demonstrate the performance impact of Supra’s
containerized parallel execution by comparing it against Aptos’ BlockSTM and Solana’s SeaLevel
techniques at various parameters. In the presence of cross-container transactions a partial-order
based approach towards parallel execution is more applicable.
1 Introduction
Blockchains started off primarily with native currency transfers in the form of the Bitcoin
network [16]. Soon Ethereum [9] introduced smart contracts capability and provided full
Turing-complete on-chain programming, and it was heralded as Blockchain 2.0. This on-chain
programming gave birth to transitioning many of the traditional financial instruments on to
blockchains and provided the Decentralized Finance (DeFi) space with a clear technological
advantage over legacy financial infrastructure. Though blockchains are being adopted
in various applications like synthetic assets, forex swaps, supply chain management and
validation, etc., DeFi has been the major driving force in the adoption of blockchains.
Over time many blockchains have proliferated, and Turing-complete, on-chain programming
has become the norm. With the rise in the adoption of blockchains and the efficiency gains
being developed regarding their consensus protocols, soon the challenges of optimizing
execution times for transactions came to the fore. The traditional sequential execution
employed by various clients of Ethereum seem insufficient at scale, and the need to exploit
and apply the advances of parallel execution are now imminent [7].
Blockchains such as Solana [26], Sawtooth [19], Sei [20], and Sui [23] adopted the approach
of allowing the clients to specify the read and write set information in the form of the accounts
and objects being accessed. This specification helps its runtime to infer the conflicts and
2 Supra Containers as App-Chains and Parallel Execution - v2.0
2 App-Chains - An overview
as well as to its community. A community can employ in their own bespoke business models,
release their own tokens, create their own governance mechanisms, and set transaction fees
to facilitate smooth and sustainable operations. In a sense, app-chains are becoming digital
decentralized gated communities. Some examples of app-chains are dYdX v4 [8] on Cosmos,
and DeFi Kingdoms [6] on Avalanche.
Here isolation is the desiderata. The transactions targeting such an application expect
that there is no influence from any other transactions meant for another application, be
it in the form of escalation of transaction fees, or in any other form. It is often seen in
popular public blockchains that a high volume of traffic targeting some contracts increase
the transaction fees for the transactions targeting smart contracts that attract low traffic.
Users and developers debate over whether this unfairly impacts certain dApps more than
others. This is one of the reasons the demand for app-chains has grown stronger as more
users enter the space.
Interestingly, we observe that app-chains came into use before public blockchains had
become capable of the same high throughput metrics as they are today. Hence Layer 2s,
sidechains, Cosmos Zones, Polkadot Parachains, Avalanche Subnets, etc, were favored as
solutions to app-chain requirements. Refer Figure 1. However in the case of a high throughput
and fast finality Layer 1 blockchain, we believe app-chains can be accommodated and offered
inside Layer 1s alone, rather than looking outwardly for solutions in the Layer 2 space. This
direction led us to the design of Supra containers.
Though isolation is desired, it is not the end-game for app-chains. They do want some
extent of composability, typically for accessing services outside of their app-chain. Usually,
such composability requests go through a relay bridge and involve a multi-network-hop
process leading towards high latency. We aim to provide such cross-container accesses in
an atomic execution resulting with low latencies. As a benefit we also avoid fragmenting
liquidity across containers.
3 Supra Containers
Supra containers are basically namespaces, partitions of the blockchain’s global state (shown
in Figure 2), and should not be confused with ‘docker’ containers. They group multiple
related smart contracts under one namespace, regulating any cross namespace function calls.
An important thing to note that the cross-container interactions are gated.
We also observe that smart contracts are naturally grouped, especially in a setting where
no dynamic dispatch is used. Ethereum Virtual Machine (EVM) supports dynamic dispatch
but the original Move [3] and most of its variants [2, 24] do not support it. However, Aptos
4 Supra Containers as App-Chains and Parallel Execution - v2.0
allows a limited version of dynamic dispatch in the way that transactions themselves can
contain scripts so that functions across different smart contracts can be dynamically called.
1
Supra Technology Overview whitepaper will be available soon on Supra.com.
Supra Research 5
Supra Containers naturally inherit the advantages of Supra’s network and its services, and
the prominent ones are listed below:
Security The containers are hosted on Supra’s Layer 1 blockchain, meaning they share
the security guarantees of Supra’s network, which are detailed in the Supra Technology
Overview whitepaper1 . There is no exclusive set of validators responsible for containers,
but all containers share the same validators set from the Supra network.
Vertically Integrated and Cross-chain Services As detailed in the Supra Technology Overview
whitepaper, all the services hosted on Supra network such as DORA [5] for Oracle feeds,
VRF [13] for on-chain randomness, zero-block delay Automation, and Supra’s IntraLayer
for cross-chain services become available for the applications deployed in a container. So
this turns any dApp into a Super-dApp from the get-go.2
State Sharding Supra containers complement state sharding. Traditionally sharding [15] has
been the first approach to scale execution. However the complexities of demarcating the
states across various shards so that a consistent global state is maintained has proven to
be the core challenge. Semantically, sharding and the container abstraction are seemingly
the same. Container abstraction provides this required demarcation such that several
containers can be packed into a single shard, and as long as the state shards respect
container boundaries – by not splitting a container across shards – this challenge is tamed.
DeFi Containers also enable the separation of jurisdictions. Consider the cases of public
DeFi and nation state-regulated DeFi. Such cases typically come with a mandate to
disallow interactions between the services of one with the other. Privacy-preserving
and publicly-visible services can be another class. Here too, container abstraction helps
the governance bodies of these classes to structure their offerings without violating any
mandates as separation is guaranteed via containerization.
DevX Towards making strides in developer experience (DevX), Supra is working to enable
the simple plug-and-play creation of containers or app-chains. Developers will be
presented with the options of which applications their containers should be initialized
with. Developers may choose lending protocol X from an array of choices, and another
Automated Market Maker Y from another set, and build a dApp in their own exclusive
container.
2
Supra zero-block delay Automation whitepaper will be available soon on Supra.com.
6 Supra Containers as App-Chains and Parallel Execution - v2.0
In this section we investigate into the benefits of grouping smart contracts into containers
towards optimizing the execution time.
From the perspective of executing transactions, containers may be seen as an access
specification. So we posit this study amidst the spectrum of input specifications of storage
locations that are read by or written to by the transactions. On one extreme end, transactions
come with no specifications – we call access set oblivious model, e.g., Ethereum [9] and
Aptos [1] transactions. On the other extreme end, transactions come with full specification
of read and write sets – we call access set aware model, e.g., Solana [26] and Sui [23]. It
is well noted by the community that the transaction sizes bloat and occupy more block
space with full specifications. We find the specifications from the container abstraction
to be in the middle of these extremes. As they do not add any extra information to the
transactions themselves, they don’t bloat the transactions burdening the network bandwidth
to disseminate the transaction data. In the rest of the paper, we show that the execution
time benefits yielding from the container abstraction is significant and is comparable with
the full specification, and hence this abstraction is a secure and efficient optimization.
We now present Supra-work-sharing queue (Supra-WSQ) scheduler, a simple algorithm
that leverages the container abstraction and achieves maximal parallelism. This algorithm
works analogously to the following real-world scenario. Consider that there are many clerks
and a stream of incoming clients who require a clerk’s signature on a document. The clients
form a queue such that whoever is in the front of the queue walks to any available clerk.
The clerk gets busy working through the document provided by the client and concludes
the session with the client by either signing the document or not. The clerks are never idle
as the next client approaches a clerk as soon as the current client finishes his session. This
protocol is depicted in Figure 3.
Recall that the container abstraction requires the transactions to target a single container.
We term an ordered list of transactions targeting a single container as a batch of container
transactions, or simply a batch, when the context is clear. Now we apply the aforementioned
real-world scenario to our execution model by mapping clerks to the cores and the clients to
the batches of container transactions.
We present the pseudocode of Supra-work-sharing queue scheduler in Algorithm 1. The
procedure init sets all the cores (note that clerks are analogous to cores) to the available
state. The procedure Supra-work-sharing-queue is the main procedure. It takes the batch
which is in the front of the batch queue (given by head(batches)) and assigns to an available
core (given by getAvailableIndex(cores)). Note that ‘getAvailableIndex’ needs to be an
atomic operation. If no cores are available, the algorithm waits until at least one becomes
Supra Research 7
available. Then, the assigned core initializes the execution of its batch. Upon completion of
the execution, the core becomes available for execution of subsequent batches in the queue.
Because the Supra-work-sharing queue algorithm uses multiple cores for the sequential
execution of batches, it is faster than executing all the batches sequentially on a single core
when multiple batches from different containers are to be executed.
Consider an approach in which we utilize a scheduling algorithm or other technique that
utilizes all the cores to execute a single batch. After the execution of a batch, the next batch
is also executed in a similar fashion. Since we do not preclude the presence of conflicts in the
execution of transactions inside a batch, naturally and necessarily these techniques need to
account for such conflicts and facilitate its resolution using aborts and retries.
As there are no aborts and retries in the Supra-work-sharing queue algorithm, it executes
even faster than these techniques in the presence of many conflicts among transactions. The
important thing to note is that in these techniques are not exploiting the specification given by
the container abstraction so it is not truly apples-apples comparison. Nevertheless, because
the Supra-work-sharing queue algorithm leverages this specification based on container
abstraction, it executes faster.
To reiterate, it is not the superiority of the Supra-work-sharing queue algorithm that
gives an advantage in execution times, but rather the facility of the specification given by
the container abstraction.
6 Related Work
In this section we review the state of the art parallel execution techniques pertaining to
blockchain transactions and posit our containerized parallel execution technique.
In the access set oblivious model, the transactions do not come with any specification on
the accounts or the storage locations that they may read or write. This is so in the case of
Ethereum, Aptos and many other chains. Ethereum uses a sequential execution mechanism
to process the transactions. However, Aptos utilizes parallel execution of transactions on
multi-core processors by innovating a speculative Software Transactional Memory (STM)
approach, called Block-STM [12].
In Block-STM, there are two tasks for each transaction: the execution task and the
validation task, prioritizing tasks for transactions lower in the preset serialization order.
8 Supra Containers as App-Chains and Parallel Execution - v2.0
A transaction may be executed and validated several times by the scheduler, the term
ith incarnation refers to the ith execution of a transaction. An incarnation is terminated
when the scheduler determines that a subsequent re-execution with an increased incarnation
number is required. For each incarnation i of a transaction Tk , Block-STM maintains two
transaction local buffers: a read-setik and a write-setik . The account and respective versions
that are read during the execution of the incarnation are contained in the read-setik . The
update is represented by (account, value) pairs and is stored in the write-setik . For an
incarnation i of Tk , the write-setik is stored in the multi-version in-memory data structure.
Further, an incarnation of a transaction must pass validation once it executes. The validation
compares the observed versions after re-reading (double-collecting) the read-setik . Intuitively,
a successful validation indicates that the write-setik is still legitimate and all reads in the
read-setik are consistent, while an unsuccessful validation implies that the incarnation i must
be aborted. When a transaction Tk is aborted, it implies that all higher transactions in
preset order after Tk can be committed only if they get successfully validated afterwards.
By executing transactions according to a preset serialization order, Block-STM dynamically
detects dependencies and avoids conflicts during execution, further ensuring that the outcome
is consistent with sequential execution.
In the access set aware model, the transactions come up with a specification of all the
accounts or the storage locations that they may read or write. Consider Solana [22]. The
following is from the best of our understanding of Solana’s documentation and code. We
are happy to revise our paraphrasing of Solana’s approach if we are made aware known of
any errors or discrepancies. The client’s wallet packs a transaction with all the accounts
that are read from and written to by it, and sends it to the current and the next leader
(leaders are determined in advance) via an RPC node. This read-write set specification
helps in parallelizing the execution as illustrated by a simplified algorithm run by the leader
(mimicking Solana’s parallel execution) given in Algorithm 2. Figure 4 (taken from [25]) also
depicts this scheme visually.
Let us consider the question of maximum number of iterations. As there are n cores,
there are n lists. Given a list, we denote k to be the contention factor, meaning the maximum
number of transactions in that list contending for a common account. Let k1 , k2 , . . . , kn
be the contention factors of each of the lists. Then the number of iterations will be
Supra Research 9
max(ki ), 1 ≤ k ≤ n.
Theoretically speaking, Algorithm 2 performs the best when all transactions are executed
in a single iteration. By increasing the number of iterations, the algorithm degrades in
performance.
As mentioned earlier, the containers abstraction falls in between the access-set-oblivious
model and the access-set-aware model and the rest of the paper shows the benefits from such
an abstraction.
7 Experiments
In this section we study the execution latency and execution throughput of the aforementioned
Supra-work-sharing queue (Supra-WSQ) approach by executing on multiple cores.
As discussed in Section 5, containerization groups transactions into different batches,
each of which is executed sequentially, while batches of different containers are executed in
parallel. The sequential execution of transactions within each batch will overcome the abort
and re-execution overhead of speculative or optimistic execution. Hence, we can expect that
batching will outperform speculative execution when the workload is inherently sequential,
meaning transactions inside each batch are sequential by nature and form a chain of conflicts.
Naturally, the number of conflicts in a batch increases as we decrease the number of keys.
Further, increasing the number of operations per transaction increases the number of
conflicts in a batch. We construct the high conflict workload (Wh ) by allocating just two
keys and performing 20 operations per transaction.We include 10 transactions per batch
and 100 such batches in a block. This results in a workload with high-conflicts (more
conflicts) pushing the batch towards sequential execution. The transactions in every
batch operate by accessing a key space of size 103 randomly (following the log-normal
distribution).
When 103 keys are allocated per batch with 2 operations per transaction we find that the
conflicts reduce, and so we generate a moderate conflict workload (Wm ). Here we include
10 transactions per batch and 100 such batches in a block.
When we increase the key space allotted per batch to 105 with a uniform random key
distribution for transactions to access them, the number of conflicts decreases further.
We construct the low conflict workload (Wl ), by fixing 2 operations per transaction, 10
transactions per batch, and 100 such batches in the block, where each batch is allocated
105 keys.
All the above mentioned workloads are succinctly represented in Table 1.
Assumptions
Our experiments and evaluation are under the following assumptions:
A1 There are more batches in the block than the number of cores available to start with.
A2 The batches contain transactions targeted for a single container, meaning there are no
cross-container transactions. We plan to consider and analyze cross-container transactions
in the next version of this paper.
A3 The workload is homogeneous meaning batches contain equal number of transactions.
Supra Research 11
Evaluation
For each workload (from Table 1), we execute blocks for Supra-work-sharing queue (containerized
batching), Block-STM, SeaLevel, and sequential execution. The sequential execution serves as
a baseline. The histogram plots in Figure 5 − Figure 7 show an average execution throughput
(tps) on the Y-axis, while the number of threads varies on the X-axis from 2 to 32. The
experiments were carried out 52 times; the first 2 executions are left as warm-up runs, and
each data point in the histograms is an average of 50 executions.
The histogram plots for the average tps on high conflict workload Wh can be seen in
Figure 5. As shown, Supra-work-sharing queue outperforms as the number of threads increases
since there is no speculative execution overhead of abort and re-execution. Due to increased
contention at the shared queue, there is a decline in performance for Supra-work-sharing
queue at 32 threads. Although throughput for both Block-STM and SeaLevel increases
with threads, the Block-STM performance falls short of Supra-work-sharing queue due to
increased aborts and re-execution (re-validation) overhead; similarly, in SeaLevel, there will
be more iterations with fewer transactions. The optimal throughput for Supra-work-sharing
queue is ≈ 98k tps at 24 threads. Block-STM and SeaLevel reach a peak throughput of
≈ 54k and ≈ 52k tps at 32 threads, respectively.
0.9
92,219
Supra-WSQ
Block-STM
0.8
SeaLevel
Throughput (tps)
52,554
51,961
0.6
48,090
46,375
59,356
0.5
33,387
30,225
0.4
27,141
25,578
34,473
19,934
0.3
16,112
16,112
16,112
16,112
16,112
0.2
20,528
0.1
2 4 8 24 32
Threads
Figure 5 Synthetic workload Wh : high conflicts within each batch
The workload Wm has fewer conflicts than Wh . Figure 6 shows that all three of the
parallel execution approaches outperform sequential execution in this case and perform
neck-to-neck with one another. The execution overhead in Block-STM is moderate because
there aren’t many conflicts; likewise, SeaLevel will have a moderate number of iterations
with multiple transactions. Block-STM performs more effectively with an optimal tps of
≈ 102k at 24 threads. At 32 threads, the maximum throughput of Supra-work-sharing queue
is ≈ 91k, while SeaLevel reaches ≈ 77k tps.
Figure 7 shows the histogram plot for Wl with low conflicts in the batches. The plots
demonstrate how the performance of Block-STM, Supra-work-sharing queue, and SeaLevel
increases with the number of threads, peaking at ≈ 127k, ≈ 100k, and ≈ 114k, respectively.
All three approaches have the maximum throughput because blocks have the high concurrency
in this workload. The Block-STM outperforms other approaches as it utilizes the threads in
12 Supra Containers as App-Chains and Parallel Execution - v2.0
91,805
88,979
1
1.03 · 105
1.01 · 105
Supra-WSQ
77,762
Block-STM
73,818
0.8 SeaLevel
Throughput (tps)
Sequential
49,494
0.6
41,721
53,515
29,522
0.4
23,090
18,123
32,185
14,922
14,922
14,922
14,922
14,922
13,930
0.2
23,005
2 4 8 24 32
Threads
Figure 6 Synthetic workload Wh : moderate conflicts within each batch
1.15 · 105
1.05 · 105
1.01 · 105
1.01 · 105
5
1.2
1.28 · 10
Supra-WSQ
5
1.16 · 10
Block-STM
1 SeaLevel
Throughput (tps)
Sequential
63,320
0.8
49,286
66,499
37,706
0.6
28,929
24,561
0.4
17,227
39,481
14,955
14,955
14,955
14,955
14,955
27,191
0.2
2 4 8 24 32
Threads
Figure 7 Synthetic workload Wl : low conflicts within each batch
Oprs/Txs Maximum
Block No. of Total
Average Maximum Txs in a Keys
Number Batches Txs
Reads Writes Reads Writes Batch
205465004 5 7 23 37 50 118 382 1007
205465000 - 205465049 6 7 34 41 171 1321 5626 5354
of Quicknode [17]. We chose Solana because transactions are made up of account access
information, an array of accounts to read from (read-set) or write to (write-set) [22]. This
information aids in the construction of the batches for containerized execution. The real-
world block transactions show heterogeneity in the workload, meaning when transactions are
batched they are not necessarily of the same sizes. Transactions may also not all have the
same number of reads and writes unlike in our synthetic tests.
We show the result of two experiments here: one on a randomly selected block - Block
number 205465004 of Solana blockchain, another on a block constructed by concatenating
consecutive 50 blocks from Block number 205465000 to 205465049 of Solana blockchain. The
Table 2 show more details of these tests. We remove the Solana voting transactions from
these blocks. The containerized batches are constructed by placing conflicting transactions
in different batches. For any two transactions i and j if either: read-set i ∩ write-set j or
read-set j ∩ write-set i or write-set i ∩ write-set j is non-empty then they are placed in different
batches.
The histogram plots for the throughput in average tps on block 205465004 from Solana
are shown in Figure 8. As shown, the containerized abstraction proves more valuable as
the number of threads increases since there is no speculative execution overhead of abort
and re-execution. Although throughput for both Block-STM and SeaLevel increases with
threads, the Block-STM performance falls short of Supra-work-sharing queue due to increased
aborts and re-execution (re-validation) overhead; similarly, in SeaLevel, there will be more
iterations with fewer transactions. The optimal throughput for Supra-work-sharing queue
is ≈ 3k tps. Block-STM and SeaLevel reach a peak throughput of ≈ 1k and ≈ 800 tps,
Txs 382, Batches 50, Max Txs in a Batch 118, Keys 1007
3,500
3,363
3,348
3,340
Supra-WSQ
3,000 Block-STM
SeaLevel
Throughput (tps)
2,500 Sequential
2,521
2,000
1,361
1,351
1,331
1,224
1,500
946
1,315
870
861
1,000
717
690
650
644
644
644
644
644
500
2 4 8 24 32
Threads
Figure 8 Historical workload Wht : Block 205465004
14 Supra Containers as App-Chains and Parallel Execution - v2.0
Txs 5626, Batches 171, Max Txs in a Batch 1321, Keys 5354
3,500
3,403
3,393
3,387
Supra-WSQ
3,000 Block-STM
SeaLevel
Throughput (tps)
2,500 Sequential
2,296
2,000
1,673
1,655
1,577
1,366
1,500
986
1,208
1,000
642
627
597
597
597
597
597
588
577
570
500
2 4 8 24 32
Threads
Figure 9 Historical workload Wht : Block 205465000 - 205465049
respectively. Similarly, on the concatenated Solana blocks: from 205465000 to 205465049, the
performance trends remain the same for Supra-work-sharing queue and Block-STM, reach a
peak throughput of ≈ 3.4k and ≈ 642 tps, respectively, as shown in Figure 9. However, the
SeaLevel performance has seen upward trends and achieves an optimal throughput of ≈ 1.5k
tps, due to the increased number of transactions per iteration.
In summary, the analysis of real-world transactions of the Solana blocks shows that the
container abstraction provides more value in parallel execution than existing state-of-art
parallelization techniques.
8 Conclusion
References
1 The aptos blockchain: Safe, scalable, and upgradeable web3 infrastructure. https://aptos.
dev/assets/files/Aptos-Whitepaper-47099b4b907b432f81fc0effd34f3b6a.pdf, August
2022. [August 11, 2022].
2 Move on aptos. https://aptos.dev/move/move-on-aptos/. [Online: accessed 22 Sep 2023].
3 Sam Blackshear, Evan Cheng, David L Dill, Victor Gao, Ben Maurer, Todd Nowacki, Alistair
Pott, Shaz Qadeer, Dario Russi Rain, Stephane Sezer, et al. Move: A language with
programmable resources. Libra Assoc, page 1, 2019.
Supra Research 15