Beigepaper: An Ethereum Technical Specification: 1. Imagining Bitcoin As A Computer
Beigepaper: An Ethereum Technical Specification: 1. Imagining Bitcoin As A Computer
Micah Dameron
Abstract
The Ethereum Protocol is a deterministic but practically unbounded state-machine with two basic functions;
the first being a globally accessible singleton state, and the second being a virtual machine that applies changes
to that state. This paper explains the individual parts that make up these two factors.
nal non-network forces. This property arises from the Finney Ξ0.001000000000000000 1,000,000,000,000,000
inherent security of the blockchain which is built by, and Szabo Ξ0.000001000000000000 1,000,000,000,000
1
2. Memory and Storage Beigepaper – v0.8.5 2019-08-15
497ff 4ac95
a The database backend is accessed by users through an external application, most likely an Ethereum client; see also: state database
bA bytearray is specific set of bytes [data] that can be loaded into memory. It is a structure for storing binary data, e.g. the contents of a file.
c This permanent data structure makes it possible to easily recall any previous state with its root hash keeping the resources off-chain and minimizing
2
2. Memory and Storage Beigepaper – v0.8.5 2019-08-15
2.2.1. Recursive Length Prefix Encoding 2. Ommers Hash – This is the Keccak-256 hash of
the ommer’s list portion of this block.
Recursive Length Prefix Encoding (RLP) imposes a
structure on data that intrinsically considers a prefixed 3. Beneficiary – This is the 20-byte address to
hex value to position the data in the state database which all block rewards are transferred.
tree. This hex value determines the depth of a cer- 4. State Root – This is the Keccak-256 hash of the
tain piece of data. There are two types of fundamental root node of the state trie, after a block and its
items one can encode in RLP:2 transactions are finalized.
2.3.1. The Block Header 14. Mix Hash – This is a 32-byte hash that verifies a
sufficient amount of computation has been done
Description : The information contained in a block on this block.
besides the transactions list. This consists of:
15. Nonce – This is an 8-byte hash that verifies a
1. Parent Hash – This is the Keccak-256 hash of the sufficient amount of computation has been done
parent block’s header. on this block.
3
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
Ommer Block Headers – These are ommer block The account state contains details of any particular ac-
headers (15 components listed above) of this block. count during some specified world state. The account
state is made up of four variables:
Transaction Series – This is a list of transactions in 1. nonce The number of transactions sent from
this block and the only non-header content in the block. this address, or the number of contract creations
made by the account associated with this ad-
dress.
2.3.3. Block Number and Difficulty 2. balance The amount of Wei owned by this ac-
count. Stored as a key/value pair inside the state
Note that is the difficulty of the genesis block. The database.
Homestead difficulty parameter, is used to affect a dy-
3. storage_root A 256-bit (32-byte) hash of the root
namic homeostasis of time between blocks, as the time
node of a Merkle Patricia Tree that encodes the
between blocks varies, as discussed below, as im-
storage contents of the account.a
plemented in EIP-2. In the Homestead release, the
exponential difficulty symbol, causes the difficulty to 4. code_hash The hash of the EVM code of this
slowly increase (every 100,000 blocks) at an exponen- account’s contract. Code hashes are stored in
tial rate, and thus increasing the block time difference, the state database. Code hashes are permanent
and putting time pressure on transitioning to proof- and they are executed when the address belong-
of-stake. This effect, known as the “difficulty bomb”, ing to that account receives a message call.
or “ice age”, was explained in EIP-649 and delayed
and implemented earlier in EIP-2, was also modified
2.3.6. Bloom Filter
in EIP-100 with the use of x, the adjustment factor, and
the denominator 9, in order to target the mean block The Bloom Filter is composed from indexable informa-
time including uncle blocks. Finally, in the Byzantium tion (logger address and log topics) contained in each
release, with EIP-649, the ice age was delayed by cre- log entry from the receipt of each transaction in the
ating a fake block number, which is obtained by sub- transactions list.
stracting three million from the actual block number,
which in other words reduced the time difference be-
2.3.7. Transaction Receipts
tween blocks, in order to allow more time to develop
proof-of-stake and preventing the network from “freez-
ing” up.4
3. Processing and Computation
2.3.4. Account Creation The basic method for Ethereum accounts to interact
with each other. The transaction is a single crypto-
Account creation definitively occurs with contract cre- graphically signed instruction sent to the Ethereum net-
ation. Is related to: init. Lastly, there is the body work. There are two types of transactions: message
which is the EVM-code that executes if/when the ac- calls and contract creations. Transactions lie at
count containing it receives a message call. the heart of Ethereum, and are entirely responsible for
aA particular path from root to leaf in the state database
4
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
5
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
All locations in both storage and memory are well- clears an entry in the storage is not only waived, a
defined initially as zero. The machine does not follow qualified refund is given; in fact, this refund is effec-
the standard von Neumann architecture. Rather than tively paid up-front since the initial usage of a storage
storing program code in generally-accessible memory location costs substantially more than normal usage.4
or storage, it is stored separately in a virtual ROM in-
teractable only through specialized instructions.
The machine can have exceptional execution for 3.8. Execution
several reasons, including stack underflows and invalid
instructions. Like the out-of-gas exception, they do The execution of a transaction defines the state tran-
not leave state changes intact. Rather, the machine sition function: stf. However, before any transaction
halts immediately and reports the issue to the execu- can be executed it needs to go through the initial tests
tion agent (either the transaction processor or, recur- of intrinsic validity.
sively, the spawning execution environment) which will
deal with it separately. 3.8.1. Intrinsic Validity
6
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
A valid transaction execution begins with a per- of a computer. This is actually an instantial runtime
manent change to the state: the nonce of the sender that executes several substates, as EVM computation
account is increased by one and the balance is de- instances, before adding the finished result, all calcu-
creased by the collateral_gasa which is the amount lations having been completed, to the final state via the
of gas a transaction is required to pay prior to its execu- finalization function.
tion. The original transactor will differ from the sender In addition to the system state and the
if the message call or contract creation comes from a remaining gas for computation there are several
contract account executing code. pieces of important information used in the execution
After a transaction is executed, there comes a environment that the execution agent must provide:
provisional state, which is a pre-final state. Gas
used for the execution of individual EVM opcodes prior
• account_address, the address of the account
to their potential addition to the world_state creates:
which owns the code that is executing.
• Provisional state. • sender_address the sender address of the trans-
• intrinsic gas, and action that originated this execution.
• an associated substate. • originator_price the price of gas in the trans-
• The accounts tagged for self-destruction action that originated this execution.
following the transaction’s completion. • input_data, a byte array that is the input data to
self_destruct(accounts) this execution; if the execution agent is a transac-
• The logs_series, which creates checkpoints in tion, this would be the transaction data.
EVM code execution for frontend applications to • account_address the address of the account
explore, and is made up of thelogs_set and which caused the code to be executing; if the ex-
logs_bloom from the tx_receipt. ecution agent is a transaction, this would be the
• The refund balance.b transaction sender.
Code execution always depletes gas. If gas runs • newstate_value the value, in Wei, passed to this
out, an out-of-gas error is signaled (oog) and the result- account if the execution agent is a transaction,
ing state defines itself as an empty set; it has no effect this would be the transaction value.4
on the world state. This describes the transactional na-
• code array the byte array that is the machine
ture of Ethereum. In order to affect the world state,
code to be executed.4
a transaction must go through completely or not at all.
• block_header the block header of the present
block.
3.8.3. Code Deposit
• stack_depth the depth of the present message-
If the initialization code completes successfully, a final call or contract-creation (i.e. the number of CALLs
contract-creation cost is paid, the code-deposit cost, c, or CREATEs being executed at present).4
proportional to the size of the created contract’s code.
The execution model defines the state_transition
function, which can compute the resultant
3.8.4. Execution Model
state, the remaining_gas, the accrued_substate
Basics : The stack-based virtual machine which lies and the resultant_output, given these defini-
at the heart of the Ethereum and performs the actions tions. For the present context, we will define it
a Designated “intrinsic_gas” in the Yellowpaper
b The sstore operation increases the amount refunded by resetting contract storage to zero from some non-zero state.
7
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
where the accrued substate is defined as the tu- of two instructions, which evaluates to the according
ple of the self-destructs_set, the log_series, the value: otherwise In general, we assume the memory,
touched_accounts and the refunds.4 self-destruct set and system state don’t change: how-
ever, instructions do typically alter one or several com-
3.8.5. Execution Overview ponents of these values.
8
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
it. Contract creations and message calls have entirely 3.9. Gas
different ways of executing, and are entirely different
Gas is the fundamental network cost unit converted to
in their roles in Ethereum. The concepts can be con-
and from ether as needed to complete the transaction
flated. Message calls can result in computation that
while it is sent. Gas is arbitrarily determined at the mo-
occurs in the next state rather than the current one. If
ment it is needed, by the block and according to the
an account that is currently executing receives a mes-
total network’s miners decision to charge certain fees.
sage call, no code will execute, because the account
Each miner choose individually which gas prices they
might exist but has no code in it yet. To execute a mes-
want to accept and which they want to reject.
sage call transactions are required:
9
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
10
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
akin to existing schemes, such as that employed in or an ommer with the same beneficiary address as the
Bitcoin-derived protocols. Since a block header in- present block,
cludes the difficulty, the header alone is enough to additions are applied cumulatively. The block re-
validate the computation done. Any block contributes ward is three ether per block.
toward the total computation or total difficulty of a chain.
Thus we define the total difficulty of this_block re- State & Nonce Validation The function that maps a
cursively by the difficulty of its parent block and the block B to its initiation state, that is, the hash of the
block itself. The jobs of miners and validators are root node of a trie of state x. This value is stored in
as follows: Validate (or, if mining, determine) the state database trivial and efficient since the trie is
ommers; validate (or, if mining, determine) by nature a resilient data structure. And finally define
transactions; apply rewards; verify (or, if the block_transition_function, which maps an in-
mining, compute a valid) state and nonce. complete block to a complete block with a specified
dataset. As specified at the beginning of the present
work, the state_transition_function, which is de-
3.11. Ommer Validation
fined in terms of, the block_finalisation_function
The validation of ommer headers means nothing more and, the transaction_evaluation_function. As pre-
than verifying that each ommer header is both a valid viously detailed, there is the nth corresponding status
header and satisfies the relation of Nth-generation om- code, logs and cumulative gas used after each trans-
mer to the present block. The maximum of ommer action, the fourth component in the tuple, has already
headers is two. been defined in terms of the logs).
11
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
determination of some token value pow_token. It is which can be computed for each block by scanning
utilised to enforce the security of the blockchain. Since through the block headers up until that point. From the
mined blocks produce a reward, the proof-of-work also seed, one can compute a pseudorandom cache, that
serves as a wealth distribution mechanism. For this is cache_init bytes in initial size. Light clients store
reason, the proof of work function is designed to be as the cache. From the cache, a dataset is generated,
accessible as possible to as many people as possible. dataset_size bytes in initial size, with the property
that each item in the dataset depends on only a small
A very basic application of this principle of acces- number of items from the cache. Full clients and min-
sibility is found in combining the traditional Proof-of- ers store the dataset. The dataset grows linearly with
Work function with a Memory-Hardness function. By time. Mining involves grabbing random slices of the
forcing the hashing algorithm to use memory as well dataset and hashing them together. Verification can
as CPU, miners are more likely to use computers than be done with low memory by using the cache to regen-
ASICs, meaning that ASIC efficiency will not obsolete erate the specific pieces of the dataset that you need,
the person who wants to mine on their home com- so you only need to store the cache. The large dataset
puter from participating in the mining process. To is updated once every 1 epoch (10,000) blocks, so the
make the Ethereum Blockchain ASIC resistant, the vast majority of a miner’s effort is spent on reading the
Proof-of-Work mechanism has been designed to be dataset, rather than on making changes to it.
sequential and memory-hard. This means that the
nonce requires high amounts of memory and band-
3.14.2. Difficulty Mechanism
width such that the memory cannot be used in paral-
lel to discover multiple nonces simultaneously. There- This mechanism enforces a relative predictability in
fore, the proof-of-work function takes the form of 2256 terms of the time-window between blocks; a smaller
the new block’s header but without the nonce and mix- period between the last two blocks results in an in-
hash components. There is the header_nonce, and crease in the difficulty level and thus additional com-
data_set which are required to compute the mix hash putation required, lengthening the next time-window.
and block_difficulty, the difficulty value of the new Conversely, if the time-window is too large, the
block. The proof-of-work function evaluates to an array difficulty is reduced, reducing the amount of time
with the first item being the mix hash and the second to the next block. The total_difficultya is the
item being a pseudorandom number which is crypto- difficulty_state of the entire Ethereum blockchain.
graphically dependent on the header_nonce and the The block_difficulty, in contrast, is not a state of
data_set. The name for this algorithm is Ethash. the blockchain, but is local–particular to each specific
block. You reach the total difficulty by summing the in-
dividual difficulty of all previous blocks and then adding
3.14.1. Ethash: Seed→Cache→Dataset→Slice
the difficulty of the present block.
Ethash is the Proof-of-Work algorithm which was used The GHOST Protocol provides an alternative so-
to launch the Ethereum network and bring it through its lution to double-spend attacks from the original solution
first few releases. It is in the process of being gradually in Satoshi Nakamoto’s Bitcoin Whitepaper. Nakamoto
phased out and replaced with a Proof-of-Stake model. solved the problem of double-spending by requiring the
For now it is the latest version of Dagger-Hashimoto, network to agree on a single block in order to function.
introduced by Vitalik Buterin. The general route that For that reason, in the Bitcoin protocol, it’s impossi-
the algorithm takes is as follows: There exists a seed ble to submit a “double-spend” block without having at
a Alternatively known as total_computation
12
3. Processing and Computation Beigepaper – v0.8.5 2019-08-15
least 50% of the network’s mining power to force the state data. This would be a variation on a compres-
longest chain. This is because the network automati- sion scheme.4
cally chooses the longest chain. So even if one wanted
to submit two spend transactions in a row, the network
3.17. Scalability
simply picks whichever one comes first, ignoring the
second because it no longer pertains to the longest Scalability is a constant concern. Because Ethereum’s
chain (which now contains the first block that was sent) state transitions are so broad in terms of possible con-
so the would-be hacker needs to submit a new block, tent, and because its applications and use-cases are
as the first double block is no longer feasible. so numerous in the number of potential transactions re-
The “GHOST Protocol” (which stands for Greedy quired, scalability is inherently necessary for increased
Heaviest Object subTree) rather requires that miners transaction throughput and for more efficient storage
begin mining whichever chain the most other miners and traversal of the chain.
are on. Because of differences in network propagation
of data about which miners are mining which block, this 3.17.1. Sharding
has a tendency to create more uncles. Nevertheless,
Parallelization of transaction combination and block
in spite of the increased amount of uncle blocks, the
building.
chain itself is equally secure, and this method allows
for higher throughput of transactions than Satoshi’s so-
3.17.2. Casper
lution to double-spending does.
3.17.3. Plasma
13
A. EVM Opcodes5 Beigepaper – v0.8.5 2019-08-15
A. EVM Opcodes5
14
A. EVM Opcodes5 Beigepaper – v0.8.5 2019-08-15
15
A. EVM Opcodes5 Beigepaper – v0.8.5 2019-08-15
16
A. EVM Opcodes5 Beigepaper – v0.8.5 2019-08-15
17
A. EVM Opcodes5 Beigepaper – v0.8.5 2019-08-15
18
A. EVM Opcodes5 Beigepaper – v0.8.5 2019-08-15
19
A. References Beigepaper – v0.8.5 2019-08-15
References
[1] W. contributors, Tree (data structure) — wikipedia,
the free encyclopedia, [Online; accessed
15-December-2017], 2017. [Online]. Available:
https : / / en . wikipedia . org / w / index . php ?
title = Tree _ (data _ structure ) %5C & oldid =
813972413 (cit. on p. 2).
20
Glossary Beigepaper – v0.8.5 2019-08-15
account state The state of a particular account–a serialization Serialization is the process of converting
section of the total world state. Comprises: the an object into a stream of bytes in order to store
nonce, balance, storage root, and code hash the object or transmit it to memory, a database,
of the account. 21 or a file. Its main purpose is to save the machine
addresses 20 character strings, specifically the right- state of an object in order to be able to recreate it
most 20 characters of the Keccak-256 hash of when needed. 21
the RLP-derived mapping which contains the state machine The term State Machine is reserved
sender’s address and the nonce of the block. 21 for any simple or complex process that moves de-
terministically from one discrete state to the next.
beneficiary The 20-character (160-bit) address to 21
which all fees collected from the successful min- state database A database stored off-chain, [i.e. on
ing of a block are transferred. 21 the computer of some user running an Ethereum
block header All the information in a block besides client] which contains a radix tree mapping bytear-
transaction information. 21 rays (organized chunks of binary data) to other
bytearrays. The relationships between each node
Contract A piece of EVM Code that may be associ- on this trie constitutes a mapping of Ethereum’s
ated with an Account or an Autonomous Object. state. 1, 4, 13, 21
21
storage root One aspect of an account’s state: this
Cryptographic hashing functions Hash functions is the hash of the triea that decides the storage
make secure blockchains possible by establish- contents of the account. 21
ing universal inputs for which there are limited,
Storage State The information particular to a given
usually only one, possible output yet that output
account that is maintained between the times that
is unique. 21
the account’s associated EVM Code runs. 21
Ethereum Runtime Environment The environment
transaction A piece of data, signed by an External Ac-
which is provided to an Autonomous Object ex-
tor. It represents either a Message or a new Au-
ecuting in the EVM. Includes the EVM but also
tonomous Object. Transactions are recorded into
the structure of the world state on which the re-
each block of the blockchain. 21
lies for certain I/O instructions including CALL &
CREATE. 21
EVM Assembly The human readable version of EVM Acronyms
code. 21
ERE Ethereum Runtime Environment. 21
EVM Code The bytecode that the EVM can natively
execute. Used to formally specify the meaning EVM Ethereum Virtual Machine. 21
and ramifications of a message to an Account. 21
RLP Recursive Length Prefix. 21
aA particular path from root to leaf in the state database
21
A. Index Beigepaper – v0.8.5 2019-08-15
Index
160 bit, 5 block header, 3, 7
256 bit, 4, 5 block header validity function, 9
50% attack, 13 block number, 4, 11
block reward, 5, 11
abstract state-machine, 1 block reward function, 11
account, 9 block rewards, 3
account address, 7 BLOCKHASH, 13
account addresses, 2 body, 9
account balance, 4 branch node, 2
account body, 4 byte array, 5
account code hash, 4 byzantium, 4
account creation, 4, 9
account init, 4 cache, 12
account nonce, 4 canonical blockchain, 11
account state, 4 canonical gas, 9
account states, 2 casper, 13
account storage root, 4 certainty, 1
accrued substate, 8 checkpoint nodes, 13
accumulated gas used, 11 checkpoints, 10
age, 13 child node, 2
ancestor node, 2 code array, 7
apply rewards, 5, 11 collisions, 11
arbitrarily determined, 9 complete block, 11
arbitrary length byte-array, 9 computation, 9
asic resistant, 12 computation of operation, 6
autonomous objects, 9 compute valid nonce, 11
available gas, 9 compute valid state, 11
contract creation, 4, 9
balance, 1 contract creation stack, 9
beneficiary, 3 contract creation transactions, 9
beneficiary address, 11 controlled halt, 10
big endian, 3 correct DAG, 9
big endian function, 9 cumulative difficulty, 12
Bitcoin, 1 cumulative gas, 11
Bitcoin Whitepaper, 13
block, 9 DAG, 9
block beneficiary, 5 data structure, 2
block composition, 3 dataset, 12
block contents, 9 dataset slice, 13
block difficulty, 12 descendant node, 2
block finalization state transition function, 11 deserialization, 5, 9
22
Index Beigepaper – v0.8.5 2019-08-15
23
Index Beigepaper – v0.8.5 2019-08-15
24
Index Beigepaper – v0.8.5 2019-08-15
25