Unit 2
Unit 2
BLOCKCHAIN ARCHITECTURE:
Operation of Bitcoin Blockchain, Blockchain Architecture – Block, Hash,
Distributer P2P, Structure of Blockchain- Consensus mechanism: Proof of Work
(PoW), Proof of Stake (PoS), Byzantine Fault Tolerance (BFT), Proof of Authority
(PoA) and Proof of Elapsed Time (PoET)
Blockchain Architecture
1
1. Header: It is used to identify the particular block in the entire blockchain. It handles all
blocks in the blockchain. A block header is hashed periodically by miners by changing the
nonce value as part of normal mining activity also three sets of block metadata are
contained in the block header.
2. Previous Block Address/ Hash: It is used to connect the i+1 th block to the ith block using
the hash. In short, it is a reference to the hash of the previous (parent) block in the chain.
3. Timestamp: It is a system and verifies the data into the block and assigns a time or date of
creation for digital documents. The timestamp is a string of characters that uniquely
identifies the document or event and indicates when it was created.
4. Nonce: A nonce number which uses only once. It is a central part of the proof of work in
the block. It is compared to the live target if it is smaller or equal to the current target.
People who mine, test, and eliminate many Nonce per second until they find that Valuable
Nonce is valid.
5. Merkel Root: It is a type of data structure frame of different blocks of data. A Merkle
Tree stores all the transactions in a block by producing a digital fingerprint of the entire
transaction. It allows the users to verify whether a transaction can be included in a block or
not.
Key Characteristics of Blockchain Architecture
Decentralization: In centralized transaction systems, each transaction needs to be
validated in the central trusted agency (e.g., the central bank), naturally resulting in cost
and the performance jam at the central servers. In contrast to the centralized mode, a third
party is not needed in the blockchain. Consensus algorithms in blockchain are used to
maintain data stability in a decentralized network.
Persistency: Transactions can be validated quickly and invalid transactions would not be
admitted by persons or miners who mining the crypto. It is not possible to delete or roll
back transactions once they are included in the blockchain network. Invalid transactions do
not carry forward further.
Anonymity: Each user can interact with the blockchain with a generated address, which
does not disclose the real identity of the miner. Note that blockchain cannot guarantee
perfect privacy preservation due to the permanent thing.
Auditability: Blockchain stores data of users based on the Unspent Transaction Output
(UTXO) model. Every transaction has to refer to some previous unspent transactions. Once
the current transaction is recorded into the blockchain, the position of those referred
unspent transactions switches from unspent to spent. Due to this process, the transactions
can be easily tracked and not harmed between transactions.
Transparency: The transparency of blockchain is like cryptocurrency, in bitcoin for
tracking every transaction is done by the address. And for security, it hides the person’s
identity between and after the transaction. All the transactions are made by the owner of the
block associated with the address, this process is transparent and there is no loss for anyone
who is involved in this transaction.
2
Cryptography: The blockchain concept is fully based on security and for that, all the
blocks on the blockchain network want to be secure. And for security, it implements
cryptography and secures the data using the cipher text and ciphers.
Block
The blockchain data structure is an ordered, back-linked list of blocks of transactions. The
blockchain can be stored as a flat file, or in a simple database. Blocks are linked “back,” each
referring to the previous block in the chain. The blockchain is often visualized as a vertical stack,
with blocks layered on top of each other and the first block serving as the foundation of the
stack. The visualization of blocks stacked on top of each other results in the use of terms such as
“height” to refer to the distance from the first block, and “top” or “tip” to refer to the most
recently added block.
Each block within the blockchain is identified by a hash, generated using the SHA256
cryptographic hash algorithm on the header of the block. Each block also references a previous
block, known as the parent block, through the “previous block hash” field in the block header. In
other words, each block contains the hash of its parent inside its own header. The sequence of
hashes linking each block to its parent creates a chain going back all the way to the first block
ever created, known as the genesis block.
Although a block has just one parent, it can temporarily have multiple children. Each of the
children refers to the same block as its parent and contains the same (parent) hash in the
“previous block hash” field. Multiple children arise during a blockchain “fork,” a temporary
situation that occurs when different blocks are discovered almost simultaneously by different
miners. Eventually, only one child block becomes part of the blockchain and the “fork” is
resolved. Even though a block may have more than one child, each block can have only one
parent. This is because a block has one single “previous block hash” field referencing its single
parent.
The “previous block hash” field is inside the block header and thereby affects the current block’s
hash. The child’s own identity changes if the parent’s identity changes. When the parent is
modified in any way, the parent’s hash changes. The parent’s changed hash necessitates a change
in the “previous block hash” pointer of the child. This in turn causes the child’s hash to change,
which requires a change in the pointer of the grandchild, which in turn changes the grandchild,
and so on. This cascade effect ensures that once a block has many generations following it, it
cannot be changed without forcing a recalculation of all subsequent blocks. Because such a
recalculation would require enormous computation, the existence of a long chain of blocks
makes the blockchain’s deep history immutable, which is a key feature of bitcoin’s security.
Structure of a Block
A block is a container data structure that aggregates transactions for inclusion in the public
ledger, the blockchain. The block is made of a header, containing metadata, followed by a long
list of transactions that make up the bulk of its size. The block header is 80 bytes, whereas the
average transaction is at least 250 bytes and the average block contains more than 500
transactions. A complete block, with all transactions, is therefore 1,000 times larger than the
block header. The following table describes the structure of a block.
3
4 bytes Block Size The size of the block, in bytes, following this field
80 bytes Block Header Several fields form the block header
1-9 bytes (VarInt) Transaction Counter How many transactions follow
Variable Transactions The transactions recorded in this block
Block Header
The block header consists of three sets of block metadata. First, there is a reference to a previous
block hash, which connects this block to the previous block in the blockchain. The second set of
metadata, namely the difficulty, timestamp, and nonce, relate to the mining competition. The
third piece of metadata is the merkle tree root, a data structure used to efficiently summarize all
the transactions in the block. The following Table describes the structure of a block header.
The primary identifier of a block is its cryptographic hash, a digital fingerprint, made by
hashing the block header twice through the SHA256 algorithm. The resulting 32-byte hash is
called the block hash but is more accurately the block header hash, because only the block
header is used to compute it.
For example, 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f is the
block hash of the first bitcoin block ever created. The block hash identifies a block uniquely and
unambiguously and can be independently derived by any node by simply hashing the block
header.
The block hash is not actually included inside the block’s data structure, neither when the block
is transmitted on the network, nor when it is stored on a node’s persistence storage as part of the
blockchain. Instead, the block’s hash is computed by each node as the block is received from the
network. The block hash might be stored in a separate database table as part of the block’s
metadata, to facilitate indexing and faster retrieval of blocks from disk.
A second way to identify a block is by its position in the blockchain, called the block height.
The first block ever created is at block height 0 (zero) and is the same block that was previously
referenced by the following block hash
000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f.
4
A block can thus be identified two ways: by referencing the block hash or by referencing the
block height. Each subsequent block added “on top” of that first block is one position “higher” in
the blockchain.
The user can search for that block hash in any block explorer website, such as blockchain.info,
and will find a page describing the contents of this block, with a URL containing that hash:
https://blockchain.info/block/
000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
https://blockexplorer.com/block/
000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
{
"hash" : "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f",
"confirmations" : 308321,
"size" : 285,
"height" : 0,
"version" : 1,
"merkleroot" : "4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b",
"tx" : [
"4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b"
],
"time" : 1231006505,
5
"nonce" : 2083236893,
"bits" : "1d00ffff",
"difficulty" : 1.00000000,
"nextblockhash" : "00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048"
}
Bitcoin full nodes maintain a local copy of the blockchain, starting at the genesis block. The
local copy of the blockchain is constantly updated as new blocks are found and used to extend
the chain. As a node receives incoming blocks from the network, it will validate these blocks and
then link them to the existing blockchain. To establish a link, a node will examine the incoming
block header and look for the “previous block hash.”
Let’s assume, for example, that a node has 277,314 blocks in the local copy of the blockchain.
The last block the node knows about is block 277,314, with a block header hash
of 00000000000000027e7ba6fe7bad39faf3b5a83daed765f05f7d1b71a1632249.
The bitcoin node then receives a new block from the network, which it parses as follows:
{
"size" : 43560,
"version" : 2,
"previousblockhash" :
"00000000000000027e7ba6fe7bad39faf3b5a83daed765f05f7d1b71a1632249",
"merkleroot" :
"5e049f4030e0ab2debb92378f53c0a6e09548aea083f3ab25e1d94ea1155e29d",
"time" : 1388185038,
"difficulty" : 1180923195.25802612,
"nonce" : 4215469401,
"tx" : [
"257e7497fb8bc68421eb2c7b699dbab234831600e7352f0d9e6522c7cf3f6c77",
"05cfd38f6ae6aa83674cc99e4d75a1458c165b7ab84725eda41d018a09176634"
]
}
Looking at this new block, the node finds the previousblockhash field, which contains the hash
of its parent block. It is a hash known to the node, that of the last block on the chain at height
277,314. Therefore, this new block is a child of the last block on the chain and extends the
existing blockchain. The node adds this new block to the end of the chain, making the blockchain
longer with a new height of 277,315.
6
The following Figure shows the chain of three blocks, linked by references in
the previousblockhash field.
7
Merkle Trees
Each block in the bitcoin blockchain contains a summary of all the transactions in the block,
using a merkle tree.
A merkle tree, also known as a binary hash tree, is a data structure used for efficiently
summarizing and verifying the integrity of large sets of data. Merkle trees are binary trees
containing cryptographic hashes. The term “tree” is used in computer science to describe a
branching data structure, but these trees are usually displayed upside down with the “root” at the
top and the “leaves” at the bottom of a diagram.
Example:
Merkle trees are used in bitcoin to summarize all the transactions in a block, producing an
overall digital fingerprint of the entire set of transactions, providing a very efficient process to
verify whether a transaction is included in a block. A Merkle tree is constructed by recursively
hashing pairs of nodes until there is only one hash, called the root, or merkle root. The
cryptographic hash algorithm used in bitcoin’s merkle trees is SHA256 applied twice, also
known as double-SHA256.
When N data elements are hashed and summarized in a merkle tree, you can check to see if any
one data element is included in the tree with at most 2*log2(N) calculations, making this a very
efficient data structure.
The merkle tree is constructed bottom-up. In the following example, we start with four
transactions, A, B, C and D, which form the leaves of the Merkle tree, as shown in the following
Figure.
The transactions are not stored in the merkle tree; rather, their data is hashed and the resulting
hash is stored in each leaf node as HA, HB, HC, and HD:
H~A~ = SHA256(SHA256( Transaction A))
Consecutive pairs of leaf nodes are then summarized in a parent node, by concatenating the two
hashes and hashing them together. For example, to construct the parent node HAB, the two 32-
byte hashes of the children are concatenated to create a 64-byte string. That string is then double-
hashed to produce the parent node’s hash:
H~AB~ = SHA256(SHA256(H~A~ + H~B~))
The process continues until there is only one node at the top, the node known as the Merkle root.
That 32-byte hash is stored in the block header and summarizes all the data in all four
transactions.
8
Figure: Calculating the nodes in a merkle tree
Because the merkle tree is a binary tree, it needs an even number of leaf nodes. If there are an
odd number of transactions to summarize, the last transaction hash will be duplicated to create an
even number of leaf nodes, also known as a balanced tree.
This is shown in the following Figure , where transaction C is duplicated.
Figure - Duplicating one data element achieves an even number of data elements
The same method for constructing a tree from four transactions can be generalized to construct
trees of any size. In bitcoin it is common to have several hundred to more than a thousand
transactions in a single block, which are summarized in exactly the same way, producing just 32
bytes of data as the single merkle root. In the following Figure , you will see a tree built from 16
transactions. Note that although the root looks bigger than the leaf nodes in the diagram, it is the
exact same size, just 32 bytes. Whether there is one transaction or a hundred thousand
transactions in the block, the merkle root always summarizes them into 32 bytes.
To prove that a specific transaction is included in a block, a node only needs to
produce log2(N) 32-byte hashes, constituting an authentication path or merkle path connecting
9
the specific transaction to the root of the tree. This is especially important as the number of
transactions increases, because the base-2 logarithm of the number of transactions increases
much more slowly. This allows bitcoin nodes to efficiently produce paths of 10 or 12 hashes
(320–384 bytes), which can provide proof of a single transaction out of more than a thousand
transactions in a megabyte-size block.
In the following Figure , a node can prove that a transaction K is included in the block by
producing a merkle path that is only four 32-byte hashes long (128 bytes total). The path consists
of the four hashes (noted in COLOR in Figure) HL, HIJ, HMNOP and HABCDEFGH. With those four
hashes provided as an authentication path, any node can prove that HK (noted in green in the
diagram) is included in the merkle root by computing four additional pair-wise hashes H KL, HIJKL,
HIJKLMNOP, and the merkle tree root (outlined in a dotted line in the diagram).
10
HASH
Hash functions are one of the most extensively-used cryptographic algorithms that generate a
fixed-length output for any input data irrespective of its size and length. The input data can be a
word, a sentence, a longer text, or an entire file. The fixed-length output generated for the input
data is called a hash. Many types of cryptographic hash functions/ algorithms are available like
MD5, BLAKE2, SHA-1, SHA-256, etc. Secure Hashing Algorithm 256, commonly referred to as
SHA-256, is one of the most famous cryptographic hash functions used extensively in Blockchain
technology. It was developed by the National Security Agency (NSA) in 2001.
When pass a certain message through the hash algorithm, it generates a hash against this input.
Regardless of the size of the letters or numbers of input, the hash algorithm always generates a
fixed-length output. The fixed-length output can vary like 32-bit, 64-bit, 128-bit, or 256-bit
depending on the hash algorithm being used. For instance, SHA-256 generates a hash value of
256 bits, equivalent to the size of 64 characters.
Using a fixed-length output increases security since anyone trying to decrypt the hash won’t be
able to tell how long or short the input is simply by looking at the length of the output. The only
method to determine the original string from its hash is by using “brute-force.” Brute-force
basically means that one has to take random inputs, hash them and compare them with the target
hash. For instance, if the SHA-256 hash algorithm is used, a brute-force attack would need to
make 2^256 attempts to generate the initial data.
11
The hash algorithm used in Blockchain has specific unique properties. These properties are:
1. The technology is designed to produce a unique output or Hash for every transaction
(input).
2. The hashes used in the Blockchain are one-way hash functions that cannot be reversed
and altered.
3. A cryptographic hash function used in the Blockchain must be deterministic.
4. The Blockchain uses the cryptographic hash function's properties as its consensus
mechanism. The cryptographic Hash acts as a digest and a digital fingerprint for a
definite quantity of data.
5. In the Blockchain's cryptographic hash functions, the transaction is accepted as an input
and run through a Hash algorithm, and gives an output of fixed size.
A Small Change in the Input Value Alters the Hash Value Significantly
The technology is designed to produce a unique output or Hash for every transaction (input). One
of the unique properties of a cryptographic hash function is that even a small change in the input
value brings about a drastic change in its output value. This is referred to as the Avalanche Effect.
For instance, when the first input, ‘Blockchain is the future’ is passed through the hash function, a
specific output or hash is generated. But when a small change is made to the input, like, an extra
exclamation mark is added- ‘Blockchain is the future!’ you can see that the new hash is generated,
which is entirely different from the previous hash. This property of cryptographic hash functions
makes them resistant to hacking as no correlation can be derived about the input data by just
looking at the hash alone.
One-Way Functions
Hash functions are generally referred to as one-way functions because they are not reversible.
While a hash function is a cryptographic function, it’s not encryption. Encryption works by
encrypting the relevant data with an encryption algorithm and an encryption key. This results in a
cipher text that can only be viewed in its original form if decrypted with the correct key. A hash
function, in contrast to encryption, works as a one-way function, in simple words, if you have a
12
hash, you cannot decrypt it to find the corresponding input. So in a real-life scenario, even if a
hacker gets access to a hash output, it is completely useless as he can’t decrypt it to get the input.
The following figure represents the functions of hash in blockchain.
Therefore, cryptography used in Blockchain requires one-way hash functions, making it safe,
secure, and reliable. Though hash functions can be used to track and validate the input data, they
can’t be used to decrypt and reach the input data.
A hash can be used to track and validate the information but can’t decrypt and find the original
data.
Deterministic
A cryptographic hash function used in the Blockchain must be deterministic. In simple words, a
hash function is said to be deterministic if it generates the same hash whenever the same input is
passed through it. No matter how many times we pass an input Blockchain is the future’ through
the hash function, it should always generate the same exact output or hash every single time.
If different outputs are generated by a hash function for the same input, the hash function will
become useless, and it would be impossible to verify a specific input.
13
SHA-256 Algorithm
We can divide the algorithm for SHA-256 into three steps, as outlined below.
Step one: Appending bits
The first step involves preprocessing the input message to make it compatible with the hash function.
It can be divided into two main sub steps:
Padding bits
The total length of our message must be a multiple of 512. In this step, we append bits to the end
of our message such that the final length of the message must be 64 bits less than a multiple of
512. The formula below depicts this step:
m+p= (n∗512) − 64
where m = length of the message, p = length of the padding, and
n = a constant.
The first bit that we append is 11 followed by all 00 bits.
Length bits
Next, we take the modulus of the original message with 2 32 to get 64 bits of data. Appending this to
the padded message makes our processed message an exact multiple of 512.
The image below illustrates the final message after step one is completed.
The original message after preprocessing is
14
Step Three : Process each Block /Compression Function
The K[i] in all the rounds is already initialized. W[i] is calculated individually for each block
depending on the number of iterations processed.
The entire cycle repeats itself till it reaches ‘n’. The number of 512-bit blocks.
Finally, The last result will consider as SHA-256 digest.
Example
“hello world”
15
Step 1: Convert it into binary
Step 3: Pad with 0’s until data is a multiple of 512, less than 64 bits (448 bits)
16
DISTRIBUTER P2P
Peer to peer network allows connecting a group to computers together along with equally
granting permissions and responsibility for getting to process data.
In the peer to peer network, all “Peers” means all computers which are linked with each other
through internet. P2P network has not any central server, so each user is capable to share any
types of files on any peer over this network. On other words, you can say that every peer on this
P2P networks plays role as server as well as client.
In P2P network, three methods are used for connecting multiple computer systems like as basic
method is to use USB (universal serial bus) to make connection between two peers, second
method is to use copper wires to make connection for more computers, and finally method
is to use protocols which help to control all connections between several terminals on the
internet.
Types of peer-to-peer (P2P) networks
A P2P architecture can be categorized into structured, unstructured and hybrid peer-to-peer
networks.
Structured peer-to-peer networks
The nodes have a way to interact with each other, because of organized architecture that is used
to search for files and to use them efficiently, rather than searching randomly. To make these
types of structured P2P networks work, hash functions are used for database lookups.
There is no doubt that structured P2P networks are more efficient. However, they also have some
sort of centralization as they are using organized architecture. It also means that they require
17
higher maintenance and setup costs. Lastly, it is robust when compared to the unstructured P2P
network.
In this type of P2P network, each device is able to make an equal contribution. This network is
easy to build as devices can be connected randomly in the network. But being unstructured, it
becomes difficult to find content.
18
However, P2P can also be difficult to manage and secure due to varying capabilities,
configurations, and trust levels among nodes.
Advantages of P2P Network
Easy to maintain: The network is easy to maintain because each node is independent of
the other.
Less costly: Since each node acts as a server, therefore the cost of the central server is
saved. Thus, there is no need to buy an expensive server.
No network manager: In a P2P network since each node manages his or her own
computer, thus there is no need for a network manager.
Adding nodes is easy: Adding, deleting, and repairing nodes in this network is easy.
Less network traffic: In a P2P network, there is less network traffic than in a client/ server
network.
Disadvantages of P2P Network
Data is vulnerable: Because of no central server, data is always vulnerable to getting lost
because of no backup.
Less secure: It becomes difficult to secure the complete network because each node is
independent.
Slow performance: In a P2P network, each computer is accessed by other computers in
the network which slows down the performance of the user.
Files hard to locate: In a P2P network, the files are not centrally stored, rather they are
stored on individual computers which makes it difficult to locate the files.
19
7. Client-Server Network is used for both While Peer-to-Peer Network is generally
small and large networks. suited for small networks with fewer than
10 computers.
20