Block Chain Material
Block Chain Material
I. Distributed Systems
Bitcoin transactions are recorded on the blockchain, a ledger that is maintained by a distributed system, or a
network of independent nodes connected by message channels that move information between them. A critical
aspect of a distributed system is the way in which these nodes, which are unknown and untrusting of each other,
come to agreement, or consensus. In the case of Bitcoin, the network needs to agree on the number of bitcoins each
individual owns, and of all transactions being made.
Distributed systems remove the need for trust in potentially unreliable parties; instead, we can trust the mathematics
and the correct operation of these systems. If one of the nodes crashes or is corrupted by malicious entities, we can
still protect our information and services by relying on previously set protocols that withstand these failures.
A distributed system is considered “correct” if it comes to consensus on an answer -- given an input, the nodes must
agree on an output.
To prove the correctness of a distributed system, we use the scheme designed by Lamport. The scheme says that a
system is correct if two things are true:
To ensure correctness, we use consensus algorithms. There are 3 requirements of any correct consensus algorithm:
Validity: any value agreed upon must be proposed by one of the processes
Notice that validity and agreement are safety properties while termination is a liveness property.
Consistency: every node provides the most recent state of the system
Partition Tolerance: ability to function in spite of partitions in the network, where a partition is the inability for
two or more nodes to communicate with each other; this is almost a given for any distributed system
It is important to understand there aren’t black and white tradeoffs between these three properties -- compromises
can be made.
There are two types of faults that may be produced by Byzantine nodes, where faults are deviants from protocol:
Paxos - Consensus mechanism inspired by the Paxon parliament, who used the Paxos algorithm to pass decrees and
make sure everyone on the island was in consensus. Assumes nodes do not try to subvert protocol; only works for
fail-stop, no byzantine failure tolerance.
Raft - Leader based approach designed to be more understandable than Paxos, easier to implement; i.e. JP Morgan’s
Quorum (enterprise version of Ethereum).
o One and only one leader: communicates with client directly, responsible for log replication on other servers, leads
until fails or disconnects
o Leader election: leader sends heartbeats to signal it is online and functioning; if no heartbeats are received the first
node to realize a lack of leader becomes the new leader
Practical Byzantine Fault Tolerance - fast, handles F faults in a system with 3F + 1 nodes, BFT-NFS
implementation only 3% slower than standard unreplicated NFS
o Pre-prepare: the primary node sends out pre-prepare messages to everyone in the network; a node accepts the pre-
prepare message so long as its valid.
o Prepare: If a node accepts a pre-prepare message, it follows up by sending out a prepare message to everyone else;
prepare messages are accepted by receiving nodes so long as they’re valid, again, based on sequence number,
signature, and other metadata; A node is considered “prepared” if it has seen the original request from the primary
node, has pre-prepared, and has seen 2F prepare messages that match its pre-prepare – making for 2F + 1 prepares.
o Commit: If a node receives F+ 1 valid commit messages, then they carry out the client request and then finally send
out the reply to the client. The client waits for F + 1 of the same reply. Since we allow for at most F faults, we need
to wait for at least F + 1, and this ensures the response to be valid.
V. Nakamoto Consensus
Used in Bitcoin and other cryptocurrencies. Whereas the voting based consensus mechanisms covered above use
explicit voting, Nakamoto consensus uses implicit voting i.e. voting based on lottery-selection and earned voting
power.
o Any user can have as many virtual identities/key pairs, as they want
o To prevent unfair voting from anyone who dishonestly creates multiple identities, voting power must be made
scarce, done by tying voting power to a scarce resource such as power or electricity
Each Nakamoto Consensus protocol must have a set of rules defining how to choose the most valid state of the
network, such as the longest chain policy in Bitcoin and many cryptocurrencies. This is because each node in the
Nakamoto consensus network gets to choose its own state, and try to convince others of its validity.
o Proof of Work - current blockchain standard, led by bitcoin and followed by most networks, led to mining craze
and rapid acquisition of computing hardware
o Proof of Stake - experimental protocol to end electricity drain by staking tokens, can mine or validate block
transactions according to how many tokens staked
o Proof of Capacity/Space - memory-hard PoW, allocating amount of memory or disk to solve a challenge
VI. Proof-of-Stake
With proof-of-stake, validators are stakeholders with voting power proportional to the economic stake they have
locked up. The assumption here is that someone with more stake is more incentivized to do things that will benefit
the system and thus increase their economic stake.
Chain-based PoS chooses availability while Byzantine Fault Tolerant PoS chooses consistency.
PoS is susceptible to corruption if over 33% of the network are malicious actors, whereas PoW requires over 50%
malicious actors
PoS tends to lead to a rich-become-richer problem where those who stake substantial portions of the total network
will grow in proportion due to higher likelihood of being selected, and thus rewarded
If the larger players grow past 33% of network, poses a threat to validity
If you don’t trust certain nodes in the quorum, we can avoid having a central party choose the quorum for us by
using a quorum slice, or subsets of a quorum that a particular node can trust. A quorum slice allows us to
individually choose who we trust, and when multiple quorum slices overlap, we form quorum intersections and
thus a larger quorum.
Federated consensus is powerful because of its decentralized control, low latency, and flexibility towards trust.
Popular implementations of federated consensus include Ripple and Stellar.
I. Cryptoeconomics
Economic principles help us to design a system so that actors are incentivized to make decisions
in line with the goals of the greater good. We are able to secure the future. (e.g. block reward in
Bitcoin and cost of mining to deter Sybil attacks)
Cryptography allows us to secure the past and ensure our decisions cannot be manipulated by
observers (e.g. cryptographic signatures for authentication and hashes for immutability)
II. Cryptography
The need for cryptography is especially important in distributed systems, where unknown actors
are a potential threat to the secrecy and safekeeping of information.
Cryptographic hash functions, used to capture the identity of information without revealing
anything about the information itself
Digital signatures, used to prove your identity and that you sent a particular message
Timelocks allow for a message to be easily encrypted but take a longer amount of time to
decrypt.
III. Economics
Economics boils down to a fundamental question: how do you determine the best choice to make
with your limited resources in order to maximize your profit? Economics also helps us to design
a system so that everyone is incentivized to act in a certain way.
In game theory, we aim to deduce how an actor will act in a given situation. These decisions are
influenced by the actions of others and the rewards and penalties associated with certain
decisions. Therefore we aim to manipulate these factors.
In blockchain, tokens are used as economic incentives. Tokens are units of protocol defined
cryptocurrency given out to miners and privileges miners can charge for. The assumption here is
that the underlying objective for actors in a blockchain network is to maximize their profit,
which equals their revenues minus their costs.
Casper the Friendly GHOST (CBC) - a family of consensus algorithms designed from ground
up i.e. Correct-by-construction, a proposed upgrade for the Ethereum network
Casper the Friendly Finality Gadget - a Proof-of-Work and Proof-of-Stake hybrid; another
upgrade proposed for the Ethereum network
V. Attacks
Each proof of stake attack represents a scenario in which the incentives of an individual are not
aligned with the incentives of a group, i.e. giving an unfair advantage to any single actor.
Because the resource consumed is monetary value, bad actors need to receive an explicit
monetary penalty with each attempted attack to keep the system in check.
If there was zero penalty, the expected profit of any given attack would be some number greater
than zero, providing an incentive. By penalizing users for incorrect or malicious actions, the
system hopes to bring the expected value to less than or equal to zero.
Examples of attacks:
Nothing-at-Stake: voting in favor of every fork in hopes of maximizing one’s rewards i.e.
guaranteeing you will not miss the reward from the chosen branch; solution: slashing an actor if
they are caught voting on multiple forks, or a less popular scheme penalizes incorrect votes; keep
in mind that voting takes place using cryptographically identifiable/verifiable signatures.
Stake grinding: attack where a validator performs some computation or takes some other step to
try to bias the randomness in their own favor; solution: require validators to deposit their coins
well in advance, and avoid information that can be easily manipulated as source data for
randomness
Weak subjectivity is a problem for new nodes or nodes that have been offline for a long time;
the node does not know which chain is the main chain; solution: introduce a "revert limit" - a
rule that nodes must simply refuse to revert further back in time than the deposit length
Text: Week 4 Summary
Participation due Jul 30, 2020 01:03 IST
Bookmark this page
I. Background
Modern day public blockchains have been victims of their own success. Bitcoin and Ethereum
especially are having scalability issues in that they aim to be global networks able to support
global-scale transaction volumes, but currently both perform subpar in the transaction
throughput.
Fundamentally, scaling solutions can either increase the transaction volume, or decrease the
block time. This is self evident as scalability is measured in a blockchain’s achievable TPS
(transactions per second.)
Going forward, we can classify blockchain scaling solutions two ways. The first is a rough
comparison with traditional cloud architecture scaling classifications: horizontal, vertical, and
diagonal. Secondly, there are the blockchain-specific scaling classifications: layer 1 (on-chain)
and layer 2 (off-chain).
Bitcoin processes less than 10 transactions per second, and without any scalability upgrades, it’s
bound to stay at low TPS. Looking at how we calculate TPS in the first place, namely in the
rough dimensional analysis above, we can see that the fields we can attempt to modify in efforts
to create new scaling solutions are:
1. Block time
2. Block size
3. Transaction size
These parameters are all built into a blockchain system itself, and tuning these parameters
directly constitute as layer 1 scaling solutions.
Ethereum has dealt with this problem historically by employing the GHOST (Greedy Heaviest
Observed SubTree) protocol. With the GHOST protocol, miners no agree on the longest chain to
be canon (as in Bitcoin), but rather the chain with the most “weight”, where weight is a value
calculated by both a chain’s length and the number of uncle blocks it has.
Increasing block size would improve a blockchain’s TPS. Since a block can now contain more
transactions, it would also lower transaction fees.
However, as with decreasing block time, there are some side effects. For one, increasing block
size would imply hard forking, and depending on the community, this could be a less than
pleasant experience. It would also make the blockchain grow in size at a much faster rate – a
problem decreasing the block time also faced. And finally, increasing the block size is most
likely not a one-time fix, since the scalability boost is only linear. The block size might need to
be increased in the future again, leading to a “slippery slope” type of debate.
Non-SegWit nodes would see a transaction without a signature, but would mark the transaction
as valid. SegWit nodes on the other hand would know to read into the segregated witness, and
would verify it using the signature.
Given that the speed of a blockchain limits its scalability, we can consider entirely removing the
more costly operations off the chain and only publishing when we require a global sense of truth.
Payment channels in Bitcoin could be implemented using HTLCs (hash time lock contracts), and
could move transactions off the main Bitcoin blockchain and onto side chains. If Alice and Bob
transact often, perhaps it makes sense for Alice and Bob to construct a private payment channel,
where they conduct their transactions off-chain. Only when they want to settle their final
balances do they post back to the main blockchain. This allows Alice and Bob to still conduct
their transactions as they do, but the main blockchain only has to store Alice and Bob’s initial
and end balances.
The idea behind the Bitcoin Lightning Network is to create a network of payment channels
In the diagram above, Alice can pay Charlie without having a payment channel to Charlie
directly, so long as there is a path from Alice to Charlie through the payment channel network.
Ethereum has a similar scalability solution in the works, appropriately named Raiden.
Payment channels and payment channel networks would allow us to keep many transactions off
chain, delegating payments to simple bookkeeping. Since the main blockchain only sees the start
and end balances of the parties in a payment channel, we can keep a majority of transactions off
chain: scaling Bitcoin from under ten transactions to potentially hundreds of thousands of
transactions.
Some problems include having to lock up capital in order to initiate a payment channel, and
centralization concerns of payment channel networks converging to hub-and-spoke topologies.
Sharding is database scaling strategy that breaks up a monolithic database into “shards”, each a
separate database that contains data from a subset of the original database, whose union is the
original database. The same idea can be applied to blockchain, and is currently one of the active
areas of research in Ethereum research.
The idea translated to blockchain implies that not every node keeps track of every block. It
would be a layer 1 horizontal scaling solution. We could have multiple blockchains running in
parallel, each containing a subset of all transactions. Issues currently being researched include
the classification of various nodes in a sharded blockchain system (e.g. nodes that see a single
shard vs nodes that see all shards), cross-shard communication, and defenses against single shard
takeovers.
Sidechains are the idea that you can create multiple side chains for different purposes that plug
into a main chain, effectively decreasing the traffic on it.
This does separate hashing power across multiple chains, which raises security concerns.
Ethereum’s Plasma can be seen as a diagonal scaling solution, since it enables horizontal scaling
by implementing side chains and vertical scaling by increasing their speed through Tendermint
and alternative consensus mechanisms. The security of off-chain transactions is derived from the
root chain, the main source of truth within the system.
The application layer processes transactions and updates the state of the system
The consensus layer makes sure the entire network agrees on transactions and updates to the
database
The networking layer makes sure all nodes get updates within a reasonable amount of time
The purpose of the Tendermint project is to provide the networking and consensus layers so that
arbitrary applications could be built on top of it. Tendermint is the consensus “engine” of the
Cosmos network, which aims to make blockchains interoperable and scalable.
The following table summarizes the scaling solutions we have learned, categorized by 2 different
methods. Layer 1 and Layer 2 specify whether solutions are built on-chain or off-chain.
Solutions can also scale vertically or horizontally.
Most blockchains are pseudonymous by default, not anonymous. With their sights set on
decentralization first and foremost, the original designers of Bitcoin chose to build a distributed
database in which everyone (all full nodes) stores the Bitcoin blockchain. While this grants us
decentralization properties, the fact that records of all transactions are now public is concerning,
especially for those who do not actively adhere to best practices, such as generating a new
pseudonym for each transaction. Additionally, the privacy of users can be severely compromised
if their virtual identities, or their pseudonyms, are somehow linked to their real ones -- this is
known as linking.
And sometimes, as hard as one might try to make a particular technology secure, private, or
anonymous, human factors make all of this very difficult.
II. Deanonymization
The goal of deanonymization is to link your online identity (pseudonym) to your real identity
through analysis and heuristics.
Taint analysis can be used to tag “bad” addresses and trace their associated activity throughout
the network (taint is the percentages of funds received by an address that can be traced back to
another address).
Third Party Protocol (TPP): Of course, as with any centralized service, TPP faces the issue of
being a single point of failure. Additionally, if the only UTXOs being sent to the centralized
mixer are dirty coins, the outputs for later users will only be dirty coins unless there are enough
clean coins being cycled through the slush fund.
Altcoin exchange mixing, which works by sending dirty funds through several layers of altcoin
⇔ altcoin exchanges to obfuscate the money trail. This is less centralized than TPP. In this case,
there is no mixing fee, but rather the sum of the exchange fees between each cryptocurrency
used. Advantages of this technique include better plausible deniability and increased tracing
difficulty, while disadvantages include the chance that exchanges may reveal the links between
your inputs and outputs.
CoinJoin was one of the earliest decentralized mixing protocols proposed, all the way back in
2011. It used n-of-n multi-signature transactions to mix coins together between n peers. The
problems were that it assumed there was some central “mix facilitator” with a central server
coordinating all the users, in addition to lacking plausible deniability and DoS resistance.
CoinShuffle improved upon CoinJoin by introducing decentralized mixnets, which allow a set
of users to pool together inputs from a group without revealing which input was submitted by
which user. Additionally, it used an “Accountable Anonymous Group Messaging” protocol
known as Dissent to coordinate users. Though this prevents attacks such as traffic analysis and
deanonymization by a mix facilitator, it is still susceptible to all the drawbacks associated with
CoinJoin.
JoinMarket sought to fill a liquidity market of mixable coins for a fee, allowing anyone to
contact this service to mix their coins. However, this approach does not provide a large
anonymity set and was claimed to be deanonymizable with $32,000 USD alone.
CoinParty aimed to provide efficient mixing with plausible deniability and a larger anonymity
set via escrow addresses and threshold signatures. However, this comes at the cost of some
protocol security, as any ⅔ of the users colluding can steal funds from all the other users.
These are built upon the traditional fair-exchange protocol so that no trusted third party is
needed.
CoinSwap - uses hash-locked, 2-of-2 multi-signature transactions to securely swap coins without
linking transactions. While this process is trustless and provides better plausible deniability, it
comes at the cost of inefficiency and is also insecure against mix-passive intermediary.
XIM - Uses untrusted intermediary to create fair-exchange mixer. It prevents Sybil and DoS
attacks by enforcing fees to use service, but it requires several hours to run because of group-
forming protocol.
BSC - Builds upon XIM to allow users to skip group-forming protocol with anonymous fee
vouchers. However, it is not supported on the Bitcoin protocol due to insufficient scripting
capabilities.
o Untraceability: for any incoming transaction, all possible senders are equiprobable
o Unlinkability: for any two outgoing transactions, it is impossible to prove that they went to the
same person
Zcash - transactions reveal nothing about input and output addresses, using zero-knowledge
Succinct Non-interactive ARguments of Knowledge (zk-SNARKs)
MimbleWimble is currently under active development and its most popular implementation is
called Grin.
MimbleWimble sacrifices some of Bitcoin’s advanced functionality (namely Bitcoin script) for
anonymity and scalability. As with all other blockchain platforms, this brings us back to our
trilemma -- to enable the increase in privacy, there have to be sacrifices with regards to
scalability and decentralization.
Here are some of the more recent FAQ for this course:
Q: What is in a block?
A: Many students who have not taken CS198.1x have been curious about the actual contents of a
block, especially in Bitcoin, following our discussion of SegWit and other scalability solutions.
In Bitcoin, a block contains: a block header, block size, transactions, transaction counters, and a
magic number. We studied the block header extensively in lecture 3 of CS198.1x. It contains a
version number, previous block header hash, merkle root, timestamp, target difficulty, and a
nonce. These values are important for ensuring tamper-evidence. As the goal of blockchain is to
ensure certain innovative properties of blockchain, many other blockchains have similar
structures.
Q: What is “hashing”?
A: In our courses, we’ve explained “hashing” to be the process of solving proof-of-work partial
preimage hash puzzles. It’s best to think of hashing then as a lottery, in which the more you
spend, the more likely you are to win. Or perhaps like throwing darts blindfolded – you can’t
really aim your throws, so to increase your chances of getting a near-bullseye, simply increase
your throwing rate. Related questions have been about effective hash rate, wasteful hashing, and
hash functions in general. Course material can be found in CS198.1x weeks 5, 4, and 3
respectively. It may help to get a general overview of why we need to expend computational
resources in the first place. This is the topic of proof-of-work consensus and of the more general
class of Nakamoto consensus (expanding on the lottery analogy), which we define in CS198.2x
week 1.
A: You may have realized this in our program summary section, but the overarching narrative of
both of our courses can be seen as modeling the gradual maturing and general sentiment about
cryptocurrency and later blockchain technologies. This approach explains the necessary context
in the blockchain space, which influences design choices. Our narrative starts with CS198.1x
Bitcoin and Cryptocurrencies, explaining cryptocurrencies as the first use case for blockchain,
and Bitcoin as the original inspiration.
Our aim was to explore both the technological and social aspects of Bitcoin, and towards the end
of the first course reveal that intuition from Bitcoin’s design carries over to understanding other
blockchains in general. While the name of the first course is “Bitcoin and Cryptocurrencies,” one
of the primary motivations was to decouple the ideas of Bitcoin and cryptocurrencies from that
of blockchain. And with something to refer back to such as a solid understanding of Bitcoin, it
makes analysis of blockchain systems in general much easier.
A: There are many factors to take into account when answering this question. In this course, we
have aimed to teach a framework with which students can gauge for themselves whether or not
blockchain is useful for a given use case or industry. It mainly reduces to the question: what
fundamental properties and innovations of blockchain specifically does the use case leverage?
These are all design considerations, and the entirety of our second course should be of use to
you. Course material related to blockchain use cases can be found in CS198.1x week 6 and
CS198.2x week 3.
A: Again, our goal is to stay nonpartisan here. There will be invariably pros and cons and
tradeoffs in the blockchain platform in question. These are all design considerations, and
hopefully our course material – especially CS198.2x week 3 – has empowered you to judge these
parameters by yourself.
A: We are actively supporting this course and engaging with students in the discussion boards.
There are a lot of students in this course though, so the best way to get a question or concern
addressed is to first scan the discussion boards to see if someone has asked something similar,
upvoting the post if there is one, posting if there is not, commenting, etc. As a rough analogy to
what we’ve learned throughout this course and last, the posts with the most “work” (upvotes,
comments, activity) will be more likely to be addressed first.