0% found this document useful (0 votes)
13 views69 pages

DTUnit 1 & 2

Uploaded by

23mdts59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views69 pages

DTUnit 1 & 2

Uploaded by

23mdts59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Distributed Technologies and

Cryptocurrencies

Dr. K. Prakash
Assistant Professor - CS
PG
Unit 1
Introduction to Distributed Database

A distributed database is a collection of interrelated data that is


distributed across different locations connected by a network. Unlike a
centralized database, where all data resides in a single location, a
distributed database system stores data across multiple computers, which
can be in the same physical location or dispersed across different
geographic locations.
Advantages of Distributed Databases:
• Improved Reliability and Availability: Data replication
ensures that the system can recover from failures.

• Scalability: It can handle an increasing amount of work or


accommodate growth.

• Reduced Latency: Data can be stored closer to where it is used,


reducing access time.

• Cost Efficiency: Resources can be allocated as needed, and


cheaper hardware can be used.
Examples:

• Google Spanner: A globally distributed, strongly


consistent database.

• Apache Cassandra: A highly scalable, high-performance


distributed database designed to handle large amounts of
data.

• Amazon DynamoDB: A fully managed NoSQL database


service that provides fast and predictable performance.
Key components
• Data Distribution:
✔ Horizontal Fragmentation: Data is divided into subsets of rows and stored
across different sites.
✔ Vertical Fragmentation: Columns of a table are distributed across different
sites.
✔ Replication: Data is duplicated and stored in multiple locations to ensure high
availability and reliability.

• Transparency:
✔ Location Transparency: Users do not need to know the physical location of the
data.
✔ Replication Transparency: Users are unaware of the replication process and
the number of replicas.
✔ Fragmentation Transparency: Users do not need to know how data is
fragmented and stored.
• Consistency:
✔ Ensuring that all users see a consistent view of data, even though data
may be distributed across different sites.

• Scalability:
✔ The system can scale horizontally by adding more machines to
accommodate more data or handle more load.

• Fault Tolerance:
✔ The system can continue to function even if one or more nodes fail,
thanks to data replication and distributed architecture.

• Concurrency Control:
✔ Mechanisms are in place to manage simultaneous operations on the
database without leading to conflicts or inconsistencies.
The Two General Problem
Byzantine Generals Problem and Fault Tolerance are
fundamental concepts in distributed computing and systems
design. They address the challenges of ensuring reliability and
consistency in systems with multiple components, especially when
communication between these components can be unreliable or
compromised.

The Two Generals Problem is a thought experiment in


computer science and communication theory that illustrates the
challenges of achieving coordinated action in a distributed system
where communication is unreliable.
Byzantine Generals Problem
The Byzantine Generals Problem is a more generalized
and complex scenario that involves multiple parties, some
of whom may act maliciously or be unreliable.

It was introduced by Leslie Lamport, Robert Shostak, and


Marshall Pease in 1982 to describe the difficulty of reaching
a consensus in the presence of faults, especially with faulty
or malicious components.
Fault Tolerance
Fault tolerance is the property of a system to continue
operating properly in the event of the failure of some of its
components. In distributed systems, fault tolerance is
crucial because individual components (like servers or
network links) can fail, and the system must continue to
operate despite these failures.
Blockchain Technology
What is blockchain?
Blockchain is a record-keeping technology designed to
make it impossible to hack the system or forge the data
stored on the blockchain, thereby making it secure and
immutable. It's a type of distributed ledger technology (DLT),
a digital record-keeping system for recording transactions
and related data in multiple places at the same time.

Each computer in a blockchain network maintains a copy of


the ledger where transactions are recorded to prevent a
single point of failure. Also, all copies are updated and
validated simultaneously.
• Blockchain is also considered a type of database, but it
differs substantially from conventional databases in how it
stores and manages information.
• Instead of storing data in rows, columns, tables and files as
traditional databases, Blockchain stores data in blocks that
are digitally chained together.
• In addition, a blockchain is a decentralized database
managed by computers belonging to a peer-to-peer
network instead of a central computer like in traditional
databases.
Applications
• Bitcoin, launched in 2009 on the Bitcoin blockchain, was
the first cryptocurrency and popular application to
successfully use blockchain. Blockchain has been most
often associated with Bitcoin and alternatives such as
Dogecoin and Bitcoin Cash, which both use public ledgers.

• Logistics companies use blockchain to track and trace


goods as they move through the supply chain.
• Government central banks and the global financial
community have been testing blockchain technology as a
foundation for currency exchange.
• Legal community and entertainment, are using blockchain
as the basis for smart contracts and other mechanisms for
transferring and protecting intellectual property rights.

• Companies and other Organizations are using blockchain-


based applications as a secure and cost-effective way to
create and manage a distributed database and maintain
records for digital transactions of all types.

• Blockchain is increasingly viewed as a way of securely


tracking and sharing data between multiple business
entities.
Distributed File System
A distributed file system (DFS) is a type of computer file
system that enables users to store and access data from
multiple distributed locations.
It is a way of sharing information between different
computers in a distributed environment.
By using a distributed file system, users can access the
same data from multiple sources and can access the data
even if one or more sources are unavailable.
Distributed file systems are an important part of any
organization’s data storage and access needs.
The design of the system should be based on the
principles of scalability, availability, reliability, performance,
and security.
HDFS (Hadoop Distributed File System)
HDFS provides one of the most reliable filesystems.
The unique design that provides storage for extremely large files with
streaming data access pattern and it runs on commodity hardware.
Features:
• Extremely large files:
Here we are talking about the data in range of petabytes(1000 TB).
• Streaming Data Access Pattern:
Designed on principle of write-once and read-many-times. Once
data is written large portions of dataset can be processed any number
times.
• Commodity hardware:
Hardware that is inexpensive and easily available in the market.
HDFS Cluster
NameNode(MasterNode):
• Run on the master node.
• Manages all the slave nodes and assign work to them.
• Store metadata (data about data) like file path, the number of blocks, block Ids. etc.
• It executes filesystem namespace operations like opening, closing, renaming files
and directories.
• Require high amount of RAM.
• Fast retrieval to reduce seek time.

DataNode(SlaveNode):
• Run on slave nodes.
• Worker nodes works like reading, writing, processing etc.
• Also performs creation, deletion, and replication upon instruction from the master.
• They can be deployed on commodity hardware.
Distributed Hash table (DHT)
A distributed hash table (DHT) is a type of distributed
system that provides a lookup service similar to a hash table.

Data is stored and retrieved using keys, and the keys are
used to determine the location of the data in the table.
The data is distributed across multiple nodes in a
network rather than being stored in a single table.
Each node is responsible for storing and managing a
portion of the data.
For Examle, when a client wants to retrieve or store data,
it sends a request to the network. The request is then
forwarded to the appropriate node based on the key of the
data being requested. The node then responds to the
request and either retrieves or stores the data.
Applications
• Peer-to-peer (P2P) networks −
Facilitate the sharing of resources, such as files or data, between peers.

• Distributed databases −
Provides a scalable and efficient way to store and retrieve large amounts of data.

• Distributed file systems −


Stores and manage files in a distributed file system. Provides a scalable and fault-
tolerant way to store and access large amounts of data.

• Content delivery networks −


Used to store and distribute content, such as videos or images, across a network of
servers.
This can help to reduce the load on a single server and improve the performance
of the network.
Working Concept

A distributed hash table (DHT) is a decentralized storage


system that provides lookup and storage schemes similar to
a hash table, storing key-value pairs.

Each node in a DHT is responsible for keys along with the


mapped values. Any node can efficiently retrieve the value
associated with a given key.

Just like in hash tables, values mapped against keys in a


DHT can be any arbitrary form of data.
DHTs have the following properties:

• Decentralised & Autonomous: Nodes collectively form the system without any
central authority.
• Fault Tolerant: System is reliable with lots of nodes joining, leaving, and failing at all
times.
• Scalable: System should function efficiently with even thousands or millions of
nodes.

DHTs support the following 2 functions:

• put (key, value)


• get (key)

The nodes in a DHT are connected together through an overlay network in which
neighboring nodes are connected. This network allows the nodes to find any given key
in the key-space.
Cryptography
Cryptography is the technique of concealing or
encoding(changing its original form) the information in
such a way that only the authenticated person can
decode(get the original form) it. This technique of
cryptography plays an important role in keeping our data
safe. The data or information can be bank cards, computer
passwords or online transactions and other private data.

Cryptography is very important in this modern world


because it helps to protect your digital stuff from hackers by
turning information into secret language or code.
Applications of Cryptography
• Securing communication by encrypting the messages and emails.

• Protecting our data in the applications by securing user data, like passwords and personal
information.

• Securing confidential files and documents.

• E-commerce platforms - Securing online transactions and payment information.

• Blockchain technology - Ensuring the security and integrity of transactions in blockchain-


based systems.

• Password protection for storing and managing passwords securely.

• Digital signatures - Verifying the authenticity of digital messages or documents.


Purpose of Cryptography

• Confidentiality

• Integrity

• Authentication

• Non-repudiation
Cryptography usage areas
• Authentication

• Internet of Things

• Card Payments

• PC and different passwords

• Ecommerce Websites

• Digital currency

• Secure file storage

• Digital Signatures
Top languages for Cryptography
• Python

• Go

• Ruby

• C++

• C#

• Java

• PHP
Hash Functions
A hash function is a cryptographic algorithm that takes an input (or "message") and
returns a fixed-size string of bytes. The output, typically called a hash value or digest, is
unique to each unique input.

The key properties are:

• Deterministic: The same input will always produce the same output.

• Quick computation: The hash value can be computed quickly.

• Pre-image resistance: Given a hash value, it should be computationally infeasible to find


the original input.

• Small changes in input drastically change the output: Even a tiny alteration to the input
should produce a vastly different hash.

• Collision resistance: It should be infeasible to find two different inputs that produce the
same hash output
Features of Hash functions in
cryptography
• Data Integrity: Ensuring that data has not been altered
during transmission or storage.
• Password Storage: Storing passwords as hashes rather
than plaintext, which enhances security.
• Digital Signatures: Hash functions play a critical role in
creating digital signatures.
Digital Signatures

A digital signature is a cryptographic technique that allows


someone to sign digital data in a way that the signature can
be verified by anyone, ensuring the authenticity and
integrity of the signed data.

This process involves two key steps:


• Signing: The sender generates a hash of the message (or
document) and encrypts it with their private key. This
encrypted hash, along with the hash function used, forms
the digital signature.
• Verification: The recipient decrypts the signature using
the sender's public key, obtaining the hash value. They
then hash the received message using the same hash
Digital signatures - Features

• Authentication: Ensures that the sender of the message is


who they claim to be.
• Integrity: Verifies that the message has not been altered in
transit.
• Non-repudiation: The sender cannot deny having sent the
message, as only they possess the private key needed to
create the signature.
Unit 2
Advantages and differences over Block
Chain vs DB
A BLOCKCHAIN is a database or a ledger that stores
information in a data structure called blocks. It is based on
distributed ledger technology which can be used between
parties that don't trust each other with data.

A DATABASE is a kind of central ledger where the


administrator manages everything. Here the administrator
gives rights to read, write, update, or delete operation. Since
it is centralized in nature, their maintenance is easy, and
output is high. But it also has a drawback which, when
corrupted, can compromise the entire data and can even
change the ownership of digital records.
Blockchain Network
A blockchain network is a shared database that uses a
chain of blocks to store and track data across a network.
The blocks are linked together in a chronological order
using cryptography, making the data immutable and secure

It is a structure that stores transactional records, also


known as the block, of the public in several databases,
known as the “chain,” in a network connected through peer-
to-peer nodes.
Typically, this storage is referred to as a 'digital ledger.
Types
• Public blockchain is where cryptocurrency like Bitcoin
originated and helped to popularize distributed ledger
technology (DLT).
• A private blockchain works in a restrictive environment like a
closed network or is under the control of a single entity.
• Hybrid blockchain combines elements of both private and
public blockchain. It lets organizations set up a private,
permission-based system alongside a public permissionless
system, allowing them to control who can access specific data
stored in the blockchain, and what data will be opened up
publicly.
• The fourth type of blockchain, consortium blockchain, also
known as a federated blockchain, is similar to a hybrid
blockchain in that it has private and public blockchain features.
But it's different in that multiple organizational members
Unit 3
Distributed consensus
It is the process by which multiple nodes in a distributed
system (such as a blockchain network) agree on a single
version of truth or the state of the network.

Achieving consensus is crucial for ensuring that all nodes in


a decentralized network maintain a consistent record of
transactions or data, despite potential faults, delays, or
malicious behavior.
Key Aspects of Distributed Consensus
• Agreement
All honest nodes must agree on the same value or outcome,
ensuring consistency across the network.
• Fault Tolerance
The system must be resilient to faults, such as hardware
failures, network issues, or malicious attacks (Byzantine faults).
Consensus algorithms are designed to function even when some
nodes fail or behave unpredictably.
• Decentralization
Unlike centralized systems where a single authority makes
decisions, distributed consensus relies on the participation of
many nodes, which helps improve security and trust in the system.
Types of Consensus Algorithms
• Proof of Work (PoW)
Used in Bitcoin and other blockchains, this requires nodes
(miners) to solve complex mathematical puzzles to propose a new
block.
The solution demonstrates computational effort, ensuring the
proposer has invested resources.

• Merits
Secure and robust against attacks, as altering the blockchain
would require significant computational power.

• Demerits
Energy-intensive and slower, leading to scalability issues.
• Proof of Stake (PoS)
Nodes (validators) are chosen to propose new blocks based on
the number of coins they hold and are willing to "stake" as
collateral.
Validators are incentivized to act honestly, as dishonest
behavior can result in a loss of their staked coins.

• Merits
Energy-efficient and can offer faster transaction processing
compared to PoW.

• Demerits
May lead to centralization if a few holders control most of the
• Byzantine Fault Tolerance (BFT)
BFT algorithms are designed to reach consensus even if
some nodes act maliciously or provide incorrect
information.
Practical Byzantine Fault Tolerance (PBFT) is an example
that ensures a system can tolerate Byzantine faults and
continue functioning correctly.

• Merits
Fast finality and high efficiency for smaller, permissioned
networks.

• Demerits
Less scalable for larger, public networks.
Importance of Distributed Consensus
• Security
Consensus mechanisms prevent double-spending and
ensure that only valid transactions are added to the
blockchain.
• Decentralization
With consensus, decision-making is distributed across
multiple nodes rather than controlled by a central authority.
• Fault Tolerance
Consensus mechanisms help the system remain
operational even when some nodes fail or act maliciously.
Life of Block chain applications
Miners
Miners are participants in a blockchain network who use
their computational power to validate and verify
transactions, add them to the blockchain, and secure the
network.

In Proof of Work (PoW) blockchains, like Bitcoin, miners


play a crucial role in maintaining the integrity of the
distributed ledger.
Roles and Responsibilities of Miners

• Transaction Validation: Miners collect transactions


broadcasted by users in the network and verify their
validity
• Block Creation: Once transactions are validated, miners
group them into a block, which is then added to the
blockchain.
• Solving Cryptographic Puzzles: In PoW blockchains,
miners compete to solve a complex mathematical puzzle,
which involves finding a specific value that makes the
hash of the block meet certain conditions.
This process is called "mining."
• Securing the Network: The computational power miners
provide helps secure the network against attacks, like
double-spending, by making it very costly and difficult for
Validators
Validators are participants in a blockchain network who are
responsible for validating transactions and creating new
blocks, particularly in Proof of Stake (PoS) and other
consensus mechanisms that do not rely on mining.

Unlike miners, validators are selected based on their stake


in the network rather than their computational power.
Roles and Responsibilities of
Validators
• Transaction Validation: Validators check the validity of
transactions broadcasted in the network, ensuring that
they meet all the required rules (e.g., verifying that the
sender has sufficient funds).

• Proposing and Attesting to Blocks: Validators take turns


proposing new blocks of transactions to add to the
blockchain. Other validators then confirm the validity of
the proposed block (a process called attesting).
• Securing the Network: By validating transactions and
creating blocks, validators help secure the network and
maintain its consensus. Their actions prevent malicious
behavior like double-spending and ensure the blockchain
remains up-to-date.
Difference Between Validators and
Miners
• Mining vs. Staking:
Miners use computational power to solve cryptographic
puzzles and propose blocks (Proof of Work), while
validators are selected based on their stake in the network
and attest to block validity (Proof of Stake).
• Energy Consumption:
Validators typically consume much less energy compared
to miners since they do not rely on solving computationally
intensive puzzles.
• Consensus Mechanism:
Miners work in PoW blockchains (e.g., Bitcoin), whereas
validators work in PoS and related consensus models (e.g.,
Ethereum 2.0, Cardano).

You might also like