0% found this document useful (0 votes)
379 views

Distributed Systems Notes

The document discusses key concepts in distributed systems including: 1) Distributed systems are collections of independent computers that work together to perform tasks. They face challenges of transparency, flexibility, and scaling. 2) Middleware provides services to simplify programming distributed systems and hides heterogeneity between nodes. 3) Techniques for scaling distributed systems include decentralization, hiding communication latencies, and replication. Distributed algorithms must work with partial information and node failures.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
379 views

Distributed Systems Notes

The document discusses key concepts in distributed systems including: 1) Distributed systems are collections of independent computers that work together to perform tasks. They face challenges of transparency, flexibility, and scaling. 2) Middleware provides services to simplify programming distributed systems and hides heterogeneity between nodes. 3) Techniques for scaling distributed systems include decentralization, hiding communication latencies, and replication. Distributed algorithms must work with partial information and node failures.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

A distributed system is a collection of  Reliability

independent computers that are  Performance


used jointly to perform a single task  Scalability
or to provide a single service.

Flexibility:
Network OS:
➜ Build a system out of (only)
Properties required components
 No single system image ➜ Extensibility: Components/services
 Individual nodes are highly can be changed or added
autonomous
 All distribution of tasks is ➜ Openness of interfaces and
explicit to the user specification
 Examples: Linux, Windows ➜ Allows reimplementation
and extension

Middleware OS: ➜ Interoperability

 Similar to a uniprocessor OS ➜ Separation of policy and


but mechanism
 System independent interface ➜ Standardized internal
for distributed programming
interfaces
 Improves transparency(e.g,
hides heterogeneity)
 Provides services(e.g., naming Techniques for Scaling:
services)
 Decentralisation
 Provides programming
model(e.g., distributed objects)  Hiding communication
latencies(asynchronous
communication, reduce
The distributed nature of a system communication)
gives rise to the following challenges:
 Distribution ( spreading
 Transparency data and control around)
 Flexibility
 Replication (making  Synchronization
copies of data and  Replication and Consistency
processes).  Naming
 Fault Tolerance
 Security
Decentralisation:
Avoid centralising
Most distributed systems are built
 Services( e.g., based on a particular paradigm (or
single server) model):
 Data (e.g., central  Shared memory
directories)  Distributed objects
 Algorithms(e.g.,  Distributed file system
based on  Shared documents
complete  Distributed coordination
information)  Agents

With regards to algorithms: MISCELLANEOUS ‘RULES OF THUMB’


➜ Do not require any machine  Trade-offs: Many of the
to hold complete system state challenges provide conflicting
➜ Allow nodes to make requirements. For example
decisions based on local info better scalability can cause
➜ Algorithms must survive worse overall performance.
failure of nodes Have to make trade-offs -what
➜ No assumption of a global is more important?
clock  Separation of Concerns: Split a
problem into individual
concerns and address each
Several key principles underlying all separately.
distributed systems:
 End-to-End Argument: Some
 Concurrency communication functions can
 Communication
only be reliably implemented
at the application level
STRUCTURED OVERLAY
 Policy vs. Mechanism: A
⮚ Nodes have identifier and
system should build
range
mechanisms that allow flexible
application of policies. Avoid ⮚ Data has identifier
built-in policies.
⮚ Nodes is responsible for data
that falls in its range
System Architecture ⮚ Search is routed to appropriate
⮚ Placement of machines node

⮚ Placement of software
machines Combination of architectures:
Example:
Software Architecture ⮚ Superpeer networks
Logical organization and roles of ⮚ Collaborative distributed
software components systems
⮚ Layered ⮚ Edge-server systems
⮚ Object-oriented

⮚ Data-centered STATEFUL VS STATELESS SERVERS:


⮚ Event-based Stateful:
⮚ Keeps persistent
information about
UNSTRUCTURED OVERLAY
clients
⮚ Data stored at random nodes
⮚ Improved
⮚ Partial view: node’s list of performance
neighbours
⮚ Expensive crash
⮚ Exchange partial views with recovery
neighbours to update
⮚ Must track clients ➀Data oriented vs control oriented
communication
Stateless:
➁ Synchronous vs asynchronous
⮚ Does not keeps state of clients
communication
⮚ Soft state design: limited client
➂ Transient vs persistent
state
communication
⮚ Can change own state
without informing clients
Data-Oriented vs Control-Oriented
⮚ No cleanup after crash
Communication:
⮚ Easy to replicate
Data-oriented communication
⮚ Increase communication
➜ Facilitates data exchange
between threads
Code Mobility: ➜ Shared address space,
shared memory & message passing
Weak vs Strong Mobility
⮚ Weak transfer only code Control-oriented communication
⮚ Strong transfer code and ➜ Associates a transfer of
execution segment control with communication
➜ Active messages, remote
procedure call (RPC) & remote
Sender vs Receiver initiated method invocation (RMI)
migration:
COMMUNICATION ABSTRACTIONS:
⮚ Sender Send program to
compute server Abstractions above simple message
passing make communication easier
⮚ Receiver Download applets
for the programmer.
Provided by higher level APIs

COMMUNICATION MODES
• ➀ Remote Procedure ➜ Service discovery
Call (RPC) & Remote
➜ Event notification
Method Invocation (RMI)
• ➁ Message-Oriented
Communication GOSSIP-BASED COMMUNICATION

• ➂ Group Communication Data Streams:

• ➃ Streams ➜ Sequence of data units

➜ Can apply to discrete and


continuos media (e.g., TCP
REMOTE PROCEDURE CALL(RPC):
connection is a stream)
Idea: Replace I/O oriented message
➜ Simple stream: single sequence of
passing model by execution of a
data
procedure call on a remote node
[BN84]: ➜ Complex stream: several related
streams (substreams) (e.g.,audio and
video)
➜ Synchronous - based on blocking
messages
Transmission modes:
➜ Message-passing details hidden
from application  Asynchronous no timing
constraints (e.g., file transfer)
➜ Procedure call parameters used to
 Synchronous maximum end-to-
transmit data
end delay for each data unit
➜ Client calls local “stub” which does (e.g., sending sampled data.
messaging and marshalling Must not be too old when it
reaches destination)
 Isochronous maximum and
GROUP-BASED COMMUNICATION minimum end-to-end delay
(e.g., video)
Used for:

➜ Replication of services
Timing model of a distributed system
➜ Replication of data
Affected by: ➜ Execution speed/time of processes
➜ Execution speed/time of processes ➜ Communication delay
➜ Communication delay ➜ Clocks & clock drift
➜ Clocks & clock drift ➜ (Partial) failure
➜ (Partial) failure
With coordination you should agree
on Values:
EVALUATING DISTRIBUTED
ALGORITHMS ➜ Agree on global value
General Properties: ➜ Agree on environment
➜ Performance ➜ Agree on state
 number of messages
exchanged
Main issues of coordination and
 response/wait time
synchronization.
 Delay
 throughput: 1/(delay + • Time and Clocks:
executiontime) • Global State:
 complexity: O()
• Concurrency Control:
➜ Efficiency
 resource usage: memory, CPU,
etc. Synchronization Modes NETWORK
TIME PROTOCOL (NTP):
➜ Scalability

➜ Reliability Multicast: for LAN, low accuracy


Procedure Call: clients poll,
 number of points of failure
reasonable accuracy
(low is good)
Symmetric: Between peer servers.
highest accuracy
Timing model of a distributed system
Affected by:
Read about LOGICAL CLOCKS – 1. Released: Outside of critical
Lamport’s logical clock section
Read about VECTOR CLOCKS 2. Wanted: Waiting to enter
critical section
Read about: CHANDY & LAMPORT’S
SNAPSHOTS 3. Held: Inside critical section

DISTRIBUTION MUTUAL EXCLUSION EVALUATING DISTRIBUTED


Requirements: ALGORITHMS
• ➀ Safety: At most one process General Properties:
may execute the critical section at
➜ Performance
a time
number of messages exchanged
• ➁ Liveness: Requests to enter and
exit the critical section eventually response/wait time
succeed Delay
• ➂ Ordering: Requests are throughput: 1/(delay +
processed in happened-before executiontime)
ordering
complexity: O()

METHOD 1: CENTRAL SERVER ➜ Efficiency

METHOD 2: TOKEN RING resource usage: memory, CPU, etc.

METHOD 3: USING MULTICAST AND ➜ Scalability


LOGICAL CLOCKS ➜ Reliability
number of points of failure (low is
Algorithm by Ricart & Agrawala: good)

➜ Processes pi maintain a Lamport


clock and can communicate pairwise
➜ Processes are in one of three
states:
SYNCHRONISATION AND OTHER Multicast ISSUES
COORDINATION 1
• Multicast
Performance:
• Elections ➜ Bandwidth
• Transactions ➜ Delay
Efficiency:
Examples of Multi-Cast
➜ Avoid sending a message over
Fault Tolerance a link multiple times (stress)
Service Discovery ➜ Distribution tree
Performance ➜ Hardware support (e.g.,
Event or Notification propagation Ethernet broadcast)
Network-level vs Application-level:
➜ Network routers understand
multicast
➜ Applications (or middleware)
send unicasts to group members
➜ Overlay distribution tree

Types of MultiCast:
BASIC MULTICAST
FIFO MULTICAST
CAUSAL MULTICAST
TOTALLY ORDERED MULTICAST
Sequence Based
Properties of Multicast
Aggreement-based
Group membership:
Open vs Closed group: Other possibilities:
Reliability: ➜ Moving sequencer
Ordering: ➜ Logical clock based
• each receiver
determines order independently
• delivery based REPLICATION I SSUES
on sender timestamp ordering
Updates

➜ Token based Replica placement


➜ Physical clock ordering Redirection/Routing
Hybrid Ordering:
➜ FIFO + Total
➜ Causal + Total Replica INCONSISTENCY
Staleness:

Research Bully Algorithm: Operation order:

Three types of messages in Bully Conflicting Data


algorithm:
• Election: announce election
• Answer: response to election TYPES OF DATA-CENTRIC
• Coordinator: announce CONSISTENCY MODELS Y:
elected coordinator STRICT CONSISTENCY
Read about Election RING SEQUENTIAL CONSISTENCY
ALGORITHM
CAUSAL CONSISTENCY
WEAK CONSISTENCY
ACID PROPERTIES OF TRANSACTIONS
RELEASE CONSISTENCY
atomic
ENTRY CONSISTENC
consistent
isolated
durable
CLIENT-CENTRIC CONSISTENCY
MODELS:
CLASSIFICATION OF TRANSACTION
Flat
Nested
Distributed CONSISTENCY PROTOCOLS:
Primary-Based Protocols:
➜ Remote-write protocols • Maintainability: how easily a
failed system can be repaired
➜ Local-write protocols
Replicated-Write Protocols:
CATEGORISING FAULTS AND
➜ Active Replication FAILURES
➜ Quorum-Based Protocols • Types of Faults:
• Transient Fault: occurs
PUSH vs PULL once then disappear

REPLICA PLACEMENT • Intermittent Fault:


occurs, vanishes,
DYNAMIC REPLICATION reoccurs, vanishes, etc.
• Permanent Fault:
DEPENDABILITY DEPENDS ON: persists until faulty
component is replaced
• Failure
• Types of Failures:
• Reliable Communication
• Process Failure: process
• Process Resilience
proceeds incorrectly or
• Recovery not at all
• Storage Failure: “stable”
secondary storage is
DEPENDABILITY DEPENDS ON:
inaccessible
• Availability: system is ready to
Communication Failure:
be used immediately
communication link or
• Reliability: system can run node failure
continuously without failure
• Safety: when a system
FAILURE MODELS:
(temporarily) fails to operate
correctly, • Crash Failure: a server halts,
nothing catastrophic happens but works correctly until it
halts
• Fail-Stop: server will Arbitrary Failure: a server may
stop in a way that clients produce arbitrary response at
can tell that it has arbitrary times (aka Byzantine failure)
halted.
• Fail-Resume server will
• FAULT TOLERANCE Techniques:
stop, then resume
execution at a later time. • ➜ Prevention: prevent
or reduce occurrence of
• Fail-Silent: clients do not
faults
know server has halted
• ➜ Prediction: predict
• Omission Failure: a server fails
the faults that can occur
to respond to incoming
and deal with them
requests
• Receive Omission: fails • ➜ Masking: hide the
to receive incoming occurrence of the fault
messages • ➜ Recovery: restore an
• Send Omission: fails to erroneous state to an
send messages error-free state

• Timing Failure: a server’s


response lies outside the STUDY Byzantine Fault Tolerance
specified
time interval
• Response Failure: a server’s DS CHALLENGES
response is incorrect Transparency
• Value Failure: the value Flexibility
of the response is wrong
Dependability
• State Transition Failure:
Performance
the server deviates from
the correct flow Scalability
of control

Three designs of replication:


➜ Explicit replication: The client
explicitly writes files to multiple
servers (not transparent).
➜ Lazy file replication: Server
automatically copies files to other
servers after file is written.
➜ Group file replication: WRITEs
simultaneously go to a group of
servers.

Read about case studies:


➜ Network File System (NFS)

➜ Andrew File System (AFS) & Coda

➜ Google File System (GFS)

You might also like