0% found this document useful (0 votes)

19 views42 pages

11 Distributed1

This document discusses the CS6456 Graduate Operating Systems course taught by Brad Campbell at the University of Virginia. It covers topics like the end-to-end principle in computer networks, remote procedure calls, problems with non-atomic failures and performance in distributed systems, and challenges of coordination and consistency in distributed systems. The CAP theorem is introduced, which states that it is impossible for a distributed data store to simultaneously provide consistency, availability, and partition tolerance.

Uploaded by

maykelnawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views42 pages

11 Distributed1

Uploaded by

maykelnawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

CS6456: Graduate

Operating Systems
Brad Campbell – [email protected]
https://www.cs.virginia.edu/~bjc8c/class/cs6456-f19/

Some slides modified from CS162 at UCB

1
End-to-End Principle
Implementing complex functionality in the network:
• Doesn’t reduce host implementation complexity
• Does increase network complexity
• Probably imposes delay and overhead on all
applications, even if they don’t need functionality

• However, implementing in network can enhance

performance in some cases
• e.g., very lossy link

2
Conservative Interpretation of
E2E
• Don’t implement a function at the
lower levels of the system unless it
can be completely implemented at
this level

• Or: Unless you can relieve the burden

from hosts, don’t bother

3
Moderate Interpretation
• Think twice before implementing functionality in the
network
• If hosts can implement functionality correctly, implement
it in a lower layer only as a performance enhancement
• But do so only if it does not impose burden on
applications that do not require that functionality
• This is the interpretation we are using

• Is this still valid?

• What about Denial of Service?
• What about privacy against intrusion?

• Perhaps there are things that must be in the network?

4
Remote Procedure Call (RPC)
• Raw messaging is a bit too low-level for
programming
• Must wrap up information into message at source
• Must decide what to do with message at destination
• May need to sit and wait for multiple messages to
arrive

• Another option: Remote Procedure Call (RPC)

• Calls a procedure on a remote machine
• Client calls:
remoteFileSystemRead("rutabaga");
• Translated automatically into call on server:
fileSysRead("rutabaga");
5
RPC Implementation
• Request-response message passing (under covers!)
• “Stub” provides glue on client/server
• Client stub is responsible for “marshalling” arguments and
“unmarshalling” the return values
• Server-side stub is responsible for “unmarshalling”
arguments and “marshalling” the return values.

• Marshalling involves (depending on system)

• Converting values to a canonical form, serializing objects,
copying arguments passed by reference, etc.

6
RPC Information Flow
bundle
args
call send
Client Client Packet
(caller) Stub Handler
return receive
unbundle mbox2

Network
Machine A ret vals

Network
Machine B bundle
ret vals mbox1
return send
Server Server Packet
(callee) Stub Handler
call receive
unbundle
args

7
RPC Details
• Equivalence with regular procedure call
• Parameters Request Message
• Result  Reply message
• Name of Procedure: Passed in request message
• Return Address: mbox2 (client return mail box)

• Stub generator: Compiler that generates stubs

• Input: interface definitions in an “interface
definition language (IDL)”
• Contains, among other things, types of arguments/return
• Output: stub code in the appropriate source
language
• Code for client to pack message, send it off, wait for result,
unpack result and return to caller
• Code for server to unpack message, call procedure, pack results,
send them off 8
RPC Details
• Cross-platform issues:
• What if client/server machines are
different architectures/ languages?
• Convert everything to/from some canonical form
• Tag every item with an indication of how it is
encoded (avoids unnecessary conversions)

9
Problems with RPC: Non-Atomic Failures
• Different failure modes in dist. system than on a
single machine
• Consider many different types of failures
• User-level bug causes address space to crash
• Machine failure, kernel bug causes all
processes on same machine to fail
• Some machine is compromised by malicious
party
• Before RPC: whole system would crash/die
• After RPC: One machine crashes/compromised
while others keep working
• Can easily result in inconsistent view of the world
• Did my cached data get written back or not?
• Did server do what I requested or not?
10
Problems with RPC: Performance
• Cost of Procedure call « same-machine RPC «
network RPC

• Means programmers must be aware that RPC

is not free
• Caching can help, but may make failure
handling complex

11
12
Important “ilities”

• Availability: probability that the system

can accept and process requests

• Durability: the ability of a system to

recover data despite faults

• Reliability: the ability of a system or

component to perform its required
functions under stated conditions for a
specified period of time (IEEE
definition) 13
Distributed: Why?
• Simple, cheaper components

• Easy to add capability incrementally

• Let multiple users cooperate (maybe)

• Physical components owned by different users
• Enable collaboration between diverse users

17
The Promise of Dist. Systems
• Availability: One machine goes down, overall
system stays up

• Durability: One machine loses data, but

system does not lose anything

• Security: Easier to secure each component of

the system individually?

18
Distributed: Worst-Case Reality
• Availability: Failure in one machine brings
down entire system

• Durability: Any machine can lose your data

• Security: More components means more

points of attack

19
Distributed Systems Goal
• Transparency: Hide "distributed-ness" from any
external observer, make system simpler
• Types
• Location: Location of resources is invisible
• Migration: Resources can move without user knowing
• Replication: Invisible extra copies of resources (for
reliability, performance)
• Parallelism: Job split into multiple pieces, but looks
like a single task
• Fault Tolerance: Components fail without users
knowing
20
Challenge of Coordination
• Components communicate over the
network
• Send messages between machines

• Need to use messages to agree on

system state
• This issue does not exist in a centralized
system

21
CAP Theorem
• Originally proposed by Eric Brewer (Berkeley)

1. Consistency – changes appear to everyone

in same sequential order
2. Availability – can get a result at any time
3. Partition Tolerance – system continues to
work even when one part of network can't
communicate with the other

• Impossible to achieve all 3 at the same time

(pick two)
22
CAP Theorem Example
• What do we do if a network partition
occurs?
• Prefer Availability: Allow the state at some
nodes to disagree with the state at other
nodes (AP)
• Prefer Consistency: Reject requests until
Partition B
the partition is resolved (CP)

Partition A

23
Consistency Preferred

• Block writes until all nodes able to agree

• Consistent: Reads never return wrong values

• Not Available: Writes block until partition is

resolved and unanimous approval is possible

24
What about AP Systems?
• Partition occurs, but both groups of nodes
continue to accept requests
• Consequence: State might diverge between
the two groups (e.g., different updates are
executed)
• When communication is restored, there
needs to be an explicit recovery process
• Resolve conflicting updates so everyone agrees
on system state once again
25
General’s Paradox
• Two generals located on opposite sides of
their enemy’s position
• Can only communicate via messengers
• Messengers go through enemy territory:
might be captured

• Problem: Need to coordinate time of attack

• Two generals lose unless they attack at same time
• If they attack at same time, they win
26
General’s Paradox
• Can messages over an unreliable network
be used to guarantee two entities do
something simultaneously?
• No, even if all messages go through
11 am ok
?
Yes, 11 works
So, 11 it i
General 1 s? General 2

if you
Yeah, but what
ck?
Don’t get this a

27
Two-Phase Commit

• We can’t solve the General’s Paradox

• No simultaneous action
• But we can solve a related problem

• Distributed Transaction: Two (or more)

machines agree to do something or not do
it atomically

• Extra tool: Persistent Log

• If machine fails, it will remember what
happened 28
Two-Phase Commit: Setup
• One machine (coordinator) initiates the
protocol
• It asks every machine to vote on
transaction

• Two possible votes:

• Commit
• Abort

• Commit transaction only if unanimous 29

Two-Phase Commit: Preparing
Agree to Commit
• Machine has guaranteed that it will accept
transaction
• Must be recorded in log so machine will remember
this decision if it fails and restarts
Agree to Abort
• Machine has guaranteed that it will never accept
this transaction
• Must be recorded in log so machine will remember
this decision if it fails and restarts
30
Two-Phase Commit: Finishing

Commit Transaction
• Coordinator learns all machines have agreed
to commit
• Apply transaction, inform voters
• Record decision in local log
Abort Transaction
• Coordinator learns at least on machine has
voted to abort
• Do not apply transaction, inform voters
• Record decision in local log
31
Example: Failure-Free 2PC

coordinator
VOTE- GLOBAL-
REQ COMMIT
worker 1

worker 2
VOTE-
COMMIT
worker 3
time
36
Example: Failure-Free 2PC

coordinator
VOTE- VOTE- GLOBAL-
REQ ABORT ABORT
worker 1

worker 2
VOTE-
COMMIT
worker 3
time
37
Example of Worker Failure
INIT

WAIT

coordinator ABORT COMM timeout

GLOBAL-
VOTE-REQ ABORT
worker 1

VOTE-
worker 2 COMMIT

worker 3 time 38
Example of Coordinator Failure
INIT

READY

ABORT COMM
coordinator restarted

VOTE-REQ
worker 1
VOTE- GLOBAL-
worker 2 COMMIT ABORT

block waiting for

worker 3 coordinator 40
Paxos: fault tolerant agreement
• Paxos lets all nodes agree on the same
value despite node failures, network
failures and delays
• High-level process:
• One (or more) node decides to be the leader
• Leader proposes a value and solicits acceptance
from others
• Leader announces result or try again

45
Google Spanner
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman,
Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian
Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle,
Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth
Wang, and Dale Woodford. 2012. Spanner: Google's globally-distributed database. In Proceedings of
the 10th USENIX conference on Operating Systems Design and Implementation (OSDI'12). USENIX
Association, Berkeley, CA, USA, 251-264.

46
Basic Spanner Operation
• Data replicated across
datacenters
• Paxos groups support
transactions
• On commit:
• Grab Paxos lock
• Paxos algorithm decides
consensus
• If all agree, transaction is
committeed

47
Spanner Operation

Paxos Paxos

2PC

48
Base operation great for writes…
• What about reads?
• Reads are dominant operations
• e.g., FB’s TAO had 500 reads : 1 write [ATC 2013]
• e.g., Google Ads (F1) on Spanner from 1? DC in
24h:
21.5B reads
31.2M single-shard transactions
32.1M multi-shard transactions
• Want efficient read transactions

49
Make Read-Only Txns Efficient
• Ideal: Read-only transactions that are non-
blocking
• Arrive at shard, read data, send data back

• Goal 1: Lock-free read-only transactions

• Goal 2: Non-blocking stale read-only txns

50
TrueTime

• “Global wall-clock time” with bounded uncertainty

• ε is worst-case clock divergence
• Timestamps become intervals, not single values

TT.now()
time

earliest latest

2*ε
Consider event enow which invoked tt = TT.now():
Guarantee: tt.earliest <= tabs(enow) <= tt.latest
51
TrueTime for Read-Only Txns

• Assign all transactions a wall-clock commit time (s)

• All replicas of all shards track how up-to-date they are with
tsafe: all transactions with s < tsafe have committed on this
machine

• Goal 1: Lock-free read-only transactions

• Current time ≤ TT.now.latest()
• sread = TT.now.latest()
• wait until sread < tsafe
• Read data as of sread

• Goal 2: Non-blocking stale read-only txns

• Similar to above, except explicitly choose time in the past
• (Trades away consistency for better perf, e.g., lower latency)
52
Timestamps and TrueTime

Acquired locks Release locks

Pick s > TT.now().latest s Wait until TT.now().earliest > s

Commit wait

average ε average ε

• Key: Need to ensure that all future transactions will get a higher timestamp
• Commit wait ensures this

53
Commit wait
• What does this mean for performance?
• Larger TrueTime uncertainty bound
• longer commit wait
• Longer commit wait
• locks held longer
• can’t process conflicting transactions
• lower throughput
• i.e., if time is less certain, Spanner is slower!

DLP RW 2018 Orientation
75% (12)
DLP RW 2018 Orientation
2 pages
Scavenger Hunt Medieval
100% (1)
Scavenger Hunt Medieval
4 pages
Gray and Black Professional Resume
No ratings yet
Gray and Black Professional Resume
1 page
Mike Kelley Minor Histories
100% (2)
Mike Kelley Minor Histories
459 pages
Fault System One
No ratings yet
Fault System One
19 pages
Aos-Unit 2
No ratings yet
Aos-Unit 2
23 pages
4 Merged
No ratings yet
4 Merged
161 pages
Chapte Four DS
No ratings yet
Chapte Four DS
37 pages
DC Mod 2
No ratings yet
DC Mod 2
10 pages
06 RPC
No ratings yet
06 RPC
44 pages
Chapter 06 Fault - Tolerance
No ratings yet
Chapter 06 Fault - Tolerance
30 pages
DC Unit 3
No ratings yet
DC Unit 3
44 pages
14CS705B-Distributed Systems Scheme
No ratings yet
14CS705B-Distributed Systems Scheme
24 pages
Chapter 4 Communication
No ratings yet
Chapter 4 Communication
25 pages
Chapter 2 Communication
No ratings yet
Chapter 2 Communication
59 pages
Distributed Computing Practice Questions Chapter 4 pt2
No ratings yet
Distributed Computing Practice Questions Chapter 4 pt2
6 pages
Chapter 2 - Communication: Distributed Systems (IT 441)
No ratings yet
Chapter 2 - Communication: Distributed Systems (IT 441)
59 pages
Lecture 9 - RPC and Concurrency Control
No ratings yet
Lecture 9 - RPC and Concurrency Control
29 pages
Chapter 4-Communication
No ratings yet
Chapter 4-Communication
61 pages
Chapter 4 Communication
No ratings yet
Chapter 4 Communication
23 pages
Communication Basics,: RPC & Rmi
No ratings yet
Communication Basics,: RPC & Rmi
55 pages
Distributed Systems Chapter 4-Communication (Compatibility Mode)
No ratings yet
Distributed Systems Chapter 4-Communication (Compatibility Mode)
71 pages
Chapter 2 DS Communication
No ratings yet
Chapter 2 DS Communication
29 pages
Inter-Process Communication in Distributed System: Chapter Three
No ratings yet
Inter-Process Communication in Distributed System: Chapter Three
38 pages
Chapter 2 Communication
No ratings yet
Chapter 2 Communication
66 pages
CS 194: Distributed Systems
No ratings yet
CS 194: Distributed Systems
15 pages
Implementing Remote Procedure Calls
No ratings yet
Implementing Remote Procedure Calls
14 pages
CH4 - Communication
No ratings yet
CH4 - Communication
68 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
37 pages
Lecture 3 (RPC) Slides
No ratings yet
Lecture 3 (RPC) Slides
47 pages
Cluster Computing: Dr. C. Amalraj 07/06/2021 The University of Moratuwa Amalraj@uom - LK
No ratings yet
Cluster Computing: Dr. C. Amalraj 07/06/2021 The University of Moratuwa Amalraj@uom - LK
37 pages
Unit 2 DOS
No ratings yet
Unit 2 DOS
10 pages
Unit - Iv
No ratings yet
Unit - Iv
19 pages
Distributed Systems - Fault Tolerance
No ratings yet
Distributed Systems - Fault Tolerance
21 pages
Distributed System RPC
100% (1)
Distributed System RPC
93 pages
Chapter 4-Communication
No ratings yet
Chapter 4-Communication
41 pages
Chapter 2 Communication
No ratings yet
Chapter 2 Communication
86 pages
Slides 04
No ratings yet
Slides 04
50 pages
Unit 4 (KD)
No ratings yet
Unit 4 (KD)
61 pages
Dist Sys Slides
No ratings yet
Dist Sys Slides
516 pages
L2 RPC PDF
No ratings yet
L2 RPC PDF
53 pages
Chapter 2 Communicationk
No ratings yet
Chapter 2 Communicationk
66 pages
Slides
No ratings yet
Slides
516 pages
ch4 2-Fall08
No ratings yet
ch4 2-Fall08
30 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
03 Communication PDF
No ratings yet
03 Communication PDF
72 pages
Chapter 4 - Communication
No ratings yet
Chapter 4 - Communication
22 pages
CH-02 Distributed Objects & File System
No ratings yet
CH-02 Distributed Objects & File System
13 pages
DC Chap 4
No ratings yet
DC Chap 4
58 pages
Chapter 4 Communication
No ratings yet
Chapter 4 Communication
76 pages
Unit 3
No ratings yet
Unit 3
39 pages
Distributed Systems: Lecture #1: Remote Communication
No ratings yet
Distributed Systems: Lecture #1: Remote Communication
20 pages
Chapter - 4
No ratings yet
Chapter - 4
53 pages
Design Issues of DS
No ratings yet
Design Issues of DS
21 pages
Lecture8 DistributedSystem
No ratings yet
Lecture8 DistributedSystem
27 pages
Lecture23 FaultTolerance
No ratings yet
Lecture23 FaultTolerance
56 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
Distributed, Network System and RPC
No ratings yet
Distributed, Network System and RPC
18 pages
DS Chapter 4-Communication
No ratings yet
DS Chapter 4-Communication
71 pages
Communications
No ratings yet
Communications
33 pages
Distributed Systems
No ratings yet
Distributed Systems
18 pages
15 ML
No ratings yet
15 ML
60 pages
09 Security
No ratings yet
09 Security
51 pages
Cheat Sheet
No ratings yet
Cheat Sheet
6 pages
Parallel Progamming With Pthreads
No ratings yet
Parallel Progamming With Pthreads
79 pages
Fall 2015 PHD Orientation
No ratings yet
Fall 2015 PHD Orientation
54 pages
Osproject 2010
No ratings yet
Osproject 2010
2 pages
Distributed Python
No ratings yet
Distributed Python
22 pages
Gram and Voc Starter Unit Possessive 'S: Be: Affirmative, Negative and Questions
No ratings yet
Gram and Voc Starter Unit Possessive 'S: Be: Affirmative, Negative and Questions
95 pages
Borobudur As A Complete YANTRA Signifying Exposition of Buddhist Doctrine
No ratings yet
Borobudur As A Complete YANTRA Signifying Exposition of Buddhist Doctrine
20 pages
Worksheet Integer Operations With Powers
No ratings yet
Worksheet Integer Operations With Powers
3 pages
PLC Lab Manual For s5 Mechhatronics
No ratings yet
PLC Lab Manual For s5 Mechhatronics
22 pages
The Average Black Girl
No ratings yet
The Average Black Girl
2 pages
Regular Expressions
No ratings yet
Regular Expressions
30 pages
Problems On Speed, Distance & Time
No ratings yet
Problems On Speed, Distance & Time
40 pages
EvasionEval A Benchmark For LLMs in Evaluating Advanced Defense Evasion Techniques
No ratings yet
EvasionEval A Benchmark For LLMs in Evaluating Advanced Defense Evasion Techniques
19 pages
Amity School of Engineering & Technology
No ratings yet
Amity School of Engineering & Technology
33 pages
09 Sn2072eu01sn 0001 App Basics ss7
No ratings yet
09 Sn2072eu01sn 0001 App Basics ss7
34 pages
Appel - Conference - ACEA - 2024 - Yaoundé II - New - FR
No ratings yet
Appel - Conference - ACEA - 2024 - Yaoundé II - New - FR
2 pages
Tutorial 3 Question
No ratings yet
Tutorial 3 Question
8 pages
Practice Sheet 01
No ratings yet
Practice Sheet 01
11 pages
Patrol Agent 3.6
No ratings yet
Patrol Agent 3.6
468 pages
Do, Does, Don't, Doesn't
No ratings yet
Do, Does, Don't, Doesn't
39 pages
Indiana University Summer Language Workshop Student Handbook 2020
No ratings yet
Indiana University Summer Language Workshop Student Handbook 2020
17 pages
2024 Aqa Lang 1
No ratings yet
2024 Aqa Lang 1
4 pages
Module 2.1 - Speaking Mathematically
No ratings yet
Module 2.1 - Speaking Mathematically
7 pages
[Ebooks PDF] download Beyond the Analytic Continental Divide Pluralist Philosophy in the Twenty First Century Routledge Studies in Contemporary Philosophy 1st Edition Jeffrey A. Bell (Editor) full chapters
100% (10)
[Ebooks PDF] download Beyond the Analytic Continental Divide Pluralist Philosophy in the Twenty First Century Routledge Studies in Contemporary Philosophy 1st Edition Jeffrey A. Bell (Editor) full chapters
84 pages
Class 11 Comp QB
No ratings yet
Class 11 Comp QB
24 pages
Outsiders
No ratings yet
Outsiders
4 pages
CS304 Mcqs FinalTerm by Vu Topper RM
No ratings yet
CS304 Mcqs FinalTerm by Vu Topper RM
34 pages
CNF Finals LP
No ratings yet
CNF Finals LP
10 pages
Ebook Finite Element Procedures in Engineering Analysis Bathe 1982
No ratings yet
Ebook Finite Element Procedures in Engineering Analysis Bathe 1982
20 pages
10-MATH First Quarter
No ratings yet
10-MATH First Quarter
3 pages

11 Distributed1

Uploaded by

11 Distributed1

Uploaded by

CS6456: Graduate

Some slides modified from CS162 at UCB

• However, implementing in network can enhance

• Or: Unless you can relieve the burden

• Is this still valid?

• Perhaps there are things that must be in the network?

• Another option: Remote Procedure Call (RPC)

• Marshalling involves (depending on system)

• Stub generator: Compiler that generates stubs

• Means programmers must be aware that RPC

• Availability: probability that the system

• Durability: the ability of a system to

• Reliability: the ability of a system or

• Easy to add capability incrementally

• Let multiple users cooperate (maybe)

• Durability: One machine loses data, but

• Security: Easier to secure each component of

• Durability: Any machine can lose your data

• Security: More components means more

• Need to use messages to agree on

1. Consistency – changes appear to everyone

• Impossible to achieve all 3 at the same time

• Block writes until all nodes able to agree

• Consistent: Reads never return wrong values

• Not Available: Writes block until partition is

• Problem: Need to coordinate time of attack

• We can’t solve the General’s Paradox

• Distributed Transaction: Two (or more)

• Extra tool: Persistent Log

• Two possible votes:

• Commit transaction only if unanimous 29

coordinator ABORT COMM timeout

block waiting for

• Goal 1: Lock-free read-only transactions

• Goal 2: Non-blocking stale read-only txns

• “Global wall-clock time” with bounded uncertainty

• Assign all transactions a wall-clock commit time (s)

• Goal 1: Lock-free read-only transactions

• Goal 2: Non-blocking stale read-only txns

Acquired locks Release locks

Pick s > TT.now().latest s Wait until TT.now().earliest > s

You might also like