0% found this document useful (0 votes)
39 views50 pages

Slides 04

This chapter discusses communication in distributed systems. It covers layered network protocols like the physical, data link, network, and transport layers. The transport layer provides the actual communication facilities for most distributed systems, with TCP and UDP being standard protocols. Middleware provides common services like communication protocols, (un)marshaling, naming, security, and scaling. Client/server communication is typically transient and synchronous, while message-oriented middleware aims for asynchronous communication using messages. Remote procedure call (RPC) hides communication between caller and callee using a procedure call mechanism. Parameter passing and asynchronous/multiple RPCs are also discussed.

Uploaded by

Susu Melody
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views50 pages

Slides 04

This chapter discusses communication in distributed systems. It covers layered network protocols like the physical, data link, network, and transport layers. The transport layer provides the actual communication facilities for most distributed systems, with TCP and UDP being standard protocols. Middleware provides common services like communication protocols, (un)marshaling, naming, security, and scaling. Client/server communication is typically transient and synchronous, while message-oriented middleware aims for asynchronous communication using messages. Remote procedure call (RPC) hides communication between caller and callee using a procedure call mechanism. Parameter passing and asynchronous/multiple RPCs are also discussed.

Uploaded by

Susu Melody
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Distributed Systems

(4th edition, version 01)

Chapter 04: Communication


Communication Foundations

Basic networking model

Drawbacks
• Focus on message-passing only
• Often unneeded or unwanted functionality
• Violates access transparency
Layered Protocols
Communication Foundations

Low-level layers
Recap
• Physical layer: contains the specification and implementation of bits,
and their transmission between sender and receiver
• Data link layer: prescribes the transmission of a series of bits into a
frame to allow for error and flow control
• Network layer: describes how packets in a network of computers are
to be routed.

Observation
For many distributed systems, the lowest-level interface is that of the
network layer.

Layered Protocols
Communication Foundations

Transport Layer
Important
The transport layer provides the actual communication facilities for most
distributed systems.

Standard Internet protocols


• TCP: connection-oriented, reliable, stream-oriented
communication
• UDP: unreliable (best-effort) datagram communication

Layered Protocols
Communication Foundations

Middleware layer
Observation
Middleware is invented to provide common services and protocols that can
be used by many different applications
• A rich set of communication protocols
• (Un)marshaling of data, necessary for integrated systems
• Naming protocols, to allow easy sharing of resources
• Security protocols for secure communication
• Scaling mechanisms, such as for replication and caching

Note
What remains are truly application-specific protocols... such as?

Layered Protocols
Communication Foundations

An adapted layering scheme

Layered Protocols
Communication Foundations

Types of communication
Distinguish...

• Transient versus persistent communication


• Asynchronous versus synchronous communication

Types of Communication
Communication Foundations

Types of communication
Transient versus persistent

• Transient communication: Comm. server discards message when


it cannot be delivered at the next server, or at the receiver.
• Persistent communication: A message is stored at a communication
server as long as it takes to deliver it.

Types of Communication
Communication Foundations

Types of communication
Places for synchronization

• At request submission
• At request delivery
• After request processing

Types of Communication
Communication Foundations

Client/Server
Some observations
Client/Server computing is generally based on a model of
transient synchronous communication:
• Client and server have to be active at the time of
communication
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and
subsequently processes them

Types of Communication
Communication Foundations

Client/Server
Some observations
Client/Server computing is generally based on a model of
transient synchronous communication:
• Client and server have to be active at the time of
communication
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and
subsequently processes them

Drawbacks synchronous communication


• Client cannot do any other work while waiting for reply
• Failures have to be handled immediately: the client is waiting
• The model may simply not be appropriate (mail, news)

Types of Communication
Communication Foundations

Messaging
Message-oriented middleware
Aims at high-level persistent asynchronous communication:
• Processes send each other messages, which are queued
• Sender need not wait for immediate reply, but can do other things
• Middleware often ensures fault tolerance

Types of Communication
Communication Remote procedure call

Basic RPC operation


Observations
• Application developers are familiar with simple procedure model
• Well-engineered procedures operate in isolation (black box)
• There is no fundamental reason not to execute procedures on
separate machine

Conclusion
Communication between caller &
callee can be hidden by using
procedure-call mechanism.

Basic RPC operation


Communication Remote procedure call

Basic RPC operation

1. Client procedure calls client stub. 6. Server does local call; returns result to stub.
2. Stub builds message; calls local OS. 7. Stub builds message; calls OS.
3. OS sends message to remote OS. 8. OS sends message to client’s OS.
4. Remote OS gives message to stub. 9. Client’s OS gives message to stub.
5. Stub unpacks parameters; calls server. 10. Client stub unpacks result; returns to client.

Basic RPC operation


Communication Remote procedure call

RPC: Parameter passing


There’s more than just wrapping parameters into a message
• Client and server machines may have different data representations
(think of byte ordering)
• Wrapping a parameter means transforming a value into a sequence
of bytes
• Client and server have to agree on the same encoding:

• How are basic data values represented (integers, floats, characters)


• How are complex data values represented (arrays, unions)

Conclusion
Client and server need to properly interpret messages, transforming them
into machine-dependent representations.

Parameter passing
Communication Remote procedure call

RPC: Parameter passing


Some assumptions
• Copy in/copy out semantics: while procedure is executed, nothing can
be assumed about parameter values.
• All data that is to be operated on is passed by parameters.
Excludes passing references to (global) data.

Parameter passing
Communication Remote procedure call

RPC: Parameter passing


Some assumptions
• Copy in/copy out semantics: while procedure is executed, nothing can
be assumed about parameter values.
• All data that is to be operated on is passed by parameters.
Excludes passing references to (global) data.

Conclusion
Full access transparency cannot be realized.

Parameter passing
Communication Remote procedure call

RPC: Parameter passing


Some assumptions
• Copy in/copy out semantics: while procedure is executed, nothing can
be assumed about parameter values.
• All data that is to be operated on is passed by parameters.
Excludes passing references to (global) data.

Conclusion
Full access transparency cannot be realized.

A remote reference mechanism enhances access transparency


• Remote reference offers unified access to remote data
• Remote references can be passed as parameter in RPCs
• Note: stubs can sometimes be used as such references

Parameter passing
Communication Remote procedure call

Asynchronous RPCs
Essence
Try to get rid of the strict request-reply behavior, but let the client
continue without waiting for an answer from the server.

Variations on RPC
Communication Remote procedure call

Sending out multiple RPCs


Essence
Sending an RPC request to a group of
servers.

Variations on RPC
Communication Message-oriented communication

Transient messaging: sockets


Berkeley socket interface
Operation Description
socket Create a new communication end point
bind Attach a local address to a socket
listen Tell operating system what the maximum number of pending
connection requests should be
accept Block caller until a connection request arrives
connect Actively attempt to establish a connection
send Send some data over the connection
receive Receive some data over the connection
close Release the connection

Simple transient messaging with sockets


Communication Message-oriented communication

Sockets: Python code


Server
1 from socket import *
2
3 c l a s s Server:
4 def r u n ( s e l f ) :
5 s = socket(AF_INET, SOCK_STREAM)
6 s.bind((HOST, PORT))
7 s.listen(1)
8 (conn, addr) = s . a c c e p t ( ) #
9 while
re tur ns True:
new s o c k e t and addr.# forever
10 c l di eant at = conn.recv(1024) # re c e iv e data from c l i e n t
11 i f not d a t a : break # s t o p i f c l i e n t stopped
12 conn.send(data+b"*")
# re tu r n s e n t data p l u s an " * "
13 conn.close() # c l o s e t h e connection

Client
1 c l a s s C l i e nt :
2 def r u n ( s e l f ) :
3 s = socket(AF_INET, SOCK_STREAM)
4 s.connect((HOST, PORT)) # connect t o s e r v e r (block u n t i l accepted)
5 s.send(b"Hello, world") # send same data
6 d a t a = s.recv(1024) # re c e iv e t h e response
7 print(data) # p r i n t what you received
8 s.send(b"") # t e l l t h e s e r v e r t o close
9 s.close() # c l o s e t h e connection

Simple transient messaging with sockets


Communication Message-oriented communication

Making sockets easier to work with


Observation
Sockets are rather low level and programming mistakes are easily
made. However, the way that they are used is often the same (such as
in a client-server setting).

Alternative: ZeroMQ
Provides a higher level of expression by pairing sockets: one for sending
messages at process P and a corresponding one at process Q for
receiving messages. All communication is asynchronous.

Three patterns
• Request-reply
• Publish-subscribe
• Pipeline

Advanced transient messaging


Communication Message-oriented communication

Request-reply

1 import zmq
2
3 def s e r v e r ( ) :
4 context = zmq.Context()
5 socket = context.socket(zmq.REP) # c re a t e re p l y socket
6 socket.bind("tcp:// * :12345") # bind s o c k e t t o address
7
8 while True:
9 message = socket.recv() # w ait f o r incoming message
10 i f not "STOP" i n str(message): # i f not t o s t o p . . .
11 r e p l y = str(message.decode())+’*’ # append " * " t o message
12 socket.send(reply.encode()) # send i t away (encoded)
13 else:
14 break # break o u t o f loop and end
15
16 def c l i e n t ( ) :
17 context = zmq.Context()
18 socket = context.socket(zmq.REQ) # c re a te request socket
19
20 socket.connect("tcp://localhost:12345" ) # block u n t i l connected
21 socket.send(b"Hello world") # send message
22 message = socket.recv() # block u n t i l response
23 socket.send(b"STOP") # t e l l s e r v e r t o stop
24 print(message.decode()) # p r i n t re s u l t

Advanced transient messaging


Communication Message-oriented communication

Publish-subscribe

1 import multiprocessing
2 import zmq, time
3
4 def s e r v e r ( ) :
5 context = zmq.Context()
6 socket = context.socket(zmq.PUB) # c re a t e a publisher socket
7 socket.bind("tcp:// * :12345") # bind s o c k e t t o t h e address
8 while True:
9 time.sleep(5) # w ait every 5 seconds
10 t = "TIME " + time.asctime()
11 socket.send(t.encode()) # publish t h e current time
12
13 def c l i e n t ( ) :
14 context = zmq.Context()
15 socket = context.socket(zmq.SUB) # c re a te a subscriber socket
16 socket.connect("tcp://localhost:12345") # connect t o t h e server
17 socket.setsockopt(zmq.SUBSCRIBE, b"TIME") # subscribe t o TIME messages
18
19 f o r i i n range(5): # Five i t e r a t i o n s
20 time = s oc ke t. re c v() # re c e iv e a message re l a te d t o subscription
21 print(time.decode()) # p r i n t t h e re s u l t

Advanced transient messaging


Communication Message-oriented communication

Pipeline

1 def producer():
2 context = zmq.Context()
3 socket = context.socket(zmq.PUSH) # c re a te a push socket
4 socket.bind("tcp://127.0.0.1:12345") # bind s o c k e t t o address
5
6 while True:
7 workload = random.randint(1, 100) # compute workload
8 socket.send(pickle.dumps(workload)) # send workload t o worker
9 time.sleep(workload/NWORKERS) # balance production by waiting
10
11 def worker(id):
12 context = zmq.Context()
13 socket = context.socket(zmq.PULL) # c re a t e a p u l l socket
14 socket.connect("tcp://localhost:12345" ) # connect t o t h e producer
15
16 while True:
17 work = pickle.loads(socket.recv()) # re c e iv e work from a source
18 time.sleep(work) # pretend t o work

Advanced transient messaging


Communication Message-oriented communication

MPI: When lots of flexibility is needed


Representative operations

Operation Description
MPI BSEND Append outgoing message to a local send buffer
MPI SEND Send a message and wait until copied to local
or remote buffer
MPI SSEND Send a message and wait until transmission starts
MPI Send a message and wait for reply
SENDRECV
MPI ISEND Pass reference to outgoing message, and continue
MPI ISSEND Pass reference to outgoing message, and wait
until receipt starts
MPI RECV Receive a message; block if there is none
MPI IRECV Check if there is an incoming message, but do
not block

Advanced transient messaging


Communication Message-oriented communication

Queue-based messaging
Four possible combinations

Message-oriented persistent communication


Communication Message-oriented communication

Message-oriented middleware
Essence
Asynchronous persistent communication through support of middleware-level
queues. Queues correspond to buffers at communication servers.

Operations

Operation Description
PUT Append a message to a specified queue
GET Block until the specified queue is nonempty,
and remove the first message
POLL Check a specified queue for messages, and
remove the first. Never block
NOTIFY Install a handler to be called when a message is
put into the specified queue

Message-oriented persistent communication


Communication Message-oriented communication

General model
Queue managers
Queues are managed by queue managers. An application can put
messages only into a local queue. Getting a message is possible by
extracting it from a local queue only ⇒ queue managers need to route
messages.

Routing

Message-oriented persistent communication


Communication Message-oriented communication

Message broker
Observation
Message queuing systems assume a common messaging protocol: all
applications agree on message format (i.e., structure and data
representation)

Broker handles application heterogeneity in an MQ system


• Transforms incoming messages to target format
• Very often acts as an application gateway
• May provide subject-based routing capabilities (i.e., publish-
subscribe capabilities)

Message-oriented persistent communication


Communication Message-oriented communication

Message broker: general architecture

Message-oriented persistent communication


Communication Message-oriented communication

Example: AMQP
Lack of standardization
Advanced Message-Queuing Protocol was intended to play the same role
as, for example, TCP in networks: a protocol for high-level messaging with
different implementations.

Basic model
Client sets up a (stable) connection, which is a container for serveral
(possibly ephemeral) one-way channels. Two one-way channels can form a
session. A link is akin to a socket, and maintains state about message
transfers.
Example: Advanced Message Queuing Protocol (AMQP)
Communication Message-oriented communication

Example: AMQP-based producer

1 import rabbitpy
2
3 def producer():
4 connection = rabbitpy.Connection() # Connect t o RabbitMQ server
5 channel = connection.channel() # Create new channel on t h e connection
6
7 exchange = rabbitpy.Exchange(channel, ’exchange’) # Create an exchange
8 exchange.declare()
9
10 queue1 = rabbitpy.Queue(channel, ’example1’) # Create 1 s t queue
11 queue1.declare()
12
13 queue2 = rabbitpy.Queue(channel, ’example2’) # Create 2nd queue
14 queue2.declare()
15
16 queue1.bind(exchange, ’example-key’) # Bind queue1 t o a s i n g l e key
17 queue2.bind(exchange, ’example-key’) # Bind queue2 t o t h e same key
18
19 message = rabbitpy.Message(channel, ’Te s t message’)
20 message.publish(exchange, ’example-key’) # Publish t h e message using t h e key
21 exchange.delete()

Example: Advanced Message Queuing Protocol (AMQP)


Communication Message-oriented communication

Example: AMQP-based consumer

1 import rabbitpy
2
3 def consumer():
4 connection = rabbitpy.Connection()
5 channel = connection.channel()
6
7 queue = rabbitpy.Queue(channel, ’example1’)
8
9 # While t h e re are messages i n t h e queue, f e t c h them using Basic.Get
10 while len(queue) > 0 :
11 message = queue.get()
12 print(’Message Q1: %s’ % message.body.decode())
13 message.ack()
14
15 queue = rabbitpy.Queue(channel, ’example2’)
16
17 while len(queue) > 0 :
18 message = queue.get()
19 print(’Message Q2: %s’ % message.body.decode())
20 message.ack()

Example: Advanced Message Queuing Protocol (AMQP)


Communication Multicast communication

Application-level multicasting
Essence
Organize nodes of a distributed system into an overlay network and use
that network to disseminate data:
• Oftentimes a tree, leading to unique paths
• Alternatively, also mesh networks, requiring a form of routing

Application-level tree-based multicasting


Communication Multicast communication

Application-level multicasting in Chord


Basic approach
1. Initiator generates a multicast identifier mid .
2. Lookup succ(mid ), the node responsible for mid .
3. Request is routed to succ(mid ), which will become the root.
4. If P wants to join, it sends a join request to the root.
5. When request arrives at Q:
• Q has not seen a join request before ⇒ it becomes forwarder; P
becomes child of Q. Join request continues to be forwarded.
• Q knows about tree ⇒ P becomes child of Q. No need to
forward join request anymore.

Application-level tree-based multicasting


Communication Multicast communication

ALM: Some costs


Different metrics

• Link stress: How often does an ALM message cross the same
physical link? Example: message from A to D needs to cross ⟨Ra, Rb⟩
twice.
• Stretch: Ratio in delay between ALM-level path and network-level
path. Example: messages B to C follow path of length 73 at ALM, but
47 at network level ⇒ stretch = 73/47.

Application-level tree-based multicasting


Communication Multicast communication

Flooding
Essence
P simply sends a message m to each of its neighbors. Each neighbor
will forward that message, except to P, and only if it had not seen m
before.

Flooding-based multicasting
Communication Multicast communication

Flooding
Essence
P simply sends a message m to each of its neighbors. Each neighbor
will forward that message, except to P, and only if it had not seen m
before.

Variation
Let Q forward a message with a certain probability pflood , possibly even
dependent on its own number of neighbors (i.e., node degree) or the degree
of its neighbors.

Flooding-based multicasting
Communication Multicast communication

Epidemic protocols
Assume there are no write–write conflicts
• Update operations are performed at a single server
• A replica passes updated state to only a few neighbors
• Update propagation is lazy, i.e., not immediate
• Eventually, each update should reach every replica

Two forms of epidemics


• Anti-entropy: Each replica regularly chooses another replica at
random, and exchanges state differences, leading to identical states at
both afterwards
• Rumor spreading: A replica which has just been updated (i.e., has been
contaminated), tells several other replicas about its update
(contaminating them as well).

Gossip-based data dissemination


Communication Multicast communication

Anti-entropy
Principle operations
• A node P selects another node Q from the system at random.
• Pull: P only pulls in new updates from Q
• Push: P only pushes its own updates to Q
• Push-pull: P and Q send updates to each other

Observation
For push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes
(round = when every node has taken the initiative to start an exchange).

Gossip-based data dissemination


Communication Multicast communication

Anti-entropy: analysis
Basics
Consider a single source, propagating its update. Let pi be the probability
that a node has not received the update after the ith round.

Analysis: staying ignorant


• With pull, pi + 1 = (pi )2: the node was not updated during the ith round
and should contact another ignorant node during the next round.
• With push, pi +1 =i p N−1 1 ) (N−1
)(1−p i ) ≈ p e −1 (for small p and
i i
(1 −
N): the node was ignorant during the ith round large and no updated node
chooses to contact it during the next round.
• With push-pull: (pi )2 · (pi e −1 )

Gossip-based data dissemination


Communication Multicast communication

Anti-entropy performance

Gossip-based data dissemination


Communication Multicast communication

Rumor spreading
Basic model
A server S having an update to report, contacts other servers. If a server
is contacted to which the update has already propagated, S stops
contacting other servers with probability pstop.

Observation
If s is the fraction of ignorant servers (i.e., which are unaware of the update),
it can be shown that with many servers

s = e−(1/pstop+1)(1−s)

Gossip-based data dissemination


Communication Multicast communication

Formal analysis
Notations
Let s denote fraction of nodes that have not yet been updated (i.e.,
susceptible; i the fraction of updated (infected) and active nodes; and r the
fraction of updated nodes that gave up (removed).

From theory of epidemics

(1) ds/dt = −s · i
(2) di/dt = s · i −pstop · (1 − s) ·
pstop
⇒ di/ds i −(1 + pstop) + s
⇒ i (s) = −(1 + pstop) · s + pstop · ln(s) +
= C
Wrap up
i (1) = 0 ⇒ C = 1 + pstop ⇒ i (s) = (1 + pstop) · (1 − s ) + pstop · ln(s). We
are looking for the case i (s) = 0, which leads to s = e−(1/pstop+1)(1−s)

Gossip-based data dissemination


Communication Multicast communication

Rumor spreading
The effect of stopping
Consider 10,000 nodes
1/pstop s Ns
1 0.203188 2032
2 0.059520 595
3 0.019827 198
4 0.006977 70
5 0.002516 25
6 0.000918 9
7 0.000336 3

Gossip-based data dissemination


Communication Multicast communication

Rumor spreading
The effect of stopping
Consider 10,000 nodes
1/pstop s Ns
1 0.203188 2032
2 0.059520 595
3 0.019827 198
4 0.006977 70
5 0.002516 25
6 0.000918 9
7 0.000336 3

Note
If we really have to ensure that all servers are eventually updated,
rumor spreading alone is not enough

Gossip-based data dissemination


Communication Multicast communication

Deleting values
Fundamental problem
We cannot remove an old value from a server and expect the removal to
propagate. Instead, mere removal will be undone in due time using
epidemic algorithms

Solution
Removal has to be registered as a special update by inserting a
death certificate

Gossip-based data dissemination


Communication Multicast communication

Deleting values
When to remove a death certificate (it is not allowed to stay for ever)
• Run a global algorithm to detect whether the removal is known
everywhere, and then collect the death certificates (looks like
garbage collection)
• Assume death certificates propagate in finite time, and associate a
maximum lifetime for a certificate (can be done at risk of not reaching
all servers)

Note
It is necessary that a removal actually reaches all servers.

Gossip-based data dissemination

You might also like