Slides 04
Slides 04
Drawbacks
• Focus on message-passing only
• Often unneeded or unwanted functionality
• Violates access transparency
Layered Protocols
Communication Foundations
Low-level layers
Recap
• Physical layer: contains the specification and implementation of bits,
and their transmission between sender and receiver
• Data link layer: prescribes the transmission of a series of bits into a
frame to allow for error and flow control
• Network layer: describes how packets in a network of computers are
to be routed.
Observation
For many distributed systems, the lowest-level interface is that of the
network layer.
Layered Protocols
Communication Foundations
Transport Layer
Important
The transport layer provides the actual communication facilities for most
distributed systems.
Layered Protocols
Communication Foundations
Middleware layer
Observation
Middleware is invented to provide common services and protocols that can
be used by many different applications
• A rich set of communication protocols
• (Un)marshaling of data, necessary for integrated systems
• Naming protocols, to allow easy sharing of resources
• Security protocols for secure communication
• Scaling mechanisms, such as for replication and caching
Note
What remains are truly application-specific protocols... such as?
Layered Protocols
Communication Foundations
Layered Protocols
Communication Foundations
Types of communication
Distinguish...
Types of Communication
Communication Foundations
Types of communication
Transient versus persistent
Types of Communication
Communication Foundations
Types of communication
Places for synchronization
• At request submission
• At request delivery
• After request processing
Types of Communication
Communication Foundations
Client/Server
Some observations
Client/Server computing is generally based on a model of
transient synchronous communication:
• Client and server have to be active at the time of
communication
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and
subsequently processes them
Types of Communication
Communication Foundations
Client/Server
Some observations
Client/Server computing is generally based on a model of
transient synchronous communication:
• Client and server have to be active at the time of
communication
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and
subsequently processes them
Types of Communication
Communication Foundations
Messaging
Message-oriented middleware
Aims at high-level persistent asynchronous communication:
• Processes send each other messages, which are queued
• Sender need not wait for immediate reply, but can do other things
• Middleware often ensures fault tolerance
Types of Communication
Communication Remote procedure call
Conclusion
Communication between caller &
callee can be hidden by using
procedure-call mechanism.
1. Client procedure calls client stub. 6. Server does local call; returns result to stub.
2. Stub builds message; calls local OS. 7. Stub builds message; calls OS.
3. OS sends message to remote OS. 8. OS sends message to client’s OS.
4. Remote OS gives message to stub. 9. Client’s OS gives message to stub.
5. Stub unpacks parameters; calls server. 10. Client stub unpacks result; returns to client.
Conclusion
Client and server need to properly interpret messages, transforming them
into machine-dependent representations.
Parameter passing
Communication Remote procedure call
Parameter passing
Communication Remote procedure call
Conclusion
Full access transparency cannot be realized.
Parameter passing
Communication Remote procedure call
Conclusion
Full access transparency cannot be realized.
Parameter passing
Communication Remote procedure call
Asynchronous RPCs
Essence
Try to get rid of the strict request-reply behavior, but let the client
continue without waiting for an answer from the server.
Variations on RPC
Communication Remote procedure call
Variations on RPC
Communication Message-oriented communication
Client
1 c l a s s C l i e nt :
2 def r u n ( s e l f ) :
3 s = socket(AF_INET, SOCK_STREAM)
4 s.connect((HOST, PORT)) # connect t o s e r v e r (block u n t i l accepted)
5 s.send(b"Hello, world") # send same data
6 d a t a = s.recv(1024) # re c e iv e t h e response
7 print(data) # p r i n t what you received
8 s.send(b"") # t e l l t h e s e r v e r t o close
9 s.close() # c l o s e t h e connection
Alternative: ZeroMQ
Provides a higher level of expression by pairing sockets: one for sending
messages at process P and a corresponding one at process Q for
receiving messages. All communication is asynchronous.
Three patterns
• Request-reply
• Publish-subscribe
• Pipeline
Request-reply
1 import zmq
2
3 def s e r v e r ( ) :
4 context = zmq.Context()
5 socket = context.socket(zmq.REP) # c re a t e re p l y socket
6 socket.bind("tcp:// * :12345") # bind s o c k e t t o address
7
8 while True:
9 message = socket.recv() # w ait f o r incoming message
10 i f not "STOP" i n str(message): # i f not t o s t o p . . .
11 r e p l y = str(message.decode())+’*’ # append " * " t o message
12 socket.send(reply.encode()) # send i t away (encoded)
13 else:
14 break # break o u t o f loop and end
15
16 def c l i e n t ( ) :
17 context = zmq.Context()
18 socket = context.socket(zmq.REQ) # c re a te request socket
19
20 socket.connect("tcp://localhost:12345" ) # block u n t i l connected
21 socket.send(b"Hello world") # send message
22 message = socket.recv() # block u n t i l response
23 socket.send(b"STOP") # t e l l s e r v e r t o stop
24 print(message.decode()) # p r i n t re s u l t
Publish-subscribe
1 import multiprocessing
2 import zmq, time
3
4 def s e r v e r ( ) :
5 context = zmq.Context()
6 socket = context.socket(zmq.PUB) # c re a t e a publisher socket
7 socket.bind("tcp:// * :12345") # bind s o c k e t t o t h e address
8 while True:
9 time.sleep(5) # w ait every 5 seconds
10 t = "TIME " + time.asctime()
11 socket.send(t.encode()) # publish t h e current time
12
13 def c l i e n t ( ) :
14 context = zmq.Context()
15 socket = context.socket(zmq.SUB) # c re a te a subscriber socket
16 socket.connect("tcp://localhost:12345") # connect t o t h e server
17 socket.setsockopt(zmq.SUBSCRIBE, b"TIME") # subscribe t o TIME messages
18
19 f o r i i n range(5): # Five i t e r a t i o n s
20 time = s oc ke t. re c v() # re c e iv e a message re l a te d t o subscription
21 print(time.decode()) # p r i n t t h e re s u l t
Pipeline
1 def producer():
2 context = zmq.Context()
3 socket = context.socket(zmq.PUSH) # c re a te a push socket
4 socket.bind("tcp://127.0.0.1:12345") # bind s o c k e t t o address
5
6 while True:
7 workload = random.randint(1, 100) # compute workload
8 socket.send(pickle.dumps(workload)) # send workload t o worker
9 time.sleep(workload/NWORKERS) # balance production by waiting
10
11 def worker(id):
12 context = zmq.Context()
13 socket = context.socket(zmq.PULL) # c re a t e a p u l l socket
14 socket.connect("tcp://localhost:12345" ) # connect t o t h e producer
15
16 while True:
17 work = pickle.loads(socket.recv()) # re c e iv e work from a source
18 time.sleep(work) # pretend t o work
Operation Description
MPI BSEND Append outgoing message to a local send buffer
MPI SEND Send a message and wait until copied to local
or remote buffer
MPI SSEND Send a message and wait until transmission starts
MPI Send a message and wait for reply
SENDRECV
MPI ISEND Pass reference to outgoing message, and continue
MPI ISSEND Pass reference to outgoing message, and wait
until receipt starts
MPI RECV Receive a message; block if there is none
MPI IRECV Check if there is an incoming message, but do
not block
Queue-based messaging
Four possible combinations
Message-oriented middleware
Essence
Asynchronous persistent communication through support of middleware-level
queues. Queues correspond to buffers at communication servers.
Operations
Operation Description
PUT Append a message to a specified queue
GET Block until the specified queue is nonempty,
and remove the first message
POLL Check a specified queue for messages, and
remove the first. Never block
NOTIFY Install a handler to be called when a message is
put into the specified queue
General model
Queue managers
Queues are managed by queue managers. An application can put
messages only into a local queue. Getting a message is possible by
extracting it from a local queue only ⇒ queue managers need to route
messages.
Routing
Message broker
Observation
Message queuing systems assume a common messaging protocol: all
applications agree on message format (i.e., structure and data
representation)
Example: AMQP
Lack of standardization
Advanced Message-Queuing Protocol was intended to play the same role
as, for example, TCP in networks: a protocol for high-level messaging with
different implementations.
Basic model
Client sets up a (stable) connection, which is a container for serveral
(possibly ephemeral) one-way channels. Two one-way channels can form a
session. A link is akin to a socket, and maintains state about message
transfers.
Example: Advanced Message Queuing Protocol (AMQP)
Communication Message-oriented communication
1 import rabbitpy
2
3 def producer():
4 connection = rabbitpy.Connection() # Connect t o RabbitMQ server
5 channel = connection.channel() # Create new channel on t h e connection
6
7 exchange = rabbitpy.Exchange(channel, ’exchange’) # Create an exchange
8 exchange.declare()
9
10 queue1 = rabbitpy.Queue(channel, ’example1’) # Create 1 s t queue
11 queue1.declare()
12
13 queue2 = rabbitpy.Queue(channel, ’example2’) # Create 2nd queue
14 queue2.declare()
15
16 queue1.bind(exchange, ’example-key’) # Bind queue1 t o a s i n g l e key
17 queue2.bind(exchange, ’example-key’) # Bind queue2 t o t h e same key
18
19 message = rabbitpy.Message(channel, ’Te s t message’)
20 message.publish(exchange, ’example-key’) # Publish t h e message using t h e key
21 exchange.delete()
1 import rabbitpy
2
3 def consumer():
4 connection = rabbitpy.Connection()
5 channel = connection.channel()
6
7 queue = rabbitpy.Queue(channel, ’example1’)
8
9 # While t h e re are messages i n t h e queue, f e t c h them using Basic.Get
10 while len(queue) > 0 :
11 message = queue.get()
12 print(’Message Q1: %s’ % message.body.decode())
13 message.ack()
14
15 queue = rabbitpy.Queue(channel, ’example2’)
16
17 while len(queue) > 0 :
18 message = queue.get()
19 print(’Message Q2: %s’ % message.body.decode())
20 message.ack()
Application-level multicasting
Essence
Organize nodes of a distributed system into an overlay network and use
that network to disseminate data:
• Oftentimes a tree, leading to unique paths
• Alternatively, also mesh networks, requiring a form of routing
• Link stress: How often does an ALM message cross the same
physical link? Example: message from A to D needs to cross ⟨Ra, Rb⟩
twice.
• Stretch: Ratio in delay between ALM-level path and network-level
path. Example: messages B to C follow path of length 73 at ALM, but
47 at network level ⇒ stretch = 73/47.
Flooding
Essence
P simply sends a message m to each of its neighbors. Each neighbor
will forward that message, except to P, and only if it had not seen m
before.
Flooding-based multicasting
Communication Multicast communication
Flooding
Essence
P simply sends a message m to each of its neighbors. Each neighbor
will forward that message, except to P, and only if it had not seen m
before.
Variation
Let Q forward a message with a certain probability pflood , possibly even
dependent on its own number of neighbors (i.e., node degree) or the degree
of its neighbors.
Flooding-based multicasting
Communication Multicast communication
Epidemic protocols
Assume there are no write–write conflicts
• Update operations are performed at a single server
• A replica passes updated state to only a few neighbors
• Update propagation is lazy, i.e., not immediate
• Eventually, each update should reach every replica
Anti-entropy
Principle operations
• A node P selects another node Q from the system at random.
• Pull: P only pulls in new updates from Q
• Push: P only pushes its own updates to Q
• Push-pull: P and Q send updates to each other
Observation
For push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes
(round = when every node has taken the initiative to start an exchange).
Anti-entropy: analysis
Basics
Consider a single source, propagating its update. Let pi be the probability
that a node has not received the update after the ith round.
Anti-entropy performance
Rumor spreading
Basic model
A server S having an update to report, contacts other servers. If a server
is contacted to which the update has already propagated, S stops
contacting other servers with probability pstop.
Observation
If s is the fraction of ignorant servers (i.e., which are unaware of the update),
it can be shown that with many servers
s = e−(1/pstop+1)(1−s)
Formal analysis
Notations
Let s denote fraction of nodes that have not yet been updated (i.e.,
susceptible; i the fraction of updated (infected) and active nodes; and r the
fraction of updated nodes that gave up (removed).
(1) ds/dt = −s · i
(2) di/dt = s · i −pstop · (1 − s) ·
pstop
⇒ di/ds i −(1 + pstop) + s
⇒ i (s) = −(1 + pstop) · s + pstop · ln(s) +
= C
Wrap up
i (1) = 0 ⇒ C = 1 + pstop ⇒ i (s) = (1 + pstop) · (1 − s ) + pstop · ln(s). We
are looking for the case i (s) = 0, which leads to s = e−(1/pstop+1)(1−s)
Rumor spreading
The effect of stopping
Consider 10,000 nodes
1/pstop s Ns
1 0.203188 2032
2 0.059520 595
3 0.019827 198
4 0.006977 70
5 0.002516 25
6 0.000918 9
7 0.000336 3
Rumor spreading
The effect of stopping
Consider 10,000 nodes
1/pstop s Ns
1 0.203188 2032
2 0.059520 595
3 0.019827 198
4 0.006977 70
5 0.002516 25
6 0.000918 9
7 0.000336 3
Note
If we really have to ensure that all servers are eventually updated,
rumor spreading alone is not enough
Deleting values
Fundamental problem
We cannot remove an old value from a server and expect the removal to
propagate. Instead, mere removal will be undone in due time using
epidemic algorithms
Solution
Removal has to be registered as a special update by inserting a
death certificate
Deleting values
When to remove a death certificate (it is not allowed to stay for ever)
• Run a global algorithm to detect whether the removal is known
everywhere, and then collect the death certificates (looks like
garbage collection)
• Assume death certificates propagate in finite time, and associate a
maximum lifetime for a certificate (can be done at risk of not reaching
all servers)
Note
It is necessary that a removal actually reaches all servers.