Chapter 3
Chapter 3
1
Divyashikha Sethia (DTU)
Objective
Interprocess communication is at the heart of all distributed
systems
• Protocols
2
Divyashikha Sethia (DTU)
Layered Protocols (1)
2-1
2-2
2-5
•Host runs user agent allowing users to compose, send, and receive e-mail.
•Sending user agent passes mail to mail delivery system which will deliver
the mail to recipient.
•User agent at receiver's side connects to mail delivery system to see
whether any mail has come in. If so, the messages are transferred to the
user agent so that they can beDivyashikha
displayed and read by the user.
Sethia (DTU) 11
Remote Procedure Call
•Programs call procedures located on other machines
•Read does a system call by pushing the parameters onto the stack
•For a read implemented as RPC (e.g., one that will run on the file
server's machine), a different version of read, called a client stub, is put
into the library.
•It does a call to the local operating system.
•Instead of OS giving data, it packs the parameters into a message and
requests that message to be sent to the server.
•After call to send, client stub calls receive, blocking until reply comes
back. Divyashikha Sethia (DTU) 15
RPC –server side
•Server's OS passes it up to server stub (equivalent of a client stub) which
transforms requests coming in over network into local procedure calls.
•Server stub calls receive and remains blocked waiting for incoming
messages.
•Server stub unpacks parameters from message and then calls server
procedure usual way using the stack as if it is being called directly by
client, the parameters and return address are all on stack.
•After work completion server returns result to caller
- Eg case of read, server will fill buffer, pointed to by the second
parameter, with the data. This buffer will be internal to the server stub.
- When server stub gets control back after the call has completed, it
packs result (the buffer) in a message and calls send to return it to the
client.
•Server stub does a call to Divyashikha
receive Sethia
again, to wait for next incoming 16
(DTU)
request.
RPC client receiving results
•Client's OS sees that it is addressed to client process (client stub)
•Client stub inspects message, unpacks result, copies it to its caller, and
returns the usual way.
•Caller gets control following the call to read, and gets data available
unaware of fact that the work was done remotely
2-8
Procedure with two parameters, an integer and a four-character String (5, JILL)
Each parameter requires one 32-bit word (each box is a byte)
a) Original message on the Pentium
Message transferred byte for byte (first byte sent is first received)
Intel Pentium number their bytes from right to left – little endian
b) The message after receipt on the SPARC
Sparc numbers bytes from left to right – big endian
c) The message after being inverted. The little numbers in boxes indicate the
address of each byte 5 interpreted as 83,886,080 (5 x 224)
=>integer is 5 and the string is "LLIJ“ since integers are reversed by different
byte ordering, but strings are not.
22
Divyashikha Sethia (DTU)
Passing References (1)
•Pointer is meaningful only within address space of process in which it
is being used eg: Address 1000 on server might be in middle of
program text
- pass pointer to server stub and special code in server to handle such
pointers eg: send back request to client to provide referenced data
2-12
RPC1 RPC2
2-13
•One-way RPC:
• Variant of Asynchronous RPC
- Client does not wait for ack of server's acceptance of request
- Reliability is not guaranteed
2-22.2
Divyashikha Sethia
Berkeley Sockets (2)
MPI_send Send a message and wait until copied to local or remote buffer
MPI_issend Pass reference to outgoing message, and wait until receipt starts
-local MPI runtime system will remove the message from its local
buffer and take care of transmission as soon as a receiver has called a
receive primitive.
2-26
Get Block until the specified queue is nonempty, and remove the first message
Poll Check a specified queue for messages, and remove the first. Never block.
Notify Install a handler to be called when a message is put into the specified queue.
- Can also be used to automatically start process that will fetch messages
from queue if no process is currently executing. Eg: implementation of a
daemon on receiver's side that continuously monitors queue for incoming
•Message queuing system provides queues to sender and receiver and takes
care of the transfer of messages
2-30
http://www.deakin.edu.au/scitech/sit/dsapp/archive/techreport/TR-C95-
Divyashikha Sethia (DTU) 58
20.pdf
Stream Oriented communication
0 1 2 3 4 5 6 7
Precedence D T R Unused
65
Divyashikha Sethia (DTU)
Datagram Header Format - TOS
0 1 2 3 4 5 6 7
Precedence D T R Unused
•Late 1990’s redefined meaning of the Service Type field for DiffServ for Qos
•First six bits form codepoint – Differentiated services Code point (DSCP), last
2 bits are unused
•Few services and codepoints under it.
•Backward compatibility – when last three bits of the codepoint are zero
precedence bits define class of service xxx000
The default DSCP is 000 000. Class selector DSCPs are values that are backward
compatible with IP precedence. When converting between IP precedence and
DSCP, match the three most significant bits. In other words:
IP Prec 5 (101) maps to IP DSCP 101 000
•Router honors original precedence scheme for high priority traffic for val 6 and 7
even when it is set for Diff services
67
Divyashikha Sethia (DTU)
Differentiated Services (DiffServ)
•Forward error correction (FEC): encode outgoing packets in such a way that
any k out of n received packets is enough to reconstruct k correct packets.
(k<n)
•Types:
1. Between discrete data stream and continuous data stream (eg: slide show
on the Web that has been enhanced with audio)
- Continuous audio stream is to be synchronized with the discrete slides.
2. Between continuous data streams (eg:
- playing movie in which video stream needs to be synchronized with audio -
lip synchronization
- playing stereo audio stream consisting of two substreams, one for each
channel requires two substreams are tightly synchronized for proper playout:
difference of more than 20 usee can distort stereo effect.
Divyashikha Sethia (DTU) 74
Synchronization Mechanisms (1)
2. Nodes organize into mesh network in which every node will have multiple
neighbors and there are multiple paths between every pair of nodes.
- Provides better robustness if connection breaks due to node failure
- looks up succ(mid), which is node responsible for key mid , and promotes
it to become root of multicast tree that will be used to sending data to
interested nodes like say node P
-While being routed to succ(mid) join request will pass several nodes
-If next node is R which has also not seen join req for mid it will become
forwarded. Now Q become child of R
R
/
Q
/
P Divyashikha Sethia (DTU) 81
Multicast tree in Chord
-For any next join request by node X which reaches Q or R it would already
been a forwarder so need not send the join request to root since Q and R are
part of the multicast tree.
1. Link stress is defined per link and counts how often packet crosses same
link . Link stress > 1 => although at ;logical level packet may be forwarded
along two different connections, part of those connections may actually
correspond to same physical link
2. Stretch or Relative Delay Penalty (RDP) measures the ratio in the delay
between two nodes in overlay, and delay that those two nodes would
experience in underlying network
Eg: B to C follow route B ~ Rb ~ Ra ~ Rc ~ C, total cost = 59 units.
But messages would have been routed in the underlying network along the path
B ~ Rb ~ Rd ~ Rc ~ C, total cost = 47 units
RDP = 1.255
Goal=> minimize the aggregated stretch, or similarly, the average RDP
measured over all node pairs Divyashikha Sethia (DTU) 85
Quality of application-level multicast tree
3. Tree cost is a global metric, generally related to minimizing the
aggregated link costs.
Eg: if cost of a link : delay between its two end nodes, then optimization of
tree requires finding a minimal spanning tree in which the total time for
disseminating information to all nodes is minimal.
These metrics can be used to determine the best parent for new node
joining a multicast
•new node issues a join request, it contacts this rendezvous node to obtain
a (potentially partial) list of members
•select best member that can operate as the new node's parent in the tree
92
Divyashikha Sethia (DTU)