Distributed Systems Introduction
Distributed Systems Introduction
3 4
The Plan: Theoretical Stuff Distributed Systems: Intro
Theoretical Foundations Distributed System:
– Autonomous Computers + Network distributed
– Fundamental Limitations computing
– Communication via message-passing
– Causality – No shared memory mobile
– Logical clocks (logical, vector, matrix clocks) computing
– No global clock
– Global states – Range:
• Two PC’s connected by $25 worth of networking
Algorithms for distributed mutual exclusion hardware
Distributed Shared Memory (DSM) • Beowulf clusters: racks (or stacks) of PCs
connected by high-speed networking
Topics in fault tolerance and reliability • Millions of computers, connected by diverse
networking technologies ranging from modems to
gigabit connections (the Internet)
5 6
Network Operating Systems DOS vs. NOS
Network operating systems extend (Virtual) Transparency: The ability to see
sequential operating systems to provide: what you want to see, and not see what
– Resource sharing (files, devices, …)
you consider to be of no interest
– Interoperability (email, remote command An exaggeration:
execution, remote login…)
lim NetworkOS DistributedOS
User is generally aware of machine transparency
boundaries (do “this” on “that” machine)
DOS’s allow you to “slide” toward a “one
large machine” view of the network of
computers
7 8
Unix/”Distributed Unix” Distributed Unix, Cont.
Unix: pervasive when cheap network Problems with distributed Unix, though:
technologies became available (1970’s), – Monolithic kernels
so a logical choice for building – Scattered process information
– Progress migration, checkpointing difficult
“distributed systems”
These problems stem from taking a tool in
Extensions to Unix which provided wide use and “molding” it to fit a new need
interprocess communication were a Why not “kill” Unix and use a modern, “this
principal building block of early is what I do well” distributed OS?
distributed systems Commercial pressures.
Unix sockets API Too much code already in use.
9 10
Desirable Characteristics Desirable: Fault Tolerance
These are the “selling points” for distributed Fault Tolerance
systems, answers to the question “What can – Higher availability and more resilience to faults
they provide for me?” than uniprocessor/shared memory multiprocessor
Scalability solutions
– Want to be able to pile on more hardware as – Redundancy is the key—it’s present in all fault
needed to tackle bigger problems without rewriting tolerance schemes, e.g.,
applications • replicated servers (e.g., for file or database
– E.g., in Parallel Virtual Machine (PVM), add more storage)
processors to share workload of smaller pieces of • snapshots of application state for recovery
a large computation
11 12
Desirable: Transparency Desirable: Concurrency
(Virtual) Transparency Concurrency
– Want distributed system to appear to be one big, – Distributed systems can bring a lot of hardware to
seamless machine, however... bear on difficult or time-consuming applications
– ‘A distributed system is a system on which I – MIMD situations (e.g. file server, compute server,
cannot get any work done because a machine I’ve web server machines)
never heard of is down.’ • (Multiple Instruction, Multiple Data—means distributed
-- L. Lamport software components are different)
– Fault tolerance/transparency must be considered – SIMD situations (e.g. parallel rendering
together applications for computer graphics)
– Don’t want “transparent” components failing and • (Single Instruction, Single Data—means lots of instances
of a software component that performs a specific
preventing work from getting done operation)
13 14
Desirable: Resource Sharing Desirable: “Openness”
Resource Sharing “Openness”
– Resource: display, printer, disk, CD-ROM, – Heterogeneous hardware and software can be used to
build systems and solve complicated problems
applications – Published protocols and interfaces make putting
– “One Cadillac instead of 12 Yugos” together the diverse pieces possible
• Which protocols are spoken?
– Distributed systems allow resources to be • What data formats are used?
shared freely (or not), regardless of their • Where are you?
location in the system – Example: WWW. Diverse machines “speak” a
– Can drastically reduce cost, improve utilization standard protocol: HTTP. “Open” extensions include
CGI (Common Gateway Interface)
of resources, reduce administration nightmare – Example: Universal Plug and Play (UPnP), Service
– Security issues must be considered; security Location Protocol (SLP) for building highly dynamic
issues arise in distributed systems which don’t client/server systems
exist in isolated systems
15 16
Advantages in Brief A Few of the Challenges
The potential for building large, scalable, No shared memory =>
fault-tolerant “computers” with huge – an unfamiliar programming model
– application state is spread around
resources from commodity machines – existing algorithms may be inappropriate
Commodity “supercomputers” – object-based distributed computing helps
In many circumstances, individual machines No perfectly synchronized time source =>
can still be used for traditional tasks – difficult to order events
– difficult to say “do something NOW” to the entire
– E.g., no reason individual users couldn’t read mail system
on one node of the Beowulf cluster… We’re stuck with the speed of light (?)
Web-based supercomputing More complicated failure modes than single
machines!
Much easier for things to be “half broken”
17 18
“Failure” has Many Meanings Models of Distributed Systems
Halting failure: component simply stops An asynchronous distributed system is a
Fail-stop: halting failures with ability to detect theoretical model of a network with no notion
failures of time
Omission failure: failure to send/recv message – no bound on message transmission time
Network failure: network link breaks – no bound on computation time
Network partition: network fragments into two or A synchronous distributed system, in contrast,
more disjoint subnetworks has bounds, algorithms operate in “rounds”
Timing failure: action early/late; clock fails, etc. Our model (most of the time) will be close to
Byzantine failure: arbitrary “malicious” behavior the asynchronous model
– This one models random, worst-case behavior
19 20
Topic Switch: Networking Basics Network Protocols
Network programming Protocol: Set of rules and data formats
Goal: be able to implement
which make communication possible
networked/distributed software rather A “language” for communication
than just talk about it Protocols are typically constructed using
– solve real problems
layers, with more abstract services
provided by higher-level layers
– design client/server protocols
Bottom layer(s) are the actual network
– evaluate proposed solutions experimentally
hardware
Networking Performance 21 22
Parameters
OSI Protocol Stack
Latency - time to transfer “empty” message OSI - Open Systems Interconnect
Bandwidth or data transfer rate - how many Application - application interfaces (httpd, ftp)
bits/sec can be transferred (how thick the “pipe” is)
Presentation - network representation for
data
message_transfer_time = latency +
msg_length / data_transfer_rate Session - connections, encryption
Transport - message packets
Consider: a modem connection vs. a van of Network - network-specific packets, routing
magnetic tapes traveling an interstate highway Data Link - transmission of packets between
QoS: Quality of Service (bandwidth/latency “directly” connected machines + error issues
guarantees for particular connections) Physical - hardware (“I can touch it”)
23 24
Communication Through Layers TCP/IP Protocol Stack
Application Application ISO stack is good as a model for understanding networks
Layers in “real” network stacks aren’t so differentiated
Presentation Presentation TCP/IP stack has won primarily because of the free
implementation shipped in early versions of BSD Unix
Session Session Addresses above IP are (port, address) combinations
Transport Transport
Application Application
Network Network
Transport UDP TCP
Data Link Data Link Network IP
Physical Physical Physical
25 26
Transport Protocols Transport Protocols, Cont.
UDP (User Datagram Protocol) TCP (Transmission Control Protocol)
– Connectionless – Connection-oriented
– Fast setup – Byte stream-oriented
– Easy one-to-many communication – Slower setup
– Datagram-oriented (fixed size chunks of data) – Consumes file handles: one per connection
– Packet reordering – Flow control, automatic retransmission
– Packet loss (no flow control, bad packets dropped) • No packet reordering (delivery is FIFO)
– Packet duplication • No packet loss
– (Absolute) maximum datagram length: 64K • No duplication
– Usable maximum is more complicated – Theoretically “no” limit on size of objects that
– 8K is generally safe for modern systems can be dumped into a TCP stream
– In practice, limits exist
27 28
Unix Sockets: TCP and UDP from a
Programming Perspective
Unix Sockets, Cont.
First the standard Unix system calls for C, then from Unix C Client:
a Java perspective – int socket(PF_INET | PF_UNIX,
Unix C Server: SOCK_STREAM | SOCK_DGRAM,
– int socket(PF_INET | PF_UNIX, …)
SOCK_STREAM | SOCK_DGRAM, – int connect(socket, remoteaddr)
…)
Unlike the server, the client typically doesn’t
– int bind(socket, localaddr …)
care which port; the system selects one
– int listen(socket, queuelength)
Then data is transmitted and received (for
– int accept(socket, remoteaddr)
both client and server) with:
– select( … ) allows a set of sockets to be
checked to determine if input is available – write(socket, message, len, …)
– Allows service of multiple clients without – read(socket, buffer, len, …)
multithreading
29 30
void ServeEchoClients(int port) { SERVER listen(sock, 15);
highest = sock;
// 15 is queue length for incoming connections
int i, found;
int alive; // client still around after read? memset((char *) &connected, 0, sizeof(connected));
int sock; // socket for listening printf("echo_server: Listening...\n");
int newconn; // socket for new client while (1) {
int highest; // highest handle in use; needed for select() FD_ZERO(&socks); // initialize set of sockets to monitor
int ready; // number of ready sockets (from select() call) FD_SET(sock,&socks); // always care about listening socket
int connected[100]; // handle only 100 simultaneous clients. // also care about sockets for connected clients
fd_set socks; // sockets ready for reading, for select() call for (i=0; i < 100; i++) {
struct sockaddr_in server_address; // structure for bind() call if (connected[i] != 0) {
int reuse=1; // avoid port in use problems FD_SET(connected[i],&socks);
if (connected[i] > highest) {
// initialize sockets stuff highest = connected[i];
sock = socket(AF_INET, SOCK_STREAM, 0); }
if (sock < 0) { }
Shutdown("echo_server: socket() call failed. Can't continue."); }
} ready = select(highest+1, &socks, NULL, NULL, NULL);
setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse)); if (ready < 0) {
memset((char *) &server_address, 0, sizeof(server_address)); Shutdown("echo_server: select() call failed. Can't continue.");
server_address.sin_family = AF_INET; }
server_address.sin_addr.s_addr = htonl(INADDR_ANY);
server_address.sin_port = htons(port);
if (bind(sock, (struct sockaddr *)&server_address,
sizeof(server_address)) < 0 ) {
Shutdown("echo_server: bind() call failed. Can't continue.");
}
31 32
// see who's knocking at our (socket) door...
// check connected clients, deal with one line for each ready client
if (FD_ISSET(sock,&socks)) { for (i=0; i < 100; i++) {
// new client if (FD_ISSET(connected[i],&socks)) {
newconn = accept(sock, NULL, NULL); alive = ReadAndEcho(connected[i]);
if (newconn < 0) { if (! alive) {
printf("** FAILED TO CONNECT TO NEW CLIENT **\n"); close(connected[i]);
} connected[i] = 0; // client hung up
else { }
// find a home for new client socket }
found=0; }
for (i=0; i < 100 && ! found; i++) { }
if (connected[i] == 0) { }
printf("echo_server: Connected to new client.\n");
connected[i] = newconn;
found=1;
}
}
if (! found) {
printf("echo_server: OVERLOADED.\n");
close(newconn);
}
}
}
33 34
void EchoClient(char *ip, int port) { CLIENT
int ReadAndEcho(int handle) {
struct sockaddr_in them; // address of server
char c=-1; int sock; // socket for communication w/ server
int count=1; int err;
int ret=1; int len;
char buf[512];
printf("echo_server: Reading, hoping for \\n...\n"); char c;
count = read(handle, &c, 1); // read one char int count;
while (c != '\n' && count > 0) { struct hostent *remip; // will use this one...
count = write(handle, &c, 1); // echo it unsigned long remip2; // or this one as the binary remote addr
if (count) {
bzero((char *)&them, sizeof(them));
putchar(c); them.sin_family = AF_INET;
count=read(handle, &c, 1); // read one char them.sin_port = htons(port); // hton*() convert integer byte order
}
} // try inet_addr() call first; some unixes freak if we provide a
if (count == 0) { // dotted numeric IP address to gethostbyname()
printf("echo_server: Client hung up.\n"); remip2=inet_addr(ip);
ret=0; if (remip2 <= 0) {
} remip=gethostbyname(ip);
else { // echo final \n if (remip == NULL) {
herror(NULL);
count = write(handle, &c, 1); Shutdown("Couldn't initialize connection parameters.");
putchar('\n'); }
} }
printf("echo_server: Returning to listening state.\n");
return ret;
35 36
printf("echo_client: \".\" on a line by itself disconnects.\n");
if (remip2 <= 0) { gets(buf);
memcpy(&(them.sin_addr.s_addr), remip->h_addr, remip->h_length); while (buf[0] != '.') {
} len=strlen(buf);
else { buf[len++]='\n'; // add newline
them.sin_addr.s_addr = remip2; buf[len]=0;
} write(sock, buf, strlen(buf)); // transmit
if ((sock=socket(AF_INET, SOCK_STREAM, 0)) < 0) {
printf("echo_client: socket() failed with error %d.\n", sock); // get response one char at a time
Shutdown("Can't continue."); printf("echo_client: Service response:\"");
} count = read(sock, &c, 1);
while (c != '\n' && count > 0) {
if ((err = connect(sock, (struct sockaddr*)&them, putchar(c);
sizeof(struct sockaddr_in)))) { count=read(sock, &c, 1); // read one char
printf("echo_client: connect() failed with error %d.\n", err); }
Shutdown("Can't continue."); printf("\"\n");
} if (count == 0) {
printf("echo_client: Server hung up. How rude!\n");
buf[0]='.';
}
gets(buf);
}
close(sock);
} // end of EchoClient
37 38
Java IPC Simple TCP Client in Java
TCP and UDP socket protocols have
try {
separate interfaces in Java s=new Socket(servhostname, port);
More abstract than standard Unix out=new DataOutputStream(s.getOutputStream());
in=new DataInputStream(s.getInputStream());
interface, but interoperable and almost as }
powerful catch (Exception e) {
/* error */
Far more portable }
Simple Java TCP Client // do standard I/O operations on ‘in’ and ‘out’
39 40
Simple TCP Server in Java Limitations
try {
servsock=new ServerSocket(listenport); Simple client/server are single threaded
}
catch (Exception e) { /* error */ }
... Affects server most, since it can only
while ( … ) {
try { service only client at a time
cl=servsock.accept()
out=new DataOutputStream(cl.getOutputStream());
in=new DataInputStream(cl.getInputStream()); Other clients are blocked while server is
// do standard I/O operations on ‘in’ and ‘out’
… busy
cl.close();
}
catch (Exception e) {
“Bad” client can tie up server forever
/* error for this client connection */
}
} Java does NOT support select() for
sockets
41 42
Simple UDP Client/Server in Java Multithreading for the TCP Server
byte[] buf = new byte[MAXDGRAMSIZE]; public class MTServer {
sock=new DatagramSocket(myport); public static void main(String[] args) {
while (...) { int port=2345;
try { final int MaxClients = 35;
// incoming datagram ...
DatagramPacket ingram = new ServerSocket serversocket = null;
DatagramPacket(buf, buf.length);
try {
sock.receive(ingram);
…
serversocket = new ServerSocket(port,
// outgoing datagram MaxClients);
DatagramPacket outgram = new DatagramPacket(buf, buf.length, while (true) {
theiraddr, theirport); Socket sock = serversocket.accept();
sock.send(outgram); ServerThread thr = new ServerThread(sock ...);
} thr.start();
catch (UnknownHostException uhe) { … } }
catch (IOException ioe) { } }
} catch (Exception e) { /* server must die */ }
}
43 44
Multithreading, Cont. Multithreading, Cont.
public class ServerThread extends Thread { public void run() {
private Socket sock; boolean bye=false;
try {
private DataInputStream in=null; while (!bye) {
private DataOutputStream out=null; String command = in.readUTF();
public ServerThread(Socket sock, ... ) { if (command.equals(“BYE”) {
this.sock = sock; bye = true; …
... }
try { else if (…) {
…
in=new DataInputStream(this.sock. }
getInputStream()); }
out=new DataOutputStream(this.sock. catch (Exception e) {
getOutputStream()); /* death for this client connection */
} }
catch (Exception e) { /* oops */ } finally {
/* cleanup for this client */
} out.close(); in.close();
}
}
45 46
Robust TCP Java, TCP, and Robust Client/Server
47 48
TCP: Maintaining Control with Timeouts Server Side “Probing”
try { // Always detects broken connection
gotstr = false; try {
while (! gotstr) { gotstr = false;
try { while (! gotsr) {
sock.setSoTimeout(90000); // 90s timeout try {
sock.setSoTimeout(90000);
s = input.readUTF(); s = input.readUTF();
gotstr=true; gotstr=true;
} }
catch (InterruptedIOException ie) { catch (InterruptedIOException ie) {
// at least I’m not stuck! // timeout...tempt fate by writing
// can do other processing here System.out.println(”Connection check”);
} // client must ignore
} output.writeUTF("$!$");
System.out.println("Connection OK");
} }
catch (Exception e) { }
System.out.println(”Broken connection”); }
} catch (Exception e) {
System.out.println(”Broken connection”);
}
49 50
Client Side “Probing” Higher-level Communication
// Clients wants to to read a string...
// Assumes that a response is expected w/in 30s MOM (Message-oriented Middleware)
boolean significant=false; Message-passing libraries
while (! significant) {
socket.setSoTimeout(30000); // 30s timeout – PVM
try {
s = input.readUTF(); – MPI
significant = (! s.equals("$!$"));
} Spaces
catch (InterruptedIOException ie) {
// assume connection is broken – Linda
throw new IOException("Server is down?");
} Object-based approaches
}
– RMI
– CORBA
51 52
Message Oriented Middleware MOM: Guts
Weakens link between the client and MOM system implements a queue
server… between clients and servers
Client sees an asynchronous interface
Each sends to other by enqueuing
Request is sent independent of reply messages on one or more queues
Reply must be dequeued from a reply
Queues can have names, for
queue later
“subject” of the queue
Client and server do not need to be
running at the same time. Simple API
53 54
MOM: Guts, Cont. Other MOM Issues?
Administrative overhead
MOM API
client MOM – Management of the queues
request queue – Replication
– Load Balancing
reply Handling runaway applications that
MOM API flood queue with requests or fail to
collect responses
server
Cleanup after crashes
Performance
55 56
PVM/MPI: Message Passing Libraries PVM Schematic pvmd
59 60
Linda: Tuple Spaces Linda Operations
Linda is a small “coordination” language out(t) puts a tuple t into the “bag”
for distributed systems development with a in(t) to get a tuple t from the bag
few simple operations
rd(t) to read (w/o removing) a tuple t
Extends a traditional language like C or
FORTRAN or Java eval(t) to create a process to evaluate the
Easy for programmers since it isn’t tuple t
necessary to learn an entire language from Predicate primitives (newer):
scratch – inp, rdp test for presence, behave like
Appropriate for “bag of tasks” problems blocking versions if they return true
– e.g., many rendering algorithms in computer
graphics
61 62
Simple Linda Example Java is a Natural for Linda-ness
int main(int argc, char *argv[]) {
Instead of tuple spaces, object spaces…bags
int nworker, j, hello();
nworker=atoi (argv[1]); of objects
for (j=0; j < nworker; j++) Operations insert and remove arbitrary
eval ("worker", hello(j));
for(j=0; j < nworker; j++)
objects from the space
in("done" /* , could read other values here */); Retrieve by “name” or by class
printf(“Got responses from all slaves.\n”);
}
Fairly easy to implement because of object
serialization facility
int hello(int i) {
Threaded implementations make powerful
printf("Slave %d reporting.\n",i);
out("done" /* , could return other values here */); extensions like transactions fairly easy
return(0); JavaSpaces…
}
63 64
Java: RMI RMI Compilation/Deployment
RMI: Remote Method Invocation Write interface for server
Java’s OO facility provides a superset of RPC Write server implementation
(Remote Procedure Call) functionality Compile service interface, implementation
RMI provides distributed objects for Java Run rmic on server implementation
– Objects can reside on different machines, and other objects – Generates server_stub, server_skel
can invoke their methods
Client needs only server interface
When searching for objects on a remote host: Server needs server_stub
– rmi://host:port/name rmiregistry runs on server machine
Port defaults to 1099 if omitted Server provides location of server_stub to rmiregistry
(Client will automatically download server_stub upon
a lookup, if necessary)
65 66
RMI Schematic RMI-in-action Schematic
O.a(..)
Client Server_stub Server_skel Client Server_stub
O = (server_type)lookup() rmiregistry
RMIC
Server_skel bind(this)
Server
public interface server
Server_Interface Server
a(..)
67 68
Simple RMI Server Interface Simple RMI Client
// Meaning of life server interface. // Meaning of life client
import java.lang.*;
import java.lang.*; import java.io.*;
import java.rmi.*;
import java.io.*; public class RMI_MOLClient {
import java.rmi.*; public static void main (String args[]) throws Exception {
if (args.length != 1) {
throw new RuntimeException("Usage: java RMIMOLClient <host>");
}
public interface RMI_MOLServerInterface extends Remote { System.setSecurityManager(new RMISecurityManager());
RMI_MOLServerInterface mol = null;
try {
// reveal the meaning of life mol = (RMI_MOLServerInterface)Naming.lookup("rmi://"+ args[0] +
public String reveal() throws java.rmi.RemoteException; "/MOL");
}
} catch (java.rmi.NotBoundException e1) {
System.out.println("No MOL service object bound on that host.");
}
catch (java.rmi.ConnectException e2) {
System.out.println("Either the RMI registry or the MOL service is dead on
that host.");
}
if (mol != null) {
System.out.println(mol.reveal());
}
}
}
69 70
Simple RMI Server Implementation Compile…
// Meaning of life server implementation.
javac RMI_MOLServerInterface.java
import java.lang.*;
import
import
java.io.*;
java.rmi.*; javac RMI_MOLServer.java
import java.rmi.server.UnicastRemoteObject;
}
this.mol = mol;
– generates RMI_MOLServer_stub.class and
public String reveal() throws java.rmi.RemoteException {
return mol; RMI_MOLServer_skel.class
}
public static void main (String args[]) throws Exception {
if (args.length != 1) { Run rmiregistry on server end
throw new RuntimeException("Usage: java RMI_MOLServer <string>");
}
RMI_MOLServer us = new RMI_MOLServer(args[0]);
To run client and server…
Naming.rebind("MOL", us); java -Djava.security.policy="policy.all“ -Djava.rmi.server.codebase=
}
"file://c:/rmi/mol/" RMI_MOLServer "Life ain’t no box of chocolates.”
// shhhhhhhh!
private String mol;
}
java -Djava.security.policy="policy.all" RMI_MOLClient localhost
Un/reliable Communication 71
RPC Design 72
Caller Callee
RPC Implementation 75
RPC Execution
76
Params Local
packing call
Wait
Pack
Return Unpack
results Return
result
Caller Callee
Sun RPC Specification 77
Sun RPC: Server Side 78
Client Side 79
Client Compilation 80
client.c :
Compilation:
#include “unpipc.h” /* local headers */
#include “square.h” /* generated by rpcgen */ cc –c client.c –o client.o
cc –c square_clnt.c –o square_clnt.o
main(int argc, char **argv) { cc –c square_xdr.c –o square_xdr.o
CLIENT *cl; /* defined in rpc.h */ cc –o client client.o square_client.o square_xdr.o libunpipc.a -lnsl
square_in in; Notes:
square_out *outp; Rpcgen: generates square_xdr.c -> XDR data conversions
square_clnt.c -> client stub
if (argc != 3) err_quit(“usage: client <hostname> <integer_value>”); Execution:
cl = Clnt_create(argv[1], SQUARE_PROG, SQUARE_VERS, “tcp”);
in.arg1 = atol(argv[2]); client bsdi 11 -> result: 121
if ((outp = squareproc_1(&in, cl) == NULL) client 209.76.87.90 22 -> result: 484
err_quit(“%s”, clnt_sperror(cl, argv[1])); client nosuchhost 11 -> nosuchhost:RPC:Unknownhost….
printf(“result: %d\n”, outp->res1); Client localhost 11 -> localhost: RPC: Program not registered
exit(0);
}
RPC Client-Server 81
RPC Implementation 82
Reference Book:
client.c square_clnt.c square_xdr.c square_svc.c server.c Unix Network Programming by Richard Stevens, Prentice-Hall.
Runtime
cc library cc
client server