Dsunit 1 PART2
Dsunit 1 PART2
1. Network of workstations
1. Transparency
Issue: How to achieve the single system image? How to "fool" everyone into
thinking that the collection of machines is a "simple" computer?
Access transparency - local and remote resources are accessed using identical
operations.
Location transparency - users cannot tell where hardware and software resources
(CPUs, files, data bases) are located; the name of the resource shouldn’t encode
the location of the resource.
Migration (mobility) transparency - resources should be free to move from one
location to another without having their names changed.
Replication transparency - the system is free to make additional copies of files
and other resources (for purpose of performance and/or reliability), without the
users noticing.
Example: several copies of a file; at a certain request that copy is accessed which
is the closest to the client.
Concurrency transparency - the users will not notice the existence of other users
in the system (even if they access the same resources).
Failure transparency - applications should be able to complete their task despite
failures occurring in certain components of the system.
Performance transparency - load variation should not lead to performance
degradation.
This could be achieved by automatic reconfiguration as response to changes of
the load; it is difficult to achieve.
I.T.S ENGINEERING COLLEGE, GREATER NOIDA
PLOT NO: 46, KNOWLEDGE PARK 3, GREATER NOIDA
Downloaded by Anuj Negi ([email protected])
lOMoARcPSD|20635294
3. Performance
Several factors are influencing the performance of a distributed system:
The performance of individual workstations.
The speed of the communication infrastructure.
Extent to which reliability (fault tolerance) is provided (replication and
preservation of coherence imply large overheads).
Flexibility in workload allocation: for example, idle processors
(workstations) could be allocated automatically to a user’s task.
4. Scalability
The system should remain efficient even with a significant increase in the number
of users and resources connected:
Cost of adding resources should be reasonable;
Performance loss with increased number of users and resources should be
controlled;
Software resources should not run out (number of bits allocated to
addresses, number of entries in tables, etc.)
5. Heterogeneity
Distributed applications are typically heterogeneous
Different hardware: mainframes, workstations, PCs, servers, etc.;
Different software: UNIX, MS-Windows, IBM OS/2, Real-time OSs, etc.;
6. Openness
One of the important features of distributed systems is openness and flexibility:
Every service is equally accessible to every client (local or remote);
It is easy to implement, install and debug new services;
Users can write and install their own services.
Key aspect of openness:
Standard interfaces and protocols (like Internet communication protocols)
Support of heterogeneity (by adequate middleware, like CORBA)
Availability: If machines go down, the system should work with the reduced
amount of resources.
There should be a very small number of critical resources;
Critical resources: resources which have to be up in order the distributed
system to work.
Key pieces of hardware and software (critical resources) should be
replicated i.e. if one of them fails another one takes up - redundancy.
Data on the system must not be lost, and copies stored redundantly on different
servers must be kept consistent.
The more copies kept, the better the availability, but keeping consistency
becomes more difficult.
Security risks associated with free access because Distributed systems should
allow communication between programs/users/resources on different
computers.
The appropriate use of resources by different users has to be guaranteed.
1. Architectural Models
Issue: How are responsibilities distributed between system components and how
are these components placed?
Client-server model
Peer-to-peer
Variations of the above two:
Proxy server
Mobile code
Mobile agents
Network computers
Thin clients
Mobile devices
Peer-to-Peer Model
2. Interaction Models
Issue: How do we handle time? Are there time limits on process execution,
message delivery, and clock drifts?
Synchronous distributed systems
Asynchronous distributed systems
Main features:
Lower and upper bounds on execution time of processes can be set.
Transmitted messages are received within a known bounded time.
Drift rates between local clocks have a known bound.
Important consequences:
In a synchronous distributed system there is a notion of global physical time
(with a known relative precision depending on the drift rate).
Only synchronous distributed systems have a predictable behavior in terms
of timing. Only such systems can be used for hard real-time applications.
In a synchronous distributed system it is possible and safe to use timeouts
in order to detect failures of a process or communication link.
It is difficult and costly to implement synchronous distributed systems.
3. Fault Models
Issue: What kind of faults can occur and what are their effects?
Omission faults
Arbitrary faults
Timing faults
Omission Faults
A processor or communication channel fails to perform actions it is supposed to
do.
This means that the particular action is not performed!
We do not have an omission fault if:
o An action is delayed (regardless how long) but finally executed.
o An action is executed with an erroneous result.
With synchronous systems, omission faults can be detected by timeouts.
If we are sure that messages arrive, a timeout will indicate that the sending
process has crashed. Such a system has‘fail-stop’ behavior.
Timing Faults
Timing faults can occur in synchronous distributed systems, where time limits are
set to process execution, communications, and clock drifts.
A timing fault occurs if any of these time limits is exceeded.
Network Protocol
Middleware and distributed applications have to be implemented on top of a
network protocol. Such a protocol is implemented as several layers.
In case of the Internet:
Implementation of RMI
Implementation is as follows
Problems:
Time triggered systems: these are systems in which certain activities are
scheduled to occur at predefined moments in time. If such activities are to
be coordinated over a distributed system we need a coherent notion of
time.
Example: time-triggered real-time systems
Maintaining the consistency of distributed data is often based on the time
when a certain modification has been performed.
Example: a make program.
When the programmer has finished changing some source files he starts make;
make examines the times at which all object and source files were last modified
and decides which source files have to be recompiled.
Solutions:
1. Synchronization of physical clocks
[R1]:
CPi is incremented before each event is issued at process Pi: CPi := CPi + 1.
[R2]:
a) When ‘a’ is the event of sending a message ‘m’ from process Pi, then the
timestamp tm = CPi (a) is included in ‘m’. (CPi(a) is the logical clock value
obtained after applying rule R1).
b) On receiving message ‘m’ by process Pj, its logical clock CPj is updated as
follows: CPj := max(CPj, tm).
c) The new value of CPj is used to timestamp the event of receiving message
‘m’ by Pj (applying rule R1).
If ‘a’ and ‘b’ are events in the same process and ‘a’
occurred before ‘b’, then a→b, and (by R1) C(a) <
C(b).
If ‘a’ is the event of sending a message ‘m’ in a
process, and ‘b’ is the event of the same message
‘m’ being received by another process, then a→b,
and (by R2) C(a) < C(b).
If a → b and b → c, then a → c, and (by induction)
C(a) < C(c).
Vector Clocks
Vector clocks give the ability to decide whether two events are causally related or
not by simply looking at their timestamp.
Each process Pi has a clock CPi, which is an integer vector of length n (n is
the number of processes).
The value of CPi is used to assign timestamps to events in process Pi.
CvPi(a) is the timestamp of event a in process Pi.
CPi[i], the ith entry of CPi, corresponds to Pi’s own logical time.
CPi*j+, j ≠ i, is Pi’s "best guess" of the logical time at Pj.
CPi[j] indicates the (logical) time of occurrence of the last event at Pj which
is in a happened before relation to the current event at Pi.
Basic Idea:
A message is delivered to a process only if the message immediately preceding it
(considering the causal ordering) has been already delivered to the process.
Otherwise, the message is buffered.
We assume that processes communicate using broadcast messages. (There exist
similar protocols for non-broadcast communication too.)
The events which are of interest here are the sending of messages ⇒ vector
clocks will be incremented only for message sending.
[R3]:
When a message is delivered at process Pj, its vector clock CPj is updated
according to rule [R1:b] for vector clock implementation.
tm[i] - 1 indicates how many messages originating from Pi precede m.
Step [R2.1] ensures that process Pj has received all the messages
originating from Pi that precede m.
Step [R2.2] ensures that Pj has received all those messages received by Pi
before sending m.
Sending a Message:
Send message M, time stamped tm, along with V_P1 to P2.
Insert (P2, tm) into V_P1. Overwrite the previous value of (P2,t), if any.
(P2,tm) is not sent. Any future message carrying (P2,tm) in V_P1 cannot be
delivered to P2 until tm < tP2.
Delivering a message
If V_M (in the message) does not contain any pair (P2, t), it can be
delivered.
/* (P2, t) exists */ If t ≥ Tp2, buffer the message. (Don’t deliver).
else (t < Tp2) deliver it
Example:
Global States
Problem: How to collect and record a consistent global state in a distributed
system.
Why a problem?
Because there is no global clock (no coherent notion of time) and no shared
memory!
Consider a bank system with two accounts A and B at two different sites; we
transfer $50 between A and B.
In general, a global state consists of a set of local states and a set of states
of the communication channels.
The state of the communication channel in a consistent global state should
be the sequence of messages sent along the channel before the sender’s
Formal Definition
LSi is the local state of process Pi. Beside other information, the local state
also includes a record of all messages sent and received by the process.
We consider the global state GS of a system, as the collection of the local
states of its processes: GS = {LS1, LS2, ..., LSn}.
A certain global state can be consistent or not!
send(Mij) denotes the event of sending message Mij from Pi to Pj;
rec(Mij) denotes the event of receiving message Mij by Pj.
send(Mij) ∈ LSi if and only if the sending event occurred before the local
state was recorded;
rec(Mij) ∈ LSj if and only if the receiving event occurred before the local
state was recorded.
transit(LSi,LSj) = {Mij | send(Mij) ∈LSi ∧ rec(Mij) ∉LSj} inconsistent(LSi,LSj) =
{Mij | send(Mij) ∉LSi ∧ rec(Mij) ∈LSj}
Example
{LS11, LS22, LS32} is inconsistent;
{LS12, LS23, LS33} is consistent;
{LS11, LS21, LS31} is strongly consistent.
Chandy-Lamport Algorithm
A process Pi records its local state LSi and later sends a message ‘m’ to Pj;
LSj at Pj has to be recorded before Pj has received m.
The state SChij of the channel Chij consists of all messages that process Pi
sent before recording LSi and which have not been received by Pj when
recording LSj.
A snapshot is started at the request of a particular process Pi, for example,
when it suspects a deadlock because of long delay in accessing a resource;
Pi then records its state LSi and, before sending any other message, it sends
a token to every Pj that Pi communicates with.
When Pj receives a token from Pi, and this is the first time it received a
token, it must record its state before it receives the next message from Pi.
After recording its state Pj sends a token to every process it communicates
with, before sending them any other message.