Distributed Computing
Distributed Computing
Distributed Computing
UNIT I
INTRODUCTION
1.1 DEFINITION
A distributed system is a collection of independent entities that cooperate to solve a
problem that cannot be individually solved. A distributed system can be characterized as a
collection of mostly autonomous processors communicating over a communication network
Features / Issues of Distributed Systems
1. No common physical clock
2. No shared memory: This is a key feature that requires message-passing for
communication.
3. Geographical separation: The Google search engine is based on the NOW architecture.
4. Autonomy and heterogeneity: The processors are “loosely coupled” in that they have
different speeds and each can be running a different operating system.
5. Communication is hidden from users
6. Applications interact in uniform and consistent way
7. High degree of scalability
8. A distributed system is functionally equivalent to the systems of which it is composed.
9. Resource sharing is possible in distributed systems.
10. Distributed systems act as fault tolerant systems
11. Enhanced performance
Differences between centralized and distributed systems
Middleware
The distributed software is also termed as middleware. The middleware is the distributed
software that drives the distributed system, while providing transparency of heterogeneity at the
platform level.
Distributed execution
A distributed execution is the execution of processes across the distributed system to
collaboratively achieve a common goal. An execution is also sometimes termed a computation
or a run.
The distributed system uses a layered architecture to break down the complexity of
system design.
Here we assume that the middleware layer does not contain the traditional application
layer functions of the network protocol stack, such as http, mail, ftp, and telnet. Various
primitives and calls to functions defined in various libraries of the middleware layer are
embedded in the user program code.
Examples of middleware
1. Object Management Group’s (OMG) common object request broker architecture
(CORBA)
2. Remote procedure call (RPC) mechanism
3. DCOM (distributed component object model)
The main objective of parallel systems is to improve the processing speed. They are
sometimes known as multiprocessor or multi computers or tightly coupled systems. They
refer to simultaneous use of multiple computer resources that can include a single computer with
Symmetric Asymmetric
multiprocessor multiprocessor
Symmetric multiprocessor
When all the processors have equal access to all the peripheral devices, the system is
called a symmetric multiprocessor.
Asymmetric multiprocessor
When only one or a few processors can access the peripheral devices, the system is called
an asymmetric multiprocessor.
Drawback:
1. Shared memory can quickly become a bottleneck for system performances
2. All processors must synchronize on the single bus and memory access.
ii) Multicomputer parallel system-Non Uniform Memory Access NUMA
In NUMA multiprocessor model, the access time varies with the location of the memory
word. Here, the shared memory is physically distributed among all the processors, called local
memories.
1.4.2 TOPOLOGIES
The choice of the interconnection network may affect several characteristics of the
system such as node complexity, scalability and cost etc. The interconnection network form the
topology to access the memory. It may be following:
Omega network
Butterfly network
Torus or 2D Mesh Topology
Hypercube
Array Processors
Omega network
A multistage omega network formed by a 2x2. The 2× 2 switch allows data on either of
the two input wires. Only one data unit can be sent on an output wire at a single step. To avoid
collision of data, many buffering techniques have been proposed.
The Omega network connecting n processors with n memory units has n/2logn switching
elements of size 2 × 2 arranged in log n stages.
Omega interconnection function
j = 2i for 0≤i≤n/2−1
2i+1 for n/2 ≤i ≤n−1
Butterfly network
A butterfly network links multiple computers into a high-speed network. For a butterfly
network with n processor nodes, there need to be n (log n + 1) switching nodes. The generation
of the interconnection pattern between a pair of adjacent stages depends not only on n but also
on the stage numbers. In a stage (s) switch, if the s + 1th MSB of j is 0, the data is routed to the
upper output wire, otherwise it is routed to the lower output wire.
unit in the torus topology is identified using a unique label, with dimensions distinguished as bit
positions.
Hypercube
The path between any two nodes in 4-D hypercube is found by Hamming distance.
Routing is done in hop to hop fashion with each adjacent node differing by one bit label. This
topology has good congestion control and fault tolerant mechanism.
10
11
12
shared memory emulation because the read and write operations are implemented by using
network-wide communication.
13
The processing for the receive primitive completes when the data to be received is copied
into the receiver’s user buffer.
Asynchronous primitives
A Send primitive is said to be asynchronous, if control returns back to the invoking
process after the data item to be sent has been copied out of the user-specified buffer.
It does not make sense to define asynchronous Receive primitives.
For non-blocking primitives, a return parameter on the primitive call returns a system-
generated handle which can be later used to check the status of completion of the call.
14
15
Processor synchrony indicates that all the processors execute in lock-step with
their clocks synchronized.
Since distributed systems do not follow a common clock, this abstraction is implemented
using some form of barrier synchronization to ensure that no processor begins executing the next
step of code until all the processors have completed executing the previous steps of code
assigned to each of the processors.
16
17
processors are synchronized and the clock drift rate between any two processors is
bounded
message delivery times are such that they occur in one logical step or round
upper bound on the time taken by a process to execute a step.
18
If system A can be emulated by system B, denoted A/B, and if a problem is not solvable
in B, then it is also not solvable in A. If a problem is solvable in A, it is also solvable in B.
Hence, in a sense, all four classes are equivalent in terms of computability in failure-free
systems.
The design of distributed systems has numerous challenges. They can be categorized into:
Issues related to system and operating systems design
Issues related to algorithm design
Issues arising due to emerging technologies
The above three classes are not mutually exclusive.
19
20
Logical time is relative and eliminates the overheads of providing physical time for
applications. Logical time can
(i) capture logic and inter-process dependencies
ii) Track the relative progress at each proces.
d. Synchronization/coordination mechanisms
Synchronization is essential for the distributed processes to facilitate concurrent
execution without affecting other processes.
The synchronization mechanisms also involve resource management and
concurrency management mechanisms.
Some techniques for providing synchronization are:
Physical clock synchronization:
Leader
Election
Mutual exclusion
Deadlock detection and resolution: This is done to avoid duplicate
work, and deadlock resolution should be coordinated to avoid
unnecessary aborts of processes.
Termination detection: cooperation among the processes to detect the specific
global state of quiescence.
Garbage collection: Detecting garbage requires coordination among the
processes.
e. Group communication, multicast, and ordered message delivery
A group is a collection of processes that share a common context and collaborate on
a common task within an application domain. Group management protocols are
21
22
23
24
q. Grid computing
Grid computing is deployed to manage resources. For instance, idle CPU cycles of
machines connected to the network will be available to others.
The challenges includes: scheduling jobs, framework for implementing quality of
service, real-time guarantees, security.
r. Security in distributed systems
The challenges of security in a distributed setting include: confidentiality,
authentication and availability. This can be addressed using efficient and scalable
solutions.
25
26
27
28
Consistent state
A distributed snapshot should reflect a consistent state. A global state is consistent if it
could have been observed by an external observer. For a successful Global State, all states must
be consistent:
If we have recorded that a process P has received a message from a process Q,
then we should have also recorded that process Q had actually send that
message.
Otherwise, a snapshot will contain the recording of messages that have been
received but never sent.
The reverse condition (Q has sent a message that P has not received) is allowed.
The notion of a global state can be graphically represented by a cut. A cut represents the
last event that has been recorded for each process.
The history of each process if given by
29
Consistent states:
The states should not violate causality. Such states are called consistent global states and
are meaningful global states.
Inconsistent global states:
They are not meaningful in the sense that a distributed system can never be in an
inconsistent state.
1.13 CUTS OF A DISTRIBUTED COMPUTATION
The notion of a global state can be graphically represented by a cut. A cut represents the
last event that has been recorded for each process.
Cut is pictorially a line slices the space–time diagram, and thus the set of events in the
distributed computation, into a PAST and a FUTURE.
The PAST contains all the events to the left of the cut
FUTURE contains all the events to the right of the cut.
For a cut C, let PAST(C) and FUTURE(C) denote the set of events in the PAST
and FUTURE of C, respectively.
Consistent cut:
A consistent global state corresponds to a cut in which every message received in the
PAST of the cut was sent in the PAST of that cut.
Inconsistent cut:
A cut is inconsistent if a message crosses the cut from the FUTURE to the PAST.
1.14 PAST AND FUTURE CONES OF AN EVENT
In a distributed computation, an event ej could have been affected only by all events ei,
such that ei → ej and all the information available at ei could be made accessible at ej. In other
word ei and ej should have a causal relationship. Let Past(ej) denote all events in the past of ej
in any computation.
The term max(past(ei)) denotes the latest event of process pi that has affected ej.
This will always be a message sent event.
30
A cut in a space-time diagram is a line joining an arbitrary point on each process line that
slices the space-time diagram into a PAST and a FUTURE. A consistent global state corresponds
to a cut in which every message received in the PAST of the cut was sent in the PAST of that
Futurei(ei ) is the set of those events of Future (ej) are the process pi and min(Futurei(ej))
as the first event on process pi that is affected by ej. All events at a process pi that occurred after
Max(Past(ej)) but before min(Futurei(ej)) are concurrent with ej.
31
LOGICAL TIME
Logical clocks are based on capturing chronological and causal relationships of processes and
ordering events based on these relationships.
Three types of logical clock are maintained in distributed systems:
Scalar clock
Vector clock
Matrix clock
Differences between physical and logical clock
Physical Clock Logical Clock
A physical clock is a physical procedureA logical clock is a component for
combined with a strategy for measuringcatching sequential and causal connections
that procedure to record the progression ofin a dispersed framework.
time.
The physical clocks are based on cyclic A logical clock allows global ordering on
processes such as a celestial rotation. events from different processes.
A system of logical clocks consists of a time domain T and a logical clock C. Elements of T
form a partially ordered set over a relation <. This relation is usually called the happened
before or causal precedence.
1.16 A FRAMEWORK FOR A SYSTEM OF LOGICAL CLOCKS
32
In general, every time R1 is executed, d can have a different value, and this value may be
application-dependent. However, typically d is kept at 1 because this is able to identify the time of
each event uniquely at a process, while keeping the rate of increase of d to its lowest level
R2:Each message piggybacks the clock value of its sender at sending time. When a process pi
receives a message with timestamp Cmsg, it executes the following actions:
1. Ci _= max_Ci_Cmsg_;
2. execute R1;
3. deliver the message.
33
3. Event Counting
If event e has a timestamp h, then h−1 represents the minimum logical duration, counted
in units of events, required before producing the event e. This is called height of the event e. h-1
events have been produced sequentially before the event e regardless of the processes that
produced these events.
4. No strong consistency
The scalar clocks are not strongly consistent is that the logical local clock and logical global
clock of a process are squashed into one, resulting in the loss causal dependency informat ion
among events at different processes.
1.18 VECTOR TIME
The time domain is represented by a set of n-dimensional non-negative
integer vectors in vector time.
The system of vector clocks was developed independently by Fidge, Mattern, and
Schmuck. In the system of vector clocks, the time domain is represented by a set of n-
dimensional non-negative integer vectors.
Each process pi maintains a vector vti[1….n], where vti [i] is the local logical clock of pi
and describes the logical time progress at process pi. Vti[j] represents process pi’s latest
knowledge of process pj local time. If vti[j] = x, then process pi knows that local time at process
pj has progressed till x. The entire vector vt i constitutes pi’s view of the global logical time and
is used to timestamp events.
34
Rule 1:
Before executing an event, process pi updates its local logical
time as follows:
ti [i] : ti [i] d d 0
Rule 2:
Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process
pi executes the following sequence of actions:
1. update its global logical time
1 k n : ti [k] : max ti [k], t[k]
2. Execute R1
3. deliver the message m
35
2. Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the vector timestamp of
two events, we can determine if the events are causally related.
3. Event counting
If an event e has timestamp vh, vh[j] denotes the number of events executed by
36
Clocking Inaccuracies
Physical clocks are synchronized to an accurate real-time standard like UTC (Universal
Coordinated Time). Due to the clock inaccuracy discussed above, a timer (clock) is said to be
working within its specification if:
37
Fig 1.30 a) Offset and delay estimation Fig 1.30 b) Offset and delay estimation
between processes from same server between processes from different servers
Let T1, T2, T3, T4 be the values of the four most recent timestamps. The clocks A and B
are stable and running at the same speed. Let a = T1 − T3 and b = T2 − T4. If the network delay
difference from A to B and from B to A, called differential delay, is small, the clock offset and
round trip delay of B relative to A at time T4 are approximately given by the following:
Each NTP message includes the latest three timestamps T1, T2, and T3, while T4 is
determined upon arrival.
38
39
7. What is reliability?
Availability: The resource/ service provided by the resource should be accessible at all times
Integrity: the value/state of the resource should be correct and consistent. Fault-Tolerance:the
ability to recover from system failure
40
Each processor may also run a different operating system and have its own bus
control logic.
Loosely coupled systems are less costly than tightly coupled systems, but are
physically bigger and have a low performance compared to tightly coupled systems.
The individual nodes in a loosely coupled system can be easily replaced and are
usually inexpensive.
They require more power and are more robust and can resist failures.
41
42
43
44
45
46
47
UNIT II
MESSAGE ORDERING AND GROUP COMMUNICATION
FIFO EXECUTIONS
In FIFO, each channel acts as a FIFO message queue. So message ordering is preserved
by a channel.
FIFO logical channels can be realistically assumed when designing distributed
algorithms since most of the transport layer protocols follow connection oriented service.
A FIFO logical channel can be created over a non-FIFO channel by using a separate
numbering scheme to sequence the messages on each logical channel.
The sender assigns and appends a <sequence_num, connection_id> tuple to each
message.
The receiver uses a buffer to order the incoming messages as per the sender’s sequence
numbers, and accepts only the “next” message in sequence.
48
49
SYNCHRONOUS ORDER
When all the communication between pairs of processes uses synchronous send and receives
primitives, the resulting order is the synchronous order.
The synchronous communication always involves a handshake between the receiver and
the sender, the handshake events may appear to be occurring instantaneously and
atomically.
The instantaneous communication property of synchronous executions requires modified
definition of the causality relation because for each (s, r) ∈ T, the send event is not
causally ordered before the receive event.
The two events are viewed as being atomic and simultaneous, and neither event precedes
the other.
46
• An execution can be modeled to give a total order that extends the partial order (E, ≺).
• In an A-execution, the messages can be made to appear instantaneous if there exist a
linear extension of the execution, such that each send event is immediately followed by
its corresponding receive event in this linear extension.
47
• In the non-separated linear extension, if the adjacent send event and its corresponding
receive event are viewed atomically, then that pair of events shares a common past and a
common future with each other.
Crown
Cyclic dependencies may exist in a crown. The crown criterion states that an A-
computation is RSC, i.e., it can be realized on a system with synchronous communication, if and
only if it contains no crown.
48
The above hierarchy implies that some executions belonging to a class X will not belong to
any of the classes included in X. The degree of concurrency is most in A and least in SYNC.
A program using synchronous communication is easiest to develop and verify.
A program using non-FIFO communication, resulting in an A execution, is hardest to
design and verify
49
50
Fig: Messages used to implement synchronous order. Pi has higher priority than Pj . (a) Pi
issues SEND(M). (b) Pj issues SEND(M).
Key rules to prevent cycles
To send to a lower priority process, messages M and ack(M) are involved in that order.
The sender issues send(M) and blocks until ack(M) arrives. Thus, when sending to a
lower priority process, the sender blocks waiting for the partner process to synchronize
and send an acknowledgement.
To send to a higher priority process, messages request(M), permission(M), and M are
involved, in that order. The sender issues send(request(M)), does not block, and awaits
permission. When permission(M) arrives, the sender issues send(M).
Steps in Bagrodia algorithm
1. Receive commands are forever enabled from all processes.
2. A send command, once enabled, remains enabled until it completes, i.e., it is not
possible that a send command gets before the send is executed.
3. To prevent deadlock, process identifiers are used to introduce asymmetry to break
potential crowns that arise.
4. Each process attempts to schedule only one send event at any time.
51
Fig : Examples showing how to schedule messages sent with synchronous primitives.
52
In either case, a higher priority process blocks on a lower priority process. So cyclic
waits are avoided.
If sender is also one of the receiver in the If sender is not a part of the communication
multicast algorithm, then it is closed group group, then it is open group algorithm.
algorithm.
They are specific and easy to implement. They are more general, difficult to design
and expensive.
It does not support large systems where It can support large systems.
client processes have short life.
53
54
The data structures maintained are sorted row–major and then column–major:
1. Explicit tracking
Tracking of (source, timestamp, destination) information for messages
(i) not known to be delivered
(ii) not guaranteed to be delivered in CO, is performed explicitly.
2. Implicit tracking
Tracking of messages that are either
(i) already delivered,
(ii) guaranteed to be delivered in CO, is performed implicitly.
Information about messages:
(i) not known to be delivered
(ii) not guaranteed to be delivered in CO, is explicitly tracked by the algorithm using (source,
timestamp, destination) information.
55
56
57
58
When message M4,2 is received by processes P2 and P3, they insert the (new)
piggybacked information in their local logs, as information M5,1.Dests = P6. They both
continue to store this in Log2 and Log3 and propagate this information on multicasts until
they learn at events (2, 4) and (3, 2) on receipt of messages M3,3 and M4,3, respectively, that
any future message is expected to be delivered in causal order to process P6, w.r.t. M5,1sent
toP6. Hence by constraint II, this information must be deleted from Log2 andLog3.
Processing at P6
When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further,
P6 propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current implicit
information M5,1 has been delivered to P6 by its very absence in the explicit information.
Processing at P1
When M2,2 arrives carrying piggybacked information M5,1.Dests = P6 this (new)
information is inserted in Log1. When M6,2 arrives with piggybacked information M5,1.Dests
={P4}, P1learns implicit information M5,1has been delivered to P6 by the very absence of
explicit information P6 ∈ M5,1.Dests in the piggybacked information, and hence marks
information P6 ∈ M5,1Dests for deletion from Log1. Simultaneously, M5,1Dests = P6 in Log1
implies the implicit information that M5,1has been delivered or is guaranteed to be delivered in
causal order to P4.Thus, P1 also learns that the explicit piggybacked information M5,1.Dests =
P4 is outdated. M5,1.Dests in Log1 is set to ∅.
59
Drawbacks:
A centralized algorithm has a single point of failure and congestion, and is not an elegant
solution.
Three phase distributed algorithm
Three phases can be seen in both sender and receiver side.
I. SENDER SIDE
Phase 1
In the first phase, a process multicasts the message M with a locally unique tag and the
local timestamp to the group members.
Phase 2
The sender process awaits a reply from all the group members who respond with a
tentative proposal for a revised timestamp for that message M.
The await call is non-blocking.
Phase 3
The process multicasts the final timestamp to the group.
60
61
62
If shared memory were available, an up-to-date state of the entire system would be
available to the processes sharing the memory.
The absence of shared memory necessitates ways of getting a coherent and complete
view of the system based on the local states of individual processes.
A meaningful global snapshot can be obtained if the components of the distributed
system record their local states at the same time.
2.7 SYSTEM MODEL & ITS DEFINITIONS
The system consists of a collection of n processes, p1, p2,…, pn that are connected by channels.
Let Cij denote the channel from process pi to process pj.
Processes and channels have states associated with them.
The state of a process at any time is defined by the contents of processor registers,
stacks, local memory, etc., and may be highly dependent on the local context of
the distributed application.
The state of channel Cij, denoted by SCij, is given by the set of messages in
transit in the channel.
The events that may happen are: internal event, send (send (mij)) and receive
(rec(mij)) events.
The occurrences of events cause changes in the process state.
A channel is a distributed entity and its state depends on the local states of the
63
recorded?
Issue 2:
Any message that is sent by a process before recording its snapshot, must be recorded in
the global snapshot (from C1).
Any message that is sent by a process after recording its snapshot, must not be recorded in the
global snapshot (from C2).
64
initiating a snapshot
record the state of C as the set of messages received along C after pj,s state was
recorded and before pj received the marker along C
65
Initiating a snapshot:
Process Pi initiates the snapshot
Pi records its own state and prepares a special marker message.
Send the marker message to all other processes.
Start recording all incoming messages from channels Cij for j not equal to i.
Propagating a snapshot
For all processes Pjconsider a message on channel Ckj.
If marker message is seen for the first time:
Pj records own sate and marks Ckj as empty
Send the marker message to all other processes.
Record all incoming messages from channels Clj for 1 not equal to j or k.
Else add all messages from inbound channels.
Terminating a snapshot
All processes have received a marker.
All process have received a marker on all the N-1 incoming channels.
A central server can gather the partial state to build a global snapshot.
Correctness of the algorithm
Since a process records its snapshot when it receives the first marker on any incoming
channel, no messages that follow markers on the channels incoming to it are recorded in
the process’s snapshot.
A process stops recording the state of an incoming channel when a marker is received on
that channel.
Due to FIFO property of channels, it follows that no message sent after the marker on
that channel is recorded in the channel state. Thus, condition C2 is satisfied.
When a process pj receives message mij that precedes the marker on channel Cij, it acts
as follows: if process pj has not taken its snapshot yet, then it includes mij in its recorded
snapshot. Otherwise, it records mij in the state of the channel Cij. Thus, condition C1
is satisfied.
Complexity
The recording part of a single instance of the algorithm requires O(e) messages and O(d) time,
where e is the number of edges in the network and d is the diameter of the network.
66
67
68
69
70
70
UNIT III
DISTRIBUTED MUTEX & DEADLOCK
3.1 INTRODUCTION
In distributed systems, a process may request resources in any order, which may not be
known a priori, and a process can request a resource while holding others. If the allocation
sequence of process resources is not controlled in such environments, deadlocks can occur.
Deadlocks can be deal with the following three strategies:
Deadlock prevention
Deadlock avoidance
Deadlock detection
Deadlock prevention:
Deadlock prevention is achieved by either having a process acquire all the needed
resources simultaneously before it begins execution or by pre-empting a process that holds the
needed resource.
Deadlock avoidance:
Deadlock avoidance means a resource is granted to a process if the resulting global
system is safe.
Deadlock detection:
Deadlock detection requires an examination of the status of the process–resources
interaction for the presence of a deadlock condition. To resolve the deadlock, abort a deadlocked
process.
Mutual exclusion is introduced to prevent race conditions.
Mutual exclusion in a distributed system states that only one process is allowed to
execute the critical section (CS) at any given time.
Critical section:
Critical section means more than one process cannot be executed at a time.
Three approaches for implementing distributed mutual exclusion:
Token-based approach
Non-token-based approach
Quorum-based approach
1. Token-based approach:
A unique token is shared among all the sites.
If a site possesses the unique token, it is allowed to enter its critical section.
This approach uses sequence number to order requests for the critical section.
71
2. Non-token-based approach:
A site communicates with other sites in order to determine which sites should
execute critical section next. This requires exchange of two or more
successive round of messages among sites.
This approach use timestamps instead of sequence number to order requests for
the critical section.
Whenever a site make request for critical section, it gets a timestamp.
Timestamp is also used to resolve any conflict between critical section
requests.
Eg: Lamport's algorithm, Ricart–Agrawala algorithm.
3. Quorum-based approach:
Instead of requesting permission to execute the critical section from all other
sites, each site requests only a subset of sites which is called a quorum.
Any two subsets of sites or Quorum contain a common site.
This common site is responsible to ensure mutual exclusion.
Eg: Maekawa’s Algorithm.
3.2 PRELIMINARIES
3.2.1 SYSTEM MODEL
The system consists of N sites, S1, S2, S3, …, SN.
Assume that a single process is running on each site.
The process at site Si is denoted by pi. All these processes communicate
asynchronously over an underlying communication network.
A process wishing to enter the CS requests all other or a subset of processes
by sending REQUEST messages, and waits for appropriate replies before
entering the CS.
While waiting the process is not allowed to make further requests to enter the
CS.
A site can be in one of the following three states:
requesting the CS
executing the CS
neither requesting nor executing the CS.
In the requesting state, the site is blocked and cannot make further requests
for the CS.
72
3.2.2.PERFORMANCE METRICS
Message complexity: This is the number of messages that are required
per CS execution by a site.
Synchronization delay: After a site leaves the CS, it is the time required
and before the next site enters the CS.
Response time: This is the time interval a request waits for its CS
execution to be over after its request messages have been sent out.
System throughput: This is the rate at which the system executes
requests for the CS.
SD=synchronization delay
E=average critical section execution time.
73
74
In fig.9.3. Sites S1 and S2 are making requests for the CS and send out
REQUEST messages to other sites. The timestamps of the requests are (1,1) and (1,2),
respectively.
In fig.9.4. Both the sites S1 and S2 have received REPLY messages from all other
sites. S1 has its request at the top of its request_queue but site S2 does not have its
request at the top of its request_queue. Consequently, site S1 enters the CS.
In fig.9.5. S1 exits and sends RELEASE mesages to all other sites.
In fig.9.6. Site S2 has received REPLY from all other sites and also received a RELEASE
message from site S1. Site S2 updates its request_queue and its request is now at the top of
its request_queue. Consequently, it enters the CS next
75
76
77
78
In Fig. 9.7, Sites S1 and S2 are each making requests for the CS and sending out
REQUEST messages to other sites. The timestamps of the requests are (2,1) and (1,2),
respectively.
In Fig. 9.8, S2 has received REPLY messages from all other sites and, consequently,
enters the CS.
In Fig. 9.9, S2 exits the CS and sends a REPLY mesage to site S1.
In Fig. 9.10, site S1 has received REPLY from all other sites and enters the CS next.
79
A site sends a REPLY message to requesting site to give its permission to enter the
critical section.
A site sends a RELEASE message to all other site in its request set or quorum upon
exiting the critical section.
80
81
82
3.7 INTRODUCTION:
Deadlocks are a fundamental problem in distributed systems. A deadlock can be defined
as a condition where a set of processes request resources that are held by other processes
in the set.
Deadlocks can be deal with using any one of the following three strategies:
Deadlock prevention
Deadlock avoidance
Deadlock detection.
Deadlock prevention:
Deadlock prevention is commonly achieved by either having a process acquire all the
needed resources simultaneously before it begins execution or by pre-empting a
process that holds the needed resource.
Deadlock avoidance:
In deadlock avoidance, a resource is granted to a process if the resulting global system
is safe.
Deadlock detection:
Deadlock detection requires an examination of the status of the process–resources
interaction for the presence of a deadlock condition.
3.8 SYSTEM MODEL
The system can be modeled as a directed graph in which vertices represent the
processes and edges represent unidirectional communication channels.
A process can be in two states, running or blocked. In the running state (also called
active state), a process has all the needed resources and is either executing or is ready
for execution. In the blocked state, a process is waiting to acquire some resource.
Deadlock can neither be prevented nor avoided in distributed system as the system is so
vast that it is impossible to do so. Therefore, only deadlock detection can be
implemented.
Techniques of deadlock detection:
Progress: The method should be able to detect all the deadlocks in the system.
Safety: The method should not detect false of phantom deadlocks
Three approaches:
Centralized approach:
Here there is only one responsible resource to detect deadlock.
83
84
Figure 10.1 shows a WFG, where process P11 of site 1 has an edge to process P21 of site
1 and an edge to process P32 of site 2. Process P32 of site 2 is waiting for a resource that is
currently held by process P33 of site 3.
At the same time process P21 at site 1 is waiting on process P24 at site 4 to release a
resource, and so on. If P33 starts waiting on process P24, then processes in the WFG are
involved in a deadlock depending upon the request model.
3.9 PRELIMINARIES
3.9.1 DEADLOCK HANDLING STRATEGIES
Three strategies:
Deadlock prevention
Deadlock avoidance
Deadlock detection.
Deadlock prevention:
Deadlock prevention is commonly achieved by either having a process acquire all the
needed resources simultaneously before it begins execution or by pre-empting a process that
holds the needed resource.
Deadlock avoidance:
In deadlock avoidance, a resource is granted to a process if the resulting global system is
safe.
Deadlock detection:
Deadlock detection requires an examination of the status of the process–resources
interaction for the presence of a deadlock condition.
3.9.2 ISSUES IN DEADLOCK DETECTION
Deadlock handling faces two major issues
1. Detection of existing deadlocks
2. Resolution of detected deadlocks
I. Detection of existing deadlocks
Detection of deadlocks involves addressing two issues namely maintenance of the
WFG and searching of the WFG for the presence of cycles or knots.
In distributed systems, a cycle or knot may involve several sites; the search for
cycles greatly depends upon how the WFG of the system is represented across the
system.
Depending upon the way WFG information is maintained and the search for cycles
is carried out, there are centralized, distributed, and hierarchical algorithms for
deadlock detection in distributed systems.
A deadlock detection algorithm must satisfy the following two conditions:
a) Progress-No undetected deadlocks:
85
86
That is, a process may not be a part of a cycle, it can still be deadlocked.
In Fig. 10.1. Process P11 has two outstanding resource requests. In case of the
AND model, P11 shall become active from idle state only after both the resources
are granted. There is a cycle P11→P21→P24→P54→P11, which corresponds to
a deadlock situation.
Consider process P44 in Figure 10.1. It is not a part of any cycle but is still
deadlocked as it is dependent on P24.
3.10.3 OR MODEL
In OR model, a process can make a request for numerous resources simultaneously and the
request is satisfied if any one of the requested resources is granted. The requested resources may
exist at different locations.
If all requests in the WFG are OR requests, then the nodes are called OR nodes.
Presence of a cycle in the WFG of an OR model does not imply a deadlock in the OR model.
In the OR model, the presence of a knot indicates a deadlock.
87
88
2. Edge-chasing
3. Diffusion computation
4. Global state detection
89
message until it has received a reply message for every query it sent.
o For all subsequent queries for this deadlock detection initiation, it immediately
sends back a reply message.
o The initiator of a deadlock detection detects a deadlock when it receives reply for
every query it had sent out.
Examples:
Chandy–Misra–Haas algorithm for one OR model
Chandy– Herman algorithm
4. Global state detection-based algorithms
Global state detection based deadlock detection algorithms exploit the following
facts:
A consistent snapshot of a distributed system can be obtained without
freezing the underlying computation.
If a stable property holds in the system before the snapshot collection is
initiated, this property will still hold in the snapshot.
Therefore, distributed deadlocks can be detected by taking a snapshot
of the system and examining it for the condition of a deadlock.
90
91
The transitions in the defined by the algorithm are block, activate, transmit and
detect.
Block creates an edge in the WFG.
Two messages are needed, one resource request and one message back to the
blocked process to inform it of the public label of the process it is waiting for.
Activate denotes that a process has acquired the resource from the process it was
waiting for.
Transmit propagates larger labels in the opposite direction of the edges by sending
a probe message.
Detect means that the probe with the private label of some process has returned to
it, indicating a deadlock.
This algorithm can easily be extended to include priorities, so that whenever a
deadlock occurs, the lowest priority process gets aborted.
This priority based algorithm has two phases.
1. The first phase is almost identical to the algorithm.
2. The second phase the smallest priority is propagated around the circle. The
propagation stops when one process recognizes the propagated priority as its
own.
The invariant is, for all processes u/v: v ≤u.
Proof
Initially u = v for all processes. The only requests that change u or v are:
1. Block: u and v are set such that u = v.
2. Transmit: u is increased.
Hence, the invariant follows.
From the previous invariant, we have the following lemmas.
Lemma
For any process u/v, if u > v, then u was set by a Transmit step.
Theorem
If a deadlock is detected, a cycle of blocked nodes exists.
Message Complexity:
If a deadlock persists long enough to be detected, the worst-case complexity of the
algorithm is s(s - 1)/2 Transmit steps, where s is the number of processes in the cycle.
92
93
Performance analysis:
In the algorithm, one probe message is sent on every edge of the WFG which
connects processes on two sites.
The algorithm exchanges at most m(n − 1)/2 messages to detect a deadlock that
involves m processes and spans over n sites.
The size of messages is fixed and is very small (only three integer words).
The delay in detecting a deadlock is O(n).
Advantages:
It is easy to implement.
Each probe message is of fixed length.
There is very little computation.
There is very little overhead.
There is no need to construct a graph, nor to pass graph information to other sites.
This algorithm does not find false (phantom) deadlock.
There is no need for special data structures.
94
95
96
97
98
99
100
101
102
UNIT IV
4.1 INTRODUCTION
Rollback and checkpointing are used to ensure consistency in distributed systems.
103
104
105
106
Delayed messages:
Messages whose receive is not recorded because the receiving process was either down
or the message arrived after the rollback of the receiving process, are called delayed
messages.
Messages m2 and m5 are delayed messages.
Orphan messages:
Messages with receive recorded but message send not recorded are called orphan
messages.
A rollback might have undone the send of such messages, leaving the receive event intact
at the receiving process.
Orphan messages do not arise if processes roll back to a consistent global state.
Duplicate messages:
Duplicate messages arise due to message logging and replaying during process recovery.
In fig 4.3, message m4 was sent and received before the rollback.
Due to the rollback of process P4 to C4,8 and process P3 to C3,8, both send and receipt of
message m4 are undone.
When process P3 restarts from C3,8, it will resend message m4. Therefore, P4 should not
replay message m4 from its log. If P4 replays message m4, then message m 4 is called a
duplicate message.
107
Assume that the process Pi fails and all the contents of the volatile memory of Pi are lost
and, after Pi has recovered from the failure, the system needs to be restored to a consistent global
state from where the processes can resume their execution. Process P i’s state is restored to a
valid state by rolling it back to its most recent checkpoint Ci,1.
To restore the system to a consistent state, the process Pj rolls back to checkpoint
Cj,1 because the rollback of process Pi to checkpoint Ci,1 created an orphan
message H.
Pj does not roll back to checkpoint Cj,1 but to checkpoint Cj,1 because rolling back
to checkpoint Cj,2 does not eliminate the orphan message H.
Even this resulting state is not a consistent global state, as an orphan message I is
created due to the rollback of process Pj to checkpoint Cj,1.
To eliminate this orphan message, process Pk rolls back to checkpoint Ck,1.
The restored global state {Ci,1, Cj,1 , Ck,1} is a consistent state
The system state has been restored to a consistent state but there are several messages left
in an erroneous state which must be handled correctly. Messages A, B, D, G, H, I, and J had
been received at the points indicated in the figure and messages C, E, and F were in transit when
the failure occurred.
108
Lost messages like D can be handled by having processes keep a message log of
all the sent messages.
So when a process restores to a checkpoint, it replays the messages from its log to
handle the lost message problem.
However, message logging and message replaying during recovery can result in
duplicate messages.
Process Pk, which has already received message J, will receive it again, thereby
causing in consistency in the system state.
Overlapping failures further complicate the recovery process. A process Pj that
begins rollback/recovery in response to the failure of a process Pi can itself fail
and develop amnesia with respect process Pi’s failure.
Pj can act in a fashion that exhibits ignorance of process Pi’s failure.
109
In check point based recovery, the state of each process and the communication
channel is check pointed frequently so that when a failure occurs, the system can be
restored to a globally consistent set of checkpoints.
This scheme does not depend on the PWD assumption, and hence it does not need
to detect, log, or replay non-deterministic events.
These protocols are less restrictive and simpler to implement than log-based
rollback recovery.
The drawback here is, it does not guarantee that pre-failure execution can be
deterministically regenerated after a rollback.
The three types of checkpoint based rollback-recovery techniques are:
1. Uncoordinated checkpointing
2. Coordinated checkpointing
3. Communication-induced checkpointing
1. Uncoordinated Checkpointing
Here, each process has autonomy in deciding when to take checkpoints. This eliminates
the synchronization overhead as there is no need for coordination between processes and it
allows processes to take checkpoints when it is most convenient or efficient.
Advantage of this method:
Lower runtime overhead during normal execution.
Limitations
Domino effect during a recovery
Recovery from a failure is slow because processes need to iterate to find a consistent
set of checkpoints
Each process maintains multiple checkpoints and periodically invoke a garbage
collection algorithm
Not suitable for application with frequent output commits
110
111
112
113
114
115
Pessimistic logging protocols assume that a failure can occur after any non-
deterministic event in the computation.
Pessimistic protocols log to the stable storage the determinant of each non- deterministic
event before the event affects the computation. Pessimistic protocols implement as synchronous
logging, which is a stronger than the always-no-orphans condition. Checkpoints to minimize the
amount of work that has to be repeated during recovery.
When a process fails, the process is restarted from the most recent checkpoint and
the logged determinants are used to recreate the prefailure execution.
In a pessimistic logging system, the observable state of each process is always
recoverable.
But there is performance penalty incurred by synchronous logging which may
lead to high performance overhead.
Implementations of pessimistic logging must use special techniques to reduce the
effects of synchronous logging on the performance.
This overhead can be lowered using special hardware.
Magnetic disk devices and a special bus to guarantee atomic logging of all
messages exchanged in the system can mitigate this overhead.
In Figure 13.8. During failure-free operation the logs of processes P0, P1, and P2
contain the determinants needed to replay messages m0, m4, m7, m1, m3, m6, and
m2, m5, respectively.
116
Upon a failure, the dependency information is used to calculate and recover the latest
global state of the pre-failure execution in which no process is in an orphan.
117
118
119
120
invoked concurrently.
This is implemented in two phases:
The initiating processsend a message to all other processes and ask for
the preferences – restarting to the previous checkpoints. All need to
agree about either do or not.
The initiating process send the final decision to all processes, all the
processes act accordingly after receiving the final decision.
Phase I:
An initiating process Pi sends a message to all other processes to check if they all are willing
to restart from their previous checkpoints.
A process may reply no to a restart request due to any reason.
If Pi learns that all processes are willing to restart from their previous checkpoints, Pi decides
that all processes should roll back to their previous checkpoints.
Phase II:
Pi propagates its decision to all the processes.
On receiving Pi’s decision, a process acts accordingly.
During the execution of the recovery algorithm, a process cannot send
messages related to the underlying computation while it is waiting for Pi’s
decision.
In Figure 13.11. The set {x1,y1, z1} is a consistent set of checkpoints. Suppose process
X decides to initiate the checkpointing algorithm after receiving message m.
It takes a tentative checkpoint x2 and sends “take tentative checkpoint" messages to
121
122
Basic idea:
The main idea of the algorithm is to find a set of consistent checkpoints, from the
set of checkpoints.
This is done based on the number of messages sent and received.
Recovery may involve multiple iterations of roll backs by processors.
Whenever a processor rolls back, it is necessary for all other processors to find
out if any message sent by the rolled back processor has become an orphan
message.
The orphan messages are identified by comparing the number of messages sent to
and received from neighboring processors.
When a processor restarts after a failure, it broadcasts a ROLLBACK message
that it has failed.
The recovery algorithm at a processor is initiated when it restarts after a failure or
when it learns of a failure at another processor.
Because of the broadcast of ROLLBACK messages, the recovery algorithm is
initiated at all processors.
123
The rollback starts at the failed processor and slowly diffuses into the entire
system through ROLLBACK messages.
During the kth iteration (k != 1), a processor pi does the following:
(i) based on the state CkPt i it was rolled back in the (k − 1)th iteration, it
computes SENTi j (CkPti) for each neighbor pj and sends this value in a
ROLLBACK message to that neighbor
(ii) pi waits for and processes ROLLBACK messages that it receives from its
neighbors in kth iteration and determines a new recovery point CkPti for pi
based on information in these messages.
124
At the end of each iteration, at least one processor will rollback to its final
recovery point, unless the current recovery points are already consistent.
The second and third iteration will progress in the same manner. Note that
the set of recovery points chosen at the end of the first iteration, {ex2, ey2,
ez1}, is consistent, and no further rollback occurs.
125
126
4. Sender identification:
A process that receives a message always knows the identity of the sender
process.
When multiple messages are expected from the same sender in a single round, a
scheduling algorithm is employed that sends these messages in sub-rounds, so
that each message sent within the round can be uniquely identified.
5. Channel reliability
The channels are reliable, and only the processes may fail.
Authenticated vs. non-authenticated messages:
With unauthenticated messages, when a faulty process relays a message to other
processes
it can forge the message and claim that it was received from another process,
it can also tamper with the contents of a received message before relaying it.
When a process receives a message, it has no way to verify it authenticity. This is
known as un authenticated message or oral message or an unsigned message.
Using authentication via techniques such as digital signatures, it is easier to solve
the agreement problem because, if some process forges a message or tampers
with the contents of a received message before relaying it, the recipient can detect
the forgery or tampering.
Thus, faulty processes can inflict less damage.
6. Agreement variable:
The agreement variable may be boolean or multivalued, and need not be an
integer.
This simplifying assumption does not affect the results for other data types, but
helps in the abstraction while presenting the algorithms.
4.8.2 BYZANTINE GENERAL PROBLEM
The Byzantine Generals’ Problem (BGP) is a classic problem faced by any distributed
computer system network. Imagine that the grand Eastern Roman empire aka Byzantine empire
has decided to capture a city.
There is fierce resistance from within the city.
The Byzantine army has completely encircled the city.
The army has many divisions and each division has a general.
The generals communicate between each as well as between all lieutenants within
their division only through messengers.
All the generals or commanders have to agree upon one of the two plans of
action.
Exact time to attack all at once or if faced by fierce resistance then the time to
retreat all at once. The army cannot hold on forever.
127
If the attack or retreat is without full strength then it means only one thing —
Unacceptable brutal defeat.
If all generals and/or messengers were trustworthy then it is a very simple
solution.
In Fig 4.10, the Lieutenant 2 is a traitor who purposely changes the message that
is to be passed to Lieutenant 1.
Now Lieutenant 1 has received 2 messages and does not know which one to
follow. Assuming Lieutenant 1 follows the Commander because of strict
hierarchy in the army.
Still, 1/3rd of the army is weaker by force as Lieutenant 2 is a traitor and this
creates a lot of confusion.
However what if the Commander is a traitor (as explained in Fig 4.11). Now
2/3rd of the total army has followed the incorrect order and failure is certain.
128
After adding 1 more Lieutenant and 1 more type of message (Let’s say the 3rd message
is ‘Not sure’), the complexity of finding a consensus between all the Lieutenants and the
Commander is increased.
Now imagine the exponential increase when there are hundreds of Lieutenants.
129
This is BGP. It is applicable to every distributed network. All participants or nodes
(‘Lieutenant’) are exactly of equal hierarchy. If agreement is reachable, then protocols to
reach it need to bedevised.
All participating nodes have to agree upon every message that is transmitted
between the nodes.
If a group of nodes is corrupt or the message that they transmit is corrupt then still
the network as a whole should not be affected by it and should resist this ‘Attack’.
The network in its entirety has to agree upon every message transmitted in the
network. This agreement is called as consensus.
130
program. If a process executes an assignment more than one time, this could
lead to a corruption fault.
Misevaluation: The process misevaluates an expression included in its
program. This fault is different from a corruption fault: misevaluating an
expression does not imply the update of the variables involved in the
expression and, in some cases the result of an evaluation is not as-signed to a
variable.
4.8.3 CONSENSUS PROBLEM
All the process has an initial value and all the correct processes must agree
on single value. This is consensus problem.
Consensus is a fundamental paradigm for fault-tolerant asynchronous distributed
systems. Each process proposes a value to the others. All correct processes have to agree
(Termination) on the same value (Agreement) which must be one of the initially proposed values
(Validity).
The requirements of the consensus problem are:
Agreement: All non-faulty processes must agree on the same (single) value.
Validity: If all the non-faulty processes have the same initial value, then the
agreed upon value by all the non-faulty processes must be that same value.
Termination: Each non-faulty process must eventually decide on a value.
Interactive Consistency Problem:
All the process has an initial value, and all the correct processes must agree
upon a set of values, with one value for each process. This is interactive
consistency problem.
The formal specifications are:
Agreement: All non-faulty processes must agree on the same array of values
A[v1, …,vn].
Validity: If process i is non-faulty and its initial value is vi, then all nonfaulty
processes agree on vi as the ith element of the array A. If process jis faulty, then
the non-faulty processes can agree on any value for A[j].
Termination: Each non-faulty process must eventually decide on the array A.
The difference between the agreement problem and the consensus problem is that, in the
agreement problem, a single process has the initial value, whereas in the consensus problem, all
processes have an initial value.
4.9 OVERVIEW OF RESULTS
Consensus is not solvable in asynchronous systems even if one process can fail by
crashing. The results are tabulated below. f indicates the number of processes that can fail and n
131
132
the consensus problem. The following are the weaker consensus problem in asynchronous
system:
Terminating reliable broadcast: A correct process will always get a message
even if the sender crashes while sending. If the sender crashes while sending the
message, the message may be even null, but still it has to be delivered to the
correct process.
K-set consensus: It is solvable as long as the number of crashes is less than the
parameter k, which indicates the non-faulty processes that agree on different
values, as long as the size of the set of values agreed upon is bounded by k.
Approximate agreement: The consensus value is from multi valued domain.
The agreed upon values by the non-faulty processes be within of each other.
Renaming problem: It requires the processes to agree on necessarily distinct
values.
Reliable broadcast: A weaker version of reliable terminating broadcast (RTB), is
the one in which the terminating condition is dropped and is solvable under
crash failures.
133
134
135
136
Consider the Fig 4.18. Commander Pc sends its value to the other three
lieutenants.
In the second round, each lieutenant relays to the other two lieutenants, the value
it received from the commander in the first round.
At the end of the second round, a lieutenant takes the majority of the values it
received
(i) directly from the commander in the first round
(ii) from the other two lieutenants in the second round.
Fig 4.18: Achieving Byzantine agreement when n = 4 processes and f = 1 Malicious process
137
Lamport-Shostak-Pease Algorithm
This is also known as Oral Message algorithm of f faulty process (OM(f)) and n is the total
number of processes (n>=3f+1). The algorithm is recursively defined as:
1. Source process sends its value to each other process.
2. Each process uses the value it receives from the source.
The algorithm is recursive and the base of the recursion that is OM(0) says that the
138
source process sends its values to each other process. Now each process uses its
value, value it receives from the source if no value is received the default 0 is
assumed.
Each message has the following parameters:
i) a consensus estimate value (v)
ii) a set of destinations (Dests)
iii) a list of nodes traversed by the message, from most recent to least recent
(List)
iv) The number of Byzantine processes that the algorithm still needs to
tolerate (faulty).
The commander invokes the algorithm with parameter faulty set to f, the maximum
number of malicious processes to be tolerated.
The algorithm uses f + 1 synchronous rounds. Each message (having this parameter
faulty = k) received by a process invokes several other instances of the algorithm
with parameter faulty = k − 1.
The terminating case of the recursion is when the parameter faulty is 0.
As the recursion folds, each process progressively computes the majority function
over the values it used as a source for that level of invocation in the unfolding, and
the values it has just computed as consensus values using the majority function for
the lower level of invocations.
139
140
Fig 4.22: Local tree at P3for solving the Byzantine agreement, for n = 10 and f = 3. Only
one branch of the tree is shown for simplicity
Correctness
The correctness of the Byzantine agreement algorithm can be observed from the following two
informal inductive arguments. Here we assume that the Oral_Msg algorithm is invoked with
parameter x, and that there are a total of f malicious processes. There are two cases depending on
whether the commander is malicious. A malicious commander causes more chaos than an honest
commander.
Phase-king algorithm for consensus: polynomial (synchronous system)
The phase-king algorithm proposed by Berman and Garay solves the consensus
problem under the same model, requiring f + 1 phases, and a polynomial number
of messages but can tolerate only f < Ceil (n/4), malicious processes.
The algorithm is so called because it operates in f + 1 phases, each with two
rounds, and a unique process plays an asymmetrical role as a leader in each
round.
141
In the first round of each phase, each process broadcasts its estimate of the
consensus value to all other processes, and likewise awaits the values broadcast
by others.
At the end of the round, it counts the number of “1” votes and the number of “0”
votes. If number is greater than n/2, then it sets its majority variable to that
consensus value, and sets mult to the number of votes received for the majority
value.
If neither number is greater than n/2, which may happen when the malicious
processes do not respond, and the correct processes are split among themselves,
then a default value is used for the majority variable.
In the second round (lines 1g–1o) of each phase, the phaseking initiates
processing the phase king for phase k is the process with identifier Pk, where k ∈
{1…n}.
142
The phase king broadcasts its majority value majority, which serves the role of a
tie-breaker vote for those other processes that have a value of mult of less than
n/2+ f.
Thus, when a process receives the tie-breaker from the phase king, it updates its
estimate of the decision variable v to the value sent by the phase king if its own
mult variable < n/2 + f.
The reason for this is that among the votes for its own majority value, f votes
could be bogus and hence it does not have a clear majority of votes (i.e., > n/2)
from the non-malicious processes.
Hence, it adopts the value of the phase king. However, if mult > n/2 + f(lines 1k–
1l), then it has received a clear majority of votes from the nonmalicious
processes, and hence it updates its estimate of the consensus variable v to its own
majority value, irrespective of what tie-breaker value the phase king has sent in
the second round.
At the end of f + 1 phases, it is guaranteed that the estimate v of all the processes
is the correct consensus value.
Correctness
The correctness reasoning is in three steps:
Among the f + 1 phases, the phase king of some phase k is non-malicious because
there are at most f malicious processes.
As the phase king of phase k is non-malicious, all non-malicious processes can be
seen to have the same estimate value v at the end of phase k.
All non-malicious processes have the same consensus estimate x at the start of
phase k+ 1 and they continue to have the same estimate at the end of phase k+ 1.
Complexity
The algorithm requires f + 1 phases with two sub-rounds in each phase, and (f + 1)[(n
−1)(n +1)] messages.
143
144
145
Z-dependency is that there does not exist a non-blocking algorithm that will allow a minimum
number of processes to take their checkpoints.
23. Give the types of communication-induced checkpointing.
The checkpoints that a process takes independently are called local checkpoints.
The process is forced to take are called forced checkpoints.
24. What is MRS model?
The MRS (mark, send, and receive) model avoids the domino effect by ensuring that within
every checkpoint interval all message receiving events precede all message- sending events.
25. Write about Index-based checkpointing
This assigns monotonically increasing indexes to checkpoints, such that the
checkpoints having the same index at different processes form a consistent state.
Inconsistency between checkpoints of the same index can be avoided in a lazy
fashion if indexes are piggybacked on application messages to help receivers
decide when they should take a forced a checkpoint.
26. Define log-based rollback recovery.
It makes use of deterministic and nondeterministic events in a computation.
27. State No-orphans condition.
It states that if any surviving process depends on an event e, then either event e is logged on the
stable storage, or the process has a copy of the determinant of event e.
28. Define pessimistic logging.
Pessimistic logging protocols assume that a failure can occur after any non- deterministic event
in the computation.
29. What is optimistic logging?
In these protocols, processes log determinants asynchronously to the stable
storage.
Optimistically assume that logging will be complete before a failure occurs.
This do not implement the always-no-orphans condition.
146
147
148
149
UNIT V
CLOUD COMPUTING
Originally, the cloud was thought of as a bunch of combined services, technologies, and activities.
What happened inside the cloud was not known to the users of the services. This is partially how the
cloud got its name. Cloud computing is a virtualization-based technology that allows us to create,
configure, and customize applications via an internet connection. The cloud technology includes a
development platform, hard disk, software application, and database. The term cloud refers to a network or
the internet. It is a technology that uses remote servers on the internet to store, manage, and access data
online rather than local drives. The data can be anything such as files, images, documents, audio, video,
and more.
There are the following operations that we can do using cloud computing:
The NIST definition of cloud computing outlines five key cloud characteristics: on-demand self-service,
broad network access, resource pooling, rapid elasticity, and measured service.
a)On-Demand Self-Service
On-demand self-service means that a consumer can request and receive access to a service offering,
without an administrator or some sort of support staff having to fulfill the request manually. The request
processes and fulfillment processes are all automated. This offers advantages for both the provider and the
consumer of the service.
b)Broad Network Access
Cloud services should be able to be accessed by a wide variety of client devices. Laptops and desktops
aren’t the only devices used to connect to networks and the Internet. Users also connect via tablets,
smartphones, and a host of other options. Cloud services need to support all of these devices. If the
service requires a client application, the provider may have to build platform-specific applications (i.e.,
150
Windows, Mac, iOS, and Android). Having to develop and maintain a number of different client
applications is costly, so it is extremely advantageous if the solution can be architected in such a way that
doesn’t require a client at all.
c)Resource Pooling
Resource pooling helps save costs and allows flexibility on the provider side. Resource pooling is based
on the fact that clients will not have a constant need for all the resources available to them. When
resources are not being used by one customer, instead of sitting idle those resources can be used by
another customer. This gives providers the ability to service many more customers than they could if each
customer required dedicated resources. Resource pooling is often achieved using virtualization.
Virtualization allows providers to increase the density of their systems. They can host multiple virtual
sessions on a single system. In a virtualized environment, the resources on one physical system are placed
into a pool that can be used by multiple virtual systems.
d)Rapid Elasticity
Rapid elasticity describes the ability of a cloud environment to easily grow to satisfy user demand. Cloud
deployments should already have the needed infrastructure in place to expand the service capacity. Rapid
elasticity is usually accomplished through the use of automation and orchestration. When resource usage
hits a certain point, a trigger is set off. This trigger automatically begins the process of capacity expansion.
Once the usage has subsided, the capacity shrinks as needed to ensure that resources are not wasted. The
rapid elasticity feature of cloud implementations is what enables them to be able to handle the “burst”
151
capacity needed by many of their users. Burst capacity is an increased capacity that is needed for only a
short period of time.
e)Measured Service
Cloud services must have the ability to measure usage. Usage can be quanti fied using various metrics,
such as time used, bandwidth used, and data used. The measured service characteristic is what enables the
“pay as you go” feature of cloud computing. Once an appropriate metric has been identified, a rate is
determined. This rate is used to determine how much a customer should be charged. This way, the client is
billed based on consumption levels.
The way the cloud is used varies from organization to organization. Every organization has its own
requirements as to what services it wants to access from a cloud and how much control it wants to have
over the environment. To accom modate these varying requirements, a cloud environment can be
implemented using different service models. Each service model has its own set of requirements and
benefits. The NIST definition of cloud computing outlines four differ ent cloud deployment models:
public, private, community, and hybrid.
i)PUBLIC CLOUDS
When most people think about cloud computing, they are thinking of the public cloud service model. In
the public service model, all the systems and resources that provide the service are housed at an external
service provider. That service provider is responsible for the management and administration of the
systems that are used to provide the service. The client is only responsible for any software or client
application that is installed on the end-user system. Connections to public cloud providers are usually
made through the Internet.
Benefits
The number of public cloud implementations continues to grow at a rapid pace due to the numerous
benefits public clouds offer.
Availability
Public cloud deployments can offer increased availability over what is achievable internally. Every
organization has an availability quotient that they would like to achieve. Every organization also has an
availability quotient that they are capable of achieving. Sometimes the two match; sometimes they don’t.
The problem is that availability comes at a cost, whether hardware cost, software cost, training cost, or
staffing cost. Whichever it is, an organization may not be able to afford it, so they have to make do with
what they have and therefore not be able to achieve the level of availability they would like. Most public
cloud providers already have the hardware, software, and staffing in place to make their offerings highly
available. They may charge a little extra for the service to provide increased availability, but it will be
nowhere near the cost of doing it internally.
152
Scalability
Public cloud implementations offer a highly scalable architecture, as do most cloud implementations.
What public cloud implementations offer that private clouds do not is the ability to scale your
organization’s capacity without having to build out your own infrastructure.
Accessibility
Public cloud providers place great importance on accessibility. To expand their potential customer base as
wide as possible, they attempt to ensure that they can service as many different client types as possible.
Their goal is to ensure that their services can be accessed by any device on the Internet without the need
for VPNs or any special client software.
Cost Savings
Public clouds are particularly attractive because of the cost savings they offer. But you do have to be
careful because the savings might not be as good as you think. You need to have a good understanding of
not only the amount of savings but also the type of savings.
Drawbacks
a)Integration Limitations
In public SaaS clouds, the systems are external to your organization; this means that the data is also
external. Having your data housed externally can cause problems when you’re doing reporting or trying to
move to on-premises systems. If you need to run reports or do business intelligence (BI) analytics against
the data, you could end up having to transmit the data through the Internet. This can raise performance
concerns as well as security issues.
b)Reduced Flexibility
When you are using a public cloud provider, you are subject to that provider’s upgrade schedule. In most
cases, you will have little or no influence over when upgrades are performed.
c)Forced Downtime
When you use a public cloud provider, the provider controls when systems are taken offline for
maintenance. Maintenance may be performed at a time that is inconvenient for you and your organization.
Responsibilities
With public clouds, most of the responsibilities lie with the service provider. The provider is responsible
for maintenance and support. The provider is also responsible for making sure support personnel are
properly trained.
Security Considerations
Ensuring security is especially difficult in public cloud scenarios. Since you probably won’t manage
access to the systems providing the services, it’s very difficult to ensure that they are secure. You basically
have to take the provider’s word for it and trust in the provider’s capabilities.
153
a)Data
Public cloud providers raise a real issue over data security. There is a question of data ownership. Since
the service provider owns the systems where your data resides, the provider could potentially be
considered the true owner of the data. There is also an issue with data access. Theoretically, anyone who
works at the service provider could potentially have access to your data.
b) Compliance
Compliance can be a big concern with public service providers, much to do with the fact that you will
have little to no visibility around what’s happening on the back end.
c) Auditing
In the case of public cloud providers, you will generally have limited auditing capabilities. You may not
direct access to any logs or event management systems. Many public cloud providers will allow you
access to at least some form of application logs. These logs can be used to view user access and make
decisions regarding licensing.
ii)PRIVATE CLOUDS
In a private cloud, the systems and resources that provide the service are located internal to the company
or organization that uses them. That organization is responsible for the management and administration of
the systems that are used to provide the service. In addition, the organization is also responsible for any
software or client application that is installed on the end-user system. Private clouds are usually accessed
through the local LAN or wide area network (WAN). In the case of remote users, the access will generally
be provided through the Internet or occasionally through the use of a virtual private network (VPN).
Benefits
a)Support and Troubleshooting
Private cloud environments can be easier to troubleshoot than public cloud environments. In a private
cloud environment, you will have direct access to all systems. You can access logs, run network traces,
run debug traces, or do anything else you need to do to troubleshoot an issue. You don’t have to rely on a
service provider for help.
b) Maintenance
With private clouds, you control the upgrade cycle. You aren’t forced to upgrade when you don’t want.
You don’t have to perform upgrades unless the newer version has some feature or functionality that you
want to take advantage of. You can control when upgrades are performed. If your organization has
regularly scheduled maintenance windows, you can perform your upgrades and other maintenance
activities during that specified timeframe. This may help reduce the overall impact of a system outage.
c) Monitoring
Since you will have direct access to the systems in your private cloud environment, you will be able to do
whatever monitoring you require. You can monitor everything from the application to the system
hardware. One big advantage of this capability is that you can take preemptive measures to prevent an
outage, so you are able to be more proactive in servicing your customers.
154
Drawbacks
a) Cost
Implementing a private cloud requires substantial upfront costs. You have to implement an infrastructure
that not only can support your current needs but your future needs as well. You need to estimate the needs
of all the business units you will be supporting. You also have to implement an infrastructure that can
support peak times. All the systems needed to support peak times don’t always have to be running if you
have a way of automatically starting them when necessary.
b) Hardware and Software Compatibility
You have to make sure the software you implement is compatible with the hardware in your environment.
In addition, you have to make sure the software you implement is compatible with the clients in your
environment.
c) Expertise Needed
With private clouds you still need expertise in all the applications and system you want to deploy. The
need for internal expertise can lead to expensive train ing and education. You will be responsible for
installing, maintaining, and sup porting them, so you must ensure that you either have the in-house
knowledge to do so or the ability to bring in outside contractors or consultants to help.
Responsibilities
a)Security Considerations
With a private cloud implementation, your organization will have complete control over the systems,
applications, and data. You can control who has access to what. Ensuring security is easier in a private
cloud environment. There you have complete control over the systems, so you can implement any security
means you like.
b)Compliance
In a private cloud environment, you are responsible for making sure that you follow any applicable
compliance regulations. If your organization has the skills and the technology to ensure adherence to
compliance regulations, having the systems and the data internal can be a big advantage. If you don’t have
the skills and technology, you will have to obtain the skills, or you could face serious problems.
Data
In a private cloud environment, you own the data and the systems that house the data. This gives you
more control over who can access the data and what they can do with it. It also gives you greater assurance
that your data is safe. Auditing
In a private cloud environment, you have complete access to all the application and system logs. You can
see who accessed what and what they did with it. The biggest advantage is that you can see all of this in
real time, so you are able to take any corrective action necessary to ensure the integrity of your systems
.iii)COMMUNITY CLOUDS
Community clouds are semi-public clouds that are shared between members of a select group of
organizations. These organizations will generally have a common purpose or mission. The organizations
do not want to use a public cloud that is open to everyone. They want more privacy than what a public
cloud offers. In addition, each organization doesn’t want to be individually responsible for maintaining the
cloud; they want to be able to share the responsibilities with others.
155
Benefits
a)Cost
In a community cloud, costs are shared between the community members. This shared cost allows for the
purchase of infrastructure that any single member organization may not have been able to afford. This way
the community members are also able to achieve greater economies of scale
b)Multitenancy
In a community cloud, multitenancy can help you take advantage of some economies of scale. Your
organization alone may not be large enough to take advantage of some of the cost savings, but working
with another organization or multiple organizations, together you may be large enough to see these
benefits.
Drawbacks
There are some potential drawbacks to implementing a community cloud. Any time you have multiple
organizations working together, there is the potential for conflict. Steps must be taken to mitigate any
potential issues.
a)Ownership
Ownership in a community cloud needs to be clearly defined. If multiple organizations are coming
together to assemble infrastructure, you must determine some agreement for joint ownership. If you are
purchasing capital resources, those resources need to go against some organization’s budget. In some
instances, the organizations coming together to build the community cloud may establish a single common
organization that can “own” the resources.
b) Responsibilities
In a community cloud, responsibilities are shared between the member organizations. There may be
problems deciding who owns what and who is responsible for what, but after those questions have been
decided, the shared responsibility can be quite beneficial. This shared responsibility reduces the
administrative burden on any single organization.
c)Security Considerations
Community clouds present a special set of circumstances when it comes to security because there will be
multiple organizations accessing and controlling the environment.
d)Data
In a community cloud, all the participants in the community may have access to the data. For this reason,
you don’t want to store any data that is restricted to only your organization.
e)Compliance
In a community cloud, compliance can be particularly tricky. The systems will be subject to all the
compliance regulations to which each of the member organizations is subject. So, your organization may
be subject to regulations with which you have little familiarity.
f)Auditing
In a community cloud, member organizations will have shared access to all the application and system
audit logs. You will want to have some agreement as to who will perform what activities. Trolling though
logs can be particularly tedious and time consuming, so you don’t want people wasting time doing
duplicate work.
156
Hybrid
A hybrid cloud model is a combination of two or more other cloud models. The clouds themselves are not
mixed together; rather, each cloud is separate, and they are all linked together. A hybrid cloud may
introduce more complexity to the environment, but it also allows more flexibility in fulfilling an
organization’s objectives.
Benefits
In addition to the benefits brought by each of the cloud models, the hybrid cloud model brings increased
flexibility. If your ultimate goal is to move everything to a public service provider, a hybrid environment
allows you to move to a cloud environment without being forced to move everything public until the time
is right. You may have certain applications for which the public service offerings are very expensive. You
can keep these applications internal until the price comes down. You may also have security concerns
about moving certain types of data to a public service provider. Again, the hybrid cloud model allows you
to leave that data internal until you can be assured that it will be safe in a public cloud environment.
Drawbacks
A hybrid cloud environment can be the most complex environment to implement. You have different
considerations for each type of cloud you plan to implement. Not all your rules and procedures will apply
to all environments. You will have to develop a different set of rules and procedures for each environment.
a)Integration
You may have some applications in a private cloud and some applications in a private one, but these
157
applications may need to access and use the same data. You have two choices here: You can duplicate
copies of data, which would require you to set up some type of replication mechanism to keep the data in
sync, or you can move data around as needed. Moving data around in a hybrid cloud environment can be
tricky because you have to worry about bandwidth constraints.
b)Security Considerations
Hybrid clouds can bring about particular security considerations. Not only do you have to worry about
security issues in each individual environment, you have to worry about issues created by connecting the
environments together.
c)Data
Moving data back and forth between cloud environments can be very risky. You have to ensure that all
environments involved have satisfactorily secured data. Data in motion can be particularly difficult to
secure. Both sides of the conversation must support the same security protocols, and they must be
compatible with each other.
Auditing
Auditing in hybrid environments can be tricky. User access may rotate between internal and external.
Following a process from start to finish may take you through both internal and external systems. It’s
important that you have some way of doing event log correlation so that you can match up these internal
and external events.
158
159
Drivers
Many organizations look to IaaS providers to expand their capacity. Instead of spending a lot of money
expanding a datacenter or building a new datacenter, organizations are basically renting systems provided
by an IaaS provider.
Challenges
There have been several challenges to IaaS adoption. Most organizations see the benefits, but they worry
about the loss of control. The total cost can also be an issue. In many IaaS environments, you are charged
for resource usage, such as processor and memory. Unless you carefully monitor your system usage, you
may be in for a shock when the bill comes.
Security Challenges
The security challenges for IaaS implementations are similar to those for other service providers.
However, since the provider does not need access to the actual operation system or items at a higher level,
160
there is no need for them to have administrative accounts on the system. This can give the customer at
least some level of comfort regarding security.
IaaS Providers
IaaS providers are really picking up steam in the marketplace. This isn’t just due to demand. There is also
the fact that IaaS platforms such as CloudStack and OpenStack have been developed to make automation
and orchestration easier. Here we cover two of the most well-known IaaS providers: Amazon EC2 and
Rackspace.
Platform as a Service
Platform as a Service, or PaaS, provides an operating system, development platform, and/or a database
platform. PaaS implementations allow organizations to develop applications without having to worry
about building the infrastructure needed to support the development environment.
161
PaaS Characteristics
PaaS implementations allow organizations to build and deploy Web applica tions without having to
build their own infrastructure. PaaS offerings gener ally include facilities for development, integration,
and testing.
a)Customization
With PaaS, you have complete control over the application, so you are free to customize the application as
you see fit.
b) Analytics
Since you, the customer, will be creating the applications, you will have the ability to view application
usage and determine trends. You will be able to see which components are getting the most use and which
ones are not being used.
c)Integration
162
In a PaaS environment, the data will be stored at the provider site, but the cus tomer will have direct
access to it. Conducting business intelligence and reporting should not be a problem from an access point
of view, but you could run into issues when it comes to bandwidth usage, because you may be moving
large amounts of data between your internal environment and the provider’s environment.
PaaS Responsibilities
In a PaaS offering, responsibilities are somewhat distributed between the service provider and the
customer.
163
In a PaaS implementation, the customer is generally responsible for everything above the operating system
and development platform level. You will be responsible for installing and maintaining any additional
applications you will need. This includes application patching and application monitoring. The database
platform may be supplied for you, but you will be responsible for the data. In a PaaS implementation, you
will usually have direct access to the data. If there are any problems with the data, you will be able to
implement any direct data fix you might need to perform.
PaaS Drivers
There have been many drivers influencing the growth of the PaaS market. Many organizations want to
move towards a public cloud model, but can’t find public SaaS providers offering the applications they
need. A PaaS model allows them to move the infrastructure and platforms out of their internal datacenters
while allowing them to be able to develop the applications they need.
PaaS Challenges
A number of challenges come into play with public PaaS environments, including issues related to
flexibility and security.
Flexibility Challenges
You may have difficulty finding a provider with the platform you need. Most PaaS providers limit their
offerings to specific platform sets. If you need a special set or special configuration, you might not be able
to find a provider that offers what you need.
Security Challenges
The provider will have administrative control over the operating system and the database platform. Since
the provider has direct access to the systems, they will have direct access to all of the applications and
data.
PaaS Providers
The number of PaaS providers in the market continues to grow. First we take a look at Windows Azure.
a)Windows Azure
Windows Azure has a free offering and upgraded offerings that include features such as increased SLAs.
Windows Azure makes it very easy to spin up a Web site or development platform. Windows Azure
includes a wide variety of options such as compute services, data services, app services, and network
service
164
Software as a Service
Software as a Service, or SaaS, provides application and data services. Applications, data, and all the
necessary platforms and infrastructure are provided by the service provider. SaaS is the original cloud
service model. It still remains the most popular model, offering by far the largest number of provider
options.
165
Customization
With SaaS implementations, the service provider usually controls virtually everything about the
application. In many cases, this will limit any custom ization that can be done. But depending on the
implementation, you may be able to request that the user interface (UI) or the look and feel of the
applica tion be modified slightly. Usually wholesale changes are not allowed. In most cases the customer
will not be to make the changes themselves; the provider will have to make the changes. In a SaaS
environment, allowing customization can be very costly for the service provider and, hence, the customer.
Support and Maintenance
In a SaaS environment, software upgrades are centralized and performed by the service provider. You
don’t have to worry about upgrading software on multiple clients. The centralized upgrades allow for more
frequent upgrades, which can allow for accelerated feature delivery. The exception to this rule is when
there is client software that is used to access the centralized application. But must SaaS providers will try
to provide access to their applications without requiring a client application.
Analytics
Analytics and usage statistics can provide value information about application usage. In SaaS
implementations, the provider has the ability to view user activities and determine trends. In many cases
this information is shared with the customers. For large organizations, this information can be invaluable.
Since most cloud environments are pay-as-you-go offerings, it’s important to understand usage trends.
Understanding trends helps you understand when you may have a spike in usage and therefore a spike in
costs. It’s also important to understand concurrent usage and total usage. You may be able to reduce your
license costs.
Integration
In a SaaS environment, the data will be stored at the provider site. In most cases, the customer will not
have direct access to the data. This can be a problem when it comes to reporting and business intelligence.
It’s also a problem if you need to do a manual fix of the data or load or reload data in bulk. In some cases
there is nothing you can do about that.
Responsibilities
In SaaS implementations, most of the responsibilities fall on the service pro vider. This is one of the
reasons SaaS implementations have become so popu lar. Organizations are able to free up their internal
resources for other activities, as opposed to using them for system administration. Figure 4.3 gives you an
idea of what is generally the responsibility of the service provider and what is usually taken care of by the
customer.
166
In a SaaS environment, the provider is basically responsible for everything except the client systems. It
will ensure that the application is up to date. It will make sure that the systems have been patched
appropriately. It will ensure that the data is being stored properly. It will monitor the systems for
performance and make any adjustments that need to be made. In a SaaS environment, the customer is
responsible for the client system or systems. The customer must ensure that the clients have connectivity
to the SaaS application. The client systems must have any necessary client software installed. The client
systems must be patched to an appropriate level.
SaaS Drivers
Many drivers have contributed to the rise of public SaaS offerings. There has been a big rise in the
creation and consumption of Web-based applications. Users are getting used to the look and feel of these
types of applications. Not to mention the fact that the look and feel have improved. Most SaaS providers
167
offer their services in the form of Web-based applications. So, as acceptance of Web-based applications
grows, so does the acceptance of SaaS services.
SaaS Challenges
Even though SaaS is currently the most popular cloud service model, there are still many challenges to the
adoption of SaaS. SaaS providers have been able to resolve many of the challenges and mitigate concerns,
but many still exist.
Disparate Location
SaaS applications are generally hosted offsite. This means connections between the client and the
application must travel over the public Internet, sometimes long distances. This distance may introduce
latency into the environment. This can be a limiting factor for some applications. Some applications
require response times in milliseconds. These applications will not work in environments where there is a
great deal of latency.
Multitenancy
Multitenancy can cause several issues. Since the application is shared, generally little to no customization
can be performed. This can be a problem if your organization requires extensive customization. You may
have to go with an on-premises application.
A default Outlook.com mail account is free. If you want advanced features or a version that does not
include advertisements, you have to upgrade your account. This can be done by selecting the gear icon in
the top-right corner and selecting More Mail Settings.
Google Drive
Google Drive, gives you online access to view and create word processing documents, spreadsheets,
presentations, and a host of other documents. You can use the built-in document types or add new types.
To add a new type of document, choose Create in the left pane and select Connect More Apps.
168
Driving Factors
1. Reduced costs
Establishing and running a data center is expensive. You need to purchase the right equipment and hire
technicians to install and manage the center. When you shift to cloud computing, you will only pay for the
services procured.
2. Flexibility
One of the major benefits of cloud computing is mobility. The service gives you and your employees the
flexibility to work from any location. Employees can complete their tasks at home or from the field.
You can reduce the number of workstations in your office and allow some employees to work from home
to save costs further. Cloud computing enables you to monitor the operations in your business effectively.
You just need a fast internet connection to get real time updates of all operations.
3. Scalability
The traditional way of planning for unexpected growth is to purchase and keep additional servers, storage,
and licenses. It may take years before you actually use the reserve resources. Scaling cloud computing
services is easy. You can get additional storage space or features whenever you need them. Your provider
will simply upgrade your package within minutes as long as you meet the additional cost.
Traditional computing system require back up plans especially for data storage. A disaster can lead to
permanent data loss if no backup storage is in place. Businesses do not require any such means when
storing data on a cloud. The data will always be available as long as users have an internet connection.
Some businesses use cloud computing services as backup and a plan for disaster recovery.
5. Data security
Sometimes storing data on the cloud is safer than storing it on physical servers and data centers. A breach
of security at your premises can lead compromised data security if laptops or computers are stolen. If you
have data on the cloud, you can delete any confidential information remotely or move it to a different
account. Breaching the security measures on clouding platforms is difficult. Hence, you are assured of data
security.
169
We have already mentioned the main groups of cloud computing services, that is, IaaS, PaaS, and IaaS.
Each of these groups has many sub categories that vary across providers. For instance, if you are looking
for software, you will have hundreds of options from different providers. You can choose the service
providers with the best features and rates for the service that your business needs.
7. Improved collaboration
Business owners are always looking for ways to boost individual and team performance. Cloud computing
is among the most effective ways of improving team performance.Staff members can easily share data and
collaborate to complete projects even from different locations. Field workers can easily share real time
data and updates with those in the office. In addition, cloud computing eliminates redundant or repetitive
tasks such as data re-entry.You can improve the level of efficiency, increase productivity, and save costs
by moving your business to cloud computing. The best approach is to shift the operations gradually to
avoid data losses or manipulation during the shift.
1. Security
The topmost concern in investing in cloud services is security issues in cloud computing. It is because
your data gets stored and processed by a third-party vendor and you cannot see it. Every day or the other,
you get informed about broken authentication, compromised credentials, account hacking, data breaches,
etc. in a particular organization. It makes you a little more skeptical.
2. Password Security
As large numbers of people access your cloud account, it becomes vulnerable. Anybody who knows your
password or hacks into your cloud will be able to access your confidential information. Here the
organization should use a multiple level authentication and ensure that the passwords remain protected.
Also, the passwords should be modified regularly, especially when a particular employee resigns and leave
the organization. Access rights to usernames and passwords should be given judiciously.
3. Cost Management
Cloud computing enables you to access application software over a fast internet connection and lets you
save on investing in costly computer hardware, software, management, and maintenance. This makes it
affordable.
4. Lack of expertise
With the increasing workload on cloud technologies and continuously improving cloud tools, management
170
has become difficult. There has been a consistent demand for a trained workforce who can deal with cloud
computing tools and services. Hence, firms need to train their IT staff to minimize this challenge.
5. Internet Connectivity
Cloud services are dependent on a high-speed internet connection. So businesses that are relatively small
and face connectivity issues should ideally first invest in a good internet connection so that no downtime
happens. It is because internet downtime might incur vast business losses.
6. Control or Governance
Another ethical issue in cloud computing is maintaining proper control over asset management and
maintenance. There should be a dedicated team to ensure that the assets used to implement cloud services
are used according to agreed policies and dedicated procedures. There should be proper maintenance and
the assets are used to meet your organization’s goals successfully.
7. Compliance
Another major risk of cloud computing is maintaining compliance. By compliance we mean, a set of rules
about what data is allowed to be moved and what should be kept in-house to maintain compliance. The
organizations must follow and respect the compliance rules set by various government bodies.
Companies have started to invest in multiple public clouds, multiple private clouds or a combination of
both called the hybrid cloud. This has grown rapidly in recent times. So it has become important to list the
challenges faced by such organizations and find solutions to grow with the trend.
Implementing an internal cloud is advantageous. This is because all the data remains secure in-house. But
the challenge here is that the IT team has to build and fix everything by themselves. Also, the team needs
to ensure the smooth functioning of the cloud. They need to automate maximum manual tasks. The
execution of tasks should be in the correct order.
10. Performance
When your business applications move to a cloud or a third-party vendor, so your business performance
starts to depend on your provider as well. Another major problem in cloud computing is investing in the
right cloud service provider. Before investing, you should look for providers with innovative technologies.
171
Be cautious about choosing the provider and investigate whether they have protocols to mitigate issues
that arise in real-time.
11. Migration
Migration is nothing but moving a new application or an existing application to a cloud. In the case of a
new application, the process is pretty straightforward. But if it is an age-old company application, it
becomes tedious. Velostrata conducted a survey recently, wherein 95% of organizations are moving their
applications to the cloud. The survey showed that most organizations are finding it a nightmare. Some
notable issues faced here are slow data migrations, security challenges in cloud computing, extensive
troubleshooting, application downtime, migration agents, and cutover complexity.
Another challenge of cloud computing is that applications need to be easily migrated between cloud
providers without being locked for a set period. There is a lack of flexibility in moving from one cloud
provider to another because of the complexity involved. Changing cloud inventions bring a slew of new
challenges like managing data movement and establishing a secure network from scratch. Another
challenge is that customers can’t access it from everywhere, but this can be fixed by the cloud provider so
that the customer can securely access the cloud from anywhere.
Some of the most pressing issues in cloud computing is the need for high availability (HA) and reliability.
Reliability refers to the likelihood that a system will be up and running at any given point in time, whereas
availability refers to how likely it is that the system will be up and running at any given point in time.
For any company, a hybrid cloud environment is often a messy mix of multiple cloud application
development and cloud service providers, as well as private and public clouds, all operating at once. A
common user interface, consistent data, and analytical benefits for businesses are all missing from these
complex cloud ecosystems.
Virtualization is a technique how to separate a service from the underlying physical delivery of that
service. It is the process of creating a virtual version of something like computer hardware. It was initially
developed during the mainframe era. It involves using specialized software to create a virtual or software-
created version of a computing resource rather than the actual version of the same resource. With the help
172
of Virtualization, multiple operating systems and applications can run on the same machine and its same
hardware at the same time, increasing the utilization and flexibility of hardware.
In other words, one of the main cost-effective, hardware-reducing, and energy-saving techniques used by
cloud providers is Virtualization. Virtualization allows sharing of a single physical instance of a resource
or an application among multiple customers and organizations at one time. It does this by assigning a
logical name to physical storage and providing a pointer to that physical resource on demand. The term
virtualization is often synonymous with hardware virtualization, which plays a fundamental role in
efficiently delivering Infrastructure-as-a-Service (IaaS) solutions for cloud computing. Moreover,
virtualization technologies provide a virtual environment for not only executing applications but also for
storage, memory, and networking.
Virtualization
Host Machine: The machine on which the virtual machine is going to be built is known as Host
Machine.
Guest Machine: The virtual machine is referred to as a Guest Machine.
173
In the case of cloud computing, users store data in the cloud, but with the help of Virtualization, users have
the extra benefit of sharing the infrastructure. Cloud Vendors take care of the required physical resources,
but these cloud providers charge a huge amount for these services which impacts every user or
organization. Virtualization helps Users or Organisations in maintaining those services which are required
by a company through external (third-party) people, which helps in reducing costs to the company. This is
the way through which Virtualization works in Cloud Computing.
Benefits of Virtualization
Drawback of Virtualization
High Initial Investment: Clouds have a very high initial investment, but it is also true that it will
help in reducing the cost of companies.
Learning New Infrastructure: As the companies shifted from Servers to Cloud, it requires highly
skilled staff who have skills to work with the cloud easily, and for this, you have to hire new staff
or provide training to current staff.
Risk of Data: Hosting data on third-party resources can lead to putting the data at risk, it has the
chance of getting attacked by any hacker or cracker very easily.
Characteristics of Virtualization
Increased Security: The ability to control the execution of a guest program in a completely
transparent manner opens new possibilities for delivering a secure, controlled execution
environment. All the operations of the guest programs are generally performed against the virtual
machine, which then translates and applies them to the host programs.
Managed Execution: In particular, sharing, aggregation, emulation, and isolation are the most
relevant features.
Sharing: Virtualization allows the creation of a separate computing environment within the same
host.
Aggregation: It is possible to share physical resources among several guests, but virtualization
also allows aggregation, which is the opposite process.
174
Types of Virtualization
1. Application Virtualization
2. Network Virtualization
3. Desktop Virtualization
4. Storage Virtualization
5. Server Virtualization
6. Data virtualization
Types of Virtualization
2. Network Virtualization: The ability to run multiple virtual networks with each having a separate
control and data plan. It co-exists together on top of one physical network. It can be managed by
individual parties that are potentially confidential to each other. Network virtualization provides a facility
to create and provision virtual networks, logical switches, routers, firewalls, load balancers, Virtual Private
Networks (VPN), and workload security within days or even weeks.
175
Network Virtualization
3. Desktop Virtualization: Desktop virtualization allows the users’ OS to be remotely stored on a server
in the data center. It allows the user to access their desktop virtually, from any location by a different
machine. Users who want specific operating systems other than Windows Server will need to have a
virtual desktop. The main benefits of desktop virtualization are user mobility, portability, and easy
management of software installation, updates, and patches.
4. Storage Virtualization: Storage virtualization is an array of servers that are managed by a virtual
storage system. The servers aren’t aware of exactly where their data is stored and instead function more
like worker bees in a hive. It makes managing storage from multiple sources be managed and utilized as a
single repository. storage virtualization software maintains smooth operations, consistent performance,
and a continuous suite of advanced functions despite changes, breaks down, and differences in the
underlying equipment.
5. Server Virtualization: This is a kind of virtualization in which the masking of server resources takes
place. Here, the central server (physical server) is divided into multiple different virtual servers by
changing the identity number, and processors. So, each system can operate its operating systems in an
isolated manner. Where each sub-server knows the identity of the central server. It causes an increase in
performance and reduces the operating cost by the deployment of main server resources into a sub-server
resource. It’s beneficial in virtual migration, reducing energy consumption, reducing infrastructural costs,
etc.
176
Server Virtualization
177
6. Data Virtualization: This is the kind of virtualization in which the data is collected from various
sources and managed at a single place without knowing more about the technical information like how
data is collected, stored & formatted then arranged that data logically so that its virtual view can be
accessed by its interested people and stakeholders, and users through the various cloud services remotely.
Many big giant companies are providing their services like Oracle, IBM, At scale, Cdata, etc.
Uses of Virtualization
Data-integration
Business-integration
Service-oriented architecture data-services
Searching organizational data
Load balancing is an essential technique used in cloud computing to optimize resource utilization and
ensure that no single resource is overburdened with traffic. It is a process of distributing workloads across
multiple computing resources, such as servers, virtual machines, or containers, to achieve better
performance, availability, and scalability.
1. In cloud computing, load balancing can be implemented at various levels, including the network
layer, application layer, and database layer. The most common load balancing techniques used in
cloud computing are:
2. Network Load Balancing: This technique is used to balance the network traffic across multiple
servers or instances. It is implemented at the network layer and ensures that the incoming traffic is
distributed evenly across the available servers.
3. Application Load Balancing: This technique is used to balance the workload across multiple
instances of an application. It is implemented at the application layer and ensures that each instance
receives an equal share of the incoming requests.
4. Database Load Balancing: This technique is used to balance the workload across multiple database
servers. It is implemented at the database layer and ensures that the incoming queries are
distributed evenly across the available database servers.
178
Advantages:
1. Improved Performance: Load balancing helps to distribute the workload across multiple resources,
which reduces the load on each resource and improves the overall performance of the system.
2. High Availability: Load balancing ensures that there is no single point of failure in the system,
which provides high availability and fault tolerance to handle server failures.
3. Scalability: Load balancing makes it easier to scale resources up or down as needed, which helps to
handle spikes in traffic or changes in demand.
4. Efficient Resource Utilization: Load balancing ensures that resources are used efficiently, which
reduces wastage and helps to optimize costs.
Disadvantages:
1. Complexity: Implementing load balancing in cloud computing can be complex, especially when
dealing with large-scale systems. It requires careful planning and configuration to ensure that it
works effectively.
2. Cost: Implementing load balancing can add to the overall cost of cloud computing, especially when
using specialized hardware or software.
3. Single Point of Failure: While load balancing helps to reduce the risk of a single point of failure, it
can also become a single point of failure if not implemented correctly.
4. Security: Load balancing can introduce security risks if not implemented correctly, such as
allowing unauthorized access or exposing sensitive data.
Cloud load balancing is defined as the method of splitting workloads and computing properties in a cloud
computing. It enables enterprise to manage workload demands or application demands by distributing
resources among numerous computers, networks or servers. Cloud load balancing includes holding the
circulation of workload traffic and demands that exist over the Internet. As the traffic on the internet
growing rapidly, which is about 100% annually of the present traffic. Hence, the workload on the server
growing so fast which leads to the overloading of servers mainly for popular web server. There are two
elementary solutions to overcome the problem of overloading on the servers-
First is a single-server solution in which the server is upgraded to a higher performance server.
However, the new server may also be overloaded soon, demanding another upgrade. Moreover, the
upgrading process is arduous and expensive.
Second is a multiple-server solution in which a scalable service system on a cluster of servers is
built. That’s why it is more cost effective as well as more scalable to build a server cluster system
for network services.
Load balancing is beneficial with almost any type of service, like HTTP, SMTP, DNS, FTP, and
POP/IMAP. It also rises reliability through redundancy. The balancing service is provided by a dedicated
179
hardware device or program. Cloud-based servers farms can attain more precise scalability and availability
using server load balancing. Load balancing solutions can be categorized into two types –
1. Direct Routing Requesting Dispatching Technique: This approach of request dispatching is like
to the one implemented in IBM’s Net Dispatcher. A real server and load balancer share the virtual
IP address. In this, load balancer takes an interface constructed with the virtual IP address that
accepts request packets and it directly routes the packet to the selected servers.
2. Dispatcher-Based Load Balancing Cluster: A dispatcher does smart load balancing by utilizing
server availability, workload, capability and other user-defined criteria to regulate where to send a
TCP/IP request. The dispatcher module of a load balancer can split HTTP requests among various
nodes in a cluster. The dispatcher splits the load among many servers in a cluster so the services of
various nodes seem like a virtual service on an only IP address; consumers interrelate as if it were a
solo server, without having an information about the back-end infrastructure.
3. Linux Virtual Load Balancer: It is an opensource enhanced load balancing solution used to build
extremely scalable and extremely available network services such as HTTP, POP3, FTP, SMTP,
media and caching and Voice Over Internet Protocol (VoIP). It is simple and powerful product
made for load balancing and fail-over. The load balancer itself is the primary entry point of server
cluster systems and can execute Internet Protocol Virtual Server (IPVS), which implements
transport-layer load balancing in the Linux kernel also known as Layer-4 switching.
Cloud Elasticity: Elasticity refers to the ability of a cloud to automatically expand or compress the
infrastructural resources on a sudden up and down in the requirement so that the workload can be managed
efficiently. This elasticity helps to minimize infrastructural costs. This is not applicable for all kinds of
environments, it is helpful to address only those scenarios where the resource requirements fluctuate up
and down suddenly for a specific time interval.
It works such a way that when number of client access expands, applications are naturally provisioned the
extra figuring, stockpiling and organization assets like central processor, Memory, Stockpiling or transfer
speed what’s more, when fewer clients are there it will naturally diminish those as
per prerequisite.
180
It is most commonly used in pay-per-use, public cloud services. Where IT managers are willing to pay
only for the duration to which they consumed the resources.
Example: Consider an online shopping site whose transaction workload increases during festive season
like Christmas. So for this specific period of time, the resources need a spike up. In order to handle this
kind of situation, we can go for a Cloud-Elasticity service rather than Cloud Scalability. As soon as the
season goes out, the deployed resources can then be requested for withdrawal.
Cloud Scalability: Cloud scalability is used to handle the growing workload where good performance is
also needed to work efficiently with software or applications. Scalability is commonly used where the
persistent deployment of resources is required to handle the workload statically.
Example: Consider you are the owner of a company whose database size was small in earlier days but as
time passed your business does grow and the size of your database also increases, so in this case you just
need to request your cloud service vendor to scale up your database capacity to handle a heavy workload.
It is totally different from what you have read above in Cloud Elasticity. Scalability is used to fulfill the
static needs while elasticity is used to fulfill the dynamic need of the organization. Scalability is a similar
kind of service provided by the cloud where the customers have to pay-per-use. So, in conclusion, we can
say that Scalability is useful where the workload remains high and increases statically.
Types of Scalability:
181
2. Horizontal Scalability: In this kind of scaling, the resources are added in a horizontal row.
3. Diagonal Scalability –
It is a mixture of both Horizontal and Vertical scalability where the resources are added both
vertically and horizontally.
182
5.9 REPLICATION
Replication in Cloud Computing refers to multiple storage of the same data to several different locations
by usually synchronization of these data sources. Replication in Cloud Computing is partly done for
backup and on the other hand to reduce response times, especially for reading data requests.
The simplest form of data replication in cloud computing environment is to store a copy of a file ( copy ),
in expanded form, the copying and pasting in any modern operating systems. Replication is the
reproduction of the original data in unchanged form. Changing data accesses are expensive in general
through replication. In the frequently encountered master / slave replication, a distinction between the
original data (primary data) and the dependent copies. In peer copies ( version control ) there must be
merging of data sets (synchronization). Sometimes it is important to know which data sets must have the
replicas. Depending on the type of replication it is located between the processing and creation of the
primary data and their replication in a certain period of time. This period is usually referred to as latency.
Synchronous replication is when a change of operation can only be completed successfully on a data
object, if it was performed on the replicas. In order to implement this technology, a protocol ensures the
indivisibility of transactions, the commit protocol.
If between the processing of the primary data and the replication there is latency, it speaks of asynchrony.
The data are identical. A simple variant of the asynchronous replication is the “File Transfer Replication”,
the transfer of files via FTP or SSH. The data of the replicas make only a snapshot of the primary data at a
specific time. At the database level this can happen in short time intervals, the transaction databases are
transported from one server to another and read into the database. Assuming an intact network latency it
corresponds to the time interval in which the transaction logs are written. Four methods can be used for
Replication in Cloud Computing – Merge Replication, Primary Copy, Snapshot replication, Standby
replication.
5.10 MONITORING
Cloud monitoring is a method of reviewing, observing, and managing the operational workflow in a cloud-
based IT infrastructure. Manual or automated management techniques confirm the availability and
performance of websites, servers, applications, and other cloud infrastructure. This continuous evaluation
of resource levels, server response times, and speed predicts possible vulnerability to future issues before
they arise.
183
The cloud has numerous moving components, and for top performance, it’s critical to safeguard that
everything comes together seamlessly. This need has led to a variety of monitoring techniques to fit the
type of outcome that a user wants. The main types of cloud monitoring are:
Database monitoring
Because most cloud applications rely on databases, this technique reviews processes, queries, availability,
and consumption of cloud database resources. This technique can also track queries and data integrity,
monitoring connections to show real-time usage data. For security purposes, access requests can be
tracked as well. For example, an uptime detector can alert if there’s database instability and can help
improve resolution response time from the precise moment that a database goes down.
Website monitoring
A website is a set of files that is stored locally, which, in turn, sends those files to other computers over a
network. This monitoring technique tracks processes, traffic, availability, and resource utilization of cloud-
hosted sites.
This monitoring type creates software versions of network technology such as firewalls, routers, and load
balancers. Because they’re designed with software, these integrated tools can give you a wealth of data
about their operation. If one virtual router is endlessly overcome with traffic, for example, the network
adjusts to compensate. Therefore, instead of swapping hardware, virtualization infrastructure quickly
adjusts to optimize the flow of data.
This technique tracks multiple analytics simultaneously, monitoring storage resources and processes that
are provisioned to virtual machines, services, databases, and applications. This technique is often used to
host infrastructure-as-a-service (IaaS) and software-as-a-service (SaaS) solutions. For these applications,
you can configure monitoring to track performance metrics, processes, users, databases, and available
storage. It provides data to help you focus on useful features or to fix bugs that disrupt functionality.
This technique is a simulation of a computer within a computer; that is, virtualization infrastructure and
virtual machines. It’s usually scaled out in IaaS as a virtual server that hosts several virtual desktops. A
184
monitoring application can track the users, traffic, and status of each machine. You get the benefits of
traditional IT infrastructure monitoring with the added benefit of cloud monitoring solutions.
Monitoring is a skill, not a full-time job. In today’s world of cloud-based architectures that are
implemented through DevOps projects, developers, site reliability engineers (SREs), and operations staff
must collectively define an effective cloud monitoring strategy. Such a strategy should focus on
identifying when service-level objectives (SLOs) are not being met, likely negatively affecting the user
experience. So, then what are the benefits of leveraging cloud monitoring tools? With cloud monitoring:
Scaling for increased activity is seamless and works in organizations of any size
Dedicated tools (and hardware) are maintained by the host
Tools are used across several types of devices, including desktop computers, tablets, and phones,
so your organization can monitor apps from any location
Installation is simple because infrastructure and configurations are already in place
Your system doesn’t suffer interruptions when local problems emerge, because resources aren’t
part of your organization’s servers and workstations
Subscription-based solutions can keep your costs low
A private cloud gives you extensive control and visibility. Because systems and the software stack are
fully accessible, cloud monitoring is relaxed when it’s operated in a private cloud. Monitoring in public or
hybrid clouds, however, can be tough. Let’s review the focal points:
Because the data exists between private and public clouds, a hybrid cloud environment presents
curious challenges. Limited security and compliance create problems for data access. Your
administrator can solve these issues by deciding which data to store in various clouds and which
data to asynchronously update.
A private cloud gives you more control, but to promote optimal performance, it’s still wise to
monitor workloads. Without a clear picture of workload and network performance, it’s nearly
impossible to justify configuration or architectural changes or to quantify quality-of-service
implementations.
What is a cloud platform? A cloud platform refers to the operating system and hardware of a server in an
Internet-based data center. It allows software and hardware products to co-exist remotely and at scale.
So, how do cloud platforms work? Enterprises rent access to compute services, such as servers, databases,
storage, analytics, networking, software, and intelligence. Therefore, the enterprises don’t have to set up
and own data centers or computing infrastructure. They simply pay for what they use.
185
There are several types of cloud platforms. Not a single one works for everyone. There are several models,
types, and services available to help meet the varying needs of users. They include:
Public Cloud: Public cloud platforms are third-party providers that deliver computing resources
over the Internet. Examples include Amazon Web Services (AWS), Google Cloud Platform,
Alibaba, Microsoft Azure, and IBM Bluemix.
Private Cloud: A private cloud platform is exclusive to a single organization. It’s usually in an on-
site data center or hosted by a third-party service provider.
Hybrid Cloud: This is a combination of public and private cloud platforms. Data and applications
move seamlessly between the two. This gives the organization greater flexibility and helps
optimize infrastructure, security, and compliance.
A cloud platform allows organizations to create cloud-native applications, test and build applications, and
store, back up, and recover data. It also allows organizations to analyze data. Organizations can also
stream video and audio, embed intelligence into their operations, and deliver software on-demand on a
global scale.
Cloud Storage is a mode of computer data storage in which digital data is stored on servers in off-site
locations. The servers are maintained by a third-party provider who is responsible for hosting, managing,
and securing data stored on its infrastructure. The provider ensures that data on its servers is always
accessible via public or private internet connections.
Cloud Storage enables organizations to store, access, and maintain data so that they do not need to own
and operate their own data centers, moving expenses from a capital expenditure model to operational.
Cloud Storage is scalable, allowing organizations to expand or reduce their data footprint depending on
need.
186
Cloud Storage uses remote servers to save data, such as files, business data, videos, or images. Users
upload data to servers via an internet connection, where it is saved on a virtual machine on a physical
server. To maintain availability and provide redundancy, cloud providers will often spread data to multiple
virtual machines in data centers located across the world. If storage needs increase, the cloud provider will
spin up more virtual machines to handle the load. Users can access data in Cloud Storage through an
internet connection and software such as web portal, browser, or mobile app via an application
programming interface (API).
Public
Public Cloud Storage is a model where an organization stores data in a service provider’s data centers that
are also utilized by other companies. Data in public Cloud Storage is spread across multiple regions and is
often offered on a subscription or pay-as-you-go basis. Public Cloud Storage is considered to be “elastic”
which means that the data stored can be scaled up or down depending on the needs of the organization.
Public cloud providers typically make data available from any device such as a smartphone or web portal.
Private
Private Cloud Storage is a model where an organization utilizes its own servers and data centers to store
data within their own network. Alternatively, organizations can deal with cloud service providers to
provide dedicated servers and private connections that are not shared by any other organization. Private
clouds are typically utilized by organizations that require more control over their data and have stringent
compliance and security requirements.
Hybrid
A hybrid cloud model is a mix of private and public cloud storage models. A hybrid cloud storage model
allows organizations to decide which data it wants to store in which cloud. Sensitive data and data that
must meet strict compliance requirements may be stored in a private cloud while less sensitive data is
stored in the public cloud. A hybrid cloud storage model typically has a layer of orchestration to integrate
between the two clouds. A hybrid cloud offers flexibility and allows organizations to still scale up with the
public cloud if need arises.
Multicloud
A multicloud storage model is when an organization sets up more than one cloud model from more than
one cloud service provider (public or private). Organizations might choose a multicloud model if one
cloud vendor offers certain proprietary apps, an organization requires data to be stored in a specific
country, various teams are trained on different clouds, or the organization needs to serve different
187
requirements that are not stated in the servicers’ Service Level Agreements. A multicloud model offers
organizations flexibility and redundancy.
Cloud Storage enables organizations to move from a capital expenditure to an operational expenditure
model, allowing them to adjust budgets and resources quickly.
Elasticity
Cloud Storage is elastic and scalable, meaning that it can be scaled up (more storage added) or down (less
storage needed) depending on the organization’s needs.
Flexibility
Cloud Storage offers organizations flexibility on how to store and access data, deploy and budget
resources, and architect their IT infrastructure.
Security
Most cloud providers offer robust security, including physical security at data centers and cutting edge
security at the software and application levels. The best cloud providers offer zero trust architecture,
identity and access management, and encryption.
Sustainability
One of the greatest costs when operating on-premises data centers is the overhead of energy consumption.
The best cloud providers operate on sustainable energy through renewable resources.
Redundancy
Redundancy (replicating data on multiple servers in different locations) is an inherent trait in public
clouds, allowing organizations to recover from disasters while maintaining business continuity.
Certain industries such as finance and healthcare have stringent requirements about how data is stored and
accessed. Some public cloud providers offer tools to maintain compliance with applicable rules and
regulations.
188
Latency
Traffic to and from the cloud can be delayed because of network traffic congestion or slow internet
connections.
Control
Storing data in public clouds relinquishes some control over access and management of that data,
entrusting that the cloud service provider will always be able to make that data available and maintain its
systems and security.
Outages
While public cloud providers aim to ensure continuous availability, outages sometimes do occur, making
stored data unavailable.
Cloud Storage provides several use cases that can benefit individuals and organizations. Whether a person
is storing their family budget on a spreadsheet, or a massive organization is saving years of financial data
in a highly secure database, Cloud Storage can be used for saving digital data of all kinds for as long as
needed.
Backup
Data backup is one of the simplest and most prominent uses of Cloud Storage. Production data can be
separated from backup data, creating a gap between the two that protects organizations in the case of a
cyber threat such as ransomware. Data backup through Cloud Storage can be as simple as saving files to a
digital folder such as Google Drive or using block storage to maintain gigabytes or more of important
business data.
Archiving
The ability to archive old data has become an important aspect of Cloud Storage, as organizations
move to digitize decades of old records, as well as hold on to records for governance and compliance
purposes. Google Cloud offers several tiers of storage for archiving data, including coldline storage
and archival storage, that can be accessed whenever an organization needs them.
Disaster recovery
A disaster—natural or otherwise— that wipes out a data center or old physical records needs not be the
business-crippling event that it was in the past. Cloud Storage allows for disaster recovery so that
organizations can continue with their business, even when times are tough.
189
Data processing
As Cloud Storage makes digital data immediately available, data becomes much more useful on an
ongoing basis. Data processing, such as analyzing data for business intelligence or applying machine
learning and artificial intelligence to large datasets, is possible because of Cloud Storage.
Content delivery
With the ability to save copies of media data, such as large audio and video files, on servers dispersed
across the globe, media and entertainment companies can serve their audience low-latency, always
available content from wherever they reside.
Cloud Storage comes in three different types: object, file, and block.
Object
Object storage is a data storage architecture for large stores of unstructured data. It designates each piece
of data as an object, keeps it in a separate storehouse, and bundles it with metadata and a unique identifier
for easy access and retrieval.
File
File storage organizes data in a hierarchical format of files and folders. File storage is common in personal
computing where data is saved as files and those files are organized in folders. File storage makes it easy
to locate and retrieve individual data items when they are needed. File storage is most often used in
directories and data repositories.
Block
Block storage breaks data into blocks, each with an unique identifier, and then stores those blocks as
separate pieces on the server. The cloud network stores those blocks wherever it is most efficient for the
system. Block storage is best used for large volumes of data that require low latency such as workloads
that require high performance or databases.
Definition
Application Services (often used instead of application management services or application services
management) are a pool of services such as load balancing, application performance monitoring,
190
application acceleration, autoscaling, micro-segmentation, service proxy and service discovery needed to
optimally deploy, run and improve applications.
The process of configuring, monitoring, optimizing and orchestrating different app services is known as
application services management.
Today, organizations with their own data centers or which use the public cloud, handle applications
services management. In the early days of online adoption, application service providers (or ASPs) were
companies which would deliver applications to end users for a fixed cost. This single tenant, hosted model
was largely replaced by the advent of the Software-as-a-Service (SaaS) delivery model which was multi-
tenant and on-demand.
191
Cloud App Services are a wide range of specific application services for applications deployed in cloud-
based resources. Services such as load balancing, application firewalling and service discovery can be
achieved for applications running in private, public, hybrid or multi-cloud environments.
Traditional applications were built as monolithic blocks of software. These monolithic applications have
long life cycles because any changes or updates to one function, usually requires reconfiguring the entire
application. This costly and time consuming process delays advancements and updates in application
development.
Application Modernization Services enable the migration of monolithic, legacy application architectures to
new application architectures that more closely match the business needs of modern enterprises’
application portfolio. Application modernization is often part of an organization’s digital transformation.
An example of this is the use of a microservices architecture where all app services are created
individually and deployed separately from one another. This allows for scaling services based on specific
business needs. Services can also be rapidly changed without affecting other parts of the application.
Application-centric enterprises are choosing microservices architectures to take advantage of flexible
container-based infrastructure models.
192
Part A
Load balancing is an essential technique used in cloud computing to optimize resource utilization
and ensure that no single resource is overburdened with traffic. It is a process of distributing
workloads across multiple computing resources, such as servers, virtual machines, or containers, to
achieve better performance, availability, and scalability.
193
Cloud Scalability
Cloud Elasticity
Elasticity is used just to meet the sudden
Scalability is used to meet the static
up and down in the workload for a small
increase in the workload.
period of time.
Elasticity is used to meet dynamic Scalability is always used to address
changes, where the resources need can the increase in workload in an
increase or decrease. organization.
Elasticity is commonly used by small Scalability is used by giant
companies whose workload and demand companies whose customer circle
increases only for a specific period of persistently grows in order to do the
time. operations efficiently.
It is a short term planning and adopted Scalability is a long term planning
just to deal with an unexpected increase and adopted just to deal with an
in demand or seasonal demands. expected increase in demand.
Part B
Part C
1. Explain storage services and application services.
194
REFERENCES
1. Kshemkalyani, Ajay D., and Mukesh Singhal. Distributed computing: principles, algorithms,
and systems. Cambridge University Press, 2011.
2. George Coulouris, Jean Dollimore and Tim Kindberg, ―Distributed Systems Concepts and
Design‖, Fifth Edition, Pearson Education, 2012.
3. Pradeep K Sinha, “Distributed Operating Systems: Concepts and Design”, Prentice Hall o f
India, 2007.
4. Mukesh Singhal and Niranjan G. Shivaratri. Advanced concepts in operating systems.
McGraw-Hill, Inc., 1994.
5. Tanenbaum A.S., Van Steen M., ―Distributed Systems: Principles and Paradigms‖, Pearson
Education, 2007.
6. Liu M.L., ―Distributed Computing, Principles and Applications‖, Pearson Education, 2004.
7.. Nancy A Lynch, ―Distributed Algorithms‖, Morgan Kaufman Publishers, USA, 2003.
195
ANNA UNIVERSITY
MODEL
QUESTION PAPERS
196
197
13. (a) Compare the various distributed mutual exclusion algorithms with diagrams. (13)
(OR)
(b) Explain the Knapp’s classification with suitable examples. (13)
14. (a) Explain in detail about checkpoint based recovery and log based rollback recovery (13)
(OR)
(b) Discuss about agreement in synchronous systems with failures. (13)
15. (a) Explain about the working of Tapestry in detail. (13)
(OR)
(b) Describe the memory consistency models. (13)
PART – C (1x15=15 Marks)
16. (a) Design the algorithms for asynchronous checkpointing and recovery. (15)
(OR)
(b) Explain shared memory exclusion in detail. (15).
198