Distributed Systems Ch2-2022
Distributed Systems Ch2-2022
Distributed Systems:
Benefits, Issues and Challenges
1
22/03/2022
Resource Sharing
l Each node can access and utilize the available resources of the other nodes in the system.
l Examples:
l At a given point in time, a user at node A may access and utilize a laser printer resource (via a print
daemon) located at node B.
l At the same time, a user at node B may access a file (via a network file server) that resides on node
A.
l Benefits:
l This means that we can directly access resources that we would otherwise not be able to.
l We can share our own resources with our friends / colleagues.
l Issues:
l How do we control access to the resource?
l What happens if a resource becomes unavailable?
Concurrency
l Multiple machines allow multiple users to work in parallel.
l This means the system must be able to handle multiple tasks concurrently.
l The tasks themselves can be decomposed into subtasks that can be executed at the same time on
different machines.
l Examples:
l Five different users attempt to print a 100 page document at the same time.
l 10,000 users attempt to access the same website at the same time.
l Consider an online auction: If two concurrent bidders (Farid and Djamel) place bids concurrently, how
can we ensure that their bids are recorded correctly?
2
22/03/2022
Concurrency
l Benefits:
l Computational Speedup – the system can exploit parallelism to reduce the time required to complete
a task.
l Issues:
l Too many users trying to do the same thing can lead to resources becoming overloaded, thus
increasing the execution time of that task.
l Concurrency has costs in terms of task management, network communication overheads, …
l How can we ensure that (possibly conflicting) operations are performed correctly?
Global Time
l Some tasks carried out by a distributed system require a shared concept of time.
l Examples:
l In a banking system, all transactions must be recorded using a single global clock to ensure that those
transactions are applied to customer accounts in the correct order.
l Two spray paint robots in a car manufacturing plan that try to coordinate their activities in-order to
speed painting of the car.
l Issues:
l A Distributed System has, by default, no global clock, therefore any solution requiring a global clock
must include an appropriate implementation.
l Unfortunately, all current approaches to implementing a global clock (as we will see in this course) are
limited.
3
22/03/2022
Reliability
l Reliability refers to the ability of a system to continue functioning after failure of a component.
l System perspective: If tasks are distributed over a number of nodes, then the failure of one node will
not affect the operation of the other nodes.
l User perspective: If a particular resource becomes available, I can search the distributed system for
another suitable resource.
A DS is “one on which I cannot get any work done because some machine I have never heard of has
crashed”
– L. Lamport
Scalability
l Scalability is concerned with how well can a system handle an increasing number of users and
resources.
l In theory, a DS would scale well, because all we have to do is to add more machines and make the
rest of the system aware of them.
l Unfortunately, this is NOT EASY to do in practice!
l Benefits:
l The system can adapt to changing requirements both in terms of the number of users and the number
of resources.
l Issues:
l Does the new machine have the right resources?
l What impact does adding the machine have on the network usage?
l How does the additional machines affect the reliability of the system?
4
22/03/2022
Heterogeneity
l Heterogeneity:
l The heterogeneity of a Distributed System refers to its ability to work with different
Networks, Computer Hardware, Operating Systems, Programming Languages, …
l Examples:
l the Internet supports heterogeneity through the Internet Protocol Suite.
l Communication between programs written in different language must cater for different type
encoding schemes (int)
l Programs developed for standardised frameworks must operate on all implementations of that
framework (e.g. EJB, Servlets).
Openness
l Openness refers to the degree to which a system is extensible.
l In DS, this often refers to the potential of adding and integrating new resource-sharing services.
l Openness requires that key software interfaces be made available to developers (published) and
that a uniform communication mechanism be defined.
l Open distributed systems can be constructed from heterogeneous hardware and software,
possibly from different vendors.
l In practice, openness can often only be achieved on a limited scale (e.g. Web Services).
10
5
22/03/2022
Security
l Security is concerned with the protection of (information) resources that may be of value to
their owner, but which need to be shared.
l E.g. credit card information on the internet, FBI access to regional police files.
l In publicly accessible systems, the security challenge must also deal with issues such as
Denial of Service Attacks.
11
Fault Tolerance
l Computer systems sometimes fail.
l Hardware faults may cause errors in processes, or even stop the process from finishing.
l In a distributed system, failure is partial – the failure of one component does not cause the whole
system to fail.
12
6
22/03/2022
Transparency
l The concealment from the user of the specific set of components that make up a distributed
system.
13
14
7
22/03/2022
l software artifacts requirements. These requirements are dealing with things such as
modularity, reusability, choice of programming language, adherence to specific programming
style, etc.
15
Performance
l Responsiveness
l Users of interactive applications require a fast and consistent response to interaction.
l The key factor is the response time
l Because client systems often need to access shared resources, response times for remote services
can be affected by:
l Load and performance of the server
l Latency and performance of the Network
l Time costs associated with the use of middleware services and OS communication services.
l The processing time for the actual code that implements the service
l Maximum responsiveness is often achieved by:
l Minimising the number of software layers used by the application
l Minimising the quantities of data that is transferred between the client and the server.
l Example: Web Browsing
l Locally cached web pages and images load more quickly that similarly sized remote ones and simple text-
based pages load more quickly than graphically intensive pages.
16
8
22/03/2022
Performance
l Throughput
l Throughput is the rate at which computational work is done.
l It is similar to responsiveness, with the exception that it focuses on task completion times rather than
interface update times.
l However, in contrast with responsiveness, it is affected by the processing speeds of both the client
and the server.
l When analysing throughput, it is also vital to consider the throughput of each of the software layers
that the architecture employs.
17
Quality of Service
l Quality of Service is measured in terms of reliability, security, and performance.
l In the previous slides performance was measured in terms of responsiveness and throughput.
l Another factor that is sometimes associated with performance is time-criticality:
l The handling of data that must be processed or transferred from one process to another at a fixed rate or
within strict delivery deadlines.
l For example, a video streaming service must be designed to ensure that the successive frames of
video must be delivered in sufficient time for the client to be able to display the video to the service
users.
18
9
22/03/2022
Dependability
l Dependability is a key requirement for many critical and commercial systems (e.g. nuclear
power plant control systems, financial systems) and can be defined in terms of correctness,
security, and fault tolerance.
l Fault Tolerance:
l Dependable systems should continue to function correctly in the presents of faults in hardware,
software, and networks.
l The key tool for achieving this is reliability – the provision of multiple resources so that the system and
application software can reconfigure and continue to function correctly.
l At the architectural level, reliability is achieved through redundancy via the use of multiple computers
at which each component process can run and multiple communication paths through which
messages can be transmitted.
l Multiple copies (replicas) of both data and processes should be maintained to ensure that the required
level of fault tolerance can be achieved.
19
Exercises
l A user arrives at a railway station that she has never visited before, carrying a PDA that is
capable of wireless networking. Suggest how the user could be provided with information about
the local services and amenities at that station, without entering the station’s name or
attributes. What technical challenges must be overcome?
l Describe and illustrate the client-server architecture of one or more major Internet applications
(for example the Web, email or netnews).
l A client sends a 200 byte request message to a service, which produces a response containing
5000 bytes. Estimate the total time to complete the request in each of the following cases, with
the performance assumptions listed below:
- Using connectionless (datagram) communication (for example, UDP);
- Using connection-oriented communication (for example, TCP);
20
10