0% found this document useful (0 votes)

30 views10 pages

Distributed Systems Ch2-2022

This document summarizes key concepts in distributed systems including benefits, issues, and challenges. It discusses decomposing a system into components, resource sharing, concurrency, global time, reliability, scalability, heterogeneity, openness, security, fault tolerance, and transparency. The main benefits are resource sharing and exploiting parallelism. The main challenges are complexity, controlling access, handling concurrency, implementing a global clock, reliability, scalability issues, heterogeneity, and security concerns.

Uploaded by

Anis MOHAMMEDI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views10 pages

Distributed Systems Ch2-2022

Uploaded by

Anis MOHAMMEDI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

22/03/2022

Distributed Systems:
Benefits, Issues and Challenges

Prof. Tahar KECHADI

School of Computer Science

University College Dublin, Ireland

Benefits, Issues and Challenges

l Creating a Distributed System is not trivial:

l It requires the decomposition of the system into a number of components.
l At the very least, we must consider:
l what these components will do?
l where they will be located?
l Which component will interact with which?
l how they will interact with one another to realise system goals?
l This can often result in a system that is more complex than a single centralised system.

1
22/03/2022

Resource Sharing
l Each node can access and utilize the available resources of the other nodes in the system.

l Examples:
l At a given point in time, a user at node A may access and utilize a laser printer resource (via a print
daemon) located at node B.

l At the same time, a user at node B may access a file (via a network file server) that resides on node
A.

l Benefits:
l This means that we can directly access resources that we would otherwise not be able to.
l We can share our own resources with our friends / colleagues.

l Issues:
l How do we control access to the resource?
l What happens if a resource becomes unavailable?

Concurrency
l Multiple machines allow multiple users to work in parallel.
l This means the system must be able to handle multiple tasks concurrently.
l The tasks themselves can be decomposed into subtasks that can be executed at the same time on
different machines.

l Examples:
l Five different users attempt to print a 100 page document at the same time.
l 10,000 users attempt to access the same website at the same time.
l Consider an online auction: If two concurrent bidders (Farid and Djamel) place bids concurrently, how
can we ensure that their bids are recorded correctly?

2
22/03/2022

Concurrency
l Benefits:
l Computational Speedup – the system can exploit parallelism to reduce the time required to complete
a task.

l Issues:
l Too many users trying to do the same thing can lead to resources becoming overloaded, thus
increasing the execution time of that task.
l Concurrency has costs in terms of task management, network communication overheads, …
l How can we ensure that (possibly conflicting) operations are performed correctly?

Global Time
l Some tasks carried out by a distributed system require a shared concept of time.

l Examples:
l In a banking system, all transactions must be recorded using a single global clock to ensure that those
transactions are applied to customer accounts in the correct order.
l Two spray paint robots in a car manufacturing plan that try to coordinate their activities in-order to
speed painting of the car.

l Issues:
l A Distributed System has, by default, no global clock, therefore any solution requiring a global clock
must include an appropriate implementation.
l Unfortunately, all current approaches to implementing a global clock (as we will see in this course) are
limited.

3
22/03/2022

Reliability
l Reliability refers to the ability of a system to continue functioning after failure of a component.
l System perspective: If tasks are distributed over a number of nodes, then the failure of one node will
not affect the operation of the other nodes.
l User perspective: If a particular resource becomes available, I can search the distributed system for
another suitable resource.

l Reliability is often related to the level of redundancy in the system.

l I.e. how many copies are there in the system? How many key components in the system?
l In theory: A Distributed System is more reliable than a single machien due to replication of resources,
and distribution of work load.
l In practice: This is not always the case!

A DS is “one on which I cannot get any work done because some machine I have never heard of has
crashed”
– L. Lamport

Scalability
l Scalability is concerned with how well can a system handle an increasing number of users and
resources.
l In theory, a DS would scale well, because all we have to do is to add more machines and make the
rest of the system aware of them.
l Unfortunately, this is NOT EASY to do in practice!

l Benefits:
l The system can adapt to changing requirements both in terms of the number of users and the number
of resources.

l Issues:
l Does the new machine have the right resources?
l What impact does adding the machine have on the network usage?
l How does the additional machines affect the reliability of the system?

4
22/03/2022

Heterogeneity

l Heterogeneity:
l The heterogeneity of a Distributed System refers to its ability to work with different
Networks, Computer Hardware, Operating Systems, Programming Languages, …

l Examples:
l the Internet supports heterogeneity through the Internet Protocol Suite.
l Communication between programs written in different language must cater for different type
encoding schemes (int)
l Programs developed for standardised frameworks must operate on all implementations of that
framework (e.g. EJB, Servlets).

Openness
l Openness refers to the degree to which a system is extensible.
l In DS, this often refers to the potential of adding and integrating new resource-sharing services.
l Openness requires that key software interfaces be made available to developers (published) and
that a uniform communication mechanism be defined.
l Open distributed systems can be constructed from heterogeneous hardware and software,
possibly from different vendors.

l In practice, openness can often only be achieved on a limited scale (e.g. Web Services).

5
22/03/2022

Security
l Security is concerned with the protection of (information) resources that may be of value to
their owner, but which need to be shared.
l E.g. credit card information on the internet, FBI access to regional police files.

l Security of this information often takes two forms:

l Security of information transmitted within the distributed system (encryption).
l Security of access to a resource (proof of identity).

l In publicly accessible systems, the security challenge must also deal with issues such as
Denial of Service Attacks.

Fault Tolerance
l Computer systems sometimes fail.
l Hardware faults may cause errors in processes, or even stop the process from finishing.
l In a distributed system, failure is partial – the failure of one component does not cause the whole
system to fail.

l In designing a distributed system we must consider:

l Failure Detection
l Failure Masking (hiding the failure)
l Failure Tolerance
l Recovery from Failure
l Redundancy
l Level of Availability
l e.g. 3 nines = 8.76 hours/year, 5 nines = 5 minutes/year

6
22/03/2022

Transparency
l The concealment from the user of the specific set of components that make up a distributed
system.

l [ISO 1992] Reference Model for Open Distributed Processing:

l Access: enables local and remote resources to be accessed using identical operations (e.g. folder
containing remote & local files).
l Location: enables resources to be accessed without knowledge of their physical or network
location (e.g. domain based URLs).
l Concurrency: enables several processes to operate concurrently using shared resources without
interference between them.
l Replication: enables multiple instances of resources to be used to increase reliability and
performance without knowledge of the replicas.

Distributed System Challenges

l The concealment from the user of the specific set of components that make up a distributed
system.

l [ISO 1992] Reference Model for Open Distributed Processing:

l Failure: enables the concealment of faults, allowing users and application programs to complete
their tasks despite the failure of hardware or software components (e.g. emails delivered even
when the mail server is down).
l Mobility: allows the movement of resources and clients within a system without affecting the
operation of users or programs.
l Performance: allows the system to be reconfigured to improve performance as loads vary.
l Scaling: allows the system and applications to expand in scale without change to the system
structure or the application algorithms.

7
22/03/2022

Distributed Software Systems

l Functionality requirements. These requirements specify any input/output relationship. To
satisfy these requirements, we need a function that takes an input/output relationship and
returns a function that returns the output for a given input.

l Run-time requirements. These are requirements on run-time behaviour such as performance,

distribution, the underlying machine (virtual or otherwise), etc.

l software artifacts requirements. These requirements are dealing with things such as
modularity, reusability, choice of programming language, adherence to specific programming
style, etc.

Performance
l Responsiveness
l Users of interactive applications require a fast and consistent response to interaction.
l The key factor is the response time
l Because client systems often need to access shared resources, response times for remote services
can be affected by:
l Load and performance of the server
l Latency and performance of the Network
l Time costs associated with the use of middleware services and OS communication services.
l The processing time for the actual code that implements the service
l Maximum responsiveness is often achieved by:
l Minimising the number of software layers used by the application
l Minimising the quantities of data that is transferred between the client and the server.
l Example: Web Browsing
l Locally cached web pages and images load more quickly that similarly sized remote ones and simple text-
based pages load more quickly than graphically intensive pages.

8
22/03/2022

Performance
l Throughput
l Throughput is the rate at which computational work is done.
l It is similar to responsiveness, with the exception that it focuses on task completion times rather than
interface update times.
l However, in contrast with responsiveness, it is affected by the processing speeds of both the client
and the server.
l When analysing throughput, it is also vital to consider the throughput of each of the software layers
that the architecture employs.

Quality of Service
l Quality of Service is measured in terms of reliability, security, and performance.

l More recently, Quality of Service measures have begun to include adaptability:

l The ability of the system to meet changing system configurations and resource availability.

l In the previous slides performance was measured in terms of responsiveness and throughput.
l Another factor that is sometimes associated with performance is time-criticality:
l The handling of data that must be processed or transferred from one process to another at a fixed rate or
within strict delivery deadlines.
l For example, a video streaming service must be designed to ensure that the successive frames of
video must be delivered in sufficient time for the client to be able to display the video to the service
users.

9
22/03/2022

Dependability
l Dependability is a key requirement for many critical and commercial systems (e.g. nuclear
power plant control systems, financial systems) and can be defined in terms of correctness,
security, and fault tolerance.

l Fault Tolerance:
l Dependable systems should continue to function correctly in the presents of faults in hardware,
software, and networks.
l The key tool for achieving this is reliability – the provision of multiple resources so that the system and
application software can reconfigure and continue to function correctly.
l At the architectural level, reliability is achieved through redundancy via the use of multiple computers
at which each component process can run and multiple communication paths through which
messages can be transmitted.
l Multiple copies (replicas) of both data and processes should be maintained to ensure that the required
level of fault tolerance can be achieved.

Exercises
l A user arrives at a railway station that she has never visited before, carrying a PDA that is
capable of wireless networking. Suggest how the user could be provided with information about
the local services and amenities at that station, without entering the station’s name or
attributes. What technical challenges must be overcome?

l Describe and illustrate the client-server architecture of one or more major Internet applications
(for example the Web, email or netnews).

l A client sends a 200 byte request message to a service, which produces a response containing
5000 bytes. Estimate the total time to complete the request in each of the following cases, with
the performance assumptions listed below:
- Using connectionless (datagram) communication (for example, UDP);
- Using connection-oriented communication (for example, TCP);