0% found this document useful (0 votes)
74 views39 pages

CS542: Topics in Distributed Systems

This document provides an overview of the key topics in distributed systems including definitions, challenges, and design goals. It defines a distributed system as a collection of independent computers that act as a single coherent system. Important challenges in distributed systems include handling heterogeneity, failures, scalability, security and openness. The document discusses approaches to address these challenges including using middleware to provide transparency and standardized interfaces. It also outlines techniques for designing scalable distributed systems and handling failures in a distributed environment.

Uploaded by

yekoyesew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views39 pages

CS542: Topics in Distributed Systems

This document provides an overview of the key topics in distributed systems including definitions, challenges, and design goals. It defines a distributed system as a collection of independent computers that act as a single coherent system. Important challenges in distributed systems include handling heterogeneity, failures, scalability, security and openness. The document discusses approaches to address these challenges including using middleware to provide transparency and standardized interfaces. It also outlines techniques for designing scalable distributed systems and handling failures in a distributed environment.

Uploaded by

yekoyesew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

CS542:

CS542: Topics
Topics in
in
Distributed
Distributed Systems
Systems

Diganta Goswami
Distributed
Distributed System
System

• A collection of independent computers


that appears to its users as a single
coherent system.
– Autonomous computers
– Many components – connected by a network –
sharing resources.
Distributed
Distributed System
System

• A System of networked components that


communicate and coordinate their actions only by
passing messages
– concurrent execution of programs
– no global clock
– components fail independently of one another
Another
Another definition
definition

• You know you have a distributed system when


the crash of a computer you’ve never heard of
stops you from getting any work done.
– inter-dependencies
– shared state
– independent failure of components
AA working
working definition
definition for
for us
us

A distributed system is a collection of entities, each


of which is autonomous, programmable,
asynchronous and failure-prone, and which
communicate through an unreliable communication
medium using message passing.

• Entity=a process on a device (PC, PDA)


• Communication Medium=Wired or wireless network
• Our interest in distributed systems involves
– design and implementation, maintenance, algorithmics
“Important”
“Important” Distributed
Distributed Systems
Systems Issues
Issues

• No global clock: no single global notion of the correct


time (asynchrony)
• Unpredictable failures of components: lack of
response may be due to either failure of a network
component, network path being down, or a computer
crash (failure-prone, unreliable)
• Highly variable bandwidth: from 16Kbps (slow
modems or Google Balloon) to Gbps (Internet2) to
Tbps (in between DCs of same big company)
• Possibly large and variable latency: few ms to
several seconds
• Large numbers of hosts: 2 to several million
There
There are
are aa range
range of
of interesting
interesting problems
problems for
for
Distributed
Distributed System
System designers
designers



• Real distributed systems
– Cloud Computing, Peer to peer systems, Hadoop, distributed file
systems, sensor networks, graph processing, …
• Classical Problems
– Failure detection, Asynchrony, Snapshots, Multicast, Consensus,
Mutual Exclusion, Election, …
• Concurrency
– RPCs, Concurrency Control, Replication Control, …
• Security
– Byzantine Faults, …
• Others…

Typical
Typical Distributed
Distributed Systems
Systems Design
Design Goals
Goals
• Common Goals:
– Heterogeneity – can the system handle a large variety of
types of PCs and devices?
– Robustness – is the system resilient to host crashes
and failures, and to the network dropping messages?
– Availability – are data+services always there for clients?
– Transparency – can the system hide its internal
workings from the users?
– Concurrency – can the server handle multiple clients
simultaneously?
– Efficiency – is the service fast enough? Does it utilize
100% of all resources?
– Scalability – can it handle 100 million nodes without
degrading service? (nodes=clients and/or servers)
– Security – can the system withstand hacker attacks?
– Openness – is the system extensible?
Challenges
Challenges and
and Goals
Goals of
of Distributed
Distributed Systems
Systems

• Heterogeneity
• Openness
• Security
• Scalability
• Failure handling
• Concurrency
• Transparency
Challenges
Challenges

• Heterogeneity (variety and difference) ofunderlying


network infrastructure,
• Internet consists of many different sorts of network –
their differences are masked by the fact that all of the
computers attached to them use the Internet Protocols for
communication.
– e.g. a computer attached to an Ethernet has an implementation of the
Internet Protocols over the Ethernet, whereas a computer on a different sort
of network will need an implementation of the Internet Protocols for that
network.
Heterogeneity
Heterogeneity

• Computer hardware and software


– e.g., operating systems, compare UNIX socket and Winsock
calls

• Programming languages : in particular, data


representations
Some
Some approaches:
approaches: Middleware
Middleware

• A S/W layer that provides a programming


abstraction as well as masking the heterogeneity
of the underlying networks, H/W, O/S and
programming languages.
transparency of network, hard- and
– Middleware (e.g., CORBA):
software and programming language heterogeneity. JAVA
RMI
• In addition to solving the problems of heterogeneity,
middleware provides a uniform computational model for
use by the programmers of servers and distributed
applications.
Positioning
Positioning Middleware
Middleware

• General structure of a distributed system as


middleware.

1-22
Openness
Openness

• Characteristic that determine whether the system


can be extended and re-implemented in various
ways.

– Determined primarily by the degree to which new resource


sharing services can be added and be made available for use
by a variety of client programs.

– Cannot be achieved unless the specification and


documentation of the key s/w interfaces are made available to
s/w developers (i.e. key interfaces are published)
Openness
Openness
• Designers of the Internet protocols introduced
a series of documents called RFCs

– Specifications of the Internet communication


protocols
– Specifications for applications run over them
» e.g., email, telnet, file transfer, etc. (by the mid 80’s)

• RFCs are not the only means --- e.g. CORBA is


published through a series of documents, including a
complete specification of the interfaces of its services
(www.omg.org)
Openness
Openness

• Offering services according to standard rules that


describe the syntax and semantics of those
services
– e.g., Network protocol rules (RFCs)

• Services specified through interfaces


– Interface definition languages (IDLs)
• specifies names and available functions as well as
parameters, return values, exceptions etc.
Security
Security

• Distributed systems must protect the shared


information and resources

• The openness of DS makes them vulnerable to


security threats

• Providing security is a significant challenge for


DS
Security.
Security.

Privacy / Confidentiality: protection against


disclosure to unauthorized individuals

Integrity: protection against alteration or corruption

Availability: protection against interference with the


means to access the resources
Scalability
Scalability

• Scalable system—system that can handle additional


number of users/resources without suffering
noticeable loss of performance

• Three metrics of a scalable system


– No of user/resources
– Distance between the farthest nodes in the system (network radius)
– Number of organizations exerting control over the pieces of the
system
Challenges
Challenges in
in designing
designing scalable
scalable DS
DS

• Controlling the cost of physical resources:


– As the demand for a resource grows, it should be
possible to extend the system, at reasonable cost,
to meet it.
» e.g. it must be possible to add server computers to avoid
the performance bottleneck that would arise if a single file
server had to handle all file access request when the freq.
of file access request grows in an intranet with the
increase in users and computers.

www.amazon.com is more than one computer


Challenges
Challenges in
in designing
designing scalable
scalable DS
DS

• Controlling the performance loss:


– Management of a set of data whose size is
proportional to the number of users or resources in
the system
» e.g. the Domain Name System holds the table with the
correspondence between domain names of computers
and their Internet address
» Hierarchic structures scale better than linar structures.
Scaling
Scaling Techniques
Techniques

1.5

An example of dividing the DNS name space into zones.


Challenges
Challenges in
in designing
designing scalable
scalable DS
DS

• Preventing s/w resources running out:


– Numbers used as Internet address --- 32 bits was
used in the late 70’s but may run out soon.
– Change from 32 bits to 128 bits?
– Difficult to predict the demand.
– Over-compensating for future growth may be worse than
adapting to a change when we are forced to - large Internet
address occupy extra space in messages and in computer
storage.
Failure
Failure Handling
Handling

• Failure in a DS is partial

– Some components fail while others continue to


function

– This makes handling of failures difficult.


Techniques
Techniques for
for dealing
dealing with
with failures
failures

• Detecting failures

– may be impossible – remote site crash or delay


in message transmission?

– Some can be.


– Ex. - Checksums can be used to detect corrupted data
Techniques
Techniques for
for dealing
dealing with
with failures
failures

• Masking failure

– Some can be hidden or made less severe

– Retransmission – when messages fail to arrive


Techniques
Techniques for
for dealing
dealing with
with failures
failures

• Tolerating failures

– Would not be practical to detect and hide all of the failures.


Can be designed to tolerate some of those

– e.g. timeouts when waiting for a web resource – clients give


up after a predetermined number of attempts and take other
actions & inform the user.
Failure
Failure Handling
Handling

• Recovery from failures


– Rollback
– Undo/Redo in transactions

• Redundancy
– Makes the system more available through replication of
resources/data
– Redundant routes in the network
– Replication of name tables in multiple domain name servers
Concurrency
Concurrency

• In a distributed system it is possible that


multiple machines/processes/users may try to
access shared data/resource concurrently
– Can potentially lead to incorrect results and/or
– Deadlocks

• The operations must be synchronized/serialized so


that the end result is correct
Transparency
Transparency

• Concealing the heterogeneous and


distributed nature of the system so that it
appears to the user like one system

– Making the user believe that there is only a


single, undivided system i.e., to hide the notion
of distribution completely

• What are the challenges of transparency?


Transparency
Transparency Categories
Categories

• Access transparency - access local and remote


resources using identical operations

– e.g., users of UNIX NFS can use the same commands


and parameters for file system operations regardless of
whether the accessed files are on a local or remote disk.
Transparency
Transparency categories
categories

• Location Transparency: Access without


knowledge of location of a resource

– e.g., URLs, email addresses (hostname, IP addresses, etc.


not required --- the part of the URL that identifies a web
server domain name refers to a computer name in a
domain, rather than to an Internet address)
Transparency
Transparency Categories
Categories

• Concurrency transparency: Allow several


processes to operate concurrently using shared
resources in a consistent fashion w/o interference
between them.
– That is, users and programmers are unaware that
components request services concurrently.

• Replication transparency
– Use replicated resource as if there was just one
instance.
» Increase reliability and performance w/o knowledge of
the replicas by users or application programmers.
Failure
Failure transparency
transparency
• Enables the concealment of faults, allowing
users and application programs to complete
their task despite failures of h/w or s/w
components.

• Retransmit of email messages – eventually


delivered even when servers or
communication links fail – it may even take
several days.
Failure
Failure transparency
transparency

• Failure transparency depends on concurrency and


replication transparency.

• Replication can be employed to achieve failure


transparency

• Message transmission governed by TCP is a


mechanism for providing failure transparency
Mobility
Mobility Transparency
Transparency

• Mobility transparency: allow resources to move


around w/o affecting the operation of users or
programs
• e.g., 700 phone number – but URLs are not, because
someone’s personal web page cannot move to their new
place of work in a different domain – all of the links in other
pages will still point to the original page!


Transparency
Transparency Categories
Categories

• Performance transparency: adaptation of the


system to varying load situations without the user
noticing it.

• Scaling transparency: allow system and


applications to expand without need to change
structure or application algorithms
Degree
Degree of
of transparency
transparency
• There are systems in which attempting to blindly hide
all distribution aspects from users is not always a
good idea

– Requesting your electronic newspaper in your mailbox before 7 am


local time – while you are at the other end of the world living in a
different time zone

– (Your morning paper will not be the morning paper you are used to)
Degree
Degree of
of transparency
transparency

• There is trade-off between a high degree of


transparency and the performance of a system

– Masking transient server failure by retransmitting the request


may slow down the system

– If it is necessary to guarantee that several replicas need to be


consistent all the time, a single update may take a long time –
something that cannot be hidden from the user.

You might also like