0% found this document useful (0 votes)
14 views38 pages

Chapter 6 BasicsDS

The document discusses distributed systems, which consist of independent computers that communicate through message passing, highlighting their key characteristics such as asynchrony, failure-proneness, and the absence of a global clock. It outlines important issues in distributed systems design, including heterogeneity, security, scalability, and failure handling, as well as the goals of robustness, availability, and transparency. Additionally, it emphasizes the role of middleware in addressing these challenges and the various types of transparency that can be achieved in distributed systems.

Uploaded by

shikharmishra252
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views38 pages

Chapter 6 BasicsDS

The document discusses distributed systems, which consist of independent computers that communicate through message passing, highlighting their key characteristics such as asynchrony, failure-proneness, and the absence of a global clock. It outlines important issues in distributed systems design, including heterogeneity, security, scalability, and failure handling, as well as the goals of robustness, availability, and transparency. Additionally, it emphasizes the role of middleware in addressing these challenges and the various types of transparency that can be achieved in distributed systems.

Uploaded by

shikharmishra252
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

CS542: Topics in

Distributed Systems

Courtesy:Diganta Goswami
Distributed System

• A collection of independent computers


that appears to its users as a single
coherent system.
– Autonomous computers
– Many components – connected by a network –
sharing resources.
Distributed System

• A System of networked components that


communicate and coordinate their actions only by
passing messages
– concurrent execution of programs
– no global clock
– components fail independently of one another
A working definition for us

A distributed system is a collection of entities, each


of which is autonomous, programmable,
asynchronous and failure-prone, and which
communicate through an unreliable communication
medium using message passing.

• Entity=a process on a device (PC, PDA)


• Communication Medium=Wired or wireless network
• Our interest in distributed systems involves
– design and implementation, maintenance, algorithmics
“Important” Distributed Systems Issues

• No global clock: no single global notion of the correct


time (asynchrony)
• Unpredictable failures of components: lack of
response may be due to either failure of a network
component, network path being down, or a computer
crash (failure-prone, unreliable)
• Highly variable bandwidth: from 16Kbps (slow
modems or Google Balloon) to Gbps (Internet2) to
Tbps (in between DCs of same big company)
• Possibly large and variable latency: few ms to
several seconds
• Large numbers of hosts: 2 to several million
There are a range of interesting problems for
Distributed System designers



• Real distributed systems
– Cloud Computing, Peer to peer systems, Hadoop, distributed file
systems, sensor networks, graph processing, …
• Classical Problems
– Failure detection, Asynchrony, Snapshots, Multicast, Consensus,
Mutual Exclusion, Election, …
• Concurrency
– RPCs, Concurrency Control, Replication Control, …
• Security
– Byzantine Faults, …
• Others…

Typical Distributed Systems Design Goals
• Common Goals:
– Heterogeneity – can the system handle a large variety of
types of PCs and devices?
– Robustness – is the system resilient to host crashes
and failures, and to the network dropping messages?
– Availability – are data+services always there for clients?
– Transparency – can the system hide its internal
workings from the users?
– Concurrency – can the server handle multiple clients
simultaneously?
– Efficiency – is the service fast enough? Does it utilize
100% of all resources?
– Scalability – can it handle 100 million nodes without
degrading service? (nodes=clients and/or servers)
– Security – can the system withstand hacker attacks?
– Openness – is the system extensible?
Challenges and Goals of Distributed Systems

• Heterogeneity
• Openness
• Security
• Scalability
• Failure handling
• Concurrency
• Transparency
Challenges

• Heterogeneity (variety and difference) of� underlying


network infrastructure,
• Internet consists of many different sorts of network –
their differences are masked by the fact that all of the
computers attached to them use the Internet Protocols for
communication.
– e.g. a computer attached to an Ethernet has an implementation of the
Internet Protocols over the Ethernet, whereas a computer on a different sort
of network will need an implementation of the Internet Protocols for that
network.
Heterogeneity

• Computer hardware and software


– e.g., operating systems, compare UNIX socket and Winsock
calls

• Programming languages : in particular, data
representations
Some approaches: Middleware

• A S/W layer that provides a programming


abstraction as well as masking the heterogeneity
of the underlying networks, H/W, O/S and
programming languages.
transparency of network, hard- and
– Middleware (e.g., CORBA):
software and programming language heterogeneity. JAVA
RMI
• In addition to solving the problems of heterogeneity,
middleware provides a uniform computational model for
use by the programmers of servers and distributed
applications.

Positioning Middleware

• General structure of a distributed system as


middleware.

1-22
Openness

• Characteristic that determine whether the system


can be extended and re-implemented in various
ways.

– Determined primarily by the degree to which new resource


sharing services can be added and be made available for use
by a variety of client programs.

– Cannot be achieved unless the specification and


documentation of the key s/w interfaces are made available to
s/w developers (i.e. key interfaces are published)



Openness

• Designers of the Internet protocols introduced


a series of documents called RFCs

– Specifications of the Internet communication


protocols
– Specifications for applications run over them
» e.g., email, telnet, file transfer, etc. (by the mid 80’s)

• RFCs are not the only means --- e.g. CORBA is


published through a series of documents, including a
complete specification of the interfaces of its services
(www.omg.org)


Openness

• Offering services according to standard rules that


describe the syntax and semantics of those
services
– e.g., Network protocol rules (RFCs)

• Services specified through interfaces


– Interface definition languages (IDLs)
• specifies names and available functions as well as
parameters, return values, exceptions etc.
Security

• Distributed systems must protect the shared


information and resources

• The openness of DS makes them vulnerable to


security threats

• Providing security is a significant challenge for


DS
Security.

Privacy / Confidentiality: protection against


disclosure to unauthorized individuals

Integrity: protection against alteration or corruption

Availability: protection against interference with the


means to access the resources
Scalability

• Scalable system—system that can handle additional


number of users/resources without suffering
noticeable loss of performance

• Three metrics of a scalable system


– No of user/resources
– Distance between the farthest nodes in the system (network radius)
– Number of organizations exerting control over the pieces of the
system
Challenges in designing scalable DS

• Controlling the cost of physical resources:


– As the demand for a resource grows, it should be
possible to extend the system, at reasonable cost,
to meet it.
» e.g. it must be possible to add server computers to avoid
the performance bottleneck that would arise if a single file
server had to handle all file access request when the freq.
of file access request grows in an intranet with the
increase in users and computers.

www.amazon.com is more than one computer


Challenges in designing scalable DS

• Controlling the performance loss:


– Management of a set of data whose size is
proportional to the number of users or resources in
the system
» e.g. the Domain Name System holds the table with the
correspondence between domain names of computers
and their Internet address
» Hierarchic structures scale better than linar structures.
Scaling Techniques

1.5

An example of dividing the DNS name space into zones.


Challenges in designing scalable DS

• Preventing s/w resources running out:


– Numbers used as Internet address --- 32 bits was
used in the late 70’s but may run out soon.
– Change from 32 bits to 128 bits?
– Difficult to predict the demand.
– Over-compensating for future growth may be worse than
adapting to a change when we are forced to - large Internet
address occupy extra space in messages and in computer
storage.
Failure Handling

• Failure in a DS is partial

– Some components fail while others continue to function

– This makes handling of failures difficult.


Techniques for dealing with failures

• Detecting failures

– may be impossible – remote site crash or delay


in message transmission?

– Some can be.


– Ex. - Checksums can be used to detect corrupted data
Techniques for dealing with failures

• Masking failure

– Some can be hidden or made less severe

– Retransmission – when messages fail to arrive


Techniques for dealing with failures

• Tolerating failures

– Would not be practical to detect and hide all of the failures.


Can be designed to tolerate some of those

– e.g. timeouts when waiting for a web resource – clients give


up after a predetermined number of attempts and take other
actions & inform the user.
Failure Handling

• Recovery from failures


– Rollback
– Undo/Redo in transactions

• Redundancy
– Makes the system more available through replication of
resources/data
– Redundant routes in the network
– Replication of name tables in multiple domain name servers
Concurrency

• In a distributed system it is possible that


multiple machines/processes/users may try to
access shared data/resource concurrently
– Can potentially lead to incorrect results and/or
– Deadlocks

• The operations must be synchronized/serialized so


that the end result is correct
Transparency

• Concealing the heterogeneous and


distributed nature of the system so that it
appears to the user like one system

– Making the user believe that there is only a


single, undivided system i.e., to hide the notion of
distribution completely

• What are the challenges of transparency?


Transparency Categories

• Access transparency - access local and remote


resources using identical operations

– e.g., users of UNIX NFS can use the same commands


and parameters for file system operations regardless of
whether the accessed files are on a local or remote disk.


Transparency categories

• Location Transparency: Access without


knowledge of location of a resource

– e.g., URLs, email addresses (hostname, IP addresses, etc.


not required --- the part of the URL that identifies a web
server domain name refers to a computer name in a domain,
rather than to an Internet address)
Transparency Categories

• Concurrency transparency:
Allow several
processes to operate concurrently using shared
resources in a consistent fashion w/o interference
between them.
– That is, users and programmers are unaware that
components request services concurrently.

• Replication transparency
– Use replicated resource as if there was just one
instance.
» Increase reliability and performance w/o knowledge of
the replicas by users or application programmers.
Failure transparency
• Enables the concealment of faults, allowing
users and application programs to complete
their task despite failures of h/w or s/w
components.

• Retransmit of email messages – eventually


delivered even when servers or
communication links fail – it may even take
several days.
Failure transparency

• Failure transparency depends on concurrency and


replication transparency.

• Replication can be employed to achieve failure


transparency

• Message transmission governed by TCP is a


mechanism for providing failure transparency
Mobility Transparency

• Mobility transparency: allow resources to move


around w/o affecting the operation of users or
programs
• e.g., 700 phone number – but URLs are not, because
someone’s personal web page cannot move to their new
place of work in a different domain – all of the links in other
pages will still point to the original page!


Transparency Categories

• Performance transparency: adaptation of the


system to varying load situations without the user
noticing it.�

• Scaling transparency: allow system and
applications to expand without need to change
structure or application algorithms
Degree of transparency
• There are systems in which attempting to blindly hide
all distribution aspects from users is not always a
good idea

– Requesting your electronic newspaper in your mailbox before 7 am


local time – while you are at the other end of the world living in a
different time zone

– (Your morning paper will not be the morning paper you are used to)
Degree of transparency

• There is trade-off between a high degree of


transparency and the performance of a system

– Masking transient server failure by retransmitting the request


may slow down the system

– If it is necessary to guarantee that several replicas need to be


consistent all the time, a single update may take a long time –
something that cannot be hidden from the user.

You might also like