Handouts
Chapter-1
Overview: Distributed Computing
“A collection of independent computers that appears to its users as a single coherent system”
Features:
No shared memory – message-based communication
Each runs its own local OS
Heterogeneity
Ideal: to present a single-system image
The distributed system “looks like” a single computer rather than a collection of
separate computers
Distributed Systems Characteristics:
To present a single-system image
Hide internal organization, communication details
Provide uniform interface
Easily expandable
Adding new computers is hidden from users
Continuous availability
Failures in one component can be covered by other components
Supported by Middleware
“Middleware in the context of distributed applications is software that provides services beyond those
provided by the operating system to enable the various components of a distributed system to
communicate and manage data”
Figure shows, A distributed system organized as middleware. The middleware layer runs on all
machines, and offers a uniform interface to the system
Middleware masks the heterogeneity of distributed systems.
In some early research systems: MW tried to provide the illusion that a collection of
separate machines was a single computer.
Today:
clustering software allows independent computers to work together closely
MW also supports seamless access to remote services, doesn’t try to look like a
general-purpose OS
Middleware Examples:
CORBA (Common Object Request Broker Architecture)
DCOM (Distributed Component Object Management) – being replaced by .net
Sun’s ONC RPC (Remote Procedure Call)
RMI (Remote Method Invocation)
SOAP (Simple Object Access Protocol)
All of the previous examples support communication across a network
They provide protocols that allow a program running on one kind of computer, using one
kind of operating system, to call a program running on another computer with a different
operating system
The communicating programs must be running the same middleware.
Distributed System Goals:
Resource Accessibility
Distribution Transparency
Openness
Scalability
Resource Availability
Support user access to remote resources (printers, data files, web pages, CPU cycles)
and the fair sharing of the resources
Economics of sharing expensive resources
Performance enhancement – due to multiple processors; also due to ease of
collaboration and info exchange – access to remote services
Resource sharing introduces security problems
Distribution Transparency
Software hides some of the details of the distribution of system resources.
A distributed system that appears to its users & applications to be a single computer
system is said to be transparent.
Transparency has several dimensions
o Access: Hide differences in data representation & resource access (enables
interoperability)
o Location: Hide location of resource (can use resource without knowing its location)
o Migration: Hide possibility that a system may change location of resource (no effect
on access)
o Replication: Hide the possibility that multiple copies of the resource exist (for
reliability and/or availability)
o Concurrency: Hide the possibility that the resource may be shared concurrently
o Failure: Hide failure and recovery of the resource. How does one differentiate
between slow and failed?
o Relocation: Hide that resource may be moved during use
Too much emphasis on transparency may prevent the user from understanding system
behavior
Openness
An open distributed system, offers services according to standard rules that describe
the syntax and semantics of those services.” In other words, the interfaces to the
system are clearly specified and freely available.
Interface Definition/Description Languages (IDL): used to describe the interfaces
between software components, usually in a distributed system
o Definitions are language & machine independent
o Support communication between systems using different OS/programming
languages; e.g. a C++ program running on Windows communicates with a Java
program running on UNIX
o Communication is usually RPC-based.
Examples:
o IDL: Interface Description Language
The original
o WSDL: Web Services Description Language
Provides machine-readable descriptions of the services
o OMG IDL: used for RPC in CORBA
OMG – Object Management Group
Open Systems Supports:
o Interoperability: the ability of two different systems or applications to work
together
A process that needs a service should be able to talk to any process
that provides the service.
Multiple implementations of the same service may be provided, as long
as the interface is maintained
o Portability: an application designed to run on one distributed system can run
on another system which implements the same interface.
o Extensibility: Easy to add new components, features
Scalability
Dimensions that may scale:
o With respect to size
o With respect to geographical distribution
o With respect to the number of administrative organizations spanned
A scalable system still performs well as it scales up along any of the three dimensions.
Scalability is negatively affected when the system is based on
o Centralized server: one for all users
o Centralized data: a single data base for all users
o Centralized algorithms: one site collects all information, processes it, and
distributes the results to all sites.
Decentralized Algorithms:
o No machine has complete information about the system state
o Machines make decisions based only on local information
o Failure of a single machine doesn’t ruin the algorithm
o There is no assumption that a global clock exists.
Scalability affects performance more than anything else.
Three techniques to improve scalability:
o Hiding communication latencies
o Distribution
o Replication
Distribution
o Instead of one centralized service, divide into parts and distribute geographically
o Each part handles one aspect of the job
Example: DNS namespace is organized as a tree of domains; each domain is
divided into zones; names in each zone are handled by a different name server
WWW consists of many (millions?) of servers
Replication
o Replication: multiple identical copies of something
Replicated objects may also be distributed, but aren’t necessarily.
o Replication
Increases availability
Improves performance through load balancing
May avoid latency by improving proximity of resource