0% found this document useful (0 votes)
16 views

ch01-introduction

A distributed system is defined as a collection of autonomous computing elements that users perceive as a single coherent system. Key characteristics include autonomous nodes that communicate through an overlay network and the necessity for collaboration among nodes to maintain coherence. The document also discusses middleware, design goals such as distribution transparency and scalability, and various types of distributed systems including cloud and grid computing.

Uploaded by

mennatalah777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

ch01-introduction

A distributed system is defined as a collection of autonomous computing elements that users perceive as a single coherent system. Key characteristics include autonomous nodes that communicate through an overlay network and the necessity for collaboration among nodes to maintain coherence. The document also discusses middleware, design goals such as distribution transparency and scalability, and various types of distributed systems including cloud and grid computing.

Uploaded by

mennatalah777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Introduction: What is a distributed system?

Distributed System

Distributed Systems
(3rd Edition) Definition
A distributed system is a collection of autonomous computing elements that
appears to its users as a single coherent system.
Maarten van Steen Andrew S. Tanenbaum
Characteristic features
Autonomous computing elements, also referred to as nodes, be they
hardware devices or software processes.
Chapter 01: Introduction
Single coherent system: users or applications perceive a single system ⇒
Edited by: Hicham G. Elmongui nodes need to collaborate.

2 / 40

Introduction: What is a distributed system? Characteristic 1: Collection of autonomous computing elements Introduction: What is a distributed system? Characteristic 1: Collection of autonomous computing elements

Collection of autonomous nodes Organization

Overlay network
Independent behavior Each node in the collection communicates only with other nodes in the system,
Each node is autonomous and will thus have its own notion of time: there is no its neighbors. The set of neighbors may be dynamic, or may even be known
global clock. Leads to fundamental synchronization and coordination problems. only implicitly (i.e., requires a lookup).

Collection of nodes Overlay types


How to manage group membership? Well-known example of overlay networks: peer-to-peer systems.

How to know that you are indeed communicating with an authorized Structured: each node has a well-defined set of neighbors with whom it can
(non)member? communicate (tree, ring).
Unstructured: each node has references to randomly selected other nodes
from the system.

3 / 40 4 / 40

Introduction: What is a distributed system? Characteristic 2: Single coherent system Introduction: What is a distributed system? Middleware and distributed systems

Coherent system Middleware: the OS of distributed systems

Essence Same interface everywhere

The collection of nodes as a whole operates the same, no matter where, when, Computer 1 Computer 2 Computer 3 Computer 4

and how interaction between a user and the system takes place.
Appl. A Application B Appl. C

Examples
An end user cannot tell where a computation is taking place Distributed-system layer (middleware)
Where data is exactly stored should be irrelevant to an application
If or not data has been replicated is completely hidden Local OS 1 Local OS 2 Local OS 3 Local OS 4

Keyword is distribution transparency Network

What does it contain?


The snag: partial failures
Commonly used components and functions that need not be implemented by
It is inevitable that at any time only a part of the distributed system fails. Hiding
applications separately.
partial failures and their recovery is often very difficult and in general
impossible to hide.

5 / 40 6 / 40
Introduction: What is a distributed system? Middleware and distributed systems Introduction: Design goals

Middleware: the OS of distributed systems What do we want to achieve?


Middleware services are offered in a networked environment:
Resource management
Facilities for interapplication communication.
Security services. Support sharing of resources
Accounting services. Distribution transparency
Masking of and recovery from failures. Openness
Scalability
Typical middleware services:
Communication
Transactions
Service composition
Reliability

7 / 40 8 / 40

Introduction: Design goals Supporting resource sharing Introduction: Design goals Making distribution transparent

Sharing resources Distribution transparency

Types

Canonical examples Transparency Description


Access Hide differences in data representation and how an
Cloud-based shared storage and files
objecta is accessed
Peer-to-peer assisted multimedia streaming
Shared mail services (think of outsourced mail systems) Location Hide where an object is located
Shared Web hosting (think of content distribution networks) Relocation Hide that an object may be moved to another location
while in use
Observation Migration Hide that an object may move to another location
Replication Hide that an object is replicated
“The network is the computer”
Concurrency Hide that an object may be shared by several
(quote from John Gage, then at Sun Microsystems) independent users
Failure Hide the failure and recovery of an object
a We use the term object to mean either a process or a resource.

9 / 40 Types of distribution transparency 10 / 40

Introduction: Design goals Making distribution transparent Introduction: Design goals Making distribution transparent

Degree of transparency Degree of transparency

Observation
Aiming at full distribution transparency may be too much: Exposing distribution may be good

There are communication latencies that cannot be hidden Making use of location-based services (finding your nearby friends)
Completely hiding failures of networks and nodes is (theoretically and When dealing with users in different time zones
practically) impossible
When it makes it easier for a user to understand what’s going on (when
You cannot distinguish a slow computer from a failing one
e.g., a server does not respond for a long time, report it as failing).
You can never be sure that a server actually performed an operation
before a crash
Full transparency will cost performance, exposing distribution of the Conclusion
system Distribution transparency is a nice a goal, but achieving it is a different story,
Keeping replicas exactly up-to-date with the master takes time and it should often not even be aimed at.
Immediately flushing write operations to disk for fault tolerance

Degree of distribution transparency 11 / 40 Degree of distribution transparency 12 / 40


Introduction: Design goals Being open Introduction: Design goals Being open

Openness of distributed systems Policies versus mechanisms

Implementing openness: policies

What are we talking about? What level of consistency do we require for client-cached data?
Which operations do we allow downloaded code to perform?
Be able to interact with services from other open systems, irrespective of the Which QoS requirements do we adjust in the face of varying bandwidth?
underlying environment: What level of secrecy do we require for communication?
Systems should conform to well-defined interfaces
Systems should easily interoperate Implementing openness: mechanisms
Systems should support portability of applications
Allow (dynamic) setting of caching policies
Systems should be easily extensible
Support different levels of trust for mobile code
Provide adjustable QoS parameters per data stream
Offer different encryption algorithms

Interoperability, composability, and extensibility 13 / 40 Separating policy from mechanism 14 / 40

Introduction: Design goals Being scalable Introduction: Design goals Being scalable

Scale in distributed systems Size scalability

Observation
Many developers of modern distributed systems easily use the adjective
“scalable” without making clear why their system actually scales.

Root causes for scalability problems with centralized solutions


At least three components
The computational capacity, limited by the CPUs
Number of users and/or processes (size scalability)
The storage capacity, including the transfer rate between CPUs and disks
Maximum distance between nodes (geographical scalability)
The network between the user and the centralized service
Number of administrative domains (administrative scalability)

Observation
Most systems account only, to a certain extent, for size scalability. Often a
solution: multiple powerful servers operating independently in parallel. Today,
the challenge still lies in geographical and administrative scalability.

Scalability dimensions 15 / 40 Scalability dimensions 16 / 40

Introduction: Design goals Being scalable Introduction: Design goals Being scalable

Problems with geographical scalability Problems with administrative scalability

Essence
Conflicting policies concerning usage (and thus payment), management, and
security
Cannot simply go from LAN to WAN: many distributed systems assume
synchronous client-server interactions: client sends request and waits for
an answer. Latency may easily prohibit this scheme. Examples

WAN links are often inherently unreliable: simply moving streaming video Computational grids: share expensive resources between different
from LAN to WAN is bound to fail. domains.
Lack of multipoint communication, so that a simple search broadcast
Exception: several peer-to-peer networks
cannot be deployed. Solution is to develop separate naming and directory
services (having their own scalability problems). File-sharing systems (based, e.g., on BitTorrent)
Peer-to-peer telephony (Skype)
Peer-assisted audio streaming (Spotify)

Note: end users collaborate and not administrative entities.

Scalability dimensions 17 / 40 Scalability dimensions 18 / 40


Introduction: Design goals Being scalable Introduction: Design goals Being scalable

Techniques for scaling Techniques for scaling

Scalability problems in distributed systems appear as performance problems


caused by limited capacity of servers and network

Hide communication latencies


Two solutions:
Make use of asynchronous communication
Scaling up: improving their capacity (e.g., by increasing memory,
upgrading CPUs, or replacing network modules) Have separate handler for incoming response
Scaling out: expanding the distributed system by essentially deploying Problem: not every application fits this model
more machines
Hiding communication latencies
Distribution of work
Replication

Scaling techniques 19 / 40 Scaling techniques 20 / 40

Introduction: Design goals Being scalable Introduction: Design goals Being scalable

Techniques for scaling Techniques for scaling

Facilitate solution by moving computations to client


Client Server
M
FIRST NAME MAARTEN A
LAST NAME VAN STEEN
Partition data and computations across multiple machines
A
E-MAIL R
[email protected] T
E
N
Move computations to clients (Java applets)
Check form Process form Decentralized naming services (DNS)
Client Server

FIRST NAME MAARTEN


Decentralized information systems (WWW)
MAARTEN
LAST NAME VAN STEEN VAN STEEN
E-MAIL [email protected] [email protected]

Check form Process form

Scaling techniques 21 / 40 Scaling techniques 22 / 40

Introduction: Design goals Being scalable Introduction: Design goals Being scalable

Techniques for scaling Scaling: The problem with replication


Replication and caching: Make copies of data available at different machines
Replicated file servers and databases
Applying replication is easy, except for one thing
Mirrored Web sites
Having multiple copies (cached or replicated), leads to inconsistencies:
Web caches (in browsers and proxies) modifying one copy makes that copy different from the rest.
File caching (at server and client) Always keeping copies consistent and in a general way requires global
synchronization on each modification.
Benefits of replication:
Global synchronization precludes large-scale solutions.
increases availability
helps to balance the load between components Observation
If we can tolerate inconsistencies, we may reduce the need for global
can hide much of the communication latency problems
synchronization, but tolerating inconsistencies is application dependent.
Caching
A decision made by the client of a resource and not by the owner
Scaling techniques 23 / 40 Scaling techniques 24 / 40
Introduction: Design goals Pitfalls Introduction: Types of distributed systems

Developing distributed systems: Pitfalls Three types of distributed systems


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable High performance distributed computing systems

The network is secure Distributed information systems

The network is homogeneous Distributed systems for pervasive computing

The topology does not change


Latency is zero
Bandwidth is infinite
Transport cost is zero
There is one administrator
25 / 40 26 / 40

Introduction: Types of distributed systems High performance distributed computing Introduction: Types of distributed systems High performance distributed computing

Cluster computing Grid computing

Essentially a group of high-end systems connected through a LAN


Homogeneous: same OS, near-identical hardware The next step: lots of nodes from everywhere
Single managing node
Heterogeneous
Master node Compute node Compute node Compute node Dispersed across several organizations

Management Component Component Component Can easily span a wide-area network


application of of of
parallel parallel parallel
Parallel libs application application application Note
Local OS Local OS Local OS Local OS To allow for collaborations, grids generally use virtual organizations. In
essence, this is a grouping of users (or better: their IDs) that will allow for
Remote access Standard network authorization on resource allocation.
network
High-speed network

Cluster computing 27 / 40 Grid computing 28 / 40

Introduction: Types of distributed systems High performance distributed computing Introduction: Types of distributed systems High performance distributed computing

Architecture for grid computing Cloud computing

The layers
Fabric: Provides interfaces to local resources Google docs
(for querying state and capabilities, locking,
Software
aa Svc

Web services, multimedia, business apps Gmail


etc.) YouTube, Flickr
Applications Application
Connectivity: Communication/transaction Software framework (Java/Python/.Net)
MS Azure
Google App engine
protocols, e.g., for moving data between Storage (databases)
Platform
aa Svc

Collective layer
resources. Also various authentication Platforms
protocols. Computation (VM), storage (block, file)
Amazon S3
Amazon EC2
Resource layer
Connectivity layer
Resource: Manages a single resource, such as Infrastructure
Infrastructure

creating processes or reading data.


aa Svc

CPU, memory, disk, bandwidth Datacenters


Fabric layer
Collective: Handles access to multiple
Hardware
resources: discovery, scheduling,
replication.
Application: Contains actual grid applications in
a single organization.

Grid computing 29 / 40 Cloud computing 30 / 40


Introduction: Types of distributed systems High performance distributed computing Introduction: Types of distributed systems Distributed information systems

Cloud computing Integrating applications

Situation
Make a distinction between four layers
Organizations confronted with many networked applications, but achieving
Hardware: Processors, routers, power and cooling systems. Customers interoperability was painful.
normally never get to see these.
Infrastructure: Deploys virtualization techniques. Evolves around Basic approach
allocating and managing virtual storage devices and virtual servers. A networked application is one that runs on a server making its services
Platform: Provides higher-level abstractions for storage and such. available to remote clients. Simple integration: clients combine requests for
Example: Amazon S3 storage system offers an API for (locally created) (different) applications; send that off; collect responses, and present a coherent
files to be organized and stored in so-called buckets. result to the user.

Application: Actual applications, such as office suites (text processors,


Next step
spreadsheet applications, presentation applications). Comparable to the
suite of apps shipped with OSes. Allow direct application-to-application communication, leading to Enterprise
Application Integration.

Cloud computing 31 / 40 32 / 40

Introduction: Types of distributed systems Distributed information systems Introduction: Types of distributed systems Distributed information systems

Example EAI: (nested) transactions TPM: Transaction Processing Monitor


Transaction
Primitive Description Server
BEGIN TRANSACTION Mark the start of a transaction Reply

END TRANSACTION Terminate the transaction and try to commit Transaction


Requests
Request

ABORT TRANSACTION Kill the transaction and restore the old values Request
Client
READ Read data from a file, a table, or otherwise application
TP monitor Server

WRITE Write data to a file, a table, or otherwise Reply


Reply
Request

Issue: all-or-nothing Reply


Server
Nested transaction

Subtransaction Subtransaction

Observation
Atomic: happens indivisibly (seemingly)
In many cases, the data involved in a transaction is distributed across several
Airline database Consistent: does not violate system invariants
Hotel database
Isolated: not mutual interference servers. A TP Monitor is responsible for coordinating the execution of a
Two different (independent) databases Durable: commit means changes are permanent transaction.

Distributed transaction processing 33 / 40 Distributed transaction processing 34 / 40

Introduction: Types of distributed systems Distributed information systems Introduction: Types of distributed systems Pervasive systems

Middleware and EAI Distributed pervasive systems

Client Client
application application Observation
Emerging next-generation of distributed systems in which nodes are small,
Communication middleware
mobile, and often embedded in a larger system, characterized by the fact that
the system naturally blends into the user’s environment.
Server-side Server-side Server-side
application application application

Three (overlapping) subtypes


Ubiquitous computing systems: pervasive and continuously present, i.e.,
there is a continuous interaction between system and user.
Middleware offers communication facilities for integration Mobile computing systems: pervasive, but emphasis is on the fact that
Remote Procedure Call (RPC): Requests are sent through local procedure devices are inherently mobile.
call, packaged as message, processed, responded through message, and Sensor (and actuator) networks: pervasive, with emphasis on the actual
result returned as return from call. (collaborative) sensing and actuation of the environment.
Message Oriented Middleware (MOM): Messages are sent to logical contact
point (published), and forwarded to subscribed applications.
Enterprise application integration 35 / 40 36 / 40
Introduction: Types of distributed systems Pervasive systems Introduction: Types of distributed systems Pervasive systems

Ubiquitous systems Mobile computing

Core elements
Distinctive features
1 (Distribution) Devices are networked, distributed, and accessible in a
transparent manner A myriad of different mobile devices (smartphones, tablets, GPS devices,
2 (Interaction) Interaction between users and devices is highly unobtrusive remote controls, active badges.
3 (Context awareness) The system is aware of a user’s context in order to Mobile implies that a device’s location is expected to change over time ⇒
optimize interaction change of local services, reachability, etc. Keyword: discovery.
4 (Autonomy) Devices operate autonomously without human intervention,
and are thus highly self-managed Communication may become more difficult: no stable route, but also
5 (Intelligence) The system as a whole can handle a wide range of perhaps no guaranteed connectivity ⇒ disruption-tolerant networking.
dynamic actions and interactions

Ubiquitous computing systems 37 / 40 Mobile computing systems 38 / 40

Introduction: Types of distributed systems Pervasive systems Introduction: Types of distributed systems Pervasive systems

Sensor networks Sensor networks as distributed databases

Two extremes
Sensor network

Operator's site
Characteristics
The nodes to which sensors are attached are:
Sensor data
Many (10s-1000s) is sent directly
to operator
Simple (small memory/compute/communication capacity) Each sensor
can process and Sensor network
Often battery-powered (or even battery-less) store data
Operator's site
Query

Sensors
send only
answers

Sensor networks 39 / 40 Sensor networks 40 / 40

You might also like