Slides.01 Distributed System
Slides.01 Distributed System
Centralized Decentralized
Distributed
Centralized Decentralized
Distributed
Centralized Decentralized
Distributed
Centralized Decentralized
Distributed
Alternative approach
Two views on realizing distributed systems
• Integrative view: connecting existing networked computer systems into
a larger a system.
• Expansive view: an existing networked computer systems is extended
with additional computers
Alternative approach
Two views on realizing distributed systems
• Integrative view: connecting existing networked computer systems into
a larger a system.
• Expansive view: an existing networked computer systems is extended
with additional computers
Two definitions
• A decentralized system is a networked computer system in which
processes and resources are necessarily spread across multiple
computers.
• A distributed system is a networked computer system in which
processes and resources are sufficiently spread across multiple
computers.
Important
There are many, poorly founded, misconceptions regarding scalability, fault
tolerance, security, etc. We need to develop skills by which distributed
systems can be readily understood so as to judge such misconceptions.
Sharing resources
Canonical examples
• Cloud-based shared storage and files
• Peer-to-peer assisted multimedia streaming
• Shared mail services (think of outsourced mail systems)
• Shared Web hosting (think of content distribution networks)
Observation
“The network is the computer”
(quote from John Gage, then at Sun Microsystems)
Resource sharing
Introduction Design goals
Distribution transparency
What is transparency?
The phenomenon by which a distributed system attempts to hide the fact
that its processes and resources are physically distributed across multiple
computers, possibly separated by large distances.
Distribution transparency
Introduction Design goals
Distribution transparency
What is transparency?
The phenomenon by which a distributed system attempts to hide the fact
that its processes and resources are physically distributed across multiple
computers, possibly separated by large distances.
Observation
Distribution transparancy is handled through many different techniques in
a layer between applications and operating systems: a middleware layer
Distribution transparency
Introduction Design goals
Distribution transparency
Types
Transparency Description
Access Hide differences in data representation and how an
object is accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another location
while in use
Migration Hide that an object may move to another location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by several
independent users
Failure Hide the failure and recovery of an object
Distribution transparency
Introduction Design goals
Degree of transparency
Aiming at full distribution transparency may be too much
Distribution transparency
Introduction Design goals
Degree of transparency
Aiming at full distribution transparency may be too much
• There are communication latencies that cannot be hidden
Distribution transparency
Introduction Design goals
Degree of transparency
Aiming at full distribution transparency may be too much
• There are communication latencies that cannot be hidden
• Completely hiding failures of networks and nodes is (theoretically
and practically) impossible
• You cannot distinguish a slow computer from a failing one
• You can never be sure that a server actually performed an
operation before a crash
Distribution transparency
Introduction Design goals
Degree of transparency
Aiming at full distribution transparency may be too much
• There are communication latencies that cannot be hidden
• Completely hiding failures of networks and nodes is (theoretically
and practically) impossible
• You cannot distinguish a slow computer from a failing one
• You can never be sure that a server actually performed an
operation before a crash
• Full transparency will cost performance, exposing distribution of the
system
• Keeping replicas exactly up-to-date with the master takes time
• Immediately flushing write operations to disk for fault
tolerance
Distribution transparency
Introduction Design goals
Degree of transparency
Exposing distribution may be good
• Making use of location-based services (finding your nearby friends)
• When dealing with users in different time zones
• When it makes it easier for a user to understand what’s going on
(when e.g., a server does not respond for a long time, report it as
failing).
Distribution transparency
Introduction Design goals
Degree of transparency
Exposing distribution may be good
• Making use of location-based services (finding your nearby friends)
• When dealing with users in different time zones
• When it makes it easier for a user to understand what’s going on
(when e.g., a server does not respond for a long time, report it as
failing).
Conclusion
Distribution transparency is a nice goal, but achieving it is a different story,
and it should often not even be aimed at.
Distribution transparency
Introduction Design goals
Openness
Introduction Design goals
Openness
Introduction Design goals
On strict separation
Observation
The stricter the separation between policy and mechanism, the more we
need to ensure proper mechanisms, potentially leading to many configuration
parameters and complex management.
Finding a balance
Hard-coding policies often simplifies management, and reduces complexity
at the price of less flexibility. There is no obvious solution.
Openness
Introduction Design goals
Dependability
Basics
A component provides services to clients. To provide services, the
component may require the services from other components ⇒ a component
may depend on some other component.
Specifically
A component C depends on C∗ if the correctness of C’s behavior depends
on the correctness of C∗’s behavior. (Components are processes or
channels.)
Dependability
Introduction Design goals
Dependability
Requirements related to dependability
Requirement Description
Availability Readiness for usage
Reliability Continuity of service delivery
Safety Very low probability of catastrophes
Maintainability How easy can a failed system be repaired
Dependability
Introduction Design goals
Traditional metrics
• Mean Time To Failure (MTTF): The average time until a component fails.
• Mean Time To Repair (MTTR): The average time needed to repair
a component.
• Mean Time Between Failures (MTBF): Simply MTTF + MTTR.
Dependability
Introduction Design goals
Terminology
Failure, error, fault
Term Description Example
Failure A component is not living up Crashed program
to its specifications
Error Part of a component that can Programming bug
lead to a failure
Fault Cause of an error Sloppy programmer
Dependability
Introduction Design goals
Terminology
Handling faults
Dependability
Introduction Design goals
On security
Observation
A distributed system that is not secure, is not
dependable
Security
Introduction Design goals
On security
Observation
A distributed system that is not secure, is not dependable
What we need
• Confidentiality: information is disclosed only to authorized parties
• Integrity: Ensure that alterations to assets of a system can be made
only in an authorized way
Security
Introduction Design goals
On security
Observation
A distributed system that is not secure, is not dependable
What we need
• Confidentiality: information is disclosed only to authorized parties
• Integrity: Ensure that alterations to assets of a system can be made
only in an authorized way
Security
Introduction Design goals
Security mechanisms
Keeping it simple
It’s all about encrypting and decrypting data using security
keys.
Notation
K (data) denotes that we use key K to encrypt/decrypt data.
Security
Introduction Design goals
Security mechanisms
Symmetric cryptosystem
With encryption key EK (data) and decryption key DK (data):
if data = DK (EK (data)) then DK = EK . Note: encryption and descryption key
are the same and should be kept secret.
Asymmetric cryptosystem
Distinguish a public key PK (data) and a private (secret) key SK (data).
Security
Introduction Design goals
Security mechanisms
Secure hashing
In practice, we use secure hash functions: H(data) returns a fixed-length
string.
• Any change from data to data∗ will lead to a completely different
string
H(data∗).
• Given a hash value, it is computationally impossible to find a data
with
h = H(data)
Security
Introduction Design goals
Security mechanisms
Secure hashing
In practice, we use secure hash functions: H(data) returns a fixed-length
string.
• Any change from data to data∗ will lead to a completely different
string
H(data∗).
• Given a hash value, it is computationally impossible to find a data
with
h = H(data)
Security
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Observation
Most systems account only, to a certain extent, for size scalability. Often a
solution: multiple powerful servers operating independently in parallel.
Today, the challenge still lies in geographical and administrative scalability.
Scalability
Introduction Design goals
Size scalability
Root causes for scalability problems with centralized solutions
• The computational capacity, limited by the CPUs
• The storage capacity, including the transfer rate between CPUs and
disks
• The network between the user and the centralized service
Scalability
Introduction Design goals
Formal analysis
A centralized service can be modeled as a simple queuing system
Scalability
Introduction Design goals
Formal analysis
Utilization U of a service is the fraction of time that it is busy
Average throughput
Scalability
Introduction Design goals
Formal analysis
Response time: total time take to process a request after submission
Observations
• If U is small, response-to-service time is close to 1: a request
is immediately processed
• If U goes up to 1, the system comes to a grinding
halt. Solution: decrease S.
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Examples
• Computational grids: share expensive resources between different
domains.
• Shared equipment: how to control, manage, and use a shared
radio telescope constructed as large-scale shared sensor network?
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Scalability
Introduction Design goals
Observation
If we can tolerate inconsistencies, we may reduce the need for global
synchronization, but tolerating inconsistencies is application dependent.
Scalability
Introduction A simple classification of distributed
systems
Parallel computing
Observation
High-performance distributed computing started with parallel computing
Problem
Performance of distributed shared memory could never compete with that
of multiprocessors, and failed to meet the expectations of programmers. It
has been widely abandoned by now.
Cluster computing
Essentially a group of high-end systems connected through a LAN
• Homogeneous: same OS, near-identical hardware
• Single, or tightly coupled managing node(s)
Grid computing
The next step: plenty of nodes from everywhere
• Heterogeneous
• Dispersed across several organizations
• Can easily span a wide-area network
Note
To allow for collaborations, grids generally use virtual organizations. In
essence, this is a grouping of users (or better: their IDs) that allows
for authorization on resource allocation.
Integrating applications
Situation
Organizations confronted with many networked applications, but
achieving interoperability was painful.
Basic approach
A networked application is one that runs on a server making its services
available to remote clients. Simple integration: clients combine requests
for (different) applications; send that off; collect responses, and present a
coherent result to the user.
Next step
Allow direct application-to-application communication, leading to Enterprise
Application Integration.
Issue: all-or-nothing
• Atomic: happens indivisibly (seemingly)
• Consistent: does not violate system invariants
• Isolated: not mutual interference
• Durable: commit means changes are permanent
Observation
Often, the data involved in a transaction is distributed across several servers.
A TP Monitor is responsible for coordinating the execution of a transaction.
Pervasive systems
Introduction A simple classification of distributed
systems
Pervasive systems
Introduction A simple classification of distributed
systems
Pervasive systems
Introduction A simple classification of distributed
systems
Pervasive systems
Introduction A simple classification of distributed
systems
Ubiquitous systems
Core elements
1. (Distribution) Devices are networked, distributed, and accessible
transparently
2. (Interaction) Interaction between users and devices is highly
unobtrusive
3. (Context awareness) The system is aware of a user’s context to optimize
interaction
4. (Autonomy) Devices operate autonomously without human
intervention, and are thus highly self-managed
5. (Intelligence) The system as a whole can handle a wide range of
dynamic actions and interactions
Pervasive systems
Introduction A simple classification of distributed
systems
Mobile computing
Distinctive features
• A myriad of different mobile devices (smartphones, tablets, GPS
devices, remote controls, active badges).
• Mobile implies that a device’s location is expected to change over time
⇒
change of local services, reachability, etc. Keyword: discovery.
• Maintaining stable communication can introduce serious problems.
• For a long time, research has focused on directly sharing
resources between mobile devices. It never became popular and is
by now considered to be a fruitless path for research.
Pervasive systems
Introduction A simple classification of distributed
systems
Mobile computing
Distinctive features
• A myriad of different mobile devices (smartphones, tablets, GPS
devices, remote controls, active badges).
• Mobile implies that a device’s location is expected to change over time
⇒
change of local services, reachability, etc. Keyword: discovery.
• Maintaining stable communication can introduce serious problems.
• For a long time, research has focused on directly sharing
resources between mobile devices. It never became popular and is
by now considered to be a fruitless path for research.
Bottomline
Mobile devices set up connections to stationary servers, essentially
bringing mobile computing in the position of clients of cloud-based services.
Pervasive systems
Introduction A simple classification of distributed systems
Mobile computing
Pervasive systems
Introduction A simple classification of distributed
systems
Sensor networks
Characteristics
The nodes to which sensors are attached are:
• Many (10s-1000s)
• Simple (small memory/compute/communication capacity)
• Often battery-powered (or even battery-less)
Pervasive systems
Introduction A simple classification of distributed systems
Pervasive systems
Introduction A simple classification of distributed systems
Pervasive systems
Introduction Pitfalls