Distributed System
Distributed System
PRASHASTI KANIKAR 2
Definitions
• Tanenbaum: “A distributed system is a collection of
independent computers that appears to its users as a
single coherent system”.
3
PRASHASTI KANIKAR 3
Introduction contd…
• 2 aspects to the definition:
– Hardware: autonomous computers, network links
– Software: communication protocols, system and
application software.
4
PRASHASTI KANIKAR 4
Introduction contd…
terminal
mainframe computer
workstation
network link
network host
centralized computing
distributed computing
5
PRASHASTI KANIKAR 5
Introduction contd…
• Examples of Distributed systems:
– Network of workstations (NOW): a group of
networked personal workstations connected to one or
more server machines.
– The Internet
6
PRASHASTI KANIKAR 6
Introduction contd…
• Why to study Distributed systems:
– Economics: distributed systems allow the pooling of
resources, including CPU cycles, data storage, input/output
devices, and services.
– Reliability: a distributed system allows replication of
resources and/or services
• Speed: a distributed system may have more total computing power than a
mainframe.
• Inherent distribution: Some applications are inherently distributed. Ex. a
supermarket chain.
• Incremental growth: Computing power can be added in small
increments. Modular expandability
• Another driving force: the existence of large number of personal
computers, the need for people to collaborate and share information.
PRASHASTI KANIKAR 7
Parallel vs Distributed Systems
Parallel Systems Distributed Systems
– Transparency
– Openness
– Scalability
9
PRASHASTI KANIKAR 9
Goals of Distributed Systems contd…
1. Connecting Users and Resources:
– Share resources
– Problem: Security
10
PRASHASTI KANIKAR 10
Goals of Distributed Systems contd…
2. Transparency:
Transparency Description
Relocation Hide that a resource may be moved to another location while in use
– Interoperability
– Portability
– Flexibility
12
PRASHASTI KANIKAR 12
Goals of Distributed Systems contd…
4. Scalability:
– Size scalability
• Can add more users and resources
– Geographical scalability
• Can spread across different geographical areas
– Administrative scalability
• Can be manageable even if it spans many
independent organizations
13
PRASHASTI KANIKAR 13
Goals of Distributed Systems contd…
Scalability Problems
Scaling w.r.t. Size:
Concept Example
Geographical Scaling:
Form of communication: Synchronous
Communication is inherently unreliable and point-
to-point.
14
PRASHASTI KANIKAR 14
Goals of Distributed Systems contd…
• Scalability Problems
• Administrative Scaling:
– Conflicting policies
– If DS expands to another domain, two types of
security measures need to be taken:
• DS has to protect itself against malicious attacks
from the new domain.
• The new domain has to protect itself against
malicious attacks from DS.
15
PRASHASTI KANIKAR 15
Goals of Distributed Systems contd…
• Scaling Techniques:
– Hiding communication latencies
– Distribution
– Replication
16
PRASHASTI KANIKAR 16
Goals of Distributed Systems contd…
• Scalability Techniques:
– Hiding communication latencies
17
PRASHASTI KANIKAR 17
Goals of Distributed Systems contd…
• Distribution
– Distribution
18
PRASHASTI KANIKAR 18
Goals of Distributed Systems contd…
• Scaling Techniques:
– Replication
• Increases availability, balances load
• Caching :special form of replication
• Drawback: leads to inconsistency problems
19
PRASHASTI KANIKAR 19
Hardware Concepts
• Multiprocessors vs. Multicomputers
20
PRASHASTI KANIKAR 20
Hardware Concepts contd…
• Multiprocessors
– Property: all the CPUs have direct access to the
shared memory.
A bus-based multiprocessor.
21
PRASHASTI KANIKAR 21
Hardware Concepts contd…
• Multiprocessors
– Problem : bus will usually be overloaded
– Solution: high-speed cache memory
– Again there is problem with caches
22
PRASHASTI KANIKAR 22
Hardware Concepts contd…
• Multiprocessors
a) A crossbar switch
b) An omega switching network
23
PRASHASTI KANIKAR 23
Hardware Concepts contd…
• Homogeneous Multicomputer systems
– Each CPU has direct connection to its own local
memory.
– Also referred as System Area Networks(SANs)
– Bus-based and switch-based
– Two popular topologies: Grid and Hypercube
24
PRASHASTI KANIKAR 24
Hardware Concepts contd…
• Homogeneous Multicomputer systems
a) Grid
b) Hypercube
25
PRASHASTI KANIKAR 25
Hardware Concepts contd…
• Heterogeneous Multicomputer systems
– Computers may vary w.r.t. processor type,
memory, sizes and I/O bandwidth.
– Varying interconnection networks
26
PRASHASTI KANIKAR 26
Software Concepts
• DS are very much like Operating systems
– Acting as resource managers
– Hides the heterogeneous nature of the underlying
H/W
Two categories:
tightly-coupled systems: DOS
loosely -coupled systems: NOS
Middleware
27
PRASHASTI KANIKAR 27
Software Concepts
System Description Main Goal
29
PRASHASTI KANIKAR 29
Software Concepts contd…
• Multiprocessor Operating systems:
– Goal: make the number of CPUs transparent to the
application
– Idea: protect the data against simultaneous access
– Semaphores and Monitors
30
PRASHASTI KANIKAR 30
Software Concepts contd…
• Multicomputer Operating systems:
31
PRASHASTI KANIKAR 31
Software Concepts contd…
• Network Operating systems:
32
PRASHASTI KANIKAR 32
Software Concepts contd…
• Network Operating systems:
33
PRASHASTI KANIKAR 33
Software Concepts contd…
• Middleware:
34
PRASHASTI KANIKAR 34
Role of Middleware (MW)
In some early research systems: MW tried to provide
the illusion that a collection of separate machines was
a single computer.
E.g. NOW project: GLUNIX middleware
Today:
clustering software allows independent computers to
work together closely
MW also supports seamless access to remote services,
doesn’t try to look like a general-purpose OS
PRASHASTI KANIKAR 35
Software Concepts contd…
• Middleware Models:
36
PRASHASTI KANIKAR 36
Software Concepts contd…
• Middleware Services:
– Communication: implements access transparency
– Naming: allows entities to be shared and looked-
up
– Persistence: for storage by means of distributed
file systems, integrated databases
– Distributed Transactions: it allows multiple read
and write operations to occur atomically
– Security
37
PRASHASTI KANIKAR 37
Software Concepts
Item Distributed OS Network OS Middleware based
OS
Multiproc. Multicomp.
Degree of Very high high low high
Transparency
Same OS on all Yes Yes No No
nodes?
Number of 1 N N N
copies of OS
Basis for Shared Messages Files Model specific
communication Memory
Resource Global, Global, Per node Per node
Management central distributed
Scalability No Moderately Yes Varies
Openness Closed Closed Open Open
38
PRASHASTI KANIKAR 38
The Client-Server Model
• Server: process implementing a specific service eg., file
system service
• Client: process that requests a service from a server .
• Client-server interaction, known as Request-Reply behavior
• Security
– Resources to be secured
40
PRASHASTI KANIKAR 40
Advantages and Disadvantages of Distributed
Systems
• Advantages:
– The affordability of computers and availability of
network access
– Resource sharing
– Scalability
– Fault Tolerance
• Disadvantages:
– Multiple points of failure
– Security concerns
41
PRASHASTI KANIKAR 41
Types of Distributed Systems
• Distributed Computing Systems
– Clusters
– Grids
– Clouds
• Distributed Information Systems
– Transaction Processing Systems
– Enterprise Application Integration
• Distributed Embedded Systems
– Home systems
– Health care systems
– Sensor networks
PRASHASTI KANIKAR 42
Cluster Computing
• A collection of similar processors (PCs,
workstations) running the same operating
system, connected by a high-speed LAN.
• Parallel computing capabilities using
inexpensive PC hardware
• Replace big parallel computers
PRASHASTI KANIKAR 43
Cluster Types & Uses
• High Performance Clusters (HPC)
– run large parallel programs
– Scientific, military, engineering apps; e.g., weather
modeling
• Load Balancing Clusters
– Front end processor distributes incoming requests
– e.g., at banks or popular web site
• High Availability Clusters (HA)
– Provide redundancy – back up systems
– May be more fault tolerant than large mainframes
PRASHASTI KANIKAR 44
Clusters – Beowulf model
• Linux-based
• Master-slave paradigm
– One processor is the master; allocates tasks to
other processors, maintains batch queue of
submitted jobs, handles interface to users
– Master has libraries to handle message-based
communication or other features (the
middleware).
PRASHASTI KANIKAR 45
Cluster Computing Systems
• Figure 1-6. An example of a cluster computing
system.
PRASHASTI KANIKAR 47
More About MOSIX
• “Operating-system-like”; looks & feels like a
single computer with multiple processors
• Supports interactive and batch processes
• Provides resource discovery and workload
distribution among clusters
• Clusters can be partitioned for use by an
individual or a group
• Best for compute-intensive jobs
PRASHASTI KANIKAR 48
Grid Computing Systems
• Modeled loosely on the electrical grid.
• Highly heterogeneous with respect to
hardware, software, networks, security
policies, etc.
• Grids support virtual organizations: a
collaboration of users who pool resources
(servers, storage, databases) and share
them
• Grid software is concerned with managing
sharing across administrative domains.
PRASHASTI KANIKAR 49
Grids
• Similar to clusters but processors are more loosely
coupled, tend to be heterogeneous, and are not all in
a central location.
• Can handle workloads similar to those on
supercomputers, but grid computers connect over a
network and supercomputers’ CPUs connect to a
high-speed internal bus/network
• Problems are broken up into parts and distributed
across multiple computers in the grid.
• less communication between parts than in clusters.
PRASHASTI KANIKAR 50
A Proposed Architecture for Grid Systems*
PRASHASTI KANIKAR 52
Globus Toolkit
• An example of grid middleware
• Supports the combination of heterogeneous
platforms into virtual organizations.
• Implements the OSGA standards, among
others.
PRASHASTI KANIKAR 53
Cloud Computing
• Provides scalable services as a utility over the
Internet.
• Often built on a computer grid
• Users buy services from the cloud
– Grid users may develop and run their own
software
• Cluster/grid/cloud distinctions blur at the
edges!
PRASHASTI KANIKAR 54
Types of Distributed Systems
• Distributed Computing Systems
– Clusters
– Grids
– Clouds
• Distributed Information Systems
• Distributed Embedded Systems
PRASHASTI KANIKAR 55
Distributed Information Systems
• Business-oriented
• Systems to make a number of separate
network applications interoperable and build
“enterprise-wide information systems”.
• Two types :
– Transaction processing systems
– Enterprise application integration (EAI)
PRASHASTI KANIKAR 56
Transaction Processing Systems
• Provide a highly structured client-server
approach for database applications
• Transactions are the communication model
• Obey the ACID properties:
– Atomic: all or nothing
– Consistent: invariants are preserved
– Isolated (serializable)
– Durable: committed operations can’t be
undone
PRASHASTI KANIKAR 57
Transaction Processing Systems
• Figure 1-8. Example primitives for
transactions.
PRASHASTI KANIKAR 58
Transactions
• Transaction processing may be centralized
(traditional client/server system) or
distributed.
• A distributed database is one in which the
data storage is distributed – connected to
separate processors.
PRASHASTI KANIKAR 59
Nested Transactions
• A nested transaction is a transaction within
another transaction (a sub-transaction)
– Example: a transaction may ask for two things
(e.g., airline reservation info + hotel info) which
would spawn two nested transactions
• Primary transaction waits for the results.
– While children are active parent may only abort,
commit, or spawn other children
PRASHASTI KANIKAR 60
Transaction Processing Systems
PRASHASTI KANIKAR 63
Enterprise Application Integration
65
PRASHASTI KANIKAR 65