0% found this document useful (0 votes)
12 views43 pages

DS Lec03

distributed systems

Uploaded by

Ebram Wagdy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views43 pages

DS Lec03

distributed systems

Uploaded by

Ebram Wagdy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Distributed Systems :

Software Concepts
DS Software
Distributed
Systems

Multi- Multi-
Processor Computer

OS for each Distributed


Asymmetric Network OS
CPU OS

Time
Symmetric
sharing
10/24/2024 Distributed Systems 6
Distributed Systems : Software
System Software Hardware Main goal
Network Loosely Loosely Offer local service for
operating system coupled coupled remote clients
Additional layer on Provide distribution
Middleware
top of NOS transparency
Distributed Tightly Loosely provide user with
operating system coupled coupled transparency
Multiprocessor Provide the feeling of
Tightly Tightly
Time Sharing working with single
coupled coupled
systems processor
10/24/2024 Distributed Systems 7
Multiprocessor OS
OS for each CPU
Each CPU Has Its Own Operating System:
The simplest possible way to organize a
multiprocessor operating system is to:
divide memory into partitions = No. of CPUs
give each CPU its own
▪ private memory
▪ private copy of the operating system.
The n CPUs operate as n independent computers.
10/24/2024 Distributed Systems 9
OS for each CPU

10/24/2024 Distributed Systems 10


Asymmetric Multiprocessors
Also called Master-slave multiprocessing
One copy of the operating system and its tables
is present on one CPU (Master) and not on any
of the others.
All system calls are redirected to Master CPU for
processing there.
Master CPU may also run user processes if there
is CPU time left over.
10/24/2024 Distributed Systems 11
Asymmetric Multiprocessors

10/24/2024 Distributed Systems 12


Asymmetric Multiprocessors
Disadvantages of
Advantages of AMP
AMP
Simplified Control Single Point of Failure

Reduced Resource
Limited Scalability
Conflicts

Less Efficient
Specialized Processors
Resource Utilization
10/24/2024 Distributed Systems 13
Symmetric Multiprocessors
In SMP, two or more identical processors are
connected to a single, shared main memory
have full access to all input/output devices
controlled by a single OS instance that treats all processors equally.
There is one copy of the operating system in memory, but any
CPU can run it.
When a system call is made, the CPU on which the system call
was made traps to the kernel and processes the system call.
A mutex (i.e., lock) is associated with the OS.
When a CPU wants to run operating system code, it must first
acquire the mutex.

10/24/2024 Distributed Systems 14


Symmetric Multiprocessors

10/24/2024 Distributed Systems 15


Symmetric Multiprocessors
Disadvantages of
Advantages of SMP
SMP
Better Resource
Complex Scheduling
Utilization

Scalability Resource Contention

Increased Hardware
Fault Tolerance
Costs
10/24/2024 Distributed Systems 16
SMP vs AMP
Symmetric Asymmetric
Parameter
Multiprocessing Multiprocessing
Processor types Identical Not identical
Tasks of the OS Any symmetric CPU The master processor.
the master manages all
Communication Via shared memory.
the slave processors.
Task assignment via a prepared queue list. Via master processor
very complex due to the
very simple and
Architecture need for CPU
straightforward.
synchronization.
Expensive due to comparatively Cheaper
Expense complicated architectural due to simpler
design. architectural design.
10/24/2024 Distributed Systems 17
Multiprocessor Scheduling
On a uniprocessor, scheduling is one dimensional. The only question that
must be answered (repeatedly) is: ''Which process should be run next?‘’
On a multiprocessor, scheduling is two dimensional.
The scheduler has to decide which process to run and which CPU to run
it on.
This extra dimension greatly complicates scheduling on multiprocessors.
Another complicating factor is that in some systems, all the processes are
unrelated whereas in others they come in groups.
An example of the former situation is a timesharing system in which
independent users start up independent processes.
The processes are unrelated and each one can be scheduled without
regard to the other ones

10/24/2024 Distributed Systems 18


Timesharing
The simplest scheduling algorithm for
dealing with unrelated processes (or threads)
is to have a single system wide data structure
for ready processes, possibly just a list, but
more likely a set of lists for processes at
different priorities as depicted

10/24/2024 Distributed Systems 19


Timesharing
Here the 16 CPUs are all currently
busy, and a prioritized set of 14
processes are waiting to run.
The first CPU to finish its current
work (or have its process block) is
CPU 4, which then locks the
scheduling queues and selects the
highest priority process, A
Next, CPU 12 goes idle and chooses
process B, as illustrated in Fig. As
long as the processes are completely
unrelated, doing scheduling this way
is a reasonable choice.

10/24/2024 Distributed Systems 20


Multiprocessor Time Sharing
Having a single scheduling data structure used by all CPUs
timeshares the CPUs, much as they would be in a uniprocessor
system.
It also provides automatic load balancing because it can never
happen that one CPU is idle while others are overloaded.
Two disadvantages of this approach are
the potential contention for the scheduling data structure as the
numbers of CPUs grows
the usual overhead in doing a context switch when a process blocks
for I/O.
It is also possible that a context switch happens when a process'
quantum expires.
10/24/2024 Distributed Systems 21
Challenges and Overheads
• Spin locks vs. blocking locks
Synchronization • Hardware vs. software support

• Number of accesses increases with number of CPUs.


Memory Access • Memory speed must scale with N(CPU) or constrain.

Caching and • Memory bottleneck increases the need to cache


Consistency • multiple copies in caches means coherency problems

System Bus • Traffic increases with the number of CPUs

File System • Serializes access if designed conventionally

10/24/2024 Distributed Systems 22


Network Operating
System
Network Operating System
Each computer has its own operating system with networking facilities.
Computers work independently
(i.e., they may even have different operating systems).
Services are tied to individual nodes (ftp, WWW).
Highly file oriented (basically, processors share only files).
Machine A Machine B Machine C

Distributed applications

NOS services NOS services NOS services

Kernel Kernel Kernel

Network
10/24/2024 Distributed Systems 24
Network Operating System
Intermediate stage between
independent individual workstations and
a distributed SW environment.
Independent machines with significant interaction:
They may run different operating systems.
Servers support interaction of distributed pieces.
Network File System (NFS) is the most obvious and the strongest
combining force.
Extensive communication - location dependent.
Client/server - service model.
File servers - file sharing.
WWW service - newest addition.

10/24/2024 Distributed Systems 25


NOS for users
Users are aware of multiplicity of machines
Access to resources of various machines is done explicitly
by:
Remote logging into the appropriate remote machine
Transferring data from remote machines to local machines,
via the File Transfer Protocol (FTP) mechanism
Upload, download, access, or share files through cloud storage
Users must change paradigms – establish a session, give
network-based commands, use a web browser
More difficult for users

10/24/2024 Distributed Systems 26


PC vs. NOS
The NOS enhances the reach of the
client PC by making remote services
available as extensions of the local
operating system.
Although a number of users may have
accounts on a PC, only a single account is
active on the system at any given time.
NOS supports multiple user accounts at
the same time and enables concurrent
access to shared resources by multiple
clients (multitasking and multiuser
environment).

10/24/2024 Distributed Systems 27


Multiuser, Multitasking, and
Multiprocessor Systems
A NOS server is a multitasking system.
OS is capable of executing multiple tasks
at the same time.
Some systems are equipped with more
than one processor, called
multiprocessing systems.
multiprocessing systems are capable of
executing multiple tasks in parallel by
assigning each task to a different
processor.
The total amount of work that the server
can perform in a given time is greatly
enhanced in multiprocessor systems.

10/24/2024 Distributed Systems 28


NOS: Where to use?
NOS can be used in:
Routers, switches and hardware firewall.
PCs in Peer-to-peer networks
Client-server Architecture

10/24/2024 Distributed Systems 29


Middleware
Middleware is a computer software that connects
software components or people and their
applications.
It consists of a set of services that allows multiple
processes running on one or more machines to
interact.
Interoperable in support of distributed systems.
Middleware sits "in the middle" between application
software that may be working on different operating
systems.
10/24/2024 Distributed Systems 30
Middleware
OS on each computer need
not know about the other
computers.
OS on different computers
need not generally be the
same.
Services are generally
(transparently) distributed
across computers.

10/24/2024 Distributed Systems 31


Distributed
Operating System
Multicomputer OS
Multicomputer OS has a totally different structure and complexity from
multiprocessor OS because of the lack of physically shared memory for
storing data structures for system-wide resource management.
Users are not aware of multiplicity of machines
Access to remote resources similar to access to local resources

Machine A Machine B Machine C

Distributed applications

Distributed OS applications
Kernel Kernel Kernel

Network
10/24/2024 Distributed Systems 33
Fully Distributed Systems
Programming and user interface provides a
virtual uni-processor.
Built on top of distributed heterogeneous machines.
New approaches and implementations to user
and system software are generally required.
Long development time.
Execution overhead is the REAL challenge
Message latency and overhead are usually the right
place to start looking
10/24/2024 Distributed Systems 34
Fully Distributed Systems
Familiar uni-processor challenges are all
present
They become more challenging when generalized
for a truly distributed implementation
Synchronization Squared.
Scheduling multiplied.
Programming model complication.
Shared and Virtual Memory complications.
10/24/2024 Distributed Systems 35
Fully Distributed Systems
Distribution brings its own complications
Load balancing.
Cache coherence.
Fault tolerance.
Process migration.

10/24/2024 Distributed Systems 36


Migrations
Data Migration – transfer data by transferring entire file, or transferring only
those portions of the file necessary for the immediate task
Computation Migration – transfer the computation, rather than the data, across
the system
Via remote procedure calls (RPCs)
Via messaging system
Process Migration – execute an entire process, or parts of it, at different sites
Load balancing – distribute processes across network to even the workload
Computation speedup – subprocesses can run concurrently on different sites
Hardware preference – process execution may require specialized processor
Software preference – required software may be available at only a particular
site
Data access – run process remotely, rather than transfer all data locally
10/24/2024 Distributed Systems 37
Design questions!
Robustness
Can the distributed system withstand failures?
Transparency
Can the distributed system be transparent to the
user both in terms of where files are stored and user
mobility?
Scalability
Can the distributed system be scalable to allow
addition of more computation power, storage, or
users?
10/24/2024 Distributed Systems 38
Robustness
Hardware failures can include failure of a link,
failure of a site, and loss of a message.
A fault-tolerant system can tolerate a certain
level of failure
Degree of fault tolerance depends on design of
system and the specific fault
The more fault tolerance, the better!
Involves failure detection, reconfiguration,
and recovery
10/24/2024 Distributed Systems 39
Failure Detection
Detecting hardware failure is difficult
To detect a link failure, a heartbeat protocol can be used
 Assume Site A and Site B have established a link
▪ At fixed intervals, each site will exchange an I-am-up message indicating that they are up and running
▪ If Site A does not receive a message within the fixed interval, it assumes either (a) the other site is not up or (b) the
message was lost
▪ Site A can now send an Are-you-up? message to Site B
▪ If Site A does not receive a reply, it can repeat the message or try an alternate route to Site B
▪ If Site A does not ultimately receive a reply from Site B, it concludes some type of failure has occurred
Types of failures:
 Site B is down
 The direct link between A and B is down
 The alternate link from A to B is down
 The message has been lost
However, Site A cannot determine exactly why the failure has occurred

10/24/2024 Distributed Systems 40


Failure Detection
Detecting hardware failure is difficult
To detect a link failure, a heartbeat protocol can be used
Assume Site A and Site B have established a link
 At fixed intervals, each site will exchange an I-am-up message indicating that they are up
and running
If Site A does not receive a message within the fixed interval, it assumes either (a) the other
site is not up or (b) the message was lost
Site A can now send an Are-you-up? message to Site B
If Site A does not receive a reply, it can repeat the message or try an alternate route to Site B
If Site A does not ultimately receive a reply from Site B, it concludes some type of failure has
occurred
Types of failures:
▪ Site B is down
▪ The direct link between A and B is down
▪ The alternate link from A to B is down
▪ The message has been lost
However, Site A cannot determine exactly why the failure has occurred

10/24/2024 Distributed Systems 41


Reconfiguration and Recovery
When Site A determines a failure has occurred, it
must reconfigure the system:
If the link from A to B has failed, this must be
broadcast to every site in the system
If a site has failed, every other site must also be notified
indicating that the services offered by the failed site are
no longer available
When the link or the site becomes available again,
this information must again be broadcast to all other
sites
10/24/2024 Distributed Systems 42
Transparency
The distributed system should appear as a
conventional, centralized system to the user
User interface should not distinguish between
local and remote resources
User mobility allows users to log into any
machine in the environment and see his/her
environment

10/24/2024 Distributed Systems 43


Scalability
As demands increase, the system should easily
accept the addition of new resources to
accommodate the increased demand
Reacts gracefully to increased load
Adding more resources may generate additional
indirect load on other resources if not careful
Data compression or deduplication can cut down
on storage and network resources used
10/24/2024 Distributed Systems 44
Comparison between Systems
Multiproc. Multicomp. Network Middleware
Item
DOS DOS OS -based OS
Degree of transparency Very High High Low High
Same OS on all nodes Yes Yes No No
Number of copies of OS 1 N N N
Basis for Shared Model
Messages Files
communication memory specific
Global, Global,
Resource management Per node Per node
central distributed
Scalability No Moderately Yes Varies
Openness Closed Closed Open Open
10/24/2024 Distributed Systems 45
10/24/2024 Distributed Systems 46

You might also like