0% found this document useful (0 votes)

2 views63 pages

Multi Processor

Uploaded by

tanushreesamantha35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views63 pages

Multi Processor

Uploaded by

tanushreesamantha35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Systems with Multiple CPUs

❑ Collection of independent CPUs (or computers) that

appears to the users/applications as a single system
❑ Technology trends
▪ Powerful, yet cheap, microprocessors
▪ Advances in communications
▪ Physical limits on computing power of a single CPU
❑ Examples
▪ Network of workstations
▪ Servers with multiple processors
▪ Network of computers of a company
▪ Microcontrollers inside a car
1
Advantages
❑ Data sharing: allows many users to share a common
data base
❑ Resource sharing: expensive devices such as a color
printer
❑ Parallelism and speed-up: multiprocessor system can
have more computing power than a mainframe
❑ Better price/performance ratio than mainframes
❑ Reliability: Fault-tolerance can be provided against
crashes of individual machines
❑ Flexibility: spread the workload over available
machines
❑ Modular expandability: Computing power can be
added in small increments (upgrading CPUs like
memory)
2
Design Issues
❑ Transparency: How to achieve a single-system image
▪ How to hide distribution of memory from applications?
▪ How to maintain consistency of data?
❑ Performance
▪ How to exploit parallelism?
▪ How to reduce communication delays?
❑ Scalability: As more components (say, processors) are
added, performance should not degrade
▪ Centralized schemes (e.g. broadcast messages) don’t work
❑ Security

3
Classification
❑ Multiprocessors
▪ Multiple CPUs with shared memory
▪ Memory access delays about 10 – 50 nsec
❑ Multicomputers
▪ Multiple computers, each with own CPU and memory, connected by a high-
speed interconnect
▪ Tightly coupled with delays in micro-seconds
❑ Distributed Systems
▪ Loosely coupled systems connected over Local Area Network (LAN), or even
long-haul networks such as Internet
▪ Delays can be seconds, and unpredictable

4
Mutiprocessors

5
Multiprocessor Systems
❑ Multiple CPUs with a shared memory
❑ From an application’s perspective, difference with
single-processor system need not be visible
▪ Virtual memory where pages may reside in memories
associated with other CPUs
▪ Applications can exploit parallelism for speed-up
❑ Topics to cover
1. Multiprocessor architectures
2. Cache coherence
3. OS organization
4. Synchronization
5. Scheduling
6
Multiprocessor Architecture

❑ UMA (Uniform Memory Access)

▪ Time to access each memory word is the same
▪ Bus-based UMA
▪ CPUs connected to memory modules through switches
❑ NUMA (Non-uniform memory access)
▪ Memory distributed (partitioned among processors)
▪ Different access times for local and remote accesses

7
Bus-based UMA
❑ All CPUs and memory module connected over a
shared bus
❑ To reduce traffic, each CPU also has a cache
❑ Key design issue: how to maintain coherency of data
that appears in multiple places?
❑ Each CPU can have a local memory module also that
is not shared with others
❑ Compilers can be designed to exploit the memory
structure
❑ Typically, such an architecture can support 16 or 32
CPUs as a common bus is a bottleneck (memory
access not parallelized)

8
Switched UMA
❑ Goal: To reduce traffic on bus, provide multiple
connections between CPUs and memory units so that
many accesses can be concurrent
❑ Crossbar Switch: Grid with horizontal lines from CPUs
and vertical lines from memory modules
❑ Crossbar at (i,j) can connect i-th CPU with j-th
memory module
❑ As long as different processors are accessing
different modules, all requests can be in parallel
❑ Non-blocking: waiting caused only by contention for
memory, but not for bus
❑ Disadvantage: Too many connections (quadratic)
❑ Many other networks: omega, counting, …
9
Crossbar Switch

10
Cache Coherence
❑ Many processors can have locally cached copies of
the same object
▪ Level of granularity can be an object or a block of 64 bytes
❑ We want to maximize concurrency
▪ If many processors just want to read, then each one can have a
local copy, and reads won’t generate any bus traffic
❑ We want to ensure coherence
▪ If a processor writes a value, then all subsequent reads by
other processors should return the latest value
❑ Coherence refers to a logically consistent global
ordering of reads and writes of multiple processors
❑ Modern multiprocessors support intricate schemes

11
Consistency and replication
❑ Need to replicate (cache) to improve
performance
▪ How updates are propagated between cached replicas
▪ How to keep them consistent
❑ How to keep them consistency (much more
complicated than sequential processor)
▪ When a processor change the vale value of its copy of a
variable,
• the other copies are invalidated (invalidate protocol), or
• the other copies are updated (update protocol).

12
Example

P1’s cache X=1

X=1 Memory

P2’s cache X=1

13
Invalidate vs. update protocols

P1’s cache X=3

X=1 Memory
P2’s cache X=1

P1’s cache X=3

X=3 Memory
P2’s cache X=3

14
Snoopy Protocol
❑ Each processor, for every cached object, keeps a state that can be
Invalid, Exclusive or Read-only
❑ Goal: If one has Exclusive copy then all others must be Invalid
❑ Each processor issues three types of messages on bus
▪ Read-request (RR), Write-request (WR), and Value-response (VR)
▪ Each message identifies object, and VR has a tagged value
❑ Assumption:
▪ If there is contention for bus then only one succeeds
▪ No split transactions (RR will have a response by VR)
❑ Protocol is called Snoopy, because everyone is listening to the bus
all the time, and updates state in response to messages RR and WR
❑ Each cache controller responds to 4 types of events
▪ Read or write operation issued by its processor
▪ Messages (RR, WR, or VR) observed on the bus
❑ Caution: This is a simplified version
15
Snoopy Cache Coherence

ID Val State
Processor 1 Processor N

Read(x), Write(x,u)
x v Exclusive

Cache Controller

RR(x), WR(x), VR(x,u)

16
Snoopy Protocol
❑ If state is Read-only
▪ Read operation: return local value
▪ Write operation: Broadcast WR message on bus, update state to Exclusive,
and update local value
▪ WR message on bus: update state to Invalid
▪ RR message on bus: broadcast VR(v) on bus
❑ If state is Exclusive
▪ Read operation: return local value
▪ Write operation: update local value
▪ RR message on bus: Broadcast VR(v), and change state to Read-only
▪ WR message on bus: update state to Invalid
❑ If state is Invalid
▪ Read operation: Broadcast RR, Receive VR(v), update state to Read-only,
and local value to v
▪ Write operation: As in first case
▪ VR(v) message on bus: Update state to Read-only, and local copy to v
▪ WR message on the bus: do nothing
17
Sample Scenario for Snoopy
❑ Assume 3 processors P1, P2, P3. One object x : int
❑ Initially, P1’s entry for x is invalid, P2’s entry is Exclusive with value 3, and
P3’s entry is invalid
❑ A process running on P3 issues Read(x)
❑ P3 sends the message RR(x) on the bus
❑ P2 updates its entry to Read-only, and sends the message VR(x,3) on the bus
❑ P3 updates its entry to Read-only, records the value 3 in the cache, and
returns the value 3 to Read(x)
❑ P1 also updates the x-entry to (Read-Only, 3)
❑ Now, if Read(x) is issued on any of the processors, no messages will be
exchanged, and the corresponding processor will just return value 3 by a
local look-up

❑ P1: x=(inv,-) … x=(ro,3)

❑ P2: x=(exc,3) … X=(ro,3); VR(x,3);
❑ P3: x=(inv,-) … Read(x); RR(x); … x=(ro,3),return(x,3)

18
Snoopy Scenario (Continued)
❑ Suppose a process running on P1 issues Write(x,0)
❑ At the same time, a process running on P2 issues Write(x,2)
❑ P1 will try to send WR on the bus, as well as P2 will try to send WR on
the bus
❑ Only one of them succeeds, say, P1 succeeds
❑ P1 will update cache-entry to (Exclusive,0)
❑ P3 will update cache-entry to Invalid
❑ P2 will update cache-entry to Invalid
❑ Now, Read / Write operations by processes on P1 will use local copy,
and won’t generate any messages

❑ P1: Write(x,0); WR(x); x=(ex,0)

❑ P2: Write(x,2); WR(x); x=(inv,-)
❑ P3: … x=(inv,-)

19
Notions of consistency
❑ Strict consistency: any read on a data item x returns a
value corresponding to the result of the most recent write
on x (need absolute global time)
▪ P1: w(x)a P1: w(x)a
▪ P2: r(x)a P2: r(x)NIL r(x)a

❑ Sequential consistency: the result of any execution is the

same as if the R/W operations by all processes were
executed in some sequential order and the operations of
each process appear in this sequence in the order
specified by its program
▪ P1: w(x)a P1: w(x)a
▪ P2: w(x)b P2: w(x)b
▪ P3: r(x)b r(x)a P3: r(x)b r(x)a
▪ P4: r(x,b) r(x,a) P4: r(x)a r(x)b

20
Multiprocessor OS
❑ How should OS software be organized?
❑ OS should handle allocation of processes to
processors. Challenge due to shared data structures
such as process tables and ready queues
❑ OS should handle disk I/O for the system as a whole
❑ Two standard architectures
▪ Master-slave
▪ Symmetric multiprocessors (SMP)

21
Master-Slave Organization

❑ Master CPU runs kernel, all others run user processes

❑ Only one copy of all OS data structures
❑ All system calls handled by master CPU
❑ Problem: Master CPU can be a bottleneck

22
Symmetric Multiprocessing (SMP)
❑ Only one kernel space, but OS can run on any CPU
❑ Whenever a user process makes a system call, the same CPU runs
OS to process it
❑ Key issue: Multiple system calls can run in parallel on different
CPUs
▪ Need locks on all OS data structures to ensure mutual exclusion for critical
updates
❑ Design issue: OS routines should have independence so that level
of granularity for locking gives good performance

Bus
23
Synchronization
❑ Recall: Mutual exclusion solutions to protect critical
regions involving updates to shared data structures
❑ Classical single-processor solutions
▪ Disable interrupts
▪ Powerful instructions such as Test&Set (TSL)
▪ Software solution such as Peterson’s algorithm
❑ In multiprocessor setting, competing processes can
all be OS routines (e.g., to update process table)
❑ Disabling interrupts is not relevant as there are
multiple CPUs
❑ TSL can be used, but requires modification

24
Original Solution using TSL
Shared variable: lock :{0,1}
lock==1 means some process is in CS
Initially lock is 0
Code for process P0 as well as P1:
while (TRUE) {
try: TSL X, lock /* test-and-set lock */
if (X!=0) goto try; /*retry if lock set*/
CS();
lock = 0; /* reset the lock */
Non_CS();
}

25
TSL solution for multi-processors
❑ TSL involves testing and setting memory, this can
require 2 memory accesses
▪ Not a problem to implement this in single-processor system
❑ Now, bus must be locked to avoid split transaction
▪ Bus provides a special line for locking
❑ A process that fails to acquire lock checks repeatedly
issuing more TSL instructions
▪ Requires Exclusive access to memory block
▪ Cache coherence protocol would generate lots of traffic
❑ Goal: To reduce number of checks
1. Exponential back-off: instead of constant polling, check only
after delaying (1, 2, 4, 8 instructions)
2. Maintain a list of processes waiting to acquire lock.
26
Busy-Waiting vs Process switch
❑ In single-processors, if a process is waiting to
acquire lock, OS schedules another ready process
❑ This may not be optimal for multiprocessor systems
▪ If OS itself is waiting to acquire ready list, then switching
impossible
▪ Switching may be possible, but involves acquiring locks, and
thus, is expensive
❑ OS must decide whether to switch (choice between
spinning and switching)
▪ spinning wastes CPU cycles
▪ switching uses up CPU cycles also
▪ possible to make separate decision each time locked mutex
encountered
27
Multiprocessors: Summary
❑ Set of processors connected over a bus with shared
memory modules
❑ Architecture of bus and switches important for efficient
memory access
❑ Caching essential; to manage multiple caches, cache
coherence protocol necessary (e.g. Snoopy)
❑ Symmetric Multiprocessing (SMP) allows OS to run on
different CPUs concurrently
❑ Synchronization issues: OS components work on shared
data structures
▪ TSL based solution to ensure mutual exclusion
▪ Spin locks (i.e. busy waiting) with exponential backoff to reduce
bus traffic
28
Scheduling
❑ Recall: Standard scheme for single-processor scheduling
▪ Make a scheduling decision when a process blocks/exits or when
a clock interrupt happens indicating end of time quantum
▪ Scheduling policy needed to pick among ready processes, e.g.
multi-level priority (queues for each priority level)
❑ In multiprocessor system, scheduler must pick among
ready processes and also a CPU
❑ Natural scheme: when a process executing on CPU k
finishes or blocks or exceeds its time quantum, then
pick a ready process according to scheduling policy and
assign it to CPU k. But this ignores many issues…

29
Issues for Multiprocessor Scheduling
❑ If a process is holding a lock, it is unwise to switch it
even if time quantum expires
❑ Locality issues
▪ If a process p is assigned to CPU k, then CPU k may hold
memory blocks relevant to p in its cache, so p should be assigned
to CPU k whenever possible
▪ If a set of threads/processes communicate with one another then
it is advantageous to schedule them together
❑ Solutions
▪ Space sharing by allocating CPUs in partitions
▪ Gang scheduling: scheduling related threads in same time slots

30
Multicomputers

31
Multicomputers

❑ Definition:
Tightly-coupled CPUs that do not share memory

❑ Communication by high-speed interconnect via

messages

❑ Also known as
▪ cluster computers
▪ clusters of workstations (COWs)

32
Clusters

❑Interconnection topologies
(a) single switch (d) double torus
(b) ring (e) cube
(c) grid (f) Hypercube (2**d, d is dimeter)
33
Switching Schemes
❑ Messages are transferred in chunks called packets
❑ Store and forward packet switching
▪ Each switch collects bits on input line, assembles the packet, and
forwards it towards destination
▪ Each switch has a buffer to store packets
▪ Delays can be long
❑ Hot-potato routing: No buffering
▪ Necessary for optical communication links
❑ Circuit switching
▪ First establish a path from source to destination
▪ Pump bits on the reserved path at a high rate
❑ Wormhole routing
▪ Split packet into subpackets to optimize circuit switching
34
Interprocess Communication

❑ How can processes talk to each other on multi-

computers?
▪ User-level considerations: ease of use etc
▪ OS level consideration: efficient implementation
❑ Message passing
❑ Remote procedure calls (RPC)
❑ Distributed shared memory (DSM)

35
Message-based Communication

(a) Blocking send call

❑ Minimum services
provided
▪ send and receive
commands

❑ These are blocking

(synchronous) calls

(b) Nonblocking send call

36
User-level Communication Primitives

❑ Library Routines
▪ Send (destination address, buffer containing message)
▪ Receive (optional source address, buffer to store message)
❑ Design issues
▪ Blocking vs non-blocking calls
▪ Should buffers be copied into kernel space?

37
Blocking vs Non-blocking

❑ Blocking send: Sender process waits until the message

is sent
▪ Disadvantage: Process has to wait
❑ Non-blocking send: Call returns control to sender
immediately
▪ Buffer must be protected
❑ Possible ways of handling non-blocking send
▪ Copy into kernel buffer
▪ Interrupt sender upon completion of transmission`
▪ Mark the buffer as read-only (at least a page long), copy on write
❑ Similar issues for handling receive calls

38
Buffers and Copying

❑ Network interface card has its own buffers

▪ Copy from RAM to sender’s card
▪ Store-and-forward switches may involve copying
▪ Copy from receiver’s card to RAM
❑ Copying slows down end-to-end communication
▪ Copying not an issue in disk I/O due to slow speed
❑ Additional problem: should message be copied from
sender process buffer to kernel space?
▪ User pages can be swapped out
❑ Typical solutions
▪ Programmed I/O for small packets
▪ DMA for large messages with disabling of page replacement
39
The Problem with Messages

❑ Messages are flexible, but

❑ They are not a natural programming model
▪ Programmers have to worry about message formats
▪ messages must be packed and unpacked
▪ messages have to be decoded by server to figure out
what is requested
▪ messages are often asynchronous
▪ they may require special error handling functions

40
Remote Procedure Call
❑ Procedure call is a more natural way to
communicate
▪ every language supports it
▪ semantics are well defined and understood
▪ natural for programmers to use
❑ Basic idea of RPC (Remote Procedure Call)
▪ define a server as a module that exports a set of
procedures that can be called by client programs.
call
Client Server
return

41
A brief history of RPC

❑ Birrell and Nelson in 1980, based on work done

at Xerox PARC.
❑ Similar idea used in RMI, CORBA or COM standards
❑ Core of many client-server systems
❑ Transparency is to goal!

42
Remote Procedure Call
❑ Use procedure call as a model for distributed
communication
❑ RPCs can offer a good programming abstraction to hide
low-level communication details
❑ Goal - make RPC look as much like local PC as possible
❑ Many issues:
▪ how do we make this invisible to the programmer?
▪ what are the semantics of parameter passing?
▪ how is binding done (locating the server)?
▪ how do we support heterogeneity (OS, arch., language)?
▪ how to deal with failures?
▪ etc.

43
Steps in Remote Procedure Calls

❑ There are 3 components on each side:

▪ a user program (client or server)
▪ a set of stub procedures
▪ RPC runtime support
❑ Steps in RPC
▪ Client invokes a library routine called client stub, possibly with
parameters
▪ Client stub generates a message to be sent: parameter marshaling
▪ Kernels on client and server handle communication
▪ Receiver kernel calls server stub
▪ Server stub unpacks parameters and invokes server routine
44
Remote Procedure Call

❑ Steps in making a remote procedure call

▪ the stubs are shaded gray

45
RPC Call Structure

client client makes server is proc foo(a,b) server

program call foo(x,y) local call to called by program
stub proc. its stub begin foo...
end foo
call foo call foo

client stub builds msg stub unpacks

proc foo(a,b) params and call foo(x,y) server
stub packet, inserts
makes call stub
params

send msg msg received

runtime sends runtime
RPC msg to remote receives msg RPC
runtime runtime
node and calls stub

Call
46
RPC Return Structure

client proc foo(a,b) server

server proc
program call foo(x,y) client continues program
returns begin foo...
end foo
return return
stub builds
client stub unpacks result msg
proc foo(a,b) call foo(x,y) server
stub msg, returns with output stub
to caller args

msg received send msg

runtime runtime
RPC responds RPC
runtime receives msg,
to original runtime
calls stub
msg

return
47
RPC Stubs

❑ A client-side stub is a procedure that looks to the client as

if it were a callable server procedure.
❑ A server-side stub looks to the server as if it’s a calling
client.
❑ The stubs send messages to each other to make the RPC
happen.
❑ Server program defines the server’s interface using an
interface definition language (IDL)
▪ Define names, parameters, and types
❑ A stub compiler reads the IDL and produces two stub
procedures for each server procedure
❑ The server program links it with the server-side stubs; the
client program links with the client-side stubs.

48
RPC Parameter Marshalling

❑ The packing of procedure parameters into a message packet.

❑ The RPC stubs call type-specific procedures to marshall (or
unmarshall) all of the parameters to the call.
❑ Representation needs to deal with byte ordering issues
▪ Different data representatin (ASCII, UNICODE, EBCDIC)
▪ big-endian (bytes from left to rigth, Intel) versus little-endian (bytes from
right to left, SPARC)
▪ strings (some CPUs require padding)
▪ alignment, etc.
❑ Parameter passing
▪ By value
▪ By reference
▪ Size limit?

49
RPC failure semantics

❑ A remote procedure call makes a call to a

remote service look like a local call
▪ RPC makes transparent whether server is local or
remote
▪ RPC allows applications to become distributed
transparently
▪ RPC makes architecture of remote machine transparent
❑ What if there is a failure?
❑ Goal: Make RPC behave like local procedure call

50
Types of failure

❑ Cannot locate the server

▪ server down
▪ version mismatch
▪ raise an exception
❑ Request message is lost
❑ Reply message is lost
❑ Server crashes after receiving a request
❑ Client crashes after sending a request

51
Handling message failure

❑ request msg is lost

▪ use timer and resend request msg
❑ reply msg is lost
▪ use timer and resend another request
▪ server need to tell whether the request is duplicate
unless the request is idempotent
• make all request idempotent
– redefine read (fd, buf, n) to read (fd, buf, pos, n)
– deposit (money) -- not possible to make it idempotent
• assign request numbers and keep track

52
Possible semantics to deal with crashes

❑ Do nothing and leave it up to the user

❑ At least once
▪ Successful return
• Executed at lease once.
▪ Only for idempotent functions
❑ At most once
▪ Suppress duplicated requests
▪ Client
• Each request has an unique id
▪ Server
• Saves request results
❑ Exactly once (not possible to implement)

53
Shared memory vs. message passing

❑ Message passing
▪ better performance
▪ know when and what msgs sent: control, knowledge
❑ Shared memory
▪ familiar
▪ hides details of communication
▪ no need to name receivers or senders, just write to
specific memory address and read later
▪ caching for “free”
▪ porting from centralized system (the original “write
once run anywhere”)
▪ no need to rewrite when adding processs, scales because
adds memory for each node
▪ Initial implementation correct (agreement is reached at
the memory system level), all changes are just
optimizations
54
Distributed Shared Memory (DSM)

Replication
(a) Pages distributed on 4
machines

(b) CPU 0 reads page 10

(c) CPU 1 reads page 10

55
Distributed Shared Memory (DSM)
❑ data in shared address space accessed as in traditional
VM.
❑ mapping manager -- maps the shared address space to
the physical address space.
❑ Advantage of DSM

▪ no explicit comm. primitives, send and receive, needed in program.

It is believed to be easier to design and write parallel alg's using
DSM
▪ complex data structure can be passed by reference.
▪ moving page containing the data take advantage of locality and
reduce comm. overhead.
▪ cheaper to build DSM system than tightly coupled multiprocessor
system.
▪ scalable -- improved portability of programs written for
multiprocessors.

56
DSM Implementation Issues
❑ Recall: In virtual memory, OS hides the fact that pages may reside
in main memory or on disk
❑ Recall: In multiprocessors, there is a single shared memory
(possibly virtual) accessed by multiple CPUs. There may be
multiple caches, but cache coherency protocols hide this from
applications
▪ how to make shared data concurrently accessible
❑ DSM: Each machine has its own physical memory, but virtual
memory is shared, so pages can reside in any memory or on disk
▪ how to keep track of the location of shared data
❑ On page fault, OS can fetch the page from remote memory
▪ how to overcome comm. delays and protocol overhead when accessing
remote data

57
Distributed Shared Memory

❑ Note layers where it can be implemented

▪ hardware
▪ operating system
▪ user-level software
58
Some Implementation Details

❑ Every computer has its own page-table

❑ If accessed page is not in memory, then message sent to lookup
where it resides, and the page is fetched
▪ The client-server algorithm
▪ The migration algorithm
❑ Replication used to reduce the traffic
▪ The read-replication algorithm
▪ The full-replication algorithm
❑ As in cache coherence for multiple caches in a multiprocessor
system, a page can reside on multiple computers with read-only
flag set
❑ To write a page other copies must be invalidated
❑ False sharing: No variables actually shared, but they may reside on
the same page
▪ Compiler should make an effort to avoid this
59
Cache/Memory Coherence and
Consistency
❑Coherence: every cache/CPU must have a
coherent view of memory
▪ If P writes X to A, then reads A, if no other proc writes
A, then P reads X
▪ If P1 writes X to A, and no other processor writes to A,
then P2 will eventually read X from A.
▪ If P1 writes X to A, and P2 writes Y to A, then every
processor will either read X then Y, or Y then X, but all
will see the writes in the same order.
❑Consistency: memory consistency model tells us
when writes to different locations will be seen by
readers.
60
False sharing in DSM

❑ False Sharing
❑ Must also achieve sequential consistency

61
Load Balancing

❑ In a multicomputer setting, system must determine

assignment of processes to machines
❑ Formulation as an optimization problem:
▪ Each process has estimated CPU and memory requirements
▪ For every pair of processes, there is an estimated traffic
❑ Goal: Given k machines, cluster the processes into k
clusters such that
▪ Traffic between clusters is minimized
▪ Aggregate memory/CPU requirements of processes within each
cluster are evenly balanced (or are below specified limits)

62
Algorithms for Load Balancing

❑ Finding optimal allocation is computationally expensive

▪ NP-hard (must try all possible combinations in the worst case)
▪ Must settle for greedy heuristics that perform well
❑ Dynamic adaptation schemes
▪ Sender Initiated Schemes
• Assign processes by default choices
• If one machine senses overload, probe others for help
• If load is low, respond to probes, and accept process migration
▪ Receiver Initiated Schemes
• When load is low, probe other machines if they are loaded with processes
• Probe traffic does not degrade performance during overload

Bosch FPA5000 General
0% (1)
Bosch FPA5000 General
95 pages
WS Retail Services Pvt. LTD.
No ratings yet
WS Retail Services Pvt. LTD.
2 pages
Yahoo
100% (11)
Yahoo
14 pages
Maths Class X Half Yearly Exam Sample Paper 01
No ratings yet
Maths Class X Half Yearly Exam Sample Paper 01
4 pages
10 Multithreading
No ratings yet
10 Multithreading
60 pages
CompArch 23a MP-1
No ratings yet
CompArch 23a MP-1
17 pages
CH 4 Synchronization Models of Memory Consistency
100% (1)
CH 4 Synchronization Models of Memory Consistency
26 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Using DFS and GPO in FIM High Availability Scenarios
No ratings yet
Using DFS and GPO in FIM High Availability Scenarios
40 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
SKJ2413 Object-Oriented Programming (Lab 5 Sheet) : DR Muhd Zalisham Bin Jali
No ratings yet
SKJ2413 Object-Oriented Programming (Lab 5 Sheet) : DR Muhd Zalisham Bin Jali
17 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
Expert: Rotating Machinery
No ratings yet
Expert: Rotating Machinery
13 pages
Basic Configuration of Mikrotik
No ratings yet
Basic Configuration of Mikrotik
7 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Laptop Comparision Table
No ratings yet
Laptop Comparision Table
9 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Aos Questions Ia
No ratings yet
Aos Questions Ia
19 pages
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
14 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Memory Consistency Model
No ratings yet
Memory Consistency Model
17 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
HW 3 ELEG 4339
No ratings yet
HW 3 ELEG 4339
3 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
7 pages
Lecture-7 SMP NUMA Cache Coherence
No ratings yet
Lecture-7 SMP NUMA Cache Coherence
34 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
LD Config
No ratings yet
LD Config
9 pages
Unit 5
No ratings yet
Unit 5
89 pages
PART17
No ratings yet
PART17
45 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
ITE 12 Topic1
No ratings yet
ITE 12 Topic1
20 pages
The FPGA Implementation of The Digital Receiver 3
No ratings yet
The FPGA Implementation of The Digital Receiver 3
64 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Lecture 5
No ratings yet
Lecture 5
15 pages
Coherence
No ratings yet
Coherence
16 pages
Module 4
No ratings yet
Module 4
40 pages
Week 5
No ratings yet
Week 5
52 pages
Knowledge Management & Expert System Module-3
No ratings yet
Knowledge Management & Expert System Module-3
43 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Licensing and Activation: Mapinfo Professional
No ratings yet
Licensing and Activation: Mapinfo Professional
14 pages
L7 Multicore 2
No ratings yet
L7 Multicore 2
22 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Install Cyberpanel
No ratings yet
Install Cyberpanel
3 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
No ratings yet
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
54 pages
Module 4
No ratings yet
Module 4
66 pages
Cisco - How To Configure NTP
No ratings yet
Cisco - How To Configure NTP
12 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
CC 2015 rpps5m l01 FL
No ratings yet
CC 2015 rpps5m l01 FL
2 pages
Service Manual: Model
No ratings yet
Service Manual: Model
98 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Flares Im
No ratings yet
Flares Im
270 pages
DSM
No ratings yet
DSM
36 pages
5986 iPhone-7-7-Plus ApplicationForm en PDF
No ratings yet
5986 iPhone-7-7-Plus ApplicationForm en PDF
4 pages
Comparative Analysis On Techniques For Big Data Testing: Adiba Abidin Divya Lal Naveen Garg Vikas Deep
No ratings yet
Comparative Analysis On Techniques For Big Data Testing: Adiba Abidin Divya Lal Naveen Garg Vikas Deep
5 pages
jbpm3 2 2-Handsontutorial
No ratings yet
jbpm3 2 2-Handsontutorial
99 pages
David Mason Resume
No ratings yet
David Mason Resume
1 page
WIA IM IQAS Application Process
No ratings yet
WIA IM IQAS Application Process
2 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Subtitle Edit 3.4 Help - SHORTCUTS
No ratings yet
Subtitle Edit 3.4 Help - SHORTCUTS
4 pages
Expert System in AI
No ratings yet
Expert System in AI
11 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Caster Manual
No ratings yet
Caster Manual
291 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Cement Case Study
No ratings yet
Cement Case Study
5 pages
LQ 680
No ratings yet
LQ 680
6 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Advanced Unix Programming
From Everand
Advanced Unix Programming
Prof. N. B Venkateswarlu
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet

Multi Processor

Uploaded by

Multi Processor

Uploaded by

Systems with Multiple CPUs

❑ Collection of independent CPUs (or computers) that

❑ UMA (Uniform Memory Access)

P1’s cache X=1

P2’s cache X=1

P1’s cache X=3

P1’s cache X=3

RR(x), WR(x), VR(x,u)

❑ P1: x=(inv,-) … x=(ro,3)

❑ P1: Write(x,0); WR(x); x=(ex,0)

❑ Sequential consistency: the result of any execution is the

❑ Master CPU runs kernel, all others run user processes

❑ Communication by high-speed interconnect via

❑ How can processes talk to each other on multi-

(a) Blocking send call

❑ These are blocking

(b) Nonblocking send call

❑ Blocking send: Sender process waits until the message

❑ Network interface card has its own buffers

❑ Messages are flexible, but

❑ Birrell and Nelson in 1980, based on work done

❑ There are 3 components on each side:

❑ Steps in making a remote procedure call

client client makes server is proc foo(a,b) server

client stub builds msg stub unpacks

send msg msg received

client proc foo(a,b) server

msg received send msg

❑ A client-side stub is a procedure that looks to the client as

❑ The packing of procedure parameters into a message packet.

❑ A remote procedure call makes a call to a

❑ Cannot locate the server

❑ request msg is lost

❑ Do nothing and leave it up to the user

(b) CPU 0 reads page 10

(c) CPU 1 reads page 10

▪ no explicit comm. primitives, send and receive, needed in program.

❑ Note layers where it can be implemented

❑ Every computer has its own page-table

❑ In a multicomputer setting, system must determine

❑ Finding optimal allocation is computationally expensive

You might also like