0% found this document useful (0 votes)

80 views52 pages

CH17 COA9e Parallel Processing

This document discusses parallel processing and symmetric multiprocessor (SMP) systems. It covers multiple processor organizations including single instruction, single data (SISD); single instruction, multiple data (SIMD); multiple instruction, single data (MISD); and multiple instruction, multiple data (MIMD). It also discusses SMP organization, advantages and disadvantages of the bus organization, multiprocessor operating system design considerations, cache coherence solutions including software, directory protocols, and snoopy protocols like MESI. Finally, it briefly introduces multithreading and chip multiprocessors.

Uploaded by

Tuấn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views52 pages

CH17 COA9e Parallel Processing

Uploaded by

Tuấn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

+

William Stallings
Computer Organization
and Architecture
9th Edition
+
Chapter 17
Parallel Processing
+
17.1 Multiple Processor Organization
Types of Parallel Processor Systems

 Single instruction, single data  Multiple instruction, single data

(SISD) stream (MISD) stream
 Single processor executes a single  A sequence of data is transmitted to
instruction stream to operate on a set of processors, each of which
data stored in a single memory executes a different instruction
 Uniprocessors fall into this category sequence
 Not commercially implemented

 Single instruction, multiple data  Multiple instruction, multiple data

(SIMD) stream (MIMD) stream
 A single machine instruction controls  A set of processors simultaneously
the simultaneous execution of a execute different instruction
number of processing elements on a sequences on different data sets
lockstep basis  SMPs, clusters and NUMA systems
 Vector and array processors fall into fit this category
this category
Processor Organizations
17.2 Symmetric Multiprocessor (SMP)

A stand alone computer with the

following characteristics:
Processors All processors System
share same share access to controlled by
memory and I/O devices integrated
I/O facilities • Either through operating
Two or more • Processors are same channels or All processors system
similar connected by a different channels can perform the
processors of bus or other giving paths to same functions • Provides
interaction
comparable internal same devices (hence between
capacity connection “symmetric”) processors and
• Memory access their programs at
time is job, task, file and
approximately the data element
same for each levels
processor
Multiprogramming and Multiprocessing
Organization
Symmetric Multiprocessor Organization
+
The bus organization has several
attractive features:

 Simplicity
 Simplest approach to multiprocessor organization

 Flexibility
 Generally easy to expand the system by attaching more processors to the
bus

 Reliability
 The bus is essentially a passive medium and the failure of any attached
device should not cause failure of the whole system
+
Disadvantages of the bus organization:

 Main drawback is performance

 All memory references pass through the common bus
 Performance is limited by bus cycle time

 Each processor should have cache memory

 Reduces the number of bus accesses

 Leads to problems with cache coherence

 If a word is altered in one cache it could conceivably invalidate a word in
another cache
 To prevent this the other processors must be alerted that an update has
taken place
 Typically addressed in hardware rather than the operating system
+Multiprocessor Operating System Design
Considerations
 Simultaneous concurrent processes
 OS routines need to be reentrant to allow several processors to execute the same IS code simultaneously
 OS tables and management structures must be managed properly to avoid deadlock or invalid operations

 Scheduling
 Any processor may perform scheduling so conflicts must be avoided
 Scheduler must assign ready processes to available processors

 Synchronization
 With multiple active processes having potential access to shared address spaces or I/O resources, care must be
taken to provide effective synchronization
 Synchronization is a facility that enforces mutual exclusion and event ordering

 Memory management
 In addition to dealing with all of the issues found on uniprocessor machines, the OS needs to exploit the
available hardware parallelism to achieve the best performance
 Paging mechanisms on different processors must be coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement

 Reliability and fault tolerance

 OS should provide graceful degradation in the face of processor failure
 Scheduler and other portions of the operating system must recognize the loss of a processor and restructure
+
17.3 Cache Coherence
Software Solutions

 Attempt to avoid the need for additional hardware circuitry and logic
by relying on the compiler and operating system to deal with the
problem

 Attractive because the overhead of detecting potential problems is

transferred from run time to compile time, and the design complexity
is transferred from hardware to software
 However, compile-time software approaches generally must make
conservative decisions, leading to inefficient cache utilization
+
Cache Coherence
Hardware-Based Solutions
 Generally referred to as cache coherence protocols

 These solutions provide dynamic recognition at run time of potential

inconsistency conditions

 Because the problem is only dealt with when it actually arises there is more
effective use of caches, leading to improved performance over a software
approach

 Approaches are transparent to the programmer and the compiler, reducing the
software development burden

 Can be divided into two categories:

 Directory protocols
 Snoopy protocols
Directory Protocols

Effective in large scale

Collect and maintain
systems with complex
information about
interconnection
copies of data in cache
schemes

Directory stored in Creates central

main memory bottleneck

Requests are checked Appropriate transfers

against directory are performed
Snoopy Protocols
 Distribute the responsibility for maintaining cache coherence among all
of the cache controllers in a multiprocessor
 A cache must recognize when a line that it holds is shared with other caches
 When updates are performed on a shared cache line, it must be announced to
other caches by a broadcast mechanism
 Each cache controller is able to “snoop” on the network to observe these
broadcast notifications and react accordingly

 Suited to bus-based multiprocessor because the shared bus provides a

simple means for broadcasting and snooping
 Care must be taken that the increased bus traffic required for broadcasting
and snooping does not cancel out the gains from the use of local caches

 Two basic approaches have been explored:

 Write invalidate
 Write update (or write broadcast)
+
Write Invalidate

 Multiple readers, but only one writer at a time

 When a write is required, all other caches of the line are invalidated

 Writing processor then has exclusive (cheap) access until line is

required by another processor

 Most widely used in commercial multiprocessor systems such as the

Pentium 4 and PowerPC

 State of every line is marked as modified, exclusive, shared or invalid

 For this reason the write-invalidate protocol is called MESI
+
Write Update

 Can be multiple readers and writers

 When a processor wishes to update a shared line the word to be

updated is distributed to all others and caches containing that line can
update it

 Some systems use an adaptive mixture of both write-invalidate and

write-update mechanisms
+
MESI Protocol
To provide cache consistency on an SMP the data cache supports
a protocol known as MESI:

 Modified
 The line in the cache has been modified and is available only in this cache

 Exclusive
 The line in the cache is the same as that in main memory and is not present
in any other cache

 Shared
 The line in the cache is the same as that in main memory and may be
present in another cache

 Invalid
 The line in the cache does not contain valid data
Table 17.1
MESI Cache Line States
MESI State Transition Diagram
+
17.4 Multithreading and Chip
Multiprocessors
 Processor performance can be measured by the rate at which it executes
instructions

 MIPS rate = f * IPC

 f = processor clock frequency, in MHz
 IPC = average instructions per cycle

 Increase performance by increasing clock frequency and increasing

instructions that complete during cycle

 Multithreading
 Allows for a high degree of instruction-level parallelism without increasing
circuit complexity or power consumption
 Instruction stream is divided into several smaller streams, known as threads,
that can be executed in parallel
Definitions of Threads
and Processes Thread in multithreaded
processors may or may not be
the same as the concept of
software threads in a
multiprogrammed operating
system
Thread switch Thread is concerned with
scheduling and execution,
• The act of switching whereas a process is concerned
processor control between with both
threads within the same scheduling/execution and
process resource and resource
• Typically less costly than
ownership
process switch

Thread:
• Dispatchable unit of work within a Process:
process • An instance of program running
• Includes processor context (which on computer
includes the program counter and stack • Two key characteristics:
pointer) and data area for stack
• Executes sequentially and is interruptible • Resource ownership
so that the processor can turn to another • Scheduling/execution
thread
Process switch
• Operation that switches the processor from one
process to another by saving all the process control
data, registers, and other information for the first
and replacing them with the process information for
the second
17.4 MULTITHREADING AND CHIP MULTIPROCESSORS

Implicit and Explicit Multithreading

 All commercial processors and most experimental

ones use explicit multithreading
 Concurrently execute instructions from different explicit
threads
 Interleave instructions from different threads on shared
pipelines or parallel execution on parallel pipelines

 Implicit multithreading is concurrent execution of

multiple threads extracted from single sequential
+ program
 Implicit threads defined statically by compiler or
dynamically by hardware
+ Approaches to Explicit Multithreading

 Interleaved  Blocked
 Fine-grained  Coarse-grained
 Processor deals with two or  Thread executed until event causes
more thread contexts at a time delay
 Switching thread at each clock
 Effective on in-order processor
cycle  Avoids pipeline stall
 If thread is blocked it is skipped  Chip multiprocessing
 Simultaneous (SMT)  Processor is replicated on a single
chip
 Instructions are simultaneously  Each processor handles separate
issued from multiple threads to threads
execution units of superscalar
processor  Advantage is that the available
logic area on a chip is used
effectively
+

Approaches to
Executing Multiple
Threads
+
Example Systems

Pentium 4 IBM Power5

 More recent models of the Pentium  Chip used in high-end PowerPC
4 use a multithreading technique products
that Intel refers to as
hyperthreading  Combines chip multiprocessing
with SMT
 Approach is to use SMT with
support for two threads  Has two separate processors, each of
which is a multithreaded processor
 Thus the single multithreaded capable of supporting two threads
processor is logically two concurrently using SMT
processors  Designers found that having two
two-way SMT processors on a single
chip provided superior performance
to a single four-way SMT processor
Power5 Instruction Data Flow
Clusters
 Alternative to SMP as an approach to providing high
performance and high availability
 Particularly attractive for server applications
 Defined as:
 A group of interconnected whole computers working together
as a unified computing resource that can create the illusion of
being one machine
 (The term whole computer means a system that can run on its
own, apart from the cluster)

 Each computer in a cluster is called a node

+  Benefits:
 Absolute scalability
 Incremental scalability
 High availability
 Superior price/performance
+

Cluster Configurations
Table 17.2
Clustering Methods: Benefits and Limitations
+
Operating System Design Issues

 How failures are managed depends on the clustering method used

 Two approaches:
 Highly available clusters
 Fault tolerant clusters

 Failover
 The function of switching applications and data resources over from a failed system to an
alternative system in the cluster

 Failback
 Restoration of applications and data resources to the original system once it has been
fixed

 Load balancing
 Incremental scalability
 Automatically include new computers in scheduling
 Middleware needs to recognize that processes may switch between machines
Parallelizing Computation

Effective use of a cluster requires executing

software from a single application in parallel

Three approaches are:

Parallelizing complier Parallelized application Parametric computing

• Determines at compile time • Application written from the • Can be used if the essence of the
which parts of an application outset to run on a cluster and application is an algorithm or
can be executed in parallel uses message passing to move program that must be executed a
• These are then split off to be data between cluster nodes large number of times, each time
assigned to different computers with a different set of starting
in the cluster conditions or parameters
Cluster Computer Architecture
Example
100-Gbps Ethernet
Configuration for
Massive Blade
Server Site
+
Clusters Compared to SMP
 Both provide a configuration with multiple processors to support high
demand applications
 Both solutions are available commercially

SMP Clustering
 Easier to manage and configure  Far superior in terms of
incremental and absolute
 Much closer to the original single scalability
processor model for which nearly
all applications are written  Superior in terms of availability

 Less physical space and lower  All components of the system can
power consumption readily be made highly redundant

 Well established and stable

+
Nonuniform Memory Access (NUMA)

 Alternative to SMP and clustering

 Uniform memory access (UMA)
 All processors have access to all parts of main memory using loads and stores
 Access time to all regions of memory is the same
 Access time to memory for different processors is the same

 Nonuniform memory access (NUMA)

 All processors have access to all parts of main memory using loads and stores
 Access time of processor differs depending on which region of main memory is
being accessed
 Different processors access different regions of memory at different speeds

 Cache-coherent NUMA (CC-NUMA)

 A NUMA system in which cache coherence is maintained among the caches of the
various processors
Motivation
SMP has practical limit to In clusters each node has its
number of processors that can own private main memory
be used • Applications do not see a large global
• Bus traffic limits to between 16 and 64 memory
processors • Coherency is maintained by software
rather than hardware

Objective with NUMA is to

maintain a transparent system
NUMA retains SMP flavor
wide memory while permitting
while giving large scale
multiple multiprocessor nodes,
multiprocessing
each with its own bus or
internal interconnect system
+

CC-NUMA
Organization
+
NUMA Pros and Cons

 Main advantage of a CC-NUMA

system is that it can deliver
effective performance at higher
levels of parallelism than SMP  Does not transparently look like an
without requiring major software SMP
changes
 Software changes will be required
 Bus traffic on any individual node to move an operating system and
is limited to a demand that the bus applications from an SMP to a CC-
can handle NUMA system

 If many of the memory accesses

 Concern with availability
are to remote nodes, performance
begins to break down
+
Vector Computation

 There is a need for computers to solve mathematical problems of physical

processes in disciplines such as aerodynamics, seismology, meteorology,
and atomic, nuclear, and plasma physics

 Need for high precision and a program that repetitively performs floating
point arithmetic calculations on large arrays of numbers
 Most of these problems fall into the category known as continuous-field
simulation

 Supercomputers were developed to handle these types of problems

 However they have limited use and a limited market because of their price tag
 There is a constant demand to increase performance

 Array processor
 Designed to address the need for vector computation
 Configured as peripheral devices by both mainframe and minicomputer users to
run the vectorized portions of programs
Vector Addition Example
+

Matrix Multiplication
(C = A * B)
+
Approaches to
Vector
Computation
+

Pipelined Processing of
Floating-Point
Operations
A Taxonomy of
Computer Organizations
+

IBM 3090 with Vector

Facility
+
Alternative
Programs
for Vector
Calculation
+

Registers for the IBM

3090 Vector Facility
Table 17.3
IBM 3090 Vector Facility:
Arithmetic and Logical Instructions
+ Summary Parallel
Processing
Chapter 17
 Multithreading and chip multiprocessors
 Implicit and explicit multithreading
 Approaches to explicit multithreading
 Multiple processor organizations  Example systems
 Types of parallel processor systems
 Parallel organizations  Clusters
 Cluster configurations
 Symmetric multiprocessors  Operating system design issues
 Organization  Cluster computer architecture
 Multiprocessor operating system design  Blade servers
considerations  Clusters compared to SMP

 Cache coherence and the MESI protocol  Nonuniform memory access

 Software solutions  Motivation
 Hardware solutions  Organization
 The MESI protocol  NUMA Pros and cons

 Vector computation
 Approaches to vector computation
 IBM 3090 vector facility
+ Key terms Chapter 17

 cache coherence  passive standby

 cluster  snoopy protocol
 directory protocol  symmetric multiprocessor (SMP)
 failback  uniform memory access (UMA)
 failover  uniprocessor
 MESI protocol  vector facility
 multiprocessor
 nonuniform memory access
(NUMA)

William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
CH17-COA10e - Parallel Processing
No ratings yet
CH17-COA10e - Parallel Processing
45 pages
Unit III Multiprocessor Issues
No ratings yet
Unit III Multiprocessor Issues
42 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
PART17
No ratings yet
PART17
45 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
CH17 COA10e
No ratings yet
CH17 COA10e
45 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
CH17 ParallelProcessing 32 Slides
No ratings yet
CH17 ParallelProcessing 32 Slides
32 pages
CH20 COA11e
No ratings yet
CH20 COA11e
40 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Arkom 13-40275
No ratings yet
Arkom 13-40275
32 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Parallel Processing
No ratings yet
Parallel Processing
28 pages
Parallel Prrocessor
No ratings yet
Parallel Prrocessor
12 pages
Multi-Processor-Parallel Processing PDF
No ratings yet
Multi-Processor-Parallel Processing PDF
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Unit 6 Mom
No ratings yet
Unit 6 Mom
23 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Unit6 - Microprocessor - Final 1
No ratings yet
Unit6 - Microprocessor - Final 1
30 pages
Lecture-7 SMP NUMA Cache Coherence
No ratings yet
Lecture-7 SMP NUMA Cache Coherence
34 pages
Computer Architecture Assignment Group 3
No ratings yet
Computer Architecture Assignment Group 3
15 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
2ad6a430 1637912349895
No ratings yet
2ad6a430 1637912349895
51 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
Computer System: Operating Systems: Internals and Design Principles
No ratings yet
Computer System: Operating Systems: Internals and Design Principles
62 pages
Unit VI
No ratings yet
Unit VI
50 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Week 5
No ratings yet
Week 5
52 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
Multiprocessor
No ratings yet
Multiprocessor
45 pages
OS Notes
No ratings yet
OS Notes
16 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
Multi Core
No ratings yet
Multi Core
7 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
L20 EmbeddedMultiprocessors
No ratings yet
L20 EmbeddedMultiprocessors
29 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Parallel Processing:: Multiple Processor Organization
No ratings yet
Parallel Processing:: Multiple Processor Organization
24 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Advanced Operating System: Unit I
No ratings yet
Advanced Operating System: Unit I
27 pages
Chapter01 OSedition7Final
No ratings yet
Chapter01 OSedition7Final
62 pages
CS 523 Advanced Computer Architecture: Introduction To Cache Coherence Protocols
No ratings yet
CS 523 Advanced Computer Architecture: Introduction To Cache Coherence Protocols
24 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
UNIT-4 CA Memory
No ratings yet
UNIT-4 CA Memory
26 pages
MESI Protocol
No ratings yet
MESI Protocol
9 pages
Chapter-7 Multiprocessors and Multicomputers: Module-Iv
No ratings yet
Chapter-7 Multiprocessors and Multicomputers: Module-Iv
53 pages
24 - Caching
No ratings yet
24 - Caching
22 pages
IHI0050G Amba Chi Architecture Spec
No ratings yet
IHI0050G Amba Chi Architecture Spec
585 pages
Multiprocessors Shared Memory
No ratings yet
Multiprocessors Shared Memory
16 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
DS1822 - Parallel Computing - Unit 1
No ratings yet
DS1822 - Parallel Computing - Unit 1
23 pages
MCAP QP CT - I - 2 Marks - Key
No ratings yet
MCAP QP CT - I - 2 Marks - Key
3 pages
Multicore-Architecture-And-Programming Notes
No ratings yet
Multicore-Architecture-And-Programming Notes
64 pages
1multiprocessors and Multicomputers: A. Multiprocessor System Interconnects
No ratings yet
1multiprocessors and Multicomputers: A. Multiprocessor System Interconnects
16 pages
CacheCoherencyWhitepaper 6june2011 PDF
No ratings yet
CacheCoherencyWhitepaper 6june2011 PDF
15 pages
Multicore Question Bank
No ratings yet
Multicore Question Bank
5 pages
Cache Coherence Snoopy Bus Protocol
No ratings yet
Cache Coherence Snoopy Bus Protocol
15 pages
Cache Coherence: CSE 661 - Parallel and Vector Architectures
No ratings yet
Cache Coherence: CSE 661 - Parallel and Vector Architectures
37 pages
MultiCore Architecture
100% (2)
MultiCore Architecture
44 pages
Cache Coherence
No ratings yet
Cache Coherence
39 pages
I Unit OS
No ratings yet
I Unit OS
47 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
63 pages
AMBA CHI Issue G Errata v6.0
No ratings yet
AMBA CHI Issue G Errata v6.0
26 pages
Multi-Core Processing: Advantages & Challenges
No ratings yet
Multi-Core Processing: Advantages & Challenges
35 pages
Large and Fast: Exploiting Memory Hierarchy: Computer Organization and Design
No ratings yet
Large and Fast: Exploiting Memory Hierarchy: Computer Organization and Design
107 pages
Dcos: Cache Embedded Switch Architecture For Distributed Shared Memory Multiprocessor Socs
No ratings yet
Dcos: Cache Embedded Switch Architecture For Distributed Shared Memory Multiprocessor Socs
4 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Lecture 5
No ratings yet
Lecture 5
15 pages

CH17 COA9e Parallel Processing

Uploaded by

CH17 COA9e Parallel Processing

Uploaded by

+

 Single instruction, single data  Multiple instruction, single data

 Single instruction, multiple data  Multiple instruction, multiple data

A stand alone computer with the

 Main drawback is performance

 Each processor should have cache memory

 Leads to problems with cache coherence

 Reliability and fault tolerance

 Attractive because the overhead of detecting potential problems is

 These solutions provide dynamic recognition at run time of potential

 Can be divided into two categories:

Effective in large scale

Directory stored in Creates central

Requests are checked Appropriate transfers

 Suited to bus-based multiprocessor because the shared bus provides a

 Two basic approaches have been explored:

 Multiple readers, but only one writer at a time

 Writing processor then has exclusive (cheap) access until line is

 Most widely used in commercial multiprocessor systems such as the

 State of every line is marked as modified, exclusive, shared or invalid

 Can be multiple readers and writers

 When a processor wishes to update a shared line the word to be

 Some systems use an adaptive mixture of both write-invalidate and

 MIPS rate = f * IPC

 Increase performance by increasing clock frequency and increasing

Implicit and Explicit Multithreading

 All commercial processors and most experimental

 Implicit multithreading is concurrent execution of

Pentium 4 IBM Power5

 Each computer in a cluster is called a node

 How failures are managed depends on the clustering method used

Effective use of a cluster requires executing

Three approaches are:

Parallelizing complier Parallelized application Parametric computing

 Well established and stable

 Alternative to SMP and clustering

 Nonuniform memory access (NUMA)

 Cache-coherent NUMA (CC-NUMA)

Objective with NUMA is to

 Main advantage of a CC-NUMA

 If many of the memory accesses

 There is a need for computers to solve mathematical problems of physical

 Supercomputers were developed to handle these types of problems

IBM 3090 with Vector

Registers for the IBM

 Cache coherence and the MESI protocol  Nonuniform memory access

 cache coherence  passive standby

You might also like