0% found this document useful (0 votes)

7 views32 pages

CH17 ParallelProcessing 32 Slides

This chapter discusses parallel processing, focusing on various types of parallel processor organizations, including SISD, SIMD, and MIMD. It also covers symmetric multiprocessors (SMP), cache coherence issues, the MESI protocol, and multithreading techniques. Key design considerations for multiprocessor operating systems, including scheduling, synchronization, and memory management, are also highlighted.

Uploaded by

quocvietlhbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views32 pages

CH17 ParallelProcessing 32 Slides

Uploaded by

quocvietlhbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

+

Parallel
Chapter Processing
17
William Stallings, Computer Organization and Architecture, 9 th Edition
+
Objectives

You are profiting from multiple CPU computers, You should

know about them.

After studying this chapter, you should be able to:

 Summarize the types of parallel processor organizations.
 Present an overview of design features of symmetric
multiprocessors. Understand the issue of cache
coherence in a multiple processor system.
 Explain the key features of the MESI protocol.
 Explain the difference between implicit and explicit
multithreading. Summarize key design issues for clusters.
+
Contents

 17.1 Multiple Processor Organizations

 17.2 Symmetric Multiprocessors
 17.3 Cache Coherence and the MESI Protocol
 17.4 Multithreading and Chip Multiprocessors
+
17.1- Multiple Processor
Organization
 Single instruction, single data  Multiple instruction, single
(SISD) stream data (MISD) stream
 Single processor executes a  A sequence of data is
single instruction stream to transmitted to a set of
operate on data stored in a processors, each of which
single memory executes a different instruction
 Uniprocessors fall into this sequence
category  Not commercially implemented

 Single instruction, multiple data  Multiple instruction, multiple

(SIMD) stream data (MIMD) stream
 A single machine instruction  A set of processors
controls the simultaneous simultaneously execute different
execution of a number of instruction sequences on different
processing elements on a data sets
lockstep basis  SMPs, clusters and NUMA systems
 Vector and array processors fall fit this category
into this category
Parallel Organizations
Parallel Organizations
17.2- Symmetric Multiprocessor
(SMP)
A SMP can be defined as a stand alone
computer with the following
characteristics:
Processors All System
share same processors controlled
memory share All by
Two or more and I/O access to processors integrated
similar facilities I/O devices can perform operating
processors • Processors are • Either through the same system
of connected by same functions • Provides
comparable ainternal
bus or other channels or
different
(hence interaction
capacity connection channels “symmetric” between
processors
• Memory giving paths ) and their
access time is to same programs at
approximately devices job, task, file
the same for and data
each element levels
processor
Multiprogramming and Multiprocessing

The operating system of an SMP schedules processes or threads across all of the
processors. SMP has a number of potential advantages over a uni-processor
organization, including the following: Performance, availability, incremental
growth (user can add processors), scaling (Vendors can offer a range of products
with different configures)
Organization: Tightly Coupled
• Each processor is self-
contained (CU, registers, one
or more caches).
• Shared main memory and I/O
devices through some form
of interconnection
mechanism.
• Processors can communicate
with each other through
memory.
• A processor can exchange
signals directly to each other.
• The memory is often
organized so that multiple
simultaneous accesses to
separate blocks of memory
are possible.
• In some configurations, each
processor may also have its
own private main memory
and I/O channels in addition
Organization: Symmetric Multiprocessor
• The most common
organization for
personal
computers,
workstations, and
servers is the time-
shared bus. The
time-shared bus is
the simplest
mechanism for
constructing a
multiprocessor
system.
• The structure and
interfaces are
DMA:
basically the same
• Addressing: <source, destination> as for a single-
• Arbitration: Any I/O module can be “master.” processor system
• Time-sharing that uses a bus
interconnection.
+
The bus organization has several
attractive features:

 Simplicity
 Simplest approach to multiprocessor organization

 Flexibility
 Generally easy to expand the system by attaching more
processors to the bus

 Reliability
 The bus is essentially a passive medium and the failure of
any attached device should not cause failure of the whole
system
+
Disadvantages of the bus organization:

 Main drawback is performance

 All memory references pass through the common bus
 Performance is limited by bus cycle time

 Each processor should have cache memory

 Reduces the number of bus accesses

 Leads to problems with cache coherence

 If a word is altered in one cache it could conceivably
invalidate a word in another cache
 To prevent this the other processors must be alerted that
an update has taken place
 Typically addressed in hardware rather than the operating
system
+ Multiprocessor Operating
System Design
Considerations
Simultaneous concurrent processes
 OS routines need to be reentrant (center) to allow several
processors to execute the same OS code (OS service)
simultaneously
 OS tables and management structures must be managed
properly to avoid deadlock or invalid operations

 Scheduling
 Any processor may perform scheduling so conflicts must be
avoided
 Scheduler must assign ready processes to available
processors

 Synchronization
With multiple active processes having potential access to shared
address spaces or I/O resources, care must be taken to provide
mutualeffective
exclusion: loạisynchronization
trừ hỗ tương, cơ chế độc chiếm tài nguyên, một nguyên nhân gây deadlock

+ Multiprocessor Operating System
Design Considerations…
 Memory management
 In addition to dealing with all of the issues found on
uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
 Paging mechanisms on different processors must be
coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement

 Reliability and fault tolerance

 OS should provide graceful degradation (suy giảm) in the face of
processor failure
 Scheduler and other portions of the operating system must
recognize the loss of a processor and restructure
accordingly
17.3- Cache Coherence and the
+
MESI Protocol Review
:
Write back: Write operations are usually made only to the cache.
Main memory is only updated when the corresponding cache line
is flushed from the cache  can result in inconsistency

Write through: All write operations are made to main memory as

well as to the cache, ensuring that main memory is always valid.
Even with the write-through policy, inconsistency can occur
unless other caches monitor the memory traffic or receive some
direct notification of the update

MESI (modified/exclusive/shared/invalid) protocol is
recommended here.
Coherent: sticking together – cố kết
Consistency: disambiguation- nhất quán, không nhập nhằng
Protocol: way including some steps for communication- giao thức
+ Cache Coherence…

Software Solutions

 Attempt to avoid the need for additional hardware

circuitry and logic by relying on the compiler and
operating system to deal with the problem (không
muốn thêm phần cứng)
 Attractive because the overhead of detecting
potential problems is transferred from run time to
compile time, and the design complexity is transferred
from hardware to software
 However, compile-time software approaches generally must
make conservative decisions, leading to inefficient cache
utilization
+
Cache Coherence…
Hardware-Based Solutions
 Generally referred to as cache coherence protocols
 These solutions provide dynamic recognition at run time
of potential inconsistency conditions
 Because the problem is only dealt with when it actually
arises there is more effective use of caches, leading to
improved performance over a software approach
 Approaches are transparent to the programmer and the
compiler, reducing the software development burden
 Can be divided into two categories:
 Directory protocols
 Snoopy protocols
Transparent: unable to see- trong suốt
Snoop: spy, rình mò
Directory Protocols
Collect and Effective in large
maintain scale systems with
information about complex
There is a copies of data in interconnection
centralized cache schemes

controller that
is part of the
Directory stored in Creates central
main memory main memory bottleneck
controller

Requests are Appropriate

checked against transfers are
directory performed
Snoopy Protocols
 Distribute the responsibility for maintaining cache coherence
among all of the cache controllers in a multiprocessor
 A cache must recognize when a line that it holds is shared with other
caches
 When updates are performed on a shared cache line, it must be
announced to other caches by a broadcast mechanism
 Each cache controller is able to “snoop” on the network to observe these
broadcast notifications and react accordingly

 Suited to bus-based multiprocessor because the shared bus provides

a simple means for broadcasting and snooping
 Care must be taken that the increased bus traffic required for broadcasting
and snooping does not cancel out the gains from the use of local caches

 Two basic approaches have been explored:

 Write invalidate
 Write update (or write broadcast)
+
Write Invalidate

 Multiple readers, but only one writer at a time

 When a write is required, all other caches of the line
are invalidated (marked)
 Writing processor then has exclusive (độc chiếm-cheap)
access until line is required by another processor
 Most widely used in commercial multiprocessor systems
such as the Pentium 4 and PowerPC
 State of every line is marked as modified, exclusive,
shared or invalid
 For this reason the write-invalidate protocol is called MESI
+
Write Update

 Can be multiple readers and writers

 When a processor wishes to update a shared line
the word to be updated is distributed to all others
and caches containing that line can update it
 Some systems use an adaptive mixture of both write-
invalidate and write-update mechanisms
+
MESI Protocol
To provide cache consistency on an SMP (symmetric
multi-processor) the data cache supports a protocol
known as MESI:
 Modified
 The line in the cache has been modified and is available
only in this cache

 Exclusive
 The line in the cache is the same as that in main memory
and is not present in any other cache

 Shared
 The line in the cache is the same as that in main memory
and may be present in another cache

 Invalid
 The line in the cache does not contain valid data
Table 17.1
MESI Cache Line States

Table 17.1 summarizes the meaning of the four states.

MESI State Transition Diagram
+
17.4- Multithreading and Chip
Multiprocessors
 Processor performance can be measured by the rate at
which it executes instructions
 MIPS rate = f * IPC // Million Instructions Per second
 f = processor clock frequency, in MHz
 IPC=average Instructions Per Cycle

 Increase performance by increasing clock frequency and

increasing instructions that complete during cycle
 Multithreading
 Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption  Increase
IPC
 Instruction stream is divided into several smaller streams,
known as threads, that can be executed in parallel
Definitions of Threads and Processes
Thread in multithreaded
processors may or may
not be the same as the
concept of software
threads in a
Thread switch multiprogrammed Thread is
• The act of switching operating system concerned with
processor control scheduling and
between threads execution, whereas
within the same a process is
process concerned with both
• Typically less costly scheduling/executio
than process switch n and resource and
Thread: resource ownership
• Dispatchable unit of work within a
process
• Includes processor context (which Process:
includes the program counter and • An instance of program
stack pointer) and data area for running on computer
stack • Two key characteristics:
• Executes sequentially and is • Resource ownership
interruptible so that the processor • Scheduling/execution
can turn to another thread
Process switch
• Operation that switches the processor from
one process to another by saving all the
process control data, registers, and other
information for the first and replacing them
with the process information for the second
Implicit and Explicit
Multithreading
 All commercial processors and most
experimental ones use explicit
multithreading
 Concurrently execute instructions from
different explicit threads
 Interleave instructions from different
threads on shared pipelines or parallel
execution on parallel pipelines

+ Implicit multithreading is concurrent


execution of multiple threads
extracted from single sequential
program
 Implicit threads defined statically by
compiler or dynamically by hardware
+ Approaches to Explicit
Multithreading
 Interleaved  Blocked
 Fine-grained (divided)  Coarse-grained (no fine)
 Processor deals with two or  Thread executed until
more thread contexts at a event causes delay (IO)
time  Effective on in-order
 Switching thread at each processor
clock cycle  Avoids pipeline stall
 If thread is blocked it is (failure)
skipped
 Chip multiprocessing
 Simultaneous (SMT)  Processor is replicated on a
 Instructions are single chip
simultaneously issued from
multiple threads to
 Each processor handles
execution units of separate threads
superscalar processor  Advantage is that the
available logic area on a
SMT: Simultaneous Multithreading chip is used effectively
+
Approache
s to
Executing
Multiple
Threads
+
Example Systems
Pentium 4 IBM Power5

 More recent models of

 Chip used in high-end
the Pentium 4 use a PowerPC products
multithreading technique
that Intel refers to as
 Combines chip
hyperthreading multiprocessing with SMT
 Has two separate processors,
 Approach is to use SMT each of which is a
with support for two multithreaded processor
threads capable of supporting two
threads concurrently using SMT
 Thus the single  Designers found that having
multithreaded processor two two-way SMT processors on
is logically two a single chip provided superior
performance to a single four-
processors
way SMT processor
+
Exercises
 17.1 List and briefly define three types of computer system
organization.
 17.2 What are the chief characteristics of an SMP(symmetric
multiprocessor)?
 17.3 What are some of the potential advantages of an SMP
compared with a uniprocessor?
 17.4 What are some of the key OS design issues for an SMP?
 17.5 What is the difference between software and hardware
cache coherent schemes?
 17.6 What is the meaning of each of the four states in the
MESI protocol?
+ Summary Parallel
Processing
Chapter 17
 Multiple processor  Cache coherence and the
organizations MESI protocol
 Types of parallel  Software solutions
processor systems
 Hardware solutions
 Parallel organizations
 The MESI protocol
 Symmetric
multiprocessors  Multithreading and chip
multiprocessors
 Organization
 Multiprocessor operating
 Implicit and explicit
system design multithreading
considerations  Approaches to explicit
multithreading
 Example systems

Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
WinOLS HelpEn
No ratings yet
WinOLS HelpEn
295 pages
Parallel Processing:: Multiple Processor Organization
No ratings yet
Parallel Processing:: Multiple Processor Organization
24 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
CH17-COA10e - Parallel Processing
No ratings yet
CH17-COA10e - Parallel Processing
45 pages
Mod 7
No ratings yet
Mod 7
56 pages
CH17 COA10e
No ratings yet
CH17 COA10e
45 pages
Unit - 1 Introduction
No ratings yet
Unit - 1 Introduction
52 pages
PART17
No ratings yet
PART17
45 pages
Week 5
No ratings yet
Week 5
35 pages
Unit 6
No ratings yet
Unit 6
36 pages
CH20 COA11e
No ratings yet
CH20 COA11e
40 pages
Operating Systems
No ratings yet
Operating Systems
52 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Week 5
No ratings yet
Week 5
52 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Lecture-7 SMP NUMA Cache Coherence
No ratings yet
Lecture-7 SMP NUMA Cache Coherence
34 pages
Parallel Processing
No ratings yet
Parallel Processing
28 pages
Unit 3
No ratings yet
Unit 3
28 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Contents:: Multiprocessors: Characteristics of Multiprocessor, Structure of Multiprocessor
No ratings yet
Contents:: Multiprocessors: Characteristics of Multiprocessor, Structure of Multiprocessor
52 pages
Unit Iv
No ratings yet
Unit Iv
31 pages
Unit VI
No ratings yet
Unit VI
50 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
Arkom 13-40275
No ratings yet
Arkom 13-40275
32 pages
Unit6 - Microprocessor - Final 1
No ratings yet
Unit6 - Microprocessor - Final 1
30 pages
Macworld - April 2024 USA
No ratings yet
Macworld - April 2024 USA
120 pages
Unit 6 Mom
No ratings yet
Unit 6 Mom
23 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Multiprocessor
No ratings yet
Multiprocessor
45 pages
COA Assignment
No ratings yet
COA Assignment
21 pages
Aos Questions Ia
No ratings yet
Aos Questions Ia
19 pages
Discuss The Inter-Dependence of Workstation Hardware With Relevant Networking Software.
No ratings yet
Discuss The Inter-Dependence of Workstation Hardware With Relevant Networking Software.
2 pages
2ad6a430 1637912349895
No ratings yet
2ad6a430 1637912349895
51 pages
Group 6 Task
No ratings yet
Group 6 Task
11 pages
Thread Level Parallelism
No ratings yet
Thread Level Parallelism
21 pages
Multi Core
No ratings yet
Multi Core
7 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Chapter - 5 Introduction To Advanced Architecture 5.1 Introduction To Parallel Processing
No ratings yet
Chapter - 5 Introduction To Advanced Architecture 5.1 Introduction To Parallel Processing
11 pages
Chapter 10
No ratings yet
Chapter 10
6 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
Parallel Prrocessor
No ratings yet
Parallel Prrocessor
12 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
Chapter 17 - Exercises - Parallel Processing (AutoRecovered)
No ratings yet
Chapter 17 - Exercises - Parallel Processing (AutoRecovered)
9 pages
OS Notes
No ratings yet
OS Notes
16 pages
ZXDU68 W201 (V5.0) DC Power System Quick Installation Guide (HEX DC3&BB2) - 2016
No ratings yet
ZXDU68 W201 (V5.0) DC Power System Quick Installation Guide (HEX DC3&BB2) - 2016
4 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
EX17
No ratings yet
EX17
2 pages
Multi-Processor-Parallel Processing PDF
No ratings yet
Multi-Processor-Parallel Processing PDF
12 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Gr11IT2013 1 - 1hardware
No ratings yet
Gr11IT2013 1 - 1hardware
17 pages
TS INS 2025 en Install and License Tekla Structures 0
No ratings yet
TS INS 2025 en Install and License Tekla Structures 0
32 pages
Hilti CFS-COS Firestop Composite Sheet Installation Guide PDF
No ratings yet
Hilti CFS-COS Firestop Composite Sheet Installation Guide PDF
14 pages
A Timeline of Cpu's
No ratings yet
A Timeline of Cpu's
16 pages
Solution-Brief Supermicro Liquid Cooling Solution Guide
No ratings yet
Solution-Brief Supermicro Liquid Cooling Solution Guide
8 pages
DataSheet HP470 PDF
No ratings yet
DataSheet HP470 PDF
4 pages
Laptop Lenovo Foxcon-661S03
No ratings yet
Laptop Lenovo Foxcon-661S03
50 pages
Installation Guide Valvelink Software en 122942
No ratings yet
Installation Guide Valvelink Software en 122942
130 pages
COOP-Price List For Phone LCD With Logo11 2023.2.20)
No ratings yet
COOP-Price List For Phone LCD With Logo11 2023.2.20)
716 pages
B Wincollect
No ratings yet
B Wincollect
110 pages
05 Exploded View LC6M-LC630K1F
No ratings yet
05 Exploded View LC6M-LC630K1F
27 pages
Universal Arcade Stick: Package Contents
No ratings yet
Universal Arcade Stick: Package Contents
2 pages
Presentation 1
No ratings yet
Presentation 1
6 pages
Information Sheet 1.1-3
No ratings yet
Information Sheet 1.1-3
4 pages
Manual v3 Honda Edition R10 (En)
No ratings yet
Manual v3 Honda Edition R10 (En)
27 pages
Datasheet KB3930
No ratings yet
Datasheet KB3930
199 pages
Cable ID: 1A/8A-A.02 Test Summary: PASS: Test Limit: TIA Cat 6 Channel
No ratings yet
Cable ID: 1A/8A-A.02 Test Summary: PASS: Test Limit: TIA Cat 6 Channel
1 page
Mswipe WisePad API4.0.0 Integration Guide Android Version
No ratings yet
Mswipe WisePad API4.0.0 Integration Guide Android Version
47 pages
ABB Recovery Media Creation Guide Enu
No ratings yet
ABB Recovery Media Creation Guide Enu
30 pages
FDS TD Sample
No ratings yet
FDS TD Sample
8 pages
Oxyblend Air/Oxygen Blender: Solutions For Life
No ratings yet
Oxyblend Air/Oxygen Blender: Solutions For Life
4 pages
Tunel Datasheet ENG
No ratings yet
Tunel Datasheet ENG
2 pages
Choose Your Parts - PCPartPicker
No ratings yet
Choose Your Parts - PCPartPicker
1 page
14 B Capture/Compare/Pwm (CCP) Modules in Pic
No ratings yet
14 B Capture/Compare/Pwm (CCP) Modules in Pic
5 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Micron DRAM Memory Support For AMD Embedded Platforms
No ratings yet
Micron DRAM Memory Support For AMD Embedded Platforms
1 page
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet
Operating System Text Book
From Everand
Operating System Text Book
Manish Soni
No ratings yet

CH17 ParallelProcessing 32 Slides

Uploaded by

CH17 ParallelProcessing 32 Slides

Uploaded by

+

You are profiting from multiple CPU computers, You should

After studying this chapter, you should be able to:

 17.1 Multiple Processor Organizations

 Single instruction, multiple data  Multiple instruction, multiple

 Main drawback is performance

 Each processor should have cache memory

 Leads to problems with cache coherence

 Reliability and fault tolerance

Write through: All write operations are made to main memory as

 Attempt to avoid the need for additional hardware

Requests are Appropriate

 Suited to bus-based multiprocessor because the shared bus provides

 Two basic approaches have been explored:

 Multiple readers, but only one writer at a time

 Can be multiple readers and writers

Table 17.1 summarizes the meaning of the four states.

 Increase performance by increasing clock frequency and

+ Implicit multithreading is concurrent

 More recent models of

You might also like