0% found this document useful (0 votes)
7 views32 pages

CH17 ParallelProcessing 32 Slides

This chapter discusses parallel processing, focusing on various types of parallel processor organizations, including SISD, SIMD, and MIMD. It also covers symmetric multiprocessors (SMP), cache coherence issues, the MESI protocol, and multithreading techniques. Key design considerations for multiprocessor operating systems, including scheduling, synchronization, and memory management, are also highlighted.

Uploaded by

quocvietlhbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views32 pages

CH17 ParallelProcessing 32 Slides

This chapter discusses parallel processing, focusing on various types of parallel processor organizations, including SISD, SIMD, and MIMD. It also covers symmetric multiprocessors (SMP), cache coherence issues, the MESI protocol, and multithreading techniques. Key design considerations for multiprocessor operating systems, including scheduling, synchronization, and memory management, are also highlighted.

Uploaded by

quocvietlhbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

+

Parallel
Chapter Processing
17
William Stallings, Computer Organization and Architecture, 9 th Edition
+
Objectives

You are profiting from multiple CPU computers, You should


know about them.

After studying this chapter, you should be able to:


 Summarize the types of parallel processor organizations.
 Present an overview of design features of symmetric
multiprocessors. Understand the issue of cache
coherence in a multiple processor system.
 Explain the key features of the MESI protocol.
 Explain the difference between implicit and explicit
multithreading. Summarize key design issues for clusters.
+
Contents

 17.1 Multiple Processor Organizations


 17.2 Symmetric Multiprocessors
 17.3 Cache Coherence and the MESI Protocol
 17.4 Multithreading and Chip Multiprocessors
+
17.1- Multiple Processor
Organization
 Single instruction, single data  Multiple instruction, single
(SISD) stream data (MISD) stream
 Single processor executes a  A sequence of data is
single instruction stream to transmitted to a set of
operate on data stored in a processors, each of which
single memory executes a different instruction
 Uniprocessors fall into this sequence
category  Not commercially implemented

 Single instruction, multiple data  Multiple instruction, multiple


(SIMD) stream data (MIMD) stream
 A single machine instruction  A set of processors
controls the simultaneous simultaneously execute different
execution of a number of instruction sequences on different
processing elements on a data sets
lockstep basis  SMPs, clusters and NUMA systems
 Vector and array processors fall fit this category
into this category
Parallel Organizations
Parallel Organizations
17.2- Symmetric Multiprocessor
(SMP)
A SMP can be defined as a stand alone
computer with the following
characteristics:
Processors All System
share same processors controlled
memory share All by
Two or more and I/O access to processors integrated
similar facilities I/O devices can perform operating
processors • Processors are • Either through the same system
of connected by same functions • Provides
comparable ainternal
bus or other channels or
different
(hence interaction
capacity connection channels “symmetric” between
processors
• Memory giving paths ) and their
access time is to same programs at
approximately devices job, task, file
the same for and data
each element levels
processor
Multiprogramming and Multiprocessing

The operating system of an SMP schedules processes or threads across all of the
processors. SMP has a number of potential advantages over a uni-processor
organization, including the following: Performance, availability, incremental
growth (user can add processors), scaling (Vendors can offer a range of products
with different configures)
Organization: Tightly Coupled
• Each processor is self-
contained (CU, registers, one
or more caches).
• Shared main memory and I/O
devices through some form
of interconnection
mechanism.
• Processors can communicate
with each other through
memory.
• A processor can exchange
signals directly to each other.
• The memory is often
organized so that multiple
simultaneous accesses to
separate blocks of memory
are possible.
• In some configurations, each
processor may also have its
own private main memory
and I/O channels in addition
Organization: Symmetric Multiprocessor
• The most common
organization for
personal
computers,
workstations, and
servers is the time-
shared bus. The
time-shared bus is
the simplest
mechanism for
constructing a
multiprocessor
system.
• The structure and
interfaces are
DMA:
basically the same
• Addressing: <source, destination> as for a single-
• Arbitration: Any I/O module can be “master.” processor system
• Time-sharing that uses a bus
interconnection.
+
The bus organization has several
attractive features:

 Simplicity
 Simplest approach to multiprocessor organization

 Flexibility
 Generally easy to expand the system by attaching more
processors to the bus

 Reliability
 The bus is essentially a passive medium and the failure of
any attached device should not cause failure of the whole
system
+
Disadvantages of the bus organization:

 Main drawback is performance


 All memory references pass through the common bus
 Performance is limited by bus cycle time

 Each processor should have cache memory


 Reduces the number of bus accesses

 Leads to problems with cache coherence


 If a word is altered in one cache it could conceivably
invalidate a word in another cache
 To prevent this the other processors must be alerted that
an update has taken place
 Typically addressed in hardware rather than the operating
system
+ Multiprocessor Operating
System Design
Considerations
Simultaneous concurrent processes
 OS routines need to be reentrant (center) to allow several
processors to execute the same OS code (OS service)
simultaneously
 OS tables and management structures must be managed
properly to avoid deadlock or invalid operations

 Scheduling
 Any processor may perform scheduling so conflicts must be
avoided
 Scheduler must assign ready processes to available
processors

 Synchronization
With multiple active processes having potential access to shared
address spaces or I/O resources, care must be taken to provide
mutualeffective
exclusion: loạisynchronization
trừ hỗ tương, cơ chế độc chiếm tài nguyên, một nguyên nhân gây deadlock

+ Multiprocessor Operating System
Design Considerations…
 Memory management
 In addition to dealing with all of the issues found on
uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
 Paging mechanisms on different processors must be
coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement

 Reliability and fault tolerance


 OS should provide graceful degradation (suy giảm) in the face of
processor failure
 Scheduler and other portions of the operating system must
recognize the loss of a processor and restructure
accordingly
17.3- Cache Coherence and the
+
MESI Protocol Review
:
Write back: Write operations are usually made only to the cache.
Main memory is only updated when the corresponding cache line
is flushed from the cache  can result in inconsistency

Write through: All write operations are made to main memory as


well as to the cache, ensuring that main memory is always valid.
Even with the write-through policy, inconsistency can occur
unless other caches monitor the memory traffic or receive some
direct notification of the update

MESI (modified/exclusive/shared/invalid) protocol is
recommended here.
Coherent: sticking together – cố kết
Consistency: disambiguation- nhất quán, không nhập nhằng
Protocol: way including some steps for communication- giao thức
+ Cache Coherence…

Software Solutions

 Attempt to avoid the need for additional hardware


circuitry and logic by relying on the compiler and
operating system to deal with the problem (không
muốn thêm phần cứng)
 Attractive because the overhead of detecting
potential problems is transferred from run time to
compile time, and the design complexity is transferred
from hardware to software
 However, compile-time software approaches generally must
make conservative decisions, leading to inefficient cache
utilization
+
Cache Coherence…
Hardware-Based Solutions
 Generally referred to as cache coherence protocols
 These solutions provide dynamic recognition at run time
of potential inconsistency conditions
 Because the problem is only dealt with when it actually
arises there is more effective use of caches, leading to
improved performance over a software approach
 Approaches are transparent to the programmer and the
compiler, reducing the software development burden
 Can be divided into two categories:
 Directory protocols
 Snoopy protocols
Transparent: unable to see- trong suốt
Snoop: spy, rình mò
Directory Protocols
Collect and Effective in large
maintain scale systems with
information about complex
There is a copies of data in interconnection
centralized cache schemes

controller that
is part of the
Directory stored in Creates central
main memory main memory bottleneck
controller

Requests are Appropriate


checked against transfers are
directory performed
Snoopy Protocols
 Distribute the responsibility for maintaining cache coherence
among all of the cache controllers in a multiprocessor
 A cache must recognize when a line that it holds is shared with other
caches
 When updates are performed on a shared cache line, it must be
announced to other caches by a broadcast mechanism
 Each cache controller is able to “snoop” on the network to observe these
broadcast notifications and react accordingly

 Suited to bus-based multiprocessor because the shared bus provides


a simple means for broadcasting and snooping
 Care must be taken that the increased bus traffic required for broadcasting
and snooping does not cancel out the gains from the use of local caches

 Two basic approaches have been explored:


 Write invalidate
 Write update (or write broadcast)
+
Write Invalidate

 Multiple readers, but only one writer at a time


 When a write is required, all other caches of the line
are invalidated (marked)
 Writing processor then has exclusive (độc chiếm-cheap)
access until line is required by another processor
 Most widely used in commercial multiprocessor systems
such as the Pentium 4 and PowerPC
 State of every line is marked as modified, exclusive,
shared or invalid
 For this reason the write-invalidate protocol is called MESI
+
Write Update

 Can be multiple readers and writers


 When a processor wishes to update a shared line
the word to be updated is distributed to all others
and caches containing that line can update it
 Some systems use an adaptive mixture of both write-
invalidate and write-update mechanisms
+
MESI Protocol
To provide cache consistency on an SMP (symmetric
multi-processor) the data cache supports a protocol
known as MESI:
 Modified
 The line in the cache has been modified and is available
only in this cache

 Exclusive
 The line in the cache is the same as that in main memory
and is not present in any other cache

 Shared
 The line in the cache is the same as that in main memory
and may be present in another cache

 Invalid
 The line in the cache does not contain valid data
Table 17.1
MESI Cache Line States

Table 17.1 summarizes the meaning of the four states.


MESI State Transition Diagram
+
17.4- Multithreading and Chip
Multiprocessors
 Processor performance can be measured by the rate at
which it executes instructions
 MIPS rate = f * IPC // Million Instructions Per second
 f = processor clock frequency, in MHz
 IPC=average Instructions Per Cycle

 Increase performance by increasing clock frequency and


increasing instructions that complete during cycle
 Multithreading
 Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption  Increase
IPC
 Instruction stream is divided into several smaller streams,
known as threads, that can be executed in parallel
Definitions of Threads and Processes
Thread in multithreaded
processors may or may
not be the same as the
concept of software
threads in a
Thread switch multiprogrammed Thread is
• The act of switching operating system concerned with
processor control scheduling and
between threads execution, whereas
within the same a process is
process concerned with both
• Typically less costly scheduling/executio
than process switch n and resource and
Thread: resource ownership
• Dispatchable unit of work within a
process
• Includes processor context (which Process:
includes the program counter and • An instance of program
stack pointer) and data area for running on computer
stack • Two key characteristics:
• Executes sequentially and is • Resource ownership
interruptible so that the processor • Scheduling/execution
can turn to another thread
Process switch
• Operation that switches the processor from
one process to another by saving all the
process control data, registers, and other
information for the first and replacing them
with the process information for the second
Implicit and Explicit
Multithreading
 All commercial processors and most
experimental ones use explicit
multithreading
 Concurrently execute instructions from
different explicit threads
 Interleave instructions from different
threads on shared pipelines or parallel
execution on parallel pipelines

+ Implicit multithreading is concurrent



execution of multiple threads
extracted from single sequential
program
 Implicit threads defined statically by
compiler or dynamically by hardware
+ Approaches to Explicit
Multithreading
 Interleaved  Blocked
 Fine-grained (divided)  Coarse-grained (no fine)
 Processor deals with two or  Thread executed until
more thread contexts at a event causes delay (IO)
time  Effective on in-order
 Switching thread at each processor
clock cycle  Avoids pipeline stall
 If thread is blocked it is (failure)
skipped
 Chip multiprocessing
 Simultaneous (SMT)  Processor is replicated on a
 Instructions are single chip
simultaneously issued from
multiple threads to
 Each processor handles
execution units of separate threads
superscalar processor  Advantage is that the
available logic area on a
SMT: Simultaneous Multithreading chip is used effectively
+
Approache
s to
Executing
Multiple
Threads
+
Example Systems
Pentium 4 IBM Power5

 More recent models of


 Chip used in high-end
the Pentium 4 use a PowerPC products
multithreading technique
that Intel refers to as
 Combines chip
hyperthreading multiprocessing with SMT
 Has two separate processors,
 Approach is to use SMT each of which is a
with support for two multithreaded processor
threads capable of supporting two
threads concurrently using SMT
 Thus the single  Designers found that having
multithreaded processor two two-way SMT processors on
is logically two a single chip provided superior
performance to a single four-
processors
way SMT processor
+
Exercises
 17.1 List and briefly define three types of computer system
organization.
 17.2 What are the chief characteristics of an SMP(symmetric
multiprocessor)?
 17.3 What are some of the potential advantages of an SMP
compared with a uniprocessor?
 17.4 What are some of the key OS design issues for an SMP?
 17.5 What is the difference between software and hardware
cache coherent schemes?
 17.6 What is the meaning of each of the four states in the
MESI protocol?
+ Summary Parallel
Processing
Chapter 17
 Multiple processor  Cache coherence and the
organizations MESI protocol
 Types of parallel  Software solutions
processor systems
 Hardware solutions
 Parallel organizations
 The MESI protocol
 Symmetric
multiprocessors  Multithreading and chip
multiprocessors
 Organization
 Multiprocessor operating
 Implicit and explicit
system design multithreading
considerations  Approaches to explicit
multithreading
 Example systems

You might also like