0% found this document useful (0 votes)
10 views19 pages

TLP

Tlp pdf

Uploaded by

cse.20201016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views19 pages

TLP

Tlp pdf

Uploaded by

cse.20201016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Overview

¨ Announcement
¤ Homework 4 is due on Dec. 11th

¨ This lecture
¤ Thread level parallelism (TLP)
¤ Parallel architectures for exploiting TLP
n Hardware multithreading
n Symmetric multiprocessors
n Chip multiprocessing
Flynn’s Taxonomy
¨ Forms of computer architectures
Instruction Stream
Single Multiple

Single-Instruction, Multiple-Instruction,
Single Single Data (SISD) Single Data (MISD)
Data Stream

uniprocessors systolic arrays

Multiple-Instruction,
Single-Instruction,
Multiple Data
Multiple Multiple Data (SIMD)
(MIMD)
vector processors
multiprocessors
Flynn’s Taxonomy
¨ Forms of computer architectures
Instruction Stream
Single Multiple

Single-Instruction, Multiple-Instruction,
Single Single Data (SISD) Single Data (MISD)
Data Stream

uniprocessors systolic arrays

Multiple-Instruction,
Single-Instruction,
Multiple Data
Multiple Multiple Data (SIMD)
(MIMD)
vector processors
multiprocessors
Basics of Threads
¨ Thread is a single sequential flow of control within a
program including instructions and state
¤ Register state is called thread context
¨ A program may be single- or multi-threaded
¤ Single-threaded program can handle one task at any
time
¨ Multitasking is performed by modern operating
systems to load the context of a new thread while
the old thread’s context is written back to memory
Thread Level Parallelism (TLP)
¨ Users prefer to execute multiple applications
¤ Piping applications in Linux
n gunzip -c foo.gz | grep bar | perl some-script.pl

¤ Your favorite applications while working in office


n Music player, web browser, terminal, etc.
¨ Many applications are amenable to parallelism
¤ Explicitly multi-threaded programs
n Pthreaded applications
¤ Parallel languages and libraries
n Java, C#, OpenMP
Thread Level Parallel Architectures
¨ Architectures for exploiting thread-level parallelism
Hardware Multithreading Multiprocessing
q Multiple threads run on the q Different threads run on
same processor pipeline different processors
q Multithreading levels q Two general types
o Coarse grained o Symmetric multiprocessors
multithreading (CGMT) (SMP)
o Fine grained multithreading § Single CPU per chip
(FGMT) o Chip Multiprocessors (CMP)
o Simultaneous multithreading § Multiple CPUs per chip
(SMT)
Hardware Multithreading
Hardware Multithreading
¨ Observation: CPU become idle due to latency of
memory operations, dependent instructions, and
branch resolution
¨ Key idea: utilize idle resources to improve
performance
¤ Support multiple thread contexts in a single processor
¤ Exploit thread level parallelism

¨ Challenge: the energy and performance costs of


context switching
Coarse Grained Multithreading
¨ Single thread runs until a costly stall—e.g. last level
cache miss
¨ Another thread starts during stall for first
¤ Pipeline fill time requires several cycles!
¨ At any time, only one thread is in the pipeline
¨ Does not cover short stalls
¨ Needs hardware support
¤ PC and register file for each thread
Coarse Grained Multithreading
¨ Superscalar vs. CGMT
FU1 FU2 FU3 FU4 FU1 FU2 FU3 FU4

Coarse Grained Multithreading


Conventional Superscalar
Fine Grain Multithreading
¨ Two or more threads interleave instructions
¤ Round-robin fashion
¤ Skip stalled threads

¨ Needs hardware support


¤ Separate PC and register file for each thread
¤ Hardware to control alternating pattern

¨ Naturally hides delays


¤ Data hazards, Cache misses
¤ Pipeline runs with rare stalls

¨ Does not make full use of multi-issue architecture


Fine Grained Multithreading
¨ CGMT vs. FGMT
FU1 FU2 FU3 FU4 FU1 FU2 FU3 FU4
Coarse Grained Multithreading

Fine Grained Multithreading


Simultaneous Multithreading
¨ Instructions from multiple threads issued on same
cycle
¤ Uses register renaming and dynamic scheduling facility
of multi-issue architecture
¨ Needs more hardware support
¤ Register files, PC’s for each thread
¤ Temporary result registers before commit
¤ Support to sort out which threads get results from which
instructions
¨ Maximizes utilization of execution units
Simultaneous Multithreading
¨ FGMT vs. SMT
FU1 FU2 FU3 FU4 FU1 FU2 FU3 FU4

Simultaneous Multithreading
Fine Grained Multithreading
Multiprocessing
Symmetric Multiprocessors
¨ Multiple CPU chips share the same CPU 0
CPU 1
memory CPU 2
CPU 3
¨ From the OS’s point of view
¤ Allof the CPUs have equal compute appapp
app
capabilities
OS
¤ The main memory is equally accessible
by the CPU chips
¨ OS runs every thread on a CPU
¨ Every CPU has its own power
distribution and cooling system
AMD Opteron
Chip Multiprocessors
¨ Can be viewed as a simple SMP on
single chip Core Core

Core
0 1 3
¨ CPUs are now called cores
¤ One thread per core Shared
cache
¨ Shared higher level caches
¤ Typicallythe last level
¤ Lower latency

¤ Improved bandwidth

¨ Not necessarily homogenous cores!

Intel Nehalem (Core i7)


Why Chip Multiprocessing?
¨ CMP exploits parallelism at lower costs than SMP
¤A single interface to the main memory
¤ Only one CPU socket is required on the motherboard

¨ CMP requires less off-chip communication


¤ Lower power and energy consumption

¤ Better performance due to improved AMAT

¨ CMP better employs the additional transistors that


are made available based on the Moore’s law
¤ More cores rather than more complicated pipelines

You might also like