Presentation Cea Chapter16 2 Demo

Uploaded by

Tran Minh Son (K18 HL)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views30 pages

Presentation Cea Chapter16 2 Demo

Uploaded by

Tran Minh Son (K18 HL)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Chapter 16

Instruction-Level Parallelism and Superscalar

processors
Table of contents

1. 2. 3. 4.
OVERVIEW DESIGN ISSUES PENTIUM 4 ARM COTEX-A8
01.
Overview
Superscalar
Refers to a machine that is
designed to improve the In most applications the
Term first coined in 1987 performance of the bulk of the operations are
execution of scalar on scalar quantities
instructions

Essence of the approach is Concept can be further

Represents the next step in
the ability to execute exploited by allowing
the evolution of high-
instructions independently instructions to be executed
performance general-
and concurrently in in an order different from
purpose processors
different pipelines the program order
SUPERSCALAR ORGANIZATION
ORDINARY SCALAR ORGANIZATION
Comparison of
superscalar
and super pipeline
Approaches
Constraints
• Instruction level parallelism
• Refers to the degree to which the instructions of a program can be
executed in parallel
• A combination of compiler based optimization and hardware techniques
can be used to maximize instruction level parallelism
• Limitations:
• True data dependency
• Procedural dependency
• Resource conflicts
• Output dependency
• Antidependency
02.
Design issues
Elaborate on what you want to
discuss.
Instruction level parallelism and machine
parallelism

Instruction level parallelism Machine parallelism

• Instructions in a sequence are dependent • Ability to take advantage of instruction
• Execution can be overlapped level parallelism
• Governed by data and procedural • Governed by number of parallel pipelines
dependency
Instruction issue policy
Instruction issue Instruction issue policy

• Refers to the process of initiating instruction execution in • Refers to the protocol used to issue instructions
the processor’s functional units • Instruction issue occurs when instruction moves from the
decode stage of the pipeline to the first execute stage of
the pipeline

Superscalar instruction issue policies can be

3 types of orderings are important grouped into the following categories

• The order in which instructions are fetched • In-order issue with in-order completion
• The order in which instructions are executed • In-order issue with out-of-order completion
• The order in which instructions update the contents of • Out-of-order issue with out-of-order completion
register and memory locations
Superscalar instruction
issue and completion
policies
Organization for out-of-order issue with out-
of-order completion
Register renaming
Output and antidependencies occur
because register contents may not reflect
the correct ordering from the program

May result in a pipeline stall

Registers allocated dynamically

Speedups of various machine organizations without
procedural dependencies
Branch prediction
• Any high-performance pipelined machine must address the issue of dealing with
branches

• Intel 80486 addressed the problem by fetching both the next sequential instruction after
a branch and speculatively fetching the branch target instruction

• RISC machines:
• Delayed branch strategy was explored
• Processor always executes the single instruction that immediately follows the branch
• Keeps the pipeline full while the processor fetches a new instruction stream

• Superscalar machines:
• Delayed branch strategy has less appeal
• Have returned to pre-RISC techniques of branch prediction
Conceptual depiction of
superscalar processing
Superscalar implementation
• Key elements:
• Instruction fetch strategies that simultaneously fetch multiple instruction
• Logic for determining true dependencies involving register values, and mechanisms
for communicating these values to where they are needed during execution
• Mechanisms for initiating, or issuing, multiple instructions in parallel
• Resources for parallel execution of multiple instructions, including multiple
pipelined functional units and memory hierarchies capable of simultaneously
servicing multiple memory references
• Mechanisms for committing the process state in correct order
03. Pentium 4
Pentium 4 diagram
1. The processor fetches instructions from memory
in the order of the static program
2. Each instruction is translated into one or more
fixed-length RISC instructions, known as micro-
operations
3. The processor executes the micro-ops on a
superscalar pipeline organization, so that the
micro-ops may be executed out of order
4. The processor commits the results of each micro-
op execution to the processor’s register set in the
order of the original program flow.
Pentium 4 pipeline

the Pentium 4 architecture implements a CISC

instruction set architecture on a RISC
microarchitecture. The inner RISC micro-ops pass
through a pipeline with at least 20 stages; in some
cases, micro-op requires multiple execution stages,
resulting in even longer pipeline
• Generation of Micro-ops
Front end
• The Pentium 4 organization has an in-order front end that feeds into an L1 instruction cache called the trace cache.
• The fetch/decode unit fetches x86 machine instructions from the L2 cache 64 bytes at a time.
• Branch prediction via the BTB & I-TLB unit may alter the sequential fetch operation.
• Once instructions are fetched, the fetch/decode unit scans the bytes to determine instruction boundaries and the decoder
translates each machine instruction into one to four micro-ops.
• The generated micro-ops are stored in the trace cache.
• Trace Cache Next Instruction Pointer
• The Pentium 4 uses a dynamic branch prediction strategy based on the history information for that entry in determining
whether to predict that the branch is taken.
• The Pentium 4 BTB is organized as a four-way set-associative cache with 512 lines, and each entry uses the address of the
branch as a tag.
• Conditional branches that do not have a history in the BTB are predicted using a static prediction algorithm.
• Trace Cache Fetch
• The trace cache takes the already-decoded micro-ops from the instruction decoder and assembles them into program-
ordered sequences of micro-ops called traces.
• Micro-ops are fetched sequentially from the trace cache, subject to the branch prediction logic.
• A few instructions require more than four micro-ops, and these instructions are transferred to microcode ROM.
• Drive
• The fifth stage of the Pentium 4 pipeline delivers decoded instructions from the trace cache to the rename/allocator module.
Out-of-order execution
logic
• In the allocate stage, resources required for execution are allocated, including a
reorder buffer (ROB) entry, a register entry for the result data value, and possibly a
load or store buffer. The ROB is a circular buffer that can hold up to 126 micro-ops and
tracks their completion status.
• In the register renaming stage, references to the 16 architectural registers are
remapped into a set of 128 physical registers to remove false dependencies. Micro-ops
are then placed in one of two micro-op queues, one for memory operations and the
other for micro-ops that don't involve memory references.
• In the micro-op scheduling and dispatching stage, the schedulers retrieve micro-ops
from the queues and dispatch them for execution, up to six at a time, favoring in-order
execution but with flexibility to allow out-of-order execution.
Integer and floating-point
execution units

• The integer and floating-point register files are used as the source for pending operations by the
execution units.
• The execution units retrieve values from the register files and the L1 data cache.
• A separate pipeline stage computes flags such as zero and negative, typically used as input to a branch
instruction.
• Branch checking is performed in a subsequent pipeline stage, comparing actual branch result with
prediction.
• If a branch prediction is wrong, micro-operations in various stages must be removed from the
pipeline.
• The Branch Predictor is provided with the correct branch destination during a drive stage, which
restarts the pipeline from the new target address.
04.
Arm
Cortex-A8
Arm
cortex-A8
Instruction fetch unit
 F0
 Predicts instruction stream
 Address generation unit (AGU) generate a new virtual address
 Fetches instructions from the L1 instruction cache
 Not counted as part of the 12-stage pipeline
 Places the fetched instructions into a buffer for consumption by the
 F1
decode pipeline
 The calculated address is used to fetch instructions from the L1
 Also includes the L1 instruction cache
instruction cache
 Speculative(there í no guarantee that they are executed)
 In parallel, the fetch address is used to access branch prediction
 Branch or exceptional instruction in the code stream can cause a
arrays
pipeline flush
 F3
 Can fetch up to four instructions per cycle
 Instruction data are placed in the instruction queue
 If an instruction results in branch prediction, new target address
is sent to the address generation unit
Instruction decode unit
 Decodes and sequences all ARM and Thumb instructions
 Dual pipeline structure, pipe0 and pipe1
 Two instructions can progress at a time
 Pipe 0 contains the older instruction in program order
 If instruction in pipe0 cannot issue, instruction in pipe 1 will not issue
 All issued instructions progress in order
 Results written back to register file at end of execution pipeline
 Prevents WAR hazards
 Keeps track of WAW hazards and recovery from flush conditions straightforward
 Main concern of decode pipeline is prevention of RAW hazards
Thank you!

Intel 80586 (Pentium)
100% (3)
Intel 80586 (Pentium)
24 pages
CH6 Iii
No ratings yet
CH6 Iii
56 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
CPU With Systems Bus
33% (3)
CPU With Systems Bus
35 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Processor Structure and Function
100% (1)
Processor Structure and Function
55 pages
Lecture 3: CPU Structure and Function
No ratings yet
Lecture 3: CPU Structure and Function
47 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
CEA201 - Chapter 14 - Processor Structure and Function
No ratings yet
CEA201 - Chapter 14 - Processor Structure and Function
42 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
8 Week
No ratings yet
8 Week
43 pages
CH18 COA11e
No ratings yet
CH18 COA11e
37 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Processor Organization
100% (1)
Processor Organization
55 pages
Presentation - Cea - Chapter16 2
No ratings yet
Presentation - Cea - Chapter16 2
33 pages
10 Week
No ratings yet
10 Week
35 pages
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
Filipino Alphabet Tracing
No ratings yet
Filipino Alphabet Tracing
28 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
CH18 COA11e
No ratings yet
CH18 COA11e
40 pages
Superscalar
No ratings yet
Superscalar
38 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
11 Processor Structure and Function 20 3 18
No ratings yet
11 Processor Structure and Function 20 3 18
27 pages
Lec03 - Processor Structure and Function
No ratings yet
Lec03 - Processor Structure and Function
55 pages
CPU Structure & Functions
No ratings yet
CPU Structure & Functions
44 pages
Unit V
No ratings yet
Unit V
23 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
Chapter 13 - Instruction Level Parallelism
No ratings yet
Chapter 13 - Instruction Level Parallelism
16 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
Lecture 4 (PENTIUM)
No ratings yet
Lecture 4 (PENTIUM)
20 pages
Module 4 part-II
No ratings yet
Module 4 part-II
22 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
CPU Structure and Function
100% (1)
CPU Structure and Function
30 pages
Frederick Jones - Juvenal and The Satiric Genre
No ratings yet
Frederick Jones - Juvenal and The Satiric Genre
225 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
William Stallings Computer Organization and Architecture: CPU Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture: CPU Structure and Function
40 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
Instruction Level Parallelism and Superscalar Processors
No ratings yet
Instruction Level Parallelism and Superscalar Processors
34 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
40 pages
STW120CT Computer Architecture and Networks: (Instruction Pipelining)
No ratings yet
STW120CT Computer Architecture and Networks: (Instruction Pipelining)
24 pages
CH12 CPU Structure and Function
No ratings yet
CH12 CPU Structure and Function
44 pages
CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
SSC Stihl SC GB
No ratings yet
SSC Stihl SC GB
64 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
GOT Barcode Reader Function
No ratings yet
GOT Barcode Reader Function
8 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Lesson Plan: Class: Duration: Subject: Lesson No.: Lesson Title
No ratings yet
Lesson Plan: Class: Duration: Subject: Lesson No.: Lesson Title
26 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
Quest For The Centre of The Old Testamen
No ratings yet
Quest For The Centre of The Old Testamen
14 pages
Comic Strips: Comic Strip Definition & Meaning
No ratings yet
Comic Strips: Comic Strip Definition & Meaning
20 pages
01 - Mod 2 - Livro Autorresponsabilidade
No ratings yet
01 - Mod 2 - Livro Autorresponsabilidade
9 pages
Pe3 - Week 3 4 - Classification of Dance
100% (1)
Pe3 - Week 3 4 - Classification of Dance
25 pages
Unit - V: Principles of HDL
No ratings yet
Unit - V: Principles of HDL
56 pages
Social N Regional Notes
No ratings yet
Social N Regional Notes
5 pages
Syllabus
No ratings yet
Syllabus
5 pages
IDoc Status Description
No ratings yet
IDoc Status Description
15 pages
100 Common Idioms in English
No ratings yet
100 Common Idioms in English
9 pages
Study
No ratings yet
Study
31 pages
MMW Module 2.1
No ratings yet
MMW Module 2.1
5 pages
Siddhantamuktavali: Sevyaswaroop
No ratings yet
Siddhantamuktavali: Sevyaswaroop
24 pages
13 Custom Auth Server
No ratings yet
13 Custom Auth Server
9 pages
Cap540 Download Problems
No ratings yet
Cap540 Download Problems
6 pages
RPS Bhs Arab 2 Ekos 18 - 19
No ratings yet
RPS Bhs Arab 2 Ekos 18 - 19
5 pages
Introduction To Dynamic Spin Chemistry Magnetic Field Effects On Chemical and Biochemical Reactions Hisaharu Hayashi PDF Download
No ratings yet
Introduction To Dynamic Spin Chemistry Magnetic Field Effects On Chemical and Biochemical Reactions Hisaharu Hayashi PDF Download
27 pages
Animated Presenation Template (With Morph Transition)
No ratings yet
Animated Presenation Template (With Morph Transition)
8 pages
5 Meeting Preparation and Format
No ratings yet
5 Meeting Preparation and Format
9 pages
SAS Enterprise Miner Tutorial
No ratings yet
SAS Enterprise Miner Tutorial
2 pages
Ic5 l2 Wq u5to7 Quiz 高科大112-2
No ratings yet
Ic5 l2 Wq u5to7 Quiz 高科大112-2
3 pages
Tieng Anh 8 Sach Moi de Thi Giua Hoc Ki 2
No ratings yet
Tieng Anh 8 Sach Moi de Thi Giua Hoc Ki 2
6 pages
8 Ci Sinif Otk .Az 2023 n2
No ratings yet
8 Ci Sinif Otk .Az 2023 n2
3 pages
Why Do We Read Literature
No ratings yet
Why Do We Read Literature
2 pages
B.E (2019 Pattern)
No ratings yet
B.E (2019 Pattern)
2 pages
Impormal Na Sektor Halimbawa - Kahulugan at Iba Pa
No ratings yet
Impormal Na Sektor Halimbawa - Kahulugan at Iba Pa
1 page
19cse214: Theory of Computation: Case Study Report
No ratings yet
19cse214: Theory of Computation: Case Study Report
5 pages

Presentation Cea Chapter16 2 Demo

Uploaded by

Presentation Cea Chapter16 2 Demo

Uploaded by

Chapter 16

Instruction-Level Parallelism and Superscalar

Essence of the approach is Concept can be further

Instruction level parallelism Machine parallelism

Superscalar instruction issue policies can be

May result in a pipeline stall

Registers allocated dynamically

the Pentium 4 architecture implements a CISC

You might also like