Multiple Issue

1) The document discusses multiple instruction width processors that can execute more than one instruction per cycle, including superscalar processors and VLIW processors. 2) Superscalar processors issue instructions out of order using dynamic scheduling by hardware, while VLIW processors issue a fixed number of instructions scheduled at compile time. 3) In-order superscalars require static scheduling by the compiler to avoid hazards, while out-of-order superscalars can dynamically schedule around stalls using hardware.

Uploaded by

Nusrat Mary Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views10 pages

Multiple Issue

Uploaded by

Nusrat Mary Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

1

Autumn 2006 CSE P548 - Multiple Instruction

Width
1
Multiple Instruction Issue
Multiple instructions issued each cycle
a processor that can execute more than one instruction per cycle
issue width = the number of issue slots, 1 slot/instruction
not all types of instructions can be issued together
an example: 2 ALUs, 1 load/store unit, 1 FPU
1 ALU does shifts & integer multiplies; the other
executes branches
Motivation:
better performance
increase instruction throughput
decrease in CPI (below 1)
Cost:
greater hardware complexity, potentially longer wire lengths
harder code scheduling job for the compiler
Autumn 2006 CSE P548 - Multiple Instruction
Width
2
Superscalars
Require:
instruction fetch
fetching of multiple instructions at once
dynamic branch prediction & fetching speculatively beyond
conditional branches
instruction issue
methods for determining which instructions can be issued next
the ability to issue multiple instructions in parallel
instruction commit
methods for committing several instructions in fetch order
duplicate & more complex hardware
2
Autumn 2006 CSE P548 - Multiple Instruction
Width
3
2-way Superscalar
Autumn 2006 CSE P548 - Multiple Instruction
Width
4
Multiple Instruction Issue
Superscalar processors
instructions are scheduled for execution by the hardware
different numbers of instructions may be issued simultaneously
VLIW (very long instruction word) processors
instructions are scheduled for execution by the compiler
a fixed number of operations are formatted as one big instruction
usually LIW(3 operations) today
3
Autumn 2006 CSE P548 - Multiple Instruction
Width
5
In-order vs. Out-of-order Execution
In-order instruction execution
instructions are fetched, executed & committed in compiler-
generated order
if one instruction stalls, all instructions behind it stall
instructions are statically scheduled by the hardware
scheduled in compiler-generated order
how many of the next n instructions can be issued, where n is
the superscalar issue width
superscalars can have hazards within the n instructions
advantage of in-order instruction scheduling: simpler
implementation
faster clock cycle
fewer transistors
faster design/development/debug time
Autumn 2006 CSE P548 - Multiple Instruction
Width
6
In-order vs. Out-of-order Execution
Out-of-order instruction execution
instructions are fetched in compiler-generated order
instruction completion may be in-order (today) or out-of-order (older
computers)
in between they may be executed in some other order
instructions are dynamically scheduled by the hardware
hardware decides in what order instructions can be executed
instructions behind a stalled instruction can pass it
advantages: higher performance
better at hiding latencies, less processor stalling
higher utilization of functional units
4
Autumn 2006 CSE P548 - Multiple Instruction
Width
7
In-order instruction issue: Alpha 21164
2 styles of static instruction scheduling
dispatch buffer & instruction slotting (Alpha 21164)
shift register model (UltraSPARC-1)
Autumn 2006 CSE P548 - Multiple Instruction
Width
8
In-order instruction issue: Alpha 21164
Instruction slotting
can issue up to 4 instructions
completely empty the instruction buffer before fill it again
compiler can pad with nops so a conflicting instructions are
issued with the following instructions, not alone
can be no data dependences in same issue cycle (some
exceptions)
hardware to:
detect data hazards
control bypass logic
5
Autumn 2006 CSE P548 - Multiple Instruction
Width
9
21164 Instruction Unit Pipeline
Fetch & issue
S0: instruction fetch
branch prediction bits read
S1: opcode decode
target address calculation
if predict taken, redirect the fetch
S2: instruction slotting: decide which of the next 4 instructions can
be issued
intra-cycle structural hazard check
intra-cycle data hazard check
S3: instruction dispatch
inter-cycle load-use hazard check
register read
Autumn 2006 CSE P548 - Multiple Instruction
Width
10
21164 Integer Pipeline
Execute (2 integer pipelines)
S4: integer execution
effective address calculation
S5: conditional move & branch execution
data cache access
S6: register write
also a 9-stage FP pipeline
6
Autumn 2006 CSE P548 - Multiple Instruction
Width
11
Autumn 2006 CSE P548 - Multiple Instruction
Width
12
In-order instruction issue: UltraSparc 1
Shift register model
can issue up to 4 instructions per cycle
shift in new instructions after every group of instructions is issued
some data dependent instructions can issue in same cycle
7
Autumn 2006 CSE P548 - Multiple Instruction
Width
13
UltraSPARC 1
Autumn 2006 CSE P548 - Multiple Instruction
Width
14
8
Autumn 2006 CSE P548 - Multiple Instruction
Width
15
Superscalars
Performance impact:
increase performance because execute multiple instructions in
parallel, not just overlapped
CPI potentially < 1 (.5 on our R3000 example)
IPC (instructions/cycle) potentially > 1 (2 on our R3000 example)
better functional unit utilization
but
need to fetch more instructions how many?
need independent instructions why?
need a good local mix of instructions why?
need more instructions to hide load delays why?
need to make better branch predictions why?
Autumn 2006 CSE P548 - Multiple Instruction
Width
16
Code Scheduling on Superscalars
Original code
Loop: lw R1, 0(R5)
addu R1, R1, R6
sw R1, 0(R5)
addi R5, R5, -4
bne R5, R0, Loop
9
Autumn 2006 CSE P548 - Multiple Instruction
Width
17
Code Scheduling on Superscalars
ALU/branch instructions memory instructions clock cycle
Loop: 1
2
3
4
With latency-hiding code scheduling
Loop: lw R1, 0(s1)
addi R5, R5, -4
addu R1, R1, R6
sw R1, 4(R5)
bne R5, $0, Loop
Original code
Loop: lw R1, 0(R5)
addu R1, R1, R6
sw R1, 0(R5)
addi R5, R5, -4
bne R5, R0, Loop
Autumn 2006 CSE P548 - Multiple Instruction
Width
18
Code Scheduling on Superscalars: Loop Unrolling
What is the cycles per iteration?
What is the IPC?
Loop unrolling provides:
+
+
-
-
ALU/branch instructions Memory instructions clock cycle
Loop: addi R5, R5, -16 lw R1, 0(R5) 1
lw R2, 12(R5) 2
addu R1, R1, R6 lw R3, 8(R5) 3
addu R2, R2, R6 lw R4, 4(R5) 4
addu R3, R3, R6 sw R1, 16(R5) 5
addu R4, R4, R6 sw R2, 12(R5) 6
sw R3, 8(R5) 7
bne R5, R0, Loop sw R4, 4(R5) 8
10
Autumn 2006 CSE P548 - Multiple Instruction
Width
19
Superscalars
Hardware impact:
more & pipelined functional units
multi-ported registers for multiple register access
more buses from the register file to the additional functional units
multiple decoders
more hazard detection logic
more bypass logic
wider instruction fetch
multi-banked L1 data cache
or else the processor has structural hazards (due to an unbalanced
design) and stalling
The restrictions on instruction types that can be issued together help to
reduce the amount of hardware.
Static (compiler) scheduling helps.
Autumn 2006 CSE P548 - Multiple Instruction
Width
20
Modern Superscalars
Alpha 21364: 4 instructions
Pentium IV: 5 RISClike operations dispatched to functional units
R12000: 4 instructions
UltraSPARC-3: 6 instructions dispatched

Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
ELECH473 Th04
No ratings yet
ELECH473 Th04
59 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
7.2.5.3 Packet Tracer - Configuring IPv6 Addressing Instructions PDF
No ratings yet
7.2.5.3 Packet Tracer - Configuring IPv6 Addressing Instructions PDF
3 pages
5 Advanced-1
No ratings yet
5 Advanced-1
60 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Lec 7 CSE-509 Pipelining
No ratings yet
Lec 7 CSE-509 Pipelining
27 pages
MIPS Pipeline For Multi-Cycle Operations: CS223 Computer Architecture & Organization
No ratings yet
MIPS Pipeline For Multi-Cycle Operations: CS223 Computer Architecture & Organization
15 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Lec9 Multiple Issue Processors
No ratings yet
Lec9 Multiple Issue Processors
33 pages
Pipe 4
No ratings yet
Pipe 4
50 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
Unit 5
No ratings yet
Unit 5
44 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
MAD Unit 2
No ratings yet
MAD Unit 2
14 pages
Course 3 Module 5
No ratings yet
Course 3 Module 5
23 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
Anti-Virus Policy
No ratings yet
Anti-Virus Policy
4 pages
T Rec G.168 200701 S!!PDF e
No ratings yet
T Rec G.168 200701 S!!PDF e
120 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
User's Manual: Notebook
No ratings yet
User's Manual: Notebook
58 pages
03ILP Speculation and Advanced Topics
No ratings yet
03ILP Speculation and Advanced Topics
48 pages
DSP q1
No ratings yet
DSP q1
7 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Am 3517
No ratings yet
Am 3517
221 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
Exploiting Instruction-Level Parallelism With Software Approaches
No ratings yet
Exploiting Instruction-Level Parallelism With Software Approaches
108 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
Unit II
No ratings yet
Unit II
84 pages
EazyDraw Manual
No ratings yet
EazyDraw Manual
391 pages
07 Superscalar
No ratings yet
07 Superscalar
12 pages
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
No ratings yet
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
24 pages
Компјутерски мрежи проект (КНИА)
100% (1)
Компјутерски мрежи проект (КНИА)
11 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Superscalar
No ratings yet
Superscalar
38 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
Xx-Iip & Ilp
No ratings yet
Xx-Iip & Ilp
16 pages
Lec18-Static BRANCH PREDICTION VLIW
No ratings yet
Lec18-Static BRANCH PREDICTION VLIW
40 pages
Advanced Topics in Computer Architecture ECE 7373
No ratings yet
Advanced Topics in Computer Architecture ECE 7373
40 pages
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
No ratings yet
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
23 pages
Subba Thesis
No ratings yet
Subba Thesis
182 pages
Byou Dissertation
No ratings yet
Byou Dissertation
177 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Dell 2000 Storage
No ratings yet
Dell 2000 Storage
115 pages
Gateway
No ratings yet
Gateway
80 pages
Gateway
No ratings yet
Gateway
80 pages
Studies Abroad Counselors
No ratings yet
Studies Abroad Counselors
38 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
Subb Arao
No ratings yet
Subb Arao
191 pages
Will My US Phone Work in Dominican Republic?
No ratings yet
Will My US Phone Work in Dominican Republic?
5 pages
p3 - Chapter 4 - Processors and Computer architecture-6-mnlEWe66XLtD460P PDF
No ratings yet
p3 - Chapter 4 - Processors and Computer architecture-6-mnlEWe66XLtD460P PDF
8 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Free - Proxy - List HTTP
No ratings yet
Free - Proxy - List HTTP
2 pages
Farmakope Indonesia Edisi 4 PDF
25% (4)
Farmakope Indonesia Edisi 4 PDF
4 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Lec 15
No ratings yet
Lec 15
15 pages
CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
43 pages
WWW Study-India Co in
No ratings yet
WWW Study-India Co in
16 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
40 pages
Portfolio PDF
No ratings yet
Portfolio PDF
29 pages
Especificaciones Técnicas Equipos de Red
No ratings yet
Especificaciones Técnicas Equipos de Red
7 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
SDLMAME
No ratings yet
SDLMAME
5 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
7 - SCR
No ratings yet
7 - SCR
11 pages
The Ielts Exam - : Reading
No ratings yet
The Ielts Exam - : Reading
11 pages
Evohtmltopdf
No ratings yet
Evohtmltopdf
2 pages
IBW Best Practices Ebook CO-110744-En
No ratings yet
IBW Best Practices Ebook CO-110744-En
55 pages
Ac Communications Manual
100% (1)
Ac Communications Manual
44 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
Name: Rafi Dar: Very Large Instruction Word
No ratings yet
Name: Rafi Dar: Very Large Instruction Word
18 pages
Advantages and Disadvantages (TAN)
No ratings yet
Advantages and Disadvantages (TAN)
12 pages
The Ielts Exam - Listening
No ratings yet
The Ielts Exam - Listening
9 pages
Application of High Power Thyristors in HVDC and FACTS Systems
No ratings yet
Application of High Power Thyristors in HVDC and FACTS Systems
8 pages
ACR1252U: NFC Forum Certified Reader
No ratings yet
ACR1252U: NFC Forum Certified Reader
7 pages
Agent Data in The SAP Solution Manager LMDB
No ratings yet
Agent Data in The SAP Solution Manager LMDB
4 pages
Networking and Protocols Interview Questions
No ratings yet
Networking and Protocols Interview Questions
5 pages
Publications Requirements 1.4
No ratings yet
Publications Requirements 1.4
11 pages
SANS Survey On Application Security Programs and Practices: Sponsored by Qualys
No ratings yet
SANS Survey On Application Security Programs and Practices: Sponsored by Qualys
23 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
A New Current-Source Converter Using A Symmetric Gate-Commutated Thyristor (SGCT)
No ratings yet
A New Current-Source Converter Using A Symmetric Gate-Commutated Thyristor (SGCT)
8 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
Performance Tune Your Iseries For E1
No ratings yet
Performance Tune Your Iseries For E1
88 pages
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
No ratings yet
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
30 pages
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
No ratings yet
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
30 pages
Analysis of The Task Superscalar Architecture Hardware Design
No ratings yet
Analysis of The Task Superscalar Architecture Hardware Design
10 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Sushil Kumar Mishra: Objective
No ratings yet
Sushil Kumar Mishra: Objective
2 pages
700-V Asymmetrical 4H-Sic Gate Turn-Off Thyristors (Gto'S)
No ratings yet
700-V Asymmetrical 4H-Sic Gate Turn-Off Thyristors (Gto'S)
3 pages
Superscalar and Superpipelined Processors
No ratings yet
Superscalar and Superpipelined Processors
4 pages
SAP Web Dynpro ABAP Interview Questions Part1
No ratings yet
SAP Web Dynpro ABAP Interview Questions Part1
21 pages
8.1.2.7 Lab - Using The Windows Calculator With Network Addresses
No ratings yet
8.1.2.7 Lab - Using The Windows Calculator With Network Addresses
7 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
HUAWEI E397 Datasheet
No ratings yet
HUAWEI E397 Datasheet
2 pages
Success Starts With IELTS: British Council 2012
No ratings yet
Success Starts With IELTS: British Council 2012
33 pages
Ieltsreadingpreparationtips 121229130101 Phpapp02
No ratings yet
Ieltsreadingpreparationtips 121229130101 Phpapp02
29 pages
Rio 500 Manual
50% (2)
Rio 500 Manual
2 pages
Syllabus ptsv2
No ratings yet
Syllabus ptsv2
7 pages
Computer & Accessories: Afcer - Ethioia@ethionet - Et
No ratings yet
Computer & Accessories: Afcer - Ethioia@ethionet - Et
4 pages
Superpipelining
No ratings yet
Superpipelining
7 pages
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet
Machine Learning Internship Report
33% (9)
Machine Learning Internship Report
31 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet

Multiple Issue

Uploaded by

Multiple Issue

Uploaded by

1

Autumn 2006 CSE P548 - Multiple Instruction

You might also like