Lec 15

This document discusses techniques for increasing instruction level parallelism (ILP) through multiple issue, including superscalar and very long instruction word (VLIW) processors. It describes how superscalar processors can issue varying numbers of instructions per cycle through static or dynamic scheduling. Examples are given of static scheduling in a superscalar DLX processor, including how dynamic scheduling and scoreboarding can be applied. The limitations of multiple issue due to inherent program ILP, hardware costs, and implementation complexity are covered. Compiler techniques like loop unrolling, software pipelining, and trace scheduling that support ILP are also summarized.

Uploaded by

jyothibellary4233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views15 pages

Lec 15

Uploaded by

jyothibellary4233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

LECTURE - 15

Further Topics in ILP

Multiple issue

Software support

Hardware support
Increasing ILP through
Multiple Issue

With at most one issue per cycle, min CPI
possible is 1
But there are multiple functional units
Hence use multiple issue

Two ways to do multiple issue
Superscalar processor

Issue varying number of instructions per cycle

Static or dynamic scheduling
Very Large Instruction Word (VLIW)

Issue a fixed number of instructions
Superscalar DLX

Simple version: two instructions issued per
cycle
One integer (load, store, branch, integer ALU)
and one FP
Instructions paired and aligned on 64-bit
boundaries –int fi rst, FP next
CC1 CC2 CC3 CC4 CC5 CC6
Integer IF ID EX MEM WB
FP IF ID EX MEM WB
Integer IF ID EX MEM WB
FP IF ID EX MEM WB
Superscalar DLX (continued)

No conflicts, almost...
Assuming separate register sets, only FP load,
store, move cause problems

Structural hazard on register port

New RAW hazard between a pair of instructions
Structural hazard:

Detect, and do not issue the FP operation

Or, provide additional register ports
RAW hazard:

Detect, and do not issue the FP operation

Also, result of LD cannot be used for 3
instns.
Static Scheduling in the
Superscalar DLX: An Example
Loop: LD F0, 0(R1) // F0 is array element
ADDD F4, F0, F2 // F2 has the scalar 'C'
SD 0(R1), F4 // Stored result
SUBI R1, R1, 8 // For next iteration
BNEZ R1, Loop // More iterations?
Loop: LD F0, 0(R1)
LD F6, -8(R1)
LD F10, -8(R1) ADDD F4, F0, F2
LD F14, -8(R1) ADDD F8, F6, F2
LD F18, -8(R1) ADDD F12, F10, F2
SD 0(R1), F4 ADDD F16, F14, F2
SD -8(R1), F8 ADDD F20, F18, F2
SD -16(R1), F12
SUBI R1, R1, #40
SD -24(R1), F16
BNEZ R1, Loop
Dynamic Scheduling in the
Superscalar DLX

Scoreboard or Tomasulo can be applied

Should preserve in-order issue!
Use separate data structures for Int and FP

When the instruction pair has a dependence
We wish to issue both in the same cycle
Two approaches:

Pipeline the issue stage, so that it runs twice as fast

Exclude load/store buffers from the set of RSs
Multiple Issue using VLIW

Superscalar ==> too much hardware
For hazard detection, scheduling

Alternative: let compiler do all the scheduling
VLIW (Very Large Instruction Word)
E.g., an VLIW may include 2 Int, 2 FP, 2 mem,
and a branch
Limitations to Multiple Issue

Why not 10 issues per cycle? Why not 20?

Three limitations:
Inherent ILP limitations in programs
Hardware costs (even for VLIW)

Memory/register bandwidth
Implementation issues:

Superscalar: complexity of hardware logic

VLIW: increased code size, binary compatibility
problems
Support for ILP

Software (compiler) support

Hardware support

Combination of both
Compiler Support for ILP

Loop unrolling:
Dependence analysis is a major component
Analysis is simple when array indices are linear
in the loop variable (called affine indices)

Limitations to dependence analysis:
Pointers
Indirect indexing
Analysis has to consider corner cases too
Compiler Support for ILP
(continued)

Two important techniques:
Software pipelining
Trace scheduling

Software pipelining: reorganize a loop such
that each iteration is made from instructions
chosen from different iterations of the original
loop
Software Pipelining
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Software
pipelined
iteration
Software Pipelining in Our
Example
Loop: LD F0, 0(R1) // F0 is array element
ADDD F4, F0, F2 // F2 has the scalar 'C'
SD 0(R1), F4 // Stored result
SUBI R1, R1, 8 // For next iteration
BNEZ R1, Loop // More iterations?
Iter i: LD F0, 0(R1)
ADDD F4, F0, F2 Software Pipelined Loop
SD 0(R1), F4 Loop: SD 16(R1), F4
Iter i+1: LD F0, 0(R1) ADDD F4, F0, F2
ADDD F4, F0, F2 LD F0, 0(R1)
SD 0(R1), F4 SUBI R1, R1, 8
Iter i+2: LD F0, 0(R1) BNEZ R1, Loop
ADDD F4, F0, F2
SD 0(R1), F4
Trace Scheduling

Compiler picks a program
A[i] = A[i] + B[i]
trace which it considers
most likely
T F
Schedule instructions from A[i] = 0?
the trace
B[i] = ... X = ...
And branches into and out
of the trace
Also need bookkeeping
instructions in case the
trace is not taken during C[i] = ...

execution

2.advanced Compiler Support For ILP
100% (1)
2.advanced Compiler Support For ILP
16 pages
Lec9 Multiple Issue Processors
No ratings yet
Lec9 Multiple Issue Processors
33 pages
Superscalar Vs VLIW
No ratings yet
Superscalar Vs VLIW
30 pages
Software Pipelining Patterson 1996
No ratings yet
Software Pipelining Patterson 1996
60 pages
Lecture12 Vliw
No ratings yet
Lecture12 Vliw
19 pages
Computer Architecture Revision For Final Exam
No ratings yet
Computer Architecture Revision For Final Exam
60 pages
Chapter 6 PPTV 2004 Short V1
No ratings yet
Chapter 6 PPTV 2004 Short V1
21 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
03a ILP Superscalar VLIW
No ratings yet
03a ILP Superscalar VLIW
21 pages
Module3
No ratings yet
Module3
49 pages
Vliw/Epic:: Statically Scheduled ILP
No ratings yet
Vliw/Epic:: Statically Scheduled ILP
34 pages
Zareen 13
No ratings yet
Zareen 13
13 pages
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
No ratings yet
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
24 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Cs152 Sp16 F Sol VLIW
No ratings yet
Cs152 Sp16 F Sol VLIW
40 pages
CSE 431 Computer Architecture Fall 2005 Lecture 17: VLIW Processors
No ratings yet
CSE 431 Computer Architecture Fall 2005 Lecture 17: VLIW Processors
18 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
02b ILP Superscalar VLIW
No ratings yet
02b ILP Superscalar VLIW
20 pages
03ILP Speculation and Advanced Topics
No ratings yet
03ILP Speculation and Advanced Topics
48 pages
Compiling For Vliws and Ilp: Profiling Region Formation Acyclic Scheduling Cyclic Scheduling
No ratings yet
Compiling For Vliws and Ilp: Profiling Region Formation Acyclic Scheduling Cyclic Scheduling
46 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
VLIW Processors: Spring 2003 CSE P548 1
No ratings yet
VLIW Processors: Spring 2003 CSE P548 1
17 pages
Lec18-Static BRANCH PREDICTION VLIW
No ratings yet
Lec18-Static BRANCH PREDICTION VLIW
40 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Lecture #2
No ratings yet
Lecture #2
11 pages
Unit II
No ratings yet
Unit II
84 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
13 pages
Advanced Topics in Computer Architecture ECE 7373
No ratings yet
Advanced Topics in Computer Architecture ECE 7373
40 pages
Exploiting Instruction-Level Parallelism With Software Approaches
No ratings yet
Exploiting Instruction-Level Parallelism With Software Approaches
108 pages
Very Large Scale Instruction Word
No ratings yet
Very Large Scale Instruction Word
22 pages
CAunitiii
No ratings yet
CAunitiii
36 pages
Exploiting ILP With Software Approach
No ratings yet
Exploiting ILP With Software Approach
104 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Lecture 13: Trace Scheduling, Conditional Execution, Speculation, Limits of ILP
No ratings yet
Lecture 13: Trace Scheduling, Conditional Execution, Speculation, Limits of ILP
21 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
18 pages
Vliw Processor: Submitted By, Manjiri Phadnis. Neha Naik. Guided By, Prof. M.S. Nagmode
No ratings yet
Vliw Processor: Submitted By, Manjiri Phadnis. Neha Naik. Guided By, Prof. M.S. Nagmode
23 pages
HW 2 Is Out! Due 9/25!
No ratings yet
HW 2 Is Out! Due 9/25!
21 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Superscalar and VLIW Architectures
No ratings yet
Superscalar and VLIW Architectures
35 pages
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
No ratings yet
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
26 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
Multiple Issue
No ratings yet
Multiple Issue
10 pages
Vliw Processors
No ratings yet
Vliw Processors
20 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
From Everand
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
Derek Molloy
4/5 (1)
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
Second-Generation Stack Computer Architecture
No ratings yet
Second-Generation Stack Computer Architecture
178 pages
BCS-29 Advanced Computer Architecture
No ratings yet
BCS-29 Advanced Computer Architecture
496 pages
Chapter 02 RISC V
No ratings yet
Chapter 02 RISC V
92 pages
Atmega 328
No ratings yet
Atmega 328
7 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
From Everand
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Redouane MEDDANE
No ratings yet
Lec 22
No ratings yet
Lec 22
14 pages
Complete 8086 Instruction Set: m1 DB ? m2 DW ?
No ratings yet
Complete 8086 Instruction Set: m1 DB ? m2 DW ?
53 pages
Mca Iii Semester Software Lab Ii - Practicals List: Assignments For Design and Analysis of Algorithms (Daa)
No ratings yet
Mca Iii Semester Software Lab Ii - Practicals List: Assignments For Design and Analysis of Algorithms (Daa)
28 pages
Parallelism (2) & Heterogeneous Computing & Future Perspetives
No ratings yet
Parallelism (2) & Heterogeneous Computing & Future Perspetives
50 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Ganga-Ashtakam-1 Telugu PDF File9839
No ratings yet
Ganga-Ashtakam-1 Telugu PDF File9839
3 pages
12-ALU - Data Path and Control Unit - Hardwired Control Unit and Micro Programmed Control Uni
No ratings yet
12-ALU - Data Path and Control Unit - Hardwired Control Unit and Micro Programmed Control Uni
12 pages
2 2micropro
No ratings yet
2 2micropro
28 pages
Lect02.LecJan12 2006.PipelineProcessor
No ratings yet
Lect02.LecJan12 2006.PipelineProcessor
34 pages
Superpipelining
No ratings yet
Superpipelining
7 pages
Chapter 6 Instruction Set of 8085 & Programming
100% (3)
Chapter 6 Instruction Set of 8085 & Programming
103 pages
Lec 19
No ratings yet
Lec 19
19 pages
Lec 11
No ratings yet
Lec 11
19 pages
Lec 06
No ratings yet
Lec 06
18 pages
Lec 12
No ratings yet
Lec 12
15 pages
Comp Architecture Sample Questions
No ratings yet
Comp Architecture Sample Questions
9 pages
Lec 24
No ratings yet
Lec 24
14 pages
Lec 13
No ratings yet
Lec 13
13 pages
Lec 05
No ratings yet
Lec 05
13 pages
TSP Java
No ratings yet
TSP Java
11 pages
TSP Java
No ratings yet
TSP Java
11 pages
Lecture4-Introduction To 8085 Instruction Set
No ratings yet
Lecture4-Introduction To 8085 Instruction Set
47 pages
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Unit 4
No ratings yet
Unit 4
31 pages
PassMark - (Dual CPU) Intel Xeon E5-2630 v2 at 2.60GHz - Price Performance Comparison PDF
No ratings yet
PassMark - (Dual CPU) Intel Xeon E5-2630 v2 at 2.60GHz - Price Performance Comparison PDF
4 pages
Section 9.1 Programming 8051 Timers
No ratings yet
Section 9.1 Programming 8051 Timers
4 pages
ECE401 Sec03
No ratings yet
ECE401 Sec03
24 pages
Micro Library
No ratings yet
Micro Library
9 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
102 Ozq 7 X G2 G 7 DC PJos Qpsag Ma 4 W YXMQp
No ratings yet
102 Ozq 7 X G2 G 7 DC PJos Qpsag Ma 4 W YXMQp
22 pages
Multiplier and Accumulator Concepts in Digital Signal Processing
No ratings yet
Multiplier and Accumulator Concepts in Digital Signal Processing
8 pages
Microprocessor 4
No ratings yet
Microprocessor 4
26 pages
Amd Micro Architecture
No ratings yet
Amd Micro Architecture
15 pages
Lec 31
No ratings yet
Lec 31
5 pages
Wireless nRF24L01 Con Temp
No ratings yet
Wireless nRF24L01 Con Temp
10 pages
(Cpre 381) Computer Organization and Assembly-Level Programming, Fall 2018 Project A Report
No ratings yet
(Cpre 381) Computer Organization and Assembly-Level Programming, Fall 2018 Project A Report
10 pages
Motorola 6809 and Hitachi 6309 Programming Reference (Darren Atkinson) - 146-153
No ratings yet
Motorola 6809 and Hitachi 6309 Programming Reference (Darren Atkinson) - 146-153
8 pages
Presantation Topic
No ratings yet
Presantation Topic
6 pages
Cit309 2023 - 1 Tma12&3 25 - 30
No ratings yet
Cit309 2023 - 1 Tma12&3 25 - 30
4 pages
Embedded Processors - PRELIM - QP - 2017-18
No ratings yet
Embedded Processors - PRELIM - QP - 2017-18
2 pages
Discussion & Conclusion + Screenshot PDF
No ratings yet
Discussion & Conclusion + Screenshot PDF
2 pages
Tiger SHARC Processor Seminar
No ratings yet
Tiger SHARC Processor Seminar
2 pages
MP QUESTION PAPER DU 3rd YEAR
No ratings yet
MP QUESTION PAPER DU 3rd YEAR
2 pages
Lec 03
No ratings yet
Lec 03
16 pages

Lec 15

Uploaded by

Lec 15

Uploaded by

LECTURE - 15

Further Topics in ILP

You might also like