0% found this document useful (0 votes)

33 views26 pages

Lecture13 Pipeline1

Uploaded by

sl4429056

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views26 pages

Lecture13 Pipeline1

Uploaded by

sl4429056

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Instruction Level Parallelism

Pipeline Architecture

Indian Institute of Technology Tirupati

Jaynarayan T Tudu
[[email protected]]

Computer System Architecture (CS5202)

19th March, 2020
Pipeline CPU
- Pipeline architecture and design

- Performance measurement

Recall (Quantitative Principles):

- Get Advantage of Parallelism (Processor Design)

- Principle of Locality (Memory System Design)
- Make common case faster (Used in all aspect, Loop)
- Amdhal’s Law (IPC improvement)
CPU Design
Objective: To execute the ISA CPU design begins from ISA
ISA Register ALU Type Year
IBM 701 1 Accumulator 1953
CDC6600 8 Load-store 1963
IBM360 18 Reg-Mem 1964
DEC PDP-8 1 Accumulator 1965
DEC PDP-11 8 Reg-Mem 1970
Intel 8008 1 Accumulator 1974
DEC VAX 16 Reg-Mem Mem-Mem 1977
Motorola 16 Reg-Mem 1980
Intel 80386 8 Reg-Mem 1985
ARM 16 Load-store 1985
MIPS 32 Load-store 1985
HP PARISC 32 Load-store 1986
SPARC 32 Load-store 1987
Power PC 32 Load-store 1992
IA-64 128 Load-store 2001
AMD64 16 Reg-Mem 2003
x86-64 16 Reg-Mem 2003
RISC-V 32 Load-store 2010
CPU Design
Evolution of CPU Design:

Example: ADD R1 R2 R3 | R1 ← R2 + R3
We will design a processor that executes the above ADD instruction!

R3
R2
PC → ADD R1 R2 R3
R1

+
Control

Single cycle CPU design: All the micro-operations of a given instruction

need to be carried out in just one cycle.
CPU Design
Single cycle CPU design Imagine a processor for large ISA!

Instructions Actual Time

----------------------------------------------- Total time to execute the program:
ADD R10, R0, R0 2 ns
ADD R11, R8, R8 2 ns = instruction count x cycle time
ADDI R12, R11, 80 2 ns = 7 x 4 = 28 ns
Loop:
LW R13, 0(R11) 4 ns Where cycle time is determined by the
ADD R10, R10, R13 2 ns maximum time of any instruction.
ADDI R11, R11, 4 2 ns
BLT R11, R12, Loop 2 ns

, x0 x8 , 80
x0 , 1
x8 x1

)
11
0, 1, 2,

x
1
0(
x 1 x 1
D x
DI
3,
D D
x1
A D
A AD
LW

4 ns
CPU Design
Single cycle CPU design:
Important points to note with respect to single cycle design:

1) Each instruction take only one cycle to complete execution.

2) Every instruction has to be go through the five important phases:
- Fetch the instruction from memory
- Decode the instruction to identify the the operands and
generate control signal
- Fetch the necessary operand either from Registers, immediate, or
data memory according to the addressing mode.
- Perform the necessary operation such as ALU, Load, Store, Branching etc
-Write the results back to register or data memory

3) All these micro-operations are to be performed in just one cycle.

4) Since the single cycle processor uses only one clock of fixed period, it implies
that all the instruction would require same cycle time.
5) One things to observe for a processor is: it has two different set of paths:
- control path
- data path
(in the later designs these paths will be isolated systematically to create multi-cycle and pipeline)
CPU Design
Multi-cycle CPU design:

The first question we ask is what is the problem with single cycle design?

We need to think in terms of performance gain and loss!

We also need to think in terms of hardware area overhead!
We also need to think in terms power consumption!

Instructions Actual Time

-----------------------------------------------
ADD R10, R0, R0 2 ns - What if we design a processor with
ADD R11, R8, R8 2 ns clock cycle time of 2 ns?
ADDI R12, R11, 80 2 ns
Loop: - LW would take two cycles to complete
LW R13, 0(R11) 4 ns - Rest all instruction would take one
ADD R10, R10, x13 2 ns cycle
ADDI R11, R11, 4 2 ns
BLT R11, R12, Loop 2 ns
CPU Design
Multi-cycle CPU design:

For single cycle CPU

4 ns

For multi-cycle CPU

Instructions Actual Time
2 ns 2 ns ----------------------------------------
CC1 ADD R10, R0, R0 2 ns
CC1 CC2 CC3 CC4 CC5 CC2 ADD R11, R8, R8 2 ns
CC3 ADDI R12, R11, 80 2 ns
Loop:
CC4 CC5 LW R13, 0(R11) 4 ns
ADD R10, R10, R13 2 ns
ADDI R11, R11, 4 2 ns
BLT R11, R12, Loop 2 ns
CPU Design
Multi-cycle CPU design:

The idea is to partition the data path and buses

The path being broken by introducing Temp Reg.

R15
….
PC → ADD R1 R2 R3
R0

LW R13, 0(R11)
Temp Reg +
Control
CPU Design
Multi-cycle CPU design: Performance analysis

Instructions Actual Time

-----------------------------------------------
Total execution time =
ADD R10, R0, R0 2 ns
ADD R11, R8, R8 2 ns
2+2+2+4+2+2+2
ADDI R12, R11, 80 2 ns
= 16 ns
Loop:
LW R13, 0(R11) 4 ns
ADD R10, R10, R13 2 ns
ADDI R11, R11, 4 2 ns
BLT R11, R12, Loop 2 ns

Speed up = Performance of new/Performance of old

= Time of old (single cycle) / Time of new (multi cycle)

= 28 ns / 16 ns = 1.7 time

How did you get this performance?

At the cost of additional registers and nets
CPU Design
Multi-cycle CPU design:

Important points to note:

1 – The idea is to partition the data path in such a way that the cycle time
would be as minimum as possible (the smallest execution time of any instruction)

2 – The design would require more hardware resources than the corresponding single
cycle design.

3 – Each instruction would require multiple cycles to complete their execution.

4 – The multiple cycle design would support more diverse set of instruction in efficiently.
Where as in single cycle the diverse set of instruction would lead to performance loss.

5 – If the instruction set is uniform in terms of execution time it is certainly wise to implement
In single cycle. However, we will see next that this argument is not always true.

6 – Multi-cycle is the beginning of pipeline architecture.

7 – Partition design is one of the challenging problems since it require that each data path
be uniform in terms of path length (or propagation time delay).
CPU Design
Single cycle CPU design

Multi-cycle CPU design

How can we do better than the multi-cycle?

Pipeline Architecture
The basic idea is:

Parallelism
This is one of the architectural principle

When you do
google search

There are many

real life example
of pipeline
Pipeline Architecture
The basic idea is: Parallelism
Instructions Actual Time From starting to end an instruction
----------------------------------------------- has to travel through:
ADD R10, R0, R0 2 ns
ADD R11, R8, R8 2 ns 1 – Fetch from the instruction memory
ADDI R12, R11, 80 2 ns 2 – Decode to generate control signal
Loop: And read operand
LW R13, 0(R11) 4 ns 3 – Execute (ALU operations)
ADD R10, R10, R13 2 ns 4 – Memory operations if data to be
ADDI R11, R11, 4 2 ns read from or to be written into
BLT R11, R12, Loop 2 ns 5 – Writing back the results from
Stage 3
How do you execute these instruction Above micro operations are necessary.
in parallel?
Pipeline Architecture
The basic idea is: Parallelism
How do you execute these instruction parallely?

Example of very bad design:

1 – You can have multiple single cycle processor operating in parallel

2 – You can have multiple multi-cycle design in parallel
3 – You can have multi-clock multiple single cycle design

The proven idea of pipeline architecture:

The parallelism can be achieved by executing the micro operations of
more than one instructions in parallel.
This is one kind of instruction level parallelism (ILP)
Pipeline Architecture
The pipeline execution flow:
Stage delay → 0.5 ns 0.5 ns 0.5 ns 0.5 ns 0.5 ns

CC 1 INS 1
CC 2 INS 2 INS 1
CC 3 INS 3 INS 2 INS 1
CC 4 INS 4 INS 3 INS 2 INS 1
CC 5 INS 5 INS 4 INS 3 INS 2 INS 1
CC 6 INS 5 INS 4 INS 3 INS 2
CC 7 INS 5 INS 4 INS 3
CC 8 INS 5 INS 4
CC 9 INS 5
CC – clock cycle, INS1 – Instruction 1;
IF – instruction fetch, ID – instruction decode, EX – instruction execution,
MEM – Memory operation (load/store), WB – write back to register
Pipeline Architecture
Performance measurement:
0.5 ns 0.5 ns 0.5 ns 0.5 ns 0.5 ns

Assuming an ideal pipeline, what would be the performance improvement?

Ideal pipeline:
- No halt in any stage at any point of time
- All instructions get executed in free flow.

Example: n instructions, k stages, and stage delay of t ns

Total execution time = (k + (n – 1)) x t

k is due to the fact that the first instruction requires k cycle to complete,
remaining n – 1 instructions would be completed in followed by cycles.
Pipeline Architecture
Example Exercise:
Consider the unpipeline processor that has been discussed. Assume that it has
A 2 GHz clock (or a .5 ns clock cycle) and that it uses 4 cycles for ALU
operations and branches and five cycles for memory operations. Assume that
the relative frequencies of these operations are 40%, 20% and 40%, respectively.
suppose that due to clock skew and setup, pipelining the processor adds 0.1 ns
of overhead to the clock. Ignoring everything other latencies, how much speed
up in the instruction execution rate will be gained from pipeline?

Solution: Find out the average instruction execution time in un-pipeline

processor, and then in pipeline processor. Then calculate the speed up.

The clock cycle for pipeline would be 0.6 ns because 0.5 ns + 0.1 ns skew
Pipeline Architecture
Exercise on Performance Comparison:

Exercise 1:

1) What is the performance improvement due to ideal pipeline over the

single cycle and multi-cycle design?

2) What is the difference between multi-cycle design and pipeline architecture?

3) Imagine an ISA where all the instructions take exactly same time, let say t ns,
then what would be the performance comparision of single, multi and pipeline
processor?
Pipeline Architecture
Pipeline Processor Design:

Recall the multi-cycle design:

- It partitioned the data path by introducing additional registers, multiplexers

and interconnects (nets or wires).

On this design, if the control path is also partitioned, you will get a pipeline design.

Let us start from the micro-operations at each stages:

IF – instruction fetch:

IR ← Mem[PC] where IR is isntruction register, PC is program counter

NPC ← PC + 4 and NPC is next program counter temporary register.
Pipeline Architecture
Pipeline Processor Design:
Let us start from the micro-operations at each stages:

IF – instruction fetch:

IR ← Mem[PC] where IR is isntruction register, PC is program counter

NPC ← PC + 4 and NPC is next program counter temporary register.

ID – instruction decode

A ← Regs[rs] where rs is the source 1 and source 2 register

B ← Regs[rt] A and B are temporary registers
Imm ← Sign extended immediate field of IR, IR is instruction register

EX – Execute the ALU instruction or computer effective address

ALUout ← A + Imm Effective address, where Imm is immediate value in IR

ALUout ← A func B func is the funuction fied in ALU instruction
ALUout ← A op Imm op is operation field in immediate instruction
ALUout ← NPC + (Imm << 2) << is the left shift, offset addressing mode
Pipeline Architecture
Pipeline Processor Design:
Let us start from the micro-operations at each stages:

MEM – Memory Access and branch complete

PC ← NPC or
PC ← ALUout if condition is satified for branch instruction
LMD ← Mem[ALUout] or this is for load instruction
Mem[ALUout] ← B this is for store instruction

WB – Write back to registers

Reg[rs] ← ALUout for register type instructions

Reg[rt] ← ALUout
Reg[rt] ← LMD this is for laod instruction

Next, the processor is designed according to these operations!

Pipeline Architecture
Multi-cycle processor data path:
Pipeline Architecture
Pipeline view of the data path:

Note: The figure above does not show control path, the control signals would go to the
select line of muxes and to the ALU control. These signals are also pipelined.
Pipeline Architecture
Exercise:

1 – Complete the pipeline design of the MIPS architecture with control signals

2 - Complete the pipeline design of RISC-V architecture with control signals

3 - Find out the hardware overhead in comparison to single cycle and multi-cycle
design for MIPS architecture and RISC-V architecture.

4 – Calculate the exact stage delay for each stage

Assume that mux takes 0.2 ns, registers read or write takes 0.5 ns, ALU or
any other arithmetic operations take 0.4 ns, reading/writing to
data instruction memory takes 1 ns.
Reference:
1 – Chapter 4, Patterson and Henessy, Computer Organisation and Design, RISC-V edition or
MIPS edition.
2 – Appendix C, thHenessy and Patterson, Computer Architecture Quantitative Approach, 5th
Edition or 6 Edition

Next Lecture
Pipeline to continue: Hazards
Long Latency pipeline

ESL Brains - A house to feel at home in
No ratings yet
ESL Brains - A house to feel at home in
26 pages
COMP-unit-1
No ratings yet
COMP-unit-1
52 pages
Pipelining
No ratings yet
Pipelining
24 pages
onur-447-spring15-lecture6-multi-cycle-and-microprogrammed-microarchitectures-afterlecture
No ratings yet
onur-447-spring15-lecture6-multi-cycle-and-microprogrammed-microarchitectures-afterlecture
81 pages
onur-447-spring15-lecture7-pipelining-afterlecture
No ratings yet
onur-447-spring15-lecture7-pipelining-afterlecture
66 pages
Arch2 Microarchitecture Design Afterlecture
No ratings yet
Arch2 Microarchitecture Design Afterlecture
222 pages
DOTCOM PPTX
No ratings yet
DOTCOM PPTX
31 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
8 Pipeline Ddp Control
No ratings yet
8 Pipeline Ddp Control
54 pages
CA07_2022S3_new
No ratings yet
CA07_2022S3_new
29 pages
Hazards Slideshow
No ratings yet
Hazards Slideshow
72 pages
08 Perf Pipeline i
No ratings yet
08 Perf Pipeline i
65 pages
week6_performance_numericals
No ratings yet
week6_performance_numericals
38 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
32 pages
Slide 4
No ratings yet
Slide 4
51 pages
Job Description - HR Coordinator
No ratings yet
Job Description - HR Coordinator
2 pages
DSP_presentation_Sumit 5
No ratings yet
DSP_presentation_Sumit 5
45 pages
FALLSEM2024-25_CSI3021_TH_VL2024250101951_2024-07-19_Reference-Material-I
No ratings yet
FALLSEM2024-25_CSI3021_TH_VL2024250101951_2024-07-19_Reference-Material-I
21 pages
Lecture10 - chapter4-p2
No ratings yet
Lecture10 - chapter4-p2
46 pages
EC Chapter2 2014
No ratings yet
EC Chapter2 2014
88 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Chapter 12 Performance of Single-cycle and multi-cycle data path
No ratings yet
Chapter 12 Performance of Single-cycle and multi-cycle data path
27 pages
3-Pipelining_241110_203716
No ratings yet
3-Pipelining_241110_203716
59 pages
chapter4_2
No ratings yet
chapter4_2
34 pages
Lec_7_CSE-509_Pipelining_5a944f3dd357e191fc77502f92eb2be7
No ratings yet
Lec_7_CSE-509_Pipelining_5a944f3dd357e191fc77502f92eb2be7
27 pages
461 Assignment
No ratings yet
461 Assignment
52 pages
3.1-2 (1)
No ratings yet
3.1-2 (1)
8 pages
Lecture # 7.
No ratings yet
Lecture # 7.
26 pages
Ch#4 Part 1, 2,34
No ratings yet
Ch#4 Part 1, 2,34
70 pages
05 Instruction+Level+Parallelism
No ratings yet
05 Instruction+Level+Parallelism
11 pages
Contact Session 8
No ratings yet
Contact Session 8
63 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
NCM 209 Lec - Transes
No ratings yet
NCM 209 Lec - Transes
122 pages
single-cycle-vs-multi-cycle-cpu
No ratings yet
single-cycle-vs-multi-cycle-cpu
11 pages
Lecture 12
No ratings yet
Lecture 12
34 pages
Computer Architecture: Edited by Galatro Giovanni
No ratings yet
Computer Architecture: Edited by Galatro Giovanni
34 pages
Computer Architecture Note by Redwan (UptoMemorySystem)
100% (1)
Computer Architecture Note by Redwan (UptoMemorySystem)
64 pages
GrammarExtraCreditPunctuationVocabSpellingHomophonesandRevising 1
No ratings yet
GrammarExtraCreditPunctuationVocabSpellingHomophonesandRevising 1
13 pages
MIPS Report File
No ratings yet
MIPS Report File
17 pages
Parallel Computing Platforms-Dr Nausheen
No ratings yet
Parallel Computing Platforms-Dr Nausheen
47 pages
If-3113 Computer Network: Bab 2 Local Area Network
No ratings yet
If-3113 Computer Network: Bab 2 Local Area Network
52 pages
Resum (1) (3) Pro
No ratings yet
Resum (1) (3) Pro
16 pages
Blueprints in Neurology PDF
100% (1)
Blueprints in Neurology PDF
246 pages
Jamb Chemistry Syllabus
No ratings yet
Jamb Chemistry Syllabus
17 pages
Military - Arms & Accoutrements - Powder Horns American 1770 - 1785
100% (10)
Military - Arms & Accoutrements - Powder Horns American 1770 - 1785
165 pages
Reverberi Matika - Service Manual
No ratings yet
Reverberi Matika - Service Manual
31 pages
Acura TSX 2004 Manual and Power Steering, Steering Column, Gear and Linkage
No ratings yet
Acura TSX 2004 Manual and Power Steering, Steering Column, Gear and Linkage
144 pages
Question Bank Maths Class 10
No ratings yet
Question Bank Maths Class 10
73 pages
Spider-Man (2022)
No ratings yet
Spider-Man (2022)
120 pages
History in Math TOPIC 8
No ratings yet
History in Math TOPIC 8
62 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
Kajian Geometri Jalan Tambang Berdasarkan Aashto Dan Kepmen No 1827/K/30/Mem/2018 Pada Penambangan Andesit Di PT XYZ, Kecamatan Rumpin, Kabupaten Bogor, Provinsi Jawa Barat
No ratings yet
Kajian Geometri Jalan Tambang Berdasarkan Aashto Dan Kepmen No 1827/K/30/Mem/2018 Pada Penambangan Andesit Di PT XYZ, Kecamatan Rumpin, Kabupaten Bogor, Provinsi Jawa Barat
10 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
6 - Avaliação Bimestral de Inglês - 3 Bimestre
No ratings yet
6 - Avaliação Bimestral de Inglês - 3 Bimestre
3 pages
Advanced Computer Architecture: Pipelined Processor
No ratings yet
Advanced Computer Architecture: Pipelined Processor
20 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Airtel
No ratings yet
Airtel
15 pages
What Is The Most Boring Household Activity?
No ratings yet
What Is The Most Boring Household Activity?
27 pages
Designer's Notes: Napoleonic Brigade Series 3.0
No ratings yet
Designer's Notes: Napoleonic Brigade Series 3.0
15 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
L24 Pipeline
No ratings yet
L24 Pipeline
40 pages
Multi Cycle Processor
100% (1)
Multi Cycle Processor
26 pages
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
No ratings yet
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
28 pages
Report On Whatsapp
No ratings yet
Report On Whatsapp
21 pages
Arquitectura
No ratings yet
Arquitectura
14 pages
Folding Regular Polygons
100% (1)
Folding Regular Polygons
4 pages
Pipelining
No ratings yet
Pipelining
25 pages
PFT Form Updated
No ratings yet
PFT Form Updated
3 pages
CSO Lecture Notes Unit - 5
No ratings yet
CSO Lecture Notes Unit - 5
11 pages
Ambo University Waliso Campus: Dep:-Information Technology Group 8 It2 Year
No ratings yet
Ambo University Waliso Campus: Dep:-Information Technology Group 8 It2 Year
10 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
7/8" RADIAFLEX® RLKW Cable, A-Series
No ratings yet
7/8" RADIAFLEX® RLKW Cable, A-Series
2 pages
Blue Book of Electric Guitar Values - Electric Bass Electric Bass
No ratings yet
Blue Book of Electric Guitar Values - Electric Bass Electric Bass
1 page
Electronic Archiving System Report
No ratings yet
Electronic Archiving System Report
6 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
Allmand NL Pro Lite
No ratings yet
Allmand NL Pro Lite
32 pages
Death Note Manga - Google Search
No ratings yet
Death Note Manga - Google Search
6 pages
Problem 4.11: Which Would You Expect To Be More Stable: (CH) C or (CF) C ? Why?
No ratings yet
Problem 4.11: Which Would You Expect To Be More Stable: (CH) C or (CF) C ? Why?
2 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
MST - Earthing and Bonding
100% (2)
MST - Earthing and Bonding
4 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Medical Gases Calculations
No ratings yet
Medical Gases Calculations
14 pages
Dra Aft Surve Ey: Proc Cedures and Cal Lculation N: Readi Ing The Draf Ftmark of TH He Ship
No ratings yet
Dra Aft Surve Ey: Proc Cedures and Cal Lculation N: Readi Ing The Draf Ftmark of TH He Ship
4 pages
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
From Everand
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
Rodrigo Copetti
No ratings yet
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
From Everand
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
Rodrigo Copetti
No ratings yet
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
From Everand
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
Rodrigo Copetti
No ratings yet

Lecture13 Pipeline1

Uploaded by

Lecture13 Pipeline1

Uploaded by

Instruction Level Parallelism

Indian Institute of Technology Tirupati

Computer System Architecture (CS5202)

Recall (Quantitative Principles):

- Get Advantage of Parallelism (Processor Design)

Single cycle CPU design: All the micro-operations of a given instruction

Instructions Actual Time

1) Each instruction take only one cycle to complete execution.

3) All these micro-operations are to be performed in just one cycle.

We need to think in terms of performance gain and loss!

Instructions Actual Time

For single cycle CPU

For multi-cycle CPU

The idea is to partition the data path and buses

Instructions Actual Time

Speed up = Performance of new/Performance of old

= Time of old (single cycle) / Time of new (multi cycle)

How did you get this performance?

Important points to note:

3 – Each instruction would require multiple cycles to complete their execution.

6 – Multi-cycle is the beginning of pipeline architecture.

Multi-cycle CPU design

How can we do better than the multi-cycle?

There are many

Example of very bad design:

1 – You can have multiple single cycle processor operating in parallel

The proven idea of pipeline architecture:

Assuming an ideal pipeline, what would be the performance improvement?

Example: n instructions, k stages, and stage delay of t ns

Total execution time = (k + (n – 1)) x t

Solution: Find out the average instruction execution time in un-pipeline

1) What is the performance improvement due to ideal pipeline over the

2) What is the difference between multi-cycle design and pipeline architecture?

Recall the multi-cycle design:

- It partitioned the data path by introducing additional registers, multiplexers

Let us start from the micro-operations at each stages:

IR ← Mem[PC] where IR is isntruction register, PC is program counter

IR ← Mem[PC] where IR is isntruction register, PC is program counter

A ← Regs[rs] where rs is the source 1 and source 2 register

EX – Execute the ALU instruction or computer effective address

ALUout ← A + Imm Effective address, where Imm is immediate value in IR

MEM – Memory Access and branch complete

WB – Write back to registers

Reg[rs] ← ALUout for register type instructions

Next, the processor is designed according to these operations!

2 - Complete the pipeline design of RISC-V architecture with control signals

4 – Calculate the exact stage delay for each stage

You might also like