0% found this document useful (0 votes)

20 views27 pages

Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path

The document discusses the performance and limitations of a single-cycle implementation of a MIPS-based instruction set, highlighting that while it is straightforward, it is inefficient as all instructions must complete within the time of the slowest instruction. It suggests that a multicycle approach can improve performance by allowing faster instructions to execute without being delayed by slower ones, thereby reducing the overall clock cycle time and hardware duplication. The document also outlines the proposed execution stages and benefits of a multicycle design, including cost savings and enhanced efficiency.

Uploaded by

hung.lq05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views27 pages

Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path

Uploaded by

hung.lq05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Performance of Single-Cycle implementation

 We just saw a single-cycle datapath and control unit for our simple MIPS-
based instruction set.
 Single-cycle implementation: An implementation in which every
instruction is executed in one clock cycle. While easy to understand, it is
too slow to be practical.

1
The single-cycle design again…

0 A control unit (not

M shown) generates all
Add u
PC
x the control signals
4 Add 1 from the instruction’s
Shift
left 2 “op” and “func” fields.
PCSrc
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

2
The example add from last time
 Consider the instruction add $s4, $t1, $t2.

000000 01001 01010 10100 00000 100000

op rs rt rd shamt func

 Assume $t1 and $t2 initially contain 1 and 2 respectively.

 Executing this instruction involves several steps.
1. The instruction word is read from the instruction memory, and the
program counter is incremented by 4.
2. The sources $t1 and $t2 are read from the register file.
3. The values 1 and 2 are added by the ALU.
4. The result (3) is stored back into $s4 in the register file.

3
How the add goes through the datapath
PC+4

0
M
Add u
x
PC 4 Add 1
Shift
left 2
PCSrc
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21] 01001
Read Read 00...01
address [31-0]
register 1 data 1
ALU Address Read 1
I [20 - 16] 01010
Read Zero data M
Instruction 00...10
register 2 Read 0 u
memory 0 Result
data 2 M x
M Write
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
10100
MemRead
ALUSrc
RegDst
I [15 - 0] Sign 00...11
extend

4
The slowest instruction...
 If all instructions must complete within one clock cycle, then the cycle
time has to be large enough to accommodate the slowest instruction.
 For example, lw $t0, –4($sp) needs 8ns, assuming the delays shown here.
reading the instruction memory 2ns
reading the base register $sp 1ns
computing memory address $sp-4 2ns 8ns
reading the data memory 2ns
storing data back to $t0 1ns
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
2 ns x Registers 2 ns memory
I [15 - 11] Write 1 data 0 ns
1 data
0 ns 2 ns
0 ns 1 ns
I [15 - 0] Sign
extend
0 ns

5
...determines the clock cycle time
 If we make the cycle time 8ns then every instruction will take 8ns, even
if they don’t need that much time => clock rate = 125 MHz
 For example, the instruction add $s4, $t1, $t2 really needs just 6ns.

reading the instruction memory 2ns

reading registers $t1 and $t2 1ns
6ns
computing $t1 + $t2 2ns
storing the result into $s0 1ns

Read Instruction I [25 - 21]

Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
2 ns x Registers 2 ns memory
I [15 - 11] Write 1 data 0 ns
1 data
0 ns 2 ns
0 ns 1 ns
I [15 - 0] Sign
extend
0 ns

6
How bad is this?
 With these same component delays, a sw instruction would need 7ns, and
beq would need just 5ns.
 Let’s consider the gcc program.

Instruction Frequency
Arithmetic 48%
Loads 22%
Stores 11%
Branches 19%

 With a single-cycle datapath, each instruction would require 8ns.

 But if we could execute instructions as fast as possible, the average time
per instruction for gcc would be:

(48% x 6ns) + (22% x 8ns) + (11% x 7ns) + (19% x 5ns) = 6.36ns

 The single-cycle datapath is about 1.26 times slower!

7
Disadvantage of single-cycle implementation
 The clock cycle will be determined by the longest possible path, which is
not the most common instruction. This type of implementation violates
the idea of making the common case fast.
 May be wasteful with respect to area since some functional units, such as
adders, must be duplicated since they cannot be shared during a single
clock cycle.
 This is also why we used a Harvard architecture with two memories; you
can’t easily read two addresses from the same memory in one cycle.

 Example:
— We’ve made very optimistic assumptions about memory latency:
• Main memory accesses on modern machines is >50ns.
For comparison, an ALU on the Pentium4 takes ~0.3ns.
— Our worst case cycle (loads/stores) includes 2 memory accesses
• A modern single cycle implementation would be stuck at <10Mhz.
• Caches will improve common case access time, not worst case.

8
A multistage approach to instruction execution
 A multicycle implementation fixes some shortcomings in the single-cycle
implementation.
— Faster instructions are not held back by slower ones.
— The clock cycle time can be decreased.
— We don’t have to duplicate any hardware units.
 A multicycle processor requires a somewhat simpler datapath which we’ll
see today, but a more complex control unit that we’ll see later.

 We’ve informally described instructions as executing in several steps.

1. Instruction fetch and PC increment.

2. Reading sources from the register file.
3. Performing an ALU computation.
4. Reading or writing (data) memory.
5. Storing data back to the register file.

 What if we made these stages explicit in the hardware design?

9
Performance benefits
 Each instruction can execute only the stages that are necessary.
— Arithmetic : 1 2, 3, 5
— Load: 1, 2, 3, 4, 5
— Store: 1, 2, 3, 4
— Branches: 1, 2, 3

 This would mean that instructions complete as soon as possible, instead

of being limited by the slowest instruction.

Proposed execution stages

1. Instruction fetch and PC increment
2. Reading sources from the register file
3. Performing an ALU computation
4. Reading or writing (data) memory
5. Storing data back to the register file

10
The clock cycle
 Things are simpler if we assume that each “stage” takes one clock cycle.
— This means instructions will require multiple clock cycles to execute.
— But since a single stage is fairly simple, the cycle time can be low.
 For the proposed execution stages below and the sample datapath delays
shown earlier, each stage needs 2ns at most.
— This accounts for the slowest devices, the ALU and data memory.
— A 2ns clock cycle time corresponds to a 500MHz clock rate!

Proposed execution stages

1. Instruction fetch and PC increment
2. Reading sources from the register file
3. Performing an ALU computation
4. Reading or writing (data) memory
5. Storing data back to the register file

11
Cost benefits
 As an added bonus, we can eliminate some of the extra hardware from
the single-cycle datapath.
— We will restrict ourselves to using each functional unit once per cycle,
just like before.
— But since instructions require multiple cycles, we could reuse some
units in a different cycle during the execution of a single instruction.
 For example, we could use the same ALU:
— to increment the PC (first clock cycle), and
— for arithmetic operations (third clock cycle).

Proposed execution stages

1. Instruction fetch and PC increment
2. Reading sources from the register file
3. Performing an ALU computation
4. Reading or writing (data) memory
5. Storing data back to the register file

12
Two extra adders

 Our original single-cycle datapath had an ALU and two adders.

 The arithmetic-logic unit had two responsibilities.
— Doing an operation on two registers for arithmetic instructions.
— Adding a register to a sign-extended constant, to compute effective
addresses for lw and sw instructions.
 One of the extra adders incremented the PC by computing PC + 4.
 The other adder computed branch targets, by adding a sign-extended,
shifted offset to (PC + 4).

13
The extra single-cycle adders

0
M
Add u
x
PC 4 Add 1
Shift
left 2
PCSrc
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

14
Our new adder setup
 We can eliminate both extra adders in a multicycle datapath, and instead
use just one ALU, with multiplexers to select the proper inputs.
 A 2-to-1 mux ALUSrcA sets the first ALU input to be the PC or a register.
 A 4-to-1 mux ALUSrcB selects the second ALU input from among:
— the register file (for arithmetic operations),
— a constant 4 (to increment the PC),
— a sign-extended constant (for effective addresses), and
— a sign-extended and shifted constant (for branch targets).
 This permits a single ALU to perform all of the necessary functions.
— Arithmetic operations on two register operands.
— Incrementing the PC.
— Computing effective addresses for lw and sw.
— Adding a sign-extended, shifted offset to (PC + 4) for branches.

15
The multicycle adder setup highlighted
PCWrite

PC ALUSrcA
IorD MemRead
0
RegDst RegWrite M
0 Address u
Read Read x
M ALU
u register 1 data 1 1
Memory Zero
x
Read Result
1 register 2 Read
Write Mem 0 0
data 2
data Data M Write 4 1
u register
2 ALUOp
x
MemWrite Write
1 Registers 3
data

0 ALUSrcB
M
u Sign Shift
x extend left 2
1

MemToReg

16
Eliminating a memory
 Similarly, we can get by with one unified memory, which will store both
program instructions and data. (a Princeton architecture)
 This memory is used in both the instruction fetch and data access stages,
and the address could come from either:
— the PC register (when we’re fetching an instruction), or
— the ALU output (for the effective address of a lw or sw).
 We add another 2-to-1 mux, IorD, to decide whether the memory is being
accessed for instructions or for data.

Proposed execution stages

1. Instruction fetch and PC increment
2. Reading sources from the register file
3. Performing an ALU computation
4. Reading or writing (data) memory
5. Storing data back to the register file

17
The new memory setup highlighted
PCWrite

PC ALUSrcA
IorD MemRead
0
RegDst RegWrite M
0 Address u
x
M Read Read
1 ALU
u register 1 data 1
Memory Zero
x
Read Result
1 register 2 Read
Write Mem 0 0
data 2
data Data M Write 4 1
u register
2 ALUOp
x
MemWrite Write
1 Registers 3
data

0 ALUSrcB
M
u Sign Shift
x extend left 2
1

MemToReg

18
Intermediate registers
 Sometimes we need the output of a functional unit in a later clock cycle
during the execution of one instruction.
— The instruction word fetched in stage 1 determines the destination of
the register write in stage 5.
— The ALU result for an address computation in stage 3 is needed as the
memory address for lw or sw in stage 4.
 These outputs will have to be stored in intermediate registers for future
use. Otherwise they would probably be lost by the next clock cycle.
— The instruction read in stage 1 is saved in Instruction register.
— Register file outputs from stage 2 are saved in registers A and B.
— The ALU output will be stored in a register ALUOut.
— Any data fetched from memory in stage 4 is kept in the Memory data
register, also called MDR.

19
The final multicycle datapath
PCWrite

PC ALUSrcA
IorD
0
RegDst RegWrite M
MemRead u
0 0
x
M Read Read M
A 1 ALU
u Address register 1 data 1 u
Zero
x x
Read ALU
1 IRWrite Result 1
Memory register 2 Read B Out
0 data 2 0
[31-26] M Write 4 1 PCSource
Write Mem u register
[25-21] 2 ALUOp
data Data x
[20-16] Write
1 Registers 3
[15-11] data
MemWrite [15-0]
Instruction 0 ALUSrcB
register M
u Sign Shift
Memory x extend left 2
data 1
register

MemToReg

20
Register write control signals
 We have to add a few more control signals to the datapath.
 Since instructions now take a variable number of cycles to execute, we
cannot update the PC on each cycle.
— Instead, a PCWrite signal controls the loading of the PC.
— The instruction register also has a write signal, IRWrite. We need to
keep the instruction word for the duration of its execution, and must
explicitly re-load the instruction register when needed.
 The other intermediate registers, MDR, A, B and ALUOut, will store data
for only one clock cycle at most, and do not need write control signals.

 The control unit must generate sequences of control signals in multicycle

implementation and it is very complicated so we skip this part.

21
The single-cycle datapath; what is the cycle time?

0
M
Add u
x
PC 4
Add 1
Shift
left 2
PCSrc 2ns
2ns 1ns
RegWrite
2ns MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

Clock rate = 125 MHz 22

Performance of a multicycle implementation
 Let’s assume the following delays for the major functional units.

1ns
M
u
2ns
0 2ns Read Read
x
0
M A ALU M
u Address register 1 data 1 1 u
Zero
x x
Read ALU
1 Result 1
Memory register 2 Read B Out
0 data 2 0
[31-26] M Write 4 1
Write Mem u register
[25-21] 2
data Data x
[20-16] Write
1 Registers 3
[15-11] data
[15-0]
Instruction 0
register M
u Sign Shift
Memory x extend left 2
data 1
register

23
Comparing cycle times
 The clock period has to be long enough to allow all of the required work
to complete within the cycle.
 In the single-cycle datapath, the “required work” was just the complete
execution of any instruction.
— The longest instruction, lw, requires 8ns.
— So the clock cycle time has to be 8ns, for a 125MHz clock rate.
 For the multicycle datapath, the “required work” is only a single stage.
— The longest delay is 2ns, for both the ALU and the memory.
— So our cycle time has to be 2ns, or a clock rate of 500MHz.
— The register file needs only 1ns, but it must wait an extra 1ns to stay
synchronized with the other functional units.
 The single-cycle cycle time is limited by the slowest instruction, whereas
the multicycle cycle time is limited by the slowest functional unit.

24
Comparing instruction execution times
 In the single-cycle datapath, each instruction needs an entire clock cycle,
or 8ns, to execute.
 With the multicycle CPU, different instructions need different numbers of
clock cycles, and hence different amounts of time.
— A branch needs 3 cycles, or 3 x 2ns = 6ns.
— Arithmetic and sw instructions each require 4 cycles, or 8ns.
— Finally, a lw takes 5 stages, or 10ns.
 We can make some observations about performance already.
— Loads take longer with this multicycle implementation, while all other
instructions are faster than before.
— So if our program doesn’t have too many loads, then we should see an
increase in performance.

25
The gcc example
 Let’s assume the gcc program.

Instruction Frequency
Arithmetic 48%
Loads 22%
Stores 11%
Branches 19%

 In a single-cycle datapath, all instructions take 8ns to execute.

 The average execution time for an instruction on the multicycle processor
works out to 8.06ns.

(48% x 8ns) + (22% x 10ns) + (11% x 8ns) + (19% x 6ns) = 8.06ns

 The multicycle implementation is slower in this case

 If a program with Branches is 22% and Loads is 19%, what is the result?

26
Summary
 A single-cycle CPU has two main disadvantages.
— The cycle time is limited by the worst case latency.
— It requires more hardware than necessary.
 A multicycle processor splits instruction execution into several stages.
— Instructions only execute as many stages as required.
— Each stage is relatively simple, so the clock cycle time is reduced.
— Functional units can be reused on different cycles.
 We made several modifications to the single-cycle datapath.
— The two extra adders and one memory were removed.
— Multiplexers were inserted so the ALU and memory can be used for
different purposes in different execution stages.
— New registers are needed to store intermediate results.
 We will look at the pipeline approach for datapath in next week

Account Allocation Sheet
No ratings yet
Account Allocation Sheet
22 pages
Lecture 12
No ratings yet
Lecture 12
34 pages
The Final Datapath: Add M U X
No ratings yet
The Final Datapath: Add M U X
32 pages
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
No ratings yet
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
11 pages
Ch#4 Part 1, 2,34
No ratings yet
Ch#4 Part 1, 2,34
70 pages
Single Cycle Vs Multi Cycle Cpu
No ratings yet
Single Cycle Vs Multi Cycle Cpu
11 pages
Multipath 1 Notes
No ratings yet
Multipath 1 Notes
37 pages
Comparch 04
No ratings yet
Comparch 04
73 pages
Computer Organization & Assembly Language: CS/COE0447
No ratings yet
Computer Organization & Assembly Language: CS/COE0447
82 pages
Week6 Performance Numericals
No ratings yet
Week6 Performance Numericals
38 pages
FALLSEM2024-25 CSI3021 TH VL2024250101951 2024-07-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 CSI3021 TH VL2024250101951 2024-07-19 Reference-Material-I
21 pages
Single Cycle
No ratings yet
Single Cycle
28 pages
Lecture 4 8405 Computer Architecture
No ratings yet
Lecture 4 8405 Computer Architecture
15 pages
Lecture 16: Basic CPU Design
No ratings yet
Lecture 16: Basic CPU Design
20 pages
DigitalLogic ComputerOrganization L15 SingleCycleProcessor Handout
No ratings yet
DigitalLogic ComputerOrganization L15 SingleCycleProcessor Handout
22 pages
Lecture 11
No ratings yet
Lecture 11
37 pages
21 Architecture MultiCycle PDF
No ratings yet
21 Architecture MultiCycle PDF
50 pages
Arch2 Microarchitecture Design Afterlecture
No ratings yet
Arch2 Microarchitecture Design Afterlecture
222 pages
461 Assignment
No ratings yet
461 Assignment
52 pages
4 The Processors
No ratings yet
4 The Processors
112 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
No ratings yet
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
28 pages
Mips Datapath
No ratings yet
Mips Datapath
23 pages
Lecture10 - Chapter4-P2
No ratings yet
Lecture10 - Chapter4-P2
46 pages
ELEN 350 Single Cycle Datapath: Adapted From The Lecture Notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
No ratings yet
ELEN 350 Single Cycle Datapath: Adapted From The Lecture Notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
61 pages
Lecture08 RISCV Impl 2
No ratings yet
Lecture08 RISCV Impl 2
55 pages
Pipelining
No ratings yet
Pipelining
25 pages
Module 4 Ktunotes - in Min
No ratings yet
Module 4 Ktunotes - in Min
11 pages
Computer Architecture Note by Redwan (UptoMemorySystem)
100% (1)
Computer Architecture Note by Redwan (UptoMemorySystem)
64 pages
Slide 3
No ratings yet
Slide 3
65 pages
5 Singlecycle
No ratings yet
5 Singlecycle
60 pages
Lecture 13
No ratings yet
Lecture 13
114 pages
The Processor: (Datapath and Pipelining)
No ratings yet
The Processor: (Datapath and Pipelining)
144 pages
Computer Architecture: Trần Trọng Hiếu
No ratings yet
Computer Architecture: Trần Trọng Hiếu
29 pages
KAIST cs311 05 Proc I
No ratings yet
KAIST cs311 05 Proc I
28 pages
Module 3 Animated Single Cycle and Multi Cycle Data Path
No ratings yet
Module 3 Animated Single Cycle and Multi Cycle Data Path
29 pages
Lec5b-Singlecycle - Datapath
No ratings yet
Lec5b-Singlecycle - Datapath
35 pages
Computer Architecture: CSCE 350
No ratings yet
Computer Architecture: CSCE 350
41 pages
Multi Cycle PDF
No ratings yet
Multi Cycle PDF
16 pages
Single Cycle Datapath PDF
No ratings yet
Single Cycle Datapath PDF
30 pages
6multicycle Datapath
No ratings yet
6multicycle Datapath
11 pages
RISC Processor Design: Multi-Cycle Cycle Implementation: Mips
No ratings yet
RISC Processor Design: Multi-Cycle Cycle Implementation: Mips
49 pages
Single Cycle Processor Design: Computer Architecture and Assembly Language
No ratings yet
Single Cycle Processor Design: Computer Architecture and Assembly Language
24 pages
08 Cse333
No ratings yet
08 Cse333
6 pages
Lec5a Singlecycle Control
No ratings yet
Lec5a Singlecycle Control
38 pages
It3030e CA Chap5 Cpu p1
No ratings yet
It3030e CA Chap5 Cpu p1
62 pages
Lec 7 CSE-509 Pipelining
No ratings yet
Lec 7 CSE-509 Pipelining
27 pages
Slide 5
No ratings yet
Slide 5
31 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Pipelining
No ratings yet
Pipelining
24 pages
Processor DP Control
No ratings yet
Processor DP Control
44 pages
Multi-Cycle MIPS Processor
No ratings yet
Multi-Cycle MIPS Processor
36 pages
Aula Ch4 1
No ratings yet
Aula Ch4 1
39 pages
Multi Cycle2
No ratings yet
Multi Cycle2
54 pages
Chapter # 4: The Processor: Course Instructor: Dr. Afshan Jamil Lecture # 9
No ratings yet
Chapter # 4: The Processor: Course Instructor: Dr. Afshan Jamil Lecture # 9
18 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
DigitalLogic ComputerOrganization L16 SingleCycleProcessorP2 Handout
No ratings yet
DigitalLogic ComputerOrganization L16 SingleCycleProcessorP2 Handout
35 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Multi Cycle Data Path
No ratings yet
Multi Cycle Data Path
21 pages
Fonduri Europene Digitalizare
No ratings yet
Fonduri Europene Digitalizare
4 pages
Abbey Road Keyboards Info
No ratings yet
Abbey Road Keyboards Info
1 page
Shallco Light Panel
No ratings yet
Shallco Light Panel
1 page
Asgore V2
No ratings yet
Asgore V2
29 pages
Dementia R7
No ratings yet
Dementia R7
1 page
Milagrow IMap10.0 User Manual (Page 24 of 29) M
No ratings yet
Milagrow IMap10.0 User Manual (Page 24 of 29) M
1 page
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
No ratings yet
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
11 pages
ITN260
No ratings yet
ITN260
7 pages
Currency Tally Spreadsheet V4f Instructions 4f
100% (1)
Currency Tally Spreadsheet V4f Instructions 4f
7 pages
Engine Speed Circuit Fault
No ratings yet
Engine Speed Circuit Fault
7 pages
Sinamics: 1FK7 Synchronous Motors Sinamics S120
No ratings yet
Sinamics: 1FK7 Synchronous Motors Sinamics S120
252 pages
Stata Finite Mixture Models Reference Manual: Release 16
No ratings yet
Stata Finite Mixture Models Reference Manual: Release 16
138 pages
Linux Imp Topics
No ratings yet
Linux Imp Topics
29 pages
Ensemble Saas Mano
No ratings yet
Ensemble Saas Mano
4 pages
Lastexception 63869526642
No ratings yet
Lastexception 63869526642
2 pages
CCS0007 - Laboratory Exercise 3
No ratings yet
CCS0007 - Laboratory Exercise 3
17 pages
Technical Seminar
100% (1)
Technical Seminar
21 pages
Quiz (FSC200 FSG L2) - Attempt Review2
100% (1)
Quiz (FSC200 FSG L2) - Attempt Review2
11 pages
EdgeWise Structure Guide
No ratings yet
EdgeWise Structure Guide
19 pages
Aruba Os
No ratings yet
Aruba Os
22 pages
Introduccion A Spark
No ratings yet
Introduccion A Spark
22 pages
Originally Equivalent Replacement IC in Laptop
No ratings yet
Originally Equivalent Replacement IC in Laptop
11 pages
Anytone AT-D578UV User
No ratings yet
Anytone AT-D578UV User
38 pages
PPSC Unit 3 Lesson Notes
No ratings yet
PPSC Unit 3 Lesson Notes
14 pages
Novel AI Applications in The Energy Sector ECCNECT2024VLVP0101 Final Report June 2025 06anUmmiFaybCQULiJc3s2yh1U 117970
No ratings yet
Novel AI Applications in The Energy Sector ECCNECT2024VLVP0101 Final Report June 2025 06anUmmiFaybCQULiJc3s2yh1U 117970
35 pages
Digital Therapeutics Apps On Prescription
No ratings yet
Digital Therapeutics Apps On Prescription
12 pages
Covidien RFA Vein Operator Manual
No ratings yet
Covidien RFA Vein Operator Manual
99 pages
Gmail - Booking Confirmation On IRCTC, Train - 22643, 06-Sep-2022, SL, TUP - PNBE
No ratings yet
Gmail - Booking Confirmation On IRCTC, Train - 22643, 06-Sep-2022, SL, TUP - PNBE
1 page
4 - General Ledger Accounting
No ratings yet
4 - General Ledger Accounting
5 pages

Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path

Uploaded by

Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path

Uploaded by

Performance of Single-Cycle implementation

0 A control unit (not

000000 01001 01010 10100 00000 100000

 Assume $t1 and $t2 initially contain 1 and 2 respectively.

reading the instruction memory 2ns

Read Instruction I [25 - 21]

 With a single-cycle datapath, each instruction would require 8ns.

(48% x 6ns) + (22% x 8ns) + (11% x 7ns) + (19% x 5ns) = 6.36ns

 The single-cycle datapath is about 1.26 times slower!

 We’ve informally described instructions as executing in several steps.

1. Instruction fetch and PC increment.

 What if we made these stages explicit in the hardware design?

 This would mean that instructions complete as soon as possible, instead

Proposed execution stages

Proposed execution stages

Proposed execution stages

 Our original single-cycle datapath had an ALU and two adders.

Proposed execution stages

 The control unit must generate sequences of control signals in multicycle

Clock rate = 125 MHz 22

 In a single-cycle datapath, all instructions take 8ns to execute.

(48% x 8ns) + (22% x 10ns) + (11% x 8ns) + (19% x 6ns) = 8.06ns

 The multicycle implementation is slower in this case

You might also like