0% found this document useful (0 votes)
7 views18 pages

Lecture 9

Chapter 4 of the document discusses the processor architecture, focusing on R-Type and load instructions, as well as the CBZ instruction. It emphasizes the importance of pipelining to improve performance by overlapping execution stages, detailing the LEGv8 pipeline with five stages and comparing pipelined and single-cycle datapaths. The chapter also highlights how the LEGv8 ISA is designed for efficient pipelining with uniform instruction sizes and formats.

Uploaded by

iosama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Lecture 9

Chapter 4 of the document discusses the processor architecture, focusing on R-Type and load instructions, as well as the CBZ instruction. It emphasizes the importance of pipelining to improve performance by overlapping execution stages, detailing the LEGv8 pipeline with five stages and comparing pipelined and single-cycle datapaths. The chapter also highlights how the LEGv8 ISA is designed for efficient pipelining with uniform instruction sizes and formats.

Uploaded by

iosama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

COMPUTER ORGANIZATION AND DESIGN ARM

Edition
The Hardware/Software Interface

Chapter 4
The Processor
R-Type Instruction

Chapter 4 — The Processor — 2


Example

Chapter 4 — The Processor — 3


Example

Chapter 4 — The Processor — 4


Load Instruction

Chapter 4 — The Processor — 5


Example

Chapter 4 — The Processor — 6


Example

Chapter 4 — The Processor — 7


CBZ Instruction

Chapter 4 — The Processor — 8


Example

Chapter 4 — The Processor — 9


Example

Chapter 4 — The Processor — 10


Control Summary

Chapter 4 — The Processor — 11


Performance Issues
 Longest delay determines clock period
 Critical path: load instruction
 Instruction memory  register file  ALU 
data memory  register file
 Not feasible to vary period for different
instructions
 Violates design principle
 Making the common case fast
 We will improve performance by pipelining

Chapter 4 — The Processor — 12


§4.5 An Overview of Pipelining
Pipelining Analogy
 Pipelined laundry: overlapping execution
 Parallelism improves performance

 Four loads:
 Speedup
= 8/3.5 = 2.3
 Non-stop:
 Speedup
= 2n/0.5n + 1.5 ≈ 4
= number of stages

Chapter 4 — The Processor — 13


LEGv8 Pipeline
 Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register

Chapter 4 — The Processor — 14


Pipeline Performance
 Assume time for stages is
 100ps for register read or write
 200ps for other stages
 Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register ALU op Memory Register Total time


read access write
LDUR 200ps 100 ps 200ps 200ps 100 ps 800ps
STUR 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
CBZ 200ps 100 ps 200ps 500ps

Chapter 4 — The Processor — 15


Pipeline Performance
Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Chapter 4 — The Processor — 16


Pipeline Speedup
 If all stages are balanced
 i.e., all take the same time
 Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
 If not balanced, speedup is less
 Speedup due to increased throughput
 Latency (time for each instruction) does not
decrease

Chapter 4 — The Processor — 17


Pipelining and ISA Design
 LEGv8 ISA designed for pipelining
 All instructions are 32-bits
 Easier to fetch and decode in one cycle
 c.f. x86: 1- to 17-byte instructions
 Few and regular instruction formats
 Can decode and read registers in one step
 Load/store addressing
 Can calculate address in 3rd stage, access memory
in 4th stage
 Alignment of memory operands
 Memory access takes only one cycle

Chapter 4 — The Processor — 18

You might also like