0% found this document useful (0 votes)
16 views162 pages

Unit 1

The document provides an overview of computer architecture and organization, highlighting the differences between the two concepts, with architecture focusing on functional design and organization on structural implementation. It discusses various architectural models such as Von Neumann and Harvard, as well as instruction set architecture (ISA) and microarchitecture, detailing their components and characteristics. Additionally, it introduces RISC-V as an open-source ISA, emphasizing its significance in modern computing.

Uploaded by

aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views162 pages

Unit 1

The document provides an overview of computer architecture and organization, highlighting the differences between the two concepts, with architecture focusing on functional design and organization on structural implementation. It discusses various architectural models such as Von Neumann and Harvard, as well as instruction set architecture (ISA) and microarchitecture, detailing their components and characteristics. Additionally, it introduces RISC-V as an open-source ISA, emphasizing its significance in modern computing.

Uploaded by

aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 162

COMPUTER ARCHITECTURE

21EC74H6
Dr. P.N.JAYANTHI
Asst.Prof, Dept. of ECE,
RVCE.

1
SYLLABUS

2
References

3
Differences between Computer
Architecture & Computer Organization
 Computer Architecture is a functional description of
requirements and design implementation for the various parts
of a computer.
 It deals with the functional behavior of computer systems. It
comes before the computer organization while designing a
computer.
 Computer architecture refers to the design of the internal
workings of a computer system, including the CPU, memory,
and other hardware components.
 It involves decisions about the organization of the hardware,
such as the instruction set architecture, the data path design,
and the control unit design. Computer architecture is concerned
with optimizing the performance of a computer system and
ensuring that it can execute instructions quickly and efficiently.
 Architecture describes what the computer does.

4
Advantages of Computer
Architecture
 Performance Optimization: Proper architectural design is
known to advance the efficiency of a system by a large
percentage.
 Flexibility: This results in the capability to adapt and
incorporate new technologies as well as accommodate the
different hardware components.
 Scalability: Plans should be in a way such that there is
provision made for future expansion of a building or accretion.
Disadvantages of Computer Architecture:
 Complexity: It may be challenging and huge task which
involves time consuming on the flow of design and
optimization.
 Cost: High-performance architectures need deluxe equipment
and parts sometimes that is why they are more expensive.

5
Computer Organization
 Computer Organization comes after the
decision of Computer Architecture first.
Computer Organization is how operational
attributes are linked together and contribute to
realizing the architectural specification.
Computer Organization deals with a structural
relationship.
The organization describes how it does it.

6
Advantages of Computer Organization
 Practical Implementation: It offers a perfect account
of the physical layout of the computer system.
 Cost Efficiency: Organization can help in avoiding
wastage of resources hence resulting in a reduction in
costs.
 Reliability: Organization helps in guaranteeing that
similar work produces similar favorable results.
Disadvantages of Computer Organization
 Hardware Limitations: The physical components that
are available installation in the organization also
defines the systems that can be implemented hence
limiting the performance.
 Less Flexibility: The organization however is a lot
more well defined and less amicable or easy to change
once set in its done so.

7
Computer System

8
COMPUTER ARCHITECTURE COMPUTER ORGANIZATION

Architecture describes what the computer does. The Organization describes how it does it.

Computer Architecture deals with the functional behavior Computer Organization deals with a structural
of computer systems. relationship.

In the above figure, it’s clear that it deals with high-level In the above figure, it’s also clear that it deals with low-
design issues. level design issues.

Architecture indicates its hardware. Whereas Organization indicates its performance.

As a programmer, you can view architecture as a series of The implementation of the architecture is called
instructions, addressing modes, and registers. organization.

For designing a computer, an organization is decided after


For designing a computer, its architecture is fixed first.
its architecture.

Computer Architecture is also called Instruction Set Computer Organization is frequently called
Architecture (ISA). microarchitecture.

Computer Architecture comprises logical functions such as


Computer Organization consists of physical units like
instruction sets, registers, data types, and
circuit designs, peripherals, and adders.
addressing modes.

The different architectural categories found in our


computer systems are as follows: CPU organization is classified into three categories based
Von-Neumann Architecture on the number of address fields:
Harvard Architecture Organization of a single Accumulator.
Instruction Set Architecture Organization of general registers
Micro-architecture Stack organization
System Design

It makes the computer’s hardware visible. It offers details on how well the computer performs.

Architecture coordinates the hardware and software of the Computer Organization handles the segments of the
system. network in
9
RISC vs CISC

10
Load-Store architecture
The Load-Store architecture is a type of
CPU design that differentiates it from other
architectures, like the CISC.
It is commonly associated with RISC
architecture, where instructions are
streamlined and optimized for efficiency.
Load instruction brings data from
memory into a CPU register.
Store instruction transfers data from a
CPU register to memory.

11
Load-Store architecture
Separation of Memory and ALU Operations:
 In Load-Store architectures, the CPU separates
memory access instructions from computation
(arithmetic/logic) instructions.
 Only load and store instructions can access
memory, and all other instructions operate only on
data within the CPU registers
Efficiency through Reduced Instructions:
• By limiting memory access to load and store
instructions, this architecture reduces the
complexity of the CPU’s instruction set,
streamlining the pipeline stages, which can
speed up instruction processing.
12
Load-Store architecture
 Improved Pipeline and Parallelism:
 Load-Store architectures facilitate instruction
pipelining by separating data movement (load/store)
from computation, allowing multiple instructions to be
processed concurrently.
 Since each instruction generally takes the same amount
of time, this leads to fewer pipeline stalls, increasing the
CPU's efficiency
Registers-Based Computation:
 All computation (arithmetic/logic) operations are carried
out on data in registers, which are much faster to access
than memory. This design encourages using more
registers and managing data within the CPU, minimizing
costly memory accesses.

13
LDR & STR examples
Generally, LDR is used to load something
from memory into a register, and STR is
used to store something from a register to
a memory address.

LDR R2, [R0] @ [R0] - origin address is the value found in R0.
STR R2, [R1] @ [R1] - destination address is the value found in
R1.

LDR operation: loads the value at the address found in R0 to the


register R2.
14 STR operation: stores the value found in R2 to the memory address
found in R1.
Architecture vs. Micro architecture

Architecture and Microarchitecture are


two distinct concepts in computer design,
each addressing different layers of how a
computer or processor is organized and
operates.

15
Architecture ( Instruction Set Architecture (ISA))

 Definition: Architecture, or Instruction Set Architecture (ISA), is


the abstract model of a computer. It defines what the processor
can do and how it does it. The ISA serves as a blueprint for
software developers and hardware engineers, establishing the
instructions that a processor can execute, such as arithmetic
operations, data manipulation, control flow, and memory access.
 Components:
 Instruction Set: The set of operations the processor can perform.
 Registers: The number, type, and purpose of registers accessible to
the programmer.
 Memory Model: How memory is accessed and organized.
 Data Types: Defines data sizes and structures.
 Examples: x86, ARM, MIPS, and RISC-V are some well-known
ISAs.
Architecture is primarily concerned with functionality and acts as an
interface between hardware and software. It is independent of the
physical implementation.
16
Microarchitecture
 Microarchitecture is the detailed, lower-level design that
defines how a particular processor (or implementation) of
an ISA is constructed.
 It deals with the actual organization and operational details of
the CPU and covers the physical arrangement of components like
pipelines, ALUs, caches, and execution units.
 Components:
 Pipelines: Defines how instructions are processed in stages.
 Caches: Levels of memory caches (L1, L2, L3) to optimize data access.
 Execution Units: ALUs, FPUs and other functional units.
 Branch Prediction and Speculative Execution: Techniques to
improve performance.
 Examples: Intel’s Haswell, AMD’s Zen, and ARM’s Cortex
microarchitectures are examples of how different manufacturers
implement the same or similar ISAs differently.
 Microarchitecture is implementation-specific and focuses on
optimizing the execution of instructions defined by the ISA,
aiming to balance speed, power efficiency, and physical
constraints.

17
Machine models
Machine models in computing refer to
abstract frameworks or theoretical models
used to understand and design computing
systems, helping to simulate, analyse, and
implement algorithms. These models play a
crucial role in both theoretical computer
science and practical computing, as they
influence how processors, memory, and
computation workflows are organized.

18
1. Von Neumann Model
 This is the classic model for most modern computers,
named after John von Neumann. It consists of a
single memory space for storing both
instructions and data, and a central processing unit
(CPU) that sequentially processes instructions.
 Components:
 Memory: Holds both instructions and data.
 Control Unit: Interprets instructions from memory.
 ALU :Executes operations.
 Input/Output: Interfaces for user interaction and data
transfer.
 Characteristics: Sequential instruction processing, a
shared memory space for instructions and data (leading
to the "von Neumann bottleneck"), which can slow
down performance due to limited data throughput.

19
2. Harvard Architecture
This model separates the memory for instructions
and data, allowing the CPU to access both
simultaneously.
Components:
 Separate Memory Banks: Separate instruction
and data memory.
 Control Unit and ALU: As in the von Neumann
model.
Characteristics: The separation allows
parallelism in fetching instructions and data,
reducing bottlenecks. It’s common in
embedded systems and certain digital signal
processors.

20
3 .Parallel Models
 These models consider multiple processors or cores to
achieve concurrent processing.
 SIMD (Single Instruction, Multiple Data): A single
control unit issues one instruction that operates on
multiple data points simultaneously. Ideal for tasks like
graphics processing and scientific computing.
 MIMD (Multiple Instruction, Multiple Data): Each
processor executes its own set of instructions
independently. Common in multi-core CPUs and distributed
computing systems.
 SISD (Single Instruction, Single Data): The traditional,
sequential model used in single-core processors.
 MISD (Multiple Instruction, Single Data): Rarely used,
where multiple instructions operate on a single data
stream.
21
Machine(Data Path)
Models
● Data in registers, memory, or stack can form operands for ALU, which
decides the type of machine model to follow.
Stack machine model:
• The stack machine performs operations by pushing operands (data
or variables) onto a stack and then applying
operations on the values at the top of the stack.
• Simple model with no explicit operands specified in ALU
instructions unless it is a multilevel stack(more than two
locations).
Ex: 8087 Floating point coprocessor to be used in conjunction
with 8086 processor.
Ex: Java virtual machine & Python interpreted languages work
on stack machine model.
*TOS:Top of the
stack

22
Machine(Data
Path)
Models
Register-Memory Register-Register
Accumulator
• Accumulator • This
• This model
model
represents
machines represents the
were the the
among architecture
load-store
architecture
earliest computer of
many real-
of
modern systems
architectures world
computer
like ARM.
the EDSAC and
like systems like x86.
• Instructions
the • Instructions
• IBM withare 2 or
701.
Instructions withare 2 or
are
simple, operating 3
3named
directly with operands.
named
operands.
accumulator the
memory. and
• Instruction are
s
with a named
operand.

23
Machine(Data Path) Models: Typical Program
sequence
Let us take an operation:
C=A+B
Stack Accumulator Register to Memory Register to
Register
PUSH A LOAD A LOAD R1, A LOAD R1,A
PUSH B ADD B ADD R3,R1, B LOAD R2,B
ADD STORE C STORE R3, C ADD R3,R1,R2
POP C STORE R3,C

24
Instruction Set Architecture (ISA)
ISA is the interface between software and hardware, defining the supported
instructions, data types, registers, memory addressing modes, and I/O
mechanisms.
Each ISA has a unique set of characteristics that influences how effectively it
can handle different types of computation and how it balances performance,
energy efficiency, and simplicity.
Key Characteristics of an ISA
 Instruction Set:
 Types of Instructions: Defines arithmetic, logical, data transfer, control, and
floating-point instructions.
 Instruction Length: Fixed-length or variable-length instructions. Fixed-length
instructions (like in RISC) simplify decoding, while variable-length instructions
(like in CISC) provide flexibility.
 Format: Specifies how operands and operation codes are structured in
instructions.
 Registers:
 General-Purpose Registers (GPRs): Registers that can be used for various
purposes, providing faster access than memory.
 Special-Purpose Registers: These may include program counters, status
registers, and others, optimized for specific operations.
 Register Count: Impacts performance and complexity; more registers
generally mean more data can be processed without needing memory access.

25
Instruction Set Architecture (ISA)
 Memory Addressing Modes:
 Specifies how instructions reference memory. Common modes include
immediate, direct, indirect, register, and displacement addressing.
 Complex addressing modes, as in CISC, allow versatile data
manipulation, while RISC designs like RISC-V often use simpler modes
to speed up instruction decoding and execution.
 Data Types:
 Defines supported data types, such as integer, floating-point, and
sometimes more specialized data types like packed or SIMD data types.
 The types supported impact the ISA's utility for various applications,
e.g., floating-point support for scientific computation.
 Instruction Decoding:
 Complexity of Decoding: Affects CPU design; simpler decoding is a
characteristic of RISC ISAs, which can speed up the pipeline.
 Control Flow Instructions: Defines branches, loops, and jumps.
Branch prediction optimizations depend on control instructions'
predictability.
 Power Efficiency:
 Simpler ISAs (RISC) tend to consume less power due to fewer
instructions, fixed-length encoding, and streamlined decoding.

26
Types of ISA

27
RISC-V
• RISC-V (pronounced "risk-five”) is a ISA standard
– An open source implementation of a reduced instruction set computing (RISC)
based instruction set architecture (ISA)
– There was RISC-I, II, III, IV before
• Most ISAs: X86, ARM, Power, MIPS, SPARC
– Commercially protected by patents
– Preventing practical efforts to reproduce the computer systems.
• RISC-V is open
– Permitting any person or group to construct compatible computers
– Use associated software
• Originated in 2010 by researchers at UC Berkeley
– Krste Asanović, David Patterson and students
• 2017 version 2 of the userspace ISA is fixed
– User-Level ISA Specification v2.2
– Draft Compressed ISA Specification v1.79 https://riscv.org/
– Draft Privileged ISA Specification v1.10 https://en.wikipedia.org/wiki/RISC-V

28
29
Key Characteristics of RISC-V

 Modular Design:
 RISC-V follows a modular approach, allowing implementers to add
or omit specific features as needed.
 The base ISA (RV32I for 32-bit, RV64I for 64-bit, RV128I for 128-
bit) provides basic integer instructions.
 Additional modules (extensions) support features like floating-point
arithmetic (F, D), atomic operations (A), and vector processing (V).
 Simplified Instruction Set:
 RISC-V follows RISC principles with a small, fixed set of instructions
that are easy to decode and execute.
 Fixed-Length Instructions: RISC-V uses 32-bit fixed-length
instructions in the base ISA, simplifying instruction fetch and
decode stages.
 32 General-Purpose Registers:
 RISC-V uses 32 GPRs for each base ISA (RV32, RV64, and RV128).
These registers allow efficient data handling without frequent
memory access, which is especially beneficial for high-performance
computing.

30
Key Characteristics of RISC-V

 Support for Extensions:


 The base RISC-V ISA is minimal, but a wide array of
extensions allows it to be tailored for different applications.
 Common Extensions:
 M (Multiply/Divide): Adds support for integer multiplication
and division.
 F and D (Single- and Double-Precision Floating Point):
Adds support for floating-point calculations, critical for
scientific and engineering applications.
 A (Atomic Instructions): Adds atomic read-modify-write
operations, essential for multithreading.
 C (Compressed Instructions): Reduces code size, improving
efficiency, especially in memory-constrained environments.
 V (Vector Extension): Enables SIMD (Single Instruction,
Multiple Data) operations, suitable for data-parallel tasks like
machine learning and image processing.

31
Key Characteristics of RISC-V
 Scalability and Interoperability:
 RISC-V supports 32-bit, 64-bit, and 128-bit address spaces, providing
scalability across devices.
 Modular extensions ensure compatibility, enabling devices of different
capabilities to run the same software while tailoring hardware to specific
application needs.
 Simplified Load/Store Architecture:
 RISC-V follows a load/store architecture where only load and store
instructions access memory, while arithmetic instructions operate solely
on registers. This approach simplifies pipelining and optimizes
performance.
 Efficient and Power-Aware:
 RISC-V’s minimalist design, fixed-length instructions, and load/store
architecture contribute to high power efficiency, making it suitable for
embedded and low-power applications.
 Open-Source and Customizable:
 Unlike proprietary ISAs like ARM or x86, RISC-V is an open standard,
fostering innovation and custom hardware designs without licensing
fees. This openness has led to rapid adoption in academia and industry,
as it encourages experimentation and customization.

32
RISC-V
RISC-V’s design aligns with traditional RISC
principles, prioritizing simplicity,
modularity, and flexibility.
Its open-source nature, coupled with an
extensible design, makes it versatile for a
wide range of applications, from small
embedded devices to large-scale data
centers.
This flexibility allows for targeted
optimizations, balancing performance,
power consumption, and cost for various
computing environments.
33
Applications
The application options are endless for the RISC-V ISA:
•Wearable's, Industrial, IoT, and Home Appliances. RISC-V processors are ideal for
meeting the power requirements of space-constrained and battery-operated designs.
•Smartphones. RISC-V cores can be customized to handle the performance needed to
power smartphones, or can be used as part of a larger SoC to handle specific tasks for
phone operation.
•Automotive, High-Performance Computing (HPC), and Data Centres. RISC-V
cores can handle complex computational tasks with customized ISAs, while RISC-V
extensions enable development of simple, secure, and flexible cores for greater energy
efficiency.
•Aerospace and Government. RISC-V offers high reliability and security for these use
applications.

34
35
36
37
Pipelining

Pipelining is a technique used in


computer architecture to improve the
overall processing speed of a system
by breaking down tasks into smaller
stages, allowing multiple instructions
to overlap in execution. It’s widely
used in processors to enhance
38
performance by organizing operations
so that different stages of multiple
An Ideal
Pipeline

stag stag stage stag


e e 3 e
1 2 4

• All objects go through the same stages


• No sharing of resources between any two stages
• Propagation delay through all pipeline stages is equal
• Scheduling of a transaction entering the pipeline is not
affected by the transactions in other stages
• These conditions generally hold for industry assembly
lines, but instructions depend on each other causing
various hazards
39
Instruction Pipeline Design: Simple RISC ISA

40
Simple RISC

Instructions encoding

● call, b, beq, bgt:

● add, sub, mul, div, mod, and,


or, lsl,lsr, asr: Immediate Format(2nd operand is
immediate):
Register Format(2nd source in
register):

41
Simple R I S C Processor Design
● The approach to designing the processor is to divide the processing into
stages.

Instructi Opera Execu Memor Regist


on nd te y er
Fetch Fetch (EX Access Writ
(IF) (OF) ) (MA) e
(RW)

• Instruction Fetch (IF) Operand Fetch (OF)


 Fetch an instruction from • Decode the instruction (break it into fields)
the instruction memory • Fetch the register operands from the register
 Compute the address file
of the next instruction • Compute the branch target (PC + offset)
• Compute the immediate
• Generate control signals

42
Simple RISC Processor
Design
MA (Memory Access) Stage
• Interfaces with the memory system
• Executes a load or a store
RW (Register Write) Stage
• Writes to the register file
• In the case of a call instruction, it writes the
return address to register, ra

43
SimpleRISC Processor
Datapath
The EX Stage
Contains an Arithmetic-Logical Unit
(ALU)This unit can perform all arithmetic
operations ( add, sub, mul, div, cmp, mod),
and logical operations (and, or, not)
Contains the branch unit for computing the
branch condition (beq, bgt).
Contains the flags register (updated by the
cmp instruction)

44
Simple RISC Processor
Design
MA (Memory Access) Stage
• Interfaces with the memory system
• Executes a load or a store
RW (Register Write) Stage
• Writes to the register file
• In the case of a call instruction, it writes the
return address to register, ra

45
SimpleRISC Processor Pipelined
Datapath
IF Stage

isBranchTaken is a control signal.


branch will be generated from the
execute stage.
Instruction contents saved in the
instruction field

46
Simple RISC Processor Pipelined
Data path

Add a latches(registers) between


subsequent stages.
●4 Latches → IF-OF, OF-EX, EX-MA, and MA-
RW

47
Unpipelined Datapath for
MIPS
PCSrc
br RegWrite MemWrite WBSrc
rind

jabs
pc+
4
0x4
Add
Add

clk

we
clk
rs1
rs2
PC addr 31 r we
inst d addr
1 ALU
clk Inst. ws z rdata
wd Data
Memory rd2
ImmG Memory
PRs
Ext wdata
ALU
Control

OpCode RegDst ExtSel OpSel BSrc zero?


48
Simplified Unpipelined
Datapath
0x4
Add
we
rs1

PC addr rs2 we
rdata r addr
d ALU
1 rdata
ws Data
Inst. wd
Memory Imm
rd
Memory
2Ext wdata
G
P
R
s

4
9
Pipelined
Datapath
0x4
Add
we
rs1

PC addr rs2 we
rdata IR r addr
d ALU
1 rdata
ws Data
Inst. wd
Memory Imm
rd
Memory
2Ext wdata
G
P
R
s
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
5
0
Pipelined
Control
0x4
Add
we
rs1

PC addr rs2 we
rdata IR r addr
d ALU
1 rdata
ws Data
Inst. wd
Memory Imm
rd
Memory
2Ext wdata
G
P
R
s
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
5
1
Pipelined
Control
Hardwired
0x4
Add Controller
we
rs1

PC addr rs2 we
rdata IR r addr
d ALU
1 rdata
ws Data
Inst. wd
Memory Imm
rd
Memory
2Ext wdata
G
P
R
s
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
52
Hardwired control in pipelining
 Speed: Hardwired control is faster than micro programmed control because it uses combinational
logic to produce control signals directly based on the current state and input. This speed is
crucial in pipelined processors, where each instruction stage needs quick control signal
generation to avoid slowing down the pipeline.
 Predictability: Hardwired control is deterministic, meaning that it’s more straightforward to
design and predict behavior for each pipeline stage. This predictability helps in minimizing stalls
or hazards (such as data hazards, control hazards, or structural hazards) within the pipeline.
 Lower Latency: The primary goal in pipelining is to improve the throughput of the instruction
execution process. Hardwired control contributes to this by ensuring minimal latency in signal
generation, helping each pipeline stage transition smoothly from one to the next without delays.
 Simplicity in Execution: Hardwired control is typically simpler to implement in terms of the
circuitry needed for straightforward, well-defined control tasks in each pipeline stage, making it
an efficient choice for processors with simpler instruction sets, like RISC.

Hardwired control is fast and efficient, it can become complex and difficult to modify if the
instruction set is extensive or if additional functionalities are needed, as adding control paths
requires physical changes in the wiring.
For more complex processors or those that need flexibility (like supporting multiple instruction
sets), microprogrammed control may be a better fit.

53
Pipelined
Control
Hardwired
0x4
Add Controller
we
rs1

PC addr rs2 we
rdata IR r addr
d ALU
1 rdata
ws Data
Inst. wd
Memory Imm
rd
Memory
2Ext wdata
G
P
R
s
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
54
However, CPI will increase unless instructions are pipelined
Pipelined
Control
Hardwired
0x4
Add Controller
we
rs1

PC addr rs2 we
rdata IR r addr
d ALU
1 rdata
ws Data
Inst. wd
Memory Imm
rd
Memory
2Ext wdata
G
P
R
s
write
fetch decode & register- execute memory
-back
phase fetch phase phase phase
phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined 5
5
CPI
Examples
Microcoded machine Time
7 cycles 5 cycles 10 cycles
Inst 1 Inst 2 Inst 3

3 instructions, 22 cycles, CPI=7.33


Unpipelined machine
Inst 1 Inst 2 Inst 3
3 instructions, 3 cycles, CPI=1
Pipelined machine
Inst 1
Inst 2 3 instructions, 3 cycles, CPI=1
Inst 3
5
6
Technology
Assumptions
• A small amount of very fast memory (caches)
backed up by a large, slower memory
• Fast ALU (at least for integers)
• Multiported Register files (slower!)

Thus, the following timing assumption is reasonable

tIM  tRF  tALU  tDM  tRW

A 5-stage pipeline will be the focus of our detailed design

- some commercial designs have over 30 pipeline


stages to do an integer add!
57
Iron Law of Processor Performance
Time Instructions Cycles Time
=
Program
–InstructionsProgram * Instruction
per program depends on* Cycle
source code, compiler technology, and ISA
–Cycles per instructions (CPI) depends upon
the ISA and the microarchitecture
–Time per cycle depends upon the
microarchitecture and the base technology
Microarchitecture CPI cycle time
Microcoded >1 short
Single-cycle unpipelined 1 long
Pipelined 1 short
58
“Iron Law” of Processor
Performance
Time Instructions Cycles Time
=
Program Program * Instruction * Cycle
–Instructions per program depends on
source code, compiler technology, and ISA
–Cycles per instructions (CPI) depends upon
the ISA and the microarchitecture
–Time per cycle depends upon the
microarchitecture
Microarchitecture and the
CPI base
cycletechnology
time
Microcoded >1 short
Single-cycle unpipelined 1 long
Pipelined 1 short
Multi-cycle, unpipelined control >1 short 5
9
60
Suppose we have two implementations of the same
instructions set architecture. Computer A has a clock
cycle time of 250ps and a CPI of 2.0 for some program,
and computer B has a clock cycle time of 500ps and a
CPI of 1.2 for the same program. Which computer is
faster for this program and by how much

61
62
63
64
65
66
67
68
69
70
71
Design of a Pipeline
* Splitting the Data Path
* We divide the data path into 5 parts : IF, OF,
EX, MA, and RW
* Timing
* We insert latches (registers) between
consecutive stages
* 4 Latches → IF-OF, OF-EX, EX-MA, and MA-RW
* At the negative edge of a clock, an instruction
moves from one stage to the next

72
Simple R I S C Processor Design
● The approach to designing the processor is to divide the processing into
stages.

Instructi Opera Execu Memor Regist


on nd te y er
Fetch Fetch (EX) Access Writ
(IF) (OF) (MA) e
(RW)

• Instruction Fetch (IF) Operand Fetch (OF)


 Fetch an instruction from • Decode the instruction (break it into fields)
the instruction memory • Fetch the register operands from the register
 Compute the address file
of the next instruction • Compute the branch target (PC + offset)
• Compute the immediate
• Generate control signals

73
The Instruction Packet
* What travels between stages ?
* ANSWER : the instruction packet
* Instruction Packet
* Instruction contents
* Program counter
* All intermediate results
* Control signals
* Every instruction moves with its entire state, no
interference between instructions

74
Pipelined Data Path with Latches

Latches

Instruction Operand Execute Memory Register


Fetch Fetch Access Write
(IF) (OF) (EX) (MA) (RW)

* Add a latch between subsequent stages.


* Triggered by a negative clock edge

75
Simple RISC Processor
Design
MA (Memory Access) Stage
• Interfaces with the memory system
• Executes a load or a store
RW (Register Write) Stage
• Writes to the register file
• In the case of a call instruction, it writes the
return address to register, ra

76
SimpleRISC Processor
Datapath
The EX Stage
Contains an Arithmetic-Logical Unit
(ALU)This unit can perform all arithmetic
operations ( add, sub, mul, div, cmp, mod),
and logical operations (and, or, not)
Contains the branch unit for computing the
branch condition (beq, bgt).
Contains the flags register (updated by the
cmp instruction)

77
Simple RISC Processor
Design
MA (Memory Access) Stage
• Interfaces with the memory system
• Executes a load or a store
RW (Register Write) Stage
• Writes to the register file
• In the case of a call instruction, it writes the
return address to register, ra

78
Instruction Fetch

* The pc register contains the


program counter (negative
edge triggered)
* We use the pc to access the
instruction memory
* The multiplexer chooses
between
* pc + 4
* branchTarget
* It uses a control signal →
isBranchTaken
79
Simple RISC
Basic Instruction Format

5 Bits opcode
imm:
80 Immediate
OF Stage

instruction

Control
Immediate and unit
branch target

branchTarget op2 instruction control

* A, B → ALU Operands, op2 (store operand),


control (set of all control signals)

81
OF Stage
*The register file has two read ports
* 1st Input
* 2nd Input
*The two outputs are op1, and op2
* op1 is the branch target (return address) in the case of a ret
instruction, or rs1
* op2 is the value that needs to be stored in the case of a store
instruction, or rs2

82
EX Stage

pc branchTarget B A instruction control

op2 OF-EX
aluSignals
isBeq
Branch
ALU unit isBgt
To fetch unit

isUBranch

flag
?ags

0 1
isRet
branchPC

s
pc aluResult
isBranchTaken control

op2 instruction EX-MA

* aluResult → result of the ALU Operation


* op2, control, pc, instruction (passed from OF-EX)

83
MA Stage
pc aluResult op2 instruction control EX-MA

mdr
mar
isLd
Data memory Memory
unit
isSt

pc control MA-RW
ldResult aluResult instruction

* ldResult → result of the load operation


* aluResult, control, pc, instruction (passed
from EX-MA)

84
RW Stage

pc ldResult aluResult instruction control

4 isLd
10 01 00 isCall isWb
E
rd
0
Register
E enable A file
1
data ra(15) D
A address
D data

85
1

pc + 4 0

pc Instruction instruction
memory

pc instruction

rd rs2 ra(15) rs1

1 0 1 0 isSt
isRet Control
reg
Immediate and Register unit
file data
branch target
op2 op1
isWb
immx isImmediate
1 0

pc branchTarget B A op2 instruction control

aluSignals

flags
0 isBeq
1 Branch
isRet ALU unit
isBgt
isUBranch

isBranchTaken
pc aluResult op2 instruction control

isLd
mar mdr

Data
Memory
memory unit
isSt

DRAFT
pc ldResult aluResult instruction control

4 isLd
isWb
10 01 00 isCall
rd
0

C Smruti
data
R. Sarangi
ra(15)<[email protected]>
1

86
Abridged Diagram

IF-OF OF-EX EX-MA MA-RW


Control
unit Branch
unit Memory
unit
Fetch Immediate Register
and branch flags write unit
unit unit

Data
ALU
memory
op2 Unit
Instruction Register
memory file op1

87
Instructions Interact With Each Other
in Pipeline
• Data Hazard: An instruction depends on a
data value produced by an earlier instruction
• Control Hazard: Whether or not an
instruction should be executed depends on a
control decision made by an earlier instruction
• Structural Hazard: An instruction in the
pipeline needs a resource being used by
another instruction in the pipeline

88
* Pipeline Diagram
Clock cycles

1 2 3 4 5 6 7 8 9
IF 1 2 3
[1]: add r1, r2, r3 OF 1 2 3
[2]: sub r4, r5, r6 EX 1 2 3
MA 1 2 3
[3]: mul r8, r9, r10
RW 1 2 3

*It has 5 rows


* One per each stage
* The rows are named : IF, OF, EX, MA, and RW
*Each column represents a clock cycle
*Each cell represents the execution of an
instruction in a stage
* It is annotated with the name(label) of the instruction
* Instructions proceed from one stage to the next across clock cycles

89
Example

Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2 3
[1]: add r1, r2, r3 1 2
OF 3
[2]: sub r4, r2, r5 EX 1 2 3
MA 1 2 3
[3]: mul r5, r8, r9
RW 1 2 3

90
Data Hazards
clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2
[1]: add r1, r2, r3
OF 1 2

[2]: sub r3, r1, r4 EX 1 2


1 2
MA
RW 1 2

 Instruction 2 will read incorrect values !!!

91
Data Hazard
Definition: A hazard is defined as the possibility of erroneous execution of an
instruction in a pipeline. A data hazard represents the possibility of erroneous
execution because of the unavailability of data, or the availability of incorrect
data.

* This situation represents a data hazard


* In specific,
* it is a RAW (read after write) hazard
* The earliest we can dispatch instruction
2, is cycle 5

92
Other Types of Data Hazards
* Our pipeline is in-order
Definition: In an in-order pipeline (such as ours), a preceding instruction is
always ahead of a succeeding instruction in the pipeline. Modern processors
however use out-of-order pipelines that break this rule. It is possible for later
instructions to execute before earlier instructions.

* We will only have RAW hazards in our pipeline.


* Out-of-order pipelines can have WAR and WAW
hazards

93
WAW Hazards & WAR Hazards
[1]: add r1, r2, r3  [1]: add r1, r2, r3
[2]: sub r1, r4, r3  [2]: add r2, r5, r6
* Instruction [2] cannot  Instruction [2] cannot write the
write the value of r1, value of r2, before instruction
before instruction [1] [1] reads it → will lead to a
writes to it, will lead to a WAR hazard
WAW hazard

94
Control Hazards

[1]: beq .foo


[2]: mov r1, 4
[3]: add r2, r4, r3
...
...
.foo:
[100]: add r4, r1, r2

* If the branch is taken, instructions


[2] and [3], might get fetched,
incorrectly

95
Control Hazard –
Pipeline Diagram Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2 3
[1]: beq .foo OF 1 2 3
[2]: mov r1, 4 EX 1 2 3
MA 1 2 3
[3]: add r2, r4, r3
RW 1 2 3

* The two instructions fetched immediately


after a branch instruction might have been
fetched incorrectly.

96
Control Hazards
* The two instructions fetched immediately after a branch
instruction might have been fetched incorrectly.
* These instructions are said to be on the wrong path
* A control hazard represents the possibility of erroneous
execution in a pipeline because instructions in the wrong
path of a branch can possibly get executed and save their
results in memory, or in the register file

97
Structural Hazards
* A structural hazard may occur when two instructions have a
conflict on the same set of resources in a cycle
* Example :
* Assume that we have an add instruction that can read
one operand from memory
* add r1, r2, 10[r3]

* This code will have a structural hazard


[1]: st r4, 20[r5]
[2]: sub r8, r9, r10
[3]: add r1, r2, 10[r3]
* [3] tries to read 10[r3] (MA unit) in cycle 4
* [1] tries to write to 20[r5] (MA unit) in cycle 4
* Does not happen in our pipeline
98
Structural Hazards - II

99
Solutions in Software
* Data hazards
1.Insert nop instructions, reorder code
[1]: add r1, r2,
r3
[2]: sub r3, r1,
r4

[1]: add r1, r2, r3


[2]: nop
[3]: nop
[4]: nop
[5]: sub r3, r1, r4

10
0
2. Code Reordering
add r1, r2, r3
add r1, r2, r3
add r8, r5, r6
add r4, r1, 3
add r10, r11, r12
add r8, r5, r6
nop
add r9, r8, r5
add r4, r1, 3
add r10, r11, r12
add r9, r8, r5
add r13, r10, 2
add r13, r10, 2

To reduce the nop instructions


Programmer can modify the code
Compiler does the re ordering
Algorithm in compiler does this reordering
Explore some of the algorthms in compiler for this task!!

10
1
Control Hazards—Delayed branch

* Trivial Solution : Add two nop


instructions after every branch
* Better solution :
* Assume that the two instructions fetched
after a branch are valid instructions
* These instructions are said to be in the delay
slots
* Such a branch is known as a delayed branch

10
2
3. Delay Slots
add r1, r2, r3 b .foo
add r4, r5, r6 add r1, r2, r3
b .foo add r4, r5, r6
add r8, r9, r10 add r8, r9, r10

 The compiler transfers instructions before the


branch to the delay slots.

 If it cannot find 2 valid instructions, it inserts nops.

10
3
Hardware solution for
pipeline hazards

Pipeline with Interlocks

10
4
Why interlocks ?
 We cannot always trust the compiler to do a good
job, or even introduce nop instructions correctly.
 Compilers now need to be tailored to specific
hardware.
 We should ideally not expose the details of the
pipeline to the compiler (might be confidential
also)
 Hardware mechanism to enforce correctness →
interlock

10
5
Two kinds of Interlocks
* Data-Lock
* Do not allow a consumer instruction to move beyond
the OF stage till it has read the correct values.
Implication : Stall the IF and OF stages.

* Branch-Lock
* We never execute instructions in the wrong
path.
* The hardware needs to ensure both
these conditions.

10
6
Comparison between Software and
Hardware

Attribute Software Hardware(withinterlocks)


Portability Limited to a specific Programs can be run on any
processor processor irrespective of the nature
of the pipeline
Branches Possible to have no Need to stall the pipeline for 2 cycles
performance penalty, by in our design
using delay slots
RAW hazards Possible to eliminate Need to stall the pipeline
them through code
scheduling
Performance Highlydependent on theThe basic version of a pipeline with
nature of the program interlocks is expected to be slower
than the version that relies on
software

10
7
Conceptual Look at Pipeline with
Interlocks

[1]: add r1, r2, r3


[2]: sub r4, r1, r2

* We have a RAW hazard


* We need to stall, instruction [2] at the OF
stage for 3 cycles.
* We need to keep sending nop instructions
to the EX stage during these 3 cycles

10
8
Example

Clock cycles
bubble

1 2 3 4 5 6 7 8 9

IF 1 2
[1]: add r1, r2, r3
OF 1 2 2 2 2

[2]: sub r4, r1, r2 EX 1 2

MA 1 2

RW 1 2

10
9
A Pipeline Bubble
* A pipeline bubble is inserted into a
stage, when the previous stage
needs to be stalled
* It is a nop instruction
* To insert a bubble
* Create a nop instruction packet
* OR, Mark a designated bubble bit to 1

11
0
Bubbles in the Case of a Branch
Instruction

Clock cycles
bubble

1 2 3 4 5 6 7 8 9
[1]: beq. foo
[2]: add r1, r2, r3 IF 1 2 3 4
[3]: sub r4, r5, r6
OF 1 2 4
....
.... EX 1 4
.foo:
MA 1 4
[4]: add r8, r9, r10
RW 1 4

11
1
Control Hazards and Bubbles
* We know that an instruction is a branch in
the OF stage
* When it reaches the EX stage and the
branch is taken, let us convert the
instructions in the IF, and OF stages to
bubbles
* Ensures the branch-lock condition

11
2
Ensuring the Data-Lock Condition

* When an instruction reaches the OF


stage, check if it has a conflict with
any of the instructions in the EX, MA,
and RW stages
* If there is no conflict, nothing needs
to be done
* Otherwise, stall the pipeline (IF and
OF stages only)

11
3
How to Stall a
Pipeline ?
* Disable the write functionality of :
* The IF-OF register
* and the Program Counter (PC)
* To insert a bubble
* Write a bubble (nop instruction) into the OF-EX
register

11
4
Data Path with Interlocks (Data-
Lock)
bubble
stall stall
Data-lock Unit

Control
unit Branch
unit Memory
unit

MA-RW
Register

EX-MA
Fetch Immediate

OF-EX
flags
IF-OF

and branch write unit


unit unit

Data
ALU
op2
memory
unit
Instruction Register
memory file op1

11
5
Ensuring the Branch-Lock Condition

* Option 1 :
* Use delay slots (interlocks not required)
* Option 2 :
* Convert the instructions in the IF, and OF
stages, to bubbles once a branch instruction
reaches the EX stage.
* Start fetching from the next PC (not taken) or
the branch target (taken)

11
6
Ensuring the Branch-Lock Condition
- II

* Option 3
* If the branch instruction in the EX stage is taken,
then invalidate the instructions in the IF and OF
stages. Start fetching from the branch target.
* Otherwise, do not take any special action
* This method is also called predict not-taken (we
shall use this method because it is more
efficient that option 2)

11
7
Data Path with Interlocks

isBranchTaken

bubble Branch-lock unit bubble


stall stall
Data-lock unit

Control
unit Branch
unit Memor
yunit
Fetch Immediate Register
IF-OF

MA-RW
OF-EX
and branch flags write unit

EX-MA
unit unit

Data
ALU
unit
memory
op2
Instruction Register
memory file op1

11
8
Ensuring the Branch-Lock Condition

* Option 1 :
* Use delay slots (interlocks not required)
* Option 2 :
* Convert the instructions in the IF, and OF
stages, to bubbles once a branch instruction
reaches the EX stage.
* Start fetching from the next PC (not taken) or
the branch target (taken)

11
9
Ensuring the Branch-Lock Condition
- II

* Option 3
* If the branch instruction in the EX stage is taken,
then invalidate the instructions in the IF and OF
stages. Start fetching from the branch target.
* Otherwise, do not take any special action
* This method is also called predict not-taken (we
shall use this method because it is more
efficient that option 2)

12
0
Data Path with Interlocks

isBranchTaken

bubble Branch-lock unit bubble


stall stall
Data-lock unit

Control
unit Branch
unit Memor
yunit
Fetch Immediate Register
IF-OF

MA-RW
OF-EX
and branch flags write unit

EX-MA
unit unit

Data
ALU
unit
memory
op2
Instruction Register
memory file op1

12
1
Measuring Performance
* What do we mean by the
performance of a processor ?
* ANSWER : Almost nothing
* What should we ask instead ?
* What is the performance with respect to a
given program or a set of programs ?
* Performance is inversely proportional to the
time it takes to execute a program

12
2
Computing the Time a Program Takes

𝜏=¿𝑠𝑒𝑐𝑜𝑛𝑑𝑠
* CPI → Cycles per instruction
* f → frequency (cycles per second)

12
3
The Performance Equation

𝐼𝑃𝐶 ∗ 𝑓
𝑃∝
¿ 𝑖𝑛𝑠𝑡𝑠

* IPC → 1/CPI (Instructions per Cycle)


* What are the units of performance ?
* ANSWER : arbitrary

12
4
Number of Instructions (#insts)

Static Instruction: The binary or executable of a program,


contains a list of
static instructions.
Dynamic Instruction: A dynamic instruction is a running
instance of a static
instruction, which is created by the processor when
*an Note that these are dynamic instructions
instruction
enters the pipeline.
* NOT static instructions

* A smart compiler can reduce the number of executed instructions

12
5
Number of Instructions(#insts) – 2

* Dead code removal


* Often programmers write code that does not
determine the final output
* This code is redundant
* It can be identified and removed by the compiler

* Function inlining
* Very small functions have a lot of overhead → call,
ret instructions, register spilling, and restoring
* Paste the code of the callee in the code of the caller
(known as inlining)

12
6
Computing the CPI
* CPI for a single cycle processor = 1
* CPI for an ideal pipeline(no hazards)
* Assume we have n instructions, and k stages
* The first instruction enters the pipeline in cycle 1
* It leaves the pipeline in cycle k
* The rest of the (n-1) instructions leave in the next (n-1) consecutive cycles

𝑛+ 𝑘 −1
𝐶𝑃𝐼 =
𝑛

12
7
Computing the Maximum Frequency
* Let the maximum amount of time that it takes to execute any instruction be :
* tmax (also known as algorithmic work)

* In the case of a pipeline, let us assume that all the pipeline stages are balanced

* Time per stage → tmax / k


* Let the latch delay be l

* We thus have :

𝑡 𝑚𝑎𝑥
𝑡 𝑠𝑡𝑎𝑔𝑒 = +𝑙
𝑘

1 𝑡 𝑚𝑎𝑥
= +𝑙
𝑓 𝑘

The minimum cycle time (1/f) is equal to tstage . Let us thus,


assume that our cycle time is as low as possible.

12
8
Performance of an Ideal Pipeline
* Let us assume that the number of instructions are a
constant

𝑓
𝑃=
𝐶𝑃𝐼

12
9
Optimal Number of Pipeline Stages

*
𝜕¿¿
k is inversely proportional to
* k is proportional to
* As we increase the latch delay, we should have less pipeline stages
* We need to minimise the time wasted in accessing latches

* As we increase the amount of algorithmic work, we require more pipeline


stages for ideal performance
* More pipeline stages help distribute the work better, and increase the overlap across
instructions
* As the number of instructions tends to ∞, the number of ideal pipeline stages also
tends to ∞

13
0
A Non-Ideal Pipeline
* Our ideal CPI (CPIideal = 1) is 1
* However, in reality, we have stalls

𝐶𝑃𝐼 = 𝐶𝑃𝐼 𝑖𝑑𝑒𝑎𝑙 + 𝑠𝑡𝑎𝑙𝑙 𝑟𝑎𝑡𝑒 ∗ 𝑠𝑡𝑎𝑙𝑙 𝑝𝑒𝑛𝑎𝑙𝑡𝑦


* Let us assume that the stall rate is a function of the program, and its nature of
dependences

* Let us assume that the stall penalty is proportional to the number of


pipeline stages
* Both these assumptions are strictly not correct. They are being used
to make a coarse grained mathematical model.
* CPI = (n+k-1)/n + rck
* r → stall rate, c → constant of proportionality

13
1
Mathematical Model
𝑓
𝑃=
𝐶𝑃𝐼

13
2
Implications
* For programs with a lot of dependences (high value of r) →
Use less pipeline stages
* For a pipeline with forwarding → c is smaller (than a pipeline
that just has interlocks)
* It requires a larger number of pipeline stages for optimal performance

* The optimal number of pipeline stages is directly proportional


to √(tmax / l)
* This explains why the number of pipline stages has remained more or less
constant for the last 5-10 this ratio is not significantly changing across years

13
3
13
4
Example
Example Consider two programs that have the following characteristics.

Program 1 Program 2
Instruction Fraction Instruction Fraction
Type Type

loads 0.4 loads 0.3

Branches 0.2 Branches 0.1


ratio(taken 0.5 ratio(taken 0.4
branches) branches)

13
5
Example
CPI=CPI ideal + stall rate *stall
penalty

13
6
13
7
13
8
13
9
Performance, Architecture, Compiler

P f IPC
Technology Compiler
Architecture Architecture
• Manufacturing technology affects the speed of transistors, and in turn
the speed of combinational logic blocks, and latches.
• Transistors are steadily getting smaller and faster.
• Consequently, the total algorithmic work (t max) and the latch delay (l),
are also steadily reducing.
• Hence, it is possible to run processors at higher frequencies leading to
improvements in performance.
• Manufacturing technology exclusively affects the frequency at which
we can run a processor.
• It does not have any effect on the IPC, or the number of instructions.

14
0
Contraints for pipelining
• Note that the overall picture is not as simple as we described
• We need to consider power and complexity issues also.
• Typically, implementing a pipeline beyond 20 stages is very difficult
because of the increase in complexity.
• Secondly, most modern processors have severe power and temperature
constraints.
• This problem is also known as the power wall.
• It is often not possible to ramp up the frequency, because we cannot
afford the increase in power consumption.
• As a thumb rule, power increases as the cube of frequency. Hence,
increasing the frequency by 10% increases the power consumption by
more than 30%, which is prohibitively large.
• Designers are thus increasingly avoiding deeply pipelined designs that
run at very high frequencies.

14
1
14
2
Consider a pipelined processor with the following four
stages, Instruction Fetch: IF, Instruction decode and
Operand Fetch: ID, EX: Execute, WB: Write Back
The IF,ID and WB stages take one clock cycle each to
complete the operation. The number of clock cycles of
the EX stage depends on the instruction. The ADD and
SUB instructions need 1 clock cycle and the MUL
instruction need 3 clock cycles in the EX stage. Operand
forwarding is used in the pipelined processor. What is the
number of clock cycles taken to complete the following
sequence of operations?
ADD R2,R1,R0
MUL R4,R3,R2
SUB R6,R5,R4

14
3
14
4
14
5
14
6
14
7
14
8
14
9
15
0
15
1
15
2
15
3
15
4
Consider a pipelined processor with the following four
stages, Instruction Fetch: IF, Instruction decode and
Operand Fetch: ID, EX: Execute, WB: Write Back
The IF,ID and WB stages take one clock cycle each to
complete the operation. The number of clock cycles of
the EX stage depends on the instruction. The ADD and
SUB instructions need 1 clock cycle and the MUL
instruction need 3 clock cycles in the EX stage. Operand
forwarding is used in the pipelined processor. What is the
number of clock cycles taken to complete the following
sequence of operations?
ADD R2,R1,R0
MUL R4,R3,R2
SUB R6,R5,R4

15
5
A processor executes instructions without
pipelining in 10 cycles per instruction. A
pipelined version of the processor splits
execution into 5 stages, each taking 2
cycles. Calculate the speedup for executing
100 instructions on the pipelined processor
compared to the non-pipelined processor.
Assume no stalls or hazards.

15
6
15
7
A pipelined processor has 6 stages, each
taking 2 ns. Due to hazards and stalls, only
80% of the pipeline is utilized.
 a) What is the effective throughput in
instructions per second?
 b) What would the throughput be if the
pipeline were fully utilized?

15
8
15
9
16
0
16
1
Thank you

[email protected]

16
2

You might also like