0% found this document useful (0 votes)

48 views72 pages

2 RISC V Performance ISA

Uploaded by

woshijuruo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views72 pages

2 RISC V Performance ISA

Uploaded by

woshijuruo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Computer Architecture

Fall 2024

Lecture 02: Performance &RISC-V ISA

[Adapted from Computer Organization and Design, P&H, UCB; Computer Architecture, Jie Zhang
Guangyu Sun, PKU; Computer Architecture, Myoungsoo Jung, KAIST]
[email protected]
Performance

Peking University
Performance Metrics
Ø Purchasing perspective
• given a collection of machines, which has the
• best performance ?
• least cost ?
• best cost/performance?
Ø Design perspective
• faced with design options, which has the
• best performance improvement ?
• least cost ?
• best cost/performance?
Ø Both require
• basis for comparison
• metric for evaluation
Ø Our goal is to understand what factors in the architecture contribute to overall system
performance and the relative importance (and cost) of these factors

Peking University
Defining (Speed) Performance
Ø Normally interested in reducing
• Response time (aka execution time) – the time between the start and the completion
of a task
• Important to individual users
• Thus, to maximize performance, need to minimize execution time

If X is n times faster than Y, then

• Throughput – the total amount of work done in a given time

• Important to data center managers
• Decreasing response time almost always improves throughput

Peking University
Machine Clock Rate
Ø Clock rate (MHz, GHz) is inverse of clock cycle time (clock period)
CC = 1 / CR

one clock period

10 nsec clock cycle => 100 MHz clock rate

5 nsec clock cycle => 200 MHz clock rate
2 nsec clock cycle => 500 MHz clock rate
1 nsec clock cycle => 1 GHz clock rate
500 psec clock cycle => 2 GHz clock rate
250 psec clock cycle => 4 GHz clock rate
200 psec clock cycle => 5 GHz clock rate

Peking University
Performance
Ø Two common measures
• Latency (how long to do X)
• Also called response time and execution time
• Throughput (how often can it do X)
Ø Example of car assembly line
• Takes 6 hours to make a car (latency is 6 hours per car)
• A car leaves every 5 minutes (throughput is 12 cars per hour)
• Overlap results in Throughput > 1/Latency

Peking University
Clock Cycles per Instruction
Ø Not all instructions take the same amount of time to execute
• One way to think about execution time is that it equals the number of instructions
executed multiplied by the average time per instruction

• Clock cycles per instruction (CPI) – the average number of clock cycles each
instruction takes to execute
• A way to compare two different implementations of the same ISA

CPI for this instruction class

A B C
CPI 1 2 3

Peking University
Effective CPI
Ø Computing the overall effective CPI is done by looking at the different
types instructions and their individual cycle counts and averaging

• Where ICi is the count ( percentage) of the number of instructions of class i

executed
• CPIi is the (average) number of clock cycles per instruction for that instruction class
• n is the number of instruction classes

Ø The overall effective CPI varies by instruction mix – a measure of the

dynamic frequency of instructions across one or many programs

Peking University
CPU Performance Equation (1)
CPU time = CPU Clock Cycles Clock cycle time

CPU time = Instruction Count Cycles Per Instruction Clock cycle time

ISA, Organization, Hardware

Compiler ISA Technology,
Technology Organization

A.K.A. The “iron law” of performance

Peking University
Car Analogy
Ø Drive from SIT to Sinchon
• “Clock Speed” = 3500 RPM
• “CPI” = 5250 rotations/km or 0.19 m/rot
• “ Insts” = 6 0 km

= 90 minutes

Peking University
CPU Version
Ø Program takes 33 billion instructions to run
Ø CPU processes instructions at 2 cycles per inst
Ø Clock speed of 3GHz

Sometimes clock cycle time given

instead (ex. cycle = 333 ps) =22 seconds
IPC sometimes used instead of CPI

Peking University
CPU Performance Equation (2)
CPU time = CPU Clock Cycles X Clock cycle time

Peking University
Comparing Performance
Ø “X is n times faster than Y”

Ø “Throughput of X is n times that of Y”

Peking University
A Simple Example
Op Freq CPIi Freq x CPIi
ALU 50% 1 .5 .5 .5 .25
Load 20% 5 1.0 .4 1.0 1.0

Store 10% 3 .3 .3 .3 .3
Branch 20% 2 .4 .4 .2 .4

∑= 2.2 1.6 2.0 1.95

• How much faster would the machine be if a better data cache reduced the
average load time to 2 cycles?
CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster
• How does this compare with using branch prediction to shave a cycle off the
branch time?
CPU time new = 2 . 0 x IC x CC so 2 . 2 / 2 . 0 means 1 0 % faster
• What if two ALU instructions could be executed at once?
CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster
Peking University
If Only it Were That Simple
Ø “X is n times faster than Y on A”

Ø But what about different applications (or even parts of the same
application)
• X is 10 times faster than Y on A, and 1.5 times on B, but Y is 2 times faster
than X on C, and 3 times on D, and…
Which would you buy?
So does X have better
Performance than Y?

Peking University
Summarizing Performance
Ø Arithmetic mean
• Average execution time
• Gives more weight to longer-running programs
Ø Weighted arithmetic mean
• More important programs can be emphasized
• But what do we use as weights?
• Different weight will make different machines look better

Peking University
Normalizing & the Geometric Mean
Ø Speedup of arithmetic means != arithmetic mean of speedups
Ø Use geometric mean:

Ø Neat property of the geometric mean: Consistent whatever the

reference machine
Ø Do not use the arithmetic mean for normalized execution times

Peking University
CPI/IPC
Ø Often when making comparisons in comp-arch studies
• Program (or set of) is the same for two CPUs
• The clock speed is the same for two CPUs
ØSo we can just directly compare CPI’s and often we use IPC’s

Peking University
Average CPI vs. “Average” IPC

CPI1 + CPI2 + ... + CPIN

Average CPI =
N
IPC1 + IPC2 + ... + IPCN
Average IPC =
N
Not Equal 1/CPI!!!

Ø Must use Harmonic Mean to remain ≈ to runtime

Peking University
Harmonic Mean

Ø What in the world is this?

– Average of inverse relationships

Peking University
A.M.(CPI) vs. H.M.(IPC)

Peking University
Amdahl’s Law (1)

Ø What if enhancement does not enhance everything?

Peking University
Amdahl’s Law (2)
Ø Make the Common Case Fast

Important: Principle of locality

Approx. 90% of the time spent in 10% of the code

Peking University
Amdahl’s Law (3)
Ø Diminishing Returns

Peking University
Yet Another Car Analogy

Ø Driving from SIT to Sinchon Campus

• you’ve got a “Turbo” for your car, but can only
use on the highway
Ø Highway to city border (50 km)
• avg. speed of 100 km/h MPH
• avg. speed of 200 km/h with Turbo
Ø City border → Sinchon Campus (10 km)
• stuck in bad rush hour traffic Turbo gives 1 0 0 % speedup across 8 3 % of the distance …
… but only results in a 1 0 % reduction on total trip time
• avg. speed of 5 km/h ( which is a 1 1 . 1 % speedup)

Peking University
Now Consider Price‐ Performance
Ø Without Turbo
• Car costs $8,000 to manufacture
• Selling price is $12,000 → $4K profit per car
• If we sell 10,000 cars, that’s $40M in profit
Ø With Turbo
• Car costs extra $3,000
• Selling price is $16,000 → $5K profit per car
• But only a few gear heads buy the car:
• We only sell 400 cars and make $2M in profit

Peking University
CPU Design is Similar
Ø What does it cost me to add some performance enhancement?
Ø How much effective performance do I get out of it?
• 100% speedup for small fraction of time wasn’t a big win for the car example
Ø How much more do I have to charge for it?
• Extra development, testing, marketing costs
Ø How much more can I charge for it?
• Does the market even care?
Ø How does the price change affect volume?

Peking University
Summary: Evaluating ISAs
Ø Design-time metrics:
• Can it be implemented, in how long, at what cost?
• Can it be programmed? Ease of compilation?
Ø Static Metrics:
• How many bytes does the program occupy in memory?
Ø Dynamic Metrics:
• How many instructions are executed? How many bytes does the processor fetch
to execute the program?
• How many clocks are required per instruction?
• How "lean" a clock is practical?
Best Metric: Time to execute the program!
depends on the instructions set, the processor organization, and compilation techniques.

Peking University
RISC-V ISA

Peking University
Abstraction Hierarchy
Application Software

Operating System (threads, files, exceptions)

Computer Architecture (instruction set)

Micro-Architecture (execution pipeline)

Logic (adders, multipliers, FSMs)

Digital Circuits (gates)

Analog Circuits (amplifiers)

Process and Devices (MOSFET transistors)

Physics (electrons, holes, tunneling, band gap)

Source: Alex Bronstein
Peking University
Instruction Set Architecture (ISA)
ØIndicating which resources (of processor) are needed
ØExplain how instructions can be encoded as a bitstream
Levels of Program Code
High Level Language
(C++, Python, Java)
Application
x= a+b;
How to program a machine
 Processor executes Assembly Language
Compiler OS (in a
instructions as a stream (ARM, MIPS, x86)
sequence) add R1, R2, R3
Software
Abstraction
Instruction layer
Set Architecture
Hardware
WhatCPU
needs to be built
Design Machine Language
 Use a wide spectrum of
I/O
Processorto
techniques Memory
make CPU faster 1001 1110 0110 1010
system
Peking University
For a given level of function, however, that
system is best in which one can specify
things with the most simplicity and
straightforwardness. … Simplicity and
straightforwardness proceed from
conceptual integrity. … Ease of use, then,
dictates unity of design, conceptual
integrity.
The Mythical Man-MonthPeking
, University
For a given level of function, however, that
system is best in which one can specify
things with the most simplicity and
straightforwardness. … Simplicity and
straightforwardness proceed from
conceptual integrity. … Ease of use, then,
dictates unity of design, conceptual
integrity.
The Mythical Man-MonthPeking
, University
The Instruction Set：a Critical Interface

ØProperties of a good abstraction:

• Lasts through many generations (portability)
• Used in many ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels

Peking University
Definitions
ØThe words of a computer’s language are called instructions, and its vocabular
y is called an instruction set.
ØThe similarity of instruction sets occurs because all computers are
constructed from hardware technologies based on similar underlying
principles and because there are a few basic operations that all computers
must provide.
ØComputer designers have a common goal: to find a language that makes it ea
sy to build the hardware and the compiler while maximizing performance and
minimizing cost and energy.
What is ISA （Instruction Set Architecture）?

Peking University
(vonNeumann) Processor Organization
Memory Datapath needs to have the
• Components – the functional units
and storage (e.g., register file) needed
to execute instructions
Interconnects - components

(Store)
•
Fetch
connected so that the instructions can
be accomplished and so that data can
be loaded from and stored to Memory

Control needs to
1.Bring input instructions from Memory
2.Issue signals to control the information
flow between the Datapath
Decode Execute components and to control what
operations they perform
3.Manage instruction sequencing

Peking University
History of ISA designs
ØLong long ago, resources are limited:
• Memory: very expensive and very small capacity
• Most programmers work on assembly languages
ØCISC (Complex Instruction Set Computer)
üDense instruction size (1~15 Bytes)
üProgrammer friendly
pComplexity: almost 1 new instructions per month
IBM 1st hard disk (5MB in total)
pHardware unfriendly: compiler, registers, state machines
e.g., a = a * b
CISC assembly instruction:
MULT 2:3, 5:2

Peking University
RISC
ØRISC (Reduced Instruction Set Computer) philosophy
• fixed instruction lengths
• load-store instruction sets
• limited addressing modes
• limited operations
e.g., a = a * b
CISC assembly instruction: RISC assembly instruction:
MULT 2:3, 5:2 LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A
ARM, RISC-V, MIPS, Sun SPARC, HP PA-RISC, IBM PowerPC, Intel (Compaq) Alpha, …

Peking University
RISC: Past, Present, and Future
PC Era PostPC Era: Client/Cloud
ØHardware translates x86 instructions • IP in SoC vs. MPU
into internal RISC instructions • Value die area, energy as much
ØThen use any RISC technique inside as performance
MPU • > 20B total / year in 2017
Ø> 350M / year ! • x86 in PCs peaks in 2011, now
decline ~8% / year (2016 < 2007)
Øx86 ISA eventually dominates servers as
• x86 servers ⇒ Cloud ~10M
well as desktops servers total* (0.05% of 20B)
• 99% Processors today are RISC

Source: P&H Turing Award Talk* “A Decade of Mobile Computing”, Vijay Reddi, 7/21/17, Computer Architecture Today
Peking University
What is RISC‐V
• RISC-V (pronounced "risk-five”) is a ISA standard
– An open source implementation of a reduced instruction set computing (RISC) based
instruction set architecture (ISA)
– There was RISC-I, II, III, IV before
• Most ISAs: X86, ARM, Power, MIPS, SPARC
– Commercially protected by patents
– Preventing practical efforts to reproduce the computer systems.
• RISC-V is open
– Permitting any person or group to construct compatible computers
– Use associated software
• Originated in 2010 by researchers at UC Berkeley
– Krste Asanović, David Patterson and students
• 2017 version 2 of the userspace ISA is fixed
– User-Level ISA Specification v2.2
– Draft Compressed ISA Specification v1.79
– Draft Privileged ISA Specification v1.10
Peking University
Goals in Defining RISC‐V
• A completely open ISA that is freely available to academia and industry
• An ISA that avoids "over-architecting" for
– a particular microarchitecture style (e.g., microcoded, in-order, decoupled, out-of-
order) or
– implementation technology (e.g., full-custom, ASIC, FPGA), but which allows
efficient implementation in any of these
• RISC-V ISA includes
– A small base integer ISA, usable by itself as a base for customized accelerators or
for educational purposes, and
– Optional standard extensions, to support general-purpose software development
– Optional customer extensions
• Support for the revised 2008 IEEE-754 floating-point standard

Peking University
RISC‐V ISA Principles
• Generally kept very simple and extendable
• Separated into multiple specifications
– User-Level ISA spec (compute instructions)
– Compressed ISA spec (16-bit instructions)
– Privileged ISA spec (supervisor-mode instructions)
– More …

• ISA support is given by RV + word-width + extensions supported

– E.g. RV32I means 32-bit RISC-V with support for the I(nteger) instruction set

Peking University
User Level ISA
• Defines the normal instructions needed for computation
– A mandatory Base integer ISA
• I: Integer instructions:
– ALU
– Branches/jumps
– Loads/stores
– Standard Extensions
• M: Integer Multiplication and Division
• A: Atomic Instructions
• F: Single‐Precision Floating‐Point
• D: Double‐Precision Floating‐Point
• C: Compressed Instructions (16 bit)
• G = IMAFD: Integer base + four standard extensions
– Optional extensions

Peking University
RISC‐V Instruction Set Architecture (ISA)
Registers
ØInstruction Categories
• Arithmetic, Logical, Shift x0 - x31
• Data transfer
• Un-/Conditional branch

6 Instruction Formats: all 32 bits wide PC

31 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits 0
funct7 rs2 rs1 funct3 rd opcode R format

immediate rs1 funct3 rd opcode I format

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S format

imm[12,10:5] rs2 rs1 funct3 imm[4:1,11] opcode SB format
immediate[31:12] rd opcode U format
immediate[20,10:1,11,19:12] rd opcode UJ format

Peking University
Question review
Ø“Wired” orders of immediate

Sign bit imm[10:5] imm[19:12] imm[4:1]

https://five-embeddev.com/riscv-isa-manual/latest/a.html#
Peking University
Issues to be Explored
ØInstruction Types
• R, I, S, SB, U and UJ
ØHow to identify/encode these instructions?
ØHow to process data?
• Data types supported
• Where to store data?
• Addressing methods

Peking University
RISC‐V Arithmetic Instructions
ØRISC-V assembly language arithmetic statement
add x5, x6, x7
sub x5, x6, x7
Ø Each arithmetic instruction performs ______
1 operation
Ø Each arithmetic instruction fits in 32 bits and specifies
exactly ____
3 operands

destination source1 op source2

Ø Operand order is fixed (destination first)
Ø Register File (x5,x6,x7)
Those operands are all contained in the ___________

Peking University
Aside: RISC‐V Register Convention
Name Register Usage
Number
x0 0 the constant value 0
x1 (ra) 1 return address (link
register)
x2 (sp) 2 stack pointer
x3 (gp) 3 global pointer
x4 (tp) 4 thread pointer
x5 - x7 5-7 temporaries
x8 - x9 8-9 frame pointer/saved
x10 - x17 10-17 arguments/results
x18 - x27 18-27 saved
X28 - x31 28-31 temporaries

Peking University
RISC‐V Register File
Register File
ØHolds thirty-two 64-bit registers 64 bits
• How many read ports ? 2
src1 addr src1 data
• How many write ports? 1
src2 addr 32
locations
des addr
Ø Registers are
write data src2 data
l Faster than other memory levels
- But register files with more locations
are slower (e.g., a 64-word file could write control
be as much as 50% slower than a 32-word file)
- Read/write port increase impacts speed quadratically
l Easier for a compiler to use
- e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs. stack
l Can hold variables so that
- code density improves (since register are named with fewer bits
than a memory location)
Peking University
Machine Language ‐ Add Instruction
Instructions, like registers and words of data, are 32 bits long
Arithmetic Instruction Format (R format):
add x5, x6, x7

funct7 rs2 rs1 funct3 rd opcode

7 5 5 3 5 7

opcode 7-bits opcode that specifies the operation

rs1 5-bits register file address of the first source operand
rs2 5-bits register file address of the second source operand
rd 5-bits register file address of the result’s destination
funct3 3-bits function code augmenting the opcode
funct7 7-bits function code augmenting the opcode

Peking University
RISC‐V Memory Access Instructions
• RISC-V has two basic data transfer instructions for accessing
memory
ld x5, 24(x6) #load doubleword from memory
sd x5, 24(x6) #store doubleword to memory
• The data is loaded into (ld) or stored from (sd) a register in the
register file – a ___
5 bit address

• The memory address – a ____bit

64 address – is formed by adding
the contents of the base address register to the offset value
• A 12-bit field meaning access is limited to memory locations within a
region of 28 or 256 doublewords (211 or 2048 bytes) of the address in
the base register
• Note that the offset can be positive or negative

Peking University
Machine Language ‐ Load & Store Instruction
Load Instruction Format (I format):
ld x5, 24 ( x6)

immediate rs1 funct3 rd opcode I format

x5
x6

Omit the MSB

32-bit addresses

Peking University
Machine Language ‐ Load & Store Instruction
Load Instruction Format (I format):
ld x5, 24 ( x6)

immediate rs1 funct3 rd opcode I format

Store Instruction Format (S format):

sd x5, 24 ( x6)

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S format

Peking University
Byte Addresses

, RISC-V

Peking University
Loading and Storing Bytes
ØRISC-V provides special instructions to move bytes
lb x5, 40(x6) #load byte from memory
sb x5, 40(x6) #store byte to memory

lb immediate rs1 funct3 rd opcode

Ø What 8 bits get loaded and stored?

l load byte places the byte from memory in the least significant 8 bits of
the destination register
- what happens to the other bits in the register?
l store byte takes the byte from the least significant 8 bits of a register
and writes it to a byte in memory
- what happens to the other bits in the memory word?

Peking University
Immediate Instructions
Ø Small constants are used often in typical code
Ø Possible approaches?
l put “typical constants” in memory and load them
l create hard-wired registers (like $zero) for constants like 1
l have special instructions that contain constants !

addi sp, sp, 4 #sp = sp + 4

• Machine format (I format):

immediate rs1 funct3 rd opcode I format

Peking University
How About Larger Constants?
We'd also like to be able to load a long constant into a register, for this we have a new
"load upper immediate" instruction
lui x5, 34534

immediate[31:12] rd opcode U format

lui loads a 20-bit constant into bits 12 through 31 of a register. The most significant 32
bits are filled with copies of bit 31, and the least significant 12 bits are filled with zeros.

Peking University
RISC‐V Control Flow Instructions
ØRISC-V conditional branch instructions:
beq rs1, rs2, L1 #go to Ll if rs1==rs2
bne rs1, rs2, L1 #go to Ll if rs1!=rs2

Ex: if (i==j) h = i + j;
bne x22, x23, L1
add x19, x22, x23
L1: ...

Ø Instruction Format (SB format):

imm[12,10:5] rs2 rs1 funct3 imm[4:1,11] opcode

Ø How is the branch destination address specified?

Peking University
Specifying Branch Destinations
ØUse a register (like in lw and sw) added to the 12-bit offset
PC )
• which register? Instruction Address Register (____
• its use is automatically implied by instruction
• PC gets updated (PC+4) during the fetch cycle so that it holds the address
of the next instruction
• limits the branch distance to -210 to +210-1 words from the (instruction
after the) branch instruction, but most branches are local anyway

imm[12,10:5] rs2 rs1 funct3 imm[4:1,11] opcode

Add branch dst

1
PC address

Add 0 ?
4

Peking University
Instructions for Accessing Procedures
RISC-V procedure call instruction:
jal x1, ProcedureAddress #jump and link

Saves PC+4 in register $x1 to have a link to the next instruction for the procedure return
Machine format (UJ format):

immediate[20,10:1,11,19:12] rd opcode
Return instruction:
jalr x0, 0(x1) #jump and link register

Question: how about unconditional jump/branch?

Answer: jal x0, Label

Peking University
Unconditional Branch
Use register x0 to help
jal x0, Label #unconditionally branch to Label

A long jump with jalr

lui x5, 32423
jalr x1, 2342(x5)

Peking University
Instructions for Accessing Procedures
Recall how function works in Name Register Usage Preserve
the programming language: Number on call?
1. Parameters; x0 0 the constant value 0 n.a.
2. Reserve caller’s info； x1 (ra) 1 return address (link caller
3. Global variables； register)
x2 (sp) 2 stack pointer callee
x3 (gp) 3 global pointer --
x4 (tp) 4 thread pointer --
x5 - x7 5-7 temporaries caller
x8 - x9 8-9 frame pointer/saved callee
x10 - x17 10-17 arguments/results caller
x18 - x27 18-27 saved callee
X28 - x31 28-31 temporaries caller

Peking University
Spilling Registers
What if the callee needs more registers? What if the procedure is recursive?
• uses a ______
stack – a last-in-first-out queue – in memory for passing additional
values or saving (recursive) return address(es)

Ø One of the general registers, sp

(x2), is used to address the stack
high addr (which “grows” from high address
to low address)
l push
add data onto the stack – ____
top of stack $sp
sp = sp – 4
data on stack at new $sp
pop
low addr
l remove data from the stack – ____
data from stack at $sp
sp = sp + 4
Peking University
Spilling Registers (cont.)
How does memory allocation look like?

Argument reg.
Return address
Saved reg.

Peking University
Instructions for Synchronization
• Synchronization primitive:
• A simple lock: the value 0 is used to indicate that the lock is free
and 1 is used to indicate that the lock is unavailable
• Hardware: atomic exchange or atomic swap
• Recall: C&S primitive in CISC

Peking University
Instructions for Synchronization (cont.)
C&S is a complex instruction, so RISC-V uses two R-type commands to replace it:
lr.d x5, (x6) # x5 = Memory[x6]
sc.d x7, x5, (x6) # Memory[x6] = x5; x7=0/1
• lr.d: load-reserved doubleword;
• sc.d: store-conditional doubleword;
• Function: if the contents of the memory location specified by the load-reserved are
changed before the store-conditional to the same address occurs, then the store-
conditional fails and does not write the value to memory.
• Example of atomic exchange between x23 and (x20):

Peking University
Instructions for Atomic Memory Updates
The pair of synchronization instructions is used to achieve atomic memory
updates without locking.
lr.d x5, (x6) # x5 = Memory[x6]
sc.d x7, x5, (x6) # Memory[x6] = x5; x7=0/1

In this example, lr.d (load reserved) will load the value stored at Memory[x6] into
register x5, then you can modify it however you like there.

sc.d (store conditional) will overwrite Memory[x6] with your modified value in x5, only
if Memory[x6] has not been altered while you were working on the copy in x5.

x7 indicates whether sc.d is successful.

Peking University
Question review
ØEdge case study of instruction LR.d, SC.d
ØCase 1: LR/SC addresses don’t match – can this succeed?
lr.w t0,(a0)
sc.w t1,a1,(a3)
Note that:
ØCase 2: unbalanced LR.d, SC.d • the SC.W succeeds only if the
lr.w t0,(a0) reservation is still valid and the
sc.w t1,a1,(a0) reservation set contains the bytes
addi a1,a1,1 being written.
sc.w t2,a1,(a0) • Regardless of success or failure,
executing an SC.W instruction
ØCase 3: multiple LRs, SCs from one core invalidates any reservation held by
lr.w t0,(a0) this hart.
lr.w t1,(a2)
sc.w t2,a1,(a0)
sc.w t3,a1,(a2)

https://five-embeddev.com/riscv-isa-manual/latest/a.html#
Peking University
RISC‐V ISA So Far
Category Instr Op Code Example Meaning
Arithmetic add 0110011 add x5, x6, x7 x5 = x6 + x7
(R & I subtract 0110011 sub x5, x6, x7 x5 = x6 - x7
format) add immediate 0010011 addi x5, x6, 20 x5 = x6 + 20
or immediate 0010011 ori x5, x6, 20 x5 = x6 | 20
Data load double word 0000011 ld x5, 40(x6) x5 = Memory[x6 + 40]
Transfer store double word 0100011 sd x5, 40(x6) Memory[x6 + 40] = x5
(I & U load byte 0000011 lb x5, 40(x6) X5(7:0) = Memory[x6 + 40](7:0)
format) store byte 0100011 sb x5, 40(x6) Memory[x6 + 40](7:0) = x5(7:0)
load upper imm 0110111 lui x5, 0x12345 x5 = 0x12345000
Cond. br on equal 1100111 beq x5, x6, 100 if (x5 == x6) go to PC+100
Branch br on not equal 1100111 bne x5, x6, 100 if (x5 != x6) go to PC+100
(SB format)
Jump (UJ jump and link 1100111 jal x1, imm x1 = PC+4; PC = PC+{imm,1’b0}
and I
format) jump and link reg. 1101111 jalr x0, 0(x1)

Peking University
Addressing modes
• Register addressing – operand is in a register

funct7 rs2 rs1 funct3 rd opcode Register

doubleword operand

• Base (displacement) addressing – operand is at the memory location whose

address is the sum of a register and a 12-bit constant contained within the
instruction

immediate rs1 funct3 rd opcode Memory

Double word or word or
half word or byte operand
base register

Peking University
Addressing modes (cont.)
• Immediate addressing – operand is a 12-bit constant contained within the
instruction

immediate rs1 funct3 rd opcode

• PC-relative addressing –instruction address is the sum of the PC and a 12-

bit constant contained within the instruction

imm rs2 rs1 funct3 imm opcode

Memory
branch destination instruction
Program Counter (PC)

Peking University
RISC‐V Design Principles
ØSimplicity favors regularity
• fixed size instructions – 32-bits
• small number of instruction formats
ØGood design demands good compromises
• six instruction formats
ØSmaller is faster
• limited instruction set
• limited number of registers in register file
• limited number of addressing modes
ØMake the common case fast
• arithmetic operands from the register file (load-store machine)
• allow instructions to contain immediate operands

Peking University

Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
UNIT 4 NOTES Oops
No ratings yet
UNIT 4 NOTES Oops
15 pages
COMP 303 Computer Architecture
No ratings yet
COMP 303 Computer Architecture
34 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Ansys Mechanical Users Guide
No ratings yet
Ansys Mechanical Users Guide
2,800 pages
CipherTrust Manager - Hands-On - CTE - Linux
0% (1)
CipherTrust Manager - Hands-On - CTE - Linux
25 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
1aca L1
No ratings yet
1aca L1
35 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
02 Performance
No ratings yet
02 Performance
23 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Designing For Performance - Performance Metrics
No ratings yet
Designing For Performance - Performance Metrics
19 pages
Cse431 04
No ratings yet
Cse431 04
17 pages
Performance
No ratings yet
Performance
51 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
Da Ci
No ratings yet
Da Ci
13 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
User Manual Efiling NGT
No ratings yet
User Manual Efiling NGT
70 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
Performance Matrices
No ratings yet
Performance Matrices
14 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
CP R80.10 EndpointSecurity AdminGuide
No ratings yet
CP R80.10 EndpointSecurity AdminGuide
190 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
CSC232 - Chp1 (Compatibility Mode)
No ratings yet
CSC232 - Chp1 (Compatibility Mode)
50 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Al3452 Os Notes
No ratings yet
Al3452 Os Notes
280 pages
09 Perf
No ratings yet
09 Perf
22 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
1 - Introduction To Computer System
No ratings yet
1 - Introduction To Computer System
31 pages
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
28 pages
Machine-Level HMI With FactoryTalk View Machine Edition Basic Lab
No ratings yet
Machine-Level HMI With FactoryTalk View Machine Edition Basic Lab
114 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Final - Proposal 1 - Student - Management - System
No ratings yet
Final - Proposal 1 - Student - Management - System
56 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
OOSE Lab Manual
No ratings yet
OOSE Lab Manual
170 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
Lect 1
No ratings yet
Lect 1
56 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Intro
No ratings yet
Intro
14 pages
Performance
No ratings yet
Performance
12 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Lect 1
No ratings yet
Lect 1
54 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Digital Microscope: Instruction Manual
No ratings yet
Digital Microscope: Instruction Manual
72 pages
Thesis On Mobile Cloud Computing
100% (2)
Thesis On Mobile Cloud Computing
5 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
Huawei Imanager Neteco Software Brochure
No ratings yet
Huawei Imanager Neteco Software Brochure
6 pages
DetailedFormHelpDoc OTR
No ratings yet
DetailedFormHelpDoc OTR
16 pages
Worksheet in TLE 6-Week 9
100% (1)
Worksheet in TLE 6-Week 9
2 pages
Library Management System: Supervisor
No ratings yet
Library Management System: Supervisor
21 pages
CSS Stylesheets
No ratings yet
CSS Stylesheets
31 pages
Develop Your Own Rat
No ratings yet
Develop Your Own Rat
85 pages
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
PPDS OSS Restriction Maintenance For Model Mix Planning SAPAPO RET2
No ratings yet
PPDS OSS Restriction Maintenance For Model Mix Planning SAPAPO RET2
2 pages
Group 2 - Security - Comelec Data Theft
No ratings yet
Group 2 - Security - Comelec Data Theft
20 pages
Studio One 6 - Release Notes
No ratings yet
Studio One 6 - Release Notes
10 pages
Database Engineering Summary of Coursework-1
No ratings yet
Database Engineering Summary of Coursework-1
4 pages
HPE6 A79 Demo
No ratings yet
HPE6 A79 Demo
8 pages
Susan Diebel Lms
No ratings yet
Susan Diebel Lms
14 pages
2nd Generation Computers
No ratings yet
2nd Generation Computers
3 pages
Genrate QR Code in EBS Custom Report - P
No ratings yet
Genrate QR Code in EBS Custom Report - P
8 pages
STK - 32-Bit Starter Kit Quick Guide - Eng
No ratings yet
STK - 32-Bit Starter Kit Quick Guide - Eng
23 pages
SLT Form Two
No ratings yet
SLT Form Two
5 pages
VBOX 3 Data Sheet
No ratings yet
VBOX 3 Data Sheet
2 pages
Concert Band 9 10
No ratings yet
Concert Band 9 10
1 page
Attachment
No ratings yet
Attachment
1 page

2 RISC V Performance ISA

Uploaded by

2 RISC V Performance ISA

Uploaded by

Computer Architecture

Lecture 02: Performance &RISC-V ISA

If X is n times faster than Y, then

• Throughput – the total amount of work done in a given time

one clock period

10 nsec clock cycle => 100 MHz clock rate

CPI for this instruction class

• Where ICi is the count ( percentage) of the number of instructions of class i

Ø The overall effective CPI varies by instruction mix – a measure of the

ISA, Organization, Hardware

A.K.A. The “iron law” of performance

Sometimes clock cycle time given

Ø “Throughput of X is n times that of Y”

∑= 2.2 1.6 2.0 1.95

Ø Neat property of the geometric mean: Consistent whatever the

CPI1 + CPI2 + ... + CPIN

Ø Must use Harmonic Mean to remain ≈ to runtime

Ø What in the world is this?

Ø What if enhancement does not enhance everything?

Important: Principle of locality

Ø Driving from SIT to Sinchon Campus

Operating System (threads, files, exceptions)

Computer Architecture (instruction set)

Micro-Architecture (execution pipeline)

Logic (adders, multipliers, FSMs)

Digital Circuits (gates)

Analog Circuits (amplifiers)

Process and Devices (MOSFET transistors)

Physics (electrons, holes, tunneling, band gap)

ØProperties of a good abstraction:

• ISA support is given by RV + word-width + extensions supported

6 Instruction Formats: all 32 bits wide PC

immediate rs1 funct3 rd opcode I format

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S format

Sign bit imm[10:5] imm[19:12] imm[4:1]

destination source1 op source2

funct7 rs2 rs1 funct3 rd opcode

opcode 7-bits opcode that specifies the operation

• The memory address – a ____bit

immediate rs1 funct3 rd opcode I format

Omit the MSB

immediate rs1 funct3 rd opcode I format

Store Instruction Format (S format):

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S format

lb immediate rs1 funct3 rd opcode

Ø What 8 bits get loaded and stored?

addi sp, sp, 4 #sp = sp + 4

• Machine format (I format):

immediate[31:12] rd opcode U format

Ø Instruction Format (SB format):

imm[12,10:5] rs2 rs1 funct3 imm[4:1,11] opcode

Ø How is the branch destination address specified?

imm[12,10:5] rs2 rs1 funct3 imm[4:1,11] opcode

Add branch dst

Question: how about unconditional jump/branch?

A long jump with jalr

Ø One of the general registers, sp

x7 indicates whether sc.d is successful.

funct7 rs2 rs1 funct3 rd opcode Register

• Base (displacement) addressing – operand is at the memory location whose

immediate rs1 funct3 rd opcode Memory

immediate rs1 funct3 rd opcode

• PC-relative addressing –instruction address is the sum of the PC and a 12-

imm rs2 rs1 funct3 imm opcode

You might also like