L06 - RISCVII (Revised)
L06 - RISCVII (Revised)
Computer Architecture
Intro to RISC-V I
Instructors:
Siting Liu & Chundong W ng
Course website: https://to st-l b.sist.sh ngh itech.edu.cn/courses/CS110@Sh ngh iTech/
Spring-2023/index.html
School of Inform tion Science nd Technology (SIST)
Sh ngh iTech University
2023/2/6
a
a
a
a
a
a
a
a
a
a
a
Course Info
• Lab 3 will be released after class (10 a.m.), get yourself prepared
before going to lab sessions!
• Our project 1.1 will be available this weekend, and will be marked
in lab sessions. Deadline March 13th.
• Next week discussion on RISC-V related materials and assembly
coding.
2
Assembly Instructions
• Different types of instructions
• I-type
• Register-Immediate type
• Has two operands (one accessed from source register, another a constant/
immediate) and one output (saved to destination register)
• Can do arithmetic/logic/load from main memory/jump (covered later)
3
RV32I I-type Arithmetic
• Syntax of instructions: assembly language
• Addition: addi rd, rs1, imm
Adds imm to rs1, stores the result to rd, and imm is a signed number.
• Example: addi x5, x4, 10
addi x6, x4, -10
Registers
• Similarly, andi/ori/xori/slti/sltiu 0 x0/zero
0x12340000 x1
• All the imm’s are sign-extended x2
0x00006789
0xFFFFFFFF x3
0x3 x4
x5
• slli/srli/srai are special (de ined in x6
RV64I), and can be extended to RV32I usage x7
(RTFM)
4
f
RV32I Arithmetic/Logic Test
• addi x1, x2, -1 Registers
• or x2, x2, x1 0 x0/zero
• add x3, x1, x2
• slt x4, x3, x1 0 x1
• sra x5, x3, x4 0 x2
• sub x0, x5, x4
0 x3
• Register zero (x0) is ‘hard-wired’ to 0; 0 x4
• By convention RISC-V has a speci ic 0 x5
no-op instruction...
0 x6
– add x0 x0 x0
– You may need to replace code later: No- 0 x7
ops can fill space, align data, and
perform other options
– Practical use in jump-and-link
operations (covered later)
5
f
RV32I I-type Load
Memory
Processor Input
Enable?
Read/Write
Control
Program
Datapath Much larger place
Address
Bytes to hold values, but
PC
slower than registers!
Registers
Write Data
Examples Examples
Names in the West (e.g. Siting, Liu) Names in China (e.g. LIU Siting)
Java Packages: (e.g. org.mypackage.HelloWorld) Internet names (e.g. sist.shanghaitech.edu.cn)
Dates done correctly ISO 8601 YYYY-MM-DD Dates written in England DD/MM/YYYY
(e.g. 2020-03-22) (e.g. 22/03/2020)
Eating Pizza crust first Eating Pizza skinny part first (the normal way)
Unix file structure (e.g., /usr/local/bin/python)
”Network Byte Order”: most network protocols CANopen
IBM z/Architecture; very old Macs Intel x86; RISC-V (can also support big-endian)
Offset Base
• lw rd, imm(rs1) : Load word at addr. to register rd
addr.= (number in rs1) + imm Bytes
• Example 56 34 23
34 12 cd
01
ab
3c
56 34 23 01
lw x1, 12(x4) 34 12 cd ab
0 x0/zero 56 34 23 01
addr.= 4 + 12 = (10)HEX 0x12340000 34 12 cd ab
x1 56 34 23 01
0x00006789 x2 34 12 cd ab
56 34 23 01
0xFFFFFFFF x3 34 12 cd ab
0x4 x4 56 34 23 01 …
34 12 cd ab
x5 56 34 23 01 c
x6 34 12 cd ab 8
56 34 23 01 4
x7 34 12 cd ab 0 8
Registers Main memory
Assembly Instructions—Load
• RV32I is a load-store architecture, where only load and store instructions
access memory and arithmetic instructions only operate on CPU registers.
14
Question! What’s in x12?
15
Question! What’s in x12?
D: 0xBC
E: 0XFFFFFF85
F: 0XFFFFFFF8
G: 0XFFFFFFC
H: 0XFFFFFFBC
16
Summary
• RISC-V is little-endian
• Load-store architecture
17
CS 110
Computer Architecture
Intro to RISC-V II
Computer Decision Making
Instructors:
Siting Liu & Chundong W ng
Course website: https://to st-l b.sist.sh ngh itech.edu.cn/courses/CS110@Sh ngh iTech/
Spring-2023/index.html
School of Inform tion Science nd Technology (SIST)
Sh ngh iTech University
2023/2/6
a
a
a
a
a
a
a
a
a
a
a
Computer Decision Making—Branch
• Normal operation: execute instructions in sequence
• In C: if/while/for-statement; function call
• RISV-V provides conditional branch (B-type) & unconditional jump (j)
• C code • Assembly
int main(void) { addi x2, x0, 5
int i=5; addi x3, x0, 6
if (i!=6){ bne x2, x3, L1
i++; beq x2, x3, L2
}
L1:addi x2, x2, 1
else i--;
return 0; ret (kind of jump)
} L2:addi x2, x2, -1
ret
• Label can also point to data (more in discussion) 20
Computer Decision Making—Branch
• Assembly (real stuff in ARM64)
• Example: mov w8, #5
Ltmp3:
beq rs1, rs2, L(imm/label) .loc 1 10 9 is_stmt 0
• C code
subs w8, w8, #6
b.eq LBB0_2
b LBB0_1
LBB0_1:
Ltmp4:
int main(void) { .loc 1 11 10 is_stmt 1
int i=5; ldr w8, [sp, #8]
if (i!=6){ add w8, w8, #1
str w8, [sp, #8]
i++; .loc 1 12 5
} b LBB0_3
else i--; Ltmp5:
LBB0_2:
return 0; .loc 1 13 11
} ldr w8, [sp, #8]
subs w8, w8, #1
str w8, [sp, #8]
b LBB0_3
Ltmp6:
LBB0_3: 21
.loc 1 0 11 is_stmt 0
mov w0, #0
.loc 1 14 5 is_stmt 1
add sp, sp, #16
ret
Computer Decision Making—Branch
• Normal operation: execute instructions in sequence
• In programming languages: if/while/for-statement
• RISV-V provides conditional branch & unconditional jump
22
C Loop Mapped to RISC-V Assembly
int A[20]; # Assume x8 holds pointer to A
int sum = 0; # Assign x10=sum
for (int i=0; i < 20; i++) add x9, x8, x0 # x9=&A[0]
sum += A[i]; add x10, x0, x0 # sum=0
add x11, x0, x0 # i=0
addi x13,x0, 20 # x13=20
Loop:
bge x11,x13,Done
lw x12, 0(x9) # x12=A[i]
add x10,x10,x12 # sum+=
addi x9, x9,4 # &A[i+1]
addi x11,x11,1 # i++
j Loop
Done:
23
Optimization
• The simple translation is sub- # Assume x8 holds pointer to A
optimal! # Assign x10=sum
25
Translate Assembly
addi x10, x0, 0x7 x10 = 7
add x12, x0, x0 x12 = 0
label_a: label_a: x14 = x10 & 1
andi x14, x10, 1 if (x14!=0)
beq x14, x0, label_b {x12 = x10+x12;}
add x12, x10, x12
label_b: x10 = x10-1;
label_b:
if (x10!=0)
addi x10, x10, -1
{go to label_a;}
bne x10, x0, label_a
26
Call a Function—Unconditional Jump
0000000100003f40 <_main>:
100003f40: ff c3 00 d1 sub sp, sp, #48
… …
100003f58: 48 9a 80 52 mov w8, #1234
100003f5c: a8 83 1f b8 stur w8, [x29, #-8]
100003f60: 28 1c 82 52 mov w8, #4321
100003f64: a8 43 1f b8 stur w8, [x29, #-12]
100003f68: a8 83 5f b8 ldur w8, [x29, #-8]
100003f6c: a9 43 5f b8 ldur w9, [x29, #-12]
100003f70: 08 01 09 0b add w8, w8, w9
… …
100003f90: 05 00 00 94 bl 0x100003fa4 <_printf+0x100003fa4>
… …
Disassembly of section __TEXT,__stubs:
0000000100003fa4 <__stubs>: Memory
100003fa4: 10 00 00 b0 adrp x16, 0x100004000 <__stubs+0x4>
100003fa8: 10 02 40 f9 ldr x16,Processor
[x16]
Read
100003fac: 00 02 1f d6 br x16
Control Instructions Data
Increase by 4
Datapath
each time an Bytes
PC
instruction
is executed Registers
Instruction
Except for Program
Arithmetic & Logic Unit Address
branch/jump/ (ALU)
function call 27
Call a Function
#include <stdio.h> 3. Acquire (local) storage resources
int sum_two_number(int a, int b) needed for function
{
int y; 4. Perform desired task of the
return y=a+b; function
}
int main(int argc, const char * argv[]) {
int x=4321, y=1234;
int a=1,b=2,c=3,d=4,e=5,f=6,g=0;
y = sum_two_number(x,y);
c = sum_two_number(a,b); Memory
Processor
f = sum_two_number(e,d);
g = sum_two_number(c,f); Control
printf("Sum is %d.\n",y);
return 0; Datapath
} PC
Bytes
1. Put parameters in a place where Registers
function can access them
Arithmetic & Logic Unit
2. Transfer control to function (PC (ALU)
jump to sum_two_number)
28
Call a Function
#include <stdio.h>
int sum_two_number(int a, int b) 6. Return control to point of origin,
{ since a function can be called
int y;
return y=a+b; from several points in a program
}
int main(int argc, const char * argv[]) {
int x=4321, y=1234;
int a=1,b=2,c=3,d=4,e=5,f=6,g=0;
y = sum_two_number(x,y);
c = sum_two_number(a,b); Memory
Processor
f = sum_two_number(e,d);
g = sum_two_number(c,f); Control
printf("Sum is %d.\n",y);
return 0; Datapath
} PC
Bytes
Registers
5. Put result value in a place where
calling code can access it and Arithmetic & Logic Unit
(ALU)
restore any registers you used
29
RISC-V Function Call Conventions
• Registers faster than memory, so use them as much as possible
• Give names to registers, conventions on how to use them
31
Call a Function
#include <stdio.h>
int sum_two_number(int a, int b) y is returned function argument;
{
int y; Can be put in registers a0-a1
return y=a+b;
}
int main(int argc, const char * argv[]) {
int x=4321, y=1234;
int a=1,b=2,c=3,d=4,e=5,f=6,g=0;
y = sum_two_number(x,y);
c = sum_two_number(a,b); Memory
Processor
f = sum_two_number(e,d);
g = sum_two_number(c,f); Control
printf("Sum is %d.\n",y);
return 0; Datapath
} PC
Bytes
x and y are function arguments; Registers
Can be put in registers a0-a7 Arithmetic & Logic Unit
(ALU)
32
Call a Function
#include <stdio.h>
int sum_two_number(int a, int b)
{ Func_called:
0x2000
int y;//one instruction
0x2004
return//another
y=a+b; instruction
} …… //need jump back to main()
int main(int argc, const char * argv[]) {
int x=4321, y=1234;
Start:
int a=1,b=2,c=3,d=4,e=5,f=6,g=0;
0x1000 //one instruction
y = sum_two_number(x,y);
0x1004 //another instruction
Save this value c = sum_two_number(a,b);
0x1008 //a third instruction
to register ra f = sum_two_number(e,d);
0x100c //PC jump to 0x2000 (call function
g = sum_two_number(c,f);
sum_two_number)
printf("Sum is %d.\n",y);
0x1010 //next instruction… …
return 0;
} …… 33
Call a Function—Jump
• JAL: Jump & Link, jump to function
• Unconditional jump (J-type)
jal rd label
Jump to label (imm+PC, explain later) and save return address
(PC+4) to rd;
rd is x1 (ra) by convention; sometimes can be x5.
When rd is x0, it is simply unconditional jump (j) without
recording PC+4.
34
Return—Jump
• JALR: Jump & Link Register
• Unconditional jump (I-type)
jalr rd label
Jump to label (imm+rs1)&~1 and save return address (PC+4) to rd
rs1 can be the return address we just saved to ra
When rd is x0, it is simply unconditional jump (j) without recording
PC+4.
35
Jump
—jal rd offset —jalr rd rs offset
• Jump and Link
• Add the immediate value to the current address in the program (the “Program
Counter”), go to that location
• The offset is 20 bits, sign extended and left-shifted one (not two)
• At the same time, store into rd the value of PC+4
• So we know where it came from (need to return to)
• jal offset == jal x1 offset (pseudo-instruction; x1 = ra = return
address)
• Two uses:
• Unconditional jumps in loops and the like
• Calling other functions 36
Jump and Link Register
• The same except the destination
• Instead of PC + immediate it is rs + immediate
• Same immediate format as I-type: 12 bits, sign extended
• Again, if you don’t want to record where you jump to…
• jr rs == jalr x0 rs
38
Call a Function
1. Put parameters in a place where function can access them
40
f
Stack
• Stack frame may include:
• Return “instruction” address
• Parameters (spill)
• Space for other local variables
• Stack frames contiguous; stack pointer (sp/x2) tells where bottom of
stack frame is
• When procedure ends, stack frame is tossed off the stack; frees memory
for future stack frames; sp restores
41
Example
• Leaf function: a function that calls no function
0 x0/zero
ra x1
sp x2
int Leaf (int g, int h, int i, int j) ……
{
s1 x9
int f; f = (g + h) - (i + j);
return f; a0 x10
} a1 x11
int main (void){ a2 x12
int a=1, b=2, c=3, d=4, e;
a3 x13
e = Leaf(a,b,d,c);
return e; a4 x14
} /*a function called by OS*/ ……
43
Stack Before, During, After Function
• Need to save old values of ra, a1, a2, a3 and a4 (caller-saved)
• W.r.t. main()
sp sp
Saved ra Saved ra
Saved a1 Saved a1
Saved a2 Saved a2
Saved a3 Saved a3
Saved a4 Saved a4
sp
Stack
During call
46
RISC-V Code for Main()/Leaf()
Main:
addi sp, sp, -20 # adjust stack for 5 items, 4 int & 1 ra pointer/address
sw ra, 16(sp) # save ra for use afterwards (return to OS)
sw a1, 12(sp) # save a1 for use afterwards, these are all caller-saved
sw a2, 8(sp) # save a2 for use afterwards
sw a3, 4(sp) # save a3 for use afterwards
sw a4, 0(sp) # save a4 for use afterwards OS stack
Saved ra
jal ra, Leaf # save a1 for use afterwards
Saved a1
Saved a2
lw a1, 12(sp) # restore register a1
Saved a3
lw a2, 8(sp) # restore register a2 Saved a4
lw a3, 8(sp) # restore register a2 sp
lw a4, 8(sp) # restore register a2
lw ra, 8(sp) # restore register ra
addi sp, sp, 20 # adjust stack to delete 5 items
Stack
mv a0, a0 # move result to return register
During call
jr ra # return 47
Call a Function
1. Caller put parameters in a place where function can access
them (a0-a7, or stack when registers not avail.), and then
save caller-saved registers to stack
2. Transfer control to function (PC jump to function): JAL, ra
is changed to where caller left
3. Acquire (local) storage resources needed for function:
change sp (size decided when compiling);
Push callee-saved registers to stack (e.g., s0-s11)
4. Perform desired task of the function
5. Put result value in a place where calling code can access it
(a0, a1), and restore callee-saved registers (s0-s11, sp)
6. Return control to point of origin, since a function can be
called from several points in a program (jr); caller restores
caller-saved registers 48