0% found this document useful (0 votes)
22 views35 pages

Slide 4

The lecture covers the RISC-V Instruction Set Architecture (ISA), focusing on its support for high-level programming constructs like conditionals and loops. It discusses translating C statements into assembly language, the use of pointers for efficient array access, and the conventions for function calls and stack management. Additionally, it outlines the process of compiling, linking, and the memory layout in RISC-V architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views35 pages

Slide 4

The lecture covers the RISC-V Instruction Set Architecture (ISA), focusing on its support for high-level programming constructs like conditionals and loops. It discusses translating C statements into assembly language, the use of pointers for efficient array access, and the conventions for function calls and stack management. Additionally, it outlines the process of compiling, linking, and the memory layout in RISC-V architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

ELT3047 Computer Architecture

Lecture 5: RISC-V ISA (2)

Hoang Gia Hung


Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Last lecture review (1)
❑ Applications of ISA design principles to 5 aspects of RISC-V
➢ Data Storage: 32 GPRs with load/store design for optimal support for compiler
➢ Addressing modes:
Displacement addressing (RV64I)
Last lecture review (2)
❑ Operations in the Instruction Set: 3 types - Arithmetic and
Logical, Data transfer, and Control.
❑ Encoding the Instruction Set:

➢ Self-practice: Venus Online RISC V Simulator https://venus.cs61c.org/

❑ Today: RISC-V ISA’s support for high level programming


IF statement
C Statement to translate Variables Mapping
if (i == j) f ➔ x12
f = g + h; g ➔ x13
h ➔ x14
i ➔ x15
j ➔ x16

beq x15, x16, L1


j Exit bne x15, x16, Exit
L1: add x12, x13, x14 add x12, x13, x14
Exit: Exit:

❑ Two equivalent translations:


➢ The one on the right is more efficient

❑ Common technique: Invert the condition for shorter code


Loops (1/2)

Key concept:
Any form of loop can be written in assembly with the
help of conditional branches and jumps.

▪ C while-loop: ▪ Rewritten with goto

Loop: if (j != k)
goto Exit;
while (j == k)
i = i+1;
i = i + 1;
goto Loop;
Exit:
Loops (2/2)

C Statement to translate Variables Mapping


Loop: if (j != k) i ➔ x13
goto Exit; j ➔ x14
i = i+1; k ➔ x15
goto Loop;
Exit:

Loop: bne x14, x15, Exit # if (j!= k) Exit


addi x13, x13, 1
j Loop # repeat loop
Exit:
Array and Loop
❑ Access large amounts of similar data (in memory)
➢ E.g. count the number of zeros in an array A
▪ A is word array with 40 elements
Zero count C code
Element size result = 0;

0x12340010 array[4]
i = 0;
0x1234800C array[3] while ( i < 40 ) {
0x12348008 array[2]
if ( A[i] == 0 )
0x12348004 array[1]
0x12348000 array[0] result++;
i++;
}

❑ Compiling: think about


➢ How to translate A[i] correctly
➢ How to perform the right comparison
Array and Loop
❑ Array indexing vs Pointer
Accessing array elements via indexing in a loop Accessing array elements via pointers in a loop
▪ Loop: Initialization for result
Multiply index by element size variables, loop counter,
Adding to array base address and array pointers.
Load data Label:
Compare and branch
Work by:
Element size 1. Calculating address
2. Load data
3. Perform task
0x12340010 array[4]
0x1234800C array[3] Update loop counter and
𝑖 ∗ size 0x12348008 array[2] array pointers.
+ 0x12348004 array[1]
Base 0x12348000 array[0]
Compare and branch.
Array and Loop: Version 1.0
Address of A[] ➔ t0
Result ➔ t6 Comments
i ➔ t1
addi t6, zero, 0
addi t1, zero, 0
addi t2, zero, 40 # end point
loop: bge t1, t2, end
slli t3, t1, 2 # i * 4
add t4, t0, t3 # &A[i]
lw t5, 0(t4) # t5  A[i]
bne t5, zero, skip
addi t6, t6, 1 # result++
skip: addi t1, t1, 1 # i++
j loop
end:
Array and Loop: Version 2.0
Address of A[] ➔ t0
Result ➔ t6 Comments
&A[i] ➔ t1
addi t6, zero, 0
addi t1, t0, 0 # pointer to &A[current]
addi t2, t0, 160 # end point: &A[40]
loop: bge t1, t2, end # comparing address!
lw t3, 0(t1) # t3  A[i]
bne t3, zero, skip
addi t6, t6, 1 # result++
skip: addi t1, t1, 4 # move to next item
j loop
end:

❑ Use of “pointers” can produce more efficient code!


➢ Reduces the instructions executed per iteration by 2
Procedure/function call: overview
❑ Procedure/function call
➢ Spy analogy: Leaves with a secret plan, acquires resources, performs the
task, covers their tracks & returns to the point of origin with desired result.
➢ One way to implement abstraction in software
➢ C Code example:
... sum(a,b);... /* a,b:s0,s1 */
int sum(int x, int y) {
return x+y; }
➢ Assembly translation:
address (shown in decimal)
1000 mv a0,s0 # x = a
1004 mv a1,s1 # y = b
1008 addi ra,zero,1016 # ra=1016
1012 j sum # jump to sum
1016 … # next instruction

2000 sum: add a0,a0,a1
2004 jr ra # jump register, why not use j?
Instructional support for function call
❑ Return from function: jr ra
➢ Unconditional jump to address specified in register ra
➢ Is a pseudo instruction (i.e. just an assembler shorthand), like ret

❑ Single instruction to jump and save return address:


➢ jal FunctionLabel: jumps to address of FunctionLabel &
simultaneously saves the address of the following instruction in ra.
▪ jal in this format is also a pseudo instruction
➢ Why RV32I needs jal?
▪ Make the common case fast: function calls very common
▪ Programmers needn’t know instruction address in memory with jal
▪ Reduce program size:
Before:
1008 addi ra,zero,1016 # ra=1016
1012 j sum # goto sum
After:
1008 jal sum # ra=1012, goto sum
Architectural support for function call
❑ Steps required and architectural support
Step Description RISC-V conventions
1 Place parameters in registers a2–a7 (x12-x17)
2 Transfer control to procedure & save return address jal, ra
3 Acquire storage for procedure $gp, $sp, $fp
4 Perform function’s operations t0–t6, s1-s11
5 Place result in register for caller a0, a1
6 Return to place of call jr, ra

❑ Example:
int Leaf(int g, int h, int i, int j) {
int f;
f = (g + h) – (i + j);
return f; }
➢ Compiling assumptions: parameter variables g, h, i, and j in argument
registers a0, a1, a2, and a3, and f in s0; one temporary register s1.
➢ Assume need to save s0, s1 for later use by the caller.
Local Data on the Stack
❑ How do we save old register values (e.g. s0, s1) before call
function, restore them when return, and delete?
➢ Stack: last-in-first-out queue with push (spilling register) & pop (restoring
register) operations.

❑ RISC-V stack conventions


➢ sp (x2) is the pointer to the last used space in the stack.
➢ Stack grows down from high to low addresses, i.e. push decrements sp,
pop increments sp
High address

sp sp
Saved s1 Saved s1
sp Saved s0 Saved s0

Low address
Before call During call After call
RISC-V Code for Leaf()
❑ Using the stack:
➢ Spilling: calculate the amount of space for spilling registers & decrease sp
by the amount of space we need, then fill it with data via store instructions.
➢ Restoring: restore the registers to previously spilled values via load
instructions, then increase sp by the same amount to clear the stack.
➢ RISC-V sssembly code for Leaf():
Leaf: addi sp,sp,-8 # adjust stack for 2 items
sw s1, 4(sp) # save s1 for use afterwards “push”
sw s0, 0(sp) # save s0 for use afterwards

add s0,a0,a1 # f = g + h
add s1,a2,a3 # s1 = i + j
sub a0,s0,s1 # return value (g + h) – (i + j)

lw s0, 0(sp) # restore register s0 for caller


lw s1, 4(sp) # restore register s1 for caller “pop”
addi sp,sp,8 # adjust stack to delete 2 items
jr ra # jump back to calling routine
Nested Procedures (1/2)
❑ Recursive function calls (a function calls a function):
int sumSquare(int x, int y) {
return mult(x,x)+ y; }
➢ Something called sumSquare, now sumSquare is calling mult → the return
address in ra that sumSquare wants to jump back to will be overwritten by
the call to mult → need to save ra before the call to mult.
➢ In general, we may need to save some other info in addition to ra → need
to reduce expensive loads and stores from spilling and restoring registers.

❑ RISC-V function-calling convention divides registers into:


1. Preserved across function call
▪ Caller can rely on values being unchanged
▪ sp, gp, tp, “saved registers” s0-s11 (s0 is also fp)
2. Non preserved across function call
▪ Caller cannot rely on values being unchanged
▪ Argument/return registers a0-a7,ra, “temporary registers” t0-t6
Nested Procedures (2/2)
❑ sumSquare compilation:
int sumSquare(int x, int y) {
return mult(x,x)+ y; }

sumSquare:
addi sp,sp,-8 # space on stack “push” non-
sw ra, 4(sp) # save ret addr preserved registers
sw a1, 0(sp) # save y
mv a1,a0 # mult(x,x)
jal mult # call mult
lw a1, 0(sp) # restore y
add a0,a0,a1 # mult()+y “pop” non-
lw ra, 4(sp) # get ret addr preserved registers
addi sp,sp,8 # restore stack
jr ra
mult: ...
Memory Layout
❑ Text: program code (binary instruction codes)
❑ Data:
➢ Static data: variables declared once per program, e.g. global variables
➢ Dynamic data: variables declared dynamically, e.g. heap (malloc in C)
➢ Stack: stores saved registers & variables are local to function & discarded
when function exits (called automatic variables) that don’t fit in registers.
▪ E.g. local arrays

❑ RV32 memory convention:


➢ Stack starts in high memory & grows down.
➢ Text segment starts in the low end right
above the reserved segment.
➢ Static data segment is above the text
segment. The RISC-V’s global pointer (gp)
points to an address in this segment.
➢ Heap segment is located above static &
grows up to high addresses.
Allocating space on the stack
❑ Local data is allocated by callee
➢ The saved registers and local variables for each callee are bundled into a
segment on the stack called procedure frame or activation record.
➢ In RV32, the frame pointer (fp) points to the first word of the frame → easier
for programmers to reference variables via the stable fp
➢ fp is initialized using the address in sp on a call, and sp is restored using fp

Before call During call After call


From writing to running a program

sum.c
Many compilers produce
object modules directly

sum.s

sum.obj

Static linking
sum.exe

When most people say


“compile” they mean
the entire process:
compile + assemble + link
Translating vs interpreting
❑ Translating (compiling) versus interpretating
HLL
HLL
Interpreting
Compiling
Machine Virtual Machine
10010100
code 11001101 Machine
Running code
Program Outputs
Hardware Hardware
outputs

Compiler Interpreter

How it converts the input? Entire program at once Read one instruction at a time

When is it needed? Once, before the 1st run Every time the program is run

Decision made at Compile time Run time

What ➢
it slows
Somedown
languagesProgram
mix bothdevelopment
concepts, e.g. Java. Program execution
Compiling
❑ Assembler (or compiler) translates HLL into machine language
using a (modified) Harvard architecture
➢ Has distinct code and data caches, backed by a common address space.
➢ Most assembler instructions represent machine instructions one-to-one
➢ Pseudo-instructions: figments of the assembler’s imagination, e.g.
mv t0, t1 → add t0, zero, t1

❑ Produces object files that contain relevant information to build a


complete program
➢ Header: described contents of object module
➢ Text segment: translated instructions (machine code)
➢ Static data segment: the static data (in binary) of the source file
➢ Relocation information: identifies lines of code that need to be fixed up
later (depend on absolute location of loaded program)
➢ Symbol table: matches names of labels and static data to the addresses of
the memory words they occupy.
➢ Debug info
Linking object modules
❑ Combines several object (.o) files into a single executable
1. Merges segments (i.e. “stitches” standard library routines together)
2. Resolve labels (determine their addresses) through relocation information
and symbol table in each object module
3. Patch location-dependent and external references

❑ Executable file has the same format as an object file, except


that it contains no unresolved references.
➢ Some location dependencies might be fixed by relocating loader, but with
virtual memory, no need to do this
➢ Program can be loaded into absolute location in virtual memory space

❑ Linking libraries
➢ Static linking: all/most needed routines in the library are loaded as part of
the executable code → image bloat & recompilation of the whole library
when new versions arise.
➢ Dynamically linked libraries (DLLs): only link/load library procedure when it
is called → adds complexity to the compiler, but avoids image bloat &
automatically pick up new library versions.
An RV64 illustration
A.obj

Linking

Assembling B.obj
Loading
❑ Load from image file on disk into memory for execution
1. Reads executable file’s header to determine size of text and data segments
2. Creates new address space for program large enough to hold text and data
segments, along with a stack segment
3. Copies instructions + data from executable file into the new address space
4. Copies arguments passed to the program onto the stack
5. Initialize registers (most registers cleared, but sp assigned address of 1st
free stack location)
6. Jump to startup routine
▪ Copies program’s arguments from stack to registers & sets the PC
▪ When main returns, start-up routine terminates program with the exit
system call

❑ In reality, loader is the operating system (OS).


➢ loading a program into memory is one of the OS kernel tasks
Anatomy of a compiler
❑ Frontend (analysis phase)
➢ Read source code text & break it up into
meaningful elements (lexeme) & gen. tokens
▪ token = {type, location} of a lexeme
➢ Check correctness, report errors
▪ e.g. “3x” is an illegal token in C
➢ Translate to intermediate representation
▪ IR = machine-independent language
▪ e.g. three-address code (3AC)

❑ Backend (synthesis)
➢ Optimize IR
▪ reduces #operations to be executed
➢ Translate IR to assembly & further optimize
▪ take advantage of particular features of
the ISA
▪ e.g. mult ← sll
Front end stages: Lexical Analysis
❑ Lexical Analysis (scanning)
➢ Source → list of tokens.
Front end stages: Syntax Analysis
❑ Syntax Analysis (parsing)
➢ Tokens → syntax tree = syntactic structure of the original source code (text)
Front end stages: Semantic Analysis
❑ Semantic Analysis
➢ Mainly type checking, i.e. if types of the operands are compatible with the
requested operation.

Symbol table
Front end stages: Intermediate
representation (IR)
❑ Internal compiler language that is
➢ Language-independent
➢ Machine-independent
➢ Easy to optimize

❑ Why yet another language?


➢ Assembly does not have enough info to
optimize it well
➢ Enable modularity and reuse

❑ A common IR: Control Flow Graph


(CFG)
➢ Nodes: basic blocks = sequences of
operations that are executed as a unit
➢ Edges: branches connecting basic blocks
Backend stages: IR optimization
❑ Perform multiple passes over the CFG
➢ A pass = a specific, simple optimization.
➢ Repeatedly apply multiple passes until no further optimization can be found.
➢ Combination of multiple simple optimizations = very complex optimizations.

❑ Typical optimizations:
➢ Dead code elimination: eliminate assignments to variables that are never
used and basic blocks that are never reached.
➢ Constant propagation: identify variables that have a constant value &
substitute that constant in place of references to the variable.
➢ Constant folding: compute expressions with all constant operands.
➢ Example: optimize
IR optimization example: 1st batch of
passes
❑ First 3 passes:
1. Dead code elimination: remove the assignment to z in the first basic block.
2. Constant propagation: replace all references to x with the constant 3.
3. Constant folding: compute constant expressions: y=3+7=10 & _t1=3/2=1
IR optimization example: subsequent
batches of passes

Repetition of simple
optimizations on CFG
=
Very powerful optimizations

No further
optimization found
Backend stages: Code generation
❑ Translate IR to assembly
➢ Map variables to registers (register allocation)
▪ Code generator assigns each variable a dedicated register.
▪ If #variables > #registers, map some less frequently used variables to
Mem and load/store them when needed.
➢ Translate each assignment to instructions
▪ Some assignments requires > 1 instructions.
➢ Emit each basic block
▪ Codes + appropriate labels and branches.
➢ Reorder basic block code wherever possible
▪ to eliminate superfluous jumps.
➢ Perform ISA- and CPU-specific optimizations
▪ e.g. reorder instructions to improve performance.
Summary

❑ How RISC-V ISA supports high level programming paradigms


❑ How a HLL code is translated and run
❑ Next lecture: Implementation of RISC-V ISA

You might also like