Slide 4
Slide 4
Key concept:
Any form of loop can be written in assembly with the
help of conditional branches and jumps.
Loop: if (j != k)
goto Exit;
while (j == k)
i = i+1;
i = i + 1;
goto Loop;
Exit:
Loops (2/2)
0x12340010 array[4]
i = 0;
0x1234800C array[3] while ( i < 40 ) {
0x12348008 array[2]
if ( A[i] == 0 )
0x12348004 array[1]
0x12348000 array[0] result++;
i++;
}
❑ Example:
int Leaf(int g, int h, int i, int j) {
int f;
f = (g + h) – (i + j);
return f; }
➢ Compiling assumptions: parameter variables g, h, i, and j in argument
registers a0, a1, a2, and a3, and f in s0; one temporary register s1.
➢ Assume need to save s0, s1 for later use by the caller.
Local Data on the Stack
❑ How do we save old register values (e.g. s0, s1) before call
function, restore them when return, and delete?
➢ Stack: last-in-first-out queue with push (spilling register) & pop (restoring
register) operations.
sp sp
Saved s1 Saved s1
sp Saved s0 Saved s0
Low address
Before call During call After call
RISC-V Code for Leaf()
❑ Using the stack:
➢ Spilling: calculate the amount of space for spilling registers & decrease sp
by the amount of space we need, then fill it with data via store instructions.
➢ Restoring: restore the registers to previously spilled values via load
instructions, then increase sp by the same amount to clear the stack.
➢ RISC-V sssembly code for Leaf():
Leaf: addi sp,sp,-8 # adjust stack for 2 items
sw s1, 4(sp) # save s1 for use afterwards “push”
sw s0, 0(sp) # save s0 for use afterwards
add s0,a0,a1 # f = g + h
add s1,a2,a3 # s1 = i + j
sub a0,s0,s1 # return value (g + h) – (i + j)
sumSquare:
addi sp,sp,-8 # space on stack “push” non-
sw ra, 4(sp) # save ret addr preserved registers
sw a1, 0(sp) # save y
mv a1,a0 # mult(x,x)
jal mult # call mult
lw a1, 0(sp) # restore y
add a0,a0,a1 # mult()+y “pop” non-
lw ra, 4(sp) # get ret addr preserved registers
addi sp,sp,8 # restore stack
jr ra
mult: ...
Memory Layout
❑ Text: program code (binary instruction codes)
❑ Data:
➢ Static data: variables declared once per program, e.g. global variables
➢ Dynamic data: variables declared dynamically, e.g. heap (malloc in C)
➢ Stack: stores saved registers & variables are local to function & discarded
when function exits (called automatic variables) that don’t fit in registers.
▪ E.g. local arrays
sum.c
Many compilers produce
object modules directly
sum.s
sum.obj
Static linking
sum.exe
Compiler Interpreter
How it converts the input? Entire program at once Read one instruction at a time
When is it needed? Once, before the 1st run Every time the program is run
What ➢
it slows
Somedown
languagesProgram
mix bothdevelopment
concepts, e.g. Java. Program execution
Compiling
❑ Assembler (or compiler) translates HLL into machine language
using a (modified) Harvard architecture
➢ Has distinct code and data caches, backed by a common address space.
➢ Most assembler instructions represent machine instructions one-to-one
➢ Pseudo-instructions: figments of the assembler’s imagination, e.g.
mv t0, t1 → add t0, zero, t1
❑ Linking libraries
➢ Static linking: all/most needed routines in the library are loaded as part of
the executable code → image bloat & recompilation of the whole library
when new versions arise.
➢ Dynamically linked libraries (DLLs): only link/load library procedure when it
is called → adds complexity to the compiler, but avoids image bloat &
automatically pick up new library versions.
An RV64 illustration
A.obj
Linking
Assembling B.obj
Loading
❑ Load from image file on disk into memory for execution
1. Reads executable file’s header to determine size of text and data segments
2. Creates new address space for program large enough to hold text and data
segments, along with a stack segment
3. Copies instructions + data from executable file into the new address space
4. Copies arguments passed to the program onto the stack
5. Initialize registers (most registers cleared, but sp assigned address of 1st
free stack location)
6. Jump to startup routine
▪ Copies program’s arguments from stack to registers & sets the PC
▪ When main returns, start-up routine terminates program with the exit
system call
❑ Backend (synthesis)
➢ Optimize IR
▪ reduces #operations to be executed
➢ Translate IR to assembly & further optimize
▪ take advantage of particular features of
the ISA
▪ e.g. mult ← sll
Front end stages: Lexical Analysis
❑ Lexical Analysis (scanning)
➢ Source → list of tokens.
Front end stages: Syntax Analysis
❑ Syntax Analysis (parsing)
➢ Tokens → syntax tree = syntactic structure of the original source code (text)
Front end stages: Semantic Analysis
❑ Semantic Analysis
➢ Mainly type checking, i.e. if types of the operands are compatible with the
requested operation.
Symbol table
Front end stages: Intermediate
representation (IR)
❑ Internal compiler language that is
➢ Language-independent
➢ Machine-independent
➢ Easy to optimize
❑ Typical optimizations:
➢ Dead code elimination: eliminate assignments to variables that are never
used and basic blocks that are never reached.
➢ Constant propagation: identify variables that have a constant value &
substitute that constant in place of references to the variable.
➢ Constant folding: compute expressions with all constant operands.
➢ Example: optimize
IR optimization example: 1st batch of
passes
❑ First 3 passes:
1. Dead code elimination: remove the assignment to z in the first basic block.
2. Constant propagation: replace all references to x with the constant 3.
3. Constant folding: compute constant expressions: y=3+7=10 & _t1=3/2=1
IR optimization example: subsequent
batches of passes
Repetition of simple
optimizations on CFG
=
Very powerful optimizations
No further
optimization found
Backend stages: Code generation
❑ Translate IR to assembly
➢ Map variables to registers (register allocation)
▪ Code generator assigns each variable a dedicated register.
▪ If #variables > #registers, map some less frequently used variables to
Mem and load/store them when needed.
➢ Translate each assignment to instructions
▪ Some assignments requires > 1 instructions.
➢ Emit each basic block
▪ Codes + appropriate labels and branches.
➢ Reorder basic block code wherever possible
▪ to eliminate superfluous jumps.
➢ Perform ISA- and CPU-specific optimizations
▪ e.g. reorder instructions to improve performance.
Summary