0% found this document useful (0 votes)
9 views40 pages

Lec04 Control

The document provides an overview of control instructions in RISC-V architecture, detailing instruction fields, categories, and the register file. It discusses control flow instructions, including conditional branches and procedure calls, along with examples of how to implement these in assembly language. Additionally, it covers stack allocation for procedures and the handling of nested and recursive procedures.

Uploaded by

John Lo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views40 pages

Lec04 Control

The document provides an overview of control instructions in RISC-V architecture, detailing instruction fields, categories, and the register file. It discusses control flow instructions, including conditional branches and procedure calls, along with examples of how to implement these in assembly language. Additionally, it covers stack allocation for procedures and the handling of nested and recursive procedures.

Uploaded by

John Lo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

CENG 3420

Computer Organization & Design


Lecture 04: Control Instruction
Bei Yu
CSE Department, CUHK
[email protected]

(Textbook: Chapters 2.8 – 2.11)

2025 Spring
Overview

1 Introduction

2 Control Instructions

3 Accessing Procedures

4 Summary

2/38
Introduction
RISC-V Instruction Fields

RISC-V fields are given names to make them easier to refer to

opcode 7-bits, opcode that specifies the operation


rs1 5-bits, register file address of the first source operand
rs2 5-bits, register file address of the second source operand
rd 5-bits, register file address of the result’s destination
imm 12-bits / 20-bits, immediate number field
funct 3-bits / 10-bits, function code augmenting the opcode
4/38
The RISC-V ISA

Instruction Categories
• Load and Store instructions
• Bitwise instructions
• Arithmetic instructions
• Control transfer instructions
• Pseudo instructions

5/38
RISC-V Register File

Register  File
32  bits
5 32 src1
src1  addr • Holds thirty-two 32-bit general purpose registers
5 data
src2  addr 32
5 locations • Two read ports
dst  addr
32 src2
write  data
32
data
• One write port

write  control

Registers are
• Faster than main memory
• But register files with more locations are slower
• E.g., a 64 word file may be 50% slower than a 32 word file
• Read/write port increase impacts speed quadratically
• Easier for a compiler to use
• (A*B)-(C*D)-(E*F) can do multiplies in any order vs. stack
• Can hold variables so that code density improves (since register are named with
fewer bits than a memory location)
6/38
Aside: RISC-V Register Convention

Table: Register names and descriptions

Register Names ABI Names Description


x0 zero Hard-wired zero
x1 ra Return address
x2 sp Stack pointer
x3 gp Global pointer
x4 tp Thread pointer
x5 t0 Temporary / Alternate link register
x6-7 t1 - t2 Temporary register
x8 s0 / fp Saved register / Frame pointer
x9 s1 Saved register
x10-11 a0-a1 Function argument / Return value registers
x12-17 a2-a7 Function argument registers
x18-27 s2-s11 Saved registers
x28-31 t3-t6 Temporary registers
7/38
History of RISC-V

8/38
Control Instructions
RISC-V Control Flow Instructions

RISC-V conditional branch instructions:


bne s0, s1, Lbl # go to Lbl if s0 != s1
beq s0, s1, Lbl # go to Lbl if s0 = s1

Example

if (i==j) h = i + j;

bne s0, s1, Lbl1


add s3, s0, s1
Lbl1: ...

• Instruction Format (B format)


• How is the branch destination address specified ?

10/38
RISC-V Control Flow Instructions

RARS example: beq

• What is the final value of a0?


11/38
In Support of Branch Instructions

• We have beq, bne, but what about other kinds of branches (e.g., branch-if-less-than)?
• For this, we need yet another instruction, slt

Set on less than instruction:


slt t0, s0, s1 # if s0 < s1 then
# t0 = 1 else
# t0 = 0

• Instruction format (R format or I format)

Alternate versions of slt


slti t0, s0, 25 # if s0 < 25 then t0 = 1 ...
sltu t0, s0, s1 # if s0 < s1 then t0 = 1 ...
sltiu t0, s0, 25 # if s0 < 25 then t0 = 1 ...

12/38
In Support of Branch Instructions

RARS example: slt

• What is the final value of a0?


13/38
Aside: More Branch Instructions

Can use slt, beq, bne, and the fixed value of 0 in register zero to create other
conditions
• less than: blt s1, s2, Label

slt t0, s1, s2 # t0 set to 1 if


bne t0, zero, Label # s1 < s2

• less than or equal to: ble s1, s2, Label


• greater than: bgt s1, s2, Label
• great than or equal to: bge s1, s2, Label
• Such branches are included in the instruction set as pseudo instructions – recognized
(and expanded) by the assembler

14/38
Bounds Check Shortcut

• Treating signed numbers as if they were unsigned gives a low cost way of checking if
0 ≤ x < y (index out of bounds for arrays)

sltu t0, s1, t2 # t0 = 0 if


# s1 > t2 (max)
# or s1 < 0 (min)
beq t0, zero, IOOB # go to IOOB if
# t0 = 0

• The key is that negative integers in two’s complement look like large numbers in
unsigned notation.
• Thus, an unsigned comparison of x < y also checks if x is negative as well as if x is
less than y.

15/38
Other Control Flow Instructions

• RISC-V also has an unconditional branch instruction or jump instruction:

jal zero, label # go to label, label can be an


immediate value

• Instruction Format (J Format)


• J is a pseudo instruction of unconditional jal and it will discard the return address
(e.g., j label)

16/38
In Support of Branch Instructions

RARS example: jal

• What is the final value of a0?


17/38
EX-2: Branching Far Away
What if the branch destination is further away than can be captured in 12 bits? Re-write
the following codes.
beq s0, s1, L1

18/38
EX: Compiling a while Loop in C

while (save[i] == k) i += 1;
Assume that i and k correspond to registers s3 and s5 and the base of the array save is in
s6.

19/38
EX: Compiling a while Loop in C

while (save[i] == k) i += 1;
Assume that i and k correspond to registers s3 and s5 and the base of the array save is in
s6.

Loop: slli t1, s3, 2 # Temp reg t1 = s3 * 4


add t1, t1, s6 # t1 = address of save[i]
lw t0, 0(t1) # Temp reg t0 = save[i]
bne t0, s5, Exit # go to Exit if save[i] != k
addi s3, s3,1 # i = i + 1
j Loop # j is a pseudo instruction for jal
# go to Loop
Exit:

Note: left shift s3 to align word address, and later address is increased by 1

19/38
Six Steps in Execution of a Procedure

1 Main routine (caller) places parameters in a place where the procedure (callee) can
access them
• a0 – a7: for argument registers
2 Caller transfers control to the callee
3 Callee acquires the storage resources needed
4 Callee performs the desired task
5 Callee places the result value in a place where the caller can access it
• s0-s11: 12 value registers for result values
6 Callee returns control to the caller
• ra: one return address register to return to the point of origin

20/38
Accessing Procedures
Instructions for Accessing Procedures

We have learnt jal, now let’s continue


• RISC-V procedure call instruction:

jal ra, label # jump and link,


# label can be an immediate value

• Saves PC + 4 in register ra to have a link to the next instruction for the procedure
return
• Machine format (J format):
• Then can do procedure return with a

jalr x0, 0(ra) # return

• Instruction format (I format)

22/38
Example of Accessing Procedures

RARS example: accessing a procedure with jal & jalr

• What is the final value of t1?


23/38
Example of Accessing Procedures

• For a procedure that computes the GCD of two values i (in t0) and j (in t1):
gcd(i,j);
• The caller puts the i and j (the parameters values) in a0 and a1 and issues a

jal ra, gcd # jump to routine gcd

• The callee computes the GCD, puts the result in s0, and returns control to the caller
using

gcd: . . . # code to compute gcd


jalr x0, 0(ra) # return

24/38
What if the callee needs to use more registers than allocated to argument and
return values?

• Use a stack: a last-in-first-out queue


• One of the general registers, sp, is used to address the stack
• “grows” from high address to low address
• push: add data onto the stack, data on stack at new sp
sp = sp - 4

• pop: remove data from the stack, data from stack at sp


sp = sp + 4

25/38
Allocating Space on the Stack

• The segment of the stack containing a procedure’s


saved registers and local variables is its procedure
frame (aka activation record)
• The frame pointer (fp) points to the first word of the
frame of a procedure – providing a stable “base”
register for the procedure
• fp is initialized using sp on a call and sp is restored
using fp on a return

26/38
Allocating Space on the Stack

RARS example: allocating space on the stack

• What is the final value of t1?


27/38
Allocating Space on the Heap

• Static data segment for constants and other static


variables (e.g., arrays)
• Dynamic data segment (aka heap) for structures that
grow and shrink (e.g., linked lists)
• Allocate space on the heap with malloc() and free
it with free() in C

28/38
EX-3: Compiling a C Leaf Procedure
Leaf procedures are ones that do not call other procedures. Give the RISC-V assembler
code for the follows.
int leaf_ex (int g, int h, int i, int j)
{
int f;
f = (g+h) - (i+j);
return f;
}
Solution:

29/38
EX-3: Compiling a C Leaf Procedure
Leaf procedures are ones that do not call other procedures. Give the RISC-V assembler
code for the follows.
int leaf_ex (int g, int h, int i, int j)
{
int f;
f = (g+h) - (i+j);
return f;
}
Solution:
Suppose g, h, i, and j are in a0, a1, a2, a3
leaf_ex: addi sp, sp, -8 # make stack room
sw t1, 4(sp) # save t1 on stack
sw t0, 0(sp) # save t0 on stack
add t0, a0, a1
add t1, a2, a3
sub s0, t0, t1
lw t0, 0(sp) # restore t0
lw t1, 4(sp) # restore t1
addi sp, sp, 8 # adjust stack ptr
jalr zero, 0(ra)

29/38
Nested Procedures

• Nested Procedure: call other procedures


• What happens to return addresses with nested procedures?

int rt_1 (int i)


{
if (i == 0) return 0;
else return rt_2(i-1);
}

30/38
Nested procedures (cont.)

caller: jal rt_1


next: . . .

rt_1: bne a0, zero, to_2


add s0, zero, zero
jalr zero, 0(ra)
to_2: addi a0, a0, -1
jal ra, rt_2
jalr zero, 0(ra)

rt_2: . . .

• On the call to rt_1, the return address (next in the caller routine) gets stored in ra.

Question:
What happens to the value in ra (when a0 != 0) when to_2 makes a call to rt_2?
31/38
Compiling a Recursive Procedure

A procedure for calculating factorial

int fact (int n)


{
if (n < 1) return 1;
else return (n * fact (n-1));
}

• A recursive procedure (one that calls itself!)


fact (0) = 1
fact (1) = 1 * 1 = 1
fact (2) = 2 * 1 * 1 = 2
fact (3) = 3 * 2 * 1 * 1 = 6
fact (4) = 4 * 3 * 2 * 1 * 1 = 24
. . .
• Assume n is passed in a0; result returned in s0

32/38
Compiling a Recursive Procedure (cont.)

fact: addi sp, sp, -8 # adjust stack pointer


sw ra, 4(sp) # save return address
sw a0, 0(sp) # save argument n
slti t0, a0, 1 # test for n < 1
beq t0, zero, L1 # if n >= 1, go to L1
addi s0, zero, 1 # else return 1 in s0
addi sp, sp, 8 # adjust stack pointer
jalr zero, 0(ra) # return to caller
L1: addi a0, a0, -1 # n >= 1, so decrement n
jal ra, fact # call fact with (n-1)
# this is where fact returns
bk_f: lw a0, 0(sp) # restore argument n
lw ra, 4(sp) # restore return address
addi sp, sp, 8 # adjust stack pointer
mul s0, a0, s0 # s0 = n * fact(n-1)
jalr zero, 0(ra) # return to caller

Note: bk_f is carried out when fact is returned.

33/38
Compiling a Recursive Procedure (cont.)

RARS example: compiling a recursive procedure

• What is the final value of t1?


34/38
Summary
The C Code Translation Hierarchy

C program

compiler

assembly code

assembler

object code library routines

linker

machine code executable

loader

memory

36/38
Compiler Benefits

• Comparing performance for bubble (exchange) sort


• To sort 100,000 words with the array initialized to random values on a Pentium 4
with a 3.06 clock rate, a 533 MHz system bus, with 2 GB of DDR SDRAM, using
Linux version 2.4.20

The un-optimized code has the best CPI1 , the O1 version has the lowest
instruction count, but the O3 version is the fastest.
gcc opt Relative Clock cycles Instr count CPI
performance (M) (M)
None 1.00 158,615 114,938 1.38
O1 (medium) 2.37 66,990 37,470 1.79
O2 (full) 2.38 66,521 39,993 1.66
O3 (proc mig) 2.41 65,747 44,993 1.46

1 37/38
CPI: clock cycles per instruction
Addressing Modes Illustrated

38/38

You might also like