Riscv
Riscv
6
In this chapter, we will introduce the RISC-V assembly language, which is newer than ARM
and x86. The ISA has a very interesting history. There are both legal and technical reasons
for designing a new RISC ISA as late as 2010. Recall that ISA design was considered to be a
dead area long before. Furthermore, a massive SW-HW ecosystem is needed to sustain an ISA,
which is hard to design from scratch. Nevertheless, there was a requirement in 2010 for a RISC
ISA that could be freely used by everybody and incorporated a lot of the technical know-how
that had been generated in the past two decades of computer architecture research.
Any modern ISA should be compatible with all kinds of devices starting from IoT devices
to mobile phones to laptops to servers. Many of these devices didn’t exist when conventional
ISAs were designed. This means that it should have 32, 64 and 128-bit variants, support a
relatively larger number of registers, and have extensive support for atomics and floating point
numbers.
Let us understand the situation that prevailed in 2010. Designing a simple RISC processor
had become relatively very easy. The architectures were well understood, EDA tools were
reasonably mature and enough computational power was available to even amateur designers.
Hence, putting together a small RISC core became feasible for fabless companies and academic
groups. It is important to note that circa 2010, there was a widespread consensus that RISC is
the way to go for new cores and x86-like CISC ISAs were not an option. This is because of the
1000+ instructions in CISC ISAs and the resulting decoding complexity. Decoders are power
hungry and given that these ISAs were meant to run on low-power embedded systems, a CISC
ISA was not a feasible option.
However, just designing a RISC processor is not enough. It needs to be fabricated as
well. A lot of fabs increased their capacity at that time and it also became possible to design
reasonably high-performance chips at older technology nodes. Hence, fabrication became quite
inexpensive. Along with these technology-level tailwinds, many fabless companies rapidly came
up in different parts of the world. They either wanted to design their bespoke processors or
wanted to integrate their custom cores with third-party accelerators in SoCs. They always had
225
c Smruti R. Sarangi 226
an option to use or modify existing RISC cores designed by ARM and MIPS or design their own
processors that are compatible with their ISAs. The advantage of using existing technologies is
that their toolchain can be used. This includes the compilers, operating systems, libraries and
binutils.
Despite so much of software support, many processor developers decided to forego the
advantages of using existing ISAs primarily for legal reasons. There were a fair amount of
restrictions and licensing requirements for using such legacy ISAs while even designing new
processors from scratch. Furthermore, licensing their cores and using them in SoCs was also
proving to be an expensive proposition to many developers. There was thus a need to create a
new ISA from scratch and also create the full ecosystem to support it. The idea was to have
extremely lenient licensing requirements such that the barrier to entry is reduced to a minimum.
In 2010, the RISC-V project began in Berkeley. It was initially supported by various aca-
demic groups. Later on the RISC-V technical documents were released under a Creative Com-
mons license in 2015. Currently, the RISC-V foundation maintains this ISA and publishes
regular updates.
Name Description
ISA base versions
RV32 32-bit ISA
RV32E 32-bit ISA (embedded version)
RV64 64-bit ISA
RV64E 64-bit ISA (embedded version)
RV128 128-bit ISA
Extensions
E embedded version
I base integer ISA
M integer multiplication/division instructions
A atomic instructions
F single-precision floating point
D double-precision floating point
V vector instructions
RISC-V has three base versions: RV32, RV64 and RV128. The term “RV” is a short form
227 c Smruti R. Sarangi
of RISC-V. The numbers 32, 64 and 128 indicate the bit width, respectively. The ‘E’ suffix
has a special place. It indicates the embedded version that uses a reduced number of registers.
For example, RV32E assumes only 16 integer registers as opposed to 32 integer registers in the
regular version. It is important to note that unlike other ISAs (such as ARM Thumb), the
instruction sizes remain the same. This simplifies compiler and processor design.
Let us now focus on the extensions. Most versions of the ISAs support basic integer instruc-
tions. They are thus named RV32I, RV64I and RV128I, respectively. Then, there are a bunch of
extensions that can be added based on users’ requirements. The list of extensions has become
quite large as of 2024 (around 20-30). Most of the common ones correspond to floating point
instructions, atomic operations, cryptographic primitives, memory barriers, etc. For example,
RV32IMA means that integer instructions (I), multiplication/ division instructions (M) and
atomic instructions (A) are supported.
These extensions themselves can have version numbers: a major version number and a
minor version number. This is because the specifications keep changing as the ISAs are under
development. For example, RV32I1p3 means that the major version number is 1 and the minor
version is 3. The separator ‘p’ is used to separate the major and minor version numbers.
The extensions can be grouped into packages the same way we bundle together add-ons
in flight or hotel deals. The ‘G’ suffix that represents general-purpose computing, combines
the base integer instruction set, additional integer instructions, floating point instructions and
basic synchronization primitives. This is considered to be an essential set of instructions in a
multi-core setup. RV32G is thus a general-purpose RISC-V ISA. It is so happening that the
number of extensions is continuously increasing and we are running out of letters !!!
This is why the ‘Z’ series was introduced, where the extension name (suffix) should start
with ‘Z’ and be followed by a word that describes the extension. For example, ‘Zfa’ refers to
additional floating point instructions.
Let us now recall the fact that the embedded version of RISC-V does not reduce the in-
struction width; it instead reduces the number of available registers by 50%. On the lines of
ARM Thumb, RISC-V does have a compressed format. In this case, the ISA specifier is ‘C’.
In such contexts, most often we use the ISA RV32GC (general-purpose and compressed). The
compressed instructions have the following limitations.
• They access a limited number of registers (typically limited to 8 registers in the 16-bit
version).
Akin to any other compressed ISA, they lead to reduced code size, better usage of the
i-cache and lower power consumption in terms of fetching and decoding instructions.
c Smruti R. Sarangi 228
Important Point 7 The RISC-V ISA is a RISC ISA. It is however not small and simple
like SimpleRisc. Instead it has a base set of instructions, a set of extensions, 48 and 64-bit
instruction lengths (may get fully frozen in the future) and different ISA variants including
a compressed 16-bit form. This sounds more like “CISC”. However, there is still a lot of
regularity in the ISA: there are a few instructions formats, instructions are mostly 32 bits
in length and the base ISA is very “RISC like”. Such trade-offs are inevitable in designing
any modern ISA that needs to support a wide range of devices: embedded processors to
supercomputers.
Given the machine model, let us now explain the instructions supported by the RV32I ISA.
Note that this is our fourth chapter on low-level assembly languages. Hence, the treatment will
be brief.
The most basic operation in any assembly language is to load a value into a register. We
typically transfer the contents from another register or from the immediate field of an instruc-
tion. We need a counterpart of the mov instruction in RISC-V. The relevant instructions are
shown in Table 6.3. RISC-V does not have a dedicated mov instruction; instead, we add an
immediate to the zero register and store the result in the destination register.
Specifically, the addi instruction can be used to load a signed 12-bit immediate to the
destination register. In this case, its usage is somewhat unconventional. As the example shows,
the immediate is added to the contents of x0 (zero register), which is hardwired to 0. Effectively,
c Smruti R. Sarangi 230
the immediate gets transferred to the destination register. An advantage of such a mechanism
is that we need not have a dedicated mov instruction. The add instruction and its variants
can be used to load immediates. Similarly, we can set the immediate to 0 and transfer the
contents of the source register to the destination register. This simulates a regular register mov
instruction. We can alternatively use the regular add instruction to achieve this. We can set
the second register operand to zero. The net effect is that the contents are transferred to the
destination register. The add instruction otherwise does the same as its counterparts in ARM
and SimpleRisc .
A major issue with the addi instruction is that the immediate is limited to 12 bits. Loading a
full 32-bit value thus requires several instructions. RISC-V therefore provides the lui instruction
that loads a 20-bit immediate into the upper 20 bits of a register – the immediate is effectively
left-shifted by 12 positions. The semantics of this instruction is shown in Table 6.4 (also refer
to Example 91).
Example 91
Write a RISC-V assembly program to add 409932 + 409823.
Answer:
. main :
lui t0 , 100 # t0 = 4096 * 100 = 409600
addi t0 , t0 , 332 # t0 = t0 + 332
lui t1 , 100 # t1 = 4096 * 100 = 409600
addi t1 , t1 , 223 # t1 = t1 + 223
add t2 , t0 , t1 # t2 = t0 + t1
It is evident from Example 91 that loading a 32-bit value into a register requires two in-
structions. Even though the ISA has this limitation, most RISC-V assemblers support the
assembler directive li that directly loads a 32-bit value into a register. The assembler replaces
the directive with two assembly instructions: addi and lui. The code in Example 91 can be
compressed using the li assembler directive (refer to Example 92).
231 c Smruti R. Sarangi
Example 92
Write a RISC-V assembly program to add 409932 + 409823 using the li assembler directive.
Answer:
. main :
li t0 , 409932 # t0 = 409932
li t1 , 409823 # t1 = 409823
add t2 , t0 , t1 # t2 = t0 + t1
Table 6.5 shows the general form of the add and sub instructions in RISC-V. They have the
same general format as SimpleRisc add and sub instructions, respectively. The generic format
is inst rd, rs1, rs2/imm.
Example 93
Write a RISC-V assembly program to compute 4 + 5 - 19.
Answer:
addi t0 , zero , 4 # load 4 into t0
addi t1 , zero , 5 # load 5 into t1
add t2 , t0 , t1 # t2 = t0 + t1
addi t2 , t2 , -19 # subtract 19 from t2
Example 94
Write an assembly program to swap two numbers stored in x1 and x2.
Answer:
add x3 , x0 , x1 # x3 = x1
add x1 , x0 , x2 # x1 = x2
add x2 , x0 , x3 # x2 = x3 ( old x1 )
c Smruti R. Sarangi 232
Table 6.6 shows the multiplication and division instructions. They are a part of the ‘M’
extension. The reason for including them in an extension is to enable the creation of really
low-end and low-power implementations that do not require such instructions.
The multiplication instruction has some complications. The product requires 64 bits, which
means that it will not fit in a single register. The default implementation thus places the lower
32 bits in the destination register. However, sometimes there is a need to store the full 64-bit
product – this will require two registers. The default mul instruction computes the lower 32
bits. The mulh and mulhu instructions can next be used to store the upper 32 bits for signed
× signed and unsigned × unsigned multiplication, respectively. Even though we require two
separate instructions now, micro-architectures can fuse them dynamically. They can identify
two consecutive multiplication instructions where one instruction computes the lower 32 bits and
the next instruction computes the upper 32 bits. This sequence can be identified dynamically
and a single multiplication will only be required.
Example 95
Write an assembly program to multiply 3 with -17 and save the result in t3.
Answer:
addi t1 , zero , 3 # t1 = 3
addi t2 , zero , -17 # t2 = -17
mul t3 , t1 , t2 # t3 = t1 * t2
Example 96
Compute 123 + 1 and save the result in t4.
Answer:
# load the registers with required values
addi t1 , zero , 1 # t1 = 1
addi t2 , zero , 12 # t2 = 12
addi t3 , zero , 12 # t3 = 12
233 c Smruti R. Sarangi
The division instruction div is comparatively simpler. In the RV32 variant, it requires 32-
bit dividends and divisors. The quotient is stored in the destination register. The rounding is
towards zero. Let us explain rounding using a few examples.
We see that rounding towards zero also means that the sign of the remainder is the same
as the sign of the dividend. The remainder instruction works on similar lines. It computes the
remainder of the division operation (rounding towards zero).
Akin to the multiplication operations, the division and remainder operation work in the
same manner. When they are issued back to back, micro-architectures are expected to fuse
them. They compute a single division operation and store the results in two registers – one
register for the quotient and one for the remainder, respectively.
This is an example of a scenario where the ISA has deliberately been under-designed. Instead
of having an instruction that stores to two 32-bit registers, the programmer or compiler are
expected to invoke these instructions consecutively. It is the job of the hardware to dynamically
identify such sequences and fuse them. This transfers the responsibility of ensuring efficiency
to hardware at the cost of keeping the ISA simple.
Example 97
Write a RISC-V assembly program to divide -50 by 3. Store the quotient in t2 and
remainder in t3.
Answer:
addi t0 , zero , -50 # t0 = -50
addi t1 , zero , 3 # t1 = 3
div t2 , t0 , t1 # quotient in t2
rem t3 , t0 , t1 # remainder in t3
c Smruti R. Sarangi 234
Example 98
Write a RISC-V assembly program to compute the bitwise OR of A and B. Let A = 4 and
B = 1.
Answer:
addi t1 , zero , 4 # t1 = 1
ori t2 , t1 , 1 # bitwise OR of 4 and 1
Akin to other ISAs, RISC-V has three shift instructions: shift left logical (sll), shift right
logical (srl) and shift right arithmetic (sra). They have their variants where the second source
is an immediate. They are slli, srli and srai, respectively.
Example 99
Write RISC-V assembly code to compute 50/4.
Answer:
235 c Smruti R. Sarangi
addi t0 , zero , 50 # t0 = 50
srai t1 , t0 , 2 # t1 = 50/4
Example 100
Answer:
addi t3 , zero , 5 # t3 = 5
addi t2 , zero , 7 # t2 = 7
slli t4 , t3 , 2 # t4 = t3 * 4
add t1 , t2 , t4 # t1 = t2 + t3 * 4
Table 6.8: The slt family of instructions. The destination register is by default set to 0.
Table 6.8 shows the slt family of instructions. They compare the values of two registers, or
a register and an immediate. If the first source operand is less than the second source operand,
then the destination register’s value is set to 1. Otherwise, it remains 0. The conditional branch
c Smruti R. Sarangi 236
instructions can then directly compare this register with zero and decide the outcome of the
branch instruction: taken or not-taken.
Example 101
Write RISC-V assembly code to set t2 if 2 < 5.
Answer:
addi t0 , zero , 2 # t0 = 2
addi t1 , zero , 5 # t1 = 5
slt t2 , t0 , t1 # t2 = ( t0 < t1 )
Example 102
Add two long 64-bit values stored in ht1, t0i and ht3, t2i. Store the result in ht5, t4i.
Answer:
# initialize the registers
addi t2 , zero , -1
addi t3 , zero , 2
addi t0 , zero , 1
addi t1 , zero , 0
Branch Instructions
The conditional branch instructions in RISC-V are shown in Table 6.9. The instructions
take two register arguments and compare them. The result of the comparison is immediately
used to decide the direction of the branch.
Table 6.9 shows the beq, bne, bge and blt instructions that have their usual meanings. The
third argument is a label that represents the branch target. Along with these signed comparison
instructions, RISC-V has comparison instructions to compare unsigned integers: bgeu and bltu.
Recall that ARM also has similar instructions that are implemented with the help of custom
flags.
Example 103
Write a RISC-V assembly program to compute the factorial of a positive number (> 1)
stored in a1. Save the result in a0.
Answer:
. main :
addi a0 , zero , 1 # prod = 1
addi t0 , zero , 1 # index = 1
. loop :
mul a0 , a0 , t0 # prod = prod * index
addi t0 , t0 , 1 # index ++
bge a1 , t0 , . loop # loop condition
Example 104
Write an assembly program to add the numbers from 1 to 10. Store the result in s0.
Answer:
. main :
addi t0 , zero , 1 # initialize t0 to 1
addi s0 , zero , 0 # result ( s0 ) = 0
addi t1 , zero , 10 # loop end value
. loop :
add s0 , s0 , t0 # add to the result
addi t0 , t0 , 1 # increment the counter
bge t1 , t0 , . loop # loop condition
Example 105
Write an assembly program to test if a number stored in a1 is prime or not. Save the
Boolean result in a0.
Answer:
# input in a1 , return value in a0
. main :
addi t0 , zero , 2 # starting divisor
. loop :
rem t1 , a1 , t0 # find the remainder ( t1 )
beq t1 , zero , . notPrime
. notPrime :
addi a0 , zero , 0
. end :
# a0 contains the result
Example 106
Write an assembly program to find the number of ones in a 32-bit number stored in a1.
Answer:
. main :
addi t0 , zero , 0 # counter , t0 = 0
addi t1 , zero , 32 # maximum possible ones
addi t2 , zero , 1 # t2 = 1
addi a0 , zero , 0 # will contain the result ( a0 = 0)
. loop :
andi t3 , a1 , 1 # check the LSB of the argument a1
srli a1 , a1 , 1 # shift the argument by 1 step
beq t3 , t2 , . inc # jump to . inc if the LSB is 1
. lret :
addi t0 , t0 , 1 # increment the counter
239 c Smruti R. Sarangi
. inc :
addi a0 , a0 , 1 # increment the count of 1 s
jal zero , . lret # resume the next iteration
. end :
# a0 contains the result
Example 107
Write an assembly program to check if a natural number stored in a1 is a perfect square or
not. Save the Boolean result in a0.
Answer:
. main :
# input number in a1
addi a1 , zero , 101
addi a0 , zero , 0 # assuming result ( a0 ) = false
addi t1 , zero , 1 # counter ( t0 ) = 1
. loop :
mul t2 , t1 , t1 # square -> compare
beq t2 , a1 , . square # It is a square
addi t1 , t1 , 1 # increment the counter
blt a1 , t2 , . end
jal zero , . loop # loop back
. square :
addi a0 , a0 , 1 # result = 1
. end :
# result in a0
additionally stores the return address (pc+4) in the first source register (x1 in the example).
Note that if the first source register is equal to x0 (zero), then the return address is not stored.
jal in this case acts as a regular unconditional jump that does not store the return address.
The jalr instruction augments jal with one additional register argument. Consider the
example: jal x1, x2, 20. In this case, we add the offset 20 to the contents of x2 and jump to
the resulting address. The return address is stored in x1. Similar to jal, we do not store the
return address if the first source register is x0. The jalr instruction can be used to implement
a function return instruction. All that we have to do is to jump to the PC whose value is 0(ra)
(contents of the register ra + 0).
Example 108
Write a RISC-V assembly program that has a function call.
Answer:
. main :
addi s0 , zero , 3 # s0 = 3
jal ra , . foo # jump to . foo
add s1 , s0 , a0 # y = x + foo ()
Example 109
Write a RISC-V assembly program to compute xn and store the result in a0. x is passed
through a1 and n is passed through a2.
Answer:
. power :
addi a0 , zero , 1 # a0 will contain the result
add t1 , zero , a2 # t1 = n
beq t1 , zero , . end # check ( n == 0)
. loop :
mul a0 , a0 , a1 # result *= x
addi t1 , t1 , -1 # decrement n
bne t1 , zero , . loop
. main :
addi a1 , zero , 7 # x = 7
addi a2 , zero , 3 # n = 3
# the result is in a0
Table 6.11: Load and store instructions. Note that la is an assembler directive.
Table 6.11 shows the load and store instructions in RISC-V. We only show the 32-bit versions
of these instructions. The lw instruction loads 32-bit values from memory that is specified in
the base-offset format. On similar lines, the sw instruction stores the value of a register to
memory. Note that the store instruction takes two register source operands, and it has its
separate format. The store operation has always been an exception in such respects. RISC-
V defines a special format for it, which accepts two register-based source operands and an
immediate.
c Smruti R. Sarangi 242
Example 110
Write an assembly program to load a0 with the contents of the memory address
sp − s0 × 4 − 12.
Answer:
. main :
slli s0 , s0 , 2 # s0 = s0 * 4
add s0 , s0 , 12 # s0 = s0 + 12
sub t0 , sp , s0 # t0 = sp - s0
lw a0 , 0( t0 ) # load the value of mem [ t0 ] in a0
Example 111
Write an assembly program to create a copy of a 10-element array. Assume the start-
ing address of the original array is stored in a1 and that of destination array is stored in a2.
Answer:
. main :
addi t1 , zero , 0 # counter ( t1 ) = 0
addi t2 , zero , 10 # number of iterations
. loop :
lw t0 , 0( a1 ) # load an element from the source array
sw t0 , 0( a2 ) # store an element in the destination array
Example 112
Write a RISC-V assembly program to compute the sum of the elements in a 10-element
array. Assume that the base address of the array is stored in a1. Store the result in a0.
Answer:
int idx ;
int sum = 0;
for ( idx = 0; idx < 10; idx ++) {
sum = sum + a [ idx ];
}
}
. loop :
lw t2 , 0( a1 ) # load an element in t2
add a0 , a0 , t2 # update the result
# result in a0
Example 113
Write a RISC-V assembly program to compute the factorial of a number (stored in a1)
using recursion. Store the result in a0.
Answer:
. fact :
# check if n ( in a1 ) is 0 or 1
addi t1 , zero , 1 # t1 = 1
bge t1 , a1 , . ltone # if ( a1 == 1) jump to . ltone
# recursive call
c Smruti R. Sarangi 244
jal ra , . fact
. ltone :
addi a0 , zero , 1 # result is 1
jalr zero , 0( ra ) # return
. main :
addi a1 , zero , 5 # compute 5!
jal ra , . fact # Call the factorial function
# result in a0
Example 114
Define a constant val that is initialized to 17. Store its value in register s0 after loading it
from memory.
Answer:
val : . word 17
la t1 , val
lw s0 , 0( t1 )
It is not possible to directly load an immediate into a floating point register. Like x86,
floating point registers can only be initialized by loading values from memory.
31 8 7 5 4 3 2 1 0
Reserved Rounding Mode (frm) Accrued Exceptions (fflags)
24 3 NV DZ OF UF NX
1 1 1 1 1
Mnemonic Explanation
NV Invalid operation
DZ Divide by zero
OF Overflow
UF Underflow
NX Inexact
The f f lags field stores five flags, which are also known as the accrued exception flags. The
first four flags – invalid operation, divide by zero, overflow and underflow – have their standard
meanings. Let us discuss the fifth flag (inexact) that we have not encountered before. This is
set when the result cannot exactly be stored in a floating point register and some rounding was
required. Next, let us discuss the different rounding modes. They are stored in bits 6-8 of the
f csr.
Rounding Modes
RISC-V instructions can use a static rounding mode (encoded in the instruction) or a
dynamic rounding mode (encoded in the f csr’s f rm field). The default rounding mode is RNE.
We round the result to the nearest value that can be represented in the IEEE 754 format. If the
247 c Smruti R. Sarangi
real value is between two representable values, then the result is rounded to the value that has
an even LSB. The next rounding mode is RTZ, which is round towards zero. It is equivalent
to truncation where the bits that cannot be fit in the format are simply removed. The next
two rounding modes are self-evident: RDN (round towards −∞ or the floor function) and RUP
(round towards +∞ or the ceiling function).
The RMM rounding mode is similar to RNE. However, if the result is between two repre-
sentable values, then we round towards the number that has the higher magnitude (away from
zero). The next two values are not used at the moment. Finally, the DYN mode selects a
rounding mode dynamically (stored in the f rm field of the f csr).
Let us now look at the basic floating point load and store instructions in Table 6.14. They
load and store values from memory, respectively. They do not perform type conversion. The
f cvt instruction and its variants can be used to perform type conversion, as we shall see later.
The key idea in RISC-V is the same as in x86, which is that floating point immediates
cannot be directly loaded into registers. Their contents need to be stored in memory first and
then the 32-bit floating point value can be loaded into a floating point register. In this sense,
this part of the ISA is less powerful than its integer counterpart. However, this does not cause
much of a performance loss in practice because most of the time we do not face the need for
loading floating point immediates, other than while loading built-in constants such as π and e.
In this case, we can use the assembler pseudoinstruction la to store the contents of the constant
to memory and then load the address of the starting memory address to a register (refer to
Example 115). The floating point load and store instructions otherwise are quite similar to
their integer counterparts in RISC-V.
Example 115
Load the value of a constant val into a floating point register f s1.
Answer:
val : . float 3.14
. main :
la a1 , val
flw fs1 , 0( a1 )
c Smruti R. Sarangi 248
Example√116
Compute π + e + π × e, and store the result in f a0.
Answer:
# declare the constants
pi : . float 3.14
e : . float 2.72
. main :
# load them into floating point registers
la a1 , pi
flw fs1 , 0( a1 )
la a2 , e
flw fs2 , 0( a2 )
To support operations such as dot products, matrix multiplication, and similar operations,
249 c Smruti R. Sarangi
RISC-V supports a few fused arithmetic instructions such as the fused addition and subtraction
operations (refer to Table 6.16). The fused add instruction (f madd.s) takes three register source
operands as arguments. It multiplies the first two and adds the product to the third. On similar
lines, the fused subtract instruction subtracts the third source operand from the product of the
first two register-based source operands.
Example 117
Compute π × e + 4, and store the result in f a0. Convert the result to an integer and store
the result in a0.
Answer:
pi : . float 3.14
e : . float 2.72
. main :
la a1 , pi # load pi
flw fs1 , 0( a1 )
la a2 , e # load e
flw fs2 , 0( a2 )
Table 6.17 shows the floating point to integer conversion (and vice versa) instructions. The
f cvt.s.w instruction proves to be very helpful. It can be used to convert integer immediates to
c Smruti R. Sarangi 250
floating point numbers, whenever we wish to multiply a floating point number with a multiplier
of the form 2.0 or 3.0.
Comparing floating point numbers is not the same as comparing integers. They cannot be
directly given as arguments to conditional branch instructions. In this case, the status of the
comparison needs to be stored in an integer register. This register can then be compared with
the zero register using a regular conditional branch instruction.
Table 6.18 shows the three floating point comparison instructions that store the result in
an integer register. Let us explore their usage using an example (Example 118).
Example 118
First, initialize a0 = 0, then set a0 = 17 if e < π.
Answer:
pi : . float 3.14
e : . float 2.72
. main :
la a1 , pi # load pi
flw fs1 , 0( a1 )
la a2 , e # load e
flw fs2 , 0( a2 )
. end :
251 c Smruti R. Sarangi
7 5 5 3 5 7
12 5 3 5 7
7 5 5 3 5 7
20 5 7
Table 6.19: Instructions belonging to each of the four arithmetic RISC-V formats
Let us now take a deep look at each of these four instruction formats. The first 7 bits are
reserved for the opcodes in all the formats. This means that we can support a maximum of
c Smruti R. Sarangi 252
128 instructions in the ISA; moreover, finding the opcode is also quite easy (first 7 bits). The
reason that this is important is because minimizing the decoder’s complexity is a key goal of
ISA design. Given that all the opcode bits are in the same positions across the formats, finding
the opcode and consequently the type of the instruction is easy.
The next 5 bits are used to store the id of the destination register rd in the R, I and U
formats. S-type instructions do not have a destination register. For example, the sw instruction,
which is an S-type instruction, does not have a destination register. It however has two source
registers and an immediate. Instead of changing the positions of the source registers in the
S-type instruction format, the ISA designers made the right decision to use the bits for the
destination register to store a part of the immediate. Hence, the first 5 bits of the immediate
are stored at the corresponding positions.
Next, let us consider U-type instructions (such as lui). We can use the remaining 20 bits
to store the 20-bit immediate. We don’t need to store any more information. Such instructions
have only two arguments. The destination register rd and the 20-bit immediate.
The rest of the three formats (R, I and S) store a 3-bit field f unct3. It is used to hierar-
chically organize the instructions and to also support more instructions. In RISC-V, a single
opcode can correspond to multiple instructions. For example, the opcodes of the add, slt and
xor instructions are the same: 0110011. They are differentiated by the values of the f unct3
field. We can thus support more instructions than 128. However, that is not the main aim here.
We can group similar instructions such that they are processed by the hardware in the same
manner. They will still have differences between them such as add and xor. However, most of
the processing can remain the same given that both are R-type instructions.
The next 5 bits store the rs1 field (first source register) in all three formats (R, I and S).
Subsequently, differences arise. I-type instructions need to store a 12-bit immediate. They don’t
have a second source register (rs2). They thus use the remaining 12 bits to store the immediate.
All immediates are sign-extended before being used in hardware unless the instruction has a
‘u’ suffix (for unsigned). The R and S-type instructions store the second source register rs2 in
the next 5 bits.
Let us now consider the last 7 bits in the R and S formats. R-type instructions have another
opcode extender called f unct7 (in addition to f unct3). It serves the same purpose as f unct3.
The aim is to increase the number of instructions and also create a grouping of instructions
based on their similarity. It is possible for two instructions to have the same opcode and f unct3
fields, yet have a different f unct7 field. Consider add and sub. They have the same opcode
(0110011) and same f unct3 (000); however, their f unct7 fields are different – 0000000 and
0100000, respectively.
In contrast, S-type instructions store the remaining 7 immediate bits in the uppermost (most
significant) 7 positions. Recall that we had already stored 5 immediate bits in the positions
at which other formats store the id of the destination register (rd). This is a very reasonable
decision because the explicit aim is to ensure that the same field is stored at the same set of
positions across the formats – it is very easy for the decoder to extract it.
4..1
12 imm[10..5] rs2 rs1 funct3 imm[..] 11 opcode B-Type
7 5 5 3 5 7
20 5 7
All the conditional instructions like beq, bne and blt are implemented using the B format.
Recall that such instructions take two source registers. They are compared and then based on
the results of the comparison, a taken/not-taken decision is made. The instruction uses PC
offset-based addressing, where the offset is encoded in the immediate field.
The immediate is encoded in a special manner in the B format. The offset needs to be a
multiple of 2 (limitation of the ISA). This means that its LSB is 0. Given that this bit is fixed,
there is no need to represent it. In other words the 0th bit is set to 0 and thus need not be
represented. The format thus stores the rest of the 4 bits in the first 5-bit field. Note that as
compared to the S format, the immediate bits 4..1 are stored in exactly the same positions. This
makes extracting these bits very easy and there is consequently no need to design additional
decoder hardware to handle these bits differently in the B format. Given that the 0th bit is not
stored, its corresponding position can be used to store the 11th bit.
The rest of the immediate bits are stored in the most significant bit positions. The most
significant 7 bits store the 12th bit and the bits 10..5. We thus store 13 immediate bits: 12 of
them are explicitly stored and the least significant immediate bit is assumed to be 0. This format
can thus encode an offset between -4096 and 4095 (≈ ± 4 KB). This offset is sign-extended and
added to the PC.
Next, consider the J format. It takes a single destination register and a 20-bit immediate
as its arguments. The immediate here encodes an offset that is a multiple of 2 (akin to the B
format). There is no need to store the LSB, which is set to 0. Like the B format, there is a need
to store 20 bits. The order of storing the bits from the most significant position to the least
significant position is as follows: bit 20, bits 10..1, bit 11, bits 19..12. Given that we encode
21 bits in this format (20 explicitly and 1 implicitly), we can represent an offset range that is
within ±1 M B of the current PC.
c Smruti R. Sarangi 254
LOAD-FP
imm rs1 width rd opcode
12 5 3 5 7
STORE-FP
imm[11..5] rs2 rs1 width imm[4..0] opcode
7 5 5 3 5 7
Figure 6.4: Encoding the f lw and f sw instructions
Figure 6.4 shows the encoding of the f lw and f sw instructions. f lw instructions are encoded
in the I format. The f unct3 instruction is replaced with the width field (amount of data that
is loaded). Similarly, the f sw instruction is encoded using the S format. The only change is
that the f unct3 field is replaced with the width.
5 2 5 5 3 5 7
Figure 6.5: Encoding of floating point arithmetic instructions
Figure 6.5 shows the encoding format of floating point arithmetic instructions (variation of
the R format). All such instructions take one floating point destination register and one or two
source registers as inputs. The format is the same for all variants. The rm field encodes the
rounding mode and the f mt field represents the precision (32-bit, 64-bit, 16-bit, 128-bit).
The opcode field is typically the same for all common floating point arithmetic instructions.
The f unct5 field stores the code for the specific type of instruction. For instructions like f qrt
that do not have the second source register, the rs2 field is set to 0.
The same format is also used by the floating point conversion instructions (f cvt.w.s and
f cvt.s.w).
This format is also used by floating point comparison instructions. The rm field in this case
stores the following comparison conditions: EQ, LT and LE. The f unct5 field stores a code
for floating point comparison (F CM P ).
255 c Smruti R. Sarangi
5 2 5 5 3 5 7
Figure 6.6 shows the encoding format of the f madd and f msub instructions. Instead of the
f unct5 field, the third source register rs3 is stored in its place. The rest remains the same.
Summary 6
1. The RISC-V ISA refers to a family of instruction sets. The basic ISA is RV32 (32-
bit). There are 64-bit and 128-bit variants as well that are currently under different
stages of development. They are named RV64 and RV128, respectively.
2. The ISA has a modular structure. Different sets of instructions can be added to it
depending upon the use case. Each such module is known as an “extension”.
3. Some popular extensions are as follows: integer (default), embedded, atomic instruc-
tions, single and double-precision floating point arithmetic, and vector arithmetic.
4. There is a compressed instruction set (suffix ‘C’) that is similar in principle to ARM
Thumb.
5. There are 32 integer registers. The zeroth integer register (x0 or zero) is hardwired
to 0. There is an elaborate usage convention that most assembly programmers are
expected to follow.
6. The usage convention distinguishes between temporary registers (caller saved), callee
saved registers and function arguments/return values.
7. The integer registers are named x0 . . . x31. They additionally can be addressed using
their mnemonics t0 − 6 (temporary), s0 − 12 (callee saved), a0 − 7 (arguments and
return values), ra (return address), sp (stack pointer), gp (global pointer) and zero.
Addressing registers by their mnemonics is preferred.
8. For example, the integer register t3 is the third temporary register that is the same as
x28.
c Smruti R. Sarangi 256
9. The RISC-V ISA is a RISC ISA that accepts 12-bit immediates in arithmetic in-
structions and 20-bit immediates in branch instructions and the load-upper-immediate
instruction.
10. There is no dedicated move-immediate instruction in the ISA. Instead, the way to load
an immediate is to use the addi instruction to add the 12-bit immediate to the register
zero. The upper 20 bits can then be set by the lui (load upper immediate) instruction.
11. Akin to other RISC ISAs, RISC-V supports all the standard arithmetic and logical
instructions including some unsigned variants.
12. There is no dedicated f lags register that stores the result of the last comparison.
Instead, branch instructions take two register arguments. They directly compare them
and depending upon the branch condition, jump to the label specified in the instruction.
13. The jal and jalr instructions are used to jump to a different location and store the
return address in the first source register. If the register is zero, then the return
address is not saved. The jal instruction can be used to implement the classical call
instruction while the jalr instruction can be used to implement the return instruction.
14. There are two important assembler directives that translate to multiple RISC-V in-
structions at runtime. They are li (load 32-bit immediate) and la (load the address
of a constant defined in the assembly file into a register).
15. RISC-V has 32 floating point registers numbered f 0 . . . f 31. No floating point register
is hardwired to 0. They also have a usage convention and are also known by their
mnemonics. These mnemonics have a similar pattern: f t0−11, f a0−7 and f s0−12.
16. The floating point control status register (f csr) is used to control the behavior of
floating point instructions. It stores the rounding mode and floating point exceptions
seen after the last time this register was reset (divide-by-zero, overflow, etc.).
17. There is no direct way of loading a floating point immediate into a register in RISC-V.
In RISC-V, an immediate is associated with a label, and it is assumed to be stored
in memory before the execution of the code starts. The address of the label (or the
immediate) can be loaded to a register using the assembler directive la. Subsequently,
the f lw instruction can be used to load the corresponding floating point value. The
f sw instruction can be used to store floating point values.
18. All single-precision floating point arithmetic instructions operate in a manner that
is more or less similar to their integer counterparts. They have the “.s” suffix. For
example, the floating point add instruction is named f add.s.
19. Another way of loading or storing immediate values is using the floating point con-
version instructions: f cvt.w.s and f cvt.s.w.
257 c Smruti R. Sarangi
20. Floating point comparison instructions have an integer destination register and two
floating point source registers. The hardware compares the source registers based on
the type of the comparison that needs to be performed, and then the Boolean result is
stored in the destination register.
21. RISC-V has six different instruction formats: 4 integer formats (R, I, S and U) and
2 branch formats (B and J).
22. Most arithmetic instructions that do not have immediates use the R format. The I
format is used for instructions that use an immediate such as the addi instruction or
the lw (load) instruction. Store instructions are encoded using the S format and the
lui instruction uses the U format.
23. All the conditional branch instructions use the B format. The B format admits a
12-bit immediate with an additional and implicit LSB bit that is hardwired to 0. jal is
a J-type instruction that has a single destination register and a 20-bit immediate (the
LSB is not specified because it is 0). Effectively, the B format has a 13-bit immediate
and the J format has a 21-bit immediate.
24. Floating point instructions use the I format for f lw and S format for f sw instructions,
respectively. The rest of the instructions primarily rely on minor variations of the R
format.
look at architecture simulators that simulate RISC-V instructions and their vector extensions
such as the simulator released by Ramirez et al. [Ramı́rez et al., 2020]. The next logical step
is to study RISC-V processors such as BOOMv2 [Celio et al., 2017], RISC-V 2 [Patsidis et al.,
2020] and the processor in reference [Stangherlin and Sachdev, 2022]. RISC-V processors are
also being designed to operate in high-radiation environments like outer space. Many space
research organizations are creating their bespoke RISC-V processors [Wessman et al., 2021].
Exercises
Ex. 2 — Why does RISC-V not have a mov instruction? What is the advantage of making
this choice?
* Ex. 4 — RISC-V does not have a f lags register. However, it stores some information in
the f csr register. Why is this required?
Ex. 6 — Why is it not a good idea to have instructions to load floating point immediates
directly into registers (similar to addi and lui for integers)?
Ex. 8 — What is the advantage of maintaining the positions of the fields across the different
RISC-V instruction encoding formats?
* Ex. 9 — How do the opcode, f unct3, f unct5 and f unct7 fields help in implementing RISC-
V extensions?
Ex. 10 — What is the advantage of making it easy to extract the sign bit of the immediate
in the different formats, especially the B and J formats?
259 c Smruti R. Sarangi
Design Problems
Ex. 11 — Extend the RISC-V assembler available on the author’s website to support the
following extensions: double precision, vector, SIMD and cryptographic operations.
Ex. 12 — Cross-compile a piece of C code using the RISC-V and ARM cross compilers.
Use the -03 gcc optimization. Next, run them on the Qemu emulation engine. Compare the
performance and find the reasons for the differences.
c Smruti R. Sarangi 260