Week 2 - Instructions Language of The Computer - Part 1
Week 2 - Instructions Language of The Computer - Part 1
Instruction Set
• The repertoire of instructions of a computer
• Different computers have different instruction sets
• But with many aspects in common
• Early computers had very simple instruction sets
• Simplified implementation
• Many modern computers also have simple instruction sets
Chapter 2 — Instructions: Language of the Computer — 3
Arithmetic Operations
• Add and subtract, three operands
• Two sources and one destination
add a, b, c # a gets b + c
• All arithmetic operations have this form
• Design Principle 1: Simplicity favours regularity
• Regularity makes implementation simpler
• Simplicity enables higher performance at lower cost
Chapter 2 — Instructions: Language of the Computer — 5
Arithmetic Example
• C code:
f = (g + h) - (i + j);
• Compiled RISC-V code:
Register Operands
• Arithmetic instructions use register
operands
RISC-V Registers
• x0: the constant value 0
• x1: return address
• x2: stack pointer
• x3: global pointer
• x4: thread pointer
• x5 – x7, x28 – x31: temporaries
• x8: frame pointer
• x9, x18 – x27: saved registers
• x10 – x11: function arguments/results
• x12 – x17: function arguments
Chapter 2 — Instructions: Language of the Computer — 8
Memory Operands
• Main memory used for composite data
• Arrays, structures, dynamic data
• To apply arithmetic operations
• Load values from memory into registers
• Store result from register to memory
• Memory is byte addressed
• Each address identifies an 8-bit byte
• RISC-V is Little Endian
• Least-significant byte at least address of a word
• c.f. Big Endian: most-significant byte at least address
• RISC-V does not require words to be aligned in
memory
• Unlike some other ISAs
Chapter 2 — Instructions: Language of the Computer —
10
ld x9, 64(x22)
add x9, x21, x9
sd x9, 96(x22)
Chapter 2 — Instructions: Language of the Computer —
11
Immediate Operands
• Constant data specified in an instruction
addi x22, x22, 4
n Range: 0 to +2n – 1
n Example
n 0000 0000 … 0000 10112
= 0 + … + 1×23 + 0×22 +1×21 +1×20
= 0 + … + 8 + 0 + 2 + 1 = 1110
n Using 64 bits: 0 to +18,446,774,073,709,551,615
Chapter 2 — Instructions: Language of the Computer —
14
Signed Negation
• Complement and add 1
• Complement means 1 → 0, 0 → 1
x + x = 1111...1112 = -1
x + 1 = -x
n Example: negate +2
n +2 = 0000 0000 … 00102
n –2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
Chapter 2 — Instructions: Language of the Computer —
17
Sign Extension
• Representing a number using more bits
• Preserve the numeric value
• Replicate the sign bit to the left
• c.f. unsigned values: extend with 0s
• Examples: 8-bit to 16-bit
• +2: 0000 0010 => 0000 0000 0000 0010
• –2: 1111 1110 => 1111 1111 1111 1110
Representing Instructions
• Instructions are encoded in binary
• Called machine code
• RISC-V instructions
• Encoded as 32-bit instruction words
• Small number of formats encoding operation code (opcode),
register numbers, …
• Regularity!
Chapter 2 — Instructions: Language of the Computer —
19
Hexadecimal
• Base 16
• Compact representation of bit strings
• 4 bits per hex digit
• Instruction fields
• opcode: operation code
• rd: destination register number
• funct3: 3-bit function code (additional opcode)
• rs1: the first source register number
• rs2: the second source register number
• funct7: 7-bit function code (additional opcode)
Chapter 2 — Instructions: Language of the Computer —
21
R-format Example
funct7 rs2 rs1 funct3 rd opcode
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
add x9,x20,x21
0 21 20 0 9 51
Logical Operations
• Instructions for bitwise manipulation
Shift Operations
funct6 immed rs1 funct3 rd opcode
6 bits 6 bits 5 bits 3 bits 5 bits 7 bits
AND Operations
• Useful to mask bits in a word
• Select some bits, clear others to 0
and x9,x10,x11
OR Operations
• Useful to include bits in a word
• Set some bits to 1, leave others unchanged
or x9,x10,x11
XOR Operations
• Differencing operation
• Set some bits to 1, leave others unchanged
Conditional Operations
• Branch to a labeled instruction if a condition is true
• Otherwise, continue sequentially
Compiling If Statements
• C code:
if (i==j) f = g+h;
else f = g-h;
• f, g, … in x19, x20, …
• Compiled RISC-V code:
while (save[i] == k) i += 1;
• i in x22, k in x24, address of save in x25
• Compiled RISC-V code:
Loop: slli x10, x22, 3
add x10, x10, x25
ld x9, 0(x10)
bne x9, x24, Exit
addi x22, x22, 1
beq x0, x0, Loop
Exit: …
Chapter 2 — Instructions: Language of the Computer —
33
Basic Blocks
• A basic block is a sequence of instructions with
• No embedded branches (except at end)
• No branch targets (except at beginning)
• Example
• if (a > b) a += 1;
• a in x22, b in x23
bge x23, x22, Exit // branch if b >= a
addi x22, x22, 1
Exit:
Chapter 2 — Instructions: Language of the Computer —
35
Procedure Calling
• Steps required
1. Place parameters in registers x10 to x17
2. Transfer control to procedure
3. Acquire storage for procedure
4. Perform procedure’s operations
5. Place result in register for caller
6. Return to place of call (address in x1)
Chapter 2 — Instructions: Language of the Computer —
37
Register Usage
• x5 – x7, x28 – x31: temporary registers
• Not preserved by the callee
Non-Leaf Procedures
• Procedures that call other procedures
• For nested call, caller needs to save on the stack:
• Its return address
• Any arguments and temporaries needed after the call
• Argument n in x10
• Result in x10
Chapter 2 — Instructions: Language of the Computer —
44
Memory Layout
• Text: program code
• Static data: global variables
• e.g., static variables in C, constant
arrays and strings
• x3 (global pointer) initialized to
address allowing ±offsets into this
segment
• Dynamic data: heap
• E.g., malloc in C, new in Java
• Stack: automatic storage
Chapter 2 — Instructions: Language of the Computer —
46
Character Data
• Byte-encoded character sets
• ASCII: 128 characters
• 95 graphic, 33 control
• Latin-1: 256 characters
• ASCII, +96 more graphic characters
Byte/Halfword/Word Operations
• RISC-V byte/halfword/word load/store
• Load byte/halfword/word: Sign extend to 64 bits in rd
• lb rd, offset(rs1)
• lh rd, offset(rs1)
• lw rd, offset(rs1)
• Load byte/halfword/word unsigned: Zero extend to 64 bits in rd
• lbu rd, offset(rs1)
• lhu rd, offset(rs1)
• lwu rd, offset(rs1)
• Store byte/halfword/word: Store rightmost 8/16/32 bits
• sb rs2, offset(rs1)
• sh rs2, offset(rs1)
• sw rs2, offset(rs1)
Chapter 2 — Instructions: Language of the Computer —
49
32-bit Constants
• Most constants are small
• 12-bit immediate is sufficient
• For the occasional 32-bit constant
lui rd, constant
• Copies 20-bit constant to bits [31:12] of rd
• Extends bit 31 to bits [63:32]
• Clears bits [11:0] of rd to 0
Branch Addressing
• Branch instructions specify
• Opcode, two registers, target address
• Most branch targets are near branch
• Forward or backward
• SB format:
imm imm
[10:5] rs2 rs1 funct3 [4:1] opcode
imm[12] imm[11]
• PC-relative addressing
• Target address = PC + immediate × 2
Chapter 2 — Instructions: Language of the Computer —
54
Jump Addressing
• Jump and link (jal) target uses 20-bit immediate for larger
range
• UJ format:
Synchronization
• Two processors sharing an area of memory
• P1 writes, then P2 reads
• Data race if P1 and P2 don’t synchronize
• Result depends of order of accesses
• Hardware support required
• Atomic read/write memory operation
• No other access to the location allowed between the read
and write
• Could be a single instruction
• E.g., atomic swap of register ↔ memory
• Or an atomic pair of instructions
Chapter 2 — Instructions: Language of the Computer —
58
Synchronization in RISC-V
• Load reserved: lr.d rd,(rs1)
• Load from address in rs1 to rd
• Place reservation on memory address
• Store conditional: sc.d rd,(rs1),rs2
• Store from rs2 to address in rs1
• Succeeds if location not changed since the lr.d
• Returns 0 in rd
• Fails if location is changed
• Returns non-zero value in rd
Chapter 2 — Instructions: Language of the Computer —
59
Synchronization in RISC-V
• Example 1: atomic swap (to test/set lock variable)
again: lr.d x10,(x20)
sc.d x11,(x20),x23 // X11 = status
bne x11,x0,again // branch if store failed
addi x23,x10,0 // X23 = loaded value
• Example 2: lock
addi x12,x0,1 // copy locked value
again: lr.d x10,(x20) // read lock
bne x10,x0,again // check if it is 0
yet
sc.d x11,(x20),x12 // attempt to store
bne x11,x0,again // branch if fails
• Unlock:
sd x0,0(x20) // free lock
Chapter 2 — Instructions: Language of the Computer —
Static linking
Chapter 2 — Instructions: Language of the Computer —
61
Loading a Program
• Load from image file on disk into memory
1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory
• Or set page table entries so they can be faulted in
4. Set up arguments on stack
5. Initialize registers (including sp, fp, gp)
6. Jump to startup routine
• Copies arguments to x10, … and calls main
• When main returns, do exit syscall
Chapter 2 — Instructions: Language of the Computer —
64
Dynamic Linking
• Only link/load library procedure when it is called
• Requires procedure code to be relocatable
• Avoids image bloat caused by static linking of all (transitively) referenced
libraries
• Automatically picks up new library versions
Chapter 2 — Instructions: Language of the Computer —
65
Lazy Linkage
Indirection table
Linker/loader code
Dynamically
mapped code
Chapter 2 — Instructions: Language of the Computer —
66
Compiles
Interprets
bytecodes of
bytecodes
“hot” methods
into native
code for host
machine
Chapter 2 — Instructions: Language of the Computer —
C Sort Example
• Illustrates use of assembly instructions for a C bubble sort
function
• Swap procedure (leaf)
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
• v in x10, k in x11, temp in x5
Chapter 2 — Instructions: Language of the Computer —
68
li x19,0 // i = 0
for1tst:
bge x19,x11,exit1 // go to exit1 if x19 ≥ x11 (i≥n)
addi x19,x19,1 // i += 1
j for1tst // branch to test of outer loop
exit1:
Chapter 2 — Instructions: Language of the Computer —
71
Preserving Registers
• Preserve saved registers:
addi sp,sp,-40 // make room on stack for 5 regs
sd x1,32(sp) // save x1 on stack
sd x22,24(sp) // save x22 on stack
sd x21,16(sp) // save x21 on stack
sd x20,8(sp) // save x20 on stack
sd x19,0(sp) // save x19 on stack
1.5
0.5
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
1.5
0.5
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
2000
1500
1000
500
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
Chapter 2 — Instructions: Language of the Computer —
75
Lessons Learnt
• Instruction count and CPI are not good performance indicators
in isolation
• Compiler optimizations are sensitive to the algorithm
• Java/JIT compiled code is significantly faster than JVM
interpreted
• Comparable to optimized C in some cases
• Nothing can fix a dumb algorithm!
Chapter 2 — Instructions: Language of the Computer —
MIPS Instructions
• MIPS: commercial predecessor to RISC-V
• Similar basic set of instructions
• 32-bit instructions
• 32 general purpose registers, register 0 is always 0
• 32 floating-point registers
• Memory accessed only by load/store instructions
• Consistent use of addressing modes for all data sizes
• Different conditional branches
• For <, <=, >, >=
• RISC-V: blt, bge, bltu, bgeu
• MIPS: slt, sltu (set less than, result is 0 or 1)
• Then use beq, bne to complete the branch
Chapter 2 — Instructions: Language of the Computer —
80
Instruction Encoding
Chapter 2 — Instructions: Language of the Computer —
Implementing IA-32
• Complex instruction set makes implementation difficult
• Hardware translates instructions to simpler microoperations
• Simple instructions: 1–1
• Complex instructions: 1–many
• Microengine similar to RISC
• Market share makes this economically viable
Fallacies
• Powerful instruction Þ higher performance
• Fewer instructions required
• But complex instructions are hard to implement
• May slow down all instructions, including simple ones
• Compilers are good at making fast code from simple
instructions
• Use assembly code for high performance
• But modern compilers are better at dealing with modern
processors
• More lines of code Þ more errors and less productivity
Chapter 2 — Instructions: Language of the Computer —
92
Fallacies
• Backward compatibility Þ instruction set doesn’t change
• But they do accrete more instructions
Pitfalls
• Sequential words are not at sequential addresses
• Increment by 4, not by 1!
• Keeping a pointer to an automatic variable after procedure
returns
• e.g., passing pointer back via an argument
• Pointer becomes invalid when stack popped
Chapter 2 — Instructions: Language of the Computer —
Concluding Remarks
• Design principles
1. Simplicity favors regularity
2. Smaller is faster
3. Make the common case fast
4. Good design demands good compromises
• Layers of software/hardware
• Compiler, assembler, hardware
• RISC-V: typical of RISC ISAs
• c.f. x86