Chapter 2
Chapter 2
Computer Architecture
Instructions
Prepared by
Madhusudan Basak
Assistant Professor
CSE, BUET
*Some modifications done by Saem Hasan
Instructions
A computer is run by instructions
Instruction Set
All possible instructions for a CPU
Instruction Set Architecture
Instruction set architecture is the attributes of a computing system as seen by
the assembly language programmer or compiler.
Instruction Set (what operations can be performed?)
Would it be good to use just one format for all kind of operations?
No!
Design Principle 3: Good design demands good compromises.
Another MIPS Instruction Format
I-Format
Deals with immediate and data transfer operations
No NOT Operation?
Shift Left
sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits
Branching Operations
if (i==j)
f = g + h;
else
f = g – h;
Transfer Control:
Jump-and-link instruction (jal)
Keeps the return address for the procedure in $ra and jumps to the
Procedures in MIPS
Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
Performing Tasks
Can perform all the operations allowed by the MIPS instruction set
Procedures in MIPS
Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.
6. Return control to the point of origin, since a procedure can be called from several
points in a program.
Procedure returned
Stack is adjusted using $sp and $fp pointers
An unconditional jump to the address from where the caller will resume execution
non_leaf:
An Example
...
add $a0, $s1, $zero Put parameters in a place
add $a1, $s2, $zero where the procedure can access
add $a2, $s3, $zero them.
add $a3, $s4, $zero Transfer control to the
jal leaf procedure.
add $s1, $zero, $v0
...
leaf: addi $sp, $sp, –12 # adjust stack to make room for 3 items Acquire the storage
sw $t1, 8($sp) # save register $t1 for use afterwards resources needed for the
sw $t0, 4($sp) # save register $t0 for use afterwards procedure
sw $s0, 0($sp) # save register $s0 for use afterwards
Unsigned Version:
Load Byte: lbu
Store Byte: Not available in MIPS
Dealing with Strings
Three commonly available strategies
the first position of the string is reserved to give the length of a string
an accompanying variable has the length of the string
the last position of a string is indicated by a character used to mark the end of a
string
Dealing with String: Example
ASCII demands 1-byte memory operation
MIPS supports
lb $t0,0($sp) #Reads 1 byte from memory and stores in the lowest (rightmost) byte of $t0
sb $t0,0($gp) #Reads 1 byte from the lowest (rightmost) byte of $t0 and stores in the memory
Unsigned Version:
Load Byte: lbu
Store Byte: Not available in MIPS
Dealing with String: An Example
strcpy: addi $sp,$sp,–4 # adjust stack for 1 more item
sw $s0, 0($sp) # save $s0
add $s0,$zero,$zero #i=0+0
L1: add $t1,$s0,$a1 # address of y[i] in $t1
lbu $t2, 0($t1) # $t2 = y[i]
add $t3,$s0,$a0 # address of x[i] in $t3
sb $t2, 0($t3) # x[i] = y[i]
beq $t2,$zero,L2 # if y[i] == 0, go to L2
addi $s0, $s0,1 #i=i+1
j L1 # go to L1. While loop continues
L2: lw $s0, 0($sp) # End of string. Restore old $s0
addi $sp,$sp,4 # pop 1 word off stack
jr $ra # return
String Copy Example
MIPS code:
strcpy:
addi $sp, $sp, -4 # adjust stack for 1 item
sw $s0, 0($sp) # save $s0
add $s0, $zero, $zero # i = 0
L1: add $t1, $s0, $a1 # addr of y[i] in $t1
lbu $t2, 0($t1) # $t2 = y[i]
add $t3, $s0, $a0 # addr of x[i] in $t3
sb $t2, 0($t3) # x[i] = y[i]
beq $t2, $zero, L2 # exit loop if y[i] == 0
addi $s0, $s0, 1 # i = i + 1
j L1 # next iteration of loop
L2: lw $s0, 0($sp) # restore saved $s0
addi $sp, $sp, 4 # pop 1 item from stack
jr $ra # and return
op address
6 bits 26 bits
Chapter 2 —
Instructions: Language
Target Addressing Example
Loop code from earlier example
Assume Loop at location 80000
Many compilers
produce object
modules directly
Static
linking
Assembler Pseudoinstructions
Most assembler instructions represent machine instructions one-to-one
Pseudoinstructions: figments of the assembler’s imagination
move $t0, $t1 → add $t0, $zero, $t1
blt $t0, $t1, L → slt $at, $t0, $t1
bne $at, $zero, L
$at (register 1): assembler temporary
Producing an Object Module
Assembler (or compiler) translates program into machine instructions
Provides information for building a complete program from the pieces
Header: described contents of object module
Text segment: translated instructions
Static data segment: data allocated for the life of the program
Relocation info: for contents that depend on absolute location of loaded
program
Symbol table: global definitions and external refs
Debug info: for associating with source code
Linking Object Modules
Produces an executable image multiple object modules -> unified executable file
1. Merges segments
2. Resolve labels (determine their addresses)
3. Patch location-dependent and external refs
Could leave location dependencies for fixing by a relocating loader
But with virtual memory, no need to do this
Program can be loaded into absolute location in virtual memory space
Since each process has its own virtual address space, there is no need for relocation during loading.
text Size = Instructions
data size = variables 100 = 256 bytes
Object 1
X is a global/external variable, as it is in
data segment
Indirection table
Linker/loader code
Dynamically
mapped code
Starting a Java Program
Java programs ensure portability sacrificing some performance
Compiled first in to an easy-to-interpret instruction set: Java bytecode
A software interpreter, called Java Virtual Machine (JVM) can execute Java byte code
This process is slow
Just In Time Compiler (JIT) makes it faster
Statistically identify the commonly used (hot) methods
Compiles these methods into native instruction set
Starting Java Applications
Simple portable
instruction set
for the JVM
Compiles
Interprets
bytecodes of
bytecodes
“hot” methods
into native
code for host
machine
C Sort Example
Chapter 2 —
Instructions: Language
The Sort Procedure in C
Non-leaf (calls swap)
void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i += 1) {
for (j = i – 1;
j >= 0 && v[j] > v[j + 1];
j -= 1) {
swap(v,j);
}
}
}
v in $a0, k in $a1, i in $s0, j in $s1
Chapter 2 —
Instructions: Language
The Procedure Body
move $s2, $a0 # save $a0 into $s2 Move
move $s3, $a1 # save $a1 into $s3 params
move $s0, $zero # i = 0
Outer loop
for1tst: slt $t0, $s0, $s3 # $t0 = 0 if $s0 ≥ $s3 (i ≥ n)
beq $t0, $zero, exit1 # go to exit1 if $s0 ≥ $s3 (i ≥ n)
addi $s1, $s0, –1 # j = i – 1
for2tst: slti $t0, $s1, 0 # $t0 = 1 if $s1 < 0 (j < 0)
bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0)
sll $t1, $s1, 2 # $t1 = j * 4 Inner loop
add $t2, $s2, $t1 # $t2 = v + (j * 4)
lw $t3, 0($t2) # $t3 = v[j]
lw $t4, 4($t2) # $t4 = v[j + 1]
slt $t0, $t4, $t3 # $t0 = 0 if $t4 ≥ $t3
beq $t0, $zero, exit2 # go to exit2 if $t4 ≥ $t3
move $a0, $s2 # 1st param of swap is v (old $a0) Pass
move $a1, $s1 # 2nd param of swap is j params
jal swap # call swap procedure & call
addi $s1, $s1, –1 # j –= 1
Inner loop
j for2tst # jump to test of inner loop
exit2: addi $s0, $s0, 1 # i += 1
Outer loop
j for1tst # jump to test of outer loop
The Full Procedure
sort: addi $sp,$sp, –20 # make room on stack for 5 registers
sw $ra, 16($sp) # save $ra on stack
sw $s3,12($sp) # save $s3 on stack
sw $s2, 8($sp) # save $s2 on stack
sw $s1, 4($sp) # save $s1 on stack
sw $s0, 0($sp) # save $s0 on stack
… # procedure body
…
exit1: lw $s0, 0($sp) # restore $s0 from stack
lw $s1, 4($sp) # restore $s1 from stack
lw $s2, 8($sp) # restore $s2 from stack
lw $s3,12($sp) # restore $s3 from stack
lw $ra,16($sp) # restore $ra from stack
addi $sp,$sp, 20 # restore stack pointer
jr $ra # return to calling routine
Effect of Compiler Optimization
1.5
0.5
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
1.5
0.5
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
2000
1500
1000
500
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
Example: Clearing and Array
clear1(int array[], int size) { clear2(int *array, int size) {
int i; int *p;
for (i = 0; i < size; i += 1) for (p = &array[0]; p < &array[size];
array[i] = 0; p = p + 1)
} *p = 0;
}
move $t0,$zero # i = 0 move $t0,$a0 # p = & array[0]
loop1: sll $t1,$t0,2 # $t1 = i * 4 sll $t1,$a1,2 # $t1 = size * 4
add $t2,$a0,$t1 # $t2 = add $t2,$a0,$t1 # $t2 =
# &array[i] # &array[size]
sw $zero, 0($t2) # array[i] = 0 loop2: sw $zero,0($t0) # Memory[p] = 0
addi $t0,$t0,1 # i = i + 1 addi $t0,$t0,4 # p = p + 4
slt $t3,$t0,$a1 # $t3 = slt $t3,$t0,$t2 # $t3 =
# (i < size) #(p<&array[size])
bne $t3,$zero,loop1 # if (…) bne $t3,$zero,loop2 # if (…)
# goto loop1 # goto loop2
Comparison of Array vs. Ptr
Multiply “strength reduced” to shift
Array version requires shift to be inside loop
Part of index calculation for incremented i
c.f. incrementing pointer
Compiler can achieve same effect as manual use of pointers
Induction variable elimination
Better to make program clearer and safer
MIPS (RISC) Design Principles In Summary
Simplicity favors regularity
fixed size instructions
small number of instruction formats
opcode always the first 6 bits
Smaller is faster
limited instruction set
limited number of registers
limited number of addressing modes
Good design demands good compromises
Three instruction formats
Make the common case fast
arithmetic operands using the registers
Saving the commonly used registers into stack
PC-relative addressing
allow instructions to contain immediate operands
Lessons Learnt
Instruction count and CPI are not good performance indicators in isolation
Compiler optimizations are sensitive to the algorithm
Java/JIT compiled code is significantly faster than JVM interpreted
Comparable to optimized C in some cases
Nothing can fix a dumb algorithm!
Alternative Architectures
Design alternative
provide more powerful operations
goal is to reduce number of instructions executed
danger is a slower cycle time and/or a higher CPI
2.16 Real Stuff: ARM Instructions
ARM & MIPS Similarities
ARM: the most popular embedded core
Similar basic set of instructions to MIPS
ARM MIPS
Date announced 1985 1985
Instruction size 32 bits 32 bits
Address space 32-bit flat 32-bit flat
Data alignment Aligned Aligned
Data addressing modes 9 3
Registers 15 × 32-bit 31 × 32-bit
Input/output Memory Memory
mapped mapped
Compare and Branch in ARM
Uses condition codes for result of an arithmetic/logical instruction
Negative, zero, carry, overflow
Compare instructions to set condition codes without keeping the result
Each instruction can be conditional
Top 4 bits of instruction word: condition value
Can avoid branches over single instructions
Instruction Encoding
2.17 Real Stuff: x86 Instructions
The Intel x86 ISA
Evolution with backward compatibility
8080 (1974): 8-bit microprocessor
• Accumulator, plus 3 index-register pairs
8086 (1978): 16-bit extension to 8080
• Complex instruction set (CISC)
8087 (1980): floating-point coprocessor
• Adds FP instructions and register stack
80286 (1982): 24-bit addresses, MMU
• Segmented memory mapping and protection
80386 (1985): 32-bit extension (now IA-32)
• Additional addressing modes and operations
• Paged memory mapping as well as segments
The Intel x86 ISA
Further evolution…
i486 (1989): pipelined, on-chip caches and FPU
• Compatible competitors: AMD, Cyrix, …
Pentium (1993): superscalar, 64-bit datapath
• Later versions added MMX (Multi-Media eXtension) instructions
• The infamous FDIV bug
Pentium Pro (1995), Pentium II (1997)
• New microarchitecture (see Colwell, The Pentium Chronicles)
Pentium III (1999)
• Added SSE (Streaming SIMD Extensions) and associated registers
Pentium 4 (2001)
• New microarchitecture
• Added SSE2 instructions
The Intel x86 ISA
And further…
AMD64 (2003): extended architecture to 64 bits
EM64T – Extended Memory 64 Technology (2004)
• AMD64 adopted by Intel (with refinements)
• Added SSE3 instructions
Intel Core (2006)
• Added SSE4 instructions, virtual machine support
AMD64 (announced 2007): SSE5 instructions
• Intel declined to follow, instead…
Advanced Vector Extension (announced 2008)
• Longer SSE registers, more instructions
If Intel didn’t extend with compatibility, its competitors would!
Technical elegance ≠ market success
Basic x86 Registers
Basic x86 Addressing Modes
Two operands per instruction
Source/dest operand Second source operand
Register Register
Register Immediate
Register Memory
Memory Register
Memory Immediate