Lecture 3
Lecture 3
Machine
Instructions: Overview
■ Language of the machine
■ More primitive than higher level languages, e.g., no
sophisticated control flow such as while or for loops
■ Very restrictive
■ e.g., MIPS arithmetic instructions
■ We’ll be working with the MIPS instruction set architecture
■ inspired most architectures developed since the 80's
■ used by NEC, Nintendo, Silicon Graphics, Sony
■ the name is just not related to millions of instructions per second !
■ it stands for microcomputer without interlocked pipeline stages !
■ Design goals: maximize performance and minimize cost and
reduce design time
MIPS Arithmetic
■ All MIPS arithmetic instructions have 3 operands
■ Operand order is fixed (e.g., destination first)
■ Example:
compiler’s job to associate
C code: A = B + C variables with registers
Control Input
Memory
Datapath Output
Processor I/O
Memory Organization
■ Viewed as a large single-dimension array with access by address
■ A memory address is an index into the memory array
■ Byte addressing means that the index points to a byte of
memory, and that the unit of memory accessed by a load/store
is a byte
0 8 bits of data
1 8 bits of data
2 8 bits of data
3 8 bits of data
4 8 bits of data
5 8 bits of data
6 8 bits of data
...
Memory Organization
■ Bytes are load/store units, but most data items use larger words
■ For MIPS, a word is 32 bits or 4 bytes.
8 32 bits of data
12 32 bits of data
...
Load/Store Instructions
■ Load and store instructions
■ Example: why 32?
constant er sathe 4 kore multiply hbe
■ Instruction Meaning
■ Instructions, like registers and words of data, are also 32 bits long
■ Example: add $t0, $s1, $s2
■ registers are numbered, e.g., $t0 is 8, $s1 is 17, $s2 is 18
Big-endian Little-endian
Bit 31
Bit 31
Bit 0
Bit 0
Memory Memory
Byte 0 Byte 1 Byte 2 Byte 3 Word 0 Byte 3 Byte 2 Byte 1 Byte 0 Word 0
Byte 4 Byte 5 Byte 6 Byte 7 Word 1 Byte 7 Byte 6 Byte 5 Byte 4 Word 1
Memory Organization:
Big/Little Endian Byte Order
■ SPIM’s memory storage depends on that of the underlying
machine
■ Intel 80x86 processors are little-endian
■ because SPIM always shows words from left to right a “mental
adjustment” has to be made for little-endian memory as in Intel PCs
in our labs: start at right of first word go left, start at right of next
word go left, …!
■ Word placement in memory (from .data area of code) or word
access (lw, sw) is the same in big or little endian
■ Byte placement and byte access (lb, lbu, sb) depend on big or
little endian because of the different numbering of bytes within a
word
■ Character placement in memory (from .data area of code)
depend on big or little endian because it is equivalent to byte
placement after ASCII encoding
■ Run storeWords.asm from SPIM examples!!
Control: Conditional Branch
■ Decision making instructions
■ alter the control flow,
■ i.e., change the next instruction to be executed
I op rs rt 16 bit
offset
J op 26 bit
address
■ MIPS jump j instruction replaces lower 28 bits of the PC with
A00 where A is the 26 bit address; it never changes upper 4 bits
■ Example: if PC = 1011X (where X = 28 bits), it is replaced with
1011A00
■ there are 16(=24) partitions of the 232 size address space, each
partition of size 256 MB (=228), such that, in each partition the upper
4 bits of the address is same.
■ if a program crosses an address partition, then a j that reaches a
different partition has to be replaced by jr with a full 32-bit address
first loaded into the jump register
■ therefore, OS should always try to load a program inside a single
partition
Constants
■ Small constants are used quite frequently (50% of operands)
e.g., A = A + 5;
B = B + 1;
C = C - 18;
■ MIPS Instructions:
addi $29, $29, 4
slti $8, $18, 10
andi $29, $29, 6
ori $29, $29, 4
op rs rt 16 bit number
How about larger constants?
■ First we need to load a 32 bit constant into a register
■ Must use two instructions for this: first new load upper immediate
instruction for upper 16 bits
lui $t0, 1010101010101010 filled with zeros
1010101010101010 0000000000000000
0000000000000000 1010101010101010
ori
1010101010101010 1010101010101010
■ Formats:
R op rs rt rd shamt funct
I op rs rt 16 bit address
op 26 bit address
J
Control Flow
■ We have: beq, bne. What about branch-if-less-than?
■ New instruction:
if
$s1 < $s2 then
$t0 = 1
slt $t0, $s1, $s2 else
$t0 = 0
int add10(int i)
{ return (i + 10);}
Procedures
■ Translated MIPS assembly
■ Note more efficient use of registers possible! save register
in stack, see
.text figure below
.globl main add10:
addi $sp, $sp, -4
main: sw $s0, 0($sp)
addi $s0, $0, 5
add $a0, $s0, $0 addi $s0, $a0, 10
argument add $v0, $s0, $0
to callee jal add10 result
control returns here to caller
jump and link lw $s0, 0($sp)
add $s1, $v0, $0 restore
addi $sp, $sp,
values 4
add $s0, $s1, $0
jr $ra
li $v0, 10 return
system code
syscall MEMORY High address
& call to
exit $sp
Content of $s0
Low address
Run this code with PCSpim: procCallsProg1.asm
MIPS: Software Conventions
for Registers
0 zero constant 0
1 at reserved for assembler 16 s0 callee saves
2 v0 results from callee ... (caller can clobber)
3 v1 returned to caller 23 s7
4 a0 arguments to callee 24 t8 temporary (cont’d)
5 a1 from caller: caller saves 25 t9
6 a2 26 k0 reserved for OS kernel
7 a3 27 k1
8 t0 temporary: caller saves 28 gp pointer to global area
... (callee can clobber) 29 sp stack pointer
15 t7 30 fp frame pointer
31 ra return Address (HW):
caller saves
Procedures (recursive)
■ Example C code – recursive factorial subroutine:
int main()
{ int i;
i = 4;
j = fact(i);
return 0;}
int fact(int n)
{ if (n < 1) return (1);
else return ( n*fact(n-1) );}
■
Procedures (recursive)
Translated MIPS assembly:
.text
.globl main slti $t0, $a0, 1
branch to
beq $t0, $0, L1
main: L1 if
n>=1
nop
addi $a0, $0, 4
jal fact addi $v0, $0, 1
control
returns nop return addi $sp, $sp, 8
from 1 jr $ra
fact if n <
move $a0, $v0
1 L1:
print li $v0, 1 addi $a0, $a0, -1
value syscall if n>=1 call jal fact
returned fact
by recursively
nop
fact li $v0, 10 with argument
exit syscall n-1 lw $a0, 0($sp)
restore return lw $ra, 4($sp)
fact: address, addi $sp, $sp, 8
argument,
addi $sp, $sp, -8
save return sw $ra, 4($sp) and stack pointer
address and return mul $v0, $a0, $v0
argument in sw $a0, 0($sp) n*fact(n-1
stack ) jr $ra
Run this code with PCSpim: factorialRecursive.asmreturn
control
Using a Frame Pointer
Variables that are local to a procedure but do not fit into registers (e.g., local arrays, struc-
tures, etc.) are also stored in the stack. This area of the stack is the frame. The frame pointer
$fp points to the top of the frame and the stack pointer to the bottom. The frame pointer does
not change during procedure execution, unlike the stack pointer, so it is a stable base
register from which to compute offsets to local variables.
Use of the frame pointer is optional. If there are no local variables to store in the stack it is
not efficient to use a frame pointer.
Using a Frame Pointer
■ Example: procCallsProg1Modified.asm
This program shows code where it may be better to use $fp
■ Because the stack size is changing, the offset of variables stored in
the stack w.r.t. the stack pointer $sp changes as well. However, the
offset w.r.t. $fp would remain constant.
■ Why would this be better?
The compiler, when generating assembly, typically maintains a table
of program variables and their locations. If these locations are
offsets w.r.t $sp, then every entry must be updated every time the
stack size changes!
■ Exercise:
Modify procCallsProg1Modified.asm to use a frame pointer
■ Observe that SPIM names register 30 as s8 rather than fp. Of
course, you can use it as fp, but make sure to initialize it with the
same value as sp, i.e., 7fffeffc.
MIPS Addressing Modes
Overview of MIPS
■ Simple instructions – all 32 bits wide
■ Very structured – no unnecessary baggage
■ Only three instruction formats
op rs rt rd shamt funct
R
I op rs rt 16 bit address
op 26 bit address
J
■ Saving grace:
■ the most frequently used instructions are not too difficult to build
■ compilers avoid the portions of the architecture that are slow
■ Design Principles:
■ simplicity favors regularity
■ smaller is faster
■ good design demands compromise
■ make the common case fast.