0% found this document useful (0 votes)
6 views113 pages

Chapter 2

The document outlines the fundamentals of computer architecture, focusing on the MIPS instruction set architecture, its design principles, and the processes involved in executing procedures. It details the structure of instructions, data storage, and the handling of registers, including how to manage memory and perform operations with constants. Additionally, it explains the steps involved in procedure calls and the management of stack memory during execution.

Uploaded by

শতক দে
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views113 pages

Chapter 2

The document outlines the fundamentals of computer architecture, focusing on the MIPS instruction set architecture, its design principles, and the processes involved in executing procedures. It details the structure of instructions, data storage, and the handling of registers, including how to manage memory and perform operations with constants. Additionally, it explains the steps involved in procedure calls and the management of stack memory during execution.

Uploaded by

শতক দে
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

CSE 305

Computer Architecture

Instructions
Prepared by
Madhusudan Basak
Assistant Professor
CSE, BUET
*Some modifications done by Saem Hasan
Instructions
 A computer is run by instructions
 Instruction Set
 All possible instructions for a CPU
Instruction Set Architecture
 Instruction set architecture is the attributes of a computing system as seen by
the assembly language programmer or compiler.
 Instruction Set (what operations can be performed?)

 Instruction Format (how are instructions specified?)

 Data storage (where is data located?)

 Addressing Modes (how is data accessed?)

 Exceptional Conditions (what happens if something goes wrong?)


Design Philosophy

MIPS Architecture
 Acronym of “Microprocessor without Interlocked Pipelined Stages”
 Is a RISC (Reduced Instruction Set Computer) ISA (Instruction Set Architecture)
 Developed by MIPS Technologies (then MIPS Computer Systems) in 1985
Design Principles
 Design Principle 1: Simplicity favors regularity.

 Design Principle 2: Smaller is faster.

 Design Principle 3: Good design demands good compromises.


Simplicity favors regularity
 The instructions of MIPS are fixed and rigid
 Rigidity ensures Regularity
 Simplicity favors regularity
Example Instruction
 and instructions always follow the following fixed format
Example Instruction
 and instructions always follow the following fixed format
MIPS Instruction Properties
 Arithmetic operations only takes register values as operands
 Size of a register : 32 bit
 Number of registers: 32
 Satisfies Design Principle 2: Smaller is faster.
• Large number of registers -> longer time for electronic signal to travel -> Increased
clock cycle
• Large number of registers -> Increased number of control bits

 A number is dedicated to a particular register.


 5 bit are reserved to indicate each register when the count is 32
 Compiler maps a program variable to a register
 There is a number assigned against each register
MIPS Instruction Properties
 Example
MIPS Instruction Properties
 What about data structures (Arrays and Structures)?
 These are kept in memory and a particular element is transferred to registers
whenever required
 Compiler
 places an array/structure into memory
 keeps the starting address of the memory in a register
 tracks which register is used for which array/structure

 Data Transfer Instructions


 load word:
 store word:
Data Transfer Instructions
 Example

Spilling Registers
 What if the number of variables and data structures is more than 32?
 The process of putting less commonly used variables into memory is called
Spilling Registers.
 Keeps the most frequently used variables in registers and less frequents one in
memory
 Moves variables around registers and memory
Operation with Constants
 Add immediate or addi instruction
 Takes one register and one constant as input

 Constant 0 is used very often (e.g., for transfer operations)


 A register is dedicated for this purpose
#$s2=$s3

 Is an instruction represented using register names?


Representing Instructions in the Computer
 Key principles of Today’s computer instructions
 Instructions are represented as numbers
 Program are stored in memory to be read or written, just like data
Representing Instructions in the Computer
 An instruction is a sequence of bits
 Each register is mapped to a number
 There is a convention to map register names into numbers (e.g., registers $s0 to
$s7 map onto registers 16 to 23)

 Each instruction is 32 bit long (size of a word)


 Satisfies Design Principle 1: Simplicity favors regularity.

 Instruction Format: format used by an instruction


An MIPS Instruction Format
 R-Format
 Deals with arithmetic operations with registers

 shorthand of opcode, denotes operation type and format type


 Source Register1
 Source Register 2
 Destination Register
 Shorthand of shift amount
 shorthand of function code, denotes the specific variant of an operation
An MIPS Instruction Format

Would it be good to use just one format for all kind of operations?

No!
Design Principle 3: Good design demands good compromises.
Another MIPS Instruction Format
 I-Format
 Deals with immediate and data transfer operations

 Here represents the destination register


MIPS Instruction Encoding (Simplified)
Summary
Logical Operations

No NOT Operation?
Shift Left
 sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits
Branching Operations

if (i==j)
f = g + h;
else
f = g – h;

Why not blt?


Branching Operations
Memory Allocation for Program and Data
 Reserved Memory
 Text Segment: Program Code
 Static Data Segment:
 Contains static variables, constants, arrays
 Dynamic Data Segment (Heap):
 Contains linked lists, variable length data
 Stack
 Contains function local variables, parameters etc.
Stack During Procedure Call
 Stores/saves the values of registers and Restores those later
 Either $sp and $fp combination, or only $sp is used
 The allocated stack area by a procedure is known as activation record or
procedure frame function er jonno allocated space = activation record
Stack During Procedure Call
 Stores the values of registers
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from
several points in a program.
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Place for parameters:


 $a0–$a3: four argument registers in which to pass parameters
 Memory (e.g., stack) can be used if more values to be passed
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.

2. Transfer control to the procedure.


3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Transfer Control:
 Jump-and-link instruction (jal)

 Keeps the return address for the procedure in $ra and jumps to the
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.

3. Acquire the storage resources needed for the procedure.


4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Acquiring the storage resources


 Different register values (manipulated by caller) are stored in the stack
 Prepares the registers for operations
 Can use the memory for data storage
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.

3. Acquire the storage resources needed for the procedure.


4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.

4. Perform the desired task.


5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Performing Tasks
 Can perform all the operations allowed by the MIPS instruction set
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.

5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Result storing by the callee


 $v0–$v1: two value registers in which to return values
 Memory (e.g., stack) can be used if more values to be returned
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.

6. Return control to the point of origin, since a procedure can be called from several
points in a program.
 Procedure returned
 Stack is adjusted using $sp and $fp pointers
 An unconditional jump to the address from where the caller will resume execution
non_leaf:
An Example
...
add $a0, $s1, $zero Put parameters in a place
add $a1, $s2, $zero where the procedure can access
add $a2, $s3, $zero them.
add $a3, $s4, $zero Transfer control to the
jal leaf procedure.
add $s1, $zero, $v0
...
leaf: addi $sp, $sp, –12 # adjust stack to make room for 3 items Acquire the storage
sw $t1, 8($sp) # save register $t1 for use afterwards resources needed for the
sw $t0, 4($sp) # save register $t0 for use afterwards procedure
sw $s0, 0($sp) # save register $s0 for use afterwards

add $t0,$a0,$a1 # register $t0 contains g + h


add $t1,$a2,$a3 # register $t1 contains i + j Perform the desired task.
sub $s0,$t0,$t1 # f = $t0 – $t1, which is (g + h)–(i + j)
Put the result value in a
place where the calling
add $v0,$s0,$zero # Set return Value
program can access it.
lw $s0, 0($sp) # restore register $s0 for caller
lw $t0, 4($sp) # restore register $t0 for caller
lw $t1, 8($sp) # restore register $t1 for caller
addi $sp,$sp,12 # adjust stack to delete 3 items
Return control to the point
of origin, since a procedure
jr $ra # jump back to calling routine can be called from several
points in a program.
Nested Call
fact: addi $sp, $sp, –8 # adjust stack for 2 items
sw $ra, 4($sp) # save the return address
sw $a0, 0($sp) # save the argument n
slti $t0,$a0,1 # test for n < 1
beq $t0,$zero,L1 # if n >= 1, go to L1
addi $v0,$zero,1 # Set the return value
addi $sp,$sp,8 # pop 2 items off stack
jr $ra # return to caller
L1: addi $a0,$a0,–1 # n >= 1: argument gets (n – 1)
jal fact # call fact with (n –1)
lw $a0, 0($sp) # return from jal: restore argument n
lw $ra, 4($sp) # restore the return address
addi $sp, $sp, 8 # adjust stack pointer to pop 2 items
mul $v0,$a0,$v0 # return n * fact (n – 1)
jr $ra # return to the caller
J-Format
 To support long jump to a remote procedure address

opcode Jumping Address


ASCII representation of characters
 ASCII stands for American Standard Code for Information Interchange
 Uses 1 byte to represent a character
1 Byte Memory Operation
 ASCII demands 1-byte memory operation
 MIPS supports
lb $t0,0($sp) #Reads 1 byte from memory and stores in the lowest (rightmost) byte of $t0
sb $t0,0($gp) #Reads 1 byte from the lowest (rightmost) byte of $t0 and stores in the memory

Unsigned Version:
Load Byte: lbu
Store Byte: Not available in MIPS
Dealing with Strings
 Three commonly available strategies
 the first position of the string is reserved to give the length of a string
 an accompanying variable has the length of the string
 the last position of a string is indicated by a character used to mark the end of a
string
Dealing with String: Example
 ASCII demands 1-byte memory operation
 MIPS supports
lb $t0,0($sp) #Reads 1 byte from memory and stores in the lowest (rightmost) byte of $t0
sb $t0,0($gp) #Reads 1 byte from the lowest (rightmost) byte of $t0 and stores in the memory

Unsigned Version:
Load Byte: lbu
Store Byte: Not available in MIPS
Dealing with String: An Example
strcpy: addi $sp,$sp,–4 # adjust stack for 1 more item
sw $s0, 0($sp) # save $s0
add $s0,$zero,$zero #i=0+0
L1: add $t1,$s0,$a1 # address of y[i] in $t1
lbu $t2, 0($t1) # $t2 = y[i]
add $t3,$s0,$a0 # address of x[i] in $t3
sb $t2, 0($t3) # x[i] = y[i]
beq $t2,$zero,L2 # if y[i] == 0, go to L2
addi $s0, $s0,1 #i=i+1
j L1 # go to L1. While loop continues
L2: lw $s0, 0($sp) # End of string. Restore old $s0
addi $sp,$sp,4 # pop 1 word off stack
jr $ra # return
String Copy Example
 MIPS code:
strcpy:
addi $sp, $sp, -4 # adjust stack for 1 item
sw $s0, 0($sp) # save $s0
add $s0, $zero, $zero # i = 0
L1: add $t1, $s0, $a1 # addr of y[i] in $t1
lbu $t2, 0($t1) # $t2 = y[i]
add $t3, $s0, $a0 # addr of x[i] in $t3
sb $t2, 0($t3) # x[i] = y[i]
beq $t2, $zero, L2 # exit loop if y[i] == 0
addi $s0, $s0, 1 # i = i + 1
j L1 # next iteration of loop
L2: lw $s0, 0($sp) # restore saved $s0
addi $sp, $sp, 4 # pop 1 item from stack
jr $ra # and return

Chapter 2 — Instructions: Language of the Computer — 46


Dealing with Multiple Languages
 Unicode, a universal encoding, supports alphabets of most human languages
 Java uses Unicode
 Requires 16 bits to represent a character
 Operations required to load and store 16bits (halfwords)
 Load halfword: lh
 Load halfword unsigned: lhu
 Store halfword: sh
Constant Size: Is larger feasible?
 The I-format is used by both immediate and memory data transfer operations

 More than 16 bit long constant?


 We have two options in hand
• Supporting short constant in 1 instruction and deal long constant with additional instructions
• Supporting long constant in 2 instructions
 We opted for the first option (to exploit the benefit of common case fast)
 Are we limited to use 16 bit constants (-32,768 to 32,767)?
 No  Use two instructions to populate a register with 32 bit constant
Manipulating Larger Constant
 Populating a register with 32 bit constant No sign extension for logical
 Load Upper Immediate (lui) : Loads upper 16 bits operations but it happens for
arithmetic operations
 Or Immediate (ori): : Loads lower 16 bits
 Example: x=y+4000000
 4000000 = 0000 0000 0011 1101 0000 1001 0000 0000
lui $s0, 61 // 61 = 0000 0000 0011 1101
$s0 0000 0000 0011 1101 0000 1001
0000 0000 0000
ori $s0, 2304 // 2304= 0000 1001 0000 0000
add $s1, $t0, $s0
0000 0000 0000 0000 0000 1001 0000 0000

 Think how to use 32 bit address to access data from memory


Addressing in Branches
 PC is 32 bit but the address in conditional branching (if-else, loop etc.) is16 bit

 Forces the program instruction count within instructions ( Bytes=128 KB)


 Most of the conditional branches jump nearby (weather forward or backward)
 Use PC relative addressing to access forward (PC + relative constant address), or
backward (PC – relative constant address) location
 Can support branching up to relative constant address value
 PC normally increments 1 word (4 bytes) after every instruction
 The addressing should be (PC+4 + relative constant address) for forward access
and (PC + 4 – relative constant address) for backward access
Addressing in Jumps
 PC is 32 bit but the address in Jumps is16 bit

opcode Jumping Address


 Forces the program instruction count within instructions ( )
Addressing in Jumps
 PC is 32 bit but the address in Jumps is16 bit

opcode Jumping Address


 Forces the program instruction count within instructions ( )

 Replaces the 28 rightmost byte of PC


 Known as pseudodirect addressing
 A program is not placed across an address boundary of 256 MB
 Otherwise, a jump must be replaced by a jump register instruction preceded
by other instructions to load the full 32-bit address into a register
Food for Thought
 Can you reverse engineer a binary?
 Is your IP secured?
Jump Addressing
 Jump (j and jal) targets could be anywhere in text
segment
 Encode full address in instruction

op address
6 bits 26 bits

 (Pseudo)Direct jump addressing


 Target address = PC31…28 : (address × 4)

Chapter 2 —
Instructions: Language
Target Addressing Example
 Loop code from earlier example
 Assume Loop at location 80000

Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0


add $t1, $t1, $s6 80004 0 9 22 9 0 32
lw $t0, 0($t1) 80008 35 9 8 0
bne $t0, $s5,Exit 80012 5 8 21 2
addi $s3, $s3, 1 80016 8 19 19 1
j Loop 80020 2 20000
Exit: … 80024
Branching Far Away
 If branch target is too far to encode with 16-bit offset, assembler rewrites the
code
 Example
beq $s0,$s1, L1

bne $s0,$s1, L2
j L1
L2: …
Addressing Mode Summary
When two or more processors share a memory area, they may try to read/write at the same time.

2.11 Parallelism and Instructions: Synchronization


Synchronization
 Two processors sharing an area of memory
 P1 writes, then P2 reads
 Data race if P1 and P2 don’t synchronize
• Result depends of order of accesses
 Hardware support required
 Atomic read/write memory operation
 No other access to the location allowed between the read and write
 Could be a single instruction
A register and memory value swap atomically (no other
 E.g., atomic swap of register memory processor can interfere).

 Or an atomic pair of instructions


Locking mechanisms using spinlocks where a processor waits until the
resource is available.
Synchronization in MIPS
 Load linked: ll rt, offset(rs)
 Store conditional: sc rt, offset(rs)
 Succeeds if location not changed since the ll
• Returns 1 in rt
 Fails if location is changed
• Returns 0 in rt
 Example: atomic swap (to test/set lock variable)
try: add $t0,$zero,$s4 ;copy exchange value
ll $t1,0($s1) ;load linked
sc $t0,0($s1) ;store conditional
beq $t0,$zero,try ;branch store fails
add $s4,$zero,$t1 ;put load value in $s4
2.12 Translating and Starting a Program
Translation and Startup

Many compilers
produce object
modules directly

Static
linking
Assembler Pseudoinstructions
 Most assembler instructions represent machine instructions one-to-one
 Pseudoinstructions: figments of the assembler’s imagination
move $t0, $t1 → add $t0, $zero, $t1
blt $t0, $t1, L → slt $at, $t0, $t1
bne $at, $zero, L
 $at (register 1): assembler temporary
Producing an Object Module
 Assembler (or compiler) translates program into machine instructions
 Provides information for building a complete program from the pieces
 Header: described contents of object module
 Text segment: translated instructions
 Static data segment: data allocated for the life of the program
 Relocation info: for contents that depend on absolute location of loaded
program
 Symbol table: global definitions and external refs
 Debug info: for associating with source code
Linking Object Modules
 Produces an executable image multiple object modules -> unified executable file

1. Merges segments
2. Resolve labels (determine their addresses)
3. Patch location-dependent and external refs
 Could leave location dependencies for fixing by a relocating loader
 But with virtual memory, no need to do this
 Program can be loaded into absolute location in virtual memory space

Since each process has its own virtual address space, there is no need for relocation during loading.
text Size = Instructions
data size = variables 100 = 256 bytes

Object 1

here two instructions

X is a global/external variable, as it is in
data segment

lw has a dependency upon X X -> VARIABLE


jal has a dependency upon B
B must be
external function

Symbol table stores global var, normal var, external procedure


Object 2

Y must be a global/ external variable

A-> external function/procedure


Merged Object
Loading a Program
 Load from image file on disk into memory
1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory
• Or set page table entries so they can be faulted in
4. Set up arguments on stack
5. Initialize registers (including $sp, $fp, $gp)
6. Jump to startup routine
• Copies arguments to $a0, … and calls main
• When main returns, do exit syscall
Dynamic Linking
 Only link/load library procedure when it is called
 Requires procedure code to be relocatable
 Avoids image bloat caused by static linking of all (transitively) referenced libraries
 Automatically picks up new library versions
Dynamically Linked Libraries (DLL)
 Library routines that are linked to a program during execution
 To incorporate the updated version of library routines
 In lazy procedure linkage, each routine is linked only when it is called
 Provides a level of indirection
 Nonlocal routine calls a set of dummy entries at the end of the program, with
one entry per nonlocal routine. These dummy entries each contain an indirect
jump
 The Dynamic Linker/Loader finds the desired routine, remaps it, and changes
the address in the indirect jump location to point to that routine
 From then, the call to the library routine jump indirectly to routine without the
extra hops
Lazy Linkage

Indirection table

Stub: Loads routine ID,


Jump to linker/loader

Linker/loader code

Dynamically
mapped code
Starting a Java Program
 Java programs ensure portability sacrificing some performance
 Compiled first in to an easy-to-interpret instruction set: Java bytecode
 A software interpreter, called Java Virtual Machine (JVM) can execute Java byte code
 This process is slow
 Just In Time Compiler (JIT) makes it faster
 Statistically identify the commonly used (hot) methods
 Compiles these methods into native instruction set
Starting Java Applications

Simple portable
instruction set
for the JVM

Compiles
Interprets
bytecodes of
bytecodes
“hot” methods
into native
code for host
machine
C Sort Example

§2.13 A C Sort Example to Put It All Together


 Illustrates use of assembly instructions for a C
bubble sort function
 Swap procedure (leaf)
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
 v in $a0, k in $a1, temp in $t0
The Procedure Swap
swap: sll $t1, $a1, 2 # $t1 = k * 4
add $t1, $a0, $t1 # $t1 = v+(k*4)
# (address of v[k])
lw $t0, 0($t1) # $t0 (temp) = v[k]
lw $t2, 4($t1) # $t2 = v[k+1]
sw $t2, 0($t1) # v[k] = $t2 (v[k+1])
sw $t0, 4($t1) # v[k+1] = $t0 (temp)
jr $ra # return to calling routine

Chapter 2 —
Instructions: Language
The Sort Procedure in C
 Non-leaf (calls swap)
void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i += 1) {
for (j = i – 1;
j >= 0 && v[j] > v[j + 1];
j -= 1) {
swap(v,j);
}
}
}
 v in $a0, k in $a1, i in $s0, j in $s1

Chapter 2 —
Instructions: Language
The Procedure Body
move $s2, $a0 # save $a0 into $s2 Move
move $s3, $a1 # save $a1 into $s3 params
move $s0, $zero # i = 0
Outer loop
for1tst: slt $t0, $s0, $s3 # $t0 = 0 if $s0 ≥ $s3 (i ≥ n)
beq $t0, $zero, exit1 # go to exit1 if $s0 ≥ $s3 (i ≥ n)
addi $s1, $s0, –1 # j = i – 1
for2tst: slti $t0, $s1, 0 # $t0 = 1 if $s1 < 0 (j < 0)
bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0)
sll $t1, $s1, 2 # $t1 = j * 4 Inner loop
add $t2, $s2, $t1 # $t2 = v + (j * 4)
lw $t3, 0($t2) # $t3 = v[j]
lw $t4, 4($t2) # $t4 = v[j + 1]
slt $t0, $t4, $t3 # $t0 = 0 if $t4 ≥ $t3
beq $t0, $zero, exit2 # go to exit2 if $t4 ≥ $t3
move $a0, $s2 # 1st param of swap is v (old $a0) Pass
move $a1, $s1 # 2nd param of swap is j params
jal swap # call swap procedure & call
addi $s1, $s1, –1 # j –= 1
Inner loop
j for2tst # jump to test of inner loop
exit2: addi $s0, $s0, 1 # i += 1
Outer loop
j for1tst # jump to test of outer loop
The Full Procedure
sort: addi $sp,$sp, –20 # make room on stack for 5 registers
sw $ra, 16($sp) # save $ra on stack
sw $s3,12($sp) # save $s3 on stack
sw $s2, 8($sp) # save $s2 on stack
sw $s1, 4($sp) # save $s1 on stack
sw $s0, 0($sp) # save $s0 on stack
… # procedure body

exit1: lw $s0, 0($sp) # restore $s0 from stack
lw $s1, 4($sp) # restore $s1 from stack
lw $s2, 8($sp) # restore $s2 from stack
lw $s3,12($sp) # restore $s3 from stack
lw $ra,16($sp) # restore $ra from stack
addi $sp,$sp, 20 # restore stack pointer
jr $ra # return to calling routine
Effect of Compiler Optimization

Compiled with gcc for Pentium 4 under Linux


3 Relative Performance 140000 Instruction count
2.5 120000
100000
2
80000
1.5
60000
1
40000
0.5 20000
0 0
none O1 O2 O3 none O1 O2 O3

180000 Clock Cycles 2 CPI


160000
140000 1.5
120000
100000
1
80000
60000
40000 0.5
20000
0 0
none O1 O2 O3 none O1 O2 O3
Effect of Language and Algorithm
3 Bubblesort Relative Performance
2.5

1.5

0.5

0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT

2.5 Quicksort Relative Performance


2

1.5

0.5

0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT

3000 Quicksort vs. Bubblesort Speedup


2500

2000

1500

1000

500

0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
Example: Clearing and Array
clear1(int array[], int size) { clear2(int *array, int size) {
int i; int *p;
for (i = 0; i < size; i += 1) for (p = &array[0]; p < &array[size];
array[i] = 0; p = p + 1)
} *p = 0;
}
move $t0,$zero # i = 0 move $t0,$a0 # p = & array[0]
loop1: sll $t1,$t0,2 # $t1 = i * 4 sll $t1,$a1,2 # $t1 = size * 4
add $t2,$a0,$t1 # $t2 = add $t2,$a0,$t1 # $t2 =
# &array[i] # &array[size]
sw $zero, 0($t2) # array[i] = 0 loop2: sw $zero,0($t0) # Memory[p] = 0
addi $t0,$t0,1 # i = i + 1 addi $t0,$t0,4 # p = p + 4
slt $t3,$t0,$a1 # $t3 = slt $t3,$t0,$t2 # $t3 =
# (i < size) #(p<&array[size])
bne $t3,$zero,loop1 # if (…) bne $t3,$zero,loop2 # if (…)
# goto loop1 # goto loop2
Comparison of Array vs. Ptr
 Multiply “strength reduced” to shift
 Array version requires shift to be inside loop
 Part of index calculation for incremented i
 c.f. incrementing pointer
 Compiler can achieve same effect as manual use of pointers
 Induction variable elimination
 Better to make program clearer and safer
MIPS (RISC) Design Principles In Summary
 Simplicity favors regularity
 fixed size instructions
 small number of instruction formats
 opcode always the first 6 bits
 Smaller is faster
 limited instruction set
 limited number of registers
 limited number of addressing modes
 Good design demands good compromises
 Three instruction formats
 Make the common case fast
 arithmetic operands using the registers
 Saving the commonly used registers into stack
 PC-relative addressing
 allow instructions to contain immediate operands
Lessons Learnt
 Instruction count and CPI are not good performance indicators in isolation
 Compiler optimizations are sensitive to the algorithm
 Java/JIT compiled code is significantly faster than JVM interpreted
 Comparable to optimized C in some cases
 Nothing can fix a dumb algorithm!
Alternative Architectures
 Design alternative
 provide more powerful operations
 goal is to reduce number of instructions executed
 danger is a slower cycle time and/or a higher CPI
2.16 Real Stuff: ARM Instructions
ARM & MIPS Similarities
 ARM: the most popular embedded core
 Similar basic set of instructions to MIPS
ARM MIPS
Date announced 1985 1985
Instruction size 32 bits 32 bits
Address space 32-bit flat 32-bit flat
Data alignment Aligned Aligned
Data addressing modes 9 3
Registers 15 × 32-bit 31 × 32-bit
Input/output Memory Memory
mapped mapped
Compare and Branch in ARM
 Uses condition codes for result of an arithmetic/logical instruction
 Negative, zero, carry, overflow
 Compare instructions to set condition codes without keeping the result
 Each instruction can be conditional
 Top 4 bits of instruction word: condition value
 Can avoid branches over single instructions
Instruction Encoding
2.17 Real Stuff: x86 Instructions
The Intel x86 ISA
 Evolution with backward compatibility
 8080 (1974): 8-bit microprocessor
• Accumulator, plus 3 index-register pairs
 8086 (1978): 16-bit extension to 8080
• Complex instruction set (CISC)
 8087 (1980): floating-point coprocessor
• Adds FP instructions and register stack
 80286 (1982): 24-bit addresses, MMU
• Segmented memory mapping and protection
 80386 (1985): 32-bit extension (now IA-32)
• Additional addressing modes and operations
• Paged memory mapping as well as segments
The Intel x86 ISA
 Further evolution…
 i486 (1989): pipelined, on-chip caches and FPU
• Compatible competitors: AMD, Cyrix, …
 Pentium (1993): superscalar, 64-bit datapath
• Later versions added MMX (Multi-Media eXtension) instructions
• The infamous FDIV bug
 Pentium Pro (1995), Pentium II (1997)
• New microarchitecture (see Colwell, The Pentium Chronicles)
 Pentium III (1999)
• Added SSE (Streaming SIMD Extensions) and associated registers
 Pentium 4 (2001)
• New microarchitecture
• Added SSE2 instructions
The Intel x86 ISA
 And further…
 AMD64 (2003): extended architecture to 64 bits
 EM64T – Extended Memory 64 Technology (2004)
• AMD64 adopted by Intel (with refinements)
• Added SSE3 instructions
 Intel Core (2006)
• Added SSE4 instructions, virtual machine support
 AMD64 (announced 2007): SSE5 instructions
• Intel declined to follow, instead…
 Advanced Vector Extension (announced 2008)
• Longer SSE registers, more instructions
 If Intel didn’t extend with compatibility, its competitors would!
 Technical elegance ≠ market success
Basic x86 Registers
Basic x86 Addressing Modes
 Two operands per instruction
Source/dest operand Second source operand
Register Register
Register Immediate
Register Memory
Memory Register
Memory Immediate

 Memory addressing modes


 Address in register
 Address = Rbase + displacement
 Address = Rbase + 2scale × Rindex (scale = 0, 1, 2, or 3)
 Address = Rbase + 2scale × Rindex + displacement
x86 Instruction Encoding
 Variable length encoding
 Postfix bytes specify
addressing mode
 Prefix bytes modify
operation
• Operand length,
repetition, locking, …
Implementing IA-32 (Seg:Off)
 Complex instruction set makes implementation difficult
 Hardware translates instructions to simpler microoperations
• Simple instructions: 1–1
• Complex instructions: 1–many
 Microengine similar to RISC
 Market share makes this economically viable
 Comparable performance to RISC
 Compilers avoid complex instructions
2.18 Real Stuff: ARM v8 (64-bit) Instructions
ARM v8 Instructions
 In moving to 64-bit, ARM did a complete overhaul
 ARM v8 resembles MIPS
 Changes from v7:
• No conditional execution field
• Immediate field is 12-bit constant
• Dropped load/store multiple
• PC is no longer a GPR
• GPR set expanded to 32
• Addressing modes work for all word sizes
• Divide instruction
• Branch if equal/branch if not equal instructions
2.19 Fallacies and Pitfalls
Fallacies
 Powerful instruction  higher performance
 Fewer instructions required
 But complex instructions are hard to implement
• May slow down all instructions, including simple ones
 Compilers are good at making fast code from simple instructions
 Use assembly code for high performance
 But modern compilers are better at dealing with modern processors
 More lines of code  more errors and less productivity
Fallacies
 Backward compatibility  instruction set doesn’t
change
 But they do accrete more instructions

x86 instruction set


Pitfalls
 Sequential words are not at sequential addresses
 Increment by 4, not by 1!
 Keeping a pointer to an automatic variable after procedure returns
 e.g., passing pointer back via an argument
 Pointer becomes invalid when stack popped
2.20 Concluding Remarks
Concluding Remarks
 Design principles
1. Simplicity favors regularity
2. Smaller is faster
3. Make the common case fast
4. Good design demands good compromises
 Layers of software/hardware
 Compiler, assembler, hardware
 MIPS: typical of RISC ISAs
 c.f. x86
Concluding Remarks
 Measure MIPS instruction executions in benchmark
programs
 Consider making the common case fast
 Consider compromises

Instruction class MIPS examples SPEC2006 Int SPEC2006 FP


Arithmetic add, sub, addi 16% 48%
Data transfer lw, sw, lb, lbu, 35% 36%
lh, lhu, sb, lui
Logical and, or, nor, andi, 12% 4%
ori, sll, srl
Cond. Branch beq, bne, slt, 34% 8%
slti, sltiu
Jump j, jr, jal 2% 0%
Acknowledgements
 These slides contain material developed and copyright by:
 Lecture slides by Dr. Tanzima Hashem, Professor, CSE, BUET
 Lecture slides by Mehnaz Tabassum Mahin, Assistant Professor, CSE, BUET
Thank You 

You might also like