0% found this document useful (0 votes)

6 views113 pages

Chapter 2

The document outlines the fundamentals of computer architecture, focusing on the MIPS instruction set architecture, its design principles, and the processes involved in executing procedures. It details the structure of instructions, data storage, and the handling of registers, including how to manage memory and perform operations with constants. Additionally, it explains the steps involved in procedure calls and the management of stack memory during execution.

Uploaded by

শতক দে

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views113 pages

Chapter 2

Uploaded by

শতক দে

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 113

CSE 305

Computer Architecture

Instructions
Prepared by
Madhusudan Basak
Assistant Professor
CSE, BUET
*Some modifications done by Saem Hasan
Instructions
 A computer is run by instructions
 Instruction Set
 All possible instructions for a CPU
Instruction Set Architecture
 Instruction set architecture is the attributes of a computing system as seen by
the assembly language programmer or compiler.
 Instruction Set (what operations can be performed?)

 Instruction Format (how are instructions specified?)

 Data storage (where is data located?)

 Addressing Modes (how is data accessed?)

 Exceptional Conditions (what happens if something goes wrong?)

Design Philosophy

MIPS Architecture
 Acronym of “Microprocessor without Interlocked Pipelined Stages”
 Is a RISC (Reduced Instruction Set Computer) ISA (Instruction Set Architecture)
 Developed by MIPS Technologies (then MIPS Computer Systems) in 1985
Design Principles
 Design Principle 1: Simplicity favors regularity.

 Design Principle 2: Smaller is faster.

 Design Principle 3: Good design demands good compromises.

Simplicity favors regularity
 The instructions of MIPS are fixed and rigid
 Rigidity ensures Regularity
 Simplicity favors regularity
Example Instruction
 and instructions always follow the following fixed format
Example Instruction
 and instructions always follow the following fixed format
MIPS Instruction Properties
 Arithmetic operations only takes register values as operands
 Size of a register : 32 bit
 Number of registers: 32
 Satisfies Design Principle 2: Smaller is faster.
• Large number of registers -> longer time for electronic signal to travel -> Increased
clock cycle
• Large number of registers -> Increased number of control bits

 A number is dedicated to a particular register.

 5 bit are reserved to indicate each register when the count is 32
 Compiler maps a program variable to a register
 There is a number assigned against each register
MIPS Instruction Properties
 Example
MIPS Instruction Properties
 What about data structures (Arrays and Structures)?
 These are kept in memory and a particular element is transferred to registers
whenever required
 Compiler
 places an array/structure into memory
 keeps the starting address of the memory in a register
 tracks which register is used for which array/structure

 Data Transfer Instructions

 load word:
 store word:
Data Transfer Instructions
 Example

Spilling Registers
 What if the number of variables and data structures is more than 32?
 The process of putting less commonly used variables into memory is called
Spilling Registers.
 Keeps the most frequently used variables in registers and less frequents one in
memory
 Moves variables around registers and memory
Operation with Constants
 Add immediate or addi instruction
 Takes one register and one constant as input

 Constant 0 is used very often (e.g., for transfer operations)

 A register is dedicated for this purpose
#$s2=$s3

 Is an instruction represented using register names?

Representing Instructions in the Computer
 Key principles of Today’s computer instructions
 Instructions are represented as numbers
 Program are stored in memory to be read or written, just like data
Representing Instructions in the Computer
 An instruction is a sequence of bits
 Each register is mapped to a number
 There is a convention to map register names into numbers (e.g., registers $s0 to
$s7 map onto registers 16 to 23)

 Each instruction is 32 bit long (size of a word)

 Satisfies Design Principle 1: Simplicity favors regularity.

 Instruction Format: format used by an instruction

An MIPS Instruction Format
 R-Format
 Deals with arithmetic operations with registers

 shorthand of opcode, denotes operation type and format type

 Source Register1
 Source Register 2
 Destination Register
 Shorthand of shift amount
 shorthand of function code, denotes the specific variant of an operation
An MIPS Instruction Format

Would it be good to use just one format for all kind of operations?

No!
Design Principle 3: Good design demands good compromises.
Another MIPS Instruction Format
 I-Format
 Deals with immediate and data transfer operations

 Here represents the destination register

MIPS Instruction Encoding (Simplified)
Summary
Logical Operations

No NOT Operation?
Shift Left
 sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits
Branching Operations

if (i==j)
f = g + h;
else
f = g – h;

Why not blt?

Branching Operations
Memory Allocation for Program and Data
 Reserved Memory
 Text Segment: Program Code
 Static Data Segment:
 Contains static variables, constants, arrays
 Dynamic Data Segment (Heap):
 Contains linked lists, variable length data
 Stack
 Contains function local variables, parameters etc.
Stack During Procedure Call
 Stores/saves the values of registers and Restores those later
 Either $sp and $fp combination, or only $sp is used
 The allocated stack area by a procedure is known as activation record or
procedure frame function er jonno allocated space = activation record
Stack During Procedure Call
 Stores the values of registers
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from
several points in a program.
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Place for parameters:

 $a0–$a3: four argument registers in which to pass parameters
 Memory (e.g., stack) can be used if more values to be passed
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.

2. Transfer control to the procedure.

3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Transfer Control:
 Jump-and-link instruction (jal)

 Keeps the return address for the procedure in $ra and jumps to the
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.

3. Acquire the storage resources needed for the procedure.

 Acquiring the storage resources

 Different register values (manipulated by caller) are stored in the stack
 Prepares the registers for operations
 Can use the memory for data storage
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.

3. Acquire the storage resources needed for the procedure.

4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.

4. Perform the desired task.

5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Performing Tasks
 Can perform all the operations allowed by the MIPS instruction set
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.

5. Put the result value in a place where the calling program can access it.
6. Return control to the point of origin, since a procedure can be called from several points in a program.

 Result storing by the callee

 $v0–$v1: two value registers in which to return values
 Memory (e.g., stack) can be used if more values to be returned
Procedures in MIPS
 Six steps during a procedure call
1. Put parameters in a place where the procedure can access them.
2. Transfer control to the procedure.
3. Acquire the storage resources needed for the procedure.
4. Perform the desired task.
5. Put the result value in a place where the calling program can access it.

6. Return control to the point of origin, since a procedure can be called from several
points in a program.
 Procedure returned
 Stack is adjusted using $sp and $fp pointers
 An unconditional jump to the address from where the caller will resume execution
non_leaf:
An Example
...
add $a0, $s1, $zero Put parameters in a place
add $a1, $s2, $zero where the procedure can access
add $a2, $s3, $zero them.
add $a3, $s4, $zero Transfer control to the
jal leaf procedure.
add $s1, $zero, $v0
...
leaf: addi $sp, $sp, –12 # adjust stack to make room for 3 items Acquire the storage
sw $t1, 8($sp) # save register $t1 for use afterwards resources needed for the
sw $t0, 4($sp) # save register $t0 for use afterwards procedure
sw $s0, 0($sp) # save register $s0 for use afterwards

add $t0,$a0,$a1 # register $t0 contains g + h

add $t1,$a2,$a3 # register $t1 contains i + j Perform the desired task.
sub $s0,$t0,$t1 # f = $t0 – $t1, which is (g + h)–(i + j)
Put the result value in a
place where the calling
add $v0,$s0,$zero # Set return Value
program can access it.
lw $s0, 0($sp) # restore register $s0 for caller
lw $t0, 4($sp) # restore register $t0 for caller
lw $t1, 8($sp) # restore register $t1 for caller
addi $sp,$sp,12 # adjust stack to delete 3 items
Return control to the point
of origin, since a procedure
jr $ra # jump back to calling routine can be called from several
points in a program.
Nested Call
fact: addi $sp, $sp, –8 # adjust stack for 2 items
sw $ra, 4($sp) # save the return address
sw $a0, 0($sp) # save the argument n
slti $t0,$a0,1 # test for n < 1
beq $t0,$zero,L1 # if n >= 1, go to L1
addi $v0,$zero,1 # Set the return value
addi $sp,$sp,8 # pop 2 items off stack
jr $ra # return to caller
L1: addi $a0,$a0,–1 # n >= 1: argument gets (n – 1)
jal fact # call fact with (n –1)
lw $a0, 0($sp) # return from jal: restore argument n
lw $ra, 4($sp) # restore the return address
addi $sp, $sp, 8 # adjust stack pointer to pop 2 items
mul $v0,$a0,$v0 # return n * fact (n – 1)
jr $ra # return to the caller
J-Format
 To support long jump to a remote procedure address

opcode Jumping Address

ASCII representation of characters
 ASCII stands for American Standard Code for Information Interchange
 Uses 1 byte to represent a character
1 Byte Memory Operation
 ASCII demands 1-byte memory operation
 MIPS supports
lb $t0,0($sp) #Reads 1 byte from memory and stores in the lowest (rightmost) byte of $t0
sb $t0,0($gp) #Reads 1 byte from the lowest (rightmost) byte of $t0 and stores in the memory

Unsigned Version:
Load Byte: lbu
Store Byte: Not available in MIPS
Dealing with Strings
 Three commonly available strategies
 the first position of the string is reserved to give the length of a string
 an accompanying variable has the length of the string
 the last position of a string is indicated by a character used to mark the end of a
string
Dealing with String: Example
 ASCII demands 1-byte memory operation
 MIPS supports
lb $t0,0($sp) #Reads 1 byte from memory and stores in the lowest (rightmost) byte of $t0
sb $t0,0($gp) #Reads 1 byte from the lowest (rightmost) byte of $t0 and stores in the memory

Unsigned Version:
Load Byte: lbu
Store Byte: Not available in MIPS
Dealing with String: An Example
strcpy: addi $sp,$sp,–4 # adjust stack for 1 more item
sw $s0, 0($sp) # save $s0
add $s0,$zero,$zero #i=0+0
L1: add $t1,$s0,$a1 # address of y[i] in $t1
lbu $t2, 0($t1) # $t2 = y[i]
add $t3,$s0,$a0 # address of x[i] in $t3
sb $t2, 0($t3) # x[i] = y[i]
beq $t2,$zero,L2 # if y[i] == 0, go to L2
addi $s0, $s0,1 #i=i+1
j L1 # go to L1. While loop continues
L2: lw $s0, 0($sp) # End of string. Restore old $s0
addi $sp,$sp,4 # pop 1 word off stack
jr $ra # return
String Copy Example
 MIPS code:
strcpy:
addi $sp, $sp, -4 # adjust stack for 1 item
sw $s0, 0($sp) # save $s0
add $s0, $zero, $zero # i = 0
L1: add $t1, $s0, $a1 # addr of y[i] in $t1
lbu $t2, 0($t1) # $t2 = y[i]
add $t3, $s0, $a0 # addr of x[i] in $t3
sb $t2, 0($t3) # x[i] = y[i]
beq $t2, $zero, L2 # exit loop if y[i] == 0
addi $s0, $s0, 1 # i = i + 1
j L1 # next iteration of loop
L2: lw $s0, 0($sp) # restore saved $s0
addi $sp, $sp, 4 # pop 1 item from stack
jr $ra # and return

Chapter 2 — Instructions: Language of the Computer — 46

Dealing with Multiple Languages
 Unicode, a universal encoding, supports alphabets of most human languages
 Java uses Unicode
 Requires 16 bits to represent a character
 Operations required to load and store 16bits (halfwords)
 Load halfword: lh
 Load halfword unsigned: lhu
 Store halfword: sh
Constant Size: Is larger feasible?
 The I-format is used by both immediate and memory data transfer operations

 More than 16 bit long constant?

 We have two options in hand
• Supporting short constant in 1 instruction and deal long constant with additional instructions
• Supporting long constant in 2 instructions
 We opted for the first option (to exploit the benefit of common case fast)
 Are we limited to use 16 bit constants (-32,768 to 32,767)?
 No  Use two instructions to populate a register with 32 bit constant
Manipulating Larger Constant
 Populating a register with 32 bit constant No sign extension for logical
 Load Upper Immediate (lui) : Loads upper 16 bits operations but it happens for
arithmetic operations
 Or Immediate (ori): : Loads lower 16 bits
 Example: x=y+4000000
 4000000 = 0000 0000 0011 1101 0000 1001 0000 0000
lui $s0, 61 // 61 = 0000 0000 0011 1101
$s0 0000 0000 0011 1101 0000 1001
0000 0000 0000
ori $s0, 2304 // 2304= 0000 1001 0000 0000
add $s1, $t0, $s0
0000 0000 0000 0000 0000 1001 0000 0000

 Think how to use 32 bit address to access data from memory

Addressing in Branches
 PC is 32 bit but the address in conditional branching (if-else, loop etc.) is16 bit

 Forces the program instruction count within instructions ( Bytes=128 KB)

 Most of the conditional branches jump nearby (weather forward or backward)
 Use PC relative addressing to access forward (PC + relative constant address), or
backward (PC – relative constant address) location
 Can support branching up to relative constant address value
 PC normally increments 1 word (4 bytes) after every instruction
 The addressing should be (PC+4 + relative constant address) for forward access
and (PC + 4 – relative constant address) for backward access
Addressing in Jumps
 PC is 32 bit but the address in Jumps is16 bit

opcode Jumping Address

 Forces the program instruction count within instructions ( )
Addressing in Jumps
 PC is 32 bit but the address in Jumps is16 bit

opcode Jumping Address

 Forces the program instruction count within instructions ( )

 Replaces the 28 rightmost byte of PC

 Known as pseudodirect addressing
 A program is not placed across an address boundary of 256 MB
 Otherwise, a jump must be replaced by a jump register instruction preceded
by other instructions to load the full 32-bit address into a register
Food for Thought
 Can you reverse engineer a binary?
 Is your IP secured?
Jump Addressing
 Jump (j and jal) targets could be anywhere in text
segment
 Encode full address in instruction

op address
6 bits 26 bits

 (Pseudo)Direct jump addressing

 Target address = PC31…28 : (address × 4)

Chapter 2 —
Instructions: Language
Target Addressing Example
 Loop code from earlier example
 Assume Loop at location 80000

Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0

add $t1, $t1, $s6 80004 0 9 22 9 0 32
lw $t0, 0($t1) 80008 35 9 8 0
bne $t0, $s5,Exit 80012 5 8 21 2
addi $s3, $s3, 1 80016 8 19 19 1
j Loop 80020 2 20000
Exit: … 80024
Branching Far Away
 If branch target is too far to encode with 16-bit offset, assembler rewrites the
code
 Example
beq $s0,$s1, L1
↓
bne $s0,$s1, L2
j L1
L2: …
Addressing Mode Summary
When two or more processors share a memory area, they may try to read/write at the same time.

2.11 Parallelism and Instructions: Synchronization

Synchronization
 Two processors sharing an area of memory
 P1 writes, then P2 reads
 Data race if P1 and P2 don’t synchronize
• Result depends of order of accesses
 Hardware support required
 Atomic read/write memory operation
 No other access to the location allowed between the read and write
 Could be a single instruction
A register and memory value swap atomically (no other
 E.g., atomic swap of register memory processor can interfere).

 Or an atomic pair of instructions

Locking mechanisms using spinlocks where a processor waits until the
resource is available.
Synchronization in MIPS
 Load linked: ll rt, offset(rs)
 Store conditional: sc rt, offset(rs)
 Succeeds if location not changed since the ll
• Returns 1 in rt
 Fails if location is changed
• Returns 0 in rt
 Example: atomic swap (to test/set lock variable)
try: add $t0,$zero,$s4 ;copy exchange value
ll $t1,0($s1) ;load linked
sc $t0,0($s1) ;store conditional
beq $t0,$zero,try ;branch store fails
add $s4,$zero,$t1 ;put load value in $s4
2.12 Translating and Starting a Program
Translation and Startup

Many compilers
produce object
modules directly

Static
linking
Assembler Pseudoinstructions
 Most assembler instructions represent machine instructions one-to-one
 Pseudoinstructions: figments of the assembler’s imagination
move $t0, $t1 → add $t0, $zero, $t1
blt $t0, $t1, L → slt $at, $t0, $t1
bne $at, $zero, L
 $at (register 1): assembler temporary
Producing an Object Module
 Assembler (or compiler) translates program into machine instructions
 Provides information for building a complete program from the pieces
 Header: described contents of object module
 Text segment: translated instructions
 Static data segment: data allocated for the life of the program
 Relocation info: for contents that depend on absolute location of loaded
program
 Symbol table: global definitions and external refs
 Debug info: for associating with source code
Linking Object Modules
 Produces an executable image multiple object modules -> unified executable file

1. Merges segments
2. Resolve labels (determine their addresses)
3. Patch location-dependent and external refs
 Could leave location dependencies for fixing by a relocating loader
 But with virtual memory, no need to do this
 Program can be loaded into absolute location in virtual memory space

Since each process has its own virtual address space, there is no need for relocation during loading.
text Size = Instructions
data size = variables 100 = 256 bytes

Object 1

here two instructions

X is a global/external variable, as it is in
data segment

lw has a dependency upon X X -> VARIABLE

jal has a dependency upon B
B must be
external function

Symbol table stores global var, normal var, external procedure

Object 2

Y must be a global/ external variable

A-> external function/procedure

Merged Object
Loading a Program
 Load from image file on disk into memory
1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory
• Or set page table entries so they can be faulted in
4. Set up arguments on stack
5. Initialize registers (including $sp, $fp, $gp)
6. Jump to startup routine
• Copies arguments to $a0, … and calls main
• When main returns, do exit syscall
Dynamic Linking
 Only link/load library procedure when it is called
 Requires procedure code to be relocatable
 Avoids image bloat caused by static linking of all (transitively) referenced libraries
 Automatically picks up new library versions
Dynamically Linked Libraries (DLL)
 Library routines that are linked to a program during execution
 To incorporate the updated version of library routines
 In lazy procedure linkage, each routine is linked only when it is called
 Provides a level of indirection
 Nonlocal routine calls a set of dummy entries at the end of the program, with
one entry per nonlocal routine. These dummy entries each contain an indirect
jump
 The Dynamic Linker/Loader finds the desired routine, remaps it, and changes
the address in the indirect jump location to point to that routine
 From then, the call to the library routine jump indirectly to routine without the
extra hops
Lazy Linkage

Indirection table

Stub: Loads routine ID,

Jump to linker/loader

Linker/loader code

Dynamically
mapped code
Starting a Java Program
 Java programs ensure portability sacrificing some performance
 Compiled first in to an easy-to-interpret instruction set: Java bytecode
 A software interpreter, called Java Virtual Machine (JVM) can execute Java byte code
 This process is slow
 Just In Time Compiler (JIT) makes it faster
 Statistically identify the commonly used (hot) methods
 Compiles these methods into native instruction set
Starting Java Applications

Simple portable
instruction set
for the JVM

Compiles
Interprets
bytecodes of
bytecodes
“hot” methods
into native
code for host
machine
C Sort Example

§2.13 A C Sort Example to Put It All Together

 Illustrates use of assembly instructions for a C
bubble sort function
 Swap procedure (leaf)
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
 v in $a0, k in $a1, temp in $t0
The Procedure Swap
swap: sll $t1, $a1, 2 # $t1 = k * 4
add $t1, $a0, $t1 # $t1 = v+(k*4)
# (address of v[k])
lw $t0, 0($t1) # $t0 (temp) = v[k]
lw $t2, 4($t1) # $t2 = v[k+1]
sw $t2, 0($t1) # v[k] = $t2 (v[k+1])
sw $t0, 4($t1) # v[k+1] = $t0 (temp)
jr $ra # return to calling routine

Chapter 2 —
Instructions: Language
The Sort Procedure in C
 Non-leaf (calls swap)
void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i += 1) {
for (j = i – 1;
j >= 0 && v[j] > v[j + 1];
j -= 1) {
swap(v,j);
}
}
}
 v in $a0, k in $a1, i in $s0, j in $s1

Chapter 2 —
Instructions: Language
The Procedure Body
move $s2, $a0 # save $a0 into $s2 Move
move $s3, $a1 # save $a1 into $s3 params
move $s0, $zero # i = 0
Outer loop
for1tst: slt $t0, $s0, $s3 # $t0 = 0 if $s0 ≥ $s3 (i ≥ n)
beq $t0, $zero, exit1 # go to exit1 if $s0 ≥ $s3 (i ≥ n)
addi $s1, $s0, –1 # j = i – 1
for2tst: slti $t0, $s1, 0 # $t0 = 1 if $s1 < 0 (j < 0)
bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0)
sll $t1, $s1, 2 # $t1 = j * 4 Inner loop
add $t2, $s2, $t1 # $t2 = v + (j * 4)
lw $t3, 0($t2) # $t3 = v[j]
lw $t4, 4($t2) # $t4 = v[j + 1]
slt $t0, $t4, $t3 # $t0 = 0 if $t4 ≥ $t3
beq $t0, $zero, exit2 # go to exit2 if $t4 ≥ $t3
move $a0, $s2 # 1st param of swap is v (old $a0) Pass
move $a1, $s1 # 2nd param of swap is j params
jal swap # call swap procedure & call
addi $s1, $s1, –1 # j –= 1
Inner loop
j for2tst # jump to test of inner loop
exit2: addi $s0, $s0, 1 # i += 1
Outer loop
j for1tst # jump to test of outer loop
The Full Procedure
sort: addi $sp,$sp, –20 # make room on stack for 5 registers
sw $ra, 16($sp) # save $ra on stack
sw $s3,12($sp) # save $s3 on stack
sw $s2, 8($sp) # save $s2 on stack
sw $s1, 4($sp) # save $s1 on stack
sw $s0, 0($sp) # save $s0 on stack
… # procedure body
…
exit1: lw $s0, 0($sp) # restore $s0 from stack
lw $s1, 4($sp) # restore $s1 from stack
lw $s2, 8($sp) # restore $s2 from stack
lw $s3,12($sp) # restore $s3 from stack
lw $ra,16($sp) # restore $ra from stack
addi $sp,$sp, 20 # restore stack pointer
jr $ra # return to calling routine
Effect of Compiler Optimization

Compiled with gcc for Pentium 4 under Linux

3 Relative Performance 140000 Instruction count
2.5 120000
100000
2
80000
1.5
60000
1
40000
0.5 20000
0 0
none O1 O2 O3 none O1 O2 O3

180000 Clock Cycles 2 CPI

160000
140000 1.5
120000
100000
1
80000
60000
40000 0.5
20000
0 0
none O1 O2 O3 none O1 O2 O3
Effect of Language and Algorithm
3 Bubblesort Relative Performance
2.5

1.5

0.5

0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT

2.5 Quicksort Relative Performance

1.5

0.5

0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT

3000 Quicksort vs. Bubblesort Speedup

2500

2000

1500

1000

500

0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
Example: Clearing and Array
clear1(int array[], int size) { clear2(int *array, int size) {
int i; int *p;
for (i = 0; i < size; i += 1) for (p = &array[0]; p < &array[size];
array[i] = 0; p = p + 1)
} *p = 0;
}
move $t0,$zero # i = 0 move $t0,$a0 # p = & array[0]
loop1: sll $t1,$t0,2 # $t1 = i * 4 sll $t1,$a1,2 # $t1 = size * 4
add $t2,$a0,$t1 # $t2 = add $t2,$a0,$t1 # $t2 =
# &array[i] # &array[size]
sw $zero, 0($t2) # array[i] = 0 loop2: sw $zero,0($t0) # Memory[p] = 0
addi $t0,$t0,1 # i = i + 1 addi $t0,$t0,4 # p = p + 4
slt $t3,$t0,$a1 # $t3 = slt $t3,$t0,$t2 # $t3 =
# (i < size) #(p<&array[size])
bne $t3,$zero,loop1 # if (…) bne $t3,$zero,loop2 # if (…)
# goto loop1 # goto loop2
Comparison of Array vs. Ptr
 Multiply “strength reduced” to shift
 Array version requires shift to be inside loop
 Part of index calculation for incremented i
 c.f. incrementing pointer
 Compiler can achieve same effect as manual use of pointers
 Induction variable elimination
 Better to make program clearer and safer
MIPS (RISC) Design Principles In Summary
 Simplicity favors regularity
 fixed size instructions
 small number of instruction formats
 opcode always the first 6 bits
 Smaller is faster
 limited instruction set
 limited number of registers
 limited number of addressing modes
 Good design demands good compromises
 Three instruction formats
 Make the common case fast
 arithmetic operands using the registers
 Saving the commonly used registers into stack
 PC-relative addressing
 allow instructions to contain immediate operands
Lessons Learnt
 Instruction count and CPI are not good performance indicators in isolation
 Compiler optimizations are sensitive to the algorithm
 Java/JIT compiled code is significantly faster than JVM interpreted
 Comparable to optimized C in some cases
 Nothing can fix a dumb algorithm!
Alternative Architectures
 Design alternative
 provide more powerful operations
 goal is to reduce number of instructions executed
 danger is a slower cycle time and/or a higher CPI
2.16 Real Stuff: ARM Instructions
ARM & MIPS Similarities
 ARM: the most popular embedded core
 Similar basic set of instructions to MIPS
ARM MIPS
Date announced 1985 1985
Instruction size 32 bits 32 bits
Address space 32-bit flat 32-bit flat
Data alignment Aligned Aligned
Data addressing modes 9 3
Registers 15 × 32-bit 31 × 32-bit
Input/output Memory Memory
mapped mapped
Compare and Branch in ARM
 Uses condition codes for result of an arithmetic/logical instruction
 Negative, zero, carry, overflow
 Compare instructions to set condition codes without keeping the result
 Each instruction can be conditional
 Top 4 bits of instruction word: condition value
 Can avoid branches over single instructions
Instruction Encoding
2.17 Real Stuff: x86 Instructions
The Intel x86 ISA
 Evolution with backward compatibility
 8080 (1974): 8-bit microprocessor
• Accumulator, plus 3 index-register pairs
 8086 (1978): 16-bit extension to 8080
• Complex instruction set (CISC)
 8087 (1980): floating-point coprocessor
• Adds FP instructions and register stack
 80286 (1982): 24-bit addresses, MMU
• Segmented memory mapping and protection
 80386 (1985): 32-bit extension (now IA-32)
• Additional addressing modes and operations
• Paged memory mapping as well as segments
The Intel x86 ISA
 Further evolution…
 i486 (1989): pipelined, on-chip caches and FPU
• Compatible competitors: AMD, Cyrix, …
 Pentium (1993): superscalar, 64-bit datapath
• Later versions added MMX (Multi-Media eXtension) instructions
• The infamous FDIV bug
 Pentium Pro (1995), Pentium II (1997)
• New microarchitecture (see Colwell, The Pentium Chronicles)
 Pentium III (1999)
• Added SSE (Streaming SIMD Extensions) and associated registers
 Pentium 4 (2001)
• New microarchitecture
• Added SSE2 instructions
The Intel x86 ISA
 And further…
 AMD64 (2003): extended architecture to 64 bits
 EM64T – Extended Memory 64 Technology (2004)
• AMD64 adopted by Intel (with refinements)
• Added SSE3 instructions
 Intel Core (2006)
• Added SSE4 instructions, virtual machine support
 AMD64 (announced 2007): SSE5 instructions
• Intel declined to follow, instead…
 Advanced Vector Extension (announced 2008)
• Longer SSE registers, more instructions
 If Intel didn’t extend with compatibility, its competitors would!
 Technical elegance ≠ market success
Basic x86 Registers
Basic x86 Addressing Modes
 Two operands per instruction
Source/dest operand Second source operand
Register Register
Register Immediate
Register Memory
Memory Register
Memory Immediate

 Memory addressing modes

 Address in register
 Address = Rbase + displacement
 Address = Rbase + 2scale × Rindex (scale = 0, 1, 2, or 3)
 Address = Rbase + 2scale × Rindex + displacement
x86 Instruction Encoding
 Variable length encoding
 Postfix bytes specify
addressing mode
 Prefix bytes modify
operation
• Operand length,
repetition, locking, …
Implementing IA-32 (Seg:Off)
 Complex instruction set makes implementation difficult
 Hardware translates instructions to simpler microoperations
• Simple instructions: 1–1
• Complex instructions: 1–many
 Microengine similar to RISC
 Market share makes this economically viable
 Comparable performance to RISC
 Compilers avoid complex instructions
2.18 Real Stuff: ARM v8 (64-bit) Instructions
ARM v8 Instructions
 In moving to 64-bit, ARM did a complete overhaul
 ARM v8 resembles MIPS
 Changes from v7:
• No conditional execution field
• Immediate field is 12-bit constant
• Dropped load/store multiple
• PC is no longer a GPR
• GPR set expanded to 32
• Addressing modes work for all word sizes
• Divide instruction
• Branch if equal/branch if not equal instructions
2.19 Fallacies and Pitfalls
Fallacies
 Powerful instruction  higher performance
 Fewer instructions required
 But complex instructions are hard to implement
• May slow down all instructions, including simple ones
 Compilers are good at making fast code from simple instructions
 Use assembly code for high performance
 But modern compilers are better at dealing with modern processors
 More lines of code  more errors and less productivity
Fallacies
 Backward compatibility  instruction set doesn’t
change
 But they do accrete more instructions

x86 instruction set

Pitfalls
 Sequential words are not at sequential addresses
 Increment by 4, not by 1!
 Keeping a pointer to an automatic variable after procedure returns
 e.g., passing pointer back via an argument
 Pointer becomes invalid when stack popped
2.20 Concluding Remarks
Concluding Remarks
 Design principles
1. Simplicity favors regularity
2. Smaller is faster
3. Make the common case fast
4. Good design demands good compromises
 Layers of software/hardware
 Compiler, assembler, hardware
 MIPS: typical of RISC ISAs
 c.f. x86
Concluding Remarks
 Measure MIPS instruction executions in benchmark
programs
 Consider making the common case fast
 Consider compromises

Instruction class MIPS examples SPEC2006 Int SPEC2006 FP

Arithmetic add, sub, addi 16% 48%
Data transfer lw, sw, lb, lbu, 35% 36%
lh, lhu, sb, lui
Logical and, or, nor, andi, 12% 4%
ori, sll, srl
Cond. Branch beq, bne, slt, 34% 8%
slti, sltiu
Jump j, jr, jal 2% 0%
Acknowledgements
 These slides contain material developed and copyright by:
 Lecture slides by Dr. Tanzima Hashem, Professor, CSE, BUET
 Lecture slides by Mehnaz Tabassum Mahin, Assistant Professor, CSE, BUET
Thank You 

ECE2071 Notes - Assembly Language
No ratings yet
ECE2071 Notes - Assembly Language
8 pages
MDB RS232 Quick Start
No ratings yet
MDB RS232 Quick Start
11 pages
CSE 305 Computer Architecture: Instructions
No ratings yet
CSE 305 Computer Architecture: Instructions
103 pages
MIPSInstructions
No ratings yet
MIPSInstructions
38 pages
EECE 417 Computer Systems Architecture: Department of Electrical and Computer Engineering Howard University
No ratings yet
EECE 417 Computer Systems Architecture: Department of Electrical and Computer Engineering Howard University
38 pages
Chapter 3: MIPS Instruction Set
No ratings yet
Chapter 3: MIPS Instruction Set
12 pages
Decision Instructions: CS353 - Computer Architecture
No ratings yet
Decision Instructions: CS353 - Computer Architecture
43 pages
COAL FALL 15 Lecture 11 and 12 Procedure Call
No ratings yet
COAL FALL 15 Lecture 11 and 12 Procedure Call
17 pages
Slides D CODchap2Slides
No ratings yet
Slides D CODchap2Slides
37 pages
Mips Isa
No ratings yet
Mips Isa
32 pages
MIPS 32 in Brief
No ratings yet
MIPS 32 in Brief
29 pages
CH7 - Instructions - Language of The Computer
No ratings yet
CH7 - Instructions - Language of The Computer
92 pages
M.Sc. Course in Microprocessor Design
No ratings yet
M.Sc. Course in Microprocessor Design
43 pages
ACA Introduction Lect
No ratings yet
ACA Introduction Lect
27 pages
Instructions: Language of The Machine: Computer Architecture - CS401
No ratings yet
Instructions: Language of The Machine: Computer Architecture - CS401
65 pages
Register Usage & Procedures: Objectives
No ratings yet
Register Usage & Procedures: Objectives
16 pages
MIPS
No ratings yet
MIPS
27 pages
Assignment No.1
No ratings yet
Assignment No.1
5 pages
Amdahl's Law: Execution Time After Improvement
No ratings yet
Amdahl's Law: Execution Time After Improvement
23 pages
Lecture 04 Assembly II
No ratings yet
Lecture 04 Assembly II
49 pages
Mi Ps Details
No ratings yet
Mi Ps Details
124 pages
COE301 Lab 2 Introduction MIPS Assembly
No ratings yet
COE301 Lab 2 Introduction MIPS Assembly
7 pages
Lecture 6 - 10 - On - MIPS - ISA
No ratings yet
Lecture 6 - 10 - On - MIPS - ISA
42 pages
L05 IntroductionToProcedures
No ratings yet
L05 IntroductionToProcedures
8 pages
Lecture 3: MIPS Instruction Set
No ratings yet
Lecture 3: MIPS Instruction Set
24 pages
Ca Lec4
No ratings yet
Ca Lec4
20 pages
6 Mips Datapath
No ratings yet
6 Mips Datapath
55 pages
08 Cse333
No ratings yet
08 Cse333
6 pages
Assemby and MIPS ISA
No ratings yet
Assemby and MIPS ISA
91 pages
Chapter 3 ISA
No ratings yet
Chapter 3 ISA
4 pages
MAP Notes - CHapter 6 Part 1
50% (2)
MAP Notes - CHapter 6 Part 1
11 pages
Instructions: Language of The Machine
No ratings yet
Instructions: Language of The Machine
47 pages
Lect2 PDF
No ratings yet
Lect2 PDF
51 pages
CS122: Computer Architecture & Organization: Semester I, 2011
No ratings yet
CS122: Computer Architecture & Organization: Semester I, 2011
27 pages
Ca03 2014 PDF
No ratings yet
Ca03 2014 PDF
84 pages
CA Note Chap2
No ratings yet
CA Note Chap2
28 pages
Mips Instruction Format
No ratings yet
Mips Instruction Format
41 pages
Procedure and Macro (16 Marks)
No ratings yet
Procedure and Macro (16 Marks)
7 pages
Computer Architecture Lec 10
No ratings yet
Computer Architecture Lec 10
41 pages
Instruction Set Architecture MIPS PDF
No ratings yet
Instruction Set Architecture MIPS PDF
46 pages
Lecturenote - 384620204chapter Four
No ratings yet
Lecturenote - 384620204chapter Four
13 pages
2MIPS Assembly
No ratings yet
2MIPS Assembly
47 pages
System Architecture: Lab Notes For Week 1: The MIPS CPU, Assembly Programming 1 Introduction To System Architecture
No ratings yet
System Architecture: Lab Notes For Week 1: The MIPS CPU, Assembly Programming 1 Introduction To System Architecture
7 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
50 pages
5 Singlecycle
No ratings yet
5 Singlecycle
60 pages
Stored Program Concept Instructions
No ratings yet
Stored Program Concept Instructions
7 pages
Computer Organizatin - MIPS Assembly Part 2
No ratings yet
Computer Organizatin - MIPS Assembly Part 2
6 pages
Procedure and Macros Notes
No ratings yet
Procedure and Macros Notes
7 pages
MPP Exp6 Final On 10-9-2013 Okk Draft
No ratings yet
MPP Exp6 Final On 10-9-2013 Okk Draft
17 pages
Lecture 4
No ratings yet
Lecture 4
32 pages
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank Competency Area 3: Programming and Coding Methods
No ratings yet
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank Competency Area 3: Programming and Coding Methods
68 pages
Conditional Ops
No ratings yet
Conditional Ops
16 pages
ECE232: Hardware Organization and Design: Example: Array Access
No ratings yet
ECE232: Hardware Organization and Design: Example: Array Access
13 pages
MIPS Architecture and Assembly Language Overview: Register Description I/O Description
No ratings yet
MIPS Architecture and Assembly Language Overview: Register Description I/O Description
9 pages
Unit 5 Proc Macro
No ratings yet
Unit 5 Proc Macro
6 pages
03 - MIPS Intruction Set Architecture
No ratings yet
03 - MIPS Intruction Set Architecture
58 pages
Advanced Computer Architecture: Dr. Muhammad Bilal Kadri
No ratings yet
Advanced Computer Architecture: Dr. Muhammad Bilal Kadri
28 pages
9 12 13 PDF
No ratings yet
9 12 13 PDF
6 pages
Mips Assembly
No ratings yet
Mips Assembly
17 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
MT8051iE1 Datasheet ENG
No ratings yet
MT8051iE1 Datasheet ENG
2 pages
MPT6 Compatibility Chart PDF
No ratings yet
MPT6 Compatibility Chart PDF
1 page
Structure of Operating Systems
No ratings yet
Structure of Operating Systems
56 pages
Architecture and Organization
No ratings yet
Architecture and Organization
4 pages
Chap8-10 - IRQ &amp DMA
No ratings yet
Chap8-10 - IRQ &amp DMA
8 pages
Microprocessor 8085 - Program
No ratings yet
Microprocessor 8085 - Program
8 pages
Assembly Programming For Mid Range PIC
No ratings yet
Assembly Programming For Mid Range PIC
52 pages
What Is Arithmetic Instructions in 8086 Microprocessor
100% (1)
What Is Arithmetic Instructions in 8086 Microprocessor
2 pages
Zaiko Computer
No ratings yet
Zaiko Computer
13 pages
(UNIT 1) Ocr A Level Computer Science Notes
No ratings yet
(UNIT 1) Ocr A Level Computer Science Notes
129 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
What Is Non-MMU or MMU-Less Linux
No ratings yet
What Is Non-MMU or MMU-Less Linux
2 pages
G7TLE First Quarter
No ratings yet
G7TLE First Quarter
5 pages
Ict Ss 1 2nd Term CA 2
100% (1)
Ict Ss 1 2nd Term CA 2
7 pages
ICDL Computer Essentials Sample Part-Test Theory Questions-WIN8.12013
No ratings yet
ICDL Computer Essentials Sample Part-Test Theory Questions-WIN8.12013
2 pages
PDC Notes Complete - Updated
No ratings yet
PDC Notes Complete - Updated
52 pages
Chapter 2 Computer System and Its Parts
No ratings yet
Chapter 2 Computer System and Its Parts
19 pages
Recommendations S5 To S7
No ratings yet
Recommendations S5 To S7
5 pages
BCA - 1 - S101T - Unit - 1 - MR - Ankit Kumar Mishra
No ratings yet
BCA - 1 - S101T - Unit - 1 - MR - Ankit Kumar Mishra
26 pages
BestGradez 61 4.1 4.2 305
No ratings yet
BestGradez 61 4.1 4.2 305
23 pages
Toshiba Satellite T110 - Quanta - BU3 - Laptop - Schematics
No ratings yet
Toshiba Satellite T110 - Quanta - BU3 - Laptop - Schematics
34 pages
ATX Power Supplies (PG)
No ratings yet
ATX Power Supplies (PG)
16 pages
CMG 2022 General Performance Recommendations
No ratings yet
CMG 2022 General Performance Recommendations
11 pages
Output Devices
No ratings yet
Output Devices
3 pages
OS CDAC Question Paper
76% (17)
OS CDAC Question Paper
3 pages
Dell Inspiron 13 7352 Laptop Reference Guide en Us
No ratings yet
Dell Inspiron 13 7352 Laptop Reference Guide en Us
24 pages
ARM Module - 1
100% (2)
ARM Module - 1
84 pages
Document Centre C250 360 450
No ratings yet
Document Centre C250 360 450
36 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
10 pages

Chapter 2

Uploaded by

Chapter 2

Uploaded by

CSE 305

 Instruction Format (how are instructions specified?)

 Data storage (where is data located?)

 Addressing Modes (how is data accessed?)

 Exceptional Conditions (what happens if something goes wrong?)

 Design Principle 2: Smaller is faster.

 Design Principle 3: Good design demands good compromises.

 A number is dedicated to a particular register.

 Data Transfer Instructions

 Constant 0 is used very often (e.g., for transfer operations)

 Is an instruction represented using register names?

 Each instruction is 32 bit long (size of a word)

 Instruction Format: format used by an instruction

 shorthand of opcode, denotes operation type and format type

 Here represents the destination register

Why not blt?

 Place for parameters:

2. Transfer control to the procedure.

3. Acquire the storage resources needed for the procedure.

 Acquiring the storage resources

3. Acquire the storage resources needed for the procedure.

4. Perform the desired task.

 Result storing by the callee

add $t0,$a0,$a1 # register $t0 contains g + h

opcode Jumping Address

Chapter 2 — Instructions: Language of the Computer — 46

 More than 16 bit long constant?

 Think how to use 32 bit address to access data from memory

 Forces the program instruction count within instructions ( Bytes=128 KB)

opcode Jumping Address

opcode Jumping Address

 Replaces the 28 rightmost byte of PC

 (Pseudo)Direct jump addressing

Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0

2.11 Parallelism and Instructions: Synchronization

 Or an atomic pair of instructions

here two instructions

lw has a dependency upon X X -> VARIABLE

Symbol table stores global var, normal var, external procedure

Y must be a global/ external variable

A-> external function/procedure

Stub: Loads routine ID,

§2.13 A C Sort Example to Put It All Together

Compiled with gcc for Pentium 4 under Linux

180000 Clock Cycles 2 CPI

2.5 Quicksort Relative Performance

3000 Quicksort vs. Bubblesort Speedup

 Memory addressing modes

x86 instruction set

Instruction class MIPS examples SPEC2006 Int SPEC2006 FP

You might also like