Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
and Architectures
COE 403
Computer Architecture
Prof. Muhamed Mudawar
Computer Engineering Department
King Fahd University of Petroleum and Minerals
Instruction Set Architecture
Critical interface between software and hardware
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 2
Evolution of Instruction Sets
Accumulator Stack Register-Memory Register-Register
Processor Processor Processor Processor
... ... ...
TOP
Load/Store
... ... ... ...
Load [A] Push [A] Load R1, [A] Load R1, [A]
C=A+B Add [B] Push [B] Add R1, [B] Load R2, [B]
Store [C] Add Store R1, [C] Add R3, R1, R2
Pop [C] Store R3, [C]
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 3
Classifying Instruction Sets
Early Instruction Set Architectures
Accumulator-based or Stack-based
Replaced with General-Purpose Register (GPR) architectures
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 7
Addressing Modes (Commonly Used)
How instructions specify the addresses of their operands
Operands can be in registers, constants, or in memory
Mode Example Meaning When used
Register ADD R1, R2, R3 R1 R2 + R3 Values in registers
R2 R2 + 8 Address is pre-updated
Pre-update LD R1, [R2, 8] !
R1 Mem[R2] Using pointer to traverse array
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 8
Types and Size of Operands
Common operand types:
ASCII character = 1 byte (64-bit register can store 8 characters)
Unicode character or Short integer = 2 bytes = 16 bits (half word)
Integer = 4 bytes = 32 bits (word size on many RISC processors)
Single-precision float = 4 bytes = 32 bits (word size)
Long integer = 8 bytes = 64 bits (double word)
Double-precision float = 8 bytes = 64 bits (double word)
Extended-precision float = 10 bytes = 80 bits (Intel architecture)
Quad-precision float = 16 bytes = 128 bits (quad word)
32-bit versus 64-bit architectures
64-bit architectures support 64-bit operands & memory addresses
Older architectures were 32-bit (can address 4 GB of memory)
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 9
Data Accesses by Size
Copyright © 2019,
Elsevier Inc. All rights Reserved.
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 12
Addressing Modes for Control
How to specify the target address for control instructions?
PC-relative addressing for branch instructions
PC-relative offset is added to the program counter (PC)
The target instruction is often near the branch instruction
Position independent code: can be loaded anywhere in memory
As a register (or memory) containing the target address
For procedure return and indirect jumps
For case or switch statements
For methods in object-oriented languages
For high-order functions or function pointers
For dynamically shared libraries that are loaded/linked at runtime
As a direct address in the instruction format
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 13
Conditional Branch Options
Extra compare
Condition Alpha, Comparison result
Simple instruction for general
Register MIPS put in a register
condition
One instruction
Compare MIPS, Compare is part of May be too much work
rather than two
and Branch PA-RISC the branch for pipelined execution
for a branch
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 14
Procedure Call Options
At a minimum, the return address should be saved
In a special link register, in a GPR, or in memory on the stack
CPU Disk
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 20
IBM 360: 50 years later zSeries z12
Six-core design (large cores)
The IBM zSeries 2.75 billion transistors (597 mm2)
z12 Die [2012] 32 nm technology (13 layers)
The z12 runs at 5.5 GHz to 6 GHz
Power = 300 Watts (liquid cooling)
I-Cache: 64KB L1 + 1MB L2 per core
D-cache: 96KB L1 + 1MB L2 per core
On-chip shared L3: 48MB eDRAM
64-bit virtual addressing
Original S/360 was 24-bit, S/370 was 32-bit
SS
R9 R9d = 32 bits
in 64-bit mode
DS
R10 R10d = 32 bits ES
R11 R11d = 32 bits FS
R12 R12d = 32 bits GS
R13 R13d = 32 bits
R14 R14d = 32 bits
R15 R15d = 32 bits
CF = Carry Flag
RIP EIP = 32 bits OF = Overflow Flag
ZF = Zero Flag
RFLAGS EFLAGS = 32 bits SF = Sign Flag
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 24
MOV Instruction
MOV has different meanings according to source and destination
Three types of source operands:
Immediate: constant encoded in the instruction
Source Register: register number is encoded in the instruction
Memory: address is computed according to memory addressing mode
Two types of destination operands: Register or Memory
However, Memory to Memory transfer is not allowed
Instruction Meaning Comment
MOV Rd, Rs Rd = Rs Register copy
MOV Rd, Imm Rd = Imm Initialize Rd with Immediate
MOV Rd, [mem] Rd = [mem] Load register Rd from memory
MOV [mem], Rs [mem] = Rs Store register Rs in Memory
MOV [mem], Imm [mem] = Imm Store immediate in Memory
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 25
Data Movement Instructions
Instruction Meaning Comment
MOVZX Rd, src Rd = zero_extend(src) Move with zero extend
MOVSX Rd, src Rd = sign_extend(src) Move with sign extend
PUSH src RSP –= 8 ; [RSP] = src Push src value on stack
POP dest dest = [RSP] ; RSP += 8 Pop top of stack
XCHG dest, src {dest,src} = {src,dest} Exchange src with dest
LEA Rd, [mem] Rd = address_of(mem) Load effective address
4, or 8
00 No disp
01 8-bit disp
10 16 or 32-bit
11 reg to reg
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 33
Top 10 Integer Instructions for Intel x86
1. Load: 22% (read from memory)
Percentages are based
2. Conditional branch: 20%
on five SPEC INT 92
3. Compare: 16% programs
4. Store: 12% (write to memory)
The most widely
5. Add: 8%
executed instructions are
6. And: 6%
the simplest operations
7. Sub: 5%
of an instruction set
8. Move register-register: 4%
9. Call: 1% (function call) Top-10 instructions
account for 96% of
10. Return: 1% (function return)
instructions executed
Total = 96% of instructions executed
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 34
Intel x86-64 FPU & XMM Registers
x87 FPU Registers XMM Registers
ST0 = 80 bits XMM0 = 128 bits
ST1 = 80 bits XMM1 = 128 bits
ST2 = 80 bits XMM2 = 128 bits
Replaced By
ST3 = 80 bits XMM3 = 128 bits
ST4 = 80 bits XMM4 = 128 bits
ST5 = 80 bits XMM5 = 128 bits
ST6 = 80 bits XMM6 = 128 bits
ST7 = 80 bits XMM7 = 128 bits
64-bit mode
registers in
Additional
Exception Flags XMM10 = 128 bits
XMM11 = 128 bits
Precision control XMM12 = 128 bits
Rounding control FP Control XMM13 = 128 bits
Exception masks XMM14 = 128 bits
XMM15 = 128 bits
FPU IP Saved for
Rounding Control
Exception
FPU DP Exception Masks MXCSR
Handlers
Exception Flags
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 35
SSE Instruction Set
SSE = Streaming SIMD Extension
SIMD instructions operate in parallel on multiple data packed in a register
SSE Instructions consist of the following:
Data movement instructions
Arithmetic Instructions
Logical Instructions
Comparison Instructions
Conversion Instructions
The SSE instruction set introduced 70 new instructions
SSE2 added 144 more instructions to SSE
SSE3 added 13 more instructions
SSE4 added 54 more instructions
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 36
SSE Scalar Instructions
128-bit XMM Registers
A3 A2 A1 A0
Scalar Single-Precision
B3 B2 B1 B0 Floating-Point Instructions (SSE)
MOVSS, ADDSS, SUBSS, …
op
MULSS, DIVSS, SQRTSS, …
A3 A2 A1 A0 op B0 MAXSS, MINSS, CMPSS, …
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 37
SSE Parallel (SIMD) Instructions
128-bit XMM Registers
A3 A2 A1 A0 Packed Single-Precision
Floating-Point Instructions (SSE)
B3 B2 B1 B0
MOVAPS, MOVUPS, …
op op op op ADDPS, SUBPS, MULPS, …
MAXPS, MINPS, CMPPS, …
A3 op B3 A2 op B2 A1 op B1 A0 op B0
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 38
SSE/2 Data Movement Instructions
Instruction Meaning
MOVSS dest, src Move Scalar (S=32-bit float) from src to dest
MOVSD dest, src Move Scalar (D=64-bit float) from src to dest
MOVAPS dest, src Move Aligned Packed floats (16 bytes)
MOVUPS dest, src Move Unaligned Packed floats (16 bytes)
MOVAPD dest, src Move Aligned Packed double-precision floats
MOVUPD dest, src Move Unaligned Packed double-precision floats
MOVD dest, src Move Double-word (32 bits) between GPR and XMM
MOVQ dest, src Move Quad-word (64 bits) between GPR and XMM
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 41
The MIPS Architecture
Announced in 1985: MIPS I,II,III,IV,V, MIPS32, MIPS64
MIPS64 has 32 × 64-bit general-purpose registers
Named R0 to R31 (also known as integer registers)
Register R0 is always zero and cannot be written
There are also 32 × 64-bit floating-point registers
Named F0 to F31 for double-precision FP numbers
Single-precision FP numbers use the lower 32-bit of the register
Integer and Floating-Point data types for MIPS64
8-bit bytes, 16-bit half words, 32-bit words, and 64-bit long words
32-bit single-precision and 64-bit double precision
Latest MIPS64 release eliminated the HI and LO registers
Multiply and Divide instructions write their results into GPR registers
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 42
MIPS Instruction Formats
All instructions are 32 bits with a 6-bit primary opcode
These are the main instruction formats, not the only ones
sa
rs
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 43
MIPS Load and Store Instructions
Load/Store instructions use the I-Format with 16-bit displacement
Instruction Name Meaning
LD Rt, Imm(Rs) Load double word Reg[Rt] 64 Mem[Reg[Rs] + Imm]
LWU Rt, Imm(Rs) Load word unsigned Reg[Rt] 32 Mem[Reg[Rs] + Imm] (zero-extend)
LHU Rt, Imm(Rs) Load half unsigned Reg[Rt] 16 Mem[Reg[Rs] + Imm] (zero-extend)
LBU Rt, Imm(Rs) Load byte unsigned Reg[Rt] 8 Mem[Reg[Rs] + Imm] (zero-extend)
SH Rt, Imm(Rs) Load half word Mem[Reg[Rs] + Imm] 16 Reg[Rt] (lower 16-bit)
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 44
MIPS Floating-Point Load and Store
Instruction Name Meaning
LDC1 Ft, Imm(Rs) Load double to FP Reg[Ft] 64 Mem[Reg[Rs] + Imm]
SWC1 Ft, Imm(Rs) Store FP word Mem[Reg[Rs] + Imm] 32 Reg[Ft] (lower 32-bit)
Instruction Meaning
DADD Rd, Rs, Rt Reg[Rd] Reg[Rs] + Reg[Rt] (64-bit integer addition)
DADDI Rt, Rs, Imm Reg[Rt] Reg[Rs] + Imm (immediate can be negative)
DSLL, DSRL, DSRA Shift Left, Shift Right Logical, Shift Right Arithmetic
DSLLV, DSRLV, DSRAV Same as DSLL, DSRL, DSRA, but with a variable amount
AND, OR, XOR, NOR R-type bitwise logic instructions (64-bit operands)
SLT, SLTU, SLTI, SLTIU Set Less Than, Unsigned, Immediate, (Result is 0 or 1)
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 46
MIPS64 Multiply and Divide Instructions
Multiplication of 64-bit integers produces a 128-bit product
Low and High 64-bit of the product are computed using two instructions
Division of 64-bit integers produces a quotient and remainder
Results are written to a register Rd LO and HI registers are eliminated
Instruction Meaning
DMUL Rd, Rs, Rt Rd = Low 64-bit of Signed 64-bit integer multiplication
DMUH Rd, Rs, Rt Rd = High 64-bit of Signed 64-bit integer multiplication
DMULU Rd, Rs, Rt Rd = Low 64-bit of Unsigned 64-bit integer multiplication
DMUHU Rd, Rs, Rt Rd = High 64-bit of Unsigned 64-bit integer multiplication
DDIV Rd, Rs, Rt Rd = Quotient of Signed 64-bit integer division
DMOD Rd, Rs, Rt Rd = Modulo (Remainder) of Signed 64-bit integer division
DDIVU Rd, Rs, Rt Rd = Quotient of Unsigned 64-bit integer division
DMODU Rd, Rs, Rt Rd = Modulo (Remainder) of Unsigned 64-bit integer division
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 47
MIPS Floating-Point Instructions
Instruction Meaning
ADD.S Fd, Fs, Ft Reg[Fd] Reg[Fs] + Reg[Ft] (32-bit double-precision add)
ADD.D Fd, Fs, Ft Reg[Fd] Reg[Fs] + Reg[Ft] (64-bit double-precision add)
SUB.S, SUB.D FP Subtract (FR-format), Single and Double-precision
MUL.S, MUL.D FP Multiply (FR-format), Single and Double-precision
DIV.S, DIV.D FP Divide (FR-format): Single and Double-precision
MADDF.S, MADDF.D FP Fused Multiply-Add: Reg[Fd] Reg[Fd] + Reg[Fs] × Reg[Ft]
SEL.S, SEL.d Select: Reg[Fd] Reg[Fd].bit0 ? Reg[Ft] : Reg[Fs]
CVT.x.y Fd, Fs Convert: Reg[Fd] convert_from_format_y_to_x (Reg[Fs])
CMP.cond.S (or .D) Compare: Reg[Fd] compare_cond (Reg[Fs], Reg[Ft])
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 50
MIPS Instruction Set Usage (cont’d)
SPEC FP 2000
Five Programs
Instruction Set Principles and Architectures COE 403 – Computer Architecture - KFUPM Muhamed Mudawar – slide 51
Fallacies and Pitfalls
Fallacy: Complex and Powerful instruction ⇒ higher performance
Fewer instructions required
But complex instructions are hard to implement
May slow down instruction execution
Compilers are good at making fast code from simple instructions