0% found this document useful (0 votes)
830 views79 pages

IA32 Instruction Set (Short Form)

The document provides a comprehensive list of IA32 instructions, detailing operand types, register sets, and flags. It includes descriptions of data transfer instructions, conditional moves, and stack operations, along with their corresponding mnemonics and symbolic operations. Additionally, it outlines the structure and functionality of various registers, including general purpose, segment, and FPU registers.

Uploaded by

Polak2002Kuba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
830 views79 pages

IA32 Instruction Set (Short Form)

The document provides a comprehensive list of IA32 instructions, detailing operand types, register sets, and flags. It includes descriptions of data transfer instructions, conditional moves, and stack operations, along with their corresponding mnemonics and symbolic operations. Additionally, it outlines the structure and functionality of various registers, including general purpose, segment, and FPU registers.

Uploaded by

Polak2002Kuba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

IA32 Instruction List (Short Form)

© by Jarosław Kuchta 2012

Gdańsk University of Technology


Faculty of Electronics, Telecommunication and Informatics
Computer Architecture Department

Description of Operands
r8 8-bit general purpose register ptr16:16 16-bir far pointer in a different code segment
r16 16-bit general purpose register ptr32:32 32-bir far pointer in a different code segment
m16:16 a memory location containing a far pointer composed
r32 16-bit general purpose register m32fp a single-precision floating-point memory location
of two 16-bit numbers: segment & offset
EDX:EAX 64-bit integer number, EDX – more significant m64fp a double-precision floating-point memory location
m16:32 a memory location containing a far pointer composed
part, EAX – less significant part m80fp an extended-precision floating-point memory location
of numbers: 16-bit segment & 32-bit offset
m16&16 a memory location containing a data pair: 16&16-bit
imm8 immediate 8-bit value from 128 to 127 m16&32 a memory location containing a data pair: 16&32-bit m16int a word integer memory location
imm16 immediate 16-bit value from 32768 m32&32 a memory location containing a data pair: 32&32-bit m32int a double-word (dword) integer memory location
to +32767 m64int a quad-word (qword) integer memory location
imm32 immediate 32-bit value from 2147483648 to moffs8 simple 8-bit memory location, which actual address is
+2147483647 given by a simple offset relative to segment base ST the top element of the FPU register stack
moffs16 simple 16-bit memory location, which actual address ST(0) the top element of the FPU register stack
r/m8 8-bit general purpose register or memory location is given by a simple offset relative to segment base ST(i) the i-th element of the FPU register stack (i←0..7)
r/m16 16-bit general purpose register or memory location moffs32 simple 32-bit memory location, which actual address
r/m32 32-bit general purpose register or memory location is given by a simple offset relative to segment base
mm 64-bit MMX register from MM0 to MM7
mm/m32 low order 32 bits of an MMX register or 32-bit
m 16-bit or 32-bit memory location Sreg segment register: CS, DS, SS, ES, FS or GS memory location
m8 8-bit memory location mm/m64 MMX register or 64-bit memory location
m16 16-bit memory location rel8 relative address in the range from 128 bytes before
m32 32-bit memory location to 127 bytes after the end of instruction xmm 128-bit XMM register from XMM0 to XMM7
m64 64-bit memory location rel16 16-bit relative address within the same code segment xmm/m32 XMM register or 32-bit memory location
m128 128-bit memory location rel32 32-bit relative address within the same code segment xmm/m64 XMM register or 64-bit memory location
mNbyte N-byte memory location xmm/m128 XMM register or 128-bit memory location
Index Registers Instruction Pointer
Register Set 31 16 15 0 31 16 15 0
← SI → ← IP →
ESI EIP
General Purpose Registers ← DI →
31 16 15 87 0 EDI Segment Registers
← AX →
15 0
EAX AH AL Pointer Registers CS
← BX → DS
31 16 15 0
EBX BH BL ES
← BP →
← CX → FS
EBP
ECX CH CL GS
← SP →
← DX →
ESP
EDX DH DL

Flags
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
← FLAGS →
EFLAGS ID VIP VIF AC VM RF 0 NT IOPL OF DF IF TF SF ZF 0 AF 0 PF 1 CF

Bit Flag Description


0 CF Carry Flag Carry from most significant bit, also borrow for most significant bit; can be considered as overflow in unsigned instructions.
2 PF Parity Flag Set to 1 if 8 less significant bits of result have even number of 1’s, else set to 0.
4 AF Auxiliary carry Flag Used as carry flag in BCD instructions.
6 ZF Zero Flag Set to 1 if result is zero, else set to 0.
7 SF Sign Flag Set to 1 if result is negative (below zero), else set to 0.
8 TF Trap Flag Used by debuggers.
9 IF Interrupt Flag If set to 1, then interrupts are enabled, else are disabled.
10 DF Direction Flag When set to 0, string instructions increment the index registers, else – decrement the index registers.
11 OF Overflow Flag Used in signed instructions.
12,13 IOPL I/O Privilege Level Indicates the I/O privilege level of the currently running program or task.
14 NT Nested Task Controls the chaining of interrupt and called tasks.
16 RF Resume Flag Controls the processor’s response to instruction-breakpoint conditions.
17 VM Virtual 8086 Mode Set to enable virtual-8086 mode; clear to return to protected mode.
18 AC Alignment check Set this flag and the AM flag in control register CR0 to enable alignment checking of memory references.
19 VIF Virtual Interrupt Flag Contains a virtual image of the IF flag.
20 VIP Virtual Interrupt Pending Set by software to indicate that an interrupt is pending.
21 ID Identification The ability of a program or procedure to set or clear this flag indicates support for the CPUID instruction.
FPU Registers
79 0 15 0
ST(7) Control Register
ST(6) Status Register
ST(5) Task Register
ST(4)
ST(3) 10 0
ST(2) Opcode Register
ST(1)
ST(0) 47 0
FPU Instruction Pointer Register
FPU Data Pointer Register

MMX Registers  FPU registers


63 0 SSE/SSE2 Registers
MMX(7) 127 0
MMX(6) XMM (7)
MMX(5) XMM(6)
MMX(4) XMM(5)
MMX(3) XMM(4)
MMX(2) XMM(3)
MMX(1) XMM(2)
MMX(0) XMM(1)
XMM(0)
8 * byte
4 * word 16 *byte
2 * dword 8 * word
1 * qword 4 * dword
4 * single FP
2 * qword
2 * double FP
1 *dbl qword

31 0
MXCSR
CPU Instruction set

Data Transfer Instructions


Instruction Mnemonic Operands Description Symbolic operations
Move MOV r/m8, r8 Move r8 to r/m8. DST ← SRC
r/m16, r16 Move r16 to r/m16.
r/m32, r32 Move r32 to r/m32.
r8, r/m8 Move r/m8 to r8.
r16, r/m16 Move r/m16 to r16.
r32, r/m32 Move r/m32 to r32.
r/m16, Sreg Move segment register to r/m16.
Sreg, r/m16 Move r/m16 to segment register.
AL, moffs8 Move byte at (segment: offset) to AL.
AX, moffs16 Move word at (segment: offset) to AX.
EAX, moffs32 Move dword at (segment: offset) to EAX.
moffs8, AL Move AL to byte at (segment: offset).
moffs16, AX Move AX to word at (segment: offset).
moffs32, EAX Move EAX to dword at (segment: offset).
r8, imm8 Move imm8 to r8.
r16, imm16 Move imm16 to r16.
r32, imm32 Move imm32 to r32.
r/m8, imm8 Move imm8 to r/m8.
r/m16, imm16 Move imm16 to r/m16.
r/m32, imm32 Move imm32 to r/m32.
Conditional Move CMOVA r16, r/m16 Move if above (CF=0 and ZF=0) TMP ← SRC;
CMOVAE r32, r/m32 Move if above or equal (CF=0) IF (condition) THEN
CMOVB Move if below (CF=1) DST ← TMP
CMOVBE Move if below or equal (CF=1 or ZF=1) END
CMOVC Move if carry (CF=1)
CMOVE Move if equal (ZF=1)
CMOVG Move if greater (ZF=0 and SF=OF)
CMOVGE Move if greater or equal (SF=OF)
CMOVL Move if less (SF<>OF)
CMOVLE Move if less or equal (ZF=1 or SF<>OF)
CMOVNA Move if not above (CF=1 or ZF=1)
CMOVNAE Move if not above or equal (CF=1)
CMOVNB Move if not below (CF=0)
CMOVNBE Move if not below or equal (CF=0 and ZF=0)
CMOVNC Move if not carry (CF=0)
CMOVNE Move if not equal (ZF=0)
CMOVNG Move if not greater (ZF=1 or SF<>OF)
CMOVNGE Move if not greater or equal (SF<>OF)
CMOVNL Move if not less (SF=OF)
CMOVNLE Move if not less or equal (ZF=0 and SF=OF)
CMOVNO Move if not overflow (OF=0)
CMOVNP Move if not parity (PF=0)
CMOVNS Move if not sign (SF=0)
CMOVNZ Move if not zero (ZF=0)
CMOVO Move if overflow (OF=1)
CMOVP Move if parity (PF=1)
CMOVPE Move if parity even (PF=1)
CMOVPO Move if parity odd (PF=0)
CMOVS Move if sign (SF=1)
CMOVZ Move if zero (ZF=1)
Exchange XCHG AX, r16 Exchanges the contents of the register with other register or memory TMP ← DST;
r16, AX location. DST ← SRC;
EAX, r32 SRC ← TMP
r32, EAX
r/m8, r8
r8, r/m8
r/m16, r16
r16, r/m16
r/m32, r32
r32, r/m32
Byte Swap BSWAP r32 Reverses the byte order of a 32-bit register. TMP ← DST;
DST[7..0] ← TMP [31..24];
DST[15..8] ← TMP[23..16];
DST[23..16] ← TMP[15..8];
DST[31..24] ← TMP[7..0];
Exchange and ADD XADD r/m8, r8 Exchanges source and destination operands. Load sum into destination TMP ← SRC + DST
r/m16, r16 operand. SRC ← DST
r/m32, r32 DST ← TMP
Compare and CMPXCHG r/m8, r8 Compares the accumulator (AL, AX or EAX) with the first operand. If IF ACC = DST THEN
Exchange r/m16, r16 equal ZF is set and the second operand is loaded into the first operand. ZF ← 1; DST ← SRC
r/m32, r32 Else, clears ZF and loads the first operand into the accumulator. ELSE
ZF ← 0; ACC ← DST
END
Compare and CMPXCHG8B m64 Compares EDX:EAX with the operand. If equal ZF is set and ECX:EBX IF EDX:EAX = DST THEN
Exchange 8 Bytes is loaded into the operand. Else, clears ZF and loads the operand into the ZF ← 1; DST ← ECX:EBX
EDXLEAX. ELSE
ZF ← 0; EDX:EAX ← DST
END
Push onto Stack PUSH r/m16 Decrements stack pointer. Pushes register, memory or immediate value to ESP ← ESP  OPERANDSIZE/8;
r/m32 the top of stack into register or memory, increment stack pointer. SS:[ESP] ← SRC
imm8
imm16
imm32
DS
ES
SS
FS
GS
Push All general PUSHA Pushes AX, CX, DX, BX, original SP, BP, SI, and DI. TMP ← (E)SP;
registers PUSH (E)AX;
PUSH (E)CX;
PUSH (E)DX;
PUSH (E)BX;
PUSHAD Pushes EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI.
PUSH TMP;
PUSH (E)BP;
PUSH (E)SI;
PUSH (E)DI
Pop from stack POP r/m16 Pops top of stack into register or memory, increments stack pointer. DST ← SS:[ESP];
r/m32 ESP ← ESP + OPERANDSIZE/8
DS
ES
SS
FS
GS
Pop All general POPA Pops AX, CX, DX, BX, original SP, BP, SI, and DI. POP (E)DI;
registers POP (E)SI;
POP (E)BP;
ESP ← ESP + OPERANDSIZE/8;
POPAD Pops EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI. POP (E)BX;
POP (E)DX;
POP (E)CX;
POP (E)AX
Input from port IN AL, imm8 Inputs byte from given I/O port address into AL. DST ← Port(SRC)
AX, inn8 Inputs word from given I/O port address into AX.
EAX, imm8 Inputs dword from given I/O port address into EAX.
AL, DX Inputs byte from I/O port specified in DX into AL.
AX, DX Inputs word from I/O port specified in DX into AX.
EAX, DX Inputs dword from I/O port specified in DX into EAX.
Output from port OUT imm8, AL Outputs byte from AL to given I/O port address. Port(DST) ← SRC
inn8, AX Outputs word from AX to given I/O port address.
imm8, EAX Outputs dword from EAX to given I/O port address.
DX, AL Outputs byte from AL to I/O port specified in DX.
DX, AX Outputs word from AX to I/O port specified in DX.
DX, EAX Outputs dword from EAX to I/O port specified in DX.
Convert Word to CWD Sign-extends AX to DX:AX DX:AX ← SignExtend(AX)
Dword
Convert Dword to CDQ Sign-extends EAX to EDX:EAX EDX:EAX ← SignExtend(EAX)
Qword
Move with Zero- MOVZX r16, r/m8 Move byte to word with zero-extension. DST ← ZeroExtend(SRC)
Extend r32, r/m8 Move byte to dword with zero-extension.
r32, r/m16 Move word to dword with zero-extension.
Move with Sign- MOVSX r16, r/m8 Move byte to word with sign-extension. DST ← SignExtend(SRC)
Extend r32, r/m8 Move byte to dword with sign-extension.
r32, r/m16 Move word to dword with sign-extension.
Binary Arithmetic Instructions
Instruction Mnemonic Operands Description Symbolic operations
Increment INC r/m8 Adds 1 to the destination operand, while preserving the state of the CF DST ← DST + 1;
r/m16 flag. SET EFLAGS.OF, .SF, .ZF, .AF, ..PF
r/m32
Decrement DEC r/m8 Subtracts 1 from the destination operand, while preserving the state of the DST ← DST  1;
r/m16 CF flag. SET EFLAGS.OF, .SF, .ZF, .AF, ..PF
r/m32
Arithmetic NEG r/m8 Two’s complement negation of the operand. IF DST=0 THEN EFLAGS.CF ← 0
Negation r/m16 ELSE EFLAGS.CF ← 1;
r/m32 DST ←  DST;
SET EFLAGS.OF, .SF, .ZF, .AF, .PF
Add ADD AL, imm8 Adds source (second) operand to the destination (first) operand. DST← DST + SRC;
AX, imm16 SET EFLAGS.OF, .SF, .ZF, .AF, .CF, ..PF
Add with Carry ADC EAX, imm32 Adds source (second) operand with CF to the destination (first) operand. DST← DST + SRC +EFLAGS.CF;
r/m8, imm8 SET EFLAGS.OF, .SF, .ZF, .AF, .CF, ..PF
Subtract SUB r/m16, imm16 Subtracts source (second) operand from the destination (first) operand. DST← DST  SRC;
r/m32, imm32 SET EFLAGS.OF, .SF, .ZF, .AF, .CF, ..PF
Subtract with SBB r/m16, imm8 Subtracts source (second) operand with CF from the destination (first) DST← DST  (SRC +EFLAGS.CF);
Borrow r/m32, imm8 operand. SET EFLAGS.OF, .SF, .ZF, .AF, .CF, ..PF
Compare CMP r/m8, r8 Compares two operands by subtracting the second operand from the first TMP ← SRC1  SIGNEXTEND(SRC2);
r/m16, r16 operand and then setting the status flag in the same manner as the SUB SET EFLAGS.OF, .SF, .ZF, .AF, .CF, ..PF
r/m32, r32 instruction.
r8, r/m8
r16, r/m16
r32, r/m32
Unsigned Multiply MUL r/m8 Multiplies unsigned AL by r/m8. Stores result in AX. AX ← AL * SRC;
IF AH=0 THEN EFLAGS.OF, .CF ← 00B;
ELSE EFLAGS.OF, .CF ← 11B;
// EFLAGS. .ZF, .AF, .PF., SF are undefined
r/m16 Multiplies unsigned AX by r/m16. Stores result in DX:AX. DX:AX ← AX * SRC;
IF DX=0 THEN EFLAGS.OF, .CF ← 00B;
ELSE EFLAGS.OF, .CF ← 11B;
// EFLAGS. .ZF, .AF, .PF., SF are undefined
r/m32 Multiplies unsigned EAX by r/m32. Stores result in EDX:EAX. EDX:EAX ← EAX * SRC;
IF EDX=0 THEN EFLAGS.OF, .CF ← 00B;
ELSE EFLAGS.OF, .CF ← 11B;
// EFLAGS. .ZF, .AF, .PF., SF are undefined
Unsigned Divide DIV r/m8 Divides unsigned AX by r/m8. Stores result in AL, remainder in AH. AL ← AX / SRC; AH ← AX MOD SRC;
// EFLAGS.CF, .OF, .ZF, .AF, .PF., SF are undefined
r/m16 Divides unsigned DX:AX by r/m16. Stores result in AX, remainder in AX ← DX:AX / SRC; DX ← DX:AX MOD SRC;
DX. // EFLAGS.CF, .OF, .ZF, .AF, .PF., SF are undefined
r/m32 Divides unsigned EDX:EAX by r/m32. Stores result in EAX, remainder in EAX ← EDX:EAX / SRC; EDX ← EDX:EAX MOD SRC;
EDX. // EFLAGS.CF, .OF, .ZF, .AF, .PF., SF are undefined
Signed Multiply IMUL r/m8 Multiplies signed AL by r/m8. Stores result in AX. AX ← AL * SRC;
IF AX=AL THEN EFLAGS.CF, .OF ← 00B;
ELSE EFLAGS.CF, .OF ← 11B;
// EFLAGS. .ZF, .AF, .PF., SF are undefined
r/m16 Multiplies signed AX by r/m16. Stores result in DX:AX. DX:AX ← AX * SRC;
IF DX:AX=SignExtend(AX) THEN EFLAGS.CF, .OF ← 00B;
ELSE EFLAGS.CF, .OF ← 11B;
// EFLAGS. .ZF, .AF, .PF., SF are undefined
r/m32 Multiplies signed EAX by r/m32. Stores result in EDX:EAX. EDX:EAX ← EDX * SRC;
IF EDX:EAX=EAX THEN EFLAGS.CF, .OF ← 00B;
ELSE EFLAGS.CF, .OF ← 11B;
// EFLAGS. .ZF, .AF, .PF., SF are undefined
r16, r/m16 Multiplies signed word register by r/m16 word. TMP ← DST * SRC; // TMP is double DST size
r32, r/m32 Multiplies signed dword register by r/m32 dword. DST ← DST * SRC;
r16, imm8 Multiplies signed word register by sign-extend imm8 value. IF TMP=DST THEN EFLAGS.CF, .OF ← 00B;
r32, imm8 Multiplies signed dword register by sign-extend imm8 value. ELSE EFLAGS.CF, .OF ← 11B;
r16, imm16 Multiplies signed word register by sign-extend imm8 value. // EFLAGS. .ZF, .AF, .PF., SF are undefined
r32, imm32 Multiplies signed dword register by sign-extend imm8 value.
r16, r/m16, imm8 Multiplies signed r/m16 word by sign-extend imm8 value. Stores result in TMP ← SRC1 * SRC2; // TMP is double SRC1 size
word register. DST ← SRC1 * SRC2
r32, r/m32, imm8 Multiplies signed r/m32 dword by sign-extend imm8 value. Stores result IF TMP=DST THEN EFLAGS.CF, .OF ← 00B;
in dword register. ELSE EFLAGS.CF, .OF ← 11B;
r16, r/m16, Multiplies signed r/m16 word by imm16 value. Stores result in word // EFLAGS. .ZF, .AF, .PF., SF are undefined
imm16 register.
r32, r/m32, Multiplies signed r/m32 dword by imm16 value. Stores result in dword
imm32 register.
Signed Divide IDIV r/m8 Divides signed AX by r/m8. Stores result in AL, remainder in AH. AL ← AX / SRC; AH ← AX MOD SRC;
// EFLAGS.CF, .OF, .ZF, .AF, .PF are undefined
r/m16 Divides signed DX:AX by r/m16. Stores result in AX, remainder in DX. AX ← DX:AX / SRC; DX ← DX:AX MOD SRC;
// EFLAGS.CF, .OF, .ZF, .AF, .PF, .SF are undefined
r/m32 Divides signed EDX:EAX by r/m32. Stores result in EAX, remainder in EAX ← EDX:EAX / SRC; EDX ← EDX:EAX MOD SRC;
EDX. // EFLAGS.CF, .OF, .ZF, .AF, .PF, .SF are undefined

Decimal Arithmetic Instructions


Instruction Mnemonic Operands Description Symbolic operations
Decimal Adjust AL after Addition DAA Adjust AL after BCD4 addition.
Decimal Adjust AL after Subtraction DAS Adjust AL after BCD4 subtraction.
ASCII Adjust after Addition AAA Adjust AL after decimal addition. IF ((AL AND 0FH)>9) OR (AF=1) THEN
AL ← AL + 6; AH ← AH + 1; AF ← 1; CF ← 1;
ELSE
CF ← 0; AF ← 0;
END
AL ← AL AND 0FH
ASCII Adjust after Subtraction AAS Adjust AL after decimal addition. IF ((AL AND 0FH)>9) OR (AF=1) THEN
AL ← AL  6; AH ← AH  1; AF ← 1; CF ← 1;
ELSE
CF ← 0; AF ← 0;
END
AL ← AL AND 0FH
ASCII Adjust before Division AAD Adjust AZ before decimal division. TMPAL ← AL; TMPAH ← AH;
AH ← (TMPAL + (TMPAH * 10);
AH ← 0
ASCII Adjust after Multiplication AAM Adjust AZ after decimal multiplication. TMPAL ← AL;
AH ← TMPAL / 10;
AL ← TMPAL MOD 10

Logical Instructions
Instruction Mnemonic Operands Description Symbolic operations
Logical Negation NOT r/m8 Reverses each bit of the operand DST ← NOT SRC;
r/m16 // EFLAGS.CF, .OF, .ZF, .AF, .PF are not affected
r/m32
Logical AND AND AL, imm8 Performs a bitwise AND operation on the destination (first) and source DST ← DST AND SRC;
AX, imm16 (second) operands and stored the result in the destination operand location. EFLAGS.OF, .CF ← 00B;
EAX, imm32 SET EFLAGS.SF, .ZF, .PF // EFLAGS.AF is undefined
Logical Inclusive OR r/m8, imm8 Performs a bitwise OR operation on the destination (first) and source DST ← DST OR SRC;
OR r/m16, imm16 (second) operands and stored the result in the destination operand location. EFLAGS.OF, .CF ← 00B;
r/m32, imm32 SET EFLAGS.SF, .ZF, .PF // EFLAGS.AF is undefined
Logical Exclusive XOR r/m16, imm8 Performs a bitwise XOR operation on the destination (first) and source DST ← DST OR SRC;
OR r/m32, imm8 (second) operands and stored the result in the destination operand location. EFLAGS.OF, .CF ← 00B;
r/m8, r8 SET EFLAGS.SF, .ZF, .PF
r/m16, r16 // EFLAGS.AF is undefined
r/m32, r32
r8, r/m8
r16, r/m16
r32, r/m32

Shift and Rotate Instructions


Instruction Mnemonic Operands Description Symbolic operations
Shift Left SHL r/m8 Shift operand bits left by 1 position. FOR i←1 TO COUNT AND 1FH
r/m8, CL Shift operand bits left by 1 position., CL times. (CF, DST) ← ShiftLeft (DST)
r/m8, imm8 Shift operand bits left by 1 position., imm8 times. NEXT
Shift Right SHR r/m16 Shift operand bits right by 1 position. FOR i←1 TO COUNT AND 1FH
r/m16, CL Shift operand bits right by 1 position, CL times. (DST,CF) ← ShiftRight (DST)
r/m16, imm8 Shift operand bits right by 1 position, imm8 times. NEXT
Shift Arithmetic SAL r/m32 Multiply signed operand by 2. FOR i←1 TO COUNT AND 1FH
Left r/m32, CL Multiply signed operand by 2, CL times. DST← DST * 2
r/m32, imm8 Multiply signed operand by 2, imm8 times. NEXT
Shift Arithmetic SAR Divide signed operand by 2. FOR i←1 TO COUNT AND 1FH
Right Divide signed operand by 2, CL times. DST ← DST / 2
Divide signed operand by 2, imm8 times. NEXT
Rotate Left ROL r/m8 Rotate operand bits left by 1 position. FOR i←1 TO COUNT AND 1FH
r/m8, CL Rotate operand bits left by 1 position., CL times. (CF,DST) ← RotateLeft (DST)
r/m8, imm8 Rotate operand bits left by 1 position., imm8 times. NEXT
Rotate Right ROR r/m16 Rotate operand bits right by 1 position. FOR i←1 TO COUNT AND 1FH
r/m16, CL Rotate operand bits right by 1 position., CL times. (DST,CF) ← RotateRight (DST)
r/m16, imm8 Rotate operand bits right by 1 position., imm8 times. NEXT
Rotate thru Carry RCL r/m32 Rotate operand bits and CF flag left by 1 position. FOR i←1 TO COUNT AND 1FH
Left r/m32, CL Rotate operand bits and CF flag left by 1 position., CL times. (DST,CF) ← RotateLeft (DST,CF)
r/m32, imm8 Rotate operand bits and CF flag left by 1 position., imm8 times. NEXT
Rotate thru Carry RCR Rotate operand bits and CF flag right by 1 position. FOR i←1 TO COUNT AND 1FH
Right Rotate operand bits and CF flag right by 1 position., CL times. (DST,CF) ← RotateRight (DST,CF)
Rotate operand bits and CF flag right by 1 position., imm8 times. NEXT
Shift Left Double SHLD r/m16, r16, imm8 Shift first operand to left by imm8/ CL places while shifting bits from r16 TMP ← DTS2;
r/m16, r16, CL in from the right. FOR i←1 TO COUNT AND 1FH
r/m32, r32, imm8 DST1 ← ShiftLeft (DST1, MSB (TMP));
r/m32, r32, CL TMP ← ShiftLeft (TMP);
NEXT
Shift Right Double SHRD Shift first operand to right by imm8/ CL places while shifting bits from TMP ← DTS2;
r16 in from the left. FOR i←1 TO COUNT AND 1FH
DST1 ← ShiftRight (LSB (TMP), DST1);
TMP ← ShiftRight (TMP);
NEXT

Bit and Byte Instructions


Instruction Mnemonic Operands Description Symbolic operations
Bit Test BT r/m16,r16 Selects the bit in a bit string (specified with the first operand called the bit CF ← BIT (bit-base, bit-offset)
r/m32,r32 base) at the bit position designated by the bit offset operand (second
r/m16,r16 operand) and stores the value of the bit in CF flag.
r/m16,imm8
r/m32,imm8
Bit Test and BTC r/m16,r16 Selects the bit in a bit string (specified with the first operand called the bit CF ← BIT (bit-base, bit-offset);
Complement r/m32,r32 base) at the bit position designated by the bit offset operand (second BIT (bit-base, bit-offset) ← NOT BIT (bit-base, bit-offset)
r/m16,r16 operand) , stores the value of the bit in CF flag, and complements the
r/m16,imm8 selected bit in the bit string.
r/m32,imm8
Bit Test and Reset BTR r/m16,r16 Selects the bit in a bit string (specified with the first operand called the bit CF ← BIT (bit-base, bit-offset);
r/m32,r32 base) at the bit position designated by the bit offset operand (second BIT (bit-base, bit-offset) ← 0
r/m16,r16 operand) , stores the value of the bit in CF flag, and clears the selected bit
r/m16,imm8 to 0.
r/m32,imm8
Bit Test and Set BTS r/m16,r16 Selects the bit in a bit string (specified with the first operand called the bit CF ← BIT (bit-base, bit-offset);
r/m32,r32 base) at the bit position designated by the bit offset operand (second BIT (bit-base, bit-offset) ← 1
r/m16,r16 operand) , stores the value of the bit in CF flag, and sets the selected bit to
r/m16,imm8 1.
r/m32,imm8
Bit Scan Forward BSF r16, r/m16 Searches the source (second) operand for the least significant set bit (1 IF SRC = 0 THEN
r32, r/m32 bit). If a least significant 1 bit is found, its bit index is stored in the ZF ← 1; // DST is undefined;
destination (first) operand. ELSE
ZF ← 0; TMP ← 0;
WHILE BIT (SRC, TMP) = 0 DO
TMP ← TMP + 1; DST ← TMP;
END
END
Bit Scan Reverse BSR r16, r/m16 Searches the source (second) operand for the most significant set bit (1 IF SRC = 0 THEN
r32, r/m32 bit). If a most significant 1 bit is found, its bit index is stored in the ZF ← 1; // DST is undefined;
destination (first) operand. ELSE
ZF ← 0; TMP ← OPERANDSIZE;
WHILE BIT (SRC, TMP) = 0 DO
TMP ← TMP  1; DST ← TMP;
END
END
Conditional Set SETA r/m8 Set byte if above (CF=0 and ZF=0) IF (condition) THEN DST ← 1
Byte SETAE Set byte if above or equal (CF=0) ELSE DST ← 0
SETB Set byte if below (CF=1)
SETBE Set byte if below or equal (CF=1 or ZF=1)
SETC Set byte if carry (CF=1)
SETE Set byte if equal (ZF=1)
SETG Set byte if greater (ZF=0 and SF=OF)
SETGE Set byte if greater or equal (SF=OF)
SETL Set byte if less (SF<>OF)
SETLE Set byte if less or equal (ZF=1 or SF<>OF)
SETNA Set byte if not above (CF=1 or ZF=1)
SETNAE Set byte if not above or equal (CF=1)
SETNB Set byte if not below (CF=0)
SETNBE Set byte if not below or equal (CF=0 and ZF=0)
SETNC Set byte if not carry (CF=0)
SETNE Set byte if not equal (ZF=0)
SETNG Set byte if not greater (ZF=1 or SF<>OF)
SETNGE Set byte if not greater or equal (SF<>OF)
SETNL Set byte if not less (SF=OF)
SETNLE Set byte if not less or equal (ZF=0 and SF=OF)
SETNO Set byte if not overflow (OF=0)
SETNP Set byte if not parity (PF=0)
SETNS Set byte if not sign (SF=0)
SETNZ Set byte if not zero (ZF=0)
SETO Set byte if overflow (OF=1)
SETP Set byte if parity (PF=1)
SETPE Set byte if parity even (PF=1)
SETPO Set byte if parity odd (PF=0)
SETS Set byte if sign (SF=1)
SETZ Set byte if zero (ZF=1)
Logical Compare TEST AL, imm8 Performs a bitwise AND operation on the destination (first) and source TMP ← DST AND SRC;
AX, imm16 (second) operands. Result is not stored, but flags are affected. EFLAGS.OF, .CF ← 00B;
EAX, imm32 SET EFLAGS.SF, .ZF, .PF
r/m8, imm8 // EFLAGS.AF is undefined
r/m16, imm16
r/m32, imm32
r/m8, r8
r/m16, r16
r/m32, r32

Control Transfer Instructions


Instruction Mnemonic Operands Description Symbolic operations
Jump JMP rel8 Jumps near, relative, displacement relative to the next instruction (E)IP ← (E)IP + DST
rel16
rel32
r/m16 Jumps near, absolute indirect, address given in the register or memory (E)IP ← DST
r/m32 location.
ptr16:16 Jumps far, absolute, address given in operand. (E)IP ← DST.Offset;
ptr16:32 CS ← DST.Segment
m16:16 Jumps far, absolute indirect, address given in memory location.
m16:32
Jump if condition JA rel8 Jumps near, relative, if above (CF=0 and ZF=0) IF (condition) THEN (E)IP ← (E)IP + DST;
JAE Jumps near, relative, if above or equal (CF=0)
JB Jumps near, relative, if below (CF=1)
JBE Jumps near, relative, if below or equal (CF=1 or ZF=1)
JC Jumps near, relative, if carry (CF=1)
JE Jumps near, relative, if equal (ZF=1)
JG Jumps near, relative, if greater (ZF=0 and SF=OF)
JGE Jumps near, relative, if greater or equal (SF=OF)
JL Jumps near, relative, if less (SF<>OF)
JLE Jumps near, relative, if less or equal (ZF=1 or SF<>OF)
JNA Jumps near, relative, if not above (CF=1 or ZF=1)
JNAE Jumps near, relative, if not above or equal (CF=1)
JNB Jumps near, relative, if not below (CF=0)
JNBE Jumps near, relative, if not below or equal (CF=0 and ZF=0)
JNC Jumps near, relative, if not carry (CF=0)
JNE Jumps near, relative, if not equal (ZF=0)
JNG Jumps near, relative, if not greater (ZF=1 or SF<>OF)
JNGE Jumps near, relative, if not greater or equal (SF<>OF)
JNL Jumps near, relative, if not less (SF=OF)
JNLE Jumps near, relative, if not less or equal (ZF=0 and SF=OF)
JNO Jumps near, relative, if not overflow (OF=0)
JNP Jumps near, relative, if not parity (PF=0)
JNS Jumps near, relative, if not sign (SF=0)
JNZ Jumps near, relative, if not zero (ZF=0)
JO Jumps near, relative, if overflow (OF=1)
JP Jumps near, relative, if parity (PF=1)
JPE Jumps near, relative, if parity even (PF=1)
JPO Jumps near, relative, if parity off (PF=0)
JS Jumps near, relative, if sign (SF=1)
JZ Jumps near, relative, if zero (ZF=1)
Jump on (E)CX Zero JCXZ rel8 Jumps near, relative, if CX is zero IF (CX=0) THEN (E)IP ← (E)IP + DST;
JECXZ Jumps near, relative, if ECX is zero IF (ECX=0) THEN (E)IP ← (E)IP + DST;
Loop with counter LOOP rel8 Decrements counter, jumps near, relative, if counter<>0. DEC (E)CX;
IF ((E)CX<>0) THEN (E)IP ← (E)IP + DST;
Loop with counter while LOOPZ rel8 Decrements counter, jumps near, relative, if counter<>0 and ZF=1. DEC (E)CX;
Zero/Equal LOOPE IF ((E)CX<>0 AND ZF=1) THEN (E)IP ← (E)IP + DST;
Loop with counter while LOOPNZ rel8 Decrements counter, jumps near, relative, if counter<>0 and ZF=0. DEC (E)CX;
Not Zero/ Not Equal LOOPNE IF ((E)CX<>0 AND ZF=0) THEN (E)IP ← (E)IP + DST;
Call procedure CALL rel16 Calls near, relative, displacement relative to the next instruction PUSH (E)IP;
rel32 (E)IP ← (E)IP + DST
r/m16 Calls near, absolute indirect, address given in the register or memory PUSH (E)IP;
r/m32 location. (E)IP ← DST
ptr16:16 Calls far, absolute, address given in operand. PUSH CS;
ptr16:32 PUSH (E)IP;
m16:16 Calls far, absolute indirect, address given in memory location. (E)IP ← DST.Offset;
m16:32 CS ← DST.Segment
Return from procedure RET Returns from near or far procedure (depending on procedure kind). POP (E)IP
imm16 Returns from near or far procedure (depending on procedure kind) and POP (E)IP;
pop imm16 bytes from the stack. (E)SP ← (E)SP+SRC
Interrupt call INT imm8 Calls to interrupt or exception handler using interrupt vector specified by IF REALMODE THEN
interrupt number. PUSH FLAGS
Interrupt 3 call INT 3 Calls to debugger trap. EFLAGS.IF, .TF, .AC ← 000B;
Interrupt on Overflow INTO Calls to interrupt or exception handler 4 if overflow flag is set to 1. PUSH CS;
PUSH IP;
CS ← IDT[DST].Segment;
(E)IP ← IDT[DST].Offset;
ELSE

Interrupt Return IRET Returns from the interrupt or exception handler (16 bits) IF REALMODE THEN
IRETD Returns from the interrupt or exception handler (32 bits) POP (E)IP;
POP CS;
POP TMP;
EFLAGS ← (TMP AND 257FD5H)
OR (EFLAGS AND 1A0000H)
ELSE

Enter procedure ENTER imm16,0 Creates a stack frame for a procedure. The first operand specifies the size NestingLevel ← NestingLevel MOD 32;
imm16,1 of the stack frame (in bytes), the second operand gives the lexical nesting PUSH (E)BP;
imm16, imm8 level (0 to 31) of the procedure. It determines the number of the stack FrameTMP ← (E)SP
frame pointers, that are copied into the “display area” of the new stack IF NestingLevel>0 THEN
frame from the preceding frame. FOR I ← 1 TO NestingLevel  1 DO
(E)BP ← (E)BP  OPERANDSIZE/8;
PUSH [EBP]
NEXT
PUSH FrameTMP
END
(E)BP ← FrameTMP;
(E)SP ← (E)SPSize;
Leave procedure LEAVE Releases the stack frame set up by an earlier ENTER instruction. (E)SP ← (E)BP;
POP (E)BP;
check Bounds BOUND r16, m16[2] Checks if array index in the first operand is within bounds specified by the IF (REG < MEM[0] OR REG >MEM[1]) THEN
r32, m32[2] second (memory) operand. If not, then a Bound Range exception is raised. RAISE #BR
END

String Instructions
Instruction Mnemonic Operands Description Symbolic operations
Move String item MOVS m8, m8 Moves byte, word or dword from address DS:(E)SI to the byte, word or ES:[(E)DI] ← DS:[(E)SI];
m16, m16 double word at address ES:(E)DI, and increases or decreases (E)SI and IF (DF=0) THEN
m32, m32 (E)DI (depending on DF flag). Both operands specify only the type of the (E)SI ← (E)SI + SIZEOF(SRC);
compared data, not the location. The locations of the operands are always (E)DI ← (E)DI + SIZEOF(DST);
specified by the DS:(E)SI and ES:(E)DI registers. ELSE
Move String Byte MOVSB Moves byte from address DS:(E)SI to the byte at address ES:(E)DI, and (E)SI ← (E)SI  SIZEOF(SRC);
increases or decreases (E)SI and (E)DI (depending on DF flag).
(E)DI ← (E)DI  SIZEOF(DST);
Move String Word MOVSW Moves word from address DS:(E)SI to the word at address ES:(E)DI, and
increases or decreases (E)SI and (E)DI (depending on DF flag). END
Move String Dword MOVSD Moves dword from address DS:(E)SI to the dword at address ES:(E)DI,
and increases or decreases (E)SI and (E)DI (depending on DF flag).
Repeat Move String REP MOVS m8, m8 Moves (E)CX bytes, words or dwords from address DS:(E)SI to the byte, WHILE (E)CX<>0 DO
item m16, m16 word or double word at address ES:(E)DI, and increases or decreases MOVS DST, SRC;
m32, m32 (E)SI and (E)DI (depending on DF flag). Both operands specify only the (E)CX ← (E)CX 1
type of the compared data, not the location. The locations of the operands END
are always specified by the DS:(E)SI and ES:(E)DI registers.
Repeat Move String REP MOVSB Moves (E)CX bytes from address DS:(E)SI to the address ES:(E)DI, and WHILE (E)CX<>0 DO
Byte increases or decreases (E)SI and (E)DI (depending on DF flag). MOVS(B|W|D)
Repeat Move String REP MOVSW Moves (E)CX words from address DS:(E)SI to the address ES:(E)DI, and (E)CX ← (E)CX 1
Word increases or decreases (E)SI and (E)DI (depending on DF flag). END
Repeat Move String REP MOVSD Moves (E)CX dwords from address DS:(E)SI to the address ES:(E)DI, and
Dword increases or decreases (E)SI and (E)DI (depending on DF flag).
Load String item LODS m8 Loads byte from address DS:(E)SI to AL, increases or decreases (E)SI. ACC ← DS:[(E)SI];
m16 Loads word from address DS:(E)SI to AX, increases or decreases (E)SI. IF (DF=0) THEN
m32 Loads dword from address DS:(E)SI to EAX, increases or decreases (E)SI (E)SI ← (E)SI + SIZEOF(SRC);
Load String Byte LODSB Loads byte from address DS:(E)SI to AL, increases or decreases (E)SI. ELSE
Load String Word LODSW Loads word from address DS:(E)SI to AX, increases or decreases (E)SI. (E)SI ← (E)SI  SIZEOF(SRC);
Load String Dword LODSD Loads dword from address DS:(E)SI to EAX, increases or decreases (E)SI. END
Repeat Load String item REP LODS m8 Loads byte (E)CX times from address DS:(E)SI to AL. WHILE (E)CX<>0 DO
m16 Loads word (E)CX times from address DS:(E)SI to AX. LODS DST;
m32 Loads dword (E)CX times from address DS:(E)SI to EAX. (E)CX ← (E)CX 1
END
Repeat Load String Byte REP LODSB Loads byte (E)CX times from address DS:(E)SI to AL. WHILE (E)CX<>0 DO
LODS(B|W|D)
Repeat Load String REP LODSW Loads word (E)CX times from address DS:(E)SI to AX. (E)CX ← (E)CX 1
Word END
Repeat Load String REP LODSD Loads dword (E)CX times from address DS:(E)SI to EAX.
Dword
Load String item LODS m8 Stores byte from AL to address ES:(E)DI, increases or decreases (E)DI. ES:[(E)DI] ← ACC;
m16 Stores word from AX to address ES:(E)DI, increases or decreases (E)DI. IF (DF=0) THEN
m32 Stores dword from EAX address ES:(E)DI, increases or decreases (E)DI (E)DI ← (E)DI + SIZEOF(DST);
Store String Byte STOSB Stores byte from AL to address ES:(E)DI, increases or decreases (E)DI. ELSE
Store String Word STOSW Stores word from AX to address ES:(E)DI, increases or decreases (E)DI. (E)DI ← (E)DI  SIZEOF(DST);
Store String Dword STOSD Stores dword from EAX to address ES:(E)DI, increases or decreases END
(E)DI.
Repeat Store String item REP STOS m8 Stores byte (E)CX times from AL to address ES:(E)DI. WHILE (E)CX<>0 DO
m16 Stores word (E)CX times from AX to address ES:(E)DI. STOS DST;
m32 Stores dword (E)CX times from EAX to address ES:(E)DI. (E)CX ← (E)CX 1
END
Repeat Store String REP STOSB Stores byte (E)CX times from AL to address ES:(E)DI. WHILE (E)CX<>0 DO
Byte STOS(B|W|D)
Repeat Store String REP STOSW Stores word (E)CX times from AX to address ES:(E)DI. (E)CX ← (E)CX 1
Word END
Repeat Store String REP STOSD Stores dword (E)CX times from EAX address ES:(E)DI.
Dword
Compare String item CMPS m8, m8 Compares byte, word or dword at address DS:(E)SI with byte, word or TMP ← ES:[(E)DI]  DS:[(E)SI];
m16, m16 dword at address ES:(E)DI and sets the status flags accordingly. Both SET EFLAGS;
m32, m32 operands specify only the type of the compared data, not the location. The IF (DF=0) THEN
locations of the operands are always specified by the DS:(E)SI and (E)SI ← (E)SI + SIZEOF(SRC);
ES:(E)DI registers.
(E)DI ← (E)DI + SIZEOF(DST);
Compare String Byte CMPSB Compares byte at address DS:(E)SI with byte at address ES:(E)DI and sets
ELSE
the status flags accordingly.
(E)SI ← (E)SI  SIZEOF(SRC);
Compare String Word CMPSW Compares word at address DS:(E)SI with word at address ES:(E)DI and
sets the status flags accordingly. (E)DI ← (E)DI  SIZEOF(DST);
Compare String Dword CMPSD Compares dword at address DS:(E)SI with dword at address ES:(E)DI and END
sets the status flags accordingly.
Repeat Compare String REPE CMPS m8, m8 Repeats (E)CX times comparing byte, word or dword at address DS:(E)SI WHILE (E)CX<>0 DO
item until Equal / Zero REPZ CMPS m16, m16 with byte, word or double word at address ES:(E)DI until ZF flag is set to (E)CX ← (E)CX 1;
m32, m32 0. CMPS DST,SRC;
UNTIL ZF=0
Repeat Compare String REPE CMPSB Repeats (E)CX times comparing byte at address DS:(E)SI with byte at WHILE (E)CX<>0 DO
Byte until Equal / Zero REPZ CMPSB address ES:(E)DI until ZF flag is set to 0. (E)CX ← (E)CX 1;
Repeat Compare String REPE CMPSW Repeats (E)CX times comparing word at address DS:(E)SI with word at CMPS(B|W|D)
Word until Equal / Zero REPZ CMPSW address ES:(E)DI until ZF flag is set to 0.
Repeat Compare String REPE CMPSD Repeats (E)CX times comparing dword at address DS:(E)SI with dword at UNTIL ZF=0
Dword until Equal / REPZ CMPSD address ES:(E)DI until ZF flag is set to 0.
Zero
Repeat Compare String REPNE CMPS m8, m8 Repeats (E)CX times comparing byte, word or dword at address DS:(E)SI WHILE (E)CX<>0 DO
item until Not Equal / REPNZ CMPS m16, m16 with byte, word or double word at address ES:(E)DI until ZF flag is set to (E)CX ← (E)CX 1;
Not Zero m32, m32 1. CMPS DST,SRC;
UNTIL ZF=1
Repeat Compare String REPNE CMPSB Repeats (E)CX times comparing byte at address DS:(E)SI with byte at WHILE (E)CX<>0 DO
Byte until Not Equal REPNZ CMPSB address ES:(E)DI until ZF flag is set to 1. (E)CX ← (E)CX 1;
/Not Zero CMPS(B|W|D)
Repeat Compare String REPNE CMPSW Repeats (E)CX times comparing word at address DS:(E)SI with word at UNTIL ZF=1
Word until Not Equal / REPNZ CMPSW address ES:(E)DI until ZF flag is set to 1.
Not Zero
Repeat Compare String REPNE CMPSD Repeats (E)CX times comparing dword at address DS:(E)SI with dword at
Dword until Not Equal REPNZ CMPSD address ES:(E)DI until ZF flag is set to 1.
Scan String item SCAS m8 Compares AL with byte at ES:(E)DI and sets status flag. TMP ← ACC  DS:[(E)SI];
m16 Compares AX with byte at ES:(E)DI and sets status flag. SET EFLAGS;
m32 Compares EAX with byte at ES:(E)DI and sets status flag. IF (DF=0) THEN
Scan String Byte SCASB Compares AL with byte at ES:(E)DI and sets status flag. (E)SI ← (E)SI + SIZEOF(SRC);
Scan String Word SCASW Compares AX with byte at ES:(E)DI and sets status flag. (E)DI ← (E)DI + SIZEOF(DST);
Scan String Dword SCASD Compares EAX with byte at ES:(E)DI and sets status flag. ELSE
(E)SI ← (E)SI  SIZEOF(SRC);
(E)DI ← (E)DI  SIZEOF(DST);
END
Repeat Scan String item REPE SCAS m8 Repeats (E)CX times comparing accumulator with byte, word or dword at WHILE (E)CX<>0 DO
until Equal / Zero REPZ SCAS m16 address ES:(E)DI until ZF flag is set to 0. (E)CX ← (E)CX 1;
m32 SCAS DST;
UNTIL ZF=1
Repeat Scan String Byte REPE SCASB Repeats (E)CX times comparing AL with byte at address ES:(E)DI until WHILE (E)CX<>0 DO
until Equal / Zero REPZ SCASB ZF flag is set to 0. (E)CX ← (E)CX 1;
Repeat Scan String REPE SCASW Repeats (E)CX times comparing AX with word at address ES:(E)DI until SCAS(B|W|D);
Word until Equal / Zero REPZ SCASW ZF flag is set to 0. UNTIL ZF=1
Repeat Scan String REPE SCASD Repeats (E)CX times comparing EAX with dword at address ES:(E)DI
Dword until Equal / REPZ SCASD until ZF flag is set to 0.
Zero
Repeat Scan String item REPNE SCAS m8 Repeats (E)CX times comparing accumulator with byte, word or dword at WHILE (E)CX<>0 DO
until Not Equal / Not REPNZ SCAS m16 address ES:(E)DI until ZF flag is set to 1. (E)CX ← (E)CX 1;
Zero m32 SCAS DST;
UNTIL ZF=1
Repeat Scan String Byte REPNE SCASB Repeats (E)CX times comparing AL with byte at address ES:(E)DI until WHILE (E)CX<>0 DO
until Not Equal / Not REPNZ SCASB ZF flag is set to 1. (E)CX ← (E)CX 1;
Zero SCAS(B|W|D);
Repeat Scan String REPNE SCASW Repeats (E)CX times comparing AX with word at address ES:(E)DI until UNTIL ZF=1
Word until Not Equal / REPNZ SCASW ZF flag is set to 1.
Not Zero
Repeat Scan String REPNE SCASD Repeats (E)CX times comparing EAX with dword at address ES:(E)DI
Dword until Not Equal / REPNZ SCASD until ZF flag is set to 1.
Not Zero
Input String item INS m8, DX Inputs byte, word or dword from I/O specified in DX into memory ES:[(E)DI] ← Port(DX)
m16, DX location specified with ES:(E)DI. Increments or decrements (E)DI. IF (DF=0) THEN
m32, DX (E)DI ← (E)DI + SIZEOF(DST);
Input String Byte INSB Inputs byte from I/O specified in DX into memory location specified with ELSE
ES:(E)DI. Increments or decrements (E)DI. (E)DI ← (E)DI  SIZEOF(DST);
Input String Word INSB Inputs word from I/O specified in DX into memory location specified with END
ES:(E)DI. Increments or decrements (E)DI.
Input String Dword INSB Inputs dword from I/O specified in DX into memory location specified
with ES:(E)DI. Increments or decrements (E)DI.
Repeat Input String item REP INS m8, DX Inputs (E)CX bytes, words or dwords from I/O specified in DX into WHILE (E)CX<>0 DO
m16, DX memory at address specified with ES:(E)DI. Increments or decrements INS SRC,DX;
m32, DX (E)DI. (E)CX ← (E)CX  1
END
Repeat Input String REP INSB Inputs (E)CX bytes from I/O specified in DX into memory at address WHILE (E)CX<>0 DO
Byte specified with ES:(E)DI. Increments or decrements (E)DI. INS(B|W|D);
Repeat Input String REP INSW Inputs (E)CX words from I/O specified in DX into memory at address (E)CX ← (E)CX  1
Word specified with ES:(E)DI. Increments or decrements (E)DI. END
Repeat Input String REP INSD Inputs (E)CX dwords from I/O specified in DX into memory at address
Dword specified with ES:(E)DI. Increments or decrements (E)DI.
Output String item OUTS DX, m8 Outputs byte, word or dword from memory location specified with Port(DX) ← DS:[(E)SI]
DX, m16 DS:(E)SI to I/O specified in DX into. Increments or decrements (E)SI. IF (DF=0) THEN
DX, m32 (E)SI ← (E)SI + SIZEOF(SRC);
Output String Byte OUTSB Outputs byte from memory location specified with DS:(E)SI to I/O ELSE
specified in DX into. Increments or decrements (E)SI. (E)SI ← (E)SI  SIZEOF(SRC);
Output String Word OUTSB Outputs word from memory location specified with DS:(E)SI to I/O END
specified in DX into. Increments or decrements (E)SI.
Output String Dword OUTSB Outputs dword from memory location specified with DS:(E)SI to I/O
specified in DX into. Increments or decrements (E)SI.
Repeat Output String REP OUTS DX, m8 Outputs (E)CX bytes, words or dwords from memory location specified WHILE (E)CX<>0 DO
item DX, m16 with DS:(E)SI to I/O specified in DX into. Increments or decrements OUTS SRC,DX;
DX, m32 (E)SI. (E)CX ← (E)CX  1
END
Repeat Output String REP OUTSB Outputs (E)CX bytes from memory location specified with DS:(E)SI to WHILE (E)CX<>0 DO
Byte I/O specified in DX into. Increments or decrements (E)SI. OUTS(B|W|D);
Repeat Output String REP OUTSW Outputs (E)CX words from memory location specified with DS:(E)SI to (E)CX ← (E)CX  1
Word I/O specified in DX into. Increments or decrements (E)SI. END
Repeat Output String REP OUTSD Outputs (E)CX dwords from memory location specified with DS:(E)SI to
Dword I/O specified in DX into. Increments or decrements (E)SI.

Flag Control Instructions


Instruction Mnemonic Operands Description Symbolic operations
Clear Carry Flag CLC Clears CF flag in the EFLAGS register. CF ← 0
Complement Carry CMC Complements CF flag in the EFLAGS register. CF ← NOT CF
Flag
Set Carry Flag STC Sets CF flag in the EFLAGS register CF ← 1
Clear Direction flag CLD Clears DF Flag in the EFLAGS register. When the DF flag is set to 0, DF ← 0
string instructions increment the index registers.
Set Direction flag STD Sets DF Flag in the EFLAGS register to 1. When the DF flag is set to 1, DF ← 1
string instructions decrement the index registers.
Clear Interrupt Flag CLI Clears interrupt flag; interrupts disabled when IF flag is cleared. IF ← 0
Set Interrupt Flag STI Sets interrupt flag; interrupts enabled when IF flag is set to 1. IF ← 1
Push Flags PUSHF Pushes FLAGS (16 bits). PUSH FLAGS
PUSHFD Pushes EFLAGS. PUSH (EFLAGS AND 00FCFFFFH)
// VM and RF flags are cleared on the stack
Pop Flags POPF Pops FLAGS (16 bits). POP FLAGS
POPFD Pops EFLAGS. POP EFLAGS
// All non-reserved flags except VIP, VIF and VM can be modified.
// VIP and VIF are celared; VM is unnafected
Load Flags into AH LAHF Moves the low byte of the EFLAGS register into AH register. Reserved AH ← EFLAGS[SF:ZF:0:AF:PF:1:CF]
bits (1, 3 , 5) are set accordingly to 1, 0, 0
Store AH into Flags SAHF Moves AH register into the low byte of the EFLAGS. Reserved bits (1, 3, EFLAGS[SF:ZF:0:AF:PF:1:CF] ← AH
5) are ignored.

Segment Register Instructions


Instruction Mnemonic Operands Description Symbolic operations
Load far pointer LDS r16, m16:16 Loads DS:r16 with far pointer from memory. DS ← SRC.Segment;
using DS r32, m32:32 Loads DS:r32 with far pointer from memory DST ← SRC.Offset;
Load far pointer LES r16, m16:16 Loads ES:r16 with far pointer from memory. ES ← SRC.Segment;
using ES r32, m32:32 Loads ES:r32 with far pointer from memory DST ← SRC.Offset;
Load far pointer LFS r16, m16:16 Loads FS:r16 with far pointer from memory. FS ← SRC.Segment;
using FS r32, m32:32 Loads FS:r32 with far pointer from memory DST ← SRC.Offset;
Load far pointer LGS r16, m16:16 Loads GS:r16 with far pointer from memory. GS ← SRC.Segment;
using GS r32, m32:32 Loads GS:r32 with far pointer from memory DST ← SRC.Offset;
Load far pointer LSS r16, m16:16 Loads SS:r16 with far pointer from memory. SS ← SRC.Segment;
using SS r32, m32:32 Loads SS:r32 with far pointer from memory DST ← SRC.Offset;

Miscellaneous Instructions
Instruction Mnemonic Operands Description Symbolic operations
Load effective LEA r16, m Stores effective address for m in the destination register. DST ← EffectiveAddress(m)
address r32, m
No Operation NOP Do nothing.
Undefined UD2 Raises invalid opcode exception.
instruction
Table Look-up XLAT m8 Set AL to memory byte DS:[(E)BX + unsigned AL] AL ← DS:[(E)BX + ZeroExtend(AL)]
Translation XLATB
CPU Identification CPUID Returns processor identification and feature information to the EAX,
EBX, ECX and EDX registers, according to the input value entered
initially in the EAX register
FPU Instruction Set

Data Transfer Instructions


Instruction Mnemonic Operands Description Symbolic operations
FPU Load FLD m32fp Pushes the source operand onto the FPU stack. FPU.Top ← (FPU.Top  1) MOD 8;
m64fp ST(0) ← SRC
m80fp
ST(i)
FPU Store FST m32fp Stores ST(0) to destination operand. DST ← ST(0)
m64fp
ST(i)
FPU Store and Pop FSTP m32fp Stores ST(0) to destination operand and pops the register stack DST ← ST(0);
m64fp FPU.Pop
m80fp
ST(i)
FPU Integer Load FILD m16int Converts the signed integer value from memory into extended FPU.Top ← (FPU.Top  1) MOD 8;
m32int floating-point and pushes it onto the FPU stack. ST(0) ← IntToExtended(SRC)
m64int
FPU Integer Store FIST m16int Converts the value in the ST(0) to the signed integer value and DST ← ExtendedToInt(ST(0));
m32int stores the result in the destination operand.
FPU Integer Store and Pop FISTP m16int Converts the value in the ST(0) to the signed integer value, DST ← ExtendedToInt(ST(0));
m32int stores the result in the destination operand, and pops register FPU.Pop;
m64int stack.
FPU BCD Load FBLD m80bcd Converts BCD value from memory to floating-point and pushes FPU.Top ← (FPU.Top  1) MOD 8;
it onto the FPU stack. ST(0) ← BCDToExtended(SRC)
FPU BCD Store and Pop FBSTP m80bcd Converts the value in the ST(0) to and 18-digit packed BCD, DST ← ExtendedToBCD(ST(0));
stores the result in memory, and pops the register stack. FPU.Pop;
FPU Exchange FXCH ST(i) Exchanges the contents of the registers ST(0) and ST(i). TMP ← ST(0); ST(0) ← SRC; SRC ← TMP
Exchanges the contents of the registers ST(0) and ST(1). TMP ← ST(0); ST(0) ← ST(1); ST(1) ← TMP
FPU Move if Equal FCMOVE ST(0), ST(i) Moves ST(i) to ST(0) when ZF=1. IF (condition) THEN ST(0) ← ST(i)
FPU Move if Below FCMOVB ST(0), ST(i) Moves ST(i) to ST(0) when CF=1.
FPU Move if Below or Equal FCMOVBE ST(0), ST(i) Moves ST(i) to ST(0) when CF=1 or ZF=1.
FPU Move if Unordered FCMOVU ST(0), ST(i) Moves ST(i) to ST(0) when PF=1.
FPU Move if Not Equal FCMOVNE ST(0), ST(i) Moves ST(i) to ST(0) when ZF=0.
FPU Move if Not Below FCMOVNB ST(0), ST(i) Moves ST(i) to ST(0) when CF=0.
FPU Move if Not Below or Equal FCMOVNBE ST(0), ST(i) Moves ST(i) to ST(0) when CF=0 and ZF=0.
FPU Move if Not Unordered FCMOVNU ST(0), ST(i) Moves ST(i) to ST(0) when PF=0.

Floating-Point Basic Arithmetic Instructions


Instruction Mnemonic Operands Description Symbolic operations
FPU Add FADD m32fp Adds single-precision floating-point value from memory to ST(0) ST(0) ← ST(0) + SingleToExtended(SRC)
m64fp Adds double-precision floating-point value from memory to ST(0) ST(0) ← ST(0) + DoubleToExtended(SRC)
ST(0), ST(i) Adds ST(i) to ST(0) and stores result in ST(0) ST(0) ← ST(0) + ST(i)
ST(i), ST(0) Adds ST(0) to ST(i) and stores result in ST(i) ST(i) ← ST(i) + ST(0)
FPU Add and Pop FADDP ST(i), ST(0) Adds ST(0) to ST(i), stores result in ST(i), and pops the register stack ST(i) ← ST(i) + ST(0); FPU.Pop;
FADDP Adds ST(0) to ST(1), stores result in ST(1), and pops the register stack ST(1) ← ST(1) + ST(0); FPU.Pop;
FPU Integer Add FIADD m32int Adds double-word signed integer from memory to ST(0) ST(0) ← ST(0) + IntToExtended(SRC)
m64int Adds quad-word signed integer from memory to ST(0) ST(0) ← ST(0) + Int64ToExtended(SRC)
FPU Subtract FSUB m32fp Subtracts single-precision floating-point value from memory from ST(0) ST(0) ← ST(0)  SingleToExtended(SRC)
m64fp Subtracts double-precision floating-point value from memory from ST(0) ST(0) ← ST(0)  DoubleToExtended(SRC)
ST(0), ST(i) Subtracts ST(i) from ST(0) and stores result in ST(0) ST(0) ← ST(0)  ST(i)
ST(i), ST(0) Subtracts ST(0) from ST(i) and stores result in ST(i) ST(i) ← ST(i)  ST(0)
FPU Subtract and Pop FSUBP ST(i), ST(0) Subtracts ST(0) from ST(i), stores result in ST(i), and pops the register stack ST(i) ← ST(i)  ST(0); FPU.POP;
FSUBP Subtracts ST(0) from ST(1), stores result in ST(1), and pops the register stack ST(1) ← ST(1)  ST(0); FPU.POP;
FPU Integer Subtract FISUB m32int Subtracts double-word signed integer from memory from ST(0) ST(0) ← ST(0)  IntToExtended(SRC)
FISUB m64int Subtracts quad-word signed integer from memory from ST(0) ST(0) ← ST(0)  Int64ToExtended(SRC)
FPU Subtract Reverse FSUBR m32fp Subtracts ST(0) from single-precision floating-point value from memory and stores ST(0) ← SingleToExtended(SRC)  ST(0)
result in ST(0)
m64fp Subtracts ST(0) from double-precision floating-point value from memory and stores ST(0) ← DoubleToExtended(SRC)  ST(0)
result in ST(0)
ST(0), ST(i) Subtracts ST(0) from ST(i) and stores result in ST(0) ST(0) ← ST(i)  ST(0)
ST(i), ST(0) Subtracts ST(i) from ST(0) and stores result in ST(i) ST(i) ← ST(0)  ST(i)
FPU Subtract Reverse FSUBRP ST(i), ST(0) Subtracts ST(i) from ST(0), stores result in ST(i), and pops the register stack ST(i) ← ST(0)  ST(i); FPU.POP;
and Pop FSUBRP Subtracts ST(1) from ST(0), stores result in ST(1), and pops the register stack ST(1) ← ST(0)  ST(1); FPU.POP;
FPU Integer Subtract FISUBR m32int Subtracts ST(0) from double-word signed integer from memory and stores result in ST(0) ← IntToExtended(SRC)  ST(0)
Reverse ST(0)
FISUBR m64int Subtracts ST(0) from quad-word signed integer from memory and stores result in ST(0) ← Int64ToExtended(SRC)  ST(0)
ST(0)
FPU Multiply FMUL m32fp Multiplies ST(0) by single-precision floating-point value from memory. ST(0) ← ST(0) * SingleToExtended(SRC)
m64fp Multiplies ST(0) by double-precision floating-point value from memory. ST(0) ← ST(0) * DoubleToExtended(SRC)
ST(0), ST(i) Multiplies ST(0) by ST(i) and stores result in ST(0). ST(0) ← ST(0) * ST(i)
ST(i), ST(0) Multiplies ST(i) by ST(0) and stores result in ST(i). ST(i) ← ST(i) * ST(0)
FPU Multiply and Pop FMULP ST(i), ST(0) Multiplies ST(i) by ST(0), stores result in ST(i), and pops the register stack. ST(i) ← ST(i) * ST(0); FPU.Pop;
FMULP Multiplies ST(1) by ST(0), stores result in ST(1), and pops the register stack. ST(1) ← ST(1) * ST(0); FPU.Pop;
FPU Integer Multiply FIMUL m32int Multiplies ST(0) by double-word signed integer from memory. ST(0) ← ST(0) * IntToExtended(SRC)
m64int Multiplies ST(0) by quad-word signed integer from memory. ST(0) ← ST(0) * Int64ToExtended(SRC)
FPU Divide FDIV m32fp Divides ST(0) by single-precision floating-point value from memory. ST(0) ← ST(0) / SingleToExtended(SRC)
m64fp Divides ST(0) by double-precision floating-point value from memory. ST(0) ← ST(0) / DoubleToExtended(SRC)
ST(0), ST(i) Divides ST(0) by ST(i) and stores result in ST(0). ST(0) ← ST(0) / ST(i)
ST(i), ST(0) Divides ST(i) by ST(0) and stores result in ST(i). ST(i) ← ST(i) / ST(0)
FPU Divide and Pop FDIVP ST(i), ST(0) Divides ST(i) by ST(0), stores result in ST(i), and pops the register stack. ST(i) ← ST(i) / ST(0); FPU.Pop;
FDIVP Divides ST(1) by ST(0), stores result in ST(1), and pops the register stack. ST(1) ← ST(1) / ST(0); FPU.Pop;
FPU Integer Divide FIDIV m32int Divides ST(0) by double-word signed integer from memory. ST(0) ← ST(0) / IntToExtended(SRC)
m64int Divides ST(0) by quad-word signed integer from memory. ST(0) ← ST(0) / Int64ToExtended(SRC)
FPU Divide Reverse FDIVR m32fp Divides single-precision floating-point value from memory by ST(0). ST(0) ← SingleToExtended(SRC) / ST(0)
m64fp Divides double-precision floating-point value from memory by ST(0). ST(0) ← DoubleToExtended(SRC) / ST(0)
ST(0), ST(i) Divides ST(i) by ST(0) and stores result in ST(0). ST(0) ← ST(i) / ST(0)
ST(i), ST(0) Divides ST(0) by ST(i) and stores result in ST(i). ST(i) ← ST(0) / ST(i)
FPU Divide Reverse FDIVRP ST(i), ST(0) Divides ST(0) by ST(i), stores result in ST(i), and pops the register stack. ST(i) ← ST(0) / ST(i); FPU.Pop;
and Pop FDIVRP Divides ST(0) by ST(1), stores result in ST(1), and pops the register stack. ST(1) ← ST(0) / ST(1); FPU.Pop;
FPU Integer Divide FIDIVR m32int Divides double-word signed integer from memory by ST(0). ST(0) ← IntToExtended(SRC) / ST(0)
Reverse m64int Divides quad-word signed integer from memory by ST(0). ST(0) ← Int64ToExtended(SRC) / ST(0)
FPU Partial Remainder FPREM Replaces ST(0) with partial remainder obtained from dividing ST(0) by ST(1). ST(0) ← PartialRemainder (ST(0),ST(1))
FPU Partial Remainder FPREM1 Replaces ST(0) with the IEEE remainder obtained from dividing ST(0) by ST(1). ST(0) ← IEEERemainder (ST(0),ST(1))
The quotient of ST(0) divided by ST(1) is rounded to an integer.
FPU Absolute Value FABS Replaces ST(0) with its absolute value. ST(0) ← Abs( ST(0))
FPU Change Sign FCHS Complements the sign bit of ST(0). ST(0) ←  ST(0)
FPU Round to Integer FRNDINT Rounds ST(0) to an integer. ST(0) ← Round (ST(0))
FPU Scale FSCALE Scales ST(0) by ST(1). Truncates the value in the ST(1) to an integral value (toward ST(0) ← ST(0) *Base2Power(Truncate(ST(1)))
0) and adds that value to the exponent of ST(0)
FPU Square Root FSQRT Replaces ST(0) with its square root. ST(0) ← SquareRoot(ST(0))
FPU Extract Exponent FXTRACT Separates value in ST(0) into exponent and significant, stores exponent in ST(0) and TMP ← Significant (ST(0))
and Significant pushes the significant onto the register stack ST(0) ← Exponent (ST(0))
FPU.Top ← (FPU.Top  1) MOD 8;
ST(0) ← TMP;

Floating-Point Comparison Instructions


Instruction Mnemonic Operands Description Symbolic operations
FPU Compare FCOM m32fp Compares single-precision floating-point value from memory with ST(0) CASE OF
m64fp Compares double-precision floating-point value from memory with ST(0) ST(0) > ST(i): FPU.C3, .C2, .C0 ← 000B;
ST(i) Compares ST(i) with ST(0) ST(0) < ST(i): FPU.C3, .C2, .C0 ← 001B;
Compares ST(1) with ST(0) ST(0) = ST(i): FPU.C3, .C2, .C0 ← 100B;
FPU Compare and Pop FCOMP m32fp Compares single-precision floating-point value from memory with ST(0) and pops. Unordeded: FPU.C3, .C2, .C0 ← 111B; // optionally
m64fp Compares double-precision floating-point value from memory with ST(0) and pops. END
ST(i) Compares ST(i) with ST(0) and pops register stack. FPU.Pop; // optionally
Compares ST(1) with ST(0) and pops register stack. FPU.Pop; // optionally
FPU Compare and Pop FCOMPP Compares ST(1) with ST(0) and pops register stack twice.
and Pop
FPU Unordered FUCOM m32fp Compares single-precision floating-point value from memory with ST(0) CASE OF
Compare m64fp Compares double-precision floating-point value from memory with ST(0) ST(0) > ST(i): FPU.C3, .C2, .C0 ← 000B;
ST(i) Compares ST(i) with ST(0) ST(0) < ST(i): FPU.C3, .C2, .C0 ← 001B;
Compares ST(1) with ST(0) ST(0) = ST(i): FPU.C3, .C2, .C0 ← 100B;
FPU Unordered FUCOMP m32fp Compares single-precision floating-point value from memory with ST(0) and pops. Unordeded: FPU.C3, .C2, .C0 ← 111B; // optionally
Compare m64fp Compares double-precision floating-point value from memory with ST(0) and pops. END
ST(i) Compares ST(i) with ST(0) and pops register stack. FPU.Pop; // optionally
Compares ST(1) with ST(0) and pops register stack. FPU.Pop; // optionally
FPU Unordered FUCOMPP Compares ST(1) with ST(0) and pops register stack twice.
Compare, Pop and Pop
FPU Compare and set FCOMI ST,ST(i) Compares ST(0) with ST(i) and sets status flags accordingly. CASE OF
EFLAGS ST(0) > ST(i): ZF, PF, CF ← 000B;
FPU Compare, set FCOMIP ST,ST(i) Compares ST(0) with ST(i), sets status flags accordingly and pops register stack. ST(0) < ST(i): ZF, PF, CF ← 001B;
EFLAGS and Pop ST(0) = ST(i): ZF, PF, CF ← 100B;
FPU Unordered FUCOMI ST,ST(i) Compares ST(0) with ST(i), checks for ordered values, and sets status flags
Compare and set accordingly. Unordeded: ZF, PF, CF ← 111B; // optianally
EFLAGS END
FPU Unordered FUCOMIP ST,ST(i) Compares ST(0) with ST(i), checks for ordered values, sets status flags accordingly FPU.Pop; // optionally
Compare and set and pops register stack.
EFLAGS
FPU Integer Compare FICOM m16int Compares ST(0) with integer value in memory and sets condition code flags C0, C2 CASE OF
m32int and C3 in the FPU status word according to the results. ST(0) > ST(i): FPU.C3, .C2, .C0 ← 000B;
ST(0) < ST(i): FPU.C3, .C2, .C0 ← 001B;
ST(0) = ST(i): FPU.C3, .C2, .C0 ← 100B;
FPU Integer Compare FICOMP m16int Compares ST(0) with integer value in memory, sets condition code flags C0, C2 and
Unordeded: FPU.C3, .C2, .C0 ← 111B;
and Pop m32int C3 in the FPU status word according to the results and pops register stack.
END
FPU.Pop; // optionally
FPU Test FTST Compares ST(0) with 0.0 and sets condition flags C0, C2 and C3 in the FPU status CASE OF
word according to the results. ST(0) > 0.0: FPU.C3, .C2, .C0 ← 000B;
ST(0) < 0.0: FPU.C3, .C2, .C0 ← 001B;
ST(0) = 0.0: FPU.C3, .C2, .C0 ← 100B;
Unordeded: FPU.C3, .C2, .C0 ← 111B;
END

FPU Examine FXAM Examines the contents of the ST(0) register and sets the condition flags C0, C2 and CASE ST(0) OF
C3 in the FPU status word according to the results. NaN: FPU.C3, .C2, .C0 ← 001B;
normal: FPU.C3, .C2, .C0 ← 010B;
infinity: FPU.C3, .C2, .C0 ← 011B;
zero: FPU.C3, .C2, .C0 ← 100B;
empty: FPU.C3, .C2, .C0 ← 101B;
denormal: FPU.C3, .C2, .C0 ← 110B;
END
FPU.C1 ← Sign(ST(0))

Transcendental Instructions (Trigonometric and Logarithmic Operations)


Instruction Mnemonic Operands Description Symbolic operations
FPU Sine FSIN Replaces ST(0) with its sine. ST(0) ← Sine(ST(0))
FPU Cosine FCOS Replaces ST(0) with its cosine. ST(0) ← Cosine(ST(0))
FPU Sine and Cosine FSINCOS Replaces ST(0) with its sine and pushes its cosine onto the register stack. ST(0) ← Sine(ST(0));
TMP ← Cosine(ST(0));
FPU.Top ← (FPU.Top  1) MOD 8;
ST(0) ← TMP
FPU Partial Tangent FPTAN Replaces ST(0) with its tangent and pushes 1 onto the register stack. ST(0) ← Tangent (ST(0));
FPU.Top ← (FPU.Top  1) MOD 8;
ST(0) ←1;
FPU Partial Arctangent FPATAN Replaces ST(1) with arc tangent (ST(1)/ST(0)) and pops the register stack. ST(1) ← ArcTangent (ST(1)/ST(0)); FPU.Pop
FPU 2x1 F2XM1 Replaces ST(0) with (2ST(0)1). ST(0) ← Base2Power(ST(0))1)
FPU y*log2(x) FYL2X Replaces ST(1) with ST(1)*log2ST(0) and pops the register stack. ST(1) ← ST(1) * Log2(ST(0));
FPU.Pop;
FPU y*log2(x+1) FYL2XP1 Replaces ST(1) with ST(1)*log2(ST(0)+1) and pops the register stack. ST(1) ← ST(1) * Log2(ST(0)+1);
FPU.Pop;
Floating-Point Constant Loading Instructions
Instruction Mnemonic Operands Description Symbolic operations
FPU Load One FLD1 Pushes +1.0 onto the FPU stack. FPU.Top ← FPU.Top  1 MOD 8;
FPU Load Zero FLDZ Pushes +0.0 onto the FPU stack. ST(0) ← Constant
FPU Load Pi FLDPI Pushes  onto the FPU stack.
FPU Load Base 2 Log of Ten FLD2T Pushes log210 onto the FPU stack.
FPU Load Base 2 Log of E FLD2E Pushes log2e onto the FPU stack.
FPU Load Base 10 Log of 2 FLDLG2 Pushes log102 onto the FPU stack.
FPU Load Base E Log of 2 FLDLN2 Pushes ln2 onto the FPU stack.

X87 FPU Control Instructions


Instruction Mnemonic Operands Description Symbolic operations
FPU Initialize FINIT Initializes FPU after checking for pending unmasked floating-point FPU.ControlWord ← 037FH;
exceptions. FPU.StatusWord ← 0;
FPU.TagWord ← FFFFH;
FNINIT Initializes FPU without checking for pending unmasked floating-point FPU.DataPointer ← 0;
exceptions. FPU.InstructionPointer ← 0;
FPU.LastInstructionOpcode ← 0;
FPU Clear Exceptions FCLEX Clears the floating point exceptions flags after checking for pending FPU.StatusWord[0..7] ← 0;
unmasked floating-point exceptions. FPU.StatusWord[15] ← 0
FNCLEX Clears the floating point exceptions flags without checking for pending
unmasked floating-point exceptions.
FPU Decrement Stack- FDECSTP Decrements Top field in FPU status word. The effect is to rotate the stack FPU.Top ← (FPU.Top  1) MOD 8;
Top Pointer by one position.
FPU Increment Stack- FINCSTP Increments Top field in FPU status word. The effect is to rotate the stack by FPU.Top ← (FPU.Top + 1) MOD 8;
Top Pointer one position. Not equivalent to popping the stack, because the tag for the
previous top-of-stack register is not marked empty.
FPU Free floating-point FFREE ST(i) Sets tag for ST(i) to empty FPU.Tag(i) ← 11B;
register
FPU Store Control FSTCW m2byte Stores FPU control word to memory after checking for pending unmasked DST ← FPU.ControlWord;
Word floating point exceptions.
FNSTCW Stores FPU control word to memory without checking for pending
unmasked floating point exceptions.
FPU Load Control FLDCW m2byte Loads FPU control word from memory. FPU.ControlWord ← SRC;
Word
FPU Store Status Word FSTSW m2byte Stores FPU status word to memory or AX register after checking for DST ← FPU.StatusWord;
AX pending unmasked floating point exceptions.
FNSTSW Stores FPU status word to memory or AX register without checking for
pending unmasked floating point exceptions.
FPU Load Environment FLDENV m14/28byte Loads FPU environment from memory FPU.ControlWord ← SRC.ControlWord;
FPU.StatusWord ← SRC.StatusWord;
FPU.TagWord ← SRC.TagWord;
FPU.DataPointer ← SRC.DataPointer;
FPU.InstructionPointer ← SRC.InstructionPointer;
FPU.LastInstructionOpcode ← SRC.LastInstructionOpcode;
FPU Save state FSAVE m94/108byte Stores FPU state to memory after checking for pending unmasked floating- DST.ControlWord ← FPU.ControlWord;
point exceptions. Then reinitializes the FPU. DST.StatusWord ← FPU.StatusWord;
DST.TagWord ← FPU.TagWord;
DST.DataPointer ← FPU.DataPointer;
DST.InstructionPointer ← FPU.InstructionPointer;
DST.LastInstructionOpcode ← FPU.LastInstructionOpcode;
DST.ST(0) ← FPU.ST(0);
DST.ST(1) ← FPU.ST(1);
DST.ST(2) ← FPU.ST(2);
DST.ST(3) ← FPU.ST(3);
FNSAVE Stores FPU state to memory without checking for pending unmasked DST.ST(4) ← FPU.ST(4);
floating-point exceptions. Then reinitializes the FPU. DST.ST(5) ← FPU.ST(5);
DST.ST(6) ← FPU.ST(6);
DST.ST(7) ← FPU.ST(7);
FPU.ControlWord ← 037FH;
FPU.StatusWord ← 0;
FPU.TagWord ← FFFFH;
FPU.DataPointer ← 0;
FPU.InstructionPointer ← 0;
FPU.LastInstructionOpcode ← 0;
FPU Restore state FRSTOR m94/108byte Loads FPU state from memory FPU.ControlWord ← SRC.ControlWord;
FPU.StatusWord ← SRC.StatusWord;
FPU.TagWord ← SRC.TagWord;
FPU.DataPointer ← SRC.DataPointer;
FPU.InstructionPointer ← SRC.InstructionPointer;
FPU.LastInstructionOpcode ← SRC.LastInstructionOpcode;
FPU.ST(0) ← SRC.ST(0);
FPU.ST(1) ← SRC.ST(1);
FPU.ST(2) ← SRC.ST(2);
FPU.ST(3) ← SRC.ST(3);
FPU.ST(4) ← SRC.ST(4);
FPU.ST(5) ← SRC.ST(5);
FPU.ST(6) ← SRC.ST(6);
FPU.ST(7) ← SRC.ST(7);
Wait for FPU FWAIT Checks pending unmasked floating-point exceptions.
WAIT
FPU No Operation FNOP No operation is performed

X87 FPU and SIMD State Management Instructions


Instruction Mnemonic Operands Description Symbolic operations
FXSAVE Save x87 FPU, MMX, SSE and SSE2 State m512byte Save x87 FPU and SIMD state in memory. WORD PTR DST[0] ← FPU.CW;
WORD PTR DST[2] ← FPU.SW;
BYTE PTR DST[5] ← FPU.TW;
WORD PTR DST[6] ← FPU.OP;
DWORD PTR DST[8] ← FPU.IP;
WORD PTR DST[12] ← CS;
DWORD PTR DST[16] ← FPU.DP;
WORD PTR DST[20] ← DS;
DWORD PTR DST[24] ← MXCSR;
DWORD PTR DST[28] ← MXCSR_MASK;
TWORD PTR DST[32] ← ST0/MM0;
TWORD PTR DST[48] ← ST1/MM1;
TWORD PTR DST[64] ← ST2/MM2;
TWORD PTR DST[80] ← ST3/MM3;
TWORD PTR DST[96] ← ST4/MM4;
TWORD PTR DST[112] ← ST5/MM5;
TWORD PTR DST[128] ← ST6/MM6;
TWORD PTR DST[144] ← ST7/MM7;
DQWORD PTR DST[160] ← XMM0;
DQWORD PTR DST[176] ← XMM1;
DQWORD PTR DST[192] ← XMM2;
DQWORD PTR DST[208] ← XMM3;
DQWORD PTR DST[224] ← XMM4;
DQWORD PTR DST[240] ← XMM5;
DQWORD PTR DST[256] ← XMM6;
DQWORD PTR DST[272] ← XMM7;
FXRSTOR Restore x87 FPU, MMX, SSE and SSE2 State m512byte Restores x87 FPU and SIMD state from memory FPU.CW ← WORD PTR DST[0];
FPU.SW ← WORD PTR DST[2];
FPU.TW ← BYTE PTR DST[5];
FPU.OP ← WORD PTR DST[6];
FPU.IP ← DWORD PTR DST[8];
CS ← WORD PTR DST[12];
FPU.DP ← DWORD PTR DST[16];
DS ← WORD PTR DST[20];
MXCSR ← DWORD PTR DST[24];
MXCSR_MASK ← DWORD PTR DST[28];
ST0/MM0 ← TWORD PTR DST[32];
ST0/MM0 ← TWORD PTR DST[48];
ST0/MM0 ← TWORD PTR DST[64];
ST0/MM0 ← TWORD PTR DST[80];
ST0/MM0 ← TWORD PTR DST[96];
ST0/MM0 ← TWORD PTR DST[112];
ST0/MM0 ← TWORD PTR DST[128];
ST0/MM0 ← TWORD PTR DST[144];
XMM0 ← DQWORD PTR DST[160];
XMM1 ← DQWORD PTR DST[176];
XMM2 ← DQWORD PTR DST[192];
XMM3 ← DQWORD PTR DST[208];
XMM4 ← DQWORD PTR DST[224];
XMM5 ← DQWORD PTR DST[240];
XMM6 ← DQWORD PTR DST[256];
XMM7 ← DQWORD PTR DST[272];
MMX/SSE Instruction set

MMX/SSE Data transfer instructions


Instruction Mnemonic Operands Description Symbolic operations
Move Dword MOVD mm, r/m32 Move double word from r/m32 to mm. DST[31..0] ← SRC; DST[63..32] ← 0
r/m32, mm Move double word from mm to r/m32. DST ← SRC[31..0]
xmm, r/m32 Move double word from r/m32 to xmm. DST[31..0] ← SRC; DST[127..32] ← 0
r/m32, xmm Move double word from xmm to r/m32. DST ← SRC[31..0]
Move Qword MOVQ mm, r/m64 Move quad word from r/m64 to mm. DST ← SRC
m64, mm Move quad word from mm to m64. DST ← SRC
xmm, m64 Move quad word from m64 to xmm. DST[63..0] ← SRC; DST[127..64] ← 0
m64, xmm Move quad word from xmm to m64. DST← SRC [63..0]

MMX/SSE Conversion instructions


Instruction Mnemonic Operands Description Symbolic operations
Pack Signed Saturated PACKSSWB mm1, mm2/m64 Converts 4 packed signed word integers from mm1 and DST[7..0] ← SaturateSignedWordToSignedByte(DST[15..0]);
Words to Bytes from mm2/m64 into 8 packed signed byte integers in DST[15..8] ← SaturateSignedWordToSignedByte(DST[31..16]);
mm1 using signed saturation. DST[23..16] ← SaturateSignedWordToSignedByte(DST[47..32]);
SRC
63
H G F E
DST[31..24] ← SaturateSignedWordToSignedByte(DST[63..48]);
DST[39..32] ← SaturateSignedWordToSignedByte(SRC[15..0]);
63
DST[47..40] ← SaturateSignedWordToSignedByte(SRC[31..16]);
DST H’ G’ F’ E’ D’ C’ B’ A’ DST[55..48] ← SaturateSignedWordToSignedByte(SRC[47..32]);
DST[63..56] ← SaturateSignedWordToSignedByte(SRC[63..48]);
63
DST D C B A

xmm1, xmm2/m128 Converts 8 packed signed word integers from mm1 and DST[7..0] ← SaturateSignedWordToSignedByte(DST[15..0]);
from mm2/m128 into 16 packed signed byte integers in DST[15..8] ← SaturateSignedWordToSignedByte(DST[31..16]);
mm1 using signed saturation. DST[23..16] ← SaturateSignedWordToSignedByte(DST[47..32]);
127
P O N M L K J I
DST[31..24] ← SaturateSignedWordToSignedByte(DST[63..48]);
SRC
DST[39..32] ← SaturateSignedWordToSignedByte(DST[79..64]);
127
DST[47..40] ← SaturateSignedWordToSignedByte(DST[95..80]);
DST P’ O’ N’ M’ L’ K’ J’ I’ H’ G’ F’ E’ D’ C’ B’ A’ DST[55..48] ← SaturateSignedWordToSignedByte(DST[111..96]);
DST[63..56] ← SaturateSignedWordToSignedByte(DST[127..112]);
127
DST H G F E D C B A DST[71..64] ← SaturateSignedWordToSignedByte(SRC[15..0]);
DST[79..72] ← SaturateSignedWordToSignedByte(SRC[31..16]);
DST[87..80] ← SaturateSignedWordToSignedByte(SRC[47..32]);
DST[95..88] ← SaturateSignedWordToSignedByte(SRC[63..48]);
DST[103..96] ← SaturateSignedWordToSignedByte(SRC[79..64]);
DST[111..104] ← SaturateSignedWordToSignedByte(SRC[95..80]);
DST[119..112] ← SaturateSignedWordToSignedByte(SRC[111..96]);
DST[127..120] ← SaturateSignedWordToSignedByte(SRC[127..112]);
Pack Signed Saturated PACKSSDW mm1, mm2/m64 Converts 2 packed signed dword integers from mm1 and DST[15..0] ← SaturateSignedDwordToSignedWord(DST[31..0]);
Dwords to Words from mm2/m64 into 4 packed signed word integers in DST[32..16] ← SaturateSignedDwordToSignedWord(DST[63..32]);
mm1 using signed saturation. DST[47..32] ← SaturateSignedDwordToSignedWord(SRC[31..0]);
SRC
63
D C
DST[63..48] ← SaturateSignedDwordToSignedWord(SRC[63..32]);

63
DST D’ C’ B’ A’

63
DST B A

xmm1, xmm2/m128 Converts 4 packed signed dword integers from mm1 and DST[15..0] ← SaturateSignedDwordToSignedWord(DST[31..0]);
from mm2/m128 into 8 packed signed word integers in DST[32..16] ← SaturateSignedDwordToSignedWord(DST[63..32]);
mm1 using signed saturation. DST[47..32] ← SaturateSignedDwordToSignedWord(DST[95..64]);
SRC
127
H G F E
DST[63..48] ← SaturateSignedDwordToSignedWord(DST[127..96]);
DST[79..64] ←SaturateSignedDwordToSignedWord(SRC[31..0]);
127 DST[95..80] ← SaturateSignedDwordToSignedWord(SRC[63..32]);
DST G’ G’ F’ E’ D’ C’ B’ A’
DST[111..96] ← SaturateSignedDwordToSignedWord(SRC[95..64]);
DST[127..112] ← SaturateSignedDwordToSignedWord(SRC[127..96]);
127
DST D C B A

Pack Unsigned Saturated PACKUSWB mm1, mm2/m64 Converts 4 packed signed word integers from mm1 and DST[7..0] ← SaturateSignedWordToUnsignedByte(DST[15..0]);
Words to Bytes from mm2/m64 into 8 packed unsigned byte integers in DST[15..8] ← SaturateSignedWordToUnsignedByte(DST[31..16]);
mm1 using unsigned saturation. DST[23..16] ← SaturateSignedWordToUnsignedByte(DST[47..32]);
63
H G F E
DST[31..24] ← SaturateSignedWordToUnsignedByte(DST[63..48]);
SRC
DST[39..32] ← SaturateSignedWordToUnsignedByte(SRC[15..0]);
63
DST[47..40] ← SaturateSignedWordToUnsignedByte(SRC[31..16]);
DST H’ G’ F’ E’ D’ C’ B’ A’ DST[55..48] ← SaturateSignedWordToUnsignedByte(SRC[47..32]);
DST[63..56] ← SaturateSignedWordToUnsignedByte(SRC[63..48]);
63
DST D C B A

zmm1, xmm2/m128 Converts 8 packed signed word integers from xmm1 and DST[7..0] ← SaturateSignedWordToUnsignedByte(DST[15..0]);
from xmm2/m128 into 16 packed unsigned byte integers DST[15..8] ← SaturateSignedWordToUnsignedByte(DST[31..16]);
in xmm1 using unsigned saturation. DST[23..16] ← SaturateSignedWordToUnsignedByte(DST[47..32]);
SRC
127
P O N M L K J I
DST[31..24] ← SaturateSignedWordToUnsignedByte(DST[63..48]);
DST[39..32] ← SaturateSignedWordToUnsignedByte(DST[79..64]);
127
DST[47..40] ← SaturateSignedWordToUnsignedByte(DST[95..80]);
DST P’ O’ N’ M’ L’ K’ J’ I’ H’ G’ F’ E’ D’ C’ B’ A’ DST[55..48] ← SaturateSignedWordToUnsignedByte(DST[111..96]);
DST[63..56] ← SaturateSignedWordToUnsignedByte(DST[127..112]);
127
DST H G F E D C B A DST[71..64] ← SaturateSignedWordToUnsignedByte(SRC[15..0]);
DST[79..72] ← SaturateSignedWordToUnsignedByte(SRC[31..16]);
DST[87..80] ← SaturateSignedWordToUnsignedByte(SRC[47..32]);
DST[95..88] ← SaturateSignedWordToUnsignedByte(SRC[63..48]);
DST[103..96] ← SaturateSignedWordToUnsignedByte(SRC[79..64]);
DST[111..104] ← SaturateSignedWordToUnsignedByte(SRC[95..80]);
DST[119..112] ← SaturateSignedWordToUnsignedByte(SRC[111..96]);
DST[127..120] ← SaturateSignedWordToUnsignedByte(SRC[127..112]);
Unpack interleaving Low- PUNPCKLBW mm1, mm2/m64 Unpacks and interleaves 4 low-order bytes from mm1 and DST[7..0] ← DST[7..0];
order Bytes to Words 4 low-order bytes from mm2/m64 into 4 words in mm1. DST[15..8] ← SRC[7..0];
63 DST[23..16] ← DST[15..8];
SRC D’’ C’’ B’’ A’’
DST[31..24] ← SRC[15..8];
63
DST[39..32] ← DST[23..16];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[47..40] ← SRC[23..16];
DST[55..48] ← DST[31..24];
DST
63
D’ C’ B’ A’ DST[63..56] ← SRC[31..24];

xmm1, xmm2/m128 Unpacks and interleaves 8 low-order bytes from xmm1 DST[7..0] ← DST[7..0];
and 8 low-order bytes from xmm2/m128 into 8 words in DST[15..8] ← SRC[7..0];
xmm1. DST[23..16] ← DST[15..8];
SRC
127
H’’ G’’ F’’ E’’ D’’ C’’ B’’ A’’
DST[31..24] ← SRC[15..8];
DST[39..32] ← DST[23..16];
127
DST[47..40] ← SRC[23..16];
DST H’’ H’ G’’ G’ F’’ F’ E’’ E’ D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[55..48] ← DST[31..24];
DST[63..56] ← SRC[31..24];
127
DST H’ G’ F’ E’ D’ C’ B’ A’ DST[71..64] ← DST[39..32];
DST[79..72] ← SRC[39..32];
DST[87..80] ← DST[47..40];
DST[95..88] ← SRC[47..40];
DST[103..96] ← DST[55..48];
DST[111..104] ← SRC[55..48];
DST[119..112] ← DST[63..56];
DST[127..120] ← SRC[63..56];
Unpack interleaving Low- PUNPCKLWD mm1, mm2/m64 Unpacks and interleaves 2 low-order words from mm1 DST[15..0] ← DST[15..0];
order Words to Dwords and 2 low-order words from mm2/m64 into 2 dwords in DST[31..16] ← SRC[15..0];
mm1. DST[47..32] ← DST[31..16];
SRC
63
B’’ A’’
DST[63..48] ← SRC[31..16];

63
DST B’’ B’ A’’ A’

63
DST B’ A’

xmm1, xmm2/m128 Unpacks and interleaves 4 low-order words from xmm1 DST[15..0] ← DST[15..0];
and 4 low-order words from xmm2/m128 into 4 dwords in DST[31..16] ← SRC[15..0];
mm1. DST[47..32] ← DST[31..16];
SRC
127
D’’ C’’ B’’ A’’
DST[63..48] ← SRC[31..16];
DST[79..64] ← DST[47..32];
127
DST[95..80] ← SRC[47..32];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[111..96] ← DST[63..48];
DST[127..112] ← SRC[63..48];
127
DST D’ C’ B’ A’
Unpack interleaving Low- PUNPCKLDQ xmm1, xmm2/m128 Unpacks and interleaves 2 low-order dwords from xmm1 DST[31..0] ← DST[31..0];
order Dwords to Qwords and 2 low-order dwords from xmm2/m128 into 2 qwords DST[63..32] ← SRC[31..0];
in mm1. DST[95..64] ← DST[63..32];
SRC
127
B’’ A’’
DST[127..96] ← SRC[63..32];

127
DST B’’ B’ A’’ A’

127
DST B’ A’

Unpack interleaving Low- PUNPCKLQDQ xmm1, xmm2/m128 Unpacks and interleaves low-order qword from xmm1 DST[63..0] ← DST[63..0];
order Qwords to Qwords and low-order qword from xmm2/m128 into mm1. DST[127..64] ← SRC[63..0];
127
SRC A’’

127
DST A’’ A’

127
DST A’

Unpack interleaving High- PUNPCKHBW mm1, mm2/m64 Unpacks and interleaves 4 high-order bytes from mm1 DST[7..0] ← DST[39..32];
order Bytes to Words and 4 high-order bytes from mm2/m64 into 4 words in DST[15..8] ← SRC[39..32];
mm1. DST[23..16] ← DST[47..40];
63
SRC D’’ C’’ B’’ A’’
DST[31..24] ← SRC[47..40];
DST[39..32] ← DST[55..48];
63
DST[47..40] ← SRC[55..48];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[55..48] ← DST[63..56];
DST[63..56] ← SRC[63..56];
63
DST D’ C’ B’ A’

xmm1, xmm2/m128 Unpacks and interleaves 8 high-order bytes from xmm1 DST[7..0] ← DST[71..64];
and 8 high-order bytes from xmm2/m128 into 8 words in DST[15..8] ← SRC[71..64];
xmm1. DST[23..16] ← DST[79..72];
127 DST[31..24] ← SRC[79..72];
SRC H’’ G’’ F’’ E’’ D’’ C’’ B’’ A’’
DST[39..32] ← DST[87..80];
127
DST[47..40] ← SRC[87..80];
DST H’’ H’ G’’ G’ F’’ F’ E’’ E’ D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[55..48] ← DST[95..88];
DST[63..56] ← SRC[95..88];
127
DST H’ G’ F’ E’ D’ C’ B’ A’ DST[71..64] ← DST[103..96];
DST[79..72] ← SRC[103..96];
DST[87..80] ← DST[111..104];
DST[95..88] ← SRC[111..104];
DST[103..96] ← DST[119..113];
DST[111..104] ← SRC[119..113];
DST[119..112] ← DST[127..120];
DST[127..120] ← SRC[127..120];
Unpack interleaving High- PUNPCKHWD mm1, mm2/m64 Unpacks and interleaves 2 high-order words from mm1 DST[15..0] ← DST[47..32];
order Words to Dwords and 2 high-order words from mm2/m64 into 2 dwords in DST[31..16] ← SRC[47..32];
mm1. DST[47..32] ← DST[63..48];
SRC
63
B’’ A’’
DST[63..48] ← SRC[63..48];

63
DST B’’ B’ A’’ A’

63
DST B’ A’

xmm1, xmm2/m128 Unpacks and interleaves 4 high-order words from xmm1 DST[15..0] ← DST[79..64];
and 4 high-order words from xmm2/m128 into 4 dwords DST[31..16] ← SRC[79..64];
in mm1. DST[47..32] ← DST[95..80];
SRC
127
D’’ C’’ B’’ A’’
DST[63..48] ← SRC[95..80];
DST[79..64] ← DST[111..96];
127
DST[95..80] ← SRC[111..96];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[111..96] ← DST[127..112];
DST[127..112] ← SRC[127..112];
127
DST D’ C’ B’ A’

Unpack interleaving High- PUNPCKHDQ xmm1, xmm2/m128 Unpacks and interleaves 2 high-order dwords from xmm1 DST[31..0] ← DST[95..64];
order Dwords to Qwords and 2 high-order dwords from xmm2/m128 into 2 qwords DST[63..32] ← SRC[95..64];
in mm1. DST[95..64] ← DST[127..96];
127
B’’ A’’
DST[127..96] ← SRC[127..96];
SRC

127
DST B’’ B’ A’’ A’

127
DST B’ A’

Unpack interleaving High- PUNPCKHQDQ xmm1, xmm2/m128 Unpacks and interleaves high-order qword from xmm1 DST[63..0] ← DST[127..64];
order Qwords to Qwords and high-order qword from xmm2/m128 into mm1. DST[127..64] ← SRC[127..64];
127
SRC A’’

127
DST A’’ A’

127
DST A’
MMX/SSE Packed Arithmetic instructions
Instruction Mnemonic Operands Description Symbolic operations
Packed Add Bytes PADDB mm1, mm2/m64 Add 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← DST[7..0] + SRC[7..0]
packed byte integers in mm1. DST[15..8] ← DST[15..8] + SRC[15..8]
63 DST[23..16] ← DST[23..16] + SRC[23..16]
DST
DST[31..24] ← DST[31..24] + SRC[31..24]
63
SRC DST[39..32] ← DST[39..32] + SRC[39..32]
DST[47..40] ← DST[47..40] + SRC[47..40]
+ + + + + + + +
DST[55..48] ← DST[55..48] + SRC[55..48]
DST
63
DST[63..56] ← DST[63..56] + SRC[63..56]

xmm1, xmm2/m128 Add 16 packed byte integers from xmm2/m128 to 16 DST[7..0] ← DST[7..0] + SRC[7..0]
packed byte integers in xmm1. DST[15..8] ← DST[15..8] + SRC[15..8]
127 DST[23..16] ← DST[23..16] + SRC[23..16]
DST
DST[31..24] ← DST[31..24] + SRC[31..24]
127
SRC DST[39..32] ← DST[39..32] + SRC[39..32]
DST[47..40] ← DST[47..40] + SRC[47..40]
+ + + + + + + + + + + + + + + +
DST[55..48] ← DST[55..48] + SRC[55..48]
DST
127
DST[63..56] ← DST[63..56] + SRC[63..56]
DST[71..64] ← DST[71.64] + SRC[71..64]
DST[79..72] ← DST[79..72] + SRC[79..72]
DST[87..80] ← DST[87..80] + SRC[87..80]
DST[95..88] ← DST[95..88] + SRC[95..88]
DST[103..96] ← DST[103..96] + SRC[103..96]
DST[111..104] ← DST[111..104] + SRC[111..104]
DST[119..112] ← DST[119..112] + SRC[119..112]
DST[127..120] ← DST[127..120] + SRC[127..120]
Packed Add Words PADDW mm1, mm2/m64 Add 4 packed word integers from mm2/m64 to 4 DST[15..0] ← DST[15..0] + SRC[15..0]
packed word integers in mm1. DST[31..16] ← DST[31..16] + SRC[31..16]
63 DST[47..32] ← DST[47..32] + SRC[47..32]
DST
DST[63..48] ← DST[63..48] + SRC[63..48]
63
SRC

+ + + +
63
DST

xmm1, xmm2/m128 Add 8 packed word integers from xmm2/m128 to 8 DST[15..0] ← DST[15..0] + SRC[15..0]
packed word integers in xmm1. DST[31..16] ← DST[31..16] + SRC[31..16]
127 DST[47..32] ← DST[47..32] + SRC[47..32]
DST
DST[63..48] ← DST[63..48] + SRC[63..48]
127
SRC DST[79..64] ← DST[79..64] + SRC[79..64]
DST[95..80] ← DST[95..80] + SRC[95..80]
+ + + + + + + +
DST[111..96] ← DST[111..96] + SRC[111..96]
DST
127
DST[127..112] ← DST[127..112] + SRC[127..112]
Packed Add Dwords PADDD mm1, mm2/m64 Add 2 packed double-word integers from mm2/m64 DST[31..0] ← DST[31..0] + SRC[31..0]
to 2 packed double-word integers in mm1. DST[63..32] ← DST[63..32] + SRC[63..32]
63
DST
63
SRC

+ +
63
DST

xmm1, xmm2/m128 Add 4 packed double-word integers from DST[31..0] ← DST[31..0] + SRC[31..0]
xmm2/m128 to 2 packed double-word integers in DST[63..32] ← DST[63..32] + SRC[63..32]
xmm1. DST[95..64] ← DST[95..64] + SRC[95..64]
DST
127 DST[127..96] ← DST[127..96] + SRC[127..96]
127
SRC

+ + + +
127
DST

Packed Add Bytes with PADDSB mm1, mm2/m64 Add 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToSignedByte (DST[7..0] + SRC[7..0])
Saturation packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToSignedByte (DST[15..8] + SRC[15..8])
with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] + SRC[23..16])
DST
63 DST[31..24] ← SaturateToSignedByte (DST[31..24] + SRC[31..24])
DST[39..32] ← SaturateToSignedByte (DST[39..32] + SRC[39..32])
63
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + +
DST[63..56] ← SaturateToSignedByte (DST[63..56] + SRC[63..56])
63
DST

xmm1, xmm2/m128 Add 16 packed byte integers from xmm2/m128 to 16 DST[7..0] ← SaturateToSignedByte (DST[7..0] + SRC[7..0])
packed byte integers in xmm1. Overflow is handled DST[15..8] ← SaturateToSignedByte (DST[15..8] + SRC[15..8])
with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] + SRC[23..16])
127 DST[31..24] ← SaturateToSignedByte (DST[31..24] + SRC[31..24])
DST
DST[39..32] ← SaturateToSignedByte (DST[39..32] + SRC[39..32])
127
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + + + + + + + + + +
DST[63..56] ← SaturateToSignedByte (DST[63..56] + SRC[63..56])
127
DST DST[71..64] ← SaturateToSignedByte (DST[71.64] + SRC[71..64])
DST[79..72] ← SaturateToSignedByte (DST[79..72] + SRC[79..72])
DST[87..80] ← SaturateToSignedByte (DST[87..80] + SRC[87..80])
DST[95..88] ← SaturateToSignedByte (DST[95..88] + SRC[95..88])
DST[103..96] ← SaturateToSignedByte (DST[103..96] + SRC[103..96])
DST[111..104] ← SaturateToSignedByte (DST[111..104] + SRC[111..104])
DST[119..112] ← SaturateToSignedByte (DST[119..112] + SRC[119..112])
DST[127..120] ← SaturateToSignedByte (DST[127..120] + SRC[127..120])
Packed Add Words with PADDSW mm1, mm2/m64 Add 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToSignedWord (DST[15..0] + SRC[15..0])
Saturation packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToSignedWord (DST[31..16] + SRC[31..16])
with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] + SRC[47..32])
DST
63 DST[63..48] ← SaturateToSignedWord (DST[63..48] + SRC[63..48])
63
SRC

+ + + +
63
DST

xmm1, xmm2/m128 Add 8 packed word integers from xmm2/m128 to 8 DST[15..0] ← SaturateToSignedWord (DST[15..0] + SRC[15..0])
packed word integers in xmm1. Overflow is handled DST[31..16] ← SaturateToSignedWord (DST[31..16] + SRC[31..16])
with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] + SRC[47..32])
127
DST
DST[63..48] ← SaturateToSignedWord (DST[63..48] + SRC[63..48])
DST[79..64] ← SaturateToSignedWord (DST[79..64] + SRC[79..64])
127
SRC DST[95..80] ← SaturateToSignedWord (DST[95..80] + SRC[95..80])
DST[111..96] ← SaturateToSignedWord (DST[111..96] + SRC[111..96])
+ + + + + + + +
DST[127..112] ← SaturateToSignedWord (DST[127..112] + SRC[127..112])
127
DST

Packed Add Bytes with PADDUSB mm1, mm2/m64 Add 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToUnsignedByte (DST[7..0] + SRC[7..0])
Unsigned Saturation packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToUnsignedByte (DST[15..8] + SRC[15..8])
with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] + SRC[23..16])
63 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] + SRC[31..24])
DST
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] + SRC[39..32])
63
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + +
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] + SRC[63..56])
63
DST

xmm1, xmm2/m128 Add 16 packed byte integers from xmm2/m128 to 16 DST[7..0] ← SaturateToUnsignedByte (DST[7..0] + SRC[7..0])
packed byte integers in xmm1. Overflow is handled DST[15..8] ← SaturateToUnsignedByte (DST[15..8] + SRC[15..8])
with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] + SRC[23..16])
DST
127 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] + SRC[31..24])
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] + SRC[39..32])
127
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + + + + + + + + + +
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] + SRC[63..56])
127
DST DST[71..64] ← SaturateToUnsignedByte (DST[71.64] + SRC[71..64])
DST[79..72] ← SaturateToUnsignedByte (DST[79..72] + SRC[79..72])
DST[87..80] ← SaturateToUnsignedByte (DST[87..80] + SRC[87..80])
DST[95..88] ← SaturateToUnsignedByte (DST[95..88] + SRC[95..88])
DST[103..96] ← SaturateToUnsignedByte (DST[103..96] + SRC[103..96])
DST[111..104] ← SaturateToUnsignedByte (DST[111..104] + SRC[111..104])
DST[119..112] ← SaturateToUnsignedByte (DST[119..112] + SRC[119..112])
DST[127..120] ← SaturateToUnsignedByte (DST[127..120] + SRC[127..120])
Packed Add Words with PADDUSW mm1, mm2/m64 Add 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToUnsignedWord (DST[15..0] + SRC[15..0])
Unsigned Saturation packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToUnsignedWord (DST[31..16] + SRC[31..16])
with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] + SRC[47..32])
DST
63 DST[63..48] ← SaturateToUnsignedWord (DST[63..48] + SRC[63..48])
63
SRC

+ + + +
63
DST

xmm1, xmm2/m128 Add 8 packed word integers from xmm2/m128 to 8 DST[15..0] ← SaturateToUnsignedWord (DST[15..0] + SRC[15..0])
packed word integers in xmm1. Overflow is handled DST[31..16] ← SaturateToUnsignedWord (DST[31..16] + SRC[31..16])
with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] + SRC[47..32])
127
DST
DST[63..48] ← SaturateToUnsignedWord (DST[63..48] + SRC[63..48])
DST[79..64] ← SaturateToUnsignedWord (DST[79..64] + SRC[79..64])
127
SRC DST[95..80] ← SaturateToUnsignedWord (DST[95..80] + SRC[95..80])
DST[111..96] ← SaturateToUnsignedWord (DST[111..96] + SRC[111..96])
+ + + + + + + +
DST[127..112] ← SaturateToUnsignedWord (DST[127..112] + SRC[127..112])
127
DST

Packed Subtract Bytes PSUBB mm1, mm2/m64 Subtract 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← DST[7..0] - SRC[7..0]
packed byte integers in mm1. DST[15..8] ← DST[15..8] - SRC[15..8]
63 DST[23..16] ← DST[23..16] - SRC[23..16]
DST
DST[31..24] ← DST[31..24] - SRC[31..24]
63
SRC DST[39..32] ← DST[39..32] - SRC[39..32]
DST[47..40] ← DST[47..40] - SRC[47..40]
- - - - - - - -
DST[55..48] ← DST[55..48] - SRC[55..48]
DST
63
DST[63..56] ← DST[63..56] - SRC[63..56]

xmm1, xmm2/m128 Subtract 16 packed byte integers from xmm2/m128 to DST[7..0] ← DST[7..0] - SRC[7..0]
16 packed byte integers in xmm1. DST[15..8] ← DST[15..8] - SRC[15..8]
127 DST[23..16] ← DST[23..16] - SRC[23..16]
DST
DST[31..24] ← DST[31..24] - SRC[31..24]
127
SRC DST[39..32] ← DST[39..32] - SRC[39..32]
DST[47..40] ← DST[47..40] - SRC[47..40]
- - - - - - - - - - - - - - - -
DST[55..48] ← DST[55..48] - SRC[55..48]
DST
127
DST[63..56] ← DST[63..56] - SRC[63..56]
DST[71..64] ← DST[71.64] - SRC[71..64]
DST[79..72] ← DST[79..72] - SRC[79..72]
DST[87..80] ← DST[87..80] - SRC[87..80]
DST[95..88] ← DST[95..88] - SRC[95..88]
DST[103..96] ← DST[103..96] - SRC[103..96]
DST[111..104] ← DST[111..104] - SRC[111..104]
DST[119..112] ← DST[119..112] - SRC[119..112]
DST[127..120] ← DST[127..120] - SRC[127..120]
Packed Subtract Words PSUBW mm1, mm2/m64 Subtract 4 packed word integers from mm2/m64 to 4 DST[15..0] ← DST[15..0] - SRC[15..0]
packed word integers in mm1. DST[31..16] ← DST[31..16] - SRC[31..16]
63 DST[47..32] ← DST[47..32] - SRC[47..32]
DST
DST[63..48] ← DST[63..48] - SRC[63..48]
63
SRC

- - - -
63
DST

xmm1, xmm2/m128 Subtract 8 packed word integers from xmm2/m128 to DST[15..0] ← DST[15..0] - SRC[15..0]
8 packed word integers in xmm1. DST[31..16] ← DST[31..16] - SRC[31..16]
127 DST[47..32] ← DST[47..32] - SRC[47..32]
DST
DST[63..48] ← DST[63..48] - SRC[63..48]
127
SRC DST[79..64] ← DST[79..64] - SRC[79..64]
DST[95..80] ← DST[95..80] - SRC[95..80]
- - - - - - - -
DST[111..96] ← DST[111..96] - SRC[111..96]
127
DST
DST[127..112] ← DST[127..112] - SRC[127..112]

Packed Subtract Dwords PSUBD mm1, mm2/m64 Subtract 2 packed double-word integers from DST[31..0] ← DST[31..0] - SRC[31..0]
mm2/m64 to 2 packed double-word integers in mm1. DST[63..32] ← DST[63..32] - SRC[63..32]
63
DST
63
SRC

- -
63
DST

xmm1, xmm2/m128 Subtract 4 packed double-word integers from DST[31..0] ← DST[31..0] - SRC[31..0]
xmm2/m128 to 2 packed double-word integers in DST[63..32] ← DST[63..32] - SRC[63..32]
xmm1. DST[95..64] ← DST[95..64] - SRC[95..64]
127
DST
DST[127..96] ← DST[127..96] - SRC[127..96]
127
SRC

- - - -
127
DST

Packed Subtract Bytes PSUBSB mm1, mm2/m64 Subtract 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToSignedByte (DST[7..0] - SRC[7..0])
with Saturation packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToSignedByte (DST[15..8] - SRC[15..8])
with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] - SRC[23..16])
DST
63 DST[31..24] ← SaturateToSignedByte (DST[31..24] - SRC[31..24])
DST[39..32] ← SaturateToSignedByte (DST[39..32] - SRC[39..32])
63
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] - SRC[55..48])
- - - - - - - -
DST[63..56] ← SaturateToSignedByte (DST[63..56] - SRC[63..56])
63
DST
xmm1, xmm2/m128 Subtract 16 packed byte integers from xmm2/m128 to DST[7..0] ← SaturateToSignedByte (DST[7..0] - SRC[7..0])
16 packed byte integers in xmm1. Overflow is DST[15..8] ← SaturateToSignedByte (DST[15..8] - SRC[15..8])
handled with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] - SRC[23..16])
DST
127 DST[31..24] ← SaturateToSignedByte (DST[31..24] - SRC[31..24])
DST[39..32] ← SaturateToSignedByte (DST[39..32] - SRC[39..32])
127
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] - SRC[55..48])
- - - - - - - - - - - - - - - -
DST[63..56] ← SaturateToSignedByte (DST[63..56] - SRC[63..56])
127
DST DST[71..64] ← SaturateToSignedByte (DST[71.64] - SRC[71..64])
DST[79..72] ← SaturateToSignedByte (DST[79..72] - SRC[79..72])
DST[87..80] ← SaturateToSignedByte (DST[87..80] - SRC[87..80])
DST[95..88] ← SaturateToSignedByte (DST[95..88] - SRC[95..88])
DST[103..96] ← SaturateToSignedByte (DST[103..96] - SRC[103..96])
DST[111..104] ← SaturateToSignedByte (DST[111..104] - SRC[111..104])
DST[119..112] ← SaturateToSignedByte (DST[119..112] - SRC[119..112])
DST[127..120] ← SaturateToSignedByte (DST[127..120] - SRC[127..120])
Packed Subtract Words PSUBSW mm1, mm2/m64 Subtract 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToSignedWord (DST[15..0] - SRC[15..0])
with Saturation packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToSignedWord (DST[31..16] - SRC[31..16])
with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] - SRC[47..32])
DST
63 DST[63..48] ← SaturateToSignedWord (DST[63..48] - SRC[63..48])
63
SRC

- - - -
63
DST

xmm1, xmm2/m128 Subtract 8 packed word integers from xmm2/m128 to DST[15..0] ← SaturateToSignedWord (DST[15..0] - SRC[15..0])
8 packed word integers in xmm1. Overflow is DST[31..16] ← SaturateToSignedWord (DST[31..16] - SRC[31..16])
handled with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] - SRC[47..32])
127
DST
DST[63..48] ← SaturateToSignedWord (DST[63..48] - SRC[63..48])
DST[79..64] ← SaturateToSignedWord (DST[79..64] - SRC[79..64])
127
SRC DST[95..80] ← SaturateToSignedWord (DST[95..80] - SRC[95..80])
DST[111..96] ← SaturateToSignedWord (DST[111..96] - SRC[111..96])
- - - - - - - -
DST[127..112] ← SaturateToSignedWord (DST[127..112] - SRC[127..112])
127
DST

Packed Subtract Bytes PSUBUSB mm1, mm2/m64 Subtract 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToUnsignedByte (DST[7..0] - SRC[7..0])
with Unsigned packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToUnsignedByte (DST[15..8] - SRC[15..8])
Saturation with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] - SRC[23..16])
63 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] - SRC[31..24])
DST
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] - SRC[39..32])
63
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] - SRC[55..48])
- - - - - - - -
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] - SRC[63..56])
63
DST
xmm1, xmm2/m128 Subtract 16 packed byte integers from xmm2/m128 to DST[7..0] ← SaturateToUnsignedByte (DST[7..0] - SRC[7..0])
16 packed byte integers in xmm1. Overflow is DST[15..8] ← SaturateToUnsignedByte (DST[15..8] - SRC[15..8])
handled with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] - SRC[23..16])
DST
127 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] - SRC[31..24])
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] - SRC[39..32])
127
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] - SRC[55..48])
- - - - - - - - - - - - - - - -
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] - SRC[63..56])
127
DST DST[71..64] ← SaturateToUnsignedByte (DST[71.64] - SRC[71..64])
DST[79..72] ← SaturateToUnsignedByte (DST[79..72] - SRC[79..72])
DST[87..80] ← SaturateToUnsignedByte (DST[87..80] - SRC[87..80])
DST[95..88] ← SaturateToUnsignedByte (DST[95..88] - SRC[95..88])
DST[103..96] ← SaturateToUnsignedByte (DST[103..96] - SRC[103..96])
DST[111..104] ← SaturateToUnsignedByte (DST[111..104] - SRC[111..104])
DST[119..112] ← SaturateToUnsignedByte (DST[119..112] - SRC[119..112])
DST[127..120] ← SaturateToUnsignedByte (DST[127..120] - SRC[127..120])
Packed Subtract Words PSUBUSW mm1, mm2/m64 Subtract 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToUnsignedWord (DST[15..0] - SRC[15..0])
with Unsigned packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToUnsignedWord (DST[31..16] - SRC[31..16])
Saturation with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] - SRC[47..32])
DST
63 DST[63..48] ← SaturateToUnsignedWord (DST[63..48] - SRC[63..48])
63
SRC

- - - -
63
DST

xmm1, xmm2/m128 Subtract 8 packed word integers from xmm2/m128 to DST[15..0] ← SaturateToUnsignedWord (DST[15..0] - SRC[15..0])
8 packed word integers in xmm1. Overflow is DST[31..16] ← SaturateToUnsignedWord (DST[31..16] - SRC[31..16])
handled with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] - SRC[47..32])
DST
127 DST[63..48] ← SaturateToUnsignedWord (DST[63..48] - SRC[63..48])
DST[79..64] ← SaturateToUnsignedWord (DST[79..64] - SRC[79..64])
127
SRC DST[95..80] ← SaturateToUnsignedWord (DST[95..80] - SRC[95..80])
DST[111..96] ← SaturateToUnsignedWord (DST[111..96] - SRC[111..96])
- - - - - - - -
DST[127..112] ← SaturateToUnsignedWord (DST[127..112] - SRC[127..112])
127
DST
Packed Multiply, Low PMULLW mm1, mm2/m64 Multiply 4 packed word integers from mm2/m64 by 4 TMP0[31..0] ← DST[15..0] * SRC[15..0]
Word packed word integers from mm1. Store low-order TMP1[31..0] ← DST[31..16] * SRC[31..16]
result words in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[15..0] ← TMP[015..0]
63
SRC DST[31..16] ← TMP1[15..0]
DST[47..32] ← TMP2[15..0]
   
DST[63..48] ← TMP3[15..0]
31
TMP0
31
TMP1
31
TMP2

31
TMP3

63
DST

xmm1, xmm2/m128 Multiply 8 packed word integers from xmm2/m128 TMP0[31..0] ← DST[15..0] * SRC[15..0]
by 8 packed word integers from xmm1. Store low- TMP1[31..0] ← DST[31..16] * SRC[31..16]
order result words in xmm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
       
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31 31
TMP4 TMP0 DST[15..0] ← TMP0[15..0]
31 31 DST[31..16] ← TMP1[15..0]
TMP5 TMP1 DST[47..32] ← TMP2[15..0]
31 31 DST[63..48] ← TMP3[15..0]
TMP6 TMP2
DST[79..64] ← TMP4[15..0]
31 31 DST[95..80] ← TMP5[15..0]
TMP7 TMP3
DST[111..96] ← TMP6[15..0]
DST
127
DST[127..112] ← TMP7[15..0]
Packed Multiply, High PMULHW mm1, mm2/m64 Multiply 4 packed word integers from mm2/m64 by 4 TMP0[31..0] ← DST[15..0] * SRC[15..0]
Word packed word integers from mm1. Store high-order TMP1[31..0] ← DST[31..16] * SRC[31..16]
result words in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[15..0] ← TMP0[31..16]
63
SRC DST[31..16] ← TMP1[31..16]
DST[47..32] ← TMP2[31..16]
   
DST[63..48] ← TMP3[31..16]
31
TMP3
31
TMP2
31
TMP1

31
TMP0

63
DST

xmm1, xmm2/m128 Multiply 8 packed word integers from xmm2/m128 TMP0[31..0] ← DST[15..0] * SRC[15..0]
by 8 packed word integers from xmm1. Store high- TMP1[31..0] ← DST[31..16] * SRC[31..16]
order result words in xmm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
       
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31 31
TMP7 TMP3 DST[15..0] ← TMP0[31..16]
31 31 DST[31..16] ← TMP1[31..16]
TMP6 TMP2 DST[47..32] ← TMP2[31..16]
31 31 DST[63..48] ← TMP3[31..16]
TMP5 TMP1
DST[79..64] ← TMP4[31..16]
31 31 DST[95..80] ← TMP5[31..16]
TMP4 TMP0
DST[111..96] ← TMP6[31..16]
DST
127
DST[127..112] ← TMP7[31..16]
Packed Multiply PMULUHW mm1, mm2/m64 Multiply 4 packed word integers from mm2/m64 by 4 TMP0[31..0] ← DST[15..0] * SRC[15..0] // unsigned multiplication
Unsigned, High Word packed word integers from mm1. Treat integers as TMP1[31..0] ← DST[31..16] * SRC[31..16]
unsigned. Store high-order result words in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[15..0] ← TMP0[31..16]
63
SRC DST[31..16] ← TMP1[31..16]
DST[47..32] ← TMP2[31..16]
   
DST[63..48] ← TMP3[31..16]
31
TMP3
31
TMP2
31
TMP1

31
TMP0

63
DST

xmm1, xmm2/m128 Multiply 8 packed word integers from xmm2/m128 TMP0[31..0] ← DST[15..0] * SRC[15..0] // unsigned multiplication
by 8 packed word integers from xmm1. Treat integers TMP1[31..0] ← DST[31..16] * SRC[31..16]
as unsigned. Store high-order result words in xmm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
       
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31 31
TMP7 TMP3 DST[15..0] ← TMP0[31..16]
31 31 DST[31..16] ← TMP1[31..16]
TMP6 TMP2 DST[47..32] ← TMP2[31..16]
31 31 DST[63..48] ← TMP3[31..16]
TMP5 TMP1
DST[79..64] ← TMP4[31..16]
31 31 DST[95..80] ← TMP5[31..16]
TMP4 TMP0
DST[111..96] ← TMP6[31..16]
DST
127
DST[127..112] ← TMP7[31..16]
Packed Multiply Add PMADDWD mm1, mm2/n64 Multiply 4 packed words in mm1 by 4 packed words TMP0[31..0] ← DST[15..0] * SRC[15..0]
Word in mm2/m64, add adjacent double word results and TMP1[31..0] ← DST[31..16] * SRC[31..16]
store in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[31..0] ← TMP0[31..0] + TMP1[31..0]
63
SRC DST[63..32] ← TMP2[31..0] + TMP3[31..0]

   

31
TMP0

31
TMP1

31
TMP2 +

31
TMP3

+
63
DST

xmm1, xmm2/n128 Multiply 8 packed words in xmm1 by 8 packed words TMP0[31..0] ← DST[15..0] * SRC[15..0]
in xmm2/m128, add adjacent double word results and TMP1[31..0] ← DST[31..16] * SRC[31..16]
store in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
       
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31
TMP4
31
TMP0 DST[31..0] ← TMP0[31..0] + TMP1[31..0]
31 31
DST[63..32] ← TMP2[31..0] + TMP3[31..0]
TMP5 TMP1 DST[95..64] ← TMP4[31..0] + TMP5[31..0]
31 31 DST[127..96] ← TMP6[31..0] + TMP7[31..0]
TMP6 + +

31 31
TMP7 TMP3

+ +
127
DST
MMX/SSE Comparison instructions
Instruction Mnemonic Operands Description Symbolic operations
Parallel Compare Bytes PCMPEQB mm1, mm2/m64 Compare 8 packed bytes in mm1 and mm2/m64 for equality. IF DST[7..0] = SRC[7..0] THEN DST[7..0] ← 0FFH
for Equality If a pair of data element is equal, then the corresponding data ELSE DST [7..0] ← 00H
element in the destination operand is set to all 1s; otherwise, it IF DST[15..8] = SRC[15..8] THEN DST[15..8] ← 0FFH
is set to all 0s. No flags in the EFLAGS registers are affected. ELSE DST [15..8] ← 00H
DST
63
IF DST[23..16] = SRC[23..16] THEN DST[23..16] ← 0FFH
63
ELSE DST [23..16] ← 00H
SRC IF DST[31..24] = SRC[31..23] THEN DST[31..24] ← 0FFH
ELSE DST [31..24] ← 00H
= = = = = = = =
IF DST[39..32] = SRC[39..32] THEN DST[39..32] ← 0FFH
63
DST ELSE DST [39..32] ← 00H
IF DST[47..40] = SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] = SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] = SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
xmm1, xmm2/m128 Compare 16 packed bytes in mm1 and mm2/m128 for IF DST[7..0] = SRC[7..0] THEN DST[7..0] ← 0FFH
equality. If a pair of data element is equal, then the ELSE DST [7..0] ← 00H
corresponding data element in the destination operand is set to IF DST[15..8] = SRC[15..8] THEN DST[15..8] ← 0FFH
all 1s; otherwise, it is set to all 0s. No flags in the EFLAGS ELSE DST [15..8] ← 00H
registers are affected.
IF DST[23..16] = SRC[23..16] THEN DST[23..16] ← 0FFH
127
DST ELSE DST [23..16] ← 00H
127
IF DST[31..24] = SRC[31..23] THEN DST[31..24] ← 0FFH
SRC ELSE DST [31..24] ← 00H
IF DST[39..32] = SRC[39..32] THEN DST[39..32] ← 0FFH
= = = = = = = = = = = = = = = =
ELSE DST [39..32] ← 00H
127
DST IF DST[47..40] = SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] = SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] = SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
IF DST[71..64] = SRC[71..63] THEN DST[71..64] ← 0FFH
ELSE DST [71..64] ← 00H
IF DST[79..72] = SRC[79..72] THEN DST[79..72] ← 0FFH
ELSE DST [79..72] ← 00H
IF DST[87..80] = SRC[87..80] THEN DST[87..80] ← 0FFH
ELSE DST [87..80] ← 00H
IF DST[95..88] = SRC[95..88] THEN DST[95..88] ← 0FFH
ELSE DST [95..88] ← 00H
IF DST[103..96] = SRC[103..96] THEN DST[103..96] ← 0FFH
ELSE DST [103..96] ← 00H
IF DST[111..104] = SRC[111..103] THEN DST[111..104] ← 0FFH
ELSE DST [111..104] ← 00H
IF DST[119..112] = SRC[119..112] THEN DST[119..112] ← 0FFH
ELSE DST [119..112] ← 00H
IF DST[127..120] = SRC[127..120] THEN DST[127..120] ← 0FFH
ELSE DST [127..120] ← 00H
Parallel Compare Words PCMPEQW mm1, mm2/m64 Compare 4 packed words in mm1 and mm2/m64 for equality. IF DST[15..0] = SRC[15..0] THEN DST[15..0] ← 0FFH
for Equality If a pair of data element is equal, then the corresponding data ELSE DST [15..0] ← 00H
element in the destination operand is set to all 1s; otherwise, it IF DST[31..16] = SRC[31..16] THEN DST[31..16] ← 0FFH
is set to all 0s. No flags in the EFLAGS registers are affected. ELSE DST [31..16] ← 00H
DST
63
IF DST[47..32] = SRC[47..32] THEN DST[47..32] ← 0FFH
63
ELSE DST [47..32] ← 00H
SRC IF DST[63..48] = SRC[63..48] THEN DST[63..48] ← 0FFH
ELSE DST [63..48] ← 00H
= = = =
63
DST

xmm1, xmm2/m128 Compare 8 packed words in mm1 and mm2/m128 for equality. IF DST[15..0] = SRC[15..0] THEN DST[15..0] ← 0FFH
If a pair of data element is equal, then the corresponding data ELSE DST [15..0] ← 00H
element in the destination operand is set to all 1s; otherwise, it IF DST[31..16] = SRC[31..16] THEN DST[31..16] ← 0FFH
is set to all 0s. No flags in the EFLAGS registers are affected. ELSE DST [31..16] ← 00H
DST
127
IF DST[47..32] = SRC[47..32] THEN DST[47..32] ← 0FFH
127
ELSE DST [47..32] ← 00H
SRC IF DST[63..48] = SRC[63..48] THEN DST[63..48] ← 0FFH
ELSE DST [63..48] ← 00H
= = = = = = = =
IFDST[79..64] = SRC[79..64] THEN DST[79..64] ← 0FFH
127
DST ELSE DST [79..64] ← 00H
IF DST[95..80] = SRC[95..80] THEN DST[95..80] ← 0FFH
ELSE DST [95..80] ← 00H
IF DST[111..96] = SRC[111..96] THEN DST[111..96] ← 0FFH
ELSE DST [111..96] ← 00H
IF DST[127..112] = SRC[127..112] THEN DST[127..112] ← 0FFH
ELSE DST [127..112] ← 00H
Parallel Compare PCMPEQD mm1, mm2/m64 Compare 2 packed double words in mm1 and mm2/m64 for IF DST[31..0] = SRC[31..0] THEN DST[31..0] ← 0FFH
Dwords for Equality equality. If a pair of data element is equal, then the ELSE DST [31..0] ← 00H
corresponding data element in the destination operand is set to IF DST[63..32] = SRC[63..32] THEN DST[63..32] ← 0FFH
all 1s; otherwise, it is set to all 0s. No flags in the EFLAGS ELSE DST [63..32] ← 00H
registers are affected.
63
DST
63
SRC

= =
63
DST
xmm1, xmm2/m128 Compare 4 packed double words in mm1 and mm2/m128 for IF DST[31..0] = SRC[31..0] THEN DST[31..0] ← 0FFH
equality. If a pair of data element is equal, then the ELSE DST [31..0] ← 00H
corresponding data element in the destination operand is set to IF DST[63..32] = SRC[63..32] THEN DST[63..32] ← 0FFH
all 1s; otherwise, it is set to all 0s. No flags in the EFLAGS ELSE DST [63..32] ← 00H
registers are affected. IF DST[95..64] = SRC[95..64] THEN DST[95..64] ← 0FFH
127
DST ELSE DST [95..64] ← 00H
127
IF DST[127..96] = SRC[127..96] THEN DST[127..96] ← 0FFH
SRC ELSE DST [127..96] ← 00H
= = = =
127
DST

Parallel Compare Bytes PCMPGTB mm1, mm2/m64 Compare 8 packed bytes in mm1 and mm2/m64 for greater. IF DST[7..0] > SRC[7..0] THEN DST[7..0] ← 0FFH
for Greater Bytes are treated as signed integers. If a pair of data element is ELSE DST [7..0] ← 00H
greater, then the corresponding data element in the destination IF DST[15..8] > SRC[15..8] THEN DST[15..8] ← 0FFH
operand is set to all 1s; otherwise, it is set to all 0s. No flags in ELSE DST [15..8] ← 00H
the EFLAGS registers are affected. IF DST[23..16] > SRC[23..16] THEN DST[23..16] ← 0FFH
63
DST ELSE DST [23..16] ← 00H
63
IF DST[31..24] > SRC[31..23] THEN DST[31..24] ← 0FFH
SRC ELSE DST [31..24] ← 00H
IF DST[39..32] > SRC[39..32] THEN DST[39..32] ← 0FFH
> > > > > > > >
ELSE DST [39..32] ← 00H
63
DST IF DST[47..40] > SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] > SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] > SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
xmm1, xmm2/m128 Compare 16 packed bytes in mm1 and mm2/m128 for greater. IF DST[7..0] > SRC[7..0] THEN DST[7..0] ← 0FFH
Bytes are treated as signed integers. If a pair of data element is ELSE DST [7..0] ← 00H
greater, then the corresponding data element in the destination IF DST[15..8] > SRC[15..8] THEN DST[15..8] ← 0FFH
operand is set to all 1s; otherwise, it is set to all 0s. No flags in ELSE DST [15..8] ← 00H
the EFLAGS registers are affected. IF DST[23..16] > SRC[23..16] THEN DST[23..16] ← 0FFH
127
DST ELSE DST [23..16] ← 00H
127
IF DST[31..24] > SRC[31..23] THEN DST[31..24] ← 0FFH
SRC ELSE DST [31..24] ← 00H
IF DST[39..32] > SRC[39..32] THEN DST[39..32] ← 0FFH
> > > > > > > > > > > > > > > >
ELSE DST [39..32] ← 00H
127
DST IF DST[47..40] > SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] > SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] > SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
IF DST[71..64] > SRC[71..63] THEN DST[71..64] ← 0FFH
ELSE DST [71..64] ← 00H
IF DST[79..72] > SRC[79..72] THEN DST[79..72] ← 0FFH
ELSE DST [79..72] ← 00H
IF DST[87..80] > SRC[87..80] THEN DST[87..80] ← 0FFH
ELSE DST [87..80] ← 00H
IF DST[95..88] > SRC[95..88] THEN DST[95..88] ← 0FFH
ELSE DST [95..88] ← 00H
IF DST[103..96] > SRC[103..96] THEN DST[103..96] ← 0FFH
ELSE DST [103..96] ← 00H
IF DST[111..104] > SRC[111..103] THEN DST[111..104] ← 0FFH
ELSE DST [111..104] ← 00H
IF DST[119..112] > SRC[119..112] THEN DST[119..112] ← 0FFH
ELSE DST [119..112] ← 00H
IF DST[127..120] > SRC[127..120] THEN DST[127..120] ← 0FFH
ELSE DST [127..120] ← 00H
Parallel Compare Words PCMPGTW mm1, mm2/m64 Compare 4 packed words in mm1 and mm2/m64 for greater. IF DST[15..0] > SRC[15..0] THEN DST[15..0] ← 0FFFFH
for Greater Words are treated as signed integers. If a pair of data element ELSE DST [15..0] ← 00H
is greater, then the corresponding data element in the IF DST[31..16] > SRC[31..16] THEN DST[31..16] ← 0FFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [31..16] ← 00H
No flags in the EFLAGS registers are affected. IF DST[47..32] > SRC[47..32] THEN DST[47..32] ← 0FFFFH
63
DST ELSE DST [47..32] ← 00H
63
IF DST[63..48] > SRC[63..48] THEN DST[63..48] ← 0FFFFH
SRC ELSE DST [63..48] ← 00H
> > > >
63
DST

xmm1, xmm2/m128 Compare 8 packed words in mm1 and mm2/m128 for greater. IF DST[15..0] > SRC[15..0] THEN DST[15..0] ← 0FFFFH
Words are treated as signed integers. If a pair of data element ELSE DST [15..0] ← 00H
is greater, then the corresponding data element in the IF DST[31..16] > SRC[31..16] THEN DST[31..16] ← 0FFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [31..16] ← 00H
No flags in the EFLAGS registers are affected. IF DST[47..32] > SRC[47..32] THEN DST[47..32] ← 0FFFFH
127
DST ELSE DST [47..32] ← 00H
127
IF DST[63..48] > SRC[63..48] THEN DST[63..48] ← 0FFFFH
SRC ELSE DST [63..48] ← 00H
IFDST[79..64] > SRC[79..64] THEN DST[79..64] ← 0FFFFH
> > > > > > > >
ELSE DST [79..64] ← 00H
127
DST IF DST[95..80] > SRC[95..80] THEN DST[95..80] ← 0FFFFH
ELSE DST [95..80] ← 00H
IF DST[111..96] > SRC[111..96] THEN DST[111..96] ← 0FFFFH
ELSE DST [111..96] ← 00H
IF DST[127..112] > SRC[127..112] THEN DST[127..112] ← 0FFFFH
ELSE DST [127..112] ← 00H
Parallel Compare PCMPGTD mm1, mm2/m64 Compare 2 packed double words in mm1 and mm2/m64 for IF DST[31..0] > SRC[31..0] THEN DST[31..0] ← 0FFFFFFFFH
Dwords for Greater greater. Dwords are treated as signed integers. If a pair of data ELSE DST [31..0] ← 00H
element is greater, then the corresponding data element in the IF DST[63..32] > SRC[63..32] THEN DST[63..32] ← 0FFFFFFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [63..32] ← 00H
No flags in the EFLAGS registers are affected.
63
DST
63
SRC

> >
63
DST

xmm1, xmm2/m128 Compare 4 packed double words in mm1 and mm2/m128 for IF DST[31..0] > SRC[31..0] THEN DST[31..0] ← 0FFFFFFFFH
greater. Dwords are treated as signed integers. If a pair of data ELSE DST [31..0] ← 00H
element is greater, then the corresponding data element in the IF DST[63..32] > SRC[63..32] THEN DST[63..32] ← 0FFFFFFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [63..32] ← 00H
No flags in the EFLAGS registers are affected. IF DST[95..64] > SRC[95..64] THEN DST[95..64] ← 0FFFFFFFFH
127
DST ELSE DST [95..64] ← 00H
127
IF DST[127..96] > SRC[127..96] THEN DST[127..96] ← 0FFFFFFFFH
SRC ELSE DST [127..96] ← 00H
> > > >
127
DST

MMX/SSE Logical Instructions


Instruction Mnemonic Operands Description Symbolic operations
Parallel AND PAND mm1, mm2/m64 Bitwise AND mm2/m64 and mm1. Store result in mm1. DST ← DST AND SRC
xmm1, xmm2/m128 Bitwise AND xmm2/m128 and xmm1. Store result in xmm1. DST ← DST AND SRC
Parallel AND NOT PANDN mm1, mm2/m64 Bitwise AND NOT mm2/m64 and mm1. Store result in mm1. DST ← DST AND NOT SRC
xmm1, xmm2/m128 Bitwise AND NOT xmm2/m128 and xmm1. Store result in xmm1. DST ← DST AND NOT SRC
Parallel OR POR mm1, mm2/m64 Bitwise OR mm2/m64 and mm1. Store result in mm1. DST ← DST OR SRC
xmm1, xmm2/m128 Bitwise OR xmm2/m128 and xmm1. Store result in xmm1. DST ← DST OR SRC
Parallel XOR PXOR mm1, mm2/m64 Bitwise XOR mm2/m64 and mm1. Store result in mm1. DST ← DST XOR SRC
xmm1, xmm2/m128 Bitwise XOR xmm2/m128 and xmm1. Store result in xmm1. DST ← DST XOR SRC

MMX/SSE Shift and Rotate Instructions


Instruction Mnemonic Operands Description Symbolic operations
Packed Shift Left PSLLW mm1, mm2/m64 Shift words in mm1 left by the specified position count clearing low-order bits. DST[15..0] ← ZeroExtend(DST[15..0] SHL Count)
Logical Words mm1, imm8 63 DST[31..16] ← ZeroExtend(DST[31..15] SHL Count)
DST
DST[47..32] ← ZeroExtend(DST[47..32] SHL Count)
0 DST[63..48] ← ZeroExtend(DST[63..48] SHL Count)
0
0
TMP 0
63
DST
xmm1, xmm2/m128 Shift words in xmm1 left by the specified position count clearing low-order bits. DST[15..0] ← ZeroExtend(DST[15..0] SHL Count)
xmm1, imm8 127 DST[31..16] ← ZeroExtend(DST[31..15] SHL Count)
DST
DST[47..32] ← ZeroExtend(DST[47..32] SHL Count)
0 0 DST[63..48] ← ZeroExtend(DST[63..48] SHL Count)
0 0
DST[79..64] ← ZeroExtend(DST[79..64] SHL Count)
0 0
TMP 0 0 DST[95..80] ← ZeroExtend(DST[95..80] SHL Count)
DST
127
DST[111..96] ← ZeroExtend(DST[111..96] SHL Count)
DST[127..112] ← ZeroExtend(DST[127..112] SHL Count)
Packed Shift Left PSLLD mm1, mm2/m64 Shift double words in mm1 left by the specified position count clearing low-order DST[31..0] ← ZeroExtend(DST[31..0] SHL Count)
Logical Dwords mm1, imm8 bits. DST[63..32] ← ZeroExtend(DST[63..32] SHL Count)
63
DST

0
TMP 0

63
DST

xmm1, xmm2/m128 Shift double words in xmm1 left by the specified position count clearing low- DST[31..0] ← ZeroExtend(DST[31..0] SHL Count)
xmm1, imm8 order bits. DST[63..32] ← ZeroExtend(DST[63..32] SHL Count)
127 DST[95..64] ← ZeroExtend(DST[95..64] SHL Count)
DST
DST[127..96] ← ZeroExtend(DST[127..96] SHL Count)
0 0
TMP 0 0

127
DST

Packed Shift Left PSLLQ mm1, mm2/m64 Shift quad word in mm1 left by the specified position count clearing low-order DST[63..0] ← ZeroExtend(DST[63..0] SHL Count)
Logical Qwords mm1, imm8 bits.
63
DST

TMP 0

63
DST

xmm1, xmm2/m128 Shift quad words in xmm1 left by the specified position count clearing low-order DST[63..0] ← ZeroExtend(DST[63..0] SHL Count)
xmm1, imm8 bits. DST[127..64] ← ZeroExtend(DST[127..64] SHL Count)
127
DST

0
TMP 0

127
DST

Packed Shift Right PSRLW mm1, mm2/m64 Shift words in mm1 right by the specified position count clearing high-order bits. DST[15..0] ← ZeroExtend(DST[15..0] SHR Count)
Logical Words mm1, imm8 63 DST[31..16] ← ZeroExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← ZeroExtend(DST[47..32] SHR Count)
0 DST[63..48] ← ZeroExtend(DST[63..48] SHR Count)
0
TMP
0 0
0
63
DST
xmm1, xmm2/m128 Shift words in xmm1 right by the specified position count clearing high-order bits. DST[15..0] ← ZeroExtend(DST[15..0] SHR Count)
xmm1, imm8 127 DST[31..16] ← ZeroExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← ZeroExtend(DST[47..32] SHR Count)
0 0 DST[63..48] ← ZeroExtend(DST[63..48] SHR Count)
0 0
TMP DST[79..64] ← ZeroExtend(DST[79..64] SHR Count)
0 0 0 0
0 0 DST[95..80] ← ZeroExtend(DST[95..80] SHR Count)
DST
127 DST[111..96] ← ZeroExtend(DST[111..96] SHR Count)
DST[127..112] ← ZeroExtend(DST[127..112] SHR Count)
Packed Shift Right PSRLD mm1, mm2/m64 Shift double words in mm1 right by the specified position count clearing high- DST[31..0] ← ZeroExtend(DST[31..0] SHR Count)
Logical Dwords mm1, imm8 order bits. DST[63..32] ← ZeroExtend(DST[63..32] SHR Count)
63
DST

0
0
TMP
63
DST

xmm1, xmm2/m128 Shift double words in xmm1 right by the specified position count clearing high- DST[31..0] ← ZeroExtend(DST[31..0] SHR Count)
xmm1, imm8 order bits. DST[63..32] ← ZeroExtend(DST[63..32] SHR Count)
127 DST[95..64] ← ZeroExtend(DST[95..64] SHR Count)
DST
DST[127..96] ← ZeroExtend(DST[127..96] SHR Count)
0 0
0 0
TMP
127
DST

Packed Shift Right PSRLQ mm1, mm2/m64 Shift quad word in mm1 right by the specified position count clearing high-order DST[63..0] ← ZeroExtend(DST[63..0] SHR Count)
Logical Qwords mm1, imm8 bits.
63
DST

TMP
0

63
DST

xmm1, xmm2/m128 Shift quad words in xmm1 right by the specified position count clearing high- DST[63..0] ← ZeroExtend(DST[63..0] SHR Count)
xmm1, imm8 order bits. DST[127..64] ← ZeroExtend(DST[127..64] SHR Count)
127
DST

0
0
TMP
127
DST

Packed Shift Right PSRAW mm1, mm2/m64 Shift words in mm1 right by the specified position count duplicating sign. DST[15..0] ← SignExtend(DST[15..0] SHR Count)
Arithmetical Words mm1, imm8 63 DST[31..16] ← SignExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← SignExtend(DST[47..32] SHR Count)
DST[63..48] ← SignExtend(DST[63..48] SHR Count)
TMP
0

63
DST
xmm1, xmm2/m128 Shift words in xmm1 right by the specified position count duplicating sign bits. DST[15..0] ← SignExtend(DST[15..0] SHR Count)
xmm1, imm8 127 DST[31..16] ← SignExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← SignExtend(DST[47..32] SHR Count)
DST[63..48] ← SignExtend(DST[63..48] SHR Count)
TMP DST[79..64] ← SignExtend(DST[79..64] SHR Count)
0 0
DST[95..80] ← SignExtend(DST[95..80] SHR Count)
DST
127 DST[111..96] ← SignExtend(DST[111..96] SHR Count)
DST[127..112] ← SignExtend(DST[127..112] SHR Count)
Packed Shift Right PSRAD mm1, mm2/m64 Shift double words in mm1 right by the specified position count duplicating sign DST[31..0] ← SignExtend(DST[31..0] SHR Count)
Arithmetical Dwords mm1, imm8 bits. DST[63..32] ← SignExtend(DST[63..32] SHR Count)
63
DST

TMP
63
DST

xmm1, xmm2/m128 Shift double words in xmm1 right by the specified position count duplicating sign DST[31..0] ← SignExtend(DST[31..0] SHR Count)
xmm1, imm8 bits. DST[63..32] ← SignExtend(DST[63..32] SHR Count)
127 DST[95..64] ← SignExtend(DST[95..64] SHR Count)
DST
DST[127..96] ← SignExtend(DST[127..96] SHR Count)

TMP
127
DST

MMX State Management Instructions


Instruction Mnemonic Operands Description Symbolic operations
Empty MMX State EMMS Sets the x87 FPU tag word to empty x87FPUTagWord ← FFFFH

SSE Instruction Set

SSE Data Transfer Instructions


Instruction Mnemonic Operands Description Symbolic operations
Move Aligned Packed MOVAPS xmm1, xmm2/m128 Moves packed single-precision floating-point values from source to DST ← SRC
Singles xmm1/m128, xmm2 destination operand. When the source or destination operand is a
memory location, it must be aligned on a 16-byte boundary.
Move Unaligned MOVUPS xmm1, xmm2/m128 Moves packed single-precision floating-point values from source to DST ← SRC
Packed Singles xmm1/m128, xmm2 destination operand. When the source or destination operand is a
memory location, it may be not aligned on a 16-byte boundary.
Move High Packed MOVHPS xmm, m64 Move two packed single-precision floating-point values from source DST[127..64] ← SRC // DST[63..0] remains unchanged
Singles m64, xmm to high quad word of destination operand. DST ← SRC[127..64]
Move High to Low MOVHLPS xmm1, xmm2 Moves two packed single-precision floating-point values from high DST[63..0] ← SRC[127..64]
Packed Singles quad word of xmm2 to low quad word of xmm1. // DST[127..64] remains unchanged
Move Low Packed MOVLPS xmm, m64 Move two packed single-precision floating-point values from source DST[63...0] ← SRC // DST[127..64] remains unchanged
Singles m64, xmm to low quad word of destination operand. DST ← SRC[63..0]
Move Low to High MOVLHPS xmm1, xmm2 Moves two packed single-precision floating-point values from low DST[127..64] ← SRC[63..0]
Packed Singles quad word of xmm2 to high quad word of xmm1. // DST[63..0] remains unchanged
Extract Packed Singles MOVMSKPS r32, xmm Extracts 4-bit sign mask of from xmm and stores in r32. DST[0] ← SRC[31]
Sign Mask 127 DST[1] ← SRC[63]
SRC DST[2] ← SRC[95]
DST[3] ← SRC[127]
31
DST[31..4] ← 000000H
DST 0 0 0

Move Scalar Single MOVSS xmm, m128 Moves scalar single-precision floating-point value from source to DST[31..0] ← SRC[31..0]
destination operand. DST[127..32] ← 000000000000000000000000H
m128, xmm DST[31..0] ← SRC[31..0]
xmm1, xmm2 DST[31..0] ← SRC[31..0]
//DST[127..32] remains unchanged

SSE Packed Arithmetic Instructions


Instruction Mnemonic Operands Description Symbolic operations
Add Packed Singles ADDPS xmm1, xmm2/m128 Adds 4 packed single-precision floating-point values from xmm2/m128 DST[31..0] ← DST[31..0] + SRC[31-0];
to 4 packed single-precision floating-point values in xmm1. DST[63..32] ← DST[63..32] + SRC[63..32];
127 DST[95..64] ← DST[95..64] + SRC[95..64];
DST
DST[127..96] ← DST[127..96] + SRC[127..96];
127
SRC

+ + + +
127
DST

Add Scalar Singles ADDSS xmm1, xmm2/m32 Adds the low single-precision floating-point value from xmm2/m32 to DST[31..0] ← DST[31..0] + SRC[31..0];
the low single-precision floating-point value in xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC

+
127
DST

Subtract Packed SUBPS xmm1, xmm2/m128 Subtracts 4 packed single-precision floating-point values from DST[31..0] ← DST[31..0]  SRC[31-0];
Singles xmm2/m128 from 4 packed single-precision floating-point values in DST[63..32] ← DST[63..32]  SRC[63..32];
xmm1. DST[95..64] ← DST[95..64]  SRC[95..64];
127
DST DST[127..96] ← DST[127..96]  SRC[127..96];
127
SRC

- - - -
127
DST
Subtract Scalar Singles SUBSS xmm1, xmm2/m32 Subtracts the low single-precision floating-point value from xmm2/m32 DST[31..0] ← DST[31..0]  SRC[31..0];
from the low single-precision floating-point value in xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC

-
127
DST

Multiply Packed MULPS xmm1, xmm2/m128 Multiplies 4 packed single-precision floating-point values xmm1 by 4 DST[31..0] ← DST[31..0] * SRC[31-0];
Singles packed single-precision floating-point values in xmm2/m128. DST[63..32] ← DST[63..32] * SRC[63..32];
127 DST[95..64] ← DST[95..64] * SRC[95..64];
DST
DST[127..96] ← DST[127..96] * SRC[127..96];
127
SRC

   

127
DST

Multiply Scalar Singles MULSS xmm1, xmm2/m32 Multiplies the low single-precision floating-point value from xmm1 by DST[31..0] ← DST[31..0] * SRC[31..0];
the low single-precision floating-point value in xmm2/m32. // DST[127..32] remains unchanged
127
DST
127
SRC



127
DST

Divide Packed Singles DIVPS xmm1, xmm2/m128 Divides 4 packed single-precision floating-point values in xmm1 by 4 DST[31..0] ← DST[31..0] / SRC[31..0];
packed single-precision floating-point values in xmm2/m128. DST[63..32] ← DST[63..32] / SRC[63..32];
127 DST[95..64] ← DST[95..64] / SRC[95..64];
DST
DST[127..96] ← DST[127..96] / SRC[127..96];
127
SRC

/ / / /
127
DST

Divide Scalar Singles DIVSS xmm1, xmm2/m32 Divides low single-precision floating-point value in xmm1 by the low DST[31..0] ← DST[31..0] / SRC[31..0];
single-precision floating-point value in xmm2/m64. // DST[127..32] remains unchanged
127
DST
127
SRC

/
127
DST
Reciprocals of Packed RCPPS xmm1, xmm2/m128 Computes the approximate reciprocals of the packed single-precision DST[31..0] ← Approximate (1.0 / SRC[31..0]);
Singles floating-point values in xmm2/m128 and stores the results in xmm1. DST[63..32] ← Approximate (1.0 / SRC[63..32]);
127 DST[95..64] ← Approximate (1.0 / SRC[95..64]);
SRC
DST[127..96] ← Approximate (1.0 / SRC[127..96]);
1/x 1/x 1/x 1/x
127
DST

Reciprocals of Scalar RCPSS xmm1, xmm2/m32 Computes the approximate reciprocal of the scalar single-precision DST[31..0] ← Approximate (1.0 / SRC[31..0]);
Single floating-point value in xmm2/m128 and stores the result in xmm1. // DST[127..32] remains unchanged
SRC

1/x
127
DST

Square Roots of Packed SQRTPS xmm1, xmm2/m128 Computes the square roots of the packed single-precision floating-point DST[31..0] ← SquareRoot (SRC[31..0]);
Singles values in xmm2/m128 and stores the results in xmm1. DST[63..32] ← SquareRoot (SRC[63..32]);
127 DST[95..64] ← SquareRoot (SRC[95..64]);
SRC
DST[127..96] ← SquareRoot (SRC[127..96]);
SQRT SQRT SQRT SQRT
127
DST

Square Root of Scalar SQRTSS xmm1, xmm2/m32 Computes the square root of the scalar single-precision floating-point DST[31..0] ← SquareRoot (SRC[31..0]);
Single value in xmm2/m128 and stores the result in xmm1. // DST[127..32] remains unchanged
SRC

SQRT
127
DST

Reciprocals of Square RSQRTPS xmm1, xmm2/m128 Computes the approximate reciprocals of the square roots of the DST[31..0] ← Approximate (1.0 / SquareRoot (SRC[31..0]));
Roots of Packed packed single-precision floating-point values in xmm2/m128 and DST[63..32] ← Approximate (1.0 / SquareRoot (SRC[63..32]));
Singles stores the results in xmm1. DST[95..64] ← Approximate (1.0 / SquareRoot (SRC[95..64]));
SRC
127 DST[127..96] ← Approximate (1.0 / SquareRoot (SRC[127..96]));

1/SQRT(x) 1/SQRT(x) 1/SQRT(x) 1/SQRT(x)


127
DST

Reciprocals of Square RSQRTSS xmm1, xmm2/m32 Computes the approximate reciprocal of the square root of the scalar DST[31..0] ← Approximate (1.0 / SquareRoot (SRC[31..0]));
Roots of Scalar Single single-precision floating-point value in xmm2/m128 and stores the // DST[127..32] remains unchanged
result in xmm1.
SRC

1/SQRT(x)
127
DST
Maximum Packed MAXPS xmm1, xmm2/m128 Returns the maximum single-precision floating-point values between DST[31..0] ← MaximumOf (DST[31..0], SRC[31..0]);
Single xmm2/m128 and xmm1. DST[63..32] ← MaximumOf (DST[63..32], SRC[63..32]);
127 DST[95..64] ← Maximum (DST[95..64], SRC[95..64]);
DST
DST[127..96] ← MaximumOf (DST[127..96], SRC[127..96]);
127
SRC

MAX MAX MAX MAX


127
DST

Maximum Scalar MAXSS xmm1, xmm2/m32 Returns the maximum scalar single-precision floating-point value DST[31..0] ← MaximumOf (DST[31..0], SRC[31..0]);
Single between xmm2/m128 and xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC

MAX
127
DST

Minimum Packed MINPS xmm1, xmm2/m128 Returns the minimum single-precision floating-point values between DST[31..0] ← MinimumOf (DST[31..0], SRC[31..0]);
Single xmm2/m128 and xmm1. DST[63..32] ← MinimumOf (DST[63..32], SRC[63..32]);
127 DST[95..64] ← Minimum (DST[95..64], SRC[95..64]);
DST
DST[127..96] ← MinimumOf (DST[127..96], SRC[127..96]);
127
SRC

MIN MIN MIN MIN


127
DST

Minimum Scalar Single MINSS xmm1, xmm2/m32 Returns the minimum scalar single-precision floating-point value DST[31..0] ← MinimumOf (DST[31..0], SRC[31..0]);
between xmm2/m128 and xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC

MIN
127
DST
SSE Comparison Instructions
Instruction Mnemonic Operands Description Symbolic operations
Compare Packed CMPPS xmm1, xmm2/m128, Compares 4 packed double-precision floating-point values in CMP0 ← DST[31..0] OP SRC[31..0];
Singles imm8 xmm2/m128 and xmm1 using imm8 as comparison predicate: 0 – CMP1 ← DST[63..32] OP SRC[63..32];
equal, 1 – less than, 2 – less or equal, 3 – unordered, 4 – not equal, 5 CMP2 ← DST[95..64] OP SRC[95..64];
– not less, 6 – not less or equal, 7 – ordered. The result of each CMP3 ← DST[127..96] OP SRC[127..96];
comparison in a quad-word mask of all 1s (comparison true) or all 0s
IF CMP0 THEN DST[31..0] ← FFFFFFFFH
(comparison false). The unordered relationship is true when at leas
ELSE DST[31..0] ← 00000000H;
one of the two operands is a NAN; the ordered relationship id true
when neither operand is a NAN. IF CMP1 THEN DST[63..32] ← FFFFFFFFH
127
ELSE DST[63..32] ← 00000000H;
DST IF CMP2 THEN DST[95..64] ← FFFFFFFFH
127 ELSE DST[95..64] ← 00000000H;
SRC
IF CMP3 THEN DST[127..96] ← FFFFFFFFH
x?y x?y x?y x?y ELSE DST[127..96] ← 00000000H
127
DST

Compare Packed CMPEQPS xmm1, xmm2 <=> CMPPS xmm1,xmm2, 0 see CMPPS
Singles CMPLTPS <=> CMPPS xmm1,xmm2, 1
CMPLEPS <=> CMPPS xmm1,xmm2, 2
CMPUNORDPS <=> CMPPS xmm1,xmm2, 3
CMPNEQPS <=> CMPPS xmm1,xmm2, 4
CMPNLTPS <=> CMPPS xmm1,xmm2, 5
CMPNLEPS <=> CMPPS xmm1,xmm2, 6
CMPORDPS <=> CMPPS xmm1,xmm2, 7
Compare Scalar Singles CMPSS xmm1, xmm2/m32, Compares the low double-precision floating-point values in CMP0 ← DST[31..0] OP SRC[31..0];
imm8 xmm2/m128 and xmm1 using imm8 as comparison predicate: 0 – IF CMP0 THEN DST[31..0] ← FFFFFFFFH
equal, 1 – less than, 2 – less or equal, 3 – unordered, 4 – not equal, 5 ELSE DST[31..0] ← 00000000H;
– not less, 6 – not less or equal, 7 – ordered. The result of each // DST[127..32] remains unchanged
comparison in a quad-word mask of all 1s (comparison true) or all 0s
(comparison false). The unordered relationship is true when at leas
one of the two operands is a NAN; the ordered relationship id true
when neither operand is a NAN.
127
DST
127
SRC

x?y
127
DST

Compare Scalar Singles CMPEQSS xmm1, xmm2 <=> CMPSS xmm1,xmm2, 0 see CMPSS
CMPLTSS <=> CMPSS xmm1,xmm2, 1
CMPLESS <=> CMPSS xmm1,xmm2, 2
CMPUNORDSS <=> CMPSS xmm1,xmm2, 3
CMPNEQSS <=> CMPSS xmm1,xmm2, 4
CMPNLTSS <=> CMPSS xmm1,xmm2, 5
CMPNLESS <=> CMPSS xmm1,xmm2, 6
CMPORDSS <=> CMPSS xmm1,xmm2, 7
Compare Scalar Singles COMISS xmm1, xmm2/m32 Compares the low single-precision floating-point values in the Result ← OrderedCompare(DST[31..0], SRC[31..0])
and set EFLAGS operands and sets the EFLAGS flags accordingly. Performs ordered CASE (Result) OF
compare. This instruction differs from the UCOMISS instruction in UNORDERED: ZF, PF, CF ← 111;
that is signals an invalid operation exception when a source operand GREATER_THAN: ZF, PF, CF ← 000;
is a QNan or and SNaN. LESS_THAN: ZF, PF, CF ← 001;
127
DST EQUAL: ZF, PF, CF ← 100;
SRC
127 END
OF, AF, SF ← 0;
1 1 1 unordered
X0>Y0 Ordered
0 0 0
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0

OF AF SF ZF PF CF

Unordered Compare UCOMISS xmm1, xmm2/m32 Compares the low single-precision floating-point value in the Result ← UnorderedCompare(DST[31..0], SRC[31..0])
Scalar Singles and set operands and sets the EFLAGS flags accordingly. Performs CASE (Result) OF
EFLAGS unordered compare. This instruction differs from the COMISS UNORDERED: ZF, PF, CF ← 111;
instruction in that is signals an invalid operation exception only when GREATER_THAN: ZF, PF, CF ← 000;
a source operand is a SNaN. LESS_THAN: ZF, PF, CF ← 001;
127
DST EQUAL: ZF, PF, CF ← 100;
127 END
SRC
OF, AF, SF ← 0;
1 1 1 unordered
X0>Y0 Ordered
0 0 0
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0

OF AF SF ZF PF CF

SSE Logical Instructions


Instruction Mnemonic Operands Description Symbolic operations
AND of Packed Singles ANDPS xmm1, xmm2/m128 Performs a bitwise AND operation of the four packed single- DST[31..0] ← DST[31..0] AND SRC[31-0];
precision floating-point values from the destination (first) and source DST[63..32] ← DST[63..32] AND SRC[63..32];
(second) operands and stored the result in the destination operand. DST[95..64] ← DST[95..64] AND SRC[95..64];
127
DST DST[127..96] ← DST[127..96] AND SRC[127..96];
127
SRC

AND AND AND AND


127
DST
AND NOT of Packed ANDNPS xmm1, xmm2/m128 Inverts the bits of the four packed single-precision floating-point DST[31..0] ← (NOT DST[31..0]) AND SRC[31-0];
Singles values in the destination (first) operand, performs a bitwise logical DST[63..32] ← (NOT DST[63..32]) AND SRC[63..32];
AND operation of the four packed single-precision floating-point DST[95..64] ← (NOT DST[95..64]) AND SRC[95..64];
values from the temporary inverted result and source (second) DST[127..96] ← (NOT DST[127..96]) AND SRC[127..96];
operand and stored the result in the destination operand.
127
DST
127
SRC

ANDN ANDN ANDN ANDN


127
DST

OR of Packed Singles ORPS xmm1, xmm2/m128 Performs a bitwise OR operation of the four packed single-precision DST[31..0] ← DST[31..0] OR SRC[31-0];
floating-point values from the destination (first) and source (second) DST[63..32] ← DST[63..32] OR SRC[63..32];
operands and stored the result in the destination operand. DST[95..64] ← DST[95..64] OR SRC[95..64];
127
DST DST[127..96] ← DST[127..96] OR SRC[127..96];
127
SRC

OR OR OR OR
127
DST

Exclusive OR of XORPS xmm1, xmm2/m128 Performs a bitwise XOR operation of the four packed single- DST[31..0] ← DST[31..0] XOR SRC[31-0];
Packed Singles precision floating-point values from the destination (first) and source DST[63..32] ← DST[63..32] XOR SRC[63..32];
(second) operands and stored the result in the destination operand. DST[95..64] ← DST[95..64] XOR SRC[95..64];
127
DST X3 X2 X1 X0 DST[127..96] ← DST[127..96] XOR SRC[127..96];
127
SRC Y3 Y2 Y1 Y0

XOR XOR XOR XOR


127
DST X3 XOR Y3 X2 XOR Y2 X1 XOR Y1 X0 XOR Y0
SSE Shuffle and Unpack Instructions
Instruction Mnemonic Operands Description Symbolic operations
Shuffle Packed Singles SHUFPS xmm1, xmm2/m128, imm8 Moves two of the four packed single-precision floating-point CASE (SEL[1..0]) OF
values from the destination (first) operand into the low quad 0: DST[31..0] ← DST[31..0]
word of the destination operand; move two of the four packed 1: DST[31..0] ← DST[63..32]
single-precision floating-point values from the source 2: DST[31..0] ← DST[95..64]
(second) operand into the high quad word of the destination
3: DST[31..0] ← DST[127..96]
operand. The select (third) operand determines which values
END
are moved to the destination operand.
CASE (SEL[3..2]) OF
127
DST X3 X2 X1 X0 0: DST[63..32] ← DST[31..0]
127 1: DST[63..32] ← DST[63..32]
SRC Y3 Y2 Y1 Y0
2: DST[63..32] ← DST[95..64]
3: DST[63..32] ← DST[127..96]
END
SEL
[7..6]
SEL
[5..4]
SEL
[3..2]
SEL
[1..0] CASE (SEL[5..4]) OF
127
0: DST[95..64] ← SRC[31..0]
DST Y3..Y0 Y3..Y0 X3..X0 X3..X0 1: DST[95..64] ← SRC[63..32]
2: DST[95..64] ← SRC[95..64]
3: DST[95..64] ← SRC[127..96]
END
CASE (SEL[7..6) OF
0: DST[127..96] ← SRC[31..0]
1: DST[127..96] ← SRC[63..32]
2: DST[127..96] ← SRC[95..64]
3: DST[127..96] ← SRC[127..96]
END
Unpack Low Packed UNPCKLPS xmm1, xmm2/m128 Unpacks and interleaves the low single-precision floating- DST[31..0] ← DST[31..0]
Singles point values from the low quad words of the source (second) DST[63..32] ← SRC[31..0]
operand and the destination (first) operand. DST[95..64] ← DST[63..32]
127 DST[127..96] ← SRC[63..32]
DST X3 X2 X1 X0
127
SRC Y3 Y2 Y1 Y0

127
DST Y1 X1 Y0 X0

Unpack High Packed UNPCKHPS xmm1, xmm2/m128 Unpacks and interleaves the low single-precision floating- DST[31..0] ← DST[95..64]
Singles point values from the high quad words of the source (second) DST[63..32] ← SRC[95..64]
operand and the destination (first) operand. DST[95..64] ← DST[127..96]
127 DST[127..96] ← SRC[127..96]
DST X3 X2 X1 X0
127
SRC Y3 Y2 Y1 Y0

127
DST Y3 X3 Y2 X2
SSE Conversion Instructions
Instruction Mnemonic Operands Description Symbolic operations
Convert Packed CVTPI2PS xmm, mm/m64 Converts two packed signed double word integers from mm/mem64 to DST[31..0] ← IntToSingle (SRC[31..0]);
Integers to Packed two packed single-precision floating-point values from xmm. DST[63..32] ← IntToSingle (SRC[63..32]);
Singles 63 // DST[127..64] remains unchanged
SRC

127
DST

Convert Packed Singles CVTPS2PI mm, xmm/m64 Converts two packed single-precision floating-point values from DST[31..0] ← SingleToInt (SRC[31..0]);
to Packed Integers xmm/m64 to two packed signed double-word integers in mm. DST[63..32] ← SingleToInt (SRC[63..32]);
127
SRC

63
DST

Convert Scalar Integer CVTSI2SS xmm, r/m32 Converts one signed double-word integer from r/m32 to one scalar DST[31..0]← IntToSingle (SRC);
to Scalar Single single-precision floating-point value in xmm.. // DST[127..32] remains unchanged
31
SRC

127
DST

Convert Scalar Single CVTSS2SI r32, xmm/m32 Converts a single-precision floating-point value from xmm/m32 to a DST← SingleToInt (SRC[31..0]);
Scalar Integer signed double-word integer in r32.
127
SRC

31
DST

Convert with CVTTPS2PI mm, xmm/m64 Converts two packed single-precision floating-point values from DST[31..0] ← TruncateSingleToInt (SRC[31..0]);
Truncation Packed xmm/m64 to two packed signed double-word integers in mm using DST[63..32] ← TruncateSingleToInt (SRC[63..32]);
Singles to Packed truncation. // DST[127..64] remains unchanged
Integers 127
SRC

63
DST

Convert with CVTTSS2SI r32, xmm/m32 Converts a single-precision floating-point value from xmm/m32 to a DST← TruncateSingleToInt (SRC[31..0]);
Truncation Scalar signed double-word integer in r32 using truncation.
Single to Scalar Integer 127
SRC

31
DST
SSE 64-Bit SIMD Integer Instructions
Instruction Mnemonic Operands Description Symbolic operations
Packed Average PAVGB mm1, mm2/m64 Averages 8 packed unsigned bytes from mm1 and 8 packed DST[7..0] ← (DST[7..0]+SRC[7..0]+1) SHR 1
Bytes unsigned bytes from mm2/m64 with rounding. DST[15..8] ← (DST[15..8]+SRC[15..8]+1) SHR 1
63 DST[23..16] ← (DST[23..16]+SRC[23..16]+1) SHR 1
DST
DST[31..24] ← (DST[31..24]+SRC[31..24]+1) SHR 1
63
SRC DST[39..32] ← (DST[39..32]+SRC[39..32]+1) SHR 1
DST[47..40] ← (DST[47..40]+SRC[47..40]+1) SHR 1
AV AV AV AV AV AV AV AV
G G G G G G G G
DST[55..48] ← (DST[55..48]+SRC[55..48]+1) SHR 1
DST
63
DST[63..56] ← (DST[63..56]+SRC[63..56]+1) SHR 1

xmm1,xmm2/m128 Averages 16 packed unsigned bytes from xmm1 and 16 packed DST[7..0] ← (DST[7..0]+SRC[7..0]+1) SHR 1
unsigned bytes from xmm2/m128 with rounding. DST[15..8] ← (DST[15..8]+SRC[15..8]+1) SHR 1
127 DST[23..16] ← (DST[23..16]+SRC[23..16]+1) SHR 1
DST
DST[31..24] ← (DST[31..24]+SRC[31..24]+1) SHR 1
127
SRC DST[39..32] ← (DST[39..32]+SRC[39..32]+1) SHR 1
DST[47..40] ← (DST[47..40]+SRC[47..40]+1) SHR 1
AV AV AV AV AV AV AV AV AV AV AV AV AV AV AV AV
G G G G G G G G G G G G G G G G
DST[55..48] ← (DST[55..48]+SRC[55..48]+1) SHR 1
DST
127
DST[63..56] ← (DST[63..56]+SRC[63..56]+1) SHR 1
DST[71..64] ← (DST[71..64]+SRC[71..64]+1) SHR 1
DST[79..72] ← (DST[79..72]+SRC[79..72]+1) SHR 1
DST[87..80] ← (DST[87..80]+SRC[87..80]+1) SHR 1
DST[95..88] ← (DST[95..88]+SRC[95..88]+1) SHR 1
DST[103..96] ← (DST[103..96]+SRC[103..96]+1) SHR 1
DST[111..104] ← (DST[111..104]+SRC[111..104]+1) SHR 1
DST[119..112] ← (DST[119..112]+SRC[119..112]+1) SHR 1
DST[127..120] ← (DST[127..120]+SRC[127..120]+1) SHR 1
Packed Average PAVGW mm1, mm2/m64 Averages 4 packed unsigned words from mm1 and 4 packed DST[15..0] ← (DST[15..0]+SRC[15..0]+1) SHR 1
Words unsigned words from mm2/m64 with rounding. DST[31..16] ← (DST[31..16]+SRC[31..16]+1) SHR 1
63 DST[47..32] ← (DST[47..32]+SRC[47..32]+1) SHR 1
DST
DST[63..56] ← (DST[63..48]+SRC[63..48]+1) SHR 1
63
SRC

AVG AVG AVG AVG

63
DST

xmm1,xmm2/m128 Averages 8 packed unsigned words from xmm1 and 8 packed DST[15..0] ← (DST[15..0]+SRC[15..0]+1) SHR 1
unsigned words from xmm2/m128 with rounding. DST[31..16] ← (DST[31..16]+SRC[31..16]+1) SHR 1
127 DST[47..32] ← (DST[47..32]+SRC[47..32]+1) SHR 1
DST
DST[63..56] ← (DST[63..48]+SRC[63..48]+1) SHR 1
127
SRC DST[79..64] ← (DST[79..64]+SRC[79..64]+1) SHR 1
DST[95..80] ← (DST[95..80]+SRC[95..80]+1) SHR 1
AVG AVG AVG AVG AVG AVG AVG AVG
DST[111..96] ← (DST[111..96]+SRC[111..96]+1) SHR 1
DST
127
DST[127..112] ← (DST[127..112]+SRC[127..112]+1) SHR 1
Packed Maximum of PMAXSW mm1, mm2/m64 Compares 4 signed word integers in mm1 with 4 signed word DST[15..0] ← MaximumOf (DST[15..0], SRC[15..0])
Words integers in mm2/m64 and returns maximum values. DST[31..16] ← MaximumOf (DST[31..16], SRC[31..16])
63 DST[47..32] ← MaximumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MaximumOf (DST[63..48], SRC[63..48])
63
SRC

MAX MAX MAX MAX

63
DST

xmm1, xmm2/m128 Compares 8 signed word integers in xmm1 with 8 signed word DST[15..0] ← MaximumOf (DST[15..0], SRC[15..0])
integers in xmm2/m128 and returns maximum values. DST[31..16] ← MaximumOf (DST[31..16], SRC[31..16])
127 DST[47..32] ← MaximumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MaximumOf (DST[63..48], SRC[63..48])
127
SRC DST[71..64] ← MaximumOf (DST[71..64], SRC[71..64])
DST[95..80] ← MaximumOf (DST[95..80], SRC[95..80])
MAX MAX MAX MAX MAX MAX MAX MAX
DST[111..96] ← MaximumOf (DST[111..96], SRC[111..96])
DST
127
DST[127..112] ← MaximumOf (DST[127..112], SRC[127..112])

Packed Maximum of PMAXUB mm1, mm2/m64 Compares 8 unsigned byte integers in xmm1 with 8 unsigned DST[7..0] ← MaximumOf (DST[7..0], SRC[7..0])
Unsigned Bytes byte integers in xmm2/m128 and returns maximum values. DST[15..8] ← MaximumOf (DST[15..8], SRC[15..8])
63 DST[23..16] ← MaximumOf (DST[23..16], SRC[23..16])
DST
DST[31..24] ← MaximumOf (DST[31..24], SRC[31..24])
63
SRC DST[39..32] ← MaximumOf (DST[39..32], SRC[39..32])
DST[47..40] ← MaximumOf (DST[47..40], SRC[47..40])
MA MA MA MA MA MA MA MA
X X X X X X X X
DST[55..48] ← MaximumOf (DST[55..48], SRC[55..48])
DST
63
DST[63..56] ← MaximumOf (DST[63..56], SRC[63..56])

Packed Minimum of PMAXSW mm1, mm2/m64 Compares 4 signed word integers in mm1 with 4 signed word DST[15..0] ← MinimumOf (DST[15..0], SRC[15..0])
Words integers in mm2/m64 and returns minimum values. DST[31..16] ← MinimumOf (DST[31..16], SRC[31..16])
63 DST[47..32] ← MinimumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MinimumOf (DST[63..48], SRC[63..48])
63
SRC

MAX MAX MAX MAX

63
DST

xmm1, xmm2/m128 Compares 8 signed word integers in xmm1 with 8 signed word DST[15..0] ← MinimumOf (DST[15..0], SRC[15..0])
integers in xmm2/m128 and returns minimum values. DST[31..16] ← MinimumOf (DST[31..16], SRC[31..16])
127 DST[47..32] ← MinimumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MinimumOf (DST[63..48], SRC[63..48])
127
SRC DST[71..64] ← MinimumOf (DST[71..64], SRC[71..64])
DST[95..80] ← MinimumOf (DST[95..80], SRC[95..80])
MAX MAX MAX MAX MAX MAX MAX MAX
DST[111..96] ← MinimumOf (DST[111..96], SRC[111..96])
DST
127
DST[127..112] ← MinimumOf (DST[127..112], SRC[127..112])
Packed Minimum of PMAXUB mm1, mm2/m64 Compares 8 unsigned byte integers in xmm1 with 8 unsigned DST[7..0] ← MinimumOf (DST[7..0], SRC[7..0])
Unsigned Bytes byte integers in xmm2/m128 and returns minimum values. DST[15..8] ← MinimumOf (DST[15..8], SRC[15..8])
63 DST[23..16] ← MinimumOf (DST[23..16], SRC[23..16])
DST
DST[31..24] ← MinimumOf (DST[31..24], SRC[31..24])
63
SRC DST[39..32] ← MinimumOf (DST[39..32], SRC[39..32])
DST[47..40] ← MinimumOf (DST[47..40], SRC[47..40])
MA MA MA MA MA MA MA MA
X X X X X X X X
DST[55..48] ← MinimumOf (DST[55..48], SRC[55..48])
DST
63
DST[63..56] ← MinimumOf (DST[63..56], SRC[63..56])

Move Byte Mask To PMOVMSKB r32, mm Creates a mask made up of the most significant bit of each byte DST[0] ← SRC[7]
Integer of the mmx register and stored the result in the low byte r32 DST[1] ← SRC[15]
register. DST[2] ← SRC[23]
63 DST[3] ← SRC[31]
SRC DST[4] ← SRC[39]
DST[5] ← SRC[47]
31 DST[6] ← SRC[55]
DST 0 0 0 DST[7] ← SRC[63]
DST[31..8] ← 000000H
r32, xmm Creates a mask made up of the most significant bit of each byte DST[0] ← SRC[7]
of the xmm register and stored the result in the low word of DST[1] ← SRC[15]
xmm register. DST[2] ← SRC[23]
127 DST[3] ← SRC[31]
SRC DST[4] ← SRC[39]
DST[5] ← SRC[47]
31 DST[6] ← SRC[55]
DST 0 0 DST[7] ← SRC[63]
DST[8] ← SRC[71]
DST[9] ← SRC[79]
DST[10] ← SRC[87]
DST[11] ← SRC[95]
DST[12] ← SRC[103]
DST[13] ← SRC[111]
DST[14] ← SRC[119]
DST[15] ← SRC[127]
DST[31..16] ← 0000H
Packed Sum of PSADBW mm1, mm2/m64 Computes the absolute differences of the packed unsigned byte TMP0 ← ABS(DST[7..0]SRC[7..0])
Absolute Differences integers from mm1 and mm2/m64; differences are then TMP1 ← ABS(DST[15..8]SRC[15..8])
summed to produce an unsigned word integer result. TMP2 ← ABS(DST[23..16]SRC[23..16])
63
DST TMP3 ← ABS(DST[31..24]SRC[31..24])
63 TMP4 ← ABS(DST[39..32]SRC[39..32])
SRC
TMP5 ← ABS(DST[47..40]SRC[47..40])
|x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| TMP6 ← ABS(DST[55..48]SRC[55..48])
TMP7 ← ABS(DST[63..56]SRC[63..56])
SUM
63
DST[15..0] ← SUM(TMP0..TMP7)
DST 0 0 0 0 0 0 DST[63..16] ← 000000000000H
xmm1, xmm2/m128 Computes the absolute differences of the packed unsigned byte TMP0 ← ABS(DST[7..0]SRC[7..0])
integers from xmm1 and xmm2/m128; the 8 low differences TMP1 ← ABS(DST[15..8]SRC[15..8])
and the high 8 differences are then summed separately to TMP2 ← ABS(DST[23..16]SRC[23..16])
produce an two unsigned word integer results.
TMP3 ← ABS(DST[31..24]SRC[31..24])
127
DST TMP4 ← ABS(DST[39..32]SRC[39..32])
127 TMP5 ← ABS(DST[47..40]SRC[47..40])
SRC
TMP6 ← ABS(DST[55..48]SRC[55..48])
|x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| TMP7 ← ABS(DST[63..56]SRC[63..56])
TMP8 ← ABS(DST[71..64]SRC[71..64])
SUM SUM
127 TMP9 ← ABS(DST[79..72]SRC[79..72])
DST 0 0 0 0 0 0 0 0 0 0 0 0
TMP10 ← ABS(DST[87..80]SRC[87..80])
TMP11 ← ABS(DST[95..88]SRC[95..88])
TMP12 ← ABS(DST[103..96]SRC[103..96])
TMP13 ← ABS(DST[111..104]SRC[111..104])
TMP14 ← ABS(DST[119..112]SRC[119..112])
TMP15 ← ABS(DST[127..120]SRC[127..120])
DST[15..0] ← SUM(TMP0..TMP7)
DST[63..16] ← 000000000000H
DST[79..64] ← SUM(TMP8..TMP15)
DST[127..80] ← 000000000000H
Packed Extract Word PEXTRW r32, mm, imm8 Extracts the word specified by imm8 from the mmx register SEL ← Count AND 3H
and moves it to the r32 register. TMP ← (SRC SHR (SEL * 16)) AND 0FFFFH
63
DST[15..0] ← TMP[15..0]
SRC DST[31..16] ← 0000H
SEL

31
DST 0

r32, xmm, imm8 Extracts the word specified by imm8 from the xmm register SEL ← Count AND 7H
and moves it to the r32 register. TMP ← (SRC SHR (SEL * 16)) AND 0FFFFH
127
DST[15..0] ← TMP[15..0]
SRC DST[31..16] ← 0000H
SEL

31
DST 0

Packed Insert Word PINSRW mm, r32/m16, imm8 Inserts the low word from the r32 register or memory into the SEL ← Count AND 3H
mmx register at the word position specified by imm8. CASE (SEL) OF
31
0: MASK ← 000000000000FFFFH
SRC 1: MASK ← 00000000FFFF0000H
2: MASK ← 0000FFFF00000000H
SEL
3: MASK ← FFFF000000000000H
63
DST END
DST ← (DST AND NOT MASK) OR
((SRC SHL (SEL *16)) AND MASK)
xmm, r32/m16, imm8 Inserts the low word from the r32 register or memory into the SEL ← Count AND 7H
xmm register at the word position specified by imm8. CASE (SEL) OF
31
0: MASK ← 0000000000000000000000000000FFFFH
SRC 1: MASK ← 000000000000000000000000FFFF0000H
2: MASK ← 00000000000000000000FFFF00000000H
SEL
3: MASK ← 0000000000000000FFFF000000000000H
127
DST 4: MASK ← 000000000000FFFF0000000000000000H
5: MASK ← 00000000FFFF00000000000000000000H
6: MASK ← 0000FFFF000000000000000000000000H
7: MASK ← FFFF0000000000000000000000000000H
END
DST ← (DST AND NOT MASK) OR
((SRC SHL (SEL *16)) AND MASK)
Packed Shuffle Words PSHUFW mm1, mm2/m64, imm8 Copies words from source (second) operand and inserts them DST[15..0] ← (SRC SHR (ORDER[1..0] * 16))[15..0]
into the destination (first) operand at word locations selected DST[31..16] ← (SRC SHR (ORDER[3..2] * 16))[15..0]
with the order (third) operand. DST[47..32] ← (SRC SHR (ORDER[5..4] * 16))[15..0]
SRC
63
X3 X2 X1 X0
DST[63..48] ← (SRC SHR (ORDER[7..6] * 16))[15..0]

SEL SEL SEL SEL


[7..6] [5..4] [3..2] [1..0]

63
DST X3..X0 X3..X0 X3..X0 X3..X0

MXCSR State Management Instructions


Instruction Mnemonic Operands Description Symbolic operations
Store MXCSR Register STMXCSR m32 Store MXCSR register to the memory. The reserved bits are stored as DST ← MXCSR
State 0s.
Load MXCSR Register LDMXCSR m32 Load MXCSR register from the memory. MXCSR ← SRC
State

SSE Cacheability Control, Prefetch and Instruction Ordering Instructions


Instruction Mnemonic Operands Description Symbolic operations
Store Selected Bytes of MASKMOVQ mm1, mm2 Stores selected bytes from the mmx register (first operand) into a 64-bit IF (MASK[7]=1) THEN DS:[(E)DI] ← SRC[7..0]
Quad-Word using Non- memory location. The address of the memory location is specified by IF (MASK[15]=1) THEN DS:[(E)DI+1] ← SRC[15..8]
temporal Hint DS:[(E)DI] registers. The mask (second) operand selects which bytes IF (MASK[23]=1) THEN DS:[(E)DI+2] ← SRC[23..16]
from the source operand are written to the memory. IF (MASK[31]=1) THEN DS:[(E)DI+3] ← SRC[31..24]
IF (MASK[39]=1) THEN DS:[(E)DI+4] ← SRC[39..32]
IF (MASK[47]=1) THEN DS:[(E)DI+5] ← SRC[47..40]
IF (MASK[55]=1) THEN DS:[(E)DI+6] ← SRC[55..48]
IF (MASK[63]=1) THEN DS:[(E)DI+7] ← SRC[63..56]
Store Quad-Word MOVNTQ m128, xmm Moves the double quad word from xmm to m128 using a non-temporal DST ← SRC
Using Non-temporal hint to prevent caching of the data during the write to memory.
Hint
Store Packed Single- MOVNTPS m128, xmm Moves the packed single-precision floating-point values from xmm to DST ← SRC
Precision Floating- m128 using a non-temporal hint to minimize cache pollution of the data
Point Values Using during the write to memory.
Non-temporal Hint
Prefetch temporal to PREFETCH0 m8 Fetches the line of data from memory that contains the byte specified
All Cache Levels with the source operand to a location in the cache hierarchy using T0
hint; it means to all levels of the cache hierarchy.
Prefetch temporal to PREFETCH1 m8 Fetches the line of data from memory that contains the byte specified
First Level Cache with the source operand to a location in the cache hierarchy using T1
hint; it means to the first level cache.
Prefetch temporal to PREFETCH2 m8 Fetches the line of data from memory that contains the byte specified
Second Level Cache with the source operand to a location in the cache hierarchy using T1
hint; it means to the second level cache.

SSE2 Instruction set

SSE2 Data Movement Instructions


Instruction Mnemonic Operands Description Symbolic operations
Move Aligned Packed MOVAPD xmm1, xmm2/m128 Moves packed double-precision floating-point values from source to DST ← SRC
Doubles xmm1/m128, xmm2 destination operand. When the source or destination operand is a memory
location, it must be aligned on a 16-byte boundary.
Move Unaligned MOVUPD xmm1, xmm2/m128 Moves packed double-precision floating-point values from source to DST ← SRC
Packed Doubles xmm1/m128, xmm2 destination operand. When the source or destination operand is a memory
location, it may be unaligned on a 16-byte boundary.
Move Double Quad- MOVDQA xmm1, xmm2/m128 Moves a double quad word from the source (second) operand to the DST ← SRC
Word Aligned xmm1/m128, xmm2 destination (first) operand. When the source or destination operand is a
memory location, it must be aligned on a 16-byte boundary.
Move Double Quad- MOVDQU xmm1, xmm2/m128 Moves a double quad word from the source (second) operand to the DST ← SRC
Word Unaligned xmm1/m128, xmm2 destination (first) operand. When the source or destination operand is a
memory location, it may be unaligned on a 16-byte boundary.
Move Quad-Word from MOVQ2DQ xmm, mm Moves the quad word from the mmx register to the low quad word of the DST[63..0] ← SRC
MMX to XMM Register xmm register. DST[127..64] ← 0000000000000000H
Move Quad-Word from MOVDQ2Q mm, xmm Moves the low quad word from the xmm register to the mmx register. DST ← SRC[63..0]
XMM to MMX Register
Move Low Packed MOVLPD xmm, m64 Moves a double-precision floating point from the memory to the low DST[63..0] ← SRC
Double quad word of the xmm register. // DST[127..64] remains unchanged
m64, xmm Moves a double-precision floating point from the low quad word of the DST ← SRC[63..0]
xmm register to the memory.
Move High Packed MOVHPD xmm, m64 Moves a double-precision floating point from the memory to the high DST[127..64] ← SRC
Double quad word of the xmm register. // DST[63..0] remains unchanged
m64, xmm Moves a double-precision floating point from the high quad word of the DST ← SRC[127..63]
xmm register to the memory.
Extract Packed Doubles MOVMSKPD r32, xmm Extracts 2-bit sign mask of from xmm and stores in r32. DST[0] ← SRC[31]
Sign Mask DST[1] ← SRC[63]
DST[31..2] ← 000000H
127
SRC

31
DST 0 0 0

Move Scalar Double MOVSD xmm, m128 Moves scalar double-precision floating-point value from source to DST[63..0] ← SRC[63..0]
destination operand. DST[127..64] ← 0000000000000000H

SSE2 Packed Arithmetic Instructions


Instruction Mnemonic Operands Description Symbolic operations
Add Packed Doubles ADDPD xmm1, xmm2/m128 Adds 2 packed double-precision floating-point values from xmm2/m128 DST[63..0] ← DEST[63..0] + SRC[63..0];
to 2 packed double-precision floating-point values in xmm1. DST[127..64] ← DEST[127..64] + SRC[127..64];
127
DST
127
SRC

+ +
127
DST

Add Scalar Doubles ADDSD xmm1, xmm2/m64 Adds the low double-precision floating-point value from xmm2/m64 to DST[63..0] ← DST[63..0] + SRC[63..0];
the low double-precision floating-point value in xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC

+
127
DST

Subtract Packed SUBPD xmm1, xmm2/m128 Subtracts packed double-precision floating-point values from DST[63..0] ← DEST[63..0]  SRC[63..0];
Doubles xmm2/m128 from 2 packed double-precision floating-point values in DST[127..64] ← DEST[127..64]  SRC[127..64];
xmm1.
127
DST
127
SRC

- -
127
DST
Subtract Scalar Doubles SUBSD xmm1, xmm2/m64 Subtracts the low double-precision floating-point value from xmm2/m64 DST[63..0] ← DST[63..0]  SRC[63..0];
from the low double-precision floating-point value in xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC

-
127
DST

Multiply Packed MULPD xmm1, xmm2/m128 Multiplies 2 packed double-precision floating-point values from mm1 DST[63..0] ← DEST[63..0] * SRC[63..0];
Doubles with 2 packed double-precision floating-point values in xmm2/m128. DST[127..64] ← DEST[127..64] + SRC[127..64];
127
DST
127
SRC

 

127
DST

Multiply Scalar Doubles MULSD xmm1, xmm2/m64 Multiplies the low double-precision floating-point value from mm1 with DST[63..0] ← DST[63..0] * SRC[63..0];
the low double-precision floating-point value in xmm2/m64. // DST[127..64] remains unchanged
127
DST
127
SRC



127
DST

Divide Packed Doubles DIVPD xmm1,xmm2/m128 Divides 2 packed double-precision floating-point values from mm1 by 2 DST[63..0] ← DST[63..0] / SRC[63..0];
packed double-precision floating-point values in xmm2/m128. DST[127..64] ← DST[127..64] / SRC[127..64];
127
DST
127
SRC

 

127
DST

Divide Scalar Doubles DIVSD xmm1,xmm2/m64 Divides the low double-precision floating-point value from mm1 by the DST[63..0] ← DST[63..0] / SRC[63..0];
low double-precision floating-point value in xmm2/m64. // DST[127..64] remains unchanged
127
DST
127
SRC



127
DST
Square Roots of Packed SQRTPD xmm1, xmm2/m128 Computes the square roots of the packed double-precision floating-point DST[63..0] ← SquareRoot (SRC[63..0]);
Doubles values in xmm2/m128 and stores the results in xmm1. DST[127..64] ← SquareRoot (SRC[127..64]);
127
SRC

SQRT SQRT
127
DST

Square Root of Scalar SQRTSD xmm1, xmm2/m32 Computes the square root of the scalar double-precision floating-point DST[63..0] ← SquareRoot (SRC[63..0]);
Double value in xmm2/m128 and stores the result in xmm1. // DST[127..64] remains unchanged
127
SRC

SQRT
127
DST

Maximum Packed MAXPD xmm1, xmm2/m128 Returns the maximum single-precision floating-point values between DST[63..0] ← MaximumOf (DST[63..0], SRC[63..0]);
Double xmm2/m128 and xmm1. DST[127..64] ← MaximumOf (DST[127..64], SRC[127..64]);
127
DST
127
SRC

MAX MAX
127
DST

Maximum Scalar MAXSD xmm1, xmm2/m32 Returns the maximum scalar single-precision floating-point value DST[63..0] ← MaximumOf (DST[63..0], SRC[63..0]);
Double between xmm2/m128 and xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC

MAX
127
DST

Minimum Packed MINPD xmm1, xmm2/m128 Returns the minimum single-precision floating-point values between DST[63..0] ← MinimumOf (DST[63..0], SRC[63..0]);
Double xmm2/m128 and xmm1. DST[127..64] ← MiniimumOf (DST[127..64], SRC[127..64]);
127
DST
127
SRC

MIN MIN
127
DST
Minimum Scalar MINSD xmm1, xmm2/m32 Returns the minimum scalar single-precision floating-point value DST[63..0] ← MinimumOf (DST[63..0], SRC[63..0]);
Double between xmm2/m128 and xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC

MIN
127
DST

SSE2 Logical Instructions


Instruction Mnemonic Operands Description Symbolic operations
AND of Packed ANDPD xmm1, xmm2/m128 Performs a bitwise AND operation of the two packed double-precision DST[63..0] ← DEST[63..0] AND SRC[63..0];
Doubles floating-point values from the destination (first) and source (second) DST[127..64] ← DEST[127..64] AND SRC[127..64]
operands and stored the result in the destination operand.
127
DST
127
SRC

AND AND
127
DST

AND NOT of Packed ANDNPD xmm1, xmm2/m128 Inverts the bits of the two packed double-precision floating-point values DST[63..0] ← (NOT DEST[63..0]) AND SRC[63..0];
Doubles in the destination (first) operand, performs a bitwise logical AND DST[127..64] ← (NOT DEST[127..64]) AND SRC[127..64]
operation of the two packed double-precision floating-point values from
the temporary inverted result and source (second) operand and stored the
result in the destination operand.
127
DST
127
SRC

ANDN ANDN
127
DST

OR of Packed Doubles ORPD xmm1, xmm2/m128 Performs a bitwise OR operation of the two packed double-precision DST[63..0] ← DEST[63..0] OR SRC[63..0];
floating-point values from the destination (first) and source (second) DST[127..64] ← DEST[127..64] OR SRC[127..64]
operands and stored the result in the destination operand.
127
DST
127
SRC

OR OR
127
DST
Exclusive OR of Packed XORPD xmm1, xmm2/m128 Performs a bitwise XOR operation of the two packed double-precision DST[63..0] ← DEST[63..0] XOR SRC[63..0];
Doubles floating-point values from the destination (first) and source (second) DST[127..64] ← DEST[127..64] XOR SRC[127..64]
operands and stored the result in the destination operand.
127
DST
127
SRC

XOR XOR
127
DST

SSE2 Comparison Instructions


Instruction Mnemonic Operands Description Symbolic operations
Compare Packed CMPPD xmm1, Compares packed double-precision floating-point values in xmm2/m128 CMP0 ← DST[63..0] OP SRC[63..0];
Doubles xmm2/m128, imm8 and xmm1 using imm8 as comparison predicate: 0 – equal, 1 – less than, 2 CMP1 ← DST[127..64] OP SRC[127..64];
– less or equal, 3 – unordered, 4 – not equal, 5 – not less, 6 – not less or IF CMP0 THEN DST[63..0] ← FFFFFFFFFFFFFFFFH
equal, 7 – ordered. The result of each comparison in a quad-word mask of ELSE DST[63..0] ← 0000000000000000H;
all 1s (comparison true) or all 0s (comparison false). The unordered IF CMP1 THEN DST[127..64] ← FFFFFFFFFFFFFFFFH
relationship is true when at leas one of the two operands is a NAN; the
ELSE DST[127..64] ← 0000000000000000H
ordered relationship id true when neither operand is a NAN.
127
DST
127
SRC

x?y x?y
127
DST

Compare Packed CMPEQPD xmm1, xmm2 <=> CMPPD xmm1,xmm2, 0 see CMPPD
Doubles CMPLTPD <=> CMPPD xmm1,xmm2, 1
CMPLEPD <=> CMPPD xmm1,xmm2, 2
CMPUNORDPD <=> CMPPD xmm1,xmm2, 3
CMPNEQPD <=> CMPPD xmm1,xmm2, 4
CMPNLTPD <=> CMPPD xmm1,xmm2, 5
CMPNLEPD <=> CMPPD xmm1,xmm2, 6
CMPORDPD <=> CMPPD xmm1,xmm2, 7
Compare Scalar CMPSD xmm1, xmm2/m64, Compares the low double-precision floating-point values in xmm2/m128 CMP0 ← DST[63..0] OP SRC[63..0];
Doubles imm8 and xmm1 using imm8 as comparison predicate: 0 – equal, 1 – less than, 2 IF CMP0 THEN DST[63..0] ← FFFFFFFFFFFFFFFFH
– less or equal, 3 – unordered, 4 – not equal, 5 – not less, 6 – not less or ELSE DST[63..0] ← 0000000000000000H;
equal, 7 – ordered. The result of each comparison in a quad-word mask of // DST[127..64] remains unchanged
all 1s (comparison true) or all 0s (comparison false). The unordered
relationship is true when at leas one of the two operands is a NAN; the
ordered relationship id true when neither operand is a NAN.
127
DST
127
SRC

x?y
127
DST

Compare Scalar CMPEQSD xmm1, xmm2 <=> CMPSD xmm1,xmm2, 0 see CMPSD
Doubles CMPLTSD <=> CMPSD xmm1,xmm2, 1
CMPLESD <=> CMPSD xmm1,xmm2, 2
CMPUNORDSD <=> CMPSD xmm1,xmm2, 3
CMPNEQSD <=> CMPSD xmm1,xmm2, 4
CMPNLTSD <=> CMPSD xmm1,xmm2, 5
CMPNLESD <=> CMPSD xmm1,xmm2, 6
CMPORDSD <=> CMPSD xmm1,xmm2, 7
Compare Scalar COMISD xmm1, xmm2/m64 Compares low double-precision floating-point values in the operands and Result ← OrderedCompare(DST[63..0], SRC[63..0])
Doubles and Set sets the EFLAGS flags accordingly. Performs ordered compare. This CASE (Result) OF
EFLAGS instruction differs from the UCOMISS instruction in that is signals an UNORDERED: ZF, PF, CF ← 111;
invalid operation exception when a source operand is a QNan or and GREATER_THAN: ZF, PF, CF ← 000;
SNaN. LESS_THAN: ZF, PF, CF ← 001;
127
DST EQUAL: ZF, PF, CF ← 100;
SRC
127 END
OF, AF, SF ← 0;
1 1 1 unordered
X0>Y0 Ordered
0 0 0
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0

OF AF SF ZF PF CF

Unordered Compare UCOMISD xmm1, xmm2/m64 Compares low double-precision floating-point values in the operands and Result ← UnorderedCompare(DST[63..0], SRC[63..0])
Scalar Doubles and set sets the EFLAGS flags accordingly. Performs unordered compare. This CASE (Result) OF
EFLAGS instruction differs from the COMISS instruction in that is signals an UNORDERED: ZF, PF, CF ← 111;
invalid operation exception only when a source operand is a SNaN. GREATER_THAN: ZF, PF, CF ← 000;
127
DST LESS_THAN: ZF, PF, CF ← 001;
127 EQUAL: ZF, PF, CF ← 100;
SRC
END
1 1 1 unordered OF, AF, SF ← 0;
0 0 0 X0>Y0 Unordered
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0

OF AF SF ZF PF CF
SSE2 Shuffle and Unpack Instructions
Instruction Mnemonic Operands Description Symbolic operations
Shuffle Packed Dwords PSHUFD xmm1, xmm2/m128, imm8 Moves double words from source (second) operand and inserts DST[31..0] ← (SRC SHR (ORDER[1..0]*32))[31..0]
them in the destination (first) operand at locations selected DST[63..32] ← (SRC SHR (ORDER[3..2]*32))[31..0]
with the order (third) operand. DST[95..64] ← (SRC SHR (ORDER[5..4]*32))[31..0]
SRC
127
X3 X2 X1 X0
DST[127..96] ← (SRC SHR (ORDER[7..6]*32))[31..0]

SEL SEL SEL SEL


[7..6] [5..4] [3..2] [1..0]

127
DST X3..X0 X3..X0 X3..X0 X3..X0

Shuffle Packed Low PSHUFLW xmm1, xmm2/m128, imm8 Moves the words from the low quad word of the source DST[15..0] ← (SRC SHR (ORDER[1..0]*16))[15..0]
Words (second) operand and inserts them to the low quad word of the DST[31..16] ← (SRC SHR (ORDER[1..0]*16))[15..0]
destination (first) operand at locations selected with the order DST[47..32] ← (SRC SHR (ORDER[1..0]*16))[15..0]
(third) operand. DST[63..48] ← (SRC SHR (ORDER[3..2]*16))[15..0]
SRC
127
X3 X2 X1 X0 DST[127..64] ← SRC[127..64]

SEL SEL SEL SEL


[7..6] [5..4] [3..2] [1..0]

127
DST X3..X0 X3..X0 X3..X0 X3..X0

Shuffle Packed High PSHUFHW xmm1, xmm2/m128, imm8 Moves the words from the high quad word of the source DST[63..0] ← SRC[63..0]
Words (second) operand and inserts them to the high quad word of DST[79..64] ← (SRC SHR (ORDER[1..0]*16))[79..64]
the destination (first) operand at locations selected with the DST[95..80] ← (SRC SHR (ORDER[1..0]*16))[79..64]
order (third) operand. DST[111..96] ← (SRC SHR (ORDER[1..0]*16))[79..64]
SRC
127
X3 X2 X1 X0 DST[127..112] ← (SRC SHR (ORDER[3..2]*16))[79..64]

SEL SEL SEL SEL


[7..6] [5..4] [3..2] [1..0]

127
DST X3..X0 X3..X0 X3..X0 X3..X0

Unpack Low Packed UNPCKLPD xmm1, xmm2/m128 Unpacks and interleaves the low double-precision floating- DST[63..0] ← DST[63..0]
Doubles point values from the low quad words of the source (second) DST[127..64] ← SRC[63..0]
operand and the destination (first) operand.
127
SRC Y1 Y0
127
DST X1 X0

127
DST Y0 X0
Unpack High Packed UNPCKHPD xmm1, xmm2/m128 Unpacks and interleaves the low double-precision floating- DST[63..0] ← DST[127..64]
Doubles point values from the high quad words of the source (second) DST[127..64] ← DST[127..64]
operand and the destination (first) operand.
127
DST X3 X2
127
SRC Y3 Y2

127
DST Y3 X3

Unpack Low Data PUNPCKLQDQ xmm1, xmm2/m128 Unpacks and interleaves low-order quad words from xmm1 DST[63..0] ← DST[63..0]
and xmm2/m128 into xmm1 register. DST[127..64] ← SRC[63..0]
127
SRC A’’

127
DST A’

127
DST A’’ A’

Unpack Low Data PUNPCKHQDQ xmm1, xmm2/m128 Unpacks and interleaves high-order quad words from xmm1 DST[63..0] ← DST[127..6]
and xmm2/m128 into xmm1 register. DST[127..64] ← SRC[127..64]
127
DST A’

127
SRC A’’

127
DST A’’ A’

SSE2 Conversion Instructions


Instruction Mnemonic Operands Description Symbolic operations
Convert Packed Integers CVTPI2PD xmm, mm/m64 Converts two packed signed double word integers from mm/mem64 to DST[63..0] ← IntToDouble (SRC[31..0]);
to Packed Doubles two packed double-precision floating-point values from xmm. DST[127..64] ← IntToDouble (SRC[63..32]);
63
SRC

127
DST

Convert Packed CVTPD2PI mm, xmm/m128 Converts two packed double-precision floating-point values from DST[31..0] ← DoubleToInt (SRC[63..0]);
Doubles to Packed xmm/m128 to two packed signed double-word integers in mm. DST[63..32] ← DoubleToInt (SRC[127..64])
Integers 127
SRC

63
DST
Convert with CVTTPD2PI mm, xmm/m128 Converts two packed double-precision floating-point values from DST[31..0] ← TruncateDoubleToInt (SRC[63..0]);
Truncation Packed xmm/m128 to two packed signed double-word integers in mm using DST[63..32] ← TruncateDoubleToInt (SRC[127..64])
Doubles to Packed truncation.
Integers 127
SRC

63
DST

Convert Packed CVTPD2DQ xmm1, xmm2/m128 Converts two packed double-precision floating-point values from DST[31..0] ← DoubleToInt (SRC[63..0]);
Doubles to Packed xmm2/m128 to two packed signed double-word integers in xmm1. DST[63..32] ← DoubleToInt (SRC[127..64]);
Dwords 127 DST[127-64] ← 0000000000000000H
SRC

127
DST 0 0

Convert with CVTTPD2DQ xmm1, xmm2/m128 Converts two packed double-precision floating-point values from DST[31..0] ← TruncateDoubleToInt (SRC[63..0]);
Truncation Packed xmm2/m128 to two packed signed double-word integers in xmm1 using DST[63..32] ← TruncateDoubleToInt (SRC[127..64]);
Doubles to Packed truncation. DST[127-64] ← 0000000000000000H
Dwords 127
SRC

127
DST 0 0

Convert Packed Dwords CVTDQ2PD xmm1, xmm2/m64 Converts two packed signed double-word integers from xmm2/m64 to DST[63..0] ← IntToDouble(SRC[31..0]);
to Packed Doubles two packed double-precision floating-point values in xmm1. DST[127..64] ← IntToDouble(SRC[63..32])
63
SRC

127
DST

Convert Packed Singles CVTPS2PD xmm1, xmm2/m64 Converts two packed single-precision floating-point values from DST[63..0] ← SingleToDouble (SRC[31..0]);
to Packed Doubles xmm2/m64 to two packed double-precision floating-point values in DST[127..64] ← SingleToDouble (SRC[63..32])
xmm1.
63
SRC

127
DST

Convert Packed CVTPD2PS xmm1, xmm2/m128 Converts two packed double-precision floating-point values from DST[31..0] ← DoubleToSingle (SRC[63..0]);
Doubles to Packed xmm2/m128 to two packed single-precision floating-point values in DST[63..32] ← DoubleToSingle (SRC[127..64]);
Singles xmm1. DST[127-64] ← 0000000000000000H
127
SRC

127
DST 0 0
Convert Scalar Single CVTSS2SD xmm1, xmm2/m32 Converts one scalar single-precision floating-point value from xmm2/m32 DST[63..0] ← SingleToDouble (SRC[31..0]);
Scalar Double. to one double-precision floating-point value in xmm. // DST[127..64] remains unchanged
127
SRC

127
DST

Convert Scalar Double CVTSD2SS xmm1, xmm2/m64 Converts one scalar double-precision floating-point value from xmm/m64 DST[31..0] ← DoubleToSingle (SRC[63..0]);
to Scalar Single to one single-precision floating-point value in xmm1. // DST[127..32] remains unchanged
127
SRC

127
DST

Convert Scalar Double CVTSD2SI r32, xmm/m64 Converts one scalar double-precision floating-point value from xmm/m64 DST← DoubleToInt (SRC[63..0]);
to Scalar Integer to one signed double-word integer in r32.
127
SRC

31
DST

Convert with CVTTSD2SI r32, xmm/m64 Converts one scalar double-precision floating-point value from xmm/m64 DST← TruncateDoubleToInt (SRC[63..0]);
Truncation Scalar to one signed double-word integer in r32 using truncation.
Double to Scalar Integer 127
SRC

31
DST

Convert Scalar Integer CVTSI2SD xmm, r/m32 Converts one signed double-word integer from r/m32 to one scalar DST[63..0]← IntToDouble (SRC);
to Scalar Double double-precision floating-point value in xmm. // DST[127..64] remains unchanged
31
SRC

127
DST

SSE2 Packed Single-Precision Floating-Point Instructions


Instruction Mnemonic Operands Description Symbolic operations
Convert Packed Dwords CVTDQ2PS xmm1, xmm2/m128 Converts four packed signed double-word integers from xmm2/m128 to DST[31..0] ← IntToSingle(SRC[31..0]);
to Packed Singles four packed single-precision floating point values in xmm1. DST[63..32] ← IntToSingle(SRC[63..32]);
DST[95..64] ← IntToSingle(SRC[95..64]);
DST[127..96] ← IntToSingle(SRC[127..96])
Convert Packed Singles CVTPS2DQ xmm1, xmm2/m128 Converts four packed single-precision floating-point values from DST[31..0] ← SingleToInt (SRC[31..0]);
to Packed Dwords xmm2/m128 to four packed signed double-word integers in xmm1. DST[63..32] ← SingleToInt (SRC[63..32]);
DST[95..64] ← SingleToInt (SRC[95..64]);
DST[127..96] ← SingleToInt (SRC[127..96])
Convert with CVTTPS2DQ xmm1, xmm2/m128 Converts four packed single-precision floating-point values from DST[31..0] ← TruncateSingleToInt (SRC[31..0]);
Truncation Packed xmm2/m128 to four packed signed double-word integers in xmm1 using DST[63..32] ← TruncateSingleToInt (SRC[63..32]);
Singles to Packed truncation. DST[95..64] ← TruncateSingleToInt (SRC[95..64]);
Dwords DST[127..96] ← TruncateSingleToInt (SRC[127..96])

SSE2 128-Bit SIMD Integer Instructions


Instruction Mnemonic Operands Description Symbolic operations
Add Packed Quad-word PADDQ mm1, mm2/m64 Adds the quad word integer from the source (second) operand to DST[63..0] ← DST[63..0] + SRC[63..0]
Integers the destination (first) operand.
63
DST
63
SRC

63
DST

xmm1, xmm2/m128 Adds 2 quad word integers from the source (second) operand to DST[63..0] ← DST[63..0] + SRC[63..0]
2 quad word integers in the destination (first) operand. DST[127..64] ← DST[127..64] + SRC[127..64]
127
DST
127
SRC

+ +

127
DST

Subtract Packed Quad- PSUBQ mm1, mm2/m64 Subtracts the quad word integer from the source (second) DST[63..0] ← DST[63..0]  SRC[63..0]
word Integers operand from the destination (first) operand.
63
DST
63
SRC

63
DST

xmm1, xmm2/m128 Subtracts 2 quad word integers from the source (second) operand DST[63..0] ← DST[63..0]  SRC[63..0]
from 2 quad word integers in the destination (first) operand. DST[127..64] ← DST[127..64]  SRC[127..64]
127
DST
127
SRC

- -

127
DST
Multiply Packed PMULUDQ mm1, mm2/m64 Multiplies the unsigned double word integer from the destination DST[63..0] ← DST[31..0] * SRC[31..0]
Unsigned Double-word (first) operand by the unsigned double word integer from the
Integers source (second) operand and stores the quad word result in the
destination (first) operand.
63
DST
63
SRC



63
DST

xmm1, xmm2/m128 Subtracts 2 quad word integers from the source (second) operand DST[63..0] ← DST[31..0] * SRC[31..0]
from 2 quad word integers in the destination (first) operand. DST[127..64] ← DST[95..64] * SRC[95..64]
127
DST
127
SRC

 

127
DST

Packed Shift Left PSLLDQ xmm1, imm8 Shifts xmm1 left by imm8 bytes while shifting in 0s. TMP ← Count;
Logical Quad-word IF (T MP>15) THEN TMP ←16;
DST ← DST SHL (TMP * 8)
Packed Shift Right PSRLDQ xmm1, imm8 Shifts xmm1 right by imm8 bytes while shifting in 0s. TMP ← Count;
Logical Quad-word IF (T MP>15) THEN TMP ←16;
DST ← DST SHR (TMP * 8)

SSE2 Cacheability Control and Instructions Ordering Instructions


Instruction Mnemonic Operands Description Symbolic operations
Flush Cache Line CLFLUSH m8 Invalidated the cache line that contains the linear address specified with
the source operand from all levels of the processor cache hierarchy (data
and instruction). The invalidation is broadcast through the cache
coherence domain. If, at any level of the cache hierarchy, the line is
inconsistent with the memory, it is written to memory before invalidation.
Store Fence SFENCE Performs a serializing operation on all store-to-memory instructions that
were issued prior the SFENCE instruction.
Load Fence LFENCE Performs a serializing operation on all load-from-memory instructions
that were issued prior the LFENCE instruction.
Memory Fence MFENCE Performs a serializing operation on all load-from-memory and store-to-
memory instructions that were issued prior the MFENCE instruction.
Spin Loop Hint PAUSE
Store Selected Bytes of MASKMOVDQU mm1, mm2 Stores selected bytes from the mmx register (first operand) into a 128-bit IF (MASK[7]=1) THEN DS:[(E)DI] ← SRC[7..0]
Double Quad-Word memory location. The address of the memory location is specified by IF (MASK[15]=1) THEN DS:[(E)DI+1] ← SRC[15..8]
Using Non-temporal DS:[(E)DI] registers. The mask (second) operand selects which bytes IF (MASK[23]=1) THEN DS:[(E)DI+2] ← SRC[23..16]
Hint from the source operand are written to the memory. IF (MASK[31]=1) THEN DS:[(E)DI+3] ← SRC[31..24]
IF (MASK[39]=1) THEN DS:[(E)DI+4] ← SRC[39..32]
IF (MASK[47]=1) THEN DS:[(E)DI+5] ← SRC[47..40]
IF (MASK[55]=1) THEN DS:[(E)DI+6] ← SRC[55..48]
IF (MASK[63]=1) THEN DS:[(E)DI+7] ← SRC[63..56]
IF (MASK[71]=1) THEN DS:[(E)DI+8] ← SRC[71..64]
IF (MASK[79]=1) THEN DS:[(E)DI+9] ← SRC[79..80]
IF (MASK[87]=1) THEN DS:[(E)DI+10] ← SRC[87..80]
IF (MASK[95]=1) THEN DS:[(E)DI+11] ← SRC[95..88]
IF (MASK[103]=1) THEN DS:[(E)DI+12] ← SRC[103..96]
IF (MASK[111]=1) THEN DS:[(E)DI+13] ← SRC[111..104]
IF (MASK[119]=1) THEN DS:[(E)DI+14] ← SRC[119..112]
IF (MASK[127]=1) THEN DS:[(E)DI+15] ← SRC[127..120]
Store Double Quad- MOVNTDQ m128, xmm Moves the double quad word from xmm to m128 using a non-temporal DST ← SRC
Word Using Non- hint to prevent caching of the data during the write to memory.
temporal Hint
Store Packed Double- MOVNTPD m128, xmm Moves the packed double-precision floating-point values from xmm to DST ← SRC
Precision Floating-Point m128 using a non-temporal hint to minimize cache pollution during the
Values Using Non- write to memory.
temporal Hint
Store Double-Word MOVNTI m32, r32 Moves the double word integer from the r32 register to the m32 memory DST ← SRC
Using Non-temporal using a non-temporal hint to minimize cache pollution during the write to
Hint memory.

System Instructions
Instruction Mnemonic Operands Description Symbolic operations
Load Global Description LGDT m16&32 Loads the values from the memory into the global descriptor table GDTR ← SRC
Table Register register (GDTR).
Store Global Descriptor SGDT m16&32 Stores the contents of the global description table register (GDTR) to the DST ← GDTR
Table Register memory.
Load Interrupt LIDT m16&32 Loads the values from the memory into the interrupt descriptor table IDTR ← SRC
Descriptor Table register (IDTR).
Register
Store Interrupt SIDT m16&32 Stores the contents of the interrupt description table register (IDTR) to DST ← IDTR
Descriptor Table the memory.
Register
Load Local Descriptor LLDT r/m16 Loads the segment selector from the local descriptor table register LDTR ← SRC
Table Register r32 (LDTR) in the register or memory.
Store Local Descriptor SLDT r/m16 Stores the segment selector from the register or memory to the local DST ← LDTR
Table Register r32 descriptor table register (LDTR).
Load Machine Status LMSW r/m16 Loads the machine status word from the register or memory. MSW ← SRC
Word r32
Store Machine Status SMSW r/m16 Stores the machine status word to the register or memory. DST ← MSW
Word r32
Load Task Register LTR r/m16 Loads the source operand into the segment selector field of the task TR ← SRC
register.
Store Task Register STR r/m16 Stores the segment selector field of the task register into the destination DST ← TR
operand
Move to/from control MOV CR0, r32 Moves r32 to CR0. DST ← SRC
registers CR2, r32 Moves r32 to CR2.
CR3, r32 Moves r32 to CR3.
CR4, r32 Moves r32 to CR4.
r32, CR0 Moves CR0 to r32.
r32, CR2 Moves CR2 to r32.
r32, CR3 Moves CR3 to r32.
r32, CR4 Moves CR4 to r32.
Clear Task-Switch CLTS Clears the TS flag in the CR0 register. This flag is set every time a task CR0.TS ←0;
switch occurs.
Adjust RPL Field of ARPL r/m16, r16 Compares the RPL fields of the two segment selectors. The first IF DST.RPL < SRC.RPL THEN
Segment Selector (destination) operand contains one segment selector and the second ZF ← 1;
(source) operand contains the other. The RPL field of the first operand is DST.RPL ← SRC.RPL;
set not less then RPL of the second operand. ELSE
ZF ← 0;
END;
Load Access Rights LAR r16, r/m16 Loads the access right from the segment descriptor specified by the
Byte r32, r/m32 source operand into the destination operand and sets the ZF flag.
Load Segment Limit LSL r16, r/m16 Loads the unscrambled segment limit from the segment descriptor
r32, r/m32 specified by the source operand into the destination operand and sets the
ZF flag.
Verify a Segment for VERR r/m16 Sets ZF=1 if segment specified with r/m16 can be read.
Reading
Verify a Segment for VERW r/m16 Sets ZF=1 if segment specified with r/m16 can be written.
Writing
Move to/from Debug MOV r32, DR0-DR7 Moves debug register to r32. DST ← SRC
Registers DR0-DR7, r32 Moves r32 to debug register .
Invalidate Cache INVD Flushes internal cashes; initiates flushing of external caches.
Write Back and WBINVD Writes back and flushes internal cashes; initiates writing-back and
Invalidate Data Cache flushing of external caches.
Invalidate TLB Entry INVLPG Invalidates (flushed) the translation look-aside buffer (TLB) entry
specified with the source operand.
Assert LOCK# signal LOCK (prefix) Causes the processor LOCK# signal to be asserted during execution of
Prefix the accompanying instructions (turns the instruction into an atomic
instruction). In a multiprocessor environment, the lock# signal insures
that the processor has exclusive use of any shared memory while the
signal is asserted.
Halt HLT Stops instruction execution and places the processor in a HALT state. An
enabled interrupt (including NMI and SMI), a debug exception, the
BINIT# signal, the INIT# signal or the RESET# signal will resume
execution.
Resume from System RSM Returns program control from system management mode (SMM) to the
Management Mode application program or operating-system procedure that was interrupted
when the processor receive and SSM interrupt.
Read from Model- RDMSR Reads the contents of the 64-bit model specific register (MSR) specified
Specific Register in the ECX register into registers EDX:EAX.
Write from Model- WRMSR Writes the contents of registers EDX:EAX into the 64-bit model specific
Specific Register register (MSR) specified in the ECX register.
Read Performance RDPMC Reads the contents of the 40-bit performance-monitoring counter (PMC)
Monitoring Counters specified in the ECX register into registers EDX:EAX.
Read Time-Stamp RDTSC Reads the current value of the processors time-stamp counter into the
Counter EDX:EAX register.
Fast System Call SYSENTER Fast call to privilege level 0 system procedures or routine.
Fast Return from Fast SYSEXIT Executes a fast return to privilege level 3 user code.
System Call

You might also like