0% found this document useful (0 votes)
324 views58 pages

X86 Instruction Listings

The document discusses the x86 instruction set and its evolution over time. It lists the original 8086/8088 instruction set and notes how it has been extended with new instructions in subsequent processors. The document is organized by type of instructions such as integer, floating-point, SIMD, and cryptographic instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
324 views58 pages

X86 Instruction Listings

The document discusses the x86 instruction set and its evolution over time. It lists the original 8086/8088 instruction set and notes how it has been extended with new instructions in subsequent processors. The document is organized by type of instructions such as integer, floating-point, SIMD, and cryptographic instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

x86 instruction listings

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The
instructions are usually part of an executable program, often stored as a computer file and executed on the
processor.

The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as
new functionality.[1]

Contents
x86 integer instructions
Original 8086/8088 instructions
Added in specific processors
Added with 80186/80188
Added with 80286
Added with 80386
Added with 80486
Added with Pentium
Added with Pentium MMX
Added with AMD K6
Added with Pentium Pro
Added with Pentium II
Added with SSE
Added with SSE2
Added with SSE3
Added with SSE4.2
Added with x86-64
Added with AMD-V
Added with Intel VT-x
Added with ABM
Added with BMI1
Added with BMI2
Added with TBM
Added with CLMUL instruction set
Added with Intel ADX
x87 floating-point instructions
Original 8087 instructions
Added in specific processors
Added with 80287
Added with 80387
Added with Pentium Pro
Added with SSE
Added with SSE3
SIMD instructions
MMX instructions
Original MMX instructions
MMX instructions added in specific processors
EMMI instructions
MMX instructions added with MMX+ and SSE
MMX instructions added with SSE2
MMX instructions added with SSSE3
3DNow! instructions
3DNow!+ instructions
Added with Athlon and K6-2+
Added with Geode GX
SSE instructions
SSE2 instructions
SSE2 SIMD floating-point instructions
SSE2 data movement instructions
SSE2 packed arithmetic instructions
SSE2 logical instructions
SSE2 compare instructions
SSE2 shuffle and unpack instructions
SSE2 conversion instructions
SSE2 SIMD integer instructions
SSE2 MMX-like instructions extended to SSE registers
SSE2 integer instructions for SSE registers only
SSE3 instructions
SSE3 SIMD floating-point instructions
SSE3 SIMD integer instructions
SSSE3 instructions
SSE4 instructions
SSE4.1
SSE4.1 SIMD floating-point instructions
SSE4.1 SIMD integer instructions
SSE4a
SSE4.2
SSE5 derived instructions
XOP
F16C
FMA3
FMA4
AVX
AVX2
AVX-512
AVX-512 foundation
Cryptographic instructions
Intel AES instructions
RDRAND and RDSEED
Intel SHA instructions
Undocumented instructions
Undocumented x86 instructions
Undocumented x87 instructions
See also
References
External links

x86 integer instructions


This is the full 8086/8088 instruction set of Intel. Most if not all of these instructions are available in 32-bit
mode; they just operate on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.)
counterparts. See also x86 assembly language for a quick tutorial for this processor family. The updated
instruction set is also grouped according to architecture (i386, i486, i686) and more generally is referred to as
x86 32 and x86 64 (also known as AMD64).

Original 8086/8088 instructions


Original 8086/8088 instruction set
Instruction Meaning Notes Opcode
AAA ASCII adjust AL after addition used with unpacked binary coded decimal 0x37
8086/8088 datasheet documents only base
10 version of the AAD instruction (opcode
0xD5 0x0A), but any other base will work.
Later Intel's documentation has the generic
AAD ASCII adjust AX before division 0xD5
form too. NEC V20 and V30 (and possibly
other NEC V-series CPUs) always use
base 10, and ignore the argument, causing
a number of incompatibilities
ASCII adjust AX after Only base 10 version (Operand is 0xA) is
AAM 0xD4
multiplication documented, see notes for AAD
AAS ASCII adjust AL after subtraction 0x3F
0x10…0x15,
destination = destination + 0x80…0x81/2,
ADC Add with carry
source + carry_flag 0x82…0x83/2
(since 80186)
0x00…0x05,
0x80/0…
ADD Add (1) r/m += r/imm; (2) r += m/imm; 0x81/0,
0x82/0…0x83/0
(since 80186)
0x20…0x25,
0x80…0x81/4,
AND Logical AND (1) r/m &= r/imm; (2) r &= m/imm;
0x82…0x83/4
(since 80186)
push eip; eip points to the
0x9A, 0xE8,
CALL Call procedure instruction directly after the
0xFF/2, 0xFF/3
call

CBW Convert byte to word 0x98


CLC Clear carry flag CF = 0; 0xF8

CLD Clear direction flag DF = 0; 0xFC

CLI Clear interrupt flag IF = 0; 0xFA

CMC Complement carry flag 0xF5


0x38…0x3D,
0x80…0x81/7,
CMP Compare operands
0x82…0x83/7
(since 80186)
CMPSB Compare bytes in memory 0xA6
CMPSW Compare words 0xA7
CWD Convert word to doubleword 0x99
DAA Decimal adjust AL after addition (used with packed binary coded decimal) 0x27
Decimal adjust AL after
DAS 0x2F
subtraction
0x48…0x4F,
DEC Decrement by 1
0xFE/1, 0xFF/1
DIV Unsigned divide (1) AX = DX:AX / r/m; resulting DX = 0xF7/6, 0xF6/6
remainder (2) AL = AX / r/m;
resulting AH = remainder

ESC Used with floating-point unit 0xD8..0xDF


HLT Enter halt state 0xF4
(1) AX = DX:AX / r/m; resulting DX =
IDIV Signed divide remainder (2) AL = AX / r/m; 0xF7/7, 0xF6/7
resulting AH = remainder

0x69, 0x6B
(1) DX:AX = AX * r/m; (2) AX = AL * (both since
IMUL Signed multiply 80186), 0xF7/5,
r/m
0xF6/5, 0x0FAF
(since 80386)
(1) AL = port[imm]; (2) AL =
0xE4, 0xE5,
IN Input from port port[DX]; (3) AX = port[imm]; (4)
0xEC, 0xED
AX = port[DX];

0x40…0x47,
INC Increment by 1
0xFE/0, 0xFF/0
INT Call to interrupt 0xCC, 0xCD
INTO Call to interrupt if overflow 0xCE
IRET Return from interrupt 0xCF
(JA, JAE, JB, JBE, JC, JE, JG,
JGE, JL, JLE, JNA, JNAE, JNB, 0x70…0x7F,
0x0F80…
Jcc Jump if condition JNBE, JNC, JNE, JNG, JNGE, JNL,
0x0F8F (since
JNLE, JNO, JNP, JNS, JNZ, JO,
80386)
JP, JPE, JPO, JS, JZ)

JCXZ Jump if CX is zero 0xE3


0xE9…0xEB,
JMP Jump
0xFF/4, 0xFF/5
LAHF Load FLAGS into AH register 0x9F
LDS Load pointer using DS 0xC5
LEA Load Effective Address 0x8D
LES Load ES with pointer 0xC4
LOCK Assert BUS LOCK# signal (for multiprocessing) 0xF0
if (DF==0) AL = *SI++; else AL
LODSB Load string byte 0xAC
= *SI--;

if (DF==0) AX = *SI++; else AX


LODSW Load string word 0xAD
= *SI--;

(LOOPE, LOOPNE, LOOPNZ, LOOPZ) if


LOOP/LOOPx Loop control 0xE0…0xE2
(x && --CX) goto lbl;

copies data from one location to another,


MOV Move 0xA0...0xA3
(1) r/m = r; (2) r = r/m;

if (DF==0)
*(byte*)DI++ = *(byte*)SI++;
MOVSB Move byte from string to string else 0xA4
*(byte*)DI-- = *(byte*)SI--;

MOVSW Move word from string to string 0xA5


if (DF==0)
*(word*)DI++ = *(word*)SI++;
else
*(word*)DI-- = *(word*)SI--;

(1) DX:AX = AX * r/m; (2) AX = AL *


MUL Unsigned multiply 0xF7/4, 0xF6/4
r/m;

NEG Two's complement negation r/m *= -1; 0xF6/3…0xF7/3

NOP No operation opcode equivalent to XCHG EAX, EAX 0x90

NOT Negate the operand, logical NOT r/m ^= -1; 0xF6/2…0xF7/2

0x08…0x0D,
0x80…0x81/1,
OR Logical OR (1) r/m |= r/imm; (2) r |= m/imm;
0x82…0x83/1
(since 80186)
(1) port[imm] = AL; (2) port[DX] =
0xE6, 0xE7,
OUT Output to port AL; (3) port[imm] = AX; (4)
0xEE, 0xEF
port[DX] = AX;

0x07,
r/m = *SP++; POP CS (opcode 0x0F) 0x0F(8086/8088
POP Pop data from stack works only on 8086/8088. Later CPUs use only), 0x17,
0x0F as a prefix for newer instructions. 0x1F, 0x58…
0x5F, 0x8F/0
POPF Pop FLAGS register from stack FLAGS = *SP++; 0x9D

0x06, 0x0E,
0x16, 0x1E,
0x50…0x57,
PUSH Push data onto stack *--SP = r/m;
0x68, 0x6A
(both since
80186), 0xFF/6
PUSHF Push FLAGS onto stack *--SP = FLAGS; 0x9C

0xC0…0xC1/2
RCL Rotate left (with carry) (since 80186),
0xD0…0xD3/2
0xC0…0xC1/3
RCR Rotate right (with carry) (since 80186),
0xD0…0xD3/3
Repeat
REPxx (REP, REPE, REPNE, REPNZ, REPZ) 0xF2, 0xF3
MOVS/STOS/CMPS/LODS/SCAS
Not a real instruction. The assembler will
translate these to a RETN or a RETF
RET Return from procedure
depending on the memory model of the
target system.
RETN Return from near procedure 0xC2, 0xC3
RETF Return from far procedure 0xCA, 0xCB
0xC0…0xC1/0
ROL Rotate left (since 80186),
0xD0…0xD3/0
0xC0…0xC1/1
ROR Rotate right (since 80186),
0xD0…0xD3/1
SAHF Store AH into FLAGS 0x9E
SAL Shift Arithmetically left (signed (1) r/m <<= 1; (2) r/m <<= CL; 0xC0…0xC1/4
shift left) (since 80186),
0xD0…0xD3/4

(1) (signed) r/m >>= 1; (2) 0xC0…0xC1/7


Shift Arithmetically right (signed
SAR (since 80186),
shift right) (signed) r/m >>= CL;
0xD0…0xD3/7
0x18…0x1D,
alternative 1-byte encoding of
0x80…0x81/3,
SBB Subtraction with borrow SBB AL, AL is available via
0x82…0x83/3
undocumented SALC instruction
(since 80186)
SCASB Compare byte string 0xAE
SCASW Compare word string 0xAF
0xC0…0xC1/4
SHL Shift left (unsigned shift left) (since 80186),
0xD0…0xD3/4
0xC0…0xC1/5
SHR Shift right (unsigned shift right) (since 80186),
0xD0…0xD3/5
STC Set carry flag CF = 1; 0xF9

STD Set direction flag DF = 1; 0xFD

STI Set interrupt flag IF = 1; 0xFB

if (DF==0) *ES:DI++ = AL; else


STOSB Store byte in string 0xAA
*ES:DI-- = AL;

if (DF==0) *ES:DI++ = AX; else


STOSW Store word in string 0xAB
*ES:DI-- = AX;

0x28…0x2D,
0x80…0x81/5,
SUB Subtraction (1) r/m -= r/imm; (2) r -= m/imm;
0x82…0x83/5
(since 80186)
0x84, 0x84,
TEST Logical compare (AND) (1) r/m & r/imm; (2) r & m/imm; 0xA8, 0xA9,
0xF6/0, 0xF7/0
Waits until BUSY# pin is inactive (used
WAIT Wait until not busy 0x9B
with floating-point unit)
r :=: r/m; A spinlock typically uses 0x86, 0x87,
XCHG Exchange data
xchg as an atomic operation. (coma bug). 0x91…0x97

XLAT Table look-up translation behaves like MOV AL, [BX+AL] 0xD7

0x30…0x35,
0x80…0x81/6,
XOR Exclusive OR (1) r/m ^= r/imm; (2) r ^= m/imm;
0x82…0x83/6
(since 80186)

Added in specific processors

Added with 80186/80188


Instruction Meaning Notes
Check array
BOUND index against raises software interrupt 5 if test fails
bounds
Modifies stack for entry to procedure for high level
Enter stack language. Takes two operands: the amount of storage
ENTER
frame to be allocated on the stack and the nesting level of
the procedure.
equivalent to

Input from port IN (E)AX, DX


INS MOV ES:[(E)DI], (E)AX
to string ; adjust (E)DI according to operand size
and DF

Leave stack Releases the local stack storage created by the


LEAVE
frame previous ENTER instruction.
equivalent to

Output string to MOV (E)AX, DS:[(E)SI]


OUTS OUT DX, (E)AX
port ; adjust (E)SI according to operand size
and DF

equivalent to

POP DI
POP SI
Pop all general POP BP
purpose POP AX ; no POP SP here, all it does is
POPA ADD SP, 2 (since AX will be overwritten
registers from later)
stack POP BX
POP DX
POP CX
POP AX

equivalent to

PUSH AX
PUSH CX
Push all general PUSH DX
purpose PUSH BX
PUSHA
registers onto PUSH SP ; The value stored is the initial
stack SP value
PUSH BP
PUSH SI
PUSH DI

equivalent to
Push an
immediate
PUSH immediate PUSH 12h
byte/word value PUSH 1200h
onto the stack

IMUL immediate Signed equivalent to


multiplication of
immediate IMUL BX,12h
byte/word value IMUL DX,1200h
IMUL CX, DX, 12h
IMUL BX, SI, 1200h
IMUL DI, word ptr [BX+SI], 12h
IMUL SI, word ptr [BP-4], 1200h

Rotate/shift bits equivalent to


with an
SHL/SHR/SAL/SAR/ROL/ROR/RCL/RCR
immediate ROL AX,3
immediate
value greater SHR BL,3
than 1

Added with 80286

Instruction Meaning Notes


ARPL Adjust RPL field of selector
CLTS Clear task-switched flag in register CR0
LAR Load access rights byte
LGDT Load global descriptor table
LIDT Load interrupt descriptor table
LLDT Load local descriptor table
LMSW Load machine status word
LOADALL Load all CPU registers, including internal ones such as GDT Undocumented, 80286 and 80386 only
LSL Load segment limit
LTR Load task register
SGDT Store global descriptor table
SIDT Store interrupt descriptor table
SLDT Store local descriptor table
SMSW Store machine status word
STR Store task register
VERR Verify a segment for reading
VERW Verify a segment for writing

Added with 80386


Instruction Meaning Notes
BSF Bit scan forward
BSR Bit scan reverse
BT Bit test
BTC Bit test and complement
BTR Bit test and reset
BTS Bit test and set
Sign-extends EAX into EDX, forming the quad-word EDX:EAX. Since
Convert double-word to (I)DIV uses EDX:EAX as its input, CDQ must be called after setting
CDQ
quad-word EAX if EDX is not manually initialized (as in 64/32 division) before
(I)DIV.
Compares ES:[(E)DI] with DS:[(E)SI] and increments or decrements
CMPSD Compare string double-word
both (E)DI and (E)SI, depending on DF; can be prefixed with REP
Convert word to double-
CWDE Unlike CWD, CWDE sign-extends AX to EAX instead of AX to DX:AX
word
IBTS Insert Bit String discontinued with B1 step of 80386
Input from port to string
INSD
double-word
Interrupt return; D suffix
means 32-bit return, F
IRETx suffix means do not Use IRETD rather than IRET in 32-bit situations
generate epilogue code (i.e.
LEAVE instruction)
JECXZ Jump if ECX is zero
LFS, LGS Load far pointer
LSS Load stack segment
EAX = *ES:EDI±±; (±± depends on DF, ES cannot be overridden);
LODSD Load string double-word
can be prefixed with REP
LOOPW,
Loop, conditional loop Same as LOOP, LOOPcc for earlier processors
LOOPccW
LOOPD, if (cc && --ECX) goto lbl;, cc = Z(ero), E(qual), NonZero,
Loop while equal
LOOPccD N(on)E(qual)
MOV
Move to/from special CR=control registers, DR=debug registers, TR=test registers (up to
to/from
registers 80486)
CR/DR/TR
*(dword*)ES:EDI±± = *(dword*)ESI±±; (±± depends on DF);
MOVSD Move string double-word
can be prefixed with REP
MOVSX Move with sign-extension (long)r = (signed char) r/m; and similar

MOVZX Move with zero-extension (long)r = (unsigned char) r/m; and similar

Output to port from string


OUTSD port[DX] = *(long*)ESI±±; (±± depends on DF)
double-word
Pop all double-word (32-bit)
POPAD Does not pop register ESP off of stack
registers from stack
Pop data into EFLAGS
POPFD
register
PUSHAD Push all double-word (32-
bit) registers onto stack
Push EFLAGS register onto
PUSHFD
stack
Scan string data double- Compares ES:[(E)DI] with EAX and increments or decrements (E)DI,
SCASD
word depending on DF; can be prefixed with REP
(SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE,
Set byte to one on SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE, SETNC,
SETcc
condition, zero otherwise SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP,
SETNS, SETNZ, SETO, SETP, SETPE, SETPO, SETS, SETZ)

SHLD Shift left double-word


r1 = r1>>CL ∣ r2<<(32-CL); Instead of CL, immediate 1 can be
SHRD Shift right double-word
used
*ES:EDI±± = EAX; (±± depends on DF, ES cannot be overridden);
STOSD Store string double-word
can be prefixed with REP
XBTS Extract Bit String discontinued with B1 step of 80386

Added with 80486

Instruction Meaning Notes


r = r<<24 | r<<8&0x00FF0000 | r>>8&0x0000FF00 | r>>24; Only defined
for 32-bit registers. Usually used to change between little endian and big endian
BSWAP Byte Swap
representations. When used with 16-bit registers produces various different results on
486,[2] 586, and Bochs/QEMU.[3]
atomic
CoMPare
CMPXCHG See Compare-and-swap / on later 80386 as undocumented opcode available
and
eXCHanGe
Invalidate
INVD Internal Flush internal caches
Caches
Invalidate
INVLPG Invalidate TLB Entry for page that contains data specified
TLB Entry
Write Back
and Writes back all modified cache lines in the processor's internal cache to main memory
WBINVD
Invalidate and invalidates the internal caches.
Cache
eXchange Exchanges the first operand with the second operand, then loads the sum of the two
XADD
and ADD values into the destination operand.

Added with Pentium


Instruction Meaning Notes
Returns data regarding processor identification and features, and returns data to
CPU
CPUID the EAX, EBX, ECX, and EDX registers. Instruction functions specified by the
IDentification
EAX register.[1] This was also added to later 80486 processors
CoMPare and
Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else,
CMPXCHG8B eXCHanGe 8
clear ZF and load m64 into EDX:EAX.
bytes
ReaD from
Model-
RDMSR Load MSR specified by ECX into EDX:EAX
specific
register
ReaD Time
Returns the number of processor ticks since the processor being "ONLINE" (since
RDTSC Stamp
the last power on of system)
Counter
WRite to
Model-
WRMSR Write the value in EDX:EAX to MSR specified by ECX
Specific
Register
Resume from
System This was introduced by the i386SL and later and is also in the i486SL and later.
RSM[4]
Management Resumes from System Management Mode (SMM)
Mode

Added with Pentium MMX

Instruction Meaning Notes


Read the PMC [Performance Monitoring Specified in the ECX register into registers
RDPMC
Counter] EDX:EAX

Also MMX registers and MMX support instructions were added. They are usable for both integer and floating
point operations, see below.

Added with AMD K6

Instruction Meaning Notes


SYSCALL functionally equivalent to SYSENTER
SYSRET functionally equivalent to SYSEXIT

AMD changed the CPUID detection bit for this feature from the K6-II on.

Added with Pentium Pro

Instruction Meaning Notes


(CMOVA, CMOVAE, CMOVB, CMOVBE, CMOVC, CMOVE, CMOVG, CMOVGE,
Conditional CMOVL, CMOVLE, CMOVNA, CMOVNAE, CMOVNB, CMOVNBE, CMOVNC,
CMOVcc
move CMOVNE, CMOVNG, CMOVNGE, CMOVNL, CMOVNLE, CMOVNO, CMOVNP,
CMOVNS, CMOVNZ, CMOVO, CMOVP, CMOVPE, CMOVPO, CMOVS, CMOVZ)
Generates an invalid opcode exception. This instruction is provided for software testing
Undefined
UD2 to explicitly generate an invalid opcode. The opcode for this instruction is reserved for
Instruction
this purpose.
Added with Pentium II

Instruction Meaning Notes


SYStem Sometimes called the Fast System Call instruction, this instruction was intended to
SYSENTER call increase the performance of operating system calls. Note that on the Pentium Pro, the
ENTER CPUID instruction incorrectly reports these instructions as available.
SYStem
SYSEXIT call
EXIT

Added with SSE

Instruction Opcode Meaning Notes

NOP r/m16 Multi-byte no-


0F 1F
operation
/0
NOP r/m32 instruction.
Prefetch Data from
PREFETCHT0 0F 18 /1 Prefetch into all cache levels
Address
Prefetch Data from
PREFETCHT1 0F 18 /2 Prefetch into all cache levels EXCEPT[5][6] L1
Address
Prefetch Data from
PREFETCHT2 0F 18 /3 Prefetch into all cache levels EXCEPT L1 and L2
Address
Prefetch Data from Prefetch to non-temporal cache structure, minimizing cache
PREFETCHNTA 0F 18 /0
Address pollution.
0F AE Processor hint to make sure all store operations that took place
SFENCE Store Fence
F8 prior to the SFENCE call are globally visible

Added with SSE2

Instruction Opcode Meaning Notes


CLFLUSH 0F AE Cache Line Invalidates the cache line that contains the linear address specified with
m8 /7 Flush the source operand from all levels of the processor cache hierarchy
0F AE
LFENCE Load Fence Serializes load operations.
E8
0F AE Performs a serializing operation on all load and store instructions that
MFENCE Memory Fence
F0 were issued prior the MFENCE instruction.
Move
MOVNTI Move doubleword from r32 to m32, minimizing pollution in the cache
0F C3 /r Doubleword
m32, r32 hierarchy.
Non-Temporal
Provides a hint to the processor that the following code is a spin loop,
PAUSE F3 90 Spin Loop Hint
for cacheability

Added with SSE3


Instruction Meaning Notes
Setup
Sets up a linear address range to be monitored by hardware and
MONITOR EAX, ECX, EDX Monitor
activates the monitor.
Address
Processor hint to stop instruction execution and enter an
Monitor
MWAIT EAX, ECX implementation-dependent optimized state until occurrence of a class
Wait
of events.

Added with SSE4.2

Instruction Opcode Meaning Notes


CRC32 r32, F2 0F
r/m8 38 F0 /r
F2 REX
CRC32 r32,
0F 38
r/m8
F0 /r
CRC32 r32, F2 0F
r/m16 38 F1 /r
CRC32 r32, F2 0F
Computes CRC value using the CRC-32C (Castagnoli) polynomial
r/m32 38 F1 /r
Accumulate 0x11EDC6F41 (normal form 0x1EDC6F41). This is the polynomial used in
F2 CRC32 iSCSI. In contrast to the more popular one used in Ethernet, its parity is
CRC32 r64, REX.W even, and it can thus detect any error with an odd number of changed bits.
r/m8 0F 38
F0 /r
F2
CRC32 r64, REX.W
r/m64 0F 38
F1 /r
CRC32 r32, F2 0F
r/m8 38 F0 /r

Added with x86-64


Instruction Meaning Notes
CDQE Sign extend EAX into RAX
CQO Sign extend RAX into RDX:RAX
CMPSQ CoMPare String Quadword
CMPXCHG16B CoMPare and eXCHanGe 16 Bytes
IRETQ 64-bit Return from Interrupt
JRCXZ Jump if RCX is zero
LODSQ LoaD String Quadword
MOVSXD MOV with Sign Extend 32-bit to 64-bit
POPFQ POP RFLAGS Register
PUSHFQ PUSH RFLAGS Register
RDTSCP ReaD Time Stamp Counter and Processor ID
SCASQ SCAn String Quadword
STOSQ STOre String Quadword
SWAPGS Exchange GS base with KernelGSBase MSR

Added with AMD-V

Instruction Meaning Notes Opcode


Clear Global Interrupt
CLGI Clears the GIF 0x0F 0x01 0xDD
Flag
Invalidate TLB entry Invalidates the TLB mapping for the virtual page specified
INVLPGA 0x0F 0x01 0xDF
in a specified ASID in RAX and the ASID specified in ECX.
Move to or from Moves 32- or 64-bit contents to control register and vice 0x0F 0x22 or
MOV(CRn)
control registers versa. 0x0F 0x20
Move to or from Moves 32- or 64-bit contents to control register and vice 0x0F 0x21 or
MOV(DRn)
debug registers versa. 0x0F 0x23
Secure Init and Jump Verifiable startup of trusted software based on secure
SKINIT 0x0F 0x01 0xDE
with Attestation hash comparison
Set Global Interrupt
STGI Sets the GIF. 0x0F 0x01 0xDC
Flag
Load state From Loads a subset of processor state from the VMCB
VMLOAD 0x0F 0x01 0xDA
VMCB specified by the physical address in the RAX register.
VMMCALL Call VMM Used exclusively to communicate with VMM 0x0F 0x01 0xD9
VMRUN Run virtual machine Performs a switch to the guest OS. 0x0F 0x01 0xD8
VMSAVE Save state To VMCB Saves additional guest state to VMCB. 0x0F 0x01 0xDB

Added with Intel VT-x


Instruction Meaning Notes Opcode
Invalidate
Invalidates EPT-derived entries in the TLBs and
INVEPT Translations 0x66 0x0F 0x38 0x80
paging-structure caches.
Derived from EPT
Invalidate
Invalidates entries in the TLBs and paging-structure
INVVPID Translations Based 0x66 0x0F 0x38 0x80
caches based on VPID.
on VPID
VMFUNC Invoke VM function Invoke VM function specified in EAX. 0x0F 0x01 0xD4
Load Pointer to
VMPTRLD Virtual-Machine Loads the current VMCS pointer from memory. 0x0F 0xC7/6
Control Structure
Store Pointer to Stores the current-VMCS pointer into a specified
VMPTRST Virtual-Machine memory address. The operand of this instruction is 0x0F 0xC7/7
Control Structure always 64 bits and is always in memory.
Clear Virtual-
VMCLEAR Machine Control Writes any cached data to the VMCS 0x66 0x0F 0xC7/6
Structure
Read Field from
VMREAD Virtual-Machine Reads out a field in the VMCS 0x0F 0x78
Control Structure
Write Field to
VMWRITE Virtual-Machine Modifies a field in the VMCS 0x0F 0x79
Control Structure
VMCALL Call to VM Monitor Calls VM Monitor function from Guest System 0x0F 0x01 0xC1
Launch Virtual
VMLAUNCH Launch virtual machine managed by current VMCS 0x0F 0x01 0xC2
Machine
Resume Virtual
VMRESUME Resume virtual machine managed by current VMCS 0x0F 0x01 0xC3
Machine
Leave VMX
VMXOFF Stops hardware supported virtualisation environment 0x0F 0x01 0xC4
Operation
Enter VMX
VMXON Enters hardware supported virtualisation environment 0xF3 0x0F 0xC7/6
Operation

Added with ABM

LZCNT, POPCNT (POPulation CouNT) – advanced bit manipulation

Added with BMI1

ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT

Added with BMI2

BZHI, MULX, PDEP, PEXT, RORX, SARX, SHRX, SHLX

Added with TBM

AMD introduced TBM together with BMI1 in its Piledriver[7] line of processors; later AMD Jaguar and Zen-
based processors do not support TBM.[8] No Intel processors (as of 2020) support TBM.
Instruction Description[9] Equivalent C expression[10]
BEXTR Bit field extract (with immediate) (src >> start) & ((1 << len) - 1)

BLCFILL Fill from lowest clear bit x & (x + 1)

BLCI Isolate lowest clear bit x | ~(x + 1)

BLCIC Isolate lowest clear bit and complement ~x & (x + 1)

BLCMSK Mask from lowest clear bit x ^ (x + 1)

BLCS Set lowest clear bit x | (x + 1)

BLSFILL Fill from lowest set bit x | (x - 1)

BLSIC Isolate lowest set bit and complement ~x | (x - 1)

T1MSKC Inverse mask from trailing ones ~x | (x + 1)

TZMSK Mask from trailing zeros ~x & (x - 1)

Added with CLMUL instruction set

Instruction Opcode Description

PCLMULQDQ 66 0f 3a 44 Perform a carry-less multiplication of two 64-bit polynomials over the


xmmreg,xmmrm,imm /r ib finite field GF(2k ).
PCLMULLQLQDQ 66 0f 3a 44
Multiply the low halves of the two registers.
xmmreg,xmmrm /r 00
PCLMULHQLQDQ 66 0f 3a 44 Multiply the high half of the destination register by the low half of the
xmmreg,xmmrm /r 01 source register.
PCLMULLQHQDQ 66 0f 3a 44 Multiply the low half of the destination register by the high half of the
xmmreg,xmmrm /r 10 source register.
PCLMULHQHQDQ 66 0f 3a 44
Multiply the high halves of the two registers.
xmmreg,xmmrm /r 11

Added with Intel ADX

Instruction Description
Adds two unsigned integers plus carry, reading the carry from the carry flag and if necessary setting it
ADCX
there. Does not affect other flags than the carry.
Adds two unsigned integers plus carry, reading the carry from the overflow flag and if necessary
ADOX
setting it there. Does not affect other flags than the overflow.

x87 floating-point instructions

Original 8087 instructions


Instruction Meaning Notes
F2XM1 more precise than for x close to zero
FABS Absolute value
FADD Add
FADDP Add and pop
FBLD Load BCD
FBSTP Store BCD and pop
FCHS Change sign
FCLEX Clear exceptions
FCOM Compare
FCOMP Compare and pop
FCOMPP Compare and pop twice
FDECSTP Decrement floating point stack pointer
FDISI Disable interrupts 8087 only, otherwise FNOP
FDIV Divide Pentium FDIV bug
FDIVP Divide and pop
FDIVR Divide reversed
FDIVRP Divide reversed and pop
FENI Enable interrupts 8087 only, otherwise FNOP
FFREE Free register
FIADD Integer add
FICOM Integer compare
FICOMP Integer compare and pop
FIDIV Integer divide
FIDIVR Integer divide reversed
FILD Load integer
FIMUL Integer multiply
FINCSTP Increment floating point stack pointer
FINIT Initialize floating point processor
FIST Store integer
FISTP Store integer and pop
FISUB Integer subtract
FISUBR Integer subtract reversed
FLD Floating point load
FLD1 Load 1.0 onto stack
FLDCW Load control word
FLDENV Load environment state
FLDENVW Load environment state, 16-bit
FLDL2E Load log2 (e) onto stack

FLDL2T Load log2 (10) onto stack

FLDLG2 Load log10 (2) onto stack

FLDLN2 Load ln(2) onto stack


FLDPI Load π onto stack
FLDZ Load 0.0 onto stack
FMUL Multiply
FMULP Multiply and pop
FNCLEX Clear exceptions, no wait
FNDISI Disable interrupts, no wait 8087 only, otherwise FNOP
FNENI Enable interrupts, no wait 8087 only, otherwise FNOP
FNINIT Initialize floating point processor, no wait
FNOP No operation
FNSAVE Save FPU state, no wait, 8-bit
FNSAVEW Save FPU state, no wait, 16-bit
FNSTCW Store control word, no wait
FNSTENV Store FPU environment, no wait
FNSTENVW Store FPU environment, no wait, 16-bit
FNSTSW Store status word, no wait
FPATAN Partial arctangent
FPREM Partial remainder
FPTAN Partial tangent
FRNDINT Round to integer
FRSTOR Restore saved state
FRSTORW Restore saved state Perhaps not actually available in 8087
FSAVE Save FPU state
FSAVEW Save FPU state, 16-bit
FSCALE Scale by factor of 2
FSQRT Square root
FST Floating point store
FSTCW Store control word
FSTENV Store FPU environment
FSTENVW Store FPU environment, 16-bit
FSTP Store and pop
FSTSW Store status word
FSUB Subtract
FSUBP Subtract and pop
FSUBR Reverse subtract
FSUBRP Reverse subtract and pop
FTST Test for zero
FWAIT Wait while FPU is executing
FXAM Examine condition flags
FXCH Exchange registers
FXTRACT Extract exponent and significand

FYL2X y · log2 x if y = logb 2, then the base-b logarithm is computed

FYL2XP1 y · log2 (x+1) more precise than log2 z if x is close to zero

Added in specific processors

Added with 80287

Instruction Meaning Notes


FSETPM Set protected mode 80287 only, otherwise FNOP

Added with 80387

Instruction Meaning Notes


FCOS Cosine
FLDENVD Load environment state, 32-bit
FSAVED Save FPU state, 32-bit
FPREM1 Partial remainder Computes IEEE remainder
FRSTORD Restore saved state, 32-bit
FSIN Sine
FSINCOS Sine and cosine
FSTENVD Store FPU environment, 32-bit
FUCOM Unordered compare
FUCOMP Unordered compare and pop
FUCOMPP Unordered compare and pop twice

Added with Pentium Pro


FCMOV variants: FCMOVB, FCMOVBE, FCMOVE, FCMOVNB, FCMOVNBE, FCMOVNE,
FCMOVNU, FCMOVU
FCOMI variants: FCOMI, FCOMIP, FUCOMI, FUCOMIP

Added with SSE

FXRSTOR, FXSAVE
These are also supported on later Pentium IIs which do not contain SSE support

Added with SSE3

FISTTP (x87 to integer conversion with truncation regardless of status word)

SIMD instructions

MMX instructions

MMX instructions operate on the mm registers, which are 64 bits wide. They are shared with the FPU
registers.

Original MMX instructions

Added with Pentium MMX


Instruction Opcode Meaning Notes
Marks all x87 FPU registers
EMMS 0F 77 Empty MMX Technology State
for use by FPU
MOVD mm, r/m32 0F 6E /r Move doubleword
MOVD r/m32, mm 0F 7E /r Move doubleword
MOVQ mm/m64,
0F 7F /r Move quadword
mm
MOVQ mm,
0F 6F /r Move quadword
mm/m64
REX.W +
MOVQ mm, r/m64 Move quadword
0F 6E /r
REX.W +
MOVQ r/m64, mm Move quadword
0F 7E /r
PACKSSDW mm1, Pack doublewords to words (signed with
0F 6B /r
mm2/m64 saturation)
PACKSSWB mm1,
0F 63 /r Pack words to bytes (signed with saturation)
mm2/m64
PACKUSWB mm, Pack words to bytes (unsigned with
0F 67 /r
mm/m64 saturation)
PADDB mm,
0F FC /r Add packed byte integers
mm/m64
PADDW mm,
0F FD /r Add packed word integers
mm/m64
PADDD mm,
0F FE /r Add packed doubleword integers
mm/m64
PADDQ mm,
0F D4 /r Add packed quadword integers
mm/m64
PADDSB mm, Add packed signed byte integers and
0F EC /r
mm/m64 saturate
PADDSW mm, Add packed signed word integers and
0F ED /r
mm/m64 saturate
PADDUSB mm, Add packed unsigned byte integers and
0F DC /r
mm/m64 saturate
PADDUSW mm, Add packed unsigned word integers and
0F DD /r
mm/m64 saturate
PAND mm,
0F DB /r Bitwise AND
mm/m64
PANDN mm,
0F DF /r Bitwise AND NOT
mm/m64
POR mm, mm/m64 0F EB /r Bitwise OR
PXOR mm,
0F EF /r Bitwise XOR
mm/m64
PCMPEQB mm,
0F 74 /r Compare packed bytes for equality
mm/m64
PCMPEQW mm,
0F 75 /r Compare packed words for equality
mm/m64
PCMPEQD mm, 0F 76 /r Compare packed doublewords for equality
mm/m64
PCMPGTB mm, Compare packed signed byte integers for
0F 64 /r
mm/m64 greater than
PCMPGTW mm, Compare packed signed word integers for
0F 65 /r
mm/m64 greater than
PCMPGTD mm, Compare packed signed doubleword integers
0F 66 /r
mm/m64 for greater than
PMADDWD mm, Multiply packed words, add adjacent
0F F5 /r
mm/m64 doubleword results
PMULHW mm, Multiply packed signed word integers, store
0F E5 /r
mm/m64 high 16 bits of results
PMULLW mm, Multiply packed signed word integers, store
0F D5 /r
mm/m64 low 16 bits of results
PSLLW mm1, imm8 0F 71 /6 ib Shift left words, shift in zeros
PSLLW mm,
0F F1 /r Shift left words, shift in zeros
mm/m64
PSLLD mm, imm8 0F 72 /6 ib Shift left doublewords, shift in zeros
PSLLD mm,
0F F2 /r Shift left doublewords, shift in zeros
mm/m64
PSLLQ mm, imm8 0F 73 /6 ib Shift left quadword, shift in zeros
PSLLQ mm,
0F F3 /r Shift left quadword, shift in zeros
mm/m64
PSRAD mm, imm8 0F 72 /4 ib Shift right doublewords, shift in sign bits
PSRAD mm,
0F E2 /r Shift right doublewords, shift in sign bits
mm/m64
PSRAW mm, imm8 0F 71 /4 ib Shift right words, shift in sign bits
PSRAW mm,
0F E1 /r Shift right words, shift in sign bits
mm/m64
PSRLW mm, imm8 0F 71 /2 ib Shift right words, shift in zeros
PSRLW mm,
0F D1 /r Shift right words, shift in zeros
mm/m64
PSRLD mm, imm8 0F 72 /2 ib Shift right doublewords, shift in zeros
PSRLD mm,
0F D2 /r Shift right doublewords, shift in zeros
mm/m64
PSRLQ mm, imm8 0F 73 /2 ib Shift right quadword, shift in zeros
PSRLQ mm,
0F D3 /r Shift right quadword, shift in zeros
mm/m64
PSUBB mm,
0F F8 /r Subtract packed byte integers
mm/m64
PSUBW mm,
0F F9 /r Subtract packed word integers
mm/m64
PSUBD mm,
0F FA /r Subtract packed doubleword integers
mm/m64
PSUBSB mm,
0F E8 /r Subtract signed packed bytes with saturation
mm/m64
PSUBSW mm,
0F E9 /r Subtract signed packed words with saturation
mm/m64
PSUBUSB mm, 0F D8 /r Subtract unsigned packed bytes with
mm/m64 saturation
PSUBUSW mm, Subtract unsigned packed words with
0F D9 /r
mm/m64 saturation
PUNPCKHBW mm,
0F 68 /r Unpack and interleave high-order bytes
mm/m64
PUNPCKHWD mm,
0F 69 /r Unpack and interleave high-order words
mm/m64
PUNPCKHDQ mm, Unpack and interleave high-order
0F 6A /r
mm/m64 doublewords
PUNPCKLBW mm,
0F 60 /r Unpack and interleave low-order bytes
mm/m32
PUNPCKLWD mm,
0F 61 /r Unpack and interleave low-order words
mm/m32
PUNPCKLDQ mm,
0F 62 /r Unpack and interleave low-order doublewords
mm/m32

MMX instructions added in specific processors

EMMI instructions

Added with 6x86MX from Cyrix, deprecated now

PAVEB, PADDSIW, PMAGW, PDISTIB, PSUBSIW, PMVZB, PMULHRW, PMVNZB, PMVLZB,


PMVGEZB, PMULHRIW, PMACHRIW

MMX instructions added with MMX+ and SSE

The following MMX instruction were added with SSE. They are also available on the Athlon under the name
MMX+.
Instruction Opcode Meaning
MASKMOVQ mm1, mm2 0F F7 /r Masked Move of Quadword
MOVNTQ m64, mm 0F E7 /r Move Quadword Using Non-Temporal Hint
PSHUFW mm1, mm2/m64, imm8 0F 70 /r ib Shuffle Packed Words
PINSRW mm, r32/m16, imm8 0F C4 /r Insert Word
PEXTRW reg, mm, imm8 0F C5 /r Extract Word
PMOVMSKB reg, mm 0F D7 /r Move Byte Mask
PMINUB mm1, mm2/m64 0F DA /r Minimum of Packed Unsigned Byte Integers
PMAXUB mm1, mm2/m64 0F DE /r Maximum of Packed Unsigned Byte Integers
PAVGB mm1, mm2/m64 0F E0 /r Average Packed Integers
PAVGW mm1, mm2/m64 0F E3 /r Average Packed Integers
PMULHUW mm1, mm2/m64 0F E4 /r Multiply Packed Unsigned Integers and Store High Result
PMINSW mm1, mm2/m64 0F EA /r Minimum of Packed Signed Word Integers
PMAXSW mm1, mm2/m64 0F EE /r Maximum of Packed Signed Word Integers
PSADBW mm1, mm2/m64 0F F6 /r Compute Sum of Absolute Differences

MMX instructions added with SSE2

The following MMX instructions were added with SSE2:

Instruction Opcode Meaning


PSUBQ mm1, mm2/m64 0F FB /r Subtract quadword integer
PMULUDQ mm1, mm2/m64 0F F4 /r Multiply unsigned doubleword integer

MMX instructions added with SSSE3


Instruction Opcode Meaning
PSIGNB mm1, 0F 38 08
Negate/zero/preserve packed byte integers depending on corresponding sign
mm2/m64 /r
PSIGNW mm1, 0F 38 09 Negate/zero/preserve packed word integers depending on corresponding
mm2/m64 /r sign
PSIGND mm1, 0F 38 0A Negate/zero/preserve packed doubleword integers depending on
mm2/m64 /r corresponding sign
PSHUFB mm1, 0F 38 00
Shuffle bytes
mm2/m64 /r
PMULHRSW mm1, 0F 38 0B Multiply 16-bit signed words, scale and round signed doublewords, pack high
mm2/m64 /r 16 bits
PMADDUBSW mm1, 0F 38 04 Multiply signed and unsigned bytes, add horizontal pair of signed words,
mm2/m64 /r pack saturated signed-words
PHSUBW mm1, 0F 38 05
Subtract and pack 16-bit signed integers horizontally
mm2/m64 /r
PHSUBSW mm1, 0F 38 07
Subtract and pack 16-bit signed integer horizontally with saturation
mm2/m64 /r
PHSUBD mm1, 0F 38 06
Subtract and pack 32-bit signed integers horizontally
mm2/m64 /r
PHADDSW mm1, 0F 38 03 Add and pack 16-bit signed integers horizontally, pack saturated integers to
mm2/m64 /r mm1.
PHADDW mm1, 0F 38 01
Add and pack 16-bit integers horizontally
mm2/m64 /r
PHADDD mm1, 0F 38 02
Add and pack 32-bit integers horizontally
mm2/m64 /r
PALIGNR mm1, 0F 3A 0F Concatenate destination and source operands, extract byte-aligned result
mm2/m64, imm8 /r ib shifted to the right
0F 38 1C
PABSB mm1, mm2/m64 Compute the absolute value of bytes and store unsigned result
/r
PABSW mm1, 0F 38 1D
Compute the absolute value of 16-bit integers and store unsigned result
mm2/m64 /r
0F 38 1E
PABSD mm1, mm2/m64 Compute the absolute value of 32-bit integers and store unsigned result
/r

3DNow! instructions

Added with K6-2

FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE, PFCMPGT, PFMAX, PFMIN,
PFMUL, PFRCP, PFRCPIT1, PFRCPIT2, PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD,
PMULHRW, PREFETCH, PREFETCHW

3DNow!+ instructions

Added with Athlon and K6-2+

PF2IW, PFNACC, PFPNACC, PI2FW, PSWAPD


Added with Geode GX

PFRSQRTV, PFRCPV

SSE instructions

Added with Pentium III

SSE instructions operate on xmm registers, which are 128 bit wide.

SSE consists of the following SSE SIMD floating-point instructions:


Instruction Opcode Meaning
ANDPS* xmm1,
0F 54 /r Bitwise Logical AND of Packed Single-Precision Floating-Point Values
xmm2/m128
ANDNPS* xmm1,
0F 55 /r Bitwise Logical AND NOT of Packed Single-Precision Floating-Point Values
xmm2/m128
ORPS* xmm1,
0F 56 /r Bitwise Logical OR of Single-Precision Floating-Point Values
xmm2/m128
XORPS* xmm1,
0F 57 /r Bitwise Logical XOR for Single-Precision Floating-Point Values
xmm2/m128
MOVUPS xmm1,
0F 10 /r Move Unaligned Packed Single-Precision Floating-Point Values
xmm2/m128
MOVSS xmm1,
F3 0F 10 /r Move Scalar Single-Precision Floating-Point Values
xmm2/m32
MOVUPS xmm2/m128,
0F 11 /r Move Unaligned Packed Single-Precision Floating-Point Values
xmm1
MOVSS xmm2/m32,
F3 0F 11 /r Move Scalar Single-Precision Floating-Point Values
xmm1
MOVLPS xmm, m64 0F 12 /r Move Low Packed Single-Precision Floating-Point Values
MOVHLPS xmm1,
0F 12 /r Move Packed Single-Precision Floating-Point Values High to Low
xmm2
MOVLPS m64, xmm 0F 13 /r Move Low Packed Single-Precision Floating-Point Values
UNPCKLPS xmm1,
0F 14 /r Unpack and Interleave Low Packed Single-Precision Floating-Point Values
xmm2/m128
UNPCKHPS xmm1,
0F 15 /r Unpack and Interleave High Packed Single-Precision Floating-Point Values
xmm2/m128
MOVHPS xmm, m64 0F 16 /r Move High Packed Single-Precision Floating-Point Values
MOVLHPS xmm1,
0F 16 /r Move Packed Single-Precision Floating-Point Values Low to High
xmm2
MOVHPS m64, xmm 0F 17 /r Move High Packed Single-Precision Floating-Point Values
MOVAPS xmm1,
0F 28 /r Move Aligned Packed Single-Precision Floating-Point Values
xmm2/m128
MOVAPS xmm2/m128,
0F 29 /r Move Aligned Packed Single-Precision Floating-Point Values
xmm1
MOVNTPS m128,
0F 2B /r Move Aligned Four Packed Single-FP Non Temporal
xmm1
Extract Packed Single-Precision Floating-Point 4-bit Sign Mask. The upper
MOVMSKPS reg, xmm 0F 50 /r
bits of the register are filled with zeros.
CVTPI2PS xmm,
0F 2A /r Convert Packed Dword Integers to Packed Single-Precision FP Values
mm/m64
CVTSI2SS xmm,
F3 0F 2A /r Convert Dword Integer to Scalar Single-Precision FP Value
r/m32
CVTSI2SS xmm, F3 REX.W
Convert Qword Integer to Scalar Single-Precision FP Value
r/m64 0F 2A /r
MOVNTPS m128, Store Packed Single-Precision Floating-Point Values Using Non-Temporal
0F 2B /r
xmm Hint
CVTTPS2PI mm, Convert with Truncation Packed Single-Precision FP Values to Packed
0F 2C /r
xmm/m64 Dword Integers
CVTTSS2SI r32, F3 0F 2C /r Convert with Truncation Scalar Single-Precision FP Value to Dword Integer
xmm/m32
CVTTSS2SI r64, F3 REX.W
Convert with Truncation Scalar Single-Precision FP Value to Qword Integer
xmm1/m32 0F 2C /r
CVTPS2PI mm,
0F 2D /r Convert Packed Single-Precision FP Values to Packed Dword Integers
xmm/m64
CVTSS2SI r32,
F3 0F 2D /r Convert Scalar Single-Precision FP Value to Dword Integer
xmm/m32
CVTSS2SI r64, F3 REX.W
Convert Scalar Single-Precision FP Value to Qword Integer
xmm1/m32 0F 2D /r
UCOMISS xmm1, Unordered Compare Scalar Single-Precision Floating-Point Values and Set
0F 2E /r
xmm2/m32 EFLAGS
COMISS xmm1, Compare Scalar Ordered Single-Precision Floating-Point Values and Set
0F 2F /r
xmm2/m32 EFLAGS
SQRTPS xmm1,
0F 51 /r Compute Square Roots of Packed Single-Precision Floating-Point Values
xmm2/m128
SQRTSS xmm1,
F3 0F 51 /r Compute Square Root of Scalar Single-Precision Floating-Point Value
xmm2/m32
RSQRTPS xmm1, Compute Reciprocal of Square Root of Packed Single-Precision Floating-
0F 52 /r
xmm2/m128 Point Value
RSQRTSS xmm1, Compute Reciprocal of Square Root of Scalar Single-Precision Floating-
F3 0F 52 /r
xmm2/m32 Point Value
RCPPS xmm1,
0F 53 /r Compute Reciprocal of Packed Single-Precision Floating-Point Values
xmm2/m128
RCPSS xmm1,
F3 0F 53 /r Compute Reciprocal of Scalar Single-Precision Floating-Point Values
xmm2/m32
ADDPS xmm1,
0F 58 /r Add Packed Single-Precision Floating-Point Values
xmm2/m128
ADDSS xmm1,
F3 0F 58 /r Add Scalar Single-Precision Floating-Point Values
xmm2/m32
MULPS xmm1,
0F 59 /r Multiply Packed Single-Precision Floating-Point Values
xmm2/m128
MULSS xmm1,
F3 0F 59 /r Multiply Scalar Single-Precision Floating-Point Values
xmm2/m32
SUBPS xmm1,
0F 5C /r Subtract Packed Single-Precision Floating-Point Values
xmm2/m128
SUBSS xmm1,
F3 0F 5C /r Subtract Scalar Single-Precision Floating-Point Values
xmm2/m32
MINPS xmm1,
0F 5D /r Return Minimum Packed Single-Precision Floating-Point Values
xmm2/m128
MINSS xmm1,
F3 0F 5D /r Return Minimum Scalar Single-Precision Floating-Point Values
xmm2/m32
DIVPS xmm1,
0F 5E /r Divide Packed Single-Precision Floating-Point Values
xmm2/m128
DIVSS xmm1,
F3 0F 5E /r Divide Scalar Single-Precision Floating-Point Values
xmm2/m32
MAXPS xmm1,
0F 5F /r Return Maximum Packed Single-Precision Floating-Point Values
xmm2/m128
MAXSS xmm1, F3 0F 5F /r Return Maximum Scalar Single-Precision Floating-Point Values
xmm2/m32
LDMXCSR m32 0F AE /2 Load MXCSR Register State
STMXCSR m32 0F AE /3 Store MXCSR Register State
CMPPS xmm1,
0F C2 /r ib Compare Packed Single-Precision Floating-Point Values
xmm2/m128, imm8
CMPSS xmm1, F3 0F C2 /r
Compare Scalar Single-Precision Floating-Point Values
xmm2/m32, imm8 ib
SHUFPS xmm1,
0F C6 /r ib Shuffle Packed Single-Precision Floating-Point Values
xmm2/m128, imm8

The floating point single bitwise operations ANDPS, ANDNPS, ORPS and XORPS produce
the same result as the SSE2 integer (PAND, PANDN, POR, PXOR) and double ones (ANDPD,
ANDNPD, ORPD, XORPD), but can introduce extra latency for domain changes when applied
values of the wrong type.[11]

SSE2 instructions

Added with Pentium 4

SSE2 SIMD floating-point instructions

SSE2 data movement instructions

Instruction Opcode Meaning


MOVAPD xmm1, 66 0F 28
Move Aligned Packed Double-Precision Floating-Point Values
xmm2/m128 /r
MOVAPD xmm2/m128, 66 0F 29
Move Aligned Packed Double-Precision Floating-Point Values
xmm1 /r
66 0F 2B Store Packed Double-Precision Floating-Point Values Using Non-
MOVNTPD m128, xmm1
/r Temporal Hint
66 0F 16
MOVHPD xmm1, m64 Move High Packed Double-Precision Floating-Point Value
/r
66 0F 17
MOVHPD m64, xmm1 Move High Packed Double-Precision Floating-Point Value
/r
66 0F 12
MOVLPD xmm1, m64 Move Low Packed Double-Precision Floating-Point Value
/r
MOVLPD m64, xmm1 66 0F 13/r Move Low Packed Double-Precision Floating-Point Value
MOVUPD xmm1, 66 0F 10
Move Unaligned Packed Double-Precision Floating-Point Values
xmm2/m128 /r
MOVUPD xmm2/m128, 66 0F 11
Move Unaligned Packed Double-Precision Floating-Point Values
xmm1 /r
66 0F 50
MOVMSKPD reg, xmm Extract Packed Double-Precision Floating-Point Sign Mask
/r
MOVSD* xmm1, F2 0F 10
Move or Merge Scalar Double-Precision Floating-Point Value
xmm2/m64 /r
F2 0F 11
MOVSD xmm1/m64, xmm2 Move or Merge Scalar Double-Precision Floating-Point Value
/r
SSE2 packed arithmetic instructions

Instruction Opcode Meaning


ADDPD xmm1, xmm2/m128 66 0F 58 /r Add Packed Double-Precision Floating-Point Values
ADDSD xmm1, xmm2/m64 F2 0F 58 /r Add Low Double-Precision Floating-Point Value
DIVPD xmm1, xmm2/m128 66 0F 5E /r Divide Packed Double-Precision Floating-Point Values
DIVSD xmm1, xmm2/m64 F2 0F 5E /r Divide Scalar Double-Precision Floating-Point Value
MAXPD xmm1, xmm2/m128 66 0F 5F /r Maximum of Packed Double-Precision Floating-Point Values
MAXSD xmm1, xmm2/m64 F2 0F 5F /r Return Maximum Scalar Double-Precision Floating-Point Value
MINPD xmm1, xmm2/m128 66 0F 5D /r Minimum of Packed Double-Precision Floating-Point Values
MINSD xmm1, xmm2/m64 F2 0F 5D /r Return Minimum Scalar Double-Precision Floating-Point Value
MULPD xmm1, xmm2/m128 66 0F 59 /r Multiply Packed Double-Precision Floating-Point Values
MULSD xmm1,xmm2/m64 F2 0F 59 /r Multiply Scalar Double-Precision Floating-Point Value
SQRTPD xmm1,
66 0F 51 /r Square Root of Double-Precision Floating-Point Values
xmm2/m128
SQRTSD xmm1,xmm2/m64 F2 0F 51/r Compute Square Root of Scalar Double-Precision Floating-Point Value
SUBPD xmm1, xmm2/m128 66 0F 5C /r Subtract Packed Double-Precision Floating-Point Values
SUBSD xmm1, xmm2/m64 F2 0F 5C /r Subtract Scalar Double-Precision Floating-Point Value

SSE2 logical instructions

Instruction Opcode Meaning


66 0F 54
ANDPD xmm1, xmm2/m128 Bitwise Logical AND of Packed Double Precision Floating-Point Values
/r
ANDNPD xmm1, 66 0F 55 Bitwise Logical AND NOT of Packed Double Precision Floating-Point
xmm2/m128 /r Values
ORPD xmm1, xmm2/m128 66 0F 56/r Bitwise Logical OR of Packed Double Precision Floating-Point Values
XORPD xmm1,
66 0F 57/r Bitwise Logical XOR of Packed Double Precision Floating-Point Values
xmm2/m128

SSE2 compare instructions

Instruction Opcode Meaning


CMPPD xmm1, xmm2/m128, 66 0F C2
Compare Packed Double-Precision Floating-Point Values
imm8 /r ib
CMPSD* xmm1, xmm2/m64, F2 0F C2
Compare Low Double-Precision Floating-Point Values
imm8 /r ib
Compare Scalar Ordered Double-Precision Floating-Point Values and
COMISD xmm1, xmm2/m64 66 0F 2F /r
Set EFLAGS
UCOMISD xmm1, 66 0F 2E Unordered Compare Scalar Double-Precision Floating-Point Values and
xmm2/m64 /r Set EFLAGS

SSE2 shuffle and unpack instructions


Instruction Opcode Meaning
SHUFPD xmm1, xmm2/m128, 66 0F C6 /r Packed Interleave Shuffle of Pairs of Double-Precision Floating-
imm8 ib Point Values
UNPCKHPD xmm1, Unpack and Interleave High Packed Double-Precision Floating-
66 0F 15 /r
xmm2/m128 Point Values
UNPCKLPD xmm1, Unpack and Interleave Low Packed Double-Precision Floating-Point
66 0F 14 /r
xmm2/m128 Values

SSE2 conversion instructions


Instruction Opcode Meaning
CVTDQ2PD xmm1, Convert Packed Doubleword Integers to Packed Double-Precision Floating-
F3 0F E6 /r
xmm2/m64 Point Values
CVTDQ2PS xmm1, Convert Packed Doubleword Integers to Packed Single-Precision Floating-
0F 5B /r
xmm2/m128 Point Values
CVTPD2DQ xmm1, Convert Packed Double-Precision Floating-Point Values to Packed
F2 0F E6 /r
xmm2/m128 Doubleword Integers
CVTPD2PI mm,
66 0F 2D /r Convert Packed Double-Precision FP Values to Packed Dword Integers
xmm/m128
CVTPD2PS xmm1, Convert Packed Double-Precision Floating-Point Values to Packed Single-
66 0F 5A /r
xmm2/m128 Precision Floating-Point Values
CVTPI2PD xmm,
66 0F 2A /r Convert Packed Dword Integers to Packed Double-Precision FP Values
mm/m64
CVTPS2DQ xmm1, Convert Packed Single-Precision Floating-Point Values to Packed Signed
66 0F 5B /r
xmm2/m128 Doubleword Integer Values
CVTPS2PD xmm1, Convert Packed Single-Precision Floating-Point Values to Packed Double-
0F 5A /r
xmm2/m64 Precision Floating-Point Values
CVTSD2SI r32,
F2 0F 2D /r Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer
xmm1/m64
CVTSD2SI r64, F2 REX.W Convert Scalar Double-Precision Floating-Point Value to Quadword Integer
xmm1/m64 0F 2D /r With Sign Extension
CVTSD2SS xmm1, Convert Scalar Double-Precision Floating-Point Value to Scalar Single-
F2 0F 5A /r
xmm2/m64 Precision Floating-Point Value
CVTSI2SD xmm1,
F2 0F 2A /r Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value
r32/m32
CVTSI2SD xmm1, F2 REX.W
Convert Quadword Integer to Scalar Double-Precision Floating-Point value
r/m64 0F 2A /r
CVTSS2SD xmm1, Convert Scalar Single-Precision Floating-Point Value to Scalar Double-
F3 0F 5A /r
xmm2/m32 Precision Floating-Point Value
CVTTPD2DQ xmm1, Convert with Truncation Packed Double-Precision Floating-Point Values to
66 0F E6 /r
xmm2/m128 Packed Doubleword Integers
CVTTPD2PI mm, Convert with Truncation Packed Double-Precision FP Values to Packed
66 0F 2C /r
xmm/m128 Dword Integers
CVTTPS2DQ xmm1, Convert with Truncation Packed Single-Precision Floating-Point Values to
F3 0F 5B /r
xmm2/m128 Packed Signed Doubleword Integer Values
CVTTSD2SI r32, Convert with Truncation Scalar Double-Precision Floating-Point Value to
F2 0F 2C /r
xmm1/m64 Signed Dword Integer
CVTTSD2SI r64, F2 REX.W Convert with Truncation Scalar Double-Precision Floating-Point Value To
xmm1/m64 0F 2C /r Signed Qword Integer

CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD
(CMPS) and MOVSD (MOVS); however, the former refer to scalar double-precision floating-
points whereas the latters refer to doubleword strings.

SSE2 SIMD integer instructions

SSE2 MMX-like instructions extended to SSE registers


SSE2 allows execution of MMX instructions on SSE registers, processing twice the amount of data at once.
Instruction Opcode Meaning
MOVD xmm, 66 0F
Move doubleword
r/m32 6E /r
MOVD r/m32, 66 0F
Move doubleword
xmm 7E /r
MOVQ xmm1, F3 0F
Move quadword
xmm2/m64 7E /r
MOVQ
66 0F
xmm2/m64, Move quadword
D6 /r
xmm1
66
MOVQ r/m64,
REX.W Move quadword
xmm
0F 7E /r
66
MOVQ xmm,
REX.W Move quadword
r/m64
0F 6E /r
PMOVMSKB 66 0F
Move a byte mask, zeroing the upper bits of the register
reg, xmm D7 /r
PEXTRW reg, 66 0F
Extract specified word and move it to reg, setting bits 15-0 and zeroing the rest
xmm, imm8 C5 /r ib
PINSRW
66 0F
xmm, Move low word at the specified word position
C4 /r ib
r32/m16, imm8
PACKSSDW
66 0F Converts 4 packed signed doubleword integers into 8 packed signed word integers with
xmm1,
6B /r saturation
xmm2/m128
PACKSSWB
66 0F Converts 8 packed signed word integers into 16 packed signed byte integers with
xmm1,
63 /r saturation
xmm2/m128
PACKUSWB
66 0F
xmm1, Converts 8 signed word integers into 16 unsigned byte integers with saturation
67 /r
xmm2/m128
PADDB
66 0F
xmm1, Add packed byte integers
FC /r
xmm2/m128
PADDW
66 0F
xmm1, Add packed word integers
FD /r
xmm2/m128
PADDD
66 0F
xmm1, Add packed doubleword integers
FE /r
xmm2/m128
PADDQ
66 0F
xmm1, Add packed quadword integers.
D4 /r
xmm2/m128
PADDSB
66 0F
xmm1, Add packed signed byte integers with saturation
EC /r
xmm2/m128
PADDSW
66 0F
xmm1, Add packed signed word integers with saturation
ED /r
xmm2/m128
PADDUSB 66 0F Add packed unsigned byte integers with saturation
xmm1, DC /r
xmm2/m128
PADDUSW
66 0F
xmm1, Add packed unsigned word integers with saturation
DD /r
xmm2/m128
PAND xmm1, 66 0F
Bitwise AND
xmm2/m128 DB /r
PANDN
66 0F
xmm1, Bitwise AND NOT
DF /r
xmm2/m128
POR xmm1, 66 0F
Bitwise OR
xmm2/m128 EB /r
PXOR xmm1, 66 0F
Bitwise XOR
xmm2/m128 EF /r
PCMPEQB
66 0F
xmm1, Compare packed bytes for equality.
74 /r
xmm2/m128
PCMPEQW
66 0F
xmm1, Compare packed words for equality.
75 /r
xmm2/m128
PCMPEQD
66 0F
xmm1, Compare packed doublewords for equality.
76 /r
xmm2/m128
PCMPGTB
66 0F
xmm1, Compare packed signed byte integers for greater than
64 /r
xmm2/m128
PCMPGTW
66 0F
xmm1, Compare packed signed word integers for greater than
65 /r
xmm2/m128
PCMPGTD
66 0F
xmm1, Compare packed signed doubleword integers for greater than
66 /r
xmm2/m128
PMULLW
66 0F
xmm1, Multiply packed signed word integers with saturation
D5 /r
xmm2/m128
PMULHW
66 0F
xmm1, Multiply the packed signed word integers, store the high 16 bits of the results
E5 /r
xmm2/m128
PMULHUW
66 0F
xmm1, Multiply packed unsigned word integers, store the high 16 bits of the results
E4 /r
xmm2/m128
PMULUDQ
66 0F
xmm1, Multiply packed unsigned doubleword integers
F4 /r
xmm2/m128
PSLLW xmm1, 66 0F
Shift words left while shifting in 0s
xmm2/m128 F1 /r
PSLLW xmm1, 66 0F
Shift words left while shifting in 0s
imm8 71 /6 ib
PSLLD xmm1, 66 0F
Shift doublewords left while shifting in 0s
xmm2/m128 F2 /r
PSLLD xmm1, 66 0F
Shift doublewords left while shifting in 0s
imm8 72 /6 ib
PSLLQ xmm1, 66 0F Shift quadwords left while shifting in 0s
xmm2/m128 F3 /r
PSLLQ xmm1, 66 0F
Shift quadwords left while shifting in 0s
imm8 73 /6 ib
PSRAD
66 0F
xmm1, Shift doubleword right while shifting in sign bits
E2 /r
xmm2/m128
PSRAD 66 0F
Shift doublewords right while shifting in sign bits
xmm1, imm8 72 /4 ib
PSRAW
66 0F
xmm1, Shift words right while shifting in sign bits
E1 /r
xmm2/m128
PSRAW 66 0F
Shift words right while shifting in sign bits
xmm1, imm8 71 /4 ib
PSRLW
66 0F
xmm1, Shift words right while shifting in 0s
D1 /r
xmm2/m128
PSRLW 66 0F
Shift words right while shifting in 0s
xmm1, imm8 71 /2 ib
PSRLD xmm1, 66 0F
Shift doublewords right while shifting in 0s
xmm2/m128 D2 /r
PSRLD xmm1, 66 0F
Shift doublewords right while shifting in 0s
imm8 72 /2 ib
PSRLQ
66 0F
xmm1, Shift quadwords right while shifting in 0s
D3 /r
xmm2/m128
PSRLQ 66 0F
Shift quadwords right while shifting in 0s
xmm1, imm8 73 /2 ib
PSUBB
66 0F
xmm1, Subtract packed byte integers
F8 /r
xmm2/m128
PSUBW
66 0F
xmm1, Subtract packed word integers
F9 /r
xmm2/m128
PSUBD
66 0F
xmm1, Subtract packed doubleword integers
FA /r
xmm2/m128
PSUBQ
66 0F
xmm1, Subtract packed quadword integers.
FB /r
xmm2/m128
PSUBSB
66 0F
xmm1, Subtract packed signed byte integers with saturation
E8 /r
xmm2/m128
PSUBSW
66 0F
xmm1, Subtract packed signed word integers with saturation
E9 /r
xmm2/m128
PMADDWD
66 0F
xmm1, Multiply the packed word integers, add adjacent doubleword results
F5 /r
xmm2/m128
PSUBUSB
66 0F
xmm1, Subtract packed unsigned byte integers with saturation
D8 /r
xmm2/m128
PSUBUSW 66 0F Subtract packed unsigned word integers with saturation
xmm1, D9 /r
xmm2/m128
PUNPCKHBW
66 0F
xmm1, Unpack and interleave high-order bytes
68 /r
xmm2/m128
PUNPCKHWD
66 0F
xmm1, Unpack and interleave high-order words
69 /r
xmm2/m128
PUNPCKHDQ
66 0F
xmm1, Unpack and interleave high-order doublewords
6A /r
xmm2/m128
PUNPCKLBW
66 0F
xmm1, Interleave low-order bytes
60 /r
xmm2/m128
PUNPCKLWD
66 0F
xmm1, Interleave low-order words
61 /r
xmm2/m128
PUNPCKLDQ
66 0F
xmm1, Interleave low-order doublewords
62 /r
xmm2/m128
PAVGB
66 0F
xmm1, Average packed unsigned byte integers with rounding
E0, /r
xmm2/m128
PAVGW
66 0F
xmm1, Average packed unsigned word integers with rounding
E3 /r
xmm2/m128
PMINUB
66 0F
xmm1, Compare packed unsigned byte integers and store packed minimum values
DA /r
xmm2/m128
PMINSW
66 0F
xmm1, Compare packed signed word integers and store packed minimum values
EA /r
xmm2/m128
PMAXSW
66 0F
xmm1, Compare packed signed word integers and store maximum packed values
EE /r
xmm2/m128
PMAXUB
66 0F
xmm1, Compare packed unsigned byte integers and store packed maximum values
DE /r
xmm2/m128
PSADBW Computes the absolute differences of the packed unsigned byte integers; the 8 low
66 0F
xmm1, differences and 8 high differences are then summed separately to produce two unsigned
F6 /r
xmm2/m128 word integer results

SSE2 integer instructions for SSE registers only

The following instructions can be used only on SSE registers, since by their nature they do not work on MMX
registers
Instruction Opcode Meaning
Non-Temporal Store of Selected Bytes from an XMM Register
MASKMOVDQU xmm1, xmm2 66 0F F7 /r
into Memory
MOVDQ2Q mm, xmm F2 0F D6 /r Move low quadword from XMM to MMX register.
MOVDQA xmm1, xmm2/m128 66 0F 6F /r Move aligned double quadword
MOVDQA xmm2/m128, xmm1 66 0F 7F /r Move aligned double quadword
MOVDQU xmm1, xmm2/m128 F3 0F 6F /r Move unaligned double quadword
MOVDQU xmm2/m128, xmm1 F3 0F 7F /r Move unaligned double quadword
Move quadword from MMX register to low quadword of XMM
MOVQ2DQ xmm, mm F3 0F D6 /r
register
MOVNTDQ m128, xmm1 66 0F E7 /r Store Packed Integers Using Non-Temporal Hint
PSHUFHW xmm1, xmm2/m128, F3 0F 70 /r
Shuffle packed high words.
imm8 ib
PSHUFLW xmm1, xmm2/m128, F2 0F 70 /r
Shuffle packed low words.
imm8 ib
PSHUFD xmm1, xmm2/m128, 66 0F 70 /r
Shuffle packed doublewords.
imm8 ib
66 0F 73 /7
PSLLDQ xmm1, imm8 Packed shift left logical double quadwords.
ib
66 0F 73 /3
PSRLDQ xmm1, imm8 Packed shift right logical double quadwords.
ib
PUNPCKHQDQ xmm1,
66 0F 6D /r Unpack and interleave high-order quadwords,
xmm2/m128
PUNPCKLQDQ xmm1,
66 0F 6C /r Interleave low quadwords,
xmm2/m128

SSE3 instructions

Added with Pentium 4 supporting SSE3

SSE3 SIMD floating-point instructions


Instruction Opcode Meaning Notes
ADDSUBPS xmm1, F2 0F
Add/subtract single-precision floating-point values
xmm2/m128 D0 /r
ADDSUBPD xmm1, 66 0F
Add/subtract double-precision floating-point values
xmm2/m128 D0 /r
MOVDDUP xmm1, F2 0F Move double-precision floating-point value and for Complex
xmm2/m64 12 /r duplicate Arithmetic
MOVSLDUP xmm1, F3 0F Move and duplicate even index single-precision
xmm2/m128 12 /r floating-point values
MOVSHDUP xmm1, F3 0F Move and duplicate odd index single-precision floating-
xmm2/m128 16 /r point values
HADDPS xmm1, F2 0F Horizontal add packed single-precision floating-point
xmm2/m128 7C /r values
HADDPD xmm1, 66 0F Horizontal add packed double-precision floating-point
xmm2/m128 7C /r values
for Graphics
HSUBPS xmm1, F2 0F Horizontal subtract packed single-precision floating-
xmm2/m128 7D /r point values
HSUBPD xmm1, 66 0F Horizontal subtract packed double-precision floating-
xmm2/m128 7D /r point values

SSE3 SIMD integer instructions

Instruction Opcode Meaning Notes


LDDQU xmm1, F2 0F Load unaligned data and return Instructionally equivalent to MOVDQU. For
mem F0 /r double quadword video encoding

SSSE3 instructions

Added with Xeon 5100 series and initial Core 2

The following MMX-like instructions extended to SSE registers were added with SSSE3
Instruction Opcode Meaning
PSIGNB xmm1, 66 0F 38 Negate/zero/preserve packed byte integers depending on corresponding
xmm2/m128 08 /r sign
PSIGNW xmm1, 66 0F 38 Negate/zero/preserve packed word integers depending on corresponding
xmm2/m128 09 /r sign
PSIGND xmm1, 66 0F 38 Negate/zero/preserve packed doubleword integers depending on
xmm2/m128 0A /r corresponding
PSHUFB xmm1, 66 0F 38
Shuffle bytes
xmm2/m128 00 /r
PMULHRSW xmm1, 66 0F 38 Multiply 16-bit signed words, scale and round signed doublewords, pack
xmm2/m128 0B /r high 16 bits
PMADDUBSW xmm1, 66 0F 38 Multiply signed and unsigned bytes, add horizontal pair of signed words,
xmm2/m128 04 /r pack saturated signed-words
PHSUBW xmm1, 66 0F 38
Subtract and pack 16-bit signed integers horizontally
xmm2/m128 05 /r
PHSUBSW xmm1, 66 0F 38
Subtract and pack 16-bit signed integer horizontally with saturation
xmm2/m128 07 /r
PHSUBD xmm1, 66 0F 38
Subtract and pack 32-bit signed integers horizontally
xmm2/m128 06 /r
PHADDSW xmm1, 66 0F 38
Add and pack 16-bit signed integers horizontally with saturation
xmm2/m128 03 /r
PHADDW xmm1, 66 0F 38
Add and pack 16-bit integers horizontally
xmm2/m128 01 /r
PHADDD xmm1, 66 0F 38
Add and pack 32-bit integers horizontally
xmm2/m128 02 /r
PALIGNR xmm1, 66 0F 3A Concatenate destination and source operands, extract byte-aligned result
xmm2/m128, imm8 0F /r ib shifted to the right
PABSB xmm1, 66 0F 38
Compute the absolute value of bytes and store unsigned result
xmm2/m128 1C /r
PABSW xmm1, 66 0F 38
Compute the absolute value of 16-bit integers and store unsigned result
xmm2/m128 1D /r
PABSD xmm1, 66 0F 38
Compute the absolute value of 32-bit integers and store unsigned result
xmm2/m128 1E /r

SSE4 instructions

SSE4.1

Added with Core 2 manufactured in 45nm

SSE4.1 SIMD floating-point instructions


Instruction Opcode Meaning
DPPS xmm1, 66 0F 3A Selectively multiply packed SP floating-point values, add and selectively
xmm2/m128, imm8 40 /r ib store
DPPD xmm1, 66 0F 3A Selectively multiply packed DP floating-point values, add and selectively
xmm2/m128, imm8 41 /r ib store
BLENDPS xmm1, 66 0F 3A
Select packed single precision floating-point values from specified mask
xmm2/m128, imm8 0C /r ib
BLENDVPS xmm1, 66 0F 38
Select packed single precision floating-point values from specified mask
xmm2/m128, <XMM0> 14 /r
BLENDPD xmm1, 66 0F 3A
Select packed DP-FP values from specified mask
xmm2/m128, imm8 0D /r ib
BLENDVPD xmm1, 66 0F 38
Select packed DP FP values from specified mask
xmm2/m128 , <XMM0> 15 /r
ROUNDPS xmm1, 66 0F 3A
Round packed single precision floating-point values
xmm2/m128, imm8 08 /r ib
ROUNDSS xmm1, 66 0F 3A
Round the low packed single precision floating-point value
xmm2/m32, imm8 0A /r ib
ROUNDPD xmm1, 66 0F 3A
Round packed double precision floating-point values
xmm2/m128, imm8 09 /r ib
ROUNDSD xmm1, 66 0F 3A
Round the low packed double precision floating-point value
xmm2/m64, imm8 0B /r ib
INSERTPS xmm1, 66 0F 3A Insert a selected single-precision floating-point value at the specified
xmm2/m32, imm8 21 /r ib destination element and zero out destination elements
EXTRACTPS reg/m32, 66 0F 3A Extract one single-precision floating-point value at specified offset and
xmm1, imm8 17 /r ib store the result (zero-extended, if applicable)

SSE4.1 SIMD integer instructions


Instruction Opcode Meaning
MPSADBW xmm1, 66 0F 3A 42 /r Sums absolute 8-bit integer difference of adjacent groups of 4 byte
xmm2/m128, imm8 ib integers with starting offset
PHMINPOSUW xmm1,
66 0F 38 41 /r Find the minimum unsigned word
xmm2/m128
PMULLD xmm1,
66 0F 38 40 /r Multiply the packed dword signed integers and store the low 32 bits
xmm2/m128
PMULDQ xmm1, Multiply packed signed doubleword integers and store quadword
66 0F 38 28 /r
xmm2/m128 result
PBLENDVB xmm1,
66 0F 38 10 /r Select byte values from specified mask
xmm2/m128, <XMM0>
PBLENDW xmm1, 66 0F 3A 0E /r
Select words from specified mask
xmm2/m128, imm8 ib
PMINSB xmm1,
66 0F 38 38 /r Compare packed signed byte integers
xmm2/m128
PMINUW xmm1,
66 0F 38 3A/r Compare packed unsigned word integers
xmm2/m128
PMINSD xmm1,
66 0F 38 39 /r Compare packed signed dword integers
xmm2/m128
PMINUD xmm1,
66 0F 38 3B /r Compare packed unsigned dword integers
xmm2/m128
PMAXSB xmm1,
66 0F 38 3C /r Compare packed signed byte integers
xmm2/m128
PMAXUW xmm1,
66 0F 38 3E/r Compare packed unsigned word integers
xmm2/m128
PMAXSD xmm1,
66 0F 38 3D /r Compare packed signed dword integers
xmm2/m128
PMAXUD xmm1,
66 0F 38 3F /r Compare packed unsigned dword integers
xmm2/m128
PINSRB xmm1, r32/m8, 66 0F 3A 20 /r
Insert a byte integer value at specified destination element
imm8 ib
PINSRD xmm1, r/m32, 66 0F 3A 22 /r
Insert a dword integer value at specified destination element
imm8 ib
PINSRQ xmm1, r/m64, 66 REX.W 0F
Insert a qword integer value at specified destination element
imm8 3A 22 /r ib
PEXTRB reg/m8, xmm2, 66 0F 3A 14 /r Extract a byte integer value at source byte offset, upper bits are
imm8 ib zeroed.
PEXTRW reg/m16, xmm, 66 0F 3A 15 /r
Extract word and copy to lowest 16 bits, zero-extended
imm8 ib
PEXTRD r/m32, xmm2, 66 0F 3A 16 /r
Extract a dword integer value at source dword offset
imm8 ib
PEXTRQ r/m64, xmm2, 66 REX.W 0F
Extract a qword integer value at source qword offset
imm8 3A 16 /r ib
PMOVSXBW xmm1,
66 0f 38 20 /r Sign extend 8 packed 8-bit integers to 8 packed 16-bit integers
xmm2/m64
PMOVZXBW xmm1,
66 0f 38 30 /r Zero extend 8 packed 8-bit integers to 8 packed 16-bit integers
xmm2/m64
PMOVSXBD xmm1, 66 0f 38 21 /r Sign extend 4 packed 8-bit integers to 4 packed 32-bit integers
xmm2/m32
PMOVZXBD xmm1,
66 0f 38 31 /r Zero extend 4 packed 8-bit integers to 4 packed 32-bit integers
xmm2/m32
PMOVSXBQ xmm1,
66 0f 38 22 /r Sign extend 2 packed 8-bit integers to 2 packed 64-bit integers
xmm2/m16
PMOVZXBQ xmm1,
66 0f 38 32 /r Zero extend 2 packed 8-bit integers to 2 packed 64-bit integers
xmm2/m16
PMOVSXWD xmm1,
66 0f 38 23/r Sign extend 4 packed 16-bit integers to 4 packed 32-bit integers
xmm2/m64
PMOVZXWD xmm1,
66 0f 38 33 /r Zero extend 4 packed 16-bit integers to 4 packed 32-bit integers
xmm2/m64
PMOVSXWQ xmm1,
66 0f 38 24 /r Sign extend 2 packed 16-bit integers to 2 packed 64-bit integers
xmm2/m32
PMOVZXWQ xmm1,
66 0f 38 34 /r Zero extend 2 packed 16-bit integers to 2 packed 64-bit integers
xmm2/m32
PMOVSXDQ xmm1,
66 0f 38 25 /r Sign extend 2 packed 32-bit integers to 2 packed 64-bit integers
xmm2/m64
PMOVZXDQ xmm1,
66 0f 38 35 /r Zero extend 2 packed 32-bit integers to 2 packed 64-bit integers
xmm2/m64
PTEST xmm1, xmm2/m128 66 0F 38 17 /r Set ZF if AND result is all 0s, set CF if AND NOT result is all 0s
PCMPEQQ xmm1,
66 0F 38 29 /r Compare packed qwords for equality
xmm2/m128
PACKUSDW xmm1, Convert 2 × 4 packed signed doubleword integers into 8 packed
66 0F 38 2B /r
xmm2/m128 unsigned word integers with saturation
MOVNTDQA xmm1, m128 66 0F 38 2A /r Move double quadword using non-temporal hint if WC memory type

SSE4a

Added with Phenom processors

EXTRQ/INSERTQ
MOVNTSD/MOVNTSS

SSE4.2

Added with Nehalem processors

Instruction Opcode Meaning


PCMPESTRI xmm1, 66 0F 3A 61 /r Packed comparison of string data with explicit lengths,
xmm2/m128, imm8 imm8 generating an index
PCMPESTRM xmm1, 66 0F 3A 60 /r Packed comparison of string data with explicit lengths,
xmm2/m128, imm8 imm8 generating a mask
PCMPISTRI xmm1, xmm2/m128, 66 0F 3A 63 /r Packed comparison of string data with implicit lengths,
imm8 imm8 generating an index
PCMPISTRM xmm1, 66 0F 3A 62 /r Packed comparison of string data with implicit lengths,
xmm2/m128, imm8 imm8 generating a mask
PCMPGTQ xmm1,xmm2/m128 66 0F 38 37 /r Compare packed signed qwords for greater than.
SSE5 derived instructions

SSE5 was a proposed SSE extension by AMD. The bundle did not include the full set of Intel's SSE4
instructions, making it a competitor to SSE4 rather than a successor. AMD chose not to implement SSE5 as
originally proposed, however, derived SSE extensions were introduced.

XOP

Introduced with the bulldozer processor core, removed again from Zen (microarchitecture) onward.

A revision of most of the SSE5 instruction set

F16C

Half-precision floating-point conversion.

Instruction Meaning
VCVTPH2PS Convert four half-precision floating point values in memory or the bottom half of an
xmmreg,xmmrm64 XMM register to four single-precision floating-point values in an XMM register
Convert eight half-precision floating point values in memory or an XMM register (the
VCVTPH2PS
bottom half of a YMM register) to eight single-precision floating-point values in a YMM
ymmreg,xmmrm128
register
VCVTPS2PH Convert four single-precision floating point values in an XMM register to half-precision
xmmrm64,xmmreg,imm8 floating-point values in memory or the bottom half an XMM register
VCVTPS2PH Convert eight single-precision floating point values in a YMM register to half-precision
xmmrm128,ymmreg,imm8 floating-point values in memory or an XMM register

FMA3

Supported in AMD processors starting with the Piledriver architecture and Intel starting with Haswell
processors and Broadwell processors since 2014.

Fused multiply-add (floating-point vector multiply–accumulate) with three operands.


Instruction Meaning
VFMADD132PD Fused Multiply-Add of Packed Double-Precision Floating-Point Values
VFMADD213PD Fused Multiply-Add of Packed Double-Precision Floating-Point Values
VFMADD231PD Fused Multiply-Add of Packed Double-Precision Floating-Point Values
VFMADD132PS Fused Multiply-Add of Packed Single-Precision Floating-Point Values
VFMADD213PS Fused Multiply-Add of Packed Single-Precision Floating-Point Values
VFMADD231PS Fused Multiply-Add of Packed Single-Precision Floating-Point Values
VFMADD132SD Fused Multiply-Add of Scalar Double-Precision Floating-Point Values
VFMADD213SD Fused Multiply-Add of Scalar Double-Precision Floating-Point Values
VFMADD231SD Fused Multiply-Add of Scalar Double-Precision Floating-Point Values
VFMADD132SS Fused Multiply-Add of Scalar Single-Precision Floating-Point Values
VFMADD213SS Fused Multiply-Add of Scalar Single-Precision Floating-Point Values
VFMADD231SS Fused Multiply-Add of Scalar Single-Precision Floating-Point Values
VFMADDSUB132PD Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values
VFMADDSUB213PD Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values
VFMADDSUB231PD Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values
VFMADDSUB132PS Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
VFMADDSUB213PS Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
VFMADDSUB231PS Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
VFMSUB132PD Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFMSUB213PD Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFMSUB231PD Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFMSUB132PS Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFMSUB213PS Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFMSUB231PS Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFMSUB132SD Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFMSUB213SD Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFMSUB231SD Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFMSUB132SS Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFMSUB213SS Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFMSUB231SS Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFMSUBADD132PD Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
VFMSUBADD213PD Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
VFMSUBADD231PD Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
VFMSUBADD132PS Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
VFMSUBADD213PS Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
VFMSUBADD231PS Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
VFNMADD132PD Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
VFNMADD213PD Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
VFNMADD231PD Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
VFNMADD132PS Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
VFNMADD213PS Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
VFNMADD231PS Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
VFNMADD132SD Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
VFNMADD213SD Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
VFNMADD231SD Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
VFNMADD132SS Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
VFNMADD213SS Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
VFNMADD231SS Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
VFNMSUB132PD Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFNMSUB213PD Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFNMSUB231PD Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFNMSUB132PS Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFNMSUB213PS Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFNMSUB231PS Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFNMSUB132SD Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFNMSUB213SD Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFNMSUB231SD Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFNMSUB132SS Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFNMSUB213SS Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFNMSUB231SS Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values

FMA4

Supported in AMD processors starting with the Bulldozer architecture. Not supported by any intel chip as of
2017.

Fused multiply-add with four operands. FMA4 was realized in hardware before FMA3.
Instruction Opcode Meaning Notes
C4E3
VFMADDPD xmm0, xmm1, Fused Multiply-Add of Packed Double-Precision
WvvvvL01 69 /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMADDPS xmm0, xmm1, Fused Multiply-Add of Packed Single-Precision Floating-
WvvvvL01 68 /r
xmm2, xmm3 Point Values
/is4
C4E3
VFMADDSD xmm0, xmm1, Fused Multiply-Add of Scalar Double-Precision Floating-
WvvvvL01 6B /r
xmm2, xmm3 Point Values
/is4
C4E3
VFMADDSS xmm0, xmm1, Fused Multiply-Add of Scalar Single-Precision Floating-
WvvvvL01 6A /r
xmm2, xmm3 Point Values
/is4
C4E3
VFMADDSUBPD xmm0, Fused Multiply-Alternating Add/Subtract of Packed
WvvvvL01 5D /r
xmm1, xmm2, xmm3 Double-Precision Floating-Point Values
/is4
C4E3
VFMADDSUBPS xmm0, Fused Multiply-Alternating Add/Subtract of Packed
WvvvvL01 5C /r
xmm1, xmm2, xmm3 Single-Precision Floating-Point Values
/is4
C4E3
VFMSUBADDPD xmm0, Fused Multiply-Alternating Subtract/Add of Packed
WvvvvL01 5F /r
xmm1, xmm2, xmm3 Double-Precision Floating-Point Values
/is4
C4E3
VFMSUBADDPS xmm0, Fused Multiply-Alternating Subtract/Add of Packed
WvvvvL01 5E /r
xmm1, xmm2, xmm3 Single-Precision Floating-Point Values
/is4
C4E3
VFMSUBPD xmm0, xmm1, Fused Multiply-Subtract of Packed Double-Precision
WvvvvL01 6D /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMSUBPS xmm0, xmm1, Fused Multiply-Subtract of Packed Single-Precision
WvvvvL01 6C /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMSUBSD xmm0, xmm1, Fused Multiply-Subtract of Scalar Double-Precision
WvvvvL01 6F /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMSUBSS xmm0, xmm1, Fused Multiply-Subtract of Scalar Single-Precision
WvvvvL01 6E /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMADDPD xmm0, xmm1, Fused Negative Multiply-Add of Packed Double-
WvvvvL01 79 /r
xmm2, xmm3 Precision Floating-Point Values
/is4
C4E3
VFNMADDPS xmm0, xmm1, Fused Negative Multiply-Add of Packed Single-Precision
WvvvvL01 78 /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMADDSD xmm0, xmm1, Fused Negative Multiply-Add of Scalar Double-Precision
WvvvvL01 7B /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMADDSS xmm0, xmm1, Fused Negative Multiply-Add of Scalar Single-Precision
WvvvvL01 7A /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMSUBPD xmm0, xmm1, Fused Negative Multiply-Subtract of Packed Double-
WvvvvL01 7D /r
xmm2, xmm3 Precision Floating-Point Values
/is4
VFNMSUBPS xmm0, xmm1, C4E3 Fused Negative Multiply-Subtract of Packed Single-
xmm2, xmm3 WvvvvL01 7C /r Precision Floating-Point Values
/is4
C4E3
VFNMSUBSD xmm0, xmm1, Fused Negative Multiply-Subtract of Scalar Double-
WvvvvL01 7F /r
xmm2, xmm3 Precision Floating-Point Values
/is4
C4E3
VFNMSUBSS xmm0, xmm1, Fused Negative Multiply-Subtract of Scalar Single-
WvvvvL01 7E /r
xmm2, xmm3 Precision Floating-Point Values
/is4

AVX

AVX were first supported by Intel with Sandy Bridge and by AMD with Bulldozer.

Vector operations on 256 bit registers.

Instruction Description
VBROADCASTSS
Copy a 32-bit, 64-bit or 128-bit memory operand to all elements of a XMM or YMM vector
VBROADCASTSD
register.
VBROADCASTF128
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a
VINSERTF128
128-bit source operand. The other half of the destination is unchanged.
Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value
VEXTRACTF128
to a 128-bit destination operand.
Conditionally reads any number of elements from a SIMD vector memory operand into a
VMASKMOVPS destination register, leaving the remaining vector elements unread and setting the
corresponding elements in the destination register to zero. Alternatively, conditionally writes
any number of elements from a SIMD vector register operand to a vector memory operand,
leaving the remaining elements of the memory operand unchanged. On the AMD Jaguar
processor architecture, this instruction with a memory source operand takes more than 300
VMASKMOVPD clock cycles when the mask is zero, in which case the instruction should do nothing. This
appears to be a design flaw.[12]

VPERMILPS Permute In-Lane. Shuffle the 32-bit or 64-bit vector elements of one input operand. These are
in-lane 256-bit instructions, meaning that they operate on all 256 bits with two separate 128-
VPERMILPD bit shuffles, so they can not shuffle across the 128-bit lanes.[13]
Shuffle the four 128-bit vector elements of two 256-bit source operands into a 256-bit
VPERM2F128
destination operand, with an immediate constant as selector.
Set all YMM registers to zero and tag them as unused. Used when switching between 128-bit
VZEROALL
use and 256-bit use.
Set the upper half of all YMM registers to zero. Used when switching between 128-bit use and
VZEROUPPER
256-bit use.

AVX2

Introduced in Intel's Haswell microarchitecture and AMD's Excavator.

Expansion of most vector integer SSE and AVX instructions to 256 bits
Instruction Description

VBROADCASTSS Copy a 32-bit or 64-bit register operand to all elements of a XMM or YMM vector register.
These are register versions of the same instructions in AVX1. There is no 128-bit version
VBROADCASTSD however, but the same effect can be simply achieved using VINSERTF128.
VPBROADCASTB
VPBROADCASTW Copy an 8, 16, 32 or 64-bit integer register or memory operand to all elements of a XMM or
VPBROADCASTD YMM vector register.

VPBROADCASTQ
VBROADCASTI128 Copy a 128-bit memory operand to all elements of a YMM vector register.
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a
VINSERTI128
128-bit source operand. The other half of the destination is unchanged.
Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value
VEXTRACTI128
to a 128-bit destination operand.
VGATHERDPD
VGATHERQPD Gathers single or double precision floating point values using either 32 or 64-bit indices and
VGATHERDPS scale.

VGATHERQPS
VPGATHERDD
VPGATHERDQ
Gathers 32 or 64-bit integer values using either 32 or 64-bit indices and scale.
VPGATHERQD
VPGATHERQQ
Conditionally reads any number of elements from a SIMD vector memory operand into a
VPMASKMOVD
destination register, leaving the remaining vector elements unread and setting the
corresponding elements in the destination register to zero. Alternatively, conditionally writes
VPMASKMOVQ any number of elements from a SIMD vector register operand to a vector memory operand,
leaving the remaining elements of the memory operand unchanged.
VPERMPS Shuffle the eight 32-bit vector elements of one 256-bit source operand into a 256-bit
VPERMD destination operand, with a register or memory operand as selector.

VPERMPD Shuffle the four 64-bit vector elements of one 256-bit source operand into a 256-bit destination
VPERMQ operand, with a register or memory operand as selector.

Shuffle (two of) the four 128-bit vector elements of two 256-bit source operands into a 256-bit
VPERM2I128
destination operand, with an immediate constant as selector.
VPBLENDD Doubleword immediate version of the PBLEND instructions from SSE4.
VPSLLVD Shift left logical. Allows variable shifts where each element is shifted according to the packed
VPSLLVQ input.

VPSRLVD Shift right logical. Allows variable shifts where each element is shifted according to the
VPSRLVQ packed input.

Shift right arithmetically. Allows variable shifts where each element is shifted according to the
VPSRAVD
packed input.

AVX-512

Introduced in Intel's Xeon Phi x200


Vector operations on 512 bit registers.

AVX-512 foundation
Instruction Description
VBLENDMPD Blend float64 vectors using opmask control
VBLENDMPS Blend float32 vectors using opmask control
VPBLENDMD Blend int32 vectors using opmask control
VPBLENDMQ Blend int64 vectors using opmask control
VPCMPD
Compare signed/unsigned doublewords into mask
VPCMPUD
VPCMPQ
Compare signed/unsigned quadwords into mask
VPCMPUQ
VPTESTMD
Logical AND and set mask for 32 or 64 bit integers.
VPTESTMQ
VPTESTNMD
Logical NAND and set mask for 32 or 64 bit integers.
VPTESTNMQ
VCOMPRESSPD
Store sparse packed double/single-precision floating-point values into dense memory
VCOMPRESSPS
VPCOMPRESSD
Store sparse packed doubleword/quadword integer values into dense memory/register
VPCOMPRESSQ
VEXPANDPD
Load sparse packed double/single-precision floating-point values from dense memory
VEXPANDPS
VPEXPANDD
Load sparse packed doubleword/quadword integer values from dense memory/register
VPEXPANDQ
VPERMI2PD
Full single/double floating point permute overwriting the index.
VPERMI2PS
VPERMI2D
Full doubleword/quadword permute overwriting the index.
VPERMI2Q
VPERMT2PS
Full single/double floating point permute overwriting first source.
VPERMT2PD
VPERMT2D
Full doubleword/quadword permute overwriting first source.
VPERMT2Q
VSHUFF32x4
VSHUFF64x2
Shuffle four packed 128-bit lines.
VSHUFFI32x4
VSHUFFI64x2
VPTERNLOGD
Bitwise Ternary Logic
VPTERNLOGQ
VPMOVQD Down convert quadword or doubleword to doubleword, word or byte; unsaturated, saturated or
saturated unsigned. The reverse of the sign/zero extend instructions from SSE4.1.
VPMOVSQD
VPMOVUSQD
VPMOVQW
VPMOVSQW
VPMOVUSQW
VPMOVQB
VPMOVSQB
VPMOVUSQB
VPMOVDW
VPMOVSDW
VPMOVUSDW
VPMOVDB
VPMOVSDB
VPMOVUSDB
VCVTPS2UDQ
VCVTPD2UDQ Convert with or without truncation, packed single or double-precision floating point to packed
VCVTTPS2UDQ unsigned doubleword integers.

VCVTTPD2UDQ
VCVTSS2USI
VCVTSD2USI Convert with or without trunction, scalar single or double-precision floating point to unsigned
VCVTTSS2USI doubleword integer.

VCVTTSD2USI
VCVTUDQ2PS Convert packed unsigned doubleword integers to packed single or double-precision floating
VCVTUDQ2PD point.

VCVTUSI2PS
Convert scalar unsigned doubleword integers to single or double-precision floating point.
VCVTUSI2PD
VCVTUSI2SD
Convert scalar unsigned integers to single or double-precision floating point.
VCVTUSI2SS
VCVTQQ2PD
Convert packed quadword integers to packed single or double-precision floating point.
VCVTQQ2PS
VGETEXPPD
Convert exponents of packed fp values into fp values
VGETEXPPS
VGETEXPSD
Convert exponent of scalar fp value into fp value
VGETEXPSS
VGETMANTPD
Extract vector of normalized mantissas from float32/float64 vector
VGETMANTPS
VGETMANTSD
Extract float32/float64 of normalized mantissa from float32/float64 scalar
VGETMANTSS
VFIXUPIMMPD
Fix up special packed float32/float64 values
VFIXUPIMMPS
VFIXUPIMMSD Fix up special scalar float32/float64 value
VFIXUPIMMSS
VRCP14PD
Compute approximate reciprocals of packed float32/float64 values
VRCP14PS
VRCP14SD
Compute approximate reciprocals of scalar float32/float64 value
VRCP14SS
VRNDSCALEPS
Round packed float32/float64 values to include a given number of fraction bits
VRNDSCALEPD
VRNDSCALESS
Round scalar float32/float64 value to include a given number of fraction bits
VRNDSCALESD
VRSQRT14PD
Compute approximate reciprocals of square roots of packed float32/float64 values
VRSQRT14PS
VRSQRT14SD
Compute approximate reciprocal of square root of scalar float32/float64 value
VRSQRT14SS
VSCALEFPS
Scale packed float32/float64 values with float32/float64 values
VSCALEFPD
VSCALEFSS
Scale scalar float32/float64 value with float32/float64 value
VSCALEFSD
VALIGND
Align doubleword or quadword vectors
VALIGNQ
VPABSQ Packed absolute value quadword
VPMAXSQ
Maximum of packed signed/unsigned quadword
VPMAXUQ
VPMINSQ
Minimum of packed signed/unsigned quadword
VPMINUQ
VPROLD
VPROLVD
VPROLQ
VPROLVQ
Bit rotate left or right
VPRORD
VPRORVD
VPRORQ
VPRORVQ
VPSCATTERDD
VPSCATTERDQ
Scatter packed doubleword/quadword with signed doubleword and quadword indices
VPSCATTERQD
VPSCATTERQQ
VSCATTERDPS Scatter packed float32/float64 with signed doubleword and quadword indices
VSCATTERDPD
VSCATTERQPS
VSCATTERQPD

Cryptographic instructions

Intel AES instructions

6 new instructions.

Instruction Description
AESENC Perform one round of an AES encryption flow
AESENCLAST Perform the last round of an AES encryption flow
AESDEC Perform one round of an AES decryption flow
AESDECLAST Perform the last round of an AES decryption flow
AESKEYGENASSIST Assist in AES round key generation
AESIMC Assist in AES Inverse Mix Columns

RDRAND and RDSEED

Instruction Description
RDRAND Read Random Number
RDSEED Read Random Seed

Intel SHA instructions

7 new instructions.

Instruction Description
SHA1RNDS4 Perform Four Rounds of SHA1 Operation
SHA1NEXTE Calculate SHA1 State Variable E after Four Rounds
SHA1MSG1 Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
SHA1MSG2 Perform a Final Calculation for the Next Four SHA1 Message Dwords
SHA256RNDS2 Perform Two Rounds of SHA256 Operation
SHA256MSG1 Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
SHA256MSG2 Perform a Final Calculation for the Next Four SHA256 Message Dwords

Undocumented instructions

Undocumented x86 instructions


The x86 CPUs contain undocumented instructions which are implemented on the chips but not listed in some
official documents. They can be found in various sources across the Internet, such as Ralf Brown's Interrupt
List and at sandpile.org (https://www.sandpile.org/)

Mnemonic Opcode Description Status


Available beginning with
8086, documented since
Divide AL by imm8, put the quotient in AH, and the remainder
AAM imm8 D4 imm8 Pentium (earlier
in AL
documentation lists no
arguments)
Available beginning with
8086, documented since
AAD imm8 D5 imm8 Multiplication counterpart of AAM Pentium (earlier
documentation lists no
arguments)
Available beginning with
Set AL depending on the value of the Carry Flag (a 1-byte 8086, but only
SALC D6
alternative of SBB AL, AL) documented since
Pentium Pro.
Available beginning with
80386, documented (as
ICEBP F1 Single byte single-step exception / Invoke ICE
INT1) since Pentium
Pro
Exact purpose unknown, causes CPU hang (HCF). The only
way out is CPU reset.[14]

In some implementations, emulated through BIOS


as a halting sequence.[15]
Unknown
0F 04 Only available on 80286
mnemonic In a forum post at the Vintage Computing
Federation (http://www.vcfed.org/forum/showthread.
php?70386-I-found-the-SAVEALL-opcode), this
instruction is explained as SAVEALL. It interacts
with ICE mode.

LOADALL 0F 05 Loads All Registers from Memory Address 0x000800H Only available on 80286
LOADALLD 0F 07 Loads All Registers from Memory Address ES:EDI Only available on 80386
Intentionally undefined instruction, but unlike UD2 this was not
UD1 0F B9
published
Only available on some
Jump and execute instructions in the undocumented Alternate
ALTINST 0F 3F x86 processors made
Instruction Set.
by VIA Technologies.

Undocumented x87 instructions

FFREEP performs FFREE ST(i) and pop stack

See also
CLMUL
RDRAND
Larrabee extensions
Advanced Vector Extensions 2
Bit Manipulation Instruction Sets
CPUID

References
1. "Re: Intel Processor Identification and the CPUID Instruction" (http://www.intel.com/content/ww
w/us/en/processors/processor-identification-cpuid-instruction-note.html?wapkw=processor-ide
ntification-cpuid-instruction). Retrieved 2013-04-21.
2. Toth, Ervin (1998-03-16). "BSWAP with 16-bit registers" (https://web.archive.org/web/19991103
025640/http://www.df.lth.se/~john_e/gems/gem000c.html). Archived from the original (http://ww
w.df.lth.se/~john_e/gems/gem000c.html) on 1999-11-03. "The instruction brings down the
upper word of the doubleword register without affecting its upper 16 bits."
3. Coldwin, Gynvael (2009-12-29). "BSWAP + 66h prefix" (https://gynvael.coldwind.pl/?id=268).
Retrieved 2018-10-03. "internal (zero-)extending the value of a smaller (16-bit) register …
applying the bswap to a 32-bit value "00 00 AH AL", … truncated to lower 16-bits, which are
"00 00". … Bochs … bswap reg16 acts just like the bswap reg32 … QEMU … ignores the 66h
prefix"
4. "RSM—Resume from System Management Mode" (https://web.archive.org/web/201203122246
25/http://www.softeng.rl.ac.uk/st/archive/SoftEng/SESP/html/SoftwareTools/vtune/users_guide/
mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instru
ct32_hh/vc279.htm). Archived from the original on 2012-03-12.
5. Intel 64 and IA-32 Architectures Optimization Reference Manual (https://www.intel.com/content/
www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html),
section 7.3.2
6. Intel 64 and IA-32 Architectures Software Developer’s Manual (https://www.intel.com/content/d
am/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instructio
n-set-reference-manual-325383.pdf), section 4.3, subsection "PREFETCHh—Prefetch Data
Into Caches"
7. Hollingsworth, Brent. "New "Bulldozer" and "Piledriver" instructions" (http://amd-dev.wpengine.
netdna-cdn.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf)
(pdf). Advanced Micro Devices, Inc. Retrieved 11 December 2014.
8. "Family 16h AMD A-Series Data Sheet" (http://support.amd.com/TechDocs/52169_KB_A_Seri
es_Mobile.pdf) (PDF). amd.com. AMD. October 2013. Retrieved 2014-01-02.
9. "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System
Instructions" (http://support.amd.com/TechDocs/24594.pdf) (PDF). amd.com. AMD. October
2013. Retrieved 2014-01-02.
10. "tbmintrin.h from GCC 4.8" (https://gcc.gnu.org/viewcvs/gcc/branches/gcc-4_8-branch/gcc/confi
g/i386/tbmintrin.h?revision=196696&view=markup). Retrieved 2014-03-17.
11. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-
architectures-optimization-manual.pdf section 3.5.2.3
12. "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly
programmers and compiler makers" (http://www.agner.org/optimize/microarchitecture.pdf)
(PDF). Retrieved October 17, 2016.
13. "Chess programming AVX2" (https://chessprogramming.wikispaces.com/AVX2). Retrieved
October 17, 2016.
14. "Re: Undocumented opcodes (HINT_NOP)" (https://web.archive.org/web/20041106070621/htt
p://www.sandpile.org/post/msgs/20004129.htm). Archived from the original (http://www.sandpil
e.org/post/msgs/20004129.htm) on 2004-11-06. Retrieved 2010-11-07.
15. "Re: Also some undocumented 0Fh opcodes" (https://web.archive.org/web/20030626044017/h
ttp://www.sandpile.org/post/msgs/20003986.htm). Archived from the original (http://www.sandpil
e.org/post/msgs/20003986.htm) on 2003-06-26. Retrieved 2010-11-07.

External links
Free IA-32 and x86-64 documentation (https://software.intel.com/en-us/articles/intel-sdm),
provided by Intel
x86 Opcode and Instruction Reference (http://ref.x86asm.net)
x86 and amd64 instruction reference (https://www.felixcloutier.com/x86/index.html)
Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns
for Intel, AMD and VIA CPUs (https://www.agner.org/optimize/instruction_tables.pdf)
Netwide Assembler Instruction List (https://www.nasm.us/doc/nasmdocb.html) (from Netwide
Assembler)

Retrieved from "https://en.wikipedia.org/w/index.php?title=X86_instruction_listings&oldid=988147692"

This page was last edited on 11 November 2020, at 11:20 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.

You might also like