X86 Instruction Listings
X86 Instruction Listings
The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The
instructions are usually part of an executable program, often stored as a computer file and executed on the
processor.
The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as
new functionality.[1]
Contents
x86 integer instructions
Original 8086/8088 instructions
Added in specific processors
Added with 80186/80188
Added with 80286
Added with 80386
Added with 80486
Added with Pentium
Added with Pentium MMX
Added with AMD K6
Added with Pentium Pro
Added with Pentium II
Added with SSE
Added with SSE2
Added with SSE3
Added with SSE4.2
Added with x86-64
Added with AMD-V
Added with Intel VT-x
Added with ABM
Added with BMI1
Added with BMI2
Added with TBM
Added with CLMUL instruction set
Added with Intel ADX
x87 floating-point instructions
Original 8087 instructions
Added in specific processors
Added with 80287
Added with 80387
Added with Pentium Pro
Added with SSE
Added with SSE3
SIMD instructions
MMX instructions
Original MMX instructions
MMX instructions added in specific processors
EMMI instructions
MMX instructions added with MMX+ and SSE
MMX instructions added with SSE2
MMX instructions added with SSSE3
3DNow! instructions
3DNow!+ instructions
Added with Athlon and K6-2+
Added with Geode GX
SSE instructions
SSE2 instructions
SSE2 SIMD floating-point instructions
SSE2 data movement instructions
SSE2 packed arithmetic instructions
SSE2 logical instructions
SSE2 compare instructions
SSE2 shuffle and unpack instructions
SSE2 conversion instructions
SSE2 SIMD integer instructions
SSE2 MMX-like instructions extended to SSE registers
SSE2 integer instructions for SSE registers only
SSE3 instructions
SSE3 SIMD floating-point instructions
SSE3 SIMD integer instructions
SSSE3 instructions
SSE4 instructions
SSE4.1
SSE4.1 SIMD floating-point instructions
SSE4.1 SIMD integer instructions
SSE4a
SSE4.2
SSE5 derived instructions
XOP
F16C
FMA3
FMA4
AVX
AVX2
AVX-512
AVX-512 foundation
Cryptographic instructions
Intel AES instructions
RDRAND and RDSEED
Intel SHA instructions
Undocumented instructions
Undocumented x86 instructions
Undocumented x87 instructions
See also
References
External links
0x69, 0x6B
(1) DX:AX = AX * r/m; (2) AX = AL * (both since
IMUL Signed multiply 80186), 0xF7/5,
r/m
0xF6/5, 0x0FAF
(since 80386)
(1) AL = port[imm]; (2) AL =
0xE4, 0xE5,
IN Input from port port[DX]; (3) AX = port[imm]; (4)
0xEC, 0xED
AX = port[DX];
0x40…0x47,
INC Increment by 1
0xFE/0, 0xFF/0
INT Call to interrupt 0xCC, 0xCD
INTO Call to interrupt if overflow 0xCE
IRET Return from interrupt 0xCF
(JA, JAE, JB, JBE, JC, JE, JG,
JGE, JL, JLE, JNA, JNAE, JNB, 0x70…0x7F,
0x0F80…
Jcc Jump if condition JNBE, JNC, JNE, JNG, JNGE, JNL,
0x0F8F (since
JNLE, JNO, JNP, JNS, JNZ, JO,
80386)
JP, JPE, JPO, JS, JZ)
if (DF==0)
*(byte*)DI++ = *(byte*)SI++;
MOVSB Move byte from string to string else 0xA4
*(byte*)DI-- = *(byte*)SI--;
0x08…0x0D,
0x80…0x81/1,
OR Logical OR (1) r/m |= r/imm; (2) r |= m/imm;
0x82…0x83/1
(since 80186)
(1) port[imm] = AL; (2) port[DX] =
0xE6, 0xE7,
OUT Output to port AL; (3) port[imm] = AX; (4)
0xEE, 0xEF
port[DX] = AX;
0x07,
r/m = *SP++; POP CS (opcode 0x0F) 0x0F(8086/8088
POP Pop data from stack works only on 8086/8088. Later CPUs use only), 0x17,
0x0F as a prefix for newer instructions. 0x1F, 0x58…
0x5F, 0x8F/0
POPF Pop FLAGS register from stack FLAGS = *SP++; 0x9D
0x06, 0x0E,
0x16, 0x1E,
0x50…0x57,
PUSH Push data onto stack *--SP = r/m;
0x68, 0x6A
(both since
80186), 0xFF/6
PUSHF Push FLAGS onto stack *--SP = FLAGS; 0x9C
0xC0…0xC1/2
RCL Rotate left (with carry) (since 80186),
0xD0…0xD3/2
0xC0…0xC1/3
RCR Rotate right (with carry) (since 80186),
0xD0…0xD3/3
Repeat
REPxx (REP, REPE, REPNE, REPNZ, REPZ) 0xF2, 0xF3
MOVS/STOS/CMPS/LODS/SCAS
Not a real instruction. The assembler will
translate these to a RETN or a RETF
RET Return from procedure
depending on the memory model of the
target system.
RETN Return from near procedure 0xC2, 0xC3
RETF Return from far procedure 0xCA, 0xCB
0xC0…0xC1/0
ROL Rotate left (since 80186),
0xD0…0xD3/0
0xC0…0xC1/1
ROR Rotate right (since 80186),
0xD0…0xD3/1
SAHF Store AH into FLAGS 0x9E
SAL Shift Arithmetically left (signed (1) r/m <<= 1; (2) r/m <<= CL; 0xC0…0xC1/4
shift left) (since 80186),
0xD0…0xD3/4
0x28…0x2D,
0x80…0x81/5,
SUB Subtraction (1) r/m -= r/imm; (2) r -= m/imm;
0x82…0x83/5
(since 80186)
0x84, 0x84,
TEST Logical compare (AND) (1) r/m & r/imm; (2) r & m/imm; 0xA8, 0xA9,
0xF6/0, 0xF7/0
Waits until BUSY# pin is inactive (used
WAIT Wait until not busy 0x9B
with floating-point unit)
r :=: r/m; A spinlock typically uses 0x86, 0x87,
XCHG Exchange data
xchg as an atomic operation. (coma bug). 0x91…0x97
XLAT Table look-up translation behaves like MOV AL, [BX+AL] 0xD7
0x30…0x35,
0x80…0x81/6,
XOR Exclusive OR (1) r/m ^= r/imm; (2) r ^= m/imm;
0x82…0x83/6
(since 80186)
equivalent to
POP DI
POP SI
Pop all general POP BP
purpose POP AX ; no POP SP here, all it does is
POPA ADD SP, 2 (since AX will be overwritten
registers from later)
stack POP BX
POP DX
POP CX
POP AX
equivalent to
PUSH AX
PUSH CX
Push all general PUSH DX
purpose PUSH BX
PUSHA
registers onto PUSH SP ; The value stored is the initial
stack SP value
PUSH BP
PUSH SI
PUSH DI
equivalent to
Push an
immediate
PUSH immediate PUSH 12h
byte/word value PUSH 1200h
onto the stack
MOVZX Move with zero-extension (long)r = (unsigned char) r/m; and similar
Also MMX registers and MMX support instructions were added. They are usable for both integer and floating
point operations, see below.
AMD changed the CPUID detection bit for this feature from the K6-II on.
AMD introduced TBM together with BMI1 in its Piledriver[7] line of processors; later AMD Jaguar and Zen-
based processors do not support TBM.[8] No Intel processors (as of 2020) support TBM.
Instruction Description[9] Equivalent C expression[10]
BEXTR Bit field extract (with immediate) (src >> start) & ((1 << len) - 1)
Instruction Description
Adds two unsigned integers plus carry, reading the carry from the carry flag and if necessary setting it
ADCX
there. Does not affect other flags than the carry.
Adds two unsigned integers plus carry, reading the carry from the overflow flag and if necessary
ADOX
setting it there. Does not affect other flags than the overflow.
FXRSTOR, FXSAVE
These are also supported on later Pentium IIs which do not contain SSE support
SIMD instructions
MMX instructions
MMX instructions operate on the mm registers, which are 64 bits wide. They are shared with the FPU
registers.
EMMI instructions
The following MMX instruction were added with SSE. They are also available on the Athlon under the name
MMX+.
Instruction Opcode Meaning
MASKMOVQ mm1, mm2 0F F7 /r Masked Move of Quadword
MOVNTQ m64, mm 0F E7 /r Move Quadword Using Non-Temporal Hint
PSHUFW mm1, mm2/m64, imm8 0F 70 /r ib Shuffle Packed Words
PINSRW mm, r32/m16, imm8 0F C4 /r Insert Word
PEXTRW reg, mm, imm8 0F C5 /r Extract Word
PMOVMSKB reg, mm 0F D7 /r Move Byte Mask
PMINUB mm1, mm2/m64 0F DA /r Minimum of Packed Unsigned Byte Integers
PMAXUB mm1, mm2/m64 0F DE /r Maximum of Packed Unsigned Byte Integers
PAVGB mm1, mm2/m64 0F E0 /r Average Packed Integers
PAVGW mm1, mm2/m64 0F E3 /r Average Packed Integers
PMULHUW mm1, mm2/m64 0F E4 /r Multiply Packed Unsigned Integers and Store High Result
PMINSW mm1, mm2/m64 0F EA /r Minimum of Packed Signed Word Integers
PMAXSW mm1, mm2/m64 0F EE /r Maximum of Packed Signed Word Integers
PSADBW mm1, mm2/m64 0F F6 /r Compute Sum of Absolute Differences
3DNow! instructions
FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE, PFCMPGT, PFMAX, PFMIN,
PFMUL, PFRCP, PFRCPIT1, PFRCPIT2, PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD,
PMULHRW, PREFETCH, PREFETCHW
3DNow!+ instructions
PFRSQRTV, PFRCPV
SSE instructions
SSE instructions operate on xmm registers, which are 128 bit wide.
The floating point single bitwise operations ANDPS, ANDNPS, ORPS and XORPS produce
the same result as the SSE2 integer (PAND, PANDN, POR, PXOR) and double ones (ANDPD,
ANDNPD, ORPD, XORPD), but can introduce extra latency for domain changes when applied
values of the wrong type.[11]
SSE2 instructions
CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD
(CMPS) and MOVSD (MOVS); however, the former refer to scalar double-precision floating-
points whereas the latters refer to doubleword strings.
The following instructions can be used only on SSE registers, since by their nature they do not work on MMX
registers
Instruction Opcode Meaning
Non-Temporal Store of Selected Bytes from an XMM Register
MASKMOVDQU xmm1, xmm2 66 0F F7 /r
into Memory
MOVDQ2Q mm, xmm F2 0F D6 /r Move low quadword from XMM to MMX register.
MOVDQA xmm1, xmm2/m128 66 0F 6F /r Move aligned double quadword
MOVDQA xmm2/m128, xmm1 66 0F 7F /r Move aligned double quadword
MOVDQU xmm1, xmm2/m128 F3 0F 6F /r Move unaligned double quadword
MOVDQU xmm2/m128, xmm1 F3 0F 7F /r Move unaligned double quadword
Move quadword from MMX register to low quadword of XMM
MOVQ2DQ xmm, mm F3 0F D6 /r
register
MOVNTDQ m128, xmm1 66 0F E7 /r Store Packed Integers Using Non-Temporal Hint
PSHUFHW xmm1, xmm2/m128, F3 0F 70 /r
Shuffle packed high words.
imm8 ib
PSHUFLW xmm1, xmm2/m128, F2 0F 70 /r
Shuffle packed low words.
imm8 ib
PSHUFD xmm1, xmm2/m128, 66 0F 70 /r
Shuffle packed doublewords.
imm8 ib
66 0F 73 /7
PSLLDQ xmm1, imm8 Packed shift left logical double quadwords.
ib
66 0F 73 /3
PSRLDQ xmm1, imm8 Packed shift right logical double quadwords.
ib
PUNPCKHQDQ xmm1,
66 0F 6D /r Unpack and interleave high-order quadwords,
xmm2/m128
PUNPCKLQDQ xmm1,
66 0F 6C /r Interleave low quadwords,
xmm2/m128
SSE3 instructions
SSSE3 instructions
The following MMX-like instructions extended to SSE registers were added with SSSE3
Instruction Opcode Meaning
PSIGNB xmm1, 66 0F 38 Negate/zero/preserve packed byte integers depending on corresponding
xmm2/m128 08 /r sign
PSIGNW xmm1, 66 0F 38 Negate/zero/preserve packed word integers depending on corresponding
xmm2/m128 09 /r sign
PSIGND xmm1, 66 0F 38 Negate/zero/preserve packed doubleword integers depending on
xmm2/m128 0A /r corresponding
PSHUFB xmm1, 66 0F 38
Shuffle bytes
xmm2/m128 00 /r
PMULHRSW xmm1, 66 0F 38 Multiply 16-bit signed words, scale and round signed doublewords, pack
xmm2/m128 0B /r high 16 bits
PMADDUBSW xmm1, 66 0F 38 Multiply signed and unsigned bytes, add horizontal pair of signed words,
xmm2/m128 04 /r pack saturated signed-words
PHSUBW xmm1, 66 0F 38
Subtract and pack 16-bit signed integers horizontally
xmm2/m128 05 /r
PHSUBSW xmm1, 66 0F 38
Subtract and pack 16-bit signed integer horizontally with saturation
xmm2/m128 07 /r
PHSUBD xmm1, 66 0F 38
Subtract and pack 32-bit signed integers horizontally
xmm2/m128 06 /r
PHADDSW xmm1, 66 0F 38
Add and pack 16-bit signed integers horizontally with saturation
xmm2/m128 03 /r
PHADDW xmm1, 66 0F 38
Add and pack 16-bit integers horizontally
xmm2/m128 01 /r
PHADDD xmm1, 66 0F 38
Add and pack 32-bit integers horizontally
xmm2/m128 02 /r
PALIGNR xmm1, 66 0F 3A Concatenate destination and source operands, extract byte-aligned result
xmm2/m128, imm8 0F /r ib shifted to the right
PABSB xmm1, 66 0F 38
Compute the absolute value of bytes and store unsigned result
xmm2/m128 1C /r
PABSW xmm1, 66 0F 38
Compute the absolute value of 16-bit integers and store unsigned result
xmm2/m128 1D /r
PABSD xmm1, 66 0F 38
Compute the absolute value of 32-bit integers and store unsigned result
xmm2/m128 1E /r
SSE4 instructions
SSE4.1
SSE4a
EXTRQ/INSERTQ
MOVNTSD/MOVNTSS
SSE4.2
SSE5 was a proposed SSE extension by AMD. The bundle did not include the full set of Intel's SSE4
instructions, making it a competitor to SSE4 rather than a successor. AMD chose not to implement SSE5 as
originally proposed, however, derived SSE extensions were introduced.
XOP
Introduced with the bulldozer processor core, removed again from Zen (microarchitecture) onward.
F16C
Instruction Meaning
VCVTPH2PS Convert four half-precision floating point values in memory or the bottom half of an
xmmreg,xmmrm64 XMM register to four single-precision floating-point values in an XMM register
Convert eight half-precision floating point values in memory or an XMM register (the
VCVTPH2PS
bottom half of a YMM register) to eight single-precision floating-point values in a YMM
ymmreg,xmmrm128
register
VCVTPS2PH Convert four single-precision floating point values in an XMM register to half-precision
xmmrm64,xmmreg,imm8 floating-point values in memory or the bottom half an XMM register
VCVTPS2PH Convert eight single-precision floating point values in a YMM register to half-precision
xmmrm128,ymmreg,imm8 floating-point values in memory or an XMM register
FMA3
Supported in AMD processors starting with the Piledriver architecture and Intel starting with Haswell
processors and Broadwell processors since 2014.
FMA4
Supported in AMD processors starting with the Bulldozer architecture. Not supported by any intel chip as of
2017.
Fused multiply-add with four operands. FMA4 was realized in hardware before FMA3.
Instruction Opcode Meaning Notes
C4E3
VFMADDPD xmm0, xmm1, Fused Multiply-Add of Packed Double-Precision
WvvvvL01 69 /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMADDPS xmm0, xmm1, Fused Multiply-Add of Packed Single-Precision Floating-
WvvvvL01 68 /r
xmm2, xmm3 Point Values
/is4
C4E3
VFMADDSD xmm0, xmm1, Fused Multiply-Add of Scalar Double-Precision Floating-
WvvvvL01 6B /r
xmm2, xmm3 Point Values
/is4
C4E3
VFMADDSS xmm0, xmm1, Fused Multiply-Add of Scalar Single-Precision Floating-
WvvvvL01 6A /r
xmm2, xmm3 Point Values
/is4
C4E3
VFMADDSUBPD xmm0, Fused Multiply-Alternating Add/Subtract of Packed
WvvvvL01 5D /r
xmm1, xmm2, xmm3 Double-Precision Floating-Point Values
/is4
C4E3
VFMADDSUBPS xmm0, Fused Multiply-Alternating Add/Subtract of Packed
WvvvvL01 5C /r
xmm1, xmm2, xmm3 Single-Precision Floating-Point Values
/is4
C4E3
VFMSUBADDPD xmm0, Fused Multiply-Alternating Subtract/Add of Packed
WvvvvL01 5F /r
xmm1, xmm2, xmm3 Double-Precision Floating-Point Values
/is4
C4E3
VFMSUBADDPS xmm0, Fused Multiply-Alternating Subtract/Add of Packed
WvvvvL01 5E /r
xmm1, xmm2, xmm3 Single-Precision Floating-Point Values
/is4
C4E3
VFMSUBPD xmm0, xmm1, Fused Multiply-Subtract of Packed Double-Precision
WvvvvL01 6D /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMSUBPS xmm0, xmm1, Fused Multiply-Subtract of Packed Single-Precision
WvvvvL01 6C /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMSUBSD xmm0, xmm1, Fused Multiply-Subtract of Scalar Double-Precision
WvvvvL01 6F /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFMSUBSS xmm0, xmm1, Fused Multiply-Subtract of Scalar Single-Precision
WvvvvL01 6E /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMADDPD xmm0, xmm1, Fused Negative Multiply-Add of Packed Double-
WvvvvL01 79 /r
xmm2, xmm3 Precision Floating-Point Values
/is4
C4E3
VFNMADDPS xmm0, xmm1, Fused Negative Multiply-Add of Packed Single-Precision
WvvvvL01 78 /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMADDSD xmm0, xmm1, Fused Negative Multiply-Add of Scalar Double-Precision
WvvvvL01 7B /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMADDSS xmm0, xmm1, Fused Negative Multiply-Add of Scalar Single-Precision
WvvvvL01 7A /r
xmm2, xmm3 Floating-Point Values
/is4
C4E3
VFNMSUBPD xmm0, xmm1, Fused Negative Multiply-Subtract of Packed Double-
WvvvvL01 7D /r
xmm2, xmm3 Precision Floating-Point Values
/is4
VFNMSUBPS xmm0, xmm1, C4E3 Fused Negative Multiply-Subtract of Packed Single-
xmm2, xmm3 WvvvvL01 7C /r Precision Floating-Point Values
/is4
C4E3
VFNMSUBSD xmm0, xmm1, Fused Negative Multiply-Subtract of Scalar Double-
WvvvvL01 7F /r
xmm2, xmm3 Precision Floating-Point Values
/is4
C4E3
VFNMSUBSS xmm0, xmm1, Fused Negative Multiply-Subtract of Scalar Single-
WvvvvL01 7E /r
xmm2, xmm3 Precision Floating-Point Values
/is4
AVX
AVX were first supported by Intel with Sandy Bridge and by AMD with Bulldozer.
Instruction Description
VBROADCASTSS
Copy a 32-bit, 64-bit or 128-bit memory operand to all elements of a XMM or YMM vector
VBROADCASTSD
register.
VBROADCASTF128
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a
VINSERTF128
128-bit source operand. The other half of the destination is unchanged.
Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value
VEXTRACTF128
to a 128-bit destination operand.
Conditionally reads any number of elements from a SIMD vector memory operand into a
VMASKMOVPS destination register, leaving the remaining vector elements unread and setting the
corresponding elements in the destination register to zero. Alternatively, conditionally writes
any number of elements from a SIMD vector register operand to a vector memory operand,
leaving the remaining elements of the memory operand unchanged. On the AMD Jaguar
processor architecture, this instruction with a memory source operand takes more than 300
VMASKMOVPD clock cycles when the mask is zero, in which case the instruction should do nothing. This
appears to be a design flaw.[12]
VPERMILPS Permute In-Lane. Shuffle the 32-bit or 64-bit vector elements of one input operand. These are
in-lane 256-bit instructions, meaning that they operate on all 256 bits with two separate 128-
VPERMILPD bit shuffles, so they can not shuffle across the 128-bit lanes.[13]
Shuffle the four 128-bit vector elements of two 256-bit source operands into a 256-bit
VPERM2F128
destination operand, with an immediate constant as selector.
Set all YMM registers to zero and tag them as unused. Used when switching between 128-bit
VZEROALL
use and 256-bit use.
Set the upper half of all YMM registers to zero. Used when switching between 128-bit use and
VZEROUPPER
256-bit use.
AVX2
Expansion of most vector integer SSE and AVX instructions to 256 bits
Instruction Description
VBROADCASTSS Copy a 32-bit or 64-bit register operand to all elements of a XMM or YMM vector register.
These are register versions of the same instructions in AVX1. There is no 128-bit version
VBROADCASTSD however, but the same effect can be simply achieved using VINSERTF128.
VPBROADCASTB
VPBROADCASTW Copy an 8, 16, 32 or 64-bit integer register or memory operand to all elements of a XMM or
VPBROADCASTD YMM vector register.
VPBROADCASTQ
VBROADCASTI128 Copy a 128-bit memory operand to all elements of a YMM vector register.
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a
VINSERTI128
128-bit source operand. The other half of the destination is unchanged.
Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value
VEXTRACTI128
to a 128-bit destination operand.
VGATHERDPD
VGATHERQPD Gathers single or double precision floating point values using either 32 or 64-bit indices and
VGATHERDPS scale.
VGATHERQPS
VPGATHERDD
VPGATHERDQ
Gathers 32 or 64-bit integer values using either 32 or 64-bit indices and scale.
VPGATHERQD
VPGATHERQQ
Conditionally reads any number of elements from a SIMD vector memory operand into a
VPMASKMOVD
destination register, leaving the remaining vector elements unread and setting the
corresponding elements in the destination register to zero. Alternatively, conditionally writes
VPMASKMOVQ any number of elements from a SIMD vector register operand to a vector memory operand,
leaving the remaining elements of the memory operand unchanged.
VPERMPS Shuffle the eight 32-bit vector elements of one 256-bit source operand into a 256-bit
VPERMD destination operand, with a register or memory operand as selector.
VPERMPD Shuffle the four 64-bit vector elements of one 256-bit source operand into a 256-bit destination
VPERMQ operand, with a register or memory operand as selector.
Shuffle (two of) the four 128-bit vector elements of two 256-bit source operands into a 256-bit
VPERM2I128
destination operand, with an immediate constant as selector.
VPBLENDD Doubleword immediate version of the PBLEND instructions from SSE4.
VPSLLVD Shift left logical. Allows variable shifts where each element is shifted according to the packed
VPSLLVQ input.
VPSRLVD Shift right logical. Allows variable shifts where each element is shifted according to the
VPSRLVQ packed input.
Shift right arithmetically. Allows variable shifts where each element is shifted according to the
VPSRAVD
packed input.
AVX-512
AVX-512 foundation
Instruction Description
VBLENDMPD Blend float64 vectors using opmask control
VBLENDMPS Blend float32 vectors using opmask control
VPBLENDMD Blend int32 vectors using opmask control
VPBLENDMQ Blend int64 vectors using opmask control
VPCMPD
Compare signed/unsigned doublewords into mask
VPCMPUD
VPCMPQ
Compare signed/unsigned quadwords into mask
VPCMPUQ
VPTESTMD
Logical AND and set mask for 32 or 64 bit integers.
VPTESTMQ
VPTESTNMD
Logical NAND and set mask for 32 or 64 bit integers.
VPTESTNMQ
VCOMPRESSPD
Store sparse packed double/single-precision floating-point values into dense memory
VCOMPRESSPS
VPCOMPRESSD
Store sparse packed doubleword/quadword integer values into dense memory/register
VPCOMPRESSQ
VEXPANDPD
Load sparse packed double/single-precision floating-point values from dense memory
VEXPANDPS
VPEXPANDD
Load sparse packed doubleword/quadword integer values from dense memory/register
VPEXPANDQ
VPERMI2PD
Full single/double floating point permute overwriting the index.
VPERMI2PS
VPERMI2D
Full doubleword/quadword permute overwriting the index.
VPERMI2Q
VPERMT2PS
Full single/double floating point permute overwriting first source.
VPERMT2PD
VPERMT2D
Full doubleword/quadword permute overwriting first source.
VPERMT2Q
VSHUFF32x4
VSHUFF64x2
Shuffle four packed 128-bit lines.
VSHUFFI32x4
VSHUFFI64x2
VPTERNLOGD
Bitwise Ternary Logic
VPTERNLOGQ
VPMOVQD Down convert quadword or doubleword to doubleword, word or byte; unsaturated, saturated or
saturated unsigned. The reverse of the sign/zero extend instructions from SSE4.1.
VPMOVSQD
VPMOVUSQD
VPMOVQW
VPMOVSQW
VPMOVUSQW
VPMOVQB
VPMOVSQB
VPMOVUSQB
VPMOVDW
VPMOVSDW
VPMOVUSDW
VPMOVDB
VPMOVSDB
VPMOVUSDB
VCVTPS2UDQ
VCVTPD2UDQ Convert with or without truncation, packed single or double-precision floating point to packed
VCVTTPS2UDQ unsigned doubleword integers.
VCVTTPD2UDQ
VCVTSS2USI
VCVTSD2USI Convert with or without trunction, scalar single or double-precision floating point to unsigned
VCVTTSS2USI doubleword integer.
VCVTTSD2USI
VCVTUDQ2PS Convert packed unsigned doubleword integers to packed single or double-precision floating
VCVTUDQ2PD point.
VCVTUSI2PS
Convert scalar unsigned doubleword integers to single or double-precision floating point.
VCVTUSI2PD
VCVTUSI2SD
Convert scalar unsigned integers to single or double-precision floating point.
VCVTUSI2SS
VCVTQQ2PD
Convert packed quadword integers to packed single or double-precision floating point.
VCVTQQ2PS
VGETEXPPD
Convert exponents of packed fp values into fp values
VGETEXPPS
VGETEXPSD
Convert exponent of scalar fp value into fp value
VGETEXPSS
VGETMANTPD
Extract vector of normalized mantissas from float32/float64 vector
VGETMANTPS
VGETMANTSD
Extract float32/float64 of normalized mantissa from float32/float64 scalar
VGETMANTSS
VFIXUPIMMPD
Fix up special packed float32/float64 values
VFIXUPIMMPS
VFIXUPIMMSD Fix up special scalar float32/float64 value
VFIXUPIMMSS
VRCP14PD
Compute approximate reciprocals of packed float32/float64 values
VRCP14PS
VRCP14SD
Compute approximate reciprocals of scalar float32/float64 value
VRCP14SS
VRNDSCALEPS
Round packed float32/float64 values to include a given number of fraction bits
VRNDSCALEPD
VRNDSCALESS
Round scalar float32/float64 value to include a given number of fraction bits
VRNDSCALESD
VRSQRT14PD
Compute approximate reciprocals of square roots of packed float32/float64 values
VRSQRT14PS
VRSQRT14SD
Compute approximate reciprocal of square root of scalar float32/float64 value
VRSQRT14SS
VSCALEFPS
Scale packed float32/float64 values with float32/float64 values
VSCALEFPD
VSCALEFSS
Scale scalar float32/float64 value with float32/float64 value
VSCALEFSD
VALIGND
Align doubleword or quadword vectors
VALIGNQ
VPABSQ Packed absolute value quadword
VPMAXSQ
Maximum of packed signed/unsigned quadword
VPMAXUQ
VPMINSQ
Minimum of packed signed/unsigned quadword
VPMINUQ
VPROLD
VPROLVD
VPROLQ
VPROLVQ
Bit rotate left or right
VPRORD
VPRORVD
VPRORQ
VPRORVQ
VPSCATTERDD
VPSCATTERDQ
Scatter packed doubleword/quadword with signed doubleword and quadword indices
VPSCATTERQD
VPSCATTERQQ
VSCATTERDPS Scatter packed float32/float64 with signed doubleword and quadword indices
VSCATTERDPD
VSCATTERQPS
VSCATTERQPD
Cryptographic instructions
6 new instructions.
Instruction Description
AESENC Perform one round of an AES encryption flow
AESENCLAST Perform the last round of an AES encryption flow
AESDEC Perform one round of an AES decryption flow
AESDECLAST Perform the last round of an AES decryption flow
AESKEYGENASSIST Assist in AES round key generation
AESIMC Assist in AES Inverse Mix Columns
Instruction Description
RDRAND Read Random Number
RDSEED Read Random Seed
7 new instructions.
Instruction Description
SHA1RNDS4 Perform Four Rounds of SHA1 Operation
SHA1NEXTE Calculate SHA1 State Variable E after Four Rounds
SHA1MSG1 Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
SHA1MSG2 Perform a Final Calculation for the Next Four SHA1 Message Dwords
SHA256RNDS2 Perform Two Rounds of SHA256 Operation
SHA256MSG1 Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
SHA256MSG2 Perform a Final Calculation for the Next Four SHA256 Message Dwords
Undocumented instructions
LOADALL 0F 05 Loads All Registers from Memory Address 0x000800H Only available on 80286
LOADALLD 0F 07 Loads All Registers from Memory Address ES:EDI Only available on 80386
Intentionally undefined instruction, but unlike UD2 this was not
UD1 0F B9
published
Only available on some
Jump and execute instructions in the undocumented Alternate
ALTINST 0F 3F x86 processors made
Instruction Set.
by VIA Technologies.
See also
CLMUL
RDRAND
Larrabee extensions
Advanced Vector Extensions 2
Bit Manipulation Instruction Sets
CPUID
References
1. "Re: Intel Processor Identification and the CPUID Instruction" (http://www.intel.com/content/ww
w/us/en/processors/processor-identification-cpuid-instruction-note.html?wapkw=processor-ide
ntification-cpuid-instruction). Retrieved 2013-04-21.
2. Toth, Ervin (1998-03-16). "BSWAP with 16-bit registers" (https://web.archive.org/web/19991103
025640/http://www.df.lth.se/~john_e/gems/gem000c.html). Archived from the original (http://ww
w.df.lth.se/~john_e/gems/gem000c.html) on 1999-11-03. "The instruction brings down the
upper word of the doubleword register without affecting its upper 16 bits."
3. Coldwin, Gynvael (2009-12-29). "BSWAP + 66h prefix" (https://gynvael.coldwind.pl/?id=268).
Retrieved 2018-10-03. "internal (zero-)extending the value of a smaller (16-bit) register …
applying the bswap to a 32-bit value "00 00 AH AL", … truncated to lower 16-bits, which are
"00 00". … Bochs … bswap reg16 acts just like the bswap reg32 … QEMU … ignores the 66h
prefix"
4. "RSM—Resume from System Management Mode" (https://web.archive.org/web/201203122246
25/http://www.softeng.rl.ac.uk/st/archive/SoftEng/SESP/html/SoftwareTools/vtune/users_guide/
mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instru
ct32_hh/vc279.htm). Archived from the original on 2012-03-12.
5. Intel 64 and IA-32 Architectures Optimization Reference Manual (https://www.intel.com/content/
www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html),
section 7.3.2
6. Intel 64 and IA-32 Architectures Software Developer’s Manual (https://www.intel.com/content/d
am/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instructio
n-set-reference-manual-325383.pdf), section 4.3, subsection "PREFETCHh—Prefetch Data
Into Caches"
7. Hollingsworth, Brent. "New "Bulldozer" and "Piledriver" instructions" (http://amd-dev.wpengine.
netdna-cdn.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf)
(pdf). Advanced Micro Devices, Inc. Retrieved 11 December 2014.
8. "Family 16h AMD A-Series Data Sheet" (http://support.amd.com/TechDocs/52169_KB_A_Seri
es_Mobile.pdf) (PDF). amd.com. AMD. October 2013. Retrieved 2014-01-02.
9. "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System
Instructions" (http://support.amd.com/TechDocs/24594.pdf) (PDF). amd.com. AMD. October
2013. Retrieved 2014-01-02.
10. "tbmintrin.h from GCC 4.8" (https://gcc.gnu.org/viewcvs/gcc/branches/gcc-4_8-branch/gcc/confi
g/i386/tbmintrin.h?revision=196696&view=markup). Retrieved 2014-03-17.
11. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-
architectures-optimization-manual.pdf section 3.5.2.3
12. "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly
programmers and compiler makers" (http://www.agner.org/optimize/microarchitecture.pdf)
(PDF). Retrieved October 17, 2016.
13. "Chess programming AVX2" (https://chessprogramming.wikispaces.com/AVX2). Retrieved
October 17, 2016.
14. "Re: Undocumented opcodes (HINT_NOP)" (https://web.archive.org/web/20041106070621/htt
p://www.sandpile.org/post/msgs/20004129.htm). Archived from the original (http://www.sandpil
e.org/post/msgs/20004129.htm) on 2004-11-06. Retrieved 2010-11-07.
15. "Re: Also some undocumented 0Fh opcodes" (https://web.archive.org/web/20030626044017/h
ttp://www.sandpile.org/post/msgs/20003986.htm). Archived from the original (http://www.sandpil
e.org/post/msgs/20003986.htm) on 2003-06-26. Retrieved 2010-11-07.
External links
Free IA-32 and x86-64 documentation (https://software.intel.com/en-us/articles/intel-sdm),
provided by Intel
x86 Opcode and Instruction Reference (http://ref.x86asm.net)
x86 and amd64 instruction reference (https://www.felixcloutier.com/x86/index.html)
Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns
for Intel, AMD and VIA CPUs (https://www.agner.org/optimize/instruction_tables.pdf)
Netwide Assembler Instruction List (https://www.nasm.us/doc/nasmdocb.html) (from Netwide
Assembler)
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.