IA32 Instruction Set (Short Form)
IA32 Instruction Set (Short Form)
Description of Operands
r8 8-bit general purpose register ptr16:16 16-bir far pointer in a different code segment
r16 16-bit general purpose register ptr32:32 32-bir far pointer in a different code segment
m16:16 a memory location containing a far pointer composed
r32 16-bit general purpose register m32fp a single-precision floating-point memory location
of two 16-bit numbers: segment & offset
EDX:EAX 64-bit integer number, EDX – more significant m64fp a double-precision floating-point memory location
m16:32 a memory location containing a far pointer composed
part, EAX – less significant part m80fp an extended-precision floating-point memory location
of numbers: 16-bit segment & 32-bit offset
m16&16 a memory location containing a data pair: 16&16-bit
imm8 immediate 8-bit value from 128 to 127 m16&32 a memory location containing a data pair: 16&32-bit m16int a word integer memory location
imm16 immediate 16-bit value from 32768 m32&32 a memory location containing a data pair: 32&32-bit m32int a double-word (dword) integer memory location
to +32767 m64int a quad-word (qword) integer memory location
imm32 immediate 32-bit value from 2147483648 to moffs8 simple 8-bit memory location, which actual address is
+2147483647 given by a simple offset relative to segment base ST the top element of the FPU register stack
moffs16 simple 16-bit memory location, which actual address ST(0) the top element of the FPU register stack
r/m8 8-bit general purpose register or memory location is given by a simple offset relative to segment base ST(i) the i-th element of the FPU register stack (i←0..7)
r/m16 16-bit general purpose register or memory location moffs32 simple 32-bit memory location, which actual address
r/m32 32-bit general purpose register or memory location is given by a simple offset relative to segment base
mm 64-bit MMX register from MM0 to MM7
mm/m32 low order 32 bits of an MMX register or 32-bit
m 16-bit or 32-bit memory location Sreg segment register: CS, DS, SS, ES, FS or GS memory location
m8 8-bit memory location mm/m64 MMX register or 64-bit memory location
m16 16-bit memory location rel8 relative address in the range from 128 bytes before
m32 32-bit memory location to 127 bytes after the end of instruction xmm 128-bit XMM register from XMM0 to XMM7
m64 64-bit memory location rel16 16-bit relative address within the same code segment xmm/m32 XMM register or 32-bit memory location
m128 128-bit memory location rel32 32-bit relative address within the same code segment xmm/m64 XMM register or 64-bit memory location
mNbyte N-byte memory location xmm/m128 XMM register or 128-bit memory location
Index Registers Instruction Pointer
Register Set 31 16 15 0 31 16 15 0
← SI → ← IP →
ESI EIP
General Purpose Registers ← DI →
31 16 15 87 0 EDI Segment Registers
← AX →
15 0
EAX AH AL Pointer Registers CS
← BX → DS
31 16 15 0
EBX BH BL ES
← BP →
← CX → FS
EBP
ECX CH CL GS
← SP →
← DX →
ESP
EDX DH DL
Flags
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
← FLAGS →
EFLAGS ID VIP VIF AC VM RF 0 NT IOPL OF DF IF TF SF ZF 0 AF 0 PF 1 CF
31 0
MXCSR
CPU Instruction set
Logical Instructions
Instruction Mnemonic Operands Description Symbolic operations
Logical Negation NOT r/m8 Reverses each bit of the operand DST ← NOT SRC;
r/m16 // EFLAGS.CF, .OF, .ZF, .AF, .PF are not affected
r/m32
Logical AND AND AL, imm8 Performs a bitwise AND operation on the destination (first) and source DST ← DST AND SRC;
AX, imm16 (second) operands and stored the result in the destination operand location. EFLAGS.OF, .CF ← 00B;
EAX, imm32 SET EFLAGS.SF, .ZF, .PF // EFLAGS.AF is undefined
Logical Inclusive OR r/m8, imm8 Performs a bitwise OR operation on the destination (first) and source DST ← DST OR SRC;
OR r/m16, imm16 (second) operands and stored the result in the destination operand location. EFLAGS.OF, .CF ← 00B;
r/m32, imm32 SET EFLAGS.SF, .ZF, .PF // EFLAGS.AF is undefined
Logical Exclusive XOR r/m16, imm8 Performs a bitwise XOR operation on the destination (first) and source DST ← DST OR SRC;
OR r/m32, imm8 (second) operands and stored the result in the destination operand location. EFLAGS.OF, .CF ← 00B;
r/m8, r8 SET EFLAGS.SF, .ZF, .PF
r/m16, r16 // EFLAGS.AF is undefined
r/m32, r32
r8, r/m8
r16, r/m16
r32, r/m32
String Instructions
Instruction Mnemonic Operands Description Symbolic operations
Move String item MOVS m8, m8 Moves byte, word or dword from address DS:(E)SI to the byte, word or ES:[(E)DI] ← DS:[(E)SI];
m16, m16 double word at address ES:(E)DI, and increases or decreases (E)SI and IF (DF=0) THEN
m32, m32 (E)DI (depending on DF flag). Both operands specify only the type of the (E)SI ← (E)SI + SIZEOF(SRC);
compared data, not the location. The locations of the operands are always (E)DI ← (E)DI + SIZEOF(DST);
specified by the DS:(E)SI and ES:(E)DI registers. ELSE
Move String Byte MOVSB Moves byte from address DS:(E)SI to the byte at address ES:(E)DI, and (E)SI ← (E)SI SIZEOF(SRC);
increases or decreases (E)SI and (E)DI (depending on DF flag).
(E)DI ← (E)DI SIZEOF(DST);
Move String Word MOVSW Moves word from address DS:(E)SI to the word at address ES:(E)DI, and
increases or decreases (E)SI and (E)DI (depending on DF flag). END
Move String Dword MOVSD Moves dword from address DS:(E)SI to the dword at address ES:(E)DI,
and increases or decreases (E)SI and (E)DI (depending on DF flag).
Repeat Move String REP MOVS m8, m8 Moves (E)CX bytes, words or dwords from address DS:(E)SI to the byte, WHILE (E)CX<>0 DO
item m16, m16 word or double word at address ES:(E)DI, and increases or decreases MOVS DST, SRC;
m32, m32 (E)SI and (E)DI (depending on DF flag). Both operands specify only the (E)CX ← (E)CX 1
type of the compared data, not the location. The locations of the operands END
are always specified by the DS:(E)SI and ES:(E)DI registers.
Repeat Move String REP MOVSB Moves (E)CX bytes from address DS:(E)SI to the address ES:(E)DI, and WHILE (E)CX<>0 DO
Byte increases or decreases (E)SI and (E)DI (depending on DF flag). MOVS(B|W|D)
Repeat Move String REP MOVSW Moves (E)CX words from address DS:(E)SI to the address ES:(E)DI, and (E)CX ← (E)CX 1
Word increases or decreases (E)SI and (E)DI (depending on DF flag). END
Repeat Move String REP MOVSD Moves (E)CX dwords from address DS:(E)SI to the address ES:(E)DI, and
Dword increases or decreases (E)SI and (E)DI (depending on DF flag).
Load String item LODS m8 Loads byte from address DS:(E)SI to AL, increases or decreases (E)SI. ACC ← DS:[(E)SI];
m16 Loads word from address DS:(E)SI to AX, increases or decreases (E)SI. IF (DF=0) THEN
m32 Loads dword from address DS:(E)SI to EAX, increases or decreases (E)SI (E)SI ← (E)SI + SIZEOF(SRC);
Load String Byte LODSB Loads byte from address DS:(E)SI to AL, increases or decreases (E)SI. ELSE
Load String Word LODSW Loads word from address DS:(E)SI to AX, increases or decreases (E)SI. (E)SI ← (E)SI SIZEOF(SRC);
Load String Dword LODSD Loads dword from address DS:(E)SI to EAX, increases or decreases (E)SI. END
Repeat Load String item REP LODS m8 Loads byte (E)CX times from address DS:(E)SI to AL. WHILE (E)CX<>0 DO
m16 Loads word (E)CX times from address DS:(E)SI to AX. LODS DST;
m32 Loads dword (E)CX times from address DS:(E)SI to EAX. (E)CX ← (E)CX 1
END
Repeat Load String Byte REP LODSB Loads byte (E)CX times from address DS:(E)SI to AL. WHILE (E)CX<>0 DO
LODS(B|W|D)
Repeat Load String REP LODSW Loads word (E)CX times from address DS:(E)SI to AX. (E)CX ← (E)CX 1
Word END
Repeat Load String REP LODSD Loads dword (E)CX times from address DS:(E)SI to EAX.
Dword
Load String item LODS m8 Stores byte from AL to address ES:(E)DI, increases or decreases (E)DI. ES:[(E)DI] ← ACC;
m16 Stores word from AX to address ES:(E)DI, increases or decreases (E)DI. IF (DF=0) THEN
m32 Stores dword from EAX address ES:(E)DI, increases or decreases (E)DI (E)DI ← (E)DI + SIZEOF(DST);
Store String Byte STOSB Stores byte from AL to address ES:(E)DI, increases or decreases (E)DI. ELSE
Store String Word STOSW Stores word from AX to address ES:(E)DI, increases or decreases (E)DI. (E)DI ← (E)DI SIZEOF(DST);
Store String Dword STOSD Stores dword from EAX to address ES:(E)DI, increases or decreases END
(E)DI.
Repeat Store String item REP STOS m8 Stores byte (E)CX times from AL to address ES:(E)DI. WHILE (E)CX<>0 DO
m16 Stores word (E)CX times from AX to address ES:(E)DI. STOS DST;
m32 Stores dword (E)CX times from EAX to address ES:(E)DI. (E)CX ← (E)CX 1
END
Repeat Store String REP STOSB Stores byte (E)CX times from AL to address ES:(E)DI. WHILE (E)CX<>0 DO
Byte STOS(B|W|D)
Repeat Store String REP STOSW Stores word (E)CX times from AX to address ES:(E)DI. (E)CX ← (E)CX 1
Word END
Repeat Store String REP STOSD Stores dword (E)CX times from EAX address ES:(E)DI.
Dword
Compare String item CMPS m8, m8 Compares byte, word or dword at address DS:(E)SI with byte, word or TMP ← ES:[(E)DI] DS:[(E)SI];
m16, m16 dword at address ES:(E)DI and sets the status flags accordingly. Both SET EFLAGS;
m32, m32 operands specify only the type of the compared data, not the location. The IF (DF=0) THEN
locations of the operands are always specified by the DS:(E)SI and (E)SI ← (E)SI + SIZEOF(SRC);
ES:(E)DI registers.
(E)DI ← (E)DI + SIZEOF(DST);
Compare String Byte CMPSB Compares byte at address DS:(E)SI with byte at address ES:(E)DI and sets
ELSE
the status flags accordingly.
(E)SI ← (E)SI SIZEOF(SRC);
Compare String Word CMPSW Compares word at address DS:(E)SI with word at address ES:(E)DI and
sets the status flags accordingly. (E)DI ← (E)DI SIZEOF(DST);
Compare String Dword CMPSD Compares dword at address DS:(E)SI with dword at address ES:(E)DI and END
sets the status flags accordingly.
Repeat Compare String REPE CMPS m8, m8 Repeats (E)CX times comparing byte, word or dword at address DS:(E)SI WHILE (E)CX<>0 DO
item until Equal / Zero REPZ CMPS m16, m16 with byte, word or double word at address ES:(E)DI until ZF flag is set to (E)CX ← (E)CX 1;
m32, m32 0. CMPS DST,SRC;
UNTIL ZF=0
Repeat Compare String REPE CMPSB Repeats (E)CX times comparing byte at address DS:(E)SI with byte at WHILE (E)CX<>0 DO
Byte until Equal / Zero REPZ CMPSB address ES:(E)DI until ZF flag is set to 0. (E)CX ← (E)CX 1;
Repeat Compare String REPE CMPSW Repeats (E)CX times comparing word at address DS:(E)SI with word at CMPS(B|W|D)
Word until Equal / Zero REPZ CMPSW address ES:(E)DI until ZF flag is set to 0.
Repeat Compare String REPE CMPSD Repeats (E)CX times comparing dword at address DS:(E)SI with dword at UNTIL ZF=0
Dword until Equal / REPZ CMPSD address ES:(E)DI until ZF flag is set to 0.
Zero
Repeat Compare String REPNE CMPS m8, m8 Repeats (E)CX times comparing byte, word or dword at address DS:(E)SI WHILE (E)CX<>0 DO
item until Not Equal / REPNZ CMPS m16, m16 with byte, word or double word at address ES:(E)DI until ZF flag is set to (E)CX ← (E)CX 1;
Not Zero m32, m32 1. CMPS DST,SRC;
UNTIL ZF=1
Repeat Compare String REPNE CMPSB Repeats (E)CX times comparing byte at address DS:(E)SI with byte at WHILE (E)CX<>0 DO
Byte until Not Equal REPNZ CMPSB address ES:(E)DI until ZF flag is set to 1. (E)CX ← (E)CX 1;
/Not Zero CMPS(B|W|D)
Repeat Compare String REPNE CMPSW Repeats (E)CX times comparing word at address DS:(E)SI with word at UNTIL ZF=1
Word until Not Equal / REPNZ CMPSW address ES:(E)DI until ZF flag is set to 1.
Not Zero
Repeat Compare String REPNE CMPSD Repeats (E)CX times comparing dword at address DS:(E)SI with dword at
Dword until Not Equal REPNZ CMPSD address ES:(E)DI until ZF flag is set to 1.
Scan String item SCAS m8 Compares AL with byte at ES:(E)DI and sets status flag. TMP ← ACC DS:[(E)SI];
m16 Compares AX with byte at ES:(E)DI and sets status flag. SET EFLAGS;
m32 Compares EAX with byte at ES:(E)DI and sets status flag. IF (DF=0) THEN
Scan String Byte SCASB Compares AL with byte at ES:(E)DI and sets status flag. (E)SI ← (E)SI + SIZEOF(SRC);
Scan String Word SCASW Compares AX with byte at ES:(E)DI and sets status flag. (E)DI ← (E)DI + SIZEOF(DST);
Scan String Dword SCASD Compares EAX with byte at ES:(E)DI and sets status flag. ELSE
(E)SI ← (E)SI SIZEOF(SRC);
(E)DI ← (E)DI SIZEOF(DST);
END
Repeat Scan String item REPE SCAS m8 Repeats (E)CX times comparing accumulator with byte, word or dword at WHILE (E)CX<>0 DO
until Equal / Zero REPZ SCAS m16 address ES:(E)DI until ZF flag is set to 0. (E)CX ← (E)CX 1;
m32 SCAS DST;
UNTIL ZF=1
Repeat Scan String Byte REPE SCASB Repeats (E)CX times comparing AL with byte at address ES:(E)DI until WHILE (E)CX<>0 DO
until Equal / Zero REPZ SCASB ZF flag is set to 0. (E)CX ← (E)CX 1;
Repeat Scan String REPE SCASW Repeats (E)CX times comparing AX with word at address ES:(E)DI until SCAS(B|W|D);
Word until Equal / Zero REPZ SCASW ZF flag is set to 0. UNTIL ZF=1
Repeat Scan String REPE SCASD Repeats (E)CX times comparing EAX with dword at address ES:(E)DI
Dword until Equal / REPZ SCASD until ZF flag is set to 0.
Zero
Repeat Scan String item REPNE SCAS m8 Repeats (E)CX times comparing accumulator with byte, word or dword at WHILE (E)CX<>0 DO
until Not Equal / Not REPNZ SCAS m16 address ES:(E)DI until ZF flag is set to 1. (E)CX ← (E)CX 1;
Zero m32 SCAS DST;
UNTIL ZF=1
Repeat Scan String Byte REPNE SCASB Repeats (E)CX times comparing AL with byte at address ES:(E)DI until WHILE (E)CX<>0 DO
until Not Equal / Not REPNZ SCASB ZF flag is set to 1. (E)CX ← (E)CX 1;
Zero SCAS(B|W|D);
Repeat Scan String REPNE SCASW Repeats (E)CX times comparing AX with word at address ES:(E)DI until UNTIL ZF=1
Word until Not Equal / REPNZ SCASW ZF flag is set to 1.
Not Zero
Repeat Scan String REPNE SCASD Repeats (E)CX times comparing EAX with dword at address ES:(E)DI
Dword until Not Equal / REPNZ SCASD until ZF flag is set to 1.
Not Zero
Input String item INS m8, DX Inputs byte, word or dword from I/O specified in DX into memory ES:[(E)DI] ← Port(DX)
m16, DX location specified with ES:(E)DI. Increments or decrements (E)DI. IF (DF=0) THEN
m32, DX (E)DI ← (E)DI + SIZEOF(DST);
Input String Byte INSB Inputs byte from I/O specified in DX into memory location specified with ELSE
ES:(E)DI. Increments or decrements (E)DI. (E)DI ← (E)DI SIZEOF(DST);
Input String Word INSB Inputs word from I/O specified in DX into memory location specified with END
ES:(E)DI. Increments or decrements (E)DI.
Input String Dword INSB Inputs dword from I/O specified in DX into memory location specified
with ES:(E)DI. Increments or decrements (E)DI.
Repeat Input String item REP INS m8, DX Inputs (E)CX bytes, words or dwords from I/O specified in DX into WHILE (E)CX<>0 DO
m16, DX memory at address specified with ES:(E)DI. Increments or decrements INS SRC,DX;
m32, DX (E)DI. (E)CX ← (E)CX 1
END
Repeat Input String REP INSB Inputs (E)CX bytes from I/O specified in DX into memory at address WHILE (E)CX<>0 DO
Byte specified with ES:(E)DI. Increments or decrements (E)DI. INS(B|W|D);
Repeat Input String REP INSW Inputs (E)CX words from I/O specified in DX into memory at address (E)CX ← (E)CX 1
Word specified with ES:(E)DI. Increments or decrements (E)DI. END
Repeat Input String REP INSD Inputs (E)CX dwords from I/O specified in DX into memory at address
Dword specified with ES:(E)DI. Increments or decrements (E)DI.
Output String item OUTS DX, m8 Outputs byte, word or dword from memory location specified with Port(DX) ← DS:[(E)SI]
DX, m16 DS:(E)SI to I/O specified in DX into. Increments or decrements (E)SI. IF (DF=0) THEN
DX, m32 (E)SI ← (E)SI + SIZEOF(SRC);
Output String Byte OUTSB Outputs byte from memory location specified with DS:(E)SI to I/O ELSE
specified in DX into. Increments or decrements (E)SI. (E)SI ← (E)SI SIZEOF(SRC);
Output String Word OUTSB Outputs word from memory location specified with DS:(E)SI to I/O END
specified in DX into. Increments or decrements (E)SI.
Output String Dword OUTSB Outputs dword from memory location specified with DS:(E)SI to I/O
specified in DX into. Increments or decrements (E)SI.
Repeat Output String REP OUTS DX, m8 Outputs (E)CX bytes, words or dwords from memory location specified WHILE (E)CX<>0 DO
item DX, m16 with DS:(E)SI to I/O specified in DX into. Increments or decrements OUTS SRC,DX;
DX, m32 (E)SI. (E)CX ← (E)CX 1
END
Repeat Output String REP OUTSB Outputs (E)CX bytes from memory location specified with DS:(E)SI to WHILE (E)CX<>0 DO
Byte I/O specified in DX into. Increments or decrements (E)SI. OUTS(B|W|D);
Repeat Output String REP OUTSW Outputs (E)CX words from memory location specified with DS:(E)SI to (E)CX ← (E)CX 1
Word I/O specified in DX into. Increments or decrements (E)SI. END
Repeat Output String REP OUTSD Outputs (E)CX dwords from memory location specified with DS:(E)SI to
Dword I/O specified in DX into. Increments or decrements (E)SI.
Miscellaneous Instructions
Instruction Mnemonic Operands Description Symbolic operations
Load effective LEA r16, m Stores effective address for m in the destination register. DST ← EffectiveAddress(m)
address r32, m
No Operation NOP Do nothing.
Undefined UD2 Raises invalid opcode exception.
instruction
Table Look-up XLAT m8 Set AL to memory byte DS:[(E)BX + unsigned AL] AL ← DS:[(E)BX + ZeroExtend(AL)]
Translation XLATB
CPU Identification CPUID Returns processor identification and feature information to the EAX,
EBX, ECX and EDX registers, according to the input value entered
initially in the EAX register
FPU Instruction Set
FPU Examine FXAM Examines the contents of the ST(0) register and sets the condition flags C0, C2 and CASE ST(0) OF
C3 in the FPU status word according to the results. NaN: FPU.C3, .C2, .C0 ← 001B;
normal: FPU.C3, .C2, .C0 ← 010B;
infinity: FPU.C3, .C2, .C0 ← 011B;
zero: FPU.C3, .C2, .C0 ← 100B;
empty: FPU.C3, .C2, .C0 ← 101B;
denormal: FPU.C3, .C2, .C0 ← 110B;
END
FPU.C1 ← Sign(ST(0))
xmm1, xmm2/m128 Converts 8 packed signed word integers from mm1 and DST[7..0] ← SaturateSignedWordToSignedByte(DST[15..0]);
from mm2/m128 into 16 packed signed byte integers in DST[15..8] ← SaturateSignedWordToSignedByte(DST[31..16]);
mm1 using signed saturation. DST[23..16] ← SaturateSignedWordToSignedByte(DST[47..32]);
127
P O N M L K J I
DST[31..24] ← SaturateSignedWordToSignedByte(DST[63..48]);
SRC
DST[39..32] ← SaturateSignedWordToSignedByte(DST[79..64]);
127
DST[47..40] ← SaturateSignedWordToSignedByte(DST[95..80]);
DST P’ O’ N’ M’ L’ K’ J’ I’ H’ G’ F’ E’ D’ C’ B’ A’ DST[55..48] ← SaturateSignedWordToSignedByte(DST[111..96]);
DST[63..56] ← SaturateSignedWordToSignedByte(DST[127..112]);
127
DST H G F E D C B A DST[71..64] ← SaturateSignedWordToSignedByte(SRC[15..0]);
DST[79..72] ← SaturateSignedWordToSignedByte(SRC[31..16]);
DST[87..80] ← SaturateSignedWordToSignedByte(SRC[47..32]);
DST[95..88] ← SaturateSignedWordToSignedByte(SRC[63..48]);
DST[103..96] ← SaturateSignedWordToSignedByte(SRC[79..64]);
DST[111..104] ← SaturateSignedWordToSignedByte(SRC[95..80]);
DST[119..112] ← SaturateSignedWordToSignedByte(SRC[111..96]);
DST[127..120] ← SaturateSignedWordToSignedByte(SRC[127..112]);
Pack Signed Saturated PACKSSDW mm1, mm2/m64 Converts 2 packed signed dword integers from mm1 and DST[15..0] ← SaturateSignedDwordToSignedWord(DST[31..0]);
Dwords to Words from mm2/m64 into 4 packed signed word integers in DST[32..16] ← SaturateSignedDwordToSignedWord(DST[63..32]);
mm1 using signed saturation. DST[47..32] ← SaturateSignedDwordToSignedWord(SRC[31..0]);
SRC
63
D C
DST[63..48] ← SaturateSignedDwordToSignedWord(SRC[63..32]);
63
DST D’ C’ B’ A’
63
DST B A
xmm1, xmm2/m128 Converts 4 packed signed dword integers from mm1 and DST[15..0] ← SaturateSignedDwordToSignedWord(DST[31..0]);
from mm2/m128 into 8 packed signed word integers in DST[32..16] ← SaturateSignedDwordToSignedWord(DST[63..32]);
mm1 using signed saturation. DST[47..32] ← SaturateSignedDwordToSignedWord(DST[95..64]);
SRC
127
H G F E
DST[63..48] ← SaturateSignedDwordToSignedWord(DST[127..96]);
DST[79..64] ←SaturateSignedDwordToSignedWord(SRC[31..0]);
127 DST[95..80] ← SaturateSignedDwordToSignedWord(SRC[63..32]);
DST G’ G’ F’ E’ D’ C’ B’ A’
DST[111..96] ← SaturateSignedDwordToSignedWord(SRC[95..64]);
DST[127..112] ← SaturateSignedDwordToSignedWord(SRC[127..96]);
127
DST D C B A
Pack Unsigned Saturated PACKUSWB mm1, mm2/m64 Converts 4 packed signed word integers from mm1 and DST[7..0] ← SaturateSignedWordToUnsignedByte(DST[15..0]);
Words to Bytes from mm2/m64 into 8 packed unsigned byte integers in DST[15..8] ← SaturateSignedWordToUnsignedByte(DST[31..16]);
mm1 using unsigned saturation. DST[23..16] ← SaturateSignedWordToUnsignedByte(DST[47..32]);
63
H G F E
DST[31..24] ← SaturateSignedWordToUnsignedByte(DST[63..48]);
SRC
DST[39..32] ← SaturateSignedWordToUnsignedByte(SRC[15..0]);
63
DST[47..40] ← SaturateSignedWordToUnsignedByte(SRC[31..16]);
DST H’ G’ F’ E’ D’ C’ B’ A’ DST[55..48] ← SaturateSignedWordToUnsignedByte(SRC[47..32]);
DST[63..56] ← SaturateSignedWordToUnsignedByte(SRC[63..48]);
63
DST D C B A
zmm1, xmm2/m128 Converts 8 packed signed word integers from xmm1 and DST[7..0] ← SaturateSignedWordToUnsignedByte(DST[15..0]);
from xmm2/m128 into 16 packed unsigned byte integers DST[15..8] ← SaturateSignedWordToUnsignedByte(DST[31..16]);
in xmm1 using unsigned saturation. DST[23..16] ← SaturateSignedWordToUnsignedByte(DST[47..32]);
SRC
127
P O N M L K J I
DST[31..24] ← SaturateSignedWordToUnsignedByte(DST[63..48]);
DST[39..32] ← SaturateSignedWordToUnsignedByte(DST[79..64]);
127
DST[47..40] ← SaturateSignedWordToUnsignedByte(DST[95..80]);
DST P’ O’ N’ M’ L’ K’ J’ I’ H’ G’ F’ E’ D’ C’ B’ A’ DST[55..48] ← SaturateSignedWordToUnsignedByte(DST[111..96]);
DST[63..56] ← SaturateSignedWordToUnsignedByte(DST[127..112]);
127
DST H G F E D C B A DST[71..64] ← SaturateSignedWordToUnsignedByte(SRC[15..0]);
DST[79..72] ← SaturateSignedWordToUnsignedByte(SRC[31..16]);
DST[87..80] ← SaturateSignedWordToUnsignedByte(SRC[47..32]);
DST[95..88] ← SaturateSignedWordToUnsignedByte(SRC[63..48]);
DST[103..96] ← SaturateSignedWordToUnsignedByte(SRC[79..64]);
DST[111..104] ← SaturateSignedWordToUnsignedByte(SRC[95..80]);
DST[119..112] ← SaturateSignedWordToUnsignedByte(SRC[111..96]);
DST[127..120] ← SaturateSignedWordToUnsignedByte(SRC[127..112]);
Unpack interleaving Low- PUNPCKLBW mm1, mm2/m64 Unpacks and interleaves 4 low-order bytes from mm1 and DST[7..0] ← DST[7..0];
order Bytes to Words 4 low-order bytes from mm2/m64 into 4 words in mm1. DST[15..8] ← SRC[7..0];
63 DST[23..16] ← DST[15..8];
SRC D’’ C’’ B’’ A’’
DST[31..24] ← SRC[15..8];
63
DST[39..32] ← DST[23..16];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[47..40] ← SRC[23..16];
DST[55..48] ← DST[31..24];
DST
63
D’ C’ B’ A’ DST[63..56] ← SRC[31..24];
xmm1, xmm2/m128 Unpacks and interleaves 8 low-order bytes from xmm1 DST[7..0] ← DST[7..0];
and 8 low-order bytes from xmm2/m128 into 8 words in DST[15..8] ← SRC[7..0];
xmm1. DST[23..16] ← DST[15..8];
SRC
127
H’’ G’’ F’’ E’’ D’’ C’’ B’’ A’’
DST[31..24] ← SRC[15..8];
DST[39..32] ← DST[23..16];
127
DST[47..40] ← SRC[23..16];
DST H’’ H’ G’’ G’ F’’ F’ E’’ E’ D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[55..48] ← DST[31..24];
DST[63..56] ← SRC[31..24];
127
DST H’ G’ F’ E’ D’ C’ B’ A’ DST[71..64] ← DST[39..32];
DST[79..72] ← SRC[39..32];
DST[87..80] ← DST[47..40];
DST[95..88] ← SRC[47..40];
DST[103..96] ← DST[55..48];
DST[111..104] ← SRC[55..48];
DST[119..112] ← DST[63..56];
DST[127..120] ← SRC[63..56];
Unpack interleaving Low- PUNPCKLWD mm1, mm2/m64 Unpacks and interleaves 2 low-order words from mm1 DST[15..0] ← DST[15..0];
order Words to Dwords and 2 low-order words from mm2/m64 into 2 dwords in DST[31..16] ← SRC[15..0];
mm1. DST[47..32] ← DST[31..16];
SRC
63
B’’ A’’
DST[63..48] ← SRC[31..16];
63
DST B’’ B’ A’’ A’
63
DST B’ A’
xmm1, xmm2/m128 Unpacks and interleaves 4 low-order words from xmm1 DST[15..0] ← DST[15..0];
and 4 low-order words from xmm2/m128 into 4 dwords in DST[31..16] ← SRC[15..0];
mm1. DST[47..32] ← DST[31..16];
SRC
127
D’’ C’’ B’’ A’’
DST[63..48] ← SRC[31..16];
DST[79..64] ← DST[47..32];
127
DST[95..80] ← SRC[47..32];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[111..96] ← DST[63..48];
DST[127..112] ← SRC[63..48];
127
DST D’ C’ B’ A’
Unpack interleaving Low- PUNPCKLDQ xmm1, xmm2/m128 Unpacks and interleaves 2 low-order dwords from xmm1 DST[31..0] ← DST[31..0];
order Dwords to Qwords and 2 low-order dwords from xmm2/m128 into 2 qwords DST[63..32] ← SRC[31..0];
in mm1. DST[95..64] ← DST[63..32];
SRC
127
B’’ A’’
DST[127..96] ← SRC[63..32];
127
DST B’’ B’ A’’ A’
127
DST B’ A’
Unpack interleaving Low- PUNPCKLQDQ xmm1, xmm2/m128 Unpacks and interleaves low-order qword from xmm1 DST[63..0] ← DST[63..0];
order Qwords to Qwords and low-order qword from xmm2/m128 into mm1. DST[127..64] ← SRC[63..0];
127
SRC A’’
127
DST A’’ A’
127
DST A’
Unpack interleaving High- PUNPCKHBW mm1, mm2/m64 Unpacks and interleaves 4 high-order bytes from mm1 DST[7..0] ← DST[39..32];
order Bytes to Words and 4 high-order bytes from mm2/m64 into 4 words in DST[15..8] ← SRC[39..32];
mm1. DST[23..16] ← DST[47..40];
63
SRC D’’ C’’ B’’ A’’
DST[31..24] ← SRC[47..40];
DST[39..32] ← DST[55..48];
63
DST[47..40] ← SRC[55..48];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[55..48] ← DST[63..56];
DST[63..56] ← SRC[63..56];
63
DST D’ C’ B’ A’
xmm1, xmm2/m128 Unpacks and interleaves 8 high-order bytes from xmm1 DST[7..0] ← DST[71..64];
and 8 high-order bytes from xmm2/m128 into 8 words in DST[15..8] ← SRC[71..64];
xmm1. DST[23..16] ← DST[79..72];
127 DST[31..24] ← SRC[79..72];
SRC H’’ G’’ F’’ E’’ D’’ C’’ B’’ A’’
DST[39..32] ← DST[87..80];
127
DST[47..40] ← SRC[87..80];
DST H’’ H’ G’’ G’ F’’ F’ E’’ E’ D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[55..48] ← DST[95..88];
DST[63..56] ← SRC[95..88];
127
DST H’ G’ F’ E’ D’ C’ B’ A’ DST[71..64] ← DST[103..96];
DST[79..72] ← SRC[103..96];
DST[87..80] ← DST[111..104];
DST[95..88] ← SRC[111..104];
DST[103..96] ← DST[119..113];
DST[111..104] ← SRC[119..113];
DST[119..112] ← DST[127..120];
DST[127..120] ← SRC[127..120];
Unpack interleaving High- PUNPCKHWD mm1, mm2/m64 Unpacks and interleaves 2 high-order words from mm1 DST[15..0] ← DST[47..32];
order Words to Dwords and 2 high-order words from mm2/m64 into 2 dwords in DST[31..16] ← SRC[47..32];
mm1. DST[47..32] ← DST[63..48];
SRC
63
B’’ A’’
DST[63..48] ← SRC[63..48];
63
DST B’’ B’ A’’ A’
63
DST B’ A’
xmm1, xmm2/m128 Unpacks and interleaves 4 high-order words from xmm1 DST[15..0] ← DST[79..64];
and 4 high-order words from xmm2/m128 into 4 dwords DST[31..16] ← SRC[79..64];
in mm1. DST[47..32] ← DST[95..80];
SRC
127
D’’ C’’ B’’ A’’
DST[63..48] ← SRC[95..80];
DST[79..64] ← DST[111..96];
127
DST[95..80] ← SRC[111..96];
DST D’’ D’ C’’ C’ B’’ B’ A’’ A’ DST[111..96] ← DST[127..112];
DST[127..112] ← SRC[127..112];
127
DST D’ C’ B’ A’
Unpack interleaving High- PUNPCKHDQ xmm1, xmm2/m128 Unpacks and interleaves 2 high-order dwords from xmm1 DST[31..0] ← DST[95..64];
order Dwords to Qwords and 2 high-order dwords from xmm2/m128 into 2 qwords DST[63..32] ← SRC[95..64];
in mm1. DST[95..64] ← DST[127..96];
127
B’’ A’’
DST[127..96] ← SRC[127..96];
SRC
127
DST B’’ B’ A’’ A’
127
DST B’ A’
Unpack interleaving High- PUNPCKHQDQ xmm1, xmm2/m128 Unpacks and interleaves high-order qword from xmm1 DST[63..0] ← DST[127..64];
order Qwords to Qwords and high-order qword from xmm2/m128 into mm1. DST[127..64] ← SRC[127..64];
127
SRC A’’
127
DST A’’ A’
127
DST A’
MMX/SSE Packed Arithmetic instructions
Instruction Mnemonic Operands Description Symbolic operations
Packed Add Bytes PADDB mm1, mm2/m64 Add 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← DST[7..0] + SRC[7..0]
packed byte integers in mm1. DST[15..8] ← DST[15..8] + SRC[15..8]
63 DST[23..16] ← DST[23..16] + SRC[23..16]
DST
DST[31..24] ← DST[31..24] + SRC[31..24]
63
SRC DST[39..32] ← DST[39..32] + SRC[39..32]
DST[47..40] ← DST[47..40] + SRC[47..40]
+ + + + + + + +
DST[55..48] ← DST[55..48] + SRC[55..48]
DST
63
DST[63..56] ← DST[63..56] + SRC[63..56]
xmm1, xmm2/m128 Add 16 packed byte integers from xmm2/m128 to 16 DST[7..0] ← DST[7..0] + SRC[7..0]
packed byte integers in xmm1. DST[15..8] ← DST[15..8] + SRC[15..8]
127 DST[23..16] ← DST[23..16] + SRC[23..16]
DST
DST[31..24] ← DST[31..24] + SRC[31..24]
127
SRC DST[39..32] ← DST[39..32] + SRC[39..32]
DST[47..40] ← DST[47..40] + SRC[47..40]
+ + + + + + + + + + + + + + + +
DST[55..48] ← DST[55..48] + SRC[55..48]
DST
127
DST[63..56] ← DST[63..56] + SRC[63..56]
DST[71..64] ← DST[71.64] + SRC[71..64]
DST[79..72] ← DST[79..72] + SRC[79..72]
DST[87..80] ← DST[87..80] + SRC[87..80]
DST[95..88] ← DST[95..88] + SRC[95..88]
DST[103..96] ← DST[103..96] + SRC[103..96]
DST[111..104] ← DST[111..104] + SRC[111..104]
DST[119..112] ← DST[119..112] + SRC[119..112]
DST[127..120] ← DST[127..120] + SRC[127..120]
Packed Add Words PADDW mm1, mm2/m64 Add 4 packed word integers from mm2/m64 to 4 DST[15..0] ← DST[15..0] + SRC[15..0]
packed word integers in mm1. DST[31..16] ← DST[31..16] + SRC[31..16]
63 DST[47..32] ← DST[47..32] + SRC[47..32]
DST
DST[63..48] ← DST[63..48] + SRC[63..48]
63
SRC
+ + + +
63
DST
xmm1, xmm2/m128 Add 8 packed word integers from xmm2/m128 to 8 DST[15..0] ← DST[15..0] + SRC[15..0]
packed word integers in xmm1. DST[31..16] ← DST[31..16] + SRC[31..16]
127 DST[47..32] ← DST[47..32] + SRC[47..32]
DST
DST[63..48] ← DST[63..48] + SRC[63..48]
127
SRC DST[79..64] ← DST[79..64] + SRC[79..64]
DST[95..80] ← DST[95..80] + SRC[95..80]
+ + + + + + + +
DST[111..96] ← DST[111..96] + SRC[111..96]
DST
127
DST[127..112] ← DST[127..112] + SRC[127..112]
Packed Add Dwords PADDD mm1, mm2/m64 Add 2 packed double-word integers from mm2/m64 DST[31..0] ← DST[31..0] + SRC[31..0]
to 2 packed double-word integers in mm1. DST[63..32] ← DST[63..32] + SRC[63..32]
63
DST
63
SRC
+ +
63
DST
xmm1, xmm2/m128 Add 4 packed double-word integers from DST[31..0] ← DST[31..0] + SRC[31..0]
xmm2/m128 to 2 packed double-word integers in DST[63..32] ← DST[63..32] + SRC[63..32]
xmm1. DST[95..64] ← DST[95..64] + SRC[95..64]
DST
127 DST[127..96] ← DST[127..96] + SRC[127..96]
127
SRC
+ + + +
127
DST
Packed Add Bytes with PADDSB mm1, mm2/m64 Add 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToSignedByte (DST[7..0] + SRC[7..0])
Saturation packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToSignedByte (DST[15..8] + SRC[15..8])
with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] + SRC[23..16])
DST
63 DST[31..24] ← SaturateToSignedByte (DST[31..24] + SRC[31..24])
DST[39..32] ← SaturateToSignedByte (DST[39..32] + SRC[39..32])
63
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + +
DST[63..56] ← SaturateToSignedByte (DST[63..56] + SRC[63..56])
63
DST
xmm1, xmm2/m128 Add 16 packed byte integers from xmm2/m128 to 16 DST[7..0] ← SaturateToSignedByte (DST[7..0] + SRC[7..0])
packed byte integers in xmm1. Overflow is handled DST[15..8] ← SaturateToSignedByte (DST[15..8] + SRC[15..8])
with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] + SRC[23..16])
127 DST[31..24] ← SaturateToSignedByte (DST[31..24] + SRC[31..24])
DST
DST[39..32] ← SaturateToSignedByte (DST[39..32] + SRC[39..32])
127
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + + + + + + + + + +
DST[63..56] ← SaturateToSignedByte (DST[63..56] + SRC[63..56])
127
DST DST[71..64] ← SaturateToSignedByte (DST[71.64] + SRC[71..64])
DST[79..72] ← SaturateToSignedByte (DST[79..72] + SRC[79..72])
DST[87..80] ← SaturateToSignedByte (DST[87..80] + SRC[87..80])
DST[95..88] ← SaturateToSignedByte (DST[95..88] + SRC[95..88])
DST[103..96] ← SaturateToSignedByte (DST[103..96] + SRC[103..96])
DST[111..104] ← SaturateToSignedByte (DST[111..104] + SRC[111..104])
DST[119..112] ← SaturateToSignedByte (DST[119..112] + SRC[119..112])
DST[127..120] ← SaturateToSignedByte (DST[127..120] + SRC[127..120])
Packed Add Words with PADDSW mm1, mm2/m64 Add 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToSignedWord (DST[15..0] + SRC[15..0])
Saturation packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToSignedWord (DST[31..16] + SRC[31..16])
with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] + SRC[47..32])
DST
63 DST[63..48] ← SaturateToSignedWord (DST[63..48] + SRC[63..48])
63
SRC
+ + + +
63
DST
xmm1, xmm2/m128 Add 8 packed word integers from xmm2/m128 to 8 DST[15..0] ← SaturateToSignedWord (DST[15..0] + SRC[15..0])
packed word integers in xmm1. Overflow is handled DST[31..16] ← SaturateToSignedWord (DST[31..16] + SRC[31..16])
with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] + SRC[47..32])
127
DST
DST[63..48] ← SaturateToSignedWord (DST[63..48] + SRC[63..48])
DST[79..64] ← SaturateToSignedWord (DST[79..64] + SRC[79..64])
127
SRC DST[95..80] ← SaturateToSignedWord (DST[95..80] + SRC[95..80])
DST[111..96] ← SaturateToSignedWord (DST[111..96] + SRC[111..96])
+ + + + + + + +
DST[127..112] ← SaturateToSignedWord (DST[127..112] + SRC[127..112])
127
DST
Packed Add Bytes with PADDUSB mm1, mm2/m64 Add 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToUnsignedByte (DST[7..0] + SRC[7..0])
Unsigned Saturation packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToUnsignedByte (DST[15..8] + SRC[15..8])
with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] + SRC[23..16])
63 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] + SRC[31..24])
DST
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] + SRC[39..32])
63
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + +
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] + SRC[63..56])
63
DST
xmm1, xmm2/m128 Add 16 packed byte integers from xmm2/m128 to 16 DST[7..0] ← SaturateToUnsignedByte (DST[7..0] + SRC[7..0])
packed byte integers in xmm1. Overflow is handled DST[15..8] ← SaturateToUnsignedByte (DST[15..8] + SRC[15..8])
with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] + SRC[23..16])
DST
127 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] + SRC[31..24])
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] + SRC[39..32])
127
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] + SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] + SRC[55..48])
+ + + + + + + + + + + + + + + +
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] + SRC[63..56])
127
DST DST[71..64] ← SaturateToUnsignedByte (DST[71.64] + SRC[71..64])
DST[79..72] ← SaturateToUnsignedByte (DST[79..72] + SRC[79..72])
DST[87..80] ← SaturateToUnsignedByte (DST[87..80] + SRC[87..80])
DST[95..88] ← SaturateToUnsignedByte (DST[95..88] + SRC[95..88])
DST[103..96] ← SaturateToUnsignedByte (DST[103..96] + SRC[103..96])
DST[111..104] ← SaturateToUnsignedByte (DST[111..104] + SRC[111..104])
DST[119..112] ← SaturateToUnsignedByte (DST[119..112] + SRC[119..112])
DST[127..120] ← SaturateToUnsignedByte (DST[127..120] + SRC[127..120])
Packed Add Words with PADDUSW mm1, mm2/m64 Add 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToUnsignedWord (DST[15..0] + SRC[15..0])
Unsigned Saturation packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToUnsignedWord (DST[31..16] + SRC[31..16])
with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] + SRC[47..32])
DST
63 DST[63..48] ← SaturateToUnsignedWord (DST[63..48] + SRC[63..48])
63
SRC
+ + + +
63
DST
xmm1, xmm2/m128 Add 8 packed word integers from xmm2/m128 to 8 DST[15..0] ← SaturateToUnsignedWord (DST[15..0] + SRC[15..0])
packed word integers in xmm1. Overflow is handled DST[31..16] ← SaturateToUnsignedWord (DST[31..16] + SRC[31..16])
with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] + SRC[47..32])
127
DST
DST[63..48] ← SaturateToUnsignedWord (DST[63..48] + SRC[63..48])
DST[79..64] ← SaturateToUnsignedWord (DST[79..64] + SRC[79..64])
127
SRC DST[95..80] ← SaturateToUnsignedWord (DST[95..80] + SRC[95..80])
DST[111..96] ← SaturateToUnsignedWord (DST[111..96] + SRC[111..96])
+ + + + + + + +
DST[127..112] ← SaturateToUnsignedWord (DST[127..112] + SRC[127..112])
127
DST
Packed Subtract Bytes PSUBB mm1, mm2/m64 Subtract 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← DST[7..0] - SRC[7..0]
packed byte integers in mm1. DST[15..8] ← DST[15..8] - SRC[15..8]
63 DST[23..16] ← DST[23..16] - SRC[23..16]
DST
DST[31..24] ← DST[31..24] - SRC[31..24]
63
SRC DST[39..32] ← DST[39..32] - SRC[39..32]
DST[47..40] ← DST[47..40] - SRC[47..40]
- - - - - - - -
DST[55..48] ← DST[55..48] - SRC[55..48]
DST
63
DST[63..56] ← DST[63..56] - SRC[63..56]
xmm1, xmm2/m128 Subtract 16 packed byte integers from xmm2/m128 to DST[7..0] ← DST[7..0] - SRC[7..0]
16 packed byte integers in xmm1. DST[15..8] ← DST[15..8] - SRC[15..8]
127 DST[23..16] ← DST[23..16] - SRC[23..16]
DST
DST[31..24] ← DST[31..24] - SRC[31..24]
127
SRC DST[39..32] ← DST[39..32] - SRC[39..32]
DST[47..40] ← DST[47..40] - SRC[47..40]
- - - - - - - - - - - - - - - -
DST[55..48] ← DST[55..48] - SRC[55..48]
DST
127
DST[63..56] ← DST[63..56] - SRC[63..56]
DST[71..64] ← DST[71.64] - SRC[71..64]
DST[79..72] ← DST[79..72] - SRC[79..72]
DST[87..80] ← DST[87..80] - SRC[87..80]
DST[95..88] ← DST[95..88] - SRC[95..88]
DST[103..96] ← DST[103..96] - SRC[103..96]
DST[111..104] ← DST[111..104] - SRC[111..104]
DST[119..112] ← DST[119..112] - SRC[119..112]
DST[127..120] ← DST[127..120] - SRC[127..120]
Packed Subtract Words PSUBW mm1, mm2/m64 Subtract 4 packed word integers from mm2/m64 to 4 DST[15..0] ← DST[15..0] - SRC[15..0]
packed word integers in mm1. DST[31..16] ← DST[31..16] - SRC[31..16]
63 DST[47..32] ← DST[47..32] - SRC[47..32]
DST
DST[63..48] ← DST[63..48] - SRC[63..48]
63
SRC
- - - -
63
DST
xmm1, xmm2/m128 Subtract 8 packed word integers from xmm2/m128 to DST[15..0] ← DST[15..0] - SRC[15..0]
8 packed word integers in xmm1. DST[31..16] ← DST[31..16] - SRC[31..16]
127 DST[47..32] ← DST[47..32] - SRC[47..32]
DST
DST[63..48] ← DST[63..48] - SRC[63..48]
127
SRC DST[79..64] ← DST[79..64] - SRC[79..64]
DST[95..80] ← DST[95..80] - SRC[95..80]
- - - - - - - -
DST[111..96] ← DST[111..96] - SRC[111..96]
127
DST
DST[127..112] ← DST[127..112] - SRC[127..112]
Packed Subtract Dwords PSUBD mm1, mm2/m64 Subtract 2 packed double-word integers from DST[31..0] ← DST[31..0] - SRC[31..0]
mm2/m64 to 2 packed double-word integers in mm1. DST[63..32] ← DST[63..32] - SRC[63..32]
63
DST
63
SRC
- -
63
DST
xmm1, xmm2/m128 Subtract 4 packed double-word integers from DST[31..0] ← DST[31..0] - SRC[31..0]
xmm2/m128 to 2 packed double-word integers in DST[63..32] ← DST[63..32] - SRC[63..32]
xmm1. DST[95..64] ← DST[95..64] - SRC[95..64]
127
DST
DST[127..96] ← DST[127..96] - SRC[127..96]
127
SRC
- - - -
127
DST
Packed Subtract Bytes PSUBSB mm1, mm2/m64 Subtract 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToSignedByte (DST[7..0] - SRC[7..0])
with Saturation packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToSignedByte (DST[15..8] - SRC[15..8])
with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] - SRC[23..16])
DST
63 DST[31..24] ← SaturateToSignedByte (DST[31..24] - SRC[31..24])
DST[39..32] ← SaturateToSignedByte (DST[39..32] - SRC[39..32])
63
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] - SRC[55..48])
- - - - - - - -
DST[63..56] ← SaturateToSignedByte (DST[63..56] - SRC[63..56])
63
DST
xmm1, xmm2/m128 Subtract 16 packed byte integers from xmm2/m128 to DST[7..0] ← SaturateToSignedByte (DST[7..0] - SRC[7..0])
16 packed byte integers in xmm1. Overflow is DST[15..8] ← SaturateToSignedByte (DST[15..8] - SRC[15..8])
handled with signed saturation. DST[23..16] ← SaturateToSignedByte (DST[23..16] - SRC[23..16])
DST
127 DST[31..24] ← SaturateToSignedByte (DST[31..24] - SRC[31..24])
DST[39..32] ← SaturateToSignedByte (DST[39..32] - SRC[39..32])
127
SRC DST[47..40] ← SaturateToSignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToSignedByte (DST[55..48] - SRC[55..48])
- - - - - - - - - - - - - - - -
DST[63..56] ← SaturateToSignedByte (DST[63..56] - SRC[63..56])
127
DST DST[71..64] ← SaturateToSignedByte (DST[71.64] - SRC[71..64])
DST[79..72] ← SaturateToSignedByte (DST[79..72] - SRC[79..72])
DST[87..80] ← SaturateToSignedByte (DST[87..80] - SRC[87..80])
DST[95..88] ← SaturateToSignedByte (DST[95..88] - SRC[95..88])
DST[103..96] ← SaturateToSignedByte (DST[103..96] - SRC[103..96])
DST[111..104] ← SaturateToSignedByte (DST[111..104] - SRC[111..104])
DST[119..112] ← SaturateToSignedByte (DST[119..112] - SRC[119..112])
DST[127..120] ← SaturateToSignedByte (DST[127..120] - SRC[127..120])
Packed Subtract Words PSUBSW mm1, mm2/m64 Subtract 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToSignedWord (DST[15..0] - SRC[15..0])
with Saturation packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToSignedWord (DST[31..16] - SRC[31..16])
with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] - SRC[47..32])
DST
63 DST[63..48] ← SaturateToSignedWord (DST[63..48] - SRC[63..48])
63
SRC
- - - -
63
DST
xmm1, xmm2/m128 Subtract 8 packed word integers from xmm2/m128 to DST[15..0] ← SaturateToSignedWord (DST[15..0] - SRC[15..0])
8 packed word integers in xmm1. Overflow is DST[31..16] ← SaturateToSignedWord (DST[31..16] - SRC[31..16])
handled with signed saturation. DST[47..32] ← SaturateToSignedWord (DST[47..32] - SRC[47..32])
127
DST
DST[63..48] ← SaturateToSignedWord (DST[63..48] - SRC[63..48])
DST[79..64] ← SaturateToSignedWord (DST[79..64] - SRC[79..64])
127
SRC DST[95..80] ← SaturateToSignedWord (DST[95..80] - SRC[95..80])
DST[111..96] ← SaturateToSignedWord (DST[111..96] - SRC[111..96])
- - - - - - - -
DST[127..112] ← SaturateToSignedWord (DST[127..112] - SRC[127..112])
127
DST
Packed Subtract Bytes PSUBUSB mm1, mm2/m64 Subtract 8 packed byte integers from mm2/m64 to 8 DST[7..0] ← SaturateToUnsignedByte (DST[7..0] - SRC[7..0])
with Unsigned packed byte integers in mm1. Overflow is handled DST[15..8] ← SaturateToUnsignedByte (DST[15..8] - SRC[15..8])
Saturation with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] - SRC[23..16])
63 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] - SRC[31..24])
DST
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] - SRC[39..32])
63
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] - SRC[55..48])
- - - - - - - -
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] - SRC[63..56])
63
DST
xmm1, xmm2/m128 Subtract 16 packed byte integers from xmm2/m128 to DST[7..0] ← SaturateToUnsignedByte (DST[7..0] - SRC[7..0])
16 packed byte integers in xmm1. Overflow is DST[15..8] ← SaturateToUnsignedByte (DST[15..8] - SRC[15..8])
handled with unsigned saturation. DST[23..16] ← SaturateToUnsignedByte (DST[23..16] - SRC[23..16])
DST
127 DST[31..24] ← SaturateToUnsignedByte (DST[31..24] - SRC[31..24])
DST[39..32] ← SaturateToUnsignedByte (DST[39..32] - SRC[39..32])
127
SRC DST[47..40] ← SaturateToUnsignedByte (DST[47..40] - SRC[47..40])
DST[55..48] ← SaturateToUnsignedByte (DST[55..48] - SRC[55..48])
- - - - - - - - - - - - - - - -
DST[63..56] ← SaturateToUnsignedByte (DST[63..56] - SRC[63..56])
127
DST DST[71..64] ← SaturateToUnsignedByte (DST[71.64] - SRC[71..64])
DST[79..72] ← SaturateToUnsignedByte (DST[79..72] - SRC[79..72])
DST[87..80] ← SaturateToUnsignedByte (DST[87..80] - SRC[87..80])
DST[95..88] ← SaturateToUnsignedByte (DST[95..88] - SRC[95..88])
DST[103..96] ← SaturateToUnsignedByte (DST[103..96] - SRC[103..96])
DST[111..104] ← SaturateToUnsignedByte (DST[111..104] - SRC[111..104])
DST[119..112] ← SaturateToUnsignedByte (DST[119..112] - SRC[119..112])
DST[127..120] ← SaturateToUnsignedByte (DST[127..120] - SRC[127..120])
Packed Subtract Words PSUBUSW mm1, mm2/m64 Subtract 4 packed word integers from mm2/m64 to 4 DST[15..0] ← SaturateToUnsignedWord (DST[15..0] - SRC[15..0])
with Unsigned packed word integers in mm1. Overflow is handled DST[31..16] ← SaturateToUnsignedWord (DST[31..16] - SRC[31..16])
Saturation with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] - SRC[47..32])
DST
63 DST[63..48] ← SaturateToUnsignedWord (DST[63..48] - SRC[63..48])
63
SRC
- - - -
63
DST
xmm1, xmm2/m128 Subtract 8 packed word integers from xmm2/m128 to DST[15..0] ← SaturateToUnsignedWord (DST[15..0] - SRC[15..0])
8 packed word integers in xmm1. Overflow is DST[31..16] ← SaturateToUnsignedWord (DST[31..16] - SRC[31..16])
handled with unsigned saturation. DST[47..32] ← SaturateToUnsignedWord (DST[47..32] - SRC[47..32])
DST
127 DST[63..48] ← SaturateToUnsignedWord (DST[63..48] - SRC[63..48])
DST[79..64] ← SaturateToUnsignedWord (DST[79..64] - SRC[79..64])
127
SRC DST[95..80] ← SaturateToUnsignedWord (DST[95..80] - SRC[95..80])
DST[111..96] ← SaturateToUnsignedWord (DST[111..96] - SRC[111..96])
- - - - - - - -
DST[127..112] ← SaturateToUnsignedWord (DST[127..112] - SRC[127..112])
127
DST
Packed Multiply, Low PMULLW mm1, mm2/m64 Multiply 4 packed word integers from mm2/m64 by 4 TMP0[31..0] ← DST[15..0] * SRC[15..0]
Word packed word integers from mm1. Store low-order TMP1[31..0] ← DST[31..16] * SRC[31..16]
result words in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[15..0] ← TMP[015..0]
63
SRC DST[31..16] ← TMP1[15..0]
DST[47..32] ← TMP2[15..0]
DST[63..48] ← TMP3[15..0]
31
TMP0
31
TMP1
31
TMP2
31
TMP3
63
DST
xmm1, xmm2/m128 Multiply 8 packed word integers from xmm2/m128 TMP0[31..0] ← DST[15..0] * SRC[15..0]
by 8 packed word integers from xmm1. Store low- TMP1[31..0] ← DST[31..16] * SRC[31..16]
order result words in xmm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31 31
TMP4 TMP0 DST[15..0] ← TMP0[15..0]
31 31 DST[31..16] ← TMP1[15..0]
TMP5 TMP1 DST[47..32] ← TMP2[15..0]
31 31 DST[63..48] ← TMP3[15..0]
TMP6 TMP2
DST[79..64] ← TMP4[15..0]
31 31 DST[95..80] ← TMP5[15..0]
TMP7 TMP3
DST[111..96] ← TMP6[15..0]
DST
127
DST[127..112] ← TMP7[15..0]
Packed Multiply, High PMULHW mm1, mm2/m64 Multiply 4 packed word integers from mm2/m64 by 4 TMP0[31..0] ← DST[15..0] * SRC[15..0]
Word packed word integers from mm1. Store high-order TMP1[31..0] ← DST[31..16] * SRC[31..16]
result words in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[15..0] ← TMP0[31..16]
63
SRC DST[31..16] ← TMP1[31..16]
DST[47..32] ← TMP2[31..16]
DST[63..48] ← TMP3[31..16]
31
TMP3
31
TMP2
31
TMP1
31
TMP0
63
DST
xmm1, xmm2/m128 Multiply 8 packed word integers from xmm2/m128 TMP0[31..0] ← DST[15..0] * SRC[15..0]
by 8 packed word integers from xmm1. Store high- TMP1[31..0] ← DST[31..16] * SRC[31..16]
order result words in xmm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31 31
TMP7 TMP3 DST[15..0] ← TMP0[31..16]
31 31 DST[31..16] ← TMP1[31..16]
TMP6 TMP2 DST[47..32] ← TMP2[31..16]
31 31 DST[63..48] ← TMP3[31..16]
TMP5 TMP1
DST[79..64] ← TMP4[31..16]
31 31 DST[95..80] ← TMP5[31..16]
TMP4 TMP0
DST[111..96] ← TMP6[31..16]
DST
127
DST[127..112] ← TMP7[31..16]
Packed Multiply PMULUHW mm1, mm2/m64 Multiply 4 packed word integers from mm2/m64 by 4 TMP0[31..0] ← DST[15..0] * SRC[15..0] // unsigned multiplication
Unsigned, High Word packed word integers from mm1. Treat integers as TMP1[31..0] ← DST[31..16] * SRC[31..16]
unsigned. Store high-order result words in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[15..0] ← TMP0[31..16]
63
SRC DST[31..16] ← TMP1[31..16]
DST[47..32] ← TMP2[31..16]
DST[63..48] ← TMP3[31..16]
31
TMP3
31
TMP2
31
TMP1
31
TMP0
63
DST
xmm1, xmm2/m128 Multiply 8 packed word integers from xmm2/m128 TMP0[31..0] ← DST[15..0] * SRC[15..0] // unsigned multiplication
by 8 packed word integers from xmm1. Treat integers TMP1[31..0] ← DST[31..16] * SRC[31..16]
as unsigned. Store high-order result words in xmm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31 31
TMP7 TMP3 DST[15..0] ← TMP0[31..16]
31 31 DST[31..16] ← TMP1[31..16]
TMP6 TMP2 DST[47..32] ← TMP2[31..16]
31 31 DST[63..48] ← TMP3[31..16]
TMP5 TMP1
DST[79..64] ← TMP4[31..16]
31 31 DST[95..80] ← TMP5[31..16]
TMP4 TMP0
DST[111..96] ← TMP6[31..16]
DST
127
DST[127..112] ← TMP7[31..16]
Packed Multiply Add PMADDWD mm1, mm2/n64 Multiply 4 packed words in mm1 by 4 packed words TMP0[31..0] ← DST[15..0] * SRC[15..0]
Word in mm2/m64, add adjacent double word results and TMP1[31..0] ← DST[31..16] * SRC[31..16]
store in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
63 TMP3[31..0] ← DST[63..48] * SRC[63..48]
DST[31..0] ← TMP0[31..0] + TMP1[31..0]
63
SRC DST[63..32] ← TMP2[31..0] + TMP3[31..0]
31
TMP0
31
TMP1
31
TMP2 +
31
TMP3
+
63
DST
xmm1, xmm2/n128 Multiply 8 packed words in xmm1 by 8 packed words TMP0[31..0] ← DST[15..0] * SRC[15..0]
in xmm2/m128, add adjacent double word results and TMP1[31..0] ← DST[31..16] * SRC[31..16]
store in mm1. TMP2[31..0] ← DST[47..32] * SRC[47..32]
DST
127 TMP3[31..0] ← DST[63..48] * SRC[63..48]
TMP4[31..0] ← DST[79..64] * SRC[79..64]
127
SRC TMP5[31..0] ← DST[95..80] * SRC[95..80]
TMP6[31..0] ← DST[111..96] * SRC[111..96]
TMP7[31..0] ← DST[127..112] * SRC[127..112]
31
TMP4
31
TMP0 DST[31..0] ← TMP0[31..0] + TMP1[31..0]
31 31
DST[63..32] ← TMP2[31..0] + TMP3[31..0]
TMP5 TMP1 DST[95..64] ← TMP4[31..0] + TMP5[31..0]
31 31 DST[127..96] ← TMP6[31..0] + TMP7[31..0]
TMP6 + +
31 31
TMP7 TMP3
+ +
127
DST
MMX/SSE Comparison instructions
Instruction Mnemonic Operands Description Symbolic operations
Parallel Compare Bytes PCMPEQB mm1, mm2/m64 Compare 8 packed bytes in mm1 and mm2/m64 for equality. IF DST[7..0] = SRC[7..0] THEN DST[7..0] ← 0FFH
for Equality If a pair of data element is equal, then the corresponding data ELSE DST [7..0] ← 00H
element in the destination operand is set to all 1s; otherwise, it IF DST[15..8] = SRC[15..8] THEN DST[15..8] ← 0FFH
is set to all 0s. No flags in the EFLAGS registers are affected. ELSE DST [15..8] ← 00H
DST
63
IF DST[23..16] = SRC[23..16] THEN DST[23..16] ← 0FFH
63
ELSE DST [23..16] ← 00H
SRC IF DST[31..24] = SRC[31..23] THEN DST[31..24] ← 0FFH
ELSE DST [31..24] ← 00H
= = = = = = = =
IF DST[39..32] = SRC[39..32] THEN DST[39..32] ← 0FFH
63
DST ELSE DST [39..32] ← 00H
IF DST[47..40] = SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] = SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] = SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
xmm1, xmm2/m128 Compare 16 packed bytes in mm1 and mm2/m128 for IF DST[7..0] = SRC[7..0] THEN DST[7..0] ← 0FFH
equality. If a pair of data element is equal, then the ELSE DST [7..0] ← 00H
corresponding data element in the destination operand is set to IF DST[15..8] = SRC[15..8] THEN DST[15..8] ← 0FFH
all 1s; otherwise, it is set to all 0s. No flags in the EFLAGS ELSE DST [15..8] ← 00H
registers are affected.
IF DST[23..16] = SRC[23..16] THEN DST[23..16] ← 0FFH
127
DST ELSE DST [23..16] ← 00H
127
IF DST[31..24] = SRC[31..23] THEN DST[31..24] ← 0FFH
SRC ELSE DST [31..24] ← 00H
IF DST[39..32] = SRC[39..32] THEN DST[39..32] ← 0FFH
= = = = = = = = = = = = = = = =
ELSE DST [39..32] ← 00H
127
DST IF DST[47..40] = SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] = SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] = SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
IF DST[71..64] = SRC[71..63] THEN DST[71..64] ← 0FFH
ELSE DST [71..64] ← 00H
IF DST[79..72] = SRC[79..72] THEN DST[79..72] ← 0FFH
ELSE DST [79..72] ← 00H
IF DST[87..80] = SRC[87..80] THEN DST[87..80] ← 0FFH
ELSE DST [87..80] ← 00H
IF DST[95..88] = SRC[95..88] THEN DST[95..88] ← 0FFH
ELSE DST [95..88] ← 00H
IF DST[103..96] = SRC[103..96] THEN DST[103..96] ← 0FFH
ELSE DST [103..96] ← 00H
IF DST[111..104] = SRC[111..103] THEN DST[111..104] ← 0FFH
ELSE DST [111..104] ← 00H
IF DST[119..112] = SRC[119..112] THEN DST[119..112] ← 0FFH
ELSE DST [119..112] ← 00H
IF DST[127..120] = SRC[127..120] THEN DST[127..120] ← 0FFH
ELSE DST [127..120] ← 00H
Parallel Compare Words PCMPEQW mm1, mm2/m64 Compare 4 packed words in mm1 and mm2/m64 for equality. IF DST[15..0] = SRC[15..0] THEN DST[15..0] ← 0FFH
for Equality If a pair of data element is equal, then the corresponding data ELSE DST [15..0] ← 00H
element in the destination operand is set to all 1s; otherwise, it IF DST[31..16] = SRC[31..16] THEN DST[31..16] ← 0FFH
is set to all 0s. No flags in the EFLAGS registers are affected. ELSE DST [31..16] ← 00H
DST
63
IF DST[47..32] = SRC[47..32] THEN DST[47..32] ← 0FFH
63
ELSE DST [47..32] ← 00H
SRC IF DST[63..48] = SRC[63..48] THEN DST[63..48] ← 0FFH
ELSE DST [63..48] ← 00H
= = = =
63
DST
xmm1, xmm2/m128 Compare 8 packed words in mm1 and mm2/m128 for equality. IF DST[15..0] = SRC[15..0] THEN DST[15..0] ← 0FFH
If a pair of data element is equal, then the corresponding data ELSE DST [15..0] ← 00H
element in the destination operand is set to all 1s; otherwise, it IF DST[31..16] = SRC[31..16] THEN DST[31..16] ← 0FFH
is set to all 0s. No flags in the EFLAGS registers are affected. ELSE DST [31..16] ← 00H
DST
127
IF DST[47..32] = SRC[47..32] THEN DST[47..32] ← 0FFH
127
ELSE DST [47..32] ← 00H
SRC IF DST[63..48] = SRC[63..48] THEN DST[63..48] ← 0FFH
ELSE DST [63..48] ← 00H
= = = = = = = =
IFDST[79..64] = SRC[79..64] THEN DST[79..64] ← 0FFH
127
DST ELSE DST [79..64] ← 00H
IF DST[95..80] = SRC[95..80] THEN DST[95..80] ← 0FFH
ELSE DST [95..80] ← 00H
IF DST[111..96] = SRC[111..96] THEN DST[111..96] ← 0FFH
ELSE DST [111..96] ← 00H
IF DST[127..112] = SRC[127..112] THEN DST[127..112] ← 0FFH
ELSE DST [127..112] ← 00H
Parallel Compare PCMPEQD mm1, mm2/m64 Compare 2 packed double words in mm1 and mm2/m64 for IF DST[31..0] = SRC[31..0] THEN DST[31..0] ← 0FFH
Dwords for Equality equality. If a pair of data element is equal, then the ELSE DST [31..0] ← 00H
corresponding data element in the destination operand is set to IF DST[63..32] = SRC[63..32] THEN DST[63..32] ← 0FFH
all 1s; otherwise, it is set to all 0s. No flags in the EFLAGS ELSE DST [63..32] ← 00H
registers are affected.
63
DST
63
SRC
= =
63
DST
xmm1, xmm2/m128 Compare 4 packed double words in mm1 and mm2/m128 for IF DST[31..0] = SRC[31..0] THEN DST[31..0] ← 0FFH
equality. If a pair of data element is equal, then the ELSE DST [31..0] ← 00H
corresponding data element in the destination operand is set to IF DST[63..32] = SRC[63..32] THEN DST[63..32] ← 0FFH
all 1s; otherwise, it is set to all 0s. No flags in the EFLAGS ELSE DST [63..32] ← 00H
registers are affected. IF DST[95..64] = SRC[95..64] THEN DST[95..64] ← 0FFH
127
DST ELSE DST [95..64] ← 00H
127
IF DST[127..96] = SRC[127..96] THEN DST[127..96] ← 0FFH
SRC ELSE DST [127..96] ← 00H
= = = =
127
DST
Parallel Compare Bytes PCMPGTB mm1, mm2/m64 Compare 8 packed bytes in mm1 and mm2/m64 for greater. IF DST[7..0] > SRC[7..0] THEN DST[7..0] ← 0FFH
for Greater Bytes are treated as signed integers. If a pair of data element is ELSE DST [7..0] ← 00H
greater, then the corresponding data element in the destination IF DST[15..8] > SRC[15..8] THEN DST[15..8] ← 0FFH
operand is set to all 1s; otherwise, it is set to all 0s. No flags in ELSE DST [15..8] ← 00H
the EFLAGS registers are affected. IF DST[23..16] > SRC[23..16] THEN DST[23..16] ← 0FFH
63
DST ELSE DST [23..16] ← 00H
63
IF DST[31..24] > SRC[31..23] THEN DST[31..24] ← 0FFH
SRC ELSE DST [31..24] ← 00H
IF DST[39..32] > SRC[39..32] THEN DST[39..32] ← 0FFH
> > > > > > > >
ELSE DST [39..32] ← 00H
63
DST IF DST[47..40] > SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] > SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] > SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
xmm1, xmm2/m128 Compare 16 packed bytes in mm1 and mm2/m128 for greater. IF DST[7..0] > SRC[7..0] THEN DST[7..0] ← 0FFH
Bytes are treated as signed integers. If a pair of data element is ELSE DST [7..0] ← 00H
greater, then the corresponding data element in the destination IF DST[15..8] > SRC[15..8] THEN DST[15..8] ← 0FFH
operand is set to all 1s; otherwise, it is set to all 0s. No flags in ELSE DST [15..8] ← 00H
the EFLAGS registers are affected. IF DST[23..16] > SRC[23..16] THEN DST[23..16] ← 0FFH
127
DST ELSE DST [23..16] ← 00H
127
IF DST[31..24] > SRC[31..23] THEN DST[31..24] ← 0FFH
SRC ELSE DST [31..24] ← 00H
IF DST[39..32] > SRC[39..32] THEN DST[39..32] ← 0FFH
> > > > > > > > > > > > > > > >
ELSE DST [39..32] ← 00H
127
DST IF DST[47..40] > SRC[47..40] THEN DST[47..40] ← 0FFH
ELSE DST [47..40] ← 00H
IF DST[55..48] > SRC[55..48] THEN DST[55..48] ← 0FFH
ELSE DST [55..48] ← 00H
IF DST[63..56] > SRC[63..56] THEN DST[63..56] ← 0FFH
ELSE DST [63..56] ← 00H
IF DST[71..64] > SRC[71..63] THEN DST[71..64] ← 0FFH
ELSE DST [71..64] ← 00H
IF DST[79..72] > SRC[79..72] THEN DST[79..72] ← 0FFH
ELSE DST [79..72] ← 00H
IF DST[87..80] > SRC[87..80] THEN DST[87..80] ← 0FFH
ELSE DST [87..80] ← 00H
IF DST[95..88] > SRC[95..88] THEN DST[95..88] ← 0FFH
ELSE DST [95..88] ← 00H
IF DST[103..96] > SRC[103..96] THEN DST[103..96] ← 0FFH
ELSE DST [103..96] ← 00H
IF DST[111..104] > SRC[111..103] THEN DST[111..104] ← 0FFH
ELSE DST [111..104] ← 00H
IF DST[119..112] > SRC[119..112] THEN DST[119..112] ← 0FFH
ELSE DST [119..112] ← 00H
IF DST[127..120] > SRC[127..120] THEN DST[127..120] ← 0FFH
ELSE DST [127..120] ← 00H
Parallel Compare Words PCMPGTW mm1, mm2/m64 Compare 4 packed words in mm1 and mm2/m64 for greater. IF DST[15..0] > SRC[15..0] THEN DST[15..0] ← 0FFFFH
for Greater Words are treated as signed integers. If a pair of data element ELSE DST [15..0] ← 00H
is greater, then the corresponding data element in the IF DST[31..16] > SRC[31..16] THEN DST[31..16] ← 0FFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [31..16] ← 00H
No flags in the EFLAGS registers are affected. IF DST[47..32] > SRC[47..32] THEN DST[47..32] ← 0FFFFH
63
DST ELSE DST [47..32] ← 00H
63
IF DST[63..48] > SRC[63..48] THEN DST[63..48] ← 0FFFFH
SRC ELSE DST [63..48] ← 00H
> > > >
63
DST
xmm1, xmm2/m128 Compare 8 packed words in mm1 and mm2/m128 for greater. IF DST[15..0] > SRC[15..0] THEN DST[15..0] ← 0FFFFH
Words are treated as signed integers. If a pair of data element ELSE DST [15..0] ← 00H
is greater, then the corresponding data element in the IF DST[31..16] > SRC[31..16] THEN DST[31..16] ← 0FFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [31..16] ← 00H
No flags in the EFLAGS registers are affected. IF DST[47..32] > SRC[47..32] THEN DST[47..32] ← 0FFFFH
127
DST ELSE DST [47..32] ← 00H
127
IF DST[63..48] > SRC[63..48] THEN DST[63..48] ← 0FFFFH
SRC ELSE DST [63..48] ← 00H
IFDST[79..64] > SRC[79..64] THEN DST[79..64] ← 0FFFFH
> > > > > > > >
ELSE DST [79..64] ← 00H
127
DST IF DST[95..80] > SRC[95..80] THEN DST[95..80] ← 0FFFFH
ELSE DST [95..80] ← 00H
IF DST[111..96] > SRC[111..96] THEN DST[111..96] ← 0FFFFH
ELSE DST [111..96] ← 00H
IF DST[127..112] > SRC[127..112] THEN DST[127..112] ← 0FFFFH
ELSE DST [127..112] ← 00H
Parallel Compare PCMPGTD mm1, mm2/m64 Compare 2 packed double words in mm1 and mm2/m64 for IF DST[31..0] > SRC[31..0] THEN DST[31..0] ← 0FFFFFFFFH
Dwords for Greater greater. Dwords are treated as signed integers. If a pair of data ELSE DST [31..0] ← 00H
element is greater, then the corresponding data element in the IF DST[63..32] > SRC[63..32] THEN DST[63..32] ← 0FFFFFFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [63..32] ← 00H
No flags in the EFLAGS registers are affected.
63
DST
63
SRC
> >
63
DST
xmm1, xmm2/m128 Compare 4 packed double words in mm1 and mm2/m128 for IF DST[31..0] > SRC[31..0] THEN DST[31..0] ← 0FFFFFFFFH
greater. Dwords are treated as signed integers. If a pair of data ELSE DST [31..0] ← 00H
element is greater, then the corresponding data element in the IF DST[63..32] > SRC[63..32] THEN DST[63..32] ← 0FFFFFFFFH
destination operand is set to all 1s; otherwise, it is set to all 0s. ELSE DST [63..32] ← 00H
No flags in the EFLAGS registers are affected. IF DST[95..64] > SRC[95..64] THEN DST[95..64] ← 0FFFFFFFFH
127
DST ELSE DST [95..64] ← 00H
127
IF DST[127..96] > SRC[127..96] THEN DST[127..96] ← 0FFFFFFFFH
SRC ELSE DST [127..96] ← 00H
> > > >
127
DST
0
TMP 0
63
DST
xmm1, xmm2/m128 Shift double words in xmm1 left by the specified position count clearing low- DST[31..0] ← ZeroExtend(DST[31..0] SHL Count)
xmm1, imm8 order bits. DST[63..32] ← ZeroExtend(DST[63..32] SHL Count)
127 DST[95..64] ← ZeroExtend(DST[95..64] SHL Count)
DST
DST[127..96] ← ZeroExtend(DST[127..96] SHL Count)
0 0
TMP 0 0
127
DST
Packed Shift Left PSLLQ mm1, mm2/m64 Shift quad word in mm1 left by the specified position count clearing low-order DST[63..0] ← ZeroExtend(DST[63..0] SHL Count)
Logical Qwords mm1, imm8 bits.
63
DST
TMP 0
63
DST
xmm1, xmm2/m128 Shift quad words in xmm1 left by the specified position count clearing low-order DST[63..0] ← ZeroExtend(DST[63..0] SHL Count)
xmm1, imm8 bits. DST[127..64] ← ZeroExtend(DST[127..64] SHL Count)
127
DST
0
TMP 0
127
DST
Packed Shift Right PSRLW mm1, mm2/m64 Shift words in mm1 right by the specified position count clearing high-order bits. DST[15..0] ← ZeroExtend(DST[15..0] SHR Count)
Logical Words mm1, imm8 63 DST[31..16] ← ZeroExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← ZeroExtend(DST[47..32] SHR Count)
0 DST[63..48] ← ZeroExtend(DST[63..48] SHR Count)
0
TMP
0 0
0
63
DST
xmm1, xmm2/m128 Shift words in xmm1 right by the specified position count clearing high-order bits. DST[15..0] ← ZeroExtend(DST[15..0] SHR Count)
xmm1, imm8 127 DST[31..16] ← ZeroExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← ZeroExtend(DST[47..32] SHR Count)
0 0 DST[63..48] ← ZeroExtend(DST[63..48] SHR Count)
0 0
TMP DST[79..64] ← ZeroExtend(DST[79..64] SHR Count)
0 0 0 0
0 0 DST[95..80] ← ZeroExtend(DST[95..80] SHR Count)
DST
127 DST[111..96] ← ZeroExtend(DST[111..96] SHR Count)
DST[127..112] ← ZeroExtend(DST[127..112] SHR Count)
Packed Shift Right PSRLD mm1, mm2/m64 Shift double words in mm1 right by the specified position count clearing high- DST[31..0] ← ZeroExtend(DST[31..0] SHR Count)
Logical Dwords mm1, imm8 order bits. DST[63..32] ← ZeroExtend(DST[63..32] SHR Count)
63
DST
0
0
TMP
63
DST
xmm1, xmm2/m128 Shift double words in xmm1 right by the specified position count clearing high- DST[31..0] ← ZeroExtend(DST[31..0] SHR Count)
xmm1, imm8 order bits. DST[63..32] ← ZeroExtend(DST[63..32] SHR Count)
127 DST[95..64] ← ZeroExtend(DST[95..64] SHR Count)
DST
DST[127..96] ← ZeroExtend(DST[127..96] SHR Count)
0 0
0 0
TMP
127
DST
Packed Shift Right PSRLQ mm1, mm2/m64 Shift quad word in mm1 right by the specified position count clearing high-order DST[63..0] ← ZeroExtend(DST[63..0] SHR Count)
Logical Qwords mm1, imm8 bits.
63
DST
TMP
0
63
DST
xmm1, xmm2/m128 Shift quad words in xmm1 right by the specified position count clearing high- DST[63..0] ← ZeroExtend(DST[63..0] SHR Count)
xmm1, imm8 order bits. DST[127..64] ← ZeroExtend(DST[127..64] SHR Count)
127
DST
0
0
TMP
127
DST
Packed Shift Right PSRAW mm1, mm2/m64 Shift words in mm1 right by the specified position count duplicating sign. DST[15..0] ← SignExtend(DST[15..0] SHR Count)
Arithmetical Words mm1, imm8 63 DST[31..16] ← SignExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← SignExtend(DST[47..32] SHR Count)
DST[63..48] ← SignExtend(DST[63..48] SHR Count)
TMP
0
63
DST
xmm1, xmm2/m128 Shift words in xmm1 right by the specified position count duplicating sign bits. DST[15..0] ← SignExtend(DST[15..0] SHR Count)
xmm1, imm8 127 DST[31..16] ← SignExtend(DST[31..15] SHR Count)
DST
DST[47..32] ← SignExtend(DST[47..32] SHR Count)
DST[63..48] ← SignExtend(DST[63..48] SHR Count)
TMP DST[79..64] ← SignExtend(DST[79..64] SHR Count)
0 0
DST[95..80] ← SignExtend(DST[95..80] SHR Count)
DST
127 DST[111..96] ← SignExtend(DST[111..96] SHR Count)
DST[127..112] ← SignExtend(DST[127..112] SHR Count)
Packed Shift Right PSRAD mm1, mm2/m64 Shift double words in mm1 right by the specified position count duplicating sign DST[31..0] ← SignExtend(DST[31..0] SHR Count)
Arithmetical Dwords mm1, imm8 bits. DST[63..32] ← SignExtend(DST[63..32] SHR Count)
63
DST
TMP
63
DST
xmm1, xmm2/m128 Shift double words in xmm1 right by the specified position count duplicating sign DST[31..0] ← SignExtend(DST[31..0] SHR Count)
xmm1, imm8 bits. DST[63..32] ← SignExtend(DST[63..32] SHR Count)
127 DST[95..64] ← SignExtend(DST[95..64] SHR Count)
DST
DST[127..96] ← SignExtend(DST[127..96] SHR Count)
TMP
127
DST
Move Scalar Single MOVSS xmm, m128 Moves scalar single-precision floating-point value from source to DST[31..0] ← SRC[31..0]
destination operand. DST[127..32] ← 000000000000000000000000H
m128, xmm DST[31..0] ← SRC[31..0]
xmm1, xmm2 DST[31..0] ← SRC[31..0]
//DST[127..32] remains unchanged
+ + + +
127
DST
Add Scalar Singles ADDSS xmm1, xmm2/m32 Adds the low single-precision floating-point value from xmm2/m32 to DST[31..0] ← DST[31..0] + SRC[31..0];
the low single-precision floating-point value in xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC
+
127
DST
Subtract Packed SUBPS xmm1, xmm2/m128 Subtracts 4 packed single-precision floating-point values from DST[31..0] ← DST[31..0] SRC[31-0];
Singles xmm2/m128 from 4 packed single-precision floating-point values in DST[63..32] ← DST[63..32] SRC[63..32];
xmm1. DST[95..64] ← DST[95..64] SRC[95..64];
127
DST DST[127..96] ← DST[127..96] SRC[127..96];
127
SRC
- - - -
127
DST
Subtract Scalar Singles SUBSS xmm1, xmm2/m32 Subtracts the low single-precision floating-point value from xmm2/m32 DST[31..0] ← DST[31..0] SRC[31..0];
from the low single-precision floating-point value in xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC
-
127
DST
Multiply Packed MULPS xmm1, xmm2/m128 Multiplies 4 packed single-precision floating-point values xmm1 by 4 DST[31..0] ← DST[31..0] * SRC[31-0];
Singles packed single-precision floating-point values in xmm2/m128. DST[63..32] ← DST[63..32] * SRC[63..32];
127 DST[95..64] ← DST[95..64] * SRC[95..64];
DST
DST[127..96] ← DST[127..96] * SRC[127..96];
127
SRC
127
DST
Multiply Scalar Singles MULSS xmm1, xmm2/m32 Multiplies the low single-precision floating-point value from xmm1 by DST[31..0] ← DST[31..0] * SRC[31..0];
the low single-precision floating-point value in xmm2/m32. // DST[127..32] remains unchanged
127
DST
127
SRC
127
DST
Divide Packed Singles DIVPS xmm1, xmm2/m128 Divides 4 packed single-precision floating-point values in xmm1 by 4 DST[31..0] ← DST[31..0] / SRC[31..0];
packed single-precision floating-point values in xmm2/m128. DST[63..32] ← DST[63..32] / SRC[63..32];
127 DST[95..64] ← DST[95..64] / SRC[95..64];
DST
DST[127..96] ← DST[127..96] / SRC[127..96];
127
SRC
/ / / /
127
DST
Divide Scalar Singles DIVSS xmm1, xmm2/m32 Divides low single-precision floating-point value in xmm1 by the low DST[31..0] ← DST[31..0] / SRC[31..0];
single-precision floating-point value in xmm2/m64. // DST[127..32] remains unchanged
127
DST
127
SRC
/
127
DST
Reciprocals of Packed RCPPS xmm1, xmm2/m128 Computes the approximate reciprocals of the packed single-precision DST[31..0] ← Approximate (1.0 / SRC[31..0]);
Singles floating-point values in xmm2/m128 and stores the results in xmm1. DST[63..32] ← Approximate (1.0 / SRC[63..32]);
127 DST[95..64] ← Approximate (1.0 / SRC[95..64]);
SRC
DST[127..96] ← Approximate (1.0 / SRC[127..96]);
1/x 1/x 1/x 1/x
127
DST
Reciprocals of Scalar RCPSS xmm1, xmm2/m32 Computes the approximate reciprocal of the scalar single-precision DST[31..0] ← Approximate (1.0 / SRC[31..0]);
Single floating-point value in xmm2/m128 and stores the result in xmm1. // DST[127..32] remains unchanged
SRC
1/x
127
DST
Square Roots of Packed SQRTPS xmm1, xmm2/m128 Computes the square roots of the packed single-precision floating-point DST[31..0] ← SquareRoot (SRC[31..0]);
Singles values in xmm2/m128 and stores the results in xmm1. DST[63..32] ← SquareRoot (SRC[63..32]);
127 DST[95..64] ← SquareRoot (SRC[95..64]);
SRC
DST[127..96] ← SquareRoot (SRC[127..96]);
SQRT SQRT SQRT SQRT
127
DST
Square Root of Scalar SQRTSS xmm1, xmm2/m32 Computes the square root of the scalar single-precision floating-point DST[31..0] ← SquareRoot (SRC[31..0]);
Single value in xmm2/m128 and stores the result in xmm1. // DST[127..32] remains unchanged
SRC
SQRT
127
DST
Reciprocals of Square RSQRTPS xmm1, xmm2/m128 Computes the approximate reciprocals of the square roots of the DST[31..0] ← Approximate (1.0 / SquareRoot (SRC[31..0]));
Roots of Packed packed single-precision floating-point values in xmm2/m128 and DST[63..32] ← Approximate (1.0 / SquareRoot (SRC[63..32]));
Singles stores the results in xmm1. DST[95..64] ← Approximate (1.0 / SquareRoot (SRC[95..64]));
SRC
127 DST[127..96] ← Approximate (1.0 / SquareRoot (SRC[127..96]));
Reciprocals of Square RSQRTSS xmm1, xmm2/m32 Computes the approximate reciprocal of the square root of the scalar DST[31..0] ← Approximate (1.0 / SquareRoot (SRC[31..0]));
Roots of Scalar Single single-precision floating-point value in xmm2/m128 and stores the // DST[127..32] remains unchanged
result in xmm1.
SRC
1/SQRT(x)
127
DST
Maximum Packed MAXPS xmm1, xmm2/m128 Returns the maximum single-precision floating-point values between DST[31..0] ← MaximumOf (DST[31..0], SRC[31..0]);
Single xmm2/m128 and xmm1. DST[63..32] ← MaximumOf (DST[63..32], SRC[63..32]);
127 DST[95..64] ← Maximum (DST[95..64], SRC[95..64]);
DST
DST[127..96] ← MaximumOf (DST[127..96], SRC[127..96]);
127
SRC
Maximum Scalar MAXSS xmm1, xmm2/m32 Returns the maximum scalar single-precision floating-point value DST[31..0] ← MaximumOf (DST[31..0], SRC[31..0]);
Single between xmm2/m128 and xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC
MAX
127
DST
Minimum Packed MINPS xmm1, xmm2/m128 Returns the minimum single-precision floating-point values between DST[31..0] ← MinimumOf (DST[31..0], SRC[31..0]);
Single xmm2/m128 and xmm1. DST[63..32] ← MinimumOf (DST[63..32], SRC[63..32]);
127 DST[95..64] ← Minimum (DST[95..64], SRC[95..64]);
DST
DST[127..96] ← MinimumOf (DST[127..96], SRC[127..96]);
127
SRC
Minimum Scalar Single MINSS xmm1, xmm2/m32 Returns the minimum scalar single-precision floating-point value DST[31..0] ← MinimumOf (DST[31..0], SRC[31..0]);
between xmm2/m128 and xmm1. // DST[127..32] remains unchanged
127
DST
127
SRC
MIN
127
DST
SSE Comparison Instructions
Instruction Mnemonic Operands Description Symbolic operations
Compare Packed CMPPS xmm1, xmm2/m128, Compares 4 packed double-precision floating-point values in CMP0 ← DST[31..0] OP SRC[31..0];
Singles imm8 xmm2/m128 and xmm1 using imm8 as comparison predicate: 0 – CMP1 ← DST[63..32] OP SRC[63..32];
equal, 1 – less than, 2 – less or equal, 3 – unordered, 4 – not equal, 5 CMP2 ← DST[95..64] OP SRC[95..64];
– not less, 6 – not less or equal, 7 – ordered. The result of each CMP3 ← DST[127..96] OP SRC[127..96];
comparison in a quad-word mask of all 1s (comparison true) or all 0s
IF CMP0 THEN DST[31..0] ← FFFFFFFFH
(comparison false). The unordered relationship is true when at leas
ELSE DST[31..0] ← 00000000H;
one of the two operands is a NAN; the ordered relationship id true
when neither operand is a NAN. IF CMP1 THEN DST[63..32] ← FFFFFFFFH
127
ELSE DST[63..32] ← 00000000H;
DST IF CMP2 THEN DST[95..64] ← FFFFFFFFH
127 ELSE DST[95..64] ← 00000000H;
SRC
IF CMP3 THEN DST[127..96] ← FFFFFFFFH
x?y x?y x?y x?y ELSE DST[127..96] ← 00000000H
127
DST
Compare Packed CMPEQPS xmm1, xmm2 <=> CMPPS xmm1,xmm2, 0 see CMPPS
Singles CMPLTPS <=> CMPPS xmm1,xmm2, 1
CMPLEPS <=> CMPPS xmm1,xmm2, 2
CMPUNORDPS <=> CMPPS xmm1,xmm2, 3
CMPNEQPS <=> CMPPS xmm1,xmm2, 4
CMPNLTPS <=> CMPPS xmm1,xmm2, 5
CMPNLEPS <=> CMPPS xmm1,xmm2, 6
CMPORDPS <=> CMPPS xmm1,xmm2, 7
Compare Scalar Singles CMPSS xmm1, xmm2/m32, Compares the low double-precision floating-point values in CMP0 ← DST[31..0] OP SRC[31..0];
imm8 xmm2/m128 and xmm1 using imm8 as comparison predicate: 0 – IF CMP0 THEN DST[31..0] ← FFFFFFFFH
equal, 1 – less than, 2 – less or equal, 3 – unordered, 4 – not equal, 5 ELSE DST[31..0] ← 00000000H;
– not less, 6 – not less or equal, 7 – ordered. The result of each // DST[127..32] remains unchanged
comparison in a quad-word mask of all 1s (comparison true) or all 0s
(comparison false). The unordered relationship is true when at leas
one of the two operands is a NAN; the ordered relationship id true
when neither operand is a NAN.
127
DST
127
SRC
x?y
127
DST
Compare Scalar Singles CMPEQSS xmm1, xmm2 <=> CMPSS xmm1,xmm2, 0 see CMPSS
CMPLTSS <=> CMPSS xmm1,xmm2, 1
CMPLESS <=> CMPSS xmm1,xmm2, 2
CMPUNORDSS <=> CMPSS xmm1,xmm2, 3
CMPNEQSS <=> CMPSS xmm1,xmm2, 4
CMPNLTSS <=> CMPSS xmm1,xmm2, 5
CMPNLESS <=> CMPSS xmm1,xmm2, 6
CMPORDSS <=> CMPSS xmm1,xmm2, 7
Compare Scalar Singles COMISS xmm1, xmm2/m32 Compares the low single-precision floating-point values in the Result ← OrderedCompare(DST[31..0], SRC[31..0])
and set EFLAGS operands and sets the EFLAGS flags accordingly. Performs ordered CASE (Result) OF
compare. This instruction differs from the UCOMISS instruction in UNORDERED: ZF, PF, CF ← 111;
that is signals an invalid operation exception when a source operand GREATER_THAN: ZF, PF, CF ← 000;
is a QNan or and SNaN. LESS_THAN: ZF, PF, CF ← 001;
127
DST EQUAL: ZF, PF, CF ← 100;
SRC
127 END
OF, AF, SF ← 0;
1 1 1 unordered
X0>Y0 Ordered
0 0 0
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0
OF AF SF ZF PF CF
Unordered Compare UCOMISS xmm1, xmm2/m32 Compares the low single-precision floating-point value in the Result ← UnorderedCompare(DST[31..0], SRC[31..0])
Scalar Singles and set operands and sets the EFLAGS flags accordingly. Performs CASE (Result) OF
EFLAGS unordered compare. This instruction differs from the COMISS UNORDERED: ZF, PF, CF ← 111;
instruction in that is signals an invalid operation exception only when GREATER_THAN: ZF, PF, CF ← 000;
a source operand is a SNaN. LESS_THAN: ZF, PF, CF ← 001;
127
DST EQUAL: ZF, PF, CF ← 100;
127 END
SRC
OF, AF, SF ← 0;
1 1 1 unordered
X0>Y0 Ordered
0 0 0
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0
OF AF SF ZF PF CF
OR of Packed Singles ORPS xmm1, xmm2/m128 Performs a bitwise OR operation of the four packed single-precision DST[31..0] ← DST[31..0] OR SRC[31-0];
floating-point values from the destination (first) and source (second) DST[63..32] ← DST[63..32] OR SRC[63..32];
operands and stored the result in the destination operand. DST[95..64] ← DST[95..64] OR SRC[95..64];
127
DST DST[127..96] ← DST[127..96] OR SRC[127..96];
127
SRC
OR OR OR OR
127
DST
Exclusive OR of XORPS xmm1, xmm2/m128 Performs a bitwise XOR operation of the four packed single- DST[31..0] ← DST[31..0] XOR SRC[31-0];
Packed Singles precision floating-point values from the destination (first) and source DST[63..32] ← DST[63..32] XOR SRC[63..32];
(second) operands and stored the result in the destination operand. DST[95..64] ← DST[95..64] XOR SRC[95..64];
127
DST X3 X2 X1 X0 DST[127..96] ← DST[127..96] XOR SRC[127..96];
127
SRC Y3 Y2 Y1 Y0
127
DST Y1 X1 Y0 X0
Unpack High Packed UNPCKHPS xmm1, xmm2/m128 Unpacks and interleaves the low single-precision floating- DST[31..0] ← DST[95..64]
Singles point values from the high quad words of the source (second) DST[63..32] ← SRC[95..64]
operand and the destination (first) operand. DST[95..64] ← DST[127..96]
127 DST[127..96] ← SRC[127..96]
DST X3 X2 X1 X0
127
SRC Y3 Y2 Y1 Y0
127
DST Y3 X3 Y2 X2
SSE Conversion Instructions
Instruction Mnemonic Operands Description Symbolic operations
Convert Packed CVTPI2PS xmm, mm/m64 Converts two packed signed double word integers from mm/mem64 to DST[31..0] ← IntToSingle (SRC[31..0]);
Integers to Packed two packed single-precision floating-point values from xmm. DST[63..32] ← IntToSingle (SRC[63..32]);
Singles 63 // DST[127..64] remains unchanged
SRC
127
DST
Convert Packed Singles CVTPS2PI mm, xmm/m64 Converts two packed single-precision floating-point values from DST[31..0] ← SingleToInt (SRC[31..0]);
to Packed Integers xmm/m64 to two packed signed double-word integers in mm. DST[63..32] ← SingleToInt (SRC[63..32]);
127
SRC
63
DST
Convert Scalar Integer CVTSI2SS xmm, r/m32 Converts one signed double-word integer from r/m32 to one scalar DST[31..0]← IntToSingle (SRC);
to Scalar Single single-precision floating-point value in xmm.. // DST[127..32] remains unchanged
31
SRC
127
DST
Convert Scalar Single CVTSS2SI r32, xmm/m32 Converts a single-precision floating-point value from xmm/m32 to a DST← SingleToInt (SRC[31..0]);
Scalar Integer signed double-word integer in r32.
127
SRC
31
DST
Convert with CVTTPS2PI mm, xmm/m64 Converts two packed single-precision floating-point values from DST[31..0] ← TruncateSingleToInt (SRC[31..0]);
Truncation Packed xmm/m64 to two packed signed double-word integers in mm using DST[63..32] ← TruncateSingleToInt (SRC[63..32]);
Singles to Packed truncation. // DST[127..64] remains unchanged
Integers 127
SRC
63
DST
Convert with CVTTSS2SI r32, xmm/m32 Converts a single-precision floating-point value from xmm/m32 to a DST← TruncateSingleToInt (SRC[31..0]);
Truncation Scalar signed double-word integer in r32 using truncation.
Single to Scalar Integer 127
SRC
31
DST
SSE 64-Bit SIMD Integer Instructions
Instruction Mnemonic Operands Description Symbolic operations
Packed Average PAVGB mm1, mm2/m64 Averages 8 packed unsigned bytes from mm1 and 8 packed DST[7..0] ← (DST[7..0]+SRC[7..0]+1) SHR 1
Bytes unsigned bytes from mm2/m64 with rounding. DST[15..8] ← (DST[15..8]+SRC[15..8]+1) SHR 1
63 DST[23..16] ← (DST[23..16]+SRC[23..16]+1) SHR 1
DST
DST[31..24] ← (DST[31..24]+SRC[31..24]+1) SHR 1
63
SRC DST[39..32] ← (DST[39..32]+SRC[39..32]+1) SHR 1
DST[47..40] ← (DST[47..40]+SRC[47..40]+1) SHR 1
AV AV AV AV AV AV AV AV
G G G G G G G G
DST[55..48] ← (DST[55..48]+SRC[55..48]+1) SHR 1
DST
63
DST[63..56] ← (DST[63..56]+SRC[63..56]+1) SHR 1
xmm1,xmm2/m128 Averages 16 packed unsigned bytes from xmm1 and 16 packed DST[7..0] ← (DST[7..0]+SRC[7..0]+1) SHR 1
unsigned bytes from xmm2/m128 with rounding. DST[15..8] ← (DST[15..8]+SRC[15..8]+1) SHR 1
127 DST[23..16] ← (DST[23..16]+SRC[23..16]+1) SHR 1
DST
DST[31..24] ← (DST[31..24]+SRC[31..24]+1) SHR 1
127
SRC DST[39..32] ← (DST[39..32]+SRC[39..32]+1) SHR 1
DST[47..40] ← (DST[47..40]+SRC[47..40]+1) SHR 1
AV AV AV AV AV AV AV AV AV AV AV AV AV AV AV AV
G G G G G G G G G G G G G G G G
DST[55..48] ← (DST[55..48]+SRC[55..48]+1) SHR 1
DST
127
DST[63..56] ← (DST[63..56]+SRC[63..56]+1) SHR 1
DST[71..64] ← (DST[71..64]+SRC[71..64]+1) SHR 1
DST[79..72] ← (DST[79..72]+SRC[79..72]+1) SHR 1
DST[87..80] ← (DST[87..80]+SRC[87..80]+1) SHR 1
DST[95..88] ← (DST[95..88]+SRC[95..88]+1) SHR 1
DST[103..96] ← (DST[103..96]+SRC[103..96]+1) SHR 1
DST[111..104] ← (DST[111..104]+SRC[111..104]+1) SHR 1
DST[119..112] ← (DST[119..112]+SRC[119..112]+1) SHR 1
DST[127..120] ← (DST[127..120]+SRC[127..120]+1) SHR 1
Packed Average PAVGW mm1, mm2/m64 Averages 4 packed unsigned words from mm1 and 4 packed DST[15..0] ← (DST[15..0]+SRC[15..0]+1) SHR 1
Words unsigned words from mm2/m64 with rounding. DST[31..16] ← (DST[31..16]+SRC[31..16]+1) SHR 1
63 DST[47..32] ← (DST[47..32]+SRC[47..32]+1) SHR 1
DST
DST[63..56] ← (DST[63..48]+SRC[63..48]+1) SHR 1
63
SRC
63
DST
xmm1,xmm2/m128 Averages 8 packed unsigned words from xmm1 and 8 packed DST[15..0] ← (DST[15..0]+SRC[15..0]+1) SHR 1
unsigned words from xmm2/m128 with rounding. DST[31..16] ← (DST[31..16]+SRC[31..16]+1) SHR 1
127 DST[47..32] ← (DST[47..32]+SRC[47..32]+1) SHR 1
DST
DST[63..56] ← (DST[63..48]+SRC[63..48]+1) SHR 1
127
SRC DST[79..64] ← (DST[79..64]+SRC[79..64]+1) SHR 1
DST[95..80] ← (DST[95..80]+SRC[95..80]+1) SHR 1
AVG AVG AVG AVG AVG AVG AVG AVG
DST[111..96] ← (DST[111..96]+SRC[111..96]+1) SHR 1
DST
127
DST[127..112] ← (DST[127..112]+SRC[127..112]+1) SHR 1
Packed Maximum of PMAXSW mm1, mm2/m64 Compares 4 signed word integers in mm1 with 4 signed word DST[15..0] ← MaximumOf (DST[15..0], SRC[15..0])
Words integers in mm2/m64 and returns maximum values. DST[31..16] ← MaximumOf (DST[31..16], SRC[31..16])
63 DST[47..32] ← MaximumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MaximumOf (DST[63..48], SRC[63..48])
63
SRC
63
DST
xmm1, xmm2/m128 Compares 8 signed word integers in xmm1 with 8 signed word DST[15..0] ← MaximumOf (DST[15..0], SRC[15..0])
integers in xmm2/m128 and returns maximum values. DST[31..16] ← MaximumOf (DST[31..16], SRC[31..16])
127 DST[47..32] ← MaximumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MaximumOf (DST[63..48], SRC[63..48])
127
SRC DST[71..64] ← MaximumOf (DST[71..64], SRC[71..64])
DST[95..80] ← MaximumOf (DST[95..80], SRC[95..80])
MAX MAX MAX MAX MAX MAX MAX MAX
DST[111..96] ← MaximumOf (DST[111..96], SRC[111..96])
DST
127
DST[127..112] ← MaximumOf (DST[127..112], SRC[127..112])
Packed Maximum of PMAXUB mm1, mm2/m64 Compares 8 unsigned byte integers in xmm1 with 8 unsigned DST[7..0] ← MaximumOf (DST[7..0], SRC[7..0])
Unsigned Bytes byte integers in xmm2/m128 and returns maximum values. DST[15..8] ← MaximumOf (DST[15..8], SRC[15..8])
63 DST[23..16] ← MaximumOf (DST[23..16], SRC[23..16])
DST
DST[31..24] ← MaximumOf (DST[31..24], SRC[31..24])
63
SRC DST[39..32] ← MaximumOf (DST[39..32], SRC[39..32])
DST[47..40] ← MaximumOf (DST[47..40], SRC[47..40])
MA MA MA MA MA MA MA MA
X X X X X X X X
DST[55..48] ← MaximumOf (DST[55..48], SRC[55..48])
DST
63
DST[63..56] ← MaximumOf (DST[63..56], SRC[63..56])
Packed Minimum of PMAXSW mm1, mm2/m64 Compares 4 signed word integers in mm1 with 4 signed word DST[15..0] ← MinimumOf (DST[15..0], SRC[15..0])
Words integers in mm2/m64 and returns minimum values. DST[31..16] ← MinimumOf (DST[31..16], SRC[31..16])
63 DST[47..32] ← MinimumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MinimumOf (DST[63..48], SRC[63..48])
63
SRC
63
DST
xmm1, xmm2/m128 Compares 8 signed word integers in xmm1 with 8 signed word DST[15..0] ← MinimumOf (DST[15..0], SRC[15..0])
integers in xmm2/m128 and returns minimum values. DST[31..16] ← MinimumOf (DST[31..16], SRC[31..16])
127 DST[47..32] ← MinimumOf (DST[47..32], SRC[47..32])
DST
DST[63..48] ← MinimumOf (DST[63..48], SRC[63..48])
127
SRC DST[71..64] ← MinimumOf (DST[71..64], SRC[71..64])
DST[95..80] ← MinimumOf (DST[95..80], SRC[95..80])
MAX MAX MAX MAX MAX MAX MAX MAX
DST[111..96] ← MinimumOf (DST[111..96], SRC[111..96])
DST
127
DST[127..112] ← MinimumOf (DST[127..112], SRC[127..112])
Packed Minimum of PMAXUB mm1, mm2/m64 Compares 8 unsigned byte integers in xmm1 with 8 unsigned DST[7..0] ← MinimumOf (DST[7..0], SRC[7..0])
Unsigned Bytes byte integers in xmm2/m128 and returns minimum values. DST[15..8] ← MinimumOf (DST[15..8], SRC[15..8])
63 DST[23..16] ← MinimumOf (DST[23..16], SRC[23..16])
DST
DST[31..24] ← MinimumOf (DST[31..24], SRC[31..24])
63
SRC DST[39..32] ← MinimumOf (DST[39..32], SRC[39..32])
DST[47..40] ← MinimumOf (DST[47..40], SRC[47..40])
MA MA MA MA MA MA MA MA
X X X X X X X X
DST[55..48] ← MinimumOf (DST[55..48], SRC[55..48])
DST
63
DST[63..56] ← MinimumOf (DST[63..56], SRC[63..56])
Move Byte Mask To PMOVMSKB r32, mm Creates a mask made up of the most significant bit of each byte DST[0] ← SRC[7]
Integer of the mmx register and stored the result in the low byte r32 DST[1] ← SRC[15]
register. DST[2] ← SRC[23]
63 DST[3] ← SRC[31]
SRC DST[4] ← SRC[39]
DST[5] ← SRC[47]
31 DST[6] ← SRC[55]
DST 0 0 0 DST[7] ← SRC[63]
DST[31..8] ← 000000H
r32, xmm Creates a mask made up of the most significant bit of each byte DST[0] ← SRC[7]
of the xmm register and stored the result in the low word of DST[1] ← SRC[15]
xmm register. DST[2] ← SRC[23]
127 DST[3] ← SRC[31]
SRC DST[4] ← SRC[39]
DST[5] ← SRC[47]
31 DST[6] ← SRC[55]
DST 0 0 DST[7] ← SRC[63]
DST[8] ← SRC[71]
DST[9] ← SRC[79]
DST[10] ← SRC[87]
DST[11] ← SRC[95]
DST[12] ← SRC[103]
DST[13] ← SRC[111]
DST[14] ← SRC[119]
DST[15] ← SRC[127]
DST[31..16] ← 0000H
Packed Sum of PSADBW mm1, mm2/m64 Computes the absolute differences of the packed unsigned byte TMP0 ← ABS(DST[7..0]SRC[7..0])
Absolute Differences integers from mm1 and mm2/m64; differences are then TMP1 ← ABS(DST[15..8]SRC[15..8])
summed to produce an unsigned word integer result. TMP2 ← ABS(DST[23..16]SRC[23..16])
63
DST TMP3 ← ABS(DST[31..24]SRC[31..24])
63 TMP4 ← ABS(DST[39..32]SRC[39..32])
SRC
TMP5 ← ABS(DST[47..40]SRC[47..40])
|x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| TMP6 ← ABS(DST[55..48]SRC[55..48])
TMP7 ← ABS(DST[63..56]SRC[63..56])
SUM
63
DST[15..0] ← SUM(TMP0..TMP7)
DST 0 0 0 0 0 0 DST[63..16] ← 000000000000H
xmm1, xmm2/m128 Computes the absolute differences of the packed unsigned byte TMP0 ← ABS(DST[7..0]SRC[7..0])
integers from xmm1 and xmm2/m128; the 8 low differences TMP1 ← ABS(DST[15..8]SRC[15..8])
and the high 8 differences are then summed separately to TMP2 ← ABS(DST[23..16]SRC[23..16])
produce an two unsigned word integer results.
TMP3 ← ABS(DST[31..24]SRC[31..24])
127
DST TMP4 ← ABS(DST[39..32]SRC[39..32])
127 TMP5 ← ABS(DST[47..40]SRC[47..40])
SRC
TMP6 ← ABS(DST[55..48]SRC[55..48])
|x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| |x-y| TMP7 ← ABS(DST[63..56]SRC[63..56])
TMP8 ← ABS(DST[71..64]SRC[71..64])
SUM SUM
127 TMP9 ← ABS(DST[79..72]SRC[79..72])
DST 0 0 0 0 0 0 0 0 0 0 0 0
TMP10 ← ABS(DST[87..80]SRC[87..80])
TMP11 ← ABS(DST[95..88]SRC[95..88])
TMP12 ← ABS(DST[103..96]SRC[103..96])
TMP13 ← ABS(DST[111..104]SRC[111..104])
TMP14 ← ABS(DST[119..112]SRC[119..112])
TMP15 ← ABS(DST[127..120]SRC[127..120])
DST[15..0] ← SUM(TMP0..TMP7)
DST[63..16] ← 000000000000H
DST[79..64] ← SUM(TMP8..TMP15)
DST[127..80] ← 000000000000H
Packed Extract Word PEXTRW r32, mm, imm8 Extracts the word specified by imm8 from the mmx register SEL ← Count AND 3H
and moves it to the r32 register. TMP ← (SRC SHR (SEL * 16)) AND 0FFFFH
63
DST[15..0] ← TMP[15..0]
SRC DST[31..16] ← 0000H
SEL
31
DST 0
r32, xmm, imm8 Extracts the word specified by imm8 from the xmm register SEL ← Count AND 7H
and moves it to the r32 register. TMP ← (SRC SHR (SEL * 16)) AND 0FFFFH
127
DST[15..0] ← TMP[15..0]
SRC DST[31..16] ← 0000H
SEL
31
DST 0
Packed Insert Word PINSRW mm, r32/m16, imm8 Inserts the low word from the r32 register or memory into the SEL ← Count AND 3H
mmx register at the word position specified by imm8. CASE (SEL) OF
31
0: MASK ← 000000000000FFFFH
SRC 1: MASK ← 00000000FFFF0000H
2: MASK ← 0000FFFF00000000H
SEL
3: MASK ← FFFF000000000000H
63
DST END
DST ← (DST AND NOT MASK) OR
((SRC SHL (SEL *16)) AND MASK)
xmm, r32/m16, imm8 Inserts the low word from the r32 register or memory into the SEL ← Count AND 7H
xmm register at the word position specified by imm8. CASE (SEL) OF
31
0: MASK ← 0000000000000000000000000000FFFFH
SRC 1: MASK ← 000000000000000000000000FFFF0000H
2: MASK ← 00000000000000000000FFFF00000000H
SEL
3: MASK ← 0000000000000000FFFF000000000000H
127
DST 4: MASK ← 000000000000FFFF0000000000000000H
5: MASK ← 00000000FFFF00000000000000000000H
6: MASK ← 0000FFFF000000000000000000000000H
7: MASK ← FFFF0000000000000000000000000000H
END
DST ← (DST AND NOT MASK) OR
((SRC SHL (SEL *16)) AND MASK)
Packed Shuffle Words PSHUFW mm1, mm2/m64, imm8 Copies words from source (second) operand and inserts them DST[15..0] ← (SRC SHR (ORDER[1..0] * 16))[15..0]
into the destination (first) operand at word locations selected DST[31..16] ← (SRC SHR (ORDER[3..2] * 16))[15..0]
with the order (third) operand. DST[47..32] ← (SRC SHR (ORDER[5..4] * 16))[15..0]
SRC
63
X3 X2 X1 X0
DST[63..48] ← (SRC SHR (ORDER[7..6] * 16))[15..0]
63
DST X3..X0 X3..X0 X3..X0 X3..X0
31
DST 0 0 0
Move Scalar Double MOVSD xmm, m128 Moves scalar double-precision floating-point value from source to DST[63..0] ← SRC[63..0]
destination operand. DST[127..64] ← 0000000000000000H
+ +
127
DST
Add Scalar Doubles ADDSD xmm1, xmm2/m64 Adds the low double-precision floating-point value from xmm2/m64 to DST[63..0] ← DST[63..0] + SRC[63..0];
the low double-precision floating-point value in xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC
+
127
DST
Subtract Packed SUBPD xmm1, xmm2/m128 Subtracts packed double-precision floating-point values from DST[63..0] ← DEST[63..0] SRC[63..0];
Doubles xmm2/m128 from 2 packed double-precision floating-point values in DST[127..64] ← DEST[127..64] SRC[127..64];
xmm1.
127
DST
127
SRC
- -
127
DST
Subtract Scalar Doubles SUBSD xmm1, xmm2/m64 Subtracts the low double-precision floating-point value from xmm2/m64 DST[63..0] ← DST[63..0] SRC[63..0];
from the low double-precision floating-point value in xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC
-
127
DST
Multiply Packed MULPD xmm1, xmm2/m128 Multiplies 2 packed double-precision floating-point values from mm1 DST[63..0] ← DEST[63..0] * SRC[63..0];
Doubles with 2 packed double-precision floating-point values in xmm2/m128. DST[127..64] ← DEST[127..64] + SRC[127..64];
127
DST
127
SRC
127
DST
Multiply Scalar Doubles MULSD xmm1, xmm2/m64 Multiplies the low double-precision floating-point value from mm1 with DST[63..0] ← DST[63..0] * SRC[63..0];
the low double-precision floating-point value in xmm2/m64. // DST[127..64] remains unchanged
127
DST
127
SRC
127
DST
Divide Packed Doubles DIVPD xmm1,xmm2/m128 Divides 2 packed double-precision floating-point values from mm1 by 2 DST[63..0] ← DST[63..0] / SRC[63..0];
packed double-precision floating-point values in xmm2/m128. DST[127..64] ← DST[127..64] / SRC[127..64];
127
DST
127
SRC
127
DST
Divide Scalar Doubles DIVSD xmm1,xmm2/m64 Divides the low double-precision floating-point value from mm1 by the DST[63..0] ← DST[63..0] / SRC[63..0];
low double-precision floating-point value in xmm2/m64. // DST[127..64] remains unchanged
127
DST
127
SRC
127
DST
Square Roots of Packed SQRTPD xmm1, xmm2/m128 Computes the square roots of the packed double-precision floating-point DST[63..0] ← SquareRoot (SRC[63..0]);
Doubles values in xmm2/m128 and stores the results in xmm1. DST[127..64] ← SquareRoot (SRC[127..64]);
127
SRC
SQRT SQRT
127
DST
Square Root of Scalar SQRTSD xmm1, xmm2/m32 Computes the square root of the scalar double-precision floating-point DST[63..0] ← SquareRoot (SRC[63..0]);
Double value in xmm2/m128 and stores the result in xmm1. // DST[127..64] remains unchanged
127
SRC
SQRT
127
DST
Maximum Packed MAXPD xmm1, xmm2/m128 Returns the maximum single-precision floating-point values between DST[63..0] ← MaximumOf (DST[63..0], SRC[63..0]);
Double xmm2/m128 and xmm1. DST[127..64] ← MaximumOf (DST[127..64], SRC[127..64]);
127
DST
127
SRC
MAX MAX
127
DST
Maximum Scalar MAXSD xmm1, xmm2/m32 Returns the maximum scalar single-precision floating-point value DST[63..0] ← MaximumOf (DST[63..0], SRC[63..0]);
Double between xmm2/m128 and xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC
MAX
127
DST
Minimum Packed MINPD xmm1, xmm2/m128 Returns the minimum single-precision floating-point values between DST[63..0] ← MinimumOf (DST[63..0], SRC[63..0]);
Double xmm2/m128 and xmm1. DST[127..64] ← MiniimumOf (DST[127..64], SRC[127..64]);
127
DST
127
SRC
MIN MIN
127
DST
Minimum Scalar MINSD xmm1, xmm2/m32 Returns the minimum scalar single-precision floating-point value DST[63..0] ← MinimumOf (DST[63..0], SRC[63..0]);
Double between xmm2/m128 and xmm1. // DST[127..64] remains unchanged
127
DST
127
SRC
MIN
127
DST
AND AND
127
DST
AND NOT of Packed ANDNPD xmm1, xmm2/m128 Inverts the bits of the two packed double-precision floating-point values DST[63..0] ← (NOT DEST[63..0]) AND SRC[63..0];
Doubles in the destination (first) operand, performs a bitwise logical AND DST[127..64] ← (NOT DEST[127..64]) AND SRC[127..64]
operation of the two packed double-precision floating-point values from
the temporary inverted result and source (second) operand and stored the
result in the destination operand.
127
DST
127
SRC
ANDN ANDN
127
DST
OR of Packed Doubles ORPD xmm1, xmm2/m128 Performs a bitwise OR operation of the two packed double-precision DST[63..0] ← DEST[63..0] OR SRC[63..0];
floating-point values from the destination (first) and source (second) DST[127..64] ← DEST[127..64] OR SRC[127..64]
operands and stored the result in the destination operand.
127
DST
127
SRC
OR OR
127
DST
Exclusive OR of Packed XORPD xmm1, xmm2/m128 Performs a bitwise XOR operation of the two packed double-precision DST[63..0] ← DEST[63..0] XOR SRC[63..0];
Doubles floating-point values from the destination (first) and source (second) DST[127..64] ← DEST[127..64] XOR SRC[127..64]
operands and stored the result in the destination operand.
127
DST
127
SRC
XOR XOR
127
DST
x?y x?y
127
DST
Compare Packed CMPEQPD xmm1, xmm2 <=> CMPPD xmm1,xmm2, 0 see CMPPD
Doubles CMPLTPD <=> CMPPD xmm1,xmm2, 1
CMPLEPD <=> CMPPD xmm1,xmm2, 2
CMPUNORDPD <=> CMPPD xmm1,xmm2, 3
CMPNEQPD <=> CMPPD xmm1,xmm2, 4
CMPNLTPD <=> CMPPD xmm1,xmm2, 5
CMPNLEPD <=> CMPPD xmm1,xmm2, 6
CMPORDPD <=> CMPPD xmm1,xmm2, 7
Compare Scalar CMPSD xmm1, xmm2/m64, Compares the low double-precision floating-point values in xmm2/m128 CMP0 ← DST[63..0] OP SRC[63..0];
Doubles imm8 and xmm1 using imm8 as comparison predicate: 0 – equal, 1 – less than, 2 IF CMP0 THEN DST[63..0] ← FFFFFFFFFFFFFFFFH
– less or equal, 3 – unordered, 4 – not equal, 5 – not less, 6 – not less or ELSE DST[63..0] ← 0000000000000000H;
equal, 7 – ordered. The result of each comparison in a quad-word mask of // DST[127..64] remains unchanged
all 1s (comparison true) or all 0s (comparison false). The unordered
relationship is true when at leas one of the two operands is a NAN; the
ordered relationship id true when neither operand is a NAN.
127
DST
127
SRC
x?y
127
DST
Compare Scalar CMPEQSD xmm1, xmm2 <=> CMPSD xmm1,xmm2, 0 see CMPSD
Doubles CMPLTSD <=> CMPSD xmm1,xmm2, 1
CMPLESD <=> CMPSD xmm1,xmm2, 2
CMPUNORDSD <=> CMPSD xmm1,xmm2, 3
CMPNEQSD <=> CMPSD xmm1,xmm2, 4
CMPNLTSD <=> CMPSD xmm1,xmm2, 5
CMPNLESD <=> CMPSD xmm1,xmm2, 6
CMPORDSD <=> CMPSD xmm1,xmm2, 7
Compare Scalar COMISD xmm1, xmm2/m64 Compares low double-precision floating-point values in the operands and Result ← OrderedCompare(DST[63..0], SRC[63..0])
Doubles and Set sets the EFLAGS flags accordingly. Performs ordered compare. This CASE (Result) OF
EFLAGS instruction differs from the UCOMISS instruction in that is signals an UNORDERED: ZF, PF, CF ← 111;
invalid operation exception when a source operand is a QNan or and GREATER_THAN: ZF, PF, CF ← 000;
SNaN. LESS_THAN: ZF, PF, CF ← 001;
127
DST EQUAL: ZF, PF, CF ← 100;
SRC
127 END
OF, AF, SF ← 0;
1 1 1 unordered
X0>Y0 Ordered
0 0 0
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0
OF AF SF ZF PF CF
Unordered Compare UCOMISD xmm1, xmm2/m64 Compares low double-precision floating-point values in the operands and Result ← UnorderedCompare(DST[63..0], SRC[63..0])
Scalar Doubles and set sets the EFLAGS flags accordingly. Performs unordered compare. This CASE (Result) OF
EFLAGS instruction differs from the COMISS instruction in that is signals an UNORDERED: ZF, PF, CF ← 111;
invalid operation exception only when a source operand is a SNaN. GREATER_THAN: ZF, PF, CF ← 000;
127
DST LESS_THAN: ZF, PF, CF ← 001;
127 EQUAL: ZF, PF, CF ← 100;
SRC
END
1 1 1 unordered OF, AF, SF ← 0;
0 0 0 X0>Y0 Unordered
X0<Y0 Compare
0 0 1
X0=Y0
0 0 0 1 0 0
OF AF SF ZF PF CF
SSE2 Shuffle and Unpack Instructions
Instruction Mnemonic Operands Description Symbolic operations
Shuffle Packed Dwords PSHUFD xmm1, xmm2/m128, imm8 Moves double words from source (second) operand and inserts DST[31..0] ← (SRC SHR (ORDER[1..0]*32))[31..0]
them in the destination (first) operand at locations selected DST[63..32] ← (SRC SHR (ORDER[3..2]*32))[31..0]
with the order (third) operand. DST[95..64] ← (SRC SHR (ORDER[5..4]*32))[31..0]
SRC
127
X3 X2 X1 X0
DST[127..96] ← (SRC SHR (ORDER[7..6]*32))[31..0]
127
DST X3..X0 X3..X0 X3..X0 X3..X0
Shuffle Packed Low PSHUFLW xmm1, xmm2/m128, imm8 Moves the words from the low quad word of the source DST[15..0] ← (SRC SHR (ORDER[1..0]*16))[15..0]
Words (second) operand and inserts them to the low quad word of the DST[31..16] ← (SRC SHR (ORDER[1..0]*16))[15..0]
destination (first) operand at locations selected with the order DST[47..32] ← (SRC SHR (ORDER[1..0]*16))[15..0]
(third) operand. DST[63..48] ← (SRC SHR (ORDER[3..2]*16))[15..0]
SRC
127
X3 X2 X1 X0 DST[127..64] ← SRC[127..64]
127
DST X3..X0 X3..X0 X3..X0 X3..X0
Shuffle Packed High PSHUFHW xmm1, xmm2/m128, imm8 Moves the words from the high quad word of the source DST[63..0] ← SRC[63..0]
Words (second) operand and inserts them to the high quad word of DST[79..64] ← (SRC SHR (ORDER[1..0]*16))[79..64]
the destination (first) operand at locations selected with the DST[95..80] ← (SRC SHR (ORDER[1..0]*16))[79..64]
order (third) operand. DST[111..96] ← (SRC SHR (ORDER[1..0]*16))[79..64]
SRC
127
X3 X2 X1 X0 DST[127..112] ← (SRC SHR (ORDER[3..2]*16))[79..64]
127
DST X3..X0 X3..X0 X3..X0 X3..X0
Unpack Low Packed UNPCKLPD xmm1, xmm2/m128 Unpacks and interleaves the low double-precision floating- DST[63..0] ← DST[63..0]
Doubles point values from the low quad words of the source (second) DST[127..64] ← SRC[63..0]
operand and the destination (first) operand.
127
SRC Y1 Y0
127
DST X1 X0
127
DST Y0 X0
Unpack High Packed UNPCKHPD xmm1, xmm2/m128 Unpacks and interleaves the low double-precision floating- DST[63..0] ← DST[127..64]
Doubles point values from the high quad words of the source (second) DST[127..64] ← DST[127..64]
operand and the destination (first) operand.
127
DST X3 X2
127
SRC Y3 Y2
127
DST Y3 X3
Unpack Low Data PUNPCKLQDQ xmm1, xmm2/m128 Unpacks and interleaves low-order quad words from xmm1 DST[63..0] ← DST[63..0]
and xmm2/m128 into xmm1 register. DST[127..64] ← SRC[63..0]
127
SRC A’’
127
DST A’
127
DST A’’ A’
Unpack Low Data PUNPCKHQDQ xmm1, xmm2/m128 Unpacks and interleaves high-order quad words from xmm1 DST[63..0] ← DST[127..6]
and xmm2/m128 into xmm1 register. DST[127..64] ← SRC[127..64]
127
DST A’
127
SRC A’’
127
DST A’’ A’
127
DST
Convert Packed CVTPD2PI mm, xmm/m128 Converts two packed double-precision floating-point values from DST[31..0] ← DoubleToInt (SRC[63..0]);
Doubles to Packed xmm/m128 to two packed signed double-word integers in mm. DST[63..32] ← DoubleToInt (SRC[127..64])
Integers 127
SRC
63
DST
Convert with CVTTPD2PI mm, xmm/m128 Converts two packed double-precision floating-point values from DST[31..0] ← TruncateDoubleToInt (SRC[63..0]);
Truncation Packed xmm/m128 to two packed signed double-word integers in mm using DST[63..32] ← TruncateDoubleToInt (SRC[127..64])
Doubles to Packed truncation.
Integers 127
SRC
63
DST
Convert Packed CVTPD2DQ xmm1, xmm2/m128 Converts two packed double-precision floating-point values from DST[31..0] ← DoubleToInt (SRC[63..0]);
Doubles to Packed xmm2/m128 to two packed signed double-word integers in xmm1. DST[63..32] ← DoubleToInt (SRC[127..64]);
Dwords 127 DST[127-64] ← 0000000000000000H
SRC
127
DST 0 0
Convert with CVTTPD2DQ xmm1, xmm2/m128 Converts two packed double-precision floating-point values from DST[31..0] ← TruncateDoubleToInt (SRC[63..0]);
Truncation Packed xmm2/m128 to two packed signed double-word integers in xmm1 using DST[63..32] ← TruncateDoubleToInt (SRC[127..64]);
Doubles to Packed truncation. DST[127-64] ← 0000000000000000H
Dwords 127
SRC
127
DST 0 0
Convert Packed Dwords CVTDQ2PD xmm1, xmm2/m64 Converts two packed signed double-word integers from xmm2/m64 to DST[63..0] ← IntToDouble(SRC[31..0]);
to Packed Doubles two packed double-precision floating-point values in xmm1. DST[127..64] ← IntToDouble(SRC[63..32])
63
SRC
127
DST
Convert Packed Singles CVTPS2PD xmm1, xmm2/m64 Converts two packed single-precision floating-point values from DST[63..0] ← SingleToDouble (SRC[31..0]);
to Packed Doubles xmm2/m64 to two packed double-precision floating-point values in DST[127..64] ← SingleToDouble (SRC[63..32])
xmm1.
63
SRC
127
DST
Convert Packed CVTPD2PS xmm1, xmm2/m128 Converts two packed double-precision floating-point values from DST[31..0] ← DoubleToSingle (SRC[63..0]);
Doubles to Packed xmm2/m128 to two packed single-precision floating-point values in DST[63..32] ← DoubleToSingle (SRC[127..64]);
Singles xmm1. DST[127-64] ← 0000000000000000H
127
SRC
127
DST 0 0
Convert Scalar Single CVTSS2SD xmm1, xmm2/m32 Converts one scalar single-precision floating-point value from xmm2/m32 DST[63..0] ← SingleToDouble (SRC[31..0]);
Scalar Double. to one double-precision floating-point value in xmm. // DST[127..64] remains unchanged
127
SRC
127
DST
Convert Scalar Double CVTSD2SS xmm1, xmm2/m64 Converts one scalar double-precision floating-point value from xmm/m64 DST[31..0] ← DoubleToSingle (SRC[63..0]);
to Scalar Single to one single-precision floating-point value in xmm1. // DST[127..32] remains unchanged
127
SRC
127
DST
Convert Scalar Double CVTSD2SI r32, xmm/m64 Converts one scalar double-precision floating-point value from xmm/m64 DST← DoubleToInt (SRC[63..0]);
to Scalar Integer to one signed double-word integer in r32.
127
SRC
31
DST
Convert with CVTTSD2SI r32, xmm/m64 Converts one scalar double-precision floating-point value from xmm/m64 DST← TruncateDoubleToInt (SRC[63..0]);
Truncation Scalar to one signed double-word integer in r32 using truncation.
Double to Scalar Integer 127
SRC
31
DST
Convert Scalar Integer CVTSI2SD xmm, r/m32 Converts one signed double-word integer from r/m32 to one scalar DST[63..0]← IntToDouble (SRC);
to Scalar Double double-precision floating-point value in xmm. // DST[127..64] remains unchanged
31
SRC
127
DST
63
DST
xmm1, xmm2/m128 Adds 2 quad word integers from the source (second) operand to DST[63..0] ← DST[63..0] + SRC[63..0]
2 quad word integers in the destination (first) operand. DST[127..64] ← DST[127..64] + SRC[127..64]
127
DST
127
SRC
+ +
127
DST
Subtract Packed Quad- PSUBQ mm1, mm2/m64 Subtracts the quad word integer from the source (second) DST[63..0] ← DST[63..0] SRC[63..0]
word Integers operand from the destination (first) operand.
63
DST
63
SRC
63
DST
xmm1, xmm2/m128 Subtracts 2 quad word integers from the source (second) operand DST[63..0] ← DST[63..0] SRC[63..0]
from 2 quad word integers in the destination (first) operand. DST[127..64] ← DST[127..64] SRC[127..64]
127
DST
127
SRC
- -
127
DST
Multiply Packed PMULUDQ mm1, mm2/m64 Multiplies the unsigned double word integer from the destination DST[63..0] ← DST[31..0] * SRC[31..0]
Unsigned Double-word (first) operand by the unsigned double word integer from the
Integers source (second) operand and stores the quad word result in the
destination (first) operand.
63
DST
63
SRC
63
DST
xmm1, xmm2/m128 Subtracts 2 quad word integers from the source (second) operand DST[63..0] ← DST[31..0] * SRC[31..0]
from 2 quad word integers in the destination (first) operand. DST[127..64] ← DST[95..64] * SRC[95..64]
127
DST
127
SRC
127
DST
Packed Shift Left PSLLDQ xmm1, imm8 Shifts xmm1 left by imm8 bytes while shifting in 0s. TMP ← Count;
Logical Quad-word IF (T MP>15) THEN TMP ←16;
DST ← DST SHL (TMP * 8)
Packed Shift Right PSRLDQ xmm1, imm8 Shifts xmm1 right by imm8 bytes while shifting in 0s. TMP ← Count;
Logical Quad-word IF (T MP>15) THEN TMP ←16;
DST ← DST SHR (TMP * 8)
System Instructions
Instruction Mnemonic Operands Description Symbolic operations
Load Global Description LGDT m16&32 Loads the values from the memory into the global descriptor table GDTR ← SRC
Table Register register (GDTR).
Store Global Descriptor SGDT m16&32 Stores the contents of the global description table register (GDTR) to the DST ← GDTR
Table Register memory.
Load Interrupt LIDT m16&32 Loads the values from the memory into the interrupt descriptor table IDTR ← SRC
Descriptor Table register (IDTR).
Register
Store Interrupt SIDT m16&32 Stores the contents of the interrupt description table register (IDTR) to DST ← IDTR
Descriptor Table the memory.
Register
Load Local Descriptor LLDT r/m16 Loads the segment selector from the local descriptor table register LDTR ← SRC
Table Register r32 (LDTR) in the register or memory.
Store Local Descriptor SLDT r/m16 Stores the segment selector from the register or memory to the local DST ← LDTR
Table Register r32 descriptor table register (LDTR).
Load Machine Status LMSW r/m16 Loads the machine status word from the register or memory. MSW ← SRC
Word r32
Store Machine Status SMSW r/m16 Stores the machine status word to the register or memory. DST ← MSW
Word r32
Load Task Register LTR r/m16 Loads the source operand into the segment selector field of the task TR ← SRC
register.
Store Task Register STR r/m16 Stores the segment selector field of the task register into the destination DST ← TR
operand
Move to/from control MOV CR0, r32 Moves r32 to CR0. DST ← SRC
registers CR2, r32 Moves r32 to CR2.
CR3, r32 Moves r32 to CR3.
CR4, r32 Moves r32 to CR4.
r32, CR0 Moves CR0 to r32.
r32, CR2 Moves CR2 to r32.
r32, CR3 Moves CR3 to r32.
r32, CR4 Moves CR4 to r32.
Clear Task-Switch CLTS Clears the TS flag in the CR0 register. This flag is set every time a task CR0.TS ←0;
switch occurs.
Adjust RPL Field of ARPL r/m16, r16 Compares the RPL fields of the two segment selectors. The first IF DST.RPL < SRC.RPL THEN
Segment Selector (destination) operand contains one segment selector and the second ZF ← 1;
(source) operand contains the other. The RPL field of the first operand is DST.RPL ← SRC.RPL;
set not less then RPL of the second operand. ELSE
ZF ← 0;
END;
Load Access Rights LAR r16, r/m16 Loads the access right from the segment descriptor specified by the
Byte r32, r/m32 source operand into the destination operand and sets the ZF flag.
Load Segment Limit LSL r16, r/m16 Loads the unscrambled segment limit from the segment descriptor
r32, r/m32 specified by the source operand into the destination operand and sets the
ZF flag.
Verify a Segment for VERR r/m16 Sets ZF=1 if segment specified with r/m16 can be read.
Reading
Verify a Segment for VERW r/m16 Sets ZF=1 if segment specified with r/m16 can be written.
Writing
Move to/from Debug MOV r32, DR0-DR7 Moves debug register to r32. DST ← SRC
Registers DR0-DR7, r32 Moves r32 to debug register .
Invalidate Cache INVD Flushes internal cashes; initiates flushing of external caches.
Write Back and WBINVD Writes back and flushes internal cashes; initiates writing-back and
Invalidate Data Cache flushing of external caches.
Invalidate TLB Entry INVLPG Invalidates (flushed) the translation look-aside buffer (TLB) entry
specified with the source operand.
Assert LOCK# signal LOCK (prefix) Causes the processor LOCK# signal to be asserted during execution of
Prefix the accompanying instructions (turns the instruction into an atomic
instruction). In a multiprocessor environment, the lock# signal insures
that the processor has exclusive use of any shared memory while the
signal is asserted.
Halt HLT Stops instruction execution and places the processor in a HALT state. An
enabled interrupt (including NMI and SMI), a debug exception, the
BINIT# signal, the INIT# signal or the RESET# signal will resume
execution.
Resume from System RSM Returns program control from system management mode (SMM) to the
Management Mode application program or operating-system procedure that was interrupted
when the processor receive and SSM interrupt.
Read from Model- RDMSR Reads the contents of the 64-bit model specific register (MSR) specified
Specific Register in the ECX register into registers EDX:EAX.
Write from Model- WRMSR Writes the contents of registers EDX:EAX into the 64-bit model specific
Specific Register register (MSR) specified in the ECX register.
Read Performance RDPMC Reads the contents of the 40-bit performance-monitoring counter (PMC)
Monitoring Counters specified in the ECX register into registers EDX:EAX.
Read Time-Stamp RDTSC Reads the current value of the processors time-stamp counter into the
Counter EDX:EAX register.
Fast System Call SYSENTER Fast call to privilege level 0 system procedures or routine.
Fast Return from Fast SYSEXIT Executes a fast return to privilege level 3 user code.
System Call