0% found this document useful (0 votes)
434 views17 pages

AT&T Syntax MMX

The document provides information about MMX instructions including: 1. It discusses AT&T syntax vs Intel syntax for MMX instructions and defines the order of operations. 2. It describes various MMX instructions categorized as arithmetic, comparison, conversion, logical, shift, data transfer, and state management. For each category it lists the mnemonics and number of opcodes and provides a brief description. 3. It provides examples of using specific MMX instructions like PMADDWD for dot product calculation and bitmasking for compositing images.

Uploaded by

akirank1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
434 views17 pages

AT&T Syntax MMX

The document provides information about MMX instructions including: 1. It discusses AT&T syntax vs Intel syntax for MMX instructions and defines the order of operations. 2. It describes various MMX instructions categorized as arithmetic, comparison, conversion, logical, shift, data transfer, and state management. For each category it lists the mnemonics and number of opcodes and provides a brief description. 3. It provides examples of using specific MMX instructions like PMADDWD for dot product calculation and bitmasking for compositing images.

Uploaded by

akirank1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CS220

April 25, 2007


AT&T syntax MMX
• Most MMX documents are in Intel Syntax
OPERATION DEST, SRC
• We use AT&T Syntax
OPERATION SRC, DEST
• Always remember:
DEST = DEST OPERATION SRC
(Please note the weird subtraction and division operation
direction in FP was a mistake of gcc)
Multiplication
• Except for multiplication, conversion, and
comparison, all other MMX instructions are
straightforward.
• PMADDWD mm/m64, mm
• PMULHW mm/m64, mm

Doubleword->word, keep high part

• PMULLW mm/m64, mm

Doubleword->word, keep low part


Conversion
• PACKSSDW mm/m64, mm
• PACKUSDW mm/m64, mm
doubleword->word

• PACKUSWB mm/m64, mm
word->byte
How to do interleave pack?
• PACKSSDW %mm0, %mm0
• PACKSSDW %mm1, %mm1
• PUNPKLWD %mm1, %mm0
(interleave the low end 16-bit values of the
operands)
• PUNPCKHBW mm/m64, mm

Low parts of original 64 bits are ignored


byte_src+byte_dst=word_dst

• PUNPCKLBW mm/m64/m32, mm

High parts of original 64 bits are ignored


byte_src+byte_dst=word_dst
How to do non-interleaved unpack?
• MOVQ %mm0, %mm2
• PUNPCKLDQ %mm1, %mm0
(replace the two high end words
of mm0 with the two low end
words of mm1 leave the two mm0
low end words of mm0 in place)
• PUNPCKHDQ %mm1, %mm2
(move the two high end words of
mm2 to the two low end words
of mm2; place the two high end
words of mm1 in the two high mm2
end words of mm2)
• PCMPEQW mm/m64, mm

• PCMPGTW mm/m64, mm
Rule of Thumb
• Only Shift instructions can have immediate
number
• Only movd instruction can have 32-bit
register
• Punpckl can have 32-bit memory source
• All other instructions deal with 64-bit
registers or memory. No immediate
number!
Constant numbers
• Generate a zero in mm0:
PXOR %mm0, %mm0 PANDN %mm0, %mm0

• Generate all 1's in register mm1, which is -1 in each of the packed data type fields:
PCMPEQ %mm1, %mm1

• Generate the constant 1 in every packed-byte [or packed-word] (or packed-dword)


field:
PXOR %mm0, %mm0
PCMPEQ %mm1, %mm1
PSUBB %mm1, %mm0 [PSUBW %mm1, %mm0] (PSUBD %mm1, %mm0)

• Generate the signed constant 2n -1 in every packed-word (or packed-dword) field:


PCMPEQ %mm1, %mm1
PSRLW $(16-n), %mm1 (PSRLD $(32-n), %mm1)

• Generate the signed constant -2n in every packed-word (or packed-dword) field:
PCMPEQ %mm1, %mm1
PSLLW $n, %mm1 (PSLLD $n, %mm1)
Examples
• absolute value of a vector of signed words
movq %mm0, %mm1 #make a copy of source data
psraw $15, %mm0 #replicate sign bit
PXOR/XOR a number with all 0s, get itself
pxor %mm0, %mm1 # PXOR/XOR a number with all 1s, get NOT(itself)
psubs %mm0, %mm1 #add 1 to just the negative fields
The data in %mm0 are all 0’s and all 1’s
For positive number, it subtracts 0’s(0)
For negative number, it subtracts 1’s(-1)
Dot Production
#include<stdio.h>
main()
{
int i;
int result;
unsigned short a[] = {1, 2, 3, 4, 5, 6, 7, 8};
unsigned short b[] = {2, 4, 6, 8, 10, 12, 14, 16};

__asm__("pxor %mm7,%mm7");

for(i = 0; i < sizeof(a)/sizeof(short); i += 4){


__asm__("movq %0,%%mm0\n\t"
"movq %1,%%mm1\n\t"
"pmaddwd %%mm1,%%mm0\n\t"
"paddd %%mm0,%%mm7"
:
: "m" (a[i]), "m" (b[i])
);
}
__asm__("movq %%mm7,%%mm0\n\t"
"psrlq $32,%%mm0\n\t"
"paddd %%mm7,%%mm0\n\t"
"movd %%mm0,%0\n\t" movd moves lower 32bits of mm0
"emms"
:"=m" (result)
);
printf("dotproduction: %d\n", result);
}
Weathercaster
• PCMPEQ (packed compare for
equality) is performed on the
weathercaster and blue-screen
images, yielding a bitmask that
traces the outline of the
weathercaster.
• This bitmask image is PANDNed
(packed and not) with the
weathercaster image, yielding the
first intermediate image: now the
weathercaster has no background
behind her.
• The same bitmask image is
PANDed (packed and) with the
weather map image, yielding the
second intermediate image.
• The two intermediate images are
PORed (packed or) together,
resulting in final composite of the
weathercaster over weather map
.section .rodata
Address or Content?
mybytes:
.byte 'a','b','c','d','e','f','g','h'
mystr:
Content in %eax, %ecx and %edx:
.ascii "abcdefghijklmnopqrstuvwxyz"
.text 0x64636261==“abcd”
.globl main
.type main, @function
main:
pushl %ebp Content in %ebx:
movl %esp, %ebp
movl mybytes, %eax
movl $mybytes, %ebx
Address
movl (mybytes), %edx
movl (%ebx), %edx
xorl %ecx, %ecx
movl $mystr, %ebx Content in %mm0-%mm5:
movq (%ebx,%ecx,8),%mm0
leal mystr, %ebx 0x6867666564636261
movq (%ebx,%ecx,8),%mm1
leal (mystr), %ebx
movq (%ebx,%ecx,8),%mm2 H address L address
movq mystr(,%ecx,8),%mm3
movq mystr,%mm4 “abcdefgh”
movq (mystr),%mm5
subl $8, %esp L address H address
movq %mm0, (%esp)
leave 0x61==97==‘a’
ret
.size main, .-main
Misc
• Context Switching
– FP mode to MMX mode: 28 cycles
– MMX mode to FP mode: 53 cycles
FP_code:
…...
……
MMX_code:
…...
EMMS (*mark the FP tag word as empty*)
FP_code 1:
…...
…...
• Also FNSAVE and FRSTR
MMX Instruction Set
Category Mnemonic Different Opcodes Description
Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]
PADDS[B,W] 2 Add signed with saturation on [byte, word]
PADDUS[B,W] 2 Add unsigned with saturation on [byte, word]
PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword]
PSUBS[B,W] 2 Subtract signed with saturation on [byte, word]
PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word]
PMULHW 1 Packed multiply high on words
PMULLW 1 Packed multiply low on words
PMADDWD 1 Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword]
PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation)
PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with
saturation)
PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from
MMXTM register
PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from
MMX register
Logical PAND 1 Bitwise AND
PANDN 1 Bitwise AND NOT
POR 1 Bitwise OR
PXOR 1 Bitwise XOR
Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount
specified in MMX register or by immediate value
Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX
register
State Mgmt EMMS 1 Empty MMX state

You might also like