AT&T Syntax MMX

The document provides information about MMX instructions including: 1. It discusses AT&T syntax vs Intel syntax for MMX instructions and defines the order of operations. 2. It describes various MMX instructions categorized as arithmetic, comparison, conversion, logical, shift, data transfer, and state management. For each category it lists the mnemonics and number of opcodes and provides a brief description. 3. It provides examples of using specific MMX instructions like PMADDWD for dot product calculation and bitmasking for compositing images.

Uploaded by

akirank1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

436 views17 pages

AT&T Syntax MMX

Uploaded by

akirank1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

CS220

April 25, 2007

AT&T syntax MMX
• Most MMX documents are in Intel Syntax
OPERATION DEST, SRC
• We use AT&T Syntax
OPERATION SRC, DEST
• Always remember:
DEST = DEST OPERATION SRC
(Please note the weird subtraction and division operation
direction in FP was a mistake of gcc)
Multiplication
• Except for multiplication, conversion, and
comparison, all other MMX instructions are
straightforward.
• PMADDWD mm/m64, mm
• PMULHW mm/m64, mm

Doubleword->word, keep high part

• PMULLW mm/m64, mm

Doubleword->word, keep low part

Conversion
• PACKSSDW mm/m64, mm
• PACKUSDW mm/m64, mm
doubleword->word

• PACKUSWB mm/m64, mm
word->byte
How to do interleave pack?
• PACKSSDW %mm0, %mm0
• PACKSSDW %mm1, %mm1
• PUNPKLWD %mm1, %mm0
(interleave the low end 16-bit values of the
operands)
• PUNPCKHBW mm/m64, mm

Low parts of original 64 bits are ignored

byte_src+byte_dst=word_dst

• PUNPCKLBW mm/m64/m32, mm

High parts of original 64 bits are ignored

byte_src+byte_dst=word_dst
How to do non-interleaved unpack?
• MOVQ %mm0, %mm2
• PUNPCKLDQ %mm1, %mm0
(replace the two high end words
of mm0 with the two low end
words of mm1 leave the two mm0
low end words of mm0 in place)
• PUNPCKHDQ %mm1, %mm2
(move the two high end words of
mm2 to the two low end words
of mm2; place the two high end
words of mm1 in the two high mm2
end words of mm2)
• PCMPEQW mm/m64, mm

• PCMPGTW mm/m64, mm
Rule of Thumb
• Only Shift instructions can have immediate
number
• Only movd instruction can have 32-bit
register
• Punpckl can have 32-bit memory source
• All other instructions deal with 64-bit
registers or memory. No immediate
number!
Constant numbers
• Generate a zero in mm0:
PXOR %mm0, %mm0 PANDN %mm0, %mm0

• Generate all 1's in register mm1, which is -1 in each of the packed data type fields:
PCMPEQ %mm1, %mm1

• Generate the constant 1 in every packed-byte [or packed-word] (or packed-dword)

field:
PXOR %mm0, %mm0
PCMPEQ %mm1, %mm1
PSUBB %mm1, %mm0 [PSUBW %mm1, %mm0] (PSUBD %mm1, %mm0)

• Generate the signed constant 2n -1 in every packed-word (or packed-dword) field:

PCMPEQ %mm1, %mm1
PSRLW $(16-n), %mm1 (PSRLD $(32-n), %mm1)

• Generate the signed constant -2n in every packed-word (or packed-dword) field:
PCMPEQ %mm1, %mm1
PSLLW $n, %mm1 (PSLLD $n, %mm1)
Examples
• absolute value of a vector of signed words
movq %mm0, %mm1 #make a copy of source data
psraw $15, %mm0 #replicate sign bit
PXOR/XOR a number with all 0s, get itself
pxor %mm0, %mm1 # PXOR/XOR a number with all 1s, get NOT(itself)
psubs %mm0, %mm1 #add 1 to just the negative fields
The data in %mm0 are all 0’s and all 1’s
For positive number, it subtracts 0’s(0)
For negative number, it subtracts 1’s(-1)
Dot Production
#include<stdio.h>
main()
{
int i;
int result;
unsigned short a[] = {1, 2, 3, 4, 5, 6, 7, 8};
unsigned short b[] = {2, 4, 6, 8, 10, 12, 14, 16};

__asm__("pxor %mm7,%mm7");

for(i = 0; i < sizeof(a)/sizeof(short); i += 4){

__asm__("movq %0,%%mm0\n\t"
"movq %1,%%mm1\n\t"
"pmaddwd %%mm1,%%mm0\n\t"
"paddd %%mm0,%%mm7"
:
: "m" (a[i]), "m" (b[i])
);
}
__asm__("movq %%mm7,%%mm0\n\t"
"psrlq $32,%%mm0\n\t"
"paddd %%mm7,%%mm0\n\t"
"movd %%mm0,%0\n\t" movd moves lower 32bits of mm0
"emms"
:"=m" (result)
);
printf("dotproduction: %d\n", result);
}
Weathercaster
• PCMPEQ (packed compare for
equality) is performed on the
weathercaster and blue-screen
images, yielding a bitmask that
traces the outline of the
weathercaster.
• This bitmask image is PANDNed
(packed and not) with the
weathercaster image, yielding the
first intermediate image: now the
weathercaster has no background
behind her.
• The same bitmask image is
PANDed (packed and) with the
weather map image, yielding the
second intermediate image.
• The two intermediate images are
PORed (packed or) together,
resulting in final composite of the
weathercaster over weather map
.section .rodata
Address or Content?
mybytes:
.byte 'a','b','c','d','e','f','g','h'
mystr:
Content in %eax, %ecx and %edx:
.ascii "abcdefghijklmnopqrstuvwxyz"
.text 0x64636261==“abcd”
.globl main
.type main, @function
main:
pushl %ebp Content in %ebx:
movl %esp, %ebp
movl mybytes, %eax
movl $mybytes, %ebx
Address
movl (mybytes), %edx
movl (%ebx), %edx
xorl %ecx, %ecx
movl $mystr, %ebx Content in %mm0-%mm5:
movq (%ebx,%ecx,8),%mm0
leal mystr, %ebx 0x6867666564636261
movq (%ebx,%ecx,8),%mm1
leal (mystr), %ebx
movq (%ebx,%ecx,8),%mm2 H address L address
movq mystr(,%ecx,8),%mm3
movq mystr,%mm4 “abcdefgh”
movq (mystr),%mm5
subl $8, %esp L address H address
movq %mm0, (%esp)
leave 0x61==97==‘a’
ret
.size main, .-main
Misc
• Context Switching
– FP mode to MMX mode: 28 cycles
– MMX mode to FP mode: 53 cycles
FP_code:
…...
……
MMX_code:
…...
EMMS (*mark the FP tag word as empty*)
FP_code 1:
…...
…...
• Also FNSAVE and FRSTR
MMX Instruction Set
Category Mnemonic Different Opcodes Description
Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]
PADDS[B,W] 2 Add signed with saturation on [byte, word]
PADDUS[B,W] 2 Add unsigned with saturation on [byte, word]
PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword]
PSUBS[B,W] 2 Subtract signed with saturation on [byte, word]
PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word]
PMULHW 1 Packed multiply high on words
PMULLW 1 Packed multiply low on words
PMADDWD 1 Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword]
PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation)
PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with
saturation)
PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from
MMXTM register
PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from
MMX register
Logical PAND 1 Bitwise AND
PANDN 1 Bitwise AND NOT
POR 1 Bitwise OR
PXOR 1 Bitwise XOR
Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount
specified in MMX register or by immediate value
Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX
register
State Mgmt EMMS 1 Empty MMX state

Class 32: Outline
100% (1)
Class 32: Outline
36 pages
Peer To Peer File Sharing
No ratings yet
Peer To Peer File Sharing
64 pages
Application of Computer in Garments Industry
100% (2)
Application of Computer in Garments Industry
15 pages
Big Data Black Book PDF
15% (20)
Big Data Black Book PDF
2 pages
SAS IT Theory PC-3 PDF
100% (1)
SAS IT Theory PC-3 PDF
18 pages
MIPS Instruction Reference: NPC To PC
No ratings yet
MIPS Instruction Reference: NPC To PC
9 pages
Fundamentals of Wireless Module 1 Answers
100% (1)
Fundamentals of Wireless Module 1 Answers
4 pages
Intel Developer Guide
No ratings yet
Intel Developer Guide
754 pages
CH 2 1
No ratings yet
CH 2 1
49 pages
Web App Success
No ratings yet
Web App Success
369 pages
SDM Vol 2b
No ratings yet
SDM Vol 2b
744 pages
Class 18: Outline: Hour 1: Levitation Experiment 8: Magnetic Forces Hour 2: Ampere's Law
No ratings yet
Class 18: Outline: Hour 1: Levitation Experiment 8: Magnetic Forces Hour 2: Ampere's Law
49 pages
Class 36: Outline: Yell If You Have Any Questions
No ratings yet
Class 36: Outline: Yell If You Have Any Questions
46 pages
ISO 19650 Workflow With Free ISO 19650 Templates
No ratings yet
ISO 19650 Workflow With Free ISO 19650 Templates
1 page
Intel SIMD Architecture: Computer Organization and Assembly Languages Yung-Yu Chuang
No ratings yet
Intel SIMD Architecture: Computer Organization and Assembly Languages Yung-Yu Chuang
80 pages
Unit I
No ratings yet
Unit I
131 pages
Class 24: Outline: Hour 1: Inductance & LR Circuits Hour 2: Energy in Inductors
No ratings yet
Class 24: Outline: Hour 1: Inductance & LR Circuits Hour 2: Energy in Inductors
37 pages
Click To Edit Master Subtitle Style
No ratings yet
Click To Edit Master Subtitle Style
24 pages
Class 28: Outline: Hour 1: Displacement Current Maxwell's Equations Hour 2: Electromagnetic Waves
No ratings yet
Class 28: Outline: Hour 1: Displacement Current Maxwell's Equations Hour 2: Electromagnetic Waves
33 pages
Class 31: Outline: Hour 1: Concept Review / Overview PRS Questions - Possible Exam Questions Hour 2
No ratings yet
Class 31: Outline: Hour 1: Concept Review / Overview PRS Questions - Possible Exam Questions Hour 2
46 pages
1COAL Lecture 13
No ratings yet
1COAL Lecture 13
15 pages
Lecture 23: Outline: Yell If You Have Any Questions
No ratings yet
Lecture 23: Outline: Yell If You Have Any Questions
43 pages
Class 20: Outline: Hour 1: Faraday's Law
No ratings yet
Class 20: Outline: Hour 1: Faraday's Law
42 pages
Look at The 2 Top Vote Getters (Tied For First Place!) On The Handout Sheet and Vote For The One You Find Most Appealing or Striking. 1. 2
No ratings yet
Look at The 2 Top Vote Getters (Tied For First Place!) On The Handout Sheet and Vote For The One You Find Most Appealing or Striking. 1. 2
39 pages
Class 33: Outline: Hour 1: Interference
No ratings yet
Class 33: Outline: Hour 1: Interference
38 pages
CMP 207 Memory Allocation
No ratings yet
CMP 207 Memory Allocation
30 pages
Class 15: Outline: Hour 1: Magnetic Force Expt. 6: Magnetic Force
No ratings yet
Class 15: Outline: Hour 1: Magnetic Force Expt. 6: Magnetic Force
33 pages
Class 14: Outline: Hour 1: Magnetic Fields Expt. 5: Magnetic Fields
No ratings yet
Class 14: Outline: Hour 1: Magnetic Fields Expt. 5: Magnetic Fields
31 pages
8086 Instruction Set
No ratings yet
8086 Instruction Set
54 pages
Class 30: Outline: Hour 1: Traveling & Standing Waves
No ratings yet
Class 30: Outline: Hour 1: Traveling & Standing Waves
29 pages
Class 17: Outline: Hour 1: Dipoles & Magnetic Fields
No ratings yet
Class 17: Outline: Hour 1: Dipoles & Magnetic Fields
26 pages
Xad
No ratings yet
Xad
17 pages
Alok Kumar Singh: A-407 Gangavadi Co - Ho. Society Gopal Bhawan L.B.S Marg Ghatkopar Mumbai - 400086
No ratings yet
Alok Kumar Singh: A-407 Gangavadi Co - Ho. Society Gopal Bhawan L.B.S Marg Ghatkopar Mumbai - 400086
3 pages
8255 Interfacong
No ratings yet
8255 Interfacong
23 pages
Lec17 x86SIMD PDF
No ratings yet
Lec17 x86SIMD PDF
80 pages
Module II Final 04082017
No ratings yet
Module II Final 04082017
446 pages
Lec15 x86SIMD
No ratings yet
Lec15 x86SIMD
74 pages
Lec15 x86SIMD
No ratings yet
Lec15 x86SIMD
74 pages
Java Advanced OOP
100% (1)
Java Advanced OOP
0 pages
Sample Thesis Title Industrial Engineering
100% (4)
Sample Thesis Title Industrial Engineering
5 pages
680x0 Assembler
No ratings yet
680x0 Assembler
47 pages
Megger Mom2 Low Resistance Micro Ohmmeter Product Manual
No ratings yet
Megger Mom2 Low Resistance Micro Ohmmeter Product Manual
48 pages
Intel X86 and Arm Data Types
No ratings yet
Intel X86 and Arm Data Types
20 pages
Week 6&7 8086 Microprocessor
No ratings yet
Week 6&7 8086 Microprocessor
30 pages
x86 Instructions - Windows Drivers - Microsoft Learn
No ratings yet
x86 Instructions - Windows Drivers - Microsoft Learn
14 pages
MIPS Instruction Reference-1
No ratings yet
MIPS Instruction Reference-1
11 pages
Prs w01d1 Qonly
No ratings yet
Prs w01d1 Qonly
9 pages
Module 2.3
No ratings yet
Module 2.3
142 pages
Experiment 6: Prediction 1
No ratings yet
Experiment 6: Prediction 1
8 pages
Prs w03d2 Qonly
No ratings yet
Prs w03d2 Qonly
8 pages
MP Unit2
No ratings yet
MP Unit2
106 pages
LexisNexis® 1976 Copyright Act
No ratings yet
LexisNexis® 1976 Copyright Act
1 page
Practice Right Hand Rule #1
No ratings yet
Practice Right Hand Rule #1
4 pages
MMX Present
No ratings yet
MMX Present
17 pages
Instruction Set
No ratings yet
Instruction Set
60 pages
SQL Assignment 1
No ratings yet
SQL Assignment 1
15 pages
Coherent, Monochromatic Plane Waves
No ratings yet
Coherent, Monochromatic Plane Waves
6 pages
Prs w02d1 Qonly
No ratings yet
Prs w02d1 Qonly
6 pages
Instruction Set of 8086
No ratings yet
Instruction Set of 8086
52 pages
Directivas Mpasm
No ratings yet
Directivas Mpasm
16 pages
Prs w07d1 Qonly
No ratings yet
Prs w07d1 Qonly
5 pages
Resistance: L and Cross Sectional Area A, The
No ratings yet
Resistance: L and Cross Sectional Area A, The
5 pages
Instruction Set
No ratings yet
Instruction Set
69 pages
5th 11th Lecture Instruction Set of 8086
No ratings yet
5th 11th Lecture Instruction Set of 8086
130 pages
Prs w09d1 Qonly
No ratings yet
Prs w09d1 Qonly
4 pages
Prs w14d1 Qonly
No ratings yet
Prs w14d1 Qonly
4 pages
Prs w03d1 Qonly
No ratings yet
Prs w03d1 Qonly
4 pages
Instruction Set
No ratings yet
Instruction Set
161 pages
Assembly Instructions
No ratings yet
Assembly Instructions
4 pages
It2623 Module6
No ratings yet
It2623 Module6
45 pages
Basic Instructions
No ratings yet
Basic Instructions
24 pages
Elegance CV Template
No ratings yet
Elegance CV Template
3 pages
00012-20040831 Skylink Federal Circuit Opinion
No ratings yet
00012-20040831 Skylink Federal Circuit Opinion
45 pages
IA32 Instruction Set (Short Form)
No ratings yet
IA32 Instruction Set (Short Form)
79 pages
MMX Technology For Pentium
No ratings yet
MMX Technology For Pentium
13 pages
Core Cminstr
No ratings yet
Core Cminstr
13 pages
Prs w05d1 Qonly
No ratings yet
Prs w05d1 Qonly
3 pages
Instructions and Data: Datorteknik, Eitf70, Per Andersson
No ratings yet
Instructions and Data: Datorteknik, Eitf70, Per Andersson
17 pages
ESD-CortexM3 Data Processing Instruction
No ratings yet
ESD-CortexM3 Data Processing Instruction
22 pages
Ancillary Services Additional Baggage Web Services Quick Card en 2017 08 17573313 en US
No ratings yet
Ancillary Services Additional Baggage Web Services Quick Card en 2017 08 17573313 en US
4 pages
8086 Instructions 1
No ratings yet
8086 Instructions 1
38 pages
Programming With SIMD-instructions
No ratings yet
Programming With SIMD-instructions
10 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
2.1 2.2 8086 Addressing Modes and Instruction Set
No ratings yet
2.1 2.2 8086 Addressing Modes and Instruction Set
55 pages
Prs w13d1 Qonly
No ratings yet
Prs w13d1 Qonly
2 pages
Set de Instrucciones DLX
No ratings yet
Set de Instrucciones DLX
2 pages
MIPS Instruction Reference: ADD - Add (With Overflow)
No ratings yet
MIPS Instruction Reference: ADD - Add (With Overflow)
9 pages
General S3 FAQs
No ratings yet
General S3 FAQs
45 pages
Form Uly
No ratings yet
Form Uly
8 pages
Part 2
No ratings yet
Part 2
125 pages
Digital and Microprocessor Techniques V10
From Everand
Digital and Microprocessor Techniques V10
Clive W. Humphris
No ratings yet
IBM AppScan and IBM SiteProtector Integration Demo PDF
No ratings yet
IBM AppScan and IBM SiteProtector Integration Demo PDF
99 pages
MIPS Instruction Reference
No ratings yet
MIPS Instruction Reference
9 pages
HP MSM Wireless Deployment - Step by Step Walkthrough - Source1ne
No ratings yet
HP MSM Wireless Deployment - Step by Step Walkthrough - Source1ne
23 pages
Table 1a: The Complete MSP430 Instruction Set of 27 Core Instructions
No ratings yet
Table 1a: The Complete MSP430 Instruction Set of 27 Core Instructions
9 pages
Huzzaz
No ratings yet
Huzzaz
24 pages
Mnemonic Segment For Memory Access Symbolic Representation Description
No ratings yet
Mnemonic Segment For Memory Access Symbolic Representation Description
37 pages
Unit II: Instruction Set and Addressing Modes
No ratings yet
Unit II: Instruction Set and Addressing Modes
53 pages
Instructions
No ratings yet
Instructions
3 pages
MMX Notes
No ratings yet
MMX Notes
2 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Chandigarh University: University Institute of Engineering
No ratings yet
Chandigarh University: University Institute of Engineering
50 pages
Vector Floating Point Instruction Set Quick Reference Card: Key To Tables
No ratings yet
Vector Floating Point Instruction Set Quick Reference Card: Key To Tables
3 pages
Adf Shuttle
No ratings yet
Adf Shuttle
12 pages
A.1.3 Usage of Identifiers: A.2 HI1: Interface Port For Administrative State
No ratings yet
A.1.3 Usage of Identifiers: A.2 HI1: Interface Port For Administrative State
27 pages
Integers Floating Point: N N S E
No ratings yet
Integers Floating Point: N N S E
4 pages
QRC0001H RVCT v2.1 Thumb
No ratings yet
QRC0001H RVCT v2.1 Thumb
4 pages
Report On Network Analysis in Dhulikhel Municipality - Group E
No ratings yet
Report On Network Analysis in Dhulikhel Municipality - Group E
14 pages
GSM Guid
No ratings yet
GSM Guid
5 pages
ARM Instruction Set Quick Reference Card: Key To Tables
No ratings yet
ARM Instruction Set Quick Reference Card: Key To Tables
3 pages
Documentation For This Question. Assumptions Can Be Made Wherever Necessary
No ratings yet
Documentation For This Question. Assumptions Can Be Made Wherever Necessary
9 pages
Overcoming LLM Challenges Using RAG-Driven Precision in Coffee Leaf Disease Remediation
No ratings yet
Overcoming LLM Challenges Using RAG-Driven Precision in Coffee Leaf Disease Remediation
6 pages
Masukkan CD Coreldraw 12 Ke Dalam CD Rom 2. Klik Kanan Start - Explore
No ratings yet
Masukkan CD Coreldraw 12 Ke Dalam CD Rom 2. Klik Kanan Start - Explore
4 pages
Business Radio Solutions EUR
No ratings yet
Business Radio Solutions EUR
28 pages
Growth Strategy For Digital Champion Program
No ratings yet
Growth Strategy For Digital Champion Program
3 pages
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
From Everand
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
Barron's Educational Series
No ratings yet
Vector Graphics Editor: Empowering Visual Creation with Advanced Algorithms
From Everand
Vector Graphics Editor: Empowering Visual Creation with Advanced Algorithms
Fouad Sabry
No ratings yet
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
From Everand
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
Fouad Sabry
No ratings yet
Campbell
No ratings yet
Campbell
32 pages
IC Technical Requirements Document Template
No ratings yet
IC Technical Requirements Document Template
7 pages

AT&T Syntax MMX

Uploaded by

AT&T Syntax MMX

Uploaded by

CS220

April 25, 2007

Doubleword->word, keep high part

Doubleword->word, keep low part

Low parts of original 64 bits are ignored

High parts of original 64 bits are ignored

• Generate the constant 1 in every packed-byte [or packed-word] (or packed-dword)

• Generate the signed constant 2n -1 in every packed-word (or packed-dword) field:

for(i = 0; i < sizeof(a)/sizeof(short); i += 4){

You might also like