Class04 X86assembly
Class04 X86assembly
Machine-Level Programming I:
Topics
Assembly Programmer’s
Execution Model
Accessing Information
Registers
Memory
Arithmetic operations
X86.1.ppt
IA32 Processors
Totally Dominate Computer Market
Evolutionary Design
Starting in 1978 with 8086 (really 1971 with 4004)
Added more features as time goes on
Still support old features, although obsolete
8080 1974 6K
Compatible at source level with 8008
Processor in first “kit” computers
Pricing caused it to beat similar processors with better
programming model
Motorola 6800
MOS Technologies (MOSTEK) 6502
–3– CS 105
X86 Evolution:
Programmer’s View
Name Date Transistors
8086 1978 29K
16-bit processor. Basis for IBM PC & DOS
Limited to 1MB address space. DOS only gives you 640K
–4– CS 105
X86 Evolution:
Programmer’s View
Name Date Transistors
486 1989 1.9M
Pentium 1993 3.1M
Pentium/MMX 1997 4.5M
Added special collection of instructions for operating on 64-
bit vectors of 1-, 2-, or 4-byte integer data
–5– CS 105
X86 Evolution:
Programmer’s View
Name Date Transistors
Pentium III 1999 8.2M
Added “streaming SIMD” instructions for operating on 128-bit
vectors of 1-, 2-, or 4-byte integer or floating point data
–6– CS 105
New Species: IA64
Name Date Transistors
–8– CS 105
Assembly Programmer’s View
CPU Memory
Addresses
Registers Object Code
E Data Program Data
I OS Data
P Condition Instructions
Codes
Stack
Programmer-Visible State
EIP (Program Counter)
Address of next instruction
Register File Memory
Heavily used program data
Byte addressable array
Condition Codes Code, user data, (most) OS data
Store status information about Includes stack used to support
most recent arithmetic operation procedures
Used for conditional branching
–9– CS 105
Turning C into Object Code
Code in files p1.c p2.c
Compile with command: gcc -O p1.c p2.c -o p
Use optimizations (-O)
Put resulting binary in file p
– 11 – CS 105
Assembly Characteristics
Minimal data types
Integer data of 1, 2, or 4 bytes
Data values
Addresses (untyped pointers)
Floating-point data of 4, 8, or 10 bytes
No aggregate types such as arrays or structures
Just contiguously allocated bytes in memory
Primitive operations
Perform arithmetic function on register or memory data
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Transfer control
Unconditional jumps to/from procedures
Conditional branches
– 12 – CS 105
Object Code
Code for sum Assembler
Translates .s into .o
0x401040 <sum>:
Binary encoding of each instruction
0x55
• Total of 13
0x89 Nearly-complete image of executable
bytes
0xe5 code
0x8b • Each
instruction 1, Missing linkages between code in
0x45
2, or 3 bytes different files
0x0c
0x03 • Starts at
0x45
address Linker
0x401040
0x08 Resolves references between files
0x89
Combines with static run-time
0xec
0x5d libraries
0xc3 E.g., code for malloc, printf
Some libraries are dynamically linked
Linking occurs when program begins
execution
– 13 – CS 105
Machine Instruction Example
C Code
int t = x+y; Add two signed integers
Assembly
Add 2 4-byte integers
addl 8(%ebp),%eax
“Long” words in GCC parlance
Object Code
0x401046: 03 45 08 3-byte instruction
Stored at address 0x401046
– 14 – CS 105
Disassembling Object Code
Disassembled
00401040 <_sum>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 0c mov 0xc(%ebp),%eax
6: 03 45 08 add 0x8(%ebp),%eax
9: 89 ec mov %ebp,%esp
b: 5d pop %ebp
c: c3 ret
d: 8d 76 00 lea 0x0(%esi),%esi
Disassembler
objdump -d p
Useful tool for examining object code
Analyzes bit pattern of series of instructions
Produces approximate rendition of assembly code
Can be run on either a.out (complete executable) or .o file
– 15 – CS 105
Alternate Disassembly
Object Disassembled
0x401040: 0x401040 <sum>: push %ebp
0x55 0x401041 <sum+1>: mov %esp,%ebp
0x89 0x401043 <sum+3>: mov 0xc(%ebp),%eax
0xe5 0x401046 <sum+6>: add 0x8(%ebp),%eax
0x8b 0x401049 <sum+9>: mov %ebp,%esp
0x45 0x40104b <sum+11>: pop %ebp
0x0c 0x40104c <sum+12>: ret
0x03 0x40104d <sum+13>: lea 0x0(%esi),%esi
0x45
0x08
0x89 Within gdb Debugger
0xec gdb p
0x5d
0xc3 disassemble sum
Disassemble procedure
x/13b sum
Examine the 13 bytes starting at sum
– 16 – CS 105
What Can Be Disassembled?
% objdump -d WINWORD.EXE
No symbols in "WINWORD.EXE".
Disassembly of section .text:
30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff push $0xffffffff
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91
%edx
Moving Data %ecx
movl Source,Dest: %ebx
Move 4-byte (“long”) word
%esi
Lots of these in typical code
%edi
Operand Types
%esp
Immediate: Constant integer data
Like C constant, but prefixed with ‘$’ %ebp
E.g., $0x400, $-533
Encoded with 1, 2, or 4 bytes
Register: One of 8 integer registers
But %esp and %ebp reserved for special use
Others have special uses for particular instructions
Memory: 4 consecutive bytes of memory
Various “address modes”
– 18 – CS 105
movl Operand Combinations
Source Destination C Analog
– 19 – CS 105
Simple Addressing Modes
Normal (R) Mem[Reg[R]]
Register R specifies memory address
movl (%ecx),%eax
– 20 – CS 105
Using Simple Addressing Modes
swap:
pushl %ebp
movl %esp,%ebp Set
pushl %ebx Up
void swap(int *xp, int *yp)
{ movl 12(%ebp),%ecx
int t0 = *xp; movl 8(%ebp),%edx
int t1 = *yp; movl (%ecx),%eax
*xp = t1; Body
movl (%edx),%ebx
*yp = t0; movl %eax,(%edx)
} movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp Finish
popl %ebp
ret
– 21 – CS 105
Understanding Swap
void swap(int *xp, int *yp) •
{ • Stack
int t0 = *xp; •
Offset
int t1 = *yp;
*xp = t1; 12 yp
*yp = t0; 8 xp
}
4 Rtn adr
0 Old %ebp %ebp
-4 Old %ebx
Register Variable
%ecx yp movl 12(%ebp),%ecx # ecx = yp
%edx xp movl 8(%ebp),%edx # edx = xp
%eax t1 movl (%ecx),%eax # eax = *yp (t1)
%ebx t0 movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 22 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 23 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 24 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 25 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 26 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 123 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 27 – CS 105
Address
Understanding Swap 456 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 123 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 28 – CS 105
Address
Understanding Swap 456 0x124
123 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 123 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 29 – CS 105
Indexed Addressing Modes
Most General Form
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
D: Constant “displacement” 1, 2, or 4 bytes
Rb: Base register: Any of 8 integer registers
Ri: Index register: Any, except for %esp
Unlikely you’d use %ebp, either
S: Scale: 1, 2, 4, or 8
Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
– 30 – CS 105
Address Computation Examples
%edx 0xf000
%ecx 0x100
– 31 – CS 105
Address Computation Instruction
leal Src,Dest
Src is address mode expression
Set Dest to address denoted by expression
Uses
Computing address without doing memory reference
E.g., translation of p = &x[i];
Computing arithmetic expressions of the form x + k*y
k = 1, 2, 4, or 8.
– 32 – CS 105
Some Arithmetic Operations
Format Computation
Two Operand Instructions
addl Src,Dest Dest = Dest + Src
subl Src,Dest Dest = Dest - Src
imull Src,Dest Dest = Dest * Src
sall k,Dest Dest = Dest << k Also called
shll
sarl k,Dest Dest = Dest >> k Arithmetic
shrl k,Dest Dest = Dest >> k Logical
k is an immediate value or contents of %cl
xorl Src,Dest Dest = Dest ^ Src
andl Src,Dest Dest = Dest & Src
orl Src,Dest Dest = Dest | Src
– 33 – CS 105
Some Arithmetic Operations
Format Computation
One Operand Instructions
incl Dest Dest = Dest + 1
decl Dest Dest = Dest - 1
negl Dest Dest = -Dest
notl Dest Dest = ~Dest
– 34 – CS 105
Using leal for
Arithmetic Expressions
arith:
pushl %ebp Set
int arith movl %esp,%ebp Up
(int x, int y, int z)
{ movl 8(%ebp),%eax
int t1 = x+y; movl 12(%ebp),%edx
int t2 = z+t1; leal (%edx,%eax),%ecx
int t3 = x+4; leal (%edx,%edx,2),%edx
sall $4,%edx Body
int t4 = y * 48;
int t5 = t3 + t4; addl 16(%ebp),%ecx
int rval = t2 * t5; leal 4(%edx,%eax),%eax
return rval; imull %ecx,%eax
}
movl %ebp,%esp
popl %ebp Finish
ret
– 35 – CS 105
Understanding arith
int arith •
(int x, int y, int z) • Stack
{ Offset •
int t1 = x+y;
int t2 = z+t1; 16 z
int t3 = x+4; 12 y
int t4 = y * 48;
int t5 = t3 + t4; 8 x
int rval = t2 * t5; 4 Rtn adr
return rval; %ebp
0 Old %ebp
}
– 37 – CS 105
Another Example
logical:
int logical(int x, int y) pushl %ebp Set
{ movl %esp,%ebp Up
int t1 = x^y;
int t2 = t1 >> 17; movl 8(%ebp),%eax
int mask = (1<<13) - 7; xorl 12(%ebp),%eax
int rval = t2 & mask; sarl $17,%eax
return rval; andl $8185,%eax
} Body
movl %ebp,%esp
popl %ebp Finish
213 = 8192, 213 – 7 = 8185 ret
– 38 – CS 105
CISC Properties
Instruction can reference different operand types
Immediate, register, memory
– 39 – CS 105
Summary: Abstract Machines
Assembly
1) byte 3) branch/jump
2) 2-byte word 4) call
mem regs alu 3) 4-byte long word 5) ret
Cond. 4) contiguous byte allocation
Stack processor 5) address of initial byte
Codes
– 40 – CS 105
Pentium Pro (P6)
History
Announced in Feb. ‘95
Basis for Pentium II, Pentium III, and Celeron processors
Pentium 4 similar idea, but different details
Features
Dynamically translates instructions to more regular format
Very wide, but simple instructions
Executes operations in parallel
Up to 5 at once
Very deep pipeline
12–18 cycle latency
– 41 – CS 105
PentiumPro Block Diagram
Microprocessor Report
2/16/95
PentiumPro Operation
Translates instructions dynamically into “Uops”
118 bits wide
Holds operation, two sources, and destination
Consequences
Indirect relationship between IA32 code & what actually gets
executed
Tricky to predict / optimize performance at assembly level
– 43 – CS 105
Whose Assembler?
Intel/Microsoft Format GAS/Gnu Format
lea eax,[ecx+ecx*2] leal (%ecx,%ecx,2),%eax
sub esp,8 subl $8,%esp
cmp dword ptr [ebp-8],0 cmpl $0,-8(%ebp)
mov eax,dword ptr [eax*4+100h] movl $0x100(,%eax,4),%eax