0% found this document useful (0 votes)
21 views44 pages

Class04 X86assembly

Uploaded by

Jemima A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views44 pages

Class04 X86assembly

Uploaded by

Jemima A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

CS 105

“Tour of the Black Holes of


Computing”

Machine-Level Programming I:

Topics
 Assembly Programmer’s
Execution Model
 Accessing Information
 Registers
 Memory
 Arithmetic operations

X86.1.ppt
IA32 Processors
Totally Dominate Computer Market

Evolutionary Design
 Starting in 1978 with 8086 (really 1971 with 4004)
 Added more features as time goes on
 Still support old features, although obsolete

Complex Instruction Set Computer (CISC)


 Many different instructions with many different formats
 But, only small subset encountered with Linux programs
 Hard to match performance of Reduced Instruction Set
Computers (RISC)
 But, Intel has done just that!
–2– CS 105
X86 Evolution:
Programmer’s View
Name Date Transistors
4004 1971 2.3K
 4-bit processor. First 1-chip microprocessor
 Didn’t even have interrupts!

8008 1972 3.3K


 Like 4004, but with 8-bit ALU

8080 1974 6K
 Compatible at source level with 8008
 Processor in first “kit” computers
 Pricing caused it to beat similar processors with better
programming model
Motorola 6800
MOS Technologies (MOSTEK) 6502
–3– CS 105
X86 Evolution:
Programmer’s View
Name Date Transistors
8086 1978 29K
 16-bit processor. Basis for IBM PC & DOS
 Limited to 1MB address space. DOS only gives you 640K

80286 1982 134K


 Added elaborate, but not very useful, addressing scheme
 Basis for IBM PC-AT and Windows

386 1985 275K


 Extended to 32 bits. Added “flat addressing”
 Capable of running Unix
 By default, Linux/gcc use no instructions introduced in later
models

–4– CS 105
X86 Evolution:
Programmer’s View
Name Date Transistors
486 1989 1.9M
Pentium 1993 3.1M
Pentium/MMX 1997 4.5M
 Added special collection of instructions for operating on 64-
bit vectors of 1-, 2-, or 4-byte integer data

PentiumPro 1995 6.5M


 Added conditional move instructions
 Big change in underlying microarchitecture

–5– CS 105
X86 Evolution:
Programmer’s View
Name Date Transistors
Pentium III 1999 8.2M
 Added “streaming SIMD” instructions for operating on 128-bit
vectors of 1-, 2-, or 4-byte integer or floating point data

Pentium 4 2001 42M


 Added 8-byte formats and 144 new instructions for streaming
SIMD mode

–6– CS 105
New Species: IA64
Name Date Transistors

Itanium 2001 10M


 Extends to IA64, a 64-bit architecture
 Radically new instruction set “designed for high
performance”
 Able to run existing IA32 programs
On-board “x86 engine”
 Joint project with Hewlett-Packard
 Compiler-writer’s nightmare

Itanium 2 2002 221M


 Big performance boost
 Hasn’t sold well
–7– CS 105
X86 Evolution: Clones
Advanced Micro Devices (AMD)
 Historically
AMD has followed just behind Intel
A little bit slower, a lot cheaper
 Recently
Recruited top circuit designers from Digital Equipment Corp.
Exploited fact that Intel distracted by IA64
Now are close competitors to Intel
 Developed own extension to 64 bits
 Intel adopted after IA64 bombed

–8– CS 105
Assembly Programmer’s View
CPU Memory
Addresses
Registers Object Code
E Data Program Data
I OS Data
P Condition Instructions
Codes

Stack
Programmer-Visible State
 EIP (Program Counter)
Address of next instruction
 Register File  Memory
Heavily used program data
 Byte addressable array
 Condition Codes  Code, user data, (most) OS data
Store status information about  Includes stack used to support
most recent arithmetic operation procedures
Used for conditional branching
–9– CS 105
Turning C into Object Code
 Code in files p1.c p2.c
 Compile with command: gcc -O p1.c p2.c -o p
Use optimizations (-O)
Put resulting binary in file p

text C program (p1.c p2.c)

Compiler (gcc -S)

Asm program (p1.s p2.s)


text

Assembler (gcc or as)

binary Object program (p1.o p2.o) Static libraries


(.a)
Linker (gcc or ld)

binary Executable program (p)


– 10 – CS 105
Compiling Into Assembly
C Code Generated Assembly
int sum(int x, int y) _sum:
{ pushl %ebp
int t = x+y; movl %esp,%ebp
return t; movl 12(%ebp),%eax
} addl 8(%ebp),%eax
movl %ebp,%esp
popl %ebp
ret

Obtain with command


gcc -O -S code.c
Produces file code.s

– 11 – CS 105
Assembly Characteristics
Minimal data types
 Integer data of 1, 2, or 4 bytes
 Data values
 Addresses (untyped pointers)
 Floating-point data of 4, 8, or 10 bytes
 No aggregate types such as arrays or structures
 Just contiguously allocated bytes in memory

Primitive operations
 Perform arithmetic function on register or memory data
 Transfer data between memory and register
 Load data from memory into register
 Store register data into memory
 Transfer control
 Unconditional jumps to/from procedures
 Conditional branches
– 12 – CS 105
Object Code
Code for sum Assembler
 Translates .s into .o
0x401040 <sum>:
 Binary encoding of each instruction
0x55
• Total of 13
0x89  Nearly-complete image of executable
bytes
0xe5 code
0x8b • Each
instruction 1,  Missing linkages between code in
0x45
2, or 3 bytes different files
0x0c
0x03 • Starts at
0x45
address Linker
0x401040
0x08  Resolves references between files
0x89
 Combines with static run-time
0xec
0x5d libraries
0xc3  E.g., code for malloc, printf
 Some libraries are dynamically linked
 Linking occurs when program begins
execution
– 13 – CS 105
Machine Instruction Example
C Code
int t = x+y;  Add two signed integers

Assembly
 Add 2 4-byte integers
addl 8(%ebp),%eax
“Long” words in GCC parlance

Similar to Same instruction whether signed


expression or unsigned
y += x  Operands:
y: Register %eax
x: Memory M[%ebp+8]
t: Register %eax
» Return function value in %eax

Object Code
0x401046: 03 45 08  3-byte instruction
 Stored at address 0x401046
– 14 – CS 105
Disassembling Object Code
Disassembled
00401040 <_sum>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 0c mov 0xc(%ebp),%eax
6: 03 45 08 add 0x8(%ebp),%eax
9: 89 ec mov %ebp,%esp
b: 5d pop %ebp
c: c3 ret
d: 8d 76 00 lea 0x0(%esi),%esi

Disassembler
objdump -d p
 Useful tool for examining object code
 Analyzes bit pattern of series of instructions
 Produces approximate rendition of assembly code
 Can be run on either a.out (complete executable) or .o file
– 15 – CS 105
Alternate Disassembly
Object Disassembled
0x401040: 0x401040 <sum>: push %ebp
0x55 0x401041 <sum+1>: mov %esp,%ebp
0x89 0x401043 <sum+3>: mov 0xc(%ebp),%eax
0xe5 0x401046 <sum+6>: add 0x8(%ebp),%eax
0x8b 0x401049 <sum+9>: mov %ebp,%esp
0x45 0x40104b <sum+11>: pop %ebp
0x0c 0x40104c <sum+12>: ret
0x03 0x40104d <sum+13>: lea 0x0(%esi),%esi
0x45
0x08
0x89 Within gdb Debugger
0xec gdb p
0x5d
0xc3 disassemble sum
 Disassemble procedure
x/13b sum
 Examine the 13 bytes starting at sum
– 16 – CS 105
What Can Be Disassembled?
% objdump -d WINWORD.EXE

WINWORD.EXE: file format pei-i386

No symbols in "WINWORD.EXE".
Disassembly of section .text:

30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff push $0xffffffff
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91

 Anything that can be interpreted as executable code


 Disassembler examines bytes and reconstructs assembly
source
– 17 – CS 105
Moving Data %eax

%edx
Moving Data %ecx
movl Source,Dest: %ebx
 Move 4-byte (“long”) word
%esi
 Lots of these in typical code
%edi
Operand Types
%esp
 Immediate: Constant integer data
 Like C constant, but prefixed with ‘$’ %ebp
 E.g., $0x400, $-533
 Encoded with 1, 2, or 4 bytes
 Register: One of 8 integer registers
 But %esp and %ebp reserved for special use
 Others have special uses for particular instructions
 Memory: 4 consecutive bytes of memory
 Various “address modes”
– 18 – CS 105
movl Operand Combinations
Source Destination C Analog

Reg movl $0x4,%eax temp = 0x4;


Imm
Mem movl $-147,(%eax) *p = -147;

Reg movl %eax,%edx temp2 = temp1;


movl Reg
Mem movl %eax,(%edx) *p = temp;

Mem Reg movl (%eax),%edx temp = *p;

 Cannot do memory-memory transfers with single


instruction

– 19 – CS 105
Simple Addressing Modes
Normal (R) Mem[Reg[R]]
 Register R specifies memory address
movl (%ecx),%eax

Displacement D(R) Mem[Reg[R]+D]


 Register R specifies start of memory region
 Constant displacement D specifies offset
movl 8(%ebp),%edx

– 20 – CS 105
Using Simple Addressing Modes
swap:
pushl %ebp
movl %esp,%ebp Set
pushl %ebx Up
void swap(int *xp, int *yp)
{ movl 12(%ebp),%ecx
int t0 = *xp; movl 8(%ebp),%edx
int t1 = *yp; movl (%ecx),%eax
*xp = t1; Body
movl (%edx),%ebx
*yp = t0; movl %eax,(%edx)
} movl %ebx,(%ecx)

movl -4(%ebp),%ebx
movl %ebp,%esp Finish
popl %ebp
ret

– 21 – CS 105
Understanding Swap
void swap(int *xp, int *yp) •
{ • Stack
int t0 = *xp; •
Offset
int t1 = *yp;
*xp = t1; 12 yp
*yp = t0; 8 xp
}
4 Rtn adr
0 Old %ebp %ebp
-4 Old %ebx
Register Variable
%ecx yp movl 12(%ebp),%ecx # ecx = yp
%edx xp movl 8(%ebp),%edx # edx = xp
%eax t1 movl (%ecx),%eax # eax = *yp (t1)
%ebx t0 movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 22 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 23 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 24 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 25 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 26 – CS 105
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 123 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 27 – CS 105
Address
Understanding Swap 456 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 123 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 28 – CS 105
Address
Understanding Swap 456 0x124
123 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124
%ebx 123 0x10c
4 Rtn adr 0x108
%esi
%ebp 0 0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
– 29 – CS 105
Indexed Addressing Modes
Most General Form
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
 D: Constant “displacement” 1, 2, or 4 bytes
 Rb: Base register: Any of 8 integer registers
 Ri: Index register: Any, except for %esp
Unlikely you’d use %ebp, either
 S: Scale: 1, 2, 4, or 8

Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

– 30 – CS 105
Address Computation Examples

%edx 0xf000

%ecx 0x100

Expression Computation Address


0x8(%edx) 0xf000 + 0x8 0xf008
(%edx,%ecx) 0xf000 + 0x100 0xf100
(%edx,%ecx,4) 0xf000 + 4*0x100 0xf400
0x80(,%edx,2) 2*0xf000 + 0x80 0x1e080

– 31 – CS 105
Address Computation Instruction
leal Src,Dest
 Src is address mode expression
 Set Dest to address denoted by expression

Uses
 Computing address without doing memory reference
 E.g., translation of p = &x[i];
 Computing arithmetic expressions of the form x + k*y
 k = 1, 2, 4, or 8.

LEARN THIS INSTRUCTION!!!


 Used heavily by compiler
 Appears regularly on exams

– 32 – CS 105
Some Arithmetic Operations
Format Computation
Two Operand Instructions
addl Src,Dest Dest = Dest + Src
subl Src,Dest Dest = Dest - Src
imull Src,Dest Dest = Dest * Src
sall k,Dest Dest = Dest << k Also called
shll
sarl k,Dest Dest = Dest >> k Arithmetic
shrl k,Dest Dest = Dest >> k Logical
k is an immediate value or contents of %cl
xorl Src,Dest Dest = Dest ^ Src
andl Src,Dest Dest = Dest & Src
orl Src,Dest Dest = Dest | Src
– 33 – CS 105
Some Arithmetic Operations
Format Computation
One Operand Instructions
incl Dest Dest = Dest + 1
decl Dest Dest = Dest - 1
negl Dest Dest = -Dest
notl Dest Dest = ~Dest

– 34 – CS 105
Using leal for
Arithmetic Expressions
arith:
pushl %ebp Set
int arith movl %esp,%ebp Up
(int x, int y, int z)
{ movl 8(%ebp),%eax
int t1 = x+y; movl 12(%ebp),%edx
int t2 = z+t1; leal (%edx,%eax),%ecx
int t3 = x+4; leal (%edx,%edx,2),%edx
sall $4,%edx Body
int t4 = y * 48;
int t5 = t3 + t4; addl 16(%ebp),%ecx
int rval = t2 * t5; leal 4(%edx,%eax),%eax
return rval; imull %ecx,%eax
}
movl %ebp,%esp
popl %ebp Finish
ret

– 35 – CS 105
Understanding arith
int arith •
(int x, int y, int z) • Stack
{ Offset •
int t1 = x+y;
int t2 = z+t1; 16 z
int t3 = x+4; 12 y
int t4 = y * 48;
int t5 = t3 + t4; 8 x
int rval = t2 * t5; 4 Rtn adr
return rval; %ebp
0 Old %ebp
}

movl 8(%ebp),%eax # eax = x


movl 12(%ebp),%edx # edx = y
leal (%edx,%eax),%ecx # ecx = x+y (t1)
leal (%edx,%edx,2),%edx # edx = 3*y
sall $4,%edx # edx = 48*y (t4)
addl 16(%ebp),%ecx # ecx = z+t1 (t2)
leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5)
imull %ecx,%eax # eax = t5*t2 (rval)
– 36 – CS 105
Understanding arith
# eax = x
movl 8(%ebp),%eax
int arith # edx = y
(int x, int y, int z) movl 12(%ebp),%edx
{ # ecx = x+y (t1)
int t1 = x+y; leal (%edx,%eax),%ecx
int t2 = z+t1; # edx = 3*y
int t3 = x+4; leal (%edx,%edx,2),%edx
int t4 = y * 48; # edx = 48*y (t4)
int t5 = t3 + t4; sall $4,%edx
int rval = t2 * t5; # ecx = z+t1 (t2)
return rval; addl 16(%ebp),%ecx
} # eax = 4+t4+x (t5)
leal 4(%edx,%eax),%eax
# eax = t5*t2 (rval)
imull %ecx,%eax

– 37 – CS 105
Another Example
logical:
int logical(int x, int y) pushl %ebp Set
{ movl %esp,%ebp Up
int t1 = x^y;
int t2 = t1 >> 17; movl 8(%ebp),%eax
int mask = (1<<13) - 7; xorl 12(%ebp),%eax
int rval = t2 & mask; sarl $17,%eax
return rval; andl $8185,%eax
} Body
movl %ebp,%esp
popl %ebp Finish
213 = 8192, 213 – 7 = 8185 ret

movl 8(%ebp),%eax eax = x


xorl 12(%ebp),%eax eax = x^y (t1)
sarl $17,%eax eax = t1>>17 (t2)
andl $8185,%eax eax = t2 & 8185

– 38 – CS 105
CISC Properties
Instruction can reference different operand types
 Immediate, register, memory

Arithmetic operations can read/write memory


Memory reference can involve complex computation
 Rb + S*Ri + D
 Useful for arithmetic expressions, too

Instructions can have varying lengths


 IA32 instructions can range from 1 to 15 bytes

– 39 – CS 105
Summary: Abstract Machines

Machine Models Data Control


C 1) char 1) loops
2) int, float 2) conditionals
mem proc 3) double 3) switch
4) struct, array 4) Proc. call
5) pointer 5) Proc. return

Assembly
1) byte 3) branch/jump
2) 2-byte word 4) call
mem regs alu 3) 4-byte long word 5) ret
Cond. 4) contiguous byte allocation
Stack processor 5) address of initial byte
Codes

– 40 – CS 105
Pentium Pro (P6)
History
 Announced in Feb. ‘95
 Basis for Pentium II, Pentium III, and Celeron processors
 Pentium 4 similar idea, but different details

Features
 Dynamically translates instructions to more regular format
 Very wide, but simple instructions
 Executes operations in parallel
 Up to 5 at once
 Very deep pipeline
 12–18 cycle latency

– 41 – CS 105
PentiumPro Block Diagram

Microprocessor Report
2/16/95
PentiumPro Operation
Translates instructions dynamically into “Uops”
 118 bits wide
 Holds operation, two sources, and destination

Executes Uops with “Out of Order” engine


 Uop executed when
 Operands available
 Functional unit available
 Execution controlled by “Reservation Stations”
 Keeps track of data dependencies between uops
 Allocates resources

Consequences
 Indirect relationship between IA32 code & what actually gets
executed
 Tricky to predict / optimize performance at assembly level
– 43 – CS 105
Whose Assembler?
Intel/Microsoft Format GAS/Gnu Format
lea eax,[ecx+ecx*2] leal (%ecx,%ecx,2),%eax
sub esp,8 subl $8,%esp
cmp dword ptr [ebp-8],0 cmpl $0,-8(%ebp)
mov eax,dword ptr [eax*4+100h] movl $0x100(,%eax,4),%eax

Intel/Microsoft Differs from GAS


 Operands listed in opposite order
mov Dest, Src movl Src, Dest
 Constants not preceded by ‘$’, Denote hex with ‘h’ at end
100h $0x100
 Operand size indicated by operands rather than operator suffix
sub subl
 Addressing format shows effective address computation
[eax*4+100h] $0x100(,%eax,4)
– 44 – CS 105

You might also like