07 Basicx86Architecture 1up
07 Basicx86Architecture 1up
1
7.1: What is an instruction set
architecture?
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
2
Definitions
• Architecture: (also instruction set architecture: ISA) The
parts of a processor design that one needs to
understand to write assembly code.
Examples:
– instruction set specification, registers.
3
Instruction Set Architecture
• Assembly Language View Application
– Processor state Program
• Registers, memory, …
– Instructions Compiler OS
• addl, movl, leal, …
• How instructions are encoded as bytes ISA
• Layer of Abstraction
CPU
– Above: how to program machine
Design
• Processor executes instructions in a
sequence
Circuit
– Below: what needs to be built
Design
• Use variety of tricks to make it run fast
• E.g., execute multiple instructions Chip
simultaneously
Layout
4
CISC Instruction Sets
– Complex Instruction Set Computer
– Dominant style through mid-80’s
• Stack-oriented instruction set
– Use stack to pass arguments, save program counter
– Explicit push and pop instructions
• Arithmetic instructions can access memory
– addl %eax, 12(%ebx,%ecx,4)
• requires memory read and write
• Complex address calculation
• Condition codes
– Set as side effect of arithmetic and logical instructions
• Philosophy
– Add instructions to perform “typical” programming tasks
5
RISC Instruction Sets
– Reduced Instruction Set Computer
– Internal project at IBM, later popularized by Hennessy (Stanford)
and Patterson (Berkeley)
• Fewer, simpler instructions
– Might take more to get given task done
– Can execute them with small and fast hardware
• Register-oriented instruction set
– Many more (typically 32) registers
– Use for arguments, return pointer, temporaries
• Only load and store instructions can access memory
– Similar to Y86 mrmovl and rmmovl – see later!
• No Condition codes
– Test instructions return 0/1 in register
6
Contrast with x86 / 64-bit
• Operations are highly uniform
– All encoded in exactly 32 bits
– All take the same time to execute (mostly)
– All operate between registers, or only load/store
– All operate on 64 or 32 bit quantities (nothing
smaller)
• No condition codes: use registers
• Lots of registers, including zero
– All registers are uniform
7
Other RISC features
(not in Alpha)
• Explicit delay slots (e.g. MIPS)
– E.g. can’t use a value until 2 instructions after the load
• Make most instructions conditional (e.g. ARM)
– Needs condition codes (why?)
– Reduces branches, increases code density
• Etc.
8
CISC vs. RISC
• Original Debate
– Strong opinions!
– CISC proponents---easy for compiler, fewer code bytes
– RISC proponents---better for optimizing compilers, can
make run fast with simple chip design
• Current Status
– For desktop processors, choice of ISA not a technical issue
• With enough hardware, can make anything run fast
• Code compatibility more important
– For embedded processors, RISC still makes sense
• Smaller, cheaper, less power
• For how much longer?
9
Comparison with MIPS
(remember Digital Design?)
11
Summary
• Architecture vs. Microarchitecture
• Instruction set architectures
• RISC vs. CISC
• x86: comparison with MIPS
12
7.2: A bit of x86 history
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
13
Intel x86 Processors
• The x86 Architecture dominates the computer market
• Evolutionary design
– Backwards compatible up until 8086, introduced in 1978
– Added more features as time goes on
14
Intel x86 Evolution: Milestones
Name Date Transistors MHz
• 8086 1978 29K 5-10
– First 16-bit processor. Basis for IBM PC & DOS
– 1MB address space
• 80386 1985 275K 16-33
– First 32 bit processor , referred to as IA32
– Added “flat addressing”
– Capable of running Unix
– 32-bit Linux/gcc uses no instructions introduced in later models
• Pentium 4F 2005 230M 2800-3800
– First 64-bit [x86] processor
– Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of
“Core” line
15
IntelArchitectures
x86 Processors: Overview
Processors
X86-16 8086
286
X86-32/IA32 386
486
Pentium
MMX Pentium MMX
SSE2 Pentium 4
SSE3 Pentium 4E
X86-64 / EM64t Pentium 4F time
Core 2 Duo
SSE4 Core i7
IA: often redefined as latest Intel architecture 16
Intel x86 Processors, contd.
• Machine Evolution
486 1989 1.9M
Pentium 1993 3.1M
Pentium/MMX ‘97 74.5M
PentiumPro 1995 6.5M
Pentium III 1999 8.2M
Pentium 4 2001 42M
Core 2 Duo 2006 291M
• Added Features
– Instructions to support multimedia operations
• Parallel operations on 1, 2, and 4-byte data, both integer & FP
– Instructions to enable more efficient conditional operations
17
x86 Clones: Advanced Micro
Devices (AMD)
• Historically
– AMD has followed just behind Intel
– A little bit slower, a lot cheaper
• Then
– Recruited top circuit designers from Digital Equipment
Corp. and other downward trending companies
– Built Opteron: tough competitor to Pentium 4
– Developed x86-64, their own extension to 64 bits
• Recently
– Intel much quicker with dual core design
– Intel currently far ahead in performance
– em64t backwards compatible to x86-64
18
Intel’s 64-Bit
(partially true…)
• Intel Attempted Radical Shift from IA32 to IA64
– Totally different architecture (Itanium)
– Executes IA32 code only as legacy
– Performance disappointing
• AMD Stepped in with Evolutionary Solution
– x86-64 (now called “AMD64”)
• Intel Felt Obligated to Focus on IA64
– Hard to admit mistake or that AMD is better
• 2004: Intel Announces EM64T extension to IA32
– Extended Memory 64-bit Technology
– Almost identical to x86-64!
19
Intel Nehalem-EX
• Current leader
(for the next few weeks)
– 2.3 billion transistors/die
– 8 or 10 cores per die
– 2 threads per core
– Up to 8 packages
(= 128 contexts!)
– 4 memory channels per package
– Virtualization support
– etc.
• Good illustration of why it is
hard to teach state-of-the-art
processor design!
20
Intel Single-Chip Cloud
Computer - 2010
• Experimental processor
(only a few 100 made)
– Designed for research
– Working version in our Lab
• 48 old-style Pentium cores
• Very fast interconnection
network
– Hardware support for
messaging between cores
– Variable speed of network
• Non-cache coherent
– Sharing memory between
cores won’t work with a
conventional OS!
21
A quick note on syntax
There are two common ways to write x86
Assembler:
• AT&T syntax
– What we'll use in this course, common on Unix
• Intel syntax
– Generally used for Windows machines
22
7.3: Basics of machine code
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
23
Assembly programmer’s view
CPU Memory
Addresses
Registers Object Code
PC
Data Program Data
Condition OS Data
Instructions
Codes
25
Assembly data types
• “Integer” data of 1, 2, or 4 bytes
– Data values
– Addresses (untyped pointers)
• Transfer control
– Unconditional jumps to/from procedures
– Conditional branches
27
Object code
Code for sum
• Assembler
0x401040 <sum>: – Translates .s into .o
0x55
0x89 – Binary encoding of each instruction
0xe5 – Nearly-complete image of
0x8b executable code
0x45
– Missing linkages between code in
0x0c
0x03
different files
0x45 • Linker
0x08 – Resolves references between files
• Total of 13 bytes
0x89
0xec • Each instruction – Combines with static run-time
0x5d 1, 2, or 3 bytes libraries
0xc3 • Starts at address • E.g., code for malloc, printf
0x401040 – Some libraries are dynamically
linked
• Linking occurs when program begins
execution 28
Machine instruction example
• C Code
int t = x+y;
– Add two signed integers
addl 8(%ebp),%eax • Assembly
– Add 2 4-byte integers
Similar to expression: • “Long” words in GCC parlance
• Same instruction whether
x += y signed or unsigned
More precisely: – Operands:
• x: Register %eax
int eax; • y: Memory M[%ebp+8]
int *ebp; • t: Register %eax
eax += ebp[2] – Return function value in %eax
• Object Code
– 3-byte instruction
0x401046: 03 45 08 – Stored at address 0x401046
29
Disassembling object code
Disassembled
00401040 <_sum>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 0c mov 0xc(%ebp),%eax
6: 03 45 08 add 0x8(%ebp),%eax
9: 89 ec mov %ebp,%esp
b: 5d pop %ebp
c: c3 ret
d: 8d 76 00 lea 0x0(%esi),%esi
• Disassembler
– objdump -d p
– Useful tool for examining object code
– Analyzes bit pattern of series of instructions
– Produces approximate rendition of assembly code
– Can be run on either a.out (complete executable) or .o file
30
Alternate disassembly
Object Disassembled
0x401040: 0x401040 <sum>: push %ebp
0x55 0x401041 <sum+1>: mov %esp,%ebp
0x89 0x401043 <sum+3>: mov 0xc(%ebp),%eax
0xe5 0x401046 <sum+6>: add 0x8(%ebp),%eax
0x8b 0x401049 <sum+9>: mov %ebp,%esp
0x45 0x40104b <sum+11>: pop %ebp
0x0c 0x40104c <sum+12>: ret
0x03 0x40104d <sum+13>: lea 0x0(%esi),%esi
0x45
0x08 Within gdb Debugger
0x89
0xec – gdb p
0x5d – disassemble sum
0xc3 • Disassemble procedure
– x/13b sum
• Examine the 13 bytes starting at sum
31
What can be disassembled?
% objdump -d WINWORD.EXE
No symbols in "WINWORD.EXE".
Disassembly of section .text:
30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff push $0xffffffff
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91
33
7.4: 32-bit x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
34
Integer registers (ia32)
%eax %ax %ah %al accumulate
%ecx
– x in {b, w, l} %ebx
%esi
– movl Source, Dest: %edi
Move 4-byte “long word”
%esp
– movw Source, Dest:
%ebp
Move 2-byte “word”
– movb Source, Dest:
Move 1-byte “byte”
36
Moving data: ia32 %eax
%ecx
%ebx
• Operand Types
– Immediate: Constant integer data %esi
• Example: $0x400, $-533 %edi
• Like C constant, but prefixed with ‘$’
• Encoded with 1, 2, or 4 bytes %esp
– Register: One of 8 integer registers %ebp
• Example: %eax, %edx
• But %esp and %ebp reserved for special use
• Others have special uses for particular instructions
– Memory: 4 consecutive bytes of memory at address given by
register
• Simplest example: (%eax)
• Various other “address modes”
37
movl operand combinations
Source Dest Src,Dest C Analog
38
Simple memory
addressing modes
• Normal (R) Mem[Reg[R]]
– Register R specifies memory address
movl (%ecx),%eax
movl 8(%ebp),%edx
39
Using simple addressing modes
swap:
pushl %ebp
Set
void swap(int *xp, int *yp) movl %esp,%ebp Up
{ pushl %ebx
int t0 = *xp;
int t1 = *yp; movl 12(%ebp),%ecx
*xp = t1; movl 8(%ebp),%edx
*yp = t0; movl (%ecx),%eax Body
}
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp
Finish
popl %ebp
ret
40
Using simple addressing modes
swap:
pushl %ebp
Set
void swap(int *xp, int *yp) movl %esp,%ebp Up
{ pushl %ebx
int t0 = *xp;
int t1 = *yp; movl 12(%ebp),%ecx
*xp = t1; movl 8(%ebp),%edx
*yp = t0; movl (%ecx),%eax Body
}
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp
Finish
popl %ebp
ret
41
Understanding swap
void swap(int *xp, int *yp) •
Stack
• (in memory)
{
int t0 = *xp; •
Offset
int t1 = *yp;
*xp = t1; 12 yp
*yp = t0;
} 8 xp
4 Rtn adr
0 Old %ebp %ebp
Register Value
%ecx yp -4 Old %ebx
%edx xp movl 12(%ebp),%ecx # ecx = yp
%eax t1 movl 8(%ebp),%edx # edx = xp
%ebx t0 movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 42
Understanding swap
%eax
Address
123 0x124
%edx 0x124
456 0x120
%ecx 0x120 0x11c
Register file
%ebx 0x118
Memory
%esi
Offset 0x114
yp 12 0x120 0x110
%edi
xp 8 0x124 0x10c
%esp
4 Rtn adr 0x108
%ebp 0x104 %epb → 0 0x104
-4 0x100
movl 12(%ebp),%ecx # ecx = yp
movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 43
Understanding swap
Address
%eax 456
123 0x124
%edx 0x124
456 0x120
%ecx 0x120 0x11c
Register file
%ebx 0x118
Memory
%esi
Offset 0x114
yp 12 0x120 0x110
%edi
xp 8 0x124 0x10c
%esp
4 Rtn adr 0x108
%ebp 0x104 %epb → 0 0x104
-4 0x100
movl 12(%ebp),%ecx # ecx = yp
movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 44
Understanding swap
Address
%eax 456
123 0x124
%edx 0x124
456 0x120
%ecx 0x120 0x11c
Register file
Memory
%esi
Offset 0x114
yp 12 0x120 0x110
%edi
xp 8 0x124 0x10c
%esp
4 Rtn adr 0x108
%ebp 0x104 %epb → 0 0x104
-4 0x100
movl 12(%ebp),%ecx # ecx = yp
movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 45
Complete memory
addressing modes
• Most General Form:
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
– D: Constant “displacement” 1, 2, or 4 bytes
– Rb: Base register: Any of 8 integer registers
– Ri: Index register: Any, except for %esp
• Unlikely you’d use %ebp, either
– S: Scale: 1, 2, 4, or 8 (why these numbers?)
• Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
46
Address computation examples
%edx 0xf000
%ecx 0x100
47
Address computation
instruction
• leal Src,Dest
– Src is address mode expression
– Set Dest to address denoted by expression
• Uses
– Computing addresses without a memory reference
• E.g., translation of p = &x[i];
– Computing arithmetic expressions of the form x + k*y
• k = 1, 2, 4, or 8
48
Summary
• 32-bit x86 registers
• mov instruction: loads and stores
• memory addressing modes
– Example: swap()
• leal: address computation
49
7.5: ia32 integer arithmetic
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
50
Some arithmetic operations
• Two operand instructions:
Format Computation
addl Src,Dest Dest ← Dest + Src
subl Src,Dest Dest ← Dest - Src
imull Src,Dest Dest ← Dest * Src
sall Src,Dest Dest ← Dest << Src Also called shll
sarl Src,Dest Dest ← Dest >> Src Arithmetic
shrl Src,Dest Dest ← Dest >> Src Logical
xorl Src,Dest Dest ← Dest ^ Src
andl Src,Dest Dest ← Dest & Src
orl Src,Dest Dest ← Dest | Src
51
Some arithmetic operations
• One operand instructions
Format Computation
incl Dest Dest ← Dest + 1
decl Dest Dest ← Dest - 1
negl Dest Dest ← -Dest
notl Dest Dest ← ~Dest
52
Using leal for arithmetic expressions
arith:
pushl %ebp Set
int arith movl %esp,%ebp Up
(int x, int y, int z)
{ movl 8(%ebp),%eax
int t1 = x+y; movl 12(%ebp),%edx
int t2 = z+t1; leal (%edx,%eax),%ecx
int t3 = x+4; leal (%edx,%edx,2),%edx
sall $4,%edx Body
int t4 = y * 48;
int t5 = t3 + t4; addl 16(%ebp),%ecx
int rval = t2 * t5; leal 4(%edx,%eax),%eax
return rval; imull %ecx,%eax
}
movl %ebp,%esp
popl %ebp Finish
ret
53
Understanding arith
int arith •
(int x, int y, int z) • Stack
{ Offset •
int t1 = x+y;
int t2 = z+t1; 16 z
z
int t3 = x+4; 12 y
y
int t4 = y * 48; 8 x
x
int t5 = t3 + t4;
int rval = t2 * t5; 4 Rtn adr
return rval; 0 Old
Old %ebp
%ebp %ebp
}
movl 8(%ebp),%eax # eax = x
movl 12(%ebp),%edx # edx = y
leal (%edx,%eax),%ecx # ecx = x+y (t1)
leal (%edx,%edx,2),%edx # edx = 3*y
sall $4,%edx # edx = 48*y (t4)
addl 16(%ebp),%ecx # ecx = z+t1 (t2)
leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5)
imull %ecx,%eax # eax = t5*t2 (rval) 54
Another example
logical:
int logical(int x, int y) pushl %ebp Setup
{ movl %esp,%ebp
int t1 = x^y;
int t2 = t1 >> 17; movl 8(%ebp),%eax
int mask = (1<<13) - 7; xorl 12(%ebp),%eax
Body
int rval = t2 & mask; sarl $17,%eax
return rval; andl $8185,%eax
}
movl %ebp,%esp
popl %ebp Finish
213 = 8192, 213 – 7 = 8185
ret
56
Data representations:
ia32 and x86-64
C data type Typical 32-bit ia32 Intel x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
long double 8 10/12 10/16
char * 4 4 8
(or any other pointer)
• New instructions:
– movl → movq
– addl → addq
– sall → salq
– etc.
59
Swap in 32-bit mode
swap:
void swap(int *xp, int *yp)
pushl %ebp
{
movl %esp,%ebp Setup
int t0 = *xp;
pushl %ebx
int t1 = *yp;
*xp = t1;
movl 12(%ebp),%ecx
*yp = t0;
movl 8(%ebp),%edx
}
movl (%ecx),%eax
Body
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp
Finish
popl %ebp
ret
60
Swap in 64-bit Mode
void swap(int *xp, int *yp) swap:
{ movl (%rdi), %edx
int t0 = *xp; movl (%rsi), %eax
int t1 = *yp; movl %eax, (%rdi)
*xp = t1; movl %edx, (%rsi)
*yp = t0; retq
}
• 64-bit data
– Data held in registers %rax and %rdx
– movq operation
– “q” stands for quad-word
62
7.7: Condition codes
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe
63
Processor State (ia32, Partial)
• Information about %eax
currently executing
program %ecx
CF ZF SF OF Condition codes
64
Condition codes (implicit setting)
• Single bit registers
CF Carry Flag (for unsigned) SF Sign Flag (for signed)
ZF Zero Flag OF Overflow Flag (for signed)
65
Condition Codes
(Explicit Setting: Compare)
• Explicit Setting by Compare Instruction
cmpl/cmpq Src2,Src1
66
Condition Codes
(Explicit Setting: Test)
• Explicit Setting by Test instruction
testl/testq Src2,Src1
67
Reading Condition Codes
• SetX Instructions
– Set single byte based on combinations of
condition codes
Body
Is %rax zero?
Yes: 32-bit instructions set high order 32 bits to 0!
70
Jumping
jX Instructions:
Jump to different part of code depending on condition codes
jX Condition Description
jmp 1 Unconditional
je ZF Equal / Zero
jne ~ZF Not Equal / Not Zero
js SF Negative
jns ~SF Non-negative
jg ~(SF^OF)&~ZF Greater (Signed)
jge ~(SF^OF) Greater or Equal (Signed)
jl (SF^OF) Less (Signed)
jle (SF^OF)|ZF Less or Equal (Signed)
ja ~CF&~ZF Above (unsigned)
jb CF Below (unsigned)
71
Summary
• Condition codes (C, Z, S, O)
• Explicit setting of condition codes
– Compare
– Test
• Reading condition codes
– setX
• Jumps
72