0% found this document useful (0 votes)
78 views72 pages

07 Basicx86Architecture 1up

The document provides a brief history of the x86 architecture and Intel processors. It describes how the x86 architecture started with the 8086 processor in 1978 and has evolved over time to become more powerful through the 80386, Pentium, and 64-bit processors. It also discusses how AMD has competed with Intel by following similar advancement and developing the x86-64 extension to 64-bits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views72 pages

07 Basicx86Architecture 1up

The document provides a brief history of the x86 architecture and Intel processors. It describes how the x86 architecture started with the 8086 processor in 1978 and has evolved over time to become more powerful through the 80386, Pentium, and 64-bit processors. It also discusses how AMD has competed with Intel by following similar advancement and developing the x86-64 extension to 64-bits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

7: Basic x86 architecture

Computer Architecture and Systems Programming


252-0061-00, Herbstsemester 2013
Timothy Roscoe

1
7.1: What is an instruction set
architecture?
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe

2
Definitions
• Architecture: (also instruction set architecture: ISA) The
parts of a processor design that one needs to
understand to write assembly code.
Examples:
– instruction set specification, registers.

• Microarchitecture: Implementation of the architecture.


• Examples:
– cache sizes and core frequency.

• Example ISAs: x86, MIPS, ia64, VAX, Alpha, ARM, etc.

3
Instruction Set Architecture
• Assembly Language View Application
– Processor state Program
• Registers, memory, …
– Instructions Compiler OS
• addl, movl, leal, …
• How instructions are encoded as bytes ISA
• Layer of Abstraction
CPU
– Above: how to program machine
Design
• Processor executes instructions in a
sequence
Circuit
– Below: what needs to be built
Design
• Use variety of tricks to make it run fast
• E.g., execute multiple instructions Chip
simultaneously
Layout

4
CISC Instruction Sets
– Complex Instruction Set Computer
– Dominant style through mid-80’s
• Stack-oriented instruction set
– Use stack to pass arguments, save program counter
– Explicit push and pop instructions
• Arithmetic instructions can access memory
– addl %eax, 12(%ebx,%ecx,4)
• requires memory read and write
• Complex address calculation
• Condition codes
– Set as side effect of arithmetic and logical instructions
• Philosophy
– Add instructions to perform “typical” programming tasks

5
RISC Instruction Sets
– Reduced Instruction Set Computer
– Internal project at IBM, later popularized by Hennessy (Stanford)
and Patterson (Berkeley)
• Fewer, simpler instructions
– Might take more to get given task done
– Can execute them with small and fast hardware
• Register-oriented instruction set
– Many more (typically 32) registers
– Use for arguments, return pointer, temporaries
• Only load and store instructions can access memory
– Similar to Y86 mrmovl and rmmovl – see later!
• No Condition codes
– Test instructions return 0/1 in register

6
Contrast with x86 / 64-bit
• Operations are highly uniform
– All encoded in exactly 32 bits
– All take the same time to execute (mostly)
– All operate between registers, or only load/store
– All operate on 64 or 32 bit quantities (nothing
smaller)
• No condition codes: use registers
• Lots of registers, including zero
– All registers are uniform

7
Other RISC features
(not in Alpha)
• Explicit delay slots (e.g. MIPS)
– E.g. can’t use a value until 2 instructions after the load
• Make most instructions conditional (e.g. ARM)
– Needs condition codes (why?)
– Reduces branches, increases code density
• Etc.

• Key message: x86 is not the only way to do this!

8
CISC vs. RISC
• Original Debate
– Strong opinions!
– CISC proponents---easy for compiler, fewer code bytes
– RISC proponents---better for optimizing compilers, can
make run fast with simple chip design
• Current Status
– For desktop processors, choice of ISA not a technical issue
• With enough hardware, can make anything run fast
• Code compatibility more important
– For embedded processors, RISC still makes sense
• Smaller, cheaper, less power
• For how much longer?

9
Comparison with MIPS
(remember Digital Design?)

• MIPS is RISC: Reduced Instruction Set


– Motivation: simpler is faster
• Fewer gates ⇒ higher frequency
• Fewer gates ⇒ more transistors left for cache
– Seemed like a really good idea
• x86 is CISC: Complex Instruction Set
– More complex instructions, addressing modes
• Intel turned out to be way too good at manufacturing
• Difference in gate count became too small to make a
difference
• x86 inside is mostly RISC anyway, decode logic is small
– ⇒ Argument is mostly irrelevant these days
10
There are many architectures…
• You’ve already seen MIPS 2000 → MIPS 3000 → …
– Workstations, minicomputers, now mostly embedded networking
• IBM S/360 → S/370 → … → zSeries
– First to separate architecture from (many) implementations
• ARM (several variants)
– Very common in embedded systems, basis for Advanced OS course at ETHZ
• IBM POWER → PowerPC (→ Cell, sort of)
– Basis for all 3 last-gen games console systems
• DEC Alpha
– Personal favorite; killed by Compaq, team left for Intel to work on…
• Intel Itanium
– First 64-bit Intel product; very fast (esp. FP), hot, and expensive
– Mostly overtaken by 64-bit x86 designs
• etc.

11
Summary
• Architecture vs. Microarchitecture
• Instruction set architectures
• RISC vs. CISC
• x86: comparison with MIPS

12
7.2: A bit of x86 history
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe

13
Intel x86 Processors
• The x86 Architecture dominates the computer market

• Evolutionary design
– Backwards compatible up until 8086, introduced in 1978
– Added more features as time goes on

• Complex instruction set computer (CISC)


– Many different instructions with many different formats
• But, only small subset encountered with Linux programs
– Hard to match performance of Reduced Instruction Set
Computers (RISC)
– But, Intel has done just that!

14
Intel x86 Evolution: Milestones
Name Date Transistors MHz
• 8086 1978 29K 5-10
– First 16-bit processor. Basis for IBM PC & DOS
– 1MB address space
• 80386 1985 275K 16-33
– First 32 bit processor , referred to as IA32
– Added “flat addressing”
– Capable of running Unix
– 32-bit Linux/gcc uses no instructions introduced in later models
• Pentium 4F 2005 230M 2800-3800
– First 64-bit [x86] processor
– Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of
“Core” line

15
IntelArchitectures
x86 Processors: Overview
Processors
X86-16 8086

286
X86-32/IA32 386
486
Pentium
MMX Pentium MMX

SSE Pentium III

SSE2 Pentium 4

SSE3 Pentium 4E
X86-64 / EM64t Pentium 4F time

Core 2 Duo
SSE4 Core i7
IA: often redefined as latest Intel architecture 16
Intel x86 Processors, contd.
• Machine Evolution
486 1989 1.9M
Pentium 1993 3.1M
Pentium/MMX ‘97 74.5M
PentiumPro 1995 6.5M
Pentium III 1999 8.2M
Pentium 4 2001 42M
Core 2 Duo 2006 291M
• Added Features
– Instructions to support multimedia operations
• Parallel operations on 1, 2, and 4-byte data, both integer & FP
– Instructions to enable more efficient conditional operations

17
x86 Clones: Advanced Micro
Devices (AMD)
• Historically
– AMD has followed just behind Intel
– A little bit slower, a lot cheaper
• Then
– Recruited top circuit designers from Digital Equipment
Corp. and other downward trending companies
– Built Opteron: tough competitor to Pentium 4
– Developed x86-64, their own extension to 64 bits
• Recently
– Intel much quicker with dual core design
– Intel currently far ahead in performance
– em64t backwards compatible to x86-64

18
Intel’s 64-Bit
(partially true…)
• Intel Attempted Radical Shift from IA32 to IA64
– Totally different architecture (Itanium)
– Executes IA32 code only as legacy
– Performance disappointing
• AMD Stepped in with Evolutionary Solution
– x86-64 (now called “AMD64”)
• Intel Felt Obligated to Focus on IA64
– Hard to admit mistake or that AMD is better
• 2004: Intel Announces EM64T extension to IA32
– Extended Memory 64-bit Technology
– Almost identical to x86-64!

19
Intel Nehalem-EX
• Current leader
(for the next few weeks)
– 2.3 billion transistors/die
– 8 or 10 cores per die
– 2 threads per core
– Up to 8 packages
(= 128 contexts!)
– 4 memory channels per package
– Virtualization support
– etc.
• Good illustration of why it is
hard to teach state-of-the-art
processor design!

20
Intel Single-Chip Cloud
Computer - 2010
• Experimental processor
(only a few 100 made)
– Designed for research
– Working version in our Lab
• 48 old-style Pentium cores
• Very fast interconnection
network
– Hardware support for
messaging between cores
– Variable speed of network
• Non-cache coherent
– Sharing memory between
cores won’t work with a
conventional OS!

21
A quick note on syntax
There are two common ways to write x86
Assembler:

• AT&T syntax
– What we'll use in this course, common on Unix
• Intel syntax
– Generally used for Windows machines

22
7.3: Basics of machine code
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe

23
Assembly programmer’s view
CPU Memory
Addresses
Registers Object Code
PC
Data Program Data
Condition OS Data
Instructions
Codes

Programmer-Visible State Stack


– PC: Program counter
• Address of next instruction
• Called “EIP” (IA32) or “RIP” (x86-64)
– Register file
• Heavily used program data
– Condition codes Memory
• Store status information about most • Byte addressable array
recent arithmetic operation • Code, user data, (some) OS data
• Used for conditional branching
• Includes stack used to support
procedures
24
Compiling into assembly

C code Generated ia32 assembly


int sum(int x, int y) sum:
{ pushl %ebp
int t = x+y; movl %esp,%ebp
return t; movl 12(%ebp),%eax
} addl 8(%ebp),%eax
movl %ebp,%esp
popl %ebp
ret

Obtain with command


gcc -O -S code.c Some compilers use single instruction
“leave”
Produces file code.s

25
Assembly data types
• “Integer” data of 1, 2, or 4 bytes
– Data values
– Addresses (untyped pointers)

• Floating point data of 4, 8, or 10 bytes

• No aggregate types such as arrays or


structures
– Just contiguously allocated bytes in memory
26
Assembly code operations
• Perform arithmetic function on register or
memory data

• Transfer data between memory and register


– Load data from memory into register
– Store register data into memory

• Transfer control
– Unconditional jumps to/from procedures
– Conditional branches

27
Object code
Code for sum
• Assembler
0x401040 <sum>: – Translates .s into .o
0x55
0x89 – Binary encoding of each instruction
0xe5 – Nearly-complete image of
0x8b executable code
0x45
– Missing linkages between code in
0x0c
0x03
different files
0x45 • Linker
0x08 – Resolves references between files
• Total of 13 bytes
0x89
0xec • Each instruction – Combines with static run-time
0x5d 1, 2, or 3 bytes libraries
0xc3 • Starts at address • E.g., code for malloc, printf
0x401040 – Some libraries are dynamically
linked
• Linking occurs when program begins
execution 28
Machine instruction example
• C Code
int t = x+y;
– Add two signed integers
addl 8(%ebp),%eax • Assembly
– Add 2 4-byte integers
Similar to expression: • “Long” words in GCC parlance
• Same instruction whether
x += y signed or unsigned
More precisely: – Operands:
• x: Register %eax
int eax; • y: Memory M[%ebp+8]
int *ebp; • t: Register %eax
eax += ebp[2] – Return function value in %eax
• Object Code
– 3-byte instruction
0x401046: 03 45 08 – Stored at address 0x401046

29
Disassembling object code
Disassembled
00401040 <_sum>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 0c mov 0xc(%ebp),%eax
6: 03 45 08 add 0x8(%ebp),%eax
9: 89 ec mov %ebp,%esp
b: 5d pop %ebp
c: c3 ret
d: 8d 76 00 lea 0x0(%esi),%esi

• Disassembler
– objdump -d p
– Useful tool for examining object code
– Analyzes bit pattern of series of instructions
– Produces approximate rendition of assembly code
– Can be run on either a.out (complete executable) or .o file
30
Alternate disassembly
Object Disassembled
0x401040: 0x401040 <sum>: push %ebp
0x55 0x401041 <sum+1>: mov %esp,%ebp
0x89 0x401043 <sum+3>: mov 0xc(%ebp),%eax
0xe5 0x401046 <sum+6>: add 0x8(%ebp),%eax
0x8b 0x401049 <sum+9>: mov %ebp,%esp
0x45 0x40104b <sum+11>: pop %ebp
0x0c 0x40104c <sum+12>: ret
0x03 0x40104d <sum+13>: lea 0x0(%esi),%esi
0x45
0x08 Within gdb Debugger
0x89
0xec – gdb p
0x5d – disassemble sum
0xc3 • Disassemble procedure
– x/13b sum
• Examine the 13 bytes starting at sum
31
What can be disassembled?
% objdump -d WINWORD.EXE

WINWORD.EXE: file format pei-i386

No symbols in "WINWORD.EXE".
Disassembly of section .text:

30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff push $0xffffffff
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91

• Anything that can be interpreted as executable code


• Disassembler examines bytes and reconstructs assembly source
32
Summary
• Compiling into assembly
• Data types in assembly
• Assembly code operations
• Object code, and disassembling it

33
7.4: 32-bit x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe

34
Integer registers (ia32)
%eax %ax %ah %al accumulate

%ecx %cx %ch %cl counter


general purpose

%edx %dx %dh %dl data

%ebx %bx %bh %bl base

%esi %si source index


destination
%edi %di index
%esp %sp stack pointer

%ebp %bp base pointer


Origin
16-bit virtual registers (mostly obsolete)
35
(backwards compatibility)
Moving data: ia32 %eax

%ecx

• movx Source, Dest %edx

– x in {b, w, l} %ebx

%esi
– movl Source, Dest: %edi
Move 4-byte “long word”
%esp
– movw Source, Dest:
%ebp
Move 2-byte “word”
– movb Source, Dest:
Move 1-byte “byte”

• Lots of these in typical code

36
Moving data: ia32 %eax

%ecx

movl Source, Dest: %edx

%ebx
• Operand Types
– Immediate: Constant integer data %esi
• Example: $0x400, $-533 %edi
• Like C constant, but prefixed with ‘$’
• Encoded with 1, 2, or 4 bytes %esp
– Register: One of 8 integer registers %ebp
• Example: %eax, %edx
• But %esp and %ebp reserved for special use
• Others have special uses for particular instructions
– Memory: 4 consecutive bytes of memory at address given by
register
• Simplest example: (%eax)
• Various other “address modes”

37
movl operand combinations
Source Dest Src,Dest C Analog

Reg movl $0x4,%eax temp = 0x4;


Imm
Mem movl $-147,(%eax) *p = -147;

Reg movl %eax,%edx temp2 = temp1;


movl Reg
Mem movl %eax,(%edx) *p = temp;

Mem Reg movl (%eax),%edx temp = *p;


Cannot do memory-memory transfer with a single instruction

38
Simple memory
addressing modes
• Normal (R) Mem[Reg[R]]
– Register R specifies memory address

movl (%ecx),%eax

• Displacement D(R) Mem[Reg[R]+D]


– Register R specifies start of memory region
– Constant displacement D specifies offset

movl 8(%ebp),%edx

39
Using simple addressing modes
swap:
pushl %ebp
Set
void swap(int *xp, int *yp) movl %esp,%ebp Up
{ pushl %ebx
int t0 = *xp;
int t1 = *yp; movl 12(%ebp),%ecx
*xp = t1; movl 8(%ebp),%edx
*yp = t0; movl (%ecx),%eax Body
}
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)

movl -4(%ebp),%ebx
movl %ebp,%esp
Finish
popl %ebp
ret
40
Using simple addressing modes
swap:
pushl %ebp
Set
void swap(int *xp, int *yp) movl %esp,%ebp Up
{ pushl %ebx
int t0 = *xp;
int t1 = *yp; movl 12(%ebp),%ecx
*xp = t1; movl 8(%ebp),%edx
*yp = t0; movl (%ecx),%eax Body
}
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)

movl -4(%ebp),%ebx
movl %ebp,%esp
Finish
popl %ebp
ret
41
Understanding swap
void swap(int *xp, int *yp) •
Stack
• (in memory)
{
int t0 = *xp; •
Offset
int t1 = *yp;
*xp = t1; 12 yp
*yp = t0;
} 8 xp
4 Rtn adr
0 Old %ebp %ebp
Register Value
%ecx yp -4 Old %ebx
%edx xp movl 12(%ebp),%ecx # ecx = yp
%eax t1 movl 8(%ebp),%edx # edx = xp
%ebx t0 movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 42
Understanding swap
%eax
Address
123 0x124
%edx 0x124
456 0x120
%ecx 0x120 0x11c
Register file

%ebx 0x118

Memory
%esi
Offset 0x114
yp 12 0x120 0x110
%edi
xp 8 0x124 0x10c
%esp
4 Rtn adr 0x108
%ebp 0x104 %epb → 0 0x104
-4 0x100
movl 12(%ebp),%ecx # ecx = yp
movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 43
Understanding swap
Address
%eax 456
123 0x124
%edx 0x124
456 0x120
%ecx 0x120 0x11c
Register file

%ebx 0x118

Memory
%esi
Offset 0x114
yp 12 0x120 0x110
%edi
xp 8 0x124 0x10c
%esp
4 Rtn adr 0x108
%ebp 0x104 %epb → 0 0x104
-4 0x100
movl 12(%ebp),%ecx # ecx = yp
movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 44
Understanding swap
Address
%eax 456
123 0x124
%edx 0x124
456 0x120
%ecx 0x120 0x11c
Register file

%ebx 123 0x118

Memory
%esi
Offset 0x114
yp 12 0x120 0x110
%edi
xp 8 0x124 0x10c
%esp
4 Rtn adr 0x108
%ebp 0x104 %epb → 0 0x104
-4 0x100
movl 12(%ebp),%ecx # ecx = yp
movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx 45
Complete memory
addressing modes
• Most General Form:
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
– D: Constant “displacement” 1, 2, or 4 bytes
– Rb: Base register: Any of 8 integer registers
– Ri: Index register: Any, except for %esp
• Unlikely you’d use %ebp, either
– S: Scale: 1, 2, 4, or 8 (why these numbers?)

• Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

46
Address computation examples
%edx 0xf000
%ecx 0x100

Expression Address Computation Address


0x8(%edx) 0xf000 + 0x8 0xf008
(%edx,%ecx) 0xf000 + 0x100 0xf100
(%edx,%ecx,4) 0xf000 + 4*0x100 0xf400
0x80(,%edx,2) 2*0xf000 + 0x80 0x1e080

47
Address computation
instruction
• leal Src,Dest
– Src is address mode expression
– Set Dest to address denoted by expression

• Uses
– Computing addresses without a memory reference
• E.g., translation of p = &x[i];
– Computing arithmetic expressions of the form x + k*y
• k = 1, 2, 4, or 8

48
Summary
• 32-bit x86 registers
• mov instruction: loads and stores
• memory addressing modes
– Example: swap()
• leal: address computation

49
7.5: ia32 integer arithmetic
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe

50
Some arithmetic operations
• Two operand instructions:
Format Computation
addl Src,Dest Dest ← Dest + Src
subl Src,Dest Dest ← Dest - Src
imull Src,Dest Dest ← Dest * Src
sall Src,Dest Dest ← Dest << Src Also called shll
sarl Src,Dest Dest ← Dest >> Src Arithmetic
shrl Src,Dest Dest ← Dest >> Src Logical
xorl Src,Dest Dest ← Dest ^ Src
andl Src,Dest Dest ← Dest & Src
orl Src,Dest Dest ← Dest | Src

• No distinction between signed and unsigned int (why?)

51
Some arithmetic operations
• One operand instructions
Format Computation
incl Dest Dest ← Dest + 1
decl Dest Dest ← Dest - 1
negl Dest Dest ← -Dest
notl Dest Dest ← ~Dest

• See book for more instructions

52
Using leal for arithmetic expressions
arith:
pushl %ebp Set
int arith movl %esp,%ebp Up
(int x, int y, int z)
{ movl 8(%ebp),%eax
int t1 = x+y; movl 12(%ebp),%edx
int t2 = z+t1; leal (%edx,%eax),%ecx
int t3 = x+4; leal (%edx,%edx,2),%edx
sall $4,%edx Body
int t4 = y * 48;
int t5 = t3 + t4; addl 16(%ebp),%ecx
int rval = t2 * t5; leal 4(%edx,%eax),%eax
return rval; imull %ecx,%eax
}
movl %ebp,%esp
popl %ebp Finish
ret

53
Understanding arith
int arith •
(int x, int y, int z) • Stack
{ Offset •
int t1 = x+y;
int t2 = z+t1; 16 z
z
int t3 = x+4; 12 y
y
int t4 = y * 48; 8 x
x
int t5 = t3 + t4;
int rval = t2 * t5; 4 Rtn adr
return rval; 0 Old
Old %ebp
%ebp %ebp
}
movl 8(%ebp),%eax # eax = x
movl 12(%ebp),%edx # edx = y
leal (%edx,%eax),%ecx # ecx = x+y (t1)
leal (%edx,%edx,2),%edx # edx = 3*y
sall $4,%edx # edx = 48*y (t4)
addl 16(%ebp),%ecx # ecx = z+t1 (t2)
leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5)
imull %ecx,%eax # eax = t5*t2 (rval) 54
Another example
logical:
int logical(int x, int y) pushl %ebp Setup
{ movl %esp,%ebp
int t1 = x^y;
int t2 = t1 >> 17; movl 8(%ebp),%eax
int mask = (1<<13) - 7; xorl 12(%ebp),%eax
Body
int rval = t2 & mask; sarl $17,%eax
return rval; andl $8185,%eax
}
movl %ebp,%esp
popl %ebp Finish
213 = 8192, 213 – 7 = 8185
ret

movl 8(%ebp),%eax # eax = x


xorl 12(%ebp),%eax # eax = x^y (t1)
sarl $17,%eax # eax = t1>>17 (t2)
andl $8185,%eax # eax = t2 & 8185
55
7.6: 64-bit x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe

56
Data representations:
ia32 and x86-64
C data type Typical 32-bit ia32 Intel x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
long double 8 10/12 10/16
char * 4 4 8
(or any other pointer)

Sizes of C objects (in bytes) 57


x86-64 integer registers
%rax %eax %r8 %r8d

%rbx %ebx %r9 %r9d

%rcx %ecx %r10 %r10d

%rdx %edx %r11 %r11d

%rsi %esi %r12 %r12d

%rdi %edi %r13 %r13d

%rsp %esp %r14 %r14d

%rbp %ebp %r15 %r15d

– Extend existing registers. Add 8 new ones.


58
– Make %ebp/%rbp general purpose
Instructions
• Long word l (4 Bytes) ↔ Quad word q (8 Bytes)

• New instructions:
– movl → movq
– addl → addq
– sall → salq
– etc.

• 32-bit instructions that generate 32-bit results


– Set higher order bits of destination register to 0
– Example: addl

59
Swap in 32-bit mode
swap:
void swap(int *xp, int *yp)
pushl %ebp
{
movl %esp,%ebp Setup
int t0 = *xp;
pushl %ebx
int t1 = *yp;
*xp = t1;
movl 12(%ebp),%ecx
*yp = t0;
movl 8(%ebp),%edx
}
movl (%ecx),%eax
Body
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)

movl -4(%ebp),%ebx
movl %ebp,%esp
Finish
popl %ebp
ret

60
Swap in 64-bit Mode
void swap(int *xp, int *yp) swap:
{ movl (%rdi), %edx
int t0 = *xp; movl (%rsi), %eax
int t1 = *yp; movl %eax, (%rdi)
*xp = t1; movl %edx, (%rsi)
*yp = t0; retq
}

• Operands passed in registers (why useful?)


– First (xp) in %rdi, second (yp) in %rsi
– 64-bit pointers
• No stack operations required
• 32-bit data
– Data held in registers %eax and %edx
– movl operation
61
Swap Long Ints in 64-bit Mode
void swap_l swap_l:
(long int *xp, long int *yp) movq (%rdi), %rdx
{ movq (%rsi), %rax
long int t0 = *xp; movq %rax, (%rdi)
long int t1 = *yp; movq %rdx, (%rsi)
*xp = t1; retq
*yp = t0;
}

• 64-bit data
– Data held in registers %rax and %rdx
– movq operation
– “q” stands for quad-word

62
7.7: Condition codes
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2013
Timothy Roscoe

63
Processor State (ia32, Partial)
• Information about %eax
currently executing
program %ecx

– Temporary data %edx General purpose


( %eax, … ) %ebx registers
– Location of runtime
stack %esi
( %ebp,%esp ) %edi
– Location of current %esp Current stack top
code control point
( %eip, … ) %ebp Current stack frame
– Status of recent
tests %eip Instruction pointer
( CF,ZF,SF,OF)

CF ZF SF OF Condition codes
64
Condition codes (implicit setting)
• Single bit registers
CF Carry Flag (for unsigned) SF Sign Flag (for signed)
ZF Zero Flag OF Overflow Flag (for signed)

• Implicitly set (think of it as side effect) by arithmetic operations


Example: addl/addq Src,Dest ↔ t = a+b
– CF set if carry out from most significant bit (unsigned overflow)
– ZF set if t == 0
– SF set if t < 0 (as signed)
– OF set if two’s complement (signed) overflow
(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)

• Not set by lea instruction


• Full documentation link on course website

65
Condition Codes
(Explicit Setting: Compare)
• Explicit Setting by Compare Instruction

cmpl/cmpq Src2,Src1

cmpl b,a like computing a-b without setting destination

CF set if carry out from most significant bit


(used for unsigned comparisons)
ZF set if a == b
SF set if (a-b) < 0 (as signed)
OF set if two’s complement (signed) overflow:
(a>0 && b<0 && (a-b)<0)
|| (a<0 && b>0 && (a-b)>0)

66
Condition Codes
(Explicit Setting: Test)
• Explicit Setting by Test instruction

testl/testq Src2,Src1

testl b,a like computing a&b w/o setting destination

– Sets condition codes based on value of Src1 & Src2


– Useful to have one of the operands be a mask

ZF set when a&b == 0


SF set when a&b < 0

67
Reading Condition Codes
• SetX Instructions
– Set single byte based on combinations of
condition codes

SetX Condition Description


sete ZF Equal / Zero
setne ~ZF Not Equal / Not Zero
sets SF Negative
setns ~SF Nonnegative
setg ~(SF^OF)&~ZF Greater (Signed)
setge ~(SF^OF) Greater or Equal (Signed)
setl (SF^OF) Less (Signed)
setle (SF^OF)|ZF Less or Equal (Signed)
seta ~CF&~ZF Above (unsigned)
setb CF Below (unsigned) 68
Reading Condition Codes (Cont.)
• setx Instructions: %eax %ah %al
Set single byte based on combination of condition %ecx %ch %cl
codes
%edx %dh %dl
• One of 8 addressable byte registers
– Does not alter remaining 3 bytes %ebx %bh %bl
– Typically use movzbl to finish job %esi
int gt (int x, int y) %edi
{
return x > y; %esp
} %ebp

Body

movl 12(%ebp),%eax # eax = y


cmpl %eax,8(%ebp) # Compare x : y
setg %al # al = x > y
movzbl %al,%eax # Zero rest of %eax 69
Reading Condition Codes: x86-64
• setx Instructions:
– Set single byte based on combination of condition codes
– Does not alter remaining 3 bytes

int gt (long x, long y) long lgt (long x, long y)


{ {
return x > y; return x > y;
} }

Body (same for both)

xorl %eax, %eax # eax = 0


cmpq %rsi, %rdi # Compare x and y
setg %al # al = x > y

Is %rax zero?
Yes: 32-bit instructions set high order 32 bits to 0!
70
Jumping
jX Instructions:
Jump to different part of code depending on condition codes

jX Condition Description
jmp 1 Unconditional
je ZF Equal / Zero
jne ~ZF Not Equal / Not Zero
js SF Negative
jns ~SF Non-negative
jg ~(SF^OF)&~ZF Greater (Signed)
jge ~(SF^OF) Greater or Equal (Signed)
jl (SF^OF) Less (Signed)
jle (SF^OF)|ZF Less or Equal (Signed)
ja ~CF&~ZF Above (unsigned)
jb CF Below (unsigned)
71
Summary
• Condition codes (C, Z, S, O)
• Explicit setting of condition codes
– Compare
– Test
• Reading condition codes
– setX
• Jumps

72

You might also like