0% found this document useful (0 votes)
47 views96 pages

Lecture 4 - Assembly Programming - A 2025

The document provides an overview of assembly and machine language, focusing on the structure and basic instructions of assembly language, including the use of registers and addressing modes. It discusses the differences between AT&T and Intel assembly syntax and outlines basic arithmetic and logical instructions. Additionally, it highlights the limitations of assembly language in terms of data types and abstractions.

Uploaded by

idoamar2609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views96 pages

Lecture 4 - Assembly Programming - A 2025

The document provides an overview of assembly and machine language, focusing on the structure and basic instructions of assembly language, including the use of registers and addressing modes. It discusses the differences between AT&T and Intel assembly syntax and outlines basic arithmetic and logical instructions. Additionally, it highlights the limitations of assembly language in terms of data types and abstractions.

Uploaded by

idoamar2609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 96

‫מבנה‬

‫מחשב‬
‫מצגת ‪4‬‬
‫‪Assembly and Machine Language‬‬

‫ד"ר מרינה קוגן‪-‬סדצקי‬


‫מבוסס על הרצאות של פרופ' גל קמינקא והרצאות של ‪Bryant and‬‬
‫‪O'Hallaron‬‬
Agenda

‫סיכום‬ 5 3 1
Jump Assembly basic Registers
Tables instructions file
‫טבלאות ניתוב‬ ‫פקודות בסיסיות‬ ‫הרגיסטרים‬

Data types
and
C Calling Addressing
Pipeline
0
Convention modes
‫מיקבול ביצוע פקודות‬ ‫טיפוסים והפניות‬
‫פונקציות באסמבלי‬
‫לזכרון‬

6 4 2 Welcome

2
Assembly basics
1952: IBM researcher
Nathaniel Rochester
wrote the first
symbolic Assembler
for IBM 701 machine,
which allowed
programs to be
written in short,
One Assembly :1-1
readable commands.
language
instruction
corresponds to one
machine language
.instruction

one Assembly instruction one


Machine instruction
3
Assembly does not support (almost) any abstractions

data types ? NO sizes

arrays ? NO direct work with RAM as an array of bytes


‫איזה כלים יש‬
‫בשפת‬ loops ? NO direct access to registers
? Assembly
functions ? NO

standard libraries ? NO

data structures ? NO
We would
conditions ? NO manually
4
simulate all the
rest
AT&T vs Intel Assembly Syntax

‫אנחנו נלמד שני‬


‫סוגים של‬
Assembly
syntax

AT&T assembly syntax was used for Unix Intel assembly syntax was used for Windows

5
Registers
x86-64 Integer Registers

Let's start from


better
understanding CPU
registers file

CPU registers file

register
… is a 64-
bit array

0 1 1 1 0 0 0 1 1 0 0 0 1 1 0 1 0 . . . 0 1 1 1 0 1 1 0 0 0 1 1

7
x86-64 Integer Registers

64 bits

32 bits

16bits

8 bits 8 bits

ah al
eax Low
rax ax byte
High
byte

8
x86-64 Integer Registers
general purpose registers
Extended (Accumulator,
Base, Counter, Data, …)
Hexa

index registers
(Source Index, Stack registers
Destination
Index)
RSP, RBP
RSP - RBP - Base
Stack pointer –
pointer - contains
contains address of
address of current
last used activation
dword in frame
general purpose the stack
registers

9
X86-64 Registers – (almost) full picture

In this course
we study only
a small
10 amount of
registers
Assembly Basics
Assembly Language Program
• Consists of processor instructions, assembler directives, and data
• Translated by Assembler into machine language instructions (binary code)

Example:
Assembly code: AL register
mov al, 0x61 # load 0x61 to AL register 0 1 1 0 0 0 0 1
machine code:
10110000 01100001
1011 a binary code (opcode) of instruction 'MOV'
0 specifies if data is byte (‘0’) or full size 16/32/64 bits (‘1’)
000 a binary identifier for AL register
01100001 a binary representation of 0x61

12
Basic Structure of Assembly Instruction
RAM
optionallabel: opcode operands # comment
filed
Label equivalents operand may be optional
to instruction reg (register), field
address in RAM. mem (memory),
Non-local labels or imm
must be unique. (immediate)

Example:
myLabel: mov al, 0x61 # load 0x61 to AL register
… jmp makes jmp myLabel
jmp myLabel RIP to point to
RAM[myLabel Text …

] Segment myLabel mov al, 0x61 1024

13
Example of Assembly directive
RAM
buffer: resb 4 # reserve 4 bytes

mov buffer, 2 => mov 2048, 2 


mov [buffer], 2 => mov [2048], 2  00000000

4 bytes
00000000
'[2048]' means 00000000
RAM[2048] buffer 00000010 2048
Appropriate C code:

int buffer;
buffer = 2;
mov dword [buffer], 2 => mov dword [2048], 2 
'dword' means double
word (4 bytes) starting
from buffer, so we refer to
14
RAM[buffer] :
RAM[buffer+4]
Sizes

● CPU doesn’t operate “types” but sizes RAM


● CPU instructions operate on the following sizes:
○ b = byte = 8 bits
b=
○ w = word = 16 bits byte w=
○ l = dword = 32 bits word l=
dword
○ q = qword = 64 bits
q=
qword
‫ שני‬Assembly ‫ברוב פקודות‬
.‫אופרנדים חייבים להיות באותו גודל‬
,‫אם אחד האופרנדים הינו רגיסטר‬ ‫ מקובל‬AT&T -‫ב‬
‫הגודל של האופרנד השני נגזר ממנו‬ ‫לציין את גודל‬
.‫מכיוון שלרגיסטר יש גודל מוגד מראש‬ ‫האופרנדים בשם של‬
‫ מקובל לא לציין גודל של‬Intel -‫לכן ב‬ ‫ אם כי זה לא‬,‫פקודה‬
.)‫פקודה (אם כי זה אפשרי‬ .‫חובה‬

Example:
mov AL, 3 movb $3, %AL
mov AX, 3 movw $3, %AX
mov EAX, 3 movl $3, %EAX
mov RAX, 3 movq $3, %RAX
15
Memory Addressing Modes
RAM
● An addressing mode is an expression that calculates an address in memory 01010101
11011110
template template 00110011
Intel ‫עבור‬ AT&T ‫עבור‬
01010101
style style RAX 00110011
[ 𝑥 𝑏𝑎𝑠𝑒+ 𝑥 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑥 𝑖𝑛𝑑𝑒𝑥+ 𝑖𝑚𝑚/ 𝑟𝑒𝑔 ] 𝑖𝑚𝑚(𝑥𝑏𝑎𝑠𝑒 , 𝑥 𝑖𝑛𝑑𝑒𝑥 , 𝑥 𝑠𝑐𝑎𝑙𝑒 ) 01010101
11011110
any
integer
1,2,4 any integer 01010101 1040
or 8 register
registe except for rsp
Examples: r

rcx+4*rdx+8 = 1024+4*2+8 = 1040


# suppose , # suppose ,

mov rax, qword [1024] # rax RAM[1024] movq 1024, %rax


mov rax, qword [rcx] # rax RAM[rcx] movq (%rcx), %rax
mov rax, qword [rcx-4] # rax RAM[rcx-4] movq -4(%rcx), %rax
mov rax, qword [rcx+rdx] # rax RAM[rcx+rdx] movq (%rcx, %rdx), %rax
mov rax, qword [rcx+4*rdx] # rax RAM[rcx+4*rdx] movq (%rcx, %rdx, 4), %rax
mov rax, qword [rcx+4*rdx+8] # rax RAM[rcx+4*rdx+8] movq 8(%rcx, %rdx, 4), %rax
mov
16 rax, qword [4*rdx+8] # rax RAM[4*rdx+8] movq 8(,%rdx, 4), %rax
1024
Memory Addressing Modes
RAM
● An addressing mode is an expression that calculates an address in memory

template
Intel ‫עבור‬ ‫דוגמא לשימוש‬
style addressing -‫ב‬
‫ בסריקת‬mode 2064
A[4]
[ 𝑥 𝑏𝑎𝑠𝑒+ 𝑥 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑥 𝑖𝑛𝑑𝑒𝑥+ 𝑖𝑚𝑚/ 𝑟𝑒𝑔 ] ‫מערך‬

any 1,2,4 any integer


integer
registe
or 8 register
except for rsp
int A[5]; A[3]
2060
Examples: r
for(int i=0; i<5; i+
# suppose , +)
A[i]++; A[2]
2056
mov rax, qword [1024] # rax RAM[1024]
mov rax, qword [rcx] # rax RAM[rcx] A: resd 5
mov ebx, 0 2052
mov rax, qword [rcx-4] # rax RAM[rcx-4] startLoop: A[1]
mov rax, qword [rcx+rdx] # rax RAM[rcx+rdx] cmp ebx, 5
je endLoop
mov rax, qword [rcx+4*rdx] # rax RAM[rcx+4*rdx] inc dword
A[0] 2048
mov rax, qword [rcx+4*rdx+8] # rax RAM[rcx+4*rdx+8] [A+4*ebx]
inc ebx
mov
17 rax, qword [4*rdx+8] # rax RAM[4*rdx+8] jmp startLoop
endLoop:
Memory Addressing Modes

‫כל‬
‫האפשרויות‬
‫עבור‬
addressing
mode

https://reverseengineering.stackexchange.com/questions/22115/understanding-operand-forms

18
Sanity Test

‫שאלה‬
-‫ הציעו שורת קוד שקולה ב‬Assembly -‫עבור כל שורת קוד ב‬
Assembly C suppose rax x, rdx y C

movq x = 0x4;
?
$0x4,%rax
movq $-147, *x = -?
(%rax) 147;
movq %rax, y = x;?
%rdx
movq %rax, *y = x;
?
(%rdx)

movq (%rax), y = *x;


?
%rdx

19
Basic Arithmetical Instructions
‫שימו לב לתוספת של‬
.‫' בשם של הפקודה‬q'
ADD - add integers ‫זה גודל של‬
‫הארגומנטים של‬
Example: .‫הפקודה‬
add RAX, RBX # (RAX gets a value of RAX+RBX)

addq %RBX, %RAX

SUB - subtract integers


Example:
sub RAX, RBX # (RAX gets a value of RAX – RBX)

subq %RBX, %RAX

20
Basic Arithmetical Instructions
RAM
INC - increment integer
Example:
inc RAX # (RAX gets a value of RAX+1)

4 bytes
incq %RAX 00000000
00000000
00000000
buffer 00000010 00000001
DEC - decrement integer
Example:
dec byte [buffer] # (first byte of RAM[buffer]--)
decb buffer AT&T -‫שימו לב שב‬
‫ כשמציינים‬style
‫ זו הפנייה‬,label
‫ ללא צורך‬,‫לזכרון‬
‫בסוגריים מרובעים‬
21
Basic Logical Instructions

OR – bitwise or – bit at index i of the gets ‘1’ if bit at


index i of source or are ‘1’; otherwise ‘0’
Example:
movb $0b11111100, %AL
mov AL, 0b11111100
movb $0b00000010, %BL
mov BL, 0b00000010 orb %BL, %AL
or AL, BL # (AL gets a value 0b11111110)

AND – bitwise and – bit at index i of the gets ‘1’ if bits


at index i of both source and are ‘1’; otherwise ‘0’
Example:
mov AL, 0b11111100 movb $0b11111100, %AL
mov BL, 0b00000010 movb $0b00000010, %BL
andb %BL, %AL
and AL, BL # (AL gets a value 0)
22
Basic Logical Instructions

NOT – one’s complement negation – inverts all the bits


Example:
mov AL, 0b11111110 movb $0b11111100, %AL
not AL # (AL gets a value of 00000001b) notb %AL
# (11111110b + 00000001b = 11111111b)

NEG – two’s complement negation – inverts all the bits, and


adds 1
Example: movb $0b11111100, %AL
mov AL, 0b11111110 negb %AL
neg AL # (AL gets a value of not(11111110b)+1=00000001b+1=00000010b)
# (11111110b + 00000010b = 100000000b = 0)

23
CMP – Compare Instruction

CMP – performs a ‘mental’ SUB


Affects RFLAGS as if the subtraction had taken place.
The calculation result is discarded. ZF ← 1 when src - dest == 0
SF ← 1 when src - dest < 0

Examples:

mov AL, 0b11111100 movb $0b11111100, %AL


mov BL, 0b00000010 movb $0b00000010, %BL
cmp AL, BL # (ZF (zero flag) gets a value 0) cmpb %BL, %AL

mov AL, 0b11111100 movb $0b11111100, %AL


mov BL, 0b11111100 movb $0b11111100, %BL
cmp AL, BL # (ZF (zero flag) gets a value 1) cmpb %BL, %AL

24
TEST – Logical Compare Instruction

TEST – performs a ‘mental’ AND


Affects RFLAGS as if the AND had taken place.
The calculation result is discarded. ZF ← 1 when src & dest == 0
SF ← 1 when src & dest < 0

Examples:

mov AL, 0b11111100 movb $0b11111100, %AL


mov BL, 0b00000010 movb $0b00000010, %BL
test AL, BL # (ZF (zero flag) gets a value 1) testb %BL, %AL

mov AL, 0b11111100 movb $0b11111100, %AL


mov BL, 0b11111100 movb $0b11111100, %BL
test AL, BL # (ZF (zero flag) gets a value 0) testb %BL, %AL

25
Shift – Bitwise Shift

SHL, SHR – Bitwise Logical Shifts


– vacated bits are filled with zero
– last shifted bit enters to CF flag

Example: shift indeed


mov AL, 0b10110111 ; AL = 10110111b movb $0b11111100, %AL performs fast
shr AL, 1 # shift right 1 bit AL = 01011011b, CF = 1 shrb $1, %AL division /
shl AL, 4 # shift left 4 bits AL = 10110000b, CF = 1 shlb $1, %AL multiplication by
2
SAL, SAR – Bitwise Arithmetic Shift
– vacated bits are filled with zero for SAL, and with copies of MSB for SAR
– last shifted bit enters to CF flag
movb $0b10110111, %AL
Example: sarb $3, %AL
mov AL, 0b10110111 ; AL = 10110111b salb $2, %AL
sar AL, 3 # shift arithmetical right 3 bits AL = 11110110b, CF = 1
sal AL, 2 # shift arithmetical left 2 bits AL = 11011000b, CF = 1
26
Basic Operations – Summary
‫אסור ששני הארגומנטים‬
‫אומנם סדר של‬ ,‫של פקודה יהיו בזכרון‬
⬛ Two Operand Instructions: Source and ‫אחד הארגומנטים חייב‬
Destination ‫להיות ערך נומרי או‬
Intel style AT&T style Calculation ‫ של‬arguments ‫רגיסטר‬
‫ החישוב‬,‫הפקודות שונה‬
add Dest, Src add Src, Dest Dest = Dest + Src ‫הינו אותו חישוב‬
sub Dest, Src sub Src, Dest Dest = Dest − Src
sal Dest, Src sal Src, Dest Dest = Dest << Src
sar Dest, Src sar Src, Dest Dest = Dest >> Src ⬛ One operand instructions
shr Dest, Src shr Src, Dest Dest = Dest >> Src
xor Dest, Src xor Src, Dest Dest = Dest ^ Src inc Dest Dest = Dest + 1
and Dest, Src and Src, Dest Dest = Dest & Src dec Dest Dest = Dest − 1
or Dest, Src or Src, Dest Dest = Dest | Src neg Dest Dest = -Dest
not Dest Dest = ~Dest

⬛ Watch out for argument order!


⬛ No distinction between signed and unsigned integers

27
JMP – Unconditional Jump
JMP tells the processor that the next instruction to be executed
is located at the label that is given as part of jmp instruction.

‫מה יקרה‬
‫כשנריץ את‬
This is infinite ‫? הקוד הבא‬
Example:
loop !
mov eax, 1
RIP register gets
inc_again: inc_again label int x=1;
inc eax (address). while (true)
jmp inc_again x++;
mov eax, 5

This instruction is
never reached
from this code.

28
j<cond> – Conditional Jump

• execution is transferred to the target instruction only if the specified


condition is satisfied
• usually, the condition being tested is the result of the last arithmetic or logic
operation
‫מה הקוד השקול‬
Example: ?C ‫בשפת‬
jge = jump if
greater or equal
mov eax, 1 = eax- 1 0 >= 0
inc_again: int x=1;
cmp eax, 10 while (x <
jge after_loop # if eax > = 10, 10)
go to after_loop x++;
inc eax ‫ אפשר (אבל‬C -‫ב‬
jmp inc_again # go back to loop int x = 1;
)‫ממש לא רצוי‬
labels -‫להשתמש ב‬
after_loop: inc_again: ‫ נעשה את‬.goto -‫וב‬
if (x >= 10) ‫זה רק כדי לדמות‬
goto after_loop; ‫את הכתיבה‬
x++; .‫באסמבלי של הקוד‬
29 goto inc_again;
after_loop:
j<cond> – Conditional Jump
Instruction Description Flags
JO Jump if overflow OF = 1
JNO Jump if not overflow OF = 0
JS Jump if sign SF = 1
JNS Jump if not sign SF = 0
Note that the JE Jump if equal ZF = 1
JZ Jump if zero
list above is JNE Jump if not equal ZF = 0
partial. The full JNZ Jump if not zero
list can be JB Jump if below CF = 1
JNAE Jump if not above or equal
found here. JC Jump if carry
JNB Jump if not below CF = 0
JAE Jump if above or equal
JNC Jump if not carry
JBE Jump if below or equal CF = 1 or ZF = 1
JNA Jump if not above
JA Jump if above CF = 0 and ZF = 0
JNBE Jump if not below or equal
JL Jump if less SF <> OF
JNGE Jump if not greater or equal
JGE Jump if greater or equal SF = OF
JNL Jump if not less
JLE Jump if less or equal ZF = 1 or SF <> OF
JNG Jump if not greater
JG Jump if greater ZF = 0 and SF = OF
JNLE Jump if not less or equal
30 JCXZ Jump if CX register is 0 CX = 0
JECXZ Jump if ECX register is 0 ECX = 0
Conditional Branch Example

int max:
int max(int
max(int x,
x, int
int y)
y) # suppose rdi x, rsi y
{
{ cmpl %esi, %edi # edi - esi ‫החלק‬
if
if (x
(x >
> y)
y) ‫החישובי‬
return
jle else # jle – jump lower/equal
return x;
x;
else movl %edi, %eax
else high bits 32
return
return y;
y;
ret
of RAX
}
} else: register are
movl %esi, %eax 'set to '0
ret

31
d<size> – declare initialized data – AT&T
in C string,
Example: '\0'
character is
int x; added
.byte .word .long .quad int y = 0; automatical
Define char, short, int or long, respectively char str [] = "Hi\ ly

n";
.space Global
Reserve a specific number of bytes variables int A [10] = {0};
definition int main {
.zero …
Reserve space and initializes it with zero bytes }
in Assembly
.section .bss string, '\0'
.string character is
Define string constants x : .space 4
added
.section .data automatically
y : .zero 4
.fill x, y, val str : .string "Hi\n"
Define x elements of size y with value val A : .fill 10, 4, 0
.section .text
.globl main
34
main

d<size> – declare initialized data - Intel
size in bytes size Directive
1 byte byte DB
2 bytes
4 bytes
word
double word
DW
DD
self-
8 bytes quadword DQ study

Examples:
x: db 0x55
x: db 0x55,0x56,0x57 ; three bytes in succession
x: db 'a‘ ; character constant 0x61 (ascii code of ‘a’)
x: db 'hello’,10, 0 ; string constant
x: dw 0x1234 ; 0x34 0x12
x: dw ‘A' ; 0x41 0x00 – complete to word
x: dw ‘ABC' ; 0x41 0x42 0x43 0x00 – complete to word
35 x: dd 0x12345678 ; 0x78 0x56 0x34 0x12
Sanity Test

section .data
x dd 4
‫שאלה‬
y dd 2 ? ‫מה התוכנית עושה‬
z dd 3

section .text
global _start
_start:
mov ecx, [x]
cmp ecx, [y]
jg check_z
mov ecx, [y]

check_z:
cmp ecx, [z]
jg exit
mov ecx, [z]

exit:
mov rax, 1
36 int 0x80
Sanity Test

section .data
x dd 4
‫שאלה‬
y dd 2 ? ‫מה התוכנית עושה‬
z dd 3
The program finds the maximum among
section .text x, y, and z and stores it in ecx.
global _start
_start: Let's go through the program step by step:
1.The program defines three variables in the data section:
mov ecx, [x]
1. x = 4
cmp ecx, [y] 2. y = 2
jg check_z 3. z = 3
mov ecx, [y] 2.In the _start section:
1. load the value of x (4) into ecx
check_z: 2. compare ecx (4) with y (2)
cmp ecx, [z] 3. Since 4 > 2, jump to check_z
3.In check_z:
jg exit
1. compare ecx (still 4) with z (3)
mov ecx, [z] 2. Since 4 > 3, jump to exit
4.In exit:
exit: 1. move 1 into rax (this is the system call number for exit on Linux)
mov rax, 1 2. trigger a system interrupt with int 0x80, which exits the program
37 int 0x80
lea – Load Effective Address Instruction

⬛ lea Src, Dst ‫זו פקודה שמאפשרת‬


‫לקבל כתובת ולשמור‬
▪ Src is address mode expression ‫אותה לשימוש מאוחר‬
‫יותר‬

⬛ Uses
▪ Computing addresses without a memory reference
▪ e.g., translation of p = &x[i]; The multiplication involved
in address calculations is
▪ Computing arithmetic expressions of the form x + k*y handled by Address
Generation Unit (AGU)
▪ k = 1, 2, 4, or 8 and not by ALU. AGU is a
specialized part of CPU that
‫בסך הכל בעזרת שתי‬ is optimized for computing
lea ‫בעזרת‬ ‫הפקודות חישבנו‬ addresses efficiently.
‫אפשר לבצע‬
‫ הינה‬,‫חישובים‬
⬛ Example ‫הדוגמא לכך‬
‫החלק‬
long m12:
m12:
long m12(long
m12(long x)
x) #
‫החישובי‬
{ # suppose
suppose rdi
rdi x
x
{ leaq (%rdi,%rdi,2),
return leaq (%rdi,%rdi,2), %rax
%rax #
# t
t xx +
+ x*2
x*2
return x*12;
x*12; salq $2,
} salq $2, %rax
%rax #
# t<<2
t<<2
} ret
38
ret
C Calling Convention
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


RSP
– decrements RSP by 2/4/8 bytes (according to the operand size)
– stores the operand value to RSP address on Stack (in Little Endian
manner)

POP <r/m16/32,64> – load data from Stack


– reads the value at RSP address on Stack (in Little Endian manner)
– increment RSP by 2/4/8 bytes (according to the operand size)

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

40
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10 3036
– stores the operand value to RSP address on Stack (in Little Endian 0x20 3035
RSP
manner)

POP <r/m16/32,64> – load data from Stack


– reads the value at RSP address on Stack (in Little Endian manner)
– increment RSP by 2/4/8 bytes (according to the operand size)

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

41
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10 3036
– stores the operand value to RSP address on Stack (in Little Endian 0x20 3035
RSP
manner)

POP <r/m16/32,64> – load data from Stack


– reads the value at RSP address on Stack (in Little Endian manner)
– increment RSP by 2/4/8 bytes (according to the operand size)

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

42
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10 3036
– stores the operand value to RSP address on Stack (in Little Endian 0x20 3035
manner) 0x30 3034
0x40 3033
RSP
POP <r/m16/32,64> – load data from Stack
– reads the value at RSP address on Stack (in Little Endian manner)
– increment RSP by 2/4/8 bytes (according to the operand size)

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

43
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10 3036
– stores the operand value to RSP address on Stack (in Little Endian 0x20 3035
manner) 0x30 3034
0x40 3033
RSP
POP <r/m16/32,64> – load data from Stack
– reads the value at RSP address on Stack (in Little Endian manner)
– increment RSP by 2/4/8 bytes (according to the operand size)

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

44
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10 3036
– stores the operand value to RSP address on Stack (in Little Endian :‫שימו לב‬ 0x20 3035
manner) ‫מצביע על‬RSP 0x30 3034
‫הערך האחרון‬ 0x40 3033
‫שנדחף‬ 0x50 3032
POP <r/m16/32,64> – load data from Stack 0x60 3031
– ‫למחסנית‬
reads the value at RSP address on Stack (in Little Endian manner) 0x50 3030
– increment RSP by 2/4/8 bytes (according to the operand size) 0x60 3029
RSP

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

45
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10 3036
– stores the operand value to RSP address on Stack (in Little Endian 0x20 3035
manner) 0x30 3034
0x40 3033
0x50 3032
POP <r/m16/32,64> – load data from Stack 0x60 3031
– reads the value at RSP address on Stack (in Little Endian manner) 0x50 3030
– increment RSP by 2/4/8 bytes (according to the operand size) 0x60 3029
RSP

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

46
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10 3036
– stores the operand value to RSP address on Stack (in Little Endian 0x20 3035
manner) 0x30 3034
RAX 0x40 3033
0x50 3032
POP <r/m16/32,64> – load data from Stack 0x60 3031
– reads the value at RSP address on Stack (in Little Endian manner) 0x50 3030
– increment RSP by 2/4/8 bytes (according to the operand size) 0x60 3029
RSP

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

47
Stack Operations

PUSH <r/m16/32,64, imm8/16/32> – push data to Stack RAM


RSP 3036
– decrements RSP by 2/4/8 bytes (according to the operand size) 0x10
– stores the operand value to RSP address on Stack (in Little Endian 0x20 3035
manner) 0x30 3034
RAX 0x40 3033
0x50 3032
POP <r/m16/32,64> – load data from Stack 0x60 3031
– reads the value at RSP address on Stack (in Little Endian manner) 0x50 3030
– increment RSP by 2/4/8 bytes (according to the operand size) 0x60 3029

Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax

popq %rbx

48
X86-64 C Calling Convention

rax return caller


value saved
rbx callee
rdi argume saved Let’s explore
main() is
rsi
nt #1
argume how this code is caller
nt #2
argume
translated to int result;
rdx func() is
nt #3 Assembly and callee
rcx argumen caller void main() {
t #4
executed. result = func(1, 2);
saved
r8 argumen
}
targumen
#5
r9 ‫לחלק‬ int func(int x, int y)
t #6 ‫מהרגיסטרים יש‬ {
rsp
‫תפקיד מיוחד‬ int sum;
rbp ‫עבור פונקציות‬ sum = x + y;
r10 return sum;
r11 caller }
saved
r12
r13 callee
saved
r14

49 r15
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
100 232 240
RBP 240
232 RSP
Stack 224
216
212

suppose …
int result; the section result: resd 1 130
following .bss ret 129
void main() { position of
result = func(1, pop rbp 127
RSP and
2); RBP mov rsp, rbp 126
} suppose mov eax, [rbp-4] 124
int func(int x, int y) the mov [rbp-4], edi 120
{ following add edi, esi 116
int sum; position of sub rsp, 4 114
sum = x + y; code and section 112
mov rbp, rsp
return sum; data
} .text func push rbp 110
… 109
mov [result], eax 107
call func 104
50 mov esi, 2 102
main mov edi, 1 100 RIP
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
100 232 240
RBP 240
232 RSP
224
section .bss 216
result: resd 1 212
section .text
main: ; caller code …
int result; result: resd 1 130
ret 129
void main() {
result = func(1, pop rbp 127
2); mov rsp, rbp 126
} mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
{ add edi, esi 116
int sum; sub rsp, 4 114
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
51 mov esi, 2 102
main mov edi, 1 100 RIP
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
104 232 240
RBP 240
232 RSP
224
section .bss 216
result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
ret 129
void main() {
result = func(1, pop rbp 127
2); mov rsp, rbp 126
} mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
{ add edi, esi 116
int sum; sub rsp, 4 114
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104 RIP

52 mov esi, 2 102


main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
107 232 240
RBP 240
232 RSP
224
section .bss 216
result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
int sum; sub rsp, 4 114
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107 RIP
call func 104
53 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
110 224 240
RBP 240
232
107 - return address 224 RSP

section .bss 216


result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
int sum; sub rsp, 4 114
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110 RIP
}
… 109
mov [result], eax 107
call func 104
54 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
112 224 240
RBP 240
232
107 - return address 224 RSP

section .bss 216


result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
sum = x + y; 112 RIP
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
55 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
112 216 240
RBP 240
232
107 - return address 224
section .bss 240 - RBP old value 216 RSP
result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
sum = x + y; 112 RIP
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
56 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
114 216 240
RBP 240
232
107 - return address 224
section .bss 240 - RBP old value 216 RSP
result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114 RIP
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
57 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
114 216 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216 RSP
result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114 RIP
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
58 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
116 216 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216 RSP
result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116 RIP
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
59 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
116 212 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216
result: resd 1 212 RSP
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116 RIP
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
60 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
124 212 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216
result: resd 1 212 RSP
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124 RIP
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov [result], eax 107
call func 104
61 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
124 212 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216
result: resd 1 3 212 RSP
section .text ‫זה‬
main: ; caller code activation …
int result; mov edi, 1 ; x – first argument ‫ של‬frame result: resd 1 130
mov esi, 2 ; y – second argument
call func
func
; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124 RIP
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov [result], eax 107
call func 104
62 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
126 212 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216
result: resd 1 3 212 RSP
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126 RIP
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
call func 104
63 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
127 212 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216
result: resd 1 3 212 RSP
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127 RIP

2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
64 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
127 216 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216 RSP
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127 RIP

2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
65 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
129 216 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216 RSP
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129 RIP
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
66 main() mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
129 224 240
RBP 240
232
107 - return address 224 RSP
section .bss 240 - RBP old value 216
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129 RIP
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
67 main() mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
130 224 240
RBP 240
232
107 - return address 224 RSP
section .bss 240 - RBP old value 216
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130 RIP
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
68 main()
ret ; return from the function mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
107 232 240
RBP 240
232 RSP
107 - return address 224
section .bss 240 - RBP old value 216
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107 RIP
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
69 main()
ret ; return from the function mov esi, 2 102
main mov edi, 1 100
X86-64 C Calling Convention RAM

RBP 240
232 RSP
‫מכלול של‬
‫השורות‬
107 - return address 224
section .bss ‫המסומנות מגדיר‬ 240 - RBP old value 216
result: resd 1 C Calling 3 212
section .text Convention
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109 RIP
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
70 main()
ret ; return from the function mov esi, 2 102
main mov edi, 1 100
C Calling Convention – sumstore() example

‫ שרוצים‬C ‫קוד‬
‫לתרגלם לשפת‬
C Code Assembly ‫קוד‬
Assembly
Assembly code ‫תואם‬
long plus(long x, long y);
rdx is caller
sumstore: saved, so we
void sumstore (long x, long y, long call plus must backup
*dest) movq %rax, (%rdx) its value
{ ret
long t = plus(x, y);
*dest = t;
}

Register Value
%rdi x
%rsi y
71 %rdx dest
Sanity Test

‫שאלה‬
Assembly -‫ נתון לקוד המתאים ב‬C ‫תרגמו קוד‬

swap: swap:
void swap(int *x, int *y) mov eax, word [rdi] mov eax, dword [rdi]
3 1
mov edx, word [rsi]
{ mov word [rdi], edx
mov edx, dword [rsi]
mov dword [rdi], edx
int temp = *x; mov word [rsi], eax mov dword [rsi], eax
ret ret
*x = *y;
swap:
*y = temp; swap:
mov eax, rdi
mov eax, dword [rdi] 4 2
mov edx, rsi
} mov dword [rdi],
mov rdi, edx
[rsi]
mov rsi, eax
mov dword [rsi], eax
ret
ret

72
Sanity Test

‫שאלה‬
Assembly -‫ נתון לקוד המתאים ב‬C ‫תרגמו קוד‬

swap: swap:
void swap(int *x, int *y) mov eax, word [rdi] mov eax, dword [rdi]
3 1
mov edx, word [rsi]
{ mov word [rdi], edx
mov edx, dword [rsi]
mov dword [rdi], edx
int temp = *x; mov word [rsi], eax mov dword [rsi], eax
ret ret
*x = *y;
swap:
*y = temp; swap:
mov eax, rdi
mov eax, dword [rdi] 4 2
mov edx, rsi
} mov dword [rdi],
mov rdi, edx
[rsi]
mov rsi, eax
mov dword [rsi], eax
ret
ret

mov dword [rdi], ‫הפקודה‬


Register Value [rsi]
%rdi x ‫ כי שני הארגומנטים‬,‫לא תקינה‬
%rsi y ‫שלה בזכרון‬

73
Reading Condition Codes

SetX – set combination of flags to destination 8-bit register


sete ZF Equal/Zero
setne ~ZF Not Equal/Zero ‫ ניתן‬setX ‫בעזרת פקודת‬
‫ או‬flag‫לקבל ערך של‬
sets SF Negative
flags. ‫שילוב של‬
setns ~SF Non-negative
Destination operand
setb CF Below (unsigned) ,bit register-8 ‫חייב להיות‬
setae ~CF Above or equal (unsigned) 1 ‫ או‬0 ‫והוא מקבל ערך‬
seta ~CF&~ZF Above (unsigned) .‫לתוצאת החישוב‬
• Use‫בהתאם‬
one of 8
seto OF Overflow (signed) addressable byte
setno ~OF Not Overflow (signed) registers
setg ~(SF^OF)&~ZF Greater (signed)
• Does not alter
remaining 3 bytes
setge ~(SF^OF) Greater or Equal (signed)
• Typically use movzbl to
setl (SF^OF) Less (signed)
finish job
setle (SF^OF) | ZF Less or Equal (signed)

‫ יכיל‬al ‫רגיסטר‬
- ‫דוגמת שימוש‬
‫פונקציית‬ gt: ‫את התשובה‬
Example: predicate x > y ‫האם‬
cmpl %esi, %edi # compare x : y ‫נרחיב את הערך‬
long gt (int x, int y) setg %al # al x > y ‫ על פני‬al -‫שיש ב‬
{ movzbq %al, %rax # zero rest bits of return ‫ כי‬,rax ‫כל‬
74 -‫ מועבר ב‬value
return x > y; %eax
} ret rax
Reading Condition Codes

SetX – set combination of flags to destination 8-bit register ‫יש התאמה בין‬
‫הפקודות‬
‫> לבין‬j<cond
setX ‫הפקודות‬
‫בחישובים של‬
flags
sete
setne
sets
setns
setg
setge
setl
setle
seta
setb

… …

75
Jump table
Switch-Case Statement

typedef enum ‫ קיים‬C ‫בשפת‬


{ADD, MULT, MINUS, DIV}
switch-case ‫ניתן לממש‬ switch- ‫מבנה‬
op_type;
if-else ‫כאוסף של‬
‫ זה יעבוד‬.statements
‫ איך נממש‬. case
char dense(op_type ‫טוב אם יש לנו כמה‬ -‫מבנה זה ב‬
‫ ויעבוד גרוע כאשר‬cases
op) .cases ‫יש לנו הרבה‬ ? Assembly
{
switch (op) { ‫אפשרות הרבה יותר‬
case ADD : ‫טובה היא לממש‬
return '+'; ‫בעזרת‬
case MULT: .Jump Table
‫בואו נראה איך‬
return '*';
.‫עושים‬
case MINUS:
return '-';
case DIV:
return '/';
}
}
77
Jump Table Structure
‫ זו‬jump table
‫טבלה שמרכזת‬
typedef enum ‫כתובות של בלוקים‬
Jump Table Jump Targets
{ADD, MULT, MINUS, DIV} ‫בקוד שמתאימים‬
case ‫לכל‬
op_type; .cases -‫ל‬
,label ‫נתאים‬
Target3 Code Block ‫שאת הערך‬
char dense(op_type 3 ‫שלו נחזיק‬
Target2 Target3: ‫בטבלה‬
op)
{ Target1
Code Block
switch (op) {
JumpTable: Target0 2
case ADD : code Target2:
return '+'; block 0
case MULT: code Code Block
return '*'; block 1 1
case MINUS: code Target1:
return '-'; block 2
case DIV: code Code Block
return '/'; block 3 0
} Advantage of Jump Table: Target0
} k-way branch in O(1) :
78
Jump Table Implementation
.section .rodata
RAM Jump
Table
.align 8
typedef enum .JT:
{ADD, MULT, MINUS, DIV} .quad .ADD #op = 0
op_type; .quad .MULT #op = 1
.quad .MINUS #op = 2
.quad .DIV #op = 3
char dense(op_type
op) Jump .section .text
{ Targets
.ADD:
switch (op) { .DONE ret 2022 movb $43,%al # ’+’
case ADD : jmp .DONE jmp .DONE
return '+'; .DIV movq $47,%rax 2020 .MULT:
case MULT: jmp .DONE
movb $42,%al # ’*’
.MINUS movq $45,%rax 2016
return '*'; jmp .DONE jmp .DONE
case MINUS: .MULT movq $42,%rax 2008 .MINUS:
return '-'; jmp .DONE movb $45,%al # ’-’
case DIV: . ADD movq $43,%rax 2000 jmp .DONE
return '/'; .DIV:
} 2022 1024 movb $47,%al # ’/’
} 2016 1016
2008 1008
jmp .DONE
79 .JT 2000 1000 .DONE:
ret
Jump Table Implementation
.section .rodata
RAM ‫אם ערך גדול‬
,3 -‫יותר מ‬
.align 8
.JT:
‫נקפוץ לסוף‬ .quad .ADD #op = 0
dense: ‫פונקציה‬
.quad .MULT #op = 1
cmpl $3,%rdi .quad .MINUS #op = 2
ja .DONE .quad .DIV #op = 3
jmp *.JT(,%rdi,8)
.section .text
,‫ חוקי‬op ‫אם ערך של‬ .ADD:
.DONE ret 2022 ‫ לפי‬label -‫נקפוץ ל‬ movb $43,%al # ’+’
character *
jmp .DONE
serves as a :‫חישוב‬ jmp .DONE
.DIV movq $47,%rax 2020 dereferenc )L57+8*rdi.(* .MULT:
jmp .DONE
e operator movb $42,%al # ’*’
.MINUS movq $45,%rax 2016
jmp .DONE jmp .DONE
suppose rdi = MINUS .MINUS:
.MULT movq $42,%rax 2008
jmp .DONE
.JT(,%rdi,8) movb $45,%al # ’-’
=2*8+1000 = 1016
. ADD movq $43,%rax 2000 jmp .DONE
jmp .JT(,%rdi,8) .DIV:
2022 1024
=> RAM[1016] movb $47,%al # ’/’
2016 1016
2008 1008
jmp .DONE
.JT80 jmp *.JT(,%rdi,8) .DONE:
2000 1000
=> RAM[RAM[1016]] ret
Sparse Switch-Case

int sparce(int x) Jump Table requires 600 entries, and thus


{ is not practical in sparce cases.
switch(x) {
case 100: return 1;
Obvious translation into if-then-else would
case 200: return 2; ‫האם תמיד‬ have better performance.
case 300: return 3; ‫כדאי להשתמש‬
case 400: return 4; switch- -‫ב‬
? case
case 500: return 5; We can get
! ‫ממש לא‬
case 600: return 6; logarithmic
performance
400 if we
default: return -1; organize
} cases as AVL
} 200 600

100 300 500

81
Sparse Switch-Case

int sparce(int x) 400


{ L6:
switch(x) { cmpq $600,%rdi #
case 100: return 1; 200 600 x:600
case 200: return 2; je Leaf6
case 300: return 3; cmpq $500,%rdi #
case 400: return 4; 100 300 500 x:500
case 500: return 5; je Leaf5
case 600: return 6; jmp DEAFULT
L3:
default: return -1; cmpq $300,%rdi #
} sparce: x:300
} cmpq $400,%rdi # je Leaf3
x:400 jmp DEAFULT
je Leaf4 Leaf1:
jg L6 movq $1,%rax
cmpq $200,%rdi # ret
x:200 Leaf2:
je Leaf2 movq $2,%rax
82 jg L3 ret
cmpq $100,%rdi # …
Pipeline
RAM – read & write CRAFTING A CPU TO RUN PROGRAMS
https://www.youtube.com/watch?v=GYlNoAMBY6o

Read (load) from


RAM
• put address on
Address Bus
• enable reading
• get data from Data
Bus

Write (store) to
RAM
• put address on
Address Bus
• pub data on Data
Bus
• enable writing
84
CPU – instruction execution steps CRAFTING A CPU TO RUN PROGRAMS
https://www.youtube.com/watch?v=GYlNoAMBY6o

‫פקודה מבוצעת ע"י‬


)ALU( ‫יחידת חישוב‬
Control( ‫ויחידת בקרה‬
‫ שמבצעת את‬,)Unit
‫כל ההכנות לחישוב‬

‫יחידה‬
‫מחשבת‬

‫יחידה‬
85 ‫שולטת‬
CPU – instruction execution steps CRAFTING A CPU TO RUN PROGRAMS
https://www.youtube.com/watch?v=GYlNoAMBY6o

Fetch – bring
Fetch
next instruction
Decode (RIP points it)
from RAM
Decode – break
the instruction to
its parts (opcode
and arguments)

Execute – Write-back –
execute the write back
instruction output to RAM
Read – bring calculation if needed
Read
all RAM
arguments to
CPU registers Execute

Write-back

86
Pipeline
Each instruction is composed of (at most) five steps

step 1: Fetch
bring instruction from RAM[RIP] to Data ‫שלבים שונים‬
RIR register Memory ‫מבוצעים ע"י חומרה‬
step 2: Decode ‫ ולכן ניתן לבצע‬,‫שונה‬
understand the instruction 3 5 ‫ זה‬.‫אותם במקביל‬
according to ISA ‫ של‬Pipeline ‫מאפשר‬
step 3: Read Instructions Registers .‫פקודות‬
if needed, read operands RIP ALU
Memory 2 file
values from RAM 1 4
step 4: Execute
execute the operation
step 5: Write
if needed, write the output
value to RAM
add qword [rdi],
rbx step 5:
Write
step 1: step 2: step 3: step 4: write t value to
Fetch Decode Read Execute qword RAM[rdi]
87 "add qword [rdi], execute
addition , Source = read qword
rbx" to RIR rbx, Destination = RAM[rdi] to some
Pipeline Ep 085: Introduction to the CPU Pipeline
https://www.youtube.com/watch?v=E5qacBU1XjQ

t0 t5 t10 t15 t20 Execution


with no
I1 I2 I3 I4 pipeline

F D R E W F D R E WF D R E W F D R E W …
4 * 5 = 20 cycles
Each
We have 4
instruction
instruction
has 5
s
steps, 1
clock cycle
Execution each
t0 t1 t2 t3 t4 t5 t6 t7 t8 with
pipeline
I1 F D R E W n – number of instructions
in our program
I2 F D R E W We get an
I3 F D R E W 4 + 4 = 8 cycles improvement of 5
times in an execution
I4 F D R E W It takes After that, at every clock
cycle one instruction time of a program when
4 cycles
… to fill up execution is completed, we use pipeline.
… the pipe so we need an additional
4 cycles to complete 4
1 clock cycle
instructions of our
sample program
88
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA

x is only
This situation is called ready at t=5
Read-After-Write Hazard where the
first
t0 t3 t5 t6 instruction t0 t5 t6 t7 t8
finishes… -‫ ו‬stalls ‫זיהוי והכנסת‬
𝑥← 𝑦+𝑧 F D R E W
Stall 𝑥← 𝑦+𝑧 F D R E W
operand
‫ נעשים‬forwarding
𝑤←𝑥 F D R E W (wait) 𝑤←𝑥 F D stall R E W ‫ע"י קומפיילר או ע"י‬
‫ המתכנת לא‬,CPU
‫צריך לדאוג לזה‬
x needed here Opera
the second nd
instruction forwar
needs the
updated value of ding
x to be ready,
but x still holds t0 t4 t5 t6 t7
its previous
value… 𝑥← 𝑦+𝑧 F D R E W
𝑤←𝑥 F D stall R E W

89 this stall is
unavoidable
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA

,‫ כלומר‬,‫ של פקודות‬dependencies tree ‫כדי לזהות ייתכנות של שינוי סדר שורות קומפיילר בונה‬
‫ אז ניתן‬,side effects ‫ ורק אם פקודות בלתי תלויות ואינן מבצעות‬,‫איזו פקודה תלויה באיזו פקודה‬
stalls -‫לבצע החלפה כדי לחסוך ב‬
the third sometimes it
instruction is is possible to
completely change
independent of instructions
t0 t8 the two t0 t7
order to
preceding avoid stalling
𝑥← 𝑦+𝑧 F D R E W instructions 𝑥← 𝑦+𝑧 F D R E W
𝑤←𝑥 F D stall R E W 𝑎← 𝑏+𝑐 F D R E W
change
𝑎← 𝑏+𝑐 F stall D R E W order 𝑤←𝑥 F D R E W

‫נוסיף פקודה‬
‫שלישית לקוד‬
‫שלנו‬

90
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA

t0 t7 t0 t7 t7

I1 F D R E W F D R E W F D R E W
we
I2 F D R E W are F D R E W F D R E W
jumpi
I3 F D R E W ng
F D R E W F D R E W
F D R E W F D R E W F D R E W …
I4 F D R E W F D R E W
I5 F D R E W F D R E W
I6 F D R E W F D R E W

decision made decision made decision made


,‫אם התחלנו איטרציה נוספת‬
)‫נצטרך לבטל ביצוע (חלקי‬ ‫מצב כזה נקרא‬
‫של הפקודות האלו ולהתחיל‬ Pipeline flush
.‫ביצוע של פקודות הלולאה‬
‫זה כמובן פוגע‬
.performance-‫ב‬
? ‫מה ניתן לעשות‬
91
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA
There are three
policies:
t0 t7 1. Static Branch
2. Random
Prediction – for
I1 F D R E W Branch
example, for loops,
Prediction – pick
I2 F D R E W always guess that
randomly between
we would not jump,
I3 F D R E W
and for if-else,
whether a branch
F D R E W is taken or not
always guess that ‘if’
I4 condition is true.
3. Dynamic Branch
I5 Prediction – take
I6 some branch and
keep statistics of
fail/success. Use this
statistics to improve
the next predictions.

92
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA

t0 t7 t0 t7 t7

I1 F D R E W F D R E W F D R E W
we
I2 F D R E W are F D R E W F D R E W

jumpi
I3 F D R E W ng
F D R E W F D R E W
F D R E W F D R E W F D R E W
I4
I5
I6
Static -‫אם נשתמש ב‬
‫ עבור‬prediction policy
‫ אז נטעה רק‬,‫הלולאה שלנו‬
.i=0 ‫ כאשר‬,‫פעם אחת‬

93
Pipeline – if-else example

cmp ‫במקרה שלפי תוצאת‬


‫הקפיצה לא הייתה אמורה‬
‫ ומכיוון שביצענו את‬,‫להתבצע‬
‫ אין שלב של‬jle ‫הקפיצה (לפקודה‬
‫ ולכן היא תתבצע לפני סיום של‬,Read
‫ עלינו יהיה לבטל‬,)cmp ‫פקודת‬
.‫את הביצוע שלה‬ ‫בואו נקרא דוגמא‬
‫לפקודת‬
long absdiff absdiff: Assembly
(long * x, long y) cmpq %rsi, (%rdi) # check if x>y # instruction 1
jle Else # instruction 2
‫שתעזור לנו‬
{
movq (%rdi), %rax # rax x ‫במקרים מסויימים‬
if (*x > y) subq %rsi, %rax # rax x-y ‫להימנע משימוש‬
return *x-y; ret
else Else:
.‫בקפיצות‬
return y-*x; movq %rsi, %rax # rax y # instruction 3

} subq (%rdi), %rax # rax y-x #


instruction 4
ret #
instruction 5

94
Conditional Move
‫אחת האפשרויות‬
‫ ינסה לתרגם‬gcc -‫לפתור פגיעה ב‬
-‫ ל‬if-else ‫ זה‬Pipeline
conditional ‫להימנע מקפיצה‬
‫ בתנאי שזה‬move
long absdiff ‫ מתי‬.‫לא מסוכן‬ absdiff:
(long * x, long y) movq (%rdi), %rax # rax x
{
? ‫זה מסוכן‬ subq %rsi, %rax # rax x-y
if (*x > y) movq %rsi, %rdx # rdx y
return *x-y; subq (%rdi), %rdx # rdx y-x
else cmpq %rsi, (%rdi) # x-y ?
return y-*x; cmovle %rdx, %rax # if xy, result y-x
} ret

‫פעלת קפיצה הינה‬ cmov ‫בעזרת פקודת‬


cmovle –
Register Value conditional
‫יקרה בגלל שהיא‬ (conditional move)
-‫עלולה לפגוע ב‬ ,‫אפשר לבצע את אותו הקוד‬
%rdi x move if lower
‫ כלומר‬, pipeline ‫רק בצורה סדרתית – נבצע‬
or equal
%rsi y ‫בביצוע מקבילי של‬ x-y ‫ גם‬,‫את שני החישובים‬
‫פקודות‬ ‫ ואז נחליט איזה‬,y-x ‫וגם‬
%rax return value C Code ‫מהחישובים להחזיר‬
val = Test ? Then_Expr : Else_Expr;
95
Bad Cases for Conditional Move
Expensive Computations
val
val =
= Test(x)
Test(x) ?
? Hard1(x)
Hard1(x) :
: Hard2(x);
Hard2(x);

⬛ Both values get computed ‫אם החישוב ארוך ולוקח‬


‫ אז עדיף לנו‬,‫הרבה זמן‬
⬛ Only makes sense when computations are very simple ,‫לבדוק קודם את התנאי‬ ‫האם תמיד‬
‫ולבצע רק חישוב אחד‬
.‫כזה‬
‫נשתמש בחישוב‬
Risky Computations ‫ ונחליט‬,‫כפול‬
‫יכול להיות שהחישוב יגרום‬
val
val =
= p
p ?
? *p
*p :
: 0;
0; run-time error ‫לשגיאה‬ ‫מה להחזיר רק‬
⬛ Both values get computed
‫אם נבצע אותו ללא בדיקת‬
.‫תנאי‬
? ‫בסוף‬
pointer -‫ ניגש ל‬,‫למשל‬
⬛ May have undesirable effects ‫ עוד לפני שבדקנו‬null ‫שהינו‬
.null ‫האם הוא לא‬

Computations with side effects


val
val =
= x
x >
> 0
0 ?
? x*=7
x*=7 :
: x+=3;
x+=3; ‫ לגבי‬,‫כנ"ל‬
‫חישובים שמשנים‬
⬛ Both values get computed ‫ערכים של‬
96 ⬛ Must be side-effect free .‫משתנים‬
Agenda

‫סיכום‬ 5 3 1
Jump Assembly basic Registers
Tables instructions file
‫טבלאות ניתוב‬ ‫פקודות בסיסיות‬ ‫הרגיסטרים‬

Data types
and
C Calling Addressing
Pipeline
0
Convention modes
‫מיקבול ביצוע פקודות‬ ‫טיפוסים והפניות‬
‫פונקציות באסמבלי‬
‫לזכרון‬

6 4 2 Welcome

97
!Thank You

You might also like