Lecture 4 - Assembly Programming - A 2025
Lecture 4 - Assembly Programming - A 2025
מחשב
מצגת 4
Assembly and Machine Language
סיכום 5 3 1
Jump Assembly basic Registers
Tables instructions file
טבלאות ניתוב פקודות בסיסיות הרגיסטרים
Data types
and
C Calling Addressing
Pipeline
0
Convention modes
מיקבול ביצוע פקודות טיפוסים והפניות
פונקציות באסמבלי
לזכרון
6 4 2 Welcome
2
Assembly basics
1952: IBM researcher
Nathaniel Rochester
wrote the first
symbolic Assembler
for IBM 701 machine,
which allowed
programs to be
written in short,
One Assembly :1-1
readable commands.
language
instruction
corresponds to one
machine language
.instruction
standard libraries ? NO
data structures ? NO
We would
conditions ? NO manually
4
simulate all the
rest
AT&T vs Intel Assembly Syntax
AT&T assembly syntax was used for Unix Intel assembly syntax was used for Windows
5
Registers
x86-64 Integer Registers
register
… is a 64-
bit array
0 1 1 1 0 0 0 1 1 0 0 0 1 1 0 1 0 . . . 0 1 1 1 0 1 1 0 0 0 1 1
7
x86-64 Integer Registers
64 bits
32 bits
16bits
8 bits 8 bits
ah al
eax Low
rax ax byte
High
byte
8
x86-64 Integer Registers
general purpose registers
Extended (Accumulator,
Base, Counter, Data, …)
Hexa
index registers
(Source Index, Stack registers
Destination
Index)
RSP, RBP
RSP - RBP - Base
Stack pointer –
pointer - contains
contains address of
address of current
last used activation
dword in frame
general purpose the stack
registers
9
X86-64 Registers – (almost) full picture
In this course
we study only
a small
10 amount of
registers
Assembly Basics
Assembly Language Program
• Consists of processor instructions, assembler directives, and data
• Translated by Assembler into machine language instructions (binary code)
Example:
Assembly code: AL register
mov al, 0x61 # load 0x61 to AL register 0 1 1 0 0 0 0 1
machine code:
10110000 01100001
1011 a binary code (opcode) of instruction 'MOV'
0 specifies if data is byte (‘0’) or full size 16/32/64 bits (‘1’)
000 a binary identifier for AL register
01100001 a binary representation of 0x61
12
Basic Structure of Assembly Instruction
RAM
optionallabel: opcode operands # comment
filed
Label equivalents operand may be optional
to instruction reg (register), field
address in RAM. mem (memory),
Non-local labels or imm
must be unique. (immediate)
Example:
myLabel: mov al, 0x61 # load 0x61 to AL register
… jmp makes jmp myLabel
jmp myLabel RIP to point to
RAM[myLabel Text …
…
] Segment myLabel mov al, 0x61 1024
13
Example of Assembly directive
RAM
buffer: resb 4 # reserve 4 bytes
4 bytes
00000000
'[2048]' means 00000000
RAM[2048] buffer 00000010 2048
Appropriate C code:
int buffer;
buffer = 2;
mov dword [buffer], 2 => mov dword [2048], 2
'dword' means double
word (4 bytes) starting
from buffer, so we refer to
14
RAM[buffer] :
RAM[buffer+4]
Sizes
Example:
mov AL, 3 movb $3, %AL
mov AX, 3 movw $3, %AX
mov EAX, 3 movl $3, %EAX
mov RAX, 3 movq $3, %RAX
15
Memory Addressing Modes
RAM
● An addressing mode is an expression that calculates an address in memory 01010101
11011110
template template 00110011
Intel עבור AT&T עבור
01010101
style style RAX 00110011
[ 𝑥 𝑏𝑎𝑠𝑒+ 𝑥 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑥 𝑖𝑛𝑑𝑒𝑥+ 𝑖𝑚𝑚/ 𝑟𝑒𝑔 ] 𝑖𝑚𝑚(𝑥𝑏𝑎𝑠𝑒 , 𝑥 𝑖𝑛𝑑𝑒𝑥 , 𝑥 𝑠𝑐𝑎𝑙𝑒 ) 01010101
11011110
any
integer
1,2,4 any integer 01010101 1040
or 8 register
registe except for rsp
Examples: r
template
Intel עבור דוגמא לשימוש
style addressing -ב
בסריקתmode 2064
A[4]
[ 𝑥 𝑏𝑎𝑠𝑒+ 𝑥 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑥 𝑖𝑛𝑑𝑒𝑥+ 𝑖𝑚𝑚/ 𝑟𝑒𝑔 ] מערך
כל
האפשרויות
עבור
addressing
mode
https://reverseengineering.stackexchange.com/questions/22115/understanding-operand-forms
18
Sanity Test
שאלה
- הציעו שורת קוד שקולה בAssembly -עבור כל שורת קוד ב
Assembly C suppose rax x, rdx y C
movq x = 0x4;
?
$0x4,%rax
movq $-147, *x = -?
(%rax) 147;
movq %rax, y = x;?
%rdx
movq %rax, *y = x;
?
(%rdx)
19
Basic Arithmetical Instructions
שימו לב לתוספת של
.' בשם של הפקודהq'
ADD - add integers זה גודל של
הארגומנטים של
Example: .הפקודה
add RAX, RBX # (RAX gets a value of RAX+RBX)
20
Basic Arithmetical Instructions
RAM
INC - increment integer
Example:
inc RAX # (RAX gets a value of RAX+1)
4 bytes
incq %RAX 00000000
00000000
00000000
buffer 00000010 00000001
DEC - decrement integer
Example:
dec byte [buffer] # (first byte of RAM[buffer]--)
decb buffer AT&T -שימו לב שב
כשמצייניםstyle
זו הפנייה,label
ללא צורך,לזכרון
בסוגריים מרובעים
21
Basic Logical Instructions
23
CMP – Compare Instruction
Examples:
24
TEST – Logical Compare Instruction
Examples:
25
Shift – Bitwise Shift
27
JMP – Unconditional Jump
JMP tells the processor that the next instruction to be executed
is located at the label that is given as part of jmp instruction.
מה יקרה
כשנריץ את
This is infinite ? הקוד הבא
Example:
loop !
mov eax, 1
RIP register gets
inc_again: inc_again label int x=1;
inc eax (address). while (true)
jmp inc_again x++;
mov eax, 5
This instruction is
never reached
from this code.
28
j<cond> – Conditional Jump
int max:
int max(int
max(int x,
x, int
int y)
y) # suppose rdi x, rsi y
{
{ cmpl %esi, %edi # edi - esi החלק
if
if (x
(x >
> y)
y) החישובי
return
jle else # jle – jump lower/equal
return x;
x;
else movl %edi, %eax
else high bits 32
return
return y;
y;
ret
of RAX
}
} else: register are
movl %esi, %eax 'set to '0
ret
31
d<size> – declare initialized data – AT&T
in C string,
Example: '\0'
character is
int x; added
.byte .word .long .quad int y = 0; automatical
Define char, short, int or long, respectively char str [] = "Hi\ ly
n";
.space Global
Reserve a specific number of bytes variables int A [10] = {0};
definition int main {
.zero …
Reserve space and initializes it with zero bytes }
in Assembly
.section .bss string, '\0'
.string character is
Define string constants x : .space 4
added
.section .data automatically
y : .zero 4
.fill x, y, val str : .string "Hi\n"
Define x elements of size y with value val A : .fill 10, 4, 0
.section .text
.globl main
34
main
…
d<size> – declare initialized data - Intel
size in bytes size Directive
1 byte byte DB
2 bytes
4 bytes
word
double word
DW
DD
self-
8 bytes quadword DQ study
Examples:
x: db 0x55
x: db 0x55,0x56,0x57 ; three bytes in succession
x: db 'a‘ ; character constant 0x61 (ascii code of ‘a’)
x: db 'hello’,10, 0 ; string constant
x: dw 0x1234 ; 0x34 0x12
x: dw ‘A' ; 0x41 0x00 – complete to word
x: dw ‘ABC' ; 0x41 0x42 0x43 0x00 – complete to word
35 x: dd 0x12345678 ; 0x78 0x56 0x34 0x12
Sanity Test
section .data
x dd 4
שאלה
y dd 2 ? מה התוכנית עושה
z dd 3
section .text
global _start
_start:
mov ecx, [x]
cmp ecx, [y]
jg check_z
mov ecx, [y]
check_z:
cmp ecx, [z]
jg exit
mov ecx, [z]
exit:
mov rax, 1
36 int 0x80
Sanity Test
section .data
x dd 4
שאלה
y dd 2 ? מה התוכנית עושה
z dd 3
The program finds the maximum among
section .text x, y, and z and stores it in ecx.
global _start
_start: Let's go through the program step by step:
1.The program defines three variables in the data section:
mov ecx, [x]
1. x = 4
cmp ecx, [y] 2. y = 2
jg check_z 3. z = 3
mov ecx, [y] 2.In the _start section:
1. load the value of x (4) into ecx
check_z: 2. compare ecx (4) with y (2)
cmp ecx, [z] 3. Since 4 > 2, jump to check_z
3.In check_z:
jg exit
1. compare ecx (still 4) with z (3)
mov ecx, [z] 2. Since 4 > 3, jump to exit
4.In exit:
exit: 1. move 1 into rax (this is the system call number for exit on Linux)
mov rax, 1 2. trigger a system interrupt with int 0x80, which exits the program
37 int 0x80
lea – Load Effective Address Instruction
⬛ Uses
▪ Computing addresses without a memory reference
▪ e.g., translation of p = &x[i]; The multiplication involved
in address calculations is
▪ Computing arithmetic expressions of the form x + k*y handled by Address
Generation Unit (AGU)
▪ k = 1, 2, 4, or 8 and not by ALU. AGU is a
specialized part of CPU that
בסך הכל בעזרת שתי is optimized for computing
lea בעזרת הפקודות חישבנו addresses efficiently.
אפשר לבצע
הינה,חישובים
⬛ Example הדוגמא לכך
החלק
long m12:
m12:
long m12(long
m12(long x)
x) #
החישובי
{ # suppose
suppose rdi
rdi x
x
{ leaq (%rdi,%rdi,2),
return leaq (%rdi,%rdi,2), %rax
%rax #
# t
t xx +
+ x*2
x*2
return x*12;
x*12; salq $2,
} salq $2, %rax
%rax #
# t<<2
t<<2
} ret
38
ret
C Calling Convention
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
40
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
41
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
42
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
43
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
44
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
45
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
46
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
47
Stack Operations
Example:
movq $0x1020, %ax
pushw %ax
movq $0x3040, %ax
pushw %ax
movq $0x50607060, %eax
pushl %eax
popq %rbx
48
X86-64 C Calling Convention
49 r15
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
100 232 240
RBP 240
232 RSP
Stack 224
216
212
suppose …
int result; the section result: resd 1 130
following .bss ret 129
void main() { position of
result = func(1, pop rbp 127
RSP and
2); RBP mov rsp, rbp 126
} suppose mov eax, [rbp-4] 124
int func(int x, int y) the mov [rbp-4], edi 120
{ following add edi, esi 116
int sum; position of sub rsp, 4 114
sum = x + y; code and section 112
mov rbp, rsp
return sum; data
} .text func push rbp 110
… 109
mov [result], eax 107
call func 104
50 mov esi, 2 102
main mov edi, 1 100 RIP
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
100 232 240
RBP 240
232 RSP
224
section .bss 216
result: resd 1 212
section .text
main: ; caller code …
int result; result: resd 1 130
ret 129
void main() {
result = func(1, pop rbp 127
2); mov rsp, rbp 126
} mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
{ add edi, esi 116
int sum; sub rsp, 4 114
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104
51 mov esi, 2 102
main mov edi, 1 100 RIP
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
104 232 240
RBP 240
232 RSP
224
section .bss 216
result: resd 1 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
ret 129
void main() {
result = func(1, pop rbp 127
2); mov rsp, rbp 126
} mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
{ add edi, esi 116
int sum; sub rsp, 4 114
sum = x + y; 112
mov rbp, rsp
return sum;
func push rbp 110
}
… 109
mov [result], eax 107
call func 104 RIP
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
64 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
127 216 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216 RSP
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127 RIP
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
65 mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
129 216 216
240
232
107 - return address 224
section .bss RBP 240 - RBP old value 216 RSP
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129 RIP
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
66 main() mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
129 224 240
RBP 240
232
107 - return address 224 RSP
section .bss 240 - RBP old value 216
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129 RIP
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
67 main() mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
130 224 240
RBP 240
232
107 - return address 224 RSP
section .bss 240 - RBP old value 216
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130 RIP
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
68 main()
ret ; return from the function mov esi, 2 102
main mov edi, 1 100
Registers file
X86-64 C Calling Convention RIP RSP RBP RAM
107 232 240
RBP 240
232 RSP
107 - return address 224
section .bss 240 - RBP old value 216
result: resd 1 3 212
section .text
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1 130
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107 RIP
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
69 main()
ret ; return from the function mov esi, 2 102
main mov edi, 1 100
X86-64 C Calling Convention RAM
RBP 240
232 RSP
מכלול של
השורות
107 - return address 224
section .bss המסומנות מגדיר 240 - RBP old value 216
result: resd 1 C Calling 3 212
section .text Convention
main: ; caller code …
int result; mov edi, 1 ; x – first argument
result: resd 1
mov esi, 2 ; y – second argument
call func ; push return address into Stack ret 129
void main() {
result = func(1, ; move RIP to point to func code pop rbp 127
2); mov [result], eax ; retrieve return value from EAX mov rsp, rbp 126
} … mov eax, [rbp-4] 124
int func(int x, int y) mov [rbp-4], edi 120
func: ; callee code
{ add edi, esi 116
push rbp ; backup RBP
int sum; sub rsp, 4 114
mov rbp, rsp ; set RBP to Func activation frame
sum = x + y; 112
sub rsp, 4 ; allocate space for local variable sum mov rbp, rsp
return sum;
add edi, esi ; calculate x+y func push rbp 110
} mov [rbp-4], edi ; set sum to be x+y … 109 RIP
mov eax, [rbp-4] ; put return value into (part of) RAX
mov [result], eax 107
mov rsp, rbp ; close function activation frame
call func 104
pop rbp ; restore activation frame of
70 main()
ret ; return from the function mov esi, 2 102
main mov edi, 1 100
C Calling Convention – sumstore() example
שרוציםC קוד
לתרגלם לשפת
C Code Assembly קוד
Assembly
Assembly code תואם
long plus(long x, long y);
rdx is caller
sumstore: saved, so we
void sumstore (long x, long y, long call plus must backup
*dest) movq %rax, (%rdx) its value
{ ret
long t = plus(x, y);
*dest = t;
}
Register Value
%rdi x
%rsi y
71 %rdx dest
Sanity Test
שאלה
Assembly - נתון לקוד המתאים בC תרגמו קוד
swap: swap:
void swap(int *x, int *y) mov eax, word [rdi] mov eax, dword [rdi]
3 1
mov edx, word [rsi]
{ mov word [rdi], edx
mov edx, dword [rsi]
mov dword [rdi], edx
int temp = *x; mov word [rsi], eax mov dword [rsi], eax
ret ret
*x = *y;
swap:
*y = temp; swap:
mov eax, rdi
mov eax, dword [rdi] 4 2
mov edx, rsi
} mov dword [rdi],
mov rdi, edx
[rsi]
mov rsi, eax
mov dword [rsi], eax
ret
ret
72
Sanity Test
שאלה
Assembly - נתון לקוד המתאים בC תרגמו קוד
swap: swap:
void swap(int *x, int *y) mov eax, word [rdi] mov eax, dword [rdi]
3 1
mov edx, word [rsi]
{ mov word [rdi], edx
mov edx, dword [rsi]
mov dword [rdi], edx
int temp = *x; mov word [rsi], eax mov dword [rsi], eax
ret ret
*x = *y;
swap:
*y = temp; swap:
mov eax, rdi
mov eax, dword [rdi] 4 2
mov edx, rsi
} mov dword [rdi],
mov rdi, edx
[rsi]
mov rsi, eax
mov dword [rsi], eax
ret
ret
73
Reading Condition Codes
יכילal רגיסטר
- דוגמת שימוש
פונקציית gt: את התשובה
Example: predicate x > y האם
cmpl %esi, %edi # compare x : y נרחיב את הערך
long gt (int x, int y) setg %al # al x > y על פניal -שיש ב
{ movzbq %al, %rax # zero rest bits of return כי,rax כל
74 - מועבר בvalue
return x > y; %eax
} ret rax
Reading Condition Codes
SetX – set combination of flags to destination 8-bit register יש התאמה בין
הפקודות
> לביןj<cond
setX הפקודות
בחישובים של
flags
sete
setne
sets
setns
setg
setge
setl
setle
seta
setb
… …
75
Jump table
Switch-Case Statement
81
Sparse Switch-Case
Write (store) to
RAM
• put address on
Address Bus
• pub data on Data
Bus
• enable writing
84
CPU – instruction execution steps CRAFTING A CPU TO RUN PROGRAMS
https://www.youtube.com/watch?v=GYlNoAMBY6o
יחידה
מחשבת
יחידה
85 שולטת
CPU – instruction execution steps CRAFTING A CPU TO RUN PROGRAMS
https://www.youtube.com/watch?v=GYlNoAMBY6o
Fetch – bring
Fetch
next instruction
Decode (RIP points it)
from RAM
Decode – break
the instruction to
its parts (opcode
and arguments)
Execute – Write-back –
execute the write back
instruction output to RAM
Read – bring calculation if needed
Read
all RAM
arguments to
CPU registers Execute
Write-back
86
Pipeline
Each instruction is composed of (at most) five steps
step 1: Fetch
bring instruction from RAM[RIP] to Data שלבים שונים
RIR register Memory מבוצעים ע"י חומרה
step 2: Decode ולכן ניתן לבצע,שונה
understand the instruction 3 5 זה.אותם במקביל
according to ISA שלPipeline מאפשר
step 3: Read Instructions Registers .פקודות
if needed, read operands RIP ALU
Memory 2 file
values from RAM 1 4
step 4: Execute
execute the operation
step 5: Write
if needed, write the output
value to RAM
add qword [rdi],
rbx step 5:
Write
step 1: step 2: step 3: step 4: write t value to
Fetch Decode Read Execute qword RAM[rdi]
87 "add qword [rdi], execute
addition , Source = read qword
rbx" to RIR rbx, Destination = RAM[rdi] to some
Pipeline Ep 085: Introduction to the CPU Pipeline
https://www.youtube.com/watch?v=E5qacBU1XjQ
F D R E W F D R E WF D R E W F D R E W …
4 * 5 = 20 cycles
Each
We have 4
instruction
instruction
has 5
s
steps, 1
clock cycle
Execution each
t0 t1 t2 t3 t4 t5 t6 t7 t8 with
pipeline
I1 F D R E W n – number of instructions
in our program
I2 F D R E W We get an
I3 F D R E W 4 + 4 = 8 cycles improvement of 5
times in an execution
I4 F D R E W It takes After that, at every clock
cycle one instruction time of a program when
4 cycles
… to fill up execution is completed, we use pipeline.
… the pipe so we need an additional
4 cycles to complete 4
1 clock cycle
instructions of our
sample program
88
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA
x is only
This situation is called ready at t=5
Read-After-Write Hazard where the
first
t0 t3 t5 t6 instruction t0 t5 t6 t7 t8
finishes… - וstalls זיהוי והכנסת
𝑥← 𝑦+𝑧 F D R E W
Stall 𝑥← 𝑦+𝑧 F D R E W
operand
נעשיםforwarding
𝑤←𝑥 F D R E W (wait) 𝑤←𝑥 F D stall R E W ע"י קומפיילר או ע"י
המתכנת לא,CPU
צריך לדאוג לזה
x needed here Opera
the second nd
instruction forwar
needs the
updated value of ding
x to be ready,
but x still holds t0 t4 t5 t6 t7
its previous
value… 𝑥← 𝑦+𝑧 F D R E W
𝑤←𝑥 F D stall R E W
89 this stall is
unavoidable
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA
, כלומר, של פקודותdependencies tree כדי לזהות ייתכנות של שינוי סדר שורות קומפיילר בונה
אז ניתן,side effects ורק אם פקודות בלתי תלויות ואינן מבצעות,איזו פקודה תלויה באיזו פקודה
stalls -לבצע החלפה כדי לחסוך ב
the third sometimes it
instruction is is possible to
completely change
independent of instructions
t0 t8 the two t0 t7
order to
preceding avoid stalling
𝑥← 𝑦+𝑧 F D R E W instructions 𝑥← 𝑦+𝑧 F D R E W
𝑤←𝑥 F D stall R E W 𝑎← 𝑏+𝑐 F D R E W
change
𝑎← 𝑏+𝑐 F stall D R E W order 𝑤←𝑥 F D R E W
נוסיף פקודה
שלישית לקוד
שלנו
90
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA
t0 t7 t0 t7 t7
I1 F D R E W F D R E W F D R E W
we
I2 F D R E W are F D R E W F D R E W
jumpi
I3 F D R E W ng
F D R E W F D R E W
F D R E W F D R E W F D R E W …
I4 F D R E W F D R E W
I5 F D R E W F D R E W
I6 F D R E W F D R E W
92
Pipeline – challenges CPU Pipelining - The cool way your CPU
avoids idle time!
https://www.youtube.com/watch?v=cZIPxra_apA
t0 t7 t0 t7 t7
I1 F D R E W F D R E W F D R E W
we
I2 F D R E W are F D R E W F D R E W
…
jumpi
I3 F D R E W ng
F D R E W F D R E W
F D R E W F D R E W F D R E W
I4
I5
I6
Static -אם נשתמש ב
עבורprediction policy
אז נטעה רק,הלולאה שלנו
.i=0 כאשר,פעם אחת
93
Pipeline – if-else example
94
Conditional Move
אחת האפשרויות
ינסה לתרגםgcc -לפתור פגיעה ב
- לif-else זהPipeline
conditional להימנע מקפיצה
בתנאי שזהmove
long absdiff מתי.לא מסוכן absdiff:
(long * x, long y) movq (%rdi), %rax # rax x
{
? זה מסוכן subq %rsi, %rax # rax x-y
if (*x > y) movq %rsi, %rdx # rdx y
return *x-y; subq (%rdi), %rdx # rdx y-x
else cmpq %rsi, (%rdi) # x-y ?
return y-*x; cmovle %rdx, %rax # if xy, result y-x
} ret
סיכום 5 3 1
Jump Assembly basic Registers
Tables instructions file
טבלאות ניתוב פקודות בסיסיות הרגיסטרים
Data types
and
C Calling Addressing
Pipeline
0
Convention modes
מיקבול ביצוע פקודות טיפוסים והפניות
פונקציות באסמבלי
לזכרון
6 4 2 Welcome
97
!Thank You