Lecture9
Lecture9
Announcement
• Programming Assignment 2 is out
• Details: https://www.cs.rochester.edu/courses/252/
fall2024/labs/assignment2.html
• Due on Sep. 30th, 11:59 PM
• You (may still) have 3 slip days
• TA Office hours
• Grace Hopper Conference
2
Carnegie Mellon
Getting a bomb
3
Stack Frame: Putting It Together
Saved
Registers
Local
Caller Variables
Frame
Arguments
7, 8, …
%rsp Return Addr
Callee
Frame
4
Passing Function Arguments
• Two choices: memory or registers
• Registers are faster, but have limited amount
5
Registers
Passing Function Arguments %rdi
%rsi
• Two choices: memory or registers %rdx
• Registers are faster, but have limited amount %rcx
• x86-64 convention (Part of the Calling %r8
Conventions): %r9
• First 6 arguments in registers, in specific order
• The rest are pushed to stack Stack
• Return value is always in %rax
•••
Arg n
•••
Arg 8
Arg 7
5
Registers
Passing Function Arguments %rdi
%rsi
• Two choices: memory or registers %rdx
• Registers are faster, but have limited amount %rcx
• x86-64 convention (Part of the Calling %r8
Conventions): %r9
• First 6 arguments in registers, in specific order
• The rest are pushed to stack Stack
• Return value is always in %rax
•••
• Just conventions, not laws
• Not necessary if you write both caller and callee as Arg n
long as the caller and callee agree
• But is necessary to interface with others’ code •••
Arg 8
Arg 7
5
Stack Frame: Putting It Together
Saved
Registers
Local
Caller Variables
Frame
Arguments
7, 8, …
%rsp Return Addr
Callee
Frame
6
Managing Function Local Variables
• Two ways: registers and long incr(long *p, long val) {
long x = *p;
memory (stack) long y = x + val;
• Registers are faster, but *p = y;
return x;
limited. Memory is slower, }
but large. Smart compilers
will optimize the usage.
7
Stack Frame: Putting It Together
Saved
Registers
Local
Caller Variables
Frame
Arguments
7, 8, …
%rsp Return Addr
Callee
Frame
8
Register Saving Conventions
9
Register Saving Conventions
• Any issue with using registers for temporary storage?
Caller Callee
yoo: who:
… …
movq $15213, %rdx subq $18213, %rdx
call who …
addq %rdx, %rax ret
…
ret
9
Register Saving Conventions
• Any issue with using registers for temporary storage?
• Contents of register %rdx overwritten by who()
Caller Callee
yoo: who:
… …
movq $15213, %rdx subq $18213, %rdx
call who …
addq %rdx, %rax ret
…
ret
9
Register Saving Conventions
• Any issue with using registers for temporary storage?
• Contents of register %rdx overwritten by who()
• This could be trouble ➙ Need some coordination
Caller Callee
yoo: who:
… …
movq $15213, %rdx subq $18213, %rdx
call who …
addq %rdx, %rax ret
…
ret
9
Register Saving Conventions
• Conventions used in x86-64 (Part of the Calling Conventions)
• Some registers are saved by caller, some are by callee.
• Caller saved: %rdi,%rsi,%rdx,%rcx,%r8,%r9,%r10,%r11
• Callee saved: %rbx,%rbp,%r12,%r13,%14,%r15
• %rax holds return value, so implicitly caller saved
• %rsp is the stack pointer, so implicitly callee saved
10
Register Saving Conventions
• Common conventions
• “Caller Saved”
• Caller saves temporary values in its frame (on the stack) before
the call
• Callee is then free to modify their values
• “Callee Saved”
• Callee saves temporary values in its frame before using
• Callee restores them before returning to caller
• Caller can safely assume that register values won’t change after
the function call
11
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
… %rsp
12
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
… %rsp
call_incr2:
pushq %rbx
pushq $15213
movq %rdi, %rbx
movl $3000, %esi
leaq (%rsp), %rdi
call incr
addq %rbx, %rax
addq $8, %rsp
popq %rbx
ret
12
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
… %rsp
call_incr2:
pushq %rbx
pushq $15213
movq %rdi, %rbx
movl $3000, %esi
leaq (%rsp), %rdi
call incr
addq %rbx, %rax
addq $8, %rsp
popq %rbx
ret
12
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
…
Saved %rbx %rsp
call_incr2:
pushq %rbx
pushq $15213
movq %rdi, %rbx
movl $3000, %esi
leaq (%rsp), %rdi
call incr
addq %rbx, %rax
addq $8, %rsp
popq %rbx
ret
12
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
…
Saved %rbx
call_incr2:
pushq %rbx 15213 %rsp
pushq $15213
movq %rdi, %rbx
movl $3000, %esi
leaq (%rsp), %rdi
call incr
addq %rbx, %rax
addq $8, %rsp
popq %rbx
ret
12
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
…
Saved %rbx %rsp
call_incr2:
pushq %rbx 15213
pushq $15213
movq %rdi, %rbx
movl $3000, %esi
leaq (%rsp), %rdi
call incr
addq %rbx, %rax
addq $8, %rsp
popq %rbx
ret
12
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
… %rsp
Saved %rbx
call_incr2:
pushq %rbx 15213
pushq $15213
movq %rdi, %rbx
movl $3000, %esi
leaq (%rsp), %rdi
call incr
addq %rbx, %rax
addq $8, %rsp
popq %rbx
ret
12
Example
Stack
long call_incr2(long x) {
long v1 = 15213;
long v2 = incr(&v1, 3000); ...
return x+v2;
}
… %rsp
Saved %rbx
call_incr2:
pushq %rbx 15213
pushq $15213
movq %rdi, %rbx
movl
leaq
$3000, %esi
(%rsp), %rdi
• call_incr2 needs to save
%rbx (callee-saved) because it
call incr
addq %rbx, %rax will modify its value
addq
popq
$8, %rsp
%rbx
• It can safely use %rbx after call
incr because incr will have to
ret
save %rbx if it needs to use it
(again, %rbx is callee saved)
12
Stack Frame: Putting It Together
Saved
Registers
Local
Caller Variables
Frame
Arguments
7, 8, …
%rsp Return Addr
Callee
Frame
13
Carnegie Mellon
14
Carnegie Mellon
char string[12];
x x + 12
int val[5];
x x+4 x+8 x + 12 x + 16 x + 20
double a[3];
x x+8 x + 16 x + 24
char* p[3];
x x+8 x + 16 x + 24
15
Carnegie Mellon
Byte Ordering
16
Carnegie Mellon
Byte Ordering
• How are the bytes of a multi-byte variable ordered in memory?
16
Carnegie Mellon
Byte Ordering
• How are the bytes of a multi-byte variable ordered in memory?
• Example
• Variable x has 4-byte value of 0x01234567
• Address given by &x is 0x100
16
Carnegie Mellon
Byte Ordering
• How are the bytes of a multi-byte variable ordered in memory?
• Example
• Variable x has 4-byte value of 0x01234567
• Address given by &x is 0x100
• Conventions
• Big Endian: Sun, PPC Mac, IBM z, Internet
• Most significant byte has lowest address (MSB first)
• Little Endian: x86, ARM
• Least significant byte has lowest address (LSB first)
16
Carnegie Mellon
Byte Ordering
• How are the bytes of a multi-byte variable ordered in memory?
• Example
• Variable x has 4-byte value of 0x01234567
• Address given by &x is 0x100
• Conventions
• Big Endian: Sun, PPC Mac, IBM z, Internet
• Most significant byte has lowest address (MSB first)
• Little Endian: x86, ARM
• Least significant byte has lowest address (LSB first)
16
Carnegie Mellon
Byte Ordering
• How are the bytes of a multi-byte variable ordered in memory?
• Example
• Variable x has 4-byte value of 0x01234567
• Address given by &x is 0x100
• Conventions
• Big Endian: Sun, PPC Mac, IBM z, Internet
• Most significant byte has lowest address (MSB first)
• Little Endian: x86, ARM
• Least significant byte has lowest address (LSB first)
Byte Ordering
• How are the bytes of a multi-byte variable ordered in memory?
• Example
• Variable x has 4-byte value of 0x01234567
• Address given by &x is 0x100
• Conventions
• Big Endian: Sun, PPC Mac, IBM z, Internet
• Most significant byte has lowest address (MSB first)
• Little Endian: x86, ARM
• Least significant byte has lowest address (LSB first)
Representing Integers
Hex: 00003B6D Hex: FFFFC493
17
Carnegie Mellon
Representing Integers
Hex: 00003B6D Hex: FFFFC493
17
Carnegie Mellon
Representing Integers
Hex: 00003B6D Hex: FFFFC493
17
Carnegie Mellon
int val[5]; 1 5 2 1 3
address x x+4 x+8 x + 12 x + 16 x + 20
19
Carnegie Mellon
T A[R][C]; • •
• 2D array of data type T
• •
• •
• R rows, C columns
• Type T element requires K bytes A[R-1][0] • • • A[R-1][C-1]
19
Carnegie Mellon
T A[R][C]; • •
• 2D array of data type T
• •
• •
• R rows, C columns
• Type T element requires K bytes A[R-1][0] • • • A[R-1][C-1]
• Array Size
• R * C * K bytes
19
Carnegie Mellon
T A[R][C]; • •
• 2D array of data type T
• •
• •
• R rows, C columns
• Type T element requires K bytes A[R-1][0] • • • A[R-1][C-1]
• Array Size
• R * C * K bytes
• Arrangement
•Row-Major Ordering in most languages, including C
int A[R][C];
A A A A A A
[0] • • • [0] [1] • • • [1] • • • [R-1] • • • [R-1]
[0] [C-1] [0] [C-1] [0] [C-1]
4*R*C Bytes
19
Carnegie Mellon
int A[R][C];
A A A A A A
[0] ••• [0] • • • [i] ••• [i] • • • [R-1] ••• [R-1]
[0] [C-1] [0] [C-1] [0] [C-1]
A A+(i*C*4) A+((R-1)*C*4)
20
Carnegie Mellon
int A[R][C];
A A A A A
[0] ••• [0] • • • ••• [i] ••• • • • [R-1] ••• [R-1]
[0] [C-1] [j] [0] [C-1]
A A+(i*C*4) A+((R-1)*C*4)
A+(i*C*4)+(j*4)
21
Carnegie Mellon
22
Carnegie Mellon
Structures
r
struct rec {
int a[4];
double i; a i next
struct rec *next;
}; 0 16 24 32
• Characteristics
• Contiguously-allocated region of memory
• Refer to members within struct by names
• Members may be of different types
23
Carnegie Mellon
24
Carnegie Mellon
24
Carnegie Mellon
24
Carnegie Mellon
24
Carnegie Mellon
25
Carnegie Mellon
int *get_ap
(struct rec *r, size_t idx)
{
return &(r->a[idx]);
}
25
Carnegie Mellon
int *get_ap
(struct rec *r, size_t idx)
{
return &(r->a[idx]);
}
&((*r).a[idx])
25
Carnegie Mellon
int *get_ap
(struct rec *r, size_t idx) # r in %rdi, idx in %rsi
{ leaq (%rdi,%rsi,4), %rax
return &(r->a[idx]); ret
}
&((*r).a[idx])
25
Carnegie Mellon
Alignment
struct S1 {
char c;
int i[2];
double v;
} *p;
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
• Aligned Data
• If the data type requires K bytes, address must
be multiple of K
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
• Aligned Data
• If the data type requires K bytes, address must
be multiple of K
c
p+0
Multiple of 8
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
• Aligned Data
• If the data type requires K bytes, address must
be multiple of K
c 3 bytes
p+0 p+4
Multiple of 4
Multiple of 8
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
• Aligned Data
• If the data type requires K bytes, address must
be multiple of K
c 3 bytes i[0]
p+0 p+4
Multiple of 4
Multiple of 8
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
• Aligned Data
• If the data type requires K bytes, address must
be multiple of K
Multiple of 4
Multiple of 8
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
• Aligned Data
• If the data type requires K bytes, address must
be multiple of K
Multiple of 4 Multiple of 8
Multiple of 8
26
Carnegie Mellon
Alignment
• Unaligned Data struct S1 {
char c;
int i[2];
c i[0] i[1] v double v;
} *p;
p p+1 p+5 p+9 p+17
• Aligned Data
• If the data type requires K bytes, address must
be multiple of K
Multiple of 4 Multiple of 8
Multiple of 8 Multiple of 8
26
Carnegie Mellon
Alignment Principles
• Aligned Data
• If the data type requires K bytes, address must be multiple of K
• Required on some machines; advised on x86-64
• Motivation for Aligning Data: Performance
• Inefficient to load or store data that is unaligned
• Some machines don’t event support unaligned memory access
• Compiler
• Inserts gaps in structure to ensure correct alignment of fields
• sizeof() returns the actual size of structs (i.e., including padding)
27
Carnegie Mellon
28
Carnegie Mellon
29
Carnegie Mellon
29
Carnegie Mellon
29
Carnegie Mellon
Multiple of K=8
29
Carnegie Mellon
Saving Space
• Put large data types first in a Struct
• This is not something that a C compiler would always do
• But knowing low-level details empower a C programmer to write
more efficient code
struct S4 {
char c;
int i; c 3 bytes i d 3 bytes
char d;
} *p;
struct S5 {
int i;
char c; i c d 2 bytes
char d;
} *p;
30
Carnegie Mellon
Arrays of Structures
struct S2 {
• Overall structure length multiple of K double v;
int i[2];
• Satisfy alignment requirement char c;
for every element } a[10];
32
Carnegie Mellon
void bar() {
struct S test = foo(3, 4);
fprintf(stdout, “%d, %d\n”,
test.a, test.b);
// you will get “3, 4” from
the terminal
}
32
Carnegie Mellon
void bar() {
struct S test = foo(3, 4);
fprintf(stdout, “%d, %d\n”,
test.a, test.b);
// you will get “3, 4” from
the terminal
}
32
Carnegie Mellon
void bar() {
struct S test = foo(3, 4);
fprintf(stdout, “%d, %d\n”,
test.a, test.b);
// you will get “3, 4” from
the terminal
}
32
Carnegie Mellon
void bar() {
struct S test = foo(3, 4);
fprintf(stdout, “%d, %d\n”,
test.a, test.b);
// you will get “3, 4” from
the terminal
}
32
Carnegie Mellon
32
Carnegie Mellon
32
Carnegie Mellon
32
Carnegie Mellon
void bar() {
struct S test = foo(3, 4);
fprintf(stdout, “%d, %d\n”,
test.a, test.b);
// you will get “3, 4” from
the terminal
}
33
Carnegie Mellon
33
Carnegie Mellon
34
Carnegie Mellon
void call_echo() {
echo();
}
36
Carnegie Mellon
Char buf[4]; A B C D
address x x+1 x+2 x+3 x+4
Char buf[4]; A B C D F
address x x+1 x+2 x+3 x+4
Char buf[4]; A B C D E F
address x x+1 x+2 x+3 x+4
void call_echo() {
echo();
}
38
Carnegie Mellon
void call_echo() {
echo();
}
unix>./bufdemo-nsp
Type a string:0123
0123
38
Carnegie Mellon
void call_echo() {
echo();
}
unix>./bufdemo-nsp
Type a string:0123
0123
unix>./bufdemo-nsp
Type a string:01234
Segmentation Fault
38
Carnegie Mellon
39
Carnegie Mellon
43