0% found this document useful (0 votes)
16 views59 pages

Computer Systems: Lecture 8: Free Memory Management

Uploaded by

Prerna Pasoriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views59 pages

Computer Systems: Lecture 8: Free Memory Management

Uploaded by

Prerna Pasoriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

CS 5600

Computer Systems

Lecture 8: Free Memory


Management
Recap of Last Week
• Last week focused on virtual memory
– Gives each process the illusion of vast, empty memory
– Offers protection and isolation
31 24 23 16 15 8 7 0

PD Index PT Index Offset


Page Page Physical
Directory Tables Memory

CR3
Register
2
Dynamic Allocation of Pages
Virtual
• Page tables allow the OS to Memory
dynamically assign physical frames to
ESP Stack
processes on-demand Stack
– E.g. if the stack grows, the OS can map in Stack
an additional page
• On Linux, processes use
sbrk()/brk()/mmap() to request
additional heap pages Heap
Heap
– But, these syscalls only allocates memory Heap
in multiples of 4KB pages
Code

3
What About malloc() and free()?
• The OS only allocates and frees memory in units
of 4KB pages
– What if you want to allocate <4KB of memory?
– E.g. char * string = (char *) malloc(100);
• Each process manages its own heap memory
– On Linux, glibc implements malloc() and free(),
manages objects on the heap
– The JVM uses a garbage collector to manage the heap
• There are many different strategies for managing
free memory
4
Free Space Management
• Todays topic: how do processes manage free
memory?
1. Explicit memory management
• Languages like C, C++; programmers control memory
allocation and deallocation
2. Implicit memory management
• Languages like Java, Javascript, Python; runtime takes care
of freeing useless objects from memory
• In both cases, software must keep track of the
memory that is in use or available

5
Why Should You Care?
• Regardless of language, all of our code uses
dynamic memory
• However, there is a performance cost associated
with using dynamic memory
• Understanding how the heap is managed leads to:
– More performant applications
– The ability to diagnose difficult memory related errors
and performance bottlenecks

6
Key Challenges
• Maximizing CPU performance
– Keeping track of memory usage requires effort
• Maximize parallelism
– Heap memory is shared across threads
– Thus, synchronization may be necessary
• Minimizing memory overhead
– Metadata is needed to track memory usage
– This metadata adds to the size of each object
• Minimize fragmentation
– Over time, deallocations create useless gaps in memory
7
• Free Lists
– Basics
– Speeding Up malloc() and free()
– Slab Allocation
– Common Bugs
• Garbage Collectors
– Reference Counting
– Mark and Sweep
– Generational/Ephemeral GC
– Parallel Garbage Collection 8
Setting the Stage
• Many languages allow programmers to explicitly
allocate and deallocate memory
– C, C++
– malloc() and free()
• Programmers can malloc() any size of memory
– Not limited to 4KB pages
• free() takes a pointer, but not a size
– How does free() know how many bytes to deallocate?
• Pointers to allocated memory are returned to the
programmer
– As opposed to Java or C# where pointers are “managed”
9
– Code may modify these pointers
Requirements and Goals
• Keep track of memory usage
– What bytes of the heap are currently
allocated/unallocated?
• Store the size of each allocation
– So that free() will work with just a pointer
• Minimize fragmentation…
– … without doing compaction or relocation
– More on this later
• Maintain higher performance
– O(1) operations are obviously faster than O(n), etc.
10
Heap Fragmentation
obj * obj1, * obj2;
Heap Memory
hash_tbl * ht;
int array[];
char * str1, * str2; str1


free(obj2); str2 array
free(array);
… ht
str2 = (char *) malloc(300);
obj2
• This is an example of external fragmentation
– There is enough empty space for str2, but the space obj1
isn’t usable
• As we will see, internal fragmentation may also
be an issue 12
The Free List
• A free list is a simple data structure for managing
heap memory
• Three key components
1. A linked-list that records free regions of memory
• Free regions get split when memory is allocated
• Free list is kept in sorted order by memory address
2. Each allocated block of memory has a header that
records the size of the block
3. An algorithm that selects which free region of
memory to use for each allocation request
13
Free List Data Structures
Heap Memory (4KB)
• The free list is a linked list
• Stored in heap memory,
alongside other data
• For malloc(n):
num_bytes = n + sizeof(header)

typedef struct node_t { • Linked list of regions of


int size; free space
• size = bytes of free space
struct node_t * next;
} node;
• Header for each block
of allocated space
typedef struct header_t { • size = bytes of
int size; allocated space next ∅
} header; node * head 4088 14
Code to Initialize a Heap
// mmap() returns a pointer to a chunk of free space
node * head = mmap(NULL, 4096, PROT_READ|
PROT_WRITE,
MAP_ANON|MAP_PRIVATE, -1, 0);
head->size = 4096 – sizeof(node);
head->next = NULL;

15
Allocating Memory (Splitting)
Heap Memory (4KB)
char * s1 = (char *) malloc(100); // 104 bytes
char * s2 = (char *) malloc(100); // 104 bytes
char * s3 = (char *) malloc(100); // 104 bytes

next ∅
node * head 3776
typedef struct node_t {
int size; char * s3
next ∅
Free region isnode
“split”
struct node_t * next; * head 100
3880
into allocated and free
} node; regions
char * s2
next ∅
node * head 3984
100
typedef struct header_t {
int size; next ∅
char * s1
} header; Header node * head 4088
100 16
Freeing Memory
Heap Memory (4KB)
• The free list is kept in sorted order
– free() is an O(n) operation
free(s2); // returns 100 + 4 –All
8 bytes
memory is free, but
free(s1); // returns 100 + 4 -the free list divided into
8 bytes
four regions
free(s3); // returns 100 + 4 - 8 bytes next ∅
node * head 3776
typedef struct node_t {
int size; char * s3
next
struct node_t * next; 100
96
} node;
These pointers are next
char * s2
96
100
typedef struct“dangling”:
header_tthey{ still point node * head
to heap memory, but the
int size; pointers are invalid next
char * s1
} header; node * head 100
96 17
Coalescing
Heap Memory (4KB)
• Free regions should be merged with their
neighbors
– Helps to minimize fragmentation
– This would be O(n2) if the list was not sorted
next ∅
3776
typedef struct node_t {
int size; next
struct node_t * next; 96
} node;
next
96
typedef struct header_t {
int size; next ∅
} header; node * head 4088
304
200
96 18
Choosing Free Regions (1)
Heap Memory (4KB)
int i[] = (int*) malloc(8); next ∅
// 8 + 4 = 12 total bytes 3596

char * s2

• Which free region should


be chosen? next
16
• Fastest option is First-Fit
char * s1
– Split the first free region with
>=8 bytes available
next
• Problem with First-Fit? node * head 38
– Leads to external next
node *int i[]
head 50
fragmentation
Choosing Free Regions (2)
Heap Memory (4KB)
int i[] = (int*) malloc(8); next ∅
// 8 + 4 = 12 total bytes 3596

char * s2

• Second option: Best-Fit next


4
– Locate the free region with next
int i[] 16
size closest to (and >=) 8 bytes
• Less external fragmentation char * s1

than First-fit
• Problem with Best-Fit?
next
– Requires O(n) time node * head 50
Basic Free List Review
• Singly-linked free list
• List is kept in sorted order
– free() is an O(n) operation
– Adjacent free regions are coalesced
• Various strategies for selecting which free region to
use for a given malloc(n)
– First-fit: use the first free region with >=n bytes available
• Worst-case is O(n), but typically much faster
• Tends to lead to external fragmentation at the head of the list
– Best-fit: use the region with size closest (and >=) to n
21
• Less external fragments than first-fit, but O(n) time
Improving Performance
1. Use a circular linked list and Next-Fit
– Faster than Best-Fit, less fragmentation than First-fit
2. Use a doubly-linked free list with footers
– Good: makes free() and coalesce O(1) time
– Bad: small amount of memory wasted due to headers and
footers
3. Use bins to quickly locate appropriately sized free
regions
– Good: much less external fragmentation, O(1) time
– Bad: much more complicated implementation
– Bad: some memory wasted due to internal fragmentation
22
Circular List and Next-Fit
Heap Memory (4KB)
int i[] = (int*) malloc(8); next
3596

1. Change a singly-linked, to char * s2

circular linked list


2. Use First-Fit, but move next
node * head 16
head after each split
char * s1
– Known as Next-Fit
– Helps spread allocations,
next
reduce fragmentation 38
– Faster allocations than next
node *int i[]
head 50
Best-Fit
Towards O(1) free()
typedef struct node_t {
bool free;
• free() is O(n) because the
int size; free list must be kept in
struct node_t * next;
struct node_t * prev; sorted order
} node;
• Key ideas:
typedef struct header_t { – Move to a doubly linked list
bool free;
int size; – Add footers to each block
} header;
• Enables coalescing without
typedef struct footer_t { sorting the free list
int size;
} header; – Thus, free() becomes O(1)

24
Example Blocks
typedef struct node_t {
bool free;
int size; Footer
struct node_t * next;
Free Block Allocated Block
struct node_t * prev;
} node;
next
typedef struct header_t {
bool free;
int size;
} header; prev

typedef struct footer_t {


int size;
} header;

25
Locating Adjacent Free Blocks
Heap Memory (4KB) • Suppose we have free(i)
• Locate the next and previous
free blocks
hn
char * p = (char *) i; // for convenience
// header of the current block
header * h = (header *) (p – sizeof(header));
int * i // header of the next block
h header * hn = (header *) (p + h->size + sizeof(footer));
// previous footer
footer * f = (footer *) (p – sizeof(header) – sizeof(footer));
// previous header
header * hp = (header *)
hp ((char *) f – f->size – sizeof(header));
26
Coalescing is O(1)
node * n = (node *) h, nn, np; • Be careful of corner cases:
n->free = true; • The first free block
if (hn->free) { // combine with the next free block • The last free block
nn = (node *) hn;
n->next = nn->next; n->prev = nn->prev;
nn->next->prev = n; nn->prev->next = n;
n->size += nn->size + sizeof(header) + sizeof(footer);
((footer *) ((char *) n + n->size))->size = n->size;
}
if (hp->free) { // combine with the previous free block
np = (node *) hp;
np->size += n->size + sizeof(header) + sizeof(footer);
((footer *) ((char *) np + np->size))->size = np->size;
}
if (!hp->free && !hn->free) {
// add the new free block to the head of the free list 27
Speeding Up malloc()
• At this point, free() is O(1)
• But malloc() still has problems
– Next-Fit: O(1) but more fragmentation
– Best-Fit: O(n) but less fragmentation
• Two steps to speed up malloc()
1. Round allocation requests to powers of 2
• Less external fragmentation, some internal fragmentation
2. Divide the free list into bins of similar size blocks
• Locating a free block of size round(x) will be O(1)

28
Rounding Allocations
• malloc(size)

size += sizeof(header) + sizeof(footer); // will always be >16 bytes


if (size > 2048) size = 4096 * ((size + 4095) / 4096);
else if (size < 128) size = 32 * ((size + 31) / 32);
else size = round_to_next_power_of_two(size);

For large allocations,


• Examples: use full pages
– malloc(4)  32 bytes
– malloc(45)  64 bytes
– malloc(145)  256 bytes
29
Binning
• Divided the free list into bins of exact size blocks
• Most allocations handled in O(1) time by pulling a
free block from the appropriate list
node * bins[];
• If no block is
32 bytes available, locate and
64 bytes split a larger block
96 bytes
128 bytes
256 bytes
512 bytes
1024 bytes
2048+ bytes 30
Next Problem: Parallelism
• Today’s programs are often parallel
• However, our current memory manager has poor
performance with >1 threads
CPU CPU
Thread 1 Thread 2 1 2

Thread 1 Thread 2

obj1 obj2
Cache Line
Free List
• Allocations are filled sequentially in memory
• Objects for different threads may share the
The free list is shared, thus it same cache line
must be protected by a mutex • This causes contention between CPU cores
Per-Thread Arenas
• To reduce lock and CPU cache contention, divide
the heap into arenas
– Each arena has its own free list
– Each thread is assigned to several arenas

Thread 1 Thread 2 Thread 3

Arena 1 Arena 2 Arena •3 No (or few) shared locks


• Cache affinity is high
unless data is shared
Arena 4 Arena 5 Arena 6 between threads

32
Two More Things
• How can you make your code manage memory
more quickly?
– Slab allocation
• Common memory bugs
– Memory leaks
– Dangling pointers
– Double free

33
Speeding Up Your Code
• Typically, the memory allocation algorithm is not
under your control
– You don’t choose what library to use (e.g. glibc)
– You don’t know the internal implementation
• How can your make your code faster?
– Avoid the memory allocator altogether!
– Use an object cache plus slab allocation

34
Memory Management Bugs (1)
int search_file(char * filename, char * search) {
unsigned int size;
char * data;
FILE * fp = fopen(filename, "r"); // Open the file
fseek(fp, 0, SEEK_END); // Seek to the end of the file
size = ftell(fp); // Tell me the total length of the file
data = (char *) malloc(size * sizeof(char)); // Allocate buffer
fseek(fp, 0, SEEK_SET); // Seek back to the beginning of the file
fread(data, 1, size, fp); // Read the •wholeWefile
forgot to free(data)!
into the buffer
return strstr(data, search) > 0; //•Is the
If this program
search ran
string in thefor a long
buffer?
} time, eventually it would exhaust
all available virtual memory
void main(int argc, char ** argv) {
if (search_file(argv[1], argv[2])) printf("String '%s' found in file '%s'\n", argv[2], argv[1]);
else printf("String '%s' NOT found in file '%s'\n", argv[2], argv[1]);
}
36
Memory Management Bugs (2)
• Dangling pointer
char * s = (char *) malloc(100);

• Behavior is nondeterministic
free(s); • If the memory has no been reused, may print s
… • If the memory has been recycled, may print garbage
puts(s);
• Double free
char * s = (char *) malloc(100);
… • Typically, this corrupts the free list
free(s); • However, your program may not crash (nondeterminism)
… • In some cases, double free bugs are exploitable

free(s); 37
• Free Lists
– Basics
– Speeding Up malloc() and free()
– Slab Allocation
– Common Bugs
• Garbage Collectors
– Reference Counting
– Mark and Sweep
– Generational/Ephemeral GC
– Parallel Garbage Collection 38
Brief Recap
• At this point, we have thoroughly covered how
malloc() and free() can be implemented
– Free lists of varying complexity
– Modern implementations are optimized for low
fragmentation, high parallelism
• What about languages that automatically manage
memory?
– Java, Javascript, C#, Perl, Python, PHP, Ruby, etc…

39
Garbage Collection
• Invented in 1959
• Automatic memory management
– The GC reclaims memory occupied by objects that are
no longer in use
– Such objects are called garbage
• Conceptually simple
1. Scan objects in memory, identify objects that cannot
be accessed (now, or in the future)
2. Reclaim these garbage objects
• In practice, very tricky to implement
40
Garbage Collection Concepts
Global Variables Stack Root Nodes

int * p; struct linked_list * head; What objects


are reachable?

Garbage
Heap
41
Identifying Pointers
• At the assembly level, anything can be a pointer
int x = 0x80FCE42;
char * c = (char *) x; // this is legal
• Challenge: how can the GC identify pointers?
1. Conservative approach: assume any number that
might be a pointer, is a pointer
• Problem: may erroneously determine (due to false
pointers) that some blocks of memory are in use
2. Deterministic approach: use a type-safe language
that does not allow the programmer to use unboxed
values as pointers, or perform pointer arithmetic
42
Approaches to GC
• Reference Counting
– Each object keeps a count of references
– If an objects count == 0, it is garbage
• Mark and Sweep
– Starting at the roots, traverse objects and “mark” them
– Free all unmarked objects on the heap
• Copy Collection
– Extends mark & sweep with compaction
– Addresses CPU and external fragmentation issues
• Generational Collection
– Uses heuristics to improve the runtime of mark & sweep
43
Reference Counting
• Key idea: each object includes a ref_count
– Assume obj * p = NULL;
– p = obj1; // obj1->ref_count++
– p = obj2; // obj1->ref_count--, obj2->ref_count++
• If an object’s ref_count == 0, it is garbage
– No pointers target that object
– Thus, it can be safely freed

44
Reference Counting Example
Global Variables Stack Root Nodes

int * p; struct linked_list * head;

2
1
01 1

1 1 0
1
3
2
1 1

These objects are Heap


garbage, but none
have ref_count == 0 45
Pros and Cons of Reference Counting

The Good The Bad


• Relatively easy to • Not guaranteed to free all
implement garbage objects
• Easy to conceptualize • Additional overhead (int
ref_count) on all objects
• Access to obj->ref_count
must be synchronized

46
Mark and Sweep
• Key idea: periodically scan all objects for
reachability
– Start at the roots
– Traverse all reachable objects, mark them
– All unmarked objects are garbage

47
Mark and Sweep Example
Global Variables Stack Root Nodes

int * p; struct linked_list * head;

Heap
48
Mark and Sweep Example
Global Variables Stack Root Nodes

int * p; struct linked_list * head;

Correctly identifies
unreachable cycles as Heap
garbage 49
Pros and Cons of Mark and Sweep
The Good The Bad
• Overcomes the weakness of • Mark and sweep is CPU
reference counting intensive
• Fairly easy to implement – Traverses all objects
and conceptualize reachable from the root
– Scans all objects in memory
• Guaranteed to free all freeing unmarked objects
garbage objects • Naïve implementations
“stop the world” before
Be careful: if you forget to collecting
set a reference to NULL, it – Threads cannot run in parallel
will never be collected with the GC
(i.e. Java can leak memory) – All threads get stopped while
50
the GC runs
Copy Collection
• Problem with mark and sweep:
– After marking, all objects on the heap must be
scanned to identify and free unmarked objects
• Key idea: use compaction (aka relocation)
– Divide the heap into start space and end space
– Objects are allocated in start space
– During GC, instead of marking, copy live object from
start space into end space
– Switch the space labels and continue

51
Compaction/Relocation
String str2 = new String(); Heap Memory
Pointer Value Location
obj1 0x0C00 0x0C00
str2 str
ht 0x0F20 0x0F20
0x0D90 0x0D90 0x1240
str 0x1240 0x1240
0x0F20 0x0F20
str2 ???
0x10B0 ???
0x10B0
0x10B0
ht
• One way to deal with fragmentation
0x0F20
is compaction
– Copy allocated blocks of memory into a 0x0D90
contiguous region of memory obj1
– Repeat this process periodically 0x0C00
• This only works if pointers are
boxed, i.e. managed by the runtime 52
Copy Collection Example
Global Variables Stack Root Nodes

int * p; struct linked_list * head; Copies are


compacted (no
fragmentation)

End Space
Start Start
End Space
All data can be
53
safely overwritten
Pros and Cons of Copy Collection
The Good The Bad
• Improves on mark and • Copy collection is slow
sweep – Data must be copied
• No need to scan memory – Pointers must be updated
for garbage to free • Naïve implementations are
• After compaction, there is not parallelizable
no fragmentation – “Stop the world” collector

54
Generational Collection
• Problem: mark and sweep is slow
– Expensive full traversals of live objects
– Expensive scan of heap memory
• Problem: copy collection is also slow
– Expensive full traversals of live objects
– Periodically, all live objects get copied
• Solution: leverage knowledge about object creation
patterns
– Object lifetime tends to be inversely correlated with
likelihood of becoming garbage (generational hypothesis)
– Young objects die quickly – old objects continue to live
55
Garbage Collection in Java
• By default, most JVMs use a generational
collector
• GC periodically runs two different collections:
1. Minor collection – occurs frequently
2. Major collection – occurs infrequently
• Divides heap into 4 regions
– Eden: newly allocated objects
– Survivor 1 and 2: objects from Eden that survive minor
collection
– Tenured: objects from Survivor that survive several
minor collections 56
Generational Collection Example
Eden Survivor 1 Survivor 2 Tenured

• Objects that survive


several Minor collections
move to Tenured
• Tenured objects are only
• Minor collection occurs • Survivor 1 and 2 scanned during Major
whenever Eden gets full rotate as collection
• Live objects are copied destinations for • Major collections occur
to Survivor a copy collector infrequently 57
More on Generational GC
• Separating young and old objects improves
performance
– Perform frequent, minor collections on young objects
– No need to scan old objects frequently
• Copy collection reduces fragmentation
– Eden and Survivor areas are relatively small, but they
are frequently erased

58
Parallel and Concurrent GC
• Modern JVMs ship with multiple generational GC
implementations, including:
– The Parallel Collector
• Runs several GC threads in parallel with user threads
• Multiple GC threads take part in each minor/major collection
• Best choice if your app is intolerant of pauses
– The Concurrent Mark and Sweep Collector
• Also implements multi-threaded GC
• Pauses the app, uses all CPU cores for GC
• Overall fastest GC, if your app can tolerate pauses
• Selecting and tuning Java GCs is an art
59
malloc()/free() vs. GC
Explicit Alloc/Dealloc Garbage Collection
• Advantages: • Advantages:
– Typically faster than GC – Much easier for programmers
– No GC “pauses” in execution • Disadvantages
– More efficient use of memory – Typically slower than explicit
• Disadvantages: alloc/dealloc
– More complex for programmers – Good performance requires
– Tricky memory bugs careful tuning of the GC
• Dangling pointers – Less efficient use of memory
• Double-free – Complex runtimes may have
• Memory leaks security vulnerabilities
– Bugs may lead to security • JVM gets exploited all the time
vulnerabilities
60
Other Considerations
• Garbage collectors are available for C/C++
– Boehm Garbage Collector
– Beware: this GC is conservative
• It tries to identify pointers using heuristics
• Since it can’t identify pointers with 100% accuracy, it must
conservatively free memory
• You can replace the default malloc()
implementation if you want to
– Example: Google’s high-performance tcmalloc library
– http://goog-perftools.sourceforge.net/doc/tcmalloc.html

61

You might also like