The way that memory is allocated and used is an oft-misunderstood topic. I’ve never found a satisfactory yet simple reference of everything a programmer should know about it, so here is my feeble attempt to write one.
Modern Operating Systems use a memory management design known as virtual memory. This is a very oft-misunderstood term, even amongst experienced programmers. If you think that virtual memory is “when hard drive is used as memory” then you need to read very closely, because that definition is completely wrong, and it will remain wrong regardless of how certain Operating System vendors choose to label their interfaces. 🙂
Virtual memory is a memory architecture where processes are presented a view of memory as a single flat address space, when in fact the data may be stored in one or more physical devices, or in some cases, not stored at all. Using hard drive in lieu of memory is known as “swapping”, and virtual memory is nice because it enables swapping easily in a transparent manner to a programmer. There are plenty of benefits to virtual memory that have nothing to do with swapping, though.
One of the big benefits of virtual memory is that each process gets its own address space to play with. It is not possible for one process to access another process’s address space, because the OS maintains a table for each process which shows which addresses map to what physical storage for each process. So the address 0x00FABE0F20 will likely refer to one piece of physical memory in one process, and a completely different piece of physical memory in another process.
Memory is divided up into pages. A page is typically 4KB, and rarely less than 4KB, though some systems use (much) larger page sizes. When a process wants memory, it must use a system call to allocate one or more pages. This is typically the mmap() system call on Linux. The Operating System will set up a mapping, allocating an address range in the process’s address space.
The Operating System is responsible for choosing how to physically store the data represented, and will do so in the most efficient way possible. Each page will be stored in one of three possible ways:
(1) unmapped: if the program has not written to the memory region since requesting its allocation, then it is by definition filled with all-zeroes. The Operating System does not have to store it at all, since it knows it’s just filled with zero bytes. Thus the OS will just mark the page as ‘unmapped’ until the program actually writes to it. Thus, on most Operating Systems, when you allocate “memory”, the OS will give you an address range but won’t actually map it to physical storage (yet).
(2) resident: the page corresponds to a page in RAM.
(3) swapped: the page corresponds to a page that has been swapped to disk.
It is quite important to understand the implications of (1). For instance, in C++ it is common to use the std::vector class template as a dynamic array. std::vector over-allocates memory so that it doesn’t have to expand its buffer too often. This may seem expensive, but for a decent size std::vector, you don’t actually pay in terms of physical memory until you start using the memory.
For most typical programs these days, it’s actually common for between 10% and 50% of memory to be in state (1). State (2) is the next most common, and most people hate it when things start getting swapped, so (3) is relatively rare.
Let’s look at what top has to say about Frogatto’s memory usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1427 david 20 0 119m 76m 17m S 22 2.5 0:02.70 frogatto
VIRT refers to the amount of address space that Frogatto is using. i.e. the amount of memory that is in states (1), (2), or (3) above. RES refers to the physical memory that Frogatto is using. i.e. the amount of memory that is in state (2) above. As we can see, Frogatto has allocated 119MB of memory, but only 76MB is resident. Unless we have a good reason to think swapping has occurred, our best guess is that the rest is in state (1) above. So Frogatto has allocated 119MB, but only actually started using 76MB, and so the Operating System has only needed to actually allocated 76MB worth of physical memory for it.
It should be noted that on a 64 bit system, VIRT is not a very constrained resource. The address space is huge, and so VIRT can grow almost without bound. Most OSes will reject allocations that they consider ridiculously above the amount of physical storage space, though Linux at least can be configured to allow any allocation.
On a 32 bit system, each process can generally only allocate 2GB of virtual memory, due to lack of address space. This makes VIRT still a rather constrained resource for many applications — though for desktop applications, 2GB is still considered a very large amount of memory, and most servers are moving toward 64 bit architectures.
Now, what is this “SHR” thing? It is how much memory is “shared”. Another feature that virtual memory facilitates nicely is making it so a virtual address range in use by one process happens to map to some physical memory that a completely different virtual address range in a different process also maps to. Some programs do this explicitly, as a communication mechanism — a very fast communication mechanism for that matter — however the most common reason this is done is by the OS itself as an efficient way of storing shared libraries in memory. It is common for a shared library to be used by many programs that are running at once, and rather than storing a copy of the shared library for each program, the OS will store it just once, and map each process to it.
Since we know that Frogatto doesn’t do any explicit sharing, this tells us that at least 17MB of Frogatto’s memory usage is due just to libraries. Being shared memory, it means that even if Frogatto wasn’t using this memory, another process would be, so Frogatto is actually only adding 59MB, not 76MB, to the system’s usage of RAM.
You can actually break down all the memory usage of a program using a cool little command called pmap. Running pmap on Frogatto gives output that looks like this:
08048000 2132K r-x-- /home/david/frogatto/frogatto 0825d000 4K r---- /home/david/frogatto/frogatto 0825e000 4K rw--- /home/david/frogatto/frogatto 0825f000 8K rw--- [ anon ] 09218000 31400K rw--- [ anon ] b288f000 2048K rw-s- /dev/dri/card0 b2a8f000 1024K rw-s- /dev/dri/card0 b2c90000 772K rw--- [ anon ] b2d52000 2004K rw--- [ anon ] b2f48000 3500K rw--- [ anon ] b32b4000 1936K rw--- [ anon ] b3499000 2476K rw--- [ anon ] b3705000 1032K rw--- [ anon ] b3807000 4K ----- [ anon ] b3808000 8192K rwx-- [ anon ] b4008000 12K r-x-- /usr/lib/alsa-lib/libasound_module_rate_speexrate.so b400b000 4K r---- /usr/lib/alsa-lib/libasound_module_rate_speexrate.so b400c000 4K rw--- /usr/lib/alsa-lib/libasound_module_rate_speexrate.so b400d000 64K rw-s- [ shmid=0x5c8016 ] b401d000 84K r-x-- /lib/tls/i686/cmov/libnsl-2.9.so [snip]
This shows us the starting address of the mapping, the size of the mapping — which is always a multiple of 4KB on this system with 4KB pages — the permissions of the memory, and then the underlying device.
On Linux, files may be “memory mapped”. That is, a file loaded into a segment of memory, with modifications of that memory modifying the underlying file. This is very different to simply reading a file into memory; the memory becomes an actual representation of the file that the kernel maintains. We can see that the Frogatto binary itself occupies several megabytes, as do various libraries. The most common use of memory mapped files are to access executables and shared libraries, though it is possible to map any file into memory.
The [ anon ] mappings are requests the application has made for “just plain memory”, not mapped to any file. This is the most common form of memory an application obtains.
For most applications, getting memory in multiples of 4KB isn’t very convenient. Most modern applications tend to allocate very many objects, of varying sizes, most of which are far smaller than 4KB. So, most programming frameworks have an additional memory management layer which will allocate memory from the Operating System, and then carve it up into smaller and different sizes for the application to use. In C, this is usually malloc() and free(), and in Java it is the memory management/garbage collection system provided by the VM.
Because memory is allocated in chunks of at least 4KB and then carved up, it is very difficult for a program to release memory that it has allocated back to the Operating System. A page might be carved up into 40 or 50 or 100 different small allocations, and every one of these would have to be released for it to be possible to release the memory back to the OS. For this reason, most programs stay at or near their high water mark in terms of memory usage.
Most memory allocators tend to satisfy large allocations by calling the OS directly. For instance, it is typical in dlmalloc() — one of the oldest allocators that many modern ones are derived from — to implement malloc() calls of 64KB of more as a direct call to mmap() on Linux, and then call munmap() when free() is called.
This has advantages, though it also has disadvantages. Generally if one accesses memory — or at least writes to it — it must be resident — i.e. in state (2) above. If the memory is actually in state (3), swapped, then it must be copied from disk to RAM. This is called a major page fault, and is terrible for performance. However, even if it’s in state (1), unmapped, the kernel must still spend time allocating it to physical RAM. This is called a minor page fault, which isn’t near so bad as a major page fault — indeed typically several thousand times faster — however nevertheless, if your application regularly allocates and uses large buffers of memory, it might be better to consider caching the buffers than continuously allocate and deallocate, since every new allocation of a buffer will trigger a new round of minor page faults when you start using the memory.
The way in which the kernel allocates memory also has a somewhat unfortunate effect on the behavior when memory exhaustion occurs. The C memory management system was designed so that malloc() returns NULL if the memory couldn’t be allocated. However, unless you try to allocate something truly ridiculous, or run out of address space on a 32 bit machine, malloc() will almost certainly succeed on a modern OS. Instead what will happen is that when you actually start using the memory, the minor page fault will be unable to be satisfied, and the OS will kill your application. That is, if the user didn’t become sufficiently frustrated with all the swapping to kill it first.
Another feature of the way Operating Systems use memory is the filesystem cache. If you’re developing an OS, you know it’ll improve performance to use some memory to cache filesystem accesses, but how much do you use? Simple: use all the memory that applications aren’t using. This is how Linux and most other modern OSes do it. Let us look at the output of the free -m command:
total used free shared buffers cached Mem: 3030 2948 82 0 222 1332 -/+ buffers/cache: 1392 1637 Swap: 4110 547 3563
At first glance it may appear that this system has 3030MB, is using 2948MB, and only has 82MB available. However, 1332MB are ‘cached’ — that is, being used for filesystem cache. 222MB more is being used in kernel buffers. If applications start using more memory, the OS will generally rather happily allow their requests to be satisfied by shrinking the size of the filesystem cache.
Thus for all intents and purposes, it is recommended to use the -/+ buffers/cache line and say that this machine is using 1392MB, and has 1637MB free. The OS uses the filesystem cache to fill up memory that isn’t otherwise being used, but usually prioritizes application storage over it.
I say “usually”, because Linux and other OSes sometimes decide that certain memory allocated by applications is being used very infrequently, while the filesystem cache is being put to good use, and actually swap out some portion of applications in favor of keeping more space for filesystem cache. Whether this is a good idea is sometimes debatable, but filesystem cache can indeed speed up the performance of many systems.
Speaking of swap, it is important to understand that swapping out generally isn’t very expensive. It can be done in the background. What is expensive and kills performance is only if memory that is swapped out is accessed and needs to be swapped back in. This generates major page faults — accesses to memory which require reading the page that memory resides in in from swap. This is usually especially terrible because if you were, say, traversing a linked list, each node in the list might reside in a different page, meaning you will have repeated page faults, and each time you have to wait for the page to be loaded before you can work out where the next node is.
So we’ve talked about various kinds of faults. Another popularly known fault is a segmentation fault. What is that exactly? Generally it occurs when a program tries to access a memory address that is not in the kernel’s table of mappings for that process, or if it accesses a page that is in the table of mappings, but does so in a way that it doesn’t have permission for. For instance, in the pmap output above, we can see that executables and libraries do not have write permission in their mappings.
Understanding how memory works gives us a better understanding of how buggy C programs will behave. Suppose you malloc() a buffer and then write to that buffer, overrunning it. It is possible that your overrun will end up straying into a page that you don’t have access to, though this is relatively unlikely. More likely is that you will simply overwrite some part of your own program’s memory that you didn’t intend to. However, it is quite likely that you will overwrite the value of a pointer, and the new value you write in the pointer will not correspond to a valid address in your application. Next time that pointer is dereferenced, you will have a segmentation fault.
Hopefully this article gives a good overview of how memory works from an OS and performance perspective. I welcome any comments or suggestions.