Linux
Linux
TO SCIENCE
A WORLD OF DISCOVERY
VIRTUAL MEMORY BASICS
Linux configures the memory management unit (MMU) of the CPU to present
a virtual address space to a running program that begins at zero and ends at the highest
address, 0xffffffff, on a 32-bit processor. This address space is divided into pages of
4 KiB by default
Linux divides this virtual address space into an area for applications, called user space,
and an area for the kernel, called kernel space.
VIRTUAL MEMORY BASICS
Pages in this virtual address space are mapped to physical addresses by the MMU, which
uses page tables to perform the mapping.
Each page of virtual memory may be unmapped or mapped as follows:
• Unmapped, so that trying to access these addresses will result in a SIGSEGV.
• Mapped to a page of physical memory that is private to the process.
• Mapped to a page of physical memory that is shared with other processes.
• Mapped and shared with a copy on write (CoW) flag set: a write is trapped in the
kernel, which makes a copy of the page and maps it to the process in place of the
original page before allowing the write to take place.
• Mapped to a page of physical memory that is used by the kernel.
VIRTUAL MEMORY BASICS
Advantages
• Invalid memory accesses are trapped and applications are alerted by SIGSEGV.
• Processes run in their own memory space, isolated from other processes.
• Efficient use of memory through the sharing of common code and data, for
example, in libraries.
• The possibility of increasing the apparent amount of physical memory by adding
swap files, although swapping on embedded targets is rare.
KERNEL SPACE MEMORY
LAYOUT
Kernel memory is managed in a fairly straightforward way. It is not demand-paged,
which means that for every allocation using kmalloc() or similar function, there is real
physical memory. Kernel memory is never discarded or paged out.
KERNEL SPACE MEMORY
LAYOUT
Consumers of kernel space memory include the following:
• The kernel itself, in other words, the code and data loaded from the kernel image
file at boot time. This is shown in the preceding kernel log in the .text, .init,
.data, and .bss. segments The .init segment is freed once the kernel has
completed initialization.
• Memory allocated through the slab allocator, which is used for kernel data
structures of various kinds. This includes allocations made using kmalloc(). They
come from the region marked lowmem.
• Memory allocated via vmalloc(), usually for larger chunks of memory than is
available through kmalloc(). These are in the vmalloc area.Kernel space memory
KERNEL SPACE MEMORY
LAYOUT
Consumers of kernel space memory include the following:
• Mapping for device drivers to access registers and memory belonging to various bits
of hardware, which you can see by reading /proc/iomem. These also come from
the vmalloc area, but since they are mapped to physical memory that is outside of
main system memory, they do not take up any real memory.
• Kernel modules, which are loaded into the area marked modules.
• Other low-level allocations that are not tracked anywhere else.
HOW MUCH MEMORY DOES
THE KERNEL USE?
Unfortunately, there isn't a precise answer to the question of how much memory the
kernel uses, but what follows is as close as we can get.
Firstly, you can see the memory taken up by the kernel code and data in the kernel log
shown previously, or you can use the size command, as follows:
Usually, the amount of memory taken by the kernel for the static code and data segments
shown here is small when compared to the total amount of memory. If that is not the case,
you need to look through the kernel configuration and remove the components that you
don't need.
HOW MUCH MEMORY DOES
THE KERNEL USE?
You can get more information about memory usage by reading /proc/meminfo:
The kernel memory usage is the sum of the following:
• Slab: The total memory allocated by the slab allocator
• KernelStack: The stack space used when executing kernel code
• PageTables: The memory used to store page tables
• VmallocUsed: The memory allocated by vmalloc()
User space memory layout
Linux employs a lazy allocation strategy for user space, only mapping physical pages of memory when the
For example, allocating a buffer of 1 MiB using malloc() returns a pointer to a block of memory addresses but
no actual physical memory.A flag is set in the page table entries such that any read or write access is trapped
Only at this point does the kernel attempt to find a page of physical memory and add it to the page table
Memory allocated using malloc comes from the former (except for very large allocations, which we will come to
later); allocations on the stack come from the latter. The maximum size of both areas is controlled by the process's
ulimit:
• Heap: ulimit -d, default unlimited
• Stack: ulimit -s, default 8 MiB
When running out of memory, the kernel may decide to discard pages that are mapped to a file and are read-only. If
that page is accessed again, it will cause a major page fault and be read back in from the file.
SWAPPING
• The idea of swapping is to reserve some storage where the kernel can place pages of memory that are not mapped to a file,
freeing up the memory for other uses. It increases the effective size of physical memory by the size of the swap file.
• It is not a panacea: there is a cost to copying pages to and from a swap file, which becomes apparent on a system that has too
little real memory for the workload it is carrying and so swapping becomes the main activity. This is sometimes known as disk
thrashing.
• Swap is seldom used on embedded devices because it does not work well with flash storage, where constant writing would
wear it out quickly. However, you may want to consider swapping to compressed RAM (zram).
Swapping to compressed memory (zram)
Creates RAM-based block devices named /dev/zram0, /dev/zram1 and so on . When data is written to these devices, it is first
compressed before being stored in RAM.
With compression ratios in the range of 30% to 50%, you can expect an overall increase in free memory of about 10% at the
expense of more processing and a corresponding increase in power usage.
# swapon /dev/zram0
# swapoff /dev/zram0
• Swapping memory out to zram is better than swapping out to flash storage, but neither technique is a
substitute for adequate physical memory.
• User space processes depend on the kernel to manage virtual memory for them. Sometimes a program
wants greater control over its memory map than the kernel can offer. There is a system call that lets us
map memory to a file for more direct access from user space.
MAPPING MEMORY WITH MMAP
• A process begins life with a certain amount of memory mapped to the text (the code) and data segments of the
program file, together with the shared libraries that it is linked with.
• It can allocate memory on its heap at runtime using malloc() and on the stack through locally scoped variables
and memory allocated through alloca(). It may also load libraries dynamically at runtime using dlopen().
• A process can also manipulate its memory map in an explicit way using mmap.
Using mmap to allocate private memory
• You can use mmap to allocate an area of private memory by setting MAP_ANONYMOUS in
the flags parameter and setting the file descriptor fd to -1
• Anonymous mappings are better for large allocations because they do not pin down the heap with
chunks of memory, which would make fragmentation more likely
• malloc() stops allocating memory from the heap for requests over 128 KiB and uses
mmap in this way
Using mmap to share memory
POSIX (Portable Operating System Interface) shared memory requires mmap to access the memory segment. In
this case, you set the MAP_SHARED flag and use the file descriptor from shm_open() :
Another process uses the same calls, filename, length, and flags to map to that memory region for sharing.
Subsequent calls to msync(2) control when updates to memory are carried through to the underlying file.
Using mmap to access device memory
It is possible for a driver to allow its device node to be memory mapped and share some of the device
memory with an application. The exact implementation is dependent on the driver
One example is the Linux framebuffer, dev/fb0 . The framebuffer interface is defined in
/usr/include/linux/fb.h, including an ioctl function to get the size of the display and the bits
per pixel
HOW MUCH MEMORY DOES MY APPLICATION USE?
As with kernel space, the different ways of allocating, mapping, and sharing user space memory make it
quite difficult to answer this seemingly simple question.
You can force the kernel to free up caches by writing a number between 1 and 3 to
/proc/sys/vm/drop_caches
The free command tells us how much memory is being used and how much is left. It neither tells us
which processes are using the unavailable memory nor in what proportions.
PER-PROCESS MEMORY USAGE
There are several metrics to measure the amount of memory a process is using. I will begin with the
two that are easiest to obtain: the virtual set size (VSS) and the resident memory size (RSS) :
• VSS: Called VSZ in the ps command and VIRT in top, this is the total amount of memory
mapped by a process
• RSS: Called RSS in ps and RES in top, this is the sum of memory that is mapped to physical
pages of memory
Using top and ps
Using smem
In 2009, Matt Mackall began looking at the problem of accounting for shared pages in process
memory measurement and added two new metrics called unique set size, or USS, and proportional
set size, or PSS
• USS: This is the amount of memory that is committed to physical memory and is unique to a
process; it is not shared with any others
• PSS: This splits the accounting of shared pages that are committed to physical memory between
all the processes that have them mapped
Information about PSS is available in /proc/<PID>/smaps, which contains additional
information for each of the mappings shown in /proc/<PID>/maps
There is a tool named smem that collates information from the smaps files and presents it in various
ways, including as pie or bar charts.
Since it can be written in Python, installing it on an embedded device requires a Python environment, which can
be difficult even with just a single tool. To help you in this case, there is a small program called smemcap, which
will collect the status from /proc on the target device and save it to a TAR file for later analysis on the computer.
IDENTIFYING MEMORY LEAKS
• A memory leak occurs when memory is allocated but not freed when it is no longer needed.
• Memory leakage is by no means unique to embedded systems, but it becomes an issue partly because targets don't have
much memory in the first place and partly because they often run for long periods of time without rebooting.
• You will realize that there is a leak when you run free or top and see that free memory is continually going down even if
you drop caches, as shown in the preceding section. You will be able to identify the culprit (or culprits) by looking at the
USS and RSS per process.
• There are several tools to identify memory leaks in a program. I will look at two: mtrace and valgrind
mtrace
mtrace is a component of glibc that traces calls to malloc, free, and related functions, and identifies areas of
memory not freed when the program exits.
How it works ?
You can select the tool you want with the -tool option. Valgrind runs on the major embedded platforms: Arm
(Cortex-A), PowerPC, MIPS, and x86 in 32-bit and 64-bit ...
To find our memory leak, we need to use the default memcheck tool, with the -–leakcheck=full option to print the
lines where the leak was found:
• However, there is always the possibility that a particular workload will cause a group of processes to
try to cash in on the allocations they have been promised simultaneously and so demand more than
there really is. This is an out of memory situation, or OOM. At this point, there is no other alternative
but to kill off processes until the problem goes away. This is the job of the out of memory killer.
Before we get to that, there is a tuning parameter for kernel allocations in /proc/sys/
vm/overcommit_memory, which you can set to the following :
• 0: Heuristic over-commit : is the default and is the best choice in the majority of cases.
• 1: Always over-commit : never check : is only really useful if you run programs that work with large
sparse arrays and allocate large areas of memory but write to a small proportion of them. Such
programs are rare in the context of embedded systems.
• 2: Always check : never over-commit, seems to be a good choice if you are worried about running out
of memory, perhaps in a mission or safety-critical application. It will fail allocations that are greater
than the commit limit, which is the size of swap space plus the total memory multiplied by the
overcommit ratio. The over-commit ratio is controlled by /proc/sys/vm/overcommit_ratio and has a
default value of 50%
As an example, suppose you have a device with 512 MB of system RAM and you set a really
conservative ratio of 25%:
There is another important variable in /proc/meminfo, called Committed_AS. This is the total
amount of memory that is needed to fulfill all the allocations made so far.
• In all cases, the final defense is oom-killer. It uses a heuristic method to calculate a badness score between 0
and 1,000 for each process and then terminates those with the highest score until there is enough free
memory. You should see something like this in the kernel log: