Linux Memory 2
Linux Memory 2
?
0 2N -1
Managing virtual memory (user space address)
{ Virtual address space management
mmap(size, type, permissions)+
…………..
alternatives
munmap( )+
…………… ➔ contiguous allocation based on
}
memory region type
◆ inflexible
◆ scalability issues
Virtual address space ➔ sparse allocation
management
(kernel) ◆ sorted list of used ranges
◆ scalability issues
0 ? 2N -1
● Can be solved using
balanced search trees
How linux does it? ➔ start and end never
struct task_struct struct mm_struct
overlaps between two
vm areas
task mm ➔ can merge/extend
vmas if permissions
struct vm_area_struct match
(include/linux/mm_types.h) ➔ linux maintains both
vma vma vma rb_tree and a sorted
(end ← start … (end ← start (end ← start list (see mm/filemap.c)
perms) perms) perms)
Example usage
➔ mmap( ), munmap( ), mremap( ) system calls
◆ some useful calls: find_vma( ) , get_unmapped_area( ), vma_merge ()
◆ can be found in mm/mm.c
➔ Challenges in paging
◆ Size of translation meta-data
◆ Additional memory accesses during translation
4-level page tables (48-bit virtual address)
9 bits 9 bits 9 bits 9 bits 12 bits
CR3 → mm->pgd
X86_64 page table entries (48-bit)
CR3 register 0
63 52 11
63 52 pgd,pud,pmd,pte entries 11 0
➔ Recall the huge page bit (PS bit) in page table entries
➔ What are the challenges?
Paging with multiple page sizes: example
9 bits 9 bits 9 bits 9 bits 12 bits
Physical
pgd_t pud_t pmd_t pte_t frame (4K)
pmd_t (H)
Physical
frame (2M)
CR3 → mm->pgd
Page faults
➔ Synchronous or asynchronous?
➔ When can it happen? What are the triggers?
➔ Do we require any information regarding the faulting task? What and
why?
➔ Why do we require page walk in software?
Page fault handling
➔ Page fault is an exception (#14 in x86)
◆ Translation missing in any level of page table hierarchy
◆ Translation present, but access rights do not permit access
◆ Error code provided by CPU to distinguish the above scenario
➔ Code reference
◆ arch/x86/mm/fault.c do_page_fault( )
What about kernel virtual addresses?
Addressing in kernel
➔ Kernel executes on behalf of ….
➔ Kernel state is accessible from all processes in kernel mode
◆ chardev example: P1 can read( ) a buffer allocated by a write( ) call from P2
◆ How buffer (a virtual address) is accessible from P1’s kernel context?
➔ Alternate 1: During process entry into kernel, change the page tables
◆ Manage one or more kernel page table
◆ Switch the page table when entering kernel mode
◆ Why not design a system like this?
PG1
A possible solution
fork( ) ➔ PG2 is a copy of PG1
Process - 0 Process - 1
(initially)
CR3 CR3
➔ Any restriction on kernel
usable VA range?
mm->pgd mm->pgd
E1 Ek
pgd_t E1 pgd_t E1
pud_t pud_t
pgd_t EK
PG1 PG2
copy
All the best !!