0% found this document useful (0 votes)
6 views

Linux Memory 2

Uploaded by

xanarry
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Linux Memory 2

Uploaded by

xanarry
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Linux Memory Management

Virtual memory layout: a closer look


➔ Applications require memory with different properties
◆ access permissions
◆ sharing
◆ file backed vs. anonymous
◆ dynamically sized
➔ /proc/{pid}/maps and mmap() system call

➔ Why OS should worry how user-space virtual addresses are managed?


◆ let a user-space library handle it
◆ only virtual to physical translation is managed by OS
◆ possible?
Managing virtual memory (user space address)
{ Some design consideration
mmap(size, type, permissions)+
………….. ◆ virtual address space is quite
munmap( )+
large (for 64-bit)
……………
} ◆ can not assume virtual address
usage size
◆ efficiency concerns: CPU and
Memory
Virtual address space
◆ address space requirements →
management
(kernel) hardware structures (MMU)

?
0 2N -1
Managing virtual memory (user space address)
{ Virtual address space management
mmap(size, type, permissions)+
…………..
alternatives
munmap( )+
…………… ➔ contiguous allocation based on
}
memory region type
◆ inflexible
◆ scalability issues
Virtual address space ➔ sparse allocation
management
(kernel) ◆ sorted list of used ranges
◆ scalability issues

0 ? 2N -1
● Can be solved using
balanced search trees
How linux does it? ➔ start and end never
struct task_struct struct mm_struct
overlaps between two
vm areas
task mm ➔ can merge/extend
vmas if permissions
struct vm_area_struct match
(include/linux/mm_types.h) ➔ linux maintains both
vma vma vma rb_tree and a sorted
(end ← start … (end ← start (end ← start list (see mm/filemap.c)
perms) perms) perms)
Example usage
➔ mmap( ), munmap( ), mremap( ) system calls
◆ some useful calls: find_vma( ) , get_unmapped_area( ), vma_merge ()
◆ can be found in mm/mm.c

➔ Page fault handler


◆ require vm_area access permissions to fix the page fault
◆ Ex: fault handling for a read-only vm_area vs. read-write vm_area

➔ Feature: vma-area specific page fault handler


◆ struct vm_operations_struct *vm_ops
◆ mechanism to register call backs on page fault (and some other events)
Demand paging: background
➔ Why not use physical addressing?

➔ Considering application expectation of virtual address space flexibility


◆ Why not use segmentation-only design?
◆ Why use paging?

➔ Challenges in paging
◆ Size of translation meta-data
◆ Additional memory accesses during translation
4-level page tables (48-bit virtual address)
9 bits 9 bits 9 bits 9 bits 12 bits

pgd_offset pud_offset pmd_offset pte_offset

pgd_t pud_t pmd_t pte_t Physical


frame (4K)

CR3 → mm->pgd
X86_64 page table entries (48-bit)
CR3 register 0
63 52 11

40-bit, 4K aligned physical address of PGD

63 52 pgd,pud,pmd,pte entries 11 0

40-bit, 4K aligned physical address of next level

Some important flags


0 (present/absent) 1 (read/write) 2 (user/supervisor), 5(accessed) 7(huge page)
63(execute permissions)

*Source: Intel manual Vol: 3A 4.5


Walk page table in s/w for fun and profit
#define PAGE_SIZE 4096
#define PAGE_SHIFT 12

get_pa(task_struct *tsk, unsigned long va)


{
unsigned long address = (va >> PAGE_SHIFT) << PAGE_SHIFT;
mm = task->mm;
pgd = pgd_offset(mm, address);
pud = pud_offset(pgd, address);
pmd = pmd_offset(pud, address);
pte = pte_offset(pmd, address); /*pte_flags, pte_none ... */
pfn = pte_pfn(pte);
return (pfn << PAGE_SHIFT) + va - address)
}
Page table translations: debate
➔ Page table physical frames themselves may not be swapped? why or why
not?
➔ How much memory required to maintain page tables in a system? What
are the determinant parameters?
➔ How can paging performance be optimized? In the proposed
optimizations
◆ What are the tradeoffs?
◆ What are the assumptions?
Multiple page size support (4K, 2M and 1G)
➔ Same process can have multiple page sizes
◆ All depends on how page table is organized

➔ Strategy: collapse page tables starting from lowest level (pte)


◆ pte level is removed and pmd addresses 21-bits = 2MB
◆ pte and pmd are removed, pud addresses 30-bits = 1GB

➔ Recall the huge page bit (PS bit) in page table entries
➔ What are the challenges?
Paging with multiple page sizes: example
9 bits 9 bits 9 bits 9 bits 12 bits

pgd_offset pud_offset pmd_offset pte_offset

Physical
pgd_t pud_t pmd_t pte_t frame (4K)

pmd_t (H)

Physical
frame (2M)
CR3 → mm->pgd
Page faults

➔ Synchronous or asynchronous?
➔ When can it happen? What are the triggers?
➔ Do we require any information regarding the faulting task? What and
why?
➔ Why do we require page walk in software?
Page fault handling
➔ Page fault is an exception (#14 in x86)
◆ Translation missing in any level of page table hierarchy
◆ Translation present, but access rights do not permit access
◆ Error code provided by CPU to distinguish the above scenario

➔ Page fault handling


◆ CR2 register in x86
◆ Perform software walk to check missing entries
◆ Allocate missing entries in the page table hierarchy
◆ May require new physical allocation

➔ Code reference
◆ arch/x86/mm/fault.c do_page_fault( )
What about kernel virtual addresses?
Addressing in kernel
➔ Kernel executes on behalf of ….
➔ Kernel state is accessible from all processes in kernel mode
◆ chardev example: P1 can read( ) a buffer allocated by a write( ) call from P2
◆ How buffer (a virtual address) is accessible from P1’s kernel context?

➔ Alternate 1: During process entry into kernel, change the page tables
◆ Manage one or more kernel page table
◆ Switch the page table when entering kernel mode
◆ Why not design a system like this?

➔ Alternate 2: kernel memory mapped to every process page table


◆ Pros and cons?
Monolithic kernel: every process has the same heart!
Process 1 Process 2
➔ Kernel virtual address
User mode User mode
access access mapping should be
Kernel mode present in both process
access
Kernel page tables.
➔ How to design?
Page table
Page table ◆ Kernel can perform
Kernel mode dynamic allocation→
(Process 1 & 2)
should reflect in every
process page table
◆ Processes are dynamically
created and destroyed
Memory
Linux strives on family values!
➔ A child process page table inherits kernel mappings of the parent
➔ By implication, the inheritance tree is rooted at the first process
➔ What about changes in mapping?
◆ Can be done by any process in kernel context
◆ Update mapping in every process?

➔ What about reservation (and propagation ) of some pgd entries (pointing


to pud-level translation page frames) for kernel virtual addresses?
◆ Will it work?
◆ What happens when kernel virtual address mappings change?
A possible solution
Process - 0
➔ One (or more) entries in
CR3
PGD-level (level-4)
➔ VA-range covered by one
mm->pgd
entry = ?
E1
➔ How many entries?
pgd_t E1
pud_t

PG1
A possible solution
fork( ) ➔ PG2 is a copy of PG1
Process - 0 Process - 1
(initially)
CR3 CR3
➔ Any restriction on kernel
usable VA range?
mm->pgd mm->pgd
E1 Ek

pgd_t E1 pgd_t E1
pud_t pud_t
pgd_t EK

PG1 PG2
copy
All the best !!

You might also like