5.2 WinMemManagFundamentals
5.2 WinMemManagFundamentals
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze
Copyright Notice
© 2000-2005 David A. Solomon and Mark Russinovich
2
Roadmap for Section 5.2.
3
Windows Memory Management
Fundamentals
Classical virtual memory management
Flat virtual address space per process
Private process address space
Global system address space
Per session address space
Object based
Section object and object-based security (ACLs...)
Demand paged virtual memory
Pages are read in on demand & written out when necessary
(to make room for other memory needs)
Provides flat virtual address space
32-bit: 4 GB, 64-bit: 16 Exabytes (theoretical)
4
Windows Memory Management
Fundamentals
Lazy evaluation
Sharing – usage of prototype PTEs (page table
entries)
Extensive usage of copy_on_write
...whenever possible
Shared memory with copy on write
Mapped files (fundamental primitive)
Provides basic support for file system cache
manager
5
Memory Manager Components
System services for allocating, deallocating, and managing virtual memory
A access fault trap handler for resolving hardware-detected memory
management exceptions and making virtual pages resident on behalf of a
process
Six system threads
Working set manager (priority 16) – drives overall memory management
policies, such as working set trimming, aging, and modified page writing
Process/stack swapper (priority 23) -- performs both process and kernel
thread stack inswapping and outswapping
Modified page writer (priority 17) – writes dirty pages on the modified list
back to the appropriate paging files
Mapped page writer (priority 17) – writes dirty pages from mapped files
to disk
Dereference segment thread (priority 18) is responsible for cache and
page file growth and shrinkage
Zero page thread (priority 0) – zeros out pages on the free list
6
MM: Process Support
MmCreateProcessAddressSpace – 3 pages
The page directory
Points to itself
Map the page table of the hyperspace
Map system paged and nonpaged areas
Map system cache page table pages
The page table page for working set
The page for the working set list
MmInitializeProcessAddressSpace
Initialize PFN for PD and hyperspace PDEs
MiInitializeWorkingSetList
Optional: MmMapViewOfSection for image file
MmCleanProcessAddressSpace,
MmDeleteProcess AddressSpace
7
MM: Process Swap Support
MmOutSwapProcess / MmInSwapProcess
MmCreateKernelStack
MiReserveSystemPtes for stack and no-access page
MmDeleteKernelStack
MiReleaseSystemPtes
MmGrowKernelStack
MmOutPageKernelStack
Signature (thread_id) written on top of stack before write
The page goes to transition list
MmInPageKernelStack
Check signature after stack page is read / bugcheck
8
MM: Working Sets
Working Set:
The set of pages in memory at any time for a given process, or
All the pages the process can reference without incurring a page fault
Per process, private address space
WS limit: maximum amount of pages a process can own
Implemented as array of working set list entries (WSLE)
Soft vs. Hard Page Faults:
Soft page faults resolved from memory (standby/modified page lists)
Hard page faults require disk access
Working Set Dynamics:
Page replacement when WS limit is reached
NT 4.0: page replacement based on modified FIFO
Windows 2000: Least Recently Used algorithm (uniproc.)
9
MM: Working Set Management
10
MM: I/O Support
11
MM: Cache Support
12
Memory Manager: Services
13
Protecting Memory
Attribute Description
PAGE_NOACCESS Read/write/execute causes access violation
PAGE_READONLY Write/execute causes access violation; read permitted
PAGE_READWRITE Read/write accesses permitted
PAGE_EXECUTE Any read/write causes access violation; execution of code is
permitted (relies on special processor support)
PAGE_EXECUTE_ Read/execute access permitted (relies on special processor
READ support)
PAGE_EXECUTE_ All accesses permitted (relies on special processor support)
READWRITE
PAGE_WRITECOPY Write access causes the system to give process a private copy
of this page; attempts to execute code cause access violation
PAGE_EXECUTE_ Write access causes creation of private copy of pg.
WRITECOPY
PAGE_GUARD Any read/write attempt raises EXCEPTION_GUARD_PAGE
and turns off guard page status
14
Reserving & Committing Memory
Optional 2-phase approach to memory allocation:
1. Reserve address space (in multiples of page size)
2. Commit storage in that address space
Can be combined in one call (VirtualAlloc, VirtualAllocEx)
Reserved memory:
Range of virtual addresses reserved for future use (contiguous buffer)
Accessing reserved memory results in access violation
Fast, inexpensive
A thread‘s user-mode stack is constructed using
this 2-phase approach: initial reserved size is 1MB,
Committed memory: only 2 pages are committed: stack & guard page
Has backing store (pagefile.sys, memory-mapped file)
Either private or mapped into a view of a section
Decommit via VirtualFree, VirtualFreeEx
15
Features new to Windows 2000
Memory Management
Support of 64 GB physical memory on Intel platform
PAE – physical address extension (36 bit, changes PDE/PTE
structs)
New version of kernel (ntkrnlpa.exe, ntkrpamp.exe)
/PAE switch in boot.ini
Integrated support for Terminal Server
HydraSpace : per session
In NT 4 Terminal Server had a specific kernel
Driver Verifier: verifier.exe
Pool checking, IRQL checking
Low resources simulation, pool tracking, I/O verification
16
Features new to Windows XP/2003
Memory Management
64-bit support
Up to 1024 Gbytes physical memory supported
Support for Data Execution Prevention (DEP)
Memory manager supports HW no-execute
protection
Performance & Scalability enhancements
17
Shared Memory & Mapped Files
18
Virtual Address Space Allocation
19
Large Pages
Large pages allow a single page directory entry to map a larger region
x86, x64: 4 MB, IA64: 16 MB
Advantage: improves performance
Single TLB entry used to map larger area
Large pages are used to map NTOSKRNL, HAL, nonpaged pool, and the PFN
database if a “large memory system”
Windows 2000: more than 127 MB
Windows XP/2003: more than 255 MB
In other words, most systems…
Disadvantage: disables kernel write protection
With small pages, OS/driver code pages are mapped as read only; with large pages,
entire area must be mapped read/write
Drivers can then modify/corrupt system & driver code without immediately
crashing system
Driver Verifier turns large pages off
Can also override by changing HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet\Control\Session Manager\Memory Management\
LargePageMinimum to FFFFFFFF
20
Large Pages: Server 2003
Enhancements
Can specify other drivers to map with large pages:
HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet\Control\Session Manager\Memory
Management\LargePageDrivers (multi-string)
Applications can use large pages for process
memory
VirtualAlloc with MEM_LARGE_PAGE flag
Can query if system supports large pages with
GetLargePageMinimum
21
Data Execution Prevention
Windows XP SP2 and Windows Server 2003 SP1 support Data Execution
Prevention (DEP)
Prevents code from executing in a memory page not specifically marked as executable
Stops exploits that rely on getting code executed in data areas
Relies on hardware ability to mark pages as non executable
AMD calls it NX (“No Execute”)
Intel calls it XD (“Execute Disable”)
Processor support:
Intel Itanium had this in 2001, but Windows didn’t support it until now
AMD64 was the next to support it
Then, AMD added Sempron (32-bit processor with NX support)
Intel added it first with their 64-bit extension chips
(Xeon/Pentium 4s with EM64T)
More recently, Intel added it to their 32-bit processor line
(anything ending in “J”)
22
Data Execution Prevention
23
Controlling DEP
24
DEP on 64-bit Windows
Always applied to all 64-
bit processes and
device drivers
Protects user and
kernel stacks, paged
pool, session pool
32-bit processes
depend on configuration
settings
25
DEP on 32-bit Windows
Hardware DEP used when running 32-
bit Windows on systems that support
it
When enabled, system boots PAE
kernel (Ntkrnlpa.exe)
Kernel mode: applied to kernel stacks,
but not paged/session pool
User mode: depends on system
configuration
Even on processors without hardware
DEP, some limited protection
implemented for exception handlers
26
Mapped Files
A way to take part of a file and map it to a range of virtual addresses
(address space is 2 GB, but files can be much larger)
Called “file mapping objects” in Windows API
Bytes in the file then correspond one-for-one with bytes in the region of
virtual address space
Read from the “memory” fetches data from the file
Pages are kept in physical memory as needed
Changes to the memory are eventually written back to the file (can request
explicit flush)
Initial mapped files in a process include:
The executable image (EXE)
One or more Dynamically Linked Libraries (DLLs)
Processes can map additional files as desired (data files or additional
DLLs)
27
Section Objects (mapped files)
Called “file mapping objects” in Windows API
Files may be mapped into v.a.s.
// first, do EITHER ...
hMapObj = CreateFileMapping (hFile, security, protection,sizeHigh, sizeLow,
mapname);
// … OR …
hMapObj = OpenFileMapping (accessMode, inheritflag, mapname);
// … then, pass the resulting handle to a mapping object (section) to ...
lpvoid = MapViewOfFile (hMapObj, accessMode,
offsetHigh, offsetLow, cbMap);
Bytes in the file then correspond one-for-one with bytes in the region of virtual
address space
Read from the “memory” fetches data from the file
Changes to the memory are written back to the file
Pages are kept in physical memory as needed
If desired, can map to only a part of the file at a time
28
Shared Memory
Process 1 virtual memory
29
Viewing DLLs & Memory Mapped
Files
Process Explorer lists memory mapped files
30
Copy-On-Write Pages
Used for sharing between process address spaces
Pages are originally set up as shared, read-only, faulted from the common
file
Access violation on write attempt alerts pager
pager makes a copy of the page and allocates it privately to the process doing the
write, backed to the paging file
So, only need unique copies for the pages in the shared region that are
actually written (example of “lazy evaluation”)
Original values of data are still shared
e.g. writeable data initialized with C initializers
31
How Copy-On-Write Works
Before
Orig. Data
Page 1
Orig. Data
Page 2
Page 3
Process Process
Address Address
Space Space
Physical
memory
32
How Copy-On-Write Works
After
Orig. Data
Page 1
Mod’d. Data
Page 2
Page 3
33
Shared Memory = File Mapped by
Multiple Processes
Process A Process B
00000000
User
User User
User
accessible
accessible accessible
accessible
v.a.s.
v.a.s. v.a.s.
v.a.s.
7FFFFFFF
34
Virtual Address Space (V.A.S.)
Process space contains:
The application
you’re running
}
(.EXE and .DLLs) 00000000
User
User Unique per
A user-mode stack for each accessible
accessible process
thread (automatic storage)
7FFFFFFF
All static storage defined by
80000000
}
the application
Kernel-mode
Kernel-mode System-
accessible
accessible wide
FFFFFFFF
35
Virtual Address Space (V.A.S.)
System space contains:
Executive, kernel, and HAL
Statically-allocated system-
wide data cells
}
00000000
Page tables (remapped for
each process) User
User Unique per
accessible
accessible
Executive heaps (pools) process
Kernel-mode device drivers 7FFFFFFF
(in nonpaged pool)
80000000
}
File system cache
A kernel-mode stack for Kernel-mode
Kernel-mode System-
every thread in every accessible
process accessible wide
FFFFFFFF
36
Windows User Process
Address Space Layout
Range Size Function
0x0 – 0xFFFF 64 KB No-access region to catch incorrect pointer ref.
0x10000 - 2 GB minus at The private process address space
07FFEFFFF least 192kb
0x7FFDE000 - 4 KB Thread Environment Block (TEB) for first thread, more TEBs
0x7FFDEFFF are created at the page prior to that page
37
3GB Process Space Option
00000000 Only available on:
Windows 2003 Server, Enterprise Edition
& Windows 2000 Advanced Server, XP
Unique per Unique per SP2
process, .EXE
.EXEcode
process code Limits phys memory to 16 GB
accessible in (= perGlobals
appl.), /3GB option in BOOT.INI
Globals
user or kernel Per-thread
user mode user
Per-thread user Windows Server 2003 and XP SP2
mode mode
modestacks
stacks supports variations from 2GB to 3GB
.DLL (/USERVA=)
.DLLcode
code
Process
Processheaps
heaps Provides 3 GB per-process address
Per process, space
accessible only
Commonly used by database servers (for
in kernel file mapping)
mode .EXE must have “large address space
BFFFFFFF aware” flag in image header, or they’re
C0000000 limited to 2 GB (specify at link time or with
Process page tables,
System wide, imagecfg.exe from ResKit)
hyperspace
accessible Chief “loser” in system space is file system
only in kernel Exec,
Exec,kernel,
kernel, cache
mode HAL,
HAL, Better solution: address windowing
FFFFFFFF
drivers,
drivers,etc.
etc. extensions
Even better: 64-bit Windows
38
Large Address Space Aware
Images
Images marked as “large address space aware”:
Lsass.exe – Security Server
Inetinfo.exe—Internet Information Server
Chkdsk.exe – Check Disk utility
Dllhst3g.exe – special version of Dllhost.exe (for COM+ applications)
Esentutl.exe - jet database repair tool
To see this type:
Imagecfg \windows\system32\*.exe > large_images.txt
Then search for “large” in large_images.txt
39
Large Address Space Aware on 64-
bits
Images marked large address space aware get
a full 4 GB process virtual address space
OS isn’t mapped there, so space is available for
process
40
Physical Memory
41
Physical Memory Limits (in
GB) x86 x64 32-bit x64 64-bit IA-64 64-
bit
XP Home 4 4 n/a n/a
XP Professional 4 4 16 n/a
42
Physical Memory Usage on Systems in
PAE Mode
Virtual address space is still 4 GB, so how can you “use” > 4 GB of memory?
1. Although each process can only address 2 GB, many may be in memory at the same time
(e.g. 5 * 2 GB processes = 10 GB)
2. Files in system cache remain in physical memory
Although file cache doesn’t know it, memory manager keeps unmapped data in
physical memory
64 GB Physical Memory
Other
System
Working Set
Assigned to Virtual Cache
Standby List
960 MB ~60 GB
3. New Address Windowing Extensions allow Windows processes to use more than 2 GB of
memory
43
Address Windowing Extensions
AWE functions allow Windows
processes to allocate large
amounts of physical memory
Process virtual memory
and then map “windows” into
that memory
Applications: database servers Physical memory
can cache large databases
Up to programmer to control
AWE memory
Like DOS enhanced memory AWE memory
(EMS) with more bits…
64-bits removes this need AWE memory
44
Windows Memory Allocation APIs
45
Windows API Memory
Management Architecture
Windows Program
Disc &
Physical Memory File System
46
Windows Memory Management
47
Managing Heap Memory
LPVOID HeapAlloc( HANDLE hHeap,
DWORD dwFlags,
DWORD dwBytes );
dwFlags:
HEAP_GENERATE_EXCEPTION,
raise SEH on memory allocation failure
STATUS_NO_MEMORY, STATUS_ACCESS_VIOLATION
HEAP_NO_SERIALIZE:
no serialization of concurrent (multithreaded) requests
HEAP_ZEROC_MEMORY: initialize allocated memory to zero
dwSize:
Block of memory to allocate
For non-growable heaps: 0x7FFF8 (0.5 MB)
HeapLock(), HeapUnlock():
HeapFree(), HeapReAlloc(), Manage concurrent accesses
HeapCompact(), HeapValidate() to heap
48
Excerpt:
Sorting with Binary Search Tree
#define NODE_HEAP_ISIZE 0x8000
__try {
/* Open the input file. */
hIn = CreateFile (fname, GENERIC_READ, 0, NULL,
OPEN_EXISTING, 0, NULL);
if (hIn == INVALID_HANDLE_VALUE)
fprintf(stderr, "Failed to open input file"), exit(1);
/* Allocate the two heaps. */
hNode = HeapCreate (
HEAP_GENERATE_EXCEPTIONS | HEAP_NO_SERIALIZE,
NODE_HEAP_ISIZE, 0);
hData = HeapCreate (
HEAP_GENERATE_EXCEPTIONS | HEAP_NO_SERIALIZE,
DATA_HEAP_ISIZE, 0);
/* Process the input file, creating the tree, actual search. */
pRoot = FillTree (hIn, hNode, hData);
49
Heap Management Example
(contd.)
/* Display the tree in Key order. */
printf ("Sorted file: %s"), fname); Scan (pRoot);
50
Virtual Address Space
Descriptors (VADs)
Bottom
2 GB
reserved
for App
122880 bytes* reserved
PAGE_READWRITE
52
Example: Committing Address
Space
VirtualAlloc(lpMem + 6 * 1024, 7 * 1024, MEM_COMMIT,
PAGE_READWRITE);
Bottom
2 GB
reserved
for App
122880 bytes reserved
PAGE_READWRITE 12KB* Committed
PAGE_READWRITE
53
Memory-Mapped Files
54
File Mapping Object
HANDLE CreateFileMapping (HANDLE hFile,
LPSECURITY_ATTRIBUTES lpsa,
DWORD fdwProtect,
DWORD dwMaximumSizeHigh,
Parameters: DWORD dwMaximumSizeLow,
LPCTSTR lpszMapName );
hFile:
hFile: handle to open file with compatible access rights (fdwProtect)
hFile == 0xFFFFFFFF: paging file, no need to create separate file
fdwProtect:
PAGE_READONLY, PAGE_READWRITE, PAGE_WRITECOPY
dwMaximumSizeHigh, dwMaximumSizeLow:
Zero: current file size is used
lpszMapName:
Name of mapping object for sharing between processes or NULL
55
Shared Memory
HANDLE OpenFileMapping (HANDLE hFile,
DWORD dwDesiredAccess,
BOOL bInheritHandle,
LPCTSTR lpName );
56
Mapping Process Address Space
to Mapping Objects
UNIX:
4.3BSD/SysV.4 LPVOID MapViewOfFile( HANDLE hMapObject,
have mmap() call;
DWORD fdwAccess, DWORD dwOffsetHigh,
See also DWORD dwOffsetLow, DWORD cbMap );
shmget(),shmctl(), BOOL UnmapViewOfFile ( LPVOID lpBaseAddress );
shmat(),shmdt()
57
Example: File Conversion with
Memory Mapping (Excerpt)
/* Open the input file. */
hIn = CreateFile (fIn, GENERIC_READ, 0, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hIn == INVALID_HANDLE_VALUE) fprintf(stderr, "Failure opening input file."), exit(1);
/* Create a file mapping object on the input file. Use the file size. */
hInMap = CreateFileMapping (hIn, NULL, PAGE_READONLY, 0, 0, NULL);
if (hInMap == INVALID_HANDLE_VALUE) fprintf(stderr, "Failure Creating input map."), exit(2);
58
Example (contd.)
pIn = pInFile; /* actual file conversion */
pOut = pOutFile;
while (pIn < pInFile + FsLow) {
*pOut = (WCHAR) *pIn; pIn++; pOut++;
}
_except (EXCEPTION_EXECUTE_HANDLER) {
/* Delete the output file if the operation did not complete successfully. */
if (!Complete)
DeleteFile (fOut);
return FALSE;
}
59
Memory Management APIs
Memory protection may be changed
per-page basis
PAGE_GUARD
PAGE_NOCACHE
60
Memory Management Information
VOID GetSystemInfo(LPSYSTEM_INFO lpSystemInfo);
61
Querying Address Space
DWORD VirtualQuery(LPVOID lpAddress,
PMEMORY_BASIC_INFORMATION lpBuffer, DWORD dwLength);
Returns:
typedef struct _MEMORY_BASIC_INFORMATION {
PVOID BaseAddress; // Block base
PVOID AllocationBase; // Region base
DWORD AllocationProtect;// Region prot
DWORD RegionSize; // # bytes in block
DWORD State; // State of block:
// MEM_RESERVE, MEM_COMMIT, MEM_FREE
DWORD Protect; // Pages prot
DWORD Type; // Type:
// MEM_IMAGE, MEM_MAPPED, MEM_PRIVATE
} MEMORY_BASIC_INFORMATION;
62
Memory Management Information
VOID GlobalMemoryStatus(LPMEMORYSTATUS lpms);
63
Further Reading
64
Source Code References
65