0% found this document useful (0 votes)
99 views

Windows Kernel Internals Ii: Processes, Threads, Virtualmemory

Uploaded by

padmarao1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Windows Kernel Internals Ii: Processes, Threads, Virtualmemory

Uploaded by

padmarao1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Windows Kernel Internals II

Processes, Threads,
VirtualMemory
University of Tokyo – July 2004*

Dave Probert, Ph.D.


Advanced Operating Systems Group
Windows Core Operating Systems Division
Microsoft Corporation

© Microsoft Corporation 2004 1


Windows Architecture
Applications

DLLs System Services Login/GINA


Subsystem
servers Kernel32 Critical services User32 / GDI

User-mode ntdll / run-time library

Kernel-mode Trap interface / LPC

Security refmon IO Manager Virtual memory Procs & threads Win32 GUI

File filters
FS run-time Scheduler
File systems
Volume mgrs exec synchr
Cache mgr
Device stacks
Object Manager / Configuration Management
Kernel run-time / Hardware Adaptation Layer
© Microsoft Corporation 2004 2
Process
Container for an address space and threads
Associated User-mode Process Environment Block (PEB)
Primary Access Token
Quota, Debug port, Handle Table etc
Unique process ID
Queued to the Job, global process list and Session list
MM structures like the WorkingSet, VAD tree, AWE etc

© Microsoft Corporation 2004 3


Thread
Fundamental schedulable entity in the system
Represented by ETHREAD that includes a KTHREAD
Queued to the process (both E and K thread)
IRP list
Impersonation Access Token
Unique thread ID
Associated User-mode Thread Environment Block (TEB)
User-mode stack
Kernel-mode stack
Processor Control Block (in KTHREAD) for cpu state when
not running
© Microsoft Corporation 2004 4
Job
Container for multiple processes
Queued to global job list, processes and jobs in the job set
Security token filters and job token
Completion ports
Counters, limits etc

© Microsoft Corporation 2004 5


Process/Thread structure
Any Handle Object Process
Table Manager Object
Thread

Thread
Files Virtual
Process’ Thread
Address
Events Handle Table
Descriptors Thread
Devices
Thread
Drivers
Thread

© Microsoft Corporation 2004 6


KPROCESS fields
DISPATCHER_HEADER Header KAFFINITY Affinity
ULPTR DirectoryTableBase[2] USHORT StackCount
KGDTENTRY LdtDescriptor SCHAR BasePriority
KIDTENTRY Int21Descriptor SCHAR ThreadQuantum
USHORT IopmOffset BOOLEAN AutoAlignment
UCHAR Iopl UCHAR State
volatile KAFFINITY ActiveProcessors BOOLEAN DisableBoost
ULONG KernelTime UCHAR PowerState
ULONG UserTime BOOLEAN DisableQuantum
LIST_ENTRY ReadyListHead UCHAR IdealNode
SINGLE_LIST_ENTRY SwapListEntry
LIST_ENTRY ThreadListHead
KSPIN_LOCK ProcessLock

© Microsoft Corporation 2004 7


EPROCESS fields
KPROCESS Pcb KGUARDED_MUTEX
EX_PUSH_LOCK ProcessLock AddressCreationLock
LARGE_INTEGER CreateTime KSPIN_LOCK HyperSpaceLock
LARGE_INTEGER ExitTime struct _ETHREAD *ForkInProgress
EX_RUNDOWN_REF ULONG_PTR HardwareTrigger;
RundownProtect PMM_AVL_TABLE
HANDLE UniqueProcessId PhysicalVadRoot
LIST_ENTRY ActiveProcessLinks PVOID CloneRoot
Quota Felds PFN_NUMBER
SIZE_T PeakVirtualSize NumberOfPrivatePages
SIZE_T VirtualSize PFN_NUMBER
NumberOfLockedPages
LIST_ENTRY SessionProcessLinks
PVOID Win32Process
PVOID DebugPort
struct _EJOB *Job
PVOID ExceptionPort
PVOID SectionObject
PHANDLE_TABLE ObjectTable
PVOID SectionBaseAddress
EX_FAST_REF Token
PEPROCESS_QUOTA_BLOCK
PFN_NUMBER WorkingSetPage QuotaBlock
© Microsoft Corporation 2004 8
EPROCESS fields
PPAGEFAULT_HISTORY PVOID AweInfo
WorkingSetWatch MMSUPPORT Vm
HANDLE Win32WindowStation Process Flags
HANDLE InheritedFromUniqueProcessId NTSTATUS ExitStatus
PVOID LdtInformation UCHAR PriorityClass
PVOID VadFreeHint MM_AVL_TABLE VadRoot
PVOID VdmObjects
PVOID DeviceMap
PVOID Session
UCHAR ImageFileName[ 16 ]
LIST_ENTRY JobLinks
PVOID LockedPagesList
LIST_ENTRY ThreadListHead
ULONG ActiveThreads
PPEB Peb
IO Counters

© Microsoft Corporation 2004 9


KTHREAD fields
DISPATCHER_HEADER Header UCHAR EnableStackSwap
LIST_ENTRY MutantListHead volatile UCHAR SwapBusy
PVOID InitialStack, StackLimit LIST_ENTRY WaitListEntry
PVOID KernelStack NEXT SwapListEntry
KSPIN_LOCK ThreadLock PRKQUEUE Queue
ULONG ContextSwitches ULONG WaitTime
volatile UCHAR State SHORT KernelApcDisable
KIRQL WaitIrql SHORT SpecialApcDisable
KPROC_MODE WaitMode KTIMER Timer
PVOID Teb KWAIT_BLOCK WaitBlock[N+1]
KAPC_STATE ApcState LIST_ENTRY QueueListEntry
KSPIN_LOCK ApcQueueLock UCHAR ApcStateIndex
LONG_PTR WaitStatus BOOLEAN ApcQueueable
PRKWAIT_BLOCK WaitBlockList BOOLEAN Preempted
BOOLEAN Alertable, WaitNext BOOLEAN ProcessReadyQueue
UCHAR WaitReason BOOLEAN KernelStackResident
SCHAR Priority
© Microsoft Corporation 2004 10
KTHREAD fields cont.
UCHAR IdealProcessor PKTRAP_FRAME TrapFrame
volatile UCHAR NextProcessor ULONG KernelTime, UserTime
SCHAR BasePriority PVOID StackBase
SCHAR PriorityDecrement KAPC SuspendApc
SCHAR Quantum KSEMAPHORE SuspendSema
BOOLEAN SystemAffinityActive PVOID TlsArray
CCHAR PreviousMode LIST_ENTRY ThreadListEntry
UCHAR ResourceIndex UCHAR LargeStack
UCHAR DisableBoost UCHAR PowerState
KAFFINITY UserAffinity UCHAR Iopl
PKPROCESS Process CCHAR FreezeCnt, SuspendCnt
KAFFINITY Affinity UCHAR UserIdealProc
PVOID ServiceTable volatile UCHAR DeferredProc
PKAPC_STATE ApcStatePtr[2] UCHAR AdjustReason
KAPC_STATE SavedApcState SCHAR AdjustIncrement
PVOID CallbackStack
PVOID Win32Thread
© Microsoft Corporation 2004 11
ETHREAD fields
KTHREAD tcb
Timestamps
LPC locks and links
CLIENT_ID Cid
ImpersonationInfo
IrpList
pProcess
StartAddress
Win32StartAddress
ThreadListEntry
RundownProtect
ThreadPushLock

© Microsoft Corporation 2004 12


Process Synchronization
ProcessLock – Protects thread list, token
RundownProtect – Cross process address space,
image section and handle table references
Token, Prefetch – Uses fast referencing
Token, Job – Torn down at last process
dereference without synchronization

© Microsoft Corporation 2004 13


Initialized
KeInitThread

Transition PspCreateThread

Thread
k stack KiReadyThread
swapped KiInsertDeferredReadyList

KiInsertDeferredReadyList
KiReadyThread

scheduling
Deferred
Ready

Ready
process

states
swapped KiRetireDpcList/KiSwapThread/ KiSetAffinityThread
KiExitDispatcher KiSetpriorityThread
KiProcessDeferredReadyList
KiDeferredReadyThread
no avail.
Ready
processor

KiSelectNextThread
KiUnwaitThread
KiReadyThread
Idle
processor
or Standby preemption
Waiting preemption
Affinity
ok
KiQuantumEnd
KiIdleSchedule
KiSwapThread
KiExitDispatcher
NtYieldExecution

Affinity
not ok

Terminated Running preemption


KeTerminateThread

Kernel Thread Transition Diagram


© Microsoft
[email protected]
Corporation 2004 14
2003/04/06 v0.4b
Thread scheduling states
• Main quasi-states:
– Ready – able to run
– Running – current thread on a processor
– Waiting – waiting an event
• For scalability Ready is three real states:
– DeferredReady – queued on any processor
– Standby – will be imminently start Running
– Ready – queue on target processor by priority
• Goal is granular locking of thread priority
queues
• Red states related to swapped stacks and
processes © Microsoft Corporation 2004 15
Process Lifetime
Created as an empty shell
Address space created with only ntdll and the main image
unless forked
Handle table created empty or populated via duplication
from parent
Process is partially destroyed on last thread exit
Process totally destroyed on last dereference

© Microsoft Corporation 2004 16


Thread Lifetime
Created within a process with a CONTEXT record
Starts running in the kernel but has a trap frame to return to
use mode
Kernel queues user APC to do ntdll initialization
Terminated by a thread calling NtTerminateThread/Process

© Microsoft Corporation 2004 17


Summary: Native NT Process APIs
NtCreateProcess() NtCreateThread()
NtTerminateProcess() NtTerminateThread()
NtQueryInformationProcess() NtSuspendThread()
NtSetInformationProcess() NtResumeThread()
NtGetNextProcess() NtGetContextThread()
NtGetNextThread() NtSetContextThread()
NtSuspendProcess() NtQueryInformationThread()
NtResumeProcess() NtSetInformationThread()
NtAlertThread()
NtQueueApcThread()

© Microsoft Corporation 2004 18


Virtual Memory Manager
Features
Provides 4 GB flat virtual address space (IA32)
Manages process address space
Handles pagefaults
Manages process working sets
Manages physical memory
Provides memory-mapped files
Allows pages shared between processes
Facilities for I/O subsystem and device drivers
Supports file system cache manager

© Microsoft Corporation 2004 19


Virtual Memory Manager
NT Internal APIs
NtCreatePagingFile
NtAllocateVirtualMemory (Proc, Addr, Size, Type, Prot)
Process: handle to a process
Protection: NOACCESS, EXECUTE, READONLY, READWRITE,
NOCACHE
Flags: COMMIT, RESERVE, PHYSICAL, TOP_DOWN, RESET,
LARGE_PAGES, WRITE_WATCH
NtFreeVirtualMemory(Process, Address, Size, FreeType)
FreeType: DECOMMIT or RELEASE
NtQueryVirtualMemory
NtProtectVirtualMemory

© Microsoft Corporation 2004 20


Virtual Memory Manager
NT Internal APIs

Pagefault

NtLockVirtualMemory, NtUnlockVirtualMemory
– locks a region of pages within the working set list
– requires PROCESS_VM_OPERATION on target
process and SeLockMemoryPrivilege
NtReadVirtualMemory, NtWriteVirtualMemory (
Proc, Addr, Buffer, Size)
NtFlushVirtualMemory
© Microsoft Corporation 2004 21
Virtual Memory Manager
NT Internal APIs
NtCreateSection
– creates a section but does not map it
NtOpenSection
– opens an existing section
NtQuerySection
– query attributes for section
NtExtendSection
NtMapViewOfSection (Sect, Proc, Addr, Size, …)
NtUnmapViewOfSection

© Microsoft Corporation 2004 22


Virtual Memory Manager
NT Internal APIs
APIs to support AWE (Address Windowing Extensions)
– Private memory only
– Map only in current process
– Requires LOCK_VM privilege
NtAllocateUserPhysicalPages (Proc, NPages, &PFNs[])
NtMapUserPhysicalPages (Addr, NPages, PFNs[])
NtMapUserPhysicalPagesScatter
NtFreeUserPhysicalPages (Proc, &NPages, PFNs[])

NtResetWriteWatch
NtGetWriteWatch
Read out dirty bits for a section of memory since last
reset © Microsoft Corporation 2004 23
Allocating kernel memory (pool)
• Tightest x86 system resource is KVA
Kernel Virtual Address space
• Pool allocates in small chunks:
< 4KB: 8B granulariy
>= 4KB: page granularity
• Paged and Non-paged pool
Paged pool backed by pagefile
• Special pool used to find corruptors
• Lots of support for debugging/diagnosis

© Microsoft Corporation 2004 24


80000000
System code, initial non-paged pool
A0000000
Session space (win32k.sys)
A4000000
Sysptes overflow, cache overflow
C0000000
Page directory self-map and page tables
C0400000
Hyperspace (e.g. working set list) x86
C0800000
Unused – no access
C0C00000
System working set list
C1000000
System cache
E1000000
Paged pool
E8000000
Reusable system VA (sysptes)
Non-paged pool expansion
FFBE0000
Crash dump information
FFC00000 HAL usage
© Microsoft Corporation 2004 25
Valid x86 Hardware PTEs
Reserved
Global
Dirty
Accessed
Cache disabled
Write through
Owner
Write

Pageframe R R R G R D A Cd Wt O W 1
31 12 11 10 9 8 7 6 5 4 3 2 1 0

© Microsoft Corporation 2004 26


Virtual Address Translation

CR3

PD PT page DATA

1024 1024 4096


PDEs PTEs bytes

0000 0000 0000 0000 0000 0000 0000 0000


© Microsoft Corporation 2004 27
Self-mapping page tables
• Page Table Entries (PTEs) and Page Directory Entries
(PDEs) contain Physical Frame Numbers (PFNs)
– But Kernel runs with Virtual Addresses
• To access PDE/PTE from kernel use the self-
map for the current process:
PageDirectory[0x300] uses PageDirectory as
PageTable
– GetPdeAddress(va): 0xc0300000[va>>20]
– GetPteAddress(va): 0xc0000000[va>>10]
• PDE/PTE formats are compatible!
• Access another process VA via thread ‘attach’
© Microsoft Corporation 2004 28
Self-mapping page tables
Virtual Access to PageDirectory[0x300]
CR3
Phys: PD[0xc0300000>>22] = PD
Virt: *((0xc0300c00) == PD
PD
0x300
PTE

0000 0000 0011


1100 0000 0000 0000 1100
0000 0000 0000
© Microsoft Corporation 2004 29
Self-mapping page tables
Virtual Access to PTE for va 0xe4321000

CR3

GetPteAddress:
0xe4321000
PD PT => 0xc0390c84

0x300 0x321

0x390 PTE

0000 0000 0011


1100 0000 1001
0000 0000 1100
0000 1000
0000 0100
0000
© Microsoft Corporation 2004 30
x86 Invalid PTEs
Transition
Page file Prototype

Page file offset 0 Protection PFN 0


31 12 11 10 9 5 4 1 0

Transition
Transition Prototype

Page file offset 1 Protection HW ctrl 0


31 12 11 10 9 5 4 1 0
Cache disable
Write through
Owner
Write
© Microsoft Corporation 2004 31
x86 Invalid PTEs

Demand zero: Page file PTE with zero offset and


PFN

Unknown: PTE is completely zero or Page Table


doesn’t exist yet. Examine VADs.

Pointer to Prototype PTE


pPte bits 7-27 pPte bits 0-6 0
31 12 11 10 9 8 7 5 4 1 0

© Microsoft Corporation 2004 32


Prototype PTEs

• Kept in array in the segment structure


associated with section objects
• Six PTE states:
– Active/valid
– Transition
– Modified-no-write
– Demand zero
– Page file
– Mapped file

© Microsoft Corporation 2004 33


Physical Memory Management

Process/System Soft
Soft
Working Set Fault
Fault
Trim Trim
Clean Dirty
Delete
Page

Modified
Standby Modified
Page-
List List
writer

MM Low
Memory Physical Page State
Changes

Hardfault Zerofault
(DISK) (FILL)

Free Zero Zero


List Thread List

© Microsoft Corporation 2004 35


Paging Overview
Working Sets: list of valid pages for each process
(and the kernel)
Pages ‘trimmed’ from working set on lists
Standby list: pages backed by disk
Modified list: dirty pages to push to disk
Free list: pages not associated with disk
Zero list: supply of demand-zero pages
Modify/standby pages can be faulted back into a
working set w/o disk activity (soft fault)
Background system threads trim working sets,
write modified pages and produce zero pages
based on memory state and config parameters
© Microsoft Corporation 2004 36
Managing Working Sets
Aging pages: Increment age counts for pages
which haven't been accessed
Estimate unused pages: count in working set and
keep a global count of estimate
When getting tight on memory: replace rather
than add pages when a fault occurs in a working
set with significant unused pages
When memory is tight: reduce (trim) working sets
which are above their maximum
Balance Set Manager: periodically runs Working
Set Trimmer, also swaps out kernel stacks of
long-waiting threads
© Microsoft Corporation 2004 37
Discussion

© Microsoft Corporation 2004 38

You might also like