The Windows Operating System
Goals
Hardware-portable
Used to support MIPS, PowerPC and Alpha Currently supports x86, ia64, and amd64 Multiple vendors build hardware
Software-portable
POSIX, OS2, and Win32 subsystems
OS2 is dead POSIX is still supportedseparate product Lots of Win32 software out there in the world
Goals
High performance
Anticipated PC speeds approaching minicomputers and mainframes Async IO model is standard Support for large physical memories SMP was an early design goal Designed to support multi-threaded processes Kernel has to be reentrant
Process Model
Threads and processes are distinct Process:
Address space Handle table (Handles => file descriptors) Process default security token
Thread:
Execution Context Optional thread-specific security token
Tokens
Who you arelist of identities
Each identity is a SID
Also contains Privileges
Shutdown, Load drivers, Backup, Debug
Can be passed through LPC ports and named pipe requests
Server side can use this to selectively impersonate the client.
Object Manager
Uniform interface to kernel mode objects. Handles are 32bit opaque integers Per-process handle table maps handles to objects and permissions on the objects Implements refcount GC
Pointer counttotal number of references Handle countnumber of open handles
Object Manager
Implements an object namespace
Win32 objects are under \BaseNamedObjects Devices under \Device
This includes filesystems
Drive letters are symbolic links
\??\C: => the appropriate filesystem device
Some things have other names
Processes and threads are opened by specifying a CID: (Process.Thread)
Standard operations on handles
CloseHandle() DuplicateHandle()
Takes source and destination process Very useful for servers
WaitForSingleObject(), WaitForMultipleObjects()
Wait for something to happen Can wait on up to 64 handles at once
Security Descriptors
Each object has a Security Descriptor
Ownerspecial SID, CREATOR_OWNER Groupspecial SID, CREATOR_GROUP DACL
Discretionary Access Control List List of SIDs and granted or denied access rights
SACL
System Access Control List List of SIDs and access rights to be audited
Access Rights
typedef struct _ACCESS_MASK { USHORT SpecificRights; UCHAR StandardRights; UCHAR AccessSystemAcl : 1; UCHAR Reserved : 3; UCHAR GenericAll : 1; UCHAR GenericExecute : 1; UCHAR GenericWrite : 1; UCHAR GenericRead : 1; } ACCESS_MASK;
Security Use
Objects are referred to via handles Security checks occur when an object is opened
Open requests contain a mask of requested access rights If granted to the token by the DACL, the handle contains those access rights
Access rights are checked on use
Just a bit testvery fast
Object Open
evt = OpenEvent(EVENT_MODIFY_STATE, FALSE, "SomeName"); Finds the event object by name Walks the DACL, looking for token SIDs Keeps looking until all permissions are granted If access is granted, inserts a handle to the object into the processs handle table, with EVENT_MODIFY_STATE access
Object Use
SetEvent(evt);
SetEvent() requires EVENT_MODIFY_STATE access, and an event object. The kernel looks up the handle in the processs handle table. Checks to make sure that it maps to an event object, and that the granted access bits contain the EVENT_MODIFY_STATE bit. If all is good, the event is set.
Object Use
WaitForSingleObject(evt)
WaitForSingleObject() requires a synchronization object (like an event) and SYNCHRONIZE access. evt maps to an event object SYNCHRONIZE access was not requested when the handle was inserted. Even if the DACL permits it, the wait fails.
Types of Objects
Events
State is set or clear. Can clear when a wait completes (auto-reset)
Mutexes
Can be acquired by a single thread at a time. Automatically release when owner exits.
Semaphores
Maintain a count Waits decrement the count
More objects
Threads, Processes, Timerslike events Registry Keys
Manipulate data in the registrycentralized store of system configuration info.
LPC Ports
Fast local RPC Security tokens can transfer over LPC calls
Files
Files & IO
File objects maintain a current offset, and a pointer to the underlying stream. Default internal model is asynchronous
Synchronous IO just waits for the IO to complete Async IO can set an event, or run a callback in the thread which queued the IO, or post a message to an IO completion port.
Each request is an IRP
IRPs
Maintain state of IO requests, independent of the thread working on the IO IRPs are handed off through the device stack to their destinations
Threads process IRPs Initiating thread processes the IRP until a device returns STATUS_PENDING Subsequent processing can be done in kernel worker threads
Interrupts
IRQLInterrupt Request Level:
0 => PASSIVE_LEVEL
Processor is running threads All usermode code is at IRQL 0
1 => APC_LEVEL; threads, APCs disabled 2 => DISPATCH_LEVEL
Running as the processor: cant stop! Cant take a page fault Only locks available are KSPIN_LOCKs
Interupts
3-26 => Device Interrupt Service Routines
Device interrupts are mapped to an IRQL and an interrupt service routine; ISR is called at that IRQL
27 => PROFILE_LEVELprofiling 28 => CLOCK2_LEVELclock interrupt 29 => IPI_LEVELinterprocessor interrupt
Requests another processor to do something
30 => POWER_LEVELpower failure 31 => HIGH_LEVELinterrupts disabled
Interrupts
Hardware signals an interrupt Interrupts ISR runs at device IRQL
Has to be fast; get off the processor and allow other ISRs to run Typically queues a DPC, acknowledges the interrupt, and returns
DPCDelayed Procedure Call
Further processing at DISPATCH_LEVEL Queues work to kernel worker threads
IO Completion
Driver calls IO Manager to complete the IRP IO Manager queues a kernel mode APC to the initiating thread APC: Asynchronous Procedure Call
Kernel mode APC preempts thread execution Writes data back to user mode in the context of the thread which initiated the IO Signals completion of the IO
IO Cache
Classic: block cache
Page mappings translate directly to blocks on the underlying partition.
Windows: stream cache
Page mappings are offsets within a stream. IO Cache Manager uses the same mappings. All cache management (trimming) is centralized in the memory manager All modifications show up in mapped views.
Virtual Memory
Sectionsanother object type
Can be created to map a file Can also be created off the pagefile Optionally named, for shared memory
Reservation
Range of VA which will not be handed out for some other purpose
Committed
VA which actually maps to something
Aside: CreateProcess
Just a user mode Win32 API { NtCreateFile(&file, szImage); NtCreateSection(&sec, file); NtCreateProcess(&proc, sec); NtCreateThread(&thrd, proc); } WaitForSingleObject(proc);
Virtual Memory
Memory Manager maintains processorspecific page table entry mappings.
Some parts of the address space are shared between processesfor instance, the kernels address space and the per-session space.
On a pagefault, mm reads in the data Pages can be mapped without the appropriate access what to do?
Signals
With threads, signals dont work very well. Some software designs expect to touch inaccessible memory.
Large structured files Concurrent garbage collection SLists
Single global handler has to somehow know about all possible situations.
Structured Exception Handling
Exceptions unwind the stack
Almost like C++! C++ matches against a type hierarchy SEH calls exception filter codefilters are Turing-complete.
Two ways to deal with exceptions:
try/finally try/except
try/finally
res = AllocateSomeResource(); try { SomeOperation(res); } finally { if (AbnormalTermination()) { FreeSomeResource(res); } } return res;
try/except
try { SomeOperationWhichMayAV(); } except (Filter( GetExceptionCode(), GetExceptionInformation())) { DoSomethingElse(); }
try/except
GetExceptionCode()
A code indicating the cause of the exception
GetExceptionInformation()
Additional code-specific info The full processor context
Filter decides what to do
EXCEPTION_EXECUTE_HANDLER EXCEPTION_CONTINUE_SEARCH EXCEPTION_CONTINUE_EXECUTION
Structured Exception Handling
On x86, TEB points to stack of EXCEPTION_REGISTRATION_RECORD
auto structs, pointing to handler code pushed by function prolog popped by function epilog
On exception, RtlDispatchException() walks the list.
Runs the filters to figure out what to do Calls handler functions
Structured Exception Handling
On x86, theres some overhead with pushing and popping the registration record On ia64, there is no overhead
Stack traces are reliable Its always possible to look up the handler
Exception handling is very slow
Especially on ia64
Used only for truly exceptional conditions
Structured Exception Handling
Used in kernel mode too!
Most user mode access will just work Still need to validate address ranges & data Works great for SMP when another thread might be in the middle of modifying the address space Expected read exceptions are returned as status codes from system calls Expected writes are returned as SUCCESS Unexpected => buggy kernel => blue screen
Top-level Exception Filter
Top frame on each thread defines a catchall exception filter Top-level exception filter:
Notifies the debugger (if being debugged) Launches a just-in-time debugger (if set up) Loads faultrep.dll to report the failure
Faultrep.dll
faultrep.dll offers to report the failure back to Microsoft We analyze the failures
A significant number are recognized instantly; we can tell the user what happened and how to fix it. The others go through the standard triage process; developers analyze the dumps and figure out what happened.
OCA
67 million machines running XP Tens of thousands of drivers Over 100 drivers on any given machine One bug in one driver => Crash A significant number of crashes come from third-party drivers (some of which ship on the CD) Lots of different problems, though
Driver Verifier
Controlled by verifier.exe Special-pools allocations
Detects allocation overruns & use after free
Validates some behaviors
IRQLtouching paged memory? DMA buffers
Can inject failuresuseful for testing behavior under sub-optimal conditions
Stress
Every night, a couple hundred machines run stress on the latest build Stress exercises filesystems, memory, GUI, scheduler, &c, trying to uncover lowmemory handling problems and race conditions Every morning, the stress test team triages failed machines Developers debug the failures
Questions?