Windows Internals
Windows Internals
Advanced Troubleshooting
Part 1: Kernel Architecture
David Solomon
David Solomon Expert Seminars
Mark Russinovich
Winternals Software
1-1
1-2
1-3
Outline
1. Kernel Architecture
2. Troubleshooting Processes and Threads
3. Troubleshooting Memory Problems
4. Crash Dump Analysis
1-4
System NTDLL.DLL
Threads
Kernel
Mode System Service Dispatcher
(kernel mode callable interfaces) Win32
USER,
I/O Mgr
Configura-
Processes
Procedure
Reference
GDI
Play Mgr.
(registry)
Plug and
Security
tion Mgr
Threads
Memory
Monitor
System
Object
Virtual
Power
Cache
Local
Mgr.
Mgr.
Call
File
&
Device & Graphics
File Sys. Drivers
Drivers
Kernel
Hardware Abstraction Layer (HAL)
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
1-5
Original copyright by Microsoft Corporation. Used by permission.
Windows XP
Six variants:
1. Windows XP Professional: replaces Windows 2000
Professional
2. Windows XP Home Edition (new)
First consumer focused release of NT
Replaces Windows ME (Millenium Edition)
Has slightly less features than Windows XP Professional
3. Windows XP Professional 64-bit Edition (new)
First 64-bit version of NT - 64-bit pointers, much larger
address space
Runs on Intel Itanium & Itanium 2 (later: AMD Opteron)
4. Windows XP Embedded
Same kernel as regular 32-bit XP
Configurable to remove unnecessary components
Boot and execute from ROM (OS runs from RAM, apps
from ROM)
5. Windows XP Media Center Edition
6. Windows XP Tablet PC Edition
1-6
New features:
More scalable: 64 processor systems, 8 node clusters, larger
memory maximums
IIS 6.0 (HTTP in the kernel, Connection failover)
Active Directory enhancements
Many new group policies
Remote Installation Support (RIS)
Bundles .NET Framework
1-7
Kernel Debugger
Allows exploring internal system state & data
structures
Part of Windows Debugging Tools
Download from http://www.microsoft.com/ddk/debugging
XP & Server 2003 support live kernel debugging
But not all commands work
LiveKD - tool on Inside Windows 2000 book CD
Allows using standard Microsoft kernel debuggers to view
“live” system state
Works on NT4, Windows 2000, Windows XP, Server 2003
Make sure to get patch from
www.sysinternals.com/insidew2k.shtml
1-10
1-11
Kernel Architecture
Process Execution Environment
Architecture Overview
Interrupt Handling & Time Accounting
System Threads
Process-based code
Summary
1-12
1-13
Scheduling Priorities
Realtime Time Critical 31
Realtime
Realtime
Levels 16-31
24
High
Realtime Idle 16
15
13 Above Normal
Normal
10
Dynamic Below Normal
Levels 1-15 8 8
Idle
6
Dynamic Idle
System Idle 0
1-14
32-Bit Virtual
00000000
Address Space
Code: EXE/DLLs
Unique per
process, Data: EXE/DLL (x86)
accessible in static storage, per- 2 GB per-process
user or kernel thread user mode Address space of one process
mode stacks, process is not directly reachable from
heaps, etc. other processes
2 GB system-wide
7FFFFFFF The operating system is
80000000 loaded here, and appears
Per process, Code: in every process’s
accessible NTOSKRNL, HAL, address space
only in kernel drivers The operating system is not a
mode Data: kernel stacks, process (though there are
C0000000 processes that do things for
Process page tables, the OS, more or less in
System wide, File system cache
hyperspace “background”)
accessible Non-paged pool,
only in kernel
3 GB user space and Address
Paged pool Windowing Extensions (AWE)
mode
FFFFFFFF
t.b.d.
1-16
E000000000000000
-E000060000000000 System Space
1-18
1-19
1-21
System User
& Service Application
Processes OS/2 POSIX
Subsystem DLL Win32
User
Mode NTDLL.DLL
Kernel
Mode Executive
Win32
Device Drivers Kernel User/GDI
Hardware Abstraction Layer (HAL)
1-22
Subsystem Components
1 API DLLs
For Win32: Kernel32.DLL, Gdi32.DLL, User32.DLL, etc.
2 Subsystem process
For Win32: CSRSS.EXE (Client Server Runtime SubSystem)
3 For Win32 only: kernel-mode GDI code
Win32K.SYS – (this code was formerly part of CSRSS)
Environment Subsystems
System User 2
& Service Application OS/2 POSIX
Processes Subsystem DLL Win32
User 1
Mode
Kernel
Executive
Mode Win32 3
Device Drivers Kernel User/GDI
Hardware Abstraction Layer (HAL)
1-24
1 API DLLs
Export the APIs defined by the subsystem
Implement them by calling Windows “native” services, or by asking the
subsystem process to do the work
2 Subsystem process
Maintains global state of subsystem
Implements a few APIs that require subsystem-wide state changes
Processes and threads created under a subsystem
Drive letters
Window management for apps with no window code of their own (character-
mode apps)
Handle and object tables for subsystem-specific objects
3 Win32K.Sys
Implements Win32 User and GDI functions; calls routines in
GDI drivers
Also used by Posix and OS/2 subsystems to access the display
1-25
1-26
SMP Scalability
More efficient locking mechanism (pushlocks)
Minimized lock contention for hot locks
E.g., PFN (Page Frame Database) lock
Some locks completely eliminated
Charging nonpaged/paged pool quotas, allocating and
mapping system page table entries, charging
commitment of pages, allocating/mapping physical
memory through
AWE functions
Even better in Server 2003:
Further reduction of use of spinlocks & length they are
held
Dispatcher (scheduling) database locking now per-
CPU
1-28
1-29
Many Packages…
1. Windows XP Home Edition
1 CPU, 4GB RAM
2. Windows 2000 & XP Professional
Desktop version (but also is a fully functional server system)
2 CPUs, 4GB RAM
3. Windows Server 2003, Web Edition (new)
Reduced functionality Standard Server (no domain controller)
2 CPUs, 2GB RAM
4. Windows 2000 Server/Windows Server 2003, Standard Edition
Adds server and networking features (active directory-based domains,
host-based mirroring and RAID 5, NetWare gateway, DHCP server,
WINS, DNS, …)
Also is a fully capable desktop system
4 CPUs (2 in Server 2003), 4GB RAM
5. Windows 2000 Advanced Server/Windows Server 2003, Enterprise
Edition
3GB per-process address space option, Clusters (8 nodes)
8 CPUs, 8GB RAM (32GB in Server 2003 32-bit; 64GB on 64-bit)
6. Windows 2000/Server 2003 Datacenter Edition
Process Control Manager
Licensed for 32 CPUs, 64GB RAM (128GB on 64-bit edition)
1-30
NTOSKRNL.EXE
Core operating system image
Contains Executive and Kernel
Kernel versions
Windows NT 4.0 is 4.0 (client and server)
Windows 2000 is 5.0 (client and server)
Windows XP is 5.1 (client only)
Windows Server 2003 is 5.2 (server only)
Kernel evolution
NT4->Windows 2000 – significant change
Windows 2000->Windows XP – modest change
Windows XP->Server 2003 – minimal change
1-32
Four variations:
4GB or less
NTOSKRNL.EXE Uniprocessor
NTKRNLMP.EXE Multiprocessor
>4GB (new as of Windows 2000)
NTKRNLPA.EXE Uniprocessor w/extended
addressing support
NTKRPAMP.EXE Multiprocessor w/extended
addressing support
1-33
1-34
NT distribution
CD-ROM:\i386 Boot Partition:
\Windows\System32
NTOSKRNL.EXE,
NTKRNLPA.EXE,
NTKRNLMP.EXE,
NTKRPAMP.EXE
NTOSKRNL.EXE
NT Setup NTKRNLPA.EXE
HAL.DLL
HALACPI.DLL
HAL.DLL
etc.
(see \windows\repair\setup.log)
1-35
Kernel Architecture
Process Execution Environment
Architecture Overview
Interrupt Handling & Time Accounting
System Threads
Process-based code
Summary
1-38
Dismiss interrupt
1-39
1-41
1-42
1-43
Kernel Architecture
Process Execution Environment
Architecture Overview
Interrupt Handling & Time Accounting
System Threads
Process-based code
Summary
1-44
1-45
1-47
Kernel Architecture
Process Execution Environment
Architecture Overview
Interrupt Handling & Time Accounting
System Threads
Process-based code
Summary
1-48
Process-Based NT Code
System Startup Processes
First two processes aren’t real processes
Not running a user mode .EXE
No user-mode address space
Different utilities report them with different names
Data structures for these processes (and their initial threads) are
“pre-created” in NtosKrnl.Exe and loaded along with the code
(Idle) Process id 0
Part of the loaded system image
Home for idle thread(s) (not a real process nor real threads)
Called “System Process” in many displays
(System) Process id 2 (8 in Windows 2000; 4 in XP)
Part of the loaded system image
Home for kernel-defined threads (not a real process)
Thread 0 (routine name Phase1Initialization) launches the first
“real” process, running smss.exe...
...and then becomes the zero page thread
1-50
Win32 Services
An overloaded generic term
A process created and managed by the Service
Control Manager (Services.exe)
E.g. Solitaire can be configured as a service, but is
killed shortly after starting
Similar in concept to Unix daemon processes
Typically configured to start at boot time (if started
while logged on, survive logoff)
Typically do not interact with the desktop
Note: Prior to Windows 2000 this is one way to
start a process on a remote machine (now you
can do it with WMI)
1-52
System boot/initialization
SCM reads registry, starts Service
services as directed Controller/
Manager
Management/maintenance (Services.Exe)
Control panel can start
Service
and stop services and
Processes
change startup parameters
Control
Panel
1-53
1-54
Logon Process
1. Winlogon sends username/password to Lsass
Either on local system for local logon, or to Netlogon service on a domain
Windows XP enhancement: Winlogon doesn’t wait for Workstation
service to start if:
Account doesn't depend on a roaming profile
Domain policy that affects logon hasn't changed since last logon
Controller for a network logon
2. Creates a process to run
HKLM\Software\Microsoft\Windows NT
\CurrentVersion\WinLogon\Userinit
By default: Userinit.exe
Runs logon script, restores drive-letter mappings, starts shell
3. Userinit creates a process to run
HKLM\Software\Microsoft\Windows NT
\CurrentVersion\WinLogon\Shell
By default: Explorer.exe
There are other places in the Registry that control
programs that start at logon 1-56
1-57
Kernel Architecture
Process Execution Environment
Architecture Overview
Interrupt Handling & Time Accounting
System Threads
Process-based code
Summary
1-58
System NTDLL.DLL
Threads
Kernel
Mode System Service Dispatcher
(kernel mode callable interfaces) Win32
USER,
I/O Mgr
Configura-
Processes
Procedure
Reference
GDI
Play Mgr.
(registry)
Plug and
Security
tion Mgr
Threads
Memory
Monitor
System
Object
Virtual
Power
Cache
Local
Mgr.
Mgr.
Call
File
&
Device & Graphics
File Sys. Drivers
Drivers
Kernel
Hardware Abstraction Layer (HAL)
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
1-59
Original copyright by Microsoft Corporation. Used by permission.
1-62
1-1
Agenda
Introduction to Tools
Identifying the Process
Analyzing Process/Thread Activity
Application Failures
1-2
1-4
1-5
Task Manager:
Applications vs.
Processes
Applications tab: List
of top level visible
windows
Windows are owned by
threads
Right-click on a window
and select “Go to
process”
Processes tab: List of
processes “Running” means
waiting for window
Can configure with messages
View->Select columns
1-8
1-9
1-10
1-11
Process Explorer
Process tree
If left justified, parent has exited
Disappears if you sort by any column
Bring back with View->Show Process Tree
Additional details in process list
Icon and description (from .EXE)
User Name shows which security database account
is from (e.g. which domain)
Highlight Own, Services Processes
Differences highighting
Green: new, Red: gone
View->Update speed->Paused
1-12
1-13
Process Properties
Image tab:
Description, company name, version
(from .EXE)
Full image path
Command line used to start process
Current directory
Parent process
User name
Start time
Performance tab:
Basic process CPU/memory usage
Security tab:
Access token (groups list, privilege list)
Environment tab: environment
variables
Services tab (only for service
processes):
List of services hosted by process
1-14
Handle View
Lower half of display shows either:
Open handles
Loaded DLLs & mapped files
Handle View
Sort by handle
Objects of type “File” and “Key” are most
interesting for general troubleshooting
1-16
1-17
DLL View
Click on View->DLL View
Shows more than just loaded DLLs
Includes .EXE and any “memory mapped files”
High speed file access mechanism
Makes file appear as virtual memory
Uses:
Detect DLL versioning problems
Compare the output from a working process with that of a
failing one (use File->Save As)
Find which processes are using a specific DLL
(search for it)
Show Relocated DLLs option
Highlights relocated DLLs in yellow
1-18
1-19
1-21
1-22
Agenda
1-24
1-26
1-27
1-28
1-29
1-30
1-31
1-33
1-34
Using Regmon/Filemon
Two basic techniques:
Go to end of log and look backwards to where problem
occurred or is evident and focused on the last things
done
Compare a good log with a bad log
Often comparing the I/O and Registry activity of a
failing process with one that works may point to
the problem
Have to first massage log file to remove data that differs
run to run
Delete first 3 columns (they are always different: line #, time,
process id)
Easy to do with Excel by deleting columns
Then compare with FC (built in tool) or Windiff
(Resource Kit)
1-36
1-37
Controlling Filemon
Start/stop logging (Control/E)
Clear display (Control/X)
Open Explorer window to folder containing
file:
Double click on a line does this
Find – finds text within window
Save to log file
History depth
Advanced mode
1-38
1-39
Filemon Lab 1
1. Run Filemon
2. Set filter to only include Notepad.exe
3. Run Notepad
4. Type some text
5. Save file as “test.txt”
6. Go back to Filemon
7. Stop logging
8. Set highlight to “test.txt”
9. Find line representing creation of new file
Hint: look for create operation
1-40
Access Denied
Many applications don’t report access
denied errors well
Example: try to save a file with Notepad to a
folder you don’t have access to
Use Filemon to verify access denied
errors are not occurring on file opens
Check Result column
1-42
1-43
Locked Files
Attempting to open or delete a file that is
in use simply reports “file locked”
With Process Explorer search (in handle
view) you can determine what process is
holding a file or directory open
Can even close open files (be careful!)
1-46
1-48
1-49
DLL Problems
But sometimes it’s the order of DLL loads
that clues you in, so use Filemon!
Missing DLLs often not reported correctly
Look for “NOTFOUND” or “ACCESS DENIED”
May be opening wrong versions due to files in
PATH
Look at the last DLL opened before the
application died
1-50
1-51
1-52
1-53
Configuration Problems
Missing, corrupted or overly-secure Registry
settings often lead to application crashes and
errors
Some applications don’t completely remove
registry data at uninstall
Regmon may yield the answer…
1-56
1-57
Controlling Regmon
Start/stop logging (Control/E)
Clear display (Control/X)
Regedit jump (opens Registry Editor and
jumps directly to key)
Double clicking on a line does this
Filtering/Highlighting
Find
Save to log file
1-58
1-59
Regmon Lab 1
1. Run Regmon
2. Highlight Notepad.exe
3. Run Notepad
4. Change font to “Times New Roman”
5. Exit
6. Go back to Regmon
7. Stop logging
8. Find line showing storing of font name in
registry
Hint: search for “times”
1-60
1-61
Example Problem
Internet Explorer failed to start:
Solution:
Looked backwards from end of Regmon log
Last queries were to:
HKCU\Software\Microsoft\Internet Connection Wizard
Looked here and found a single value “Completed”
set to 0
Compared to other users—theirs was 1
Set this manually to 1 and problem went away 1-62
1-63
Regmon Lab 2
1. Run Notepad
2. Change Font and point size
3. Enable Word wrap
4. Run Regmon & filter to Notepad.exe
5. Exit Notepad
6. In Regmon log, find location of user-specific
Notepad settings
7. Double click on a line to jump to Regedit
8. Delete top level Notepad user settings key
9. Re-run Notepad and confirm font and word
wrap reset to default setting
1-64
1-65
Solution
Ran Regmon
Looked backwards from end (at the point
IE was hung)
Found references to ATT under a
PhoneBook key
Renamed ATT key and problem went away
Conclusion: registry junk was left from
uninstall
1-66
Filemon/Regmon as a Service
Sometimes need to capture I/O or registry
activity during the logon or logoff process
E.g. errors occuring during logon/logoff
Solution:
Run Filemon/Regmon with AT command
Install and run Filemon/Regmon as a service
Use Srvany tool from Resource Kit
In either case, but tools remain running
after logoff
1-68
1-69
Process Crashes
1-70
1-71
1-74
1-75
1-1
Troubleshooting Memory
Problems
System and process memory usage may
degrade performance
Or eventually cause process failures
How do you determine memory leaks?
Process vs. system?
How do you know if you need more memory?
How do you size your page file?
What do system and process memory counters
really mean?
Understanding process and system memory
information can help answer these questions…
1-2
1-4
1-5
32-Bit Virtual
00000000
Address Space
Code: EXE/DLLs
Unique per
process, Data: EXE/DLL (x86)
accessible in static storage, per- 2 GB per-process
user or kernel thread user mode Address space of one process
mode stacks, process is not directly reachable from
heaps, etc. other processes
2 GB system-wide
7FFFFFFF The operating system is
80000000 loaded here, and appears
Per process, Code: in every process’s
accessible NTOSKRNL, HAL, address space
only in kernel drivers The operating system is not a
mode Data: kernel stacks, process (though there are
C0000000 processes that do things for
Process page tables, the OS, more or less in
System wide, File system cache
hyperspace “background”)
accessible Non-paged pool,
only in kernel
3 GB user space and Address
Paged pool Windowing Extensions (AWE)
mode
FFFFFFFF
t.b.d.
1-6
1-7
0
64-Bit Virtual
User-Mode User Space
6FC00000000 Kernel-Mode User Space
Address Space
1FFFFF0000000000 User Page Tables
(Itanium)
2000000000000000 Session Space
E000000000000000
-E000060000000000 System Space
PerfMon
Process “WorkingSet”
1-9
1
z “Mem Usage” = physical
memory used by process
(working set size, not
working set limit)
¾ Note: Shared pages are
counted in each
process
2 “VM Size” = private (not
z
shared) committed virtual
space in processes ==
potential pagefile usage
3 “Mem Usage” in status bar
z
is not total of “Mem Usage”
column (see later slide) 3
1-12
1-13
Paging Lists
1-14
1-15
1-16
Paging Dynamics
demand zero page read from
page faults disk or kernel
allocations
Standby
Page
List
“global Modified
valid” Page
faults
working set List
replacement
Private pages
at process exit
1-18
6 “Available” = sum of free,
standby, and zero page
lists (physical)
Majority are likely standby
pages
“System Cache” = size of
standby list + size of 6
system working set (file
cache, paged pool,
pageable OS/driver code
& data)
Screen snapshot from:
Task Manager | Performance tab 1-19
Zeroed: 0 ( 0 kb)
Free: 3 ( 12 kb)
Standby: 98248 (392992 kb)
Modified: 563 ( 2252 kb)
ModifiedNoWrite: 0 ( 0 kb)
Active/Valid: 93437 (373748 kb)
Transition: 1 ( 4 kb)
Unknown: 0 ( 0 kb)
TOTAL: 192252 (769008 kb)
Screen snapshot from:kernel debugger
!memusage command 1-20
1-21
Page Files
What gets sent to the paging file?
Not code – only modified data (code can be re-read
from image file anytime)
When do pages get paged out?
Only when necessary
Page file space is only reserved at the time pages
are written out
Once a page is written to the paging file, the space is
occupied until the memory is deleted (e.g., at
process exit), even if the page is read back from disk
Can run with no paging file
Windows NT4/Windows 2000: Zero pagefile size
actually created a 20MB temporary page file
1-22
1-24
1-25
Memory Leaks
1-28
1-29
1-30
1-31
1-32
1-33
Two options:
Poolmon
In the Support Tools and the Device Driver Kit
(DDK)
Requires that you turn on Pool Tagging with
Gflags on Windows NT and Windows 2000
Driver Verifier
Select all drivers
Turn on pool tracking
1-35
1-36
1-37
1-38
1-39
1-40
1-41
1-1
Outline
What causes crashes?
Crash dump options
Analysis with WinDbg/Kd
Debugging hung systems
Microsoft On-line Crash Analysis
Using Driver Verifier
Live kernel debugging
Getting past a crash
1-2
1-3
Dump Options
Complete memory dump (Windows NT 4,
Windows 2000, Windows XP)
Full contents of memory written to
<systemroot>\memory.dmp
Kernel memory dump (Windows 2000, Windows
XP)
System memory written to <systemroot>\memory.dmp
Small memory dump (Windows 2000, Windows
XP)
Also called a minidump or triage dump
64KB of summary written to
<systemroot>\minidump\MiniMMDDYY-NN.dmp
1-7
In Windows NT 4:
1-8
Enabling Dumps
In Windows 2000/XP:
1-9
WinLogon
Session 2
Manager
Memory.dmp
3
SaveDump
1 4
User mode
Kernel mode
NtCreatePagingFile
Paging
File
1-12
At The Reboot
Session Manager process
(\winnt\system32\smss.exe) initializes
paging file 1
NtCreatePagingFile
NtCreatePagingFile determines if the dump
has a crash header 2
Protects the dump from use
WinLogon calls NtQuerySystemInformation
to tell if there’s a dump
1-13
1-14
1-16
1-17
1-18
Symbol Files
Before you can use any crash analysis tool you
need symbol files
Symbol files contain global function and variable names
At the minimum, get the symbol file(s) for ntoskrnl.exe,
ntkrnlmp.exe, ntkrnlpa.exe, ntkrpamp.exe
Symbols are service pack-specific and have an
installer (default directory is \winnt\symbols)
Windows NT 4: *.dbg
Windows 2000: *.dbg, *.pdb
Windows XP: *.pdb
Note: SP symbols only include updates
1-19
1-21
1-22
Debugger Commands
Two types of commands
Dot commands are built-in
Bang commands are provided with extension
DLLs
Extension DLLs allow Microsoft and third-
parties to dynamically add commands
The main extension DLL is the kernel-
debugger extension DLL, kdexts.dll
Each OS has a subdirectory with its own
kdexts.dll version as well as other,
development-area specific, extension DLLs
(e.g. Rpcexts.dll, ndiskd.dll, …)
1-23
1-24
Useful Commands
1-25
1-26
Hung Systems
To configure keystroke-crash:
Set HKEY_LOCAL_MACHINE\System\
CurrentControlSet\Services\i8042prt\
Parameters\CrashOnCtrlScrl to 1
Enter right-ctrl+[scroll-lock, scroll-lock] to crash
the system
Use !thread to see what’s running
1-27
1-28
1-30
Special Pool
Special pool is a kernel
buffer area where buffers
are sandwiched with invalid Page n+2
Invalid
pages
Higher
Buffer
Conditions for a driver Addresses
1-32
1-33
1-34
More Information
Inside Windows 2000, 3rd edition – section
on System Crashes in chapter 4
Debugging Tools help file
Knowledge Base Articles
http://www.microsoft.com/ddk/debugging
Other books:
http://www.microsoft.com/ddk/
newbooks.asp
The debugger team wants your feedback
and bug reports 1-35
1-36