0% found this document useful (0 votes)
219 views108 pages

Windows Internals

This document provides an overview and outline of a tutorial on Windows internals and advanced troubleshooting. The tutorial aims to give IT professionals a foundational understanding of the Windows kernel architecture. It will cover kernel architecture, troubleshooting processes and threads, troubleshooting memory problems, and crash dump analysis. The tutorial applies to Windows NT4, Windows 2000, Windows XP, and Windows Server 2003. It is condensed from a 5 day internals class.

Uploaded by

laertepalves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
219 views108 pages

Windows Internals

This document provides an overview and outline of a tutorial on Windows internals and advanced troubleshooting. The tutorial aims to give IT professionals a foundational understanding of the Windows kernel architecture. It will cover kernel architecture, troubleshooting processes and threads, troubleshooting memory problems, and crash dump analysis. The tutorial applies to Windows NT4, Windows 2000, Windows XP, and Windows Server 2003. It is condensed from a 5 day internals class.

Uploaded by

laertepalves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Windows Internals and

Advanced Troubleshooting
Part 1: Kernel Architecture

David Solomon
David Solomon Expert Seminars
Mark Russinovich
Winternals Software

1-1

About The Speaker


ƒ 10 years doing operating systems development at
Digital (VMS)
ƒ Since 1992, researching, writing, and teaching about
Windows NT operating system internals
ƒ Live classes
ƒ Windows NT/2000/XP/Server 2003 Internals
ƒ Books
ƒ Co-author of Inside Windows 2000, 3rd edition (Microsoft
Press, 2000)
ƒ Author of Inside Windows NT, 2nd edition
(Microsoft Press, 1998)
ƒ Author of Windows NT for OpenVMS Professionals
(Digital Press, 1996)
ƒ Video training
ƒ 11 hour interactive internals tutorial on DVD or CD
ƒ Used by Microsoft for their internal training

1-2

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Purpose of Tutorial
ƒ Give IT Professionals a foundation
understanding of the Windows OS kernel
architecture
ƒ Note: this is a small, but important part of Windows
ƒ The “plumbing in the boiler room”
ƒ Condensed from a 5 day internals class
ƒ Benefits:
ƒ Able to troubleshoot problems more effectively
ƒ Understand system performance issues
ƒ Applies to NT4, Windows 2000, Windows XP,
and Windows Server 2003

1-3

Outline
1. Kernel Architecture
2. Troubleshooting Processes and Threads
3. Troubleshooting Memory Problems
4. Crash Dump Analysis

1-4

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Kernel Architecture
System Processes Services Applications Environment
Service Subsystems
Control Mgr. POSIX
SvcHost.Exe Task Manager
LSASS
WinMgt.Exe Explorer
WinLogon SpoolSv.Exe OS/2
User
User Session Services.Exe Application
Mode Manager Subsystem DLLs Win32

System NTDLL.DLL
Threads

Kernel
Mode System Service Dispatcher
(kernel mode callable interfaces) Win32
USER,
I/O Mgr

Configura-
Processes

Procedure
Reference
GDI
Play Mgr.

(registry)
Plug and

Security

tion Mgr
Threads
Memory
Monitor
System

Object

Virtual
Power
Cache

Local
Mgr.

Mgr.

Call
File

&
Device & Graphics
File Sys. Drivers
Drivers
Kernel
Hardware Abstraction Layer (HAL)
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
1-5
Original copyright by Microsoft Corporation. Used by permission.

Windows XP
ƒ Six variants:
1. Windows XP Professional: replaces Windows 2000
Professional
2. Windows XP Home Edition (new)
ƒ First consumer focused release of NT
ƒ Replaces Windows ME (Millenium Edition)
ƒ Has slightly less features than Windows XP Professional
3. Windows XP Professional 64-bit Edition (new)
ƒ First 64-bit version of NT - 64-bit pointers, much larger
address space
ƒ Runs on Intel Itanium & Itanium 2 (later: AMD Opteron)
4. Windows XP Embedded
ƒ Same kernel as regular 32-bit XP
ƒ Configurable to remove unnecessary components
ƒ Boot and execute from ROM (OS runs from RAM, apps
from ROM)
5. Windows XP Media Center Edition
6. Windows XP Tablet PC Edition
1-6

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Windows Server 2003
ƒ Replacement for Windows 2000 Server family
ƒ Name changes for flavors
Windows Server 2003, Web Edition (new package)
Windows Server 2003, Standard Edition (was Server)
Windows Server 2003, Enterprise Edition (was Advanced Server)
Windows Server 2003, Datacenter Edition (no change)

ƒ New features:
ƒ More scalable: 64 processor systems, 8 node clusters, larger
memory maximums
ƒ IIS 6.0 (HTTP in the kernel, Connection failover)
ƒ Active Directory enhancements
ƒ Many new group policies
ƒ Remote Installation Support (RIS)
ƒ Bundles .NET Framework
1-7

Level Of Kernel Change


ƒ Windows .NET Server 2003 & Windows XP are
modest upgrades as compared to the changes from
Windows NT 4.0 to Windows 2000
ƒ Kernel architecture is basically unchanged
ƒ No new subsystems
ƒ No new API sets
ƒ Internal version numbers confirm this
ƒ Windows 2000 is 5.0
ƒ Windows XP is 5.1 (not 6.0)
ƒ Windows .NET Server is 5.2
ƒ Not the same kernel as XP (a superset)
ƒ But, nonetheless, still lots of interesting kernel
changes…
1-8

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Tools used to dig in
ƒ Many tools available to dig into Windows 2000/XP internals
ƒ Helps to see internals behavior “in action”
ƒ We’ll use these tools to explore the internals
ƒ Many of these tools are also used in the labs that you can do after
each module
ƒ Several sources of tools
ƒ Support Tools
ƒ Resource Kit Tools
ƒ Debugging Tools
ƒ Sysinternals.com
ƒ Inside Windows 2000, 3rd edition book CD
ƒ Additional tool packages with internals information
ƒ Platform Software Development Kit (SDK)
ƒ Device Driver Development Kit (DDK)
1-9

Kernel Debugger
ƒ Allows exploring internal system state & data
structures
ƒ Part of Windows Debugging Tools
ƒ Download from http://www.microsoft.com/ddk/debugging
ƒ XP & Server 2003 support live kernel debugging
ƒ But not all commands work
ƒ LiveKD - tool on Inside Windows 2000 book CD
ƒ Allows using standard Microsoft kernel debuggers to view
“live” system state
ƒ Works on NT4, Windows 2000, Windows XP, Server 2003
ƒ Make sure to get patch from
www.sysinternals.com/insidew2k.shtml
1-10

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Symbols
ƒ What are they?
ƒ Contains debugging information needed for kernel &
user debugging
ƒ Names of functions and global variables
ƒ One symbol file (.PDB) for every EXE, DLL, driver
ƒ Data “stripped” from .EXEs, .DLLs, drivers after they are
linked
ƒ Can install symbols on local machine (>600MB)
ƒ Note: if running a Service Pack or hot fix, must get
matching symbols
ƒ Easiest solution: use on-demand symbol server
on internet
ƒ See www.microsoft.com/ddk/debugging/symbols.asp
for details

1-11

Kernel Architecture
ƒ Process Execution Environment
ƒ Architecture Overview
ƒ Interrupt Handling & Time Accounting
ƒ System Threads
ƒ Process-based code
ƒ Summary

1-12

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Processes And Threads Per-process
address space
ƒ What is a process?
ƒ Represents an instance of a running program
ƒ You create a process to run a program Thread
ƒ Starting an application creates a process
ƒ Process defined by
ƒ Address space Thread
ƒ Resources (e.g., open handles)
ƒ Security profile (token)
ƒ What is a thread?
ƒ An execution context within a process
ƒ Unit of scheduling (threads run, processes don’t Thread
run)
ƒ All threads in a process share the same per-
process address space
ƒ Services provided so that threads can
synchronize access to shared resources
(critical sections, mutexes, events,
semaphores)
ƒ All threads in the system are scheduled as peers
to all others, without regard to their “parent”
process System-wide
Address Space

1-13

Scheduling Priorities
Realtime Time Critical 31

Realtime
Realtime
Levels 16-31
24

High
Realtime Idle 16
15
13 Above Normal

Normal
10
Dynamic Below Normal
Levels 1-15 8 8
Idle
6

Dynamic Idle
System Idle 0

1-14

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Processes And Threads
ƒ Every process starts with one thread
ƒ First thread executes the program’s “main” function
ƒ Can create other threads in the same process
ƒ Can create additional processes
ƒ Why divide an application into multiple threads?
ƒ Perceived user responsiveness, parallel/background execution
ƒ Examples: Word background print – can continue to edit during print
ƒ Take advantage of multiple processors
ƒ On an MP system with n CPUs, n threads can literally run at the
same time
ƒ Questions
ƒ Given a single threaded application, will adding a second
processor make it run faster?
ƒ Will a multithreaded application run faster on an MP system?
ƒ Depends if application internal synchronization permits this
ƒ Having too many runnable threads causes excess context switching
1-15

32-Bit Virtual
00000000
Address Space
Code: EXE/DLLs
Unique per
process, Data: EXE/DLL (x86)
accessible in static storage, per- ƒ 2 GB per-process
user or kernel thread user mode ƒ Address space of one process
mode stacks, process is not directly reachable from
heaps, etc. other processes
ƒ 2 GB system-wide
7FFFFFFF ƒ The operating system is
80000000 loaded here, and appears
Per process, Code: in every process’s
accessible NTOSKRNL, HAL, address space
only in kernel drivers ƒ The operating system is not a
mode Data: kernel stacks, process (though there are
C0000000 processes that do things for
Process page tables, the OS, more or less in
System wide, File system cache
hyperspace “background”)
accessible Non-paged pool,
only in kernel
ƒ 3 GB user space and Address
Paged pool Windowing Extensions (AWE)
mode
FFFFFFFF
t.b.d.

1-16

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
0
64-Bit Virtual
User-Mode User Space
6FC00000000 Kernel-Mode User Space
Address Space
1FFFFF0000000000 User Page Tables
(Itanium)
2000000000000000 Session Space

3FFFFF0000000000 Session Space Page Tables

E000000000000000
-E000060000000000 System Space

FFFFFF0000000000 Session Space Page Tables

64-bit Windows 32-bit Windows


User Address Space 7152 GB 2 or 3 GB
System PTE Space 128 GB 2 GB
System Cache 1 TB 960 MB
Paged pool 128 GB 650 MB
Non-paged pool 128 GB 256 MB
1-17

Memory Protection Model


ƒ No user process can touch another user process’
address space
ƒ Without first opening the process (means passing
through NT security)
ƒ All kernel components share a single address
space
ƒ This is how driver bugs can cause ‘blue screens’
ƒ Most other commercial OSs (Unix, Linix, VMS etc.) have
the same design

1-18

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Memory Protection Model
ƒ Controlled by using two hardware access modes:
user and kernel
ƒ X86: Ring 0, Ring 3
ƒ Itanium: Privilege Level 0 & 3
ƒ Each memory page is tagged to show the required
mode for access
ƒ Associated with threads
ƒ Threads can change from user to kernel mode and
back (via a secure interface)
ƒ Part of saved context, along with registers, etc.
ƒ Does not affect scheduling

1-19

Accounting for Kernel-Mode Time


ƒ “Processor Time” =
total busy time of
processor (equal to
elapsed real time -
idle time)
ƒ “Processor Time” =
“User Time” +
“Privileged Time”
ƒ “Privileged Time” =
time spent in kernel
mode
ƒ “Privileged Time”
includes:
ƒ Interrupt Time
ƒ DPC Time
Screen snapshot from: Programs |
ƒ Explained later… Administrative Tools | Performance Monitor
click on “+” button, or select Edit | Add to chart...
1-20

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Kernel Architecture
ƒ Process Execution Environment
ƒ Architecture Overview
ƒ Interrupt Handling & Time Accounting
ƒ System Threads
ƒ Process-based code
ƒ Summary

1-21

Multiple OS Personality Design


Environment Subsystems

System User
& Service Application
Processes OS/2 POSIX
Subsystem DLL Win32

User
Mode NTDLL.DLL

Kernel
Mode Executive
Win32
Device Drivers Kernel User/GDI
Hardware Abstraction Layer (HAL)

1-22

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Environment Subsystems
ƒ Windows NT 4.0 shipped with three environment
subsystems
ƒ Win32 – 32-bit Windows API
ƒ OS/2 – 1.x character-mode apps only
ƒ Removed in Windows 2000
ƒ Posix – only Posix 1003.1 (bare minimum Unix services – no
networking, windowing, threads, etc.)
ƒ Removed in Windows XP/Server 2003 – enhanced version ships
with Services For Unix 3.0
ƒ Of the three, Win32 provides access to the majority of the
native functions
ƒ Of the three, Win32 is required to be running
ƒ System crashes if Win32 subsystem process exits
ƒ POSIX and OS/2 subsystems are Win32 programs
ƒ POSIX and OS/2 start on demand (first time an app is run)
ƒ Stay running until system shutdown
1-23

Subsystem Components

1 API DLLs
ƒ For Win32: Kernel32.DLL, Gdi32.DLL, User32.DLL, etc.
2 Subsystem process
ƒ For Win32: CSRSS.EXE (Client Server Runtime SubSystem)
3 For Win32 only: kernel-mode GDI code
ƒ Win32K.SYS – (this code was formerly part of CSRSS)
Environment Subsystems

System User 2
& Service Application OS/2 POSIX
Processes Subsystem DLL Win32
User 1
Mode

Kernel
Executive
Mode Win32 3
Device Drivers Kernel User/GDI
Hardware Abstraction Layer (HAL)

1-24

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Role Of Subsystem Components

1 API DLLs
ƒ Export the APIs defined by the subsystem
ƒ Implement them by calling Windows “native” services, or by asking the
subsystem process to do the work
2 Subsystem process
ƒ Maintains global state of subsystem
ƒ Implements a few APIs that require subsystem-wide state changes
ƒ Processes and threads created under a subsystem
ƒ Drive letters
ƒ Window management for apps with no window code of their own (character-
mode apps)
ƒ Handle and object tables for subsystem-specific objects
3 Win32K.Sys
ƒ Implements Win32 User and GDI functions; calls routines in
GDI drivers
ƒ Also used by Posix and OS/2 subsystems to access the display
1-25

Symmetric Multiprocessing (SMP)


ƒ No master processor
ƒ All the processors share just one
memory space CPUs
ƒ Interrupts can be serviced on any
processor
ƒ Any processor can cause another
processor to reschedule what it’s L2
Cache
running
ƒ Current implementation supports up Memory I/O
to 32 CPUs (64-bit edition is 64
internally)
SMP
ƒ Not an architectural limit—just
implementation
ƒ Maximum # of CPUs stored in registry
HKLM\System\CurrentControlSet
\Control\Session Manager
\LicensedProcessors

1-26

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
SMP Scalability
ƒ Scalability is a function of parallelization and
resource contention
ƒ Can’t make a general statement
ƒ Different for different applications (e.g., file server
versus SQL versus Exchange)
ƒ Windows kernel provides a scalable foundation
ƒ Multiple threads of execution within a single process,
each of which can execute simultaneously on different
processors
ƒ Ability to run operating system code on any available
processor and on multiple processors at the same time
ƒ Fine-grained synchronization within the kernel as well
as within device drivers allows more components to run
concurrently on multiple processors
ƒ Multiple programming mechanisms that facilitate
scalable server applications (e.g. I/O completion ports)
1-27

SMP Scalability
ƒ More efficient locking mechanism (pushlocks)
ƒ Minimized lock contention for hot locks
ƒ E.g., PFN (Page Frame Database) lock
ƒ Some locks completely eliminated
ƒ Charging nonpaged/paged pool quotas, allocating and
mapping system page table entries, charging
commitment of pages, allocating/mapping physical
memory through
AWE functions
ƒ Even better in Server 2003:
ƒ Further reduction of use of spinlocks & length they are
held
ƒ Dispatcher (scheduling) database locking now per-
CPU

1-28

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
New MP Configurations
ƒ NUMA (non uniform memory architecture) systems
ƒ Groups of physical processors (called “nodes”) that have “local
memory”
ƒ Still an SMP system (e.g. any processor can access all of memory)
ƒ But node-local memory is faster
ƒ Scheduling algorithms take this into account
ƒ Hyperthreading support
ƒ CPU fools OS into thinking there are multiple CPUs
ƒ Example: dual Xeon with hyperthreading can support 2 logical
processors
ƒ Windows Server 2003 is hyperthreading aware
ƒ Logical processors don’t count against physical processor limits
ƒ Scheduling algorithms take into account logical vs physical
processors

1-29

Many Packages…
1. Windows XP Home Edition
ƒ 1 CPU, 4GB RAM
2. Windows 2000 & XP Professional
ƒ Desktop version (but also is a fully functional server system)
ƒ 2 CPUs, 4GB RAM
3. Windows Server 2003, Web Edition (new)
ƒ Reduced functionality Standard Server (no domain controller)
ƒ 2 CPUs, 2GB RAM
4. Windows 2000 Server/Windows Server 2003, Standard Edition
ƒ Adds server and networking features (active directory-based domains,
host-based mirroring and RAID 5, NetWare gateway, DHCP server,
WINS, DNS, …)
ƒ Also is a fully capable desktop system
ƒ 4 CPUs (2 in Server 2003), 4GB RAM
5. Windows 2000 Advanced Server/Windows Server 2003, Enterprise
Edition
ƒ 3GB per-process address space option, Clusters (8 nodes)
ƒ 8 CPUs, 8GB RAM (32GB in Server 2003 32-bit; 64GB on 64-bit)
6. Windows 2000/Server 2003 Datacenter Edition
ƒ Process Control Manager
ƒ Licensed for 32 CPUs, 64GB RAM (128GB on 64-bit edition)
1-30

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
…But one OS
ƒ Through Windows 2000, core operating system
executables are identical
ƒ NTOSKRNL.EXE, HAL.DLL, xxxDRIVER.SYS, etc.
ƒ XP & Server 2003 have different kernel versions, but not
substantially different
ƒ Registry indicates system type (set at install time)
ƒ HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control
\ProductOptions
ƒ ProductType: WinNT=Workstation, ServerNT=Server not a domain
controller, LanManNT=Server that is a Domain Controller
ƒ ProductSuite: indicates type of Server (Advanced, Datacenter, or for
Windows NT 4.0: Enterprise Edition, Terminal Server, …)
ƒ Code in the operating system tests these values and
behaves slightly differently in a few places
ƒ Licensing limits (number of processors, number of inbound network
connections, etc.)
ƒ Boot-time calculations (mostly in the memory manager)
ƒ Default length of time slice
1-31

NTOSKRNL.EXE
ƒ Core operating system image
ƒ Contains Executive and Kernel
ƒ Kernel versions
ƒ Windows NT 4.0 is 4.0 (client and server)
ƒ Windows 2000 is 5.0 (client and server)
ƒ Windows XP is 5.1 (client only)
ƒ Windows Server 2003 is 5.2 (server only)
ƒ Kernel evolution
ƒ NT4->Windows 2000 – significant change
ƒ Windows 2000->Windows XP – modest change
ƒ Windows XP->Server 2003 – minimal change

1-32

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
NTOSKRNL Variants

ƒ Four variations:
ƒ 4GB or less
NTOSKRNL.EXE Uniprocessor
NTKRNLMP.EXE Multiprocessor
ƒ >4GB (new as of Windows 2000)
NTKRNLPA.EXE Uniprocessor w/extended
addressing support
NTKRPAMP.EXE Multiprocessor w/extended
addressing support

1-33

HAL – Hardware Abstraction Layer


ƒ Responsible for a small part of “hardware
abstraction”
ƒ Components on the motherboard not handled by drivers
ƒ System timers, Cache coherency, and flushing
ƒ SMP support, Hardware interrupt priorities
ƒ Subroutine library for the kernel and device drivers
ƒ Isolates OS & drivers from platform-specific details
ƒ Presents uniform model of I/O hardware interface to
drivers
ƒ Reduced role in Windows 2000
ƒ Bus support moved to bus drivers
ƒ Majority of HALs are vendor-independent

1-34

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
NTOSKRNL And HAL Selection
ƒ Selected at installation time
ƒ See \windows\repair\setup.log to find out which one
ƒ Can select manually at boot time with /HAL= in boot.ini

NT distribution
CD-ROM:\i386 Boot Partition:
\Windows\System32
NTOSKRNL.EXE,
NTKRNLPA.EXE,
NTKRNLMP.EXE,
NTKRPAMP.EXE
NTOSKRNL.EXE
NT Setup NTKRNLPA.EXE
HAL.DLL
HALACPI.DLL
HAL.DLL
etc.

(see \windows\repair\setup.log)

1-35

NTOSKRNL And HAL Selection


ƒ NTOSKRNL & HAL considered to be the “device drivers” for the
“computer”
ƒ Go to Control Panel->System – Hardware tab
ƒ Click on “Device Manager”
ƒ Click on “Computer”
ƒ Right click/Properties on
“driver” for PC

Screen snapshot from:


Control Panel | System | Hardware |
Device Manager | Computer properties |
Driver Details
1-36

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Debug Version
“Checked Build”
ƒ Special debug version of system called “Checked Build”
ƒ Provided with MSDN
ƒ Primarily for driver testing, but can be useful for catching timing bugs in
multithreaded applications
ƒ Built from same source files as “free build” (a.k.a., “retail build”)
ƒ “DBG” compile-time symbol defined which enables:
ƒ Error tests for “can’t happen” conditions in kernel mode (ASSERTs)
ƒ Validity checks on arguments passed from one kernel mode routine to another
#ifdef DBG
if (something that should never happen has happened)
KeBugCheckEx(…)
#endif

ƒ Multiprocessor kernel (of course, runs on UP systems)


ƒ Since no checked Server CD provided, can copy checked NTOSKRNL, HAL,
to a normal Server system
ƒ Select debug kernel and HAL with Boot.ini /KERNEL=, /HAL= switches
ƒ See Knowledge base article 314743 (HOWTO: Enable Verbose Debug
Tracing in Various Drivers and Subsystems) 1-37

Kernel Architecture
ƒ Process Execution Environment
ƒ Architecture Overview
ƒ Interrupt Handling & Time Accounting
ƒ System Threads
ƒ Process-based code
ƒ Summary

1-38

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Interrupt Dispatching
user or
kernel mode kernel mode
code Note, no thread or
process context
switch!
Interrupt dispatch routine
interrupt !
Disable interrupts
Interrupt service routine
Record machine state (trap
frame) to allow resume Tell the device to stop
interrupting
Mask equal- and lower-IRQL Interrogate device state,
interrupts start next operation on
device, etc.
Request a DPC
Find and call appropriate
ISR Return to caller

Dismiss interrupt

Restore machine state


(including mode and
enabled interrupts)

1-39

Interrupt Precedence Via IRQLs


ƒ IRQL = Interrupt Request Level ƒ IRQL is also a state of the
ƒ The “precedence” of the interrupt processor
with respect to other interrupts ƒ Servicing an interrupt raises
ƒ Different interrupt sources have processor IRQL to that
different IRQLs interrupt’s IRQL
ƒ Not the same as IRQ ƒ This masks subsequent interrupts at
equal and lower IRQLs
31 High ƒ User mode is limited to IRQL 0
30 Power fail ƒ No waits or page faults at
29 Interprocessor Interrupt IRQL >= DISPATCH_LEVEL
28 Clock
Device n Hardware interrupts
.
.
.
Device 1
2 Dispatch/DPC Deferrable software interrupts
1 APC
normal thread execution
0 Passive 1-40

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Deferred Procedure Calls (DPCs)

ƒ Used to defer processing from higher (device) interrupt level to a


lower (dispatch) level
ƒ Driver (usually ISR) queues request
ƒ One queue per CPU; DPCs are normally queued to the current
processor, but can be targetted to other CPUs
ƒ Executes specified procedure at dispatch IRQL (or “dispatch level”, also
“DPC level”) when all higher-IRQL work (interrupts) completed
ƒ Used heavily for driver “after interrupt” functions
ƒ Also used for quantum end and timer expiration

queue head DPC object DPC object DPC object

1-41

Interrupt Time Accounting


ƒ Time servicing interrupts are NOT charged to
interrupted thread
ƒ Time spent at IRQL 2 appears as “% DPC time”
ƒ Time spent at IRQL >2 appears as “% interrupt time”
ƒ Hence no process appears to be running
ƒ What if system is not idle, but no process
appears to be running?
ƒ Must be due to interrupt-related activity
ƒ Performance counters (Processor object):
ƒ % Interrupt time – time spent processing hardware
interrupts
ƒ % DPC time – software generated interrupts
ƒ Can also look at Interrupts/sec & DPCs Queued/sec

1-42

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Time Accounting Quirks
ƒ Looking at total CPU time for each process may
not reveal where system has spent its time
ƒ CPU time accounting is driven by programmable
interrupt timer
ƒ Normally 10 msec (15 msec on some MP Pentiums)
ƒ Thread execution and context switches between
clock intervals NOT accounted
ƒ E.g., one or more threads run and enter a wait state
before clock fires
ƒ Thus threads may run but never get charged

1-43

Kernel Architecture
ƒ Process Execution Environment
ƒ Architecture Overview
ƒ Interrupt Handling & Time Accounting
ƒ System Threads
ƒ Process-based code
ƒ Summary

1-44

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
System Threads
ƒ Functions in OS and some drivers that need to run as
real threads
ƒ E.g., need to run concurrently with other system activity, wait on
timers, perform background “housekeeping” work
ƒ Always run in kernel mode
ƒ Not non-preemptible (unless they raise IRQL to 2 or above)
ƒ For details, see DDK documentation on PsCreateSystemThread
ƒ What process do they appear in?
ƒ “System” process (Windows NT 4.0: PID 2,
Windows 2000: PID 8, Windows XP: PID 4)
ƒ In Windows 2000 and XP, windowing system threads (from
Win32k.sys) appear in “csrss.exe”
(Win32 subsystem process)

1-45

Examples Of System Threads


ƒ Memory Manager
ƒ Modified Page Writer for mapped files
ƒ Modified Page Writer for paging files
ƒ Balance Set Manager
ƒ Swapper (kernel stack, working sets)
ƒ Zero page thread (thread 0, priority 0)
ƒ Security Reference Monitor
ƒ Command Server Thread
ƒ Network
ƒ Redirector and Server Worker Threads
ƒ Threads created by drivers for their exclusive use
ƒ Examples: Floppy driver, parallel port driver
ƒ Pool of Executive Worker Threads
ƒ Used by drivers, file systems, …
ƒ Accessed via ExQueueWorkItem 1-46

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Understanding System Threads

ƒ Later we’ll see how to understand what


system thread is running when the System
process is consuming CPU time…

1-47

Kernel Architecture
ƒ Process Execution Environment
ƒ Architecture Overview
ƒ Interrupt Handling & Time Accounting
ƒ System Threads
ƒ Process-based code
ƒ Summary

1-48

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process-Based Code
ƒ OS components that run in separate executables
(.exes), in their own processes
ƒ Started by system
ƒ Not tied to a user logon
ƒ Three types
ƒ Environment subsystems (already described)
ƒ System startup processes
ƒ Note: “system startup processes” is not an official Microsoft
defined name
ƒ Win32 Services
ƒ Let’s examine the system process “tree”
ƒ Use Tlist /T or Process Explorer
1-49

Process-Based NT Code
System Startup Processes
ƒ First two processes aren’t real processes
ƒ Not running a user mode .EXE
ƒ No user-mode address space
ƒ Different utilities report them with different names
ƒ Data structures for these processes (and their initial threads) are
“pre-created” in NtosKrnl.Exe and loaded along with the code

(Idle) Process id 0
Part of the loaded system image
Home for idle thread(s) (not a real process nor real threads)
Called “System Process” in many displays
(System) Process id 2 (8 in Windows 2000; 4 in XP)
Part of the loaded system image
Home for kernel-defined threads (not a real process)
Thread 0 (routine name Phase1Initialization) launches the first
“real” process, running smss.exe...
...and then becomes the zero page thread

1-50

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process-Based NT Code
System Startup Processes
smss.exe Session Manager
The first “created” process
Takes parameters from
\HKEY_LOCAL_MACHINE\System\CurrentControlSet
\Control\Session Manager
Launches required subsystems (csrss) and then winlogon
csrss.exe Win32 subsystem
winlogon.exe Logon process: Launches services.exe & lsass.exe; presents first
login prompt
When someone logs in, launches apps in
\Software\Microsoft\Windows NT\WinLogon\Userinit
services.exe Service Controller; also, home for many NT-supplied services
Starts processes for services not part of services.exe (driven by
\Registry\Machine\System\CurrentControlSet\Services )
lsass.exe Local Security Authentication Server
userinit.exe Started after logon; starts Explorer.exe (see
\Software\Microsoft\Windows NT\CurrentVersion\WinLogon\Shell)
and exits (hence Explorer appears to be an orphan)
explorer.exe and its children are the creators of all interactive apps
1-51

Win32 Services
ƒ An overloaded generic term
ƒ A process created and managed by the Service
Control Manager (Services.exe)
ƒ E.g. Solitaire can be configured as a service, but is
killed shortly after starting
ƒ Similar in concept to Unix daemon processes
ƒ Typically configured to start at boot time (if started
while logged on, survive logoff)
ƒ Typically do not interact with the desktop
ƒ Note: Prior to Windows 2000 this is one way to
start a process on a remote machine (now you
can do it with WMI)
1-52

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Life Of A Service
ƒ Install time
ƒ Setup application tells Service
Controller about the service
Registry
Setup CreateService
Application

ƒ System boot/initialization
ƒ SCM reads registry, starts Service
services as directed Controller/
Manager
ƒ Management/maintenance (Services.Exe)
ƒ Control panel can start
Service
and stop services and
Processes
change startup parameters
Control
Panel
1-53

Mapping Services to Service


Processes
ƒ Service properties displayed through Control
Panel (services.msc) show name of .EXE
ƒ But not which process started services are in
ƒ Tlist /S or Tasklist /svc (new as of XP) list
internal name of services inside service
processes
ƒ Process Explorer shows both internal and
external name

1-54

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Services Infrastructure Improvements
ƒ Two new less privileged accounts for built-in services
ƒ LOCAL SERVICE, NETWORK SERVICE
ƒ Less rights than LocalSystem
ƒ Reduces possibility of damage if system compromised
ƒ More services run in generic service host process
(svchost.exe)
ƒ Reduces number of processes
ƒ Four instances (at least)
ƒ SYSTEM
ƒ SYSTEM (2nd instance – for RPC)
ƒ LOCAL SERVICE
ƒ NETWORK SERVICE
ƒ Later we’ll see how to understand WHICH service is
consuming CPU time when a multi-service process is
running
1-55

Logon Process
1. Winlogon sends username/password to Lsass
ƒ Either on local system for local logon, or to Netlogon service on a domain
ƒ Windows XP enhancement: Winlogon doesn’t wait for Workstation
service to start if:
ƒ Account doesn't depend on a roaming profile
ƒ Domain policy that affects logon hasn't changed since last logon
ƒ Controller for a network logon
2. Creates a process to run
HKLM\Software\Microsoft\Windows NT
\CurrentVersion\WinLogon\Userinit
ƒ By default: Userinit.exe
ƒ Runs logon script, restores drive-letter mappings, starts shell
3. Userinit creates a process to run
HKLM\Software\Microsoft\Windows NT
\CurrentVersion\WinLogon\Shell
ƒ By default: Explorer.exe
ƒ There are other places in the Registry that control
programs that start at logon 1-56

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Processes Started at Logon
ƒ Displays order of processes configured to start at log on time
ƒ Also can use new XP built-in tool called
“System Configuration Utility”
ƒ To run, click on Start->Help, then “Use Tools…”, then System
Configuration Utility
ƒ Only shows what’s defined to start vs Autoruns which shows all places
things CAN be defined to start
Msconfig
Autoruns (Sysinternals) (in \Windows\pchealth\helpctr\binaries)

1-57

Kernel Architecture
ƒ Process Execution Environment
ƒ Architecture Overview
ƒ Interrupt Handling & Time Accounting
ƒ System Threads
ƒ Process-based code
ƒ Summary

1-58

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Kernel Architecture
System Processes Services Applications Environment
Service Subsystems
Control Mgr. POSIX
SvcHost.Exe Task Manager
LSASS
WinMgt.Exe Explorer
WinLogon SpoolSv.Exe OS/2
User
User Session Services.Exe Application
Mode Manager Subsystem DLLs Win32

System NTDLL.DLL
Threads

Kernel
Mode System Service Dispatcher
(kernel mode callable interfaces) Win32
USER,
I/O Mgr

Configura-
Processes

Procedure
Reference
GDI
Play Mgr.

(registry)
Plug and

Security

tion Mgr
Threads
Memory
Monitor
System

Object

Virtual
Power
Cache

Local
Mgr.

Mgr.

Call
File

&
Device & Graphics
File Sys. Drivers
Drivers
Kernel
Hardware Abstraction Layer (HAL)
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
1-59
Original copyright by Microsoft Corporation. Used by permission.

Four Contexts For Executing Code


ƒ Full process and thread context
ƒ User applications
ƒ Win32 Services
ƒ Environment subsystem processes
ƒ System startup processes
ƒ Have thread context but no “real” process
ƒ Threads in “System” process
ƒ Routines called by other threads/processes
ƒ Subsystem DLLs
ƒ Executive system services (NtReadFile, etc.)
ƒ GDI32 and User32 APIs implemented in Win32K.Sys (and graphics
drivers)
ƒ No process or thread context (“arbitrary thread context”)
ƒ Interrupt dispatching
ƒ Device drivers
1-60

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Core Kernel System Files
ƒ Kernel32.Dll, Gdi32.Dll, User32.Dll
ƒ Export Win32 entry points
ƒ NtDll.Dll
ƒ Provides user-mode access to system-space routines
ƒ Also contains heap manager, image loader, thread startup routine
ƒ NtosKrnl.Exe (or NtkrnlMp.Exe)
ƒ Executive and kernel
ƒ Includes most routines that run as threads in “system” process
ƒ Win32K.Sys
ƒ The loadable module that includes the now-kernel-mode Win32
code (formerly in csrss.exe)
ƒ Hal.Dll
ƒ Hardware Abstraction Library
ƒ drivername.Sys
ƒ Loadable kernel drivers 1-61

End of Kernel Architecture

Next: Process & Thread Troubleshooting

1-62

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Windows Internals and
Advanced Troubleshooting
Part 2: Troubleshooting Processes &
Threads

1-1

Agenda

Introduction to Tools
Identifying the Process
Analyzing Process/Thread Activity
Application Failures

1-2

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Tools for Obtaining Process &
Thread Information
Many overlapping tools (most show one item the others do not)
Built-in tools in Windows 2000/XP:
Task Manager, Performance Tool
Tasklist (new in XP)
Support Tools
pviewer - process and thread details (GUI)
pmon - process list (character cell)
tlist - shows process tree and thread details (character cell)
Resource Kit tools:
apimon - system call and page fault monitoring (GUI)
oh – display open handles (character cell)
pviewer - processes and threads and security details (GUI)
ptree – display process tree and kill remote processes (GUI)
pulist - lists processes and usernames (character cell)
pstat - process/threads and driver addresses (character cell)
qslice - can show process-relative thread activity (GUI)
Tools from www.sysinternals.com
Process Explorer – super Task Manager – shows open files, loaded DLLs, security info,
etc.
Pslist – list processes on local or remote systems
Ntpmon - shows process/thread create/deletes (and context
switches on MP systems only)
Listdlls - displays full path of EXE & DLLs loaded in each process
1-3

Tools We’ll Look At


Task Manager – see what’s using CPU
Process Explorer (Procexp) – view
process details
Filemon – monitors file I/O
Regmon – monitors registry I/O
Pssuspend – suspends a proces
Strings – dumps printable strings in files

1-4

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Agenda

Introduction & Data Structures


Identifying the Process
Analyzing Process/Thread Activity
Application Failures

1-5

The CPU Is Busy – Why?


System is busy
(may be slow)
What is running?
A user or system
process?
Interrupt activity?
What’s it doing?
File I/O? Network
I/O? Registry
calls?
Application code? 1-6

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Which Process Is Running?
Determine which process’
threads are consuming
the most CPU time
Quick method:
Open Task Manager
->Processes
Sort processes by “CPU”
usage column
Other tools
Qslice.exe (Resource Kit)
Performance Monitor
(monitor %Processor Time
counter in process object
for all processes)
1-7

Task Manager:
Applications vs.
Processes
Applications tab: List
of top level visible
windows
Windows are owned by
threads
Right-click on a window
and select “Go to
process”
Processes tab: List of
processes “Running” means
waiting for window
Can configure with messages
View->Select columns
1-8

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Dealing with a CPU Hog

Option 1: Kill the process


Option 2: Lower the priority
Option 3: Suspend the process with PsSuspend
Another use: you’ve started a long running job but
want to pause it to do something else
Lowering the priority still leaves it running…
Option 4: Try and figure out what it’s doing using
monitoring tools explained later in this talk

1-9

Identify The Image


Once you’ve found the process of interest,
what is it?
Sometimes name of .EXE identifies clearly
(e.g., Winword.exe)
Often, it doesn’t since Task Manager doesn’t
show the full path of the image
We need more information!

1-10

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process Explorer (Sysinternals)
“Super Task Manager”
Shows full image path, command line, environment
variables, parent process, security access token, open
handles, loaded DLLs & mapped files

1-11

Process Explorer
Process tree
If left justified, parent has exited
Disappears if you sort by any column
Bring back with View->Show Process Tree
Additional details in process list
Icon and description (from .EXE)
User Name shows which security database account
is from (e.g. which domain)
Highlight Own, Services Processes
Differences highighting
Green: new, Red: gone
View->Update speed->Paused
1-12

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process Tree
System keeps track of
parent/child relationship
What if parent exits?
System only keeps track of
parent PID
If parent exits, no way to
find its ancestors (without a
trace of process creations)
Process Explorer shows
orphans left justified

1-13

Process Properties
Image tab:
Description, company name, version
(from .EXE)
Full image path
Command line used to start process
Current directory
Parent process
User name
Start time
Performance tab:
Basic process CPU/memory usage
Security tab:
Access token (groups list, privilege list)
Environment tab: environment
variables
Services tab (only for service
processes):
List of services hosted by process

1-14

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process Explorer Lab
1. Run Process Explorer
2. Sort on first column (“Process”) and note tree
view disappears
3. Click on View->Show Process Tree to bring it
back
4. Change update speed to paused
5. Run Notepad
6. In ProcExp, hit F5 and notice new process
7. Find value of PATH environment variable in
Notepad
8. Exit Notepad
9. In ProcExp, hit F5 and notice Notepad in red
1-15

Handle View
Lower half of display shows either:
Open handles
Loaded DLLs & mapped files
Handle View
Sort by handle
Objects of type “File” and “Key” are most
interesting for general troubleshooting

1-16

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Uses of Handle View
Solve file locked errors
Use the search feature to determine what
process is holding a file or directory open
Can even close an open files (be careful!)
Understand resources (e.g. files) used by
an application
Detect handle leaks using refresh
difference highlighting

1-17

DLL View
Click on View->DLL View
Shows more than just loaded DLLs
Includes .EXE and any “memory mapped files”
High speed file access mechanism
Makes file appear as virtual memory
Uses:
Detect DLL versioning problems
Compare the output from a working process with that of a
failing one (use File->Save As)
Find which processes are using a specific DLL
(search for it)
Show Relocated DLLs option
Highlights relocated DLLs in yellow
1-18

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Identifying Processes
Often, name of process .EXE doesn’t
clearly identify what it is
Check “Description” column in Process Explorer
Taken from .EXE header

1-19

Identify The Image


Sometimes description is not meaningful

Check full path of


.EXE with Process
Explorer
Often pinpoints
which product
1-20

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Identify The Image
Often, applications are
installed in
\Windows\System32
Or in folders with
unrecognizable names
Check company name
or copyright
Process Explorer: double
click on process
Explorer->right-click,
properties on .EXE

1-21

Identify The Image


What if image properties say
nothing?
Examine open handles
Open files or registry keys may
give a clue

1-22

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Identifying Processes
If you still don’t know what the EXE is, run
Strings on it
Dumps printable strings in binary
Need to run twice
No switches dumps Unicode strings
“–a” switch dumps ANSI strings
Printable strings may yield clues
Registry keys
Help/error message text
May also need to dump DLLs used by
process 1-23

Agenda

Introduction & Data Structures


Identifying the Process
Analyzing Process/Thread Activity
Application Failures

1-24

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Multi-service Processes
Some processes host multiple services
E.g. Svchost.exe, Inetinfo.exe (IIS)
If still not clear what process is doing,
need to peer inside process and examine
which thread(s) are running and what
code they are executing
With Performance Monitor, monitor
%Processor Time for threads inside
a process
Find thread(s) consuming CPU time
1-25

Analyzing Thread Activity


Then try and determine what code they are
executing by finding which code module
the thread started in:
1. Get thread start address with Tlist
2. With Process Explorer DLL view, sort by
base address and find in which module the
address lies
Can also do this with Tlist

1-26

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Analyzing Thread Activity
Start address may not be enough
May need to look at call stack
Can attach with Windbg or Ntsd and issue
“k” command
Caution: pre-XP, exiting debugger kills
debugee if real debugger attachment
Attach “noninvasive”
Freezes threads while connected
Allows viewing information in process, but not
changing data

1-27

Analyzing Call Stacks

With Windbg, click on


File->Attach to Process
Then View->
Call Stack
Then View
->Processes and
Threads
Select thread of interest

1-28

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Call Stacks
If not obvious from
function names, note
name of DLL and look
at description in
Process Explorer
Run Strings
(Sysinternals) on DLL
or EXE

1-29

Examining System Threads


If System threads are consuming CPU time,
cannot use WinDbg to attach to process and
examine user stack
System threads always run in kernel mode
No user stack
Need to find out what code is running, since it
could be any one of a variety of components
Memory manager modified page writer
Swapper
File server worker threads

1-30

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Examining System Threads
With user-mode tools:
1. PerfMon: monitor %Processor time for
each thread in System process
2. Determine which thread(s) are running
3. From this, get “Start address” (address
of thread function) in Pviewer
4. Run pstat to find which driver thread
start address falls in
Look for what driver starts near the thread
start address

1-31

Examining System Threads


With Kernel Debugger:
ln (“List Near”) <startaddress> will give name of
driver and function
Use !process or !thread to see kernel stack
lkd> ln 8061adb8
(8061adb8) nt!MiModifiedPageWriter | (8061af38)
lkd> !process 4

THREAD 816113e0 Cid 8.50 WAIT: (Executive) KernelMode Non-Alertable
f5c67d70 NotificationTimer
80482540 SynchronizationEvent
Start Address nt!KeBalanceSetManager (0x804634e0)
Stack Init f5c68000 Current f5c67cc0 Base f5c68000 Limit f5c65000 Call 0
ChildEBP RetAddr Args to Child
f5c67cd8 8042d5a3 ffffffff ff676980 00000000 nt!KiSwapThread+0xc5
f5c67d0c 8046355e 00000002 f5c67d98 00000001 nt!KeWaitForMultipleObjects+0x266
f5c67da8 80454faf 00000000 00000000 00000000 nt!KeBalanceSetManager+0x7e
f5c67ddc 80468ec2 804634e0 00000000 00000000 nt!PspSystemThreadStartup+0x69
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16
1-32

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Agenda

Introduction & Data Structures


Identifying the Process
Analyzing Process/Thread Activity
Application Failures

1-33

Troubleshooting Application Failures

Most applications do a poor job of reporting


file-related or registry-related errors
E.g. permissions problems
Missing files
Missing or corrupt registry data

1-34

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Troubleshooting Application Failures

When in doubt, run Filemon and Regmon!


Filemon monitors File I/O; Regmon monitors
registry I/O
Ideal for troubleshooting a wide variety of
application failures
Also useful for to understand and tune file
system access
E.g. understanding hard drive activity
Work on all Windows® OSs
Used extensively within Microsoft 1-35

Using Regmon/Filemon
Two basic techniques:
Go to end of log and look backwards to where problem
occurred or is evident and focused on the last things
done
Compare a good log with a bad log
Often comparing the I/O and Registry activity of a
failing process with one that works may point to
the problem
Have to first massage log file to remove data that differs
run to run
Delete first 3 columns (they are always different: line #, time,
process id)
Easy to do with Excel by deleting columns
Then compare with FC (built in tool) or Windiff
(Resource Kit)

1-36

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Filemon
# - operation number
Process: image name + process id
Request: internal I/O request code
Result: return code from I/O operation
Other: flags passed on I/O request

1-37

Controlling Filemon
Start/stop logging (Control/E)
Clear display (Control/X)
Open Explorer window to folder containing
file:
Double click on a line does this
Find – finds text within window
Save to log file
History depth
Advanced mode
1-38

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Limiting Filemon Output
Can set filters for including, excluding, and
highlighting output

1-39

Filemon Lab 1
1. Run Filemon
2. Set filter to only include Notepad.exe
3. Run Notepad
4. Type some text
5. Save file as “test.txt”
6. Go back to Filemon
7. Stop logging
8. Set highlight to “test.txt”
9. Find line representing creation of new file
Hint: look for create operation
1-40

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Filemon Example

While typing in the document Word XP closes


without any prompts
Filemon log showed this:

User looked up what .LEX file was


Related to Word proofing tools
Uninstalled and reinstalled proofing tools & problem
went away
1-41

Access Denied
Many applications don’t report access
denied errors well
Example: try to save a file with Notepad to a
folder you don’t have access to
Use Filemon to verify access denied
errors are not occurring on file opens
Check Result column

1-42

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Example: Access Denied

AOL reported this error:

Filemon showed this:


waol.exe OPEN C:\PROGRA~1\AMERIC~1.0\IDB\main.ind ACCESS DENIED

User did not have admin rights to AOL directory

1-43

Example: Access Denied


For example, an application failed with
this error:

Ran Filemon and found it was getting


Access Denied

Someone had misread a request to


remove EDIT rights and removed all rights
1-44

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Hot File Analysis
Understand disk activity system-wide
Run Filemon for a period of time
Save output in a log file
Import into Excel and make a pie chart
by file name or operation type
Example: used Filemon on a server
to determine which file(s) were being
accessed most frequently
Moved these files to a different disk on
a different controller 1-45

Locked Files
Attempting to open or delete a file that is
in use simply reports “file locked”
With Process Explorer search (in handle
view) you can determine what process is
holding a file or directory open
Can even close open files (be careful!)

1-46

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process Explorer Lab: Locked File
1. Run ProcExp
Click on View->Update speed – change to
Paused
2. Run Microsoft Word
3. Create a file called “test.doc” and save it
(but don’t close it)
4. From a command prompt try and delete
“test.doc” (should get file locked)
5. In ProcExp, hit F5 and then use Search to
find open handle to test.doc
1-47

Access Denied on Mapped Files

Attempting to delete a DLL or EXE that is in


use gets “access denied”, not “file locked”
Can be misleading
In Process Explorer DLL View, search for
file
Example: try and delete Notepad.exe while
you’re running it

1-48

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
DLL Problems
DLL version mismatches can cause strange
application failures
Most applications do a poor job of reporting
DLL version problems
Process Explorer can help detect DLL
versioning problems
Compare the output from a working process
with that of a failing one (use File->Save As)

1-49

DLL Problems
But sometimes it’s the order of DLL loads
that clues you in, so use Filemon!
Missing DLLs often not reported correctly
Look for “NOTFOUND” or “ACCESS DENIED”
May be opening wrong versions due to files in
PATH
Look at the last DLL opened before the
application died

1-50

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Example Problem: Word Dies
Word97 starts and a few seconds later
gets a Dr. Watson (access violation)
Customer tried re-installing Office – still failed
Solution:
Ran Filemon, looked at last DLL loaded
before Dr. Watson
It was a printer DLL
Uninstalled printer – problem went away

1-51

Example Problem: Help Fails

The Help command in an application failed


on Win95, but worked fine on
Win98/ME/NT4/Win2000/WinXP
Failed with meaningless error message

1-52

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Solution
Ran Filemon on failing system and working
system
Reduced log to file opens
Compared logs
At the point they diverged, looked backwards to
last common thing done
An OLE system DLL was loaded
Noticed this OLE DLL was loaded from a directory in
the user’s PATH on Win95, but from
\Windows\System on other versions
Conclusion:
DLL loaded on Win95 system was not for Win95
Got proper version for Win95, problem went away

1-53

Example Problem: Access Hangs

Problem: Access 2000 would hang when


trying to import an Excel file
Worked fine on other users’ workstations
User had Access 97 and Access 2000
installed
Compared a Filemon log from the working
and failing system
Failing system was loading an old Access
DLL from \windows\system32 due to having
installed Access 97 previously
Removed DLL and problem went away 1-54

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Dll Version Mismatch Lab
With Word XP installed in the default folder:
1. Go to folder:
\Program Files\Microsoft
Office\Office\1033
2. Rename MSO9INTL.DLL to “MSO9INTL.DLL1”
3. Copy OUTLLIBR.DLL to MSO9INTL.DLL
4. Try and start Word
Send error report to Microsoft ☺
5. Use FileMon to confirm which DLL is likely
causing the problem
1-55

Configuration Problems
Missing, corrupted or overly-secure Registry
settings often lead to application crashes and
errors
Some applications don’t completely remove
registry data at uninstall
Regmon may yield the answer…

1-56

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Regmon Output
Request: OpenKey, CreateKey, SetValue,
QueryValue, CloseKey
Path
HKCU=HKEY_CURRENT_USER (per-user settings)
HKLM=HKEY_LOCAL_MACHINE (system wide settings)
Result – return code from Registry operation
Other – extended information or results

1-57

Controlling Regmon
Start/stop logging (Control/E)
Clear display (Control/X)
Regedit jump (opens Registry Editor and
jumps directly to key)
Double clicking on a line does this
Filtering/Highlighting
Find
Save to log file

1-58

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Regmon Filtering
Normally, registry activity should be only at
application/system startup and exit
But, sadly, lots of processes perform needless
registry querying…
Filtering options:
Process name or registry path (or partial name)
Success/failure, read/write

1-59

Regmon Lab 1
1. Run Regmon
2. Highlight Notepad.exe
3. Run Notepad
4. Change font to “Times New Roman”
5. Exit
6. Go back to Regmon
7. Stop logging
8. Find line showing storing of font name in
registry
Hint: search for “times”

1-60

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Using Regmon

Identify missing Registry keys


Search for status “NOTFOUND”
Troubleshoot permission problems
Search for status “ACCESS DENIED”
Find incorrect or corrupt data
Examine values read and/or written (in
Other column)

1-61

Example Problem
Internet Explorer failed to start:

Solution:
Looked backwards from end of Regmon log
Last queries were to:
HKCU\Software\Microsoft\Internet Connection Wizard
Looked here and found a single value “Completed”
set to 0
Compared to other users—theirs was 1
Set this manually to 1 and problem went away 1-62

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Regmon Applications
If you suspect registry data is causing
problems, rename the key and re-run the
application
Most applications re-create user settings
when run
In this way, the data won’t be seen by the
application
Can always rename the key back

1-63

Regmon Lab 2
1. Run Notepad
2. Change Font and point size
3. Enable Word wrap
4. Run Regmon & filter to Notepad.exe
5. Exit Notepad
6. In Regmon log, find location of user-specific
Notepad settings
7. Double click on a line to jump to Regedit
8. Delete top level Notepad user settings key
9. Re-run Notepad and confirm font and word
wrap reset to default setting
1-64

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Example Problem
Internet Explorer hung when started
Default internet connection was set, but
wasn’t being dialed
Dialing the connection first manually and
then running IE worked
Background information:
User had previously installed the AT&T
Dialer program, but had uninstalled it and
created dial up connection manually

1-65

Solution
Ran Regmon
Looked backwards from end (at the point
IE was hung)
Found references to ATT under a
PhoneBook key
Renamed ATT key and problem went away
Conclusion: registry junk was left from
uninstall

1-66

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Example Problem
User somehow disabled all toolbars and
menus in Word
No way to open files, change settings etc.
With Regmon, captured startup of Word
Found location of user-specific settings for
Word
Deleted this Registry key
Re-ran Word – menus and toolbars were
back!
Word re-created user settings from scratch
1-67

Filemon/Regmon as a Service
Sometimes need to capture I/O or registry
activity during the logon or logoff process
E.g. errors occuring during logon/logoff
Solution:
Run Filemon/Regmon with AT command
Install and run Filemon/Regmon as a service
Use Srvany tool from Resource Kit
In either case, but tools remain running
after logoff

1-68

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Analyzing Process Crashes

If you still can’t determine why a process is


crashing, next step is to get a process dump
to the developer
But, until XP, few knew there was a process
dump…

1-69

Process Crashes

1-70

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Windows Error Reporting
On XP & Server 2003, when an unhandled
exception occurs:
System first runs DWWIN.EXE
DWWIN creates a process microdump and XML file and offers
the option to send the error report
Then runs debugger (Drwtsn32.exe)

1-71

Windows Error Reporting


Configurable with
System Properties-
>Advanced->Error
Reporting
HKLM\SOFTWARE
\Microsoft\PCHealth
\ErrorReporting
Configurable with group
policies
HKLM\SOFTWARE
\Policies\Microsoft
\PCHealth
1-72

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Dr. Watson
User message box
doesn’t mention most
important thing:
A dump file was created!
Can customize by
running
“DRWTSN32.EXE”
Note: servers default to
no visual notification
To set Dr. Watson as
default debugger:
Drwtsn32 -i
1-73

Dumping a Running Processes


Instead of killing a hung process (leaving no
debug info), run Dr. Watson on it
Dr. Watson creates a crash dump file and then kills
process
drwtsn32 –p processid
Autodump (Debugging Tools) will snapshot a
process without killing it
E.g. a server process that is having problems on a
production system
Snapshot the process and debug offline
Determine if the process needs to be restarted or not

1-74

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
End of Troubleshooting Processes
& Threads

Next: Troubleshooting Memory Problems

1-75

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Windows Internals and
Advanced Troubleshooting
Part 3: Troubleshooting Memory
Problems

1-1

Troubleshooting Memory
Problems
ƒ System and process memory usage may
degrade performance
ƒ Or eventually cause process failures
ƒ How do you determine memory leaks?
ƒ Process vs. system?
ƒ How do you know if you need more memory?
ƒ How do you size your page file?
ƒ What do system and process memory counters
really mean?
ƒ Understanding process and system memory
information can help answer these questions…
1-2

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Windows Memory Management
ƒ Demand paged virtual memory
ƒ Unit of protection and usage is one page
ƒ x86: 4 KB
ƒ Itanium 8 KB
ƒ Pages are read in on demand and written out when
necessary (to make room for other memory needs)
ƒ Provides illusion of flat virtual address space to
each process
ƒ 32-bit: 4 GB, 64-bit: 16 Exabytes (theoretical)
ƒ Supports up to 64 GB (32-bit systems) or 512 GB
(64-bit systems) physical memory
ƒ Intelligent, automatic sharing of memory
1-3

Process Memory Usage

1-4

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process Memory Usage
ƒ Process virtual size
ƒ By default, 2 GB on 32-bit Windows
ƒ 64-bit Windows: 7152 GB
ƒ Up to 3 GB with Windows Server 2003 Enterprise Edition
(/USERVA= or /3GB)
ƒ Application must be marked large address space aware
ƒ What limits total process virtual memory?
ƒ Page file size + (most of) physical memory
ƒ Called “Commit limit”
ƒ What limits physical size of a process?
ƒ Physical memory + Memory Manager policies
ƒ Based on memory demands and paging rates

1-5

32-Bit Virtual
00000000
Address Space
Code: EXE/DLLs
Unique per
process, Data: EXE/DLL (x86)
accessible in static storage, per- ƒ 2 GB per-process
user or kernel thread user mode ƒ Address space of one process
mode stacks, process is not directly reachable from
heaps, etc. other processes
ƒ 2 GB system-wide
7FFFFFFF ƒ The operating system is
80000000 loaded here, and appears
Per process, Code: in every process’s
accessible NTOSKRNL, HAL, address space
only in kernel drivers ƒ The operating system is not a
mode Data: kernel stacks, process (though there are
C0000000 processes that do things for
Process page tables, the OS, more or less in
System wide, File system cache
hyperspace “background”)
accessible Non-paged pool,
only in kernel
ƒ 3 GB user space and Address
Paged pool Windowing Extensions (AWE)
mode
FFFFFFFF
t.b.d.

1-6

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
3GB Process Space
00000000
Option
Unique per ƒ /3GB option in BOOT.INI
Unique per
process,
accessible in process
.EXE code ƒ Provides up to 3 GB per-process
user or kernel (= perGlobals
appl.), address space
mode user mode user
Per-thread ƒ Windows Server 2003 supports
mode stacks variations from 2GB to 3GB
.DLL code (/USERVA=)
Process heaps ƒ Restrictions to use:
Per process,
accessible ƒ Only available on Windows 2000
only in kernel Advanced Server & Server 2003
mode Enterprise Edition
ƒ Limits memory to 16 GB
BFFFFFFF
C0000000 ƒ .EXE must have “large address
System wide, Process page tables, space aware” flag in image
accessible hyperspace header, or they’re limited to 2
only in kernel GB (specify at link time or with
Exec, kernel, HAL,
mode imagecfg.exe in Resource Kit)
drivers, etc.
FFFFFFFF ƒ Better solution: address
windowing extensions

1-7

0
64-Bit Virtual
User-Mode User Space
6FC00000000 Kernel-Mode User Space
Address Space
1FFFFF0000000000 User Page Tables
(Itanium)
2000000000000000 Session Space

3FFFFF0000000000 Session Space Page Tables

E000000000000000
-E000060000000000 System Space

FFFFFF0000000000 Session Space Page Tables

64-bit Windows 32-bit Windows


User Address Space 7152 GB 2 or 3 GB
System PTE Space 128 GB 2 GB
System Cache 1 TB 960 MB
Paged pool 128 GB 650 MB
Non-paged pool 128 GB 256 MB
1-8

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process Memory Usage: “Working Set”
ƒ Working set: All the physical pages “owned” by a
process
ƒ Essentially, all the pages the process can reference
without incurring a page fault
ƒ A process always starts with an empty working set
ƒ Pages itself into existence
ƒ XP prefetches pages to speed up application startup
ƒ Many page faults may be resolved from memory

newer pages older pages

PerfMon
Process “WorkingSet”

1-9

Process Memory Information


Task Manager
Processes tab 1 2

1
z “Mem Usage” = physical
memory used by process
(working set size, not
working set limit)
¾ Note: Shared pages are
counted in each
process
2 “VM Size” = private (not
z
shared) committed virtual
space in processes ==
potential pagefile usage
3 “Mem Usage” in status bar
z
is not total of “Mem Usage”
column (see later slide) 3

Screen snapshot from:


Task Manager | Processes tab
1-10

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Shared Memory
ƒ Like most modern OSs,
Windows provides a way for
processes to share memory
ƒ High speed IPC (used by LPC,
which is used by RPC) Process 1
ƒ Threads share address space, but Address DLL
applications may be divided into Space code
multiple processes for stability
reasons
ƒ Processes can also create shared Physical
memory sections Memory
ƒ Called page file backed file mapping
objects
ƒ Full Windows security
ƒ It does this automatically for Process 2
shareable pages Address
ƒ E.g., code pages in an EXE or DLL Space
1-11

Viewing the Working Set


ƒ Working set size counts shared pages in each
working set
ƒ Vadump (Resource Kit) can dump the breakdown
of private, shareable, and shared pages
C:\> Vadump –o –p 3968
Module Working Set Contributions in pages
Total Private Shareable Shared Module
14 3 11 0 NOTEPAD.EXE
46 3 0 43 ntdll.dll
36 1 0 35 kernel32.dll
7 2 0 5 comdlg32.dll
17 2 0 15 SHLWAPI.dll
44 4 0 40 msvcrt.dll

1-12

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Working Set Replacement

ƒ When process reaches working set maximum, must give up


pages to make room for new pages
ƒ This is called a local page replacement policy Kept in
(versus a global replacement policy common on Unix) memory on
standby or
ƒ Means that a single process cannot take over all of physical modified
memory unless other processes aren’t using it page list
ƒ Page replacement algorithm is least recently accessed
ƒ Windows 2000: only on uniprocessor; Windows XP and Server
2003: All systems

1-13

Paging Lists

1-14

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Managing Physical Memory
ƒ System keeps unowned physical pages on
one of several lists
ƒ Free page list
ƒ Modified page list
ƒ Standby page list
ƒ Zero page list
ƒ Bad page list – pages that failed memory test at
system startup

1-15

Standby And Modified Page Lists

ƒ Modified pages go to modified (dirty) list


ƒ Avoids writing pages back to disk too soon
ƒ Unmodified pages go to standby (clean) list
ƒ They form a system-wide cache of “pages likely
to be needed again”
ƒ Pages can be faulted back into a process from the
standby and modified page list
ƒ These are counted as page faults, but not
page reads

1-16

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Free And Zero Page Lists
ƒ Free Page List
ƒ Used for page reads
ƒ Private modified pages go here on process exit
ƒ Pages contain junk in them (e.g., not zeroed)
ƒ On most busy systems, this is empty
ƒ Zero Page List
ƒ Used to satisfy demand zero page faults
ƒ References to private pages that have not been created
yet
ƒ When free page list has 8 or more pages, a priority
zero thread is awoken to zero them
ƒ On most busy systems, this is empty too
1-17

Paging Dynamics
demand zero page read from
page faults disk or kernel
allocations

Standby
Page
List

Working “soft” modified Free zero Zero


page page Page Bad
Sets page Page
faults writer List thread
List Page
List

“global Modified
valid” Page
faults
working set List
replacement

Private pages
at process exit
1-18

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Memory Management Information
Task Manager
Performance tab

ƒ
6 “Available” = sum of free,
standby, and zero page
lists (physical)
ƒ Majority are likely standby
pages
ƒ “System Cache” = size of
standby list + size of 6
system working set (file
cache, paged pool,
pageable OS/driver code
& data)
Screen snapshot from:
Task Manager | Performance tab 1-19

Viewing the Paging Lists

ƒ Only way to get actual size of physical memory


lists is to use !memusage in Kernel Debugger
lkd> !memusage
loading PFN database

Zeroed: 0 ( 0 kb)
Free: 3 ( 12 kb)
Standby: 98248 (392992 kb)
Modified: 563 ( 2252 kb)
ModifiedNoWrite: 0 ( 0 kb)
Active/Valid: 93437 (373748 kb)
Transition: 1 ( 4 kb)
Unknown: 0 ( 0 kb)
TOTAL: 192252 (769008 kb)
Screen snapshot from:kernel debugger
!memusage command 1-20

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Page Files

1-21

Page Files
ƒ What gets sent to the paging file?
ƒ Not code – only modified data (code can be re-read
from image file anytime)
ƒ When do pages get paged out?
ƒ Only when necessary
ƒ Page file space is only reserved at the time pages
are written out
ƒ Once a page is written to the paging file, the space is
occupied until the memory is deleted (e.g., at
process exit), even if the page is read back from disk
ƒ Can run with no paging file
ƒ Windows NT4/Windows 2000: Zero pagefile size
actually created a 20MB temporary page file
1-22

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Do I Need More Memory?
ƒ If heavy paging activity:
ƒ Monitor Memory->Page Reads/sec
ƒ Not Page Faults/sec (which includes soft faults)
ƒ Should not stay high for sustained period
ƒ Some hard page faults unavoidable
ƒ Process startup
ƒ Normal file I/O done via paging
ƒ To eliminate normal file I/O, subtract
System->File Read Operations/sec
ƒ Or, use Filemon to determine what file(s) are
having paging I/O (asterisk next to I/O function)
1-23

Sizing The Page File


ƒ Given understanding of page file usage,
how big should the total paging file space
be?
(Windows supports multiple paging files)
ƒ Size should depend on total private virtual
memory used by applications and drivers
ƒ Therefore, not related to RAM size (except for
taking a full memory dump)

1-24

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Sizing The Page File
ƒ Worst case: Windows has to page all private data
out to make room for code pages
ƒ To handle, minimum size should be the maximum of VM
usage (“Commit Charge Peak”)
ƒ Hard disk space is cheap, so why not double this
ƒ Normally, make maximum size same as minimum
ƒ But, max size could be much larger if there will be
infrequent demands for large amounts of page file
space
ƒ Performance problem: Page file extension will likely be very
fragmented
ƒ Extension is deleted on reboot, thus returning to a contiguous
page file

1-25

Memory Management Information


Task Manager
Performance tab

ƒ3 Total committed private virtual


memory (total of “VM Size” in
process tab + Kernel 3
Memory Paged)
ƒ not all of this space has actually
been used in the paging files; it is
“how much would be used if it was
all paged out”
4ƒ “Commit charge limit” = sum of
physical memory available for
processes + current total size of
paging file(s) 3
ƒ does not reflect true maximum 4
page file sizes (expansion)
ƒ when “total” reaches “limit”, further
VirtualAlloc attempts by any
process will fail
Screen snapshot from:
Task Manager | Performance tab 1-26

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Why Page File Usage on Systems with
Ample Free Memory?
ƒ Because memory manager doesn’t let process working
sets grow arbitrarily
ƒ Processes are not allowed to expand to fill available memory
(previously described)
ƒ Bias is to keep free pages for new or expanding processes
ƒ This will cause page file usage early in the system life even with
ample memory free
ƒ We talked about the standby list, but there is another list of
modified pages recently removed from working sets
ƒ Modified private pages are held in memory in case the process asks
for it back
ƒ When the list of modified pages reaches a certain threshold, the
memory manager writes them to the paging file (or mapped file)
ƒ Pages are moved to the standby list, since they are still “valid” and
could be requested again
1-27

Memory Leaks

1-28

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Process Memory Leaks
ƒ System says “running low on virtual
memory”
ƒ Before increasing size of page file, look for a
process (or system) memory leak
ƒ Look for who is consuming pagefile space
ƒ Process memory leak: Check Task Manager,
Processes tab, VM Size column
ƒ Or Perfmon “private bytes”, same counter

1-29

Leakyapp Test Program


ƒ Leakyapp.exe is in the Resource Kit
ƒ Continuously allocates private,
nonshareable virtual memory
ƒ When there is no more, it just keeps trying..
ƒ Run several copies to fill pagefile more
quickly

1-30

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Handle Leaks
ƒ Processes that open resources but don’t
close them can exhaust system memory
ƒ Check total handle count in Task Manager
Performance tab
ƒ To find offending process, on Process tab add
Handle Count and sort by that column
ƒ Using Process Explorer handle view with
differences highlighting you can even find which
handle(s) are not being closed

1-31

Kernel Memory Leaks


ƒ A driver leaking nonpaged ƒ Or, a rowing Memory
pool shows up as large and Usage and Paged
growing Nonpaged pool pool usage
usage

1-32

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Kernel Memory Pools
ƒ Two system memory pools
ƒ “Nonpaged Pool” and “Paged Pool”
ƒ Used for systemwide persistent data (visible
from any process context)
ƒ Pool sizes are a function of memory size &
Server vs. Workstation
ƒ Can be overidden in Registry:
HKLM\System\CurrentControlSet\Control\Session
Manager
\Memory Management

1-33

Kernel Memory Pools


ƒ Nonpaged pool
ƒ Has initial size and upper limit (can be grown dynamically,
up to the max)
ƒ 32-bit upper limit: 256 MB on x86 (NT4: 128MB)
ƒ 64-bit limit: 128 GB
ƒ Paged pool
ƒ 32-bit upper limit: 650MB (Windows Server 2003), 470MB
(Windows 2000), 192MB (Windows NT 4.0)
ƒ 64-bit limit: 128 GB
ƒ Pool size performance counters display current size,
not maximum
ƒ To display maximums, use “!vm” kernel debugger
command
1-34

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Debugging Pool Leaks

ƒ Two options:
ƒ Poolmon
ƒ In the Support Tools and the Device Driver Kit
(DDK)
ƒ Requires that you turn on Pool Tagging with
Gflags on Windows NT and Windows 2000
ƒ Driver Verifier
ƒ Select all drivers
ƒ Turn on pool tracking

1-35

Troubleshooting with Poolmon


ƒ Poolmon.exe (Support Tools)
ƒ Shows paged and nonpaged pool consumption by data structure “tag”
ƒ Must first turn on “pool tagging” with Resource Kit gflags tool & reboot
ƒ On by default in Windows Server 2003

1-36

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Troubleshooting with Poolmon

ƒ Once you find pool tag that is leaking:


ƒ Look up in Windows Debugging Tools subfolder
\triage\pooltag.txt
ƒ May not be there if 3rd party driver
ƒ Run Strings (from Sysinternals) on all drivers:
strings \windows\system32\drivers\*.sys
| findstr Xyzz

1-37

Troubleshooting with Driver Verifier


ƒ Use Driver Verifier to enable pool tracking
for all drivers (or ones of interest)
ƒ System tracks pool usage by driver
ƒ Poolmon looks at pool usage by structure tag

1-38

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Looking for Leaks
ƒ Reboot and look at the pool usage of each driver
ƒ A leaker exhibits the following
ƒ Current allocations is always close to or equal to the peak
ƒ The peak grows over time
ƒ If the leak is significant the peak allocations or bytes will be large

1-39

Causing a Pool Leak


ƒ Run NotMyFault and select
“Leak Pool”
(available from
http://www.sysinternals.com
/files/notmyfault.zip)
ƒ Allocates paged pool buffers
and doesn’t free them
ƒ Stops leaking when you select
“Stop Leaking”

1-40

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
End of Troubleshooting Memory
Problems

Next: Crash Dump Analysis

1-41

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Windows Internals and
Advanced Troubleshooting
Part 4: Crash Dump Analysis

1-1

Outline
ƒ What causes crashes?
ƒ Crash dump options
ƒ Analysis with WinDbg/Kd
ƒ Debugging hung systems
ƒ Microsoft On-line Crash Analysis
ƒ Using Driver Verifier
ƒ Live kernel debugging
ƒ Getting past a crash

1-2

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Why Analyze Dumps?

ƒ The debuggers and Microsoft Online Crash


Analysis (OCA) often solve crashes
ƒ Sometimes, however, they do not, so your
analysis might tell you:
ƒ What driver to disable, update, or replace with different
hardware
ƒ What OEM to send the dump to

1-3

You Can Do It!


ƒ Many systems administrators ignore
Windows NT/Windows 2000’s crash dump
options
ƒ “I don’t know what to do with one”
ƒ “Its too hard”
ƒ “It won’t tell me anything anyway”
ƒ Basic crash dump analysis is actually pretty
straightforward
ƒ Even if only 1 out of 5 or 10 dumps tells you
what’s wrong, isn’t it worth spending a few
minutes?
1-4

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
What Causes Crashes?
ƒ System crashes when a fatal error prevents
further execution
ƒ Any kernel-mode component can crash the
system
ƒ Drivers and the OS share the same memory
space
ƒ Therefore, any driver or OS component can,
due to a bug, corrupt system memory
ƒ Note: This is for performance reasons and is the
same on Linux, most Unix’s, VMS, etc…
1-5

Dump Options
ƒ Complete memory dump (Windows NT 4,
Windows 2000, Windows XP)
ƒ Full contents of memory written to
<systemroot>\memory.dmp
ƒ Kernel memory dump (Windows 2000, Windows
XP)
ƒ System memory written to <systemroot>\memory.dmp
ƒ Small memory dump (Windows 2000, Windows
XP)
ƒ Also called a minidump or triage dump
ƒ 64KB of summary written to
<systemroot>\minidump\MiniMMDDYY-NN.dmp

1-7

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Enabling Dumps

ƒ In Windows NT 4:

1-8

Enabling Dumps
ƒ In Windows 2000/XP:

1-9

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
At The Crash
ƒ A component calls KeBugCheckEx, which takes
five arguments:
ƒ Stop code
ƒ 4 stop-code defined parameters
ƒ KeBugCheckEx:
ƒ Turns off interrupts
ƒ Tells other CPUs to stop
ƒ Paints the blue screen
ƒ Notifies registered drivers of the crash
ƒ If a dump is configured:
ƒ Verifies checksums
ƒ Calls dump I/O functions
1-10

Common Stop Codes


ƒ There are about 150 defined stop codes
ƒ IRQL_NOT_LESS_OR_EQUAL (0x0A)
ƒ Usually an invalid memory access
ƒ INVALID_KERNEL_MODE_TRAP (0x7F)
and
KMODE_EXCEPTION_NOT_HANDLED
(0x1E)
ƒ Generated by executing garbage instructions
ƒ Its usually caused when a stack
is trashed
1-11

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
At The Reboot

WinLogon
Session 2
Manager
Memory.dmp
3

SaveDump
1 4
User mode
Kernel mode

NtCreatePagingFile
Paging
File

1-12

At The Reboot
ƒ Session Manager process
(\winnt\system32\smss.exe) initializes
paging file 1
ƒ NtCreatePagingFile
ƒ NtCreatePagingFile determines if the dump
has a crash header 2
ƒ Protects the dump from use
ƒ WinLogon calls NtQuerySystemInformation
to tell if there’s a dump
1-13

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
At The Reboot
ƒ If there’s a dump, Winlogon executes
SaveDump 3
(\winnt\system32\savedump.exe)
ƒ Writes an event to the System event log
ƒ SaveDump writes contents to appropriate
file 4
ƒ Crash dump portion of paging file is in use
during copy, so virtual memory can run low

1-14

Why Crash Dumps Fail


ƒ Most common reasons:
ƒ Paging file on boot volume is too small
ƒ Not enough free space for extracted dump
ƒ Less common:
ƒ The crash corrupted components involved in the
dump process
ƒ Miniport driver doesn’t implement dump I/O
functions
ƒ Windows 2000 and Windows XP storage drivers
must implement dump I/O to get a Microsoft®
signature
1-15

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Generating A Test Dump
ƒ Get BSOD from Sysinternals:
www.sysinternals.com/ntw2k/freeware/
bluesave.shtml
ƒ It crashes the system by:
ƒ Allocating kernel memory
ƒ Freeing the memory
ƒ Raising the IRQL
ƒ Touching the freed memory

1-16

Analyzing a Crash Dump

ƒ There are two kernel-level debuggers:


ƒ WinDbg –Windows program
ƒ Kd – command-line program
ƒ Same functionality

1-17

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Debugging Tools
ƒ Get the latest from:
www.microsoft.com/ddk/debugging
ƒ Supports Windows NT 4, Windows 2000,
Windows XP, Server 2003
ƒ Check for updates frequently
ƒ Don’t use older version on install media
ƒ Install to c:\Debuggers
ƒ Easy access from command prompt

1-18

Symbol Files
ƒ Before you can use any crash analysis tool you
need symbol files
ƒ Symbol files contain global function and variable names
ƒ At the minimum, get the symbol file(s) for ntoskrnl.exe,
ntkrnlmp.exe, ntkrnlpa.exe, ntkrpamp.exe
ƒ Symbols are service pack-specific and have an
installer (default directory is \winnt\symbols)
ƒ Windows NT 4: *.dbg
ƒ Windows 2000: *.dbg, *.pdb
ƒ Windows XP: *.pdb
ƒ Note: SP symbols only include updates
1-19

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Microsoft Symbol Server
ƒ WinDbg and Kd can download symbols
automatically from Microsoft
ƒ Pick a directory to install symbols and add
the following to the debugger’s symbol
path:
SRV*directory*http://msdl.microsoft.
com/download/symbols
ƒ The debugger automatically detects the OS
version of a dump and downloads the
symbols on-demand
1-20

Installing the Symbol Files


ƒ On CDs:
ƒ Windows NT 4: on Windows NT 4 Setup CD under
\support\debug
ƒ Windows 2000 SP0/Windows XP SP0 on Customer
Support Diagnostics CD
ƒ Windows 2000 SP1 on SP1 CD
ƒ Online:
ƒ Windows NT 4: All (US) service packs are at
ftp.microsoft.com:\
bussys\winnt\winnt-public\fixes\usa\nt40
ƒ Windows 2000/XP:
http://www.microsoft.com/ddk/debugging/
symbols.asp

1-21

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Automated Analysis
ƒ When you open a crash dump with Windbg
or Kd you get a basic crash analysis:
ƒ Stop code and parameters
ƒ A guess at offending driver
ƒ The analysis is the result of the automated
execution of the !analyze debugger
command

1-22

Debugger Commands
ƒ Two types of commands
ƒ Dot commands are built-in
ƒ Bang commands are provided with extension
DLLs
ƒ Extension DLLs allow Microsoft and third-
parties to dynamically add commands
ƒ The main extension DLL is the kernel-
debugger extension DLL, kdexts.dll
ƒ Each OS has a subdirectory with its own
kdexts.dll version as well as other,
development-area specific, extension DLLs
(e.g. Rpcexts.dll, ndiskd.dll, …)
1-23

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Deeper Analysis
ƒ Always execute !analyze with the –v option
to get more information
ƒ Text description of stop code
ƒ Meaning (if any) of parameters
ƒ Stack dump
ƒ !Analyze uses heuristics to walk up the
stack and determine what driver is the likely
cause of the crash
ƒ “Followup” is taken from optional triage.ini file

1-24

Useful Commands

ƒ When you load a dump into the debugger it executes


!analyze
ƒ Sometimes identifies the cause of a crash
ƒ Always execute !analyze –v to see more
ƒ The next steps:
Look at the current process: !process
List all processes: !process 0 0
Look at a thread: !thread <thread address or ID>
List loaded drivers: lm kv
Look at an I/O request packet: !irp <irp address>
Disassemble code: u <address or function name>

1-25

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Hung Systems
ƒ You can tackle a hung system, but only if you’ve
prepared:
ƒ Boot in debug mode, or
ƒ Set the keystroke-crash Registry value
ƒ For debug mode you need a second system (the
debugger host) connected to the target via serial
cable
ƒ Run Windbg/Kd on the host
ƒ Edit the target’s boot.ini file:
ƒ /debugport=comX /baudrate=XXX
ƒ When the system hangs, connect with the debugger
and hit Ctrl-C

1-26

Hung Systems
ƒ To configure keystroke-crash:
ƒ Set HKEY_LOCAL_MACHINE\System\
CurrentControlSet\Services\i8042prt\
Parameters\CrashOnCtrlScrl to 1
ƒ Enter right-ctrl+[scroll-lock, scroll-lock] to crash
the system
ƒ Use !thread to see what’s running

1-27

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Microsoft On-line Crash Analysis
(OCA)
ƒ Have Microsoft process dumps at
oca.microsoft.com
ƒ XP asks you if you want to submit after a crash
ƒ You can visit OCA and manually submit a dump
ƒ OCA accepts Win2K and XP dumps, but is
focused on XP
ƒ Currently requires a Passport account to check
crash analysis status if it doesn’t know right away

1-28

What Does OCA Do?


ƒ Server farm uses !analyze, but uses
Microsoft’s Triage.ini file and database that
includes information about known problems
ƒ Several ways to get OCA results:
ƒ Via e-mail
ƒ At the OCA site
ƒ Sometimes OCA will point you at KB
articles that describe the problem
ƒ KB articles may tell you to use Windows
Update to get newer drivers, a hotfix, or install
a Service Pack
1-29

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Driver Verifier
ƒ This tool was introduced in Windows 2000
and can be useful to validate a suspicion
about a driver
ƒ The Verifier performs the following checks:
ƒ IRQL rule adherence
ƒ I/O request consistency
ƒ Proper memory usage

1-30

Special Pool
ƒ Special pool is a kernel
buffer area where buffers
are sandwiched with invalid Page n+2
Invalid

pages
Higher
Buffer
ƒ Conditions for a driver Addresses

allocating from special Page n+1


Signature
pool:
ƒ Driver Verifier is verifying
driver Page n Invalid

ƒ Special pool is enabled


ƒ Allocation is slightly less than
one page (4 KB on x86)
1-31

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
Driver Verifier
ƒ If the Verifier detects a violation it crashes
the system and identifies the driver
ƒ If you find a driver in a crash dump that looks like
it might be the cause of the crash, turn on
verification for it
ƒ Use “Last Known Good” if the verifier detects a bug
during the boot
ƒ If a bug is detected in a third-party product check for
updates and/or contact the vendor’s support
ƒ Note that the Verifier means fewer crashes on
Windows XP than Windows 2000 than Windows
NT 4

1-32

Getting Past a Crash


ƒ Last-Known Good
ƒ Boots with driver/kernel configuration last used during
a successful boot
ƒ Safe Mode
ƒ Boots the system with core set of drivers and services
ƒ Network and non-network
ƒ The Recovery Console
ƒ Manually disable offending service, replace corrupt
images, update files
ƒ ERD Commander 2002
ƒ Registry Editor, Explorer, Driver/Service Manager,
password changer, Event Log viewer, Notepad

1-33

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
The Bluescreen Screen Saver

ƒ Scare your enemies and fool your friends


with the Sysinternals Bluescreen Screen
Saver
ƒ Be careful, your job may be on the line!

1-34

More Information
ƒ Inside Windows 2000, 3rd edition – section
on System Crashes in chapter 4
ƒ Debugging Tools help file
ƒ Knowledge Base Articles
ƒ http://www.microsoft.com/ddk/debugging
ƒ Other books:
ƒ http://www.microsoft.com/ddk/
newbooks.asp
ƒ The debugger team wants your feedback
and bug reports 1-35

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich
End of Tutorial

Thank you for coming!

1-36

Windows Internals and Advanced Troubleshooting


Copyright © 2002-2003 by David A. Solomon and Mark E. Russinovich

You might also like