We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
The kernel represents the core of the operating system.
In general it is not necessary to monitor the
operation of the kernel on a regular basis. But there are issues that are caused by bottlenecks in the kernel. This is often the case when a bottleneck cannot be traced to a specific subsystem. The kernel maintains several memory areas for data structures that are required for the proper operation of the system. The Non-Paged Pool is used for storing data that cannot be paged out to the disk as it is crucial to the operation of the kernel (like the data structures relevant to process and memory management) and therefore uses precious space in physical memory. The Paged Pool in contrast - contains data that may be paged to the disk. Both metrics are expressed by an upper limit and the current usage which can be displayed using Process Explorer when debugging symbols are configured. Many performance issues arise fromthese memory areas being depleted. Due to the architecture, 32-bit systems have inherently low upper limits for (Non-)Paged Pool. Even when preset to the maximumvalue, these memory areas are easily exhausted on highly loaded systems. For example, terminal servers suffer fromthe effects when the user density is pushed. On a 64-bit system, these limits have been raised so high that it is not perceivable today when they will be reached but both metrics are primarily limited by the amount of physical memory available to the operating system. Apart fromthese special memory areas, the kernel also collects data about itself servicing requests (e.g. interrupts) and processes (context switches). The latter provides a very interesting piece of information as the number of context switches per second measures how often a new thread is granted a time slice of a processor core. The higher this metric the more time needs to be invested by the processors to switch between threads. As this task is expensive (in terms of processing time), the processor may be eating precious time switching between competing threads causing the processor resource to performworse. It is important to note that this is an internal task which is not showing on processor activity. Kernel All processes running on a systemare competing for execution time. As modern processors have several cores, the operating systemdoes a very good job of moving processes between cores to spread the load equally. The load on the processor is monitored by two metrics: the activity and the competition. The activity of the processor is expressed by the percentage of time it was actively servicing processes. The competition is determined froma queue containing processes waiting for execution time. Often, only the processor activity is examined resulting in an incomplete view of this subsystem. In addition to these parameters, the processor also services hardware interrupts (caused by notification sent by hardware components) and internal kernel tasks. Processor Monitoring processes is very similar to watching the whole system. As each and every process utilizes all subsystems mentioned on this poster, the impact of a process on the systemcan be determined by analyzing the impact on every subsystem. But usually a process exerts most of its load on a single subsystem. The processor can serve a process to performcomplex calculations on data and thereby cause an increased load on the processor. When a process is working on a large amount of data, this may increase the memory footprint. Also the exchange of data with other processes or systems can raise the traffic on physical disks or the network. Processes The physical disk is responsible for storing various kinds of data. First of all, the operating systemcauses load itself by accessing vital files, libraries and configuration data. Next, applications are launched fromthe disk and require more or less frequent reads and writes. Last, the pagefile increases disk activity because it is used to extend the physical memory built into the system. It is inherently hard to separate the load caused fromthese three domains. Although disk I/O can be measured for a process, it is not practical to add all those values to a performance analysis. In Windows Server 2008/Vista, Microsoft introduced the Resource Monitor (perfmon /res) which (above other metrics) displays the disk activity caused by every process. It is a very handy tool for a quick overview how running processes use the disk. The fact that the pagefile is stored on the physical disk creates a close relationship with the memory subsystem. Whenever the memory subsystemis under load, paging causes disk activity by accessing the pagefile and thereby increasing the load on the physical disk. The box labeled Memory contains hints how to measure the effect of the memory subsystemon the disk subsystem. Performance data is exposed for all physical disks individually. A list of local hard drives can be retrieved fromWMI with the following command: wmic pagefile get Name,AllocatedBaseSize Two metrics are of high interest when examining the physical disk: the activity and the competition. The activity is expressed in a percentage of time and the competition in a number of processes waiting for execution time. To determine the load caused by the memory subsystem, refer to the details about paging in the box Memory. Physical Disk The memory subsystemmaintains the available virtual memory consisting of the physical memory built into the systemand the pagefile(s) configured for the system. To gain a valid overview of the state of the memory subsystem, both of these aspects need to be regarded. Windows begins paging memory to the pagefile soon after systemstartup to maintain as much unused physical memory as possible so that processes can quickly allocate a maximumamount of memory. Therefore, monitoring the pagefile usage is as important as the available memory. But using the page file causes I/O operations with the corresponding physical disk, so that the disk subsystemneeds to be examined as well. The amount of physical memory is only presented through WMI because it is a hardware asset. IT can be retrieved by the following command: wmic memphysical get Name,MaxCapacity Judging by name, a 32-bit systemcan only address a maximumof 4GB of physical memory. By using the Physical Address Extension (PAE) a 32-bit systemis able to use up to 64GB of memory. The maximumamount of usable memory also depends on the Windows edition used. Although PAE can be enabled on Windows Server 2003/ 2008 Standard Edition, it only serves to counter the negative effects of drivers masking physical memory. Drivers often require I/O address ranges to access the hardware devices. These address ranges are usually placed below the 4GB limit rendering portions of the physical memory unusable. By enabling PAE on Windows Server 2003/2008 Standard Edition, these address ranges are moved beyond the 4GB border and additional physical memory is made available to the kernel and processes. This does not work for Windows clients as these are limited to an address range of 4GB regardless of PAE. The different behaviour is caused by the more strict requirements for drivers on Windows servers. As PAE makes driver development more complex, it is not enforced for consumer devices. On a typical, physical server, PAE can make up to 650MB of physical memory available to the system. On modern laptops, more than 1GB is masked by address ranges for device drivers. Examining the memory subsystemby looking at the available bytes and the pagefile usage is not conclusive, because Windows begins paging memory to the pagefile as soons as it has booted. Therefore, this poster is based on examining the commit charge. The operating systemmaintains a metric called Commit Limit which denotes the overall amount of virtual memory available to the system. It is represented by the sumof physical memory built into the systemand pagefile(s) available to the system. The Commit Limit may change while the systemis running because Windows may be allowed to manage the size of the pagefile(s) automatically and decide to shrink or enlarge it. By using the metric Committed Bytes, performance monitor exposes the number of bytes used by the systemregardless of the located (physical memory or pagefile). Using the commit charge instead of the available physical bytes gives an impression when the systemwill hit the hard limit of the available virtual memory. In addition, monitoring the effect of paging on the performance of the physical disk is essential because paging activity is the primary metric to determine whether the memory subsystemis overloaded. If paging activity increases, the disk subsystemneeds to be analyzed. Dont use the metric Page Faults/s instead of Pages/s because it also includes soft page faults which are resolved without reading pages fromthe physical disk. Whenever the operating systemflags a page to be removed fromphysical memory, it is not immediately written to the pagefile instead rather the page reference is added to the standby list maintained by the kernel. As long as the page is only flagged for paging out, a request for the process memory mapped to the page is called a soft page fault because it can easily be resolved by restoring the page to the active page list. And even after the page has been written to the pagefile, it may still be recovered quickly if the corresponding physical memory has not been reused. Only when the page has been assigned again will a hard page fault be caused by accessing the originaly virtual memory area. It will then have to be retrieved fromthe physical disk resulting in a time consuming process delaying the normal execution of the corresponding process. Memory Nicholas Dille, MVP for Remote Desktop Services (http://blogs.sepago.de/nicholas, @NicholasDille, [email protected]) Nicholas Dille is an IT architect at sepago GmbH and has been engaged in enterprise projects for many years. He specializes in centralizing IT infrastructures, consolidating resources and managing capacity. In his community blog, he writes articles with deep technical insight around Remote Desktop Services and related technologies. sepago is an IT consultancy located in Cologne, Germany (http://www.sepago.de) and has achieved Microsoft Gold partner status as well as Citrix Platinumpartner status. Copyright 2011 by Nicholas Dille About the Author Performance monitoring is a mysterious domain. Often an issue is easily traced to a certain subsystembut sometimes a bottleneck is much harder to uncover. This poster offers a thorough overview of the relations between the individual subsystems and provides a guide to support the analysis of the more complex and hard to narrow down issues. The five subsystems covered on this poster are ordered by complexity. To begin with, the famous processor, memory and disk subsystems are covered followed by a dive into the analysis of running processes and the kernel. The analysis for each of the subsystems is structured into steps with increasing complexity. If necessary or appropriate, a step contains references to another subsystems which may be participating in the issue. This poster describes performance analysis for the most important subsystems by providing vital insights, the relation to other subsystems and a step-by-step guide how to analyze the subsystem. It is recommended that you follow the five steps in the analysis of performance issues. Updates to this poster will be published at http://www.sepago.de/d/sepago-backstage/blogs/tags/poster. The Performance Monitoring Poster Many of the described checks mention a threshold without providing a value. This is due to the fact that the threshold depends on the environment as well as the workload. For example, a threshold for the processor activity may be 80% to account for expected peaks. In another environment, the threshold describes a rather constant upper limit for the load and therefore may be set to 90% processor activity. Several steps ask for a metric to be high but do not specify what high means. Again the notion of a high value depends on the analyzed environment. As a rule of thumb, the value for a metric can be considered high when it deviates strongly froma typical corridor. The left graph below displays a characteristic increase in processor activity by 50% due to a process gone astray and claiming a single CPU core on a two-core system. In such a case, activity changes by (1/# cores). Resolving the issue (often by killing the responsible process) results in an equivalent decrease of activity. Humans are very good at spotting such peaks by merely looking at a graph. But at the same time, humans are easily distracted by values seeming extreme at first glance. This process can be (somewhat) formalized by plotting a histogramof all values contained in a series of values. An especially high or low value is set apart fromthe cluster of values. The right graph above shows that the activity is around 20% most of the time exposing values of 70-80% to be unusually high in a few cases. This should results in a detailed analysis when and why these high values are observed. Virtual Memory It is a very common mistake to confuse the address space with the overall memory. The address space denotes the number of bytes addressable by the system. Inside resides the memory available to the systemwhich may be taking up only a fraction of the address space or even all of it. For example, with PAE a 32 bit systemis able to address 64GB of memory but it may be equipped with only 4GB of physical memory. Systemmemory is addressed in pages which are mapped into the address space as required by kernel or processes through allocation. The address space is split in two equally sized parts one is reserved for the kernel and the other one is presented to processes as its own virtual address space. But due to the mapping of memory into the address space there is no such segmentation of systemmemory. Rather, it is reserved as necessary by either kernel or process. As explained in the box Memory, only half of the address space is available to processes. In fact, every process works in its own virtual address space as if half of the address space was excusively available to the process. Read more about this topic in the Windows Internals books (http://technet.microsoft.com/en-us/sysinternals/bb963901). Introduction Check: Privileged time Process\% Privileged Time is high? Whenever a process issues an API call, the kernel takes over. Privileged time usually stresses the systemmore than user time. Warning: Many I/O operations The process is heavily exchanging data with one or more of the mentioned sources. Have a look at networking metrics for the overall systemto narrow down the cause. Check: User time Process\% User Time is high? This expresses how much time the processor spends serving the process in its normal operations. Warning: Heavily data processing This may be typical for the application but sometimes processes enter an endless loop and thereby increase processor activity. Warning: Many I/O operations Privileged time is closely related to services I/O operations. Have a look at the disk subsystemto narrow down the cause. Check: I/O behaviour Process\IO Data Bytes/sec This counter exposes all data bytes written fromor read to files, the network and devices. Warning: Process requires a lot of memory The process may be using a lot of memory to fulfill its purpose or it may even suffer froma memory leak. Nevertheless, depending on the workload, the systemmay scale worse with regard to this process. Check: Process memory Process\Working Set The working set usually provides a convenient value to judge the load on the memory subsystem. Warning: Too many threads! The systemis suffering fromtoo many active threads. The kernel is spending too much processing time on switching back and forth between them. Warning: Too many interrupts! Interrupts are interfering with the normal operation. For example, a network interface may be receiving too many packets (each causing an interrupt). Check: Thread switching System\Context Switches/sec is high? Thread switching takes time which reduces the amount of processor time available to user and systemprocesses. Check: Interrupts Processor\% Interrupt Time Interrupts represent messages from hardware devices. Note that this metric is tracked per processor core. Warning: Disk activity is high Although the threshold is somewhat dependent on the system, activity beyond 80% must be considered to be high. Though, the disk may just be handling the load. Warning: Disk competition is too high Too many I/O operations are competing for the disk. Examine the memory subsystemto determine whether this is caused by paging. Check: Disk activity PhysicalDisk\% Disk Time > Threshold? The disk activity measures the amount of time required to service I/O request. Check: Disk competition PhysicalDisk\Average Disk Queue Length > 2 The queue length expresses the number of I/O operations waiting to be serviced. A backlog shows high competion for the disk. Warning: Memory depleted Although the systemmay still be responsive the situation should be resolved quickly. Processes will fail to launch and requests for memory will be denied. Warning: Virtual memory is depleted The systemmay be fighting for life. Examine the disk subsystem! Check paging: Memory\Pages/sec is high? The counter shows the number of pages read fromand written to the page file. Disk activity increases with this counter. Check: Virtual memory Memory\Committed Bytes > Size(RAM) + 0,5 * Size(Pagefile)? Check: Activity Processor\% Processor Time > Threshold? The processor activity measures the time required to satisfy the need of running processes for execution time. Warning: Processor activity is high This may not be a problemas the processor may just be handling the load. Is this a temporary peak? Do you see a steady increase in activity? Warning: Processor competition is high There is a backlog of processing waiting for execution time. The processor is overloaded. Reduce the concurrency by decreasing the number of processes, users or requests. Check: Competition System\Processor Queue Length > 2/core? The queue length expresses the number of processes waiting for the processor. Important Note The processor queue length is only provided on a per-systembasis. This is causing pains because the threshold presented for the disk competition (2 per core) needs to be multiplied by the number of cores present in the system. The number of cores is only exposed through WMI: wmic cpu get DeviceID,NumberOfLogicalProcessors Warning: Hardware is causing high activity The remedy highly depends on the hardware causing the interrupts. For example: NICs may be receiving to many packets. Check other subsystems No indication but still sluggish response? 1. High disk activity may be slowing down the system. Examine the disk subsystem. 2. The kernel may be using the processor for internal tasks. Examine the kernel. Check: Hardware events Processor\% Interrupt Time > Threshold? Whenever a hardware interrupt is issued it causes a temporary pause in normal execution of processes. Kernel Address Space Process Address Space Process Address Space Process Address Space Virtual Memory Page Mapping Virtual Memory Address Space Page Mapping 0 20 40 60 80 100 120 10 20 30 40 50 60 70 80 90 100 # v a l u e s activity [%] 0 20 40 60 80 100 a c t i v i t y [ % ] time S t e p 4 S t e p