0% found this document useful (0 votes)

39 views14 pages

Off-CPU Analysis

Off-CPU analysis is a performance methodology that focuses on measuring and studying the time threads spend waiting while blocked on I/O, locks, and other factors, as opposed to running on-CPU. This approach is complementary to traditional CPU analysis and can be implemented using techniques like tracing and sampling, particularly with tools like eBPF in modern Linux environments. The document outlines prerequisites, methods, overhead considerations, and provides insights into applying off-CPU analysis effectively across different operating systems.

Uploaded by

Cherry Zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views14 pages

Off-CPU Analysis

Uploaded by

Cherry Zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

3/24/25, 2:43 PM Off-CPU Analysis

Brendan's site:
Start Here Off-CPU Analysis
Homepage
Blog Performance issues can be categorized
Sys Perf book
BPF Perf book into one of two types:
Linux Perf
eBPF Tools On-CPU: where threads are
perf Examples spending time running on-CPU.
Perf Methods
USE Method Off-CPU: where time is spent
TSA Method waiting while blocked on I/O, locks,
Off-CPU Analysis
Active Bench.
timers, paging/swapping, etc.
WSS Estimation
Flame Graphs Off-CPU analysis is a performance
Heat Maps methodology where off-CPU time is
Frequency Trails
Colony Graphs measured and studied, along with context
DTrace Tools such as stack traces. It differs from CPU
DTraceToolkit profiling, which only examines threads if
DtkshDemos
Generic thread states
Guessing Game they are executing on-CPU. Here, the
Specials target are thread states that are blocked and off-CPU, as shown in blue in the diagram on the right.
Books
Other Sites Off-CPU analysis is complementary to CPU analysis, so that 100% of thread time can be understood. This method
also differs from tracing techniques that instrument applications functions that commonly block, since this method
targets the kernel scheduler's notion of blocking, which is a convenient catch-all.

Threads can leave CPU for a number of reasons, including I/O and locks, but also some that are unrelated to the
current thread's execution, including involuntary context switching due to high demand for CPU resources, and
interrupts. Whatever the reason, if this occurs during a workload request (a synchronous code-path), then it is
introducing latency.

On this page I'll introduce off-CPU time as a metric, and summarize techniques for off-CPU analysis. As an
example implementation of off-CPU analysis I'll apply it to Linux, then summarize other OSes in later sections.

Table of contents:

1. Prerequisites 9. Flame Graphs

2. Introduction 10. Wakeups
3. Overhead 11. Other OSes
4. Linux 12. Origin
5. Off-CPU Time 13. Summary
6. Off-CPU Analysis 14. Updates
7. Request Context
8. Caveats

Prerequisites
Off-CPU analysis require stack traces to be available to tracers, which you may need to fix first. Many applications
are compiled with the -fomit-frame-pointer gcc option, breaking frame pointer-based stack walking. VM runtimes
like Java compile methods on the fly, and tracers may not find their symbol information without additional help,
causing the stack trace to be hexadecimal only. There are other gotchas as well. See my previous write ups on fixing
Stack Traces and JIT Symbols for perf.

Introduction
To explain the role of off-CPU analysis, I'll summarize CPU sampling and tracing first for comparison. I'll then
summarize two off-CPU analysis approaches: tracing, and sampling. While I've been promoting off-CPU analysis
for over a decade, it is still not a widely used methodology, in part due to the lack of tools in production Linux
environments to measure it. That is now changing with eBPF and newer Linux kernels (4.8+).

1. CPU Sampling

https://www.brendangregg.com/offcpuanalysis.html 1/14
3/24/25, 2:43 PM Off-CPU Analysis
Many traditional profiling tools use timed sampling of activity across all CPUs, collecting snapshots of the current
Brendan's site: instruction address (program counter) or entire stack back trace at a certain time interval or rate (eg, 99 Hertz). This
Start Here will give counts of either the running function or the stack trace, allowing reasonable estimates to be calculated of
Homepage
Blog where the CPU cycles are being spent. On Linux, the perf tool in sampling mode (eg, -F 99) does timed CPU
Sys Perf book sampling.
BPF Perf book
Linux Perf Consider application function A() calls function B(), which makes a blocking system call:
eBPF Tools
perf Examples
Perf Methods CPU Sampling ----------------------------------------------->
USE Method | | | | | | | | | | | |
TSA Method A A A A B B B B A A A A
Off-CPU Analysis A(---------. .----------)
Active Bench. | |
B(--------. .--)
WSS Estimation | | user-land
Flame Graphs - - - - - - - - - - syscall - - - - - - - - - - - - - - - - -
Heat Maps | | kernel
Frequency Trails X Off-CPU |
Colony Graphs block . . . . . interrupt
DTrace Tools
DTraceToolkit
DtkshDemos While this can be very effective for studying on-CPU issues, including hot code-paths and adaptive mutex spins, it
Guessing Game doesn't gather data when the application has blocked and is waiting off-CPU.
Specials
Books
Other Sites 2. Application Tracing

App Tracing ------------------------------------------------>

| | | |
A( B( B) A)

Here functions are instrumented so that timestamps are collected when they begin "(" and end ")", so that the time
spent in functions can be calculated. If the timestamps include elapsed time and CPU time (eg, using times(2) or
getrusage(2)), then it is also possible to calculate which functions were slow on-CPU vs which functions were slow
because they were blocked off-CPU. Unlike sampling, these timestamps can have a very high resolution
(nanoseconds).

While this works, a disadvantage is that you either trace all application functions, which can have a significant
performance impact (and affect the performance you are trying to measure), or you pick the functions that are likely
to block, and hope you didn't miss any.

3. Off-CPU Tracing

I'll summarize this here, then explain it in more detail in the next sections.

Off-CPU Tracing -------------------------------------------->

With this approach, only the kernel functions that switch the thread off-CPU are traced, along with timestamps and
user-land stack traces. This focuses on the off-CPU events, without needing to trace all of the application functions,
or needing to know what the application is. This approach works for any blocking event, for any application:
MySQL, Apache, Java, etc.

https://www.brendangregg.com/offcpuanalysis.html 2/14
3/24/25, 2:43 PM Off-CPU Analysis

Brendan's site: Off-CPU tracing captures all wait events for any application.
Start Here
Homepage
Blog Later on this page I'll trace kernel off-CPU events and include some application level instrumentation to filter out
Sys Perf book asynchronous wait times (eg, threads waiting for work). Unlike application level instrumentation, I don't need to
BPF Perf book hunt down every place that may block off-CPU; I just need to identify that the application is in a time sensitive code
Linux Perf
eBPF Tools path (eg, during a MySQL query), so that the latency is synchronous to the workload.
perf Examples
Perf Methods Off-CPU tracing is the main approach I've been using for Off-CPU analysis. But there's also sampling.
USE Method
TSA Method
4. Off-CPU Sampling
Off-CPU Analysis
Active Bench.
WSS Estimation Off-CPU Sampling ------------------------------------------->
Flame Graphs | | | | | | |
Heat Maps O O O O O O O
Frequency Trails A(---------. .----------)
| |
Colony Graphs
B(--------. .--)
DTrace Tools | | user-land
DTraceToolkit - - - - - - - - - - syscall - - - - - - - - - - - - - - - - -
DtkshDemos | | kernel
Guessing Game X Off-CPU |
Specials block . . . . . interrupt
Books
Other Sites
This approach uses timed sampling to capture blocked stack traces from threads that are not running on-CPU. It
can also be accomplished by a wall-time profiler: one that samples all threads all the time, regardless of whether
they are on- or off-CPU. The wall-time profile output can then be filtered to look for just the off-CPU stacks.

Off-CPU sampling is rarely used by system profilers. Sampling is usually implemented as per-CPU timer interrupts,
that then examine the currently running process that was interrupted: producing an on-CPU profile. An off-CPU
sampler must work in a different way: either setting up timers in each application thread to wake them up and
capture stacks, or having the kernel walk all threads at an interval and capture their stacks.

Overhead
WARNING: With off-CPU tracing, be aware that scheduler events can be very frequent – in extreme cases, millions
of events per second – and although tracers may only add a tiny amount of overhead to each event, due to the
event rate that overhead can add up and become significant. Off-CPU sampling has overhead issues too, as systems
may have tens of thousands of threads which must be constantly sampled, orders of magnitude higher overhead
than CPU sampling across the CPU count only.

To use off-CPU analysis, you'll want to pay attention to overheads every step of the way. Tracers that dump every
event to user-space for post processing can easily become prohibitive, creating Gbytes of trace data per minute.
That data must be written to the file system and storage devices and post-processed, costing more CPU. This is why
tracers that can do in-kernel summaries, like Linux eBPF, are so important for reducing overhead and making off-
CPU analysis practical. Also beware of feedback loops: tracers tracing events caused by themselves.

If I'm completely in the dark with a new scheduler tracer, I'll begin by tracing for one tenth of a second only (0.1s),
and then ratchet it up from there, while closely measuring the impact on system CPU utilization, application request
rates, and application latency. I'll also consider the rate of context switches (eg, measured via the "cs" column in
vmstat), and be more careful on servers with higher rates.

To give you some idea of the overheads, I tested an 8 CPU system running Linux 4.15, with a heavy MySQL load
causing 102k context switches per second. The server was running at CPU saturation (0% idle) on purpose, so that
any tracer overhead will cause a noticeable loss in application performance. I then compared off-CPU analysis via
scheduler tracing with Linux perf and eBPF, which demonstrate different approaches: perf for event dumping, and
eBPF for in-kernel summarizing:

Using perf to trace every scheduler event caused a 9% drop in throughput while tracing, with occasional 12%
drops while the perf.data capture file was flushed to disk. That file ended up 224 Mbytes for 10 seconds of
tracing. The file was then post processed by running perf script to do symbol translation, which cost a 13%
performance drop (the loss of 1 CPU) for 35 seconds. You could summarize this by saying the 10 second perf
trace cost 9-13% overhead for 45 seconds.

https://www.brendangregg.com/offcpuanalysis.html 3/14
3/24/25, 2:43 PM Off-CPU Analysis
Using eBPF to count stacks in kernel context instead caused a 6% drop in throughput during the 10 second
Brendan's site: trace, which began with a 13% drop for 1 second as eBPF was initialized, and was followed by 6 seconds of
Start Here post-processing (symbol resolution of the already-summarized stacks) costing a 13% drop. So a 10 second
Homepage
Blog trace cost 6-13% overhead for 17 seconds. Much better.
Sys Perf book
BPF Perf book What happens when the trace duration is increased? For eBPF it's only capturing and translating unique stacks,
Linux Perf
eBPF Tools which won't scale linearly with the trace duration. I tested this by increasing the trace from 10 to 60 seconds, which
perf Examples only increased eBPF post processing from 6 to 7 seconds. The same with perf increased its post processing from 35
Perf Methods seconds to 212 seconds, as it needed to process 6x the volume of data. To finish understanding this fully, it's worth
USE Method
TSA Method
noting that post processing is a user-level activity that can be tuned to interfere less with the production workload,
Off-CPU Analysissuch as by using different scheduler priorities. Imagine capping CPU at 10% (of one CPU) for this activity: the
Active Bench. performance loss may then be negligible, and the eBPF post-processing stage may then take 70 seconds – not too
WSS Estimation
Flame Graphs bad. But the perf script time may then take 2120 seconds (35 minutes), which would stall an investigation. And
Heat Maps perf's overhead isn't just CPU, it's also disk I/O.
Frequency Trails
Colony Graphs How does this MySQL example compare to production workloads? It was doing 102k context switches per second,
DTrace Tools
DTraceToolkit which is relatively high: many production systems I see at the moment are in the 20-50k/s range. That would
DtkshDemos suggest that the overheads described here are about 2x higher than I would see on those production systems.
Guessing Game However, the MySQL stack depth is relatively light, often only 20-40 frames, whereas production applications can
Specials
Books exceed 100 frames. That matters as well, and may mean that my MySQL stack walking overheads are perhaps only
Other Sites half what I would see in production. So this may balance out for my production systems.

Linux: perf, eBPF

Off-CPU analysis is a generic approach that should work on any operating system. I'll demonstrate doing it on
Linux using off-CPU tracing, then summarize other OSes in later sections.

There are many tracers available on Linux for off-CPU analysis. I'll use eBPF here, as it can easily do in-kernel
summaries of stack traces and times. eBPF is part of the Linux kernel, and I'll use it via the bcc frontend tools.
These need at least Linux 4.8 for stack trace support.

You may wonder how I did off-CPU analysis before eBPF. Lots of different ways, including completely different
approaches for each blocking type: storage tracing for storage I/O, kernel statistics for scheduler latency, and so on.
To actually do off-CPU analysis before, I've used SystemTap, and also perf event logging – although that has higher
overhead (I wrote about it in perf_events Off-CPU Time Flame Graph). At one point I wrote a simple wall-time
kernel-stack profiler called proc-profiler.pl, which sampled /proc/PID/stack for a given PID. It worked well
enough. I'm not the first to hack up such a wall-time profiler either, see poormansprofiler and Tanel Poder's
quick'n'dirty troubleshooting.

Off-CPU Time
This is the time that threads spent waiting off-CPU (blocked time), and not running on-CPU. It can be measured as
totals across a duration (already provided by /proc statistics), or measured for each blocking event (usually requires
a tracer).

To start with, I'll show total off-CPU time from a tool that may already be familiar to you. The time(1) command.
Eg, timing tar(1):

$ time tar cf archive.tar linux-4.15-rc2

real 0m50.798s
user 0m1.048s
sys 0m11.627s

tar took about one minute to run, but the time command shows it only spent 1.0 seconds of user-mode CPU time,
and 11.6 seconds of kernel-mode CPU time, out of a total 50.8 seconds of elapsed time. We are missing 38.2
seconds! That is the time the tar command was blocked off-CPU, no doubt doing storage I/O as part of its archive
generation.

To examine off-CPU time in more detail, either dynamic tracing of kernel scheduler functions or static tracing using
the sched tracepoints can be used. The bcc/eBPF project includes cpudist that does this, developed by Sasha

https://www.brendangregg.com/offcpuanalysis.html 4/14
3/24/25, 2:43 PM Off-CPU Analysis
Goldshtein, which has a -O mode that measures off-CPU time. This requires Linux 4.4 or higher. Measuring tar's
Brendan's site: off-CPU time:
Start Here
Homepage
Blog # /usr/share/bcc/tools/cpudist -O -p `pgrep -nx tar`
Tracing off-CPU time... Hit Ctrl-C to end.
Sys Perf book
^C
BPF Perf book usecs : count distribution
Linux Perf 0 -> 1 : 3 | |
eBPF Tools 2 -> 3 : 50 | |
perf Examples 4 -> 7 : 289 | |
Perf Methods 8 -> 15 : 342 | |
USE Method 16 -> 31 : 517 | |
TSA Method 32 -> 63 : 5862 |*** |
Off-CPU Analysis 64 -> 127 : 30135 |**************** |
128 -> 255 : 71618 |****************************************|
Active Bench.
256 -> 511 : 37862 |********************* |
WSS Estimation 512 -> 1023 : 2351 |* |
Flame Graphs 1024 -> 2047 : 167 | |
Heat Maps 2048 -> 4095 : 134 | |
Frequency Trails 4096 -> 8191 : 178 | |
Colony Graphs 8192 -> 16383 : 214 | |
DTrace Tools 16384 -> 32767 : 33 | |
DTraceToolkit 32768 -> 65535 : 8 | |
DtkshDemos 65536 -> 131071 : 9 | |
Guessing Game
Specials
This shows that most of the blocking events were between 64 and 511 microseconds, which is consistent with flash
Books
Other Sites storage I/O latency (this is a flash-based system). The slowest blocking events, while tracing, reached the 65 to 131
millisecond second range (the last bucket in this histogram).

What does this off-CPU time consist of ? Everything from when a thread blocked to when it began running again,
including scheduler delay.

At the time of writing this, cpudist uses kprobes (kernel dynamic tracing) to instrument the finish_task_switch()
kernel function. (It should use the sched tracepoint, for API stability reasons, but the first attempt wasn't successful
and was reverted for now.)

The prototype for finish_task_switch() is:

static struct rq finish_task_switch(struct task_struct prev)

To give you an idea of how this tool works: The finish_task_switch() function is called in the context of the next-
running thread. An eBPF program can instrument this function and argument using kprobes, fetch the current PID
(via bpf_get_current_pid_tgid()), and also fetch a high resolution timestamp (bpf_ktime_get_ns()). This is all the
information needed for the above summary, which uses an eBPF map to efficiently store the histogram buckets in
kernel context. Here is the full source to cpudist.

eBPF is not the only tool on Linux for measuring off-CPU time. The perf tool provides a "wait time" column in its
perf sched timehist output, which excludes scheduler time as it's shown in the adjacent column separately. That
output shows the wait time for each scheduler event, and costs more overhead to measure than the eBPF histogram
summary.

Measuring off-CPU times as a histogram is a little bit useful, but not a lot. What we really want to know is context –
why are threads blocking and going off-CPU. This is the focus of off-CPU analysis.

Off-CPU Analysis
Off-CPU analysis is the methodology of analyzing off-CPU time along with stack traces to identify the reason that
threads were blocking. The off-CPU tracing analysis technique can be easy to implement due to this principle:

Application stack traces don't change while off-CPU.

This means we only need to measure the stack trace once, either at the beginning or end of the off-CPU period.
The end is usually easier, since you're recording the time interval then anyway. Here is tracing pseudocode for
measuring off-CPU time with stack traces:

https://www.brendangregg.com/offcpuanalysis.html 5/14
3/24/25, 2:43 PM Off-CPU Analysis

on context switch finish:

Brendan's site: sleeptime[prev_thread_id] = timestamp
Start Here if !sleeptime[thread_id]
Homepage return
delta = timestamp - sleeptime[thread_id]
Blog
totaltime[pid, execname, user stack, kernel stack] += delta
Sys Perf book sleeptime[thread_id] = 0
BPF Perf book
Linux Perf on tracer exit:
eBPF Tools for each key in totaltime:
perf Examples print key
Perf Methods print totaltime[key]
USE Method
TSA Method
Off-CPU AnalysisSome notes on this: all measurements happen from one instrumentation point, the end of the context switch
Active Bench. routine, which is in the context of the next thread (eg, the Linux finish_task_switch() function). That way, we can
WSS Estimation calculate the off-CPU duration at the same time as retrieving the context for that duration by simply fetching the
Flame Graphs
Heat Maps current context (pid, execname, user stack, kernel stack), which tracers make easy.
Frequency Trails
Colony Graphs This is what my offcputime bcc/eBPF program does, which needs at least Linux 4.8 to work. I'll demonstrate using
DTrace Tools bcc/eBPF offcputime to measure blocking stacks for the tar program. I'll restrict this to kernel stacks only to start
DTraceToolkit
DtkshDemos with (-K):
Guessing Game
Specials
Books
Other Sites

https://www.brendangregg.com/offcpuanalysis.html 6/14
3/24/25, 2:43 PM Off-CPU Analysis

# /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tar`

Brendan's site: Tracing off-CPU time (us) of PID 15342 by kernel stack... Hit Ctrl-C to end.
Start Here ^C
Homepage [...]
Blog
finish_task_switch
Sys Perf book __schedule
BPF Perf book schedule
Linux Perf schedule_timeout
eBPF Tools __down
perf Examples down
Perf Methods xfs_buf_lock
USE Method _xfs_buf_find
TSA Method xfs_buf_get_map
xfs_buf_read_map
Off-CPU Analysis
xfs_trans_read_buf_map
Active Bench. xfs_da_read_buf
WSS Estimation xfs_dir3_block_read
Flame Graphs xfs_dir2_block_getdents
Heat Maps xfs_readdir
Frequency Trails iterate_dir
Colony Graphs SyS_getdents
DTrace Tools entry_SYSCALL_64_fastpath
DTraceToolkit - tar (18235)
203075
DtkshDemos
Guessing Game finish_task_switch
Specials __schedule
Books schedule
Other Sites schedule_timeout
wait_for_completion
xfs_buf_submit_wait
xfs_buf_read_map
xfs_trans_read_buf_map
xfs_imap_to_bp
xfs_iread
xfs_iget
xfs_lookup
xfs_vn_lookup
lookup_slow
walk_component
path_lookupat
filename_lookup
vfs_statx
SYSC_newfstatat
entry_SYSCALL_64_fastpath
- tar (18235)
661626

finish_task_switch
__schedule
schedule
io_schedule
generic_file_read_iter
xfs_file_buffered_aio_read
xfs_file_read_iter
__vfs_read
vfs_read
SyS_read
entry_SYSCALL_64_fastpath
- tar (18235)
18413238

I've truncated the output to the last three stacks. The last, showing a total of 18.4 seconds of off-CPU time, is in
the read syscall path ending up with io_schedule() – this is tar reading file contents, and blocking on disk I/O. The
stack above it shows 662 milliseconds in a stat syscall, which also ends up waiting for storage I/O via
xfs_buf_submit_wait(). The top stack, with a total of 203 milliseconds, appears to show tar blocking on locks while
doing a getdents syscall (directory listing).

Interpreting these stack traces takes a little familiarity with the source code, which depends on how complex the
application is and its language. The more you do this, the quicker you'll become, as you'll recognize the same
functions and stacks.

I'll now include user-level stacks:

https://www.brendangregg.com/offcpuanalysis.html 7/14
3/24/25, 2:43 PM Off-CPU Analysis

# /usr/share/bcc/tools/offcputime -p `pgrep -nx tar`

Brendan's site: Tracing off-CPU time (us) of PID 18311 by user + kernel stack... Hit Ctrl-C to end.
Start Here [...]
Homepage
finish_task_switch
Blog
__schedule
Sys Perf book schedule
BPF Perf book io_schedule
Linux Perf generic_file_read_iter
eBPF Tools xfs_file_buffered_aio_read
perf Examples xfs_file_read_iter
Perf Methods __vfs_read
USE Method vfs_read
TSA Method SyS_read
entry_SYSCALL_64_fastpath
Off-CPU Analysis
[unknown]
Active Bench. - tar.orig (30899)
WSS Estimation 9125783
Flame Graphs
Heat Maps
Frequency Trails This didn't work: user-level stacks are just "[unknown]". The reason is that the default version of tar is compiled
Colony Graphs without frame pointers, and this version of bcc/eBPF needs them to walk stack traces. I wanted to show what this
DTrace Tools
DTraceToolkit gotcha looks like in case you hit it as well.
DtkshDemos
Guessing Game I did fix tar's stacks (see Prerequisites earlier) to see what they looked like:
Specials
Books
Other Sites # /usr/share/bcc/tools/offcputime -p `pgrep -nx tar`
Tracing off-CPU time (us) of PID 18375 by user + kernel stack... Hit Ctrl-C to end.
[...]

finish_task_switch
__schedule
schedule
io_schedule
generic_file_read_iter
xfs_file_buffered_aio_read
xfs_file_read_iter
__vfs_read
vfs_read
SyS_read
entry_SYSCALL_64_fastpath
__read_nocancel
dump_file0
dump_file
dump_dir0
dump_dir
dump_file0
dump_file
dump_dir0
dump_dir
dump_file0
dump_file
dump_dir0
dump_dir
dump_file0
dump_file
create_archive
main
__libc_start_main
[unknown]
- tar (15113)
426525
[...]

Ok, so it looks like tar has a recursive walk algorithm for the file system tree.

Those stack traces are great – it's showing why the application was blocking and waiting off-CPU, and how long for.
This is exactly the sort of information I'm usually looking for. However, blocking stack traces aren't always so
interesting, as sometimes you need to look for request-synchronous context.

Request-Synchronous Context
Applications that wait for work, like web servers with pools of threads waiting on a socket, present a challenge for
off-CPU analysis: often most of the blocking time will be in stacks waiting for work, rather than doing work. This
floods the output with stacks that aren't very interesting.

As an example of this phenomenon, here are off-CPU stacks for a MySQL server process that is doing nothing.
Zero requests per second:
https://www.brendangregg.com/offcpuanalysis.html 8/14
3/24/25, 2:43 PM Off-CPU Analysis

# /usr/share/bcc/tools/offcputime -p `pgrep -nx mysqld`

Brendan's site: Tracing off-CPU time (us) of PID 29887 by user + kernel stack... Hit Ctrl-C to end.
Start Here ^C
Homepage [...]
Blog
finish_task_switch
Sys Perf book __schedule
BPF Perf book schedule
Linux Perf do_nanosleep
eBPF Tools hrtimer_nanosleep
perf Examples sys_nanosleep
Perf Methods entry_SYSCALL_64_fastpath
USE Method __GI___nanosleep
TSA Method srv_master_thread
start_thread
Off-CPU Analysis
- mysqld (29908)
Active Bench. 3000333
WSS Estimation
Flame Graphs finish_task_switch
Heat Maps __schedule
Frequency Trails schedule
Colony Graphs futex_wait_queue_me
DTrace Tools futex_wait
DTraceToolkit do_futex
sys_futex
DtkshDemos entry_SYSCALL_64_fastpath
Guessing Game pthread_cond_timedwait@@GLIBC_2.3.2
Specials os_event::wait_time_low(unsigned long, long)
Books srv_error_monitor_thread
Other Sites start_thread
- mysqld (29906)
3000342

finish_task_switch
__schedule
schedule
read_events
do_io_getevents
SyS_io_getevents
entry_SYSCALL_64_fastpath
[unknown]
LinuxAIOHandler::poll(fil_node_t**, void**, IORequest*)
os_aio_handler(unsigned long, fil_node_t**, void**, IORequest*)
fil_aio_wait(unsigned long)
io_handler_thread
start_thread
- mysqld (29896)
3500863
[...]

Various threads are polling for work and other background tasks. These background stacks can dominate the
output, even for a busy MySQL server. What I'm usually looking for is off-CPU time during a database query or
command. That's the time that matters – the time that's hurting the end customer. To find those in the output, I
need to hunt around for stacks in query context.

For example, now from a busy MySQL server:

https://www.brendangregg.com/offcpuanalysis.html 9/14
3/24/25, 2:43 PM Off-CPU Analysis

# /usr/share/bcc/tools/offcputime -p `pgrep -nx mysqld`

Brendan's site: Tracing off-CPU time (us) of PID 29887 by user + kernel stack... Hit Ctrl-C to end.
Start Here ^C
Homepage [...]
Blog
finish_task_switch
Sys Perf book __schedule
BPF Perf book schedule
Linux Perf io_schedule
eBPF Tools wait_on_page_bit_common
perf Examples __filemap_fdatawait_range
Perf Methods file_write_and_wait_range
USE Method ext4_sync_file
TSA Method do_fsync
SyS_fsync
Off-CPU Analysis
entry_SYSCALL_64_fastpath
Active Bench. fsync
WSS Estimation log_write_up_to(unsigned long, bool)
Flame Graphs trx_commit_complete_for_mysql(trx_t*)
Heat Maps [unknown]
Frequency Trails ha_commit_low(THD*, bool, bool)
Colony Graphs TC_LOG_DUMMY::commit(THD*, bool)
DTrace Tools ha_commit_trans(THD*, bool, bool)
DTraceToolkit trans_commit_stmt(THD*)
mysql_execute_command(THD*, bool)
DtkshDemos mysql_parse(THD*, Parser_state*)
Guessing Game dispatch_command(THD*, COM_DATA const*, enum_server_command)
Specials do_command(THD*)
Books handle_connection
Other Sites pfs_spawn_thread
start_thread
- mysqld (13735)
1086119

[...]

This stack identifies some time (latency) during a query. The do_command() -> mysql_execute_command() code
path is a give away. I know this because I'm familiar with the code from all parts of this stack: MySQL and kernel
internals.

You can imagine writing a simple text post-processor, that plucked out the stacks of interest based on some
application-specific pattern matching. And that might work fine. There's another way, which is a little more efficient,
although also requires application specifics: extending the tracing program to also instrument application requests
(the do_command() function, in this MySQL server example), and to then only record off-CPU time if it occurred
during the application request. I've done it before, it can help.

Caveats
The biggest caveat is the overhead of off-CPU analysis, as described earlier in the overhead section, followed by
getting stack traces to work, which I summarized in the earlier Prerequisites section. There is also scheduler latency
and involuntary context switches to be aware of, which I'll summarize here, and wakeup stacks which I'll discuss in a
later section.

Scheduler Latency

Something that's missing from these stacks is if the off-CPU time includes time spent waiting on a CPU run queue.
This time is known as scheduler latency, run queue latency, or dispatcher queue latency. If the CPUs are running at
saturation, then any time a thread blocks, it may endure additional time waiting its turn on a CPU after being woken
up. That time will be included in the off-CPU time.

You can use extra trace events to tease apart off-CPU time into time blocked vs scheduler latency, but in practice,
CPU saturation is pretty easy to spot, so you are unlikely to be spending much time studying off-CPU time when
you have a known CPU saturation issue to deal with.

Involuntary Context Switching

If you see user-level stack traces that don't make sense – that show no reason to be blocking and going off-CPU – it
could be due to involuntary context switching. This often happens when the CPUs are saturated, and the kernel
CPU scheduler gives threads turns on CPU, then kicks them off when they reach their time slice. The threads can
be kicked off anytime, such as in the middle of a CPU heavy code-path, and the resulting off-CPU stack trace
makes no sense.

https://www.brendangregg.com/offcpuanalysis.html 10/14
3/24/25, 2:43 PM Off-CPU Analysis
Here is an example stack from offcputime that is likely an involuntary context switch:
Brendan's site:
Start Here # /usr/share/bcc/tools/offcputime -p `pgrep -nx mysqld`
Homepage Tracing off-CPU time (us) of PID 29887 by user + kernel stack... Hit Ctrl-C to end.
Blog [...]
Sys Perf book
BPF Perf book finish_task_switch
Linux Perf __schedule
eBPF Tools schedule
perf Examples exit_to_usermode_loop
prepare_exit_to_usermode
Perf Methods swapgs_restore_regs_and_return_to_usermode
USE Method Item_func::type() const
TSA Method JOIN::make_join_plan()
Off-CPU Analysis JOIN::optimize()
Active Bench. st_select_lex::optimize(THD*)
WSS Estimation handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long)
Flame Graphs [unknown]
Heat Maps mysql_execute_command(THD*, bool)
Frequency Trails Prepared_statement::execute(String*, bool)
Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*)
Colony Graphs mysqld_stmt_execute(THD*, unsigned long, unsigned long, unsigned char*, unsigned long)
DTrace Tools dispatch_command(THD*, COM_DATA const*, enum_server_command)
DTraceToolkit do_command(THD*)
DtkshDemos handle_connection
Guessing Game pfs_spawn_thread
Specials start_thread
Books - mysqld (30022)
Other Sites 13

[...]

It's not clear (based on the function names) why this thread blocked in Item_func::type(). I suspect this is an
involuntary context switch, as the server was CPU saturated.

A workaround with offcputime is to filter on the TASK_UNINTERRUPTIBLE state (2):

# /usr/share/bcc/tools/offcputime -p `pgrep -nx mysqld` --state 2

On Linux, involuntary context switches occur for state TASK_RUNNING (0), whereas the blocking events we're
usually interested in are in TASK_INTERRUPTIBLE (1) or TASK_UNINTERRUPTIBLE (2), which offcputime
can match on using --state. I used this feature in my Linux Load Averages: Solving the Mystery post.

Flame Graphs
Flame Graphs are a visualization of profiled stack traces, and are very useful for quickly comprehending the
hundreds of pages of stack trace output that can be generated by off-CPU analysis. Yichun Zhang first created off-
CPU time flame graphs, using SystemTap.

The offcputime tool has a -f option for emitting stack traces in "folded format": semi-colon delimited on one line,
followed by the metric. This is the format that my FlameGraph software takes as input.

For example, creating an off-CPU flame graph for mysqld:

# /usr/share/bcc/tools/offcputime -df -p `pgrep -nx mysqld` 30 > out.stacks

[...copy out.stacks to your local system if desired...]
# git clone https://github.com/brendangregg/FlameGraph
# cd FlameGraph
# ./flamegraph.pl --color=io --title="Off-CPU Time Flame Graph" --countname=us < out.stacks > out.svg

Then open out.svg in a web browser. It looks like this (SVG, PNG):

https://www.brendangregg.com/offcpuanalysis.html 11/14
3/24/25, 2:43 PM Off-CPU Analysis

Off-CPU Time Flame Graph S

Brendan's site: fi..
finish_.. __.. finis
Start Here __sched.. sc.. __sc
Homepage schedule io.. sche
Blog io_sche.. ge.. io_s
generic.. __.. gene
Sys Perf book
__vfs_r.. vf.. __vf
BPF Perf book vfs_read Sy.. vfs_
Linux Perf SyS_pre.. do.. SyS
eBPF Tools entry_S.. re.. do_
- retu
perf Examples __GI___libc_.. -
Perf Methods [unknown] __G
USE Method os_file_read.. [unk
fil_io(IOReq.. os_f
TSA Method buf_read_pag.. fil_io
Off-CPU Analysis buf_page_get.. buf_
Active Bench. btr_cur_sear.. buf_
[unknown] btr_
WSS Estimation
ha_innobase:.. [unk
Flame Graphs handler::mul.. row_
Heat Maps DsMrr_impl::.. ha_i
Frequency Trails [unknown] handl
test_quick_s.. join_re
Colony Graphs JOIN::estima.. JOIN::
DTrace Tools finish_task_switch [.. JOIN::make_join_plan(
DTraceToolkit finish_task_switch __schedule sub.. JOIN::optimize()
__schedule schedule JOI.. st_select_lex::optimize(
DtkshDemos schedule f.. read_events handle_query(THD*, LEX*, Que
Guessing Game io_schedule _.. do_io_getevents [unknown]
Specials generic_file_read_iter s.. SyS_io_getevents mysql_execute_command(THD*
__vfs_read i.. fi.. f.. finish_t.. entry_SYSCALL_64_fastpath Prepared_statement::execute(St
Books
vfs_read g.. __.. _.. __schedule - Prepared_statement::execute_loo
Other Sites SyS_pread64 _.. sc.. s.. schedule [unknown] mysqld_stmt_execute(THD*, unsig
entry_SYSCALL_64_fastpath v.. ex.. e.. exit_to_.. LinuxAIOHandler::poll(fil_node_t**, void**, IORequest*) dispatch_command(THD*, COM_D
- S.. sy.. s.. do_sysca.. os_aio_handler(unsigned long, fil_node_t**, void**, IOR.. do_command(THD*)
__GI___libc_pread e.. en.. e.. return_f.. fil_aio_wait(unsigned long) handle_connection
[unknown] - - - io_handler_thread pfs_spawn_thread
[unknown] _.. __r.. send start_thread
mysqld

Much better: this shows all off-CPU stack traces, with stack depth on the y-axis, and the width corresponds to the
total time in each stack. The left-to-right ordering has no meaning. There are delimiter frames "-" between the
kernel and user stacks, which were inserted by offcputime's -d option.

You can click to zoom. For example, click on the "do_command(THD*)" frame on the right bottom, to zoom into
the blocking paths that happened during a query. You might want to generate flame graphs which only show these
paths, which can be as simple as grep, since the folded format is one line per stack:

# grep do_command < out.stacks | ./flamegraph.pl --color=io --title="Off-CPU Time Flame Graph" --countname=us > o

The resulting flame graph (SVG, PNG):

https://www.brendangregg.com/offcpuanalysis.html 12/14
3/24/25, 2:43 PM Off-CPU Analysis

Off-CPU Time Flame Graph Se

Brendan's site:
finish_task_swi..
Start Here finish_task_switch __schedule finish_task_switch
Homepage __schedule schedule __schedule
Blog schedule io_schedule schedule
io_schedule generic_file_re.. io_schedule
Sys Perf book generic_file_read_iter __vfs_read generic_file_read_iter
BPF Perf book __vfs_read vfs_read __vfs_read
Linux Perf vfs_read SyS_pread64 vfs_read
f.. f.. SyS_pread64 do_syscall_64 SyS_pread64
eBPF Tools _.. _.. entry_SYSCALL_64_fastpath return_from_SYS.. f.. do_syscall_64
perf Examples fini.. s.. s.. - _.. return_from_SYSCALL_64
Perf Methods __sc.. i.. i.. fi.. __GI___libc_pread s.. f.. -
sche.. g.. g.. __.. [unknown] i.. _.. __GI___libc_pread finish.. f..
USE Method io_s.. _.. _.. sc.. os_file_read_func(IORequest&, int, void*, unsigned lo.. g.. s.. [unknown] __sche.. _..
TSA Method gene.. v.. v.. fu.. fil_io(IORequest const&, bool, page_id_t const&, page.. _.. i.. os_file_read_func(IORequest&, int, v.. schedule s..
Off-CPU Analysis __vf.. S.. S.. fu.. buf_read_page(page_id_t const&, page_size_t const&) v.. g.. fil_io(IORequest const&, bool, page_.. sys_sc.. i..
vfs_.. e.. e.. do.. f.. buf_page_get_gen(page_id_t const&, page_size_t const&.. S.. _.. buf_read_page(page_id_t const&, page_.. do_sys.. g..
Active Bench. SyS_.. - - sy.. _.. btr_cur_search_to_nth_level(dict_index_t*, unsigned l.. e.. v.. buf_page_get_gen(page_id_t const&, pa.. return.. _..
WSS Estimation entr.. _.. r.. en.. s.. [unknown] - S.. btr_cur_search_to_nth_level(dict_inde.. - v..
Flame Graphs - [.. h.. - f.. ha_innobase::records_in_range(unsigned int, st_key_ran.. F.. e.. [unknown] sched_.. S.. f
row_sear.. h.. my.. f.. handler::multi_range_read_info_const(unsigned int, st_.. I.. - row_search_mvcc(unsigned char*, page_cur_mode.. e.. _
Heat Maps f.. fi.. ha_innob.. h.. Fi.. d.. DsMrr_impl::dsmrr_info_const(unsigned int, st_range_se.. I.. h.. ha_innobase::index_read(unsigned char*, unsig.. - s
Frequency Trails _.. __.. handler::.. Q.. So.. s.. [unknown] [.. handler::ha_index_read_idx_map(unsigned char*, un.. I.. f
Colony Graphs s.. sc.. handler::.. fileso.. e.. test_quick_select(THD*, Bitmap<64u>, unsigned long lon.. join_read_const_table(JOIN_TAB*, st_position*) [.. f
i.. fu.. QUICK_RAN.. QEP_TA.. - JOIN::estimate_rowcount() JOIN::extract_func_dependent_tables() b.. d
DTrace Tools g.. fu.. [unknown] join_i.. _.. JOIN::make_join_plan() o.. s
DTraceToolkit _.. do.. sub_select(JOIN*, Q.. my.. JOIN::optimize() e
DtkshDemos v.. sy.. JOIN::exec() lo.. st_select_lex::optimize(THD*) -
S.. en.. handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) p
Guessing Game e.. - [unknown] s
Specials - It.. mysql_execute_command(THD*, bool)
Books P.. Prepared_statement::execute(String*, bool)
Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*)
Other Sites mysqld_stmt_execute(THD*, unsigned long, unsigned long, unsigned char*, unsigned long)
dispatch_command(THD*, COM_DATA const*, enum_server_command)
do_command(THD*)
handle_connection
pfs_spawn_thread
start_thread
mysqld

That looks great.

For more on off-CPU flame graphs, see my Off-CPU Flame Graphs page.

Wakeups
Now that you know how to do off-CPU tracing, and generate flame graphs, you're starting to really look at these
flame graphs and interpret them. You may find that many off-CPU stacks show the blocking path, but not the full
reason it was blocked. That reason and code path is with another thread, the one that called a wakeup on a blocked
thread. This happens all the time.

I cover this topic in my Off-CPU Flame Graphs page, along with two tools: wakeuptime and offwaketime, to
measure wakeup stacks and also to associate them with off-CPU stacks.

Other Operating Systems

Solaris: DTrace can be used for off-cpu tracing. Here is my original page on this: Solaris Off-CPU Analysis.
FreeBSD: Off-CPU analysis can be performed using procstat -ka for kernel-stack sampling, and DTrace for
user- and kernel-stack tracing. I've created a separate page for this: FreeBSD Off-CPU Analysis.

Origin
I started using this methodology in about 2005 after exploring uses of the DTrace sched provider and its
sched:::off-cpu probe. I called it off-CPU analysis and the metric off-CPU time, after the DTrace probe name (not a
perfect name: when teaching a DTrace class in Adelaide in 2005, a Sun engineer said I shouldn't call it off-CPU
since the CPU isn't "off"). The Solaris Dynamic Tracing Guide had some examples of measuring the time from
sched:::off-cpu to sched:::on-cpu for some specific cases and process states. I didn't filter on process state, and
instead captured all off-CPU events, and included the stack trace to explain why. I think this approach is obvious, so
I doubt I'm the first to do this, but I seem to have been the only person really promoting this methodology for a
while. I wrote about it in 2007 in DTracing Off-CPU Time, and in later posts and talks.

https://www.brendangregg.com/offcpuanalysis.html 13/14
3/24/25, 2:43 PM Off-CPU Analysis

Brendan's site: Summary

Start Here
Homepage Off-CPU analysis can be an effective way to locate types of latency where threads block and wait for some other
Blog event. By tracing this from the kernel scheduler functions that context switch threads, all off-CPU latency types can
Sys Perf book
BPF Perf book be analyzed in the same way, without needing to trace multiple sources. To see the context for the off-CPU event to
Linux Perf understand why it occurred, both user and kernel stack back traces can be inspected.
eBPF Tools
perf Examples With CPU profiling and off-CPU analysis, you have the full picture of where threads spend their time. These are
Perf Methods
USE Method complementary techniques.
TSA Method
Off-CPU AnalysisFor more on off-CPU analysis, see the visualizations in Off-CPU Flame Graphs and Hot/Cold Flame Graphs.
Active Bench.
WSS Estimation
Flame Graphs Updates
Heat Maps
Frequency Trails
Colony Graphs My first post on off-CPU analysis was in 2007: DTracing Off-CPU Time.
DTrace Tools
DTraceToolkit Updates from 2012:
DtkshDemos
Guessing Game
I included off-CPU analysis as a part of a Stack Profile Method in my USENIX LISA 2012 talk. The Stack
Specials
Books Profile Method is the technique of collecting both CPU and off-CPU stacks, to study all the time spent by
Other Sites threads.

Updates from 2013:

Yichun Zhang created off-CPU flame graphs using SystemTap and gave the talk Introduction to off CPU
Time Flame Graphs (PDF).
I included off-CPU flame graphs in my USENIX LISA 2013 talk Blazing Performance with Flame Graphs.

Updates from 2015:

I posted FreeBSD Off-CPU Flame Graphs to explore the procstat -ka off-CPU sampling approach.
I posted Linux perf_events Off-CPU Time Flame Graph to show how these could be created with perf event
logging – if all you had was perf (use eBPF instead).

Updates from 2016:

I posted Linux eBPF Off-CPU Flame Graph to show the value of this and help make the case for stack traces
in eBPF (I had to hack them in for this proof of concept).
I posted Linux Wakeup and Off-Wake Profiling to show wakeup stack analysis.
I posted Who is waking the waker? (Linux chain graph prototype as a proof of concept of walking a chain of
wakeups.
I described the importance of off-CPU analysis at the start of my Facebook Performance@Scale talk: Linux
BPF Superpowers (slides).

Updates from 2017:

I used a few off-CPU flame graphs in my Linux Load Averages: Solving the Mystery post.
I summarized off-CPU, wakeup, off-wake, and chain graphs in my USENIX ATC 2017 talk on Flame Graphs:
youtube, slides.

Last updated: 27-Dec-2017

https://www.brendangregg.com/offcpuanalysis.html 14/14

Linux Performance Tools (LinuxCon NA) - Brendan Gregg
No ratings yet
Linux Performance Tools (LinuxCon NA) - Brendan Gregg
90 pages
Profiling & Tracing With Perf - Julia Evans
No ratings yet
Profiling & Tracing With Perf - Julia Evans
24 pages
Cm93 Download Charts
67% (3)
Cm93 Download Charts
3 pages
Manual: Training - Debugger Basics
No ratings yet
Manual: Training - Debugger Basics
164 pages
Ftrace Linux Kernel Tracing: Steven Rostedt
No ratings yet
Ftrace Linux Kernel Tracing: Steven Rostedt
50 pages
Step-By-Step App Deployment Via Microsoft Intune
No ratings yet
Step-By-Step App Deployment Via Microsoft Intune
8 pages
WX 10 Stormscope Manual
0% (1)
WX 10 Stormscope Manual
3 pages
A Kernel Trace Device For Plan9
No ratings yet
A Kernel Trace Device For Plan9
29 pages
Training Debugger
No ratings yet
Training Debugger
213 pages
Festival Tycoon User Content Tutorial
100% (1)
Festival Tycoon User Content Tutorial
12 pages
Manual: Tricore Debugger and Trace
No ratings yet
Manual: Tricore Debugger and Trace
139 pages
Final
No ratings yet
Final
38 pages
Flutter Riverpod Tutorial With Usage & Advantages
No ratings yet
Flutter Riverpod Tutorial With Usage & Advantages
8 pages
t2 Ammp 8
No ratings yet
t2 Ammp 8
5 pages
Eeus2012 Singhvi
No ratings yet
Eeus2012 Singhvi
26 pages
Linux Perf Examples
No ratings yet
Linux Perf Examples
45 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Kernel Tracing Using eBPF
No ratings yet
Kernel Tracing Using eBPF
31 pages
Linux Performance 2018: Brendan Gregg
No ratings yet
Linux Performance 2018: Brendan Gregg
26 pages
Profiling and Tracing
No ratings yet
Profiling and Tracing
9 pages
Scale10x Performance 120806162249 Phpapp02
No ratings yet
Scale10x Performance 120806162249 Phpapp02
86 pages
P51a 03 Part2
No ratings yet
P51a 03 Part2
38 pages
KernelRecipes Perf Events
No ratings yet
KernelRecipes Perf Events
79 pages
Linuxbpfsuperpowers 160302200247
No ratings yet
Linuxbpfsuperpowers 160302200247
60 pages
Access The Performance-Counter On Ubuntu Linux
No ratings yet
Access The Performance-Counter On Ubuntu Linux
8 pages
A0 Class
No ratings yet
A0 Class
30 pages
Ebpfperftools2019 190310010642
No ratings yet
Ebpfperftools2019 190310010642
39 pages
Javaone2016javaflamegraphs 160920172322
No ratings yet
Javaone2016javaflamegraphs 160920172322
71 pages
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
No ratings yet
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
92 pages
ACM Applicative 2016: System Methodology
No ratings yet
ACM Applicative 2016: System Methodology
57 pages
Debugging Embefsdf
No ratings yet
Debugging Embefsdf
29 pages
Lisa2016bpftools169 161209002318
No ratings yet
Lisa2016bpftools169 161209002318
68 pages
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
No ratings yet
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
84 pages
Thenewsystemsperformance 131014005720 Phpapp01
No ratings yet
Thenewsystemsperformance 131014005720 Phpapp01
17 pages
Monitorama2015netflixinstanceanalysis 150616190732 Lva1 App6892
No ratings yet
Monitorama2015netflixinstanceanalysis 150616190732 Lva1 App6892
69 pages
Scale2017perfanalysisbpf169 170304230834
No ratings yet
Scale2017perfanalysisbpf169 170304230834
70 pages
Readme
No ratings yet
Readme
3 pages
Unit 5 - Linux System Performance
No ratings yet
Unit 5 - Linux System Performance
27 pages
Presentation 9
No ratings yet
Presentation 9
28 pages
Awsreinvent2014perftuningec2 141112191859 Conversion Gate02
No ratings yet
Awsreinvent2014perftuningec2 141112191859 Conversion Gate02
81 pages
Tutorial - Perf Wiki
No ratings yet
Tutorial - Perf Wiki
23 pages
Lca2017bpftracingandmore169 170118050907
No ratings yet
Lca2017bpftracingandmore169 170118050907
72 pages
Advanced Dtrace: Tips, Tricks and Gotchas
No ratings yet
Advanced Dtrace: Tips, Tricks and Gotchas
56 pages
Debugger Tricore
No ratings yet
Debugger Tricore
145 pages
Perf - Event Docume N
No ratings yet
Perf - Event Docume N
41 pages
Percona2016linuxsystemsperf 160421182216
No ratings yet
Percona2016linuxsystemsperf 160421182216
72 pages
Bpftools2017 190108214100
No ratings yet
Bpftools2017 190108214100
44 pages
Linux 4.10
No ratings yet
Linux 4.10
19 pages
Lisa19 Slides Gregg
No ratings yet
Lisa19 Slides Gregg
64 pages
Avoiding Memory Scanners
No ratings yet
Avoiding Memory Scanners
33 pages
Linux Performance Tools: Brendan Gregg
No ratings yet
Linux Performance Tools: Brendan Gregg
90 pages
How To Sos Report
No ratings yet
How To Sos Report
6 pages
Pstack, Truss, Etc: by Riyaj Shamsudeen
No ratings yet
Pstack, Truss, Etc: by Riyaj Shamsudeen
39 pages
Tools and Techniques For Advanced Debugging in Unix
No ratings yet
Tools and Techniques For Advanced Debugging in Unix
20 pages
Linux Performance Analysis New Tools and Old Secrets: Brendan Gregg
No ratings yet
Linux Performance Analysis New Tools and Old Secrets: Brendan Gregg
75 pages
Debugger Tricore
No ratings yet
Debugger Tricore
135 pages
Linux Profiling at Netflix
No ratings yet
Linux Profiling at Netflix
84 pages
PSR 2920 2018-12-07T111416 Linux Observability Superpowers
No ratings yet
PSR 2920 2018-12-07T111416 Linux Observability Superpowers
47 pages
Pstack Truss Etc
No ratings yet
Pstack Truss Etc
39 pages
App Trace
No ratings yet
App Trace
15 pages
875 3766 10 Acsls
No ratings yet
875 3766 10 Acsls
576 pages
All in 1 PDF
No ratings yet
All in 1 PDF
625 pages
Grandstream Networks, Inc.: Grandstream Firmware Upgrade Utility Tool
No ratings yet
Grandstream Networks, Inc.: Grandstream Firmware Upgrade Utility Tool
8 pages
Mcafee Endpoint Security 10.6.0 - Installation Guide (Unmanaged) - Windows
No ratings yet
Mcafee Endpoint Security 10.6.0 - Installation Guide (Unmanaged) - Windows
25 pages
Windows 8.1 AIO x64 Dec2013
No ratings yet
Windows 8.1 AIO x64 Dec2013
2 pages
UCS303 Syllabus For 2020 and 2021
No ratings yet
UCS303 Syllabus For 2020 and 2021
4 pages
Using HyperTerminal in Windows 7 or Vista
No ratings yet
Using HyperTerminal in Windows 7 or Vista
1 page
Primavera P6 Enterprise Project Portfolio Management Release 18.8 Tested Configurations
No ratings yet
Primavera P6 Enterprise Project Portfolio Management Release 18.8 Tested Configurations
18 pages
BEXEL Manager Manual
No ratings yet
BEXEL Manager Manual
78 pages
Class - IV, ICT-Syllabus and Lesson Plan
No ratings yet
Class - IV, ICT-Syllabus and Lesson Plan
9 pages
BACA - Windows 8.1 AIO 7in1 x86 Sep10-2013
No ratings yet
BACA - Windows 8.1 AIO 7in1 x86 Sep10-2013
2 pages
Bizhub PRO 950 Service Manual - IC Unit
No ratings yet
Bizhub PRO 950 Service Manual - IC Unit
18 pages
Aws Reference Architecture Oracle Rac On Vmware Cloud PDF
No ratings yet
Aws Reference Architecture Oracle Rac On Vmware Cloud PDF
1 page
Fix "The Application Was Unable To Start (0xc000007b) " Error PDF
No ratings yet
Fix "The Application Was Unable To Start (0xc000007b) " Error PDF
6 pages
DMFT
No ratings yet
DMFT
14 pages
Understanding Memory
No ratings yet
Understanding Memory
23 pages
Kernel API
No ratings yet
Kernel API
341 pages
Infinity Apps Guide For Administrators V37.a
No ratings yet
Infinity Apps Guide For Administrators V37.a
74 pages
The Exec Family (Replacing A Process Image)
No ratings yet
The Exec Family (Replacing A Process Image)
7 pages
Wa0003.
No ratings yet
Wa0003.
28 pages
Create and Extend File System Oracle Cloud
No ratings yet
Create and Extend File System Oracle Cloud
6 pages
An Introduction To The Unix Shell
No ratings yet
An Introduction To The Unix Shell
24 pages
Apis Iq Software Installation v70 en
No ratings yet
Apis Iq Software Installation v70 en
17 pages
SW HW Data Types
No ratings yet
SW HW Data Types
1 page
Os QB by Namma Vtu
No ratings yet
Os QB by Namma Vtu
2 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
Study Guide 300-615 Dcit Troubleshooting Cisco Data Centre Infrastructure
From Everand
Study Guide 300-615 Dcit Troubleshooting Cisco Data Centre Infrastructure
Anand Vemula
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
A Practical Guide Wireshark Forensics
From Everand
A Practical Guide Wireshark Forensics
alasdair gilchrist
5/5 (4)

Off-CPU Analysis

Uploaded by

Off-CPU Analysis

Uploaded by

3/24/25, 2:43 PM Off-CPU Analysis

1. Prerequisites 9. Flame Graphs

App Tracing ------------------------------------------------>

Off-CPU Tracing -------------------------------------------->

Linux: perf, eBPF

$ time tar cf archive.tar linux-4.15-rc2

The prototype for finish_task_switch() is:

static struct rq *finish_task_switch(struct task_struct *prev)

Application stack traces don't change while off-CPU.

on context switch finish:

# /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tar`

I'll now include user-level stacks:

# /usr/share/bcc/tools/offcputime -p `pgrep -nx tar`

# /usr/share/bcc/tools/offcputime -p `pgrep -nx mysqld`

For example, now from a busy MySQL server:

# /usr/share/bcc/tools/offcputime -p `pgrep -nx mysqld`

Involuntary Context Switching

A workaround with offcputime is to filter on the TASK_UNINTERRUPTIBLE state (2):

# /usr/share/bcc/tools/offcputime -p `pgrep -nx mysqld` --state 2

For example, creating an off-CPU flame graph for mysqld:

# /usr/share/bcc/tools/offcputime -df -p `pgrep -nx mysqld` 30 > out.stacks

Off-CPU Time Flame Graph S

The resulting flame graph (SVG, PNG):

Off-CPU Time Flame Graph Se

That looks great.

Other Operating Systems

Brendan's site: Summary

Updates from 2013:

Updates from 2015:

Updates from 2016:

Updates from 2017:

Last updated: 27-Dec-2017

You might also like

static struct rq finish_task_switch(struct task_struct prev)