Linux Performance Analysis and Tools: Brendan Gregg
Linux Performance Analysis and Tools: Brendan Gregg
Brendan Gregg
Lead Performance Engineer
[email protected]
@brendangregg
SCaLE11x
February, 2013
Hardware
Applications
DBs, all server types, ...
System Libraries
Linux Kernel
Scheduler
Sockets
ZFS
CPU
Interconnect
CPU
1
TCP/UDP
LVM
IP
Ethernet
Virtual
Memory
Memory
Bus
DRAM
Device Drivers
I/O Bus
I/O Bridge
Expander Interconnect
I/O Controller
Disk
Disk
Network Controller
Interface
Transports
Port
Port
whoami
Lead Performance Engineer
Work/Research: tools, visualizations, methodologies
Was Brendan@Sun Microsystems, Oracle, now Joyent
Joyent
High-Performance Cloud Infrastructure
Compete on cloud instance/OS performance
Public/private cloud provider
OS-Virtualization for bare metal performance (Zones)
Core developers of SmartOS and node.js
KVM for Linux guests
http://dtrace.org/blogs/brendan/2012/01/30/performance-analysis-talk-at-scale10x/
Sunday, February 24, 13
Systems
Performance
ENTERPRISE
AND THE CLOUD
Brendan Gregg
Prentice Hall, 2013
Agenda
Background
Linux Analysis and Tools
Basic
Intermediate
Advanced
Methodologies
Challenges
Performance
Why do performance analysis?
Reduce IT spend find and eliminate waste, find areas to
tune, and do more with less
Systems Performance
Why study the operating system?
Find and fix kernel-based perf issues
Perspectives
System analysis can be top-down, or bottom-up:
Workload
Developers
Operating
System
Software
Stack
Application
Workload
Analysis
System Libraries
System Calls
Kernel
Sysadmins
Devices
Resource
Analysis
Kernel Internals
Eventually youll need to know some kernel internals
Operating System
Applications
DBs, all server types, ...
System Libraries
user-level
Linux Kernel
VFS
ext3/...
Sockets
ZFS
LVM
TCP/UDP
IP
Ethernet
Device Drivers
Scheduler
Virtual
Memory
kernel-level
%user
0.02
kB_read/s
7.37
5.51
02/20/2013
%steal
0.00
kB_wrtn/s
2.15
7.79
_x86_64_ (1 CPU)
%idle
99.84
kB_read
80735422
60333940
kB_wrtn
23571828
85320072
Hardware
Applications
DBs, all server types, ...
System Libraries
System Call Interface
VFS
ext3/...
Sockets
ZFS
CPU
1
TCP/UDP
LVM
IP
Scheduler
Ethernet
Virtual
Memory
DRAM
Device Drivers
I/O Bridge
I/O Controller
Disk
Disk
Network Controller
Port
Port
strace
Operating System
netstat
perf
top
pidstat
mpstat
dstat
Applications
DBs, all server types, ...
System Libraries
System Call Interface
VFS
ext3/...
Scheduler
Sockets
ZFS
TCP/UDP
LVM
Virtual
Memory
IP
Ethernet
Device Drivers
iostat
iotop
blktrace
dtrace
perf
I/O Bridge
I/O Controller
Disk
Disk
perf
CPU
1
vmstat
slabtop
dstat
free
DRAM
top
dstat
Network Controller
Port
ping
Sunday, February 24, 13
Hardware
Port
Various:
sar
/proc
Tools: Basic
uptime
top or htop
mpstat
iostat
vmstat
free
ping
nicstat
dstat
uptime
Shows load averages, which are also shown by other tools:
$ uptime
16:23:34 up 126 days,
1:03,
1 user,
If the load is greater than the CPU count, it might mean the
CPUs are saturated (100% utilized), and threads are suffering
scheduler latency. Might. Theres that disk I/O factor too.
top
System-wide and per-process summaries:
$ top
top - 01:38:11 up 63 days, 1:17, 2 users, load average: 1.57, 1.81, 1.77
Tasks: 256 total,
2 running, 254 sleeping,
0 stopped,
0 zombie
Cpu(s): 2.0%us, 3.6%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 49548744k total, 16746572k used, 32802172k free,
182900k buffers
Swap: 100663292k total,
0k used, 100663292k free, 14925240k cached
PID
11721
11715
10
51
11724
1
[...]
USER
web
web
root
root
admin
root
PR
20
20
20
20
20
20
top, cont.
Interview questions:
1. Does it show all CPU consumers?
2. A process has high %CPU next steps for analysis?
top, cont.
1. top can miss:
short-lived processes
kernel threads (tasks), unless included (see top options)
2. analyzing high CPU processes:
identify why profile code path
identify what execution or stall cycles
High %CPU time may be stall cycles on memory I/O
upgrading to faster CPUs doesnt help!
htop
Super top. Super configurable. Eg, basic CPU visualization:
mpstat
Check for hot threads, unbalanced workloads:
$ mpstat -P ALL
02:47:49
CPU
02:47:50
all
02:47:50
0
02:47:50
1
02:47:50
2
02:47:50
3
02:47:50
4
02:47:50
5
02:47:50
6
02:47:50
7
02:47:50
8
[...]
1
%usr
54.37
22.00
19.00
24.00
100.00
100.00
100.00
100.00
16.00
100.00
%nice
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
%sys %iowait
33.12
0.00
57.00
0.00
65.00
0.00
52.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
63.00
0.00
0.00
0.00
%irq
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
%idle
12.50
21.00
16.00
24.00
0.00
0.00
0.00
0.00
21.00
0.00
iostat
Disk I/O statistics. 1st output is summary since boot.
$ iostat -xkdz 1
Linux 2.6.35-32-server (prod21)
02/20/13
_x86_64_
(16 CPU)
Device:
sda
sdb
rrqm/s
0.00
0.00
wrqm/s
0.00
0.35
r/s
0.00
0.00
w/s
0.00
0.05
rkB/s
0.00
0.10
wkB/s
0.00
1.58
Device:
sdb
rrqm/s
0.00
wrqm/s
0.00
r/s
591.00
w/s
0.00
rkB/s
2364.00
wkB/s
0.00
\
/
\
/
\
/
...
...
...
...
...
...
workload input
...
...
...
...
...
...
...
\
/
\
/
\
/
\
avgqu-sz
0.00
0.00
0.00
svctm
0.84
0.30
2.31
%util
0.00
0.00
0.00
avgqu-sz
0.95
svctm
1.61
%util
95.00
resulting performance
iostat, cont.
%util: usefulness depends on target virtual devices backed
by multiple disks may accept more work a 100% utilization
vmstat
Virtual-Memory statistics, and other high-level summaries:
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---r b
swpd
free
buff cache
si
so
bi
bo
15 0
2852 46686812 279456 1401196
0
0
0
0
16 0
2852 46685192 279456 1401196
0
0
0
0
15 0
2852 46685952 279456 1401196
0
0
0
56
15 0
2852 46685960 279456 1401196
0
0
0
0
[...]
-system-- ----cpu---in
cs us sy id wa
0
0 0 0 100 0
2136 36607 56 33 11 0
2150 36905 54 35 11 0
2173 36645 54 33 13 0
free
Memory usage summary (Kbytes default):
$ free
total
Mem:
49548744
-/+ buffers/cache:
Swap:
100663292
used
32787912
32383628
0
free
16760832
17165116
100663292
shared
0
buffers
61588
cached
342696
ping
Simple network test (ICMP):
$ ping www.hilton.com
PING a831.b.akamai.net (63.234.226.9): 56 data bytes
64 bytes from 63.234.226.9: icmp_seq=0 ttl=56 time=737.737 ms
Request timeout for icmp_seq 1
64 bytes from 63.234.226.9: icmp_seq=2 ttl=56 time=819.457 ms
64 bytes from 63.234.226.9: icmp_seq=3 ttl=56 time=897.835 ms
64 bytes from 63.234.226.9: icmp_seq=4 ttl=56 time=669.052 ms
64 bytes from 63.234.226.9: icmp_seq=5 ttl=56 time=799.932 ms
^C
--- a831.b.akamai.net ping statistics --6 packets transmitted, 5 packets received, 16.7% packet loss
round-trip min/avg/max/stddev = 669.052/784.803/897.835/77.226 ms
nicstat
Network statistics tool, ver 1.92 on Linux:
# nicstat -z 1
Time
Int
rKB/s
01:20:58
eth0
0.07
01:20:58
eth4
0.28
01:20:58 vlan123
0.00
01:20:58
br0
0.00
Time
Int
rKB/s
01:20:59
eth4 42376.0
Time
Int
rKB/s
01:21:00
eth0
0.05
01:21:00
eth4 41834.7
Time
Int
rKB/s
01:21:01
eth4 42017.9
[...]
wKB/s
rPk/s
wPk/s
0.00
0.95
0.02
0.01
0.20
0.10
0.00
0.00
0.02
0.00
0.00
0.00
wKB/s
rPk/s
wPk/s
974.5 28589.4 14002.1
wKB/s
rPk/s
wPk/s
0.00
1.00
0.00
977.9 28221.5 14058.3
wKB/s
rPk/s
wPk/s
979.0 28345.0 14073.0
rAvs
79.43
1451.3
42.00
42.00
rAvs
1517.8
rAvs
56.00
1517.9
rAvs
1517.9
wAvs
64.81
80.11
64.81
42.07
wAvs
71.27
wAvs
0.00
71.23
wAvs
71.24
%Util
0.00
0.00
0.00
0.00
%Util
35.5
%Util
0.00
35.1
%Util
35.2
Sat
0.00
0.00
0.00
0.00
Sat
0.00
Sat
0.00
0.00
Sat
0.00
This was the tool I wanted, and finally wrote it out of frustration
(Tim Cook ported and enhanced it on Linux)
dstat
A better vmstat-like tool. Does coloring (FWIW).
Hardware
Applications
DBs, all server types, ...
top
mpstat
dstat
System Libraries
System Call Interface
VFS
ext3/...
Scheduler
Sockets
ZFS
Virtual
Memory
IP
vmstat
TCP/UDP
LVM
Ethernet
Device Drivers
iostat
infer
Disk
Disk
dstat
DRAM
nicstat
Network Controller
Port
ping
Sunday, February 24, 13
dstat
free
top
infer
I/O Bridge
I/O Controller
CPU
1
Port
Tools: Intermediate
sar
netstat
pidstat
strace
tcpdump
blktrace
iotop
slabtop
sysctl
/proc
Sunday, February 24, 13
sar
System Activity Reporter. Eg, paging statistics -B:
$ sar -B 1
Linux 3.2.6-3.fc16.x86_64 (node104)
05:24:34
05:24:35
05:24:36
05:24:37
05:24:38
05:24:39
05:24:40
[...]
PM
PM
PM
PM
PM
PM
PM
pgpgin/s pgpgout/s
0.00
0.00
19.80
0.00
12.12
0.00
0.00
0.00
220.00
0.00
2206.06
0.00
02/20/2013 _x86_64_
fault/s
267.68
265.35
1339.39
534.00
644.00
6188.89
majflt/s
0.00
0.99
1.01
0.00
3.00
17.17
(1 CPU)
%vmeff
0.00
0.00
100.00
0.00
0.00
100.00
netstat
Various network protocol statistics using -s:
$ netstat -s
[...]
Tcp:
127116 active connections openings
165223 passive connection openings
12904 failed connection attempts
19873 connection resets received
20 connections established
662889209 segments received
354923419 segments send out
405146 segments retransmited
6 bad segments received.
26379 resets sent
[...]
TcpExt:
2142 invalid SYN cookies received
3350 resets received for embryonic SYN_RECV sockets
7460 packets pruned from receive queue because of socket buffer overrun
2932 ICMP packets dropped because they were out-of-window
96670 TCP sockets finished time wait in fast timer
86 time wait sockets recycled by time stamp
1007 packets rejects in established connections because of timestamp
[...many...]
pidstat
Very useful process breakdowns:
# pidstat 1
Linux 3.2.6-3.fc16.x86_64 (node107)
02/20/2013
_x86_64_ (1 CPU)
05:55:18 PM
05:55:19 PM
05:55:19 PM
PID
12642
12643
%usr %system
0.00
1.01
5.05
11.11
%guest
0.00
0.00
%CPU
1.01
16.16
CPU
0
0
Command
pidstat
cksum
05:55:19 PM
05:55:20 PM
[...]
PID
12643
%usr %system
6.93
6.93
%guest
0.00
%CPU
13.86
CPU
0
Command
cksum
# pidstat -d 1
Linux 3.2.6-3.fc16.x86_64 (node107)
02/20/2013
_x86_64_ (1 CPU)
05:55:22 PM
05:55:23 PM
05:55:23 PM
PID
kB_rd/s
279
0.00
12643 151985.71
kB_wr/s kB_ccwr/s
61.90
0.00
0.00
0.00
Command
jbd2/vda2-8
cksum
05:55:23 PM
05:55:24 PM
[...]
PID
12643
kB_wr/s kB_ccwr/s
0.00
0.00
Command
cksum
kB_rd/s
96616.67
strace
System call tracer:
$ strace -tttT -p
1361424797.229550
1361424797.239053
1361424797.239406
1361424797.239738
1361424797.240145
<0.000017>
[...]
12670
read(3, "REQUEST 1888 CID 2"..., 65536) = 959 <0.009214>
read(3, "", 61440)
= 0 <0.000017>
close(3)
= 0 <0.000016>
munmap(0x7f8b22684000, 4096) = 0 <0.000023>
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
strace, cont.
-c: print summary:
# strace -c dd if=/dev/zero of=/dev/null bs=512 count=1024k
[...]
% time
seconds usecs/call
calls
errors syscall
------ ----------- ----------- --------- --------- ---------------51.32
0.028376
0
1048581
read
48.68
0.026911
0
1048579
write
0.00
0.000000
0
7
open
[...]
200x slower
tcpdump
Sniff network packets, dump to output files for post analysis:
# tcpdump -i eth4 -w /tmp/out.tcpdump
tcpdump: listening on eth4, link-type EN10MB (Ethernet), capture size 65535
bytes
^C33651 packets captured
34160 packets received by filter
508 packets dropped by kernel
# tcpdump -nr /tmp/out.tcpdump
reading from file /tmp/out.tcpdump, link-type EN10MB (Ethernet)
06:24:43.908732 IP 10.2.0.2.55502 > 10.2.203.2.22: Flags [.], ack
06:24:43.908922 IP 10.2.0.2.55502 > 10.2.203.2.22: Flags [.], ack
06:24:43.908943 IP 10.2.203.2.22 > 10.2.0.2.55502: Flags [.], seq
06:24:43.909061 IP 10.2.0.2.55502 > 10.2.203.2.22: Flags [.], ack
...
...
...
...
tcpdump, cont.
Does have overhead in terms of CPU and storage; previous
example dropped packets
blktrace
Block device I/O event tracing. Launch using btrace, eg:
# btrace
8,16
8,16
8,16
8,16
8,16
8,16
8,16
8,16
[...]
/dev/sdb
3
1
3
2
3
3
3
4
3
5
3
6
3
7
1
1
0.429604145
0.429604569
0.429606014
0.429607624
0.429608804
0.429610501
0.429611912
0.440227144
20442
20442
20442
20442
20442
20442
20442
0
A
Q
G
P
I
U
D
C
R
R
R
N
R
N
R
R
184773879
184773879
184773879
[cksum]
184773879
[cksum] 1
184773879
184773879
iotop
Disk I/O by process:
# iotop -bod5
Total DISK READ:
TID PRIO USER
12824 be/4 root
279 be/3 root
12716 be/4 root
12816 be/4 root
iotop -bod5
[...]
WRITE:
39.50 K/s
SWAPIN
IO
COMMAND
0.00 % 80.59 % cksum ...
0.00 % 2.21 % [jbd2/vda2-8]
2.35 % 0.00 % sshd: root@pts/0
0.89 % 0.00 % python /usr/bin/
IO: time thread was waiting on I/O (this is even more useful
than pidstats Kbytes)
slabtop
Kernel slab allocator usage top:
# slabtop -sc
Active / Total Objects (% used)
: 900356 / 1072416 (84.0%)
Active / Total Slabs (% used)
: 29085 / 29085 (100.0%)
Active / Total Caches (% used)
: 68 / 91 (74.7%)
Active / Total Size (% used)
: 237067.98K / 260697.24K (90.9%)
Minimum / Average / Maximum Object : 0.01K / 0.24K / 10.09K
OBJS ACTIVE
112035 110974
726660 579946
4608
4463
83496 76878
23809 23693
11016
9559
3488
2702
510
431
10948
9054
2585
1930
[...]
sysctl
System settings:
# sysctl -a
[...]
net.ipv4.tcp_fack = 1
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_mem = 24180
32240
net.ipv4.tcp_wmem = 4096
16384
net.ipv4.tcp_rmem = 4096
87380
[...]
48360
1031680
1031680
/proc
Read statistic sources directly:
$ cat /proc/meminfo
MemTotal:
8181740
MemFree:
71632
Buffers:
163288
Cached:
4518600
SwapCached:
7036
Active:
4765476
Inactive:
2866016
Active(anon):
2480336
Inactive(anon):
478580
Active(file):
2285140
Inactive(file): 2387436
Unevictable:
0
Mlocked:
0
SwapTotal:
2932728
SwapFree:
2799568
Dirty:
76
Writeback:
0
[...]
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
kB
Tools: Advanced
perf
DTrace
SystemTap
and more ...
perf
Originally Performance Counters for Linux (PCL), focusing on
CPU performance counters (programmable registers)
kvm
list
lock
probe
record
report
sched
stat
task-clock-msecs
context-switches
CPU-migrations
page-faults
cycles
instructions
branches
branch-misses
cache-references
cache-misses
2.546444859
#
#
#
#
#
#
#
#
#
#
0.901
0.000
0.000
0.000
2395.230
2.221
550.641
1.032
2.059
1.211
CPUs
M/sec
M/sec
M/sec
M/sec
IPC
M/sec
%
M/sec
M/sec
yay
Low IPC (<0.2) means stall cycles (likely memory); look for
ways to reduce memory I/O, and improve locality (NUMA)
instructions
cycles
L1-dcache-load-misses
LLC-load-misses
dTLB-load-misses
2.332492555
2.199 IPC
Hardware
Applications
DBs, all server types, ...
System Libraries
CPU
Interconnect
Sockets
ZFS
Scheduler
TCP/UDP
LVM
IP
Ethernet
Virtual
Memory
Memory
Bus
DRAM
Device Drivers
perf stat
I/O Bus
advanced activity:
refer to the
processor
manuals
I/O Bridge
I/O Controller
Disk
CPU
1
Disk
Expander Interconnect
Network Controller
Port
Port
perf: Profiling
Profiling (sampling) CPU activity:
# perf record -a -g -F 997 sleep 10
[ perf record: Woken up 44 times to write data ]
[...]
Sunday, February 24, 13
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
[Tracepoint
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
event]
net:
sock: skb:
Operating System
Hardware
Applications
DBs, all server types, ...
System Libraries
syscalls:
ext4:
ext3/...
Sockets
ZFS
Scheduler
CPU
1
TCP/UDP
LVM
block:
sched:
IP
Ethernet
Virtual
Memory
vmscan:
kmem:
DRAM
Device Drivers
scsi:
irq:
I/O Bridge
I/O Controller
Disk
Disk
Network Controller
Port
Port
device
stats
can be
inferred
Symbol
...........
[k] tcp_sendmsg
Operating System
Applications
DBs, all server types, ...
System Libraries
System Call Interface
VFS
ext3/...
Sockets
ZFS
IP
Scheduler
CPU
1
TCP/UDP
LVM
advanced activity:
refer to the
kernel source
code
Ethernet
Virtual
Memory
DRAM
Device Drivers
I/O Bridge
I/O Controller
Disk
Hardware
Disk
Network Controller
Port
Port
device
stats
can be
inferred
DTrace
DTrace
Programmable, real-time, dynamic and static tracing
Perf analysis and troubleshooting, without restarting anything
Used on Solaris, illumos/SmartOS, Mac OS X, FreeBSD, ...
Two ports in development for Linux (that we know of):
1. dtrace4linux
Mostly by Paul Fox
2. Oracle Enterprise Linux DTrace
Steady progress
There are a couple of awesome books about DTrace too
DTrace: Installation
dtrace4linux version:
1. https://github.com/dtrace4linux/dtrace
2. README:
tools/get-deps.pl
# if using Ubuntu
tools/get-deps-fedora.sh
# RedHat/Fedora
make all
make install
make load
(need to be root or have sudo access)
# make load
tools/load.pl
13:40:14 Syncing...
13:40:14 Loading: build-3.2.6-3.fc16.x86_64/driver/dtracedrv.ko
13:40:15 Preparing symbols...
13:40:15 Probes available: 281887
13:40:18 Time: 4s
DTrace: Programming
Programming capabilities allow for powerful, efficient, oneliners and scripts. In-kernel custom filtering and aggregation.
# dtrace -n 'fbt::tcp_sendmsg:entry /execname == "sshd"/ {
@["bytes"] = quantize(arg3); }'
dtrace: description 'fbt::tcp_sendmsg:entry ' matched 1 probe
^C
bytes
value
16
32
64
128
256
512
1024
2048
4096
8192
DTrace: Programming
Programming capabilities allow for powerful, efficient, oneliners and scripts. In-kernel custom filtering and aggregation.
# dtrace -n 'fbt::tcp_sendmsg:entry /execname == "sshd"/ {
@["bytes"] = quantize(arg3); }'
dtrace: description 'fbt::tcp_sendmsg:entry ' matched 1 probe
^C
bytes
filter
aggregation (summarizes)
value
16
32
64
128
256
512
1024
2048
4096
8192
DTrace: Real-Time
Multiple GUIs use DTrace for real-time statistics. Eg, Joyent
Cloud Analytics, showing real-time cloud-wide syscall latency:
DTrace, cont.
Has advanced capabilities, but not necessarily difficult;
You may just:
DTrace: Scripts
#!/usr/sbin/dtrace -s
fbt::vfs_read:entry
{
self->start = timestamp;
}
fbt::vfs_read:return
/self->start/
{
@[execname, "ns"] = quantize(timestamp - self->start);
self->start = 0;
}
# ./vfsread.d
dtrace: script './vfsread.d' matched 2 probes
cksum
ns
value ------------- Distribution ------------[...]
262144 |
524288 |@@@@@@@@@@
1048576 |
2097152 |
read latency distribution,
4194304 |
8388608 |@
0.5ms -> 33ms (disks)
16777216 |
33554432 |
Sunday, February 24, 13
count
0
834
8
30
40
66
28
1
DTrace: Basics
CLI syntax:
dtrace -n provider:module:function:name /predicate/ { action }
probe description
optional
filter
do this when
probe fires
DTrace: Providers
Applications
DBs, all server types, ...
System Libraries
System Call Interface
VFS
ext3/...
Sockets
ZFS
CPU
1
TCP/UDP
LVM
IP
Scheduler
Ethernet
Virtual
Memory
DRAM
Device Drivers
I/O Bridge
I/O Controller
Disk
Disk
Network Controller
Port
Port
DTrace: Providers
syscall plockstat
tcp udp ip
pid
Applications
DBs, all server types, ...
System Libraries
System Call Interface
VFS
Sockets
ext3/...
fbt
profile
java javascript
node perl python
php ruby erlang
objc tcl ...
mysql postgres ...
ZFS
CPU
1
TCP/UDP
LVM
IP
Scheduler
profile
sched
proc
Ethernet
Virtual
Memory
cpc
DRAM
Device Drivers
vminfo
cpc
io
I/O Bridge
infer
infer
I/O Controller
Disk
Disk
Network Controller
Port
Port
ip
fbt
DTrace: ext4slower.d
Show me:
ext4 reads and writes
slower than a specified latency (milliseconds)
with time, process, direction, size, latency, and file name
# ./ext4slower.d 10
Tracing ext4 read/write slower than 10 ms
TIME
PROCESS
D
2013 Feb 22 17:17:02 cksum
R
2013 Feb 22 17:17:02 cksum
R
2013 Feb 22 17:17:03 cksum
R
2013 Feb 22 17:17:03 cksum
R
KB
64
64
64
64
ms
35
16
18
23
FILE
100m
1m
data1
data1
... continued:
Sunday, February 24, 13
DTrace: tcpretransmit.d
Show me:
TCP retransmits
destination IP address
kernel stack (shows why)
in real-time
Dont sniff all packets only trace retransmits, to minimize
overhead
... can trace those stack functions directly for more detail
SystemTap
SystemTap
Created when there
wasnt DTrace for
Linux ports
Hardware
Applications
DBs, all server types, ...
System Libraries
System Call Interface
VFS
Sockets
ext3/...
ZFS
CPU
1
TCP/UDP
LVM
IP
Scheduler
Ethernet
Virtual
Memory
DRAM
Device Drivers
I/O Bridge
I/O Controller
Disk
Disk
Network Controller
Port
Port
Methodologies
Selected four:
Streetlight Anti-Method
Workload Characterization Method
Drill-Down Analysis Method
USE Method
Methodologies give beginners a starting point, casual users a
checklist, and experts a reminder
Streetlight Anti-Method
Streetlight Anti-Method
1. Pick observability tools that are
familiar
found on the Internet
found at random
2. Run tools
3. Look for obvious issues
Included for comparison (dont use this methodology)
USER
root
brendan
joshw
joshw
root
root
root
root
root
PR
20
20
20
20
20
20
RT
20
RT
NI
0
0
0
0
0
0
0
0
0
System Libraries
System Call Interface
VFS
ext4
Block Device Interface
Device Drivers
[...]
value
< 0
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
Pros:
Will identify root cause(s)
Cons:
Time consuming especially when drilling in the wrong
direction
USE Method
USE Method
For every resource, check:
1. Utilization
2. Saturation
3. Errors
X
Errors
Utilization
DRAM
CPU
1
CPU
Interconnect
I/O Bus
I/O Bridge
Expander Interconnect
I/O Controller
Disk
CPU
1
Memory
Bus
Disk
Network Controller
Interface
Transports
Port
Port
DRAM
CPU
CPU
Utilization
Saturation
Errors
Metric
per-cpu: mpstat -P ALL 1, %idle; sar -P ALL,
%idle; system-wide: vmstat 1, id; sar -u, %idle;
dstat -c, idl; per-process:top, %CPU; htop, CPU
%; ps -o pcpu; pidstat 1, %CPU; per-kernelthread: top/htop (K to toggle), where VIRT == 0
(heuristic).
[1]vmstat 1, r > CPU count [2]; sar -q,
system-wide:
runq-sz > CPU count; dstat -p, run > CPU count;
per-process: /proc/PID/schedstat 2nd field
(sched_info.run_delay); perf sched latency (shows
Average and Maximum delay per-schedule); dynamic
tracing, eg, SystemTap schedtimes.stp queued(us) [3]
perf (LPE) if processor specific error events (CPC) are
available; eg, AMD64s 04Ah Single-bit ECC Errors
Recorded by Scrubber [4]
Cons:
Limited to a class of issues
Sunday, February 24, 13
Other Methodologies
Include:
Blame-Someone-Else Anti-Method
Tools Method
Ad-Hoc Checklist Method
Problem Statement Method
Scientific Method
Latency Analysis
Stack Profile Method
http://dtrace.org/blogs/brendan/2012/12/13/usenix-lisa-2012-performance-analysis-methodology/
Challenges
Performance counter analysis (eg, bus or interconnect port
analysis) is time consuming would like tools for convenience
Cloud Computing
Performance may be limited by cloud resource controls, rather
than physical limits
Free trial for new customers: good for $125 of usage value (~
one Small 1GB SmartMachine for 60 days). All prices subject
to change. Limited time only. Sign up at joyent.com
References
Linux man pages, source, /Documentation
USE Method: http://queue.acm.org/detail.cfm?id=2413037
http://dtrace.org/blogs/brendan/2012/03/07/the-use-methodlinux-performance-checklist/
http://dtrace.org/blogs/brendan/2012/12/13/usenix-lisa-2012performance-analysis-methodology/
https://github.com/dtrace4linux, http://www.dtracebook.com,
http://illumos.org, http://smartos.org
Thank you!
email: [email protected]
twitter: @brendangregg
blog: http://dtrace.org/blogs/brendan
blog resources:
http://dtrace.org/blogs/brendan/tag/linux-2/
http://dtrace.org/blogs/brendan/2012/02/29/the-use-method/
http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/
http://dtrace.org/blogs/brendan/2011/10/15/using-systemtap/
http://dtrace.org/blogs/brendan/2012/03/07/the-use-method-linux-performancechecklist/
http://dtrace.org/blogs/brendan/2012/03/17/linux-kernel-performance-flamegraphs/