Enhancing The Monitoring Using Linux
Enhancing The Monitoring Using Linux
To identify system bottlenecks and come up with solutions to fix it, you
should understand how various components of Linux works.
For example, how to identify performance related issues like High CPU
Load, High memory utilization, high disk io, high swap utilization and
different tools & commands used to narrow down the issue etc.,
2
Course Contents
CPU
Memory
I/O
Network
Like Context switches, Run Queue, CPU utilization & Load Average
3
Course Contents
CPU
Like Context switches, Run Queue, CPU utilization & Load Average
4
Course Contents
top
vmstat
iostat
free
lsof
tcpdump- Network Packet Analyzer
5
Lab on SAR (System Activities Statistics)
6
Lab on tcpdump- Network Packet Analyzer
• For example: number of packets received (transmitted) through the network card,
statistics of packet failure etc
• lsof command used in many Linux/Unix like system that is used to display list of all the
open files and the processes.
7
Linux system administrators should be proficient in Linux performance
monitoring and tuning.
To identify system bottlenecks and come up with solutions to fix it, you
should understand how various components of Linux works.
For example, how to identify performance related issues like High CPU
Load, High memory utilization, high disk io, high swap utilization and
different tools & commands used to narrow down the issue etc.,
8
On a very high level, following are the four subsystems that needs to
be monitored.
CPU
Memory
I/O
Network
9
CPU
You should understand the four critical performance metrics for CPU —
context switch, run queue, cpu utilization, and load average.
Context Switch
When CPU switches from one process (or thread) to another, it is called
as context switch.
However, a higher level of context switching can cause performance
issues.
10
CPU
Context Switch
Context Switch
You can view information about your process's context switches in /proc/<pid>/status.
$ pid=307
$ grep ctxt /proc/$pid/status
voluntary_ctxt_switches: 41
nonvoluntary_ctxt_switches: 16
12
CPU
Run Queue
Run queue indicates the total number of active processes in the current
queue for CPU.
When CPU is ready to execute a process, it picks it up from the run
queue based on the priority of the process.
Please note that processes that are in sleep state, or i/o wait state are
not in the run queue.
So, a higher number of processes in the run queue can cause
performance issues.
13
CPU
Cpu Utilization
14
CPU
Load Average
This indicates the average CPU load over a specific time period.
On Linux, load average is displayed for the last 1 minute, 5 minutes, and 15
minutes. This is helpful to see whether the overall load on the system is going up
or down.
For example, a load average of “0.75 1.70 2.10” indicates that the load on the
system is coming down. 0.75 is the load average in the last 1 minute. 1.70 is the
load average in the last 5 minutes. 2.10 is the load average in the last 15 minutes.
Please note that this load average is calculated by combining both the total
number of process in the queue, and the total number of processes in the
uninterruptable task status.
15
Memory
As you know, RAM is your physical memory. If you have 4GB RAM installed
on your system, you have 4GB of physical memory.
Virtual memory = Swap space available on the disk + Physical memory. The
virtual memory contains both user space and kernel space.
Using either 32-bit or 64-bit system makes a big difference in determining
how much memory a process can utilize.
On a 32-bit system a process can only access a maximum of 4GB virtual
memory. On a 64-bit system there is no such limitation.
16
Swap
Swap space in Linux is used when the amount of physical memory (RAM) is
full. If the system needs more memory resources and the RAM is full,
inactive pages in memory are moved to the swap space. While swap space
can help machines with a small amount of RAM, it should not be considered
a replacement for more RAM. Swap space is located on hard drives, which
have a slower access time than physical memory.
Swap space can be a dedicated swap partition (recommended), a swap file,
or a combination of swap partitions and swap files.
17
I/O
I/O wait is the amount of time CPU is waiting for I/O. If you see consistent
high i/o wait on you system, it indicates a problem in the disk subsystem.
You should also monitor reads/second, and writes/second. This is measured
in blocks. i.e number of blocks read/write per second. These are also
referred as bi and bo (block in and block out).
tps indicates total transactions per seconds, which is sum of rtps (read
transactions per second) and wtps (write transactions per seconds).
18
Network
For network interfaces, you should monitor total number of packets (and
bytes) received/sent through the interface, number of packets dropped, etc.
19
Commands to manage performance issues in Linux Servers
Listed below are some of commands including top, vmstat, iostat, free, and
sar. They may help in resolving performance issues quickly and easily.
20
Commands to manage performance issues in Linux Servers
Top
21
22
Commands to manage performance issues in Linux Servers
vmstat
The ‘vmstat’ command gives a snapshot of current CPU, IO, processes and
memory usage. Similar to the top command, it dynamically updates and can
be executed with this command:
$ vmstat 10
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free inact active si so bi bo in cs us sy id wa st
1 0 0 810420 97380 70628 0 0 115 4 89 79 1 6 90 3 0
23
Commands to manage performance issues in Linux Servers
sar
Use the ‘sar’ command line tool to collect, view and record performance data.
This command is considerably more sophisticated than all the commands
discussed above. It can collect and display data over longer periods.
24
Commands to manage performance issues in Linux Servers
iostat
25
Commands to manage performance issues in Linux Servers
Iostat
To identify whether I/O is causing system slowness you can use several commands but
the easiest is the unix command top.
26
27
Commands to manage performance issues in Linux Servers
free
The ‘free’ command shows memory statistics for both main memory and
swap. A total memory amount can be displayed by specifying the -t switch.
The amounts in bytes can also be displayed by specifying the -b switch and
megabytes using the -m switch (it displays in kilobytes by default).
Free can also be run continuously using the -s switch with a delay specified
in seconds:
$ free -s 5
28
Commands to manage performance issues in Linux Servers
free
29
Commands to manage performance issues in Linux Servers
30
Commands to manage performance issues in Linux Servers
Lsof command used in many Linux/Unix like system that is used to display
list of all the open files and the processes. The open files included are disk
files, network sockets, pipes, devices and processes. One of the main reason
for using this command is when a disk cannot be unmounted and displays
the error that files are being used or opened. With this commmand you can
easily identify which files are in use. The most common format for this
command is.
$ lsof
31
Commands to manage performance issues in Linux Servers
32
SAR (System Activities Statistics)
33
SAR (System Activities Statistics)
Using sar you can monitor performance of various Linux subsystems (CPU,
Memory, I/O, Network Statistics) in real time.
Using sar, you can also collect all performance data on an on-going basis,
store them, and do historical analysis to identify bottlenecks.
34
SAR (System Activities Statistics)
First, make sure the latest version of sar is available on your system. Install
it using any one of the following methods depending on your distribution.
35
SAR (System Activities Statistics)
Once installed, verify the sar version using “sar -V”. Version 10 is
the current stable version of sysstat.
$ sar -V
36
SAR (System Activities Statistics)
37
SAR (System Activities Statistics)
LinuxGuru@Server#sar -u 1 2
Linux 2.6.18-404.el5 04/09/17
38
SAR (System Activities Statistics)
sar -u Displays CPU usage for the current day that was collected until that point.
sar -u 1 3 Displays real time CPU usage every 1 second for 3 times.
sar -u ALL Same as “sar -u” but displays additional fields.
sar -u ALL 1 3 Same as “sar -u 1 3” but displays additional fields.
sar -u -f /var/log/sa/sa10 Displays CPU usage for the 10day of the month from the sa10 file.
39
SAR (System Activities Statistics)
If you have 4 Cores on the machine and would like to see what the
individual cores are doing, do the following.
“-P ALL” indicates that it should displays statistics for ALL the individual
Cores.
40
SAR (System Activities Statistics)
LinuxGuru@Server#sar -P ALL 1 1
Linux 2.6.18-404.el5 04/09/17
41
SAR (System Activities Statistics)
LinuxGuru@Server#sar -r 1 3
Linux 2.6.18-404.el5 04/09/17
42
SAR (System Activities Statistics)
sar -P ALL Displays CPU usage broken down by all cores for the current
day.
sar -P ALL 1 3 Displays real time CPU usage for ALL cores every 1 second
for 3 times (broken down by all cores).
sar -P 1 Displays CPU usage for core number 1 for the current day.
sar -P 1 1 3 Displays real time CPU usage for core number 1, every 1
second for 3 times.
sar -P ALL -f /var/log/sa/sa10 Displays CPU usage broken down by all cores
for the 10day day of the month from sa10 file.
43
SAR (System Activities Statistics)
44
SAR (System Activities Statistics)
sar -r
sar -r 1 3
sar -r -f /var/log/sa/sa10
45
SAR (System Activities Statistics)
tps – Transactions per second (this includes both read and write)
rtps – Read transactions per second
wtps – Write transactions per second
bread/s – Bytes read per second
bwrtn/s – Bytes written per second
46
SAR (System Activities Statistics)
LinuxGuru@Server#sar -b 1 3
Linux 2.6.18-404.el5 04/09/17
47
SAR (System Activities Statistics)
sar -b
sar -b 1 3
sar -b -f /var/log/sa/sa10
Note: Use “sar -v” to display number of inode handlers, file handlers,
and pseudo-terminals used by the system.
48
SAR (System Activities Statistics)
49
SAR (System Activities Statistics)
LinuxGuru@Server#sar -d 1 1
Linux 2.6.18-404.el5 04/09/17
10:41:07 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
10:41:08 dev8-0 2.00 0.00 176.00 88.00 0.00 1.00 1.00 0.20
10:41:08 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:41:08 dev8-2 2.00 0.00 176.00 88.00 0.00 1.00 1.00 0.20
10:41:08 dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:41:08 dev8-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.20
50
SAR (System Activities Statistics)
51
SAR (System Activities Statistics)
The device name (DEV column) can display the actual device name (for example: sda, sda1, sdb1 etc.,), if you
use the -p option (pretty print) as shown below.
LinuxGuru@Server#sar -p -d 1 1
Linux 2.6.18-404.el5 04/09/17
10:42:18 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
10:42:19 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
52
SAR (System Activities Statistics)
sar -d
sar -d 1 3
sar -d -f /var/log/sa/sa10
sar -p -d
53
SAR (System Activities Statistics)
This reports the run queue size and load average of last 1 minute, 5
minutes, and 15 minutes. “1 3” reports for every 1 seconds a total
of 3 times.
54
Linux Performance Monitoring and Tuning
Introduction
LinuxGuru@Server#sar -q 1 3
Linux 2.6.18-404.el5 04/09/17
55
Linux Performance Monitoring and Tuning
Introduction
Note: The “blocked” column displays the number of tasks that are currently
blocked and waiting for I/O operation to complete.
sar -q
sar -q 1 3
sar -q -f /var/log/sa/sa10
56
Linux Performance Monitoring and Tuning
Introduction
57
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
KEYWORD can be one of the following:
DEV – Displays network devices vital statistics for eth0, eth1, etc.,
EDEV – Display network device failure statistics
NFS – Displays NFS client activities
NFSD – Displays NFS server activities
SOCK – Displays sockets in use for IPv4
IP – Displays IPv4 network traffic
EIP – Displays IPv4 network errors
ICMP – Displays ICMPv4 network traffic
TCP – Displays TCPv4 network traffic
ETCP – Displays TCPv4 network errors
UDP – Displays UDPv4 network traffic
SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
ALL – This displays all of the above information. The output will be very long.
58
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
$ sar -n DEV 1 1
LinuxGuru@Server#sar -n DEV 1 1
Linux 2.6.18-404.el5 04/09/17
59
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
When you view historic sar data from the /var/log/sa/saXX file using “sar -
f” option, it displays all the sar data for that specific day starting from
12:00 a.m for that day.
Using “-s hh:mi:ss” option, you can specify the start time. For example, if
you specify “sar -s 10:00:00”, it will display the sar data starting from
10 a.m (instead of starting from midnight) as shown below.
60
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
For example, to report the load average on 26th of this month starting
from 10 a.m in the morning, combine the -q and -s option as shown
below.
61
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
CPU Utilization:
# sar -f /var/log/sa/sa11 -u 2 -s 06:30:00 -e 07:30:00
Linux 2.6.32-431.20.3.el6.s390x 11/10/16 _s390x_ (1 CPU)
62
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
63
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
06:40:01 vg_root-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
06:50:01 vg_root-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
07:00:01 vg_root-lv_swap 0.12 0.75 0.25 8.00 0.00 15.33 2.40 0.03
07:10:01 vg_root-lv_swap 0.12 0.21 0.72 8.00 0.00 17.29 6.57 0.08
07:20:01 vg_root-lv_swap 0.00 0.00 0.01 8.00 0.00 10.00 10.00 0.00
Average: vg_root-lv_swap 0.05 0.19 0.20 8.00 0.00 16.23 4.45 0.00
disk await is high during the same period of time and the disk is swap disk. It is trying to access the swap disk but unable
to get it. So the swap utilization is normal but unable to get the swap disk to swapin swap out.
64
Tcpdump
65
Tcpdump
# tcpdump -i eth0
# tcpdump -c 5 -i eth0
66
Tcpdump
# tcpdump -D
1.eth0
2.eth1
67
Tcpdump
To read and analyze captured packet 0001.pcap file use the command with -r
option, as shown below.
# tcpdump -r 0001.pcap
68
Tcpdump
# tcpdump -n -i eth0
Let’s say you want to capture packets for specific port 22, execute the below
command by specifying port number 22 as shown below.
70
Tcpdump
71
lsof
72
High CPU Utilization
• Below are commands which can be used to find out biggest cpu
consuming processes
• top
• ps –eo pmem,pcpu,pid,args | tail –n +2|sort –rnk 1|head
73
High Memory Utilization
• top
• ps –eo pmem,pcpu,pid,args | tail –n +2|sort –rnk 2|head
74
Swap
76
How to Increase Swap in Linux
77
END of this Course Module.
Thanks
78