Linux On ZVM - Understanding Disk IO
Linux On ZVM - Understanding Disk IO
http://zvmperf.wordpress.com/
2
Linux on z/VM Tuning Objective
Resource Efficiency
Achieve SLA at minimal cost
• “As Fast As Possible” is a very expensive SLA target
Scalability has its limitations
• The last 10% peak capacity is often the most expensive
5
Benchmark Challenges
140
160
120
140
100
120
80
100
60
80
40
60
20 40
0 20
wr it e r ewr it e read rer ead random r d r andom wr
0
w rite rew rite read reread random rd random w r
6
Anatomy of Basic Disk I/O
Selection Criteria
Capacity
Price
Reality: In comparison,
disk I/O today is slow
IBM 3380-AJ4 Seagate Momentus
(1981) 7200.3 (2011)
Seek Time 12 ms 11 ms
© 2010 Brocade, SHARE in Seattle, “Understanding FICON I/O Performance” Device Interface 2.7 MB/s 150 MB/s
7
Anatomy of Basic Disk I/O
Host Disk
Average I/O Operation
Start
Seek over 1/3 of the tracks ~ 10 ms I/O I/O Seek
Response
Wait for 1/2 a rotation ~ 3 ms Time Locate
Read the data ~ 1 ms
Transfer
Data
host
Processing
buffer I/O Time
Rate
disk
Start
I/O
Host and disk decoupled by
speed matching buffer
8
Classic DASD Configuration
9
Classic DASD Configuration
10
Classic DASD Configuration
11
Contemporary Disk Subsystem
12
RAID Configuration
Performance Considerations
The drives are “just disks”
RAID does not avoid latency
Large data cache to avoid I/O
Cache replacement strategy
Cache
13
RAID Configuration
Probabiltiy
Cache read hit
• Data available in subsystem cache
Response Tim e
• No DISC time
Cache read miss
• Back-end reads to collect data
• Service time unrelated to logical I/O
14
RAID Configuration
Example:
Cache Hit Ratio 90%
Average DISC 0.5 ms
Service Time Miss 5 ms
Read Prediction
Detecting sequential I/O
ECKD: Define Extent
ECKD
Emulation
15
Disk I/O Example
210K blocks per second =
<----------Rates (per sec)--------> 105 MB/s -> 6.3 GB written
<Processor Pct Util> Idle <-Swaps-> <-Disk IO-> Switch Intrpt
Time Node Total Syst User Nice Pct In Out In Out Rate Rate
-------- -------- ----- ---- ---- ---- ---- ---- ---- ----- ----- ------ ------
15:12:00 roblnx2 5.9 5.7 0.2 0 60.2 0 0 0 210K 272.1 0 105 MB/s & 272 context
switches -> ~ 400 KB I/O’s
16
Parallel Access Volumes
LPAR
Channel Subsystem
Cache
LPAR
FICON
Channels
a a
z/VM LPAR b b ECKD
Emulation
c c
17
Parallel Access Volumes
LPAR
Channel Subsystem
Cache
LPAR
FICON
Channels
a a
aa aa
b b
z/VM LPAR b b b b ECKD
c c
Emulation
c c
c c
18
Parallel Access Volumes
Example
Cache hit ratio of 90%
• Cache hit response time 0.5 ms
• Cache miss response 5.5 ms
CONN 0.3 ms
Base
Alias
Elapsed Time
19
Parallel Access Volumes
Single
Subchannel
PEND 0.2 ms
Base
DISC 5.0 ms
Alias
CONN 0.3 ms
Alias
Elapsed Time
20
Parallel Access Volumes
Single
Subchannel
21
Parallel Access Volumes
Performance Benefits
1. Access to cached data while previous I/O is still active
• Avoids DISC time for cache miss
2. Queuing the request closer to the device
• Avoid IOSQ and PEND time
3. Multiple operations in parallel retrieving data from cache
• Utilize multiple channels for single logical volume
Restrictions
22
Parallel Access Volumes
Static PAV
Alias devices assigned in DASD Subsystem configuration
Association observed by host Operating System
Dynamic PAV
Assignment can be changed by higher power (z/OS WLM)
Moving an alias takes coordination between parties
Linux and z/VM tolerate but not initiate Dynamic PAV
HyperPAV
Pool of alias devices is associated with set of base devices
Alias is assigned for the duration of a single I/O
Closest to “infinite number of alias devices assumed”
23
Parallel Access Volumes
24
Linux Disk I/O
Diagnose I/O y
Channel Subsystem
High-level Disk I/O protocol
x
Easier to manage y
z/VM LPAR
Synchronous and Asynchronous z
25
Linux Disk I/O
50
• Minidisk or dedicated DASD 40
30
Channel Subsystem
x
No obvious performance favorite y
z/VM LPAR
• Very workload dependent z
26
Linux Disk I/O
Real
I/O Response Time
IOSQ
Virtual Machine
Start I/O
CCW Translation
PEND Command
Paging Transfer
Dispatching
DISC
Data
Transfer
CONN
Real I/O
Complete
Virtual I/O
Data
Interrupt
Available
27
Linux Disk I/O
28
Linux Disk I/O
Buffered I/O
By default Linux will buffer application I/O using Page Cache
• Lazy Write – updates written to disk at “later” point in time
• Data Cache – keep recently used data “just in case”
• Read Ahead – avoid I/O for sequential reading
Buffered I/O Throughput
Write
Read
300
Performance improvement 250
Throughput (MB/s)
• More efficient disk I/O 200
150
50
0
4 8 16 32 64 128 256 512
Block Size (KB)
29
Linux Disk I/O
Buffered I/O
By default Linux will buffer application I/O using Page Cache
• Lazy Write – updates written to disk at “later” point in time
• Data Cache – keep recently used data “just in case”
• Read Ahead – avoid I/O for sequential reading
Write
Direct I/O vs Buffered I/O
Throughput (MB/s)
200
• Overlap of I/O and processing 150
100
50
Direct I/O 0
4 8 16 32 64 128 256 512
0
Buf fered Direct I/ O
http://zvmperf.wordpress.com/2012/04/17/cpu-cost-of-buffered-io/
30
Linux Disk I/O
31
Linux Disk I/O
Synchronous I/O
Single threaded application model
Processing and I/O are interleaved CPU I/O CPU I/O CPU I/O
transaction
Asynchronous I/O
Allow for overlap of processing and I/O
Improves single application throughput CPU CPU CPU
Assumes a balance between I/O and CPU I/O I/O I/O
Matter of Perspective
From a high level everything is asynchronous
Looking closer, everything is serialized again
Linux on z/VM
Many virtual machines competing for resources
Processing of one user overlaps I/O of the other
Unused capacity is not wasted
32
Linux Disk I/O
top - 11:49:20 up 38 days, 21:27, 2 users, load average: 0.57, 0.13, 0.04
Tasks: 55 total, 2 running, 53 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 1.3%sy, 0.0%ni, 0.0%id, 96.7%wa, 0.3%hi, 0.3%si, 1.0%st
top - 11:53:32 up 38 days, 21:31, 2 users, load average: 0.73, 0.38, 0.15
Tasks: 55 total, 3 running, 52 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 31.1%sy, 0.0%ni, 0.0%id, 62.5%wa, 0.3%hi, 4.3%si, 1.7%st
33
Linux Disk I/O
top - 11:53:32 up 38 days, 21:31, 2 users, load average: 0.73, 0.38, 0.15
Tasks: 55 total, 3 running, 52 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 31.1%sy, 0.0%ni, 0.0%id, 62.5%wa, 0.3%hi, 4.3%si, 1.7%st
http://zvmperf.wordpress.com/2013/02/28/explaining-linux-steal-percentage/
34
Linux Disk I/O
concatenation striping
35
Linux Disk I/O
Disk Striping
Function provided by LVM and mdadm
Engage multiple disks in parallel for your workload
36
Linux Disk I/O
37
The Mystery of Lost Disk Space
1
Claim in various IBM presentations
38
Conclusion
39
Linux on z/VM Performance