High Performance Storage With BLK-MQ and Scsi-Mq: Christoph Hellwig
High Performance Storage With BLK-MQ and Scsi-Mq: Christoph Hellwig
Christoph Hellwig
2014 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Problem Statement
800,000
700,000
600,000
Aggregate IOPS
500,000
300,000
200,000
100,000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
LUNs
Linux Storage Stack - Issues
BIO submission
Device mapper,
Software RAID
Request layer
SCSI layer
Processes
Software contexts
(per-CPU)
Hardware contexts
(based on HW capabilities)
HBA
Blk-mq – request allocation and tagging
1,000,000
800,000
Aggregate IOPS
600,000
Linux 2.6.32
3.17-rc3 (with blk-mq)
400,000
200,000
LUNs
SCSI profiling data
46.13% [kernel] [k]
[k] _spin_lock_irq
Linux 2.6.32
46.13% [kernel] _spin_lock_irq
26.92%
26.92% [kernel]
[kernel] [k]
[k] _spin_lock_irqsave
_spin_lock_irqsave
9.32%
9.32% [kernel]
[kernel] [k] _spin_lock
[k] _spin_lock
0.47%
0.47% [kernel]
[kernel] [k]
[k] kmem_cache_alloc
kmem_cache_alloc
0.45%
0.45% [kernel]
[kernel] [k]
[k] scsi_request_fn
scsi_request_fn
0.39%
0.39% [kernel]
[kernel] [k] _spin_unlock_irqrestore
[k] _spin_unlock_irqrestore
0.33%
0.33% [kernel]
[kernel] [k]
[k] kref_get
kref_get
0.32%
0.32% [kernel]
[kernel] [k] __blockdev_direct_IO_newtrunc
[k] __blockdev_direct_IO_newtrunc
0.32%
0.32% [kernel]
[kernel] [k]
[k] kmem_cache_free
kmem_cache_free
0.30%
0.30% [kernel]
[kernel] [k]
[k] native_write_msr_safe
native_write_msr_safe
2.67%
2.67% [kernel]
[kernel] [k]
[k] do_blockdev_direct_IO
do_blockdev_direct_IO
2.60% [kernel] [k]
[k] __bt_get
Linux 3.17-rc3 2.60%
2.43%
2.43%
2.07%
[kernel]
[kernel]
[kernel]
[kernel]
[k]
__bt_get
__blk_mq_run_hw_queue
[k] __blk_mq_run_hw_queue
[k]
2.07% [kernel] [k] put_compound_page
put_compound_page
(with blk-mq) 1.87%
1.87%
1.60%
1.60%
[kernel]
[kernel]
[kernel]
[kernel]
[k]
[k] __blk_mq_alloc_request
[k]
__blk_mq_alloc_request
_raw_spin_lock
[k] _raw_spin_lock
1.59%
1.59% [kernel]
[kernel] [k]
[k] kmem_cache_alloc
kmem_cache_alloc
1.58%
1.58% [kernel]
[kernel] [k] scsi_queue_rq
[k] scsi_queue_rq
1.44%
1.44% [kernel]
[kernel] [k]
[k] _raw_spin_lock_irqsave
_raw_spin_lock_irqsave
Linux SCSI Performance
Multiple LUN performance, single threaded - SRP attached null_io target
1,400,000 140%
130%
1,200,000 120%
110%
1,000,000 100%
90%
800,000 80%
CPU usage
70%
IOPS
600,000 60%
50%
400,000 40%
30%
200,000 20%
LUNs
1,200,000
1,000,000
800,000 3.14.3
3.16+
IOPS
400,000
200,000
0
random read, 12 threads random write, 12 threads random read, 1 thread random write, 1 thread
SCSI blk-mq status - near term work
Benchmarks:
– Bart van Assche (Fusion-io / Sandisk):
• https://docs.google.com/file/d/0B1YQOreL3_FxWmZfbl8xSzRfdGM/edit?pli=1