NVMe-oF An Advanced Introduction
NVMe-oF An Advanced Introduction
http://nvmexpress.org/resources/specifications/
Why Use
NVMe
• NVMe is an alternative to SCSI (small computer
system interface)
• SCSI became a standard in 1986 for connecting and
transferring data between hosts and storage devices (HDD
and tape)
• SCSI-based file commands work well with disk storage,
but performance does not improve as much when
working with flash systems.
• How?
Simplified I/O stack (particularly on the host side)
Parallel requests easy with enhanced queuing capabilities
NVMe provides for large numbers of queues (up to 64,000)
and supports massive queue depth (up to 64,000
commands)
I/O locking is not required
Block
Layer
SCSI Mid
Layer
SCSI Driver
Layer
HBA
Workload
IO size 4KB @70% Read-30% Write cache hit
Identical workload of 200K IOPs on SCSI and
NVMeFC 32 SCSI devices with total QD=32
32 NVMeFC devices with 4 associations and total
64 queues
Analysis
• Identical response time for typical workload
• Identical IOPS for typical workload
• SCSI max out at QD32
• NVMeFC - same IOPs at half CPU consumption
• Code efficiency in NVMeFC host stack
• Potential queuing in NVMeFC gives same response time
with lesser CPU cost than SCSI
Workload
IO size 4KB @70% Read/ 30% Write cache
hit Maximum workload with QD 512
32 SCSI devices
32 NVMeFC devices with 4 associations and
total 64 queues
Analysis
• NVMeFC IOPs scale to 400K – 500K IOPs
• NVMeFC IOPS limited by Storage target port capability
• NVMeFC show 50% latency drop over SCSI
• SCSI IOPs limited to 220K IOPs
• SCSI performance limited by host stack bottleneck
• SCSI drives CPU usage almost to 70%
*
*
Inside
Network-attached
host
chassis ** RDMA means
remote direct
memory access
RoCE RoCE
v1 v2
RDMA Software Stack RDMA Software Stack
IB Transport Protocol IB Transport Protocol
IB Network Layer UDP/IP
Ethernet Link Layer Ethernet Link Layer
Ethernet Management Ethernet / IP Management
Fibre Channel
iWARP RoCE Infiniband
Fibre Channel
iWARP RoCE Infiniband PCIe Function
NVMe
Subsystem
Hea
d
FC-NVMe
storage
Hos Discovery
t Controller
Storage
Controller
FC-NVMe storage
Hos Discovery
t Controller
Storage
Controller
FC-NVMe
storage
Hos Discovery
t Controller
Storage
Controller
FC-NVMe
storage
Hos Discovery
t Controller
Storage
Controller
FC-NVMe
storage
Hos Discovery
t Controller
Storage
Controller
NOTE: At this point, the host can
disconnect from the NVMe
Storage Subsystem’s Discovery
Controller.
FC-NVMe
storage
Hos Discovery
t Controller
Storage
Controller
Write Operation
Flow
IBM Systems Technical Events | © Copyright IBM Corporation 2018. Technical University/Symposia materials may
ibm.com/training/events not be reproduced in whole or in part without the prior written permission of IBM.
Exploiting Dual Protocol FCP and FC-
NVMe
Dual-protocol infrastructure for easiest migration
Servers
• Deploy NVMe based arrays SCSI-on-
• Leverage existing infrastructure FC
• Easily supported dual infrastructure FC-NVMe GEN6
• How long will the transition take? HBAs
• Avoid risks
• Incremental Migration
• Applications dictate how
individual volumes can be
migrated
• Changes can be rolled back
easily without DUAL Protocol Existing 32Gb
disruption to hardware or SAN arrays FC fabric
cabling.
Existing enterprise SCSI-on-
• FCP and NVMe over FC can both Storage FC FC-
leverage FC zoning
• Only SAN Zoning improves security infrastructure NVME
• Zoning restricts devices from accessing
network areas that should not be
visiting!
• Discovery SCSI on FC FC NVMe
• Emulex create drivers that leverage Arrays Arrays
FCPTechnical
IBM Systems for Events
ibm.com/training/events
device
| discovery, then check © Copyright IBM Corporation 2018. Technical University/Symposia materials may
not be reproduced in whole or in part without the prior written permission of IBM.
38
|
© Copyright IBM Corporation 2018. Technical University/Symposia materials may
IBM Systems Technical Events | 39
ibm.com/training/events not be reproduced in whole or in part without the prior written permission of IBM.