1 - Introduction to InfiniBand
1 - Introduction to InfiniBand
1 - Introduction to InfiniBand
Unit 1
Outline
• InfiniBand Trade Association (IBTA)
• What is InfiniBand?
• Why InfiniBand?
3
Introduction to InfiniBand
What Is InfiniBand?
Interconnect technology for interconnecting processor nodes and I/O nodes
to form a system area network
▪ The architecture is independent of the host operating system (OS) and processor platform
▪ Open industry-standard
specification
▪ Defines an input/output
architecture
▪ Offers point-to-point
bidirectional serial links
InfiniBand Trade Association (IBTA)
InfiniBand is an open standard interconnect protocol developed by the
InfiniBand® Trade Association (IBTA)
▪ The InfiniBand specification was released in 2000
▪ The specification provides a solution starting from the hardware layer to the application layer
▪ InfiniBand software is developed under the OpenFabrics Alliance
▪ The InfiniBand technology is based on the InfiniBand specification
▪ InfiniBand is Now Connecting More than 50 Percent of the TOP500 Supercomputing List
6
InfiniBand Main Use Cases
× 4 = 200Gb/s
Single lane
Single lane
50Gb/s
50Gb/s
Single lane
50Gb/s Single lane
50Gb/s
Single lane
50Gb/s
Single lane
50Gb/s
CPU Offloads and RDMA
( RDMA – Remote Direct Memory Access )
▪ InfiniBand architecture supports packet transportation with minimal CPU intervention, enabling much
effective utilization of the Hosts CPU resources and traffic
latency performance
▪ This is achieved thanks to:
▪ RDMA support
▪ Hardware-based transport protocol
USER
(Transport off load )
▪ Kernel bypass
KERNEL
▪ zero copy
RDMA
HARDWARE
▪ Hardware - Host Channel Adapter
CPU Offload With RDMA
Without RDMA With RDMA and Offload
User Space
User Space
~53% ~88%
CPU Utilization CPU Utilization
System Space
System Space
~47% ~12%
CPU Overhead/Idle CPU Overhead/Idle
Low Latency
Latency is the time taken to send a unit of data between two points in a fabric.
▪ GPU-direct, allows direct data transfer from memory of one GPU, to the memory of another.
▪ It enables accelerated computing applications such as:
▪ Artificial Intelligence - AI
▪ Deep learning and data science
▪ Lower latency and improved performance,
as provided by the GPU based computation.
Simplified Management
▪ Subnet Manager (SM) - a program that runs and manages the fabric
▪ Any InfiniBand fabric has its own single (master) SM
▪ The SM makes the fabric management very simple:
▪ Plug & Play end nodes environment
▪ Centralized route manager
Quality of Service (QOS)
▪ QoS is the ability to provide different priority to different:
▪ Applications
▪ Users
▪ Data flows
▪ QoS implementation can be achieved by:
▪ Defining I/O channels at the adapter level
▪ Defining Virtual Lanes at the link level
▪ QoS allows control of congestion on the network
Scalability and Flexibility
▪ InfiniBand offers easy scaling for data centers with great flexibility
▪ InfiniBand enables a single subnet to scale up to 48,000 managed nodes
▪ An InfiniBand network could scale to a million 1000000 nodes plus (with multiple subnets)
InfiniBand Fabric Components
▪ Subnet Manager (SM)
▪ Host/Server
▪ Host Channel adapter
Switch
▪ Switch Router
▪ Gateway
▪ Router
Gateway
Channel
Adapter
Subnet
Manager
Host/Server
Common InfiniBand Network Topology Icons
SM
SM
• What is InfiniBand?
• Why InfiniBand?
InfiniBand bandwidth
Low latency
CPU offloads
Simplified management
Quality of Service
Scalability and flexibility