M Via
M Via
Michael Welcome
Paul Hargrove
Lawrence Berkeley National Lab
[email protected]
[email protected]
Overview
VI
RDMA
Descriptor
VI NIC VI NIC
Network
M-VIA: Modular VIA for Linux
• Goals:
• Reference implementation
• Emulated drivers for non-VIA aware NICs
• Open Source – BSD style license
• Compaq Servernet II
• SysKonnect
• Portable to other Architectures
• Thread safe
• Kernel recompilation not required – Loadable Modules
• Co-exist with traditional network protocols
• Promote rapid development of new drivers
M-VIA Architecture
• VIPL implementation is driver
independent – IOCTL and Fast Trap
Application
M-KA
M-KA Plugin
VI Hardware
M-VIA 2 Features
• Plugins are dynamically loaded
• User-level and kernel level loading
• Multiple simultaneous plugins possible
• Applications do not need to be recompiled or relinked to
use different NICs, even though applications interact
directly with the NIC
• Installation easier
• Software distribution much easier
• Example: Single application could use multiple devices:
• M-VIA loop-back for communication within node
• hardware NIC(s) for communication with other nodes.
• Status: in design stage, some code written.
MVICH
MPI
(mpich) • MVICH is an implementation of the
MPICH ADI-2 for VIA
• Implements point-to-point message passing
ADI • From-scratch implementation of ADI-2
• No channels, no chameleon
• Multi-device support stripped out
• To Build:
channel mvich
• Put MVICH source tree in MPICH/mpid/via
• Configure with your VIA device
TCP • Build
“devices”
MVICH Implementation
• N-1 VI’s created on each node, one for each node-to-node
communication channel.
• Buffering
• “vbufs” are VIA memory-registered MPI-managed buffers
• Contain control info and, in certain cases, message data
• Flow control – VIA recv must be posted before send.
• Credit scheme implements accounting system for pre-posted recvs.
• Initially, each node pre-posts M recv vbufs on each VI, senders
given M credits on each VI.
• Sender decrements credit on send.
• Receiver posts another vbuf after recv, “refresh” credits are
piggybacked on rendezvous acknowledgements.
• Credit scheme throttles sender
Protocols
• Short/Eager
• Send data in one or more packets through vbufs
• Requires buffering on receiver
• “R3” Rendezvous
• Standard 3-way rendezvous through vbufs with pipelining
• “Rput” rendezvous
• Zero copy RDMA write from sender to receiver
• “Rget” rendezvous
• Zero copy RDMA read by receiver
• Rput and Rget revert to R3 if either side fails to register the
user’s data area.
Dynamic memory registration
• Some advantages:
• Support
• Base implementation targeted for VIA (MPICH is “general”).
• Thread safe
• Full asynchronous communication
• Datatype optimizations
For more info...
• http://www.nersc.gov/research/ftg/via