VIAP
VIAP
VIAP
Virtual Interface
Architecture Primer
Traditional operating system interfaces to network hardware prevent application programs from taking
advantage of the performance improvements (i.e. high bandwidth and extremely low latencies) of giga-
bit-per-second interconnects like Fibre Channel. To eliminate this bottleneck, a group of independent
hardware and software vendors have defined the Virtual Interface (VI) Architecture. VI is a standard and
extremely efficient way to reduce software overhead between a high-performance CPU/memory sub-
system and a high-performance network.
This primer introduces the VI Architecture and provides an overview of the key elements of VI. Using the
clustering challenge as an example, it describes how these elements work to provide an efficient way
of moving data between applications and network hardware.
VIRTUAL INTERFACE
application programs from taking advantage of these ARCHITECTURE (VI)
performance improvements. To eliminate this bottle- As a standard mechanism to deliver high performance
neck, a group of independent hardware and software directly to applications, and to eliminate overhead and
vendors have defined the Virtual Interface (VI) inefficiencies common to operating systems and net-
Architecture. VI is a standard and extremely efficient work stacks, Intel®, Compaq® and Microsoft® have
way to reduce the software overhead between a high- jointly defined and developed the Virtual Interface
performance CPU/memory subsystem and a high-per- Architecture (VI). The development of VI now allows
formance network. application developers and operating system vendors
This primer is intended as an introduction to the VI to code to a standard architecture for lightweight mes-
Architecture and includes an overview of the key ele- saging. Further, it allows hardware vendors to supply
ments of VI as well as a description of how these ele- products implementing it, thus creating a total end-to-
ments work to provide an efficient way for moving data end solution with multi-vendor support.
between applications and network hardware. The VI bypasses much of the overhead in traditional proto-
motivation for creating the VI specification was the col stacks and provides a more direct access to the
clustering challenge - and that's where we begin. network interface hardware. VI gives a concise set of
operations for moving data between network connec-
THE CLUSTERING CHALLENGE
Clustering is the scaling of computer power to
increase overall performance and availability. This is
accomplished by connecting processors so that they
can collaboratively perform computations while
appearing as a single computing resource to the
application or client (see Figure 1). From the applica-
tion's point of view, a computer cluster is a virtual main-
frame because distribution of processing is complete-
ly transparent. The performance gains realized by scal-
ing with clusters are highly dependent on the underly-
ing network and commu-nication protocols. The goal
is to obtain linear performance scaling (i.e. a return of a
dollar of performance gain for every dollar of invest-
ment).
The overhead of standard communication protocols
and the inefficiencies of their interaction with operating
systems greatly reduce the performance benefits of
scaling. For example, some of the best TCP/IP protocol
stack implementations have sig-nificant throughput
bottlenecks (see discussion in "Profiling and Reducing
Processing Overheads in TCP/IP" by Kay and Figure 1. Clustered servers in a Fibre Channel Network.
tions with latency characteristics closer to memory duplicate/redundant data copy-ing as it moves
movement than network operations. It eliminates most between application and network hardware
of the copying inherent in the movement of data inefficient protocol stacks resulting in costly CPU
between an application and the supporting operating overhead (e.g., IP checksum handled by CPU).
system. It also reduces wasted CPU cycles typically
The VI network architecture (see Figure 3) eliminates
required to manage the interaction between the oper-
this CPU overhead by providing direct communication
ating system and the network interface hardware.
between application software and the network hard-
VI defines a combination of hardware, firmware and ware, including:
operating system driver interaction to improve the effi-
direct access from NIC to application memory, elim-
ciencies of network communications. In fact, one of the
inating virtual to physical memory mapping over-
major tenets of VIA is that the architecture be simple
head
enough to be integrated in silicon. VI-compliant hard-
ware, such as a Fibre Channel NIC with hardware user-level access to NIC via a Kernel Agent reduc-
assist for VI connections and compatible operating ing user to kernel context switches
system drivers, off-loads the host CPU by performing elimination of operating system driver interrupts
much of the work involved in transferring data directly direct data copies between application memory
in and out of application-based buffers. At the same and NIC
time, VI drivers are fully compatible with both general
NIC-based protocol processing, off-loading the host
purpose and real-time operating systems such as
card CPU.
Windows NT™, Linux, VxWorks® and LynxOS™.
point that can be logically connected to another VI to also eliminates the protection mechanisms that
support point-to-point communication of data between ensure that applications can't interfere with each other.
processes (see Figure 5). Each VI consists of one Send VI provides applications with direct access to the net-
queue and one Receive queue, each NIC can support work interface hardware while retaining access protec-
as many as 64K separate VIs, and a process can have tion. It offers a mechanism for memory protection in the
multiple VIs. form of protection tags that ensures that a user
process cannot send out of or receive into memory
CREATING A VI locations that it does not own.
The VIPL provides an application call that in turn com- Sending and Receiving Data
mands the VI Kernel Agent to initiate the creation of a
A pair of processes connected by a VI can use that
VI. The creation process consists of associating a
channel to communicate data with minimal host
doorbell and Send/Receive queues with a VI and
processor and operating system overhead. Two differ-
returning addresses of these to the calling application.
ent data transfer mechanisms are supported, a tradi-
There is no connection established upon creation of a
tional Send/ Receive model and the Remote Direct
VI and no data will flow between processes until a con-
Memory Access (RDMA) model.
nection is established with a pair of VIs.
Send Receive Model
ESTABLISHING A CONNECTION The Send/Receive model is the traditional method of
VI provides a flexible connection management transferring data between two end-points. With this
scheme with support for both a client-server and peer- model, the receiving end-point executes a Receive
to-peer model. The client-server model offers the tradi- command and the transmitting end-point executes a
tional blocking connection scheme; one side of the Send command. The Receive-side process specifies
connection waits for connection requests, and the con- the memory location where data will be placed and
nection is established after another remote process the Send-side process specifies the memory location
issues a connection request. The peer-to-peer scheme from which data will be sent. For synchronization pur-
provides a non-blocking connection scheme that poses, both the Send-side and Receive-side process-
allows a connection request to time out if there is no es are notified when their respective requests have
matching connection request from a peer after a fixed completed.
time period. RDMA Model
Establishing a VI connection is typically an infrequent Somewhat unique to VI is the RDMA model, whereby
task that results in a long-life connection for many data the initiator of the data transfer specifies both the
exchanges. The connection will remain until an explic- source memory region and the destination memory
it disconnect request is issued by an application on region. The RDMA model includes both an RDMA
one side of the connection. write and an RDMA read capability. This is a very use-
ful operation because it allows a node to transfer data
MEMORY REGISTRATION AND to or from remote memory without intervention by the
PROTECTION remote CPU. RDMA operations do require that the
Most modern operating systems support virtual mem- remote end pre-registers memory regions to be used
ory management schemes that force a virtual-to-phys- for RDMA operations.
ical address map translation so that network hardware
can access user data. These address translations are SYNCHRONIZATION AND
required for every data transfer request introducing a COMPLETION
significant overhead for every network transfer. VIA supports both polled and interrupt mechanisms to
The VI architecture requires that data blocks used for synchronize and inform the application program that
data transfer over a VI be tied to a physical memory an operation has completed. By using a VIPL call to
address rather than a virtual address in User space. poll on the head of each VI queue, an application can
This provides the network hardware with free access to check on completion of a request. If the descriptor in
application memory without participation of the operat- the work queue is complete, the VIPL call removes the
ing system. This reduces software overhead in the per- descrip-tor from the head of the queue and returns the
formance-critical data transfer path, and eliminates the descriptor address to the calling process; otherwise, it
need for intermediate data copying from User to Kernel returns a unique status and the head of the queue
space. does not change.
The VI architecture enforces a memory registration The user may also use a blocking call to wait on the
scheme to tie down physical memory. Memory regis- completion of a request. When the request has com-
tration requires that application processes allocate a pleted, an operating system interrupt will be generated.
region of memory in User space and pass the address VIA supports both an interrupt to awaken a blocked
of that region to the VI Kernel Agent as a parameter of process as well as a callback function.
the VIPL register-memory command. A memory handle As an alternative to synchronizing on individual VI
returned by the VIPL function can then be used in sub- queues, there is a construct that supports coalescing
sequent data transfer requests. com-pletion notifications from multiple work queues
By eliminating operating system level memory man- into a single completion queue. This simplifies the syn-
agement from the critical path, there is a risk that one chroni-zation activities and reduces overhead for appli-
cation software with multiple VIs and multiple out- Information on the VIDF can be requested by sending
standing requests. a request to [email protected].
VIA implementation for Linux (M-VIA)
LOOKING FORWARD
M-VIA (Modular-VIA) is a complete high-performance
VIA introduces open standard networking concepts,
implementation of the VIA for Linux. It was written at the
data structures, architecture and API that allow hard-
NERSC center at Lawrence Berkeley National
ware and software vendors to build compatible, inter-
Laboratory. It is essentially shareware that can be port-
operable, high performance, and competitive network-
ed to new hardware platforms. The NERSC web site
ing products. It allows application programs to take
provides a downloadable image as well as information
advantage of the high bandwidth and low latency
on performance numbers and hardware support for M-
capabilities of high speed network technologies, such
VIA. http://www.nersc.gov/research/ftg/via/.
as Fibre Channel, by reducing the overhead and ineffi-
ciencies of traditional operating systems and networks MPI layered on top of VIA
stacks. Message Passing Interface (MPI), a de facto standard
for parallel communications, has been implemented
on VIA to provide an efficient interconnect for clusters
of workstations in a gigabit/second system area net-
work. MPI has become the programming interface of
FOR MORE INFORMATION ON VIA:
choice in the parallel computing world and is used to
Virtual Interface Architecture interconnect digital signal processors.
Specification For information on MPI Softech's implementation of
This is the primary VIA specification that defines the VI MPI on VIA see http://www.giganet.com/cluster/-
architecture and all related elements. This document whitepapers/MPIimpforvi.html
can be downloaded from the Intel developer's site at
http://developer.intel.com/design/servers/vi/developer/
ia_imp_guide.htm. This site also provides the VI Wilf Sullivan is the Product Marketing Manager of
Conformance Tests - a set of tests that can be used by Channel 1* networking products and Pentium* sin-
independent hardware and software vendors to gle board computers at DY 4 Systems Inc. Building
demonstrate conformance of VI-related products to the on his Master's degree in Computer Science, Wilf
VIPL specification. has over 10 years' experience as a designer and
VIA Developer's Guide developer of software for the real-time embedded
environment.
The specification defines the standard application pro-
gram interface.This API is called the VI Provider Library.
It is defined in an operating system-independent man- REFERENCES
ner. The VIPL specification can be downloaded from
1. Dunning et al., "The Virtual Interface Architecture,"
the VI Architecture web-site at www.viarch.org
IEEE Micro, March/April 1998, pp. 66-73.
FC-VI Specification 2. Kay and Pasquale, "Profiling and Reducing
(Fibre Channel Mapping to VI) Processing Overheads in TCP/IP" IEEE/ACM
The mapping of the VI architecture onto the Fibre Transaction on Networking, Vol. 4, No. 6, December
Channel framing protocol is defined by the FC-VI (ANSI 1996, pp. 817-828.
T11) standards working group. A copy of the standard 3. Mukherjee et al., "Making Network Interfaces Less
can be downloaded from the T11 web site at Peripheral," IEEE Computer, Oct. 1998, pp. 70-76.
www.t11.org. The goal of the FC-VI group is to provide
a mapping between Fibre Channel and VI that fully 4. "Next Generation I/O: A New Approach to Server I/O
exploits the potential of both. Included within the rec- Architectures," Aberdeen Group Technical White
ommended scope of this project is the creation of Paper, February 1999, http://www.ngioforum.org/-
mappings to Fibre Channel for the transport of VIA and events/02991357.html.
support for the full range of Fibre Channel topologies, 5. Skjellum et al., "An Efficient MPI Implementation for
including loops and fabrics, to enable scalable clus- Virtual Interface (VI) Architecture-Enabled Clus-ter
tering solutions. Computing," http://www.giganet.com/cluster/-
whitepapers/MPIimpforvi.html.
VI Developers Forum (VIDF)
6. "Virtual Interface (VI) Architecture Developer's Guide,"
The VI Developers Forum (VIDF) is an organization of
Revision 1.0, September 1998, www.viarch.org.
independent software and hardware vendors that
works to define and maintain VI as an open indepen- 7. Virtual Interface Architecture for Clustered Systems,
dent standard. VIDF working group topics include: Dell Computer Corporation White Paper, September
1998, http://www.dell.com/r&d/whitepapers/-
Operating System support extensions for VI
wpvia/wpvia.htm.
VIPL standardization
8. "Virtual Interface Architecture Specification," Version
socket APIs on top of VIPL 1.0, December 1997, http://developer.intel.com/
fault tolerance design/servers/vi/developer/ia_imp_guide.htm.
The defining standard for multi-pathing and fail-over 9. Von Eicken et al., "Evolution of the Virtual Interface