VIAP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

By Wilf Sullivan,

MARKET Product Marketing Manager,


DY 4 Systems Inc.

Virtual Interface
Architecture Primer
Traditional operating system interfaces to network hardware prevent application programs from taking
advantage of the performance improvements (i.e. high bandwidth and extremely low latencies) of giga-
bit-per-second interconnects like Fibre Channel. To eliminate this bottleneck, a group of independent
hardware and software vendors have defined the Virtual Interface (VI) Architecture. VI is a standard and
extremely efficient way to reduce software overhead between a high-performance CPU/memory sub-
system and a high-performance network.
This primer introduces the VI Architecture and provides an overview of the key elements of VI. Using the
clustering challenge as an example, it describes how these elements work to provide an efficient way
of moving data between applications and network hardware.

A TECHNICAL OVERVIEW Pasquale). Further, they are CPU intensive and


lthough gigabit-per-second interconnects like extremely inefficient with respect to end-to-end laten-

A Fibre Channel provide network bandwidths


with extremely low latencies, traditional oper-
ating system interfaces to network hardware prevent
cies.

VIRTUAL INTERFACE
application programs from taking advantage of these ARCHITECTURE (VI)
performance improvements. To eliminate this bottle- As a standard mechanism to deliver high performance
neck, a group of independent hardware and software directly to applications, and to eliminate overhead and
vendors have defined the Virtual Interface (VI) inefficiencies common to operating systems and net-
Architecture. VI is a standard and extremely efficient work stacks, Intel®, Compaq® and Microsoft® have
way to reduce the software overhead between a high- jointly defined and developed the Virtual Interface
performance CPU/memory subsystem and a high-per- Architecture (VI). The development of VI now allows
formance network. application developers and operating system vendors
This primer is intended as an introduction to the VI to code to a standard architecture for lightweight mes-
Architecture and includes an overview of the key ele- saging. Further, it allows hardware vendors to supply
ments of VI as well as a description of how these ele- products implementing it, thus creating a total end-to-
ments work to provide an efficient way for moving data end solution with multi-vendor support.
between applications and network hardware. The VI bypasses much of the overhead in traditional proto-
motivation for creating the VI specification was the col stacks and provides a more direct access to the
clustering challenge - and that's where we begin. network interface hardware. VI gives a concise set of
operations for moving data between network connec-
THE CLUSTERING CHALLENGE
Clustering is the scaling of computer power to
increase overall performance and availability. This is
accomplished by connecting processors so that they
can collaboratively perform computations while
appearing as a single computing resource to the
application or client (see Figure 1). From the applica-
tion's point of view, a computer cluster is a virtual main-
frame because distribution of processing is complete-
ly transparent. The performance gains realized by scal-
ing with clusters are highly dependent on the underly-
ing network and commu-nication protocols. The goal
is to obtain linear performance scaling (i.e. a return of a
dollar of performance gain for every dollar of invest-
ment).
The overhead of standard communication protocols
and the inefficiencies of their interaction with operating
systems greatly reduce the performance benefits of
scaling. For example, some of the best TCP/IP protocol
stack implementations have sig-nificant throughput
bottlenecks (see discussion in "Profiling and Reducing
Processing Overheads in TCP/IP" by Kay and Figure 1. Clustered servers in a Fibre Channel Network.

12 Copyright 2000 by Dedicated Systems Magazine - 2000 Q1 (http://www.dedicated-systems.com)


AD PHAR LAP
MARKET

Figure 2. Contemporary Network Architecture.

tions with latency characteristics closer to memory duplicate/redundant data copy-ing as it moves
movement than network operations. It eliminates most between application and network hardware
of the copying inherent in the movement of data inefficient protocol stacks resulting in costly CPU
between an application and the supporting operating overhead (e.g., IP checksum handled by CPU).
system. It also reduces wasted CPU cycles typically
The VI network architecture (see Figure 3) eliminates
required to manage the interaction between the oper-
this CPU overhead by providing direct communication
ating system and the network interface hardware.
between application software and the network hard-
VI defines a combination of hardware, firmware and ware, including:
operating system driver interaction to improve the effi-
direct access from NIC to application memory, elim-
ciencies of network communications. In fact, one of the
inating virtual to physical memory mapping over-
major tenets of VIA is that the architecture be simple
head
enough to be integrated in silicon. VI-compliant hard-
ware, such as a Fibre Channel NIC with hardware user-level access to NIC via a Kernel Agent reduc-
assist for VI connections and compatible operating ing user to kernel context switches
system drivers, off-loads the host CPU by performing elimination of operating system driver interrupts
much of the work involved in transferring data directly direct data copies between application memory
in and out of application-based buffers. At the same and NIC
time, VI drivers are fully compatible with both general
NIC-based protocol processing, off-loading the host
purpose and real-time operating systems such as
card CPU.
Windows NT™, Linux, VxWorks® and LynxOS™.

NETWORK ARCHITECTURE VIA MODEL


This section introduces VIA nomenclature and
The contemporary network architecture (see Figure 2)
describes the major elements of the VI Architecture
consists of many layers of software interaction that sit
Model. Figure 4 illustrates the relationship between
between application software and the network hard-
these major elements.
ware.
Overhead in this system includes: User vs. Kernel Space
virtual to physical memory mapping of data When describing the elements of VI, User space and
Kernel space are often distinguished. User space
operating system context switches refers to application processes while operating system
interrupt latencies associated with hardware inter- functions (including device drivers, described below)
rupts from NIC operate in Kernel or protected mode. Application
processes operate in their own User space that is pro-

Figure 3. VI Network Architecture.

14 Copyright 2000 by Dedicated Systems Magazine - 2000 Q1 (http://www.dedicated-systems.com)


MARKET

requests to the NIC.


VI Kernel Agent
The VI Kernel Agent is a privileged part of the operat-
ing system responsible for management functions for
registering communication memory, and setting up
and breaking down VIs. The VI Kernel Agent is imple-
mented as an operating system driver and operates in
Kernel rather than User space. Functions handled by
the Kernel Agent are infrequent because VIs are typi-
cally established once and have long lives; thereby
minimizing the overhead in switching between user
and kernel space. In fact once the VI is established,
application programs can communicate using VIPL
Figure 4. VI Architecture Model.
Send and Receive calls that communicate directly to
the hardware, bypassing the operating system over-
tected from other applications. In User mode, the oper- head.
ating system controls application access to network
hardware through operating system extensions known
Completion Queues
as device drivers. Device drivers provide a set of func- A completion queue stores information used to notify
tion calls that can be used to send and receive data a VI of the completion of a Send or Receive operation.
using the network hardware. In traditional operating Completion queues reside in VI NIC memory and sim-
system interfaces to network hard-ware, these driver- plify synchronization activities by combining comple-
level calls result in intermediate data copies from User tion notifications from multiple interfaces into a single
to Kernel space and also in operating system context completion queue. Completion queues are created by
switches. Context switches and intermediate data an application through a VIPL call.
copies result in increased CPU overhead and Hardware Doorbells
increased latencies in data movement that VI attempts
Each Send and Receive queue has an associated
to reduce or eliminate.
work notification mechanism called a "doorbell." A
Application Program doorbell is a mechanism for a process to notify the VI
An application program is any software process or NIC that work has been placed on a Send or Receive
processes that will initiate the movement of data over request. A doorbell is typically implemented as a mem-
the network to another process. An application oper- ory-mapped register on the VI NIC.
ates in a protected region of memory, where it main- VI NIC
tains data that can be communicated over network
A VI NIC is a network interface controller that complies
interface hardware.
with the VIA specification. The VI NIC incorporates
VI Provider Library (VIPL) Send and Receive queues, completion queues, door-
The VI Provider Library (VIPL) is the interface between bells and all of the processing logic for mapping VI
the application program and the network hardware data to a network such as Fibre Channel. For an exam-
(called the VI Network Interface Card or VI NIC). The ple of a VI NIC implementation, see DY 4's PMC-642
VIPL is the standard API that provides a common set
of function calls independent of the underlying hard-
ware implementation.
The VIPL provides two communication paths to the VI
NIC, one through the VI Kernel Agent and the other
directly to the VI NIC. VIPL commands used to create
and manage VI tasks do this through the VI Kernel
Agent. The more common VIPL operations that sup-
port performance-critical data movement operations
bypass the VI Kernel Agent overhead and communi-
cate directly with the hardware on the VI NIC. Because
both the application program and VIPL reside in User
space, data transfers between the application and VI
NIC can be performed without incurring context Figure 5. VI Dataflow.
switches between User and Kernel space.
Virtual Interface (VI) Fibre Channel module at http://www.dy4.com/ps/-
A virtual interface is the interface between a VI NIC and datasht/pdf/pmc642.pdf.
an application program process that gives the VI direct
access to application program memory. Each VI con- WHAT'S A VI?
sists of a Send queue and a Receive queue. The Conceptually, VIA provides multiple independent appli-
queues hold descriptors that are data structures used cation processes with their own direct control of the
by the application to communicate Send and Receive network hardware. Each VI is a communication end-

Copyright 2000 by Dedicated Systems Magazine - 2000 Q1 (http://www.dedicated-systems.com) 15


MARKET

point that can be logically connected to another VI to also eliminates the protection mechanisms that
support point-to-point communication of data between ensure that applications can't interfere with each other.
processes (see Figure 5). Each VI consists of one Send VI provides applications with direct access to the net-
queue and one Receive queue, each NIC can support work interface hardware while retaining access protec-
as many as 64K separate VIs, and a process can have tion. It offers a mechanism for memory protection in the
multiple VIs. form of protection tags that ensures that a user
process cannot send out of or receive into memory
CREATING A VI locations that it does not own.
The VIPL provides an application call that in turn com- Sending and Receiving Data
mands the VI Kernel Agent to initiate the creation of a
A pair of processes connected by a VI can use that
VI. The creation process consists of associating a
channel to communicate data with minimal host
doorbell and Send/Receive queues with a VI and
processor and operating system overhead. Two differ-
returning addresses of these to the calling application.
ent data transfer mechanisms are supported, a tradi-
There is no connection established upon creation of a
tional Send/ Receive model and the Remote Direct
VI and no data will flow between processes until a con-
Memory Access (RDMA) model.
nection is established with a pair of VIs.
Send Receive Model
ESTABLISHING A CONNECTION The Send/Receive model is the traditional method of
VI provides a flexible connection management transferring data between two end-points. With this
scheme with support for both a client-server and peer- model, the receiving end-point executes a Receive
to-peer model. The client-server model offers the tradi- command and the transmitting end-point executes a
tional blocking connection scheme; one side of the Send command. The Receive-side process specifies
connection waits for connection requests, and the con- the memory location where data will be placed and
nection is established after another remote process the Send-side process specifies the memory location
issues a connection request. The peer-to-peer scheme from which data will be sent. For synchronization pur-
provides a non-blocking connection scheme that poses, both the Send-side and Receive-side process-
allows a connection request to time out if there is no es are notified when their respective requests have
matching connection request from a peer after a fixed completed.
time period. RDMA Model
Establishing a VI connection is typically an infrequent Somewhat unique to VI is the RDMA model, whereby
task that results in a long-life connection for many data the initiator of the data transfer specifies both the
exchanges. The connection will remain until an explic- source memory region and the destination memory
it disconnect request is issued by an application on region. The RDMA model includes both an RDMA
one side of the connection. write and an RDMA read capability. This is a very use-
ful operation because it allows a node to transfer data
MEMORY REGISTRATION AND to or from remote memory without intervention by the
PROTECTION remote CPU. RDMA operations do require that the
Most modern operating systems support virtual mem- remote end pre-registers memory regions to be used
ory management schemes that force a virtual-to-phys- for RDMA operations.
ical address map translation so that network hardware
can access user data. These address translations are SYNCHRONIZATION AND
required for every data transfer request introducing a COMPLETION
significant overhead for every network transfer. VIA supports both polled and interrupt mechanisms to
The VI architecture requires that data blocks used for synchronize and inform the application program that
data transfer over a VI be tied to a physical memory an operation has completed. By using a VIPL call to
address rather than a virtual address in User space. poll on the head of each VI queue, an application can
This provides the network hardware with free access to check on completion of a request. If the descriptor in
application memory without participation of the operat- the work queue is complete, the VIPL call removes the
ing system. This reduces software overhead in the per- descrip-tor from the head of the queue and returns the
formance-critical data transfer path, and eliminates the descriptor address to the calling process; otherwise, it
need for intermediate data copying from User to Kernel returns a unique status and the head of the queue
space. does not change.
The VI architecture enforces a memory registration The user may also use a blocking call to wait on the
scheme to tie down physical memory. Memory regis- completion of a request. When the request has com-
tration requires that application processes allocate a pleted, an operating system interrupt will be generated.
region of memory in User space and pass the address VIA supports both an interrupt to awaken a blocked
of that region to the VI Kernel Agent as a parameter of process as well as a callback function.
the VIPL register-memory command. A memory handle As an alternative to synchronizing on individual VI
returned by the VIPL function can then be used in sub- queues, there is a construct that supports coalescing
sequent data transfer requests. com-pletion notifications from multiple work queues
By eliminating operating system level memory man- into a single completion queue. This simplifies the syn-
agement from the critical path, there is a risk that one chroni-zation activities and reduces overhead for appli-

16 Copyright 2000 by Dedicated Systems Magazine - 2000 Q1 (http://www.dedicated-systems.com)


AD NATIONAL
INSTRUMENTS
MARKET

cation software with multiple VIs and multiple out- Information on the VIDF can be requested by sending
standing requests. a request to [email protected].
VIA implementation for Linux (M-VIA)
LOOKING FORWARD
M-VIA (Modular-VIA) is a complete high-performance
VIA introduces open standard networking concepts,
implementation of the VIA for Linux. It was written at the
data structures, architecture and API that allow hard-
NERSC center at Lawrence Berkeley National
ware and software vendors to build compatible, inter-
Laboratory. It is essentially shareware that can be port-
operable, high performance, and competitive network-
ed to new hardware platforms. The NERSC web site
ing products. It allows application programs to take
provides a downloadable image as well as information
advantage of the high bandwidth and low latency
on performance numbers and hardware support for M-
capabilities of high speed network technologies, such
VIA. http://www.nersc.gov/research/ftg/via/.
as Fibre Channel, by reducing the overhead and ineffi-
ciencies of traditional operating systems and networks MPI layered on top of VIA
stacks. Message Passing Interface (MPI), a de facto standard
for parallel communications, has been implemented
on VIA to provide an efficient interconnect for clusters
of workstations in a gigabit/second system area net-
work. MPI has become the programming interface of
FOR MORE INFORMATION ON VIA:
choice in the parallel computing world and is used to
Virtual Interface Architecture interconnect digital signal processors.
Specification For information on MPI Softech's implementation of
This is the primary VIA specification that defines the VI MPI on VIA see http://www.giganet.com/cluster/-
architecture and all related elements. This document whitepapers/MPIimpforvi.html
can be downloaded from the Intel developer's site at
http://developer.intel.com/design/servers/vi/developer/
ia_imp_guide.htm. This site also provides the VI Wilf Sullivan is the Product Marketing Manager of
Conformance Tests - a set of tests that can be used by Channel 1* networking products and Pentium* sin-
independent hardware and software vendors to gle board computers at DY 4 Systems Inc. Building
demonstrate conformance of VI-related products to the on his Master's degree in Computer Science, Wilf
VIPL specification. has over 10 years' experience as a designer and
VIA Developer's Guide developer of software for the real-time embedded
environment.
The specification defines the standard application pro-
gram interface.This API is called the VI Provider Library.
It is defined in an operating system-independent man- REFERENCES
ner. The VIPL specification can be downloaded from
1. Dunning et al., "The Virtual Interface Architecture,"
the VI Architecture web-site at www.viarch.org
IEEE Micro, March/April 1998, pp. 66-73.
FC-VI Specification 2. Kay and Pasquale, "Profiling and Reducing
(Fibre Channel Mapping to VI) Processing Overheads in TCP/IP" IEEE/ACM
The mapping of the VI architecture onto the Fibre Transaction on Networking, Vol. 4, No. 6, December
Channel framing protocol is defined by the FC-VI (ANSI 1996, pp. 817-828.
T11) standards working group. A copy of the standard 3. Mukherjee et al., "Making Network Interfaces Less
can be downloaded from the T11 web site at Peripheral," IEEE Computer, Oct. 1998, pp. 70-76.
www.t11.org. The goal of the FC-VI group is to provide
a mapping between Fibre Channel and VI that fully 4. "Next Generation I/O: A New Approach to Server I/O
exploits the potential of both. Included within the rec- Architectures," Aberdeen Group Technical White
ommended scope of this project is the creation of Paper, February 1999, http://www.ngioforum.org/-
mappings to Fibre Channel for the transport of VIA and events/02991357.html.
support for the full range of Fibre Channel topologies, 5. Skjellum et al., "An Efficient MPI Implementation for
including loops and fabrics, to enable scalable clus- Virtual Interface (VI) Architecture-Enabled Clus-ter
tering solutions. Computing," http://www.giganet.com/cluster/-
whitepapers/MPIimpforvi.html.
VI Developers Forum (VIDF)
6. "Virtual Interface (VI) Architecture Developer's Guide,"
The VI Developers Forum (VIDF) is an organization of
Revision 1.0, September 1998, www.viarch.org.
independent software and hardware vendors that
works to define and maintain VI as an open indepen- 7. Virtual Interface Architecture for Clustered Systems,
dent standard. VIDF working group topics include: Dell Computer Corporation White Paper, September
1998, http://www.dell.com/r&d/whitepapers/-
Operating System support extensions for VI
wpvia/wpvia.htm.
VIPL standardization
8. "Virtual Interface Architecture Specification," Version
socket APIs on top of VIPL 1.0, December 1997, http://developer.intel.com/
fault tolerance design/servers/vi/developer/ia_imp_guide.htm.
The defining standard for multi-pathing and fail-over 9. Von Eicken et al., "Evolution of the Virtual Interface

18 Copyright 2000 by Dedicated Systems Magazine - 2000 Q1 (http://www.dedicated-systems.com)

You might also like