Enabling Inference Inside Software Switches
Enabling Inference Inside Software Switches
I. I NTRODUCTION
Software Defined Networking (SDN) has been emerged
to solve the problem of traditional network architectures. In
SDN, a network is divided into the control plane and the
data plane. All network management is handled by the control
plane by software, while switches are only in charge of simple
data forwarding. Recent research has also explored how to
virtualize traditional network services by Network Function
Virtualization (NFV). NFV eliminates the need of providing
network functions by specialized dedicated hardware (such as Fig. 2: Intra-Network Inference
firewalls, routers, etc.). Instead, NFV is implemented using
flexible software, which can be flexibly deployed (removed) to enable inference capability inside switches, as illustrated
into (out of) the network as needed. It also becomes possible in Fig. 2. As the advanced software switches, e.g., P4, are
to develop a variety of value-added services. By integrating designed to have certain computational capability, it becomes
NFV and SDN, more diverse services can be designed and now possible to perform simple inference directly in switches,
boosted. without the assistance of backend servers or virtual machines.
On the other hand, with the evolution of artificial intel- To achieve this goal, this paper presents a new archi-
ligence. The system can leverage data processing or model tecture, called Intra-Network Inference (INI). We develop a
prediction to quickly process a large amount of data and make data forwarding processing system that allows packets to be
accurate decisions. In recent years, artificial intelligence has cloned to the kernel of a switch for on-line inference. To
been widely used in different fields, such as smart factories, In- enable in-switch inference, we leverage a new hardware, called
ternet of Things, computer vision, unmanned stores and other neural compute stick (NCS), developed by Intel Movidius
services. Many studies [1]–[3] have also used artificial intel- and released on the market recently. By connecting an NCS
ligence technologies to solve network management problems. to a P4 switch over a USB interface, the NCS can process
If these artificial intelligence services can be implemented as the cloned packets and perform real-time inference. An INI-
a virtual network function (VNF) through software, we can enabled switch is capable of filtering the packets that are useful
make an SDN intelligent and be managed more efficiently. for inference and hence reduces the cost of data cloning.
However, the combination of NFV and SDN still needs To verify the practicality of our design, we implement a
to transfer data streams from switches to different virtual prototype of INI using P4 switches and empirically measure
machines for inference, as illustrated in Fig. 1. The data the execution time required by each phase of INI.
exchange between switches and virtual machines would incur a The rest of this paper is organized as follows. Section II
fairly long delay and a large amount of data forwarding load, summarizes recent works on network management via ma-
which would easily saturate the bottleneck link and lead to chine learning. We then describe the design of our INI in
congestion. To resolve this problem, in this work, we propose Section III and show some preliminary results in Section IV.
© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019
✌✤✟✙✁ ✥✦ ✧✠★✖✕✁ ✜☎✢✁✛ ✞✆✟✖✕✖✕✓
✁✂✄☎✆✝ ✞✆✟✠✡☛ ✌✆✁✣✆☎☛✁✙✙✖✕✓ ✞✆✟✖✕✖✕✓ ✞✆✟✖✕✖✕✓ ✜☎✢✁✛
☞✌✍✎✌✏
© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019
Metrics (ms) Legacy INI
Transmission per packet 39.53 0.378
Image transformation 0.35 0.38
Inference 3.78 8.56
Total 43.66 9.318
© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019
Figure 7 plots the number of inference requests every
millisecond over time. The figure shows that, in most of time,
there exist more than 10 requests every millisecond. However,
each inference should be processed using 8.56 ms, on average,
as shown in Table I. This result verifies that the arrival rate of
inference requests is way more higher than the service rate of
an NCS (actually the service rate of a GPU as well). That is,
it is not possible to just rely on a single NCS to handle all the
inference requests in a network. It is hence worth study about
how to leverage multiple NCS-equipped switches to share the
inference load of a network in the future.
V. C ONCLUSION
In this work, we presented an intra-network inference ar-
chitecture, called Intra-Network Inference. We combine pro-
grammable switches with a recently-released component, i.e.,
Fig. 6: Malware Classification Accuracy
neural compute stick (NCS), to enable switches to performance
3
local inference. We develop filtering rules in the switch and
10
Number of Reqests communication channels among the switch and P4Runtime to
realize in-network inference. By doing this, our architecture
avoids the heavy load of data forwarding among data and
2
control planes and further enable real-time network manage-
10
ment inside the network. We implement a prototype of INI to
Requests
© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019