0% found this document useful (0 votes)
36 views4 pages

Enabling Inference Inside Software Switches

Uploaded by

Usman Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

Enabling Inference Inside Software Switches

Uploaded by

Usman Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Enabling Inference Inside Software Switches

Yung-Sheng Lu and Kate Ching-Ju Lin


Department of Computer Science, National Chiao Tung University, Taiwan
{yungshenglu, katelin}@cs.nctu.edu.tw

Abstract—Software Defined Networking (SDN) has been


emerged to solve the problem of traditional network archi-
tectures. The ability of programmable switches renders us an
opportunity to have computational tasks done in the switches.
With this nice property, in this work, we investigate the potential
of enabling machine learning inside a network. To this end, we
propose a new architecture, Intra-Network Inference (INI), which
equips each switch with a recently released component, called
neural compute stick (NCS), to enable intra-switch neural network
inference. Unlike conventional SDN architectures, which relay
backend servers to enable inference, our INI performs inference
locally at switches and, thereby, reduces the data forwarding
overhead and inference latency. Fig. 1: Legacy Architecture for Inference
Index Terms—SDN, P4, Neural Networks

I. I NTRODUCTION
Software Defined Networking (SDN) has been emerged
to solve the problem of traditional network architectures. In
SDN, a network is divided into the control plane and the
data plane. All network management is handled by the control
plane by software, while switches are only in charge of simple
data forwarding. Recent research has also explored how to
virtualize traditional network services by Network Function
Virtualization (NFV). NFV eliminates the need of providing
network functions by specialized dedicated hardware (such as Fig. 2: Intra-Network Inference
firewalls, routers, etc.). Instead, NFV is implemented using
flexible software, which can be flexibly deployed (removed) to enable inference capability inside switches, as illustrated
into (out of) the network as needed. It also becomes possible in Fig. 2. As the advanced software switches, e.g., P4, are
to develop a variety of value-added services. By integrating designed to have certain computational capability, it becomes
NFV and SDN, more diverse services can be designed and now possible to perform simple inference directly in switches,
boosted. without the assistance of backend servers or virtual machines.
On the other hand, with the evolution of artificial intel- To achieve this goal, this paper presents a new archi-
ligence. The system can leverage data processing or model tecture, called Intra-Network Inference (INI). We develop a
prediction to quickly process a large amount of data and make data forwarding processing system that allows packets to be
accurate decisions. In recent years, artificial intelligence has cloned to the kernel of a switch for on-line inference. To
been widely used in different fields, such as smart factories, In- enable in-switch inference, we leverage a new hardware, called
ternet of Things, computer vision, unmanned stores and other neural compute stick (NCS), developed by Intel Movidius
services. Many studies [1]–[3] have also used artificial intel- and released on the market recently. By connecting an NCS
ligence technologies to solve network management problems. to a P4 switch over a USB interface, the NCS can process
If these artificial intelligence services can be implemented as the cloned packets and perform real-time inference. An INI-
a virtual network function (VNF) through software, we can enabled switch is capable of filtering the packets that are useful
make an SDN intelligent and be managed more efficiently. for inference and hence reduces the cost of data cloning.
However, the combination of NFV and SDN still needs To verify the practicality of our design, we implement a
to transfer data streams from switches to different virtual prototype of INI using P4 switches and empirically measure
machines for inference, as illustrated in Fig. 1. The data the execution time required by each phase of INI.
exchange between switches and virtual machines would incur a The rest of this paper is organized as follows. Section II
fairly long delay and a large amount of data forwarding load, summarizes recent works on network management via ma-
which would easily saturate the bottleneck link and lead to chine learning. We then describe the design of our INI in
congestion. To resolve this problem, in this work, we propose Section III and show some preliminary results in Section IV.

© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019
✌✤✟✙✁ ✥✦ ✧✠★✖✕✁ ✜☎✢✁✛ ✞✆✟✖✕✖✕✓
✁✂✄☎✆✝ ✞✆✟✠✡☛ ✌✆✁✣✆☎☛✁✙✙✖✕✓ ✞✆✟✖✕✖✕✓ ✞✆✟✖✕✖✕✓ ✜☎✢✁✛
☞✌✍✎✌✏

✞✆✟✖✕✖✕✓ ✜☎✢✁✛ ✜☎✢✁✛ ✍✗ ✑✕✠✁✆✁✕☛✁


✍☎✕✡✓✚✆✟✂✖☎✕ ✍☎✒✣✖✛✟✂✖☎✕ ✜☎✢✁✛

✌✤✟✙✁ ✩✦ ✧✕✛✖✕✁ ✑✕✠✁✆✁✕☛✁


✁✂✄☎✆✝ ✞✆✟✠✡☛ ✑✒✟✓✁ ✍✗ ✑✕✠✁✆✁✕☛✁
☞✌✍✎✌✏ ✔✁✕✁✆✟✂✖☎✕ ✑✕✠✁✆✁✕☛✁ ✘✁✙✚✛✂

Fig. 4: Inference Process


Fig. 3: INI Architecture
III. INI D ESIGN
Finally, Section V concludes this work and summarizes some A. Architecture
directions of future study.
Figure 3 illustrates the hardware architecture of the pro-
posed INI. We use the Edgecore Wedge 100-32X switch with
II. R ELATED W ORK programmable Tofino switch silicon from Barefoot Networks
as P4 switches. Each P4 switch is loaded with Open Network
Install Environment (ONIE) software installer, which is com-
Recent work has shown possibility of leveraging machine patible with Open Network Linux (ONL). Besides, each P4
learning to improve the efficiency of network management. switch also supports USB Type-A port, which can connect
The studies [4]–[7] propose to detect and monitor elephant an NCS device. The P4Runtime Server and local controller
flows of a network via traffic classification or rule-based are two main components running on a P4 switch, which
algorithms. The work [4] combines switch-side filtering and communicate with each other via gRPC. gRPC is a modern
controller-assisted classification to enable real-time classifica- open source high performance RPC framework that can run in
tion. A cost-sensitive learning method [5] is then proposed any environment. ONOS (Open Network Operating System)
to further improve the inference speed. FLight [6] alterna- is one of the controller that supports P4Runtime, which is
tively develops a rule-based detector based on the the TCP designed for high availability, performance and scale-out. We
communication behaviors. Later work [7] not only detects use ONOS as the main controller that manages the whole
elephant flows but also counts the size of those flows for network and let the ONOS and P4 switch communicate via
traffic engineering. The above approaches are designed mainly gRPC.
based on conventional machine learning classification, whose Figure 4 illustrates the framework of INI. Our framework
performance is sensitive to feature selection. consists of two phases. The first phase is offline model
Some work [8], [9] investigates the network traffic classi- training, which creates the inference model for the next phase.
fication problem. The problem of identifying end-user appli- We use a CNN architecture LeNet-5 for offline training since
cations, like Facebook, Twitter and Skype is explored in [8]. the NCS now can only support the CNN model. Before
[9] gives a throughout survey about the challenges of network training, we need to process the traffic and partition packets
application identification, including port abuse, random port into sessions, each of which is defined as a bidirectional flow,
usage and tunneling. including both directions of traffic. Since the size of input data
Recent efforts then exploit deep learning techniques to for a CNN model should be fixed, we extract only the first n
enable more network services. The work [1] enables traffic bytes (e.g., n = 784) from each session for model training and
optimization, like flow scheduling, using reinforcement learn- inference. In general, the first few bytes of a session usually
ing. It develops a system that consists of two components: includes the important connection information (e.g., MAC
peripheral systems (PS), which runs on all end-hosts to collect layer and network layer headers) and a few of payload, which
flow information and make local decision, and central system should well characterize a session. When trimming all sessions
(CA), where global traffic information is aggregated and pro- into a uniform length, 0×00 is appended to complement it to
cessed. The interaction between the two components mimics n bytes if the size of a session is shorter than n bytes. After
the design of reinforcement learning so as to optimize network trimming, each session will be converted to the input format of
performance. DDNN [10] enables distributed deep learning the inference model for training and testing. Once the model
by allowing edges and end users to cooperatively finish the has been trained, to make the NCS be able to install the
inference tasks. To reduce the number of rules in TCAM, [2] training model for inference, we should configure the training
proposes a reinforcement learning scheme to determine which model for compiling an inference model.
rules are crucial and should be kept in TCAM. A malware After generating an inference model for an NCS, we can
classification system based on CNN is developed in [3]. In run the inference model within an NCS for online inference.
this work, we select [3] as an application to demonstrate the When a P4 switch receives the network traffic of a full session,
effectiveness of our in-network inference capability. we should do preprocessing to generate an input data for NCS

© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019
Metrics (ms) Legacy INI
Transmission per packet 39.53 0.378
Image transformation 0.35 0.38
Inference 3.78 8.56
Total 43.66 9.318

TABLE I: Inference Execution Time

verify the effectiveness of our design, we use malware classi-


fication as an application. We use a real data set, DeepTrack
Fig. 5: P4 filtering rule
(USTC-TFC2016) [11], which includes 791,615 packets, to
train a CNN-based classification model. We define packets
to perform inference. After getting the result of inference, the
with the same five-tuple as a flow, and the data set includes
manager can consider to send the whole result or just only
288,614 flows. The CNN model contains two convolutional
a notification of anomaly to the local controller. Based on
layers, two maxpool layers and two fully connected layers.
the inference results, the controller can adapt its strategy by
The model is trained with 20,000 epochs. Each session is
modifying the forwarding rules of switches or banning some
transformed to a gray-scale 28 × 28 image as the input of
misbehavior flows.
the inference model. Hence, we let each switch forward the
B. Flow Filtering first 784 bytes of a flow to the P4Runtime for inference in
NCS.
To capture only the first n bytes for inference, we leverage
the configurable P4 switches to direct only the first n bytes Execution Time of In-Switch Inference: We partition our INI
to P4Runtime for further process. By doing this, we can framework into three steps: packet forwarding (transmission),
significantly reduce the load of forwarding data among P4 transformation from packets to images, and inference. We
switch and P4Runtime. Figure 5 plots the process of the measure the time required by each phase in Table I for
proposed P4 filtering rule. In the beginning, Tofino will trigger the legacy GPU architecture and our INI, respectively. To
a packet-in event and send the packet to the local controller measure the transmission time between the P4 switch and
during the ingress stage. Besides, the local controller will also P4Runtime (or GPU server), we transmit the real traffic
buffer the packets. The local controller will add an entry into received by Tofino to P4Runtime via Tofino ingress or the
the NCS table and the other table for connection tracking. The remote GPU server through a 100 Mbps link, and calculate
byte-counter keeps tracking the number of bytes according to the average transmission time of each packet. Note that the
the table in connection tracking. The reason why the byte- legacy architecture requires each switch to forward the packets
counter is tracking the number of bytes according to the table to a backend server. To emulate this scenario, we generate
in connection tracking is to ensure that the number of bytes can additional 9 connections that also send a file to the GPU server
accumulate from the first packet of a session without missing at the same time. As multiple switches share a bottleneck link
the first packet when table-miss occurs in the NCS table. After connecting to the GPU server, the transmission time would be
collecting the first n bytes of a session, Tofino will notify the fairly high. However, INI only needs to redirect the packets
local controller that the queue is ready for inference. from the P4 switch to P4Runtime. Hence, the data forwarding
path is extremely short, as a result reducing the transmission
C. In-Switch Inference time significantly.
Upon receiving the notification, our system transforms the As for image conversion, since both the legacy and INI
data in queue into the input data format of a model and architecture leverages computing power, the process time is
sends to the NCS. The NCS then outputs the inference similar. Finally, to measure the inference time, we perform
result and returns the result back to the local controller. The inference for all the flows and find the aggregated time
local controller will add an entry into the NCS table and required to finish all the inference tasks. The table then shows
the table for connection tracking according to the inference the inference time per flow on average. As the NCS is a less
result from the NCS. By doing this, we can promptly detect capable device, its inference speed is slightly worse than GPU.
misbehavior or network properties via switches and configure Also, our result overestimates the performance of GPU since
rules immediately to react to network changes. we simply calculate the average time required by a single
inference request without considering the queueing delay.
IV. E VALUATION
However, a GPU server has to serve many switches and may
We implement a prototype of INI to evaluate its perfor- buffer lots of requests in its queue. In other words, the actual
mance. In our implementation, an Intel Movidius Neural Com- inference time, including the queueing delay, could be longer
pute Stick 1 (Intel NCS-1) is connected to a P4 Wedge100BF- in the legacy architecture. After summing up all the processing
32X switch via a USB port. We compare our INI with a legacy time, we can see that INI reduces the overall execution time
architecture, where the P4 switch is connected to a server with significantly, as compared to the legacy architecture, showing
two GeForce GTX 1080 Ti GPU for remote inference. To the effectiveness of our intra-network inference design.

© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019
Figure 7 plots the number of inference requests every
millisecond over time. The figure shows that, in most of time,
there exist more than 10 requests every millisecond. However,
each inference should be processed using 8.56 ms, on average,
as shown in Table I. This result verifies that the arrival rate of
inference requests is way more higher than the service rate of
an NCS (actually the service rate of a GPU as well). That is,
it is not possible to just rely on a single NCS to handle all the
inference requests in a network. It is hence worth study about
how to leverage multiple NCS-equipped switches to share the
inference load of a network in the future.
V. C ONCLUSION
In this work, we presented an intra-network inference ar-
chitecture, called Intra-Network Inference. We combine pro-
grammable switches with a recently-released component, i.e.,
Fig. 6: Malware Classification Accuracy
neural compute stick (NCS), to enable switches to performance
3
local inference. We develop filtering rules in the switch and
10
Number of Reqests communication channels among the switch and P4Runtime to
realize in-network inference. By doing this, our architecture
avoids the heavy load of data forwarding among data and
2
control planes and further enable real-time network manage-
10
ment inside the network. We implement a prototype of INI to
Requests

measure the execution time required by each processing step


and point out some potential future research directions.
10 1 R EFERENCES
[1] L. Chen, J. Lingys, K. Chen, and F. Liu, “AuTO: Scaling deep rein-
forcement learning for datacenter-scale automatic traffic optimization,”
in ACM SIGCOMM, 2018.
[2] T.-Y. Mu, A. Al-Fuqaha, K. Shuaib, F. M. Sallabi, and J. Qadir, “SDN
10 0 flow entry management using reinforcement learning,” ACM Trans.
3 5 7 9 11 13 15 17 19 21 23 25 27 29
Time (ms) Auton. Adapt. Syst., vol. 13, no. 2, Nov. 2018.
[3] Wei Wang, Ming Zhu, Xuewen Zeng, Xiaozhou Ye, and Yiqiang Sheng,
Fig. 7: Inference Request Interval “Malware traffic classification using convolutional neural network for
representation learning,” in International Conference on Information
Inference accuracy: The trained model is a multi-class clas- Networking (ICOIN), Jan 2017.
[4] Y. Huang, W. Shih, and J. Huang, “A classification-based elephant
sification model, which can classify a flow into one of the flow detection method using application round on SDN environments,”
potential applications (normal or malware). We reserve 80% in 19th Asia-Pacific Network Operations and Management Symposium
of the flows to train the CNN model and use the remaining (APNOMS), Sep. 2017.
[5] Peng Xiao, Wenyu Qu, Heng Qi, Yujie Xu, and Zhiyang Li, “An efficient
20% of the flows to test the model accuracy. The prediction elephant flow detection with cost-sensitive in SDN,” in 1st International
results is shown as a confusion matrix summarized in Figure 6. Conference on Industrial Networks and Intelligent Systems (INISCom),
The figure shows that most of the classes can be accurately March 2015.
[6] A. AlGhadhban and B. Shihada, “FLight: A fast and lightweight
classified with a prediction ratio higher than 94%. Though elephant-flow detection mechanism,” in IEEE International Conference
the processing speed of the NCS is slightly worse than on Distributed Computing Systems (ICDCS), July 2018.
the GPU server, it can still achieve an accurate prediction [7] S. C. Madanapalli, M. Lyu, H. Kumar, H. H. Gharakheili, and V. Sivara-
man, “Real-time detection, isolation and monitoring of elephant flows
performance, showing its potential of realizing intra-network using commodity SDN system,” in IEEE/IFIP Network Operations and
machine learning. Management Symposium, April 2018.
[8] B. Yamansavascilar, M. A. Guvensan, A. G. Yavuz, and M. E. Karsligil,
Inference request arrival rate: We finally examine whether “Application identification via network traffic classification,” in Inter-
the inference capability of NCS is sufficiently high for real- national Conference on Computing, Networking and Communications
(ICNC), Jan 2017.
time traffic. To check what is the arrival rate of inference [9] A. Tongaonkar, R. Keralapura, and A. Nucci, “Challenges in network ap-
sessions, we extract the timestamp of each packet and find the plication identification,” in USENIX Workshop on Large-Scale Exploits
timestamp of the packet that includes the 784-th byte of the and Emergent Threats, 2012.
[10] S. Teerapittayanon, B. McDanel, and H. T. Kung, “Distributed deep
flow. When this packet arrives, it means that a new inference neural networks over the cloud, the edge and end devices,” in IEEE
request will be sent to P4Runtime. That is, the timestamp of International Conference on Distributed Computing Systems (ICDCS),
the packet including the 784-th byte is exactly the timestamp June 2017.
[11] “Deeptraffic dataset.” [Online]. Available:
of an inference request. We then count the number of inference https://github.com/echowei/DeepTraffic/tree/master/1.malware traffic classification
requests arrived every millisecond.

© Copyright IEICE – The 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019

You might also like