0% found this document useful (0 votes)
114 views6 pages

P4 Packet Routing

The document presents ROAR, a novel architectural solution that utilizes Deep Reinforcement Learning (DRL) within P4 programmable switches for adaptive routing in complex networks. By employing a multi-agent reinforcement learning framework, ROAR allows network devices to independently learn cooperative behaviors to optimize packet transmission, improving throughput and reducing delays compared to traditional methods. Experimental results demonstrate the effectiveness of ROAR in handling increased traffic while maintaining efficient routing policies.

Uploaded by

bektashakan308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views6 pages

P4 Packet Routing

The document presents ROAR, a novel architectural solution that utilizes Deep Reinforcement Learning (DRL) within P4 programmable switches for adaptive routing in complex networks. By employing a multi-agent reinforcement learning framework, ROAR allows network devices to independently learn cooperative behaviors to optimize packet transmission, improving throughput and reducing delays compared to traditional methods. Experimental results demonstrate the effectiveness of ROAR in handling increased traffic while maintaining efficient routing policies.

Uploaded by

bektashakan308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)

ROAR: Routing Packets in P4 Switches With


2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN) | 979-8-3503-4319-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICMLCN59089.2024.10625142

Multi-Agent Decisions Logic


Antonino Angi ∗ Alessio Sacco ∗ Flavio Esposito † Guido Marchetto ∗
∗ Department of Control and Computer Engineering, Politecnico di Torino, Italy
† Computer Science Department, Saint Louis University, USA

Abstract—The soaring complexity of networks has led to changes, as in short traffic bursts. Moreover, even when the
more complex methods to efficiently manage and orchestrate software control plane is local on the switches, the ability to
the multitude of network environments. Recent advances in select new routes is often limited and not fast enough [7].
machine learning (ML) have opened new opportunities for
network management automation, exploiting existing advances To overcome these limitations, recently programmable data
in software-defined infrastructures. Advanced routing strategies planes have gained popularity, and different works have de-
have been proposed to accommodate the traffic demand of veloped mechanisms that, operating entirely in the data plane,
interactive systems, where the common architecture is composed enable real-time adaptation [8], [9]. These solutions can deliver
of a data-driven network management schema collecting network considerable performance benefits over more static mecha-
data that feed a reinforcement learning (RL) algorithm. However,
the overhead introduced by the SDN controller and its operations nisms and centralized approaches using fine-grained perfor-
can be mitigated if the networking architecture is redesigned. mance information on hardware timescales. However, these
In this paper, we propose ROAR, a novel architectural solution techniques are limited to trivial performance-aware policies
that implements Deep Reinforcement Learning (DRL) inside that are unable to learn during the execution and, consequently,
P4 programmable switches to perform adaptive routing policies adapt to multiple scenarios. Moreover, as the complexity of
based on network conditions and traffic patterns. The network
devices act independently in a multi-agent reinforcement learning the network grows, determining the optimal routing policies
(MARL) framework but are able to learn cooperative behaviors to avoid congestion and improve performance has become
to reduce the queuing time of transmitting packets. Experimental increasingly essential but also challenging.
results show that for an increasing amount of traffic in the To automate routing decisions directly in the network de-
network, there is both a throughput and delay improvement in vice, we designed a Reinforcement learning for Autonomous
the transmission compared to traditional approaches.
Index Terms—deep reinforcement learning, P4, routing Routers (ROAR). In our solution, we uses network pro-
grammability in general, and P4 [10] programmable switches
I. I NTRODUCTION in particular, to perform distributed routing decisions via Deep
In recent years, there has been a rapid increase in the Reinforcement Learning (DRL). As such, every switch of
number of brand-new applications, which not only places the network is an agent of the DRL system, which uses the
more and more demands on communication technologies, e.g., algorithm to decide the forwarding port for the incoming
5G and 6G, but also poses significant difficulties for the packet according to two main factors: the next hop to the
Internet. Each application, in particular, has distinct yet strict destination, known for the topology of interest, and the port’s
requirements for latency, jitter, throughput, and packet loss outgoing queue, to be minimized. Since the switches con-
rate. We can observe that as networks continue to evolve stantly learn from the environment, the model is periodically
and become more complex, the need for efficient routing trained to consider the impact of different routing decisions
mechanisms becomes increasingly crucial. on network performance and select the best route based on
One dictating trend is applying Machine Learning (ML) and learned policies. Designing a DRL model with P4 is known
Deep Learning (DL) to routing with the aim of leveraging to be challenging since the architecture does not support
information about past traffic conditions to learn good routing loops, complex arithmetical operations, or if-else conditions
configurations for future conditions [1]. The flexibility pro- in action blocks, which are essential for the DRL algorithm.
vided by SDN enables a fast reactive and proactive network To overcome this limitation, we modify the P4-16 compiler
management, in which the SDN controller can easily observe in order to adapt to a C++ external module, which is the
the network and react to changes in traffic demand and fundamental intermediary between the P4 application and the
evolutions [2], [3]. One class of ML is Reinforcement Learning DRL algorithm. We evaluated our solution on an emulated
(RL), which well fits routing problems given its automatic network over Mininet, showing that when the network starts
learning improvements to achieve the optimal policy [4]. In being congested, the benefits of ROAR can be observed in the
recent years many research works integrated RL and Deep increment of throughput and delay reduction.
Reinforcement Learning (DRL) into SDN networks for routing The rest of the paper is structured as follows. Section II
optimizations [5], [6]. presents literature about RL-based routing. In Section III, we
However, these approaches based on centralized controllers describe ROAR’s components. Section IV presents the exper-
are inherently too slow to respond to fine-grained traffic imental results, and finally, Section V concludes the paper.

979-8-3503-4319-9/24/$31.00 ©2024 IEEE 63


Authorized licensed use limited to: Universiteit van Amsterdam. Downloaded on September 27,2024 at 11:17:12 UTC from IEEE Xplore. Restrictions apply.
2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)

II. R ELATED WORK


With the increasing number of connected devices and more
Non-ML ML-based
demanding applications, e.g., Tactile Internet, meta-verse, the Non-adaptability to topology
volume of data traffic flowing through networks has grown changes (e.g., ECMP, OSPF)
exponentially. This has created a pressing need for efficient
routing mechanisms to optimize network performance, re- RL-based Supervised
duce latency, and ensure reliable data transmission, bringing Demanding a comprehensive
dataset for all scenarios
researchers to propose different automated and reactive ap-
proaches, usually combining ML techniques, such as RL and
DRL, to perform traffic prediction for load balancing and rout- Distributed Centralized
ing [7]. One example is QR-SDN [11], where the authors use Slow to react for large
topologies (e.g., RSIR, QR-SDN)
tabular RL (Q-learning) techniques to reduce latency across the
network by optimizing multipath routing. An SDN centralized
controller routes packets using a flow-preserving strategy Internal External
that aims to minimize the latency of transmissions. Another ROAR Limited scalability
(e.g., DROM, SmartCC)
interesting study is RSIR [12] that implements a knowledge
plane, in connection with the management plane, to store the Fig. 1: Design space for adaptive routing techniques highlight-
data, which are then used by the SDN centralized controller ing difference with state-of-the-art.
to find the routing shortest path and balancing the load,
comparing their results with the classic Dijkstra algorithm. As routing algorithm running inside the device. As shown in
the authors suggest, adopting a centralized controller with a Fig. 1, differently from other centralized solutions that face
global view brought a low response time in case of topology the issue of data collection, we propose a routing optimization
changes. This solution works well in limited scenarios, but mechanism aiming at providing a general solution adaptable
in large network scenarios and with an increasing number of to any network scenario and any P4-compatible switch.
flow numbers, we assist in a downgrade of throughput and
overall delay due to the constant interaction with a centralized III. S YSTEM A RCHITECTURE AND C OMPONENTS
controller that cannot ensure reliable packet transmissions.
In this section, we describe the design of our solution, as
Over the last few years, Deep Reinforcement Learning (DRL)
well as the principles behind this definition. As represented
methods have also been quickly adopted in many fields of
in Fig. 2, we can observe that ROAR revolves around using
networking, from the Internet of Things (IoT) to concurrent
P4 switches to control the forwarding plane directly inside
multipath transfer data scheduling, mobile-edge computing,
the network device, adopting forwarding decisions according
and heterogeneous networks, focusing especially on routing
to the outcome of a deep reinforcement learning approach.
optimization and congestion control [5]. An example of DRL-
Every switch of our network is composed of three main blocks:
based approach is [13], where the authors propose DROM,
(i) the P4 application, which constitutes the logic of the
a deep policy gradient routing optimization mechanism that
network device, (ii) the Deep Reinforcement Learning (DRL)
uses neural networks to improve the network’s overall perfor-
module, that runs the learning algorithm to route the packets,
mance. However, this work requires human intervention for the
(iii) the Inter-process communication (IPC) module, which is
network strategy customization and maintenance that affects
composed of the high-level modules and data structures needed
the rewarding function [7]. Routing optimization mechanisms
to connect the P4 application to the DRL module.
were also essential in heterogeneous networks where various
devices with different capabilities, such as cell phones, laptops,
A. DRL Module in ROAR
and tablets, are connected, and efficient resource utilization
was vital to avoid delays and maximize the throughput. An In ROAR, every switch of the network is an agent of
example of an application in a heterogeneous network is the DRL, making the solution a Multi-Agent Reinforcement
SmartCC [14], a multipath congestion control approach based Learning (MARL) [20] scenario that applies forwarding de-
on DRL where an asynchronous RL framework learns a set of cisions according to the result of the DRL model in an
congestion rules and adapts the congestion windows accord- environment where other agents are also trying to reach the
ingly. Several attempts to alleviate network congestion and same goal. We consider a system of M agents (routers)
balance network load utilizes DRL methods running without within a shared environment without a centralized controller
[15] or with an SDN controller, such as DRLS [16] and IQoR- responsible for gathering rewards or making decisions on
LSE [17], where great focus is brought to the collection of behalf of the agents. In this setup, the collection of agents
network measurements. is represented as Mt , where M denotes the cardinality of the
While with the advent of programmable switches there was set, and every agent has the capability to communicate with
an attempt to move part of the computation inside network all other agents. In particular, the composition of the agent set
devices, e.g., [18], [19], to the best of our knowledge there can possibly vary over time (e.g., failures) and is defined as
is no current routing solution having a data-driven DRL Mt at a given time t ∈ M.

64
Authorized licensed use limited to: Universiteit van Amsterdam. Downloaded on September 27,2024 at 11:17:12 UTC from IEEE Xplore. Restrictions apply.
2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)

SDN Controller
the expected cumulative reward (Q-value) for that state-action
P4
Metrics Application couple. Thus, it takes into account two main factors: (i)
DRL
queuing time, which is the time every packet has spent in
IPC Module the output queue, and (ii) the distance of the chosen next hop
from the final destination. While we want to minimize the time
spent by the packets in the outgoing queues, the distance of the
Data Traffic next hop from the final destination is the only information the
agent knows about the global topology. The reward function
Ri also considers two indicators: delivered, σ1 , set to 1 if
correctly routed, 0 otherwise, and dropped, σ2 , set to 1 if the
packet has been dropped, 0 otherwise. Their value is always
Fig. 2: Overview of ROAR and its differences compared to a set to 0 in the case of spine switches, as they are not directly
centralized solution. The solution is based on the control of connected to any destination host.
packet forwarding planes directly in networking devices, also Summarizing, the reward function for each agent i is:
showing the blocks composing every network switch.
Ri = λ1 ∗ σ1 − λ2 ∗ q − λ3 ∗ σ2 − λ4 ∗ σ3 ∗ d (1)
Our time-varying MARL process is defined as a tuple where: (i) the λ values are the model’s hyper-parameters set
⟨{S i }i∈M , {Ai }i∈M , P, {Ri }i∈M , {Mt }t≥0 ⟩, where S i de- during the training to check the performance of the algorithm,
notes the local state space of agent i in Mt , and AQi is the (ii) q is the time that the packet has spent in the queue before
M
action set that agent i can execute. Besides, A = i=1 Ai being sent, (iii) σ3 is a parameter indicating whether the
is the joint action space of all agents, also referred to as the switch is a spine one and is multiplied by d, i.e., the distance
global action profile. We then proceed by defining the local to the destination switch.
reward function of agent i, denoted as Ri : S × A → R, and Our Neural Network. In ROAR, we determined the number
the state transition probability function P : S ×A×S → [0, 1]. of input features for our NN, i.e., the state space, by computing
In this setup, we assume that the states and actions have their mean reward and selecting the list length for which
a global impact but are locally observable, as well as the they achieve the highest value. After different settings and
rewards, which are only observed locally. At each time step evaluations, we set as input the last two taken decisions, i.e.,
t, given the state st ∈ S and the joint actions of the agents the length for the “future destination” and “action history”
t ) ∈ A, each agent receives an individual
at = (a1t , . . . , aM state. Being categorical features, we need to convert them
reward rt+1
i
. This reward is given by an equation that captures into numerical ones using encoding techniques. For this work,
the incentive that the learning model wants to model and is we used the One-Hot-Encoding method, which uses dummy
determined by R(s i
t ,at )
. Additionally, the system transitions to variables to perform categorical encoding and performs better
a new state st+1 ∈ S with a probability of P (st+1 |st , at ). than other techniques according to the precision-recall curve
Our model is considered fully decentralized and individual (PR-AUC) metric [21].
as each agent receives rewards locally and performs actions These encoding features are now ready to be the inputs
independently. Opposed to an SDN centralized scenario, where of our NN model. We evaluated different NN structures
the controller has a global view of the network, we design a on leaf and backbone switches, finding that an architecture
fully distributed solution. The leading idea is to train agents in- composed of 3 hidden layers of 128, 64, and 32 neurons
dependently of each other to simplify the coordination process can achieve the best performance. It is important to notice
among routers and reduce the overhead caused by continuous that the DRL algorithm is not applied every time a switch
updates. On the contrary, routers only exchange information receives a packet. The overhead would be too high and it
about the known topology to consolidate in a virtual global would take tens of milliseconds to make a routing decision
network view. In this scenario, agents are independent of which would be intolerable at line rate for modern switches.
each other, considering that they do not share network state The resulting trade-off is to adopt a static routing guided by
information or model parameters. the DRL algorithm by means of periodical updates. It has
In each ROAR’s agent i, the action Ai is a discrete number been tested that updating the single route towards a specific
ranging from 1 to N , where N is the number of ports the destination every 10, 000 packets maximize the performance
switch uses. The state S i , instead, is composed of three of our network.
elements: (i) current destination, which is the current packet’s
destination IP address, (ii) future destinations, which is a list B. P4-based Actions
of L next packet’s destination IP addresses that follow the P4 is a networking programming language that allows
same route as the current one, (iii) action history, which is defining the data plane processing of a switch in a high-level
a list of the last k actions adopted for the current packet’s structure and generates efficient code that can run on different
destination. Every time a given action for a certain state has hardware targets, including ASICs, FPGAs, and CPUs. It
been performed, e.g., a packet has been forwarded towards provides a way to define how packets should be processed
a specific port, the reward function is evaluated to update through a network device, including how they are parsed,

65
Authorized licensed use limited to: Universiteit van Amsterdam. Downloaded on September 27,2024 at 11:17:12 UTC from IEEE Xplore. Restrictions apply.
2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)

matched against rules, and modified. To do so, P4 is composed comparison between ROAR, a traditional routing protocol and
of three main blocks: a parser, a match-action pipeline, and a centralized SDN solution.
a deparser. The parser is designed as a finite-state machine
that analyzes and extracts headers. For example, a packet may A. Evaluation settings
begin with an Ethernet header, followed by an IPv4 or IPv6 To validate ROAR’s benefits, we used Mininet a network
header; the parser extracts all of these headers and passes emulator that allows reproducing virtual networks and use
them to the next step, the match-action pipeline. This stage them as a testbed for simulation purposes. Being specifically
is composed of checksum verification algorithms, ingress, designed for software-defined networking (SDN), it supports
and egress pipelines, which are composed of structures (e.g., P4-compatible switches via the Behavioral Model Version 2
tables, registers, counters) that P4 uses to customize switch (BMV2), which allows compiling a P4 program into packet-
behavior and implement routing strategies based on policy. processing actions of C++11 software switches. The topology
The deparser is the final stage, defining how outgoing packets used for testing is composed of 10 servers connected to
are constructed from a set of header fields. In the following, we their switches which are consequently connected to other 4
mostly focus on the match-action pipeline’s ingress and egress switches, in a leaf-spine fashion, with all the links of the
while considering that the parser recognizes and extracts the topology with 100 Mbps bandwidth. In all the performed
Ethernet, IP, UDP, TCP, and ICMP headers. tests, we are using iperf3, a tool to perform measurements on
When a ROAR’s router receives a packet, it performs two the network according to different bandwidths, protocols, and
main operations: (i) inserts the current packet’s IP destination buffer setups. Each server of the network sends packets to any
address inside the “future destinations” data structure, accessed other server, varying the number of receiving servers (from 1
by the IPC module, (ii) chooses the output port given by the to 9) and replicating the workload as described in [23]. For
DRL module. The egress performs two other main operations: each network load, we then computed the average of the ob-
(i) forwards the packet according to a FIFO criteria and re- tained results (i.e., RTT, FPS, throughput, and packet loss) and
moves its destination IP address from the “future destinations” drew the two-tailed confidence interval at 95%. For the context
data structure, (ii) interacts with the IPC module to determi- of this work, we compare our results against two alternatives:
nate the reward for that specific forwarding action according to a traditional routing protocol implementation as OSPF, which
the time spent by the packet in the outgoing queue. Due to P4 uses the longest prefix match tables to perform routing across
limitations that prevent implementing any machine-learning the network and does not depend on the current network load;
method inside the default P4 compiler, we had to modify and a centralized SDN solution, QR-SDN [11], that routes
it and enable “extern” instances, which allow implementing packets according to the output of a tabular RL algorithm.
external methods outside the P4 program [22]. The P4 program
can reference the extern object and pass inputs to it, but the B. Random Traffic Generation
implementation details of how the object works are hidden To study the performance of our solution, we started by
from the P4 program. This makes the network’s control and generating traffic using the iperf3 tool that helped us control
data plane separation easier, allowing the P4 program to focus the level of congestion of our network and compare the results
on packet processing logic while leaving the low-level details with QR-SDN and the traditional routing implementation, as
of interacting with hardware to the extern objects. OSPF (Fig. 3). We can see from Fig. 3a that when the load
is low (10% to 20%), the network is not congested, and QR-
C. IPC module SDN shows a lower Round Trip Time (RTT), while ROAR
In ROAR, we use an IPC module deployed inside each and OSPF perform similarly. This might be caused to the fact
switch to act as an intermediary between our P4-based network that ROAR uses DRL to choose the optimal route and, for
application and the DRL algorithm. The IPC block serves every 10, 000 packet, the IPC module interacts with the DRL
to bridge the two modules by providing a means for the P4 one to retrieve information about the best port to forward the
application to communicate the packet counters to DRL, and packet. This interaction can indeed impact on the network
for the DRL to communicate the chosen actions, i.e., next performance, decreasing the overall throughput. However,
hop, to the P4 forwarding plane. As mentioned previously, when the network load starts to increase (20% to 40%), a
our IPC module includes functionality for preprocessing and significant difference is visible with OSPF, while QR-SDN
transforming data to make it suitable for the DRL algorithm still achieves a lower RTT than ROAR. When our network
and, in particular, for the NN method, as well as for monitoring is highly congested (from 50% to 90% of network load), we
and collecting feedback on the performance of the system for can clearly identify the benefits brought by ROAR, where the
the reward function. More in detail, our IPC module uses sock- RL method accurately chooses the best forwarding port to
ets abstraction written in C++ to establish a communication avoid congestion and decrease the RTT, while for QR-SDN
channel and exchange data with the DRL module. the interaction with an SDN controller impacts the overall
performance of the network. A similar behavior is visible
IV. E VALUATION when evaluating the throughput, as in Fig. 3b. The figure
This section describes our experimental settings and results shows that when the network is not congested and our load is
obtained over a virtual testbed like Mininet, focusing on a low (10% to 20%), ROAR performs as the traditional routing

66
Authorized licensed use limited to: Universiteit van Amsterdam. Downloaded on September 27,2024 at 11:17:12 UTC from IEEE Xplore. Restrictions apply.
2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)

200 100

Throughput (Mbps)
ROAR ROAR

Packet Loss (%)


QR-SDN 7.5 QR-SDN
80
RTT (ms)

OSPF OSPF

100 5.0
60
ROAR
QR-SDN 2.5
40 OSPF
0 0.0
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Load (%) Load (%) Load (%)
(a) (b) (c)
Fig. 3: Comparison of ROAR, OSPF and QR-SDN, measuring the (a) RTT evolution, (b) throughput, and (c) packet loss at
increasing network load.
100

Throughput (Mbps)

Throughput (Mbps)
ROAR ROAR 90.0
170 QR-SDN QR-SDN
90 87.5
165 ROAR
FPS

QR-SDN
160 80 85.0

155 82.5
70
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 0 50 100 150
Load (%) Load (%) Time (s)
(a) (b) (c)
Fig. 4: Comparison of ROAR and QR-SDN with realistic traffic workloads, measuring the (a) FPS evolution and (b) throughput
at increasing network load. (c) Throughput in 200 seconds when the network load is at 60%.
implementation, while QR-SDN achieves higher throughput. 8
10
However, a different behavior can be seen when our network
RAM (%)

CPU (%)
7
8
load increases. While QR-SDN achieves better throughput
6
than OSPF, ROAR allows our network to handle congestion 6

better and perform better than the compared solutions. 5 4


Packet delivery is another relevant metric when evaluating a 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Load (%) Load (%)
solution, as it is an indicator of how well the network responds
to the implemented strategy. Dealing with a high number of (a) (b)
packet loss means a low transmission quality with the need Fig. 5: (a) RAM and (b) CPU consumption as a percentage
to send packets again, negatively impacting the performance of available resources of a reference Intel Tofino switch.
of our model and the network resource utilization. For this
reason, we report in Fig. 3c the packet loss of when transmit- collected from a data center, has been analyzed to extract and
ting UDP packets and network runs ROAR and compared it replicate the flows in our topology by opportunely adapting IP
to QR-SDN and OSPF. For this experiment, we chose UDP addresses. We reported in Fig. 4 the results of our evaluation.
as the default transport protocol to validate the only effect of We start in Fig. 4a by counting the flows per second (fps)
routing over losses, since for TCP we could also have the TCP that the network can handle while running the file capture.
congestion control protocol running on the host to interfere. When our network is not congested, and the network load is
As visible from the figure, our solution can constantly reduce low (10% to 30%), QR-SDN is able to send a higher number
the number of packets lost compared to alternatives, due to the of fps than our solution. However, when the network starts
faster reaction when congestion starts appearing. By promptly being more congested and the network load increases (from
reacting and adapting the next hop, ROAR is able to reduce 40%), it is notable that ROAR performs better, being able
the number of losses. This result is particularly important for to send a high number of fps even at the highest network
delay-sensitive transmission, where keeping the number of re- load. As previous experiments showed, when the network state
transmissions to the minimum is crucial. (expressed as load) is more prominent, the RL module can
differentiate actions and select a more appropriate path.
C. Trace-based Evaluation A similar behavior is visible when evaluating the throughput
To evaluate our solution in the context of a realistic in both our solution and QR-SDN (Fig. 4b). While QR-SDN
workload scenario, we used captures taken from publicly achieves higher throughput at lower network loads, ROAR
available datasets [24] and replayed them in our network using is able to outperform when the network is fully congested.
the well-known tcprelay tool. The file, representing traffic Finally, we considered the first 200 seconds of execution at a

67
Authorized licensed use limited to: Universiteit van Amsterdam. Downloaded on September 27,2024 at 11:17:12 UTC from IEEE Xplore. Restrictions apply.
2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)

fixed network load of 60% and reported the result in Fig. 4c. [5] T. Fu, C. Wang, and N. Cheng, “Deep-learning-based joint optimization
It is visible from the figure that for the entire evaluation of renewable energy storage and routing in vehicular energy network,”
IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6229–6241, 2020.
period, ROAR achieves better throughput than QR-SDN, [6] C. Liu, M. Xu, Y. Yang, and N. Geng, “DRL-OR: Deep reinforcement
with results coherent with the ones reported before. These learning-based online routing for multi-type service requirements,” in
results are extremely important to assess the validity of local IEEE INFOCOM - IEEE Conference on Computer Communications.
IEEE, 2021, pp. 1–10.
performance-aware routing not only with synthetic traffic, but [7] R. Amin, E. Rojas, A. Aqdus, S. Ramzan, D. Casillas-Perez, and J. M.
also with more realistic traffic patterns. Arco, “A survey on machine learning techniques for routing optimization
in sdn,” IEEE Access, vol. 9, pp. 104 582–104 611, 2021.
D. Can ROAR run over real switches? [8] K.-F. Hsu, R. Beckett, A. Chen, J. Rexford, and D. Walker, “Contra: A
programmable system for performance-aware routing,” in 17th USENIX
Aware of the impact on resource consumption that a DRL Symposium on Networked Systems Design and Implementation (NSDI
model might cause when implemented on physical switches 20). USENIX Association, 2020, pp. 701–721.
[9] L. Yu, J. Sonchack, and V. Liu, “Mantis: Reactive programmable
(e.g., FPGA, Tofino), we computed the RAM and CPU usage switches,” in Proceedings of the Annual conference of the ACM Special
of ROAR at a varying network load. To do so, we took Interest Group on Data Communication (SIGCOMM ’20). ACM New
as reference the X308P-48Y-T programmable switch [25], York, NY, USA, 2020, pp. 296–309.
[10] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford,
combined with its embedded Data Processing Unit (DPU), C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese et al., “P4: Pro-
proportioning the results to its computing power. We reported gramming protocol-independent packet processors,” ACM SIGCOMM
the results in Fig. 5. In Fig. 5a, we can see how our solution Computer Communication Review, vol. 44, no. 3, pp. 87–95, 2014.
[11] J. Rischke, P. Sossalla, H. Salah, F. H. Fitzek, and M. Reisslein, “QR-
consumes a low amount of RAM when the network load SDN: Towards Reinforcement Learning States, Actions, and Rewards
is low and the network is congested. This amount increases for Direct Flow Routing in Software-Defined Networks,” IEEE Access,
only up to 2% at the highest load (70%-90%). The same vol. 8, pp. 174 773–174 791, 2020.
[12] D. M. Casas-Velasco, O. M. C. Rendon, and N. L. da Fonseca, “Intel-
behavior is visible in Fig. 5b, where the CPU consumption ligent Routing Based on Reinforcement Learning for Software-Defined
only increases at high network loads. These figures prove Networking,” IEEE Transactions on Network and Service Management,
that, despite the DRL module, the IPC module interaction, and vol. 18, no. 1, pp. 870–881, 2020.
[13] C. Yu, J. Lan, Z. Guo, and Y. Hu, “Drom: Optimizing the routing
the NN algorithm, ROAR requires small hardware resources, in software-defined networks with deep reinforcement learning,” IEEE
making it suitable to deploy on real programmable switches. Access, vol. 6, pp. 64 533–64 539, 2018.
[14] W. Li, H. Zhang, S. Gao, C. Xue, X. Wang, and S. Lu, “Smartcc: A
V. C ONCLUSION reinforcement learning approach for multipath tcp congestion control in
heterogeneous networks,” IEEE Journal on Selected Areas in Commu-
In this paper, we propose ROAR, a distributed ML-based nications, vol. 37, no. 11, pp. 2621–2633, 2019.
solution that uses the novel Deep Reinforcement Learning [15] S. S. Bhavanasi, L. Pappone, and F. Esposito, “Dealing with changes:
Resilient routing via graph neural networks and multi-agent deep
(DRL) mechanism to optimize the routing process of the reinforcement learning,” IEEE Transactions on Network and Service
network while doing all the computation directly inside the Management, vol. 20, no. 3, pp. 2283–2294, 2023.
P4 programmable switches. This approach allows to take into [16] L. Zhao, J. Wang, J. Liu, and N. Kato, “Routing for Crowd Management
in Smart Cities: A Deep Reinforcement Learning Perspective,” IEEE
consideration the current link load and re-route packets over Communications Magazine, vol. 57, no. 4, pp. 88–93, 2019.
less congested paths. In the experimental results, we compared [17] B. Dai, Y. Cao, Z. Wu, and Y. Xu, “IQoR-LSE: An Intelligent QoS On-
our solution to a traditional routing implementation and a Demand Routing Algorithm With Link State Estimation,” IEEE Systems
Journal, vol. 16, no. 4, pp. 5821–5830, 2022.
centralized SDN solution. It has been proven that the absence [18] C. Yu, W. Quan, D. Gao, Y. Zhang, K. Liu, W. Wu, H. Zhang, and
of an interaction with a centralized SDN controller reduces X. Shen, “Reliable cybertwin-driven concurrent multipath transfer with
RTT even when the network is congested, also achieving deep reinforcement learning,” IEEE Internet of Things Journal, vol. 8,
no. 22, pp. 16 207–16 218, 2021.
higher throughput, especially at higher network loads. [19] A. Sapio, M. Canini et al., “Scaling Distributed Machine Learning with
In-Network Aggregation,” in 18th USENIX Symposium on Networked
ACKNOWLEDGMENT Systems Design and Implementation (NSDI ’21), 2021, pp. 785–808.
[20] L. Buşoniu, R. Babuška, and B. De Schutter, “Multi-agent reinforce-
This work has been partially supported by NSF awards ment learning: An overview,” Innovations in multi-agent systems and
2133407 and 2201536. applications-1, pp. 183–221, 2010.
[21] C. Seger, “An investigation of categorical variable encoding techniques
R EFERENCES in machine learning: binary versus one-hot and feature hashing,” 2018.
[1] A. Sacco, F. Esposito, and G. Marchetto, “Resource Inference for Sus- [22] J. S. da Silva, F.-R. Boyer, L.-O. Chiquette, and J. P. Langlois, “Extern
tainable and Responsive Task Offloading in Challenged Edge Networks,” objects in p4: an rohc header compression scheme case study,” in
IEEE Transactions on Green Communications and Networking, vol. 5, 2018 4th IEEE Conference on Network Softwarization and Workshops
no. 3, pp. 1114–1127, 2021. (NetSoft). IEEE, 2018, pp. 517–522.
[2] Y.-J. Wu, P.-C. Hwang, W.-S. Hwang, and M.-H. Cheng, “Artificial [23] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prab-
intelligence enabled routing in software defined networking,” Applied hakar, S. Sengupta, and M. Sridharan, “Data center tcp (dctcp),” in
Sciences, vol. 10, no. 18, p. 6564, 2020. Proceedings of the ACM SIGCOMM 2010 Conference, 2010, pp. 63–74.
[3] A. Sacco, F. Esposito, and G. Marchetto, “RoPE: An Architecture for [24] T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics
Adaptive Data-Driven Routing Prediction at the Edge,” IEEE Transac- of data centers in the wild,” in Proceedings of the 10th ACM SIGCOMM
tions on Network and Service Management, vol. 17, no. 2, pp. 986–999, conference on Internet measurement, 2010, pp. 267–280.
2020. [25] “48x25gb+8x100gb, intel tofino p4 programmable bare
[4] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C. metal switch: Asterfusion,” July 2022. [Online].
Liang, and D. I. Kim, “Applications of deep reinforcement learning Available: https://cloudswit.ch/product/48x25gb8x100gb-intel-tofino-
in communications and networking: A survey,” IEEE Communications p4-programmable-bare-metal-switch-asterfusion/
Surveys & Tutorials, vol. 21, no. 4, pp. 3133–3174, 2019.

68
Authorized licensed use limited to: Universiteit van Amsterdam. Downloaded on September 27,2024 at 11:17:12 UTC from IEEE Xplore. Restrictions apply.

You might also like