0% found this document useful (0 votes)

41 views12 pages

DRL GNN

Uploaded by

avinashsharma231500

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views12 pages

DRL GNN

Uploaded by

avinashsharma231500

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

1

Deep Reinforcement Learning meets Graph Neural

Networks: exploring a routing optimization use case
Paul Almasan, José Suárez-Varela, Krzysztof Rusek, Pere Barlet-Ros, Albert Cabellos-Aparicio

Abstract—Deep Reinforcement Learning (DRL) has shown a (CP). However, real-world production networks have in the
dramatic improvement in decision-making and automated control order of hundreds of nodes and solvers based on ILP or
problems. Consequently, DRL represents a promising technique CP would take a large amount of time to solve network
to efficiently solve many relevant optimization problems (e.g.,
routing) in self-driving networks. However, existing DRL-based optimization problems [1], [2]. In addition, heuristic based
arXiv:1910.07421v3 [cs.NI] 7 Oct 2022

solutions applied to networking fail to generalize, which means solutions are far from being optimal.
that they are not able to operate properly when applied to Deep Reinforcement Learning (DRL) has shown significant
network topologies not observed during training. This lack of improvements in sequential decision-making and automated
generalization capability significantly hinders the deployment of
control problems [3], [4]. As a result, the network commu-
DRL technologies in production networks. This is because state-
of-the-art DRL-based networking solutions use standard neural nity is already investigating DRL as a key technology for
networks (e.g., fully connected, convolutional), which are not network optimization (e.g., routing) with the goal of enabling
suited to learn from information structured as graphs. self-driving networks [5]–[8]. However, existing DRL-based
In this paper, we integrate Graph Neural Networks (GNN) solutions still fail to generalize when applied to different
into DRL agents and we design a problem specific action network scenarios [9], [10]. In this context, generalization
space to enable generalization. GNNs are Deep Learning models
inherently designed to generalize over graphs of different sizes refers to the ability of the DRL agent to adapt to new network
and structures. This allows the proposed GNN-based DRL agent scenarios not seen during training (e.g., network topologies,
to learn and generalize over arbitrary network topologies. We configurations).
test our DRL+GNN agent in a routing optimization use case in We argue that generalization is an essential property for
optical networks and evaluate it on 180 and 232 unseen synthetic
the successful adoption of DRL technologies in production
and real-world network topologies respectively. The results show
that the DRL+GNN agent is able to outperform state-of-the-art networks. Without generalization, DRL solutions should be
solutions in topologies never seen during training. trained in the same network where they are deployed, which
is not possible or affordable in general. To train a DRL agent
Index Terms—Graph Neural Networks, Deep Reinforcement
Learning, Routing, Optimization is a very costly and lengthy process. It often requires sig-
nificant computing power and instrumentation of the network
I. I NTRODUCTION to observe its performance (e.g., delay, jitter). Additionally,
decisions made by a DRL agent during training can lead to
In the last years, industrial advances (e.g., Industry 4.0, IoT) degraded performance or even to service disruption. Thus,
and changes in social behavior created a proliferation of mod- training a DRL agent in the customer’s network may be
ern network applications (e.g., Vehicular Networks, AR/VR, unfeasible.
Real-Time Communications), imposing new requirements on With generalization, a DRL agent can be trained with
backbone networks (e.g., high throughput and low latency). multiple, representative network topologies and configurations.
Consequently, network operators need to efficiently manage Afterwards, it can be applied to other topologies and configu-
the network resources, ensuring the customer’s Quality of rations, as long as they share some common properties. Such
Service and fulfilling the Service Level Agreements. This is a “universal” model can be trained in a laboratory and later
typically done using expert knowledge or solvers leveraging on be incorporated in a product or a network device (e.g.,
Integer Linear Programming (ILP) or Constraint Programming router, load balancer). The resulting solution would be ready
to be deployed to a production network without requiring any
Paul Almasan, José Suárez-Varela, Pere Barlet-Ros and Albert Cabellos-
Aparicio are with the Barcelona Neural Networking Center. Universitat further training or instrumentation in the customer network1 .
Politècnica de Catalunya. Barcelona, Spain. Unfortunately, existing DRL proposals for networking were
E-mail: {felician.paul.almasan, jose.suarez-varela, pere.barlet, designed to operate in the same network topology seen during
alberto.cabellos}@upc.edu,
Krzysztof Rusek is with the Institute of Telecommunications, AGH training [9], [11], [12], thereby limiting their potential deploy-
University of Science and Technology, Krakow, Poland, and with the ment on production networks. The main reason behind this
Barcelona Neural Networking Center, Universitat Politècnica de Catalunya, strong limitation is that computer networks are fundamentally
Barcelona, Spain. E-mail: [email protected]
represented as graphs. For instance, the network topology and
NOTE: This work has been accepted for publication in the Computer routing policy are typically represented as such. However,
Communications journal. Please use the following reference to cite this state-of-the-art proposals [11], [13]–[15] use traditional neural
work: Paul Almasan, José Suárez-Varela, Krzysztof Rusek, Pere Barlet-Ros
and Albert Cabellos-Aparicio. ”Deep reinforcement learning meets graph
neural networks: Exploring a routing optimization use case” in Computer 1 Note that solutions based on transfer learning do not offer this property as
Communications, 2022, doi: https://doi.org/10.1016/j.comcom.2022.09.029. DRL agents need to be re-trained on the network where they finally operate.
2

network (NN) architectures (e.g., fully connected, convolu- A. Graph Neural Networks
tional) that are not well suited to model graph-structured Graph Neural Networks are a novel family of neural net-
information [16]. works designed to operate over graphs. They were introduced
In this paper, we integrate Graph Neural Networks in [17] and numerous variants have been developed since [20],
(GNN) [17] into DRL agents to solve network optimization [21]. In their basic form, they consist of associating some
problems. Particularly, our architecture is intended to solve initial states to the different elements of an input graph, and
routing optimization in optical networks and to generalize over combining them considering how these elements are connected
never-seen arbitrary topologies. The GNN integrated in our in the graph. An iterative algorithm updates the elements’
DRL agent is inspired by Message-passing Neural Networks state and uses the resulting states to produce an output. The
(MPNN), which were successfully applied to solve a relevant particularities of the problem to solve will determine which
chemistry-related problem [18]. In our case, the GNN was GNN variant is more suitable, depending on, for instance, the
specifically designed to capture meaningful information about nature of the graph elements (i.e., nodes and edges) involved.
the relations between the links and the traffic flowing through Message Passing Neural Networks (MPNN) [18] are a well-
the network topologies. known type of GNNs that apply an iterative message-passing
The evaluation results show that our agent achieves a algorithm to propagate information between the nodes of
strong generalization capability compared to state-of-the-art the graph. In a message-passing step, each node k receives
DRL (SoA DRL) algorithms [15]. Additionally, to further messages from all the nodes in its neighborhood, denoted by
test the generalization capability of the proposed DRL-based N(k). Messages are generated by applying a message function
architecture, we evaluated it in a set with 232 different real- m(·) to the hidden states of node pairs in the graph. Then,
world network topologies. The results show that the proposed they are combined by an aggregation function, for instance, a
DRL+GNN architecture is able to achieve outstanding perfor- sum (Equation 1). Finally, an update function u(·) is used to
mance over the networks never seen during training. Finally, compute a new hidden state for every node (Equation 2).
we explore the generalization limitations of our architecture
X
and discuss its scalability properties. Mkt+1 = m(htk , hti ) (1)
Overall, our DRL+GNN architecture for network optimiza- i∈N (k)
tion has the following features:
• Generality: It can work effectively in network topologies ht+1
k = u(htk , Mkt+1 ) (2)
and scenarios never seen during training. Where functions m(·) and u(·) can be learned by neural
• Deployability: It can be deployed to production networks
networks. After a certain number of iterations, the final node
without requiring training nor instrumentation in the states are used by a readout function r(·) to produce an output
customer network. for the given task. This function can also be implemented by
• Low overhead: Once trained, the DRL agent can make
a neural network and is typically tasked to predict properties
routing decisions in only one step (≈ ms), while its cost of individual nodes (e.g., the node’s class) or global properties
scales linearly with the network size. of the graph.
• Commercialization: Network vendors can easily embed it
GNNs have been able to achieve relevant performance re-
in network devices or products, and successfully operate sults in multiple domains where data is typically structured as
”arbitrary” networks. a graph [18], [22]. Since computer networks are fundamentally
We believe the combination of these features can enable represented as graphs, it is inherent in their design that GNNs
the development of a new generation of networking solu- offer unique advantages for network modeling compared to
tions based on DRL that are more cost-effective than current traditional neural network architectures (e.g., fully connected
approaches based on heuristics or linear optimization. All NN, Convolutional NN, etc.).
the topologies and scripts used in the experiments, as well
as the source code of our DRL+GNN agent are publicly B. Deep Reinforcement Learning
available [19].
DRL algorithms aim at learning a long-term strategy that
leads to maximize an objective function in an optimization
II. BACKGROUND problem. DRL agents start from a tabula rasa state and they
The solution proposed in this paper combines two machine learn the optimal strategy by an iterative process that explores
learning mechanisms. First, we use a GNN to model com- the state and action spaces. These are denoted by a set of states
puter network scenarios. GNNs are neural network architec- (S) and a set of actions (A). Given a state s ∈ S, the agent
tures specifically designed to generalize over graph-structured will perform an action a ∈ A that produces a transition to a
data [16]. In addition, they offer near real-time operation in new state s’ ∈ S, and will provide the agent with a reward
the scale of milliseconds (see Section VI-B). Second, we use r. Then, the objective is to find a strategy that maximizes the
Deep Reinforcement Learning to build an agent that learns cumulative reward by the end of an episode. The definition of
how to efficiently operate networks following a particular the end of an episode depends on the optimization problem to
optimization goal. DRL applies the knowledge obtained in past address.
optimizations to later decisions, without the necessity to run Q-learning [23] is a RL algorithm whose goal is to make
computationally intensive algorithms. an agent learn a policy π : S → A. The algorithm creates
3

a table (a.k.a., q-table) with all the possible combinations - Network state DRL Agent
of states and actions. At the beginning of the training, the - Traffic demand
Graph Neural Network
- Reward
table is initialized (e.g., with zeros or random values) and
during training, the agent updates these values according to Control plane
the rewards obtained after selecting an action. These values, Data plane
called q-values, represent the expected cumulative reward after
applying action a from state s, assuming that the agent follows

Lightpaths
the current policy π during the rest of the episode. During - ACTION:
Routing policy
training, q-values are updated using the Bellman equation (see Allocated for the current
Equation 3) where Q(st ,at ) is the q-value function at time-step demands
OTN state traffic demand
+
t, α is the learning rate, r(st ,at ) is the reward obtained from new traffic demand {src, dst, bandwidth}
selecting action at from state st and γ ∈ [0, 1] is the discount
Fig. 1: Schematic representation of the DRL agent in the OTN
factor.
routing scenario.

Q(st , at ) = Q(st , at ) + α r(st , at )+
(3) ITU-T Recommendation G.709 [26]. The ODUk signals are
0 0 then multiplexed into Optical Transport Units (OTUk), which
γ max
0
Q(s t , a ) − Q(s ,
t ta )
a
are data frames including Forward Error Correction. Eventu-
Deep Q-Network (DQN) [24] is a more advanced algorithm ally, OTUk frames are mapped to different optical channels
based on Q-learning that uses a Deep Neural Network (DNN) within the lightpaths of the topology.
to approximate the q-value function. As the q-table size In this scenario, the routing problem is defined as finding the
becomes larger, Q-learning faces difficulties to learn a policy optimal routing policy for each incoming source-destination
from high dimensional state and action spaces. To overcome traffic demand. The learning process is guided by an objective
this problem, they proposed to use a DNN as a q-value function that aims to maximize the traffic volume allocated in
function estimator, relying on the generalization capabilities the network in the long-term. We consider that a demand is
of DNNs to estimate the q-values of states and actions unseen properly allocated if there is enough available capacity in all
in advance. For this reason, a NN well suited to understand the lightpaths forming the end-to-end path selected. Note that
and generalize over the input data of the DRL agent is lightpaths are the edges in the logical topology where the agent
crucial for the agents to perform well when facing states (or operates. The demands do not expire, occupying the lightpaths
environments) never seen before. Additionally, DQN uses an until the end of a DRL episode. This implies a challenging task
experience replay buffer to store past sequential experiences for the agent, since it has not only to identify critical resources
(i.e., stores tuples of {s,a,r,s’} ). on networks (e.g., potential bottlenecks), but also to deal with
the uncertainty in the generation of future traffic demands. The
III. N ETWORK OPTIMIZATION SCENARIO following constraints summarize the traffic demand routing
In this paper, we explore the potential of a GNN-based problem in the OTN scenario:
DRL agent to address the routing problem in Optical Transport
• The agent must make sequential routing decisions for
Networks (OTN). Particularly, we consider a network scenario
every incoming traffic demand
based on Software-Defined Networking, where the DRL agent
• Traffic demands can not be split over multiple paths
(located in the control plane) has a global view of the current
• Previous traffic demands can not be rerouted and they
network state, and has to make routing decisions on every
occupy the links’ capacities until the end of the episode
traffic demand as it arrives. We consider a traffic demand as
the volume of traffic sent from a source to a destination node. The optimal solution to the OTN optimization problem can
This is a relevant optimization scenario that has been studied be found by solving its Markov Decision Process (MDP)
in the last decades in the optical networking community, where [27]. To do this, we can use techniques such as Dynamic
many solutions have been proposed [11], [15], [25]. Programming, which consist of an iterative process over all
In our OTN scenario, the DRL agent makes routing deci- MDP’s states until convergence. The MDP for the traffic
sions at the electrical domain, over a logical topology where demand allocation problem consists of all the possible network
nodes represent Reconfigurable Optical Add-Drop Multiplex- topology states and the transition probabilities between states.
ers (ROADM) and edges are predefined lightpaths connecting Notice that in our scenario we have uniform transition prob-
them (see Figure 1). The DRL agent receives traffic demands abilities from one state to the next. One limitation of solving
with different bandwidth requirements defined by the tuple MDPs optimally is that it becomes infeasible for large and
{src, dst, bandwidth}, and it has to select an end-to-end path complex optimization problems. As the problem size grows,
for every demand. Particularly, end-to-end paths are defined as so does the MDP’s state space, where the space complexity
sequences of lightpaths connecting the source and destination (in number of states) is S ≈ O(N E ), having N as the number
of a demand. Since the agent operates at the electrical domain, of different capacities a link can have and E as the number of
traffic demands are defined as requests of Optical Data Units links. Therefore, to solve the MDP the algorithm will spend
(ODUk), whose bandwidth requirements are defined in the more time on iterating over all MDP’s states.
4

IV. GNN- BASED DRL AGENT DESIGN h6

h1 x1 x2 0 … 0
In this section, we describe the DRL+GNN architecture
x1 x2 0 … 0
4 5
proposed in this paper. On one side, we have the GNN-based
h3
DRL agent which defines the actions to apply on the network x1 x2 0 … 0
topology. These actions consist of allocating the demands on 1
one of the candidate paths. Our DRL agent implements the h2
DQN algorithm [24], where the q-value function is modeled h4
x1 x2 8 … 0
by a GNN. On the other side, we have an environment which x1 x2 8 … 0

defines the optimization problem to solve. This environment 001 2 x1 x2 8 … 0

3
stores the network topology, together with the link features. n-element one-hot
vector encoding h5
In addition, the environment is responsible of generating the
reward once an action is performed, which will indicate the Fig. 2: Action representation in the link hidden states.
agent if that action was good or not.
The learning process is based on an iterative process,
where at each time step, the agent receives a graph-structured in large-scale real-world networks. This makes the routing
network state observation from the environment. Then, the problem complex for the DRL agent, since it should estimate
GNN constructs a graph representation where the links of the the q-values for all the possible actions to apply (i.e., routing
topology are the graph entities. In this representation, the link configurations). To overcome this problem, the action space
hidden states are initialized considering the input link-level must be carefully designed to reduce the dimensionality. In
features and the routing action to evaluate (see more details in addition, to enable generalization to other topologies, the
Sections IV-A, IV-B and IV-C). With this representation, an action space should be equivalent across topologies. In other
iterative message passing algorithm runs between the links’ words, if the actions in the training topology are represented
hidden states according to the graph structure. The output by shortest paths, in the evalutation topolgy they should also
of this algorithm (i.e., new links hidden states) is aggregated be shortest paths. If the action space would be different (e.g.,
into a global hidden state, that encodes topology information, multiple paths between a source-destination node pair), the
and then is processed by a DNN. This process makes the agent will have problems learning and it will not generalize
GNN topology invariant because the global hidden state length well. To leverage the generalization capability of GNN, we
is pre-defined and it will always have the same length for introduced the action into the agent in the form of a graph. This
different topology sizes. At the end of the message passing makes the action representation invariant to node and edge
phase, the GNN outputs a q-value estimate. This q-value is permutation, which means that, once the GNN is successfully
evaluated over a limited set of actions, and finally the DRL trained, it is able to understand actions over arbitrary graph
agent selects the action with highest q-value. structures (i.e., over different network states and topologies).
Considering the above, we limit the action set to k candidate
A. Environment paths for each source-destination node pair. To maintain a good
The network state is defined by the topology links’ features, trade-off between flexibility to route traffic and the cost to
which include the link capacity and link betweenness. The evaluate all the possible actions, we selected a set with the k=4
former indicates the amount of capacity available on the link. shortest paths (by number of hops) for each source-destination
The latter is a measure of centrality inherited from graph node pair. This follows a criteria originally proposed by [15].
theory that indicates how many paths may potentially traverse Note that the action set differs depending on the source and
the link. From the experimental results we observed that this destination nodes of the traffic demand to be routed.
feature helps reduce the grid search of the hyperparameter To represent the action, we introduce it within the network
tuning for the DRL agent. This is because the betweenness state. Particularly, we consider an additional link-level feature,
helps the agent converge faster to a good policy. In particular, which is the bandwidth allocated over the link after applying
we compute the link betweenness in the following way: for the routing action. This value corresponds to the bandwidth
each pair of nodes in the topology, we compute k candidate demand of the current traffic request to be allocated. Likewise,
paths (e.g., the k shortest paths), and we maintain a per-link the links that are not included in the path selected by the action
counter that indicates how many paths pass through the link. will have this feature set to zero. Since our OTN environment
Thus, the betweenness on each link is the number of end-to- has a limited number of traffic requests with various discrete
end paths crossing the link divided by the total number of bandwidth demands, we represent the bandwidth allocated
paths. with a N-element one-hot encoding vector, where N is the
vector length.
Figure 2 illustrates the representation of the action in the
B. Action Space hidden state of the links in a simple network scenario. A traffic
In this section we describe how the routing actions are request from node 1 to node 5, with a traffic demand of 8
represented in the DRL+GNN agent. Note that the number bandwidth units, is allocated over the path formed by the nodes
of possible routing combinations for each source-destination {1,2,3,5}. To summarize, Table I provides a description of
node pair typically results in a high dimensional action space the features included in the links’ hidden states. These values
5

Notation Description store information from links that are farther and farther apart.
x1 Link available capacity Therefore, the concept of time appears. RNNs are a NN ar-
x2 Link Betweenness chitecture that are tailored to capture sequential behavior (e.g.,
x3 Action vector (bandwidth allocated) text, video, time-series). In addition, some RNN architectures
x4 − xN Zero padding
(e.g., GRU) are designed to process large sequences (e.g., long
TABLE I: Input features of the link hidden states. N corre- text sentences in NLP). Specifically, they internally contain
sponds to the size of the hidden state vector. gates that are designed to mitigate the vanishing gradients,
a common problem with large sequences [28]. This makes
RNNs suitable to learn how the links’ state evolve during the
represent both the network state and the action, which is the message passing phase, even for large T.
input needed to model the q-value function Q(s, a).
The size of the hidden states is typically larger than the
D. DRL Agent Operation
number of features in the hidden states. This is to enable
each link to store information of himself (i.e., his own initial The DRL agent operates by interacting with the environ-
features) plus the aggregated information coming from all the ment. In Algorithm 2 we can observe a pseudocode describing
links’ neighbors (see Section IV-C). If the hidden state size the DRL agent operation. At the beginning, we initialize the
is equal to the number of link features, the links won’t have environment env by initializing all the link features. At the
space to store information about the neighboring links without same time, the environment generates a traffic demand to be
losing information. This results in a poor graph embedding allocated by the tuple {src, dst, bw} and an environment state
after the readout function. On the contrary, if the state size s. We also initialize the cumulative reward to zero, define
is very large, it can lead to a large GNN model, which can the action set size and create the experience replay buffer
overfit to the data. A common approach is to set the state size (agt.mem). Afterwards, we execute a while loop (lines 3-
larger than the number of features and to fill the vector with 16) that finishes when there is some demand that cannot be
zeros. allocated in the network topology. For each of the k=4 shortest
paths, we allocate the demand along all the links forming the
path and compute a q-value (lines 7-9). Once we have the
C. GNN Architecture q-value for each state-action pair, the next action a to apply
The GNN model is based on the Message Passing Neural is selected using an -greedy exploration strategy (line 10)
Network [18] model. In our case, we consider the link entity [24]. The action is then applied to the environment, leading
and perform the message passing process between all links. to a new state s’, a reward r and a flag Done indicating if
We choose link entities, instead of node entities, because the there is some link without enough capacity to support the
link features are what define the OTN routing optimization demand. Additionally, the environment returns a new traffic
problem. Node entities could be added when addressing an demand tuple {src0 , dst0 , bw0 }. The information about the
optimization problem that needs to incorporate node-level fea- state transition is stored in the experience replay buffer (line
tures (e.g., I/O buffer size, scheduling algorithm). Algorithm 13). This information will be used later on to train the GNN
1 shows a formal description of the message passing process in the agt.replay() call (line 15), which is executed every M
where the algorithm receives as input the links’ features (xl ) training iterations.
and outputs a q-value (q).
The algorithm performs T message passing steps. A graphi-
V. E XPERIMENTAL RESULTS
cal representation can be seen in Figure 3, where the algorithm
iterates over all links of the network topology. For each link, In this section we evaluate our GNN-based DRL agent
its features are combined with those of the neighboring links to optimize the routing configuration in the OTN scenario
using a fully-connected, corresponding to M in Figure 3. The described in Section III. Particularly, the experiments in this
outputs of these operations are called messages according to
the GNN notation. Then, the messages computed for each link
Execute T times
with their neighbors are aggregated using an element-wise sum Message passing Update
(line 5 in Algorithm 1). Afterwards, a Recurrent NN (RNN) For all neighbors
hL1 of link hL1
is used to update the link hidden states hLK with the new hL1
M + RNN
aggregated information (line 6 in Algorithm 1). At the end hL2
of the message passing phase, the resulting link states are . M . . Q-value
... +
aggregated using an element-wise sum (line 7 in Algorithm 1). . .
.
. . Readout

The result is passed through a fully-connected DNN which .

.
. .
models the readout function of the GNN. The output of this hLK M
hLK
latter function is the estimated q-value of the input state and M + RNN
action. ...
For all neighbors
The role of the RNN is to learn how the link states change of link hLK
along the message passing phase. As the link information
is being spread through the graph, each hidden state will Fig. 3: Message passing architecture.
6

Algorithm 1 Message Passing destination node pair and a traffic demand type (ODUk). This
Input : xl makes the problem even more difficult for the DRL agent,
Output : hTl , q since the uniform traffic distribution hinders the exploitation
1: for each l ∈ L do of prediction systems to anticipate possible demands difficult
2: h0l ← [xl , 0 . . . , 0] to allocate. In other words, all traffic demands are equally
3: for t = 1 to T do probable to appear in the future, making it more difficult for
4: for each l ∈ P L do the DRL agent to estimate the expected future rewards.
5: Mlt+1 = i∈N (l) m (htl , hti ) Initial experiments were carried out to choose an appro-
priate gradient-based optimization algorithm and to find the
ht+1 = u htl , Mlt+1

6: l
P hyperparameter values for the DRL+GNN agent. For the GNN
7: rdt ← l∈L hl model, we defined the links’ hidden states hl as 27-element
8: q ← R(rdt) vectors (filled with the features described in Table I). Note
that the size of the hidden states is related to the amount
of information they may potentially encode. Larger network
section are focused on evaluating the performance and gener- topologies and complex network optimization scenarios might
alization capabilities of the proposed DRL+GNN agent. Af- need larger sizes for the hidden state vectors. In every forward
terwards, in Section VI, we analyze the scalability properties propagation of the GNN we execute T=7 message passing
of our solution and discuss other relevant aspects related to steps using batches of 32 samples. The optimizer used is
the deployment on production networks. the Stochastic Gradient Descent [31] with a learning rate
of 10−4 and a momentum of 0.9. We start the -greedy
A. Evaluation Setup exploration strategy with =1.0 and maintain this value during
We implemented the DRL+GNN solution described in 70 training iterations. Afterwards, decays exponentially every
Section IV with Tensorflow [29] and evaluated it on an episode. The experience buffer stores 4,000 samples and is
OTN network simulator implemented using the OpenAI Gym implemented as a FIFO queue (first in, first out). We applied
framework [30]. The source code, together with all the training l2 regularization and dropout to the readout function with a
and evaluation results are publicly available [19]. coefficient of 0.1 in both cases. The discount factor γ was set
In the OTN simulator, we consider three traffic demand to 0.95.
types (ODU2, ODU3, and ODU4), whose bandwidth require-
ments are expressed in terms of multiples of ODU0 signals B. Methodology
(i.e., 8, 32, and 64 ODU0 bandwidth units respectively) [26].
We divided the evaluation of our DRL+GNN agent in two
When the DRL agent correctly allocates a demand, it receives
sets of experiments. In the first set, we focused on reasoning
an immediate reward being the bandwidth of the current traffic
about the performance and generalization capabilities of our
demand if it was properly allocated, otherwise the reward is
solution. For illustration purposes, we chose two particular
0. We consider that a demand is successfully allocated if all
network scenarios and analyzed them extensively. As a base-
the links in the path selected by the DRL agent have enough
line, we implemented the DRL-based system proposed in [15],
available capacity to carry such demand. Likewise, episodes
a state-of-the-art solution for routing optimization in OTNs.
end when a traffic demand was not correctly allocated. Traf-
Later on, in Section VI, we evaluated our solution on real-
fic demands are generated by uniformly selecting a source-
world network topologies and analyzed its scalability in terms
of computation time and generalization capabilities.
Algorithm 2 DRL Agent operation To find the optimal MDP solution to the OTN optimization
1: s, src, dst, bw ← env.init env() problem is infeasible due to its complexity. Take as an example
2: reward ← 0, k ← 4, agt.mem ← { }, Done ← False a small network topology with 6 nodes and 8 edges, where
3: while not Done do the links have capacities of 3 ODU0 units, there is only
4: k q values ← { } one bandwidth type available (1 ODU0) and there are 4
5: k shortest paths ← compute k paths(k, src, dst) possible actions. The resulting number of states of the MDP
6: for i in 0, ..., k do is 58 *6*5*1 ≈ 1.17e7 . To find a solution to the MDP we can
7: p0 ← get path(i, k shortest paths) use Dynamic Programming algorithms such as value iteration.
8: s0 ← env.alloc demand(s, p’, src, dst, dem) However, this algorithm has a time complexity to solve the
9: k q values[i] ← compute q value(s’, p’) MDP of O(S 2 A), where S and A are the number of states
and actions respectively and S ≈ O(N E ), having N as the
10: q value ← epsilon greedy(k q values, )
number of different capacities a link can have and E as the
11: a ← get action(q value, k shortest paths, s)
number of links.
12: r, Done, s0 , src0 , dst0 , bw0 ← env.step(s, a)
As an alternative, we compare the DRL+GNN agent perfor-
13: agt.rmb(s, src, dst, bw, a, r, s0 , src0 , dst0 , bw0 )
mance with a theoretical fluid model (labeled as Theoretical
14: reward ← reward + r
Fluid). This model is a theoretical approach which considers
15: If training steps % M == 0: agt.replay()
that traffic demands may be split into the k=4 candidate paths
16: src ← src’; dst ← dst’; bw ← bw’, s ← s’
proportionally to the available capacity they have. This routing
7

1500
policy is aimed at avoiding congestion on links. For instance,

Score (bw allocated)

1250 1500
paths with low available capacity will carry a small proportion
1250
of the traffic volume from new demands. Note that this model 1000
1000
is non-realizable because ODU demands cannot be split in real 750
750
OTN scenarios. However, this model is fast to compute and 500 500

serves us as a reference to compare the performance of the 250 250

DRL+GNN DRL LB Theoretical DRL+GNN DRL LB Theoretical
DRL+GNN agent. In addition, we also use a load balancing trained in the Fluid trained in the Fluid
same network same network
routing policy (LB), which selects uniformly random one path (NSFNet) (Geant2)
among the k=4 candidate shortest paths to allocate the traffic
(a) Evaluation on Nsfnet (b) Evaluation on Geant2
demand.
1.0 1.0
We trained the DRL+GNN agent in an OTN routing

Probability Density
0.8 0.8
scenario on the 14-node Nsfnet topology [32], where we
considered that the links represent lightpaths with capacity 0.6 0.6

for 200 ODU0 signals. Note that the capacity is shared on 0.4 DRL+GNN 0.4 DRL+GNN
DRL trained in
both directions of the links and that the bandwidth of different 0.2
the same network DRL trained in
the same network
(NSFNet) 0.2
(Geant2)
traffic demands is expressed in multiples of ODU0 signals (i.e., LB LB
0.0 0.0
8, 32 or 64 ODU0 bandwidth units). We ran 1,000 training iter- −1.0 −0.5 0.0 0.5 1.0 1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Relative performance to Relative performance to
ations where the agent received traffic demands and allocated Theoretical Fluid Theoretical Fluid
them on one of the k=4 shortest paths available in the action (c) Evaluation on Nsfnet (d) Evaluation on Geant2
set. The model with highest performance was selected to be
benchmarked against traditional routing optimization strategies Fig. 4: Performance evaluation against state-of-the-art DRL.
and state-of-the-art DRL-based solutions. Notice that the vertical lines in 4c and 4d indicate the same
performance as the theoretical fluid model.
C. Performance evaluation against state-of-the-art DRL-
based solutions
In this evaluation experiment, we compare our DRL+GNN 1.0
agent against state-of-the-art DRL-based solutions. Particu-
Score (bw allocated)

Probability Density
1500 0.8
larly, we adapted the solution proposed in [15] to operate in 1250
0.6
scenarios where links share their capacity in both directions. 1000
We trained two different instances of the state-of-the-art DRL 750 0.4

agent in two network scenarios: the 14-node Nsfnet and the 24- 500 0.2

node Geant2 topology [33]. We made 1,000 experiments with 250

0.0
DRL+GNN DRL traiend in −1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
uniform traffic generation to provide representative results. a different network DRL+GNN relative
Note that both, the proposed DRL+GNN agent and the state- (NSFNet) performance to DRL

of-the-art DRL solution, were evaluated over the same list of (a) Bandwidth allocated (b) CDF
generated demands.
Fig. 5: Evaluation on Geant2 of DRL-based solutions trained
We run two experiments to compare the performance of
on Nsfnet.
our DRL+GNN with the results obtained by the state-of-the-
art DRL (SoA DRL). In the first experiment, we evaluated
the DRL+GNN agent against the SoA DRL agent trained
on Nsfnet, the LB routing policy, and the theoretical fluid
model. We evaluated the four routing strategies on the Nsfnet We run another experiment to compare the generalization
topology and compared their performance. In Figure 4a, we capabilities of our DRL+GNN agent. In this experiment, we
can observe a bloxplot with the evaluation results of 1,000 evaluated the DRL+GNN agent (trained on Nsfnet) against
evaluation experiments. The y-axis indicates the agent score, the SoA DRL agent trained on Nsfnet, and evaluated both
which corresponds to the bandwidth allocated by the agent. agents on the Geant2 topology. The resulting boxplot can be
Figure 4c shows the Cumulative Distribution Function (CDF) seen in Figure 5a and the corresponding CDF in Figure 5b.
of the relative score obtained with respect to the fluid model. The results indicate that in this scenario the DRL+GNN agent
In this experiment we could also observe that the proposed also outperforms the SoA DRL agent. In this case, in 80%
DRL+GNN agent slightly outperforms the SoA DRL-based by of the experiments our DRL+GNN agent achieved more than
allocating 6.6% more bandwidth. In the second experiment, we 45% of performance improvement with respect to the SoA
evaluated the same models (DRL+GNN, SoA DRL, LB, and DRL proposal. These results show that while the proposed
Theoretical Fluid) on the Geant2 topology, but in this case DRL+GNN agent is able to generalize and achieve outstanding
the SoA DRL agent was trained on Geant2. The resulting performance in the unseen Geant2 topology (Figure 5a and
boxplot can be seen in Figure 4b and the CDF of the Figure 5b), the SoA DRL agent performs poorly when applied
evaluation samples in Figure 4d. Similarly, in this case our to topologies not seen during training. This reveals the lack
agent performs slightly better than the SoA DRL approach of generalization capability of the latter DRL-based solution
(3% more bandwidth). compared to the agent proposed in this paper.
8

D. Use case: Link failure resilience NETWORK

OFFLINE TRAINING OPERATION ON OPERATOR 1
This subsection presents a use case where we evaluate if CUSTOMER NETWORK
LAB
our DRL+GNN agent can adapt successfully to changes in SYNTHETIC TOPOLOGIES
FINAL PRODUCT NETWORK
the network topology. For this, we consider the case of a MODEL WITH
OPERATOR 2
network with link failures. Previous work showed that real- GENERALIZATION
CAPABILITIES
world network topologies change during time (e.g., due to link CONTROLLED CONTROLLED
TESTBED #1 TESTBED #2
failures) [1], [34], [35]. These changes in network connectivity NETWORK
+ OPERATOR 3
are unpredictable and they have a significant impact in protocol DRL+GNN
convergence [34] or on fulfilling network optimization goals
[1].
In this evaluation, we considered a range of scenarios that Fig. 7: DRL+GNN deployment process overview by incorpo-
can experience up to 10 link failures. Thus, the DRL+GNN rating it into a product.
agent is tasked to find new routing configurations that avoid
the affected links while still maximizing the total bandwidth

Relative performance (%)

Theoretical Fluid Theoretical Fluid
5 40
DRL+GNN trained in NSFNet
allocated. We executed experiments where n ∈ [1, 10] links are 0
30
LB
20
randomly removed from the Geant2 topology. We compare the −5 10

score (i.e., bandwidth allocated) achieved by the DRL+GNN 0

−10 −10
agent with respect to the theoretical fluid model. Figure 6a −20
−15
shows the average score over 1,000 experiments (y-axis) as a −30
−20 −40
function of the number of link failures (x-axis). There, we can 20 30 40 50 60 70 80 90 100 0 30 60 90 120 150 180 210 240
Topology size (number of nodes) Topology identifier
observe that the DRL+GNN agent can maintain better perfor-
mance than the theoretical baseline even in the extreme case (a) (b)
of 10 concurrent link failures. Likewise, Figure 6b shows the Fig. 8: DRL+GNN relative performance with respect to the
relative score of our DRL+GNN agent against the theoretical fluid model over 180 synthetic topologies (a) and 232 real-
fluid model. In line with the previous results, the relative score world topologies (b).
is maintained as links are removed from the topology. This
suggests that the proposed DRL+GNN architecture is able to
adapt to topology changes. To better understand the technical feasibility and scalability
properties of such a product in terms of cost and general-
ization, we designed two experiments. First, we analyze how
VI. D ISCUSSION ON DEPLOYMENT
the effectiveness of our agent scales with the network size,
In this section we analyze and discuss relevant aspects of by training it in a single (small) network and evaluating its
our DRL+GNN architecture towards deployment in production performance in synthetic and real-world network topologies.
networks. In the context of self-driving networks, DRL cannot Second, we analyze the scalability of our agent in terms of
succeed without generalization capabilities. Training a DRL computation time after deployment (i.e., the time it takes
agent requires instrumenting the network with configurations for the agent to make routing decisions). This is particularly
that may disrupt the service. As a result, training in the relevant in real-time networking scenarios.
customer’s network may be unfeasible. With generalization
capabilities, the DRL agent can be trained in a controlled
A. Generalization over network topologies
lab (for instance at the vendor’s facilities) and shipped to
the customer. Once deployed, it can operate efficiently in an In both scenarios we used the DRL+GNN agent trained
unseen network or scenario. Figure 7 illustrates this training in a single topology (14 nodes Nsfnet) and we analyzed its
and deployment process of a product based on our DRL+GNN performance in larger topologies (up to 100 nodes) not seen
architecture. during training.
1) Synthetic topologies: In this experiment we generated a
total of 180 synthetic topologies with an increasing number
3
Mean Score (bw allocated)

of nodes. For each topology size –in number of nodes–

Relative performance (%)

DRL+GNN trained
900 on NSFNet

800
Theoretical Fluid
2
we generated 20 topologies and we evaluated the agent on
1,000 episodes. To do this, we used the NetworkX python
700
1 library [36] to generate random network topologies between
600
20 and 100 nodes with similar average node degree to Nsfnet.
0
500 This allows us to analyze how the network size affects the
400 performance.
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Number of links removed Number of links removed Figure 8a shows how the performance scales inversely with
(a) (b) the topology size. For benchmark purposes, we computed the
relative score with respect to the theoretical fluid model. The
Fig. 6: DRL+GNN evaluation on a use case with link failures. agent shows a remarkable performance in unseen topologies.
9

8.5
As an example, the agent has a similar performance to the the-
8.0

Inference Time (ms)

oretical fluid model in the 30-node topologies, which double
7.5
the size of the single 14-node topology seen during training. In 7.0
addition, in the 100-node topologies, we observe only a 15% 6.5

drop in performance. This result shows that the generalization 6.0

Confidence interval
properties of our solution degrade gracefully with the size 5.5
Average time
5.0
of the network. It is well-known that deep learning models 20 30 40 50 60 70 80 90 100
lose generalization capability as the distribution of the data Topology size (number of nodes)

seen during training differs from the evaluation samples (see Fig. 9: DRL+GNN average computation time (in milliseconds)
SectionVI-C). over different topology sizes.
2) Real-world network topologies: In this section we eval-
uate the generalization capabilities of our DRL+GNN agent,
trained in Nsfnet, on 232 real-world topologies obtained from B. Computation Time
the Topology Zoo [2] dataset. Specifically, we take all the In this section we analyze the computation time of an
topologies that have up to 100 nodes. In Table II we can already trained DRL+GNN agent when deployed in a realistic
see the features extracted from the resulting topologies. The scenario. For this purpose, we used the synthetic topologies
diameter feature corresponds to the maximum eccentricity generated before in Section VI-A1, and we executed 1,000
(i.e., maximum distance from one node to another node). episodes for each one and we measured the computation time.
The ranges of the different topology features indicate that our This is the time the agent takes to select the best path to
topology dataset contains different topology distributions. allocate all the incoming traffic requests. For this experiment
We executed 1,000 evaluation episodes and computed the we used off-the-shelf hardware without any specific hardware
average reward achieved by the DRL+GNN agents, the LB, accelerator (64-bit Ubuntu 16.04 LTS with processor Intel
and the theoretical fluid routing strategies for each topology. Core i5-8400 with 2.80GHz × 6 cores and 8GB of RAM
Then, we computed the relative performance (in %) of our memory). Results should be understood only as a reference
agent and the LB policy with respect to the theoretical fluid to analyze the scalability properties of our solution. Real im-
model. Figure 8b shows the results where, for readability, plementations in a network device would be highly optimized.
we sort the topologies according to the difference of score Figure 9 shows the computation time for all episodes. The
between the DRL+GNN agent and the LB policy. In the left dots correspond to the average agent operation time over all the
side of the figure we observe some topology samples where episodes and the confidence interval corresponds to the 5/95
the scores of all three routing strategies coincide. This kind percentiles. The execution time is in the order of few ms and
of behavior is normal in topologies where for each input grows linearly with the size of the topology. This is expected
traffic demand, there are not many paths to route the traffic due to the way the message-passing in the GNN has been
demand (e.g., in ring or star topologies). As the number of designed. The results indicate that, in terms of deployment,
paths increases, routing optimization becomes necessary to the proposed DRL+GNN agent has interesting features. It
maximize the number of traffic demands allocated. is capable of optimizing unseen networks achieving good
We also trained a DRL+GNN agent only in the Geant2 performance, as optimization algorithms, but in one single step
topology. The mean relative score (with respect to the theoret- and in tens of milliseconds, as heuristics.
ical fluid) of evaluating the model on all real-world topologies
was +4.78%. In the interest of space, we omit this figure. These C. Discussion
results indicate that our DRL+GNN architecture generalizes In this paper we propose a data-driven solution to solve
well to topologies never seen during training, independently a routing problem in OTN. This means that our DRL agent
of the topology used during training.
These experiments show the robustness of our architecture Mean Var. DRL+GNN
to operate in real-world topologies that largely differ from Topology Node Edge
Node Node Perf. w.r.t.
Size Betwee. Betwee.
the scenarios seen during training. Even when trained in a Degree Degree Fluid (%)
Nsfnet
single 14-node topology, the agent achieves good performance (training)
3 0.2857 0.0952 0.1020 -
in topologies of up to 100 nodes. 20 Nodes 2.90 0.1050 0.1036 0.0988 4.305
30 Nodes 2.93 0.0956 0.0844 0.0764 -0.649
40 Nodes 2.95 0.1025 0.0704 0.0623 -3.945
50 Nodes 2.96 0.1104 0.0620 0.0538 -6.422
Feature Minimum Maximum 60 Nodes 2.97 0.1056 0.0559 0.0476 -8.103
70 Nodes 2.97 0.0920 0.0522 0.0437 -10.064
Num. Nodes 6 92
80 Nodes 2.98 0.0956 0.0474 0.0395 -11.380
Num. Edges 5 101
Avg. node degree 1.667 8 90 Nodes 2.98 0.1062 0.0436 0.0361 -13.610
Var. node degree 0.001 41.415
Diameter 1 31
TABLE III: Features for the Synthetic network topologies. The
values correspond to the mean of all topologies from each
TABLE II: Real-world topology features (minimum and max- topology size. As a reference, the first row corresponds to the
imum values). Nsfnet topology used during training.
10

Avg Var DRL+GNN

Topology
Node Node
Node Edge
Perf. w.r.t. Figure 8b). In addition, we visualized the topologies with id
Id Betwee. Betwee. 0, 1 and 2 and observed that they correspond to topologies
Degree Degree Fluid (%)
Nsfnet that have some nodes with a very high connectivity (see the
3 0.29 0.0952 0.1020 -
(training)
0 2.42 9.59 0.0410 0.0484 -27.357
variance of the node degree in Table IV). Similarly to the
1 3.51 15.17 0.0447 0.0394 -23.944 synthetic topologies, we again observe that the more different
2 2.00 41.41 0.0180 0.0298 -25.965 the topologies are than Nsfnet, the worse is the DRL+GNN’s
229 2.31 0.98 0.1294 0.1615 19.066 performance.
230 2.06 1.75 0.1140 0.1340 18.570
231 2.07 2.22 0.0994 0.1244 27.430 There are several things that could be done to improve
the generalization capabilities for such topologies. A straight-
TABLE IV: Features for the real-world network topologies. forward approach would be to incorporate topologies with
The relative performance is the mean of 1,000 evaluation different characteristics to the training set. In addition, the
episodes. As a reference, the first row corresponds to the DRL+GNN architecture could be improved using fine-tuned
Nsfnet topology used during training. traditional Deep Learning techniques (e.g., regularization,
dropout). Finally, the work from [37] suggests that aggregating
the information of the neighboring links using a combination
learns from data that is obtained from past interactions with of mean, min, max, and sum of the links’ states improves
the environment. This method has the main limitation that generalization. We consider that improving the generalization
when evaluated on out-of-distribution data, its performance is outside the scope of our work and we left it as future work.
is expected to drop. In our scenario, out-of-distribution is
any data related to network topology, link features and traffic
matrix that is radically different from the data seen during the VII. R ELATED WORK
training process. Network optimization is a well-known and established topic
The experimental results on synthetic and real-world topolo- whose fundamental goal is to operate networks efficiently.
gies (Section VI-A1 and Section VI-A2 respectively) show that Most of the works in the literature use traditional methods to
the DRL+GNN architecture has performance issues on some optimize a network state based on Integer Linear Programming
topologies. This performance drop is related to the diverging (ILP) or Constraint Programming (CP). A relevant work is [1]
network characteristics from the topology used during training. where they convert high-level goals, indicated by the network
The link features are normalized and the traffic demands operator, into valid network configurations using constraint
always have the same bandwidth values, which excludes them programming. The authors from [38] propose a solution based
as the source of the performance drop. However, the network on ILP for multicast routing in OTN. In addition, they propose
topology changes, which has a direct impact on the DRL agent to use a heuristic based on genetic algorithms to improve
performance. the locally optimal solution and to reduce the computational
Table III shows different topology metrics for each topology complexity. Similar work using genetic algorithms are [39],
size (in number of nodes). The edge betweenness is computed [40].
in the following way: for each edge compute the sum of the To find the optimal routing configuration from a given traffic
fraction of all-pairs of shortest paths that pass through the matrix is a fundamental problem, which has been demonstrated
edge, and then make the mean of all edges. This applies to be NP-hard [41], [42]. In this context, several DRL-based
in a similar way to the node betweenness. In addition, the solutions have been proposed to address routing optimization.
DRL+GNN’s performance with respect to the Theoretical In [11] they propose a DRL solution for spectrum assignment
Fluid model is also shown. The values correspond to the means using Q-learning and convolutional NNs. Similarly, in [15]
from evaluating on all network topologies for each topology they propose a more elaborated representation of the network
size (i.e., the means from the results in Figure 8a). state to help the DRL agent capture easily the singularities
Even though the synthetic topologies were generated in a of the network topology. In [43] they propose a scalable
way to have a similar node degree like Nsfnet, we can see method to solve Traffic Engineering problems with DRL in
that other metrics diverge as the topologies become larger. large networks. The work from [44] combines DRL with
Specifically, the node and edge betweenness become smaller, Linear Programming to minimize the utilization of the most
which indicates that the pairs of shortest paths are more congested link. In [45] they propose and compare different
distributed. In other words, for small topologies the nodes and DRL-based algorithms to solve a Traffic Engineering problem
edges have proportionally more shortest paths crossing them in SD-WAN.
than for larger ones. The network metrics clearly indicate that However, most of the proposed DRL-based solutions fail
the more different the topologies are than Nsfnet, the worse to generalize to unseen scenarios. This is because they pre-
is the DRL’s performance. process the data from the network state and present it in the
Table IV shows a similar table but for the real-world form of fixed-size matrices (e.g., adjacency matrix of a net-
topologies. In this case, the performance results correspond to work topology). Then, these representations are processed by
the means from the results in Figure 8b. Following a similar traditional neural network architectures (e.g., fully connected,
reasoning, we can see that the real-world topologies where the convolutional neural networks). These neural architectures are
DRL+GNN architecture achieves the worst performance are not suited to learn and generalize over data that is inherently
radically different from Nsfnet (i.e., top left topologies from structured as a graph. Consequently, state-of-the-art DRL
11

agents perform poorly when they are evaluated in different funds of the Faculty of Computer Science, Electronics and
topologies that were not seen during the training. Telecommunications of AGH University and by the PL-Grid
There have been several attempts to use GNN in the commu- Infrastructure.
nication networks field. In [46] they use GNN to learn shortest-
path routing and max-min routing in a supervised learning R EFERENCES
approach. In [47] they combine GNN with DRL to solve a
network planning problem. Another relevant work is the one [1] R. Hartert, S. Vissicchio, P. Schaus, O. Bonaventure, C. Filsfils,
T. Telkamp, and P. Francois, “A declarative and expressive approach to
from [48] where they use a distributed setup of DRL agents control forwarding paths in carrier-grade networks,” ACM SIGCOMM
to solve a Traffic Engineering problem in a decentralized way. computer communication review, vol. 45, no. 4, pp. 15–28, 2015.
The work from [10] proposes to use GNN to predict network [2] S. Knight, H. X. Nguyen, N. Falkner, R. Bowden, and M. Roughan,
“The internet topology zoo,” IEEE Journal on Selected Areas in Com-
metrics and a traditional optimizer to find the routing that munications, vol. 29, no. 9, pp. 1765–1775, 2011.
minimizes some network metrics (e.g., average delay). Finally, [3] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
GNNs have been proposed to learn job scheduling policies in Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski
et al., “Human-level control through deep reinforcement learning,”
a data-center scenario without human intervention [49]. Nature, vol. 518, p. 529–533, 2015.
[4] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez,
M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., “Mastering chess
VIII. C ONCLUSION and shogi by self-play with a general reinforcement learning algorithm,”
In this paper, we presented a DRL architecture based on arXiv preprint arXiv:1712.01815, 2017.
[5] N. Feamster and J. Rexford, “Why (and how) networks should run
GNNs that is able to generalize to unseen network topologies. themselves,” arXiv preprint arXiv:1710.11583, 2017.
The use of GNNs to model the network environment allows the [6] M. Wang, Y. Cui, X. Wang, S. Xiao, and J. Jiang, “Machine learning
DRL agent to operate in different networks than those used for networking: Workflow, advances and opportunities,” IEEE Network,
vol. 32, no. 2, pp. 92–99, 2017.
for training. We believe that the lack of generalization was [7] A. Mestres, A. Rodriguez-Natal, J. Carner, P. Barlet-Ros, E. Alarcón,
the main obstacle preventing the use and deployment of DRL M. Solé, V. Muntés-Mulero, D. Meyer, S. Barkai, M. J. Hibbett
in production networks. The proposed architecture represents et al., “Knowledge-defined networking,” ACM SIGCOMM Computer
Communication Review, vol. 47, no. 3, pp. 2–10, 2017.
a first step towards the development of a new generation of [8] P. Kalmbach, J. Zerwas, P. Babarczi, A. Blenk, W. Kellerer, and
DRL-based products for networking. S. Schmid, “Empowering self-driving networks,” in Proceedings of the
In order to show the generalization capabilities of our Afternoon Workshop on Self-Driving Networks, 2018, pp. 8–14.
[9] A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar, “Learning to
DRL+GNN solution, we selected a classic problem in the field route,” in Proceedings of the ACM Workshop on Hot Topics in Networks
of optical networks. This served as a baseline benchmark to (HotNets), 2017, pp. 185–191.
validate the generalization performance of our architecture. [10] K. Rusek, J. Suárez-Varela, P. Almasan, P. Barlet-Ros, and A. Cabellos-
Aparicio, “Routenet: Leveraging graph neural networks for network
Our results show that the proposed DRL+GNN agent is able modeling and optimization in sdn,” IEEE Journal on Selected Areas
to effectively operate in networks never seen during training. in Communications, vol. 38, no. 10, pp. 2260–2270, 2020.
Previous DRL solutions based on traditional neural network [11] X. Chen, J. Guo, Z. Zhu, R. Proietti, A. Castro, and S. J. B. Yoo, “Deep-
rmsa: A deep-reinforcement-learning routing, modulation and spectrum
architectures were not able to generalize to other topologies. assignment agent for elastic optical networks,” in Proceedings of the
A fundamental challenge that remains to be addressed Optical Fiber Communications Conference (OFC), 2018.
towards the deployment of DRL techniques for self-driving [12] Z. Xu, J. Tang, J. Meng, W. Zhang, Y. Wang, C. H. Liu, and D. Yang,
“Experience-driven networking: A deep reinforcement learning based
networks is their black-box nature. DRL does not provide approach,” in IEEE Conference on Computer Communications (INFO-
guaranteed performance for all network scenarios and its COM), 2018, pp. 1871–1879.
operation cannot be understood easily by humans. As a result, [13] L. Chen, J. Lingys, K. Chen, and F. Liu, “Auto: Scaling deep rein-
forcement learning for datacenter-scale automatic traffic optimization,”
DRL-based solutions are inherently complex to troubleshoot in Proceedings of the 2018 conference of the ACM special interest group
and debug by network operators. In contrast, computer net- on data communication, 2018, pp. 191–205.
works have been built around well-understood analytical and [14] A. Mestres, E. Alarcón, Y. Ji, and A. Cabellos-Aparicio, “Understanding
the modeling of computer network delays using neural networks,” in
heuristic techniques, and such mechanisms are based on well- Proceedings of the ACM SIGCOMM Workshop on Big Data Analytics
known assumptions that perform reasonably well across dif- and Machine Learning for Data Communication Networks (Big-DAMA),
ferent scenarios. Such issues are not unique to self-driving 2018, pp. 46–52.
networks, but rather common to the application of machine [15] J. Suárez-Varela, A. Mestres, J. Yu, L. Kuang, H. Feng, A. Cabellos-
Aparicio, and P. Barlet-Ros, “Routing in optical transport networks with
learning to many critical use-cases, such as self-driving cars. deep reinforcement learning,” IEEE/OSA Journal of Optical Communi-
cations and Networking, vol. 11, no. 11, pp. 547–558, 2019.
[16] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zam-
ACKNOWLEDGMENT baldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner
This publication is part of the Spanish I+D+i project et al., “Relational inductive biases, deep learning, and graph networks,”
arXiv preprint arXiv:1806.01261, 2018.
TRAINER-A (ref.PID2020-118011GB-C21), funded by [17] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini,
MCIN/ AEI/10.13039/501100011033. This work is also “The graph neural network model,” IEEE Transactions on Neural
partially funded by the Catalan Institution for Research Networks, vol. 20, no. 1, pp. 61–80, 2008.
[18] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl,
and Advanced Studies (ICREA) and the Secretariat for “Neural message passing for quantum chemistry,” in Proceedings of the
Universities and Research of the Ministry of Business and International Conference on Machine Learning (ICML) - Volume 70,
Knowledge of the Government of Catalonia and the European 2017, pp. 1263–1272.
[19] https://github.com/knowledgedefinednetworking/DRL-GNN.
Social Fund. This work was also supported by the Polish [20] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph
Ministry of Science and Higher Education with the subvention sequence neural networks,” arXiv preprint arXiv:1511.05493, 2015.
12

[21] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Ben- Selected Areas in Communications, vol. 38, no. 10, pp. 2249–2259,
gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017. 2020.
[22] P. W. Battaglia, R. Pascanu, M. Lai, D. J. Rezende et al., “Interaction [45] S. Troia, F. Sapienza, L. Varé, and G. Maier, “On deep reinforcement
networks for learning about objects, relations and physics,” in Proceed- learning for traffic engineering in sd-wan,” IEEE Journal on Selected
ings of Advances in neural information processing systems (NIPS), 2016, Areas in Communications, vol. 39, no. 7, pp. 2198–2212, 2020.
pp. 4502–4510. [46] F. Geyer and G. Carle, “Learning and generating distributed routing
[23] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine learning, protocols using graph-based deep learning,” in Proceedings of the ACM
vol. 8, no. 3-4, pp. 279–292, 1992. SIGCOMM Workshop on Big Data Analytics and Machine Learning for
[24] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- Data Communication Networks (Big-DAMA), 2018, pp. 40–45.
stra, and M. Riedmiller, “Playing atari with deep reinforcement learn- [47] H. Zhu, V. Gupta, S. S. Ahuja, Y. Tian, Y. Zhang, and X. Jin, “Network
ing,” arXiv preprint arXiv:1312.5602, 2013. planning with deep reinforcement learning,” in Proceedings of the 2021
[25] J. Kuri, N. Puech, and M. Gagnaire, “Diverse routing of scheduled ACM SIGCOMM 2021 Conference, 2021, pp. 258–271.
lightpath demands in an optical transport network,” in Proceedings of [48] G. Bernárdez, J. Suárez-Varela, A. López, B. Wu, S. Xiao, X. Cheng,
the IEEE International Workshop on Design of Reliable Communication P. Barlet-Ros, and A. Cabellos-Aparicio, “Is machine learning ready
Networks (DRCN), 2003, pp. 69–76. for traffic engineering optimization?” in 2021 IEEE 29th International
[26] “Itu-t recommendation g.709/y.1331: Interface for the optical transport Conference on Network Protocols (ICNP). IEEE, 2021, pp. 1–11.
network,” 2019, https://www.itu.int/rec/T-REC-G.709/. [49] H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Al-
[27] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. izadeh, “Learning scheduling algorithms for data processing clusters,”
MIT press, 2018. in Proceedings of ACM SIGCOMM, 2019, pp. 270–288.
[28] K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the
properties of neural machine translation: Encoder-decoder approaches,”
arXiv preprint arXiv:1409.1259, 2014.
[29] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large-
scale machine learning,” in Proceedings of the 12th USENIX Symposium
on Operating Systems Design and Implementation (OSDI), 2016, pp.
265–283.
[30] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schul-
man, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint
arXiv:1606.01540, 2016.
[31] L. Bottou, “Large-scale machine learning with stochastic gradient de-
scent,” in Proceedings of the International Conference on Computational
Statistics (COMPSTAT), 2010, pp. 177–186.
[32] X. Hei, J. Zhang, B. Bensaou, and C.-C. Cheung, “Wavelength converter
placement in least-load-routing-based optical networks using genetic
algorithms,” Journal of Optical Networking, vol. 3, no. 5, pp. 363–378,
2004.
[33] F. Barreto, E. C. Wille, and L. Nacamura Jr, “Fast emergency paths
schema to overcome transient link failures in ospf routing,” arXiv
preprint arXiv:1204.2465, 2012.
[34] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure, “Achieving
sub-second igp convergence in large ip networks,” ACM SIGCOMM
Computer Communication Review, vol. 35, no. 3, pp. 35–44, 2005.
[35] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh,
S. Venkata, J. Wanderer, J. Zhou, M. Zhu et al., “B4: Experience with
a globally-deployed software defined wan,” ACM SIGCOMM Computer
Communication Review, vol. 43, no. 4, pp. 3–14, 2013.
[36] A. Hagberg, P. Swart, and D. S Chult, “Exploring network struc-
ture, dynamics, and function using networkx,” Los Alamos National
Lab.(LANL), Los Alamos, NM (United States), Tech. Rep., 2008.
[37] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph
neural networks?” arXiv preprint arXiv:1810.00826, 2018.
[38] L. Gong, X. Zhou, X. Liu, W. Zhao, W. Lu, and Z. Zhu, “Efficient
resource allocation for all-optical multicasting over spectrum-sliced
elastic optical networks,” IEEE/OSA Journal of Optical Communications
and Networking, vol. 5, no. 8, pp. 836–847, 2013.
[39] L. Gong, X. Zhou, W. Lu, and Z. Zhu, “A two-population based
evolutionary approach for optimizing routing, modulation and spectrum
assignments (rmsa) in o-ofdm networks,” IEEE Communications letters,
vol. 16, no. 9, pp. 1520–1523, 2012.
[40] M. Klinkowski, M. Ruiz, L. Velasco, D. Careglio, V. Lopez, and
J. Comellas, “Elastic spectrum allocation for time-varying traffic in
flexgrid optical networks,” IEEE journal on selected areas in communi-
cations, vol. 31, no. 1, pp. 26–38, 2012.
[41] Y. Wang, X. Cao, and Y. Pan, “A study of the routing and spectrum
allocation in spectrum-sliced elastic optical path networks,” in IEEE
International Conference on Computer Communications (INFOCOM),
2011, pp. 1503–1511.
[42] K. Christodoulopoulos, I. Tomkos, and E. A. Varvarigos, “Elastic
bandwidth allocation in flexible ofdm-based optical networks,” Journal
of Lightwave Technology, vol. 29, no. 9, pp. 1354–1366, 2011.
[43] P. Sun, Z. Guo, J. Lan, J. Li, Y. Hu, and T. Baker, “Scaledrl: a scalable
deep reinforcement learning approach for traffic engineering in sdn with
pinning control,” Computer Networks, vol. 190, p. 107891, 2021.
[44] J. Zhang, M. Ye, Z. Guo, C.-Y. Yen, and H. J. Chao, “Cfr-rl: Traffic
engineering with reinforcement learning in sdn,” IEEE Journal on

Abstract-Deep Learn-Ing (DL) Is Becoming A Powerful Method To Add Intelligence To
No ratings yet
Abstract-Deep Learn-Ing (DL) Is Becoming A Powerful Method To Add Intelligence To
4 pages
Knowledge & Practical Interests PDF
100% (3)
Knowledge & Practical Interests PDF
204 pages
Towards Delivering A Coherent Self-Contained Explanation of Proximal Policy Optimization
No ratings yet
Towards Delivering A Coherent Self-Contained Explanation of Proximal Policy Optimization
36 pages
Grom Paper
No ratings yet
Grom Paper
13 pages
Deep Reinforcement Learning For Cybersecurity Threat Detection and Protection: A Review
No ratings yet
Deep Reinforcement Learning For Cybersecurity Threat Detection and Protection: A Review
30 pages
DRL in Communications Presentation
No ratings yet
DRL in Communications Presentation
19 pages
Foreword
No ratings yet
Foreword
1,318 pages
2856practical Decision Making Using Super Decisions v3 An Introduction To The Analytic Hierarchy Process 1st Edition Enrique Mu Download
No ratings yet
2856practical Decision Making Using Super Decisions v3 An Introduction To The Analytic Hierarchy Process 1st Edition Enrique Mu Download
57 pages
Network Optimization - GNN
No ratings yet
Network Optimization - GNN
34 pages
Deif GC1F
100% (1)
Deif GC1F
80 pages
1DRL Networks Presentation
No ratings yet
1DRL Networks Presentation
19 pages
Unit-Ii DLL
No ratings yet
Unit-Ii DLL
19 pages
70 413 CertifyChat Vce 20 01 2016 v1
No ratings yet
70 413 CertifyChat Vce 20 01 2016 v1
181 pages
Safe Load Balancing in Software-Defined-Networking
No ratings yet
Safe Load Balancing in Software-Defined-Networking
42 pages
On The Bottleneck of Graph Neural Networks
No ratings yet
On The Bottleneck of Graph Neural Networks
16 pages
Challenges and Opportunities in Deep Reinforcement Learning With Graph Neural Networks: A Comprehensive Review of Algorithms and Applications
No ratings yet
Challenges and Opportunities in Deep Reinforcement Learning With Graph Neural Networks: A Comprehensive Review of Algorithms and Applications
19 pages
EN ACS880 DDC CTRL PRG YDCLX FW A A4
No ratings yet
EN ACS880 DDC CTRL PRG YDCLX FW A A4
230 pages
RL Decision
No ratings yet
RL Decision
40 pages
Sensors 22 08139 With Cover
No ratings yet
Sensors 22 08139 With Cover
20 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
58 pages
RLtools-Nov. 2024
No ratings yet
RLtools-Nov. 2024
19 pages
RAMET Main Brochure
No ratings yet
RAMET Main Brochure
60 pages
ADAPTIVE6G - Adaptive Resource Management For Network Slicing Architectures in Current 5G and Future 6G Systems
No ratings yet
ADAPTIVE6G - Adaptive Resource Management For Network Slicing Architectures in Current 5G and Future 6G Systems
24 pages
Report Final
No ratings yet
Report Final
15 pages
Transfer Learning in Deep Reinforcement Learning A Survey
No ratings yet
Transfer Learning in Deep Reinforcement Learning A Survey
19 pages
Deep Reinforcement Learning Meets Graph Neural Networks: Exploring A Routing Optimization Use Case
No ratings yet
Deep Reinforcement Learning Meets Graph Neural Networks: Exploring A Routing Optimization Use Case
11 pages
T RL: Efficient Deep Reinforcement Learning With Polyhedral Dependence Graphs
No ratings yet
T RL: Efficient Deep Reinforcement Learning With Polyhedral Dependence Graphs
17 pages
Defining Problem From Solutions - Inverse Reinforcement Learning (IRL) and Its Applications For Next-Generation Networking
No ratings yet
Defining Problem From Solutions - Inverse Reinforcement Learning (IRL) and Its Applications For Next-Generation Networking
9 pages
E-Commerce Mis CH 10
No ratings yet
E-Commerce Mis CH 10
42 pages
Guia Usuario Monitor CFX 750 Trimble - Uniport 2500
No ratings yet
Guia Usuario Monitor CFX 750 Trimble - Uniport 2500
184 pages
DCN v2 Paper
No ratings yet
DCN v2 Paper
14 pages
1 Toward Packet Routing With Fully Distributed Multiagent Deep Reinforcement Learning
No ratings yet
1 Toward Packet Routing With Fully Distributed Multiagent Deep Reinforcement Learning
14 pages
DRL Framework Comparison
No ratings yet
DRL Framework Comparison
2 pages
Robust Deep Learning For Wireless Network Optimization
No ratings yet
Robust Deep Learning For Wireless Network Optimization
7 pages
Deep Reinforcement Learning For Communication Flow Control in Wireless Mesh Networks
No ratings yet
Deep Reinforcement Learning For Communication Flow Control in Wireless Mesh Networks
8 pages
2023-Dealing With Changes Resilient Routing Via Graph Neural Networks and Multi-Agent Deep Reinforcement Learning
No ratings yet
2023-Dealing With Changes Resilient Routing Via Graph Neural Networks and Multi-Agent Deep Reinforcement Learning
12 pages
Group 4: Diet For Healthy Teath Bones
No ratings yet
Group 4: Diet For Healthy Teath Bones
26 pages
Deep Learning Wireless Networks
No ratings yet
Deep Learning Wireless Networks
27 pages
RLtools
No ratings yet
RLtools
15 pages
Dynamic Routing Networks: Shaofeng Cai Yao Shu Wei Wang National University of Singapore
No ratings yet
Dynamic Routing Networks: Shaofeng Cai Yao Shu Wei Wang National University of Singapore
10 pages
Microsoft Word - WW Formula 2nd Edition Cover and Index
No ratings yet
Microsoft Word - WW Formula 2nd Edition Cover and Index
20 pages
Brochure Ing EPIQ Elite-6.0 GI
No ratings yet
Brochure Ing EPIQ Elite-6.0 GI
28 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
8 pages
Deep Reinforcement Learning For AI - Powered Robotics
No ratings yet
Deep Reinforcement Learning For AI - Powered Robotics
4 pages
P4 Packet Routing
No ratings yet
P4 Packet Routing
6 pages
AlonAndYahav 2021 On The Bottleneck of Graph Neu
No ratings yet
AlonAndYahav 2021 On The Bottleneck of Graph Neu
16 pages
An Optimized GNN-Based Caching Scheme For SDN-Based Information-Centric Networks
No ratings yet
An Optimized GNN-Based Caching Scheme For SDN-Based Information-Centric Networks
6 pages
Federated Deep Reinforcement Learning For Task Offloading in Digital Twin Edge Networks
No ratings yet
Federated Deep Reinforcement Learning For Task Offloading in Digital Twin Edge Networks
12 pages
MZDG - 66172 1 en 0603
No ratings yet
MZDG - 66172 1 en 0603
44 pages
00 AAPRINCIPLES - Syllabus
No ratings yet
00 AAPRINCIPLES - Syllabus
13 pages
71 Graph Q Learning For Combinato
No ratings yet
71 Graph Q Learning For Combinato
8 pages
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
No ratings yet
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
7 pages
Attributes of God
No ratings yet
Attributes of God
13 pages
This Spreadsheet Supports STUDENT Analysis of The Case "Transportation and Consolidation at Elevalt LTD." (UVA-OM-1490)
No ratings yet
This Spreadsheet Supports STUDENT Analysis of The Case "Transportation and Consolidation at Elevalt LTD." (UVA-OM-1490)
7 pages
LMHC
No ratings yet
LMHC
1 page
IEEE Conf v2 0-3
No ratings yet
IEEE Conf v2 0-3
6 pages
Deep Reinforcement Learning: A Brief Survey
No ratings yet
Deep Reinforcement Learning: A Brief Survey
13 pages
I. Key Points of The Paper: Current Challenges
No ratings yet
I. Key Points of The Paper: Current Challenges
6 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
7 pages
Hearkenign To SIlece - Merlo Ponti
No ratings yet
Hearkenign To SIlece - Merlo Ponti
6 pages
Approaches To Stylistics and The Literar
No ratings yet
Approaches To Stylistics and The Literar
8 pages
MEG511 - Term Report
No ratings yet
MEG511 - Term Report
15 pages
24 Generic Toolbar Component 169163
No ratings yet
24 Generic Toolbar Component 169163
8 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
61 pages
Reinforcement Learning Based Routing in Networks R
No ratings yet
Reinforcement Learning Based Routing in Networks R
9 pages
English Reading Activites - Santiago Arellano - 14-12
No ratings yet
English Reading Activites - Santiago Arellano - 14-12
3 pages
Reporte de Belmont
No ratings yet
Reporte de Belmont
8 pages
KET Speaking TIPS
No ratings yet
KET Speaking TIPS
3 pages
Exercise 2 Memory Management
No ratings yet
Exercise 2 Memory Management
2 pages
5G Tasks
No ratings yet
5G Tasks
4 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
3 pages
Vacation Survey
No ratings yet
Vacation Survey
2 pages
DHCP Configration
No ratings yet
DHCP Configration
3 pages
(2020 - IEEE-Trsnsactions Omn NSAE) RL-Routing - An SDN Routing Alogorirjm Based On Deep Reinforcement Learning
No ratings yet
(2020 - IEEE-Trsnsactions Omn NSAE) RL-Routing - An SDN Routing Alogorirjm Based On Deep Reinforcement Learning
15 pages
Smart M Air Connection Manual
No ratings yet
Smart M Air Connection Manual
6 pages
Over Load Protection For Transformer
No ratings yet
Over Load Protection For Transformer
45 pages
Toward Packet Routing With Fully-Distributed Multi-Agent Deep Reinforcement Learning
No ratings yet
Toward Packet Routing With Fully-Distributed Multi-Agent Deep Reinforcement Learning
8 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
A Brief Survey of Deep Reinforcement Learning
No ratings yet
A Brief Survey of Deep Reinforcement Learning
16 pages
(2018 - ICCC - IEEE) RNN Deep Reinforcement Learning For Routing Optimization
No ratings yet
(2018 - ICCC - IEEE) RNN Deep Reinforcement Learning For Routing Optimization
5 pages
Deepgraphonet: A Deep Graph Operator Network To Learn and Zero-Shot Transfer The Dynamic Response of Networked Systems
No ratings yet
Deepgraphonet: A Deep Graph Operator Network To Learn and Zero-Shot Transfer The Dynamic Response of Networked Systems
10 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
9 pages
Deep Reinforcement Learning For Router Selection in Network With
No ratings yet
Deep Reinforcement Learning For Router Selection in Network With
12 pages
11 Chemistry
No ratings yet
11 Chemistry
2 pages
Deep Learning For Intelligent Wireless Networks: A Comprehensive Survey
No ratings yet
Deep Learning For Intelligent Wireless Networks: A Comprehensive Survey
25 pages
Approximation - and Quantization-Aware Training For Graph Neural Networks
No ratings yet
Approximation - and Quantization-Aware Training For Graph Neural Networks
14 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
16 pages
Dgraph Essentials: The Complete Guide for Developers and Engineers
From Everand
Dgraph Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Software Defined Networking (SDN) - a definitive guide
From Everand
Software Defined Networking (SDN) - a definitive guide
Rajesh Kumar Sundararajan
2/5 (2)

DRL GNN

Uploaded by

DRL GNN

Uploaded by

1

Deep Reinforcement Learning meets Graph Neural

IV. GNN- BASED DRL AGENT DESIGN h6

defines the optimization problem to solve. This environment 001 2 x1 x2 8 … 0

The result is passed through a fully-connected DNN which .

Score (bw allocated)

serves us as a reference to compare the performance of the 250 250

node Geant2 topology [33]. We made 1,000 experiments with 250

D. Use case: Link failure resilience NETWORK

Relative performance (%)

score (i.e., bandwidth allocated) achieved by the DRL+GNN 0

of nodes. For each topology size –in number of nodes–

Inference Time (ms)

drop in performance. This result shows that the generalization 6.0

Avg Var DRL+GNN

You might also like