A Reinforcement Learning-Based Traffic Engineering

electronics
Article
A Reinforcement Learning-Based Traffic Engineering Algorithm
for Enterprise Network Backbone Links
Haixiu Cheng 1,2 , Yingxin Luo 1,2 , Ling Zhang 1,2, * and Zhiwen Liao 3
1 Guangdong Province Key Laboratory of Computer Network, South China University of Technology,
Guangzhou 510641, China; [email protected] (H.C.); [email protected] (Y.L.)
2 School of Computer Science & Engineering, South China University of Technology, Guangzhou 510641, China
3 School of Artificial Intelligence, Zhuhai City Polytechnic, Zhuhai 519090, China
* Correspondence: [email protected]
Abstract: Large enterprise networks typically rely on expensive, high-speed backbone links to connect
multiple campuses across diverse regions. As the volume of traffic traversing these backbone links
increases, traffic engineering techniques are employed to filter or redirect traffic flows. Nevertheless,
simple rerouting strategies can introduce business disruptions such as packet reordering, which
significantly impact the user experience. To address this issue, we introduce an enhanced traffic
scheduling algorithm named Critical Flow Rerouting with Weight- Reinforcement Learning(CFRW-
RL), which builds upon the critical flow rerouting-reinforcement learning (CFR-RL) algorithm.
CFRW-RL incorporates the principles of reinforcement learning, accounting for both the weights
and classifications of data flows. This approach enables the algorithm to prioritize flows with lower
weights for rerouting. The simulation results demonstrate that CFRW-RL significantly minimizes
the rerouting of high-priority business flows and reduces business interference compared with the
CFR-RL algorithm and that it maintains a similar computational complexity.
Keywords: reinforcement learning; enterprise network backbone links; traffic classification; user
experience
Citation: Cheng, H.; Luo, Y.; Zhang,

L.; Liao, Z. A Reinforcement
1. Introduction
Learning-Based Traffic Engineering In the current digital era, data centers, serving as the central hub for cloud computing,
Algorithm for Enterprise Network big data processing, and various online services, present both challenges and opportunities.
Backbone Links. Electronics 2024, 13, As the data center scale expands and service types increase, distributed data centers
1441. https://doi.org/10.3390/ have become prevalent [1]. The network traffic within these distributed data centers
electronics13081441 has experienced rapid growth, necessitating efficient traffic scheduling mechanisms to
Academic Editor: Franco Cicirelli optimize network resource allocation and enhance network performance [2]. In the context
of distributed data centers, high bandwidth costs and suboptimal resource utilization
Received: 10 March 2024 are common issues in inter-domain networks [3]. The complexity of inter-domain traffic
Revised: 4 April 2024
scheduling in data centers stems from diverse application and service requirements, each
Accepted: 9 April 2024
with unique and dynamic use of network resources [4]. Therefore, the implementation
Published: 11 April 2024
of more intelligent traffic engineering is crucial for effective scheduling of inter-domain
network traffic.
Video distribution networks (VDNs) and content distribution networks (CDNs) are
Copyright: © 2024 by the authors.
also confronted with the issue of inter-domain traffic scheduling [5]. These networks
Licensee MDPI, Basel, Switzerland. operate in a distributed environment, highlighting the importance of intelligent scheduling
This article is an open access article for inter-domain traffic. By employing traffic engineering to schedule inter-domain network
distributed under the terms and traffic for data centers, VDNs, and CDNs, not only can network resource utilization be
conditions of the Creative Commons enhanced, but service quality can be optimized and user experience improved.
Attribution (CC BY) license (https:// In the field of traffic engineering (TE), reinforcement learning (RL) and its evolution
creativecommons.org/licenses/by/ into deep reinforcement learning (DRL) are recognized as pivotal technologies for optimiz-
4.0/). ing network performance. As network scales grow and traffic patterns become more intri-
Electronics 2024, 13, 1441. https://doi.org/10.3390/electronics13081441 https://www.mdpi.com/journal/electronics

Electronics 2024, 13, 1441 2 of 17
cate, traditional TE methods face challenges in meeting the demands of modern networks.
To address this evolving landscape, researchers have started to utilize RL technologies
to adaptively adjust network configurations. Guo et al. [6] introduced an innovative TE
method based on DRL, focusing on optimizing traffic allocation in shared data center wide
area networks. Their approach minimizes link bandwidth utilization and transmission
delay to achieve load balancing while maintaining low latency for high-priority traffic. Xu
et al. [7] proposed a novel network control strategy using an experience-driven deep rein-
forcement learning approach, which enhances decision-making accuracy by supervising the
learning of network dynamics through Deep Neural Networks (DNNs). Furthermore, Guo
et al. [8] implemented RL in hybrid software-defined network environments, dynamically
adjusting routing strategies based on learned traffic patterns to improve the quality of
service for critical traffic and optimize network resource utilization.
Sun et al. [9] and Sun et al. [10] introduced deep reinforcement learning methods that
consider time relevance and scalability, respectively, providing new insights and solutions
for TEby addressing the real-time and scalable nature of networks. Rischke et al. [11]
proposed QR-SDN, a reinforcement learning approach for direct flow routing in software-
defined networks(SDN), defining states, actions, and rewards to optimize routing decisions.
Wu et al. [12] utilized a multi-agent deep reinforcement learning strategy for simultaneous
traffic control and multichannel reassignment in the core backbone network of SDN-
IoT, effectively improving packet throughput at the link layer and reducing packet loss
and delay.
The study also highlights the importance of investigating potential configuration
overhead and business disturbances (such as packet disorder and service interruptions)
during the traffic scheduling process in TE. Zhang et al. [13] initially suggested a cautious
routing scheme to address packet disorder issues in TE by evaluating the benefits of path
switching for routing decisions. Expanding on this concept, Zhang et al. [14] introduced the
CFR-RL algorithm, a TE method based on RL, which optimizes load balancing by routing
critical data flows while minimizing network interference and business disruptions.
The inter-domain networks of distributed data centers manage traffic for both data
centers and user Internet applications [8]. These two types of traffic have varying network
parameters like latency, jitter, and bandwidth, which require different handling and prioriti-
zation. Alibaba’s traffic scheduling system NetO [15], as well as studies by W. Ran et al. [16]
and Kandula S et al. [17], recognize the presence of these distinct levels of traffic priority.
This underscores the importance of implementing differentiated handling strategies for
traffic with different priorities in inter-domain networks of distributed data centers to
ensure optimal performance.
Previous studies [13,14] have not adequately accounted for traffic prioritization, lead-
ing to a uniform treatment of all data streams, which overlooks the varying impacts that
different levels of data streams can have on the network. By introducing data stream
weight, we propose the introduction of data stream weight to signify the importance of
each flow. By prioritizing the rerouting of critical flows with lower weights, we can mini-
mize disruptions to business operations and maintain network performance. This approach
ensures the stability and continuity of essential business processes.
The remainder of this paper is organized as follows: Section 2 presents the CFRW-RL
algorithm model, Section 3 outlines its implementation, Section 4 discusses the simulation
and result analysis, and, finally, Section 5 offers the conclusions.
2. The CFRW-RL Model

This section provides an overview of the architecture and theoretical foundations of
the CFRW-RL model.
2.1. Model Overview

The CFRW-RL model is dedicated to accurately identifying critical flows in the
network—flows that significantly impact network performance yet have relatively low
Electronics 2024, 13, 1441 3 of 17
average weights—and performing intelligent rerouting operations. The model’s aim is

to enhance the overall network performance while minimizing disruptions to business
processes and reducing the risk of service interruptions and delays. Leveraging the pow-
erful capabilities of reinforcement learning (RL), the CFRW-RL model continually adapts
to the dynamic changes in network conditions and formulates more precise and efficient
strategies for traffic rerouting issues.
In addressing the task of critical flow selection, the CFRW-RL model employs an Actor–
Critic architecture, which ingeniously integrates both policy-based and value-based learn-
ing methods. The Actor component is responsible for making real-time traffic rerouting
decisions based on the learned policy, aiming to optimize the network’s overall operational
efficiency. Meanwhile, the Critic component evaluates the decisions made by the Actor,
providing feedback by estimating the value of states, which guides the Actor in adjust-
ing and refining its strategy. This interactive, two-way learning mechanism enables the
CFRW-RL model to effectively manage critical flows while minimizing interference with
the network’s routine operations. Through this coordinated learning and decision-making
process, the CFRW-RL model demonstrates its efficiency and reliability in modern network
traffic engineering.
2.2. State and Action Space

1. Input/State Space
The state space is represented by the following formula:

S = { S1 , S2 , . . . , S t , . . . , S n }
(1)
St = TMt ∪ TWt
where St represents the state at time t, and TMt is the traffic matrix at time t, to express
the bandwidth requirements among different data flows, indicating how much bandwidth
each data flow needs for transmission. TWt is the data-flow weight matrix at time t, used
to represent the weights of different categories of traffic, measuring the negative impact
of different categories of data flows on the network during rerouting. TMt and TWt are
isomorphic, with the only difference being the meanings of their elements, as indicated in
Equation (2).  
T11 T12 · · · T1j · · · T1n2
 21 T22 · · · T2j · · · T2n2 
T 
 . .. . . 
 . . .. .. 
 .
T= (2)

 Ti1 Ti2 · · · Tij · · · Tin2 

 .. .. .. .. .. 
 
 . . . . . 
Tm1 Tm2 · · · Tmj · · · Tmn2
In this context, n signifies the total count of nodes, while m corresponds to the number
of time steps involved.
Data-flow weight values represent the proportion of network service traffic demanded
by users within a data flow. A larger weight value indicates a higher demand for net-
work service traffic by users. This definition is chosen to enhance the speed of network
scheduling. The algorithm does not schedule each small network service individually
but rather schedules each large data flow as a whole. Each large data flow comprises
both user-demand network services and data center-demand network services. Data-flow
weight values are used to distinguish their importance.
2. Action Space
The action space can be represented by Equation (3).
At = {0, 1, . . . , ( N × ( N − 1) − 1)} (3)

Electronics 2024, 13, 1441 4 of 17
For each state St , CFRW-RL selects K critical flows. In a network system with N switch
nodes, considering the source-destination switch perspective, a total of N × ( N − 1) data
flows exist, meaning that any switch acting as the source node can establish data flows
with all other switches acting as destination nodes. Therefore, the action space contains
all possible combinations of data flows, totaling CN K K
×( N −1) combinations, where CN ×( N −1)
denotes the number of combinations of choosing K data flows from N × ( N − 1) flows.
The action space has a large scale, and directly computing all possible combinations would
result in excessively high computational complexity. To reduce time complexity, CFRW-RL
employs the method proposed by Zhang et al. [14] and Mao et al. [18], directly selecting K
data flows from N × ( N − 1) flows and associating each possible action with a specific data
flow. This definition effectively reduces the scale of the action space, making the algorithm
more efficient in addressing data-flow scheduling and rerouting problems.
During the algorithm’s iteration process, at each time step, K different data flows are
chosen as the actions the agent will take in the next state:
n o
atK = a1t , a2t , . . . , atK . (4)
This action-selection method ensures the algorithm’s rationality while reducing com-
putational complexity, enabling CFRW-RL to better adapt to large-scale network systems
and improve the efficiency of data-flow scheduling and rerouting.
2.3. Reward Function

The reward function is
1 wf
rt = − ε ∗ i ( f i ∈ FK ). (5)
Ut K
where Ut represents the minimized maximum link bandwidth occupancy rate after opti-
mization of key flow rerouting, FK = { f 1 , f 2 , . . . , f i , . . . , f K } represents the set of K critical
flows, w f i represents the weight of the i-th critical f , ε represents the weight factor of critical
flows in the reward function, and K is the number of critical flows. The term 1/Ut in the
reward function translates the effect of optimizing and reducing the maximum link band-
width occupancy rate, where a higher value results in a higher reward. The term ε ∗ (w f i /K )
represents the average weight of the selected K critical flows, where a smaller value leads
to a higher reward. This is intended to encourage the algorithm to choose critical flows
with a smaller impact on the network, thereby minimizing business interference.
2.4. Actor Network and Critic Network

The CFRW-RL model utilizes a combined Actor–Critic architecture, where the Actor
network is responsible for selecting actions based on a learned policy function. This policy
function maps states to action probabilities, allowing for both the exploration of new
strategies and the exploitation of known effective actions. The Critic network, on the other
hand, estimates the value function, which predicts the expected cumulative reward for
a given state or state–action pair. The Critic’s valuations provide feedback to the Actor,
guiding it toward optimal policies that enhance network performance. The policy network
and the value network’s structural diagrams are depicted in Figures 1 and 2, respectively.
function maps states to action probabilities, allowing for both the exploration of new strat-
egies and the exploitation of known effective actions. The Critic network, on the other
hand, estimates the value function, which predicts the expected cumulative reward for a
given state or state–action pair. The Critic’s valuations provide feedback to the Actor,
Electronics 2024, 13, 1441 guiding it toward optimal policies that enhance network performance. The policy network
5 of 17
and the value network’s structural diagrams are depicted in Figures 1 and 2, respectively.
.. ..
.. ..
.. ..
Traffic
Traffic matrix
matrix
.. ..
.. ..
.. ..
Traffic
Traffic weight
weight matrix
matrix
convolutional
convolutional layer
layer fully
fully connected
connected layer
layer
Input
Input layer
layer
Figure 1. Structural
1.Structural
Figure1.
Figure diagram
Structuraldiagram of
diagramof the
ofthe policy
thepolicy network.
policynetwork.
network.
Value
Value Network(Critic)
Network(Critic)
convolution
convolution
Traffic
Traffic matrix
matrix
convolution
convolution
concatenate
concatenate
Dense
Dense
Traffic
Traffic weight
weight matrix
matrix
feature
feature q(s,a,w)
q(s,a,w)
Dense
Dense
feature
feature
action
action aa
Figure 2. Structural Diagram of the value network.

Electronics 2024, 13, x FOR PEER REVIEW 6 of 19
Figure
Figure 2.
2. Structural
Structural Diagram
Diagram of
of the
the value
value network.
network.
As illustrated in Figure 3, the model structure of the Actor–Critic method is presented.
As illustrated in Figure 3, the model structure of the Actor–Critic method is pre-
sented.
Figure 3. The model structure of the Actor–Critic method.

Figure 3. The model structure of the Actor–Critic method.
The Actor is the policy network, responsible for generating actions, that is, selecting an
The Actor is the policy network, responsible for generating actions, that is, selecting
action to perform in a given state. The Actor is typically parameterized by a policy function,
an action to perform in a given state. The Actor is typically parameterized by a policy
which can be either deterministic or stochastic. The goal of the Actor is to optimize its
function, which can be either deterministic or stochastic. The goal of the Actor is to opti-
mize its policy through learning so as to achieve the maximum cumulative reward in the
long term. The Critic, on the other hand, is the value network, responsible for evaluating
the effectiveness of the actions taken by the Actor. It learns a value function to estimate
the expected return of taking a specific action in a particular state. This value function can
be either a state-value function or an action-value function, providing feedback to the Ac-
Electronics 2024, 13, 1441 6 of 17
policy through learning so as to achieve the maximum cumulative reward in the long
term. The Critic, on the other hand, is the value network, responsible for evaluating the
effectiveness of the actions taken by the Actor. It learns a value function to estimate the
expected return of taking a specific action in a particular state. This value function can be
either a state-value function or an action-value function, providing feedback to the Actor
about the quality of its actions.
The state-value function of the Actor–Critic is defined as Equation (6):
V (s; θ, w) = ∑a π (a|s) ∗ q(s, a; w) . (6)
The equation represents the approximation of the state-value function using neural
networks:
• V (s; θ, w): this is the value function for state s, where θ represents the parameters
of the policy network (Actor) and w represents the parameters of the value function
network (Critic);
• Σa: this denotes the summation over all possible actions a, indicating the summation
over the action space;
• q(s, a; w): this is the State-action Value Function (Q-value Function) generated by the
value function network, representing the expected value of taking action a in state s;
• π ( a|s) : this is the action probability generated by the policy network, representing
the probability of taking action. This is the action probability generated by the policy
network, representing the probability of taking action a in state s.
In essence, this equation approximates the state-value function V (s; θ, w) as a weighted
sum of the state–action-value function q(s, a; w) for all possible actions in state s, weighted
by the corresponding action probabilities π ( a|s) generated by the policy network.
3. The CFRW-RL Algorithm

This section elaborates on the CFRW-RL algorithm, focusing on the critical flow
selection and rerouting processes that are central to its tTE approach.
3.1. Critical Flow Identification

The steps for updating the policy and value networks in the Actor–Critic method are
as follows:
1. Observe the current state of the environment at time step t, use it as input, and
calculate the probability distribution with the policy network. Randomly sample an
action at based on the computed probabilities.
2. The intelligent agent executes the action at , the environment transitions to a new state
st+1 , and a reward rt is given.
3. Use the new stat st+1 as input and calculate the probability distribution with the policy
network. Randomly sample an action at+1 (note that this action is for computing the
Q-value and is not actually executed).
4. Evaluate the value network using Equations (7) and (8).
q t = q ( St , a t , w t ) (7)
q t +1 = q ( S t +1 , e
a t +1 , w t ) (8)
5. Calculate the Temporal Difference (TD) error using Equation (9).
δt = qt − (rt + γ · qt+1 ) (9)
6. Derive the value network using Equation (10).
∂q(st , at ; w)
dw,t = | w = wt (10)
∂w
Electronics 2024, 13, 1441 7 of 17
7. Update the value network using Equation (11). This involves gradient descent to
make the predicted value closer to the TD target.
wt+1 = wt − α · δt · dw,t (11)
8. Derive the policy network using Equation (12).
∂ log π ( at |St , θ )
dθ,t = θ =θt (12)
∂θ
9. Update the policy network using Equation (13). This involves gradient ascent to
increase the score of the Actor’s action.
θt+1 = θt + β · qt · dθ,t (13)
Each iteration of the Actor–Critic method requires the above nine steps, during which
only one action is taken, one reward is observed, and the parameters of the neural network
are updated once.
The pseudocode for the critical selection algorithm is shown in Algorithm 1.
Algorithm 1: pseudocode for the critical flow selection algorithm

1 Initialize actor network parameters θ_actor, critic network parameters θ_critic
2 Initialize actor optimizer optimizer_actor and critic optimizer optimizer_critic
3 for each training iteration do:
Collect a batch of inputs, actions, rewards (and possibly next states) from the
4
environment
5 with tf.GradientTape(tape1) as tape1:
6 Compute critic model’s value predictions values from inputs
7 Calculate value loss and advantages using value_loss_fn with rewards and values
8 Compute gradients for the critic network with tape1
9 Update critic network parameters using optimizer_critic and critic_gradients
10 with tf.GradientTape(tape2) as tape2:
11 Compute actor model’s policy logits from inputs
Calculate policy loss and entropy using policy_loss_fn with logits, actions,
12
advantages, and entropy_weight
13 Compute gradients for the actor network with tape2
14 Update actor network parameters using optimizer_actor and actor_gradients
15 End for
In this pseudocode:
• ‘θ_actor’ and ‘θ_critic’ represent the parameters of the Actor network and Critic
network, respectively.
• ‘optimizer_actor’ and ‘optimizer_critic’ are the optimizers used to update the network
parameters.
• During each training iteration, a batch of inputs, actions, and rewards (and possibly
next states) is first collected from the environment.
• ‘tf.GradientTape’ is used to track the gradients of the Critic network, and to calculate
the value loss and advantage values.
• The gradients computed are used to update the parameters of the Critic network.
• ‘tf.GradientTape’ is used again to track the gradients of the Actor network, and to
calculate the policy loss and entropy.
• The gradients computed are used to update the parameters of the Actor network.
This process is repeated over multiple training iterations until the network parameters
converge to a point that optimizes the policy performance.
Electronics 2024, 13, 1441 8 of 17
3.2. Critical Flow Rerouting Strategy

The traffic in the network consists of two parts: the critical flows that need to be
rerouted and the residual flows outside the critical flows. Under the routing optimization
strategy used in this paper, the residual flows are routed by default using the ECMP
algorithm, while the critical flows are rerouted using linear programming. To minimize the
maximum link utilization ratio in the network, we employed a linear programming (LP)
model. A description of the symbols used in the model and their meanings are provided
in Table 1.
Table 1. Symbols and their meanings.
Symbol Meaning
Ci,j The physical bandwidth of link < i, j >∈ E, i ∈ V, j ∈ V.
li,j The total load of link < i, j >, < i, j >∈ E, i ∈ V, j ∈ V.
li,j ′ The initial load of link < i, j >∈ E, i ∈ V, j ∈ V.
fK The set of K critical flows identified by the CFRW-RL algorithm.
f ECMP The residual flows excluding f K .
The bandwidth requirement for the data flow from source node s to destination
D s,d
node d, s ∈ V, d ∈ V, s ̸= d.
s,d The probability of the data flow from source node s to destination node d traversing
pi,j
link < i, j >, s ∈ V, d ∈ V, s ̸= d, < i, j >∈ E.
U The maximum bandwidth utilization across various links in the network.
D ( f ECMP ) The traffic demand employing the ECMP algorithm.
D( f K ) The traffic demand of critical flow f K .
d (i )+ The in-degree of node i
d (i )− The out-degree of node i.
li,j f K The traffic load of critical flow allocated to link < i, j >.
If there are L links, with the load (i.e., traffic) on each link being l1 , l2 , . . . , l L , and the ca-
pacity of each link being C1 , C2 , . . . , CL , then the maximum utilization can be
expressed as
l
U = maxiL=1 ( i ) (14)
Ci
Here, li /Ci represents the utilization of the i-th link, which is the ratio of the traffic li
to the link capacity Ci . By taking the maximum of the utilization ratios across all links, the
maximum utilization rate of the entire network is obtained.
The model’s objective function is outlined in Equation (15), while the constraints are
detailed in Equations (16)–(22).
obj = minimize U (15)
d (i )+ = ∑k:<k,i>∈E pi,k
s,d
(16)
d (i )− = ∑k:<i,k>∈E ps,d
k,i (17)
li,j ′ = ∑<s,d>∈ f ECMP pi,js,d • Ds,d (18)
li,j f K = ∑<s,d>∈ fK pi,js,d • Ds,d (19)
li,j = li,j ′ + li,j f K (< i, j >∈ E, i ∈ V, j ∈ V ) (20)

li,j ≤ Ci,j ∗ U (21)

−1 i == s
d (i )+ − d (i )− = 0 other (22)
1 i == d

The in-degree d(i )+ and out-degree d(i )− can be calculated as shown in Equations
(15) and (16), respectively. The in-degree accounts for the sum of probabilities of traffic
Electronics 2024, 13, 1441 9 of 17
flow incoming to the node from others, whereas the out-degree corresponds to the sum of
probabilities of traffic flow outgoing from the node to other nodes.
The initial load of the link < i, j >, representing the residual traffic assigned by the
ECMP algorithm minus the critical flows’ traffic, is derived from Equation (17). The load
of critical flows on the link < i, j >, as allocated by the key flow routing algorithm, is
meticulously calculated using the model provided by Equation (18). This equation is
pivotal in determining the portion of traffic that is deemed critical and how it is distributed
across the network’s links. The total load on the link < i, j >, encompassing both the initial
load and the critical flow load, is synthesized in Equation (19).
To ensure that link traffic remains within acceptable limits, the condition set forth
in Equation (20) mandates that the total load on a link must be less than or equal to the
product of its physical bandwidth and the maximum utilization rate. This constraint is
crucial for preventing any link from being overloaded beyond its capacity.
Adhering to the principle of traffic conservation, data flows within the network are
governed by a set of rules. These rules, as articulated in Equation (21), dictate the flow
dynamics at various nodes. At the origin of a path, a node’s in-degree is null and its
out-degree is one, resulting in a net difference of −1. Conversely, at the path’s terminus,
the in-degree is one and the out-degree is zero, leading to a net difference of 1. For nodes
that serve as intermediaries along the path, both the in-degree and out-degree are one, with
no net difference between them.
4. Simulation and Results Analysis

4.1. Experimental Setup
4.1.1. Dataset
In this study, we utilize the Abilene open dataset, which was collected by Zhang [19],
describing traffic information between core nodes of the Abilene network (the U.S. Ed-
ucation Network). The network topology consists of 12 network nodes and 30 links.
This dataset records the traffic matrices between each pair of nodes in the network, with
each traffic matrix size being 12 × 12. Traffic matrices are collected every 5 min, result-
ing in a total of 288 traffic matrices per day. The dataset spans 6 months, comprising
51,840 traffic matrices.
4.1.2. Evaluation Metrics

1. Load-balancing Performance Ratio
To evaluate the load-balancing performance of the proposed CFRW_RL algorithm,
we use the load-balancing performance ratio (P) as a metric. The calculation of the load-
balancing performance ratio P is
Uopt
P= , (23)
UCFRW
where Uopt represents the optimized minimum maximum link bandwidth occupancy
achieved through linear programming-based route optimization when the critical flow
selection ratio is 100%. UCFRW is the minimized maximum link bandwidth occupancy
obtained after selecting critical data flows based on the given critical flow selection ratio
and optimizing their routing. A higher P value indicates better routing optimization
effectiveness.
2. Rerouting Disruption (RD)
RD is evaluated as the weighted average of the selected critical flows’ weights, calcu-
lated as
wf
RD = ε ∗ i ( f i ∈ FK ), (24)
K
where K represents the number of critical flows, FK denotes the set of critical flows, fi
represents the i-th critical flow, and w f i represents the weight of the i-th critical flow.
Electronics 2024, 13, 1441 10 of 17
3. Comparison Algorithms
To validate the effectiveness of the CFRW-RL algorithm, we employ three different
methods for critical flow selection. Two rule-based heuristic algorithms are used: Top-
K algorithm and Top-K critical algorithm. Additionally, a reinforcement learning-based
algorithm is employed.
(1) Top-K Algorithm
Selects the K flows with the highest traffic from the traffic matrix. The basic idea is
that flows with higher traffic have a more significant impact on network performance.
(2) Top-K Critical Algorithm
Selects the K flows with the highest traffic from the most congested links. The moti-
vation is that flows passing through the most congested links have a larger impact on the
network.
(3) CFR-RL Algorithm
CFR-RL algorithm is a reinforcement learning-based TE solution. It dynamically
adjusts routes based on network state and traffic demand by autonomously learning policies
for selecting critical flows. It leverages the flexibility of software-defined networking (SDN).
4.2. Experiments and Results Analysis

4.2.1. Generation of Data-Flow Weight Values
To ensure the reasonability of data-flow weight values, we assume a similar dis-
tribution for the network service traffic demanded by users and the sizes of data flow
traffic. Drawing inspiration from the distribution patterns of data center network ser-
vice sizes studied in previous work [20] this study randomly selects some traffic matrices
from the Abilene dataset. Common probability distributions are used to fit the traffic
distribution patterns of each traffic matrix, and the goodness of fit for each probability dis-
tribution is calculated. The distribution with the highest fit is selected to generate data-flow
weight values.
1. Data Fitting
The specific steps for data fitting in this study are as follows. In the Abilene dataset,
1000 traffic matrices are randomly selected. Each traffic matrix is treated as an array,
and the fitter toolkit is used to fit the distribution patterns of data items within each
array. Fifteen common distribution patterns are used for data fitting, including norm, t,
laplace, erlang, chi2, expon, exponpow, gamma, lognorm, uniform, pareto, weibull_min,
weibull_max, exponweib, and dweibull. The distribution with the highest goodness of fit is
selected, and the proportions of occurrence for these distributions are calculated. To ensure
accuracy, we repeat the data-fitting process 10 times, and the average of the results is taken.
Table 2 illustrates the fitted distribution patterns for data-flow weight values. The top four
distribution patterns with the highest fit probability are exponweib, weibull_min, gamma,
and lognorm.
Table 2. Fitted distribution of data stream weights.
Probability Distribution EXPONWEIB WEIBULL_MIN Gamma Gamma Other

Percentage 43.95% 20.19% 17.7% 12.81% 5.35%
2. Generating Weights Based on Probability Distribution and Normalization

Drawing upon the findings presented in Table 2, this study opts for the “exponweib”
probability distribution to assign weights to the traffic matrices in both the training and
testing datasets. Subsequently, these weights are normalized to facilitate the training and
evaluation of the CFRW-RL algorithm.
Electronics 2024, 13, 1441 11 of 17
4.2.2. Impact of Key Flow Setting Ratio on CFRW-RL Algorithm

This section aims to explore the performance and effectiveness differences of the
CFRW-RL algorithm under different critical flow selection ratios. To achieve this, we
conduct a series of simulation experiments on the Abilene dataset. We divide the critical
selection
flow ratioratio
selection fromfrom0 to030%to 30% intointo
16 16
groups
groups uniformly,
uniformly,where
where0% 0%represents
represents the default
the default
ECMP algorithm.
ECMP algorithm. We We randomly
randomly select select 2500
2500 traffic
traffic matrices
matrices as
as experimental
experimental data,data, with
with 2000
2000
used for
used for training
training thethe CFRW-RL
CFRW-RL model model and and 500500 for
for testing
testing the
the CFRW-RL
CFRW-RL algorithm.
algorithm. The The
network load-balancing
network load-balancing performance
performance ratio ratio under
under different
different critical
critical flow
flow ratios
ratios is
is calculated
calculated
using (12),
using (12), and
and histograms
histograms are are generated,
generated, as as shown
shownin inFigure
Figure1.1.
As depicted in Figure 4, the left y-axis
As depicted in Figure 4, the left y-axis corresponds tocorresponds to the
the load-balancing
load-balancing ratio,
ratio, while
while
the right
the righty-axis
y-axisindicates
indicates thethe average
average weight
weight of selected
of the the selected critical
critical flows.flows. Observations
Observations from
from Figure
Figure 4 reveal
4 reveal that across
that across load ratios
load ratios of 10,
of 10, 12, 14,12,
16,14,
18,16,
and18, andthe
20%, 20%, the load-balancing
load-balancing ratios
ratios consistently
consistently surpasssurpass
97%. The 97%. The primary
primary objective objective of the CFRW-RL
of the CFRW-RL algorithmalgorithm is to
is to reduce
reduce
the the average
average weight weight of the chosen
of the chosen critical critical flows, thereby
flows, thereby mitigating
mitigating businessbusiness inter-
interference.
ference. the
Among Among the six examined
six examined key flowkey flow
ratios, theratios, the minimum
minimum averageofweight
average weight of the
the selected
selectedflows
critical critical
is flows
achieved is achieved
at a keyatflow a key flowofratio
ratio 10%.ofConsequently,
10%. Consequently, for thefor the subse-
subsequent
experiments,
quent experiments,a key flow
a keyratio
flowof 10%
ratio ofis10%
maintained to optimize
is maintained the trade-off
to optimize between
the trade-off the
between
load-balancing
the load-balancing ratioratio
and and
the rate of business
the rate interference.
of business interference.
Figure 4.
Figure 4. Influence
Influence of
of different
different critical
critical flow
flow ratios
ratios on
on link
link load
load ratios.
ratios.
4.2.3.
4.2.3. Effect
Effect of
of Weight
Weight Proportion
Proportion εε onon CFRW-RL
CFRW-RLAlgorithm
Algorithm
During
During the analysis of the CFRW-RL algorithm’s impact
the analysis of the CFRW-RL algorithm’s impact on on weight
weight distribution,
distribution, wewe
fixed
fixed the
the proportion
proportion of of critical
critical flows
flows at
at 10%,
10%, and
and varied the weight
varied the weight ratio
ratio εε from
from 00 to
to 20,
20,
incrementing
incrementingby by0.5
0.5for
foreach
eachstep. ForFor
step. each distinct
each value
distinct of ε,ofwe
value ε,documented
we documentedthe load
the ratio
load
and the average reward value, as illustrated in Figure 5. A weight value of 0 signifies the
ratio and the average reward value, as illustrated in Figure 5. A weight value of 0 signifies
absence of the traffic weight matrix’s application, which implies that the CFR-RL algorithm
the absence of the traffic weight matrix’s application, which implies that the CFR-RL al-
is not being utilized.
gorithm is not being utilized.
The load-balancing ratio and the average weight of the selected critical flows are
delineated in Figure 5, with the former indicated by the left vertical axis and the latter by
the right vertical axis. Notably, the load-balancing ratio achieves a figure above 99% at
weight coefficients of 3.0, 4.5, 16.5, and 17.0. The CFRW-RL algorithm’s primary goal is to
optimize the balance between maintaining a high load-balancing ratio and minimizing the
average weight of critical flows. Among the coefficients tested, the lowest average weight
is observed at a coefficient of 4.5, signifying an optimal configuration for balancing load
and reducing flow weights. In light of these findings, a weight coefficient of 4.5 is selected
for implementation in subsequent experimental trials.
Electronics2024,
Electronics 2024,13,
13,1441
x FOR PEER REVIEW 13 of 19
12 of 17
Figure5.5.The
Figure Theinfluence
influenceof
oftraffic
trafficweight
weightproportion
proportionon
onthe
theCFRW-RL
CFRW-RLalgorithm.
algorithm.
4.2.4. Validation of the Effectiveness of a Reinforcement Learning Network Structure

The load-balancing ratio and the average weight of the selected critical flows are de-
The convolutional
lineated in Figure 5, with layers in the policy
the former network
indicated by theare utilized
left verticaltoaxis
extract
andfeatures
the latterrelated
by the
to the vertical
right weightsaxis.of data flows.the
Notably, Toload-balancing
investigate theratioperformance
achieves aoffigure
the CFRW-RL
above 99%algorithm
at weight
under different
coefficients policy
of 3.0, 4.5,network
16.5, andarchitectures, we vary the
17.0. The CFRW-RL number ofprimary
algorithm’s convolutional
goal is kernels,
to opti-
setting them to 64, 128, and 256, respectively. Three sets
mize the balance between maintaining a high load-balancing ratio and minimizingof experiments are conducted the
using
average theweight
Abilene of dataset, comparing
critical flows. Among thethethree networktested,
coefficients structures basedaverage
the lowest on indicators
weight
such as the load-balancing
is observed at a coefficient ratio,of 4.5,algorithm
signifyingoptimization time, and business
an optimal configuration interference
for balancing load
rate. The comparison of business interference rates for different network
and reducing flow weights. In light of these findings, a weight coefficient of 4.5 is selected structures with
CFRW-RL is shown in
for implementation inFigure 6.
subsequent experimental trials.
From the results presented in Figure 5, it is evident that an increase in the number of
nodes leads to a corresponding
4.2.4. Validation of the Effectiveness increaseof in the optimization
a Reinforcement time required
Learning Network by the algorithm.
Structure
This is attributable to the fact that a higher node count demands greater learning time and
The convolutional
computational resources.layers in thewhen
Moreover, policythe network
number areofutilized
nodes in tothe
extract features related
convolutional layer
to the weights of data flows. To investigate the performance of the
is set to 128, the algorithm achieves the optimal balance, with the highest load-balancing CFRW-RL algorithm
under
ratio anddifferent policy
the lowest network
business architectures,
disturbance rate.we vary the number
Consequently, of convolutional
this study has opted for ker-
a
nels, setting them
configuration of 128 nodes. to 64, 128, and 256, respectively. Three sets of experiments are con-
ductedIt isusing the Abilene
important to notedataset, comparing
that having too few thenodes
three can
network
resultstructures basedlearning
in insufficient on indi-
cators such as the load-balancing ratio, algorithm optimization time,
capacity for the policy network, while an excessive number of nodes can lead to an increased and business inter-
ference rate. The comparison of business interference rates for
business disturbance rate. This is primarily due to the risk of overfitting in the policy different network struc-
tures with
network, CFRW-RL
which is shown in
can compromise theFigure 6.
robustness of the algorithm. Additionally, as the size of
the policy network grows, so does the time required to train the model. Therefore, selecting
an appropriate network size is crucial for effectively balancing algorithm performance and
training duration.
Electronics 2024, 13, 1441 13 of 17
(a) (b)
(c)
Figure
Figure6.6.
This is is
This a performance
a performancecomparison
comparisonofofCFRW-RL
CFRW-RLalgorithm
algorithmfor
fordifferent
differentnumbers
numbers of
of convo-
convolu-
lutional layer nodes. (a) Comparison of description of load-balancing ratios for different numbersof
tional layer nodes. (a) Comparison of description of load-balancing ratios for different numbers
of nodes in convolutional layers; (b) comparison of algorithm optimization time for different num-
nodes in convolutional layers; (b) comparison of algorithm optimization time for different numbers
bers of nodes in convolutional layers; (c) comparison of business disruption rates for different num-
of nodes in convolutional layers; (c) comparison of business disruption rates for different numbers of
bers of nodes in convolutional layers.
nodes in convolutional layers.
From
4.2.5. the resultsComparison
Performance presented inbetween
Figure 5,CFRW-RL
it is evident that an increase
Algorithm in the Algorithms
and Heuristic number of
nodes leads to a corresponding increase in the optimization time required by the algo-
The comparison between the CFRW-RL algorithm and heuristic algorithms Crit_TopK
rithm. This is attributable to the fact that a higher node count demands greater learning
and TopK in terms of the optimization time, load-balancing ratio, and delay is illustrated in
time and computational resources. Moreover, when the number of nodes in the convolu-
Figure 7a–c.
tional layer is set to 128,
As depicted the algorithm
in Figure achieves algorithm
7, the CFRW-RL the optimalshows
balance, with theincrease
a modest highest in
load-
opti-
balancing ratio and the lowest business disturbance rate. Consequently,
mization time and delay when compared with the heuristic algorithms, namely top this study has K
opted for a configuration
and crit-topK. of 128
Despite this, thenodes.
CFRW-RL algorithm significantly outperforms these two
heuristic algorithms in terms of having
It is important to note that too few ratio.
load-balancing nodesThis
candistinction
result in insufficient learning
is particularly crucial
capacity for the
in scenarios policy
that demandnetwork, while an responsiveness,
high real-time excessive number of video
with nodesapplications
can lead to an in- a
being
creased business
prime example. disturbance rate. This is primarily due to the risk of overfitting in the
policy Thenetwork,
slightwhich
increasecanincompromise
optimization thetime
robustness
and delayof the algorithm.
for the CFRW-RL Additionally,
algorithm as can
the size of the policy network grows, so does the time required to train
be attributed to its more sophisticated approach to learning and adapting to network the model. There-
fore, selecting The
conditions. an appropriate
algorithm’snetwork size is learning
reinforcement crucial for effectivelyinvolves
framework balancinga algorithm
continuous
performance and training
process of evaluating and duration.
adjusting its strategy to achieve better long-term performance,
which inherently requires more computational effort and time compared to heuristic
4.2.5. Performance
algorithms Comparison
that follow simpler,between
rule-basedCFRW-RL Algorithm and Heuristic Algorithms
procedures.
TheHowever, the superior
comparison between load-balancing
the CFRW-RL ratioalgorithm
achieved byandtheheuristic
CFRW-RL algorithm
algorithms
justifies the
Crit_TopK andadditional computational
TopK in terms overhead.time,
of the optimization A higher load-balancing
load-balancing ratiodelay
ratio, and ensures
is
that network
illustrated resources
in Figure 7a–c.are utilized more efficiently, leading to a more stable and reliable
network performance,
As depicted which
in Figure 7, theisCFRW-RL
essential for maintaining
algorithm showsthe qualityincrease
a modest of service
in (QoS)
optimi-for
real-time applications, such as video streaming or video conferencing.
zation time and delay when compared with the heuristic algorithms, namely top K and
crit-topK. Despite this, the CFRW-RL algorithm significantly outperforms these two heu-
ristic algorithms in terms of load-balancing ratio. This distinction is particularly crucial in
scenarios that demand high real-time responsiveness, with video applications being a
prime example.
Electronics 2024, 13, 1441 14 of 17
(a) (b)
(c)
Figure 7. This
Figure is aisperformance
7. This comparison
a performance between
comparison CFRW-RL
between CFRW-RLalgorithm andand
algorithm heuristic algorithms.
heuristic algorithms.
(a)(a)
Comparison of time consumption for optimization between CFRW-RL algorithm
Comparison of time consumption for optimization between CFRW-RL algorithm and and heuristic
heuristic
algorithm; (b) comparison of load-balancing ratios between CFRW-RL algorithm and heuristic al-
algorithm; (b) comparison of load-balancing ratios between CFRW-RL algorithm and heuristic
gorithm; (c) comparison of optimization latency between CFRW-RL algorithm and heuristic algo-
algorithm; (c) comparison of optimization latency between CFRW-RL algorithm and heuristic
rithm.
algorithm.
ThePerformance
4.2.6. slight increase in optimization
Comparison betweentime
theand delay for
CFRW-RL the CFRW-RL
Algorithm algorithm
and CFR-RL can
Algorithm
be attributed to its more sophisticated approach to learning and adapting to network con-
The comparison results of CFRW-RL and CFR-RL algorithms in terms of optimization
ditions. The algorithm’s reinforcement learning framework involves a continuous process
time consumption, load-balancing ratio, and delay are shown in Figure 8a–c, respectively.
of evaluating and adjusting its strategy to achieve better long-term performance, which
The business interference rate is calculated in two ways: the first one is based on the
inherently requires more computational effort and time compared to heuristic algorithms
calculation of the business interference rate in the CFR-RL algorithm, which computes
that follow simpler, rule-based procedures.
the ratio of the traffic of critical flows to the total traffic; the second one is based on the
However,ofthe
calculation superior
the load-balancing
business interference ratio achieved
proportion in by
thethe CFRW-RLalgorithm,
CFRW-RL algorithm which
jus-
tifies the additional
calculates computational
the average weight of overhead. A critical
the selected higher flows.
load-balancing ratio ensures
The comparisons that
of the two
network resources are utilized more efficiently, leading to a more
business interference rates of the algorithm are shown in Figure 8d,e. stable and reliable net-
work performance,
As depicted which is 3,
in Table essential for maintaining
the comparison the quality
of CFRW-RL of servicealgorithms
and CFR-RL (QoS) for real-
across
time applications, such as video streaming or video conferencing.
various key performance indicators such as time consumption, load balancing, delay,
average traffic of critical flows, and average weight of critical flows is presented. This
4.2.6. Performance
table Comparison
provides a detailed between analysis,
comparative the CFRW-RL Algorithm
allowing and understanding
for a better CFR-RL of the
Algorithm
strengths and suitable applications of each algorithm.
The comparison
The results of
results presented in CFRW-RL and CFR-RL
Figure 8, along with the algorithms in terms
data from Table of optimiza-
2, indicate that the
tion time consumption,
CFRW-RL load-balancing
algorithm has ratio, and delay
a marginal improvement in are
timeshown in Figurecompared
consumption 8a–c, respec-
to the
tively.
CFR-RLThe algorithm,
business interference rate is calculated
with a reduction in two
of 0.6 s. This ways:
slight the first
decrease one is based
suggests on
a potential
thefor increasedofefficiency
calculation in the
the business CFRW-RLrate
interference algorithm. Both algorithms
in the CFR-RL algorithm,exhibit a high load-
which computes
thebalancing ratio,
ratio of the exceeding
traffic 95%,
of critical which
flows is indicative
to the of their
total traffic; effectiveness
the second in distributing
one is based on the
network traffic evenly and preventing potential bottlenecks.
calculation of the business interference proportion in the CFRW-RL algorithm, which cal-
culates the average weight of the selected critical flows. The comparisons of the two busi-
ness interference rates of the algorithm are shown in Figure 8d,e.
As depicted in Table 3, the comparison of CFRW-RL and CFR-RL algorithms across
various key performance indicators such as time consumption, load balancing, delay, av-
erage traffic of critical flows, and average weight of critical flows is presented. This table
provides a detailed comparative analysis, allowing for a better understanding of the
strengths and suitable applications of each algorithm.
In conclusion, the CFRW-RL algorithm, as demonstrated by the data in Table 2 and
the visualization in Figure 8, offers a refined approach to network traffic management. It
successfully balances the reduction of time consumption and load balancing with the min-
imization of business interference, making it a potentially superior choice for environ-
Electronics 2024, 13, 1441
ments that require high performance and minimal disruption to business operations. 15 of 17
(a) (b)
(c) (d)
(e)
Figure
Figure8.8.
This is is
This a performance
a performancecomparison between
comparison between CFRW-RL
CFRW-RL and CFR-RL.
and CFR-RL.(a)(a)
Comparison
Comparisonofof
time
time
consumption
consumption forforoptimization
optimizationbetween CFRW-RL
between CFRW-RLand andCFR-RL;
CFR-RL;(b)
(b)comparison
comparisonofofload-balancing
load-balancing
ratios between CFRW-RL and CFR-RL; (c) comparison of optimization latency between CFRW-RL
ratios between CFRW-RL and CFR-RL; (c) comparison of optimization latency between CFRW-RL
algorithm and CFR-RL c algorithm; (d) comparison of business interference rate (average weight of
algorithm and CFR-RL c algorithm; (d) comparison of business interference rate (average weight of
critical flows) between CFRW-RL and CFR-RL; (e) comparison of business interference rates (aver-
critical
age trafficflows) between
of critical CFRW-RL
flows) betweenand CFR-RL;and
CFRW-RL (e) comparison
CFR-RL. of business interference rates (average
traffic of critical flows) between CFRW-RL and CFR-RL.
5. Conclusions
Table 3. Performance comparison between CFRW-RL and CFR-RL algorithms.
In the domain of backbone link traffic scheduling within enterprise networks, the aim
to
Timeminimize the disruption to businessDelay
Load-Balancing activities during theAverage
The process of traffic Thescheduling
Average
Algorithm
has led
Consumption to the introduction
(ms) Ratio of data-flow(ms)
weights in this study.
TrafficWe propose an enhanced
Weight
CFRW-RL learning algorithm95.7%
56.04 that leverages these weights, known as
0.8075 CFRW-RL. This algorithm
0.1972 0.0550 is
an extension of the CFR-RL algorithm and integrates a traffic-weight matrix. This integra-
CFR-RL 55.44 95.6% 0.8068 0.1965 0.1673
tion enables the algorithm to automatically identify and prioritize critical flows that are
associated with lower business interference for rerouting. This approach facilitates more
One and
intelligent of the most traffic
precise notablescheduling,
differencesensuring
betweenthat
the two algorithmsperformance
the network’s is observed in their
is op-
handling of critical flows. The CFRW-RL
timized with minimal impact on business processes. algorithm selects critical flows with an average
traffic
Therate that is almost
simulation identical conducted
experiments to that of the
toCFR-RL
evaluatealgorithm. However,ofthe
the performance CFRW-RL
CFRW-RL
algorithm achieves this with a significantly lower average weight of
have demonstrated its superiority over the CFR-RL algorithm. The findings reveal critical flows, which
thatis
more than 67% less than that of the CFR-RL algorithm. This lower average
the CFRW-RL algorithm is capable of achieving near-optimal performance by rerouting weight implies
thatathe
only CFRW-RL
small fractionalgorithm introduces
of the total lessstrategic
traffic. This interference with business
rerouting operations,
significantly reduceswhich
the
is a crucial factor in maintaining the smooth functioning of business applications.
interference typically caused by traffic scheduling adjustments on business operations. By
The CFRW-RL algorithm accomplishes this through its reward computation mecha-
focusing on the average traffic of critical flows and their associated weights, CFRW-RL
nism, which deducts the average weight of critical flows from the rewards calculated by
effectively balances network efficiency with the operational needs of the enterprise, lead-
the CFR-RL algorithm. By doing so, the CFRW-RL algorithm incentivizes the selection
ing to a more streamlined and business-friendly network environment.
of critical flows with lower average weights, as lower average weights result in higher
In future work, we aim to focus on the following areas for research and improvement:
1. Adaptive Weight Adjustment Mechanism: We will research and develop a mecha-
nism that can dynamically adjust the weights of data flows based on real-time net-
work conditions and business requirements. This will help achieve more refined and
effective traffic management in changing network environments.
2. Algorithmic Scalability: We will investigate the scalability of the algorithm for large
Electronics 2024, 13, 1441 16 of 17
rewards. This approach not only reduces business interference but also enhances the overall
efficiency of network traffic management.
The significant reduction in business interference, as highlighted by the lower average
weight of critical flows, underscores the effectiveness of the CFRW-RL algorithm in miti-
gating disruptions to business operations. This improvement is particularly important in
scenarios where network performance directly impacts business outcomes, as it ensures
that critical business applications run smoothly without unnecessary delays or interference.
In conclusion, the CFRW-RL algorithm, as demonstrated by the data in Table 2 and
the visualization in Figure 8, offers a refined approach to network traffic management. It
successfully balances the reduction of time consumption and load balancing with the mini-
mization of business interference, making it a potentially superior choice for environments
that require high performance and minimal disruption to business operations.
5. Conclusions
In the domain of backbone link traffic scheduling within enterprise networks, the aim
to minimize the disruption to business activities during the process of traffic scheduling has
led to the introduction of data-flow weights in this study. We propose an enhanced learning
algorithm that leverages these weights, known as CFRW-RL. This algorithm is an extension
of the CFR-RL algorithm and integrates a traffic-weight matrix. This integration enables
the algorithm to automatically identify and prioritize critical flows that are associated
with lower business interference for rerouting. This approach facilitates more intelligent
and precise traffic scheduling, ensuring that the network’s performance is optimized with
minimal impact on business processes.
The simulation experiments conducted to evaluate the performance of CFRW-RL
have demonstrated its superiority over the CFR-RL algorithm. The findings reveal that
the CFRW-RL algorithm is capable of achieving near-optimal performance by rerouting
only a small fraction of the total traffic. This strategic rerouting significantly reduces the
interference typically caused by traffic scheduling adjustments on business operations. By
focusing on the average traffic of critical flows and their associated weights, CFRW-RL
effectively balances network efficiency with the operational needs of the enterprise, leading
to a more streamlined and business-friendly network environment.
In future work, we aim to focus on the following areas for research and improvement:
1. Adaptive Weight Adjustment Mechanism: We will research and develop a mechanism
that can dynamically adjust the weights of data flows based on real-time network
conditions and business requirements. This will help achieve more refined and
effective traffic management in changing network environments.
2. Algorithmic Scalability: We will investigate the scalability of the algorithm for large
enterprise networks, particularly in the face of complex network topologies and high
volumes of traffic, to ensure the effectiveness and efficiency of the algorithm.
3. Real-World Deployment and Testing: We plan to deploy and test the CFRW-RL
algorithm in real enterprise network environments to verify its practical applicability
and to further optimize the algorithm’s performance based on actual data.
Author Contributions: Conceptualization, H.C. and L.Z.; methodology, H.C. and Y.L.; software,
H.C. and Y.L.; validation, Z.L.; formal analysis, H.C.; data curation, Y.L.; writing—original draft
preparation, H.C.; writing—review and editing, L.Z.; visualization, Z.L.; supervision, L.Z.; project
administration, H.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Guangdong Province’s Special Fund for Science and Tech-
nology Innovation Strategy, grant number pdjh2023b0986; Guangdong Vocational and Technical
Education Association’s Fourth Council Research Planning Project, grant number 202212G266; Zhuhai
City Polytechnic scientific research projects, grant number KY2022Z01Z; and Zhuhai Education Re-
search Planning Project, grant number 2023ZHGHKT261.
Electronics 2024, 13, 1441 17 of 17
Data Availability Statement: Data are available in a publicly accessible repository that does not
issue DOIs. Publicly available datasets were analyzed in this study. This data can be found here:
[https://www.cs.utexas.edu/~yzhang/research/AbileneTM (accessed on 8 April 2024)].
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Garani, S.S.; Zhang, T.; Motwani, R.H.; Pozidis, H.; Vasic, B. Guest Editorial Channel Modeling, Coding and Signal Processing for
Novel Physical Memory Devices and Systems. IEEE J. Sel. Areas Commun. 2016, 34, 2289–2293. [CrossRef]
2. Chang, H.; Kodialam, M.; Lakshman, T.V.; Mukherjee, S.; Van der Merwe, J.K.; Zaheer, Z. MAGNet: Machine Learning Guided
Application-Aware Networking for Data Centers. IEEE Trans. Cloud Comput. 2023, 11, 291–307. [CrossRef]
3. Garcia-Dorado, J.L.; Rao, S.G. Cost-aware Multi Data-Center Bulk Transfers in the Cloud from a Customer-Side Perspective. IEEE
Trans. Cloud Comput. 2019, 7, 34–47. [CrossRef]
4. Zhu, J.; Jiang, X.; Yu, Y.; Jin, G.; Chen, H.; Li, X.; Qu, L. An efficient priority-driven congestion control algorithm for data center
networks. China Commun. 2020, 17, 37–50. [CrossRef]
5. Islam, S.U.; Javaid, N.; Pierson, J.M. A novel utilisation-aware energy consumption model for content distribution networks. Int.
J. Web Grid Serv. 2017, 13, 290. [CrossRef]
6. Guo, Y.; Ma, Y.; Luo, H.; Wu, J. Traffic Engineering in a Shared Inter-DC WAN via Deep Reinforcement Learning. IEEE Trans.
Netw. Sci. Eng. 2022, 9, 2870–2881. [CrossRef]
7. Xu, Z.; Tang, J.; Meng, J.; Zhang, W.; Wang, Y.; Liu, C.H.; Yang, D. Experience-driven Networking: A Deep Reinforcement
Learning based Approach. In Proceedings of the IEEE INFOCOM 2018—IEEE Conference on Computer Communications,
Honolulu, HI, USA, 16–19 April 2018; pp. 1871–1879.
8. Guo, Y.; Wang, W.; Zhang, H.; Guo, W.; Wang, Z.; Tian, Y.; Yin, X.; Wu, J. Traffic Engineering in Hybrid Software Defined Network
via Reinforcement Learning. J. Netw. Comput. Appl. 2021, 189, 103116. [CrossRef]
9. Sun, P.; Hu, Y.; Lan, J.; Tian, L.; Chen, M. TIDE: Time-relevant deep reinforcement learning for routing optimization. Future Gener.
Comput. Syst. 2019, 99, 401–409. [CrossRef]
10. Sun, P.; Guo, Z.; Lan, J.; Li, J.; Hu, Y.; Baker, T. ScaleDRL: A Scalable Deep Reinforcement Learning Approach for Traffic
Engineering in SDN with Pinning Control. Comput. Netw. 2021, 190, 107891. [CrossRef]
11. Rischke, J.; Sossalla, P.; Salah, H.; Fitzek, F.H.P.; Reisslein, M. QR-SDN: Towards Reinforcement Learning States, Actions, and
Rewards for Direct Flow Routing in Software-Defined Networks. IEEE Access 2020, 8, 174773–174791. [CrossRef]
12. Wu, T.; Zhou, P.; Wang, B.; Li, A.; Tang, X.; Xu, Z.; Chen, K.; Ding, X. Joint traffic control and multichannel reassignment for core
backbone network in SDN-IoT: A multi-agent deep reinforcement learning approach. IEEE Trans. Netw. Sci. Eng. 2020, 8, 231–245.
[CrossRef]
13. Zhang, H.; Zhang, J.; Bai, W.; Chen, K.; Chowdhury, M. Resilient datacenter load balancing in the wild. In Proceedings of the
Conference of the ACM Special Interest Group on Data Communication, Los Angeles, CA, USA, 21–25 August 2017; pp. 253–266.
14. Zhang, J.; Ye, M.; Guo, Z.; Yen, C.-Y.; Chao, H.J. CFR-RL: Traffic Engineering with Reinforcement Learning in SDN. IEEE J. Sel.
Areas Commun. 2020, 38, 2249–2259. [CrossRef]
15. Wu, X.; Huang, C.; Tang, M.; Sang, Y.; Zhou, W.; Wang, T.; He, Y.; Cai, D.; Wang, H.; Zhang, M. NetO: Alibaba’s WAN
Orchestra-tor[EB/OL]. 2017. Available online: http://conferences.sigcomm.org/sigcomm/2017/files/program-industrial-
demos/sigcomm17industrialdemos-paper1.pdf (accessed on 26 October 2017).
16. Wang, R.; Zhang, Y.; Wang, W.; Xu, K.; Cui, L. Algorithm of Mixed Traffic Scheduling Among Data Centers Based on Prediction. J.
Comput. Res. Dev. 2021, 58, 1307–1317. [CrossRef]
17. Kandula, S.; Menache, I.; Schwartz, R.; Babbula, S.R. Calendaring for wide area networks. In Proceedings of the 2014 ACM
conference on SIGCOMM, Chicago, IL, USA, 17–22 August 2014; pp. 515–526.
18. Mao, H.; Alizadeh, M.; Menache, I.; Kandula, S. Resource management with deep reinforcement learning. In Proceedings of the
15th ACM Workshop on Hot Topics in Networks, Atlanta, GA, USA, 9–10 November 2016; pp. 50–56.
19. Zhang, Y. Abilene Network Traffic Data[EB/OL]. 2023. Available online: https://www.cs.utexas.edu/~yzhang/research/
AbileneTM (accessed on 26 October 2017).
20. Benson, T.; Akella, A.; Maltz, D.A. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM
SIGCOMM Conference on Internet measurement, Melbourne, Australia, 1–3 November 2010; pp. 267–280.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

A Reinforcement Learning-Based Traffic Engineering

Uploaded by

Copyright:

Available Formats

A Reinforcement Learning-Based Traffic Engineering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Reinforcement Learning-Based Traffic Engineering

Uploaded by

Copyright:

Available Formats

electronics

Citation: Cheng, H.; Luo, Y.; Zhang,

Electronics 2024, 13, 1441. https://doi.org/10.3390/electronics13081441 https://www.mdpi.com/journal/electronics

2. The CFRW-RL Model

2.1. Model Overview

average weights—and performing intelligent rerouting operations. The model’s aim is

2.2. State and Action Space

At = {0, 1, . . . , ( N × ( N − 1) − 1)} (3)

2.3. Reward Function

2.4. Actor Network and Critic Network

Figure 2. Structural Diagram of the value network.

Figure 3. The model structure of the Actor–Critic method.

V (s; θ, w) = ∑a π (a|s) ∗ q(s, a; w) . (6)

3. The CFRW-RL Algorithm

3.1. Critical Flow Identification

δt = qt − (rt + γ · qt+1 ) (9)

6. Derive the value network using Equation (10).

wt+1 = wt − α · δt · dw,t (11)

8. Derive the policy network using Equation (12).

θt+1 = θt + β · qt · dθ,t (13)

Algorithm 1: pseudocode for the critical flow selection algorithm

3.2. Critical Flow Rerouting Strategy

Table 1. Symbols and their meanings.

li,j ′ = ∑<s,d>∈ f ECMP pi,js,d • Ds,d (18)

li,j f K = ∑<s,d>∈ fK pi,js,d • Ds,d (19)

li,j = li,j ′ + li,j f K (< i, j >∈ E, i ∈ V, j ∈ V ) (20)

4. Simulation and Results Analysis

4.1.2. Evaluation Metrics

4.2. Experiments and Results Analysis

Table 2. Fitted distribution of data stream weights.

Probability Distribution EXPONWEIB WEIBULL_MIN Gamma Gamma Other

2. Generating Weights Based on Probability Distribution and Normalization

4.2.2. Impact of Key Flow Setting Ratio on CFRW-RL Algorithm

4.2.4. Validation of the Effectiveness of a Reinforcement Learning Network Structure

Electronics 2024, 13, x FOR PEER REVIEW 17 of 19

You might also like