A Reinforcement Learning-Based Traffic Engineering
A Reinforcement Learning-Based Traffic Engineering
A Reinforcement Learning-Based Traffic Engineering
Article
A Reinforcement Learning-Based Traffic Engineering Algorithm
for Enterprise Network Backbone Links
Haixiu Cheng 1,2 , Yingxin Luo 1,2 , Ling Zhang 1,2, * and Zhiwen Liao 3
1 Guangdong Province Key Laboratory of Computer Network, South China University of Technology,
Guangzhou 510641, China; [email protected] (H.C.); [email protected] (Y.L.)
2 School of Computer Science & Engineering, South China University of Technology, Guangzhou 510641, China
3 School of Artificial Intelligence, Zhuhai City Polytechnic, Zhuhai 519090, China
* Correspondence: [email protected]
Abstract: Large enterprise networks typically rely on expensive, high-speed backbone links to connect
multiple campuses across diverse regions. As the volume of traffic traversing these backbone links
increases, traffic engineering techniques are employed to filter or redirect traffic flows. Nevertheless,
simple rerouting strategies can introduce business disruptions such as packet reordering, which
significantly impact the user experience. To address this issue, we introduce an enhanced traffic
scheduling algorithm named Critical Flow Rerouting with Weight- Reinforcement Learning(CFRW-
RL), which builds upon the critical flow rerouting-reinforcement learning (CFR-RL) algorithm.
CFRW-RL incorporates the principles of reinforcement learning, accounting for both the weights
and classifications of data flows. This approach enables the algorithm to prioritize flows with lower
weights for rerouting. The simulation results demonstrate that CFRW-RL significantly minimizes
the rerouting of high-priority business flows and reduces business interference compared with the
CFR-RL algorithm and that it maintains a similar computational complexity.
Keywords: reinforcement learning; enterprise network backbone links; traffic classification; user
experience
cate, traditional TE methods face challenges in meeting the demands of modern networks.
To address this evolving landscape, researchers have started to utilize RL technologies
to adaptively adjust network configurations. Guo et al. [6] introduced an innovative TE
method based on DRL, focusing on optimizing traffic allocation in shared data center wide
area networks. Their approach minimizes link bandwidth utilization and transmission
delay to achieve load balancing while maintaining low latency for high-priority traffic. Xu
et al. [7] proposed a novel network control strategy using an experience-driven deep rein-
forcement learning approach, which enhances decision-making accuracy by supervising the
learning of network dynamics through Deep Neural Networks (DNNs). Furthermore, Guo
et al. [8] implemented RL in hybrid software-defined network environments, dynamically
adjusting routing strategies based on learned traffic patterns to improve the quality of
service for critical traffic and optimize network resource utilization.
Sun et al. [9] and Sun et al. [10] introduced deep reinforcement learning methods that
consider time relevance and scalability, respectively, providing new insights and solutions
for TEby addressing the real-time and scalable nature of networks. Rischke et al. [11]
proposed QR-SDN, a reinforcement learning approach for direct flow routing in software-
defined networks(SDN), defining states, actions, and rewards to optimize routing decisions.
Wu et al. [12] utilized a multi-agent deep reinforcement learning strategy for simultaneous
traffic control and multichannel reassignment in the core backbone network of SDN-
IoT, effectively improving packet throughput at the link layer and reducing packet loss
and delay.
The study also highlights the importance of investigating potential configuration
overhead and business disturbances (such as packet disorder and service interruptions)
during the traffic scheduling process in TE. Zhang et al. [13] initially suggested a cautious
routing scheme to address packet disorder issues in TE by evaluating the benefits of path
switching for routing decisions. Expanding on this concept, Zhang et al. [14] introduced the
CFR-RL algorithm, a TE method based on RL, which optimizes load balancing by routing
critical data flows while minimizing network interference and business disruptions.
The inter-domain networks of distributed data centers manage traffic for both data
centers and user Internet applications [8]. These two types of traffic have varying network
parameters like latency, jitter, and bandwidth, which require different handling and prioriti-
zation. Alibaba’s traffic scheduling system NetO [15], as well as studies by W. Ran et al. [16]
and Kandula S et al. [17], recognize the presence of these distinct levels of traffic priority.
This underscores the importance of implementing differentiated handling strategies for
traffic with different priorities in inter-domain networks of distributed data centers to
ensure optimal performance.
Previous studies [13,14] have not adequately accounted for traffic prioritization, lead-
ing to a uniform treatment of all data streams, which overlooks the varying impacts that
different levels of data streams can have on the network. By introducing data stream
weight, we propose the introduction of data stream weight to signify the importance of
each flow. By prioritizing the rerouting of critical flows with lower weights, we can mini-
mize disruptions to business operations and maintain network performance. This approach
ensures the stability and continuity of essential business processes.
The remainder of this paper is organized as follows: Section 2 presents the CFRW-RL
algorithm model, Section 3 outlines its implementation, Section 4 discusses the simulation
and result analysis, and, finally, Section 5 offers the conclusions.
where St represents the state at time t, and TMt is the traffic matrix at time t, to express
the bandwidth requirements among different data flows, indicating how much bandwidth
each data flow needs for transmission. TWt is the data-flow weight matrix at time t, used
to represent the weights of different categories of traffic, measuring the negative impact
of different categories of data flows on the network during rerouting. TMt and TWt are
isomorphic, with the only difference being the meanings of their elements, as indicated in
Equation (2).
T11 T12 · · · T1j · · · T1n2
21 T22 · · · T2j · · · T2n2
T
. .. . .
. . .. ..
.
T= (2)
Ti1 Ti2 · · · Tij · · · Tin2
.. .. .. .. ..
. . . . .
Tm1 Tm2 · · · Tmj · · · Tmn2
In this context, n signifies the total count of nodes, while m corresponds to the number
of time steps involved.
Data-flow weight values represent the proportion of network service traffic demanded
by users within a data flow. A larger weight value indicates a higher demand for net-
work service traffic by users. This definition is chosen to enhance the speed of network
scheduling. The algorithm does not schedule each small network service individually
but rather schedules each large data flow as a whole. Each large data flow comprises
both user-demand network services and data center-demand network services. Data-flow
weight values are used to distinguish their importance.
2. Action Space
The action space can be represented by Equation (3).
For each state St , CFRW-RL selects K critical flows. In a network system with N switch
nodes, considering the source-destination switch perspective, a total of N × ( N − 1) data
flows exist, meaning that any switch acting as the source node can establish data flows
with all other switches acting as destination nodes. Therefore, the action space contains
all possible combinations of data flows, totaling CN K K
×( N −1) combinations, where CN ×( N −1)
denotes the number of combinations of choosing K data flows from N × ( N − 1) flows.
The action space has a large scale, and directly computing all possible combinations would
result in excessively high computational complexity. To reduce time complexity, CFRW-RL
employs the method proposed by Zhang et al. [14] and Mao et al. [18], directly selecting K
data flows from N × ( N − 1) flows and associating each possible action with a specific data
flow. This definition effectively reduces the scale of the action space, making the algorithm
more efficient in addressing data-flow scheduling and rerouting problems.
During the algorithm’s iteration process, at each time step, K different data flows are
chosen as the actions the agent will take in the next state:
n o
atK = a1t , a2t , . . . , atK . (4)
This action-selection method ensures the algorithm’s rationality while reducing com-
putational complexity, enabling CFRW-RL to better adapt to large-scale network systems
and improve the efficiency of data-flow scheduling and rerouting.
where Ut represents the minimized maximum link bandwidth occupancy rate after opti-
mization of key flow rerouting, FK = { f 1 , f 2 , . . . , f i , . . . , f K } represents the set of K critical
flows, w f i represents the weight of the i-th critical f , ε represents the weight factor of critical
flows in the reward function, and K is the number of critical flows. The term 1/Ut in the
reward function translates the effect of optimizing and reducing the maximum link band-
width occupancy rate, where a higher value results in a higher reward. The term ε ∗ (w f i /K )
represents the average weight of the selected K critical flows, where a smaller value leads
to a higher reward. This is intended to encourage the algorithm to choose critical flows
with a smaller impact on the network, thereby minimizing business interference.
.. ..
.. ..
.. ..
Traffic
Traffic matrix
matrix
.. ..
.. ..
.. ..
Traffic
Traffic weight
weight matrix
matrix
convolutional
convolutional layer
layer fully
fully connected
connected layer
layer
Input
Input layer
layer
Figure 1. Structural
1.Structural
Figure1.
Figure diagram
Structuraldiagram of
diagramof the
ofthe policy
thepolicy network.
policynetwork.
network.
Value
Value Network(Critic)
Network(Critic)
convolution
convolution
Traffic
Traffic matrix
matrix
convolution
convolution
concatenate
concatenate
Dense
Dense
Traffic
Traffic weight
weight matrix
matrix
feature
feature q(s,a,w)
q(s,a,w)
Dense
Dense
feature
feature
action
action aa
policy through learning so as to achieve the maximum cumulative reward in the long
term. The Critic, on the other hand, is the value network, responsible for evaluating the
effectiveness of the actions taken by the Actor. It learns a value function to estimate the
expected return of taking a specific action in a particular state. This value function can be
either a state-value function or an action-value function, providing feedback to the Actor
about the quality of its actions.
The state-value function of the Actor–Critic is defined as Equation (6):
The equation represents the approximation of the state-value function using neural
networks:
• V (s; θ, w): this is the value function for state s, where θ represents the parameters
of the policy network (Actor) and w represents the parameters of the value function
network (Critic);
• Σa: this denotes the summation over all possible actions a, indicating the summation
over the action space;
• q(s, a; w): this is the State-action Value Function (Q-value Function) generated by the
value function network, representing the expected value of taking action a in state s;
• π ( a|s) : this is the action probability generated by the policy network, representing
the probability of taking action. This is the action probability generated by the policy
network, representing the probability of taking action a in state s.
In essence, this equation approximates the state-value function V (s; θ, w) as a weighted
sum of the state–action-value function q(s, a; w) for all possible actions in state s, weighted
by the corresponding action probabilities π ( a|s) generated by the policy network.
q t = q ( St , a t , w t ) (7)
q t +1 = q ( S t +1 , e
a t +1 , w t ) (8)
5. Calculate the Temporal Difference (TD) error using Equation (9).
∂q(st , at ; w)
dw,t = | w = wt (10)
∂w
Electronics 2024, 13, 1441 7 of 17
7. Update the value network using Equation (11). This involves gradient descent to
make the predicted value closer to the TD target.
∂ log π ( at |St , θ )
dθ,t = θ =θt (12)
∂θ
9. Update the policy network using Equation (13). This involves gradient ascent to
increase the score of the Actor’s action.
Each iteration of the Actor–Critic method requires the above nine steps, during which
only one action is taken, one reward is observed, and the parameters of the neural network
are updated once.
The pseudocode for the critical selection algorithm is shown in Algorithm 1.
In this pseudocode:
• ‘θ_actor’ and ‘θ_critic’ represent the parameters of the Actor network and Critic
network, respectively.
• ‘optimizer_actor’ and ‘optimizer_critic’ are the optimizers used to update the network
parameters.
• During each training iteration, a batch of inputs, actions, and rewards (and possibly
next states) is first collected from the environment.
• ‘tf.GradientTape’ is used to track the gradients of the Critic network, and to calculate
the value loss and advantage values.
• The gradients computed are used to update the parameters of the Critic network.
• ‘tf.GradientTape’ is used again to track the gradients of the Actor network, and to
calculate the policy loss and entropy.
• The gradients computed are used to update the parameters of the Actor network.
This process is repeated over multiple training iterations until the network parameters
converge to a point that optimizes the policy performance.
Electronics 2024, 13, 1441 8 of 17
Symbol Meaning
Ci,j The physical bandwidth of link < i, j >∈ E, i ∈ V, j ∈ V.
li,j The total load of link < i, j >, < i, j >∈ E, i ∈ V, j ∈ V.
li,j ′ The initial load of link < i, j >∈ E, i ∈ V, j ∈ V.
fK The set of K critical flows identified by the CFRW-RL algorithm.
f ECMP The residual flows excluding f K .
The bandwidth requirement for the data flow from source node s to destination
D s,d
node d, s ∈ V, d ∈ V, s ̸= d.
s,d The probability of the data flow from source node s to destination node d traversing
pi,j
link < i, j >, s ∈ V, d ∈ V, s ̸= d, < i, j >∈ E.
U The maximum bandwidth utilization across various links in the network.
D ( f ECMP ) The traffic demand employing the ECMP algorithm.
D( f K ) The traffic demand of critical flow f K .
d (i )+ The in-degree of node i
d (i )− The out-degree of node i.
li,j f K The traffic load of critical flow allocated to link < i, j >.
If there are L links, with the load (i.e., traffic) on each link being l1 , l2 , . . . , l L , and the ca-
pacity of each link being C1 , C2 , . . . , CL , then the maximum utilization can be
expressed as
l
U = maxiL=1 ( i ) (14)
Ci
Here, li /Ci represents the utilization of the i-th link, which is the ratio of the traffic li
to the link capacity Ci . By taking the maximum of the utilization ratios across all links, the
maximum utilization rate of the entire network is obtained.
The model’s objective function is outlined in Equation (15), while the constraints are
detailed in Equations (16)–(22).
obj = minimize U (15)
d (i )+ = ∑k:<k,i>∈E pi,k
s,d
(16)
d (i )− = ∑k:<i,k>∈E ps,d
k,i (17)
The in-degree d(i )+ and out-degree d(i )− can be calculated as shown in Equations
(15) and (16), respectively. The in-degree accounts for the sum of probabilities of traffic
Electronics 2024, 13, 1441 9 of 17
flow incoming to the node from others, whereas the out-degree corresponds to the sum of
probabilities of traffic flow outgoing from the node to other nodes.
The initial load of the link < i, j >, representing the residual traffic assigned by the
ECMP algorithm minus the critical flows’ traffic, is derived from Equation (17). The load
of critical flows on the link < i, j >, as allocated by the key flow routing algorithm, is
meticulously calculated using the model provided by Equation (18). This equation is
pivotal in determining the portion of traffic that is deemed critical and how it is distributed
across the network’s links. The total load on the link < i, j >, encompassing both the initial
load and the critical flow load, is synthesized in Equation (19).
To ensure that link traffic remains within acceptable limits, the condition set forth
in Equation (20) mandates that the total load on a link must be less than or equal to the
product of its physical bandwidth and the maximum utilization rate. This constraint is
crucial for preventing any link from being overloaded beyond its capacity.
Adhering to the principle of traffic conservation, data flows within the network are
governed by a set of rules. These rules, as articulated in Equation (21), dictate the flow
dynamics at various nodes. At the origin of a path, a node’s in-degree is null and its
out-degree is one, resulting in a net difference of −1. Conversely, at the path’s terminus,
the in-degree is one and the out-degree is zero, leading to a net difference of 1. For nodes
that serve as intermediaries along the path, both the in-degree and out-degree are one, with
no net difference between them.
3. Comparison Algorithms
To validate the effectiveness of the CFRW-RL algorithm, we employ three different
methods for critical flow selection. Two rule-based heuristic algorithms are used: Top-
K algorithm and Top-K critical algorithm. Additionally, a reinforcement learning-based
algorithm is employed.
(1) Top-K Algorithm
Selects the K flows with the highest traffic from the traffic matrix. The basic idea is
that flows with higher traffic have a more significant impact on network performance.
(2) Top-K Critical Algorithm
Selects the K flows with the highest traffic from the most congested links. The moti-
vation is that flows passing through the most congested links have a larger impact on the
network.
(3) CFR-RL Algorithm
CFR-RL algorithm is a reinforcement learning-based TE solution. It dynamically
adjusts routes based on network state and traffic demand by autonomously learning policies
for selecting critical flows. It leverages the flexibility of software-defined networking (SDN).
Figure 4.
Figure 4. Influence
Influence of
of different
different critical
critical flow
flow ratios
ratios on
on link
link load
load ratios.
ratios.
4.2.3.
4.2.3. Effect
Effect of
of Weight
Weight Proportion
Proportion εε onon CFRW-RL
CFRW-RLAlgorithm
Algorithm
During
During the analysis of the CFRW-RL algorithm’s impact
the analysis of the CFRW-RL algorithm’s impact on on weight
weight distribution,
distribution, wewe
fixed
fixed the
the proportion
proportion of of critical
critical flows
flows at
at 10%,
10%, and
and varied the weight
varied the weight ratio
ratio εε from
from 00 to
to 20,
20,
incrementing
incrementingby by0.5
0.5for
foreach
eachstep. ForFor
step. each distinct
each value
distinct of ε,ofwe
value ε,documented
we documentedthe load
the ratio
load
and the average reward value, as illustrated in Figure 5. A weight value of 0 signifies the
ratio and the average reward value, as illustrated in Figure 5. A weight value of 0 signifies
absence of the traffic weight matrix’s application, which implies that the CFR-RL algorithm
the absence of the traffic weight matrix’s application, which implies that the CFR-RL al-
is not being utilized.
gorithm is not being utilized.
The load-balancing ratio and the average weight of the selected critical flows are
delineated in Figure 5, with the former indicated by the left vertical axis and the latter by
the right vertical axis. Notably, the load-balancing ratio achieves a figure above 99% at
weight coefficients of 3.0, 4.5, 16.5, and 17.0. The CFRW-RL algorithm’s primary goal is to
optimize the balance between maintaining a high load-balancing ratio and minimizing the
average weight of critical flows. Among the coefficients tested, the lowest average weight
is observed at a coefficient of 4.5, signifying an optimal configuration for balancing load
and reducing flow weights. In light of these findings, a weight coefficient of 4.5 is selected
for implementation in subsequent experimental trials.
Electronics2024,
Electronics 2024,13,
13,1441
x FOR PEER REVIEW 13 of 19
12 of 17
Figure5.5.The
Figure Theinfluence
influenceof
oftraffic
trafficweight
weightproportion
proportionon
onthe
theCFRW-RL
CFRW-RLalgorithm.
algorithm.
(a) (b)
(c)
Figure
Figure6.6.
This is is
This a performance
a performancecomparison
comparisonofofCFRW-RL
CFRW-RLalgorithm
algorithmfor
fordifferent
differentnumbers
numbers of
of convo-
convolu-
lutional layer nodes. (a) Comparison of description of load-balancing ratios for different numbersof
tional layer nodes. (a) Comparison of description of load-balancing ratios for different numbers
of nodes in convolutional layers; (b) comparison of algorithm optimization time for different num-
nodes in convolutional layers; (b) comparison of algorithm optimization time for different numbers
bers of nodes in convolutional layers; (c) comparison of business disruption rates for different num-
of nodes in convolutional layers; (c) comparison of business disruption rates for different numbers of
bers of nodes in convolutional layers.
nodes in convolutional layers.
From
4.2.5. the resultsComparison
Performance presented inbetween
Figure 5,CFRW-RL
it is evident that an increase
Algorithm in the Algorithms
and Heuristic number of
nodes leads to a corresponding increase in the optimization time required by the algo-
The comparison between the CFRW-RL algorithm and heuristic algorithms Crit_TopK
rithm. This is attributable to the fact that a higher node count demands greater learning
and TopK in terms of the optimization time, load-balancing ratio, and delay is illustrated in
time and computational resources. Moreover, when the number of nodes in the convolu-
Figure 7a–c.
tional layer is set to 128,
As depicted the algorithm
in Figure achieves algorithm
7, the CFRW-RL the optimalshows
balance, with theincrease
a modest highest in
load-
opti-
balancing ratio and the lowest business disturbance rate. Consequently,
mization time and delay when compared with the heuristic algorithms, namely top this study has K
opted for a configuration
and crit-topK. of 128
Despite this, thenodes.
CFRW-RL algorithm significantly outperforms these two
heuristic algorithms in terms of having
It is important to note that too few ratio.
load-balancing nodesThis
candistinction
result in insufficient learning
is particularly crucial
capacity for the
in scenarios policy
that demandnetwork, while an responsiveness,
high real-time excessive number of video
with nodesapplications
can lead to an in- a
being
creased business
prime example. disturbance rate. This is primarily due to the risk of overfitting in the
policy Thenetwork,
slightwhich
increasecanincompromise
optimization thetime
robustness
and delayof the algorithm.
for the CFRW-RL Additionally,
algorithm as can
the size of the policy network grows, so does the time required to train
be attributed to its more sophisticated approach to learning and adapting to network the model. There-
fore, selecting The
conditions. an appropriate
algorithm’snetwork size is learning
reinforcement crucial for effectivelyinvolves
framework balancinga algorithm
continuous
performance and training
process of evaluating and duration.
adjusting its strategy to achieve better long-term performance,
which inherently requires more computational effort and time compared to heuristic
4.2.5. Performance
algorithms Comparison
that follow simpler,between
rule-basedCFRW-RL Algorithm and Heuristic Algorithms
procedures.
TheHowever, the superior
comparison between load-balancing
the CFRW-RL ratioalgorithm
achieved byandtheheuristic
CFRW-RL algorithm
algorithms
justifies the
Crit_TopK andadditional computational
TopK in terms overhead.time,
of the optimization A higher load-balancing
load-balancing ratiodelay
ratio, and ensures
is
that network
illustrated resources
in Figure 7a–c.are utilized more efficiently, leading to a more stable and reliable
network performance,
As depicted which
in Figure 7, theisCFRW-RL
essential for maintaining
algorithm showsthe qualityincrease
a modest of service
in (QoS)
optimi-for
real-time applications, such as video streaming or video conferencing.
zation time and delay when compared with the heuristic algorithms, namely top K and
crit-topK. Despite this, the CFRW-RL algorithm significantly outperforms these two heu-
ristic algorithms in terms of load-balancing ratio. This distinction is particularly crucial in
scenarios that demand high real-time responsiveness, with video applications being a
prime example.
Electronics 2024, 13, x FOR PEER REVIEW 15 of 19
Electronics 2024, 13, 1441 14 of 17
(a) (b)
(c)
Figure 7. This
Figure is aisperformance
7. This comparison
a performance between
comparison CFRW-RL
between CFRW-RLalgorithm andand
algorithm heuristic algorithms.
heuristic algorithms.
(a)(a)
Comparison of time consumption for optimization between CFRW-RL algorithm
Comparison of time consumption for optimization between CFRW-RL algorithm and and heuristic
heuristic
algorithm; (b) comparison of load-balancing ratios between CFRW-RL algorithm and heuristic al-
algorithm; (b) comparison of load-balancing ratios between CFRW-RL algorithm and heuristic
gorithm; (c) comparison of optimization latency between CFRW-RL algorithm and heuristic algo-
algorithm; (c) comparison of optimization latency between CFRW-RL algorithm and heuristic
rithm.
algorithm.
ThePerformance
4.2.6. slight increase in optimization
Comparison betweentime
theand delay for
CFRW-RL the CFRW-RL
Algorithm algorithm
and CFR-RL can
Algorithm
be attributed to its more sophisticated approach to learning and adapting to network con-
The comparison results of CFRW-RL and CFR-RL algorithms in terms of optimization
ditions. The algorithm’s reinforcement learning framework involves a continuous process
time consumption, load-balancing ratio, and delay are shown in Figure 8a–c, respectively.
of evaluating and adjusting its strategy to achieve better long-term performance, which
The business interference rate is calculated in two ways: the first one is based on the
inherently requires more computational effort and time compared to heuristic algorithms
calculation of the business interference rate in the CFR-RL algorithm, which computes
that follow simpler, rule-based procedures.
the ratio of the traffic of critical flows to the total traffic; the second one is based on the
However,ofthe
calculation superior
the load-balancing
business interference ratio achieved
proportion in by
thethe CFRW-RLalgorithm,
CFRW-RL algorithm which
jus-
tifies the additional
calculates computational
the average weight of overhead. A critical
the selected higher flows.
load-balancing ratio ensures
The comparisons that
of the two
network resources are utilized more efficiently, leading to a more
business interference rates of the algorithm are shown in Figure 8d,e. stable and reliable net-
work performance,
As depicted which is 3,
in Table essential for maintaining
the comparison the quality
of CFRW-RL of servicealgorithms
and CFR-RL (QoS) for real-
across
time applications, such as video streaming or video conferencing.
various key performance indicators such as time consumption, load balancing, delay,
average traffic of critical flows, and average weight of critical flows is presented. This
4.2.6. Performance
table Comparison
provides a detailed between analysis,
comparative the CFRW-RL Algorithm
allowing and understanding
for a better CFR-RL of the
Algorithm
strengths and suitable applications of each algorithm.
The comparison
The results of
results presented in CFRW-RL and CFR-RL
Figure 8, along with the algorithms in terms
data from Table of optimiza-
2, indicate that the
tion time consumption,
CFRW-RL load-balancing
algorithm has ratio, and delay
a marginal improvement in are
timeshown in Figurecompared
consumption 8a–c, respec-
to the
tively.
CFR-RLThe algorithm,
business interference rate is calculated
with a reduction in two
of 0.6 s. This ways:
slight the first
decrease one is based
suggests on
a potential
thefor increasedofefficiency
calculation in the
the business CFRW-RLrate
interference algorithm. Both algorithms
in the CFR-RL algorithm,exhibit a high load-
which computes
thebalancing ratio,
ratio of the exceeding
traffic 95%,
of critical which
flows is indicative
to the of their
total traffic; effectiveness
the second in distributing
one is based on the
network traffic evenly and preventing potential bottlenecks.
calculation of the business interference proportion in the CFRW-RL algorithm, which cal-
culates the average weight of the selected critical flows. The comparisons of the two busi-
ness interference rates of the algorithm are shown in Figure 8d,e.
As depicted in Table 3, the comparison of CFRW-RL and CFR-RL algorithms across
various key performance indicators such as time consumption, load balancing, delay, av-
erage traffic of critical flows, and average weight of critical flows is presented. This table
provides a detailed comparative analysis, allowing for a better understanding of the
strengths and suitable applications of each algorithm.
In conclusion, the CFRW-RL algorithm, as demonstrated by the data in Table 2 and
the visualization in Figure 8, offers a refined approach to network traffic management. It
successfully balances the reduction of time consumption and load balancing with the min-
imization of business interference, making it a potentially superior choice for environ-
Electronics 2024, 13, 1441
ments that require high performance and minimal disruption to business operations. 15 of 17
(a) (b)
(c) (d)
(e)
Figure
Figure8.8.
This is is
This a performance
a performancecomparison between
comparison between CFRW-RL
CFRW-RL and CFR-RL.
and CFR-RL.(a)(a)
Comparison
Comparisonofof
time
time
consumption
consumption forforoptimization
optimizationbetween CFRW-RL
between CFRW-RLand andCFR-RL;
CFR-RL;(b)
(b)comparison
comparisonofofload-balancing
load-balancing
ratios between CFRW-RL and CFR-RL; (c) comparison of optimization latency between CFRW-RL
ratios between CFRW-RL and CFR-RL; (c) comparison of optimization latency between CFRW-RL
algorithm and CFR-RL c algorithm; (d) comparison of business interference rate (average weight of
algorithm and CFR-RL c algorithm; (d) comparison of business interference rate (average weight of
critical flows) between CFRW-RL and CFR-RL; (e) comparison of business interference rates (aver-
critical
age trafficflows) between
of critical CFRW-RL
flows) betweenand CFR-RL;and
CFRW-RL (e) comparison
CFR-RL. of business interference rates (average
traffic of critical flows) between CFRW-RL and CFR-RL.
5. Conclusions
Table 3. Performance comparison between CFRW-RL and CFR-RL algorithms.
In the domain of backbone link traffic scheduling within enterprise networks, the aim
to
Timeminimize the disruption to businessDelay
Load-Balancing activities during theAverage
The process of traffic Thescheduling
Average
Algorithm
has led
Consumption to the introduction
(ms) Ratio of data-flow(ms)
weights in this study.
TrafficWe propose an enhanced
Weight
CFRW-RL learning algorithm95.7%
56.04 that leverages these weights, known as
0.8075 CFRW-RL. This algorithm
0.1972 0.0550 is
an extension of the CFR-RL algorithm and integrates a traffic-weight matrix. This integra-
CFR-RL 55.44 95.6% 0.8068 0.1965 0.1673
tion enables the algorithm to automatically identify and prioritize critical flows that are
associated with lower business interference for rerouting. This approach facilitates more
One and
intelligent of the most traffic
precise notablescheduling,
differencesensuring
betweenthat
the two algorithmsperformance
the network’s is observed in their
is op-
handling of critical flows. The CFRW-RL
timized with minimal impact on business processes. algorithm selects critical flows with an average
traffic
Therate that is almost
simulation identical conducted
experiments to that of the
toCFR-RL
evaluatealgorithm. However,ofthe
the performance CFRW-RL
CFRW-RL
algorithm achieves this with a significantly lower average weight of
have demonstrated its superiority over the CFR-RL algorithm. The findings reveal critical flows, which
thatis
more than 67% less than that of the CFR-RL algorithm. This lower average
the CFRW-RL algorithm is capable of achieving near-optimal performance by rerouting weight implies
thatathe
only CFRW-RL
small fractionalgorithm introduces
of the total lessstrategic
traffic. This interference with business
rerouting operations,
significantly reduceswhich
the
is a crucial factor in maintaining the smooth functioning of business applications.
interference typically caused by traffic scheduling adjustments on business operations. By
The CFRW-RL algorithm accomplishes this through its reward computation mecha-
focusing on the average traffic of critical flows and their associated weights, CFRW-RL
nism, which deducts the average weight of critical flows from the rewards calculated by
effectively balances network efficiency with the operational needs of the enterprise, lead-
the CFR-RL algorithm. By doing so, the CFRW-RL algorithm incentivizes the selection
ing to a more streamlined and business-friendly network environment.
of critical flows with lower average weights, as lower average weights result in higher
In future work, we aim to focus on the following areas for research and improvement:
1. Adaptive Weight Adjustment Mechanism: We will research and develop a mecha-
nism that can dynamically adjust the weights of data flows based on real-time net-
work conditions and business requirements. This will help achieve more refined and
effective traffic management in changing network environments.
2. Algorithmic Scalability: We will investigate the scalability of the algorithm for large
Electronics 2024, 13, 1441 16 of 17
rewards. This approach not only reduces business interference but also enhances the overall
efficiency of network traffic management.
The significant reduction in business interference, as highlighted by the lower average
weight of critical flows, underscores the effectiveness of the CFRW-RL algorithm in miti-
gating disruptions to business operations. This improvement is particularly important in
scenarios where network performance directly impacts business outcomes, as it ensures
that critical business applications run smoothly without unnecessary delays or interference.
In conclusion, the CFRW-RL algorithm, as demonstrated by the data in Table 2 and
the visualization in Figure 8, offers a refined approach to network traffic management. It
successfully balances the reduction of time consumption and load balancing with the mini-
mization of business interference, making it a potentially superior choice for environments
that require high performance and minimal disruption to business operations.
5. Conclusions
In the domain of backbone link traffic scheduling within enterprise networks, the aim
to minimize the disruption to business activities during the process of traffic scheduling has
led to the introduction of data-flow weights in this study. We propose an enhanced learning
algorithm that leverages these weights, known as CFRW-RL. This algorithm is an extension
of the CFR-RL algorithm and integrates a traffic-weight matrix. This integration enables
the algorithm to automatically identify and prioritize critical flows that are associated
with lower business interference for rerouting. This approach facilitates more intelligent
and precise traffic scheduling, ensuring that the network’s performance is optimized with
minimal impact on business processes.
The simulation experiments conducted to evaluate the performance of CFRW-RL
have demonstrated its superiority over the CFR-RL algorithm. The findings reveal that
the CFRW-RL algorithm is capable of achieving near-optimal performance by rerouting
only a small fraction of the total traffic. This strategic rerouting significantly reduces the
interference typically caused by traffic scheduling adjustments on business operations. By
focusing on the average traffic of critical flows and their associated weights, CFRW-RL
effectively balances network efficiency with the operational needs of the enterprise, leading
to a more streamlined and business-friendly network environment.
In future work, we aim to focus on the following areas for research and improvement:
1. Adaptive Weight Adjustment Mechanism: We will research and develop a mechanism
that can dynamically adjust the weights of data flows based on real-time network
conditions and business requirements. This will help achieve more refined and
effective traffic management in changing network environments.
2. Algorithmic Scalability: We will investigate the scalability of the algorithm for large
enterprise networks, particularly in the face of complex network topologies and high
volumes of traffic, to ensure the effectiveness and efficiency of the algorithm.
3. Real-World Deployment and Testing: We plan to deploy and test the CFRW-RL
algorithm in real enterprise network environments to verify its practical applicability
and to further optimize the algorithm’s performance based on actual data.
Author Contributions: Conceptualization, H.C. and L.Z.; methodology, H.C. and Y.L.; software,
H.C. and Y.L.; validation, Z.L.; formal analysis, H.C.; data curation, Y.L.; writing—original draft
preparation, H.C.; writing—review and editing, L.Z.; visualization, Z.L.; supervision, L.Z.; project
administration, H.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Guangdong Province’s Special Fund for Science and Tech-
nology Innovation Strategy, grant number pdjh2023b0986; Guangdong Vocational and Technical
Education Association’s Fourth Council Research Planning Project, grant number 202212G266; Zhuhai
City Polytechnic scientific research projects, grant number KY2022Z01Z; and Zhuhai Education Re-
search Planning Project, grant number 2023ZHGHKT261.
Electronics 2024, 13, 1441 17 of 17
Data Availability Statement: Data are available in a publicly accessible repository that does not
issue DOIs. Publicly available datasets were analyzed in this study. This data can be found here:
[https://www.cs.utexas.edu/~yzhang/research/AbileneTM (accessed on 8 April 2024)].
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Garani, S.S.; Zhang, T.; Motwani, R.H.; Pozidis, H.; Vasic, B. Guest Editorial Channel Modeling, Coding and Signal Processing for
Novel Physical Memory Devices and Systems. IEEE J. Sel. Areas Commun. 2016, 34, 2289–2293. [CrossRef]
2. Chang, H.; Kodialam, M.; Lakshman, T.V.; Mukherjee, S.; Van der Merwe, J.K.; Zaheer, Z. MAGNet: Machine Learning Guided
Application-Aware Networking for Data Centers. IEEE Trans. Cloud Comput. 2023, 11, 291–307. [CrossRef]
3. Garcia-Dorado, J.L.; Rao, S.G. Cost-aware Multi Data-Center Bulk Transfers in the Cloud from a Customer-Side Perspective. IEEE
Trans. Cloud Comput. 2019, 7, 34–47. [CrossRef]
4. Zhu, J.; Jiang, X.; Yu, Y.; Jin, G.; Chen, H.; Li, X.; Qu, L. An efficient priority-driven congestion control algorithm for data center
networks. China Commun. 2020, 17, 37–50. [CrossRef]
5. Islam, S.U.; Javaid, N.; Pierson, J.M. A novel utilisation-aware energy consumption model for content distribution networks. Int.
J. Web Grid Serv. 2017, 13, 290. [CrossRef]
6. Guo, Y.; Ma, Y.; Luo, H.; Wu, J. Traffic Engineering in a Shared Inter-DC WAN via Deep Reinforcement Learning. IEEE Trans.
Netw. Sci. Eng. 2022, 9, 2870–2881. [CrossRef]
7. Xu, Z.; Tang, J.; Meng, J.; Zhang, W.; Wang, Y.; Liu, C.H.; Yang, D. Experience-driven Networking: A Deep Reinforcement
Learning based Approach. In Proceedings of the IEEE INFOCOM 2018—IEEE Conference on Computer Communications,
Honolulu, HI, USA, 16–19 April 2018; pp. 1871–1879.
8. Guo, Y.; Wang, W.; Zhang, H.; Guo, W.; Wang, Z.; Tian, Y.; Yin, X.; Wu, J. Traffic Engineering in Hybrid Software Defined Network
via Reinforcement Learning. J. Netw. Comput. Appl. 2021, 189, 103116. [CrossRef]
9. Sun, P.; Hu, Y.; Lan, J.; Tian, L.; Chen, M. TIDE: Time-relevant deep reinforcement learning for routing optimization. Future Gener.
Comput. Syst. 2019, 99, 401–409. [CrossRef]
10. Sun, P.; Guo, Z.; Lan, J.; Li, J.; Hu, Y.; Baker, T. ScaleDRL: A Scalable Deep Reinforcement Learning Approach for Traffic
Engineering in SDN with Pinning Control. Comput. Netw. 2021, 190, 107891. [CrossRef]
11. Rischke, J.; Sossalla, P.; Salah, H.; Fitzek, F.H.P.; Reisslein, M. QR-SDN: Towards Reinforcement Learning States, Actions, and
Rewards for Direct Flow Routing in Software-Defined Networks. IEEE Access 2020, 8, 174773–174791. [CrossRef]
12. Wu, T.; Zhou, P.; Wang, B.; Li, A.; Tang, X.; Xu, Z.; Chen, K.; Ding, X. Joint traffic control and multichannel reassignment for core
backbone network in SDN-IoT: A multi-agent deep reinforcement learning approach. IEEE Trans. Netw. Sci. Eng. 2020, 8, 231–245.
[CrossRef]
13. Zhang, H.; Zhang, J.; Bai, W.; Chen, K.; Chowdhury, M. Resilient datacenter load balancing in the wild. In Proceedings of the
Conference of the ACM Special Interest Group on Data Communication, Los Angeles, CA, USA, 21–25 August 2017; pp. 253–266.
14. Zhang, J.; Ye, M.; Guo, Z.; Yen, C.-Y.; Chao, H.J. CFR-RL: Traffic Engineering with Reinforcement Learning in SDN. IEEE J. Sel.
Areas Commun. 2020, 38, 2249–2259. [CrossRef]
15. Wu, X.; Huang, C.; Tang, M.; Sang, Y.; Zhou, W.; Wang, T.; He, Y.; Cai, D.; Wang, H.; Zhang, M. NetO: Alibaba’s WAN
Orchestra-tor[EB/OL]. 2017. Available online: http://conferences.sigcomm.org/sigcomm/2017/files/program-industrial-
demos/sigcomm17industrialdemos-paper1.pdf (accessed on 26 October 2017).
16. Wang, R.; Zhang, Y.; Wang, W.; Xu, K.; Cui, L. Algorithm of Mixed Traffic Scheduling Among Data Centers Based on Prediction. J.
Comput. Res. Dev. 2021, 58, 1307–1317. [CrossRef]
17. Kandula, S.; Menache, I.; Schwartz, R.; Babbula, S.R. Calendaring for wide area networks. In Proceedings of the 2014 ACM
conference on SIGCOMM, Chicago, IL, USA, 17–22 August 2014; pp. 515–526.
18. Mao, H.; Alizadeh, M.; Menache, I.; Kandula, S. Resource management with deep reinforcement learning. In Proceedings of the
15th ACM Workshop on Hot Topics in Networks, Atlanta, GA, USA, 9–10 November 2016; pp. 50–56.
19. Zhang, Y. Abilene Network Traffic Data[EB/OL]. 2023. Available online: https://www.cs.utexas.edu/~yzhang/research/
AbileneTM (accessed on 26 October 2017).
20. Benson, T.; Akella, A.; Maltz, D.A. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM
SIGCOMM Conference on Internet measurement, Melbourne, Australia, 1–3 November 2010; pp. 267–280.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.