IEEE Conference Template 1
IEEE Conference Template 1
Networks
Abstract—To accommodate the rapid advancements in cloud data center networks. However, these strategies often rely
computing, big data, and related technologies, the integration of on manual design, making it challenging to achieve optimal
Software-Defined Networking (SDN) with data center networks solutions in dynamic network environments.
has been proposed to enhance flexibility and simplify network
management. Leveraging this integration, researchers have ex- The emergence of Artificial Intelligence (AI) introduces new
plored various routing strategies. However, traditional strategies possibilities for addressing routing challenges. Reinforcement
that rely heavily on manual design face challenges in achieving Learning (RL), exemplified by Q-learning (QL) [12], is a
optimal solutions in dynamic network environments. To address
these limitations, artificial intelligence (AI)-driven strategies are key branch of AI that identifies optimal strategies through
gaining traction. iterative trial-and-error interactions with the environment.
This paper introduces a novel routing strategy based on However, the dynamic nature of network states can result in
Deep Q-Learning (DQL) to autonomously determine optimal an excessively large Q-table, rendering traditional Q-learning
routing paths in SDN-enabled data center networks. Recognizing impractical. Deep Q-Learning (DQL) [13], which employs
the distinct requirements of mice flows and elephant flows in
such networks, separate deep Q-networks are trained for each neural networks to approximate Q-tables, effectively addresses
flow type. The objective is to minimize latency and packet loss this issue. NetworkAI [14] proposes an intelligent architecture
for mice flows while maximizing throughput and minimizing combining SDN with deep reinforcement learning (DRL),
packet loss for elephant flows. Additionally, considering the providing a foundation for self-learning control strategies in
traffic distribution and the constrained resources of data center SDN networks. While it offers an example of using DQL
networks and SDN, this work employs port rate and flow table
utilization to represent network states. for Quality of Service (QoS) routing, the detailed design of
Simulation results demonstrate that, compared to traditional routing schemes remains unexplored.
Equal-Cost Multipath (ECMP) routing and the Selective Ran- In this paper, we present a DQL-based routing strategy
domized Load Balancing (SRL) FlowFit approach, the proposed
strategy significantly reduces the average latency and packet loss
for SDN-enabled data center networks. Two separate deep Q-
rate for mice flows while enhancing the average throughput for networks (DQNs) are designed to make intelligent routing de-
elephant flows. cisions: one focuses on elephant flows to optimize throughput
Index Terms—component, formatting, style, styling, insert and reduce packet loss, while the other targets mice flows to
minimize latency and packet loss. This approach enables the
I. I NTRODUCTION development of near-optimal routing strategies tailored to the
1) INTRODUCTION: The rapid advancements in cloud specific characteristics of each flow type.
computing, big data, and related technologies have led to the 2) Contributions: The key contributions of this work are
continuous expansion of data center networks [1]. Traditional summarized as follows:
network architectures face significant challenges in meeting
the demands of modern data center networks due to limita- 1) Intelligent Network Architecture: We develop an intel-
tions in management and deployment flexibility. The advent ligent architecture for data center network routing. Based
of Software-Defined Networking (SDN) offers a promising on the traffic characteristics of data center networks,
solution by introducing a novel architecture that decouples the this architecture dynamically generates optimal routing
control plane from the data plane in forwarding devices. This strategies for elephant flows and mice flows.
separation enables centralized control, allowing data center 2) DQL Algorithm Design: We provide a detailed design
networks to make intelligent and dynamic decisions. of the DQL algorithm, including the state space, ac-
Routing remains a critical area of research within data tion space, and reward function. To better capture the
center networks and has been extensively studied. Leveraging network state, we incorporate port rate and flow table
the global network view provided by SDN, routing strategies occupancy rate in switches, reflecting traffic distribution
can be deployed more conveniently and flexibly. To enhance and resource utilization, such as link bandwidth and flow
routing efficiency, network flows are generally categorized into table resources.
two types: elephant flows and mice flows. Elephant flows are 3) Performance Validation: The proposed routing algo-
characterized by large volumes of data and long durations, rithm’s effectiveness is validated through simulations.
whereas mice flows involve smaller data volumes and shorter Results demonstrate improvements in packet loss rate
durations. Given these traffic characteristics, several studies and delay for mice flows, as well as enhanced throughput
[2]–[11] have explored SDN-based routing approaches in and reduced packet loss rate for elephant flows.
II. RELATED WORKS
Routing has long been a critical focus of research in data
center networks. With the emergence of Software-Defined
Networking (SDN) in recent years, numerous SDN-based
routing strategies have been developed, enabling fine-grained
flow control. Among these, load-balancing routing strategies
are widely studied to enhance network sustainability. These
strategies aim to maintain high transmission quality for flows
while preserving resources for future traffic.
Several works [2]–[4] have focused on load balancing
specifically for elephant flows. For example, [2] selects routes
capable of accommodating flows, while [3] prioritizes the
least congested paths. Meanwhile, [4] introduces dynamic
path splitting, distributing elephant flows across multiple paths
based on computed ratios. Further, studies such as [5]–[7] pro-
pose rerouting mechanisms to continuously balance network
loads. These approaches periodically assess the network’s load Fig. 1. Enter Caption
balance using specific parameters, such as load balance degree.
If a threshold is exceeded, flow scheduling or splitting is
triggered, effectively reducing packet loss rates and improving simple example of using DQL for QoS routing. While the
throughput for elephant flows. architecture combining DRL and SDN is emphasized, the
While these strategies primarily address link load balancing, specific design of routing schemes is not detailed.
another perspective—flow table load balancing—has recently Recent DRL-based mechanisms, such as DROM [17] and
gained attention. This method considers the limited flow table TIDE [18], have been proposed for routing optimization.
capacities in SDNs. For example, research [8] proposes flow However, these methods rely on modifying link weights, with
table load balancing for mice flows, which constitute the optimal paths determined indirectly through shortest path algo-
majority of data center traffic, to prevent packet loss caused rithms, limiting their direct routing optimization capabilities.
by flow table overflow.
Routing schemes for mice flows also consider their low-
latency requirements. Literature [9] selects paths with the low- III. SYSTEM ARCHITECTURE
est delay, while [10] dedicates specific low-latency paths for To integrate SDN-based data center networks with Deep
these flows. In addition, [11] minimizes latency by reducing Q-Learning (DQL), an AI agent is introduced within an AI
the number of flow rules required for mice flow transmission. plane, extending the traditional SDN architecture to enable in-
However, these solutions are largely based on manual designs, telligent routing decisions. The proposed system architecture,
which lack intelligence and adaptability. Consequently, similar illustrated in Figure 1, consists of three distinct planes: the
traffic patterns often result in the same routing decisions, even data plane, the control plane, and the AI plane. Each plane’s
when such strategies lead to suboptimal network performance. specific functions are detailed below.
Moreover, these approaches are unable to learn from past 1) DATA PLANE: The data plane primarily consists of
experiences [15]. switches responsible for packet forwarding, all of which are
To address these limitations, artificial intelligence (AI) of- compatible with the OpenFlow protocol. In this study, the Fat-
fers promising solutions. Studies such as [15], [16] leverage Tree topology [20], a widely adopted structure for data center
deep learning to mitigate congestion in various network sce- networks, is used as the data plane. The Fat-Tree topology,
narios. For instance, convolutional neural networks (CNNs) depicted in Figure 2, offers multiple paths between source
are trained to predict congestion states for specific path and destination nodes, ensuring high bandwidth and robust
combinations based on input traffic patterns. Similarly, [12] fault tolerance for data center operations.
employs Q-learning (QL), a reinforcement learning algorithm,
to realize QoS-aware adaptive routing, introducing a QoS- A. CONTROL PLANE
sensitive reward function to guide the learning process. The control plane communicates with the data plane via
However, the dynamic and fine-grained control requirements the southbound interface protocol (OpenFlow). Using the Link
of modern networks make QL-based approaches challenging Layer Discovery Protocol (LLDP), the controller retrieves the
due to the significant storage space required for maintaining Q- network topology and periodically sends state query messages
tables. To overcome this limitation, Deep Q-Learning (DQL), to each switch to collect their status information, including
which integrates deep learning with QL, has been proposed. flow table and port states.
For example, [14] applies deep reinforcement learning (DRL) When a new flow enters the network, the controller cal-
to address large-scale network control problems, presenting a culates the flow rate and determines its type based on
satisfy that 1. For mice- ows, Rmice (1 PLR2) (1 DL) (13)
where PLR2 represents the average packet loss rate of mice
ows, and DL is the normalized average delay of mice- ows.
Both of these indicators are between 0 and 1. and are the
weights of the two indicators respectively and 1.
V. ALGORITHM DESIGN
RL is a tool to solve the MDP problem. QL is a classical
RL algorithm, which is based on value. It sacri ces some of
its current earnings for its long-term earnings. Q stands for
Q(s a), it is the expected bene t of taking action a(a A) at
a certain state s(s S). The main idea of the algorithm is to
build a Q-table to store Q, and then select the action that can
obtain a large pro t according to the Q value. However, the
Fig. 2. Enter Caption
state space is too large to build a Q-table in nite memory
in ourscenario. To address the problem,DQLisadoptedhere.
A. DQL ALGORITHM In this section, we introduce DQN
in detail and show our improvement to DQN for the routing
problem. DQL is an algorithm that combines deep neural
network and QL. Deepneural network has good generalization
ability and can approximate almost any nonlinear function.
There fore, on the basis of QL algorithm, the deep neural
network is used to establish the mapping relationship between
state and action, so as to realize the accelerated solution of
Fig. 3. Enter Caption the problem and solve the dimension disaster problem caused
by the large scale of system state. QLupdates the value
function as follow Q(s a) Q(s a) (r max a Q(s a) Q(s a)) (14)
flow statistics. If the flow rate exceeds a predefined thresh- where (01] is the learning rate. Q(s a) and Q(s a) are the
old—commonly set at 5 Q values of current moment and next moment, respectively.
Additionally, the control plane gathers network performance Instead of searching Q values in Q-table, DQL uses deep
metrics such as packet loss rate, delay, and throughput. neural network such as CNN to estimate Q(s a), i.e., Q(s a )
This information is used to formulate routing strategies. The Q(s a), where represents the set of weights and biases which
controller then translates these strategies into flow rules and are the parameters of neural network. The network is trained
deploys them on the relevant switches. by minimizing the loss, the loss function can be expressed as
follow L( ) E[(r r max a Q(s a ) Q(s a ))2] (15) maxaQ(s a ) is
IV. PROBLEM FORMULATION
the target Q value calculated by QL. While, Q(s a )istheQvalue
We model the data center network as a directed graph G estimated. Our goal is to get the estimate Q close to the target
V E , where V represents the set of all switch nodes and Q. To obtain the two types of Q value, we adopt two indepen
E denotes the set of links between switches. The ow table dent neural networks with the same structure: evaluated Q
capacity of each switch is Rm, and the capac ity of each link network and target Q-network. The former generates the
is Cm. During the period from t1 to tn, n , we assume that esti mateQaccordingtothecurrentstate.Itchangesparametersin
the set of all mice- ows and all elephant- ows are Fmice fw w each episode to decrease the loss. While, the latter outputs
[1p] and Felephant fv v [1q] , where p and q are the number Q correspondedtothenextstate, preparing for thecalculation
of mice ows and elephant- ows, p , q . Further more, we set of the target Q. It updates parameters with evaluated Q-
the existing ows in the network at time ti as Fti network every some steps. To provide training samples, DQN
loss rate and maximize throughput. And for mice- ow, has a reply memory which stores historical experiences.
the goalistominimizepacketlossrateandlatency.Therefore, the Experiences are selected randomly from the reply memory
reward functions are set up as follows: For elephant- ows, to train the neural network. In this way, the problem of
Relephant (1 PLR) TP (12) where PLR represents the average time-correlation of samples is solved and the stability of
packet loss rate of elephant- ows in the network, TP is the training is improved. Wesummarize the work ow of DQL in
average through put of elephant- ows after processing(Average Figure 4. In particular, in the routing scenario, the information
throughput divided by the maximum receiving rate at the about the arrival ow of next moment is unknown, including
receiving end). This is done for bringing the two indicators the ow type as well as the source and destination IP address
into the same order of magnitude(0-1) to facilitate compre- of the ow. And the available paths for the ow are uncertain.
hensive evaluation. and are the weights of the two indicators, We let Q1 and Q2 be the action value function for mice
respectively, indicating the importance of the indicators. They ows and elephant- ows, respectively. We set the alternative
Fig. 4. basic SDN componants