IEEE Conference Template 1

This paper discusses the integration of Software-Defined Networking (SDN) with data center networks to improve routing strategies, particularly through the use of Deep Q-Learning (DQL) to optimize paths for different flow types. The proposed DQL-based routing strategy aims to minimize latency and packet loss for mice flows while maximizing throughput for elephant flows. Simulation results indicate that this approach outperforms traditional routing methods, significantly reducing average latency and packet loss rates for mice flows and enhancing throughput for elephant flows.

Uploaded by

faithjewellery1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views5 pages

IEEE Conference Template 1

Uploaded by

faithjewellery1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Routing Schemes in SDN-Based Data Center

Networks

Abstract—To accommodate the rapid advancements in cloud data center networks. However, these strategies often rely
computing, big data, and related technologies, the integration of on manual design, making it challenging to achieve optimal
Software-Defined Networking (SDN) with data center networks solutions in dynamic network environments.
has been proposed to enhance flexibility and simplify network
management. Leveraging this integration, researchers have ex- The emergence of Artificial Intelligence (AI) introduces new
plored various routing strategies. However, traditional strategies possibilities for addressing routing challenges. Reinforcement
that rely heavily on manual design face challenges in achieving Learning (RL), exemplified by Q-learning (QL) [12], is a
optimal solutions in dynamic network environments. To address
these limitations, artificial intelligence (AI)-driven strategies are key branch of AI that identifies optimal strategies through
gaining traction. iterative trial-and-error interactions with the environment.
This paper introduces a novel routing strategy based on However, the dynamic nature of network states can result in
Deep Q-Learning (DQL) to autonomously determine optimal an excessively large Q-table, rendering traditional Q-learning
routing paths in SDN-enabled data center networks. Recognizing impractical. Deep Q-Learning (DQL) [13], which employs
the distinct requirements of mice flows and elephant flows in
such networks, separate deep Q-networks are trained for each neural networks to approximate Q-tables, effectively addresses
flow type. The objective is to minimize latency and packet loss this issue. NetworkAI [14] proposes an intelligent architecture
for mice flows while maximizing throughput and minimizing combining SDN with deep reinforcement learning (DRL),
packet loss for elephant flows. Additionally, considering the providing a foundation for self-learning control strategies in
traffic distribution and the constrained resources of data center SDN networks. While it offers an example of using DQL
networks and SDN, this work employs port rate and flow table
utilization to represent network states. for Quality of Service (QoS) routing, the detailed design of
Simulation results demonstrate that, compared to traditional routing schemes remains unexplored.
Equal-Cost Multipath (ECMP) routing and the Selective Ran- In this paper, we present a DQL-based routing strategy
domized Load Balancing (SRL) FlowFit approach, the proposed
strategy significantly reduces the average latency and packet loss
for SDN-enabled data center networks. Two separate deep Q-
rate for mice flows while enhancing the average throughput for networks (DQNs) are designed to make intelligent routing de-
elephant flows. cisions: one focuses on elephant flows to optimize throughput
Index Terms—component, formatting, style, styling, insert and reduce packet loss, while the other targets mice flows to
minimize latency and packet loss. This approach enables the
I. I NTRODUCTION development of near-optimal routing strategies tailored to the
1) INTRODUCTION: The rapid advancements in cloud specific characteristics of each flow type.
computing, big data, and related technologies have led to the 2) Contributions: The key contributions of this work are
continuous expansion of data center networks [1]. Traditional summarized as follows:
network architectures face significant challenges in meeting
the demands of modern data center networks due to limita- 1) Intelligent Network Architecture: We develop an intel-
tions in management and deployment flexibility. The advent ligent architecture for data center network routing. Based
of Software-Defined Networking (SDN) offers a promising on the traffic characteristics of data center networks,
solution by introducing a novel architecture that decouples the this architecture dynamically generates optimal routing
control plane from the data plane in forwarding devices. This strategies for elephant flows and mice flows.
separation enables centralized control, allowing data center 2) DQL Algorithm Design: We provide a detailed design
networks to make intelligent and dynamic decisions. of the DQL algorithm, including the state space, ac-
Routing remains a critical area of research within data tion space, and reward function. To better capture the
center networks and has been extensively studied. Leveraging network state, we incorporate port rate and flow table
the global network view provided by SDN, routing strategies occupancy rate in switches, reflecting traffic distribution
can be deployed more conveniently and flexibly. To enhance and resource utilization, such as link bandwidth and flow
routing efficiency, network flows are generally categorized into table resources.
two types: elephant flows and mice flows. Elephant flows are 3) Performance Validation: The proposed routing algo-
characterized by large volumes of data and long durations, rithm’s effectiveness is validated through simulations.
whereas mice flows involve smaller data volumes and shorter Results demonstrate improvements in packet loss rate
durations. Given these traffic characteristics, several studies and delay for mice flows, as well as enhanced throughput
[2]–[11] have explored SDN-based routing approaches in and reduced packet loss rate for elephant flows.
II. RELATED WORKS
Routing has long been a critical focus of research in data
center networks. With the emergence of Software-Defined
Networking (SDN) in recent years, numerous SDN-based
routing strategies have been developed, enabling fine-grained
flow control. Among these, load-balancing routing strategies
are widely studied to enhance network sustainability. These
strategies aim to maintain high transmission quality for flows
while preserving resources for future traffic.
Several works [2]–[4] have focused on load balancing
specifically for elephant flows. For example, [2] selects routes
capable of accommodating flows, while [3] prioritizes the
least congested paths. Meanwhile, [4] introduces dynamic
path splitting, distributing elephant flows across multiple paths
based on computed ratios. Further, studies such as [5]–[7] pro-
pose rerouting mechanisms to continuously balance network
loads. These approaches periodically assess the network’s load Fig. 1. Enter Caption
balance using specific parameters, such as load balance degree.
If a threshold is exceeded, flow scheduling or splitting is
triggered, effectively reducing packet loss rates and improving simple example of using DQL for QoS routing. While the
throughput for elephant flows. architecture combining DRL and SDN is emphasized, the
While these strategies primarily address link load balancing, specific design of routing schemes is not detailed.
another perspective—flow table load balancing—has recently Recent DRL-based mechanisms, such as DROM [17] and
gained attention. This method considers the limited flow table TIDE [18], have been proposed for routing optimization.
capacities in SDNs. For example, research [8] proposes flow However, these methods rely on modifying link weights, with
table load balancing for mice flows, which constitute the optimal paths determined indirectly through shortest path algo-
majority of data center traffic, to prevent packet loss caused rithms, limiting their direct routing optimization capabilities.
by flow table overflow.
Routing schemes for mice flows also consider their low-
latency requirements. Literature [9] selects paths with the low- III. SYSTEM ARCHITECTURE
est delay, while [10] dedicates specific low-latency paths for To integrate SDN-based data center networks with Deep
these flows. In addition, [11] minimizes latency by reducing Q-Learning (DQL), an AI agent is introduced within an AI
the number of flow rules required for mice flow transmission. plane, extending the traditional SDN architecture to enable in-
However, these solutions are largely based on manual designs, telligent routing decisions. The proposed system architecture,
which lack intelligence and adaptability. Consequently, similar illustrated in Figure 1, consists of three distinct planes: the
traffic patterns often result in the same routing decisions, even data plane, the control plane, and the AI plane. Each plane’s
when such strategies lead to suboptimal network performance. specific functions are detailed below.
Moreover, these approaches are unable to learn from past 1) DATA PLANE: The data plane primarily consists of
experiences [15]. switches responsible for packet forwarding, all of which are
To address these limitations, artificial intelligence (AI) of- compatible with the OpenFlow protocol. In this study, the Fat-
fers promising solutions. Studies such as [15], [16] leverage Tree topology [20], a widely adopted structure for data center
deep learning to mitigate congestion in various network sce- networks, is used as the data plane. The Fat-Tree topology,
narios. For instance, convolutional neural networks (CNNs) depicted in Figure 2, offers multiple paths between source
are trained to predict congestion states for specific path and destination nodes, ensuring high bandwidth and robust
combinations based on input traffic patterns. Similarly, [12] fault tolerance for data center operations.
employs Q-learning (QL), a reinforcement learning algorithm,
to realize QoS-aware adaptive routing, introducing a QoS- A. CONTROL PLANE
sensitive reward function to guide the learning process. The control plane communicates with the data plane via
However, the dynamic and fine-grained control requirements the southbound interface protocol (OpenFlow). Using the Link
of modern networks make QL-based approaches challenging Layer Discovery Protocol (LLDP), the controller retrieves the
due to the significant storage space required for maintaining Q- network topology and periodically sends state query messages
tables. To overcome this limitation, Deep Q-Learning (DQL), to each switch to collect their status information, including
which integrates deep learning with QL, has been proposed. flow table and port states.
For example, [14] applies deep reinforcement learning (DRL) When a new flow enters the network, the controller cal-
to address large-scale network control problems, presenting a culates the flow rate and determines its type based on
satisfy that 1. For mice- ows, Rmice (1 PLR2) (1 DL) (13)
where PLR2 represents the average packet loss rate of mice
ows, and DL is the normalized average delay of mice- ows.
Both of these indicators are between 0 and 1. and are the
weights of the two indicators respectively and 1.
V. ALGORITHM DESIGN
RL is a tool to solve the MDP problem. QL is a classical
RL algorithm, which is based on value. It sacri ces some of
its current earnings for its long-term earnings. Q stands for
Q(s a), it is the expected bene t of taking action a(a A) at
a certain state s(s S). The main idea of the algorithm is to
build a Q-table to store Q, and then select the action that can
obtain a large pro t according to the Q value. However, the
Fig. 2. Enter Caption
state space is too large to build a Q-table in nite memory
in ourscenario. To address the problem,DQLisadoptedhere.
A. DQL ALGORITHM In this section, we introduce DQN
in detail and show our improvement to DQN for the routing
problem. DQL is an algorithm that combines deep neural
network and QL. Deepneural network has good generalization
ability and can approximate almost any nonlinear function.
There fore, on the basis of QL algorithm, the deep neural
network is used to establish the mapping relationship between
state and action, so as to realize the accelerated solution of
Fig. 3. Enter Caption the problem and solve the dimension disaster problem caused
by the large scale of system state. QLupdates the value
function as follow Q(s a) Q(s a) (r max a Q(s a) Q(s a)) (14)
flow statistics. If the flow rate exceeds a predefined thresh- where (01] is the learning rate. Q(s a) and Q(s a) are the
old—commonly set at 5 Q values of current moment and next moment, respectively.
Additionally, the control plane gathers network performance Instead of searching Q values in Q-table, DQL uses deep
metrics such as packet loss rate, delay, and throughput. neural network such as CNN to estimate Q(s a), i.e., Q(s a )
This information is used to formulate routing strategies. The Q(s a), where represents the set of weights and biases which
controller then translates these strategies into flow rules and are the parameters of neural network. The network is trained
deploys them on the relevant switches. by minimizing the loss, the loss function can be expressed as
follow L( ) E[(r r max a Q(s a ) Q(s a ))2] (15) maxaQ(s a ) is
IV. PROBLEM FORMULATION
the target Q value calculated by QL. While, Q(s a )istheQvalue
We model the data center network as a directed graph G estimated. Our goal is to get the estimate Q close to the target
V E , where V represents the set of all switch nodes and Q. To obtain the two types of Q value, we adopt two indepen
E denotes the set of links between switches. The ow table dent neural networks with the same structure: evaluated Q
capacity of each switch is Rm, and the capac ity of each link network and target Q-network. The former generates the
is Cm. During the period from t1 to tn, n , we assume that esti mateQaccordingtothecurrentstate.Itchangesparametersin
the set of all mice- ows and all elephant- ows are Fmice fw w each episode to decrease the loss. While, the latter outputs
[1p] and Felephant fv v [1q] , where p and q are the number Q correspondedtothenextstate, preparing for thecalculation
of mice ows and elephant- ows, p , q . Further more, we set of the target Q. It updates parameters with evaluated Q-
the existing ows in the network at time ti as Fti network every some steps. To provide training samples, DQN
loss rate and maximize throughput. And for mice- ow, has a reply memory which stores historical experiences.
the goalistominimizepacketlossrateandlatency.Therefore, the Experiences are selected randomly from the reply memory
reward functions are set up as follows: For elephant- ows, to train the neural network. In this way, the problem of
Relephant (1 PLR) TP (12) where PLR represents the average time-correlation of samples is solved and the stability of
packet loss rate of elephant- ows in the network, TP is the training is improved. Wesummarize the work ow of DQL in
average through put of elephant- ows after processing(Average Figure 4. In particular, in the routing scenario, the information
throughput divided by the maximum receiving rate at the about the arrival ow of next moment is unknown, including
receiving end). This is done for bringing the two indicators the ow type as well as the source and destination IP address
into the same order of magnitude(0-1) to facilitate compre- of the ow. And the available paths for the ow are uncertain.
hensive evaluation. and are the weights of the two indicators, We let Q1 and Q2 be the action value function for mice
respectively, indicating the importance of the indicators. They ows and elephant- ows, respectively. We set the alternative
Fig. 4. basic SDN componants

Fig. 5. basic SDN componants

simulation, we use a Fat-Tree [21] topology with a param

eter of 4 as an example, which contains 20 switches and
16servers. The link capacity and owtablecapacity are set to
100 and 20, respectively. To emulate the traf c in data center
networks, we set 20 percent of the ows to be mice- ows,
and the others to be elephant- ows. The threshold of the Fig. 6. Enter Caption
two types of ows is 0.5 percent of the link bandwidth. And
the duration is 5s for mice- ows, while 30s for elephant-
ows. The related parameters of reward function are set SRL randomly selects two equivalent shortest paths,
as follows: 05. B. PERFORMANCE AND RESULTS In andthepathwiththeleastloadwillbetheinital path. Furthermore,
order to demonstrate the effectiveness of the proposed FlowFit periodically monitors the state of the network and
scheme,experimentswereperformedunderdifferentnetwork reassigns ows to optimal links. As shown in Table 2 to Table
loads (0.1, 0.5, 0.9). We take 0.9 for example, Figure 5 4, the proposed scheme reduces the delay of mice- ows by
shows the training process of the AI agent under this load. an average of 55.08the network load of 0.1 to 0.9, compared
From 5(a) and 5(b), we can nd that the average delay of with ECMP and SRL FlowFit. The average packet loss rate
mice- ows and average packet loss rate display a decreasing is 33.1725.5while, the average throughput of elephant- ows
trend with the increase of training steps. While the average is 35.822.68clarity, we explain the calculation method of
throughput TABLE 1. Convergence steps (k). TABLE 2. the above results here. Taking Table 2 as an example, we
Average delay comparison of mice-flows (ms). TABLE 3. calculate the delay reduction of mice- ows of the proposed
Average packet loss rate comparison of all flows (TABLE scheme compared with other schemes under each load, and
4. Average throughout comparison of elephant-flows (of then calculate the average valuecommand that updates the
elephant- ows shows an upward trend in this interval in counter it’s supposed to
5(c). All indicators will level off after a certain number
VI. CONCLUSION
of steps. Werecordtheconvergencestepsofeachstandardunder
each network load in Table 1. In particular, when the load In this paper, we focus on solving routing problems in SDN
increases, the indicators tend to stabilize in a shorter period based data center networks. DQL is employed to achieve the
of time. It should be noted that the average packet loss rate optimal routing. In the learning phase, we constantly adjust
we measureinthispaperisforalltraf cinthenetwork,including our routing strategies through trial and error, and train CNNs
mice- ows and elephant- ows. We compare our scheme to generate the optimal paths. It spends a lot of time and
with two methods. One is the classic data center network computing resources. For the subsequent routing phase, we
routing algorithm ECMP [22], which adopts polling to can obtain the optimal routing strategies according to the
allocate ows, not considering the network status. The other trained CNNs accurately and quickly without extra calcula
is SRL FlowFit [23]. As a routing initialization algorithm, tions. Aiming at the two types of ows in data center net
works, elephant- ows and mice- ows, two DQNsarebuilt to [16] B. Mao, F. Tang, Z. M. Fadlullah, N. Kato, O. Akashi, T. Inoue, and K.
train and generate the corresponding routing strategy respec Mizutani, “A novel non-supervised deep-learning-based network traffic
control method for software-defined wireless networks,” IEEE Wireless
tively. Meanwhile, the ow table utilization and port rate are Commun., vol. 25, no. 4, pp. 74–81, Aug. 2018.
both taken into account to describe the network state in the [17] C. Yu, J. Lan, Z. Guo, and Y. Hu, “DROM: Optimizing the routing
scheme. We have successfully veri ed the effectiveness of in software-defined networks with deep reinforcement learning,” IEEE
Access, vol. 6, pp. 64533–64539, 2018.
the proposed mechanism in a simulated data center network. [18] P. Sun, Y. Hu, J. Lan, L. Tian, and M. Chen, “TIDE: Time-relevant deep
Simulation results show that, the proposed routing scheme reinforcement learning for routing optimization,” Future Gener. Comput.
can not only provide optimized routing strategy intelligently, Syst., vol. 99, pp. 401–409, Oct. 2019.
[19] Mininet. Accessed: Aug. 28, 2018. [Online]. Available:
but also improve the network performance. In the future, http://mininet.org/
we will improve the neural network structure to reduce the [20] Ryu. Accessed: May 23, 2020. [Online]. Available:
path computationtime.Furthermore,theroutingproblemwill be http://ryu.readthedocs.io/en/latest/
[21] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity
researched in a more complex scenario, where multiple ows data center network architecture,” in Proc. ACM SIGCOMM Conf. Data
arrive at the network at the same interval. Commun. SIGCOMM, Aug. 2008, pp. 95–104.
[22] M. Chiesa, G. Kindler, and M. Schapira, “Traffic engineering with
R EFERENCES Equal-Cost-multipath: An algorithmic perspective,” in Proc. IEEE IN-
[1] W. Xia, P. Zhao, Y. Wen, and H. Xie, “A survey on data center net- FOCOM Conf. Comput. Commun., Apr. 2014, pp. 1590–1598.
working (DCN): Infrastructure and operations,” IEEE Commun. Surveys [23] W. Sehery and T. Charles Clancy, “Load balancing in data center
Tuts., vol. 19, no. 1, pp. 640–656, 1st Quart., 2017. networks with folded-Clos architectures,” in Proc. 1st IEEE Conf. Netw.
[2] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, Softwarization (NetSoft), Apr. 2015, pp. 1–6.
“Hedera: Dynamic flow scheduling for data center networks,” in Proc.
USENIX NSDI, San Jose, CA, USA, 2010, pp. 281–296.
[3] A. R. Curtis, W. Kim, and P. Yalagandula, “Mahout: Low-overhead data
center traffic management using end-host-based elephant detection,” in
Proc. IEEE INFOCOM, Apr. 2011, pp. 1629–1637.
[4] J. Liu, J. Li, G. Shou, Y. Hu, Z. Guo, and W. Dai, “SDN based load
balancing mechanism for elephant flow in data center networks,” in Proc.
Int. Symp. Wireless Pers. Multimedia Commun. (WPMC), Sep. 2014, pp.
486–490.
[5] H. Long, Y. Shen, M. Guo, and F. Tang, “LABERIO: Dynamic load-
balanced routing in OpenFlow-enabled networks,” in Proc. IEEE 27th
Int. Conf. Adv. Inf. Netw. Appl. (AINA), Mar. 2013, pp. 290–297.
[6] G. Xiao, W. Wenjun, Z. Jiaming, F. Chao, and Z. Yanhua, “An
OpenFlow-based dynamic traffic scheduling strategy for load balancing,”
in Proc. 3rd IEEE Int. Conf. Comput. Commun. (ICCC), Dec. 2017, pp.
531–535.
[7] Y.-C. Wang and S.-Y. You, “An efficient route management framework
for load balance and overhead reduction in SDN-based data center
networks,” IEEE Trans. Netw. Service Manage., vol. 15, no. 4, pp. 1422–
1434, Dec. 2018.
[8] Z. Guo, Y. Xu, R. Liu, A. Gushchin, K.-Y. Chen, A. Walid, and H. J.
Chao, “Balancing flow table occupancy and link utilization in software-
defined networks,” Future Gener. Comput. Syst., vol. 89, pp. 213–223,
Dec. 2018.
[9] C. Wang, G. Zhang, H. Chen, and H. Xu, “An ACO-based elephant and
mice flow scheduling system in SDN,” in Proc. IEEE 2nd Int. Conf. Big
Data Anal. (ICBDA), Mar. 2017, pp. 859–863.
[10] W. Wang, Y. Sun, K. Zheng, M. A. Kaafar, D. Li, and Z. Li, “Freeway:
Adaptively isolating the elephant and mice flows on different transmis-
sion paths,” in Proc. IEEE 22nd Int. Conf. Netw. Protocols, Oct. 2014,
pp. 362–367.
[11] F. Amezquita-Suarez, F. Estrada-Solano, N. L. S. da Fonseca, and O.
M. C. Rendon, “An efficient mice flow routing algorithm for data
centers based on software-defined networking,” in Proc. IEEE Int. Conf.
Commun. (ICC), May 2019, pp. 1–6.
[12] J. Ho, D. W. Engels, and S. E. Sarma, “HiQ: A hierarchical Q-Learning
algorithm to solve the reader collision problem,” in Proc. Int. Symp.
Appl. Internet Workshops (SAINTW), 2006, pp. 88–91.
[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S.
Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D.
Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep
reinforcement learning,” Nature, vol. 518, pp. 529–533, Feb. 2015.
[14] H. Yao, T. Mai, X. Xu, P. Zhang, M. Li, and Y. Liu, “NetworkAI:
An intelligent network architecture for self-learning control strategies in
software-defined networks,” IEEE Internet Things J., vol. 5, no. 6, pp.
4319–4327, Dec. 2018.
[15] B. Mao, F. Tang, Z. M. Fadlullah, and N. Kato, “An intelligent route
computation approach based on real-time deep learning strategy for
software-defined communication systems,” IEEE Trans. Emerg. Topics
Comput., early access, Feb. 14, 2019, doi: 10.1109/TETC.2019.2899407.