0% found this document useful (0 votes)
376 views6 pages

Ns3-Gym: Extending OpenAI Gym For Networking

1) ns3-gym extends OpenAI Gym to integrate the ns-3 network simulator, allowing reinforcement learning algorithms developed in Gym to interact with ns-3 networking environments. 2) Two illustrative examples are presented to demonstrate ns3-gym. 3) ns3-gym is provided as open source to facilitate the development and comparison of reinforcement learning approaches for networking problems.

Uploaded by

Muhammed Yahya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
376 views6 pages

Ns3-Gym: Extending OpenAI Gym For Networking

1) ns3-gym extends OpenAI Gym to integrate the ns-3 network simulator, allowing reinforcement learning algorithms developed in Gym to interact with ns-3 networking environments. 2) Two illustrative examples are presented to demonstrate ns3-gym. 3) ns3-gym is provided as open source to facilitate the development and comparison of reinforcement learning approaches for networking problems.

Uploaded by

Muhammed Yahya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ns3-gym: Extending OpenAI Gym for Networking

Research
Piotr Gawłowicz and Anatolij Zubow
{gawlowicz, zubow}@tkn.tu-berlin.de
Technische Universität Berlin, Germany

Abstract—OpenAI Gym is a toolkit for reinforcement learning behaviors [9]. The main advantage of RL is its ability to
(RL) research. It includes a large number of well-known prob- learn to interact with the surrounding environment based on
lems that expose a common interface allowing to directly compare its own experience. Therefore, RL agents learn to find the
the performance results of different RL algorithms. Since many
arXiv:1810.03943v2 [cs.NI] 10 Oct 2018

years, the ns–3 network simulation tool is the de–facto standard best action series to maximize the cumulated reward (i.e.,
for academic and industry research into networking protocols objective function) by interacting with the environment. The
and communications technology. Numerous scientific papers were usage of RL is very suitable for solving networking related
written reporting results obtained using ns–3, and hundreds of problems. First, they are great in solving optimizing problems
models and modules were written and contributed to the ns–3 without an accepted closed solution. Second, they are of low
code base. Today as a major trend in network research we see
the use of machine learning tools like RL. What is missing complexity so that they can be even used in real production
is the integration of a RL framework like OpenAI Gym into systems where the actions have to be taken at very high speed,
the network simulator ns-3. This paper presents the ns3-gym e.g. adapting the contention windows in Transmission Control
framework. First, we discuss design decisions that went into the Protocol (TCP) [4] at line speed.
software. Second, two illustrative examples implemented using
ns3-gym are presented. Our software package is provided to the
community as open source under a GPL license and hence can Reward Rt+1
be easily extended. Agent
Index terms— Machine Learning, Reinforcement Learning, Model
OpenAI Gym, network simulator, ns-3, networking research Take Environment
State action At
I. Introduction
We see a boom in the usage of machine learning in Observe state St+1

general and reinforcement learning (RL) in particular for Fig. 1. Reinforcement Learning.
the optimization of communication and networking systems
ranging from scheduling [1], [2], resource management [3], B. RL Tools
congestion control [4], [5], [6], routing [7] and adaptive
OpenAI Gym [10] is a toolkit for developing and comparing
video streaming [8]. Each proposed approach shows significant
reinforcement learning algorithms. It supports teaching agents
improvements compared to traditionally designed algorithms.
for the variety of applications ranging from playing video
Unfortunately, the results are often not directly comparable.
games like Pong or Pinball to problems in robotics [10],
Some researchers use different RL libraries; other different
[11], [12]. Gym is easy to use as widely used ML libraries
networking simulators or experimentation testbeds. This paper
like Tensorflow and Scikit-Learn are available. It is well
takes the first step towards the unification: usage of same
documented, tested and accepted by the research community.
RL libraries and same network simulation environment so
Moreover, as agents can be written in high-level programming
that the performance of different RL-based solutions can be
language Python it is suitable for beginners.
directly compared with each other in a well-defined controlled
environment with common API that should accelerate the de-
velopment of novel RL-based networking solutions. Moreover, C. Ns-3 Network Simulator
as the selected ns-3 network simulator provides emulation Ns-3 is a discrete-event network simulator for Internet sys-
capabilities for evaluating network protocols in real testbeds, tems, targeted primarily for research and educational use. Ns-
our toolkit integrates the Gym API with real networking 3 is a general purpose network simulator comprising features
hardware. Hence, it allows the researcher to validate RL like the availability of a full-fledged TCP/IP protocol stack, the
algorithms even in real networking environments. support of numerous wireless technologies such as LTE, WiFi
and WiMAX, and the possibility of integration with testbeds
II. Background
and real applications. It is a free software, licensed under the
A. Reinforcement Learning GNU GPLv2 license, and is publicly available [13], [14]. Ns-3
RL is being successfully used in robotics for years as is a de-facto standard as the results obtained are accepted by
it allows the design of sophisticated and hard to engineer the research community.
III. Design Principles uses it as the environment. This way, the entire environment
The main goal of our work is to facilitate and shorten the is defined inside the simulation script making the Python code
time required for prototyping of novel RL-based networking environment-independent and less prone to errors.
solutions. Therefore we have identified the following design 1 import gym
principles: 2 import PyOpenGymNs3
3 import MyAgent
• scalability - it should be possible to run multiple ns-3 4
instances even in a distributed environment. Moreover, 5 env = gym.make(’ns3-v0’)
6 obs = env.reset()
support of both time and event-driven observation, 7 agent = MyAgent.Agent()
• low entry overhead - it should be easy to convert existing 8
legacy ns-3 simulation scripts to be used in OpenAI Gym 9 while True:
10 action = agent.get_action(obs)
environment, 11 obs, reward, done, info = env.step(action)
• fast prototyping - loose coupling between agent and 12
environment allows easy debugging of the locally running 13 if done:
break
Gym agent scripts, i.e. the ns-3 worker may run on a 14 15 env.close()
powerful server,
• easy maintenance - the framework is just a normal ns-3 Listing 1. An OpenAI Gym agent written in Python language
module like LTE or WiFi, i.e. no changes required inside
At each step, the agent takes the observation and returns,
the ns-3 simulation kernel.
based on the implemented logic, the next action to be executed
IV. System Design in the environment — lines 9–11. Note, that agent class is not
provided in the framework and the developers are free to define
A. Overview
them as they want. For example, the simplest agent performs
The architecture of our framework shown in Fig. 2 consists random actions. The execution of the episode terminates (lines
of two main software blocks, namely OpenAI Gym and ns-3 13–14) when the environment returns done=true, that can be
network simulator. Following the RL nomenclature, the Gym caused by the end of the simulation or meeting the predefined
framework is used to implement agents, while ns-3 acts as an game-over condition.
environment. Optionally, the framework can be executed in a
real network testbed — see a detailed description in IV-E. C. Generic Environments
Following our main design decision, any ns-3 simulation
script can be used as a Gym environment. This requires only to
Testbed instantiate OpenGymInterface (see Listing 2) and implement
the ns3-gym C++ interface consisting of the functions listed
Agent
(algorithm) optional in Listing 3. Note, that the functions can be defined separately
IPC (e.g. or grouped together inside object inheriting from the GymEnv
socket) base class.
ns3gym ns-3 Network 1 Ptr<OpenGymInterface> openGymInterface =
OpenAI Gym ,→ CreateObject<OpenGymInterface> (openGymPort);
Interface Simulator
2 Ptr<MyGymEnv> myGymEnv = CreateObject<MyGymEnv> ();
3 myGymEnv->SetOpenGymInterface(openGymInterface);
Fig. 2. Proposed architecture for OpenAI Gym for networking.
Listing 2. Adding OpenAI Gym interface to ns-3 simulation
The main contribution of this work is the design and
implementation of a generic interface between OpenAI Gym 1 Ptr<OpenGymSpace> GetObservationSpace();
and ns-3 that allows for seamless integration of those two 2 Ptr<OpenGymSpace> GetActionSpace();
frameworks. The interface takes care for the management of 3 Ptr<OpenGymDataContainer> GetObservation();
4 float GetReward();
the ns-3 simulation process life cycle as well as delivering 5 bool GetGameOver();
state and action information between the Gym agent and 6 std::string GetExtraInfo();
the simulation environment. In the following subsection, we 7 bool ExecuteActions(Ptr<OpenGymDataContainer> action);
describe our ns3-gym framework in detail. Listing 3. ns3-gym C++ interface
B. ns3-gym Example The functions GetObservationSpace and
The code listing 1 shows the execution of a single episode GetActionSpace are used to define observation and
using the ns3-gym framework. First, the ns-3 environment action spaces, respectively. They are called only once during
and agent are initialized — lines 5–7. Note, that the creation initialization of the environment. The definitions are used
of ns3-v0 environment is achieved using the standard Gym to create corresponding spaces in Python — our framework
API. Behind the scene, the ns3-gym engine starts a ns-3 takes care for it. Currently, we support the most useful spaces
simulation script located in the current working directory and defined in OpenAI Gym framework, namely:
1) Discrete — a single discrete number with value between The ns3-gym framework delivers the collected environ-
0 and N. ment’s state to the agent that in return sends the action to be ex-
2) Box — a vector or matrix of numbers of single type ecuted. Similarly to the observation, the action is also encoded
with values bounded between low and high limits. as numerical values in a container. The user is responsible
3) Tuple — a tuple of simpler spaces. to implement the ExecuteActions function, that maps those
4) Dict — a dictionary of simpler spaces. numerical values to proper actions, e.g. transmission power or
Listing 4 shows an example definition of the observation MCS for the WiFi interface in each node.
space as C++ function. The space is going to be used to store Note, that the mapping of all the described functions
queue lengths of all the nodes available in the simulation. The between corresponding C++ and Python functions is done
queue size was set to 100 packets, hence the values are integers by the ns3-gym framework automatically hiding the entire
and bounded between 0 and 100. complexity behind easy to use API.
As already mentioned, the environment is defined en-
1 Ptr<OpenGymSpace> GetObservationSpace() {
2 uint32_t nodeNum = NodeList::GetNNodes (); tirely inside the ns-3 simulation script. Optionally, it
3 float low = 0.0; can be also adjusted by passing command line argu-
4 float high = 100.0; ments during the start of the script (e.g. seed, simulation
5 std::vector<uint32_t> shape = {nodeNum,};
6 std::string dtype = TypeNameGet<uint32_t> (); time, number of nodes, etc.). This, however, requires to
7 Ptr<OpenGymBoxSpace> space = use Ns3Env(args={arg=value,...}) constructor instead of
,→ CreateObject<OpenGymBoxSpace>(low,high,shape,dtype); standard gym.make(’ns3-v0’).
8 return space;
9 } D. Custom Environments
Listing 4. An example definition of the GetObservationSpace function In addition to the generic ns3-gym interface when one can
observe any variable in a simulation, we provide also custom
During every step execution the framework collects the
environments for specific use-cases, e.g. in TCPNs3Env where
current state of the environment by calling the following
for the problem of flow & congestion control (TCP) the obser-
functions:
vation state, action and reward function are predefined using
1) GetObservation – collect values of observed variables the RL mapping proposed by [4]. This simplifies dramatically
and/or parameters at any network node in each layer of the development of own RL-based TCP solutions and can be
the network protocol stack; further used as a benchmarking suite allowing to compare the
2) GetReward – measure the reward achieved during last performance of different RL approaches in the context of TCP.
step; DASHNs3Env is another predefined environment for test-
3) GetGameOver – check a predefined game-over condi- ing adaptive video streaming solutions using our framework.
tion; Again the RL mapping for observation state, action and reward
4) GetExtraInfo – get an extra information associated is predefined, i.e. as proposed by [8].
with current environment state. Fig. 3 shows the meta-model of the environments we
Note, that the step in our framework can be executed every provide. Note, the user of our framework is free to extend
predefined time-interval (time-based step), e.g. every 100 ms, it by providing his own custom environments.
or fired by an occurrence of specific event (event-based step),
e.g. packet loss. fully
Ns3Env generic
The code listing 5 shows example implementation of the
GetObservation observation function. First, the box data
container is created according to the observation space def- DASHNs3Env TCPNs3Env
specific to
inition. Then the box is filled with the current size of the -state: EWMA, RTT ratio, curCWND use-case
CustomNs3Env -action: CWND += k, k=-1,...,3
queue of WiFi interface of each node. -reward: r() function

1 Ptr<OpenGymDataContainer> GetObservation() { Fig. 3. Meta model of the OpenAI Gym environments provided by ns3-gym
2 uint32_t nodeNum = NodeList::GetNNodes (); framework with a generic and multiple custom environments.
3 std::vector<uint32_t> shape = {nodeNum,};
4 Ptr<OpenGymBoxContainer<uint32_t> > box =
,→ CreateObject<OpenGymBoxContainer<uint32_t>>(shape); E. Emulation
5
Since the ns-3 allows for usage of real Linux protocol stacks
6 uint32_t nodeNum = NodeList::GetNNodes ();
7 for (uint32_t i=0; i<nodeNum; i++) { inside simulation [15] as well as can be run in the emulation
8 Ptr<Node> node = *i; mode for evaluating network protocols in real testbeds [16]
9 Ptr<WifiMacQueue> queue = GetQueue (node); (possibly interacting with real-world implementations), it can
10 uint32_t value = queue->GetNPackets();
11 box->AddValue(value); act as a bridge between an agent implemented in Gym and a
12 } real-world environment.
13 return box; Those features give the researchers the possibility to train
14 }
their RL agents in a simulated network (possibly very fast
Listing 5. An example definition of the GetObservation function using parallel environments) and test them afterward in real
testbed without having to change any single line of code. We B. Cognitive Radio
believe that this intermediate step is of the great importance We consider the problem of radio channel selection in a
for the testing of the ML-based network control algorithms. wireless multi-channel environment, e.g. 802.11 networks with
external interference. The objective of the agent is to select for
V. Implementation the next time slot a channel free of interference. We consider
a simple illustrative example where the external interference
The ns3-gym is a toolkit that consists of two modules (one
follows a periodic pattern, i.e. sweeping over all channels one
written in C++ and the other in Python) being add-ons to
to four in the same order as shown in the table.
the existing ns-3 and OpenAI Gym frameworks and enabling
information exchange between them. The communication is 1 2 3 4 5 6 7 8 9
channel\slot
realized over ZMQ1 sockets using the Protocol Buffers2 library 1
for serialization of messages. This, however, is hidden from 2
the users behind easy to use API. 3
4 ...
The simulation environments are defined using purely stan-
dard ns-3 models, while agents can be developed using popular
We created such a scenario in ns-3 using existing
ML libraries like Tensorflow, Keras, etc.
functionality from ns-3, i.e. interference created using
Our software package together with clarifying examples is
WaveformGenerator class and sensing performed using
provided to the community as open source under a GPL on
SpectrumAnalyzer class.
https://github.com/tkn-tub/ns3-gym.
Such a periodic interferer can be easily learned by an
RL-agent so that based on the current observation of the
VI. Illustrative Examples occupation on each channel in a given time slot the correct
In this section, we present two networking related examples channel can be determined for the next time slot avoiding any
we implemented using our ns3-gym framework. collision with the interferer.
Our proposed RL mapping is:
• observation — occupation on each channel in the current
A. Random Access
time slot, i.e. wideband-sensing,
Controlling the random access in an IEEE 802.11 mesh • actions — set the channel to be used for the next time
network is challenging as the network nodes compete for the slot,
shared radio resources. It is known that assigning the same • reward — +1 in case of no collision with interferer;
channel access probability to each node is not optimal [17] otherwise -1,
and therefore the literature proposed solutions where e.g. the • gameover — if more than three collisions happened
channel access probability depends on the network load (queue during the last ten time-slots
size) of a node. In this section, we will show how our toolkit Fig. 4 shows the learning performance when using a simple
can be used to learn the control channel access probability neural network with fully connected input and an output
value as a function of network load. We created a linear layer. We see that after around 80 episodes the agent is able
topology in ns-3 consisting of five nodes and set up a saturated to perfectly predict the next channel state from the current
UDP packet flow from the leftmost to the rightmost node. observation hence avoiding any collision with the interference.
Our proposed RL mapping is: The full source code of the example can be found in our
• observation - queue lengths of each node, repository under ./examples/opengym/interference-pattern/.
• actions - set channel access probability for each node; Note, that in a more realistic scenario the simple waveform
here we set both CWmin and CWMax to the same value, generator in this example can be replaced by a real wireless
i.e. uniform backoff (window stays constant even when technology like LTE unlicensed (LTE-U). Here an RL-agent
in case of packet collisions), running on a WiFi node might be trained to detect co-located
• reward - the number of packets received at the flow’s LTE-U BSs from observing the radio spectrum as proposed
ultimate destination during last step interval, in [18].
• gameover - end of simulation time.
VII. Related Work
Our RL agent was able to learn to assign lower
Related work falls into three categories:
CWmin/CWMax values to nodes closer to the flow destination.
Hence it was able to outperform the baseline where all nodes RL for networking applications: In the literature, a variety of
were assigned the same CWmin/CWMax. works can be found proposing to use RL to solve networking
The full source code of the example can be found in our related problems. We present two of those in more detail with
repository under ./examples/opengym/linear-mesh/. emphasis on the proposed RL mapping.
Li et al. [4] proposed RL-based Transmission Control Pro-
1 http://zeromq.org/ tocol (TCP) where the objective is to learn to adjust the TCP’s
2 https://developers.google.com/protocol-buffers/ CWND to increase an utility function, which is computed
Learning Performance VIII. Conclusions & Future Work
100 Steps In this paper, we presented the ns3-gym toolkit which
80 Reward dramatically simplifies the usage of reinforcement learning for
solving problems in the area of networking. This is achieved
60 by connecting the OpenAI Gym to the ns-3 network. As the
Time

40 framework is open source it can easily be extended by the


community.
20 For the future, we plan to define a set of well-known
0 networking problems, e.g. network transport control, which
0 25 50 75 100 125 150 175 200 can be used to benchmark different RL techniques. Moreover,
Episode we will adjust the framework and provide examples showing
Fig. 4. Learning performance of RL-agent in Cognitive Radio example. how it can be used with more advanced RL techniques, e.g.
A3C [20] that uses multiple agents interacting with their own
copies of the environment for more efficient learning. The
based on the measurement of flow throughput and latency. independent, hence more diverse, experience of each agent
The identified state space consists of EWMA of the ACK is periodically fused to the global network.
inter-arrival, EWMA of packet inter-sending time, RTT ratio, We believe that ns3-gym will foster machine learning re-
slow start threshold and current CWND is available in the search in the networking area and research community will
provided environment. Moreover, the action space consists of grow around it. Finally, we plan to set up a website allowing
increasing and decreasing the CWND respectively. Finally, the researchers to share their results and compare the performance
reward is specified by the value of a utility function, reflecting of algorithms for various environments using the same virtual
the desirability of the action picked. conditions — so-called leaderboard.
Mao et al. proposed an RL-based adaptive video stream- Acknowledgments: We are grateful to Georg Hoelger for
ing [8] called Pensieve which learns the Adaptive Bitrate helping us with the implementation of the presented illustrative
(ABR) algorithm automatically through experience. The ob- examples.
servation state consists among other things of past chunk
throughput and download time and current buffer size. The References
action space consists of the different bitrates which can be [1] S. Chinchali, P. Hu, T. Chu, M. Sharma, M. Bansal, R. Misra, M. Pavone,
selected for the next video chunk. Finally, the reward signal and S. Katti, “Cellular network traffic scheduling with deep reinforce-
ment learning,” in AAAI, 2018.
is derived directly from the QoE metric, which considers the [2] R. Atallah, C. Assi, and M. Khabbaz, “Deep reinforcement learning-
three QoE factors: bitrate, rebuffering, smoothness. based scheduling for roadside communication networks,” in WiOpt,
2017.
[3] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource manage-
Extension of OpenAI Gym: Zamora et al. [19] provided an ment with deep reinforcement learning,” in HotNets. ACM, 2016.
extension of the OpenAI Gym for robotics using the Robot [4] W. Li, F. Zhou, K. R. Chowdhury, and W. M. Meleis, “QTCP: Adaptive
Congestion Control with Reinforcement Learning,” IEEE Transactions
Operating System (ROS) and the Gazebo simulator with a on Network Science and Engineering, 2018.
focus on creating a benchmarking system for robotics allowing [5] K. Winstein and H. Balakrishnan, “Tcp ex machina: Computer-generated
direct comparison of different techniques and algorithms using congestion control,” in SIGCOMM. ACM, 2013.
[6] Y. Kong, H. Zang, and X. Ma, “Improving TCP Congestion Control
the same virtual conditions. Our work aims the same but with Machine Intelligence,” in NetAI. ACM, 2018.
targets the networking community. [7] D. S. A. T. Asaf Valadarsky, Michael Schapira, “Learning to route with
deep rl,” in NIPS, 2017.
Chinchali et al. [1] build a custom network simulator for IoT [8] H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video stream-
using OpenAI’s Gym environment in order to study scheduling ing with pensieve,” in SIGCOMM. ACM, 2017.
of cellular network traffic. With our framework, it would be [9] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in
robotics: A survey,” The International Journal of Robotics Research,
easier to perform such an analysis as the ns-3 already contains vol. 32, no. 11, pp. 1238–1274, 2013.
lots of MAC schedulers which would serve as the baseline for [10] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman,
comparison. J. Tang, and W. Zaremba, “OpenAI Gym,” CoRR, 2016. [Online].
Available: http://arxiv.org/abs/1606.01540
[11] OpenAI, “OpenAI Gym documentation,” https:gym.openai.com, ac-
Custom RL solutions for networkings: Winstein et al. [5] cessed: 2018-09-20.
[12] ——, “OpenAI Gym source code,” https:github.comopenaigym, ac-
implemented a RL-based TCP congestion control algorithm on cessed: 2018-09-20.
the basis of the outdated ns-2 network simulator. Newer work [13] NS-3 Consortium, “ns-3 documentation,” https:www.nsnam.org, ac-
on Q-learning for TCP can be found here [8]. In contrast to cessed: 2018-09-20.
[14] ——, “ns-3 source code,” http:code.nsnam.org, accessed: 2018-09-20.
our work both proposed approaches are not generic as only an [15] H. Tazaki, F. Uarbani, E. Mancini, M. Lacage, D. Camara, T. Turletti,
API meant for reading and controlling TCP parameters was and W. Dabbous, “Direct code execution: revisiting library OS architec-
presented. Moreover, custom RL libraries were used. Finally, ture for reproducible network experiments,” in CoNEXT. ACM, 2013.
[16] G. Carneiro, H. Fontes, and M. Ricardo, “Fast prototyping of network
the source code of the above mentioned extensions is not protocols through ns-3 simulation model reuse,” Simulation modelling
available. practice and theory, 2011.
[17] C. Buratti and R. Verdone, “L-CSMA: A MAC Protocol for Multihop
Linear Wireless (Sensor) Networks,” IEEE Transactions on Vehicular
Technology, 2016.
[18] M. Olbrich, A. Zubow, S. Zehl, and A. Wolisz, “Wiplus: Towards lte-u
interference detection, assessment and mitigation in 802.11 networks,”
in European Wireless 2017; 23th European Wireless Conference; Pro-
ceedings of. VDE, 2017, pp. 1–8.
[19] I. Zamora, N. G. Lopez, V. M. Vilches, and A. H. Cordero, “Extending
the openai gym for robotics: a toolkit for reinforcement learning using
ros and gazebo,” arXiv preprint arXiv:1608.05742, 2016.
[20] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap,
T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods
for Deep Reinforcement Learning,” CoRR, 2016. [Online]. Available:
http://arxiv.org/abs/1602.01783

You might also like