Machine Learning in Traffic Classification of SDN - Final Project Report
Machine Learning in Traffic Classification of SDN - Final Project Report
Abstract
Traffic Classification has become an important research area especially with the advancements
of Machine Learning and Software Defined Networking. In this project, two machine learning models –
Logistic Regression (supervised) and K-Means Clustering (unsupervised) were used to classify DNS,
Telnet, Ping, and Voice traffic flows simulated by the Distributed Internet Traffic Generator (D-ITG) tool.
Each host in the network was connected through an overlay network to an Open vSwitch (OVS). The OVS
was connected to a Ryu controller which collected basic flow statistics between hosts. These statistics
were then parsed by a Python traffic classification script which periodically outputted the learned traffic
labels of each flow. Logistic Regression was found to work much better than K-Means Clustering. Further
improvements in the project could be to add functionality to detect unique traffic flows between the
The classification of traffic flows in today’s IP networks has become an important research area with the
adoption of Machine Learning (ML) techniques and Software Defined Networking (SDN) principles.
Traditional methodologies including identifying traffic based on port number and payload inspection are
not effective due to the dynamic and encrypted nature of current traffic. This project will attempt to
utilize Supervised and Unsupervised ML algorithms to classify flows by their required bandwidth,
required QoS, and their application based on various flow level details as features.
As mentioned earlier, traffic classification using Machine Learning is a growing trend in the network
analytics domain. One application of these techniques is in cybersecurity. Large datasets produced from
enormous Internet traffic flows are difficult to process and analyze, even for talented experts in the field
using sophisticated tools. In addition, ML classification and clustering of flows can help identify network
hotspots and potential bottlenecks. When using bandwidth and QoS of flows as classifiers, we can use
Traffic Engineering (TE) to adjust flow paths and add virtual resources to the network infrastructure.
Finally, identification of applications or web-based protocols is important for forecasting future trends
and ensuring the network can meet the demand. Network classification is therefore of great interest to
Implementation
A simple network topology was created using VirtualBox as shown in Figure 1. It consisted of 5 Virtual
Machines: 1 Controller, 1 Layer 2 Switch, and 3 Hosts. The nodes in the network were modeled as VMs
so that network traffic would experience some delay. Another option was to use Mininet but it wasn’t
chosen due to the network simulation taking place within a single VM. An overlay network was deployed
so that the traffic generated by the hosts went through the Switch VM instead of using the VirtualBox
internal switching mechanism to communicate. The hosts in the network used OVS with 2 interfaces.
The first interface was internal and the second connected towards the Switch VM OVS using VXLAN
tunnelling. The Switch OVS had VXLAN interfaces towards the hosts and connected to the Controller VM
directly (i.e. using underlay IP). The Ryu controller was deployed on the Controller VM.
After the network is successfully deployed and configured, the controller should be aware of packets
flowing through the switch between any of the hosts. The sample Python script, simple_monitor_13.py,
This data was used as input data to the traffic_classifier.py script. The script used the input data to
• time_start: the UTC clock value at the time the flow is first detected
• forward_packets: the total number of packets seen in the forward direction (src -> dst)
• forward_bytes: the total number of Bytes seen in the forward direction (src -> dst)
• forward_delta_packets: the number of packets seen since the last forward flow detection
• forward_delta_bytes: the number of Bytes seen since the last forward flow detection
• forward_inst_pps: the instantaneous packets per second in the forward direction (src->dst)
• forward_avg_pps: the average packets per second in the forward direction (src->dst)
• forward_inst_bps: the instantaneous Bytes per second in the forward direction (src->dst)
• forward_avg_bps: the average Bytes per second in the forward direction (src->dst)
• forward_last_time: the UTC clock value of the last time forward flow was detected
• reverse_packets: the total number of packets seen in the reverse direction (dst -> src)
• reverse _bytes: the total number of Bytes seen in the reverse direction (dst -> src)
• reverse _delta_packets: the number of packets seen since the last reverse flow detection
• reverse _delta_bytes: the number of Bytes seen since the last reverse flow detection
• reverse _inst_pps: the instantaneous packets per second in the reverse direction (dst->src)
• reverse _avg_pps: the average packets per second in the reverse direction (dst->src)
• reverse _inst_bps: the instantaneous Bytes per second in the reverse direction (dst->src)
• reverse _avg_bps: the average Bytes per second in the reverse direction (dst->src)
• reverse _last_time: the UTC clock value of the last time reverse flow was detected
1) Collect Training Data – collect training data for a specified traffic type. The traffic must be
2) Classify using Supervised Machine Learning – classify traffic type of flow between hosts using
Logistic Regression
3) Classify using Unsupervised Machine Learning – classify traffic type of flow between hosts using
K-Means Clustering.
The Distributed Internet Traffic Generator (D-ITG) application was used to generate the traffic flow data
used for training the Machine Learning models. D-ITG is described as ‘a platform capable to produce
IPv4 and IPv6 traffic by accurately replicating the workload of current Internet applications. D-ITG can
generate traffic following stochastic models for packet size (PS) and inter departure time (IDT) that
mimic application-level protocol behavior. D-ITG is able to replicate statistical properties of traffic of
different well-known applications (e.g. Telnet, VoIP – G.711, G.723, G.729, Voice Activity Detection,
Compressed RTP – DNS, network games’ [1]. For the purposes of this proof of concept traffic
classification, the following traffic types were used: Ping, Telnet, DNS, Voice (G.711). The choice of traffic
classes was due to limitations in simulation tools and issues faced in using D-ITG as discussed further in
• Telnet - Generates traffic with Telnet characteristics. It works with TCP transport layer protocol.
• DNS - Generates traffic with DNS characteristics. It works with both UDP and TCP transport layer
protocols.
• VoIP (voice) - Generate traffic with VoIP characteristics. It only works with UDP transport layer
protocol. Different settings will be ignored. The emulation of G.711 codec is used.
For Ping traffic, a simple ‘ping’ command was run with the overlay IP of the destination host.
The process to collect training data for the models is as follows: First simulate the flow of the specific
traffic between a certain pair of hosts using D-ITG or other tools. Second, start the traffic_classifier.py
script with the appropriate options for training the traffic type. The script starts the Ryu controller and
simple monitor script is collected and transformed to update the attributes of a Flow object. The
attributes of the Flow object are then periodically outputted to a CSV file. After the CSV files for each
traffic type are generated, they are combined in a complete Pandas Dataframe object used for the
The first Machine Learning algorithm used was a supervised Logistic Regression model. It is used to
predict categorical target variables. Mostly, the outcome is a binary value but in the case of multiple
targets, it picks the target with the highest probability of occurring. In our program, Logistic Regression
performed exceptionally well with an accuracy of over 99%. It is clear to see from the decision
boundaries, Figure 2, formed with the first two Principle Components why the accuracy is so high.
A Confusion Matrix can help pinpoint where the model is failing to accurately decide the target. It is a
matrix with predicted labels on the y-axis and true labels on the x-axis. If a model is tending to predict a
certain traffic class as another class more often, then it will be clearly evident in the Confusion Matrix.
However, we see, in Figure 3, that for Logistic Regression, there is almost no failure.
Figure 3: Confusion Matrix for Logistic Regression
K-Means Clustering is an unsupervised Machine Learning model that groups data points in multi-
dimensional space together to form clusters and label each data point in the cluster the same. The initial
desired number of clusters are chosen – in our case of four traffic types, four clusters were used as an
input parameter to the model. In Figure 4 below, we can attempt to visualize where each centroid of the
four clusters are. Each square represents a dimension. The darker the color, the higher the value in that
dimension. This shows where in the 12-dimensional vector space, the cluster centroids lie.
dimensions from 12 to 2 and then plot in two-dimensional space. We can clearly understand why model
performance for this algorithm was only around 30%. It is not very good at clustering non-circularly
groups of data.
Finally, we confirm our understanding by looking at the confusion matrix and seeing a very disbalanced
matrix.
Although the proof of concept design of this problem works well, it is limited in the following cases:
1) It does not classify multiple flows between the same source and destination nodes. For example,
if we start a ping command from host1 to host2, then add voice traffic from host1 to host2, it
will assume the ping and voice are 1 unique flow and classify as such.
2) If a flow is started and stopped, the algorithm will not delete the old flow but instead update it
with the latest flow statistics. This is a problem because the flow classifier considers average
packet size and average number of Bytes. If the flow is stopped for a significant amount of time,
these two features will be reduced, and the resulting label of traffic will be inaccurate.
3) The simulation tool (D-ITG) was unable to generate flows for game play or other video traffic.
With a lot of internet traffic being video nowadays, this would have been helpful for real world
classification situations.
4) We see that unsupervised learning using K-Means clustering performs very poorly. Perhaps this
model needs to be further tuned using Hyperparameter tuning. We can also consider other
models such as DBSCAN (Density Based Scanning) or CNN (Convoluted Neural Networks) for
In addition to the above, the following items will also help improve the quality of the project.
• Use visual analytics to point out ‘hot’ areas of high bandwidth and QoS traffic
• Connect ML application to actual SAVI testbed controllers and visualize real traffic flows
References
[3] QoS -aware Traffic Classification Architecture Using Machine Learning and Deep Packet Inspection
https://reader.elsevier.com/reader/sd/pii/S1877050918307129?token=481A2C61587FE2C6743C320DD
C7612381E99A6BAD5300B53B6C87D475380BC37D4A801FA23FA6B514383FD67EA186865
[5] Identification and Selection of Flow Features for Accurate Traffic Classification in SDN
https://ieeexplore.ieee.org/document/7371715?ALU=LU1046369