Network Traffic Classification for Inferring Network Behavior

We present our work for the track of “Case studies demonstrating (dis)advantages of choosing AI/ML techniques for networking over more traditional ones”. This research work, the focus is on comparative analysis on classical ml models and neural models, in its ability to infer known network behavior from the payload independent communication patterns observed as statistical characteristics over a window time, by building a supervised learning multi class classifier. We carry out our experiments using NIMS data trace, a publically available network dataset, which has been widely used to build network models. Our premise of research is to build machine learning models using all known available traditional techniques, which include statistics, classic machine learning algorithms and neural networks too. This premise of this research exploration is to present the importance of each learning paradigm and see how we can how we can leverage all of them and benefit overall, than choosing one versus another. Such a premise will lead us to think in terms of different way an augmented solution can be built.

Decision trees

We have used J48 trees in our experiments. It is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool.

Algorithm

C4.5 builds decision trees from a set of training data using the concept of information entropy. The training data is a set of already classified samples. Each sample consists of a vector which contains features of the sample, as well as the class in which the sample falls. At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurses on the partitioned sublists.

Advantages

Simple to understand and interpret
Important insights can be generated
Help determine worst, best and expected values for different scenarios
Use a white box model
Can be combined with other decision techniques

Disadvantages

They are unstable, a small change in data can lead to a large change in the structure of the optimal decision tree
They are often relatively inaccurate but random forests of multiple decision trees can give better results
Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
code		code
dataset		dataset
models_and_algorithms		models_and_algorithms
results		results
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Network Traffic Classification for Inferring Network Behavior

Decision trees

Algorithm

Advantages

Disadvantages

About

Uh oh!

Releases

Packages

Languages

krishnamohanelluru/NTC

Folders and files

Latest commit

History

Repository files navigation

Network Traffic Classification for Inferring Network Behavior

Decision trees

Algorithm

Advantages

Disadvantages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages