Intrusion Detection Using Decision Tree Approach
Intrusion Detection Using Decision Tree Approach
Jagdish Bhatta
Central Department of Computer Science and IT, Tribhuvan University, Nepal
[email protected]
Youbaraj Paudyal
Central Campus of Science and Technology, Midwestern Univeristy, Nepal
[email protected]
Nawaraj Paudel
Central Department of Computer Science and IT, Tribhuvan University, Nepal
[email protected]
Introduction
• With the tremendous growth of the usage of
computers over network and development in the
application running on various platform captures the
attention towards network security.
• Intrusion is any attack that aims to compromise the
security of any network. To handle intrusion
detection mechanism, intrusion detection system
(IDS) is used.
Introduction
• An Intrusion Detection System (IDS) is a system that
monitors network to check harmful activities in the
network and reports events that does not meet the
security criteria to the network administrator. IDS
was first introduced in 1980 by James P. Anderson
and then improved by D Denning in 1948.
Introduction
• There are various classifiers that are applicable to
misuse-based detection. Some are tree based such as
decision tree, and random forest, whereas some are rule
based such as oneR, while some are function based such
as SVM (Support Vector Machine).
• In this study J48 and CART decision tree1,2 are
implemented and analyzed to classify input data from
NSL-KDD as normal or anomalous.
• Decision trees are utilized to describe decision-making
process. A decision tree is a tree-like graph consisting of
internal nodes which represent a test on an attribute and
branches which denote the outcome of the test and leaf
nodes which signify a class label.
Introduction
• Decision trees can analyze data and identify
significant characteristics in the network that
indicate malicious activities. It can add value to many
real-time security systems by analyzing large set of
intrusion detection data. It can recognize trends and
patterns that support further investigation, the
development of attack signatures, and other
activities of monitoring.
Literature Review
• D.P. Gaikwad, and R.C. Thool have applied Genetic Algorithm
(GA) for intrusion detection and resulted an accuracy of 79%3.
• P. Aggarwal, and S.K. Sharma4 evaluated classification
algorithms such as Random Forest, C4.5, Naïve Bayes, and
Decision Table. Random Tree shows the best results overall
while the algorithms that have high detection rate and low
false alarm rate were C4.5 and Random Forest.
• S. Mallissery, S. Kolekar, and R. Ganiga5 Mallissery, Kolekar and
Ganiga [5] show how useful is the NSL-KDD for various
intrusion detection models. They use PCA technique for
dimensionality reduction. They resulted that SVM gives the
highest accuracy.
Literature Review
• M. El basiony et al. have proposed a hybrid detection
framework that depends on data mining classification and
clustering techniques. In misuse detection, random forests
classification algorithm is used to build intrusion patterns
automatically from a training dataset, and then matches
network connections to these intrusion patterns to detect
network intrusions. In anomaly detection, the k-means
clustering algorithm is used to detect novel intrusions by
clustering the network connections data to collect the most of
intrusions together in one or more clusters6.
Literature Review
• M. Kumar et al. have analyzed the performance of IDS using
different decision tree algorithms and they found that the
performance was better in C5.0 7.
• Dataset
– NSL-KDD dataset9,10 is used.
Methodology
J48 Classifier
• The C4.5 algorithm for building decision trees is
implemented in Weka as a classifier called J48. It is a
greedy and a top-down approach to decision tree
making.
• It is used for classification in which new data is
labeled according to already existing observations
(training data set).
Methodology
J48 Classifier
• Step-1: Create a root node N;
• Step-2: IF (T belongs to same category C)
{leaf node = N; Mark N as class C; Return N; }
• Step-3: For i=1 to n
{Calculate Information_gain (Ai);}
• Step-4: ta= testing attribute;
• Step-5: N.ta = attribute having highest information_gain;
• Step-6: if (N.ta == continuous )
{ find threshold;}
• Step-7: For (Each T in splitting of T)
if (T is empty) {child of N is a leaf node;}
else {child of N= dtree T)}
• Step-8: Calculate classification error rate of node N;
• Step-9: return N;
Methodology
CART Classifier
• Simple Cart method is CART (Classification And
Regression Tree) analysis implemented in WEKA.
• It is used for data exploration and prediction also.
Classification and regression trees are classification
methods which in order to construct decision trees
uses historical data. CART uses learning sample
which is a set of historical data with preassigned
classes for all observations for building decision tree.
Methodology
CART Classifier
• Step-1: Start at the root node.
• Step-2: For each ordered variable X,
convert it to an unordered variable X’ by grouping its values in the node
into a small number of intervals if X is unordered, then set X’ = X
• Step-3: Perform a chi-squared test of independence of each X’ variable
versus Y on the data in the node and compute its significance probability.
• Step-4: Choose the variable X∗ associated with the X’ that has the smallest
significance probability.
• Step-5: Find the split set {X∗ ∈ S∗} that minimizes the sum of Gini indexes
and use it to split the node into two child nodes.
• Step-6: If a stopping criterion is reached, exit.
Otherwise, apply steps 2–5 to each child node.
• Step-7: Prune the tree with the CART method.
Methodology
• Information Gain
• The information gain is used to select the splitting attribute in each
node in the tree. The attribute with higher information gain is
chosen as the splitting attribute for the current node.
• Where, S is the set of all the examples in the given training set. S_a
is the subset of S, values (a) is the set of all possible values of
attribute a and |a| is the total number of values in attribute a.
• Entropy can be calculated as
Methodology
• Information Gain
• The information gain of an attribute in CART Classifier is
computed as;
Normal Anomaly
Normal Anomaly
0.9866
0.9864
Accuracy in Percentage
0.9862
J48
CART
0.986
0.9858
0.9856
0.9854
Classifier
Results
Graph Showing time in seconds to build tree by J48 and CART Classifier
0.4
0.35
0.3
0.25
Seconds
0.2
J48 CART
0.15
0.1
0.05
0
Classifier
Conclusion
• Intrusion is any attack that aims to compromise the security of any
network. Intrusion detection decision tree assists the network
administrator to decide about the incoming traffic, i.e., whether the
coming data is malicious or not by providing a model that separates
malicious and non-malicious traffic.
• In this study, J48 and CART classifiers are used for detecting possible
intrusions in order to detect whether the input data is malicious or
non-malicious. During the experiment, the input test data are
classified as normal and anomaly.
• From the result of experiments, it is observed that the accuracy for
J48 classifier is 98.59% and for CART classifier is 98.67%.
• J48 is found to be computationally efficient as it builds the model in
3.94 seconds while CART takes 36.48 seconds.
References
1 N. Bhargava, G. Sharma, R. Bhargava, and M. Mathuria (2013), "Decision Tree
Analysis on J48 Algorithm for Data Mining." International journal of advance
research in computer science and software engineering
2 P.N. Tan, M. Steinbach, V. Kumar (2005), Introduction to Data Mining, Pearson
Addison Wesley
3 D.P. Gaikwad, and R.C. Thool (2015), Intrusion Detection System Using Bagging
with Partial decision Tree Base Classifier, Proc. of the 4th International
Conference on Advances in Computing, Communication and Control
4 P. Aggarwal, and S.K. Sharma (2015), An Empirical Comparison of Classifiers to
Analyze Intrusion Detection, Proc. of Fifth International Conference an
Advanced Computing and Communication Technologies
5 S. Mallissery, S. Kolekar, and R. Ganiga (2013), Accuracy Analysis of Machine
Learning Algorithms for intrusion Detection System using NSL-KDD Dataset,
Proc. International Conference on Future Trends in Computing and
Communication FTCC
References
6 M. El basiony , E. A. Sallam , T. E. Eltobely , M. M. Fahmy (2013) A hybrid
network intrusion detection framework based on random forests and
weighted k-means, Ain Shams Engineering Journal, 753-762
7 M. Kumar , M. Hanumanthappa, T. V. Suresh Kumar (2012), Intrusion
Detection System Using Decision Tree Algorithm, Proceeding of IEEE 14th
International Conference on Communication Technology, DOI:
10.1109/ICCT.2012.6511281
8 G. Zhai and C. Liu (2010), Research and Improvement on ID3 Algorithm in
Intrusion Detection System, Sixth International Conference on Natural
Computation
9 M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani (2009), A detailed analysis
of the KDD CUP 99 data set, in Proc. IEEE Symp. Comput. Intell. Secur.
Defense Appl.
10 NSL-KDD dataset [online] available: http://nsl.cs.unb.ca/nsl-kdd/.