A_Decision_Tree_Algorithm_for Intrusion-Detection
A_Decision_Tree_Algorithm_for Intrusion-Detection
Keywords - Decision Tree, Information Gain, Gain Ratio, NSL-KDD, Signature-based IDS
----------------------------------------------------------------------------------------------------------------------------- ---------------------
Date of Submission: Dec 22, 2015 Date of Acceptance: Dec 28, 2015
-------------------------------------------------------------------------------------------------------------------------------------------- ------
1. INTRODUCTION is chosen as it is the most prominent attribute to
separate the data. The tree is constructed by identifying
An Intrusion Detection System (IDS) is a system that attributes and their associated values which will be used
monitors network to check harmful activities in the to analyze the input data at each intermediate node of
network and reports events that does not meet the the tree. After the tree is formed, it can prefigure newly
security criteria to the network administrator. IDSs are coming data by traversing, starting from a root node to
categorized as Signature based and Anomaly based. the leaf node visiting all the internal nodes in the path
Signature or Misuse based IDS uses various techniques depending upon the test conditions of the attributes at
to locate the similarity among system behavior and each node [5]. The main issue in constructing decision
previously known attacks stored in the signature tree is, which value is chosen for splitting the node of
database. Anomaly based IDS detects activities in a the tree. This issue is taken care in section 3.
network which deviates from normal behaviors stored
in system profiles database. There are various classifiers Decision trees can analyze data and identify significant
that are applicable to misuse based detection. Some are characteristics in the network that indicate malicious
tree based such as decision tree [1], and random forest activities. It can add value to many real-time security
[2], whereas some are rule based such as oneR [3], systems by analyzing large set of intrusion detection
while some are function based such as SVM (Support data. It can recognize trends and patterns that support
Vector Machine) [4]. In this paper, the decision tree further investigation, the development of attack
classifier is used to classify input data as normal or signatures, and other activities of monitoring. The main
anomalous. advantage of using decision trees instead of other
classification techniques is that they provide a rich set
A Decision Tree is a tree-like graph consisting of of rules that are easy to understand, and can be
internal nodes which represent a test on an attribute and effortlessly integrated with real-time technologies [6].
branches which denote the outcome of the test and leaf
nodes which signify a class label. The classification NSL-KDD is the latest dataset for intrusion detection.
rules are formed by the path selected from the root node This dataset consists of 41 features, however not all the
to the leaf. To divide each input data, first the root node
Int. J. Advanced Networking and Applications 2829
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
features are of equal importance. If complete feature set using CFS and the classification is done using Naïve
is used for classification input data, then the classifier Bayes, C4.5 decision tree, and AD (Alternating
will take more time to detect intrusion and they can also Decision) tree and their results have been compared.
affect the accuracy of the classifier. That’s why before CONS selects 10 useful features out of 41 and then the
performing any classification, we need to reduce this set classification using various techniques such as Random
by applying some feature selection method. Feature Forest and Random Tree has been analyzed.
selection is done to remove irrelevant and redundant
features. In the literature, there are various feature Revathi, and Malathi[13] select 15 features using CFS
selection methods such as information gain [7], PCA and test using various classifiers such as Random
(Principle Component Analysis), and GA (Genetic Forest, C4.5 decision tree, SVM, CART and Naïve
Algorithm). For classification of network data several Bayes. The results of above mentioned classifiers have
classifiers are available such as KNN (k-nearest been compared and the outcome shows that Random
neighbor), SVM, ANN (Artificial Neural Network), and Forest gives highest accuracy in detecting attacks.
decision tree. C4.5 builds decision tree by using the Jyothsna and Prasad [14] evaluates and compares the
notion of information entropy from a set of training performance of IDSs for different feature selection
data. At each node of the tree, the algorithm picks out techniques such as information gain, gain ratio,
an attribute which most efficiently divides the set of Optimized Least Significant Particle based Quantitative
given data into smaller subsets associated with any class Particle Swarm Optimization (OLSP-QPSO), and
in the given training set. The dividing factor here is the Principle Component Analysis (PCA). Results show
gain ratio. The attribute with the highest gain ratio is that the OLSP-QPSO technique has more number of
selected to do the judgment [8]. attribute reduction and low false alarm rate with high
detection rate when compared with the remaining
The rest of the paper is structured as follows: Section 2 feature selection techniques.
gives a brief related work of intrusion detection based
on feature selection and classifiers. In section 3 In [15], the authors proposed hybrid KNN and Neural
provides the proposed algorithm DTS for developing Network based multilevel classification model. In this
decision tree. In section 4, the experimental results model, KNN was used as a classifier for anomaly
using NSL-KDD dataset is shown. Section 5, includes detection with two classes, namely, 'normal' and
conclusion and scope for future work. 'abnormal'. After that a neural network was used to
detect a specific type of attack in 'abnormal' class. For
2. RELATED WORK experiments, the NSL-KDD dataset was used. First, all
the features of the dataset were used for classification.
Literature review has been done including latest papers Then classification is performed on 25 selected features.
that perform training and testing of the system on NSL- Selection has been done by Rough Set Theory and
KDD dataset. The review is performed based on feature Information Gain separately. In this classification
selection and classification. model, Information Gain with 25 features of the NSL-
KDD dataset produced better results as compared to 25
2.1 REVIEW BASED ON FEATURE SELECTION features with Rough Set Theory as well as 41 features
of NSL-KDD dataset.
Gaikwad and Thool [9] have applied Genetic Algorithm
(GA) on NSL-KDD data set to select relevant features. 2.2 REVIEW BASED ON CLASSIFIERS
The GA selects 15 features out of 41 from the available
data set. These 15 features gives 79% accuracy on test Elekar, and Waghmare [16] implement different
data with decision tree as a classifier and it takes 176 classifiers such as C4.5 decision tree, Random Forest,
seconds to build the model. Hoeffding Tree and Random Tree for intrusion
Bajaj and Arora [10] discuss various feature selection detection and compare the result using WEKA. The
methods such as information gain, gain ratio and results show that the Hoeffding Tree gives the best
correlation based feature selection. In their paper, they result among the various classifiers for detecting attacks
select 33 features out of 41 for classification and the on the test data.
results for various classifiers are compared. The Simple Aggarwal and Sharma [1] evaluate ten classification
Cart algorithm gives the highest accuracy 66.77% algorithms such as Random Forest, C4.5, Naïve Bayes,
whereas the classification result of C4.5 decision tree is and Decision Table. Then they simulate these
65.65% only. Alazab et al. [11] also select features classification algorithms in WEKA with KDD’99
using information gain and decision tree to detect both dataset. These ten classifiers are analyzed according to
the old and the new attacks. metrics such as accuracy, precision, and F-score.
Random Tree shows the best results overall while the
Thaseen and Kumar [12] used two useful methods for algorithms that have high detection rate and low false
feature selection, namely, Correlation based Feature alarm rate were C4.5 and Random Forest.
Selection (CFS) and Consistency-based Feature In [17] the authors show how useful is the NSL-KDD
Selection (CONS). In this paper, 8 features are selected for various intrusion detection models. For
Int. J. Advanced Networking and Applications 2830
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
dimensionality reduction, PCA technique was used in 1. If all the given training examples belong to the
this paper. Six different algorithms, namely, ID3, Bayes same class, then a leaf node is created for the
Net, J48, CART, SVM, and Naïve Bayes were used for decision tree by choosing that class.
the experimentation with and without feature reduction,
and from the results it was clear that SVM gives the 2. For every feature 'a', calculate the gain ratio by
highest accuracy for the above two cases. dividing the information gain of an attribute
In [18], the authors designed a multi-layer hybrid with splitting value of that attribute. The
machine learning IDS. PCA was used for attribute formula for gain ratio is
selection and only 22 features were selected in the first IG(a)
GainRatio(a)
layer of the IDS. GA was used in the next layer for Split(a)
generating detectors, which can distinguish between where, S is the set of all the examples in the
normal and abnormal behavior. In the third layer, given training set.
classification was done using several classifiers. Results
demonstrate that the Naive Bayes has good accuracy for 3. Information gain of an attribute is computed as
two types of attacks, namely, User-to-Root (U2R) and | S_a |
Remote-to-Local (R2L) attacks however the decision IG(a) Ent(S) * Ent(S_a)
tree gives higher accuracy up to 82% for Denial-of- a_val values(a) | a |
Service attacks and 65% of probe attacks.
The system proposed by Raeeyat et al. in [19] consists where, S_a is the subset of S, values (a) is the
of 4 modules namely, Data pre-processing module, set of all possible values of attribute ‘a’ and |a|
Misuse detection module, anomaly detection module is the total number of values in attribute ‘a’
and Evaluation and comparison module. Data were pre-
processed before passing to the other modules by data 4. Entropy can be calculated as
pre-processing module. In the misuse detection module, num_class freq(Lj, S) freq(Lj, S)
pre-processed data is given to PCA to take out Ent(S) * log2( )
j 1 | S | |S|
important features. After that the data were examined
using Adaboost algorithm based on C4.5 decision tree
to know whether it is a normal packet or an intrusion. where, L = L1,L2,…, Ln is the set of classes,
Then the outcome of decision tree is passed on to the andnum_class is the number of distinct classes.
next module for evaluation and comparison. When the For our consideration num_class has only two
data is sent to misuse detection module it is values, namely, ‘normal’, and ‘anomaly’.
simultaneously sent to anomaly detection module also.
The correlation among features was also found out by 5. Split value of an attribute is chosen by taking
the correlation unit by using Pearson Correlation. Data the average of all the values in the domain at
correlation graph is used to show deviation of behavior that particular attribute. It can be formulated as
from the normal behavior. Then, the evaluation and m
a _ val i
comparison module determine whether the instance is
i 1
an intrusion or not by taking the output from misuse and Split(a)
anomaly detection module and if both the module m
shows that it is an intrusion then only that instance is where m is the number of values of an attribute
considered as an intrusion. ‘a’.
In [20], a hybrid IDS is proposed. Random Forest is
used for classification in misuse detection to build 6. Find the attribute with the highest gain ratio.
patterns of intrusion from a training dataset. Weighted Suppose, the highest gain ratio is for the
k-means clustering is used in anomaly detection. Due to attribute 'a_best'.
less correlation between clusters, there is high false
alarm rate as the number of clusters increases for 7. Construct a decision node that divides the
detecting larger number of attacks. dataset on the attribute 'a_best'.
Figure 2. Comparison of ROC Curve of proposed algorithm with various other classifiers
Figure 3. Time of construction of model of the proposed algorithm with various tree based classifiers
Int. J. Advanced Networking and Applications 2833
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
5. CONCLUSION AND FUTURE SCOPE OF [4]C. Cortes, and V. Vapnik, (1995). Support-vector
WORK networks, Machine Learning 20 (3): 273.
Decision tree assists the network administrator to [5] P.N. Tan, M. Steinbach, V. Kumar, Introduction to
decide about the incoming traffic, i.e., whether the Data Mining, Pearson Addison Wesley, 2005.
coming data is malicious or not by providing a model
that separates malicious and non-malicious traffic. By [6] J. Markey, Using Decision Tree Analysis for
modified the split value calculation by taking the Intrusion Detection: A How-To Guide, SANS Institute
average of all the values in the domain of an attribute. InfoSec Reading Room, June, 2011.
The algorithm provides uniform weightage to all the
values in the domain. It allows taking less number of [7] T. M. Mitchell, (1997). Machine Learning. The Mc-
attributes and provides acceptable accuracy in Graw-Hill Companies, Inc. ISBN 0070428077.
reasonable account of time. From the results of the
experiments, it is concluded that the proposed algorithm [8] J.R. Quinlan, C4.5: Programs for Machine Learning,
for signature based intrusion detection is more efficient Morgan Kaufmann Publishers, 1993.
with respect to finding attacks in the network with less
number of features and it takes less time to construct the [9] D.P. Gaikwad, and R.C. Thool, Intrusion Detection
model. It is also concluded that the efficiency depends System Using Bagging with Partial Decision Tree Base
on the size of the data set and the number of features Classifier,Proc. of the 4th International Conference on
used to construct the decision tree. The formula used in Advances in Computing, Communication and Control,
DTS to calculate gain ratio can also be used in attribute 2015.
selection for feature reduction. Our future scope of
work is to improve the split value by using concepts [10] K. Bajaj, and A. Arora, Improving the Intrusion
such as geometric mean which also gives uniform Detection using Discriminative Machine Learning
weightage to the domain values. Approach and Improve the Time Complexity by Data
Mining Feature Selection Methods, International
REFERENCES Journal of Computer Science, vol. 76, Aug, 2013.
[1] P. Aggarwal, and S.K. Sharma, An Empirical [11] A. Alazab, M. Hobbs, J.Abawajy, and M. Alazab,
Comparison of Classifiers to Analyze Intrusion Using Feature Selection for Intrusion Detection System,
Detection, Proc. of Fifth International Conference an International Symposium on Communications and
Advanced Computing and Communication Information Technologies, 2012.
Technologies, 2015.
[12] S. Thaseen, and Ch. A. Kumar, An Analysis of
[2] Ho, and Tin Kam, Random Decision Forests,Proc. Supervised Tree Based Classifiers for Intrusion
of the 3rd International Conference on Document Detection System, In Proc. of the 2013 International
Analysis and Recognition, Montreal, QC, 14–16 August Conference on Pattern Recognition, Informatics and
1995. pp. 278–282. Mobile Engineering, Feb, 2013.