0% found this document useful (0 votes)
27 views

A_Decision_Tree_Algorithm_for Intrusion-Detection

The document presents a Decision Tree Based Algorithm for Intrusion Detection, focusing on the development of a decision tree algorithm using the C4.5 approach. It addresses feature selection and split value determination to enhance the accuracy and efficiency of intrusion detection systems (IDS), particularly using the NSL-KDD dataset for experimentation. The proposed Decision Tree Split (DTS) algorithm demonstrates potential for effective signature-based intrusion detection.

Uploaded by

mahmud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

A_Decision_Tree_Algorithm_for Intrusion-Detection

The document presents a Decision Tree Based Algorithm for Intrusion Detection, focusing on the development of a decision tree algorithm using the C4.5 approach. It addresses feature selection and split value determination to enhance the accuracy and efficiency of intrusion detection systems (IDS), particularly using the NSL-KDD dataset for experimentation. The proposed Decision Tree Split (DTS) algorithm demonstrates potential for effective signature-based intrusion detection.

Uploaded by

mahmud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Int. J.

Advanced Networking and Applications 2828


Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290

Decision Tree Based Algorithm for Intrusion


Detection
Kajal Rai
Research Scholar, Department of Computer Science and Applications, Panjab University, Chandigarh, India
Email: [email protected]
M. Syamala Devi
Professor, Department of Computer Science and Applications, Panjab University, Chandigarh, India
Email: [email protected]
Ajay Guleria
System Manager, Computer Center, Panjab University, Chandigarh, India
Email: [email protected]
-----------------------------------------------------ABSTRACT------------------------------------------------------
An Intrusion Detection System (IDS) is a defense measure that supervises activities of the computer network and
reports the malicious activities to the network administrator. Intruders do many attempts to gain access to the
network and try to harm the organization’s data. Thus the security is the most important aspect for any type of
organization. Due to these reasons, intrusion detection has been an important research issue. An IDS can be
broadly classified as Signature based IDS and Anomaly based IDS. In our proposed work, the decision tree
algorithm is developed based on C4.5 decision tree approach. Feature selection and split value are important
issues for constructing a decision tree. In this paper, the algorithm is designed to address these two issues. The
most relevant features are selected using information gain and the split value is selected in such a way that makes
the classifier unbiased towards most frequent values. Experimentation is performed on NSL-KDD (Network
Security Laboratory Knowledge Discovery and Data Mining) dataset based on number of features. The time
taken by the classifier to construct the model and the accuracy achieved is analyzed. It is concluded that the
proposed Decision Tree Split (DTS) algorithm can be used for signature based intrusion detection.

Keywords - Decision Tree, Information Gain, Gain Ratio, NSL-KDD, Signature-based IDS

----------------------------------------------------------------------------------------------------------------------------- ---------------------
Date of Submission: Dec 22, 2015 Date of Acceptance: Dec 28, 2015
-------------------------------------------------------------------------------------------------------------------------------------------- ------
1. INTRODUCTION is chosen as it is the most prominent attribute to
separate the data. The tree is constructed by identifying
An Intrusion Detection System (IDS) is a system that attributes and their associated values which will be used
monitors network to check harmful activities in the to analyze the input data at each intermediate node of
network and reports events that does not meet the the tree. After the tree is formed, it can prefigure newly
security criteria to the network administrator. IDSs are coming data by traversing, starting from a root node to
categorized as Signature based and Anomaly based. the leaf node visiting all the internal nodes in the path
Signature or Misuse based IDS uses various techniques depending upon the test conditions of the attributes at
to locate the similarity among system behavior and each node [5]. The main issue in constructing decision
previously known attacks stored in the signature tree is, which value is chosen for splitting the node of
database. Anomaly based IDS detects activities in a the tree. This issue is taken care in section 3.
network which deviates from normal behaviors stored
in system profiles database. There are various classifiers Decision trees can analyze data and identify significant
that are applicable to misuse based detection. Some are characteristics in the network that indicate malicious
tree based such as decision tree [1], and random forest activities. It can add value to many real-time security
[2], whereas some are rule based such as oneR [3], systems by analyzing large set of intrusion detection
while some are function based such as SVM (Support data. It can recognize trends and patterns that support
Vector Machine) [4]. In this paper, the decision tree further investigation, the development of attack
classifier is used to classify input data as normal or signatures, and other activities of monitoring. The main
anomalous. advantage of using decision trees instead of other
classification techniques is that they provide a rich set
A Decision Tree is a tree-like graph consisting of of rules that are easy to understand, and can be
internal nodes which represent a test on an attribute and effortlessly integrated with real-time technologies [6].
branches which denote the outcome of the test and leaf
nodes which signify a class label. The classification NSL-KDD is the latest dataset for intrusion detection.
rules are formed by the path selected from the root node This dataset consists of 41 features, however not all the
to the leaf. To divide each input data, first the root node
Int. J. Advanced Networking and Applications 2829
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290

features are of equal importance. If complete feature set using CFS and the classification is done using Naïve
is used for classification input data, then the classifier Bayes, C4.5 decision tree, and AD (Alternating
will take more time to detect intrusion and they can also Decision) tree and their results have been compared.
affect the accuracy of the classifier. That’s why before CONS selects 10 useful features out of 41 and then the
performing any classification, we need to reduce this set classification using various techniques such as Random
by applying some feature selection method. Feature Forest and Random Tree has been analyzed.
selection is done to remove irrelevant and redundant
features. In the literature, there are various feature Revathi, and Malathi[13] select 15 features using CFS
selection methods such as information gain [7], PCA and test using various classifiers such as Random
(Principle Component Analysis), and GA (Genetic Forest, C4.5 decision tree, SVM, CART and Naïve
Algorithm). For classification of network data several Bayes. The results of above mentioned classifiers have
classifiers are available such as KNN (k-nearest been compared and the outcome shows that Random
neighbor), SVM, ANN (Artificial Neural Network), and Forest gives highest accuracy in detecting attacks.
decision tree. C4.5 builds decision tree by using the Jyothsna and Prasad [14] evaluates and compares the
notion of information entropy from a set of training performance of IDSs for different feature selection
data. At each node of the tree, the algorithm picks out techniques such as information gain, gain ratio,
an attribute which most efficiently divides the set of Optimized Least Significant Particle based Quantitative
given data into smaller subsets associated with any class Particle Swarm Optimization (OLSP-QPSO), and
in the given training set. The dividing factor here is the Principle Component Analysis (PCA). Results show
gain ratio. The attribute with the highest gain ratio is that the OLSP-QPSO technique has more number of
selected to do the judgment [8]. attribute reduction and low false alarm rate with high
detection rate when compared with the remaining
The rest of the paper is structured as follows: Section 2 feature selection techniques.
gives a brief related work of intrusion detection based
on feature selection and classifiers. In section 3 In [15], the authors proposed hybrid KNN and Neural
provides the proposed algorithm DTS for developing Network based multilevel classification model. In this
decision tree. In section 4, the experimental results model, KNN was used as a classifier for anomaly
using NSL-KDD dataset is shown. Section 5, includes detection with two classes, namely, 'normal' and
conclusion and scope for future work. 'abnormal'. After that a neural network was used to
detect a specific type of attack in 'abnormal' class. For
2. RELATED WORK experiments, the NSL-KDD dataset was used. First, all
the features of the dataset were used for classification.
Literature review has been done including latest papers Then classification is performed on 25 selected features.
that perform training and testing of the system on NSL- Selection has been done by Rough Set Theory and
KDD dataset. The review is performed based on feature Information Gain separately. In this classification
selection and classification. model, Information Gain with 25 features of the NSL-
KDD dataset produced better results as compared to 25
2.1 REVIEW BASED ON FEATURE SELECTION features with Rough Set Theory as well as 41 features
of NSL-KDD dataset.
Gaikwad and Thool [9] have applied Genetic Algorithm
(GA) on NSL-KDD data set to select relevant features. 2.2 REVIEW BASED ON CLASSIFIERS
The GA selects 15 features out of 41 from the available
data set. These 15 features gives 79% accuracy on test Elekar, and Waghmare [16] implement different
data with decision tree as a classifier and it takes 176 classifiers such as C4.5 decision tree, Random Forest,
seconds to build the model. Hoeffding Tree and Random Tree for intrusion
Bajaj and Arora [10] discuss various feature selection detection and compare the result using WEKA. The
methods such as information gain, gain ratio and results show that the Hoeffding Tree gives the best
correlation based feature selection. In their paper, they result among the various classifiers for detecting attacks
select 33 features out of 41 for classification and the on the test data.
results for various classifiers are compared. The Simple Aggarwal and Sharma [1] evaluate ten classification
Cart algorithm gives the highest accuracy 66.77% algorithms such as Random Forest, C4.5, Naïve Bayes,
whereas the classification result of C4.5 decision tree is and Decision Table. Then they simulate these
65.65% only. Alazab et al. [11] also select features classification algorithms in WEKA with KDD’99
using information gain and decision tree to detect both dataset. These ten classifiers are analyzed according to
the old and the new attacks. metrics such as accuracy, precision, and F-score.
Random Tree shows the best results overall while the
Thaseen and Kumar [12] used two useful methods for algorithms that have high detection rate and low false
feature selection, namely, Correlation based Feature alarm rate were C4.5 and Random Forest.
Selection (CFS) and Consistency-based Feature In [17] the authors show how useful is the NSL-KDD
Selection (CONS). In this paper, 8 features are selected for various intrusion detection models. For
Int. J. Advanced Networking and Applications 2830
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290

dimensionality reduction, PCA technique was used in 1. If all the given training examples belong to the
this paper. Six different algorithms, namely, ID3, Bayes same class, then a leaf node is created for the
Net, J48, CART, SVM, and Naïve Bayes were used for decision tree by choosing that class.
the experimentation with and without feature reduction,
and from the results it was clear that SVM gives the 2. For every feature 'a', calculate the gain ratio by
highest accuracy for the above two cases. dividing the information gain of an attribute
In [18], the authors designed a multi-layer hybrid with splitting value of that attribute. The
machine learning IDS. PCA was used for attribute formula for gain ratio is
selection and only 22 features were selected in the first IG(a)
GainRatio(a) 
layer of the IDS. GA was used in the next layer for Split(a)
generating detectors, which can distinguish between where, S is the set of all the examples in the
normal and abnormal behavior. In the third layer, given training set.
classification was done using several classifiers. Results
demonstrate that the Naive Bayes has good accuracy for 3. Information gain of an attribute is computed as
two types of attacks, namely, User-to-Root (U2R) and | S_a |
Remote-to-Local (R2L) attacks however the decision IG(a)  Ent(S)   * Ent(S_a)
tree gives higher accuracy up to 82% for Denial-of- a_val values(a) | a |
Service attacks and 65% of probe attacks.
The system proposed by Raeeyat et al. in [19] consists where, S_a is the subset of S, values (a) is the
of 4 modules namely, Data pre-processing module, set of all possible values of attribute ‘a’ and |a|
Misuse detection module, anomaly detection module is the total number of values in attribute ‘a’
and Evaluation and comparison module. Data were pre-
processed before passing to the other modules by data 4. Entropy can be calculated as
pre-processing module. In the misuse detection module, num_class freq(Lj, S) freq(Lj, S)
pre-processed data is given to PCA to take out Ent(S)    * log2( )
j 1 | S | |S|
important features. After that the data were examined
using Adaboost algorithm based on C4.5 decision tree
to know whether it is a normal packet or an intrusion. where, L = L1,L2,…, Ln is the set of classes,
Then the outcome of decision tree is passed on to the andnum_class is the number of distinct classes.
next module for evaluation and comparison. When the For our consideration num_class has only two
data is sent to misuse detection module it is values, namely, ‘normal’, and ‘anomaly’.
simultaneously sent to anomaly detection module also.
The correlation among features was also found out by 5. Split value of an attribute is chosen by taking
the correlation unit by using Pearson Correlation. Data the average of all the values in the domain at
correlation graph is used to show deviation of behavior that particular attribute. It can be formulated as
from the normal behavior. Then, the evaluation and  m 
  a _ val i 
comparison module determine whether the instance is  
i  1 
an intrusion or not by taking the output from misuse and Split(a) 
anomaly detection module and if both the module m
shows that it is an intrusion then only that instance is where m is the number of values of an attribute
considered as an intrusion. ‘a’.
In [20], a hybrid IDS is proposed. Random Forest is
used for classification in misuse detection to build 6. Find the attribute with the highest gain ratio.
patterns of intrusion from a training dataset. Weighted Suppose, the highest gain ratio is for the
k-means clustering is used in anomaly detection. Due to attribute 'a_best'.
less correlation between clusters, there is high false
alarm rate as the number of clusters increases for 7. Construct a decision node that divides the
detecting larger number of attacks. dataset on the attribute 'a_best'.

8. Repeat steps from 1 to 4 on each subsets


3. DECISION TREE SPLIT (DTS)
produced by dividing the set on attribute
ALGORITHM 'a_best' and insert those nodes as descendant
of parent node.
Decision Tree Split (DTS) algorithm is based on C4.5
decision tree algorithm [11] [21]. The main issue in C4.5 algorithm uses the following function for
constructing decision tree is the split value of a node. calculating the split value of an attribute
The proposed algorithm gives a novel approach in | S _a |
|S_a|
selecting the split value. The steps of the algorithm are Split(a)   * log 2 
as follows: a _ val  values(a) |a|  |a| 
Int. J. Advanced Networking and Applications 2831
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290

3.1 SIGNIFICANCE OF THE PROPOSED 4.1 DATASET


ALGORITHM
In this section, the efficiency of the proposed technique
To select the split value, C4.5 algorithm first sorts all is evaluated cautiously by experimentations with the
the values of an attribute. Then from these sorted NSL-KDD data set, which is a revised version of
values, say, Pi, Pi+1, … Pn , the gain ratio of all the KDD’99 data set. The reason for using NSL-KDD
values is calculated by choosing the lower value of Pi dataset for our experiments is that the KDD’99 data set
and Pi+1 as threshold value and then calculate split value has a large number of redundant records in the training
by using above mentioned formula. The value which and testing data set. For binary classification, the NSL-
gives the highest gain ratio is chosen as the split value KDD classifies the network traffic into two classes,
for that particular node. Instead of using all these namely, normal and anomaly. The experiments were
calculations which makes technique more complex and performed on full training data set having 125973
difficult to understand, we use a simple and effective records and test data set having 22544 records.
approach. In our approach, there is no need to sort the First, we compute information gain of all the attributes
attribute values to calculate the split value. We calculate of the data set. We found that there are 16 attributes
the split value by taking the average of the values in the whose information gain is greater than the average
domain of a particular attribute at each node. It gives information gain. That’s why in the preprocess step, we
uniform weightage to all the values in the domain, can choose 16 or less than 16 attributes for further
making the classifier totally unbiased towards the most processing based on information gain because the
frequent values in the domain of an attribute. remaining features will not have much effect on
Sometimes, gain ratio may choose an attribute as a split classification of the dataset. Then, the data set with
attribute just because its intrinsic information is very these selected attributes is passed to the algorithm for
low. This limitation can be overcome by considering constructing, training and testing the decision tree.
only those attributes that have greater value of
information gain than average information gain. 4.2 RESULTS AND ANALYSIS

4. IMPLEMENTATION AND TESTING The performance of the proposed algorithm is


compared with the performance of various techniques.
The DTS algorithm is implemented on a 64-bit The comparison of results is done based on the
Windows 8.1 operating system, with 8 GB of RAM and accuracy in detecting attacks on the test dataset of NSL-
a Pentium(R) processor with CPU speed of 2.20GHz KDD. The results are taken from the literature which
using tools WEKA and MATLAB. The proposed uses various techniques such as Self Organizing Maps
algorithm is compared with the existing ones such as (SOM), hoeffding tree, and Ripple Down Rule learner
Classification and Regression Tree (CART), C4.5, and Intrusion Detection (RDRID) for training their detection
AD Tree. model and testing. It is observed that our proposed
The experiments were done for performance algorithm for constructing decision tree is efficient in
comparison of different tree based classifiers and the attack detection as shown in Fig. 1.
DTS algorithm. The analysis is done based on different Various other classifiers such as CART, Naïve Bayes
parameters such as how many seconds the classifier (NB) Tree, and AD Tree along with the proposed
takes to construct the model, false positive rate, true algorithm are tested using NSL-KDD test dataset. ROC
positive rate, and accuracy. True Positive (TP) curves of AD Tree, C4.5, CART and DTS algorithm
represents the examples that are correctly predicted as without feature selection on test data of NSL-KDD are
normal. True Negative (TN) shows the instances which plotted as shown in Fig. 2. The time taken by several
are correctly predicted as an attack. False Positive (FP) classifiers is also measured and bar graph is plotted in
identifies the instances which are predicted as attack Fig. 3. It is observed from Fig. 2 and Fig. 3, that the true
while they are not. False Negative (FN) represents the positive rate of DTS is better than C4.5 technique,
cases which are prefigured as normal while they are however CART shows the best performance in terms of
attack in reality. Accuracy can be defined as the number true positive rate. But if we compare the results in terms
of correct predictions. It can be computed as of delay to build the model, we can see that CART
(TP  TN) takes very high time as compared to other techniques.
Accuracy The results of comparison of various classifiers with
(TP  TN  FP  FN)
different number of features are presented in Fig. 4. It
The Receiver Operating Characteristic (ROC) curve is can be seen from the results that with the proposed
also plotted for various techniques. ROC plots the curve technique, instead of training with all the features we
between true positive rate (TPR) and false positive rate get good accuracy with even less number of features
(FPR) of an algorithm. TPR and FPR are computed as selected using information gain.
TP FP
TPR  and FPR 
TP  FN FP  TN 
Int. J. Advanced Networking and Applications 2832
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290

Figure1. Results of comparison of proposed algorithm with various other techniques

Figure 2. Comparison of ROC Curve of proposed algorithm with various other classifiers

Figure 3. Time of construction of model of the proposed algorithm with various tree based classifiers
Int. J. Advanced Networking and Applications 2833
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290

Figure 4. Comparison of accuracy of several classifiers with proposed algorithm

5. CONCLUSION AND FUTURE SCOPE OF [4]C. Cortes, and V. Vapnik, (1995). Support-vector
WORK networks, Machine Learning 20 (3): 273.

Decision tree assists the network administrator to [5] P.N. Tan, M. Steinbach, V. Kumar, Introduction to
decide about the incoming traffic, i.e., whether the Data Mining, Pearson Addison Wesley, 2005.
coming data is malicious or not by providing a model
that separates malicious and non-malicious traffic. By [6] J. Markey, Using Decision Tree Analysis for
modified the split value calculation by taking the Intrusion Detection: A How-To Guide, SANS Institute
average of all the values in the domain of an attribute. InfoSec Reading Room, June, 2011.
The algorithm provides uniform weightage to all the
values in the domain. It allows taking less number of [7] T. M. Mitchell, (1997). Machine Learning. The Mc-
attributes and provides acceptable accuracy in Graw-Hill Companies, Inc. ISBN 0070428077.
reasonable account of time. From the results of the
experiments, it is concluded that the proposed algorithm [8] J.R. Quinlan, C4.5: Programs for Machine Learning,
for signature based intrusion detection is more efficient Morgan Kaufmann Publishers, 1993.
with respect to finding attacks in the network with less
number of features and it takes less time to construct the [9] D.P. Gaikwad, and R.C. Thool, Intrusion Detection
model. It is also concluded that the efficiency depends System Using Bagging with Partial Decision Tree Base
on the size of the data set and the number of features Classifier,Proc. of the 4th International Conference on
used to construct the decision tree. The formula used in Advances in Computing, Communication and Control,
DTS to calculate gain ratio can also be used in attribute 2015.
selection for feature reduction. Our future scope of
work is to improve the split value by using concepts [10] K. Bajaj, and A. Arora, Improving the Intrusion
such as geometric mean which also gives uniform Detection using Discriminative Machine Learning
weightage to the domain values. Approach and Improve the Time Complexity by Data
Mining Feature Selection Methods, International
REFERENCES Journal of Computer Science, vol. 76, Aug, 2013.

[1] P. Aggarwal, and S.K. Sharma, An Empirical [11] A. Alazab, M. Hobbs, J.Abawajy, and M. Alazab,
Comparison of Classifiers to Analyze Intrusion Using Feature Selection for Intrusion Detection System,
Detection, Proc. of Fifth International Conference an International Symposium on Communications and
Advanced Computing and Communication Information Technologies, 2012.
Technologies, 2015.
[12] S. Thaseen, and Ch. A. Kumar, An Analysis of
[2] Ho, and Tin Kam, Random Decision Forests,Proc. Supervised Tree Based Classifiers for Intrusion
of the 3rd International Conference on Document Detection System, In Proc. of the 2013 International
Analysis and Recognition, Montreal, QC, 14–16 August Conference on Pattern Recognition, Informatics and
1995. pp. 278–282. Mobile Engineering, Feb, 2013.

[3]http://www.saedsayad.com/oner.htm [13] S. Revathi, and A. Malathi, A detailed Analysis on


NSL-KDD Dataset Using Various Machine Learning
Techniques for Intrusion Detection, International
Int. J. Advanced Networking and Applications 2834
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290

Journal of Engineering research and Technology, vol 2,


Issue 12, Dec, 2013.

[14] V. Jyothsna, and V.V. Prasad, A Comparative


Study on Performance Evaluation of Intrusion
Detection System through Feature Reduction for High
Speed Networks, Global Journal of Computer Science
and Technology: E Network, Web and Security, vol. 14,
Issue 7, Version 1.0, 2014.

[15] P. Ghosh, C. Debnath, D. Metia, and Dr. R. Dutta,


An Efficient Hybrid Multilevel Intrusion Detection
System in Cloud Environment, IOSR Journal of
Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,
p-ISSN: 2278-8727, Volume 16, Issue 4, Ver. VII (Jul –
Aug. 2014), PP 16-26.

[16] K.S. Elekar, and M.M. Waghmare, Comparison of


Tree base Data Mining Algorithms for Network
Intrusion Detection, International Journal of
Engineering, Education and Technology, vol 3 Issue 2,
2015.

[17] S. Mallissery, S. Kolekar, and R. Ganiga, Accuracy


Analysis of Machine Learning Algorithms for intrusion
Detection System using NSL-KDD Dataset, Proc.
International Conference on Future Trends in
Computing and Communication -- FTCC 2013, July
2013, Bangkok.

[18] A.S.A. Aziz, A.E. Hassanien, S. El-Ola Hanafy,


M.F. Tolba, Multi-layer hybrid machine learning
techniques for anomalies detection and classification
approach, 13th International Conference on Hybrid
Intelligent Systems (HIS), 2013, IEEE.

[19] A. Raeeyat, and H. Sajedi, HIDS: DC-ADT: An


Effective Hybrid Intrusion Detection System based on
Data Correlation and Adaboost based Decision Tree
classifier, Journal of Academic and Applied Studies,
vol. 2(12), Dec. 2012, pp.19-33.

[20] R.M. Elbasiony, E.A. Sallan, T.E. Eltobely, and


M.M. Fahmy, A hybrid network intrusion detection
framework based on random forests and weighted k-
means, Ain Shams Engineering Journal, vol.4, Issue 4,
Dec, 2013, pp. 753-762.

[21] D.P. Gaikwad, and R.C. Thool, Intrusion Detection


System using Ripple Down Rule learner and Genetic
Algorithm, International Journal of Computer Science
and Information Technologies, vol. 5, 2014, pp. 6976-
6980.

[22] L.M. Ibrahim, D.T. Basheer, and M.S. Mahmod, A


comparison study for intrusion database (KDD99, NSL-
KDD) based on Self Organization Map (SOM)
Artificial Neural Network, Journal of Engineering
Science and Technology, vol. 8, No. 1, 2013, pp. 107-
119.

You might also like