Intrusion Detection System Based On Multi-Layer Perceptron Neural Networks and Decision Tree
Intrusion Detection System Based On Multi-Layer Perceptron Neural Networks and Decision Tree
Abstract—The growth of internet attacks is a major problem system to work reliably even in uncertainty. For the
for today’s computer networks. Hence, implementing security simulation, the Defense Advanced Research Projects Agency
methods to prevent such attacks is crucial for any computer (DARPA) KDD CUP 99 [4] dataset has been used. This
network. With the help of Machine Learning and Data Mining dataset contains 41 studied network traffic features plus one
techniques, Intrusion Detection Systems (IDS) are able to label which constructs its 42 records [4].
diagnose attacks and system anomalies more effectively.
Though, most of the studied methods in this field, including One of the most famous datasets in researches which are
Rule-based expert systems, are not able to successfully identify related to IDS, is KDD Cup 99. DT and MLP have been used
the attacks which have different patterns from expected ones. in many cases for classification purposes. In [5], the authors
By using Artificial Neural Networks (ANNs), it is possible to use DT and NN oriented algorithms to classify attack and
identify the attacks and classify the data, even when the dataset normal records. In their hybrid IDS, they use Decision
is nonlinear, limited, or incomplete. In this paper, a method Support System (DSS) for misuse detection in KDD CUP
based on the combination of Decision Tree (DT) algorithm and 99. In [6], authors take advantage of Support Vector
Multi-Layer Perceptron (MLP) ANN is proposed which is able Machines (SVM) and DT and Simulated Annealing (SA). In
to identify attacks with high accuracy and reliability. their bulky method, they use SVM and SA for feature
selection and SVM and DT for classification. Their method
Keywords-Intrusion Detection Systems; Decision Tree;
shows high anomaly detection rate. In [7], authors propose a
Neural Networks; Machine Learning
multiple-level hybrid classifier, which combines supervised
I. INTRODUCTION DT and unsupervised Bayesian Clustering. Their method’s
performance on KDD CUP 99 shows low false alarm rate.
With the help of abnormal traffics, attackers and These three recent papers utilize DT alongside of a number
intruders are able to penetrate computer networks and cause of other algorithms to achieve a better performance. That
serious problems for users and network 1administrators. An depicts that DT performs decently when it is used alongside
attack to a business corporation for example, could lead to of another well-chosen algorithm. In [8], a misuse detection
loss of much sensitive information and significant amount of model is built based on DT and then the normal training data
money. Although IDSs cannot block the traffics by is decomposed into smaller subsets. For the decomposed
themselves, they are able to diagnose attack traffics and subsets, multiple one-class SVM models are used. Their
report it to the firewall. The firewall then blocks the reported results on KDD CUP 99 shows low false alarm rate which is
traffic. Hence, a well-trained IDS alongside of a powerful really important for an IDS. In [9], authors present an
firewall can protect the network properly. The IDS ensemble classification method, involving MLP and radial
introduced in this work is an offline IDS which means it has basis function. Their method’s results show superior
been trained before installation on a network. In contrast, an performance comparing to the use of these two algorithms
online IDS is real time and is trained on online traffic data. solely. In [10], authors exploit K-mean Clustering algorithm
Decision Trees (DT) [1] and Multi-Layer Perceptron for dividing the dataset to some subspace. Then they use set
(MLP) Neural Networks [2] are two of the most famous and of MLPs to study every space. Their model has exposed
most applicable machine learning algorithms for appropriate performance on DARPA dataset. In [11], authors
classification. Generating various purposeful rules, DTs are use SVMMLP which uses SVM based on MLP. Their
able to detect abnormal traffics. This algorithm can method has lower space and time complexities in
formulate these rules by studying features of the dataset. comparison to using SVM only. Their model’s accuracy on
When a DT is trained, diagnosing the type of the traffic is DARPA NSL-KDD is noticeable. Authors of [12], employ a
possible. A network traffic can be applied to the tree. The combination of SVM, K-means and MLP algorithms to build
trail of the record in tree would be determined by specific a powerful IDS. Their method illustrates superior
generated rules in tree’s node. When the record reaches to a performance in comparison with other existing methods.
leaf, the type of traffic will be detectable. Artificial neural Most of researchers exercise DT and MLP with other
networks are another powerful approach for classification. classification or clustering algorithms to boost their
ANNs are computational models based on a biological performance. These two methods are especially useful
nervous system in human's brain that are widely used in because of their relative simplicity. Other methods such as
machine learning. A typical ANN is depicted as a system of RBF and SVM require more punctuality. In this regard, with
interconnected "neurons" with the ability of computing a hybrid IDS, which combines the effectiveness of both DT
values from inputs [3]. In this paper, an MLP ANN is used to and MLP, reaching to excellent results is feasible in a
classify the data of the dataset. This network can create simpler manner. What matters is to employ a simple method
relationships between attack entries and help the system to to be utilizable in real networks. Most of the mentioned
correctly identify them. In many cases, ANNs can assist the papers, unlike this paper’s method, take advantage of DT and
1
Corresponding Author
978-1-4673-7485-9/15/$31.00 ©2015 IEEE
MLP abilities with several bulky and compplicated methods researchers, the proposed method
m tries to benefit from the
which might be cumbersome to use in real woorld. abilities of both algorithm ms to achieve better results.
Proposed method is designedd by two separate phases. In the
One of the major focuses of researchers isi to consider all first phase, the random dataaset with 41 features (which we
the attack types as a single major attack class
c [13-16]. In call it original dataset now on) is trained on both DT and
offline detection, where the IDS has enough time
t to build its MLP, and their outputs (ppredicted values) are obtained.
best possible model, this step, as a preprocesssing step, could These algorithms predict thee type of traffics based on their
promise encouraging effects in reduction of o false positive input records. They are ablee to build rational relationships
and false negative rates which is vital for ann effective IDS. between input and output daata and make a model which is
Proposed method uses all the features in KDD CUP 99. This able to predict new inputs acccurately. Hence, the algorithms
method uses the productive prediction abiliities of DT and predict “Attack” or “Normal” for each input traffic record.
MLP. A proposed well-trained MLP is able to perform very The MLP network which is used in the first phase is
well alone. With combining this powerful neural network illustrated in diagram 1. It has 41 inputs to cover the 41
and a DT, an enhanced IDS would emerge. Proposed
P method features of the original datasset, 8 neurons in each of its two
exploits these two algorithms’ positive effects in a well-made hidden layers, and two ouutputs to represent Attack and
dataset.
Normal records.
II. PROPOSED METHOD
The KDD CUP 99 is a very large dataset. d Original
training dataset has more than 4 million recoords. Thus, using
original dataset for the simulation is arduouss. In this regard,
researchers try to create random train and test datasets for
simulation purposes [17-21]. In this methodd, a dataset with
43000 train records and 21000 test records iss selected. 10000
records of each dataset, belongs to normal reccords and rest of
them are attack ones. It should be taken innto account that
redundancy in KDD CUP 99 dataset is overwhelming.
Because of capturing numerous traffic recordds by listening to
traffic in short amount of time, most of thee records in the Diagram 1: The MLP network
n used in first phase
dataset are completely identical. Thereforre, the random
sampling phase starts by deleting the redundaant records [22]. In the second phase, the t new dataset is created by
In random sampling, normal and attack reccords have been putting the outputs of DT andd MLP and the original labels of
selected from irredundant dataset. Attack reccords have been the dataset together. In this way,
w the new dataset (which we
selected from various types of attacks whichh are provided in call it new dataset now on) has two specific features which
original KDD CUP 99 dataset. This type off selection helps are the predicted values of running
r DT and MLP on every
the model to simulate the real conditions, annd consequently, single record from the originnal dataset. In other words, the
be reliable in real world applications. Inn Table 1, the results of DT and MLP for every
e single record in phase one,
distribution of records in original dataset andd our dataset has are captured and fed to the new
n dataset. The outcome of DT
been brought. and MLP are whether “Noormal” or “Attack”, which we
TABLE 1: NUMBER OF RECORDS IN EACH
H DATASET replace these values with the equivalent numbers of “1” or
“2”. Finally, the label, whichh indicates whether this record is
DARPA’ DARPA’ Ourr Our “Normal” or “Attack”, is entered to the new dataset.
s s Test Randoom Random Thereupon, the new dataset,, which has three columns (two
Training dataset Traininng Test features and one label), is trained again by the MLP network.
dataset dataseet dataset Having fed the new datasett to the well-trained MLP, the
Attack 3,925,650 250,436 33,000 11,000 results are evaluated. The ML LP network which is used in the
Normal 972,781 60,591 10,000 10,000 second phase is illustrated inn diagram 2. It has two inputs for
Total 4,898,431 311,027 43,000 21,000 the outcomes of DT and MLP P from the first phase, 8 neurons
in each of its two hidden layeers, and two outputs to represent
Attack and Normal records.
MLP is one of the most applicable and poowerful methods
for classification in IDS [23]. By manipulatinng the design of
MLP, a researcher will be able to build a weell-trained model
to classify desired classes. Considering the t relationship
between the number of layers and neurons,, the speed, and
the error rate of the simulation, proposed MLP
M network has
been designed with two layers, each includding 8 neurons.
These numbers of neurons is selected to effecctively cover the
size of input data. In these simulations, Leveenberg-Maquardt
has been chosen as the training function. Thhe ratio of train
records to test records is 43/21.
Considering the performance of DT and MLP
M algorithms Diagram 2: The MLP neetwork used in second phase
in this paper’s experiments and pervious works of other
Diagram 3 represents the proposed method. As shown in table 2, botth classification algorithms have
appropriate performance. MLP M has superior outcomes in
comparison with DT. One of the most important criteria for
evaluating an IDS is FAR which
w represents its reliability in
the real world uses. MLP haas very small FAR that acclaims
its effectiveness and reliabilitty. FAR is directly dependent to
FP. IDSs must have small FP. It may seem harmless to
classify a suspicious traffic which is in fact normal, as an
attack one; however, in big scales, this feature can decrease
Diagram 3: Block diagram of the proposeed method
the trustworthiness of an IDS S.
Suggesting the new dataaset, the proposed method helps
As shown in diagram 3, the outcomes of DT and MLP, the MLP to perform more effficiently. That signifies that the
which have been run on original dataset, form the new proposed method exploits thhe ability of both algorithms to
dataset. In the second phase, the well-trrained MLP is achieve enhanced results. DT
T results are obviously inferior to
executed on the new dataset. At the end, thhe proposed IDS MLP; however, it helps MLP P to improve its performance. In
can be judged based on three specific criiteria which are the second phase, the outtput dataset contains valuable
mentioned below. information of DT’s outputs. This valuable DT’s information
In this work, the measures of evaluatingg the results are assists MLP in the second phase to achieve better results
False Alarm Rate (FAR), Accuracy (A), andd Detection Rate which were unreachable in first phase as shown in Figure 1.
(DR). These three criteria are major tools foor evaluating an Figure 1 shows the performance of DT and MLP on two
IDS performance. The next section shows the t outcomes of different phases of the propoosed method. The phase number
proposed method and how it is superior in perrformance. is indicated in front of each classification
c algorithm. By way
of shown in Figure 1, the MLPM performance is superior on
III. SIMULATION AND RESULLTS both phases. DR of DT in booth datasets are the same; 98.01,
For the evaluation of every IDS, there aree typical criteria while its accuracy -97.94- shows slight improvement by 0.01
to determine whether its performance is satiisfactory or not. percent, in output dataset raather than original dataset. This
This studied evaluation criteria, helps the sysstem to evaluate might seem negligible improovement; however in IDS, where
its performance in real world applications. An appropriate there are significant numberrs of records in short amount of
IDS would prefer higher A and DR and low wer FAR. These time, this improvement cann enable the IDS to correctly
terms are explained as follows: classify numerous records consequently. Best DR and A
belongs to MLP; DR: 99.993 and A: 99.75, which shows
implementing the MLP on the new dataset can build an
A= ( TP + TN ) / ( TP + TN + FP + FN
F ) (1) improved IDS.
DR=TP / ( TP + FP ) (2)
FAR=FP / ( FP + TN ) (3)
Where:
• TP (True Positive): the number off Attack records
which were correctly classified as Attack
• TN (True Negative): the number off Normal records
which correctly were classified as Normal Figure 1. DR and A of two classiification algorithms on different phases
• FP (False Positive): the number of Normal records Figure 2 shows FAR off DT and MLP on original and
which incorrectly were classified as Attack new dataset. FAR in DT is the t same with different datasets
• FN (False Negative): the number off Attack records while MLP shows better performance in new dataset –in the
which were incorrectly classified as Normal second phase. Least FAR in this simulation goes to MLP on
new dataset with 0.08. Theerefore, with applying proposed
As learning system in the proposed method,
m DT and method, the FAR, similar too DR and A, has best result in
MLP classification algorithms have been runn on the original overall. Considering the facct that FAR is one of the major
dataset. And their results are obtained. Table 2 represents the criteria of IDS for reliability of the system, using MLP with
results of these two algorithms and comparees their A, FAR the new dataset could prom mise decent performance in real
and DR. world.
TABLE 2 : RESULTS OF SIMULATIONS ON ORIG
GINAL DATASET
[7] Cheng Xiang, Png Chinn Yong, Lim Swee Meng, "Design of
multiple-level hybrid cllassifier for intrusion detection system
using Bayesian clusterring and decision trees" in Pattern
Recognition Letters Vollume 29, Issue 7, 1 May 2008, Pages
918–924
[8] Gisung Kima, Seungminn Leeb, Sehun Kima "A novel hybrid
Figure 2. FAR of DT and MLP on differennt dataset
intrusion detection methhod integrating anomaly detection with
misuse detection" in Exppert Systems with Applications Volume
These results illustarte the efficiency of proposed method. 41, Issue 4, Part 2, Marchh 2014, Pages 1690–1700
The prposed method, assists both algorithhms to perform
better. The new dataset conatins purposeeful information [9] M. Govindarajan andd RM. Chandrasekaran1, "Intrusion
detection using neural based
b hybrid classification methods" in
about both algorithms’performance. Consediiring this dataset Computer Networks Voolume 55, Issue 8, 1 June 2011, Pages
as a train one and evalute its perfformance with 1662–1671
classififcation algorithms, we can expliot thhe abilty of both
classifiers with improved results. [10] Hongying Zheng, Lin Ni,
N Di Xiao "Intrusion Detection Based
on MLP Neural Netw works and K-Means Algorithm" in
Springer-Advances in Neural
N Networks – ISNN 2005 Lecture
IV. CONCLUSION Notes in Computer Sciennce Volume 3498, 2005, pp 434-438
In this paper, a method based on Artificial Neural
N Networks
and Decision Trees is proposed to desiggn an accurate [11] Yong Hou, Xue Fengg Zheng "SVM Based MLP Neural
Network Algorithm andd Application in Intrusion Detection" in
Intrusion Detection System with high detectiion rate and low Springer-Artificial Intellligence and Computational Intelligence
false alarm rate. This method is consisted of o two different Lecture Notes in Compputer Science Volume 7004, 2011, pp
phases. The first phase is to create a new daataset by feeding 340-345
the classification results of the DT and MLPP network on the
[12] A. M. Chandrashekhar, K. Raghuveer "Amalgamation of K-
random dataset. In the second phase, the MLP M network is means Clustering Algorithm with Standard MLP and SVM
used again to classify the data in the new dataset,
d and then Based Neural Networkks to Implement Network Intrusion
the results are evaluated. By using this hybrrid method, it is Detection System" in Springer-Advanced Computing,
possible to achieve promising outcomes. Having
H very low Networking and Informmatics- Volume 2 Smart Innovation,
Systems and Technologies Volume 28, 2014, pp 273-283
FAR, this system can promise reliable resullts in real world
applications. As a result, it is expectedd to accurately [13] M. J. Fadaeieslam, B.. Minaei-Bidgoli, M. Fathy, and M.
consider the capability of such hybrid methoods in the future Soryani, "Comparison of o Two Feature Selection Methods in
works. Intrusion Detection Sysstems," in Computer and Information
Technology, 2007. CIT C 2007. 7th IEEE International
ACKNOWLEDGMENT Conference on, 2007, ppp. 83-86.
Authors of this paper would like to thannk UCI machine [14] S. Zaman and F. Karrayy, "Lightweight IDS Based on Features
learning repository [24] for putting KDD CU UP 99 dataset in Selection and IDS Classsification Scheme," in Computational
public access. Science and Engineerring, 2009. CSE '09. International
Conference on, 2009, ppp. 365-370.
REFERENCES
[15] L. Sang Min, K. Dong Seong,
S Y. YoungHyun, and P. Jong Sou,
[1] S. S. Sivatha Sindhu, S. Geetha, and A. Kannnan, "Decision tree "Quantitative Intrusion Intensity Assessment Using Important
based light weight intrusion detection using a wrapper Feature Selection and Proximity Metrics," in Dependable
approach," Expert Systems with Applicationns, vol. 39, pp. 129- Computing, 2009. PR RDC '09. 15th IEEE Pacific Rim
141, 2012. International Symposiumm on, 2009, pp. 127-134.
[3] Simon O. Haykin “Neural Networks: A Comprehensive [17] F. Amiri, M. Rezaei Yousefi,
Y C. Lucas, A. Shakery, and N.
Foundation 3rd edition" New York Macmillaan,2009. Yazdani, "Mutual infoormation-based feature selection for
intrusion detection systems," Journal of Network and Computer
[4] Y. Yi, J. Wu, and W. Xu, "Incremental SVM M based on reserved Applications, vol. 34, ppp. 1184-1199, 7// 2011.
set for network intrusion detection," Exxpert Systems with
Applications, vol. 38, pp. 7698-7707, 6// 20111. [18] P. Sangkatsanee, N. Watttanapongsakorn, and C. Charnsripinyo,
"Practical real-time intruusion detection using machine learning
[5] Ozgur Depren, Murat Topallar, Emin Anariim, M. Kemal Ciliz, approaches," Computer Communications, vol. 34, pp. 2227-
"An intelligent intrusion detection system (IDS) for anomaly 2235, 12/1/ 2011.
and misuse detection in computer networks"" in Expert Systems
[19] C. R. Pereira, R. Y. M. Nakamura, K. A. P. Costa, and J. P.
Papa, "An Optimum-Path Forest framework for intrusion
detection in computer networks," Engineering Applications of
Artificial Intelligence, vol. 25, pp. 1226-1234, 9// 2012.