Intrusion Detection Using Deep Belief Network and Probabilistic Neural Network (2017)
Intrusion Detection Using Deep Belief Network and Probabilistic Neural Network (2017)
Abstract—This paper focuses on the problems existing in but there are still some drawbacks. For example, Reference [4]
intrusion detection using neural network, including redundant uses the genetic algorithm to solve the problem which the
information, large amount of data, long-time training, easy to fall convergence speed of the neural network is slow and easy to
into the local optimal. An intrusion detection method using deep fall into the local optimum. However, the solution space scale
belief network (DBN) and probabilistic neural network (PNN) is of the genetic algorithm is related to the number of network
proposed. First, the raw data are converted to low-dimensional parameters, which will increase the network training time. In
data while retaining the essential attributes of the raw data by addition, although the principal component analysis (PCA)
using the nonlinear learning ability of DBN. Second, to obtain the method can reduce the dimension of the data to improve the
best learning performance, particle swarm optimization
detection speed, it will reduce the effective information of raw
algorithm is used to optimize the number of hidden-layer nodes
data [5]. The emergence of deep learning provides a new
per layer. Next, PNN is used to classify the low-dimensional data.
Finally, the KDD CUP 1999 dataset is employed to test the direction for intrusion detection. Deep learning discovers
performance of the method mentioned above. The experiment intricate structure in large data sets by using the
result shows that the method performs better than the traditional backpropagation algorithm to indicate how a machine should
PNN, PCA-PNN and unoptimized DBN-PNN. change its internal parameters that are used to compute the
representation in each layer from the representation in the
Keywords—intrusion detection; deep belief network; particle previous layer [6]. This learning is very suitable for detecting a
swarm optimization; probabilistic neural network; network security variety of high dimension invasion behavior, so this paper
proposes an intrusion detection method based on deep learning.
I. INTRODUCTION First, the raw data are converted into low-dimensional data by
deep learning that further reflects characteristics of the data.
Any machine exposed to the Internet today is at the risk of Second, the intrusion detection model is established by using
being attacked and compromised. Detecting attack attempts, be the probabilistic neural network (PNN). Next, the number of
they successful or not, is important for securing networks hidden-layer nodes is optimized by particle swarm
(servers, end-hosts and other assets) as well as for forensic optimization (PSO) algorithm. Finally, the model is trained and
analysis [1]. Intrusion detection plays an important role in the tested with KDD CUP 1999 dataset.
network security, and traditional intrusion detection system can
detect many types of attack, however, with network
environment increasingly complex, network structure diverse, II. DEEP LEARNING
network attacks updated quickly, traditional intrusion detection Deep learning is a new field of machine learning, which is
is difficult to meet network security demand. What is more, essentially a multi-layer neural network. By simulating the
many Advanced Persistent Threats or Targeted Attacks and biometrical characteristics of animal brain vision cells, a
organizations are surfaced after the exposure of Stuxnet in similar deep neural network is established to understand the
2010 [2], so it needs more intelligent intrusion detection system data. The multi-layer neural networks usually have 5-10 layers
to address these challenges. or more hidden layer, which can be formed by learning a deep
nonlinear network structure to combine low-level features to
Intelligent intrusion detection methods include data mining, form a more abstract high-level representation of attribute
decision tree, support vector machine, genetic algorithm, categories or features.
artificial neural networks (ANN) and so on. Among them,
ANN has been widely used in intrusion detection system
because of its unique model structure, nonlinear simulation A. Restricted Boltzmann Machine
ability, strong learning and adaptive ability [3]. With the ever- Restricted Boltzmann Machine (RBM) is a two-layer
growing network traffic, Intrusion detection system is troubled network, each unit in inter-layer has a two-way connection,
by high-dimensional intrusion data and complex attack types. units within the same layer are not connected. Its structure
While neural networks are widely used in intrusion detection, shown in Fig. 1.
Corresponding author: Cuixiao Zhang, Email: [email protected]
E ( v, h ) ¦ ai vi ¦ b j h j ¦ vi h j wij
iV jH i, j
. (1)
Assume that the neuron node has a value of 0 or 1, where vi
and hj represent the binary state of the visible layer unit i and
the hidden layer unit j respectively, ai represents the bias of the
unit i and bj represents the bias of the unit j, wij is the weight
Fig. 2. Deep belief network structure
between i and j.
Definition of v and h joint probability: Learning of DBN is divided into pre-training and fine-
tuning two parts [8]. Pre-training is the process of unsupervised
1 E ( v ,h ) learning, starting from the first RBM input data, then the output
p ( v, h ) e
Z , (2) of the first RBM as the input of the second RBM, so that
training layer by layer, until all layers are trained to complete.
where Z is called the allocation function: After the pre-training, the parameters of each layer, including
the weight and bias as the initial value of the network
Z ¦e E ( v,h )
parameters, then using the BP algorithm to adjust the
v,h
, (3) parameters of the entire network, which is a process of
from (2) and (3) the probability of visible layer can be supervision learning.
obtained as follows:
III. INTRUSION DETECTION MODEL
1
p (v ) ¦ e E ( v ,h )
Z h A. Particle Swarm Optimization Algorithm
, (4)
PSO is a parallel algorithm, which has the advantages of
Training an RBM is let the RBM model to learn easy implementation, high precision and fast convergence [9].
parameter ) ( w, a, b) , that is to learn the parameter when the To find the best solution, PSO initializes some random
likelihood probability reaches the maximum. The logarithmic solutions in solution space, these solutions are some particles,
likelihood function is used to derive the parameters w: where define the particle velocity vi and the particle position xi.
Then use the fitness function to evaluate whether the position
w log p(v)
vi h j !data vi h j !model , (5) of the particles is optimal, using pbest and gbest to record the
wwi j best position of the individual and the group respectively. For
each particle, calculate its fitness value, it will be as pbest if it
the gradient update criterion is: is better than pbest, and it will be as gbest if better than gbest,
'wi j H ( vi h j ! data vi h j ! model ) , (6) then update the particle velocity and position.
Particle velocity and position update rules are as follows:
H is the learning rate. The angle brackets subscript indicates
the expectation of the brackets under a certain probability
distribution. The expectation of the data distribution is easy to vi wvi c1 u rand () u ( pbesti xi ) c2 u rand () u ( gbesti xi )
,(8)
obtain, in contrast the expectations of the model distribution
needs to perform a long-time Gibbs Sampling. In 2002, Hinton xi xi vi
, (9)
proposed a contrast divergence algorithm [7]. The objective
function that needs to be optimized is changed from where vi is the velocity of the particle, w is inertia weight,
logarithmic likelihood function to contrast divergence function, rand () is a random value between 0 and 1㸪xi is the current
which accelerates the training speed, at the same time the position of the particle㸪c1 and c2 are acceleration factor. If the
guidelines for the gradient descent is: velocity or the position of the particles beyond the scope of the
'wij H ( vi h j ! data vi h j ! recon ) search, it will be set as the maximum velocity or the position of
, (7) the boundary. Once the particle has been updated, it will
continue to iterate until the best solution is found. Usually
finding the best position or reaching maximum number of
640
638
iterations will stop the search. In DBN, the number of hidden- First, to reduce the dimension of the training data and test
layer nodes affects the reconstruction of the unsupervised data, the features of the raw data are mapped to the low
learning phase and the fine tuning of the supervised learning dimension space by excellent feature learning ability of DBN.
stage. Therefore, the number of hidden-layer nodes in the deep Next, PSO algorithm is used to optimize the number of hidden-
learning needs to be optimized by PSO algorithm to improve layer nodes of DBN, improving the learning performance of
the performance of the network. the network. Finally, the data are input to the PNN network for
training and testing.
B. Probabilistic Neural Network
Probabilistic neural network (PNN) is a feedforward neural IV. EXPERIMENT AND RESULT ANALYSIS
network developed by radial basis function neural network. The experimental environment is MATLAB R2010a in this
The theoretical basis is Bayesian minimum risk rule [10]. PNN paper. To measure the merits of the algorithm, methods that
is a local approximation network, that is, for some subsets of DBN-PNN without PSO, PCA-PNN and traditional PNN are
the input only a few neurons determine the output of the used as contrast experiments. The KDD CUP 1999 dataset is
network, PNN learning faster, and there is no easy to fall into selected to evaluate the intrusion detection model proposed
local optimum. Different from the traditional neural network above and it contains Dos, R2L, U2R and Probe four classes of
with sigmoid function, PNN usually uses the Gaussian function attack data. The 10% training dataset and the 10% testing
as the network activation function. Its hierarchical model dataset are usually used in the study [11]. Defining several test
consists of input layer, pattern layer, summation layer and indicators as follows:
output layer. The basic structure of the network is shown in Fig.
3. Detection accuracy: the proportion of the data that is
correctly classified as the total test data; Detection rate: the
proportion of the intrusion data that is correctly classified as
the total invasive data; False alarm rate: the proportion of data
that is misclassified in normal data to total normal data.
To simulate the real network environment, a total of 10,000
data that include normal and the four classes attack data
randomly taken from the train dataset and the test dataset, of
which the normal data accounted for 96%. The distribution of
specific data is shown in Table 1.
641
639
the number is determined manually, the optimal network amount of data, long training time, easy to fall into local
structure is not necessarily obtained, so we use the PSO optimum. This paper proposes an intrusion detection method
algorithm to find the number of each hidden-layer node to based on DBN and PNN. This method uses DBN to shortens
determine the optimal network structure. the PNN network training and test time by converting the raw
data into low-dimensional data. At the same time, to improve
The adaptive function of PSO algorithm is: the performance of DBN network feature expression, the PSO
1 n correcti algorithm is used to optimize the number of DBN hidden-layer
f ¦ wi , n 5, wi
ni1 sumi ,
nodes. The experimental results show that the combination of
(10) deep learning and PSO algorithm and PNN is effective and
correcti indicates the number of the i-th type of data that provides some reference for solving the above problems in
can be correctly identified, sumi represents the total number of intrusion detection. The experiment is done on a public dataset,
i-th data. All the parameters of PSO shown in Table 2. and the actual network environment is more real and complex
than the dataset. Therefore, the next step is to apply the method
to the real network, through the feedback in the network to
TABLE II. EXPERIMENT PARAMETERS improve the method. What’s more, there are many parameters
Parameter information that need to be trained in deep neural networks. When the
Name network layer is large, it will take a lot of time, so GPU
Value Description
acceleration technology or big data processing technology
inertia weight 0.8 the global and local convergence speed could be used to speed up, the ultimate goal is to build a system
acceleration factor 2, 2 recommended c1 + c2 <= 4 similar to real-time intrusion detection.
more than 50, the experimental results
iterations 50
no significant changes ACKNOWLEDGMENT
the number of particles involved in the
population 20 We thank the financial support of Graduate School of
search at the same time
particle dimension 3 there are three hidden layers Shijiazhuang Tiedao University. Thanks teacher Feng provides
guidance on artificial neural networks, and thanks the teacher
The results of using the PSO algorithm to optimize the Yang and teacher Ma, we discussed the deep learning together,
number of three hidden-layer nodes are 90, 21, 17. Then input and they provided some useful suggestions.
the dimension reduction data into the PNN network for training
and testing, and calculate the detection indicators mentioned
above. The specific results of the experiment are shown in REFERENCES
Table 3.
[1] Divakaran, Dinil Mon, et al. “Evidence gathering for network
security and forensics,” Digital Investigation ,2017, pp.56–65.
TABLE III. EXPERIMENTAL RESULT
[2] Langner, Ralph. “Stuxnet: Dissecting a Cyberwarfare Weapon,”
Experimental result IEEE Security & Privacy Magazine, vol. 9:3, 2011, pp. 49-51.
Methods Running Detection Detection False alarm [3] Shah B, H Trivedi B. “Artificial Neural Network based Intrusion
Time accuracy rate rate Detection System: A Survey,” International Journal of Computer
unoptimized Applications,vol. 39:6, 2012, pp. 13-18.
5.71s 99.31% 91.75% 0.375%
DBN-PNN [4] Kumar, Gulshan, and K. Kumar. “The Use of Multi-Objective
optimized Genetic Algorithm Based Approach to Create Ensemble of ANN for
5.48s 99.14% 93.25% 0.615%
DBN-PNN Intrusion Detection,” International Journal of Intelligence Science
PCA-PNN 6.16s 98.28% 89% 1.33% vol. 2:4, 2012, pp. 115-127.
[5] Hoz E D L, Hoz E D L, Ortiz A, et al. “PCA filtering and
PNN 35.38s 99.04% 89.25% 0.55% probabilistic SOM for network intrusion detection,”
It can be seen from the experimental results that the PSO Neurocomputing, 2014, pp. 71-81.
algorithm can find the best structure of DBN network, which [6] Lecun Y, Bengio Y, Hinton G. “Deep learning,” Nature, vol.
521:7553, 2015, pp. 436-444.
improves the performance of feature expression, so the
[7] Hinton G E. “Training products of experts by minimizing
detection rate is the highest, and the detection time is shorter contrastive divergence,” Neural Computation, vol. 14:8, 2002, pp.
than that the network without reducing the dimension. PCA- 1771-1800.
PNN network by using the PCA method to reduce the [8] Hu Y J, Ling Z H. “DBN-based Spectral Feature Representation for
dimension of data, but the PCA will result in a reduction of Statistical Parametric Speech Synthesis,” IEEE Signal Processing
data, so its detection rate is lower than DBN-PNN. PNN Letters, vol. 23:3, 2016, pp. 21-325.
network has the longest running time, the detection rate is [9] Fong S, Wong R, Vasilakos A V. “Accelerated PSO Swarm Search
lower than DBN-PNN. This is due to that there is no data Feature Selection for Data Stream Mining Big Data,” IEEE
dimensionality reduction and optimization. Transactions on Services Computing, vol. 9:1, 2016, pp. 33-45.
[10] Chasset, Pierre Olivier. "pnn: Probabilistic neural networks." MIT
Press, 2013, pp. 109-118.
V. CONCLUSION [11] Shah, B, and B. H. Trivedi. “Reducing Features of KDD CUP 1999
In the application of neural network in intrusion detection, Dataset for Anomaly Detection Using Back Propagation Neural
Network,” Fifth International Conference on Advanced Computing
there are many problems such as redundant information, large & Communication Technologies, 2015, pp. 247-251.
642
640