Paper 3 PDF
Paper 3 PDF
ISSN: 2278-0181
Vol. 3 Issue 4, April - 2014
Abstract - Intrusion detection is a critical issue in network to decrease the quantity of the data to be processed by
security, for protecting network resources. Therefore an removing the features that contain false correlations and
accurate system of detecting intrusions is to be built to give redundant information. This results in gaining better
assurance for information in any organization either public or accuracy and lower computation time. IDS task is
private. The main goal is to increase the detection rate and
commonly modeled as a classification procedure in a
reduce the false alarm rate. Since existing Intrusion Detection
Systems (IDSs) use all the features to detect known intrusions, machine-learning context. Many methods were proposed to
they achieve depressed results. We have proposed an develop an efficient IDS, among those Support Vector
algorithm Factor Analysis based Support Vector Machine Machines(SVMs) have gained a significant importance
(FA-SVM) for developing efficient IDS by making use of using intrusion detection system using various kernels
popular statistical technique called Factor Analysis (FA) [4].In modeling efficient IDS,it is necessary to reduce the
through which the features are analyzed as factors. To design features which showed a great change in the
more effective and efficient IDSs it is very essential to select performance[5].
the best classifiers. Therefore we used Support Vector For constructing an Intrusion Detection System the
RT
Machines (SVMs) which are good enough with high
research mainly falls in two ways: detection model
generalization ability. This work is done on knowledge
discovery and data mining cup dataset for conducting tests. generation and intrusion feature selection. In achieving best
The performance of this approach was analyzed and accurate results preprocessing techniques like feature
IJE
compared with existing approaches like Principal Component selection, feature reduction have become crucial in
Analysis (PCA) using SVM and also classification with SVM Intrusion Detection Systems [6].The recent study illustrated
itself without feature selection. The results proved that the an improved false positive rate using Artificial Neural
proposed method enhances the intrusion detection and Networks (ANN) in Intrusion Detection mechanism with
outperforms existing approaches thus modeling Principal Component Analysis(PCA) as a feature selection
computationally efficient IDS with minimum false positive strategy [7].There are numerous studies which show
rates.
reasonably good results with feature reduction using
Support Vector Machine(SVM) as a classifier
Key words: Intrusion Detection System (IDS), Network Security,
Factor Analysis (FA), Support Vector Machines (SVMs), tool[8][9][10][11].In another study using Classification and
Principal Component Analysis (PCA). Regression Trees (CART) and Bayesian Networks(BN)
Chebrolu et. al has given ensemble feature selection
1. INTRODUCTION algorithms which results in lightweight IDS[12].More
Intrusion detection is a critical issue in network security, recently a study on Generalized Discriminant Analysis as a
for protecting network resources. Therefore an accurate feature selection technique achieved good results[17].
system of detecting intrusions is to be built to give Even though SVM is a good classification technique, when
assurance for information in any organization either public applied to massive datasets many problems will be
or private. The main goal is to increase the detection rate occurring. Since solving SVM is similar to solving a
and reduce the false alarm rate. Intrusion Detection System quadratic optimization problem, when the dimensionality
(IDS) is a method which dynamically monitors the events increases it needs a large computational time and memory.
occurring in a system, and decides whether these events are Meanwhile for a pattern classification problem e.g.:
signs of an attack or constitutes an authorized use of the intrusion detection, it is difficult to decide which features
system [1] [2] [3].There are many types of IDSs in terms of are useful for classifying attack or normal activity. But
monitoring the network traffic such as Network Intrusion with IDS there are large amount of dimensions d as well as
Detection System (NIDS), Host Based Intrusion Detection examples k which leads to inaccurate results. Therefore
System (HIDS) and Hybrid Intrusion Detection System. there is a need to select most significant features and apply
IDS has to monitor large amount of audit data even for a high performance classifiers like SVMs which results in
small network, therefore analysis becomes more difficult, low false alarm rates.
which leads to poor detection of suspicious activities. Here in this paper we have taken a popular statistical
There are diverse affinities between features. So, IDS has technique called Factor Analysis (FA) as a dimensionality
reduction technique through which the features are done to improve learning methods using SVMs. One
analyzed as factors. The rest of the paper is organized as approach is to optimize the SVM algorithm [20, 21] to
follows. Section 2 describes An Overview of Support solve the convex optimization problem. Other approaches
Vector Machines and Factor Analysis. Section 3 will include simplification phase in reducing the training set
describe the Proposed IDS Model with a novel algorithm size [22, 23]. To perform training using SVM, model
and Section 4 selection is crucial. Even though the SVM algorithms are
give Experimental results followed by Conclusions with lesser sensitive to curse of dimensionality, dimensionality
future work. reduction techniques can enhance the efficiency of SVMs.
In SVM, generalization ability depends on the choice of
2. MACHINE LEARNING PERSPECTIVE: AN OVERVIEW SVMs parameters.
In Training the dataset using SVMs, the user should
2.1. SUPPORT VECTOR MACHINES provide the type of kernel function to be applied [21].
Mainly classification in IDS deals with false positive There are several kernel functions namely linear, sigmoid,
reduction and classifying between normal and attack polynomial, radial basis and Gaussian and so on. The
patterns, therefore Support Vector Machines (SVMs) are performance of SVM depends mainly on the kernel
best classifiers. SVMs are supervised learning selected. More general studies showed that Radial Basis
techniques.SVM is based on statistical learning theory and Function (RBF) is most popular choice of Kernel option
is developed by Vapnik [13][14][15].These are built using because of their localized and finite responses across the
support vectors, which are responsible for classification of entire range of the real x-axis [2]. The SVM work flow is
data points with Maximal Marginal Hyper plane(MMH). given with the following algorithm [16].
The main aim is to classify the data points using MMH by SVM Algorithm
solving quadratic optimization problem [16].SVMs have
smaller running times and give high accurate classification Input:D={(xl,yl),(xl,yl),…..,(xl,yl)},xЄRn, yЄ{-1,+1}
results. The attractiveness of SVMs lies in its mathematical Define: wi,bi,λj where w is the weight vector,b is the bias
equations and pictorial illustrations. vector, λj is the lagrangian multiplier and i=number of
SVM is a machine, constructed based on support vectors attributes and j=number of intstances.
which are decisive points in both of the classes. Once Solve: LD = λi - λiλjxi xj yi yj where LD is the dual
support vectors are identified then it is easy to draw the form. It must be solved to obtain λj.
RT
hyper plane which separates both positive and negative Calculate: w,b are obtained by substituting λj.>0 values
classes. In this way classification process is done in SVM. in the equations w= λiyixi and for getting b,in λi( yi( w.
It uses class label, so they are called as supervised learning xi +b) -1) =0
IJE
techniques. By training the model we used to get the Classifier: f(x)=sgn(w*.x+b*) if sgn is‟ +‟ then class is
weight vector and bias vector values which are used to positive,if sgn is‟ –„ then class is negative.
identify support vectors.SVM construction can be done
both in data linearly separable case and linearly inseparable
case. When the data is linearly separable, MMH is 2.2 FACTOR ANALYSIS
constructed based on training points and class boundary. Factor Analysis is a popular statistical technique, which is
When the data is linearly inseparable, the data is mapped to the extension of Principal Component Analysis (PCA).It is
a high dimensional feature space and classification is done. useful in overcoming the shortcomings of PCA.It is also
The process of mapping to a high dimensional feature called Multivariate Statistical Analysis. Factor Analysis
space is called kernel function. The Figure 1: given below specifies that the attributes can be grouped by their
illustrates the classification of SVM. correlations [24].Its significance is to find the
intercorrelations between n attributes by deducing them
w.x+b=1
into a set of factors f, which are relatively lesser than n, the
Support vectors
number of attributes [25] [27]. It can be viewed as an
attempt to approximate the covariance matrix ∑. Therefore
w.x+b=-1 it reduces the dimensionality of the dataset. Factor analysis
produce a table in which the rows are obtained as raw
w.x+b=0 indicator variables and the columns are factors that exhibit
as much of the variance in these variables as possible. The
MMH cells in the table contain factor loadings and the importance
of the factors lies in observing which variables are heavily
loaded on certain factors. Therefore factor loadings are
nothing but the correlation between the variables and
1/w
factors.
Class label y= 1 There are three main steps involved in Factor Analysis.
Class label y= - 1
Figure1: General idea of SVM organization A. Calculate initial factor loading matrix: This can be done
by using two approaches: Principal component method and
But training with SVMs on huge datasets, is time principal axis factoring.
consuming. In recent years, there has been a lot of work B. Factor rotation: The objective of the rotation is to try to
make sure that all variables have high loadings only on one
factor. There are two types of rotation methods namely the dataset in order to improve the time, cost and quality of
orthogonal and oblique rotation. Generally orthogonal results. After completing the Pre-processing phase, it is
rotation is used when the common factors are independent. necessary to collect the necessary or important attributes, to
C. Calculation of factor scores: When calculating factor do this we go for feature reduction phase.
scores, (m factors as f1, f2…fm) a decision has to made as 3.3. FEATURE REDUCTION PHASE
how many factors to include. One vital thing is check the For the system to be perfect it is necessary to reduce the
total variance of original variables is more than 75%.Then features in the raw data based on data analysis. To obtain
choose m to be equal to the number of eigenvalues over 1. most essential attributes and remove that are worse
associated, feature reduction is done. In the Feature
3. PROPOSED INTRUSION DETECTION SYSTEM reduction phase a popular statistical technique called Factor
SVMs are powerful classifiers; they yielded good results Analysis (FA) is used as a dimensionality reduction
when applied to intrusion detection. They are applied to technique. Factor Analysis is responsible for binding the
data with a large number of features, but their performance number of features or attributes to the number of factors
has been drastically increased by reducing the number of based on the correlation between the features. Therefore, a
features [19].In building IDS, KDD Cup 99 dataset which novel feature reduction scheme based on FA-SVM
is a bench mark in the area of intrusion detection and algorithm is proposed in which the factor analysis and
security evaluation frameworks is used. Generally IDS is a SVM are applied. The algorithm is given as follows.
classification technique in a machine-learning framework. FA-SVM Algorithm
Here in the proposed model we have added another phase
to reduce the number of features and then perform Input: Dij is a dataset where ‘i’ is the number of instances
classification task. The key objective is to increase the and ‘j’ is the number of features.
detection rate and reduce the false alarm rate. It consists of Step1: normalize the dataset using suitable pre-processing
five phases: collection of raw KDD cup 99 dataset, pre- techniques.
processing, and feature reduction scheme, parameter Step2: Then calculate the factor loading matrix of Dij.
selection using SVM and testing. The proposed model of Step3: find the cumulative variance and determine
IDS is described in the figure below. principal factors using Eigen values
Step4: Now rotate the factor loading matrix, then compute
the factor score and make it the new feature.
RT
Collection of Raw KDD Cup Step 5: Then train the dataset with the transformed
dataset features using SVM classification with c=10 and γ=0.01
and using Radial Basis Kernel function.
IJE
Feature Reduction
Scheme 3.4. Intrusion Detection: Parameter Selection Using Svm
The dataset is classified using SVM, it will dispose the
classes either attack or normal data. Since SVMs are
Parameter Selection simply capable of binary classifications, we will need to
using SVM use five SVMs, for the 5-class classification in intrusion
detection. We separate the data into the two classes of
“Normal” and “Others” (Probe, DOS, U2Su, R2L) patterns,
Testing where the Others is the group of four classes of attack
instances in the data set. The objective is to divide normal
and attack traffic. We repeat this process for all classes.
Figure 2: Proposed Model of Intrusion Detection System Training is conducted using the RBF (radial basis function)
kernel.
3.1. Collection of Kdd Cup 99 Dataset 3.5. TESTING
The KDD cup 99 dataset which is 10% of TCP/IP dump Here in this approach, we conduct 10 fold cross validation.
data collected from USAir force LAN in the year 1998.It The dataset is partitioned at random into 10 equal parts in
contain 4, 94,020 records of which 97277, 391458, which the classes are taken approximately as same scope as
4107,1126,52 are Normal, DOS, Probe, R2L, U2R in the full dataset. Each part is held out in turn and the
respectively. Because the dataset cannot be processed in its training is conducted on remaining 9 parts, then its testing
format various pre-processing techniques have to be (error rate) is conducted on holdout set. The training
applied to the dataset. procedure is conducted in total of 10 times on different
3.2. PRE-PROCESSING PHASE training sets and finally the 10 error rates are averaged to
Given dataset is not ready for the detection process. It has fetch overall error estimate.
to be processed before undergoing intrusion detection
procedure. Then pre-processing techniques are applied to
Normal
6000 DOS Normal DOS Probe R2L U2R
SVM 93.5 79.4 77.7 9.4 9.6
4000 Probe PCA+SVM 95.2 84.5 84.4 16.4 17.3
R2L FA+SVM 96.7 93.8 95.1 35 25
2000 TABLE I: DETECTION RATE OBTAINED
U2R
0 Normal DOS Probe R2L U2R
Data Organized SVM 19.3 9.0 1.24 0.91 0.09
PCA+SVM 13.4 7.3 2.5 0.3 0.02
Figure 3: The Distribution of the dataset FA+SVM 6.03 5.5 0.9 0.15 0
All the symbolic attributes are converted to numeric. TABLE II: FALSE ALARM RATE OBTAINED
Therefore the attributes protocol_type, service and flag are
converted to numeric. Now the redundant records are The Detection Rates and False Alarm Rates of three
removed from all the classes. Then attributes experiments SVM, PCA+SVM, FA+SVM are depicted in
duration,src_bytes,dst_bytes are discretized.Then sampling the following charts in Figure 4 & Figure 5 for the
is applied to take subset of it, since using the entire set of evaluation of results in precise way.
data is too expensive and time consuming for processing.
Then feature reduction scheme is applied which involves
finding the factors that dominating the attributes in the
100
dataset. This can be done by applying our algorithm FA-
SVM. Then the outcome is obtained with 12 factors (where 80
60 SVM
41 features are transformed as 12 different factors), which
are fed to SVM classifier. 40 PCA+SVM
4.3 Results 20 FA+SVM
We have performed three types of experiments. 0
1) The dataset taken containing 14027 records with no Normal Probe U2R
feature selection, i.e. taking 41 attributes, we applied SVM.
2) In the second experiment we have applied Principal
Figure 4: Comparision Of Performance Results: Detection Rate
are more accurate for the detection of attacks. Available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm ,2001
22. Yu, H., Yang, J., Han, J.: Classifying large data sets using SVM with
hierarchical clusters. In: SIGKDD.,2003,pp.306–315
REFERENCES 23. Lebrun, G., Charrier, C., Cardot, H.: SVM training time reduction
using vector quantization. In: ICPR. Volume 1.,2004,pp. 160–163.
24. Nitin Khosla” Dimensionality reduction using factor
1. Ghosh A. K. (1999). Learning Program Behavior Profiles for analysis”MastersThesis,http://researchhub.griffith.edu.au/display/n2
Intrusion Detection. USENIX. 6993f96c6bc6146d5444ea116009424,2006.
2. Mukkamala S., Janoski G., Sung A. H, “Intrusion Detection Using 25. R. J. Johnson and D. W. Wichern, Applied Multivariate Statistical
Neural Networks and Support Vector Machines,” Proceedings of Analysis, Prentice Hall, New Jersey, 1998
IEEE International Joint Conference on Neural Networks, 2002, 26. M. Hall, et al., "The WEKA data mining software: an update," ACM
pp.1702-1707. SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009.
3. H. Debar, M. Dacier and A. Wespi, ”Towards a taxonomy of 27. http://www.cs.cmu.edu/~pmuthuku/mlsp_page/lectures/slides/JFA_
intrusion-detection systems” Computer Networks, vol. 31,pp. 805- presentation_final.pdf.
822, 1999.
4. Wun-Hwa Chen, Sheng-Hsun Hsu,”Application of SVM and ANN
for intrusion detection”, Computers & Operations Research, 2005 –
Elsevier .
5. Rupali Datti, Bhupendra verma,”Feature Reduction for Intrusion
Detection Using Linear Discriminant Analysis”, (IJCSE)
International Journal on Computer Science and Engineering Vol 02,
No. 04, 2010, 1072-1078
6. Andrew Sung,S Mukkamala.,”Feature Selection for Intrusion
Detection using Neural Networks and Support Vector
Machines”Transportation Research Record:Journal of the
Transportation Research Board 1822.1,2003,pp.33-39.
7. Ravi Kiran Varma,V.Valli Kumari ,”Feature Optimization and
Performance Improvement of a Multiclass Intrusion Detection
System using PCA and ANN” , International Journal of Computer
Applications (0975 – 8887) Vol 44 No13, April 2012.
8. Safaa Zaman and Fakhri Karray.,”Features Selection for Intrusion
Detection Systems Based on Support Vector Machines”, Consumer
Communications and Networking Conference, 2009. CCNC 2009.
6th IEEE
9. Gopi K. Kuchimanchi, Vir V. Phoha, Kiran S. Balagani, Shekhar R.
Gaddam,”Dimension Reduction Using Feature Extraction Methods
for Real-time Misuse Detection Systems”,Proceedings of the 2004