An Anomaly Detection Model Based On One-Class
An Anomaly Detection Model Based On One-Class
Abstract—Intrusion detection occupies a decision position in Intrusion detection was first introduced by Anderson in [3].
solving the network security problems. Support Vector Machines Later, lots of researches have been carried out [4]. Generally,
(SVMs) are one of the widely used intrusion detection techniques. there are two main approaches to conduct intrusion detection:
However, the commonly used two-class SVM algorithms are signature-based detection and anomaly-based detection. The
facing difficulties of constructing the training dataset. That is signature-based detection, also called misuse detection, usually
because in many real application scenarios, normal connection builds a model based on known attacks. Any activity that has
records are easy to be obtained, but attack records are not so. We the corresponding known attack signatures is regarded as an
propose an anomaly detection model based on One-class SVM to intrusion. The signature-based detection model has a good prior
detect network intrusions. The one-class SVM adopts only
knowledge of known attacks, but seldom involves new types of
normal network connection records as the training dataset. But
attacks. Hence, in practice, it could miss a significant amount
after being trained, it is able to recognize normal from various
attacks. This just meets the requirements of the anomaly of real attacks [5]. By contrary, the anomaly detection creates a
detection. Experimental results on KDDCUP99 dataset show that profile from normal behaviors and any violation will be
compared to Probabilistic Neural Network (PNN) and C-SVM, reported as an intrusion. Theoretically, it is capable of detecting
our anomaly detection model based on One-class SVM achieves both known and unknown attacks. Under the current
higher detection rates and yields average better performance in complicated network environment, the anomaly detection is
terms of precision, recall and F-value. much more required and has a better application foreground. In
this paper, we focus on the anomaly detection.
Keywords- intrusion detection; anomaly detection; one-class
SVM; scaling strategy
With the network developing at an unprecedented pace, the
traditional intrusion detection approaches are faced with more
and more challenges. So a lot of new techniques have been
I. INTRODUCTION introduced to conduct intrusion detection [6], among which the
Nowadays, we are living in an information age. We Support Vector Machine (SVM) is one of the widely used
immerse ourselves in the endless joy and great convenience techniques [7, 8]. The standard SVM algorithm was proposed
brought by the Internet. Especially, with the rapid growth of by Corinna Cortes and Vapnik in 1993 and published in 1995
Web applications, everything seems so easy. However, in [9]. A SVM algorithm tries to construct a hyperplane that has
recent years, “attack”, “intrusion” and other similar words the largest distance to the nearest training-data point of any
frequently appear in people’s eyes. Like any greatness, the class in a high or infinite dimensional space, which can be used
Internet is also a double-edged sword. We are enjoying the for classification and other tasks. By using the slack variables
benefits of the Internet, but at the same time, we are suffering and kernel tricks, the SVM guarantees to find the hyperplane
from increasing network threats. The well-known internet that achieves a good separation. Whereas in the actual intrusion
security corporation, Symantec, reminds in its annual Internet detection scenarios, the conventional two-class SVM
Security Threat Report (ISTR) that cybercrime remains algorithms may face some minor problems. For example, in
prevalent and damaging threats from cybercriminals continue many cases, normal network records can be obtained easily, but
to loom over businesses and consumers [1]. Another Web intrusion records are not so. So it is difficult to construct the
security company, Cenzic, reported in 2014 that 96% of the training dataset. Actually, the intrusion detection is not a
tested internet applications had vulnerabilities with a median of straightforward binary classification problem. The attacks can
14 per application, resulting in that hackers are increasingly be divided into many categories. Though some researchers
focusing on and are succeeding with layer 7 (application layer) have tried using multi-class SVM for intrusion detection, the
attacks [2]. These reports show that network security should multi-class SVM is just a combination of many two-class
not be ignored and effective security measures are much SVMs and also encounters the similar problems.
needed.
Given this, we propose to adopt the one-class SVM [10],
Among the important ways to solve security problems, which uses the normal connection records as the training
intrusion detection is an effective and high-profile method. dataset and can recognize normal from various attacks, to
103
B. Feature extracting module support vectors.
Almost no intrusion detection model can distinguish If w and ρ solve this problem, we have the following
between intrusive connections and normal connections directly
decision function:
from original packets. They must be inputted with formatted
data. Feature extracting is to obtain useful information from f ( x ) = sgn( w ⋅ φ ( x ) − ρ ). (4)
raw data and then format it, so that it can be interpreted by the
detection module. There is no permanent standard to extract That is, if w ⋅ φ ( x ) − ρ ≥ 0 , x is declared as a normal
features. It may be better to extract features based on the actual event, otherwise, it is declared as intrusive.
network environment to find whether some attacks are hidden
in connections. Extracting proper features helps the detection To solve the quadratic programming (4), we introduce
module to make more accurate predictions. In terms of network multipliers α i , βi ≥ 0 , and get the Lagrangian:
intrusions, some frequently-used features need paying attention
to, such as the length (number of the seconds) of the L ( w, ξ , ρ , α , β )
connection, the type of the protocol, e.g. tcp, udp, etc., the 1 1 l (5)
number of data bytes transferred, the number of “root” accesses = || w ||2 + ¦ ξ i − ρ
2 vl i =1
and so forth. In our one-class SVM based detection model, the
l l
feature extracting module takes the raw data as inputs, and then − ¦ α i ( w ⋅ φ ( xi ) − ρ + ξ i ) − ¦ β i ξ i .
extracts expected features to form the formatted data. i =1 i =1
Moreover, the feature extracting module is charged with
dividing the formatted data into two divisions, the training data The following Karush-Kuhn-Tucker (KKT) conditions
and the testing data. This process is fairly simple. The normal are necessary and sufficient conditions for the quadratic
records comprise the training data and the rest (intrusive) programming problem (3).
records comprise the testing data. This relates to the detection ∂L
mechanism of one-class SVM (detailed later). = 0,
∂w
C. One-class SVM module ∂L
Here, we adopt the one-class SVM proposed by = 0,
∂ρ
Scholkopf [10]. First, consider the training dataset:
∂L (6)
D = {( xi , yi ) | xi ∈ \ n , yi = +1}li =1 , (1) = 0, i = 1, 2, ... , l.
∂ξi
where xi is the feature vector with dimension n , yi = +1 αi ≥ 0 , i = 1, 2, ... , l.
means all the training patterns are normal observations, and βi ≥ 0 , i = 1, 2, ... , l.
l is the number of training patterns. According to the KKT conditions, set the derivatives
The algorithm basically separates all the data points equal to zero with respect to the primal variables w, ξi , ρ ,
from the origin. Suppose the hyperplane has the form: yielding:
w ⋅ φ ( x ) − ρ = 0, (2) l
w = ¦ α iφ ( xi ),
ρ i =1
then the distance from the hyperplane to the origin is .
|| w || αi =
1 1
− βi ⇔ 0 ≤ αi ≤ ,
(7)
Maximizing the distance results to solving the following vl vl
quadratic programming problem: l
1 1 l
¦α i =1
i = 1.
min || w ||2 + ¦ ξi − ρ
w ,ξ , ρ 2 vl i =1 In (7), the xi corresponding to α i = 0 is an inner point
s.t. w ⋅ φ ( xi ) ≥ ρ − ξi , (3)
that is irrelevant to solving w ; the xi corresponding to
ξi ≥ 0.
0 < α i < 1/ (vl ) is called a support vector, which is crucial
Here, φ ( xi ) is the feature mapping function that maps to solve w ; xi corresponding to α i = 1/ (vl ) is an outlier.
xi from its input space to a feature space, ξi is the slack
Now, we can rewrite the decision function (4) using the
variable for outlier xi that allows it to lie on the other side kernel function:
of the decision boundary (hyperplane), and v ∈ (0,1] is the l
regularization parameter that is an upper bound on the f ( x ) = sgn(¦ α i k ( xi , x ) − ρ ). (8)
fraction of outliers and a lower bound on the fraction of i =1
104
In (8), k ( xi , x ) is the kernel function with the steal information or find vulnerabilities, e.g. port scanning. (3)
R2L: remote to local, unauthorized access from a remote
following equivalent form:
machine, e.g. guessing password. (4) U2R: user to root, using
k ( xi , x ) = φ ( xi )φ ( x ). (9) system’s vulnerabilities to get super user (root) privileges for
some purpose, e.g. various “buffer overflow” attacks. Attacks
For the one-class SVM used in our detection model, we in each category are divided into many types, resulting in a
use the Gaussian kernel: total of 24 training attack types. It is important to note that the
testing data contains some specific types not in the training data
k ( xi , x ) = e −γ || xi − x|| .
2
(10) and does not have the same probability distribution as the
training data. This makes the intrusion detection task more
Substituting (7) into (5), we can obtain the dual problem: realistic.
¦α
ability to detect different kinds of attacks, we randomly
i = 1.
i =1
selected different types of records in the raw testing data. The
sampling proportion is about 1%. Some types of attacks such as
The answer to the dual problem (11) is also the answer R2L and U2R were totally selected due to their low proportion
to the primal quadratic programming problem (3). in KDDCUP99 dataset. Finally, 32426 normal connection
Furthermore, solving the dual problem is much easier and records in the raw training data and 31415 connection records
more feasible. We use the SMO (Sequential Minimal in the raw testing data were randomly selected. TABLE 1
Optimization) algorithm [26] to solve the dual problem. shows the details about different categories of records. “Other”
indicates the new types of attacks not present in the four main
We know, any support vector xi satisfies the equality: categories.
105
compute the evaluation value. The precision, recall and F-value shown in TABLE 2 and Figure 2, and are produced in this
are defined as follows. way—first, any attack that can be detected by one-class SVM
TP is declared as abnormal without any distinction. Then we
Precision = (14) compute the detection rate for different category of attacks
TP + FP according to the labels in the testing data. We can see that for
TP DOS attacks, the three models get perfect results (all above
Recall = (15) 99%). For Probe attacks, one-class SVM can reach the top
TP + FN detection rate 100%, while the detection rates of PNN and C-
SVM are relatively lower, respectively 98.73% and 86.98%.
2* Recall* Precision
F −value = (16) We should note that for R2L, U2R and ̌Other̍ categories of
Recall + Precision attacks, the results of all the three models are not very
satisfactory. We believe one of the main reasons is that the
C. Results and Discussions number of attacks in these three categories is relatively small
In this section, we compare our one-class SVM based (see Table 1), so the test results have some limitations. Another
model with other two well-knowns, probabilistic neural reason may be that the attacks are too covert to be detected by
network (PNN) [30] and C-SVM (proposed by Cortes and the models. But even so, the detection results of one-class
Vapnik in [9]), given that they both adopt the radial basis SVM are considerably better than two others’. Furthermore, for
function (Gaussian function or Gaussian kernel) as the one- PNN and C-SVM, the “Other” category of attacks are new
class SVM does and are often used to detect intrusions due to attacks not present in their training data, so it is especially
their good classification performance. PNN used in our difficult for them to detect such attacks. But for one-class
experiments is taken from the MATLAB R2013b toolbox and SVM, the new attacks receive the same treatment as with other
C-SVM from the software LIBSVM [31]. Because the training categories of attacks, without any difference.
data used by PNN and C-SVM must contain both normal and
abnormal records, we conducted a stratified random sampling Next, we use three other criteria, precision, recall and F-
for the raw training data in KDDCUP99 with the proportion value to conduct performance comparison. The results are
around 1%. The final training data consists of 49567 records, shown in Table 3 and Figure 3. As illustrated by Figure 3, one-
including 9728 Normals (19.63%), 39167 DOSs (79.02%), 412 class SVM produces a slightly lower precision than PNN and
Probes (0.83%), 208 R2Ls (0.42%), and 52 U2Rs (0.10%). The C-SVM. But the precisions of all the three models are very
other three models use the same testing data as described in high (above 99%). Apparently, the recall and F-value of one-
TABLE 1. class SVM are higher than others’.
TABLE 3. PRECISION, RECALL AND F-VALUE OF DIFFERENT MODELS
PNN C-SVM One-class SVM
Precision 0.9988 0.9957 0.9903
Recall 0.8916 0.9041 0.9161
F-value 0.9422 0.9477 0.9518
106
facing the difficulties of constructing the training dataset. That [11] Taylor, Carol, and Jim Alves-Foss. Low cost network intrusion
is because in many true application scenarios, it is easy to detection. (2000).
obtain normal connection records, but difficult to obtain attack [12] Barbara, Daniel, Ningning Wu, and Sushil Jajodia. Detecting Novel
Network Intrusions Using Bayes Estimators. SDM. 2001.
records, or the number of attack records is very limited.
[13] Shyu, Mei-Ling, et al. A novel anomaly detection scheme based on
Whereas to a great extent, the distribution of training records principal component classifier. MIAMI UNIV CORAL GABLES FL
affects the detection results of the two-class SVM. Hence, we DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003.
propose to use one-class SVM, which adopts only the normal [14] Qin, Min, and Kai Hwang. Frequent episode rules for intrusive anomaly
network connection records as the training data, to conduct the detection with internet datamining. USENIX Security Symposium. 2004.
anomaly detection. The experimental results on KDDCUP99 [15] Denning, Dorothy E. An intrusion-detection model. Software
dataset show that compared to PNN and C-SVM, our one-class Engineering, IEEE Transactions on 2 (1987): 222-232.
SVM achieves higher detection rates for different categories of [16] Wang, Gang, et al. A new approach to intrusion detection using
attacks and has an average better performance in terms of Artificial Neural Networks and fuzzy clustering. Expert Systems with
precision, recall and F-value. The deficiency lies in that both Applications 37.9 (2010): 6225-6232.
our one-class SVM based and other two models show relatively [17] Sinclair, Chris, Lyn Pierce, and Sara Matzner. An application of
machine learning to network intrusion detection. Computer Security
low detection rates for low-frequent attacks, such as R2L and Applications Conference, 1999.(ACSAC'99) Proceedings. 15th Annual.
U2R. Affecting the accuracy of results, the insufficient number IEEE, 1999.
of data is partially to blame. But the detection model could also [18] Tsai, Chih-Fong, et al. Intrusion detection by machine learning: A
be enhanced. We leave this as future work. review. Expert Systems with Applications 36.10 (2009): 11994-12000.
[19] Ryan, Jake, Meng-Jang Lin, and Risto Miikkulainen. "Intrusion
ACKNOWLEDGMENT detection with neural networks." Advances in neural information
processing systems (1998): 943-949.
We would like to thank the anonymous reviewers for their [20] Kim, Dong Seong, and Jong Sou Park. "Network-based intrusion
comments and suggestions. Thank our advisors for their detection with support vector machines." Information Networking.
valuable guidance. Thank all the authors in references for their Springer Berlin Heidelberg, 2003.
work to help us carry out research. [21] Sung A H, Mukkamala S. Identifying important features for intrusion
detection using support vector machines and neural
networks[C]//Applications and the Internet, 2003. Proceedings. 2003
REFERENCES Symposium on. IEEE, 2003: 209-216.
[1] Symantec Enterprise. Internet Security Threat Report 2014. [22] Mukkamala, Srinivas, Guadalupe Janoski, and Andrew Sung. "Intrusion
http://www.symantec.com/content/en/us/enterprise/other_resources/b- detection using neural networks and support vector machines." Neural
istr_main_report_v19_21291018.en-us.pdf. (accessed 15th, Apr, 2015) Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint
Conference on. Vol. 2. IEEE, 2002
[2] Cenzic. Application Vulnerability Trends Report 2014.
http://www.cenzic.com/downloads/Cenzic_Vulnerability_Report_2014. [23] Ambwani, Tarun. "Multi class support vector machine implementation
pdf. (accessed 15th, Apr, 2015) to intrusion detection." Neural Networks, 2003. Proceedings of the
International Joint Conference on. Vol. 3. IEEE, 2003.
[3] Anderson, James P. Computer security threat monitoring and
surveillance. Vol. 17. Technical report, James P. Anderson Company, [24] Khan, Latifur, Mamoun Awad, and Bhavani Thuraisingham. "A new
Fort Washington, Pennsylvania, 1980. intrusion detection system using support vector machines and
hierarchical clustering." The VLDB Journal—The International Journal
[4] Axelsson, Stefan. Intrusion detection systems: A survey and taxonomy.
on Very Large Data Bases 16.4 (2007): 507-521.
Vol. 99. Technical report, 2000.
[25] Horng, Shi-Jinn, et al. "A novel intrusion detection system based on
[5] Kruegel, Christopher, and Thomas Toth. Using decision trees to improve
hierarchical clustering and support vector machines." Expert systems
signature-based intrusion detection. Recent Advances in Intrusion
with Applications 38.1 (2011): 306-313.
Detection. Springer Berlin Heidelberg, 2003.
[26] Platt, John. "Sequential minimal optimization: A fast algorithm for
[6] Patcha, Animesh, and Jung-Min Park. "An overview of anomaly
training support vector machines." (1998).
detection techniques: Existing solutions and latest technological trends."
Computer Networks 51.12 (2007): 3448-3470. [27] UCI KDD Archive. KDDCUP99 dataset.
http://kdd.ics.uci.edu/databases/kddcup99/. (accessed 15th, Apr, 2015)
[7] Li, Yuping, Weidong Li, and Guoqiang Wu. "An intrusion detection
approach using SVM and multiple kernel method." IJACT: International [28] MIT Lincoln Laboratory. DARPA Intrusion Detection Data Sets.
Journal of Advancements in Computing Technology 4.1 (2012): 463- http://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/ideva
469. l/data/index.html. (accessed 15th, Apr, 2015)
[8] Li, Yinhui, et al. "An efficient intrusion detection system based on [29] Georgiou, V. L., et al. Optimizing the performance of probabilistic
support vector machines and gradually feature removal method." Expert neural networks in a bioinformatics task. Proceedings of the EUNITE
Systems with Applications 39.1 (2012): 424-430. 2004 Conference. 2004.
[9] Corinna Cortes and Vladimir Vapnik. "Support-vector networks." [30] Specht, Donald F. "Probabilistic neural networks." Neural networks 3.1
Machine learning 20.3 (1995): 273-297. (1990): 109-118.
[10] Schölkopf, Bernhard, et al. "Estimating the support of a high- [31] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support
dimensional distribution." Neural computation 13.7 (2001): 1443-1471. vector machines. ACM Transactions on Intelligent Systems and
Technology, 2:27:1--27:27, 2011. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.
107