0% found this document useful (0 votes)
5 views8 pages

2006MONAM

This paper compares neural networks and decision trees for intrusion detection, highlighting the limitations of current signature-based systems in detecting new attacks. It proposes an enhanced anomaly detection approach that incorporates known attack behaviors to improve detection rates for both known and novel attacks. Experimental results indicate that while neural networks excel at identifying known attacks, decision trees are more effective in detecting new attacks.

Uploaded by

mahmud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

2006MONAM

This paper compares neural networks and decision trees for intrusion detection, highlighting the limitations of current signature-based systems in detecting new attacks. It proposes an enhanced anomaly detection approach that incorporates known attack behaviors to improve detection rates for both known and novel attacks. Experimental results indicate that while neural networks excel at identifying known attacks, decision trees are more effective in detecting new attacks.

Uploaded by

mahmud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Neural networks vs.

decision trees for intrusion


detection
Yacine Bouzida Frédéric Cuppens
Mitsubishi Electric ITE-TCL Département RSM GET/ENST Bretagne
1, allée de Beaulieu CS 10806 2, rue de la Châtaigneraie F-35576, Cesson Sévigné, France
35708, Rennes, France [email protected]
[email protected]

ABSTRACT (of anomaly detection) is restrictive because only one class


Signature based intrusion detection systems cannot detect new which corresponds to the normal behavior is learned.
attacks. These systems are the most used and developed ones. In this paper, we extend the definition of anomaly detection
Current anomaly based intrusion detection systems are also to not only take into account normal profiles but also han-
unable to detect all kinds of new attacks because they are dle known attacks and explore supervised machine learning
designed to restricted applications on limited environment. techniques, particularly neural networks and decision trees for
Current hackers are using new attacks where neither preven- intrusion detection. In fact, decision trees induction algorithm
tive techniques mainly based on access control nor current has proven its efficiency in predicting the different classes of
intrusion detection systems can prevent the devastating results the unlabeled data in the test data set for the KDD 99 intrusion
of these attacks against information systems. We enhance detection contest [9]. Since machine learning techniques, gen-
the notion of anomaly detection and we use both neural erally, cannot find boundaries between known and unknown
networks and decision trees for intrusion detection. Since classes, an extension of neural networks and decision trees is
these techniques are mainly applicable to misuse detection, introduced to deal with new unknown anomalies.
we use our anomaly detection enhancement and improve The rest of the paper is organized as the following. Section
these techniques for anomaly detection. Experimental results II presents the state of the art of current anomaly detection
demonstrate that while neural networks are highly successful method in general and the limitations of the current anomaly
in detecting known attacks, decision trees are more interesting detection tools that only learn normal behaviors and flag sus-
to detect new attacks. The proposed methods outperform picion when deviation, from the established normal behavior,
previous work in detecting both known and new attacks. is observed. Based on this state, we enhance the anomaly
detection notion for detecting novel attacks. Section III briefly
KEY WORDS presents background of neural networks and an improvement
Intrusion Detection, Anomaly Detection, Neural Networks, of this technique to handle new attacks with the different re-
Decision Trees. sults obtained for both the standard multilayer neural network
and its enhancement for new attacks detection. In Section
I. I NTRODUCTION IV, we present decision trees induction algorithm with an
improvement for anomaly detection and the corresponding
Anomaly intrusion detection systems are not well studied or results. Finally, Section V concludes the paper.
explored as misuse detection ones. Misuse detection consists in
using patterns of well known intrusions to match and identify II. P ROBLEM STATEMENT AND MOTIVATION
known labels for unlabeled datasets. In fact, many commercial Anomaly intrusion detection is the first intrusion detection
and open source intrusion detection systems (IDSs) are misuse method that was introduced to monitor computer systems by
based ones. Recently, attackers have explored serious break- Anderson [1] in 1980 and then improved by Denning [6] in
ins to many commercial and government sites where serious 1987. At that time, intrusion detection was immature since
damages have occurred. The different intrusions that have been only user behavior and some system events were taken into
used were new. This situation was foreseeable because the account. In fact, this approach consisted in establishing normal
attackers are attempting to develop new attacks forms where behavior profile for user and system activity and observing
neither misuse detection tools nor access control tools (such significant deviations of the actual user activity with respect
as firewalls) installed in our networks may detect or stop these to the established habitual profile. Significant deviations are
new attacks forms. flagged as anomalous and should raise suspicion. This defini-
Anomaly detection, on the other hand, consists in building tion did not take into account the expert knowledge of known
profiles of normal behaviors then detecting any deviation of a vulnerabilities and known attacks. This is why we enhance the
new behavior from the learned normal profiles. This definition notion of anomaly detection not only by considering normal
profiles but also by taking into account abnormal behaviors methods used for this task are not anomaly based techniques.
that are extracted from known attacks. However, they consisted in learning the signatures of the
Since we have knowledge about known vulnerabilities and connections (attacks or normal traffic) using the 41 attributes
their corresponding attacks, we may enhance the anomaly composing the different connections. Then the new connec-
detection by adding to the learning step the abnormal behavior tions are compared to the learning data set using the model
corresponding to known attacks. Therefore anomaly detection constructed during the training phase. We call this technique:
would consists in learning all known normal and attack misuse detection by learning because it differs from other
profiles. Based on this learnt knowledge, anomaly detection signature based techniques, such as snort, bro, etc., where only
has then to detect whether a new observed profile is normal attack signatures are used. On the other hand, it is the role of
or abnormal and its corresponding known attack is determined the expert to write the different rules using the corresponding
or the observed profile is new and therefore it is considered known vulnerabilities.
as a novel unknown behavior. Thereafter, we suggest that a Based on the above considerations and limitations and
diagnosis should be done on the observed traffic that has the need to explain the failure of supervised techniques to
caused the detection of the new anomaly in order to find out detect U2R and R2L attacks, we investigate two supervised
the reason of this new observation. Thus, if it corresponds to techniques, namely neural networks and decision trees in order
a new activity that was not seen before it is flagged either as a to explain the failure of machine learning techniques in the
normal profile or as a new attack. The new observations with KDD 99 contest.
their real classification would then be considered for further In our experiments, we use the KDD 99 data sets without
investigation. We note that the diagnosis of the new observed altering any sample or considering any new sample as de-
behaviors is not our main objective here. scribed in Table II. The attacks that have any occurrence in
In our knowledge, all the efforts done by different re- the learning set should be detected as known attacks and others
searchers for detecting new attacks in the KDD 99 either —those that are absent in the training set and are present in the
consider unrealistic hypotheses or obtain uninteresting results. test set— are considered as anomalies and should be predicted
In the following, we recall the different measures that are used as new attacks.
to rank the different proposed methods for this task. We use The default supervised algorithms do not deal with unknown
this measure to assess our results discussed in Sections III and classes. They are interesting since they can generate alarms in
IV. real time at the end of a connection by contrast to unsupervised
To rank the different results a cost matrix C is defined [7]. techniques that remain unusable for real time intrusion detec-
Given the cost matrix illustrated in Table I and the confusion tion. Separate modules for anomaly and misuse detection may
matrix obtained subsequent to an empirical testing process, a not be as efficient as a single module with the two techniques
cost per test (CPT) is calculated using the formula given in in the same time. These observations have motivated us to
Equation (1). enhance supervised anomaly detection techniques as presented
in the following sections.
Normal Probing DoS U2R R2L
Normal 0 1 2 2 2
In the following, we motivate our research with neural net-
Probing 1 0 2 2 2 works to compare its results with the best entries. We consider
DoS 2 1 0 2 2
U2R 3 2 2 0 2
them also as a cross validation technique with decision trees
R2L 4 2 2 2 0 for the task of detecting new attacks. While the successful
TABLE I detection rate of the new U2R attacks is increased, in our
T HE COST PER TEST MATRIX . experiments, the R2L attack remains very low. This suggested
us to focus on the transformation done by the MADAM/ID
tool and prove that this transformation is not an appropriate
5 5 one. The low detection rate of new R2L attacks is due to
1 XX
CP T = Ci,j ∗ CMi,j (1) this transformation and not to the enhanced machine learning
N i=1 j=1
algorithms particularly the decision trees induction algorithm.
where C corresponds to the cost matrix, N is the number III. N EURAL NETWORKS
of instances in the test data set and CM corresponds to the
confusion matrix obtained subsequent to the method that is A. Backpropagation technique for intrusion detection
used in the classification task. Backpropagation is a neural network learning algorithm.
The accuracy of each experiment is based on the percentage A neural network is a set of connected units following a
of successful prediction (PSP) on the test data set. particular topology. Each neuron is described by a unit that
has an input and an output. Two neurons are connected if the
number of successf ul instance classif ication output of one of them is connected to the input of the other.
P SP = Each connection in a neural network has a weight associated to
number of instances in the test set
(2) it. The topology of the neural network, the training methodol-
The different techniques that are applied to the KDD 99 data ogy for weights’ adjustment and the connections between the
sets did not detect unknown attacks as new ones because the different neurons define the type of the corresponding neural
Probing (4, 107; 4, 166) DoS(391, 458; 229, 853)
ipsweep(1, 247; 306), apache2(0; 794), back(2, 203; 1.098), where only packet header attributes are considered for analysis
mscan(0; 1, 053), land(21; 9), mailbomb(0; 5, 000), to detecting denial of service and port scan attacks. While
nmap(231; 84), neptune(107, 201; 58, 001),
portsweep(1, 040; 364), pod(264; 87), processtable(0; 759), these works used neural networks for either user anomaly
saint(0; 736), smurf(280, 790; 164, 091), detection or misuse detection, we use them here for both
satan(1, 589; 1, 633). teardrop(979; 12), udpstorm(0; 2).
U2R(52; 228) R2L(1, 126; 16, 189)
network misuse and anomaly detection particularly over the
buffer overflow(30, 22), ftp write(8; 3), imap(12; 1), different KDD 99 data sets [9].
httptunnel(0; 158), multihop(7; 18), named(0; 17), phf(4; 2),
guess passwd(53; 4, 367), sendmail(0; 17), snmpgetattack(0; 7, 741),
loadmodule(9; 2), perl(3; 2), snmpguess(0; 2, 406), spy(2; 0),
B. Experimental methodology and results
perl(3; 2), ps(0; 16), warezclient(1, 020; 0),
rootkit(10; 13), sqlattack(0; 2), warezmaster(20; 1, 602), worm(0; 2),
Some parameters of the neural network are known a priori
xterm(0; 13). xlock(0; 9), xsnoop(0; 4). from the provided problem. The number of neurons on the
TABLE II input layer, in our example using the KDD 99 data sets, is
T HE DIFFERENT ATTACK TYPES AND THEIR CORRESPONDING equal to 125 units because the discrete attributes among the
OCCURRENCE NUMBER RESPECTIVELY IN THE TRAINING AND TEST DATA 41 attributes are translated into continuous ones. The number
SETS . of neurons on the output layer is equal to the number of the
total classes corresponding to the five classes considered in
the KDD 99 contest (normal, probing, DoS, U2R and R2L
respectively).
network. In our study, we are interested in the multilayer Other parameters such as the number of hidden layers, the
neural networks using the backpropagation learning algorithm number of neurons in the hidden layers, the momentum, the
[14]. In a multilayer neural network, there are three kinds of learning rate and the number of iterations are determined by
layers. Each layer contains a set of neurons. The first layer, experience.
called input layer, sets the activation of its neurons according In the following, the number of hidden layers we consider
to the provided pattern in question. The output layer provides in our neural network architecture is limited to only one
the answer of the network. A multilayer network may contain hidden layer. After performing different tests using one hidden
one or many hidden layers although in practice, usually one layer, two then three hidden layers we did not obtain a
is used. significant improvement in comparison with using only one
Like any supervised learning technique, a multilayer neural hidden layer. The momentum is fixed to 0.60 after many
network has two phases. The learning phase where the network experiments where this parameter varied over the interval
learns by adjusting the weights so as to be able to predict the [0.20, 0.90]. The learning rate is fixed to 0.20 after varying
correct class label of the new input patterns during the test it over the interval [0.10, 0.50]. However, the weights values
phase. of the different connections in the whole network are randomly
Before the training process, one should define the number initialized in the interval [−0.50, 0.50].
of hidden layers (if more than one) and the number of neurons Since each neuron on the output layer corresponds to one
on each layer. The number of neurons on the input layer class, the neuron with the highest value defines the predicted
corresponds to the number of attributes that represent a sample. class. Using this technique, every sample will be assigned a
However, input values should be numerical to perform the class among the five classes defined a priori.
backpropagation algorithm. Therefore, the discrete values are Table III presents the confusion matrix related to the best
transformed into a vector as it is explained in the following. percentage of successful prediction obtained after combining
For each different discrete value of an attribute is assigned a the best parameters of the neural network.
neuron on the input layer. For example, for the protocol type
(tcp, udp, icmp), there are three inputs, say I0 , I1 , I2 assigned Predicted %Normal %Probing %DoS %U2R %R2L
Actual
to this attribute. Each unit is initialized to 0. If the protocol Normal(60,593) 97.87 0.75 1.20 0.00 0.18
type of the current connection is tcp (resp. udp, icmp) then I0 Probing (4,166) 10.68 71.63 15.34 0.00 2.35
DoS (229,853) 2.62 0.36 97.00 0.00 0.02
is set to 1 (resp. I1 is set to 1 and so on). One output unit, on U2R (228) 86.84 7.02 3.95 0.00 2.19
the output layer, may be used to represent exactly one class. R2L (16,189) 73.20 0.06 0.06 0.00 26.68
So, if the output of a neuron on the output layer is equal to P SP = 93.10%, CP T = 0.2072
1 then the corresponding class is designed as the predicted TABLE III
class. The number of hidden layers and the number of units C ONFUSION MATRIX WHEN USING THE BACKPROPAGATION TECHNIQUE
on each hidden layer is established by experience during the WITH THE BEST PARAMETERS .
training phase since there are no clear rules as to set the best
number of hidden layer units.
The use of neural networks in intrusion detection is not We mention, from Table III, that the prediction ratio P SP =
new because there are at least two works that were developed 93.10% and the cost per test CP T = 0.2072 outperform
during the last decades. The first model is used in Hyperview all the results of the previous works done over KDD 99.
[5] for a user behavior modeling. The second one is that However, the U2R class is undetectable. This is because the
discussed in [4]. This latter was used as a misuse detection tool number of samples of the U2R class during the learning step
Normal% Probing%
IF (all neurons on the output layer are less than a threshold) THEN 97,83 80
The corresponding connection is new 97,82 70
A diagnosis should be performed 97,81 60
50
ELSE 97,8
40
Let the kth neuron be the most activated one, 97,79
30
97,78
This connection corresponds to the kth class. 97,77
20
10
FI 97,76 0
0,4 0,5 0,6 0,7 0,8 0,9 0,4 0,5 0,6 0,7 0,8 0,9
Threshold value Threshold value
Fig. 1. Classification process using a threshold. DoS% R2L%
97 30

25
96,98
20
96,96
15
is the lowest one (52 instances over 494, 021). Therefore, it is 96,94
10

difficult to learn this category using neural networks. Our first 96,92 5

96,9 0
goal does not consist in outperforming previous work done 0,4 0,5 0,6 0,7 0,8 0,9 0,4 0,5 0,6 0,7 0,8 0,9
Threshold value Threshold value
over KDD 99 intrusion detection contest. However, we want
to understand why all these algorithms fail to detect the last
two attack classes, namely U2R and R2L. We also note that Fig. 3. Different classes PSP variation according to the considered threshold
the two attacks U2R and R2L are often detected as a normal value.
traffic (86.84% for U2R and 73.20% for R2L) in almost all
the techniques that are used for this purpose. There are many
classes are not predicted in their corresponding actual class
R2L and U2R instances that are new (see Table II) in the test
when considering the threshold equal to 0.70 for R2L and 0.90
data sets since their corresponding attack type is not present
for the Probing attack class. This means that the instances of
in the learning data set.
the test corresponding to these two classes are not well learned
In order to detect these new attacks we improve the clas-
or are not close to their corresponding instances in the learning
sification process of the neural networks as the following. A
data set.
threshold θ is defined. Therefore, if the value of the highest
output neuron is below this threshold, the corresponding Normal(New%) Probing(New%)
1,2 80
connection is considered momentarily anomalous however a 1 70
60
diagnosis should be performed for further investigation. The 0,8
50
0,6 40
diagnosis is not a goal here. Figure 1 presents this algorithm. 0,4 30
20
Figure 2 shows the variation of the percentage of successful 0,2
10
0
versus the variation of the a priori fixed threshold θ. 0,4 0,5 0,6 0,7 0,8 0,9
0
0,4 0,5 0,6 0,7 0,8 0,9
Threshold value Threshold value

DoS(New%) U2R(New%)
PSP% 0,6 25
94,2
0,5 20
94
0,4
15
93,8 0,3
10
93,6 0,2

0,1 5
93,4

93,2 0 0
0,4 0,5 0,6 0,7 0,8 0,9 0,4 0,5 0,6 0,7 0,8 0,9
93 Threshold value Threshold value
0,4 0,5 0,6 0,7 0,8 0,9
Threshold value R2L(New%)
35
30
25
20
Fig. 2. PSP variation according to the considered threshold value. 15
10
5
The results shown in Figure 2 are performed over the same 0
0,4 0,5 0,6 0,7 0,8 0,9
neural network using the best parameters. We mention that Threshold value

the attacks instances that are predicted as new attacks are


considered as a successful prediction ratio. Fig. 4. Different classes ratios detected as a new class.
While the whole successful prediction ratio increases, the
corresponding prediction ratio of each class decreases respec- We report in Figure 4 the prediction ratios of the different
tively. Figure 3 shows the different prediction ratios of each classes that are detected as new ones. Figure 4 shows that
class1 when varying the threshold. while increasing the threshold value the two attack classes
According to Figure 3, even if the threshold is set to 0.90 R2L and Probing are detected as new ones. This means that
the two classes DoS and Normal remain detectable in their they are moving from their actual class when no threshold was
actual classes. This means that the neural network has correctly considered to a new class as if they differ from their real class.
learned these two classes. However, the probing and the R2L However, Figure 5 presents the different attack classes that are
1 The U2R class is not presented since it is not detected even before detected as a normal class while increasing the value of the
considering the threshold. threshold. It is interesting to note that the prediction ratio of
these attacks as a normal one remains respectively stable for learning set containing a finite and not empty set of labeled
all of them even if the value of this threshold is equal to 0.90. instances.
The two classes U2R and R2L are always detected as normal The decision tree is constructed during the learning phase,
with rates exceeding 76.75% for U2R and 64.4% for the R2L it is then used to predict the classes of new instances.
class. This means that the new instances of these two classes Most of the decision trees algorithms use a top down
seemingly are close to the instances of the normal connection strategy; i.e from the root to the leaves. Two main processes
present in the training set. are necessary to use the decision tree: the building process and
the classification process.
Probing( Normal%) DoS(Normal%) Besides the construction and classification steps, many
10,65 2,502
10,6 2,5 decision trees algorithms use another optional step. This step
10,55 2,498
10,5
10,45
2,496 consists in removing some edges that are considered useless
2,494
10,4
10,35
2,492 for improving the performance of the tree in the classification
2,49
10,3
10,25
2,488
2,486
step. Pruning trees simplifies the tree since many useless edges
10,2
0,4 0,5 0,6 0,7 0,8 0,9
2,484
0,4 0,5 0,6 0,7 0,8 0,9
are removed rendering complex trees more comprehensive for
Threshold value Threshold value
interpretation. In addition, a tree that is already built is pruned
U2R(Normal%) R2L(Normal)%
77,3 69,5 only when it gives better classification results than before
77,2
77,1
69
pruning [11].
68,5
77
76,9 68
In practice, one successful method that is used for finding
76,8
76,7
67,5 high accuracy hypotheses is based on pruning the rules issued
67
76,6
76,5 66,5
from the tree constructed during the learning phase. This
0,4 0,5 0,6 0,7 0,8
Threshold value
0,9 0,4 0,5 0,6 0,7 0,8
Threshold value
0,9 method is used in the C4.5rules [12] that is a companion
program to C4.5.
After the building process, each attribute test along the path
Fig. 5. Ratios of the different attack classes detected as a normal class. from the root to the leaf becomes a rule antecedent (precon-
dition) and the classification at the leaf node becomes the
Although the neural network outperforms all previous works rule consequence (postcondition). To illustrate the rule post
done over KDD 99 intrusion detection data sets, it failed to de- pruning, let us consider the following rule generated from the
tect the attacks that are not present with low number presence tree:
in the training data set particularly the U2R attack instances IF (protocol type = icmp) ∧ (count> 87)
that are almost not predicted in the whole experiments. THEN class = smurf
While the neural networks transform the discrete values of This rule is pruned by removing any antecedent whose removal
the different attributes into numerical values, the decision trees does not worsen its estimated accuracy.
algorithm works not only with numerical attributes values but In addition to the advantages cited by Mitchell [11], the
also with discrete values. In Section IV, we investigate the pruned rules have many advantages in intrusion detection.
decision tree induction algorithm to test whether it is possible Since the rules have the "IF ... THEN ..." format, they
to detect new attacks, especially the two classes R2L and U2R can be used as a model for a rule based intrusion detection. The
that remain undetectable. Our goal consists in detecting the last different C4.5 rules that are generated are concise and intuitive.
two classes as attacks rather than improving the percentage of Therefore, they can be checked and inspected by a security
the successful prediction ratio of the whole test data. If this is expert for further investigation. We notice that C4.5rules has
not the case, we should give the reasons why they are always interesting properties for intrusion detection since it generates
detected as normal connections. a good generalization accuracy. New intrusions may appear
after the building process whose forms are quite similar
IV. D ECISION TREES to known attacks that are considered a priori. Using the
generalization accuracy of the rules, new attacks variations
A. Background
could then be detected using the different rules. Real time
Decision trees learners trace their origins back to the work IDSs require short rules for efficiency. Post pruning the rules
of Hunt and others in the late 1950s [8]. At least two seminal generates accurate conditions hence improves the execution
works are to be mentioned, those by Quinlan [12] and by time for a real time use of decision in intrusion detection.
Breiman et al. [3]. A decision tree is a tree that has three main
components: nodes, arcs, and leaves. Each node is labeled B. Improving the classification process
with a feature attribute which is most informative among the While the rules are efficient for detecting intrusions and
attributes not yet considered in the path from the root, each their variants, they remain limited to known attacks and normal
arc out of a node is labeled with a feature value for the node’s traffic. This is because the decision trees C4.5 algorithm
feature and each leaf is labeled with a category or class. written by Quinlan [12] presents a drawback towards the set
Decision trees classifiers are based on the “divide and of instances that are not covered by any of the rules generated
conquer” strategy to construct an appropriate tree from a given from the decision tree. He proposed a default class for those
instances. The default class is defined as that with most items class from the known classes in the training data set is
not covered by any rule. In the case of conflict, ties are automatically assigned to any new instance that may not be
resolved in favor of the most frequent class. An example of covered by any of the different rules. In the second step, the
such a classification is illustrated in Table IV. enhanced C4.5 algorithm, as explained in Section IV-B is used
to handle new instances.
C4.5 rule Meaning
duration <= 2, If the duration of the connection is less Table V presents the confusion matrix for the 5 classes
num failed logins> 5 or equal to 2 seconds and the number of when using the rules from the decision trees generated by
− > class guess passwd failed logins is greater than 5 then this
connection (telnet or rsh) is a guessing the standard C4.5rules algorithm of Quinlan [13].
password attack.
protocol type = icmp, If the protocol type is icmp and the length Predicted %Normal %Probing %DoS %U2R %R2L
src bytes> 333 of the packets coming from the source is Actual
− > class smurf greater than 333 bytes then this connection Normal(60,593) 99.47 0.40 0.12 0.01 0.00
is a smurf attack. Probing (4,166) 18.24 72.73 2.45 0.00 6.58
. . DoS (229,853) 2.62 0.06 97.14 0.00 0.18
. .
. . U2R (228) 82.89 4.39 0.44 7.02 5.26
R2L (16,189) 81.60 14.85 0.00 0.70 2.85
Default: Normal If none of the rules matches then the cur- P SP = 92.30%, CP T = 0.2342
rent connection corresponds to a normal
one. TABLE V
TABLE IV C ONFUSION M ATRIX RELATIVE USING THE RULES GENERATED BY THE

C LASSIFICATION USING THE POST PRUNED RULES . STANDARD C4.5 RULES ALGORITHM .

From Table V, the two classes R2L and U2R are badly pre-
Using this principle, a default class from the learning data
dicted. On the other hand, many probing and DoS instances are
set is assigned to any observed instance that may be a normal
misclassified within the normal category. Most misclassified
connection, known or unknown attack. This classification is
instances are predicted as normal. This is due to the supervised
useful only if it is exclusive. Since we are interested in
C4.5rules algorithm that assigns a default class among known
detecting novel attacks this classification would not be able
classes as explained in Section IV-B. We note that the class
to detect new attacks that normally are not covered by any
that has the highest number of uncovered instances according
rule from the tree built during the learning step.
to the different pruned rules in the learning data set is the
To overcome this problem, instances that do not have a
normal class corresponding to the normal traffic.
corresponding class in the training data set are assigned to a
Hence, if a new instance is presented that is different
default class denoted new class. Therefore, if any new instance
(see for instance definition 4.1 below) from all other known
does not match any of the rules generated by the decision tree
normal or abnormal instances in the learning step, it is
then it is classified as a new class instead of assigning it to a
automatically classified as the default class normal.
default class. Let us call this algorithm enhanced C4.5.
Definition 4.1: An instance A is different from all other
To illustrate the effectiveness of this new classification, we
instances present in the training data set, according to the
conduct, in Section IV-C, our experiments on the KDD 99
different generated rules, if none of the rules matches this
database since it contains many new attacks in the test data set
instance.
that are not present in the training data set as shown in Table
The confusion matrix obtained when we use the enhanced
II. On the other hand, we applied this technique to a real
C4.5rules algorithm that considers the default class as a new
traffic in our laboratory network. This traffic contains some
instance is presented in Table VI.
new attacks that were not available when DARPA98 was built
such as the slammer worm and the different DDoS attacks. Predicted %Normal %Probing %DoS %U2R %R2L %New
These experiments shown the effectiveness of our algorithm. Actual
Normal(60,593) 99.43 0.40 0.12 0.01 0.00 0.04
We do not present them here because of space limitation (for Probing (4,166) 8.19 72.73 2.45 0.00 6,58 10.06
more details, see [2], Chapter 5). DoS (229,853) 2.26 0.06 97.14 0.00 0.18 0.36
U2R (228) 21.93 4.39 0.44 7.02 5.26 60.96
This proposal may be generalized to any problem similar R2L (16,189) 79.41 14.85 0.00 0.70 2.85 2.20
to the KDD 99 contest that seeks to find new instances in the P SP = (92.30 + 0.57)%, CP T = 0.2228
test data set where some classes should be detected as new TABLE VI
ones but not as one of the categories listed in the training data C ONFUSION MATRIX WHEN USING THE GENERATED RULES FROM THE
set. The fact that new attacks are not considered is one of the ENHANCED C4.5 ALGORITHM .
reasons that does not enable the different methods applied to
KDD 99 contest to predict any new attack.
By using the enhanced C4.5 algorithm, the detection rate of
C. Experimental Analysis of KDD 99 the U 2R class is increased by 60.96% (corresponding to the
We first present the different experiments and results ob- httptunnel attack) which decreases the false negative rate of
tained when using the different rules generated from the this class from 82.89% (189/228) to 21, 93% (50/228). The
standard C4.5 algorithm. Applying this algorithm, a default detection rate of the Probing class is also enhanced by 10, 06%
corresponding to 413 instances which are not classified as a instances. Unfortunately, the C4.5 induction algorithm has
normal traffic but as a new class. We note that the different efficiently learned the different instances of the training set,
ratios presented in Table VI are the same as those in Table according to Table VII, but could not classify new instances,
V except the normal column where the corresponding ratios for the moment, into their appropriate category according to
have decreased from Table V to VI. This is expected since bad results that are reported in Table V.
the normal class is the default class, whereas in the second We also examined in details the classification of the new
experiment all the instances that are classified using the default instances belonging to the R2L class presented in Table II;
class are classified in the new class. namely {named, sendmail, snmpgettattack, snmpguess, worm,
We should mention that the highest ratio for the U2R class xlock, xsnoop}. Table VIII presents the confusion matrix
has never exceeded 14% according to the different results corresponding to these new R2L attacks in the test data set.
available in the literature. Using our approach, this attack class
is detected as an abnormal traffic with a detection rate of Predicted %Normal %Pro- %DoS %U2R %R2L %New
Actual bing
67.98%. The false positive rate is increased by a small ratio named (17) 70.59 0.00 0.00 0.00 0.00 29.41
corresponding to 24 instances (0.04%). However, the false sendmail (17) 100 0.00 0.00 0.00 0.00 0.00
snmpget- 100 0.00 0.00 0.00 0.00 0.00
negative rate of the R2L class remains stable. attack(7,741)
We also performed two different tests to check the coher- snmpguess (2,406) 99.88 0.04 0.00 0.00 0.00 0.08
worm (2) 100 0.00 0.00 0.00 0.00 0.00
ence of the learning and test databases of KDD 99. xlock (9) 100 0.00 0.00 0.00 0.00 0.00
In the first case, we use the default training data set of xsnoop (4) 50.00 0.00 0.00 25.00 25.00 0.00
KDD 99 as the training data set and in the second test we P SP ' 0.00% (P SP ' 0.00%)

use the test data set as the training set. In each test, we TABLE VIII
examine the percentage of successful prediction (PSP) using C ONFUSION MATRIX RELATIVE TO NEW R2L ATTACKS USING THE
the learning data set of each test as a test set. The objective ENHANCED C4.5 ALGORITHM .
of this analysis is to help us discover whether the two data
sets (learning and test data sets) are incoherent. Therefore, the
different prediction ratios of the different data sets may help us From Table VIII, there is only one instance of type xsnoop
to find out whether the enhanced C4.5 algorithm we proposed that is classified properly as R2L attacks and another in the
is inefficient or the different KDD 99 data sets present some U 2R class and one instance of type snmpguess is classified
anomalies such as incoherence. as a probing attack and these are common results of the two
Definition 4.2: A database is said coherent if all the training algorithms standard C4.5 and enhanced C4.5. However, there
instances characterized by the same attributes’ values belong are only two instances of type snmpguess that are classified
to the same class. It is said incoherent if there are at least as new attacks and five others of type named.
two instances having the same attributes values but different All the remaining instances concerning the new R2L attacks
classes. are predicted as normal connections, i.e 10, 186 (resp. 10, 193)
Table VII presents the confusion matrix obtained from using the enhanced C4.5 algorithm (resp. the standard C4.5
testing the enhanced C4.5 algorithm over the training data set algorithm).
as a learning and a testing data set. The false negative rate of the new R2L attacks present in the
test data set is about 99.10% (resp. 99.97%) for the enhanced
Predicted %Normal %Probing %DoS %U2R %R2L %New C4.5 algorithm (resp. the standard C4.5 algorithm).
Actual
Normal(97,278) 99.94 0.01 0.00 0.00 0.00 0.05 These results show that these new R2L connections are not
Probing (4,107) 0.17 99.78 0.00 0.00 0.00 0.05 distinct from the normal connections issued after transforma-
DoS (391,458) 0.00 0.00 99.99 0.00 0.00 0.01
U2R (52) 1.92 1.92 0.00 90.39 0.00 5.77
tion done by MADAM/ID.
R2L (1,126) 0.62 0.00 0.00 0.09 98.93 0.36 In the second test, we invert the two databases.Using the
P SP = 99.99% standard and the enhanced C4.5 algorithms, we obtained the
TABLE VII confusion matrix presented in Table IX.
C ONFUSION MATRIX OBTAINED USING THE ENHANCED C4.5 ALGORITHM
Predicted %Normal %Probing %DoS %U2R %R2L %New
ON THE INITIAL KDD 99 LEARNING DATABASE .
Actual
Normal(60,593) 98.34 0.02 0.03 0.01 1.50 0.11
Probing (4,166) 0.19 99.35 0.07 0.00 0.00 0.38
DoS (229,853) 0.01 0.00 99.99 0.00 0.00 0.00
We notice that the different classes are predicted with high U2R (228) 2.19 0,00 0.00 96.93 0.00 0.88
rates using the learning database to construct the tree and to R2L (16,189) 36.40 0,02 0.01 0.05 63.33 0.19
generate the different rules. The successful prediction ratio is P SP = 97.70%
P SP = 99.99%. TABLE IX
In the field of supervised machine learning techniques, a C ONFUSION MATRIX RELATIVE TO FIVE CLASSES USING THE RULES
method is said powerful if it learns and predicts the different GENERATED BY THE ENHANCED C4.5 ALGORITHM OVER THE LEARNING
instances of the training set with a low detection error and DATABASE OF THE SECOND TEST.
then generalizes its knowledge to predict the class of new
0,udp,snmp,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,
0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack. the neural networks are very interesting for generalization
0,udp,snmp,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00, and very poor for new attacks attack detection, the decision
0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
trees have proven their efficiency in both generalization and
TABLE X new attacks detection. The results obtained with these two
SNMPGETATTACK ATTACK AND NORMAL CONNECTION SIMILARITY.
techniques outperform the winning entry of the KDD 99 data
intrusion detection contest. Another interesting point done here
is the introduction of the new class to which new instances
Although the percentage of successful prediction rate, from should be classified for anomaly intrusion detection using
confusion matrix IX, is P SP = 97.70%, it is considered supervised machine learning techniques. Since the different
very low since it consists in classifying the known labeled MADAM/ID programs [10] are not available and present
instances of the learning data set. This rate is considered very many shortcomings, we have written the different programs
low in the machine learning domain because it could not learn that transform tcpdump traffic into connection records. The
the instances whose classes are known a priori. This means objective of our contribution in this paper is twofold. It first
that the C4.5 algorithm failed to learn instances with their consists in extending the notion of anomaly intrusion detection
appropriate labels. On the other hand, the R2L class is highly by considering both normal and known intrusions during
misclassified. The classifier has learned only 63.33% from all the learning step. The second is the necessity to improve
the R2L labeled instances. machine learning methods by adding a new class into which
Most misclassified R2L instances are predicted as normal novel instances should be classified since they should not be
connections. This result justifies our observation stated in the classified as any of the known classes present in the learning
first test: i.e. after transformation, the new R2L attacks are not data set. As future work, we are investigating the use of this
distinct from the normal connections. technique with explicit or semi explicit alert correlation tools.
Since there are similarities between many attack connec- Since these tools do not deal with unknown attacks, we are
tions and many normal connections, the question one has to currently investigating their extension to handle these new
ask is why different attacks have the same attributes as those attacks generated by the new anomaly detection to integrate
of the normal connections? The corresponding tcpdump traffic them in the ongoing correlation attack scenarios.
of the different attacks is similar to that of normal connections
or the transformation done over these data sets is incorrect? ACKNOWLEDGMENT
All instances of snmpgetattack are predicted as normal This work was funded by the RNRT OSCAR and ACI
(within R2L class in Table VIII). Indeed, the snmpgetattack DADDi projects.
traffic is recognized as normal because the attacker logs in
R EFERENCES
as he were a non malicious user since he has guessed the
password. Table X shows that the connections corresponding [1] J. P. Anderson. Computer Security Threat Monitoring and Surveillance.
Technical report, James. P. Anderson Co., Fort Washington, Pennsylva-
to the snmpgetattack are the same as those of the normal nia, 1980.
traffic. However, the snmpguess category should be recognized [2] Y. Bouzida. Principal Component Analysis for Intrusion Detection and
as a new attack or as a dictionary attack. Unfortunately, there Supervised Learning for New Attack Detection. PhD Thesis, March
2006.
is not any attribute among the 41 attributes to test the SNMP [3] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification
community password in the SNMP request as it is the case and Regression Trees. 1984.
with some attributes that verify if it is a root password or a [4] J. Cannady. Artificial Neural Networks for Misuse Detection. In Pro-
ceedings of the 1998 National Information Systems Security Conference
guest password. This is considered only in the case of telnet, (NISSC’98), Arlington, VA, USA, October 5-8 1998.
rlogin, etc., services. The corresponding connections of the [5] H. Debar, M. Becker, and D. Siboni. A neural network component
snmpguess category are the same as those of the normal traf- for an intrusion detection system. In Proceedings of the 1992 IEEE
Symposium On Research in Computer Security and Privacy, Oakland,
fic after transformation using MADAMA/ID programs [10]. CA, May 1992.
Hence, some interesting information, with which we might [6] D. Denning. An Intrusion Detection Model. IEEE Transactions on
have distinguished the traffic, generated by the snmpguess Software Engineering, 13(2):222–232, 1987.
[7] C. Elkan. Results of the KDD’99 Classifier Learning. ACM SIGKDD,
attack with the normal traffic is lost after transformation. We 1:63–64, 2000.
set necessary conditions that should be satisfied by a rich [8] E. B. Hunt. Concept Learning: An Information Processing Problem.
transformation function to prevent these similarities (for more Wiley, 1962.
[9] KDD 99 Task. Available at: http://kdd.ics.uci.edu/
details, see [2]). databases/kddcup99/task.html, 1999.
[10] W. Lee. A Data Mining Framework for Constructing Features and
V. C ONCLUSION Models for Intrusion Detection Systems. PhD Thesis, June 1999.
[11] T. M. Mitchell. Machine Learning. McGraw Hill, 1997.
In this paper, we investigated two different techniques for [12] J. R. Quinlan. Induction of decision trees. Machine Learning, 1:1–106,
anomaly intrusion namely neural networks and decision trees. 1986.
[13] J. R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann
These two techniques fail to detect new attacks that are Publishers, 1993.
not present in the training data set. We improve them for [14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning represen-
anomaly intrusion detection and test them over the KDD 99 tations by back-propagating errors. Nature, 323:533–536, 1986.
data sets and over real network traffic in real time. While

You might also like