Identifying Important Characteristics in The KDD99 Intrusion Detection Dataset by Feature Selection Using A Hybrid Approach

UJH

Uploaded by

د. سامي أبو جهاد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views7 pages

Identifying Important Characteristics in The KDD99 Intrusion Detection Dataset by Feature Selection Using A Hybrid Approach

UJH

Uploaded by

د. سامي أبو جهاد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2010 17th International Conference on Telecommunications

Identifying Important Characteristics in the

KDD99 Intrusion Detection Dataset by Feature
Selection using a Hybrid Approach

Nelcileno Arajo1 Ruy de Oliveira2 Ailton Akira Shinoda3 Bharat Bhargava5

1 3 5
Institute of Computing EdWilson Ferreira4 Department of Electrical Department of Computer
Federal University of Mato 2,4
Department of Informatics Engineering Science
Grosso Federal Institute of Mato State University Jlio de Purdue University
Cuiab, MT, Brazil Grosso Mesquita Filho West Lafayette, IN, USA
[email protected] Cuiab, MT, Brazil Ilha Solteira, SP, Brazil [email protected]
[email protected] [email protected]
[email protected]

Abstract Intrusion detection datasets play a key role in fine only regular patterns. Everything that is not regular is taken as
tuning Intrusion Detection Systems (IDSs). Using such datasets anomalous and consequently may be linked to an intrusion [2].
one can distinguish between regular and anomalous behavior of a
given node in the network. To build this dataset is not Comparing the two IDS approaches, one can say that the
straightforward, though, as only the most significant features of misuse detection provides accurate results in recognizing
the collected data for detecting the nodes behavior should be patterns, but it is limited to the known attacks. This means
considered. We propose in this paper a technique for selecting new attacks that are not included in the signature database
relevant features out of KDD99 using a hybrid approach toward cannot be detected. On the other hand, the anomaly detection
an optimal subset of features. Unlike existing work that only based approach provides good performance in detecting new
detect attack or no attack conditions, our approach efficiently
identifies which sort of attack each register in the dataset refers
forms of attacks, but gives high false positive rates (false
to. The evaluation results show that the optimized subset of alarms), due to the difficult of characterizing a practical
features can improve performance of typical IDSs. normal behavior pattern for the nodes in the network. In fact,
regardless of the IDS approach in place, for the sake of the
Keywords: KDD99. Feature Selection, Hybrid Approach, K-Means, reliability, it is needed to choose appropriate detection metrics
Information Gain Ratio to either represent the attack pattern efficiently or define the
regular behavior expected for the network.
I. INTRODUCTION
In order to choose proper intrusion detection metrics,
Over the past ten years, the number of security related several training datasets for IDSs have been created. One of
incidents registered at CERT.br (Center for Studies, Answers the most popular such a dataset is the Knowledge Discovery
and Handling of Security related Incidents in Brazil) has and Data Mining KDD99 [3], which was developed, by the
increased about 100-fold [1]. This demonstrates the inherent Massachusetts Institute of Technology - MIT, during the
vulnerability of the Internet, which calls for permanent international competition on data mining in 1999. In this
development of efficient security mechanisms. As a result, dataset each connection (TCP Connection) is represented by
various security tools, such as firewall, cryptography, and 41 features, but experiments have shown that using all these
Intrusion Detection Systems (IDSs) have been developed features does not guarantee efficiency for attacks based on the
rendering computing systems more reliable. package contents [4].
In particular, the IDSs have received great attention from With that in mind, this paper proposes optimizing the
researchers all over the globe because of their ability to keep existing metrics in the KDD99 training dataset through a
track of the network behavior, so that abnormal behavior can feature selection technique using a hybrid approach, which
be detected quickly. The detection can occur in two distinct will generate an optimal dataset of features. Differently from
ways. One technique uses previously known attack patterns to existing work, our approach takes into account all the
infer intrusions. This technique is normally called misuse categories of connections in KDD99 (attacks or no attacks),
detection. Another way of detection is called anomaly i.e., Normal, DoS, Probe, U2R, R2L. It is also a purpose of
detection, in which there are no known attack patterns, but this paper to check the impact of using such a dataset on the
IDS accuracy.

U.S. Government work not protected by U.S. copyright

552
The remainder of this paper is organized as follows. In machine is already invaded, but the attacker attempts to
Section 2, we describe the KDD99 intrusion detection dataset, gain access with superuser privilegies.
where the procedures to generate the dataset are discussed.
The files generated during the data collection were put in a
Section 3 addresses the selection of the most relevant features
standard format that contains 41 features for each registered
in KDD99 through a hybrid approach that combines the
connection. A connection here refers to a sequence of TCP
information gain ratio and the k-means classifier toward the
packets with well defined time duration and transmitted over a
optimized dataset. Yet in this section, it is shown comparative
well defined protocol between a source machine and a
results on detection accuracy for ten distinct datasets
destination machine [3]. Each connection is labeled as either
generated from the so called 10%KDD99" training dataset. In
normal or under a specific sort of attack. Each connection
Section 4, we conclude the work and outline suggestions for register is about 100 bytes long.
future work.
The combination of the 41 features of each connection
II. KDD99 INTRUSION DETECTION DATASET determines to which of the five connection categories
mentioned above the audited connection belongs to. We call
This dataset is composed of various training and test data this procedure categorization of the connections. Accordingly,
for IDSs. It was developed from a project at MIT Lincoln to better understand the contribution of each of the features
Labs, in 1999, where comparative evaluations among several within the dataset to this categorization, they were gathered
distinct methodologies for intrusion detection were conducted. into four groups [2], [3], [4], as follows:
Fig. 1 illustrates the simulated network topology used for
KDD99. It is a fictitious military network with three target- Basic features identify the properties in the packet
machines running various operating systems and services. header, which represent critical metrics in a
Moreover, there are three additional machines for generating connection.
traffic from different sources. The Sniffer captures the
network flow in TCP Dump format. The simulation ran for Content features these are information extracted
seven weeks. from the packets that are only useful for experts who
are able to associate them to known forms of attacks.
Example of such metrics is the number of non-
authorized access attempts into a given machine.
Time based traffic features show the features that
occurred in a traffic profile computed during a time
interval of two seconds. Crucial information related to
some sort of attacks can only be obtained if the time
duration is taken into consideration. A good example
here is the number of connections to a single machine
in a time interval of two seconds.
Host based traffic features In this case, the metrics,
which show the traffic profile, are calculated from a
historical data that is estimated from the last hundred
used connections. A metric employed in this group is
Figure 1. Topology of the simulated network for KDD99 the number of connections to the same destination
machine.
The logs from the sniffer were divided into five categories
[2], [3], [4]: KDD99 is actually composed of three datasets. The
largest one is called Whole KDD, which contains about
Normal connections that fit the expected profile in the 4 million registers. This is the original dataset created out
military network. of the data collected by the Sniffer.
Denial of Service (DoS) c o n n e c t i o n s t r y i n g to Since the amount of data to be processed is too high, it is
prevent legitimate users from accessing the service in the interesting to reduce the computational costs involved as much
target-machine. as possible. Thus, a subset containing only 10% of the training
data, taken randomly from the original dataset was created.
Scanning (Probe) connections scanning a target
This resulted in the 10% KDD dataset used to train the IDS.
machine for information about potential vulnerabilities.
In addition to the 10% KDD and Whole KDD, there is
Remote to Local (R2L) connections in which the a testing dataset known as Corrected KDD. This dataset
attacker attempts to obtain non-authorized access into a does not have the same distribution of probability of attacks as
machine or network. is the case in the other bases. This happens because the
User to Root (U2R) connection in which a target Corrected KDD includes 14 new types of attacks aiming at

553
checking the IDS performance to unknown forms of attacks. subset having 6 features only. They used the Support Vector
Note that in the complete dataset (Whole KDD) and in the Machines (SVM), Multivariate Adaptive Regression Splines
training dataset (10% KDD) there are 22 types of attacks in (MARS) and Linear Genetic Program (LGP) algorithms to
total [4]. associate a weight to each feature. The Sequential Backward
Search technique was employed in [11] and [12] to indentify
It is also important to mention that the KDDs training the subset of relevant features. In their approach the whole
dataset contains a large number of connections for the dataset is initially used and after each iteration a feature is
categories normal, probe and DoS. They represent removed from the dataset, until the desirable precision for the
approximately 99.76% of the whole dataset. classifier is reached. Another popular approach, known as
hybrid approach, combines the both techniques: filter and
III. OPTIMIZING THE KDD 99 INTRUSION DETECTION wrapper. The work in [6] shows the efficiency of the hybrid
DATASET USING A HYBRID APPROACH FOR FEATURE SELECTION approach with large datasets, in which the calculation demand
of the optimal subset of features is similar to the one of the
In general, it is not a good idea to feed the IDSs learning filter approach. In [5], the authors use the hybrid approach over
mechanisms with the originally collected dataset. It needs to a dataset obtained in an infrastructure wireless network based
be optimized, since there are features that are either irrelevant on the IEEE 802.11 model. They applied the Gain Information
or redundant for the learning algorithm. Without a proper Ratio metric to classify the original dataset of features on the
treatment of the dataset, the detector accuracy is degraded and basis of the reached grade, and a so-called k-means classifier to
the test and training procedures may get really slow [5]. build an optimal subset of features that increases the detector
Hence, it is important to determine an optimal set of features accuracy and at the same time reduces the learning time.
that accurately represents the characteristics of the traffic
being evaluated. Experiments have shown that proper set of B. Proposed Model
features results in up to 50% of time reduction for the IDS Our proposed scheme for feature selection is based on the
test and training phases [6]. hybrid approach published in [5]. Nevertheless, while the
work in [5] evaluates the quality of the optimal subset of
A. Feature Selection features considering only whether the connection is either
Feature selection is crucial for designing the intrusion normal or under attack, our evaluation takes into account all
detection models. In this process, only the most relevant the categories of connections in KDD99, i.e., Normal, DoS,
features are extracted from the whole dataset. This prevents Probe, U2R, R2L. Besides, the captured data in our
the irrelevant features from causing noise in the categorization evaluations were not collected in an infrastructure wireless
of the connections. network but in a wired military-like network. We also used the
two metrics to evaluate the capability of detection of the IDS:
Currently, there exist two main approaches to carry out the detection ratio of the whole dataset and the
feature selection: filter and wrapper. In the former, an acknowledgment accuracy ratio of each connection category.
independent metric, such as correlation and PCA [6], is used
to compute the relevance of a set of features, resulting in the The feature selection algorithm proposed here is shown in
optimal subset of features that contains the important features Fig. 2. Initially, the information gain ratio for each of the 41
classified in accordance with the measured values of the used features of KDD99 are computed, and then ranked in
metric. The latter uses machine learning algorithms for rating accordance to their values. In the sequence, after each
the importance of one or more features in order to build an iteration, the k-means classifier extracts the feature with the
optimal subset of features with the most representative highest IGR from the dataset and assesses the detection rate of
features. Wrapper is more complex in terms of computing the optimal subset of features. Additionally, the accuracy level
than the filter approach, but gives better results [6], [7], [8], in detecting the right category for the connection in the
[9]. optimal subset is verified. The selecting process stops when
either the classifier accuracy is above the adjusted threshold or
These approaches have some drawbacks. For instance, the the accuracy value is below the previous calculated value.
classifier input using random features can result in biased
outcomes, and the search for the optimal set of features can The IGR metric was used here mainly because of its good
result in thousands of combinations in the classifier, which results shown in the filter approach, as well as its low
leads too high computational costs. For example, the KDD99 computational cost [4], [5], [13]. This metric is computed as
dataset encompasses 41 features, and considering all possible shown in (1) [14].
combinations in the classifier to verify which set best
IGR( D, A)
Gain( D, A) (1)
contributed to the detection models, we will have hundreds of
SplitInformation( D, A)
billions of feature combinations that can render the use of the
dataset unviable.
where,
Different techniques have been employed to mitigate the
feature selection problem. In [10], the authors used D training data with N features.
classification algorithms to reduce the set of features out of the A set of features in the dataset.
KDD99 dataset (originally with 41 parameters) into an optimal

554
Dv Dv (4)
Split inf( P) log 2
Algorithm Feature Selection based on IGR/K-means v Attributes( A) D D
Input:
D Training data with N features
IGR Information Gain Ratio
The K-Means algorithm [16] is one of the oldest and more
C k-means Classifier important algorithms available in the literature for performing
AC Current Accuracy grouping. Although it has been published over forty years ago,
AP Previous Accuracy it is still largely used these days. The main reasons for this
Threshold Gain accuracy threshold
Output:
popularity include its simplicity and high performance. K-
Soptimum Optimal subset of features Means complexity is O(nK), being n the cardinality of original
Begin dataset and K is the amount of groups [9]. Besides, K-Means is
//Filter Approach of ease implementation and has been evaluated quite a lot in
For each feature f compute IGR(f)
Classify the features in D based on IGR(f)
recente years, which leveraged the development of various
//Wrapper Approach novelties in the way it works. Because of these characteristics,
Initialize S = EMPTY and AC = 0 noting that K-Means performance over similar tools is much
Repeat better, we have adopted it in our scheme.
AP = AC
f = getNext(D)
Soptimum = Soptimum U {f} C. Experimental Evaluations
D = D {f} In order to evaluate the efficiency of the hybrid approach,
AC = ACCURACY(C,SOPTIMUM)
UNTIL (AC-AP) < threshold or AC<AP using IGK/K-means, on optimizing the KDD99, we used for
End the experiments the parameters setup shown in Table I. The
Figure 2. Feature Selection Algorithm based on IGR/k-means
subset 10% KDD99 was chosen because it was created
exactly to be used in training IDSlearning modules [3]. This
The information gain ratio is a quantitative measure used to subset is composed of approximately 490.000 samples
grade the relevance of the features based on the values of such including all kinds of connection categories defined in KDD99
features in the dataset [15]. Nonetheless, before computing the (Normal, DoS, Probe, U2R, R2L).
information gain ratio, it is necessary to check the noise The feature selection was carried out by the data mining
(misclassification) inserted in the training set. This checking is tool called WEKA [17]. This tool performed efficiently in
called Entropy and is computed using (2). related work such as [5], [6], [10], [11], [12], [13], and so we
n (2) adopted it here as well.
Entropy ( P) pi log 2 pi
i 1
TABLE I. PARAMETERS SETUP FOR THE EXPERIMENTS
where,
Pi probability of a given feature (or attribute) value to be Components Configuration
in the sampled set of the dataset.
Dataset 10% KDD99
n maximum value assigned to a feature. Programming tools WEKA, MS-Excel 2007
After computing the entropy of D, the formula for the Computer Notebook processor Intel Celeron M 440,
information gain ratio in (3) is used to determine the best 1.86GHz, 2GB RAM, 250 GB of hard disc
feature to be used as root. Operating system Microsoft Windows XP Professional (SP2)

Regarding the evaluated scenarios, two distinct scenarios

Dv (3) were considered. The first one was used to optimize the KDD
Gain( D, A) Entropy ( D) Entropy ( Dv) dataset and the second one to check the optimizations effects
v Attribute( A) D
on the performance of an IDS, as follows:
where, Scenario 1: the IGR was applied to the dataset
Dv amount of samples of the dataset that contain 10%KDD99 to measure the relevance of the 41
repetitions of the evaluated feature. features, resulting in a sorted classification. Then, the
K-means classifier is used to compute the optimal
D total samples of the training dataset. subset of features;
The Entropy gives us information about the probability of Scenario 2: the dataset 10%KDD99 was divided
a given feature value to be in a dataset (pi). The split into ten subsets, containing about 49000 registers of
information represents the potential information to be connections each. Subsequently, each subset was
generated by dividing the base D into m subsets, as defined in processed by an IDS based on the decision trees
(4). algorithm, and for this, the optimal subset of features
was used.

555
In Fig. 4(a) one can notice that the best results are obtained
when the optimal subset of features has the 14 most important
features of the evaluated dataset. With less features than that,
the U2R class has accuracy close to zero, which means that
despite the high detection rate depicted in Fig 4(b), the
algorithm does not provide enough accuracy in recognizing
U2R connections. Hence, whenever the optimal subset of
features contains categories of connections with large
percentage of samples, the detection rate is not a good
criterion to use to evaluate the quality of such a subset. For
Figure 3. Descending classification of the IGR for the features in the dataset this evaluated dataset, the DoS category accounts to 80% of
10%KDD99.
the whole sampled connections. The optimal subset of features
In both scenarios, the validation of the results was 10%KDD99 comprises the following features: dst host diff
conducted with the so-called 10-fold-cross technique [18]. srv rate, logged in, dst host srv diff host rate, diff srv rate,
The idea here was to obtain low error rates and find out the destination bytes, root Shell, is guest login, urgent, service, dst
intrusion detection rate. host count, srv diff host rate, source bytes e protocol type.

1) Results for the Scenario 1 2) Results for the Scenario 2

Fig. 3 shows the classification of the 41 features of the The purpose of the second scenario is to provide us with
dataset 10%KDD99 sorted in a descending order through good insights into the effects of an optimized dataset on the
the information gain ratio. Most of the features have IGR performance of an IDS. The dataset 10%KDD99 was
under the average of the dataset, (IGR average = 0,22). In fact, divided into 10 subsets, as depicted in Fig. 5. Each subset has
only 18 features are above the average. This shows that the its own distribution of categories of connections but the
original database has data concentration in a small group of subsets 5 and 6. It is possible to distinguish a pattern in most
values. Features that result in a convergence of connection subsets, since there are a lot more DoS connections registers
categories within a small group of values are little significant than registers of the other connections. This can be interesting
to describe a node behavior. This indicates that the original to evaluate our previous statement that subsets with a strong
dataset may contain irrelevant data for the IDS and so needs to prevalence of a single connection category might render the
be optimized. adjusted detection rate unfeasible.

After obtaining the ranked set of features through the IFG Subsequently, we used the features inside each generated
the optimal subset of features were determined by the k-means optimal subset of features to feed an IDS based on a decision
classifier. After each iteration of the classifier the most trees algorithm. The outcome is shown in Fig. 6 through three
relevant feature, in accordance to the IGR, was added to the parameters: detection rate, accuracy rate and true positive rate.
optimal subset of features. The classifier keeps track of the The false positive parameter was ignored because the assumed
accuracy rate of the connection categories in the new subset, values are too close to zero, which does not contribute to the
and once either the accuracy reaches 90% or it is lower than evaluation of the quality of the optimal subset of features.
the value calculated in the previous iteration the classification
process ends and the optimal subset of features is determined.

(a) Accuracy rate (b) Detection rate

Figure 4. Performance of alternate subsets (optimal subsets) of 10%KDD99 by two stop criteria.

556
Figure 5. Composition of datasets generated from the 10%KDD99.

From the results we find out more about the quality of the main reason lies in the differences found in the weight that
optimal subset of features in terms of connection categories each category of connection in the training dataset has on the
detection. As shown in Fig. 6, the detection rate for all subsets proposed mechanism. Categories with low weight face
surpasses 99%. To ensure reliability in our evaluations, we problems of detection despite the detection rate remains high,
also included the accuracy rate in our evaluations. Fig.6 shows which occurs due to the Giant categories of connections in
that all connections categories provided high accuracy (over place.
90%) except the U2R category that in the best scenario gave
To address this problem, we propose here using jointly the
60% of accuracy. This is a result of the low relevance of such
detection rate and the accuracy rate. By using the dataset
a category in the sample space of each subset, which
features in a fairer way, without favoring any category, the
corroborates our finding that the detection rate parameter does
accuracy rate corrects the distortions caused by the Giant
not impact the evaluation process of the quality of the optimal
categories of connections.
subset of features.
Since the computational cost for large dataset are non-
Finally, the high values for the true positive rate
negligible, and the results here showed that the optimized
strenghtens the viability of the our proposal, as this indicates
dataset provided similar outcome to the original dataset (with
that the IDS is capable of recognizing the connection
41 features), we can say that our proposal is indeed
categories efficiently. It is important to note that in some cases
worthwhile. By using it, an IDS will be trained much faster
the rate is below 80%, which occurred again due to the low
than it would do with the original dataset.
impact of the evaluated category on the sample space of the
dataset. As an example, we have in the category 7 a total of 11 The following tasks are left for future work. Application of
connection of the category Normal but none of them is the technique used here for feature selection in dataset
recognized by the IDS. collected from other network environments such as sensor,
mesh, and WiMax wireless networks. Alternate programming
IV. CONCLUSIONS AND OUTLOOK tools, such as C and FORTRAN, for conducting the feature
selection, as the WEKA [7] algorithm, that was used here for
We have proposed the use of a hybrid approach to select feature selection, is based on JAVA and so demanded too
the best features from the training dataset KDD99 toward a much both memory and processing capabilities of the machine
reduced dataset to improve an IDS efficiency. The hybrid used in the experiments. And finally, the use of Metaheuristics
approach combines the information gain ratio (IGR) and the k- (genetic algorithms, tabu search, and simulated annoling) to
means classifier. The former is responsible for classifying the perform feature selection through the computation of the
features on the basis of IGR measure. The latter generates an optimal subset of features.
optimal subset of features by evaluating the features accuracy
from the ranked data provided by the IGR.
The evaluation results suggest that the detection rate on its
own does not provide reliability in detecting intrusions. The

Figure 6. Results obtained by the decision tree based IDS on the 10 datasets generated from the "10% KDD99".

557
[16] J. Mcqueen, Some methods for classification and analysis of
ACKNOWLEDGMENT multivariate observations, in Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability, 1967, pp. 281
This material is based on a research project funded by the 297.
Foundation for Research Support of Mato Grosso (FAPEMAT) [17] R. R. Bouckaert et al., WEKA manual for version 3-7-0.
on the supervision of the Network and Security Research http://www.cs.waikato.ac.nz/ml/weka/, Last Access: August 2009.
Group (GPRS). GPRS is managed by the Federal Institute of [18] Y. Bengio & Y. Grandvalet, No unbiased estimator of the variance of
Mato Grosso (IFMT) in conjunction with the Federal k-fold cross validation, Journal of Machine Learning Research, vol.5,
University of Mato Grosso (UFMT), State University Jlio de pp. 1089-1105, 2004.
Mesquita Filho (UNESP) and Federal University of Uberlandia
(UFU). The authors acknowledge the facilities and equipment
provided by IFMT for the development of this work.

REFERENCES
[1] CERT.br Computer Emergency Response Team Brazil.
http://www.cert.br/stats/incidentes/, Last Access: August 2009.
[2] P. Souza, Study about anomaly based intrusion detection systems: an
approach using neural networks, M.Sc. Thesis, Salvador
University/Salvador, 2008.
[3] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba & K. Das, The 1999
DARPA off-line intrusion detection evaluation, Computer Networks,
vol.34, n.4, pp. 579-595, 2000.
[4] H. G. Kayacik, A. N. Zincir-Heywood & M. I. Heywood, Selecting
features for intrusion detection: a feature relevance analysis on KDD
99, in Proceeding of third annual conference on privacy, security and
trust, 2005.
[5] M. Guennoun, A. Lbekkouri & K. El-Khatib, Optimizing the feature set
of wireless intrusion detection systems, International Journal of
Computer Science and Network Security, vol. 8, n. 10, pp. 127-131,
2008.
[6] Y. Chen, Y. Li, X. Cheng & L. Guo, Survey and taxonomy of feature
selection algorithms in intrusion detection system, Lecture Notes in
Computer Science, vol. 4318, pp. 153-167, 2006.
[7] H. Liu & H. Motoda, Feature selection for knowledge discovery and
data mining, Kluwer Academic, 1998.
[8] R. A. M. Horta & F. J. dos S. Alves, Data mining techniques in feature
selection for prediction of insolvency: implementation and evaluation
using recent brazilian dataset, in Proceedings of the XXXII Meeting of
ANPAD, 2008, pp. 1-15. [Digests XXXII Encontro da ANPAD, 2008, p.
152.]
[9] J. de A. Soares, Preprocessing data in data mining: a comparative study
in inputation, D.Sc. Thesis, Federal University Rio de Janeiro/Rio de
Janeiro, 2007.
[10] H. Sung & S. Mukkamala, The feature selection and intrusion detection
problems, in Proceedings of the 9th Asian Computing Science
Conference, Lecture Notes in Computer Science, 2004, vol. 3321, pp.
468-482.
[11] H. Sung & S. Mukkamala, Identifying Important Features for Intrusion
Detection Using Support Vector Machines and Neural Networks,
in Proceedings of the 2003 Symposium on Applications and the internet,
2003, pp. 209-217.
[12] G. Stein, B. Chen, A. S. Wu & K. A. Hua, Decision tree classifier for
network intrusion detection with GA-based feature selection,
in Proceedings of the 43rd Annual Southeast Regional Conference,
2005, vol. 2, pp. 136-141.
[13] Bsila, S. Gombault. & A. Belghith, Improving traffic transformation to
detect novel attacks, in Proceeding of 4th International Conference:
Sciences of Eletronic, Technologies of Information and
Telecommunications, 2007.
[14] O. Maimom & L. Rokach., Decomposition methodology for knowledge
discovery and data mining theory and applications, World Scientific
Publishing Co, 2005.
[15] T. M. Mitchell, Machine Learning, McGraw Hill, 1997.

558

Differentiation
No ratings yet
Differentiation
3 pages
Artificial Neural Networks An Econometric Perspective
No ratings yet
Artificial Neural Networks An Econometric Perspective
98 pages
Selecting Features For Intrusion Detection: A Feature Relevance Analysis On KDD 99 Intrusion Detection Datasets
No ratings yet
Selecting Features For Intrusion Detection: A Feature Relevance Analysis On KDD 99 Intrusion Detection Datasets
6 pages
DattaDeshmukhecs 2014 6892542
No ratings yet
DattaDeshmukhecs 2014 6892542
7 pages
Analyzing Denial-Of-Service Attacks in KDD CUP 99 Data Set For Intrusion Detection System
No ratings yet
Analyzing Denial-Of-Service Attacks in KDD CUP 99 Data Set For Intrusion Detection System
4 pages
Research Paper - Lakhan
No ratings yet
Research Paper - Lakhan
18 pages
Best Journal
No ratings yet
Best Journal
6 pages
Packet and Flow Based Network Intrusion Dataset
No ratings yet
Packet and Flow Based Network Intrusion Dataset
12 pages
Feature Selection TJ-SZ-ISIE 2017-Camera Ready1
No ratings yet
Feature Selection TJ-SZ-ISIE 2017-Camera Ready1
7 pages
Feature Selection For Effective Anomaly-Based Intrusion Detection
No ratings yet
Feature Selection For Effective Anomaly-Based Intrusion Detection
5 pages
Study On Decision Tree and KNN Algorithm For Intrusion Detection System IJERTV9IS050303
No ratings yet
Study On Decision Tree and KNN Algorithm For Intrusion Detection System IJERTV9IS050303
6 pages
Network Intrusion Detection Using Association Rules: Flora S. Tsai
No ratings yet
Network Intrusion Detection Using Association Rules: Flora S. Tsai
3 pages
On The Capability of An SOM Based Intrusion Detection System
No ratings yet
On The Capability of An SOM Based Intrusion Detection System
6 pages
Document 1
No ratings yet
Document 1
5 pages
A Deeper Dive Into The NS1
No ratings yet
A Deeper Dive Into The NS1
5 pages
2056 5246 1 PB
No ratings yet
2056 5246 1 PB
14 pages
A Genetic Algorithm Based Elucidation For Improving Intrusion Detection Through Condensed Feature Set by KDD 99 Data Set
No ratings yet
A Genetic Algorithm Based Elucidation For Improving Intrusion Detection Through Condensed Feature Set by KDD 99 Data Set
10 pages
Ijnsa 040604
No ratings yet
Ijnsa 040604
16 pages
Modernized Intrusion Detection Using Enhanced Apriori Algorithm
No ratings yet
Modernized Intrusion Detection Using Enhanced Apriori Algorithm
10 pages
ContentServer
No ratings yet
ContentServer
33 pages
Intelligent Methods For Intrusion Detection in Local Area Networks
No ratings yet
Intelligent Methods For Intrusion Detection in Local Area Networks
12 pages
Integrated Intrusion Detection Using SCT: Selvakani Kandeeban and R. S. Rajesh
No ratings yet
Integrated Intrusion Detection Using SCT: Selvakani Kandeeban and R. S. Rajesh
6 pages
Intrusion Detection Using Dynamic Feature Selection and Fuzzy Temporal Decision Tree Classification For Wireless Sensor Networks
No ratings yet
Intrusion Detection Using Dynamic Feature Selection and Fuzzy Temporal Decision Tree Classification For Wireless Sensor Networks
8 pages
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
No ratings yet
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
8 pages
Evaluation of Different Data Mining Algorithms
No ratings yet
Evaluation of Different Data Mining Algorithms
19 pages
Feature Selection For Intrusion Detection Using Neural Networks and Support Vector Machines
No ratings yet
Feature Selection For Intrusion Detection Using Neural Networks and Support Vector Machines
17 pages
Anomaly Detection in Computer Networks Using Linear Svms
No ratings yet
Anomaly Detection in Computer Networks Using Linear Svms
5 pages
Implementation and Analysis of Combined Machine Learning Method For Intrusion Detection System
No ratings yet
Implementation and Analysis of Combined Machine Learning Method For Intrusion Detection System
10 pages
Intrusion Detection Using Artificial Intelligence On KDD Data Set
No ratings yet
Intrusion Detection Using Artificial Intelligence On KDD Data Set
10 pages
GNP-Based Fuzzy Class-Association Rule Mining in IDS
No ratings yet
GNP-Based Fuzzy Class-Association Rule Mining in IDS
5 pages
A Study On NSL-KDD Dataset PDF
No ratings yet
A Study On NSL-KDD Dataset PDF
7 pages
A Study On NSL-KDD Dataset
100% (1)
A Study On NSL-KDD Dataset
7 pages
Intrusion Detection System Based On Multi-Layer Perceptron Neural Networks and Decision Tree
No ratings yet
Intrusion Detection System Based On Multi-Layer Perceptron Neural Networks and Decision Tree
5 pages
Efficient Classifier For R2L and U2R Attacks: P. Gifty Jeya M. Ravichandran C. S. Ravichandran
No ratings yet
Efficient Classifier For R2L and U2R Attacks: P. Gifty Jeya M. Ravichandran C. S. Ravichandran
5 pages
Systematic Approach To Intrusion Evaluation Using The Rough Set Based Classification
No ratings yet
Systematic Approach To Intrusion Evaluation Using The Rough Set Based Classification
6 pages
A Feed-Forward and Pattern Recognition ANN Model For Network Intrusion Detection
No ratings yet
A Feed-Forward and Pattern Recognition ANN Model For Network Intrusion Detection
7 pages
Sat - 35.Pdf - Detection of Attacks (DoS Probe) Using Genetic Algorithm
No ratings yet
Sat - 35.Pdf - Detection of Attacks (DoS Probe) Using Genetic Algorithm
11 pages
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
No ratings yet
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
10 pages
The Efficiency of Ensemble Machine Learning Models On Network Intrusion Detection Using KDDCup 99 Dataset
No ratings yet
The Efficiency of Ensemble Machine Learning Models On Network Intrusion Detection Using KDDCup 99 Dataset
5 pages
Performance Analysis of Machine Learning
No ratings yet
Performance Analysis of Machine Learning
22 pages
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
No ratings yet
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
17 pages
Intrusion Detection System (Ids) Development Using Tree - Based Machine Learning Algorithms
No ratings yet
Intrusion Detection System (Ids) Development Using Tree - Based Machine Learning Algorithms
17 pages
Intrusion Detection System IDS Developme
No ratings yet
Intrusion Detection System IDS Developme
17 pages
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
No ratings yet
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
6 pages
Network Traffic Intrusion Detection System Using Decision Tree & K-Means Clustering Algorithm
No ratings yet
Network Traffic Intrusion Detection System Using Decision Tree & K-Means Clustering Algorithm
3 pages
s2.0-S1877705812008375-main غدا PDF
No ratings yet
s2.0-S1877705812008375-main غدا PDF
9 pages
EECE 655 Paper
No ratings yet
EECE 655 Paper
4 pages
Design An Intrusion Detection System Based On Feature Selection Using ML Algorithms
No ratings yet
Design An Intrusion Detection System Based On Feature Selection Using ML Algorithms
9 pages
Network Intrusion Detection Using SupervisedMachine Learning Technique With Feature Selection
No ratings yet
Network Intrusion Detection Using SupervisedMachine Learning Technique With Feature Selection
66 pages
Toward Generating A New Intrusion Detection Dataset and Intrusion Traffic Characterization
No ratings yet
Toward Generating A New Intrusion Detection Dataset and Intrusion Traffic Characterization
9 pages
Intrusion Detection System Using Genetic Algorithm
No ratings yet
Intrusion Detection System Using Genetic Algorithm
5 pages
Sensors 23 01315 v2
No ratings yet
Sensors 23 01315 v2
32 pages
Seguridad
No ratings yet
Seguridad
29 pages
A Detailed Analysis of The KDD CUP 99
No ratings yet
A Detailed Analysis of The KDD CUP 99
25 pages
Sample Paper
No ratings yet
Sample Paper
7 pages
Analysis of KDD 99 Intrusion Detection Dataset For Selection
No ratings yet
Analysis of KDD 99 Intrusion Detection Dataset For Selection
8 pages
IAT-III Question Paper With Solution of 18EC821 Network Security May-2023-Dr. Harsha B K
No ratings yet
IAT-III Question Paper With Solution of 18EC821 Network Security May-2023-Dr. Harsha B K
6 pages
Identifying Key Variables For Intrusion Detection Using Soft Computing Paradigms
No ratings yet
Identifying Key Variables For Intrusion Detection Using Soft Computing Paradigms
7 pages
Intrusion Detection System Using Hierarchical GMM and Dimensionality Reduction
No ratings yet
Intrusion Detection System Using Hierarchical GMM and Dimensionality Reduction
5 pages
Model For Intrusion Detection System With Data Mining
No ratings yet
Model For Intrusion Detection System With Data Mining
4 pages
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Site FINAL
No ratings yet
Site FINAL
7 pages
Guidecomm Mgmtsci 2014
No ratings yet
Guidecomm Mgmtsci 2014
8 pages
Ijct 13 (3) 203-217
No ratings yet
Ijct 13 (3) 203-217
15 pages
Ijct 13 (3) 203-217
No ratings yet
Ijct 13 (3) 203-217
15 pages
33P PDF
No ratings yet
33P PDF
31 pages
LiDAR Primer
No ratings yet
LiDAR Primer
1 page
Fundamental Remote Sensing
No ratings yet
Fundamental Remote Sensing
55 pages
Water .. Salty or Fresh?: Teacher's Note
No ratings yet
Water .. Salty or Fresh?: Teacher's Note
8 pages
Water Research: Zhangxin Wang, Shihong Lin
No ratings yet
Water Research: Zhangxin Wang, Shihong Lin
10 pages
Auxiliary Monge-Ampere Equations in Geometric Analysis
No ratings yet
Auxiliary Monge-Ampere Equations in Geometric Analysis
38 pages
Genmath Graphic Organizer
No ratings yet
Genmath Graphic Organizer
1 page
Unit 1 - AIML
No ratings yet
Unit 1 - AIML
91 pages
Chapter 8 Steady State Error
No ratings yet
Chapter 8 Steady State Error
24 pages
Physics: Alternative Sample Schedules Are Available at Nyuad - Nyu.edu/grids
No ratings yet
Physics: Alternative Sample Schedules Are Available at Nyuad - Nyu.edu/grids
1 page
Unit5.ipynb - Numerical Integration
No ratings yet
Unit5.ipynb - Numerical Integration
5 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Digital Signal Processing
100% (6)
Digital Signal Processing
354 pages
Unit 2 - Week 1: Assignment 1
No ratings yet
Unit 2 - Week 1: Assignment 1
4 pages
Data Science and Machine Learning
100% (1)
Data Science and Machine Learning
190 pages
Full Factorial (Minitab 1)
No ratings yet
Full Factorial (Minitab 1)
3 pages
A Deep Learning Methodology To Predicting Cybersecurity Attacks On The Internet of Things
No ratings yet
A Deep Learning Methodology To Predicting Cybersecurity Attacks On The Internet of Things
22 pages
9ma0 03 Q S2
No ratings yet
9ma0 03 Q S2
1 page
Understanding Time Complexity With Simple Examples
No ratings yet
Understanding Time Complexity With Simple Examples
78 pages
Digit Problems
No ratings yet
Digit Problems
3 pages
Chapter 2 (Part 1) OOP Vs SP
No ratings yet
Chapter 2 (Part 1) OOP Vs SP
11 pages
Formation of Partial Differential Equations
No ratings yet
Formation of Partial Differential Equations
9 pages
A Dynamic Adaptive Particle Swarm Optimization and
No ratings yet
A Dynamic Adaptive Particle Swarm Optimization and
27 pages
Advanced Engineering Mathematics
No ratings yet
Advanced Engineering Mathematics
32 pages
Operation Management Forecast
No ratings yet
Operation Management Forecast
2 pages
Data-Driven Switching Control Technique Based On Deep Reinforcement Learning For Packed E-Cell As Smart EV Charger
No ratings yet
Data-Driven Switching Control Technique Based On Deep Reinforcement Learning For Packed E-Cell As Smart EV Charger
9 pages
MCQ Question
No ratings yet
MCQ Question
5 pages
Chapter 6S Advanced Waiting Line Theory and Simulation Modeling
No ratings yet
Chapter 6S Advanced Waiting Line Theory and Simulation Modeling
16 pages
1 Sampling and Signal Reconstruction
No ratings yet
1 Sampling and Signal Reconstruction
15 pages
CS3381 Oop Manual Cse
No ratings yet
CS3381 Oop Manual Cse
52 pages
Practical Work of AI
No ratings yet
Practical Work of AI
6 pages
Thesis On Content Based Image Retrieval
100% (3)
Thesis On Content Based Image Retrieval
7 pages
Week 4 Security
No ratings yet
Week 4 Security
13 pages

Identifying Important Characteristics in The KDD99 Intrusion Detection Dataset by Feature Selection Using A Hybrid Approach

Uploaded by

Identifying Important Characteristics in The KDD99 Intrusion Detection Dataset by Feature Selection Using A Hybrid Approach

Uploaded by

2010 17th International Conference on Telecommunications

Identifying Important Characteristics in the

Nelcileno Arajo1 Ruy de Oliveira2 Ailton Akira Shinoda3 Bharat Bhargava5

U.S. Government work not protected by U.S. copyright

Regarding the evaluated scenarios, two distinct scenarios

1) Results for the Scenario 1 2) Results for the Scenario 2

(a) Accuracy rate (b) Detection rate

You might also like