0% found this document useful (0 votes)

89 views

Intrusion Detection Using Decision Tree Approach

This document discusses using decision tree approaches for intrusion detection systems. It summarizes previous research on decision tree classifiers like J48 and CART for intrusion detection. The methodology section describes how J48 and CART decision tree classifiers work and the steps involved in building the decision trees. The document then discusses implementing J48 and CART classifiers on the NSL-KDD dataset and analyzing the results based on metrics like accuracy, precision, recall and F-measure. The results section presents the confusion matrices and accuracy achieved by each classifier.

Uploaded by

bhattajagdish

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

Intrusion Detection Using Decision Tree Approach

Uploaded by

bhattajagdish

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

Intrusion Detection System Using

Decision Tree Approach

Jagdish Bhatta
Central Department of Computer Science and IT, Tribhuvan University, Nepal
[email protected]
Youbaraj Paudyal
Central Campus of Science and Technology, Midwestern Univeristy, Nepal
[email protected]
Nawaraj Paudel
Central Department of Computer Science and IT, Tribhuvan University, Nepal
[email protected]
Introduction
• With the tremendous growth of the usage of
computers over network and development in the
application running on various platform captures the
attention towards network security.
• Intrusion is any attack that aims to compromise the
security of any network. To handle intrusion
detection mechanism, intrusion detection system
(IDS) is used.
Introduction
• An Intrusion Detection System (IDS) is a system that
monitors network to check harmful activities in the
network and reports events that does not meet the
security criteria to the network administrator. IDS
was first introduced in 1980 by James P. Anderson
and then improved by D Denning in 1948.
Introduction
• There are various classifiers that are applicable to
misuse-based detection. Some are tree based such as
decision tree, and random forest, whereas some are rule
based such as oneR, while some are function based such
as SVM (Support Vector Machine).
• In this study J48 and CART decision tree1,2 are
implemented and analyzed to classify input data from
NSL-KDD as normal or anomalous.
• Decision trees are utilized to describe decision-making
process. A decision tree is a tree-like graph consisting of
internal nodes which represent a test on an attribute and
branches which denote the outcome of the test and leaf
nodes which signify a class label.
Introduction
• Decision trees can analyze data and identify
significant characteristics in the network that
indicate malicious activities. It can add value to many
real-time security systems by analyzing large set of
intrusion detection data. It can recognize trends and
patterns that support further investigation, the
development of attack signatures, and other
activities of monitoring.
Literature Review
• D.P. Gaikwad, and R.C. Thool have applied Genetic Algorithm
(GA) for intrusion detection and resulted an accuracy of 79%3.
• P. Aggarwal, and S.K. Sharma4 evaluated classification
algorithms such as Random Forest, C4.5, Naïve Bayes, and
Decision Table. Random Tree shows the best results overall
while the algorithms that have high detection rate and low
false alarm rate were C4.5 and Random Forest.
• S. Mallissery, S. Kolekar, and R. Ganiga5 Mallissery, Kolekar and
Ganiga [5] show how useful is the NSL-KDD for various
intrusion detection models. They use PCA technique for
dimensionality reduction. They resulted that SVM gives the
highest accuracy.
Literature Review
• M. El basiony et al. have proposed a hybrid detection
framework that depends on data mining classification and
clustering techniques. In misuse detection, random forests
classification algorithm is used to build intrusion patterns
automatically from a training dataset, and then matches
network connections to these intrusion patterns to detect
network intrusions. In anomaly detection, the k-means
clustering algorithm is used to detect novel intrusions by
clustering the network connections data to collect the most of
intrusions together in one or more clusters6.
Literature Review
• M. Kumar et al. have analyzed the performance of IDS using
different decision tree algorithms and they found that the
performance was better in C5.0 7.

• G. Zhai and C. Liu have done improvement on ID3 algorithm

for intrusion detection. In their research, experimental results
showed the effectiveness of the algorithm, false alarm rate
and omission rate decreased, increasing the detection rate
and reducing the space consumption 8.
Methodology
• Classifiers
– J48
– CART

• Dataset
– NSL-KDD dataset9,10 is used.
Methodology
J48 Classifier
• The C4.5 algorithm for building decision trees is
implemented in Weka as a classifier called J48. It is a
greedy and a top-down approach to decision tree
making.
• It is used for classification in which new data is
labeled according to already existing observations
(training data set).
Methodology
J48 Classifier
• Step-1: Create a root node N;
• Step-2: IF (T belongs to same category C)
{leaf node = N; Mark N as class C; Return N; }
• Step-3: For i=1 to n
{Calculate Information_gain (Ai);}
• Step-4: ta= testing attribute;
• Step-5: N.ta = attribute having highest information_gain;
• Step-6: if (N.ta == continuous )
{ find threshold;}
• Step-7: For (Each T in splitting of T)
if (T is empty) {child of N is a leaf node;}
else {child of N= dtree T)}
• Step-8: Calculate classification error rate of node N;
• Step-9: return N;
Methodology
CART Classifier
• Simple Cart method is CART (Classification And
Regression Tree) analysis implemented in WEKA.
• It is used for data exploration and prediction also.
Classification and regression trees are classification
methods which in order to construct decision trees
uses historical data. CART uses learning sample
which is a set of historical data with preassigned
classes for all observations for building decision tree.
Methodology
CART Classifier
• Step-1: Start at the root node.
• Step-2: For each ordered variable X,
convert it to an unordered variable X’ by grouping its values in the node
into a small number of intervals if X is unordered, then set X’ = X
• Step-3: Perform a chi-squared test of independence of each X’ variable
versus Y on the data in the node and compute its significance probability.
• Step-4: Choose the variable X∗ associated with the X’ that has the smallest
significance probability.
• Step-5: Find the split set {X∗ ∈ S∗} that minimizes the sum of Gini indexes
and use it to split the node into two child nodes.
• Step-6: If a stopping criterion is reached, exit.
Otherwise, apply steps 2–5 to each child node.
• Step-7: Prune the tree with the CART method.
Methodology
• Information Gain
• The information gain is used to select the splitting attribute in each
node in the tree. The attribute with higher information gain is
chosen as the splitting attribute for the current node.

• The information gain of an attribute in J48 Classifier is computed as;

• Where, S is the set of all the examples in the given training set. S_a
is the subset of S, values (a) is the set of all possible values of
attribute a and |a| is the total number of values in attribute a.
• Entropy can be calculated as
Methodology
• Information Gain
• The information gain of an attribute in CART Classifier is
computed as;

• Where ppos is the probability that data instance in class

positive being chosen, and (1-ppos) is the probability that that
instance is incorrectly labeled as negative.
• Where pneg is the probability that data instance in class
negative being chosen, and (1-ppos) is the probability that that
instance is incorrectly labeled as positive.
Implementation
• The J48 and CART algorithms are implemented and executed
using WEKA tool.

• The experimental setup for NSL-KDD training dataset is done

in WEKA using J48 decision tree approach and Simple CART.

• The experiments were performed on full training dataset

having 125973 instances and test dataset having 22544
instances. The dataset contains 41 features.
Analysis
• The analysis is done based on different parameters such as
true positive rate, false positive rate, precision, recall, f-
measure and accuracy.
• True Positive (TP): Anomaly instances correctly classified as
anomaly. True positive rate can be calculated as;

• False Positive (FP): Normal instances wrongly classified as

anomaly. False positive rate can be calculated as;
Analysis
• True Negative (TN): Normal instances correctly classified as
normal.
• False Negative (FN): Anomaly instances wrongly classified as
normal.
• Accuracy (AC): It indicates the proportion of correct
classifications of the total record in the testing set. Accuracy
(AC) can be calculated as;

• Precision (P): It indicates the proportion of correct predictions

of intrusion divided by the total number of predicted
intrusions in the testing process. Precision (P) can be
calculated as;
Analysis

• Recall (R): It indicates the proportion of correct predictions of

intrusions divided by the total number of actual intrusion
instances in the testing set. Recall is calculated as;

• F-measures (F): It represent both precision and recall and is

calculated as;
Results
• The Confusion Matrix for J48 Classifier is ;

Normal Anomaly

Normal 9574 137

Anomaly 179 12654
Results
• The Accuracy Parameter for J48 Classifier are ;

Precision Recall F-Score Class

98.2 98.6 98.4 Normal

98.9 98.6 98.8 Anomaly
Weighted 98.6 98.6 98.6
Average
• Time: 3.94 Seconds
• Accuracy: 98.59%
Results
• The Confusion Matrix for CART Classifier is ;

Normal Anomaly

Normal 9571 140

Anomaly 158 12675
Results
• The Accuracy Parameter for CART Classifier are ;

Precision Recall F-Score Class

98.4 98.6 98.5 Normal

98.9 98.8 98.8 Anomaly
Weighted 98.7 98.7 98.7
Average
• Time: 36.48 Seconds
• Accuracy: 98.67%
Results
Graph Showing Accuracy of the J48 and CART Classifiers in percentage
0.9868

0.9866

0.9864
Accuracy in Percentage

0.9862

J48
CART
0.986

0.9858

0.9856

0.9854
Classifier
Results
Graph Showing time in seconds to build tree by J48 and CART Classifier
0.4

0.35

0.3

0.25
Seconds

0.2

J48 CART
0.15

0.1

0.05

0
Classifier
Conclusion
• Intrusion is any attack that aims to compromise the security of any
network. Intrusion detection decision tree assists the network
administrator to decide about the incoming traffic, i.e., whether the
coming data is malicious or not by providing a model that separates
malicious and non-malicious traffic.
• In this study, J48 and CART classifiers are used for detecting possible
intrusions in order to detect whether the input data is malicious or
non-malicious. During the experiment, the input test data are
classified as normal and anomaly.
• From the result of experiments, it is observed that the accuracy for
J48 classifier is 98.59% and for CART classifier is 98.67%.
• J48 is found to be computationally efficient as it builds the model in
3.94 seconds while CART takes 36.48 seconds.
References
1 N. Bhargava, G. Sharma, R. Bhargava, and M. Mathuria (2013), "Decision Tree
Analysis on J48 Algorithm for Data Mining." International journal of advance
research in computer science and software engineering
2 P.N. Tan, M. Steinbach, V. Kumar (2005), Introduction to Data Mining, Pearson
Addison Wesley
3 D.P. Gaikwad, and R.C. Thool (2015), Intrusion Detection System Using Bagging
with Partial decision Tree Base Classifier, Proc. of the 4th International
Conference on Advances in Computing, Communication and Control
4 P. Aggarwal, and S.K. Sharma (2015), An Empirical Comparison of Classifiers to
Analyze Intrusion Detection, Proc. of Fifth International Conference an
Advanced Computing and Communication Technologies
5 S. Mallissery, S. Kolekar, and R. Ganiga (2013), Accuracy Analysis of Machine
Learning Algorithms for intrusion Detection System using NSL-KDD Dataset,
Proc. International Conference on Future Trends in Computing and
Communication FTCC
References
6 M. El basiony , E. A. Sallam , T. E. Eltobely , M. M. Fahmy (2013) A hybrid
network intrusion detection framework based on random forests and
weighted k-means, Ain Shams Engineering Journal, 753-762
7 M. Kumar , M. Hanumanthappa, T. V. Suresh Kumar (2012), Intrusion
Detection System Using Decision Tree Algorithm, Proceeding of IEEE 14th
International Conference on Communication Technology, DOI:
10.1109/ICCT.2012.6511281
8 G. Zhai and C. Liu (2010), Research and Improvement on ID3 Algorithm in
Intrusion Detection System, Sixth International Conference on Natural
Computation
9 M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani (2009), A detailed analysis
of the KDD CUP 99 data set, in Proc. IEEE Symp. Comput. Intell. Secur.
Defense Appl.
10 NSL-KDD dataset [online] available: http://nsl.cs.unb.ca/nsl-kdd/.

6th Central Pay Commission Salary Calculator
100% (436)
6th Central Pay Commission Salary Calculator
15 pages
Eyewitness Insect Worksheet Key
No ratings yet
Eyewitness Insect Worksheet Key
2 pages
Literature Review
No ratings yet
Literature Review
2 pages
Death, Grief and Poverty in Britain, 1870-1914
100% (4)
Death, Grief and Poverty in Britain, 1870-1914
306 pages
Improved Intrusion Detection Applying Feature Selection Using Rank & Score of Attributes in KDD-99 Data Set
No ratings yet
Improved Intrusion Detection Applying Feature Selection Using Rank & Score of Attributes in KDD-99 Data Set
44 pages
Experiment No. 7
No ratings yet
Experiment No. 7
4 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
ML Recap
No ratings yet
ML Recap
96 pages
An Application of Machine Learning To Network Intrusion Detectio
No ratings yet
An Application of Machine Learning To Network Intrusion Detectio
7 pages
CART - Machine Learning
No ratings yet
CART - Machine Learning
29 pages
20131016125725977
No ratings yet
20131016125725977
7 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Performance Assessment of Different Classification Techniques For Intrusion Detection
No ratings yet
Performance Assessment of Different Classification Techniques For Intrusion Detection
5 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
S&ML Unit 6- Q & A
No ratings yet
S&ML Unit 6- Q & A
12 pages
Decision Tree Analysis On J48 Algorithm PDF
No ratings yet
Decision Tree Analysis On J48 Algorithm PDF
6 pages
Objective Segmentation
No ratings yet
Objective Segmentation
21 pages
Ensemble Based Approach For Intrusion
No ratings yet
Ensemble Based Approach For Intrusion
8 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Data Warehousing and Data Mining: Classification, Trees
No ratings yet
Data Warehousing and Data Mining: Classification, Trees
26 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
Literature Survey
No ratings yet
Literature Survey
1 page
Comparison_of_classification_techniques_for_intrus
No ratings yet
Comparison_of_classification_techniques_for_intrus
6 pages
Intrusion Detection System An Automatic Machine Learning Algorithms Using Auto - WEKA
No ratings yet
Intrusion Detection System An Automatic Machine Learning Algorithms Using Auto - WEKA
5 pages
CSE 422 Machine Learning Tree Based Methods
No ratings yet
CSE 422 Machine Learning Tree Based Methods
35 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Exercise 4
No ratings yet
Exercise 4
3 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
2.1(1)
No ratings yet
2.1(1)
8 pages
Analysis of Machine Learning Algorithms Using WEKA: Aaditya Desai Dr. Sunil Rai
No ratings yet
Analysis of Machine Learning Algorithms Using WEKA: Aaditya Desai Dr. Sunil Rai
6 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
DMDW 04
No ratings yet
DMDW 04
10 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
1
No ratings yet
1
2 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Decision Tree Method in Financial Analysis of Listed Logistics Companies
No ratings yet
Decision Tree Method in Financial Analysis of Listed Logistics Companies
6 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
My Decision Tree Algorithm
No ratings yet
My Decision Tree Algorithm
21 pages
Article 13
No ratings yet
Article 13
6 pages
Intrusion Detection Systems Using Decision Tree Classifier: Dr. K.K.Shukla
No ratings yet
Intrusion Detection Systems Using Decision Tree Classifier: Dr. K.K.Shukla
23 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Ch13. Decision Tree: KH Wong
No ratings yet
Ch13. Decision Tree: KH Wong
82 pages
SWENG545 Kraus Luttrell Term Project
No ratings yet
SWENG545 Kraus Luttrell Term Project
3 pages
Dinesh Kumar Indra Panwar Arjan Singh
No ratings yet
Dinesh Kumar Indra Panwar Arjan Singh
19 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
WEEK 13 ML
No ratings yet
WEEK 13 ML
3 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
PR GTU IMP questions by jay
No ratings yet
PR GTU IMP questions by jay
35 pages
BANA 560 Lecture - 5 - NaiveBayes - Decision - Tree
No ratings yet
BANA 560 Lecture - 5 - NaiveBayes - Decision - Tree
42 pages
Decision Tree Notes
No ratings yet
Decision Tree Notes
6 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Wang 2020 J. Phys. Conf. Ser. 1437 012007
No ratings yet
Wang 2020 J. Phys. Conf. Ser. 1437 012007
8 pages
Intrusion Detection Using Pca With Random Forest
No ratings yet
Intrusion Detection Using Pca With Random Forest
6 pages
Phising Email Using Cuckoo Search With SVM
No ratings yet
Phising Email Using Cuckoo Search With SVM
6 pages
Phishing Detection Using Machine Learning
No ratings yet
Phishing Detection Using Machine Learning
9 pages
Using X Ray Images and Deep Learning For Automated Detection of Coronavirus Disease
No ratings yet
Using X Ray Images and Deep Learning For Automated Detection of Coronavirus Disease
13 pages
37 127 1 PB PDF
No ratings yet
37 127 1 PB PDF
25 pages
Automatic Text Summarization With Machine Learning - An Overview - by Luís Gonçalves - Luisfredgs - Medium
No ratings yet
Automatic Text Summarization With Machine Learning - An Overview - by Luís Gonçalves - Luisfredgs - Medium
11 pages
Ijser: Image Encryption Using Chaotic Based Artificial Neural Network
No ratings yet
Ijser: Image Encryption Using Chaotic Based Artificial Neural Network
4 pages
Ielts Lesson 1
No ratings yet
Ielts Lesson 1
7 pages
Session 1 Introduction
No ratings yet
Session 1 Introduction
53 pages
Do Earning Wives Have The Right To Maintenance
No ratings yet
Do Earning Wives Have The Right To Maintenance
11 pages
Report Text and Descriptive Text
No ratings yet
Report Text and Descriptive Text
2 pages
Honeywell Protect Process Control Systems
No ratings yet
Honeywell Protect Process Control Systems
21 pages
BJT Uhf Mixer
No ratings yet
BJT Uhf Mixer
17 pages
Lending-Schema (Branch-Name, Branch-City, Assets, Customer-Name, Loan-Number, Amount)
No ratings yet
Lending-Schema (Branch-Name, Branch-City, Assets, Customer-Name, Loan-Number, Amount)
6 pages
Graphic Organizers
100% (1)
Graphic Organizers
10 pages
1-s2.0-S1871187122000384-main
No ratings yet
1-s2.0-S1871187122000384-main
12 pages
Adam Fairclough - Was Martin Luther King A Marxist?
No ratings yet
Adam Fairclough - Was Martin Luther King A Marxist?
9 pages
Lethbridge-Stewart and The Band of Evil
No ratings yet
Lethbridge-Stewart and The Band of Evil
42 pages
Effects of Seed Preparation, Sowing Media, Seed Sowing Rate and Harvesting Period On The Production of Chia Microgreens
No ratings yet
Effects of Seed Preparation, Sowing Media, Seed Sowing Rate and Harvesting Period On The Production of Chia Microgreens
6 pages
Angles of Elevation and Depression
No ratings yet
Angles of Elevation and Depression
8 pages
OIML R 124 Edition 1997
No ratings yet
OIML R 124 Edition 1997
32 pages
Chapter - 2 Knowledge-Based System Architecture
No ratings yet
Chapter - 2 Knowledge-Based System Architecture
27 pages
Curriculum Vitae Name: Rajat Banerjee, Ph. D
No ratings yet
Curriculum Vitae Name: Rajat Banerjee, Ph. D
6 pages
Darul Arqam School Homework Spot
100% (1)
Darul Arqam School Homework Spot
9 pages
Pressure Drop Across A Fixed Bed Reactor With Mechanical Failure of Catalyst Pellets Described by Simplified Ergun'S Equation
No ratings yet
Pressure Drop Across A Fixed Bed Reactor With Mechanical Failure of Catalyst Pellets Described by Simplified Ergun'S Equation
3 pages
Serway CP Poll ch13
No ratings yet
Serway CP Poll ch13
60 pages
British Parliamentary Reform in The 19th Century
100% (2)
British Parliamentary Reform in The 19th Century
10 pages
Capital Output Ratio (Slide)
0% (1)
Capital Output Ratio (Slide)
19 pages
NLP For Dummies
No ratings yet
NLP For Dummies
7 pages
BA201 Engineering Mathematic UNIT8 - Cramer's Rule and Inverse Matrix Method
No ratings yet
BA201 Engineering Mathematic UNIT8 - Cramer's Rule and Inverse Matrix Method
14 pages
22ballb37 International Relations Project Work
No ratings yet
22ballb37 International Relations Project Work
17 pages
Flow Guided Interpolation - A GIS-based Method To Represent Contaminant Concentration Distributions in Groundwater
No ratings yet
Flow Guided Interpolation - A GIS-based Method To Represent Contaminant Concentration Distributions in Groundwater
12 pages
Georgia Odyssey 2nd Edition James Charles Cobb - The latest ebook is available for instant download now
100% (2)
Georgia Odyssey 2nd Edition James Charles Cobb - The latest ebook is available for instant download now
57 pages
RFG - Sample 1
No ratings yet
RFG - Sample 1
4 pages
[2K9] FRIENDS - UNIT 7+8
No ratings yet
[2K9] FRIENDS - UNIT 7+8
87 pages
The Empathy Quotient (EQ)
No ratings yet
The Empathy Quotient (EQ)
13 pages