0% found this document useful (0 votes)

69 views

Lab 7

The document summarizes the results of various machine learning algorithms for classifying network traffic data: - Random forest and decision tree algorithms performed best with over 99% precision, recall, accuracy, and F1 score. KNN and SVM had lower success rates of around 75-76% and 72% respectively. - The data distribution was heavily skewed towards benign and DoS attacks, with few user escalation packets, which could affect classification accuracy. - Metrics for the clustering algorithm showed low completeness, homogeneity, and v-measure, indicating the data was incompletely labeled and mixed, which likely contributed to errors in the classification models.

Uploaded by

Stephen Giacobe

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Lab 7

Uploaded by

Stephen Giacobe

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Lab 7

CYBER 362 Section 001

Seth Schuler, Kelly Wallert, Isaac Robles Jr.,
Andrea Hatcher, Stephen Giacobe
The Pennsylvania State University
1

Table of Contents
Table of Contents 1

Section A 2

Section B - Isaac 7

Section C 9
2

Section A
1-a) Table containing performance measures.

According to the information generated by Splunk, with a testing ratio of 80:20, the most precise
tests were the Decision Tree and the Random Forest analysis. Looking at the table shown below,
both the Decision Tree and Random Forest had a precision score of 0.99 or 99%, with the same
scores for recall, accuracy, and F1.

When looking at the confusion Matrices, the likelihood of a False Negative (Type II Error) for
each algorithm are as follows: 16.4% (657), 11.1% (444), 1.3% (51), and 1.1% (46). Meanwhile,
the False Positives (Type I Error) are as follows: 24% (1455), 0.5% (32), 1.4% (83), and 0.8%
(49).
3

1-b) It appears that the most accurate algorithm is a Random Forest, but a decision tree doesn’t
use as much memory and yields almost the same result. If it came to an instance of speed, I’d
pick the Decision tree, but for accuracy, use Random Forest.

2-a) K-Means Clustering

Below, you can see that we used the k-means clustering with K=2, K=3, and K=4. When looking
at the results you can see that the value of k that yields the best clustering. Higher the K value,
tighter the cluster.

ii.
4

iii.

2-b) DBSCAN

Below you can see the DBSCAN clustering algorithm, with the eps = 0.4, eps = 0.2, eps = 1.0. When
looking at the results, the value of eps that yields the best clustering.

i.
5

ii.

iii.

iv.

3-a) Predict VPN Usage - Linear Regression

i. R2 Statistic
1. 0.8759
ii. Root Mean Squared Error (RMSE)
1. 75.63

3-b) Screenshot for Decision Tree regression, R2, and RMSE.

i. R2 Statistic
1. 0.8321
ii. Root Mean Squared Error (RMSE)
1. 86.78
7

3-c) Screenshot for Random Forest regression, R2, and RMSE. Commented on the performance of the
three regressors.

i. R2 Statistic
1. 0.9400
ii. Root Mean Squared Error (RMSE)
1. 66.69
The three regression models fit the data in the following order from least to best: Decision Tree,
Linear, and Random Forest. The regression fits the data from 83%-94% which is not too bad.

Section B
a) Loading dataset.
8

a) Table of performance measures.

b) Discussed the performance of these classifiers and provided a reflection on completing this exercise.
According to the table, the random forest and decision tree algorithm performed exceptionally well.
Logistic regression did and the source vector machine algorithm performed the weakest.
9

Section C
1. Execute the code in line 82 and capture the screenshot showing the distribution of the 22
attack types.

2. Execute the code in line 86 and capture the screenshot showing the distribution of the 5
categories (benign and the four attack categories). Comment on the distribution of these
attack categories in terms of how it might affect the classification.

The distribution is heavily skewed to having mostly benign and dos packets. The lack of user
escalation packets are a problem for classifications as it can cause more error than necessary.
Having a more even category of
10

3. Provide the confusion matrix of the decision tree classifier in line 163 and the error rate
in line 164. Comment on the success rate of this classifier. An error rate of 0.2 means the
success rate is 0.8.

With a success rate of around 75.7%, it falls slightly behind the second approach, being
around .1% less effective, which is statistically insignificant.

4. Provide the confusion matrix of the k-nearest neighbor’s classifier in line 177 and the
error rate in line 178. Comment on the success rate of this classifier. We didn't discuss
this one in class but you can read about it on pages 52 and 53 of the course textbook.

The confusion matrix can be seen above, with a success rate of around 75.8%. This
approach was the most effective but should be adjusted more as the success rate is fairly
low.

5. Provide the confusion matrix of the support vector machine classifier in line 191 and the
error rate in line 192. Comment on the success rate of this classifier.

The confusion matrix can be seen above, with a success rate of around 72.2%. After
seeing this, we concluded that this approach was less effective than the other two;
containing around 3% more error.
11

6. Execute the code in lines 232, 233 and 234. Comment on the values you get for these
three parameters.

The values for completeness, homogeneity, and v-measure are very low showing that the
data is very mixed and labeled incompletely. This would explain some of the errors that
resulted from the three approaches.

DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
Stats 101c Final Project
100% (1)
Stats 101c Final Project
16 pages
Modelling-project notes-2
No ratings yet
Modelling-project notes-2
26 pages
ml 2m cie2
No ratings yet
ml 2m cie2
4 pages
ML Mid Sem Sep2023 Paper
No ratings yet
ML Mid Sem Sep2023 Paper
3 pages
Optimized Classification On Forest Covertype: COMP5318 M L D M A 2 R
No ratings yet
Optimized Classification On Forest Covertype: COMP5318 M L D M A 2 R
16 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
sqr da 2
No ratings yet
sqr da 2
11 pages
Data Algo Metrics
No ratings yet
Data Algo Metrics
5 pages
Applying Machine Learning To Cyber Security
No ratings yet
Applying Machine Learning To Cyber Security
117 pages
ML UNIT4
No ratings yet
ML UNIT4
10 pages
Detailed Explanations For Your Presentation
No ratings yet
Detailed Explanations For Your Presentation
5 pages
ML Assignment Report Prithvi D
No ratings yet
ML Assignment Report Prithvi D
15 pages
Bhagya Report Final
No ratings yet
Bhagya Report Final
73 pages
Classification
No ratings yet
Classification
4 pages
frmCourseSyllabusIPDownload (2)
No ratings yet
frmCourseSyllabusIPDownload (2)
3 pages
Previous Year Placement Questions of ISI KOLKATA
No ratings yet
Previous Year Placement Questions of ISI KOLKATA
9 pages
ML Recap
No ratings yet
ML Recap
96 pages
ML-2m
No ratings yet
ML-2m
3 pages
Computer Security: A Machine Learning Approach: Sandeep V. Sabnani
No ratings yet
Computer Security: A Machine Learning Approach: Sandeep V. Sabnani
62 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
Diss
No ratings yet
Diss
51 pages
Dissertationfinal
No ratings yet
Dissertationfinal
51 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Paper Presentation - IDS
No ratings yet
Paper Presentation - IDS
2 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
Final Report (1)
No ratings yet
Final Report (1)
17 pages
Ushna FYP
No ratings yet
Ushna FYP
25 pages
Week 5
No ratings yet
Week 5
72 pages
Finalized_blackbook_Group_28
No ratings yet
Finalized_blackbook_Group_28
42 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
43 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
Slay The Day
No ratings yet
Slay The Day
21 pages
Improved Intrusion Detection Applying Feature Selection Using Rank & Score of Attributes in KDD-99 Data Set
No ratings yet
Improved Intrusion Detection Applying Feature Selection Using Rank & Score of Attributes in KDD-99 Data Set
44 pages
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
No ratings yet
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
4 pages
1000099853
No ratings yet
1000099853
2 pages
CE802 Pilot
No ratings yet
CE802 Pilot
2 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet
JETIR2403387
No ratings yet
JETIR2403387
5 pages
Ushna FYP
No ratings yet
Ushna FYP
25 pages
Machine Learnig Revision
No ratings yet
Machine Learnig Revision
93 pages
Vivek Sharma 2k21 Cs 111
No ratings yet
Vivek Sharma 2k21 Cs 111
48 pages
Probability Estimation in Random Forests
No ratings yet
Probability Estimation in Random Forests
35 pages
ComprehensiveExam
No ratings yet
ComprehensiveExam
35 pages
Machine Learning Guide: Meher Krishna Patel
No ratings yet
Machine Learning Guide: Meher Krishna Patel
121 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
10 pages
AICS Topics
No ratings yet
AICS Topics
250 pages
911523405006
No ratings yet
911523405006
16 pages
L03 The Regression Pipeline - 2
No ratings yet
L03 The Regression Pipeline - 2
58 pages
Heart Merged
No ratings yet
Heart Merged
8 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
No ratings yet
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
4 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
25 pages
020 The Google Earth Engine Mangrove Mapping Methodology (GEEMMM)
100% (1)
020 The Google Earth Engine Mangrove Mapping Methodology (GEEMMM)
36 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
12 pages
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
No ratings yet
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
8 pages
Herbapp A Mobile Based Application For Herbal Leaf Recognition Using Image Processing and Regularized Logistic
No ratings yet
Herbapp A Mobile Based Application For Herbal Leaf Recognition Using Image Processing and Regularized Logistic
8 pages
Neural Networks and Fuzzy Systems: Neurolab
No ratings yet
Neural Networks and Fuzzy Systems: Neurolab
17 pages
Application of Data Mining Techniques in
No ratings yet
Application of Data Mining Techniques in
11 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
Thomas 2000
No ratings yet
Thomas 2000
24 pages
AI Lab Manual aktu
No ratings yet
AI Lab Manual aktu
11 pages
ANN Updated Syllabus
No ratings yet
ANN Updated Syllabus
2 pages
2021 Pho1 15 Neural Networks Part1
No ratings yet
2021 Pho1 15 Neural Networks Part1
77 pages
Synopsis Souvik
No ratings yet
Synopsis Souvik
22 pages
FTSTPL-127-2023
No ratings yet
FTSTPL-127-2023
19 pages
Geomatica 2018.1 Object Analyst Guide - PCI - Geomatics
100% (1)
Geomatica 2018.1 Object Analyst Guide - PCI - Geomatics
62 pages
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
No ratings yet
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
30 pages
An Analysis Method For Interpretability of CNN Text Classification Model
No ratings yet
An Analysis Method For Interpretability of CNN Text Classification Model
14 pages
(Articulo) Class Imbalance, and Cost Sensitivity - Why Undersampling Beats Over - Sampling PDF
No ratings yet
(Articulo) Class Imbalance, and Cost Sensitivity - Why Undersampling Beats Over - Sampling PDF
8 pages
SQL Injection Detection Using Machine Learning
No ratings yet
SQL Injection Detection Using Machine Learning
51 pages
Experiment No 8
No ratings yet
Experiment No 8
4 pages
Weka Ex
No ratings yet
Weka Ex
3 pages
Pattern Recognition-Theory
No ratings yet
Pattern Recognition-Theory
2 pages
ANN Unit 3
No ratings yet
ANN Unit 3
11 pages
Machine Learning Based Framework
No ratings yet
Machine Learning Based Framework
45 pages
Chap4 Ann
No ratings yet
Chap4 Ann
22 pages
Mini Projects 2024-25 LIST
No ratings yet
Mini Projects 2024-25 LIST
66 pages
Artificial Intelligence in Asset Management 1st Edition Cfa Institute Research Foundation 2024 Scribd Download
100% (4)
Artificial Intelligence in Asset Management 1st Edition Cfa Institute Research Foundation 2024 Scribd Download
32 pages
Unit 4 Data Science
No ratings yet
Unit 4 Data Science
21 pages
Get Prognostics and Remaining Useful Life RUL Estimation Predicting with Confidence 1st Edition Diego Galar Kai Goebel Peter Sandborn Uday Kumar free all chapters
100% (5)
Get Prognostics and Remaining Useful Life RUL Estimation Predicting with Confidence 1st Edition Diego Galar Kai Goebel Peter Sandborn Uday Kumar free all chapters
65 pages

Lab 7

Uploaded by

Lab 7

Uploaded by

Lab 7

CYBER 362 Section 001

2-a) K-Means Clustering

3-a) Predict VPN Usage - Linear Regression

3-b) Screenshot for Decision Tree regression, R2, and RMSE.

a) Table of performance measures.

You might also like