Threat Monitoring and Intelligent Data Analytics of Network Traffic

The document discusses building a machine learning model for threat monitoring and network traffic classification. It proposes a system with online and offline modes to classify network packets in real-time as legitimate or malicious using algorithms like Random Forest and XGBoost on the NSL_KDD dataset. The system architecture includes modules for data collection using HDFS, processing using Apache Spark for real-time analysis at scale, and visualization with Elasticsearch and Kibana. The goal is to develop a scalable tool that can intelligently identify new attacks through distributed machine learning on network data.

Uploaded by

Saipriya Vempalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views8 pages

Threat Monitoring and Intelligent Data Analytics of Network Traffic

Uploaded by

Saipriya Vempalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Threat Monitoring using

Machine Learning
39111069-BHANU KIRAN.V
39111127-ROHITH KUMAR.Y
ABSTRACT
In the present day scenario cyber crimes and data breaches have
become increasingly prevalent and are causing huge losses and hazards
to companies and netizens. For a example, we have seen how a 17year
old allegedly hacked the social media accounts of famous personalities
like Jeff Bezos, and Barack Obama. Late detection of such attacks
increase the possibility of irreparable damage. Such attacks are causing
high financial losses to the organizations , and also affecting the lives of
individuals. Recent studies show that cybercrime results in a loss of
over US$6 trillion by 2021 and the size, complexity, frequency of these
attacks are growing exponentially. Also considering the pandemic
worldwide, the data being generated is increasing rapidly along with
the equal chances of breaches.
So, network security is one of the very essential features that organizations
and individuals should consider.
In this project, we are building an ML model that classifies any network
packet into two categories legitimate and malicious once deployed in network
devices forwards only legitimate packets to the end system. To build the model
we are using the NSL_KDD data set and implementing “Random forest” and
“XGBoost” algorithms for the classification purpose. The data set is built
considering various fields in the packet header.
Introduction to Problem
Domain
Threat Monitoring and Intelligent Data Analytics of Network Traffic mainly
shows better security solutions where we have developed a tool that analysis
real time threat using machine learning algorithms. This focuses on scalability,
performance and correctness to identify diversity of new attacks. The novelty of
the paper lies in using distributed processing completely using Scala and the
Resilient Distributed Datasets, DataFrame and Dataset data structures available
in Apache spark. The tool proposed works in two different modes online and
offline. The major problem lies in nonavailability of public datasets for network
based attacks, of the existing ones the NSL_KDD dataset was chosen as it is the
most efficient one with no redundant data.
Existing system
The dataset has attributes such as protocol type, flag, source bytes,
logged in etc and all the different 24 types of attacks are broadly
classified into 4 which are Dos, Probe, U2R, R2L. The packet is classified
as normal or anomaly based on its attributes. Various machine learning
algorithms were used to build and train the model. The classification
algorithms that were used include Decision Tree, Support Vector
Machine, Gradient Boost, Logistic Regression, Random Forest. The
model which results in best performance is taken considering various
performance measures evaluated such as F1_score, accuracy_score,
recall_score and precision.
Proposed
System
The architecture considered works in 2 different modes which are
online and offline. The online mode does the classification of packets in
real time whereas offline mode is used to take a note of performance
of multiple classifiers subjected to be implemented on the dataset. The
three main modules of the architecture includes data collection,
processing and visualization. The famous big data handling tool Hadoop
Distributed File System(HDFS) was used to collect and retain the data
which is used to train the model and then perform the testing part.
Processing is carried using Apache Spark cluster as it procesess the data
at much faster rates in real time. The visualization module is
implemented using Elasticsearch and Kibana softwares.
Ideation map
Modules
• Data Visualization module
• Data Processing module
• Data collection module