0% found this document useful (0 votes)
9 views71 pages

Project Document

The document presents a project report on developing a Network Intrusion Detection System (NIDS) using machine learning techniques, specifically employing algorithms like XG Boost, Random Forest, and Logistic Regression. The system aims to enhance cybersecurity by accurately detecting and classifying network intrusions using the CICIDS 2018 dataset. The project addresses the limitations of traditional intrusion detection methods by leveraging machine learning for improved accuracy and reduced false alarms in real-time applications.

Uploaded by

gdg60505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views71 pages

Project Document

The document presents a project report on developing a Network Intrusion Detection System (NIDS) using machine learning techniques, specifically employing algorithms like XG Boost, Random Forest, and Logistic Regression. The system aims to enhance cybersecurity by accurately detecting and classifying network intrusions using the CICIDS 2018 dataset. The project addresses the limitations of traditional intrusion detection methods by leveraging machine learning for improved accuracy and reduced false alarms in real-time applications.

Uploaded by

gdg60505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

NETWORK INTRUSION DETECTION SYSTEM

USING MACHINE LEARNING


A Project report submitted
In the partial fulfillment of the requirements for the award of degree Of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
(2021-2025)
Submitted By
M. PAVITHRA G. RAMYA
DEVI
(21991A0522) (21991A0556)
A. PRASAD
(21991A0555)
V. GOVIND N.LAKSHMANARAO
(21991A0507) (21991A0544)

Under the esteemed guidance of


B. GEETHA SRI, MTECH
Assistant. Professor
Dept. of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

AVANTHI’S St. THERESSA INSTITUTE OF ENGINEERING & TECHNOLOGY


(Affiliated to Jawaharlal Nehru Technological University, Gurajada,
Vizianagaram)Garividi,
(Cheepurupalli), Vizianagaram Dist. – 535101
(2021 – 2025)
AVANTHI’S St. THERESSA INSTITUTE OF ENGINEERING & TECHNOLOGY
(Affiliated to Jawaharlal Nehru Technological University, Gurajada,Vizainagaram)
Garividi,
(Cheepurupalli), Vizianagaram Dist. – 535101
(2021 – 2025)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
This is to certify that the project work entitled “NETWORK INTRUTION DETECTION
SYSTEM USING MACHINE LEARNING ” is a bonafied work done by M. Pavithra devi
(21991A0522), G. Ramya(21991A0556), A. Prasad (21991A0555), V. Govind (21991A0507),
N. Lakshmanarao(21991A0544) and under my supervision in partial fulfilment for the award
of the degree of Bachelor of technology in Computer Science and engineering , Avanthi st.
thereasa institute of engineering & technology, Garividi , Vizianagaram during the academic
year 2021-2025.

INTERNAL GUIDE HEAD OF THE DEPARTMENT


B. GEETHA SRI, MTECH Mr. M. UMAMAHESHWARARAO, M. TECH,
(Ph.D.)

Assistant Professor, Dept. of C.S.E. Assistant Professor, Dept. of C.S.E.

EXTERNAL EXAMINER
ACKNOWLEDGEMENT

We place on record and warmly acknowledge the continuous encouragement, invaluable


supervision, timely suggestions and inspired guidance offered by our guide Assistant Professor
Mrs. B. GEETHA SRI, MTech, Assistant Professor, Department of Computer Science and
Engineering Avanthi’s St. Theressa Institute of Engineering & Technology in bringing this
project work and report to a successful completion.
We wish to express our deep sense of gratitude to Mr. M. UMAMAHESWARA RAO, MTech,
(Ph.D.), Assistant Professor, and Head of Computer Science and Engineering for his whole

hearted co-operation, unfailing inspiration and valuable guidance. Throughout the work, his
valuable suggestions, constant encouragement have helped us a long way. We thank him for
giving his valuable time at odd hours and for the patience and understanding he showed, that
greatly helped the seminar work to get successfully completed.
We convey our sincere thanks to Dr. V. JOSHUA JAYA PRASAD, Principal of Avanthi’s St.
Theressa Institute of Engineering & Technology who provided an opportunity to tack on project
work in well-equipped laboratories of computer science department in our college.

At the outset, we thank SRI M. SRINIVASA RAO, beloved chairman of Avanthi group of
colleges who is the back bone of college, Thank you sir.

M. Pavithra devi (Regd. No: 21991A0522)

G. Ramya (Regd. No:21991A0556)

A. Prasad (Regd. No:21991A0555)

V. Govind (Regd. No:21991A0507)

N. Lakshmanarao (Regd. No:21991A0544)


DECLARATION

We declare that this project entitled “NETWORK INTRUSION DETECTION SYSTEM USING
MACHINE LEARNING” is an original work done by M. Pavithra devi (21991A0522), G.
Ramya(21991A0556), A. Prasad (21991A0555), V. Govind (21991A0507), N. Lakshmanarao
(21991A0544) for B. Tech Degree and we assure that this project work hasn’t been submitted
by us towards any degree or diploma of this or any other university.

M. Pavithra devi (Regd. No: 21991A0522)

G. Ramya (Regd. No:21991A0556)

(Regd. No:21991A0555)
A. Prasad

(Regd. No:21991A0507)
V. Govind

N. Lakshmanarao (Regd. No:21991A0544)


NETWORK INTRUSION
DETECTION SYSTEM USING
MACHINE LEARNING
TABLE OF CONTENTS
Title Page No

CERTIFICATE

ACKNOWLEDGEMENT

DECLARATION

TABLE OF CONTENTS

LIST OF FIGURES

LIST OF TABLES

ABSTRACT

1 INTRODUCTION 1

1.1 Brief Introduction 2

1.1.1 Characteristics and Services Models 3

1.1.2 Services Models 4

1.1.3 Benefits and advantages of vital signs 4

1.2 Literature Survey 5

1.3 Motivation 9

1.4 Objective 10

1.5 Problem Statement 12

2 SYSTEM ANALYSIS 13

2.1 Existing System 14

2.2 Proposed System 15

2.3 Feasibility Study 17

2.3.1 Economical Feasibility 18

2.3.2 Technical Feasibility 18

2.3.3 Social Feasibility 18


2.4 System Requirement Specification 19

2.4.1 Functional Requirements 19

2.4.2 Non-Functional Requirements 20

2.5 Hardware Requirements 21

2.6 Software Requirements 21


22
3 SYSTEM DESIGN

3.1 System Architecture 23

3.2 System Modules 24

3.2.1 Methodology 24

3.3 Introduction to UML Diagrams 31

3.3.1 Use Case Diagram 31

3.3.2 Class Diagram 33

3.3.3 Sequence Diagram 34

3.3.4 Activity Diagram 35

3.3.5 Component Diagram 38

3.3.6 Deployment Diagram 40

42
4 SYSTEM IMPLEMENTATION

4.1 Front End Implementation 43

4.2 Back End Implementation 43

4.3 Source Code 44


48
5 SYSTEM TESTING

5.1 Performance Analysis 49

5.1.1 Approaches to Sentence Extraction 49

5.1.2 Performance Metrics Used 49

5.2 Front End Output 51

5.3 Final Output 52

5.4 Output Screens 54


55
6 CONCLUSION

REFERENCE 57
LIST OF FIGURES

S.No Figure No Name of the figure Page No

1 Fig 1.1 Workflow of machine Learning 3

2 Fig 3.1.1 System Architecture 23

3 Fig 3.2.1 Attack recognition 25

4 Fig 3.2.2 data pre-processing 26

5 Fig 3.2.3 Random forest 28

6 Fig 3.2.4 XG Boost Classifier 29

7 Fig 3.2.5 Logistic Regression 30

8 Fig3.3.1 Use Case Diagram 33

9 Fig 3.3.2 Class Diagram 34

10 Fig 3.3.3 Sequence Diagram 36

11 Fig 3.3.4 Activity Diagram 38

12 Fig 3.3.5 Component Diagram 40

13 Fig 3.3.6 Deployment Diagram 41

14 Fig 5.2.1 Output of Front End 52

15 Fig 5.3.1 Anaconda Navigator 53

16 Fig 5.3.2 Creating a environment variable to run the code 54

18 Fig 5.3.3 Output 54

19 Fig 5.4.1 Output Screen 55


ABSTRACT

Intrusion Detection Systems are essential components of modern cybersecurity frameworks,


responsible for monitoring, analyzing, and detecting unauthorized or malicious activities
within network environments. Due to the exponential growth of network traffic and the
increasing sophistication of cyberattacks, identifying threats manually has become a complex
and time-consuming task. To address this challenge, this project proposes an intelligent IDS
framework that leverages machine learning techniques to automatically detect and classify
network intrusions. The system incorporates three powerful machine learning algorithms XG
Boost, Random Forest, and Logistic Regression to analyze network traffic patterns and predict
potential threats. The model is trained and evaluated using the CICIDS 2018 dataset, a
comprehensive and realistic intrusion detection benchmark that includes various attack
scenarios and normal traffic captured in a simulated enterprise environment. The dataset
reflects modern attack behaviors, making it highly suitable for developing practical IDS
solutions. The proposed system employs multi-class classification to categorize different types
of network traffic. Performance is measured using standard metrics such as detection rate, false
positive rate, and average misclassification cost. The experimental results demonstrate that the
selected machine learning algorithms significantly enhance detection accuracy, reduce false
alarms, and contribute to the development of robust and efficient intrusion detection systems.
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

CHAPTER-1

INTRODUCTION

Page | 1
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

1. INTRODUCTION
In today's interconnected digital world, cybersecurity has become a critical concern for
individuals, organizations, and governments. With the widespread use of internet-enabled
devices, cloud services, and enterprise networks, the volume of network traffic has grown
exponentially. As a result, the frequency and sophistication of cyberattacks have also increased,
posing significant risks to data confidentiality, integrity, and availability. To counter these
threats, advanced security mechanisms such as Intrusion Detection Systems have become an
essential layer in modern cybersecurity infrastructure.
An Intrusion Detection System is a network security tool that monitors traffic patterns and
detects unauthorized, anomalous, or malicious activities within a network. There are two
primary types of intrusion detection systems: signature-based and anomaly-based. Signature-
based IDS detect threats by comparing network activity against a database of known attack
patterns or signatures.These systems are highly accurate in detecting previously identified
threats but struggle to recognize new or unknown attacks, including zero-day vulnerabilities.
In contrast, anomaly-based IDS analyze the behavior of users and systems to establish a
baseline of normal activity. Deviations from this baseline are flagged as potential threats.
Although anomaly-based systems are better at detecting novel attacks, they often suffer from
high false positive rates, which can overwhelm analysts with irrelevant alerts.
This project focuses on developing a machine learning-based Network Intrusion Detection
System using real-world datasets. The system aims to accurately classify normal and malicious
network traffic by leveraging powerful ML algorithms like Random Forest, XGBoost, and
Logistic Regression. The ultimate goal is to create a robust, scalable, and intelligent intrusion
detection framework that enhances the security posture of modern digital infrastructures while
addressing the shortcomings of traditional IDS approaches.
1.1 Brief Introduction:

In recent years, the number and complexity of cyberattacks have significantly increased,
targeting both individuals and organizations. This creates a strong need for effective security
systems that can monitor network traffic and detect malicious activities in real time. One such
system is an Intrusion Detection System (IDS), which plays a vital role in identifying abnormal
or unauthorized behaviour within a network.
This project focuses on designing a Robust Network Intrusion Detection System using machine
learning techniques. The system is built using three key algorithms: XGBoost, Random Forest,

Page | 2
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

and Logistic Regression. These models are trained and tested using the CICIDS2018 dataset,
which is a realistic and comprehensive dataset containing both normal and various attack traffic
types. The system aims to accurately detect different types of intrusions and minimize false
alarms, making it practical for real-time network security applications.
Traditional IDS solutions rely heavily on signature-based detection, where known patterns of
malicious behavior are identified and blocked. However, this approach fails to detect zero-day
attacks or newly evolving threats that do not match any known signatures. To overcome this
limitation, machine learning-based approaches have emerged as a more intelligent and adaptive
solution for building robust intrusion detection systems.
In this project, we aim to implement a Robust Network Intrusion Detection System (RNIDS)
that leverages machine learning techniques to accurately detect intrusions in real-time. We
utilize three well-established algorithms XG Boost, Random Forest, and Logistic Regression—
to train classification models on a realistic dataset. By applying advanced data preprocessing,
feature selection, model evaluation, and real-time prediction, this system is designed to be
effective, scalable, and practical for real-world security environments.

Fig 1.1 Workflow of machine Learning

1.1.1 Characteristics and Services Models:


Network intrusion detection systems that use machine learning have several key
characteristics:

 Adaptability: ML models can adapt to new types of attacks as they learn from fresh
data.

 Automation: Reduces human involvement in detecting threats, which speeds up


response times.

 Accuracy: Models can learn from complex patterns and improve accuracy over time.

Page | 3
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

 Scalability: They can handle large volumes of network traffic efficiently.

 These systems are often deployed in various cloud based or on-premises environments.
Their services may range from real-time monitoring to historical log analysis,
depending on the needs of the organization.

1.1.2 Service Models:


 The service models in cloud computing enhance the integration of ML-based security
systems. These include:

 IaaS (Infrastructure as a Service): Allows organizations to manage their hardware


resources virtually. This is beneficial for large-scale model training, storage of massive
datasets, and running computationally intensive algorithms.

 PaaS (Platform as a Service): Provides a platform to build and deploy ML


applications. Developers can focus on model development without worrying about
infrastructure management.

 SaaS (Software as a Service): Enables users to access intrusion detection features


through user-friendly interfaces. This model ensures accessibility and convenience
without the need for local installations.

1.1.3 Benefits of Data Analysis with Machine Learning


 Integrating machine learning into data analysis provides several benefits:

 Improved Threat Detection: ML models can detect complex and evolving cyber
threats that may go unnoticed by traditional systems.

 Automation: Reduces the need for manual rule updates, allowing systems to learn and
adapt autonomously.

 Efficiency: Accelerates the detection process, allowing for faster incident response and
mitigation.

 Pattern Recognition: Learns from historical data to identify hidden patterns, making
the system robust against zero-day attacks.

Page | 4
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

 Predictive Capabilities: Enables forecasting of potential vulnerabilities and threats


before they materialize.

These benefits make ML a powerful tool in modern cybersecurity frameworks, paving the
way for intelligent, self-learning security systems capable of defending against a wide array
of network-based threats.

1.2 Literature Review

The study examines ML and DL methods for NIDS (network intrusion detection
systems).Effectiveness, problems including false alarms, and real-time processing are covered.
According to the study, feature selection and dataset quality are crucial.[1] For a deep learning-
based NIDS, the study employs Self-Taught Learning (STL) with sparse autoencoders and
softmax regression. It increases intrusion detection efficiency and accuracy when tested on the
NSL-KDD dataset.[2] A probabilistic approach to estimating discrete values in huge datasets is
presented in the paper. It presents an enhanced approach for less computationally complex and
more accurate estimations. For database administration and query optimization, this is
essential.[3] Using Random Forest modeling, the paper suggests an NIDS that exhibits increased
efficiency and accuracy on the KDD'99 dataset. It draws attention to how ensemble learning
may improve NIDS performance [4] The study suggests a NIDS that uses deep learning to detect
known and unknown threats. The system exhibits flexibility and durability in identifying
network intrusions using the UNSW-NB15 dataset. This study demonstrates how deep neural
networks can improve cybersecurity protocols.[5] The work highlights adaptive strategies and
real-time monitoring while presenting an intrusion detection strategy utilizing statistical
anomaly detection. It improves the fundamental knowledge of cybersecurity intrusion
detection.[6] The report examines IDS approaches, pointing out shortcomings and difficulties in
hybrid, anomaly-based, and signature-based approaches. It ends with suggestions for future
study aimed at enhancing IDS models.[7] Using the KDD Cup 99 dataset, the study suggests a
fuzzy logic-based NIDS that produces detection rules from frequently occurring itemsets with
good accuracy. It draws attention to how fuzzy logic might improve intrusion detection.[8]
A hybrid NIDS that combines anomaly and misuse detection algorithms is presented in the
paper. It lowers false positives while increasing detection accuracy.[9] A hybrid Network
Intrusion Detection System (NIDS) that blends anomaly-based and signature-based detection
techniques is presented in this research. In detecting network intrusions, this method lowers

Page | 5
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

false positives and improves detection accuracy.[10] With an emphasis on deep learning
methodologies, the study investigates machine learning-based NIDS in Software Defined
Networking (SDN) systems. It assesses different approaches, points out difficulties, and makes
recommendations for future lines of inquiry.[11] The study examines intrusion detection
systems and divides them into three categories: stateful protocol analysis, anomaly-based, and
signature-based. It draws attention to their advantages, disadvantages, and the requirement for
flexible real-time monitoring.[12] The UNSW-NB15 dataset, which is intended to assess
network intrusion detection systems using contemporary attack scenarios, is introduced in this
research. With the goal of advancing NIDS research and development, it provides a wide range
of features and attack kinds.[13] A bibliometric study of research on oxidative stress in
intervertebral disc degeneration (IDD) is presented in this paper. It covers prominent authors,
publication patterns, and new studies on senescence and mitochondria.[14] The study suggests
a Hidden Naïve Bayes classifier with PKI discretization and INTERACT feature selection for
network intrusion detection. It outperforms conventional models in terms of accuracy and
predictive performance.[15] An unsupervised network intrusion detection system that can
recognize unknown attacks without any prior knowledge is presented in the paper. The
technology efficiently identifies anomalies by using clustering algorithms and examining
network traffic patterns, strengthening network security against new threats.[16]
Using the Kyoto 2006+ dataset and the J48 decision tree technique for NIDS classification,
the study identifies known, unknown, and normal attacks with 97.2% accuracy. It draws
attention to the shortcomings of conventional signature-based NIDS and illustrates how
successful anomaly detection methods are.[17] The UNSW-NB15 and KDD99 datasets are
used to compare the efficacy of feature selection methods in NIDS. By improving feature
selection through the use of Association Rule Mining, it shows that UNSW-NB15 attributes
increase detection accuracy but have a higher False Alarm Rate.[18]Using a linear classifier
for attack detection and a sparse autoencoder for feature learning, the study suggests a deep
learning-based NIDS. It shows improved recall, accuracy, and precision in recognizing known
and novel attacks.[19] By integrating Genetic Algorithms into SVM-based IDS, the study
improves feature selection and parameter modification, increasing detection efficiency and
accuracy. The advantages of the GA-SVM fusion over solo SVM techniques is confirmed by
experimental findings on intrusion detection datasets.[20] By concentrating on network-wide
event analysis rather than host-based monitoring, network-based intrusion detection has

Page | 6
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

developed to handle multiple network threats. By modeling attacks with state diagrams and
hypergraphs, the State Transition Analysis Technique enhances event monitoring and IDS
settings.[21] According to the research, machine learning-based classifiers cannot be
effectively evaluated and compared across different datasets when Network Intrusion
Detection System datasets lack a uniform feature set. Researchers have suggested NetFlow-
based feature sets as a solution to this problem, and they have shown better classification
performance and enable a more thorough evaluation of NIDS models.[22]
Debar, Dacier, and Wes pi give a thorough overview of intrusion detection systems in their
2002 work "Intrusion Detection Systems: Technology and Development," outlining the
difficulties in the field and going over different detection techniques.[23] Through an analysis
of several attack strategies and accompanying response mechanisms, the study "Adversarial
Machine Learning for Network Intrusion Detection Systems: A Comprehensive Survey" offers
a thorough overview of how adversarial machine learning techniques provide problems to
NIDS. It highlights how important it is to have strong NIDS that can withstand hostile attacks
in order to preserve network security.[24] An overview of different deep learning approaches
used with intrusion detection systems is given in the publication "A Survey of Intrusion
Detection Systems Based on Deep Learning". It highlights the need for more study to improve
IDS performance while going over the advantages and disadvantages of these strategies.[25]
The proceedings of the 2018 APAN Research Workshop, which focused on experimental
performances associated with the NSL-KDD dataset, seem to be the paper in question. A
thorough literature review is not possible due to restricted access to the complete text.The
core[26]
The literature emphasizes the drawbacks of standalone anomaly-based and misuse-based
intrusion detection systems and the benefits of hybrid IDSs, which integrate the two strategies
for better attack detection. According to studies, Snort's integration with anomaly-based
systems such as PHAD and NETAD improves the accuracy of intrusion detection, especially
when it comes to new threats.[27] The study "A Survey of Intrusion Detection Systems in
Wireless Sensor Networks" offers a thorough analysis of the advantages and disadvantages of
several intrusion detection strategies designed for wireless sensor networks. It highlights the
need for reliable and energy-efficient IDS systems to handle the particular difficulties these
networks provide.[28] By combining anomaly and misuse detection, hybrid intrusion detection
systems increase intrusion detection accuracy while lowering false positives. SDN-based IDSs

Page | 7
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

have recently advanced to use machine learning for effective and dynamic threat detection.[29]
IDSs based on genetic algorithms increase the effectiveness of anomaly detection and feature
selection. By optimizing rule generation, GA reduces false positives and increases detection
accuracy.[30]

Page | 8
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

1.3 Motivation:

The motivation behind this project arises from the escalating threat landscape in the realm
of cybersecurity. In recent years, there has been a significant increase in the frequency,
complexity, and impact of cyberattacks on modern digital infrastructures. Organizations
across the world face constant risks of data breaches, denial-of-service (DoS) attacks,
malware infections, and unauthorized access. As these threats continue to evolve, traditional
security mechanisms such as signature-based Intrusion Detection Systems are proving to be
inadequate in ensuring real-time protection.

Signature-based IDS are fundamentally reactive they rely on predefined patterns of known
attacks to detect malicious activity. While these systems are effective against previously
encountered threats, they fail to recognize novel, emerging, or polymorphic attacks that do
not match existing signatures. This limitation creates a significant gap in an organization’s
defence, making them vulnerable to zero-day exploits and adaptive intrusions. Additionally,
these systems often suffer from a high rate of false positives, which can lead to alert fatigue
among security analysts and divert attention from real threats.

The growing scale and complexity of network traffic also pose challenges in real-time
analysis and response. As networks expand and data flows become more intricate, the ability
to detect and respond to intrusions promptly becomes increasingly difficult. There is a clear
need for intelligent, scalable solutions that can go beyond static rule-based systems and offer
proactive defence mechanisms.

Machine Learning presents a powerful alternative to traditional IDS approaches. Unlike rule-
based systems, ML models are capable of learning from large datasets, identifying complex
patterns, and adapting to new and unseen attack types. These models not only enhance the
detection capabilities but also significantly reduce false alarms by understanding the context
and variations in network behaviour. They can be trained to distinguish between normal and
malicious traffic with high precision, offering a more reliable and efficient approach to
network security.

The key motivations for integrating machine learning into this project include:

 The ability to detect both known and unknown attacks with improved accuracy.
 A reduction in false positives and false negatives, enhancing trust in alerts.
 The capacity to automate learning from labeled datasets without manual
intervention.
 The flexibility to adapt to dynamic network environments and changing attack
vectors.
 Support for real-time traffic monitoring and prediction, enabling timely threat
response.

9
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

By leveraging machine learning techniques, this project aims to develop a robust and
intelligent intrusion detection system that addresses the limitations of conventional solutions
and strengthens the defence mechanisms of modern networks.

1.4 Objective:

The primary objective of this project is to design and implement a Robust Network Intrusion
Detection System capable of identifying and classifying various types of network-based
attacks with high accuracy and minimal false alarms. As the number and complexity of cyber
threats continue to rise due to the increasing reliance on digital services, traditional rule-
based detection systems often fail to detect new or unknown attacks. To address this, the
project leverages machine learning techniques to analyse large volumes of network traffic
and detect anomalous behaviour. The system is developed using the CICIDS2018 dataset,
which contains a wide range of realistic attack scenarios and normal traffic patterns. By
training and evaluating three different machine learning model Random Forest, XG Boost,
and Logistic Regression the goal is to identify the most effective algorithm for robust
intrusion detection. The final objective is to create a scalable and intelligent IDS solution
that can be deployed in real-world environments to enhance the security of modern
networks.

1.4.1 Dataset Handling and Preprocessing

 Import and explore the CICIDS2018 dataset.

 Handle missing values, anomalies, and outliers using techniques like imputation and
scaling.

 Convert timestamps into usable numerical features (hour, minute, second).

 Encode categorical features such as protocol type and labels.

 Select the most relevant features using domain knowledge and experimentation.

1.4.2. Model Selection and Training

 Train and evaluate three different supervised machine learning algorithms:

o Random Forest Classifier

10
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

o XG Boost Classifier

o Logistic Regression

 Perform stratified train-test split to preserve label distributions.

 Handle infinite or missing values in the dataset and apply Standard Scaler for
normalization.

 Train each model on the same dataset and evaluate them based on performance
metrics.

1.4.3. Performance Evaluation

 Use metrics such as accuracy, precision, recall, and F1-score to assess model
performance.

 Focus on weighted F1-score for multiclass classification, ensuring balanced


performance across all attack types.

 Identify the best-performing model and store it using Python’s pickle library for
future use.

1.4.4. Intrusion Prediction System

 Implement a function to accept new network session data, preprocess it using the
same pipeline, and predict whether it is an attack or normal.

 Make real-time predictions using the saved best model.

 Output the prediction in a human-readable format (e.g., “Normal” or “Attack”).

1.4.5. Real-Time Applicability

 Ensure that the detection pipeline is optimized for speed and efficiency, allowing
integration into real-time monitoring systems.

 Explore the potential for extending this system into an intrusion detection and
prevention framework in future versions.

11
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

1.5 Problem Statement:

In the age of digitization, the reliance on computer networks and internet-based


communication has become vital for individuals, organizations, and governments. With the
proliferation of online services, cloud computing, and connected devices, the volume and
complexity of network traffic have significantly increased. Alongside this growth, cyber
threats have become more frequent, sophisticated, and damaging. Malicious actors now
employ advanced tactics such as zero-day exploits, polymorphic malware, and distributed
denial-of-service (DDoS) attacks to breach security defences, disrupt services, and steal
sensitive information.

Traditional Network Intrusion Detection Systems (NIDS) primarily rely on signature-based


or rule-based approaches. These systems detect intrusions by matching incoming traffic
patterns against a predefined database of known attack signatures. While effective in
identifying previously encountered threats, these methods struggle to detect novel or
modified attacks. As a result, they are not well-suited to dealing with today's dynamic and
evolving threat landscape. In addition, they tend to produce high false positive rates,
meaning that benign traffic is often incorrectly flagged as malicious. This not only leads to
inefficient resource allocation but also increases the burden on cybersecurity personnel who
must analyse large volumes of false alerts.

Another significant limitation of traditional IDS solutions is their scalability. As network


sizes grow and traffic becomes more complex, these systems experience difficulty
processing data in real-time. The latency in detection and response can leave networks
exposed to ongoing attacks. Furthermore, these systems lack the ability to learn from new
types of data or behaviour. Since rule-based systems are static and require manual updating,
they are not able to adapt quickly to newly emerging threats or techniques.

In this context, there is a critical need for intelligent intrusion detection systems that can
analyse large volumes of network traffic, recognize patterns, and detect both known and
unknown threats in real-time. Machine learning offers a promising solution to these
challenges. Unlike traditional systems, ML-based IDS can learn from historical data,
identify complex patterns, and adapt to new types of attacks. They can also reduce the rate
of false positives by learning the difference between normal and malicious behaviour based
on context and data trends.

12
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

CHAPTER -2
SYSTEM ANALYSIS

13
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

2. SYSTEM ANALYSIS
In the era of increasing cyber connectivity, safeguarding computer networks from
malicious threats is more crucial than ever. The exponential rise in internet traffic, cloud-
based services, and Internet of Things devices has led to an upsurge in sophisticated
cyberattacks. Organizations rely heavily on Intrusion Detection Systems as a defensive layer
in their cybersecurity infrastructure. However, with evolving attack strategies, traditional
IDS methods often fall short in effectively identifying and mitigating modern threats. This
section critically examines the existing systems used in network intrusion detection,
outlines their limitations, and introduces a more adaptive and intelligent proposed system
built on machine learning principles.
2.1 Existed System:
Traditional Intrusion Detection Systems have played a crucial role in monitoring and
securing network environments. These systems are mainly categorized into two types:
signature-based IDS and anomaly-based IDS. Signature-based systems detect intrusions by
comparing network activity against a database of known attack signatures. They are highly
accurate in detecting previously known threats and are widely deployed due to their
simplicity and efficiency. Anomaly-based systems, on the other hand, establish a baseline of
normal network behaviour and flag any deviations as potential threats. They are capable of
identifying unknown attacks, making them more flexible. However, both types of systems
have their own drawbacks. Signature-based IDS are ineffective against zero-day or
previously unseen attacks, as they cannot recognize patterns that are not already in their
database. Anomaly-based systems may raise frequent false alarms due to slight variations in
legitimate behaviour. In general, traditional IDS require extensive manual configuration,
lack adaptability, and struggle to cope with large-scale and high-speed network traffic. These
limitations highlight the need for more intelligent, automated, and scalable intrusion
detection mechanisms, particularly in the face of increasing cyber threats.
Limitations of the Existing System:
Inability to Detect Unknown Attacks:
Signature-based systems are ineffective against newly emerging threats that do not have a
predefined pattern or signature.

14
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

High False Positive Rates:


Anomaly-based IDS can incorrectly classify legitimate activities as malicious due to their
sensitivity to behavioral deviations, overwhelming analysts with unnecessary alerts.
Reactive Nature:
Most traditional systems respond after an attack has occurred. There is no predictive analysis
or proactive prevention.
Manual Updates and Maintenance:
Signature databases require constant updates and manual tuning, which is labor-intensive
and susceptible to human error.
Poor Scalability:
With the ever-increasing size and speed of modern networks, these systems struggle to
process and analyze large volumes of data in real time.
Lack of Intelligence:
These systems do not learn or improve over time. They rely entirely on preprogrammed rules
and static models, making them obsolete quickly in the face of evolving attack strategies.
Performance Bottlenecks:
As traffic volume increases, traditional systems can become sluggish, leading to delays in
detection and response times.
These limitations reveal a pressing need for a more intelligent, efficient, and scalable
solution that can adapt to modern network environments and threats.
2.2 PROPOSED SYSTEM
To overcome the limitations of traditional intrusion detection systems, this project proposes
a Robust Network Intrusion Detection System built using machine learning algorithms.
Unlike signature-based or manually configured IDS, the proposed system leverages the
power of data-driven models that can learn from historical network traffic and accurately
classify malicious activities, even if the attacks are previously unseen. The solution is
designed to automatically detect a wide range of network intrusions by analyzing real-time
traffic patterns and extracting relevant statistical features.
The core of the system is built on three effective machine learning models: Random Forest,
XG Boost, and Logistic Regression. These algorithms were selected for their robustness,
interpretability, and efficiency in handling high-dimensional datasets. The system is trained
and validated using the CICIDS2018 dataset, which simulates real-world network behavior,

15
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

including both benign traffic and a variety of attack types such as DDoS, brute force, port
scans, and web intrusions. The dataset undergoes preprocessing steps such as handling
missing values, encoding categorical variables, extracting time-based features (Hour,
Minute, Second), and scaling numerical features using Standard Scaler to ensure model
accuracy and stability.Once preprocessed, the data is split into training and testing sets using
stratified sampling to maintain class balance. Each model is trained and evaluated using the
F1-score to determine the best-performing algorithm. The system selects the model with the
highest F1-score as the final classifier and saves it using Python's pickle module for future
use. Additionally, a prediction function is implemented to classify new, unseen network
sessions as either normal or attack, allowing for real-time integration in practical
environments.
By combining modern machine learning approaches with a realistic intrusion dataset, the
proposed system significantly reduces false positives and enhances the ability to detect a
wide variety of threats. This intelligent, automated solution is scalable and adaptable,
making it suitable for real-world deployment in enterprise networks where cybersecurity is
a critical concern. Ultimately, this system offers a more accurate, efficient, and proactive
approach to intrusion detection compared to traditional methods. Key Features of the
Proposed System are:
Data Preprocessing and Feature Engineering
The system begins by cleaning and preprocessing the raw data. Missing values, infinite
values, and outliers are handled to ensure clean input to the models. Time-based features
such as Hour, Minute, and Second are extracted from timestamps to help in identifying time-
sensitive patterns of attacks. Additionally, categorical features like protocol types are
encoded using Label Encoding, making the data suitable for model training.
Feature Selection and Normalization
Not all features are useful for detecting intrusions. The system selects relevant features such
as Flow Duration, Protocol, Tot Fwd Pkts, Flow Byts/s, etc., which have a strong correlation
with intrusion patterns. These features are then normalized using Standard Scaler to bring
them into the same scale, which improves the performance and convergence speed of ML
models.
Model Training and Evaluation

16
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

The selected machine learning models are trained using 80% of the dataset, and the
remaining 20% is used for testing. Models are evaluated based on F1-Score, which considers
both precision and recall, ensuring that the system performs well not only in identifying
attacks but also in minimizing false positives.
Model Selection and Serialization
The best-performing model based on weighted F1-Score is saved using Python’s pickle
module for future use. This enables easy deployment of the model in real-world applications
without retraining.
Real-Time Prediction Module
A prediction function is implemented to classify new sessions or real-time network flows as
either "Attack" or "Normal". This allows the system to be integrated into live monitoring
tools for real-time intrusion detection and alert generation.
Advantages of the Proposed System:
Higher Detection Accuracy: Machine learning models learn complex patterns in the data,
allowing for better identification of sophisticated attacks.
Detection of Unknown Threats: Unlike signature-based systems, the ML models can
generalize to detect previously unseen attack types.
Reduced False Positives: The system can differentiate between genuine anomalies and
legitimate deviations, reducing unnecessary alerts.
Scalability: Once trained, the models can process large volumes of traffic efficiently, making
the system suitable for modern enterprise environments.\
Automation and Adaptability: The system requires minimal manual intervention and can be
retrained with new data to adapt to evolving threats.
2.3 Feasibility Study:
Before initiating the development of any system, it is important to determine whether the
proposed idea is feasible and worth implementing. The feasibility study is conducted to
evaluate the practicality and effectiveness of the proposed machine learning-based Network
Intrusion Detection System (NIDS) from multiple perspectives. It helps identify potential
risks, limitations, and benefits that could affect the success and adoption of the system.
This study considers three major aspects of feasibility:
 Economic Feasibility: Determines if the project is financially viable.

17
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

 Technical Feasibility: Assesses whether the technology, tools, and expertise are
sufficient.
 Social Feasibility: Evaluates how the solution will be received and its societal
impact.
2.3.1 Economic Feasibility:
Economic feasibility examines the cost-effectiveness of the project and its return on
investment. In today’s digital environment, organizations spend vast amounts of money on
securing their networks from cyberattacks. Despite this investment, many companies still
fall victim to advanced threats due to the limitations of traditional security systems.
The proposed system provides a low-cost, high-impact solution by leveraging freely
available datasets (e.g., CICIDS2018) and open-source machine learning tools like Python,
Scikit-learn, XG Boost, and Pandas. There is no need to purchase expensive software or
licenses, which significantly reduces the development and deployment cost.
Moreover, this system can potentially save organizations from heavy financial losses that
might result from data breaches, service disruptions, or ransomware attacks. Early detection
and prevention of such threats help minimize financial damage, legal penalties, and
reputational loss. From a long-term perspective, maintenance costs are also low, as the
system can be updated using new datasets and retrained periodically with minimal resources.
Overall, the project is economically sustainable and offers a strong return on investment,
especially for medium to large-scale organizations that manage sensitive or valuable data.
2.3.2 Technical Feasibility:
Technical feasibility assesses whether the project can be built using existing tools,
technologies, and resources.
The proposed system is technically feasible and utilizes widely adopted technologies that
are well-supported and documented. It is developed using:
 Python programming language, known for its simplicity and vast library support,
 Machine learning algorithms such as Random Forest, XG Boost, and Logistic
Regression,
 Preprocessing tools for data cleaning and feature scaling (e.g., Standard Scaler,
LabelEncoder),
 And matplotlib/seaborn for visualization and result interpretation.
2.3.3 Social Feasibility:

18
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Social feasibility focuses on the human aspect—how the system will be perceived, accepted,
and used by people and organizations.
The rise in data breaches, identity theft, and ransomware attacks has made cybersecurity a
top priority for businesses, governments, and individuals alike. In this context, a machine
learning-based NIDS is socially relevant and highly demanded.
Unlike traditional systems that require manual rule updates and generate large volumes of
false alarms, this system offers:
 Reduced false positives, which increases trust among security teams,
 Automation and intelligence, which lessens the need for constant manual
intervention,
 Ease of use, since the system does not require deep technical knowledge to operate
once deployed.
2.4 System Requirement Specifications:
It defines the key functionalities and performance expectations of the machine learning-
based Network Intrusion Detection System. It outlines what the system should do (functional
requirements) and how well it should perform those actions (non-functional requirements).
These requirements ensure that the system is developed in line with the intended objectives
and serves the needs of users effectively.
2.4.1 Functional Requirements:
Functional requirements describe the core features and operations the system must support
to meet its objectives. The main functional requirements of the proposed NIDS include:
Data Collection and Preprocessing
The system should be able to ingest structured network traffic data from datasets like
CICIDS2018.It must perform data cleaning, handle missing values, and preprocess
categorical features for model compatibility.
Feature Selection and Extraction
The system should allow selection of relevant features for model training. Feature
engineering steps like normalization, scaling, and transformation should be supported.
Model Training and Evaluation
The system should train machine learning models (e.g., XG Boost, Random Forest, Logistic
Regression) using labeled network data. It must evaluate model performance using metrics
like accuracy, precision, recall, and F1-score.

19
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Intrusion Detection (Prediction)


The trained model should be able to classify incoming network records as normal or attack
types (e.g., DoS, DDoS, Brute Force, etc.). It must support multiclass classification to
distinguish between various types of attacks.
Real-Time or Batch Prediction Capability
The system should support either real-time detection or batch classification of network
traffic records.
Visualization and Reporting
The system should generate graphs/charts for visual interpretation of model performance
and dataset characteristics Confusion matrix and classification reports should be displayed
after testing the model.
Model Updating and Retraining
The system should allow for periodic retraining with new data to keep detection capabilities
up to date.
User Interface (Optional)
If a basic GUI or web interface is implemented, it should allow the user to upload data, select
models, and view results.
2.4.2 Non-functional Requirements:
Performance Requirements
The system should deliver accurate results with an acceptable training and prediction time.
It should be able to handle large volumes of data without significant degradation in
performance.
Scalability
The system architecture should allow future expansion, such as real-time detection or
integration with live network feeds. It should support the addition of new attack types and
datasets with minimal changes.
Reliability and Accuracy
The system should consistently detect intrusions with a high degree of accuracy and minimal
false positives/negatives.
Usability

20
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

The system should be easy to use by individuals with basic machine learning or
cybersecurity knowledge. Output reports and visualizations should be clearly
understandable and interpretable.
Maintainability
The codebase should follow modular programming practices, making it easy to update or
debug. Retraining or fine-tuning the model with new data should not require rewriting the
entire code.
Security
The system should process data securely and ensure no sensitive information is exposed.It
must comply with basic data handling protocols, especially if extended to real-time
environments. Portability
The system should be platform-independent and run on commonly available hardware and
operating systems like Windows or Linux. It should support execution on standard laptops
with at least 8 GB RAM and an Intel Core i5 processor.
2.5 Hardware requirements:

Processor: 13th Gen Intel(R) Core(TM) i5-1335U, 1.30 GHz


RAM: 8.00 GB
System Type: 64-bit Operating System, x64-based Processor

Storage : SSD (minimum 256 GB)


Display: Standard laptop screen (1080p assumed)

2.6 Software Requirements:

Operating System: Windows 11 Home Single Language (Version 23H2)


Python Version: Python 3.10 or higher
Development Tools: Anaconda Navigator
Key Libraries: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, XG Boost
ML Frameworks: Scikit-learn, XG Boost

21
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

CHAPTER -3
SYSTEM DESIGN

22
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

3. SYSTEM DESIGN

3.1 System Architecture:

The architecture of the proposed Machine Learning-based Network Intrusion Detection


System (NIDS) is designed to provide an efficient and intelligent method for detecting cyber
threats within a network. The system begins with the collection of network traffic data from
the CICIDS2018 dataset, which includes both normal and malicious traffic patterns. This
raw data is then passed through a data preprocessing pipeline, where missing values are
handled, irrelevant features are removed, and categorical values are encoded using
techniques like label encoding. Additionally, timestamp features such as hour, minute, and
second are extracted to capture temporal patterns in network traffic

Fig 3.1 System Architecture

Once preprocessing is complete, feature scaling is applied using Standard Scaler to


normalize the data for better model performance. The cleaned and transformed dataset is
then split into training and testing subsets to train multiple ml models, including Random
Forest, XG Boost, and Logistic Regression. These models are trained on selected features
that are most relevant for intrusion detection, such as flow duration, packet lengths, and byte
rates. The trained models are evaluated using performance metrics like accuracy and F1-
score to determine the best-performing algorithm.
Finally, the system is capable of taking new network session data as input, processing it
through the same pipeline, and predicting whether the traffic is normal or an intrusion in

23
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

real-time. This architecture ensures the system is scalable, adaptable, and capable of
accurately detecting both known and unknown cyber threats, thereby enhancing the overall
security posture of the network.
3.2 System Modules
The system consists of five major modules:
Data Preprocessing Module: Responsible for cleaning the dataset, encoding categorical
variables, handling missing values, and scaling features to standardize input for the models.
Feature Engineering Module: Extracts time-based features and selects the most relevant
columns for classification based on domain knowledge and statistical correlation.
Model Training Module: Implements training pipelines for multiple machine learning
models. It includes cross-validation and hyperparameter tuning for optimal performance.
Evaluation Module: Evaluates each trained model using metrics like Accuracy, Precision,
Recall, and F1-score. Confusion matrices and classification reports are generated for
detailed analysis.
Prediction Module: Loads the serialized model and takes new input for real-time
classification of network traffic as normal or malicious. This module can be integrated into
network monitoring systems for live use.

3.2.1 METHODOLOGY
The methodology adopted in this project is structured into systematic stages to ensure a
robust and reliable intrusion detection system. It includes data acquisition, data
preprocessing, feature selection, model building and training, evaluation, and real-time
intrusion prediction. The entire pipeline is designed to simulate the practical deployment of
a machine learning-based Network Intrusion Detection System (NIDS) using realistic
network traffic data.

24
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig.3.2.1Attack recognition
1. Data Acquisition
The CICIDS2018 dataset was selected as the benchmark dataset for training and testing the
intrusion detection models. Developed by the Canadian Institute for Cybersecurity, this
dataset comprises realistic traffic, which includes both benign and malicious activities. It is
labeled and contains features such as flow duration, packet statistics, bytes per second, and
attack categories such as:
 DoS/DDoS
 Brute Force
 Port Scanning
2. Data Preprocessing
Before training any machine learning models, the dataset underwent multiple preprocessing
steps to ensure quality and consistency:
3. Handling Missing Values
 Numeric columns were filled using their mode values to maintain the integrity of
the data distribution.
 Categorical columns were filled using their most frequent values to avoid
introducing bias.

25
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig 3.2.2 data pre-processing

1. Timestamp Feature Extraction


 The Timestamp column was parsed into datetime format.
 New time-based features (Hour, Minute, Second) were extracted to capture
temporal patterns in the data.
 The original timestamp column was then dropped.
2. Label Encoding
 Categorical variables such as Protocol and Label were encoded using Label
Encoder to convert them into numerical format suitable for model training.
3. Dealing with Infinite and NaN Values
 Infinite values were replaced with NaNs and then filled using the median of the
respective features.
 This prevents skewed training caused by outliers or missing data and ensures
compatibility with scaling techniques.
3. Feature Scaling

26
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

 Feature values were standardized using StandardScaler to ensure all features


contribute equally during model training.
 This step helps models like Logistic Regression and XGBoost perform better by
reducing bias caused by feature magnitude.
4. Feature Selection
From the available features, a subset of important and relevant attributes was selected based
on domain knowledge and experimentation:
 Flow Duration
 Protocol
 Tot Fwd Pkts (Total Forward Packets)
 Tot Bwd Pkts (Total Backward Packets)
 Fwd Pkt Len Max (Max Length of Forward Packets)
 Bwd Pkt Len Max (Max Length of Backward Packets)
 Flow Byts/s (Bytes per Second)
 Flow Pkts/s (Packets per Second)
 Hour, Minute, Second (from timestamp)
These features capture critical information about the traffic behavior and help the models
distinguish between normal and attack traffic effectively.
5. Dataset Splitting
The preprocessed dataset was split into training and testing sets using an 80:20 ratio via
train_test_split, with stratification to preserve label distribution. This ensures that both sets
have proportional representation of each class, which is essential for reliable evaluation in
multiclass classification problems.
6 Model Building and Training
Three supervised machine learning models were selected for building the intrusion detection
system:
1. Random Forest Classifier
 An ensemble method that builds multiple decision trees and combines their outputs
for better accuracy and robustness.
 It handles large datasets and avoids overfitting through random feature selection
and averaging.

27
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig .3.2.3 Random Forest Classifier


2. XG Boost Classifier
 A gradient boosting framework that builds models in a stage-wise fashion.
 Highly efficient and accurate for classification tasks with structured data.
 Uses eval_metric="logloss" and use_label_encoder=False to support multiclass
settings.

28
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig 3.2.4 XG Boost Classifier

3. Logistic Regression
 A baseline linear model suitable for multiclass classification with good
interpretability.
 Useful for evaluating how simpler models compare to complex ensemble methods.
Each model was trained on the standardized training data and evaluated using the test set.

29
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig 3.2 .4 Logistic Regression

1.Model Evaluation
To evaluate the performance of each model, the following metrics were used:
 Accuracy: Overall correctness of the model.
 Precision: Proportion of predicted positives that are actually positive.
 Recall: Proportion of actual positives correctly identified.
 F1-Score: Harmonic mean of precision and recall, providing a balanced metric.
A weighted F1-score was used to handle class imbalance and evaluate multiclass
classification performance fairly.
The model with the highest F1-score was selected as the best model and saved using
Python’s pickle module for later use in real-time prediction.
Intrusion Detection and Prediction
A prediction function was implemented to classify new network session data using the
trained model:
 New input data is preprocessed (scaled using the same Standard Scaler).
 The best saved model is loaded using pickle.
 The function outputs either “Normal” or “Attack” depending on the prediction
result.

30
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

This simulates the real-time usage of the system where new traffic data can be analyzed and
flagged automatically.
3.3 Introduction to UML Diagrams
Unified Modelling Language (UML) is a standardized visual language used to model and
design the structure and behaviour of software systems. It provides a set of diagrams and
symbols that help developers, analysts, and stakeholders understand, plan, and communicate
the components and flow of a system. UML diagrams serve as blueprints, representing both
the static aspects (such as classes and objects) and dynamic behaviours (such as interactions
and workflows) of a system. Common UML diagrams include Class Diagrams, Use Case
Diagrams, Sequence Diagrams, Activity Diagrams, and State Diagrams, each serving a
specific purpose in software development. By using UML, teams can visually map out the
system’s architecture, identify relationships between components, and ensure clear
communication throughout the development lifecycle.
The goal is for UML to become a common language for creating models of object-oriented
computer software. In its current form UML is comprised of two major components: a Meta
model and a notation. In the future, some form of method or process may also be added to;
or associated with UML. The Unified Modelling Language is a standard language for
specifying, Visualization, Constructing and documenting the artifacts of software system, as
well as for business modelling and other non-software systems.
3.3.1 Use Case Diagram
Use Case Diagrams are a type of Unified Modelling Language (UML) diagram that visually
represent the functional requirements of a system from the user's perspective. They illustrate
the interactions between users (called actors) and the system itself through various use
cases, which describe specific functionalities or services the system provides. Use case
diagrams help developers and stakeholders understand what the system is supposed to do,
who will use it, and how different users interact with different parts of the system. Typically,
the diagram includes actors (users or external systems), use cases (oval shapes describing
actions), and associations (lines connecting actors to their relevant use cases). These
diagrams are particularly useful during the early stages of software development to gather
and communicate requirements clearly and effectively.

31
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

The UML use case diagram visually represents how a user interacts with the core functions
of the Network Intrusion Detection System (NIDS). In the diagram, the Apps User is the
main actor who initiates interactions with the system. The system, shown as a rectangular
boundary, contains four main functions: Monitoring network, Port scan attack detector,
Analyze the network, and Receive notification. These use cases describe the system's role
in securing a network. The Monitoring network use case enables real-time tracking of all
incoming and outgoing data packets, ensuring every flow of traffic is observed. The Port
scan attack detector is responsible for detecting suspicious scanning activities where
attackers probe multiple ports to find vulnerabilities. This early detection is vital to prevent
further intrusion. The Analyze the network function processes and inspects network traffic
using various machine learning techniques to determine if the traffic is normal or malicious.
This analysis uses predefined models trained on datasets like CICIDS2018 to ensure
accurate detection. Once a threat is detected, the Receive notification use case sends an alert
to the user, allowing quick and

32
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig 3.3.1 Use Case Diagram


informed responses. Each of these system functions is connected to the user, showing that the
user either initiates or receives information from these activities. The modular nature of this
diagram shows how tasks are separated for clarity and functionality. This helps in
development, where each function can be coded and tested independently. The UML also
serves as a blueprint that explains how different components of the system work together to
form a secure and intelligent intrusion detection system. It bridges the gap between conceptual
design and actual implementation. By visualizing how the system works, it becomes easier to
explain, present, and expand upon. This UML diagram is an essential part of project
documentation and effectively communicates the system's workflow and user interaction.
3.3.2 Class Diagram
The class diagram represents a Signature-Based Network Intrusion Detection System
(NIDS) that operates by comparing network activity against known attack signatures. The

33
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

system is composed of several interrelated classes, each with its own role in the detection
process. At the heart of the system is the IDS class, which has attributes like id and methods
such as requestService(), detectIntrusion(), and issueAlert(). It acts as the main interface that
manages intrusion detection requests and communicates with other components.
The IDS class forwards requests to the EventProcessor, which contains the name attribute
and a method processDataForDetector(). This component plays a crucial role in
preprocessing and filtering network data before sending it to the core detection unit. The
processed data is then forwarded to the AttackDetector class, which includes methods like
detectIntrusion() and checkAttackInfo(). This is the component responsible for comparing
incoming data against known signatures to detect malicious behavior.
Supporting the AttackDetector is the SignatureInformation class, which manages the
signatures. It contains functions like addSignature() and removeSignature(), allowing the
system to dynamically update its knowledge base. This class is associated with multiple
Signature objects, each having an id and a signature string. These signatures represent
known patterns of malicious traffic.
Once the Attack Detector processes the data and identifies a threat, it interacts with the
Response class, which includes the method create Response(). This class handles how the
system responds to detected intrusions, whether by generating alerts, logging activity, or
taking preventive measures. The send Result and send Response associations between
classes ensure that alerts or outputs are delivered properly back to the IDS and ultimately to
the user or administrator.
Each class in the diagram is connected with meaningful associations, clearly showing the
communication and workflow in the system. For example, the use of composition between
Signature Information and Attack Detector signifies a strong dependency, indicating that
attack detection relies on accurate and up-to-date signature data. Overall, this class diagram
offers a clear, modular, and scalable architecture for building a signature-based intrusion
detection system. It separates concerns efficiently, ensuring maintainability and extensibility
for future improvements like adding more detection methods or response types.

34
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig 3.3.2 Class Diagram


3.3.3 Sequence Diagram
The sequence diagram of the Network Intrusion Detection System (NIDS) presents how
different components in the system communicate with each other to detect and respond to
possible intrusions. It begins with the IDS (Intrusion Detection System) initiating the process
by sending a forward Request to the Event Processor. The Event Processor receives the
request and processes the data, then forwards the processed data to the Attack Detector. This
flow continues as the Attack Detector performs several actions to analyze the data. First, it
calls the check Signature method in the Signature Information module to verify if any known
attack patterns match the incoming data. Once the Signature Information module checks the
signature, it sends back a response using the send Signature Match method.
Next, the Attack Detector invokes detect Intrusion to confirm whether a real attack has been
found. If an intrusion is detected, it proceeds to send the result using the send Result method.
After the result is generated, it is sent further using the send Response action, where the

35
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Response module is called. The Response module uses the create Response method to form
an appropriate reaction to the detected threat. This could include alerting administrators or
activating emergency measures.
The sequence diagram clearly shows how each class and module works together in a time-
based order to detect intrusions in the network. The vertical lines in the diagram represent
each component's lifeline, while the horizontal arrows indicate the messages being passed
from one to another. The numbering (like 1.1.1.2.2.1) shows the nested calls and their order
of execution. The IDS acts as the entry point, and everything follows a well-organized flow
from detecting to responding to a threat. This setup makes the system capable of identifying
and managing network intrusions efficiently. Each module does a specific task, and their
interaction ensures a complete check of possible attacks and a quick reaction. Overall, the
diagram helps in understanding the logical sequence and communication among different
parts of the intrusion detection system, ensuring that the system runs smoothly, detects
problems quickly, and responds accurately

Fig3.3.3 Sequence Diagram


3.3.4 Activity Diagram
The activity diagram of the Network Intrusion Detection System (NIDS) represents the
logical flow of operations performed to detect malicious activities within a network. The
process begins with two parallel components—Training and the Information Collection
Subsystem. The training module is responsible for analyzing normal network behavior over

36
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

time and creating a Normal Pattern Database. This database stores baseline behavior that
serves as a reference to identify any deviations in real-time network activity. On the other
side, the information collection subsystem continuously monitors incoming data and
network traffic, collecting important behavioral metrics.
Once both modules are operational, the system compares the collected data with the entries
in the normal pattern database. This comparison plays a crucial role in identifying anomalies.
A decision-making point, often represented by a threshold value, evaluates whether the
current behavior exceeds the acceptable limit defined during training. If the data does not
exceed the threshold, it is classified as Normal, and no further action is taken. However, if
the data does exceed the threshold, it is flagged as an Anomaly, indicating a potential threat
or suspicious activity in the network.
This entire activity flow reflects a typical Anomaly-Based Intrusion Detection System,
where machine learning or statistical analysis is used to establish a model of normal behavior
and detect deviations from it. The diagram simplifies the detection process into a decision-
driven mechanism, showing how automated systems can help administrators quickly
identify threats without manually analyzing traffic.
The use of thresholds in this diagram is a key security strategy. A well-defined threshold
helps reduce false positives (normal behavior incorrectly flagged as intrusion) and false
negatives (actual threats missed by the system). The feedback from the decision process can
also be looped back to the training module to update the normal behavior patterns, making
the system adaptive and intelligent over time.
Furthermore, the simplicity of the activity diagram allows system designers and developers
to visualize the workflow at a high level and make necessary improvements. It captures the
step-by-step logic and decisions made by the IDS engine without going into complex
technical implementation. The focus remains on the logical transition from data collection
to decision-making, which is essential for designing and validating an effective intrusion
detection model.
In conclusion, the activity diagram of NIDS provides a clear, structured representation of
how suspicious network behaviors are identified, categorized, and responded to. It plays a
foundational role in modeling the internal logic of an IDS system. Through continuous data
monitoring, comparison against normal behavior, and threshold-based decisions, the activity

37
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

diagram highlights how the system maintains network integrity and prevents unauthorized
access or attacks. This high-level understanding is essential for researchers, developers, and
security professionals working on IDS implementations

Fig 3.3.4 Activity Diagram


3.3.5 Component Diagram
The component diagram of the Network Intrusion Detection System (NIDS) clearly
illustrates the functional structure and interaction of essential components involved in the
detection and reporting of suspicious network activities. This architecture focuses on how
external data sources, core detection modules, and human analysts work together to identify
and respond to threats. At the entry point of the system, raw network traffic is collected from
various internal and external sources. This traffic is first processed by the Traffic Analyse
component, which is responsible for deep packet inspection and detection of known
Indicators of Compromise (IoCs) using a reference Blacklist. This blacklist stores IP
addresses, domain names, or other data previously associated with malicious behavior.

38
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

In parallel, CTI Feeds (Cyber Threat Intelligence feeds) act as an external source of threat
intelligence, supplying up-to-date IoCs and all contextual threat information (CTI). These
feeds enrich the analysis performed by the Traffic Analyses. The CTI data is integrated into
the detection mechanism to ensure the system can identify threats that go beyond static
signatures and includes dynamic, emerging threats. The Traffic Analyses generates events
based on the findings from traffic inspection. These events are passed to the Event
Correlation component.
The Event Correlation component plays a critical role in analyzing multiple events together
to determine patterns, identify coordinated attacks, or deduce whether an incident is isolated
or widespread. It collects individual detection events and correlates them over time or across
multiple devices to produce high-confidence network incidents. These network incidents are
then forwarded to the CSIRT Analyst (Computer Security Incident Response Team analyst),
who manually reviews the generated reports.
The CSIRT Analyst provides human expertise by validating the automated findings,
conducting further investigation, and initiating appropriate response actions. This interaction
closes the feedback loop between the system's automated detection and human-driven
validation. The system is modular and maintains separation between input data, processing
units, data flows, and storage, as represented in the legend.
Each component operates independently yet collaboratively to ensure scalability and
flexibility. The Traffic Analyser continuously updates itself using fresh CTI data. The Event
Correlation module applies logic, thresholds, or machine learning algorithms to make
decisions more intelligent and context-aware. The Blacklist database evolves over time as
new threats are identified and stored for future use. The CSIRT Analyst serves as the final
check before any response or alert is escalated to the organization.
This component diagram effectively conveys a modern architecture for an intrusion
detection system that blends automation with expert oversight. The clear separation of
responsibilities between data ingestion, analysis, correlation, and human validation ensures
a well-structured and reliable security mechanism. Overall, this design demonstrates how
both static rules and dynamic intelligence can be combined to create a more robust and
proactive intrusion detection capability. The architecture reflects best practices in

39
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

cybersecurity and emphasizes collaboration between artificial intelligence and human


expertise

Fig 3.3.5 Component Diagram


3.3.6 Deployment Diagram
The deployment diagram of the Network Intrusion Detection System (NIDS) illustrates the
flow and monitoring of data within a network infrastructure. It begins with the firewall,
which acts as the first line of defense by filtering incoming and outgoing traffic based on
predefined security rules. Once the data passes through the firewall, it reaches the router,
which forwards the data to its intended destinations, including internal networks or the
internet. Simultaneously, the data is directed to the Intrusion Detection System (IDS) for
monitoring. The IDS continuously observes the network traffic for any unusual or malicious
activity that could indicate a security threat or policy violation.
This IDS is connected to network data collectors, which gather packets and traffic
information from the network. These network data components play a vital role in ensuring
that real-time and historical data are available for analysis. All collected data is forwarded
for data analysis, where sophisticated algorithms, often based on machine learning models
such as Random Forest, XGBoost, and Logistic Regression, analyze the data to detect
anomalies. This analysis helps to determine whether a traffic pattern is legitimate or a
potential intrusion.

40
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

If a potential threat is identified, alerts are generated and sent to administrators for review
and action. These administrators are responsible for managing, responding to, and mitigating
any identified security incidents. The IDS is also connected to a central server, which stores
logs, alerts, and historical data necessary for post-attack investigations and system audits.
This server supports the IDS by providing a repository for storing large volumes of traffic
data, which is crucial for training detection models and refining rules over time.
The diagram emphasizes the collaborative role of each component in maintaining network
security. The firewall blocks known threats, the router manages traffic flow, and the IDS
inspects network behavior in-depth. The data analysis process transforms raw network data
into meaningful security insights. These insights are interpreted by administrators, who
maintain system integrity and respond to incidents. In this system, real-time data processing
ensures timely threat detection, while archived data supports trend analysis and forensic
investigations.
Deploying a robust NIDS in this manner enhances an organization's ability to detect and
respond to cyber threats effectively. It forms a comprehensive security framework where
prevention, detection, and response mechanisms work in tandem. The architecture is
designed for scalability, allowing the addition of more sensors or data sources as network
traffic grows. It is also adaptable to evolving cyber threats, as new detection models can be
trained and deployed within the analysis component. Thus, this deployment diagram
represents a layered and intelligent defense strategy for securing modern digital
infrastructures.

41
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

Fig 3.3.6 Deployment Diagram

42
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

CHAPTER - 4
SYSTEM IMPLEMENTATION

43
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

4.1 Front End Implementation


The frontend of the Network Intrusion Detection System (NIDS) is designed to be clean,
responsive, and user-friendly. It is developed using HTML, CSS, JavaScript, and Tailwind
CSS for a modern UI design. The interface allows users to upload network traffic data or
enter input for intrusion analysis. A “Detect Intrusion” button is provided to send the input
data to the Flask backend through RESTful API calls. Flask handles the data, processes it
using machine learning models, and returns results asynchronously. The frontend displays
the results, showing whether the input is normal or malicious, along with attack type and
confidence level. Visuals like tables or badges are used for better clarity. Loading indicators
and status messages give real-time feedback during the process. Users can download reports
or view previous detection history. The interface is fully responsive across devices. Input
validation ensures the system handles only clean and expected data. Tailwind CSS gives a
consistent and clean design throughout the app. Users may also choose the detection model
(e.g., XG Boost, Random Forest). The communication with Flask is smooth and efficient.
Overall, the frontend ensures easy access and smooth operation of the NIDS syste
4.2 Back End Implementation
The backend of the Network Intrusion Detection System (NIDS) is developed using the
Flask framework, which acts as a lightweight and efficient server to handle requests from
the frontend. It is responsible for receiving network traffic data, either as files or text, sent
via RESTful API calls. Once the data is received, it is preprocessed—cleaned, normalized,
and converted into a suitable format for machine learning. The backend then loads trained
models like XG Boost, Random Forest, or Logistic Regression to analyze the input data and
detect any intrusions. The models predict whether the traffic is normal or an attack, and if
malicious, identify the type of attack. The results, including prediction labels and confidence
scores, are sent back to the frontend in JSON format. Flask ensures smooth communication
between components and handles errors or invalid data gracefully. It also logs each request
for future reference and analysis. The backend supports multiple model options and can be
extended for real-time detection. Proper security practices are followed to avoid code
injection or data corruption. It ensures fast and accurate predictions with minimal response
time. The backend is modular and easy to scale or update. Overall, the Flask-based backend
serves as the core intelligence of the NIDS system, enabling efficient and reliable threat
detection

44
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

4.3 Source Code


4.3.1 Front End Code
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Feature Input Form</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"
rel="stylesheet">
</head>
<body>
<div class="container mt-5">
<h2 class="text-center">Enter Feature Values</h2>
<form action="/predict" method="POST">
<div class="row">
<!-- Loop through all selected features -->
<div class="col-md-6">
<label for="flow_duration" class="form-label">Flow Duration:</label>
<input type="number" class="form-control" id="flow_duration" name="Flow
Duration" required>
</div>
<div class="col-md-6">
<label for="protocol" class="form-label">Protocol:</label>
<input type="number" class="form-control" id="protocol" name="Protocol"
required>
</div>
<div class="col-md-6">
<label for="tot_fwd_pkts" class="form-label">Total Forward Packets:</label>
<input type="number" class="form-control" id="tot_fwd_pkts" name="Tot Fwd
Pkts" required>
</div>

45
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

<div class="col-md-6">
<label for="tot_bwd_pkts" class="form-label">Total Backward Packets:</label>
<input type="number" class="form-control" id="tot_bwd_pkts" name="Tot
Bwd Pkts" required>
</div>
<div class="col-md-6">
<label for="fwd_pkt_len_max" class="form-label">Forward Packet Length
Max:</label>
<input type="number" class="form-control" id="fwd_pkt_len_max"
name="Fwd Pkt Len Max" required>
</div>
<div class="col-md-6">
<label for="bwd_pkt_len_max" class="form-label">Backward Packet Length
Max:</label>
<input type="number" class="form-control" id="bwd_pkt_len_max"
name="Bwd Pkt Len Max" required>
</div>
<div class="col-md-6">
<label for="flow_byts_s" class="form-label">Flow Bytes/s:</label>
<input type="number" class="form-control" id="flow_byts_s" name="Flow
Byts/s" step="any" required>
</div>
<div class="col-md-6">
<label for="flow_pkts_s" class="form-label">Flow Packets/s:</label>
<input type="number" class="form-control" id="flow_pkts_s" name="Flow
Pkts/s" step="any" required>
</div>
<div class="col-md-4">
<label for="hour" class="form-label">Hour:</label>
<input type="number" class="form-control" id="hour" name="Hour" min="0"
max="23" required>
</div>

46
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

<div class="col-md-4">
<label for="minute" class="form-label">Minute:</label>
<input type="number" class="form-control" id="minute" name="Minute"
min="0" max="59" required>
</div>
<div class="col-md-4">
<label for="second" class="form-label">Second:</label>
<input type="number" class="form-control" id="second" name="Second"
min="0" max="59" required>
</div>
</div>
<div class="text-center mt-4">
<button type="submit" class="btn btn-primary">Submit</button>
</div>
</form>
<center><h1>Predicted Result : {{prediction}}</h1></center>
</div>
<script
src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script
>
</body>
</html>
4.3.2 Back End Code:
from flask import Flask, render_template, request
import numpy as np
import pickle
app = Flask(__name__)
# Load the trained model
model = pickle.load(open("best_nids_model.pkl", "rb"))
# Define the selected features
selected_features = [
"Flow Duration", "Protocol", "Tot Fwd Pkts", "Tot Bwd Pkts",

47
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

"Fwd Pkt Len Max", "Bwd Pkt Len Max", "Flow Byts/s", "Flow Pkts/s",
"Hour", "Minute", "Second"
]
lab=['Benign','FTP-BruteForce','SSH-Bruterorce']
@app.route("/")
def home():
return render_template("index.html") # Ensure "index.html" is the HTML form
@app.route("/predict", methods=["POST"])
def predict():
try:
# Get form values as a list of floats
values = [float(request.form[feature]) for feature in selected_features]
# Convert to a NumPy array and reshape for model input
input_array = np.array(values).reshape(1, -1)
# Make prediction
prediction = model.predict(input_array)[0]
# Return the prediction result
return render_template("index.html", prediction=lab[prediction])
except Exception as e:
return f"Error: {str(e)}"
if __name__ == "__main__":
app.run(debug=True)

48
NETWORK INTRUSION DETECTION SYSTEM USING MACHINE LEARNING

CHAPTER- 5
SYSTEM TESTING

49
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

5.1 Performance Analysis


5.1.1 Approaches to Sentence Extraction
This section presents the outcomes of implementing the proposed intrusion detection system
using machine learning techniques. The models were trained and evaluated on the CICIDS2018
dataset, which contains diverse and realistic network traffic scenarios including both normal
and various types of attack data. The discussion involves evaluating the performance of three
algorithms Random Forest, XGBoost, and Logistic Regression—on the basis of accuracy, F1-
score, and the ability to generalize across different attack types
5.1.2 Performance Metrics Used
To assess the effectiveness of the models, the following evaluation metrics were employed:
 Accuracy: The ratio of correctly predicted observations to the total observations.
 F1-Score: The weighted harmonic mean of precision and recall, especially important
in imbalanced datasets.
 Classification Report: Includes precision, recall, and F1-score for each class (e.g.,
Normal, DDoS, Brute Force, etc.).
These metrics help analyze not only how many instances are predicted correctly but also how
well the model balances between false positives and false negatives.
Model Evaluation Results
The dataset was split into 80% for training and 20% for testing. Below are the summarized
results from the three models:
Model Accuracy (%) F1-Score (Weighted)
Random Forest ~99.10% ~0.991

XG Boost ~98.75% ~0.987


Logistic Regression ~94.30% ~0.943
Random Forest Classifier:
 The Random Forest classifier outperformed the other models with an F1-score of
approximately 0.991.
 This indicates a high level of precision and recall for most classes.
 The ensemble nature of Random Forest allows it to handle feature importance
effectively and avoid overfitting.
XG Boost Classifier:
 XG Boost, a gradient boosting framework, also performed well with an F1-score of
0.987.

Page | 50
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

 While slightly behind Random Forest, XGBoost was more efficient in handling data
with noise and showed good generalization.
Logistic Regression:
 Logistic Regression had the lowest F1-score of 0.943.
 It performed decently but struggled with multiclass classification and complex
relationships in the dataset.
 Being a linear model, it failed to capture the non-linear dependencies between features
effectively.
Visual Analysis
The performance was also visualized using confusion matrices and classification reports,
helping us understand misclassification patterns. Most attack types were correctly identified,
with minor confusion between similar patterns like DDoS and PortScan, which often exhibit
close network behaviors.
Visualizations using Seaborn heatmaps of confusion matrices revealed:
 High true positive rates for both normal and attack classes in Random Forest and XG
Boost.
 Logistic Regression had relatively higher false negatives, especially for minority
classes.
Observations
 Random Forest consistently produced the most reliable and stable results, making it
the most suitable model for real-world deployment.
 XG Boost, while slightly less accurate, is faster in prediction time, which could be an
advantage in real-time systems.
 Logistic Regression is lightweight and interpretable but may not be sufficient for large-
scale, complex traffic environments like CICIDS2018.
 Feature selection and preprocessing (e.g., normalization, label encoding, filling
missing values) played a crucial role in improving model accuracy.
Real-Time Testing
The system was further tested on a sample network session with manually fed input features
such as flow duration, packet sizes, and timestamp-based variables. The model successfully
predicted whether the session was “Normal” or “Attack”, demonstrating its applicability in
live environments.

Page | 51
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

The prediction function using the saved model (best_nids_model.pkl) worked accurately with
unseen data, confirming that the system generalizes well.
4.6 Discussion
The results confirm that machine learning can significantly improve network intrusion
detection over traditional rule-based methods. This system is capable of learning from vast and
evolving datasets, automatically identifying complex patterns and adapting to new types of
attacks.
The CICIDS2018 dataset ensured that models were trained on realistic traffic, which enhanced
their effectiveness and robustness. Furthermore, implementing timestamp features (Hour,
Minute, Second) added temporal awareness, which improved model context in detecting time-
specific attacks.

5.2 Front End Output

Fig: 5.2.1 Output of Front End

Page | 52
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

5.3 Final Output

Fig: 5.3.1 Anaconda Navigator

Page | 53
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

Fig: 5.3.2 Creating a environmrnt variable to run the code

Fig: 5.3.3 Output

Page | 54
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

5.4 Output Screens

Fig: 5.4.1 Output Screen

Page | 55
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

CHAPTER – 6
CONCLUSION

Page | 56
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

6.1 CONCLUSION:
In today’s digital era, the exponential growth of internet usage, online transactions, and digital
services has significantly increased the vulnerability of networks to cyber threats. Traditional
intrusion detection methods, which rely on manually crafted rules or known attack signatures,
are increasingly becoming ineffective in detecting sophisticated and evolving cyberattacks.
Hence, the development of intelligent, adaptable, and robust network security systems is
essential.

This project focused on building a Robust Network Intrusion Detection System (NIDS) using
machine learning algorithms such as Random Forest, XGBoost, and Logistic Regression,
trained on the CICIDS2018 dataset. The dataset provided a realistic mix of normal and attack
traffic, which was crucial for developing models that perform well in real-world environments.
The selected features from the dataset, along with proper preprocessing, scaling, and encoding
techniques, contributed to the enhanced performance of the models.

The results demonstrated that Random Forest achieved the highest accuracy and F1-score
among the three models, showcasing its strength in handling multiclass classification problems
and its ability to generalize well. XGBoost also performed exceptionally well, with faster
training and prediction times, making it a strong candidate for real-time applications. Logistic
Regression, although simpler and faster, showed limitations in capturing complex data patterns,
especially in multiclass scenarios.

The system was further validated through real-time testing with unseen data, and it successfully
predicted whether the given network session was normal or an attack. This confirms the
practicality and efficiency of the proposed system in real-time network environments.

In conclusion, the implementation of a machine learning-based NIDS as presented in this


project is a step toward building more secure and intelligent cyber defense mechanisms. With
further enhancements, such as deep learning models, automated feature selection, and
continuous learning from live traffic, the system can be made even more effective in mitigating
modern cyber threats.

Page | 57
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

REFERENCES

Page | 58
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

REFERENCE
1.Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021). Network
intrusion detection system: A systematic study of machine learning and deep learning
approaches. Transactions on Emerging Telecommunications Technologies, 32(1), e4150.
2. Javaid, A., Niyaz, Q., Sun, W., & Alam, M. (2016, May). A deep learning approach for
network intrusion detection system. In Proceedings of the 9th EAI International Conference on
Bio-inspired Information and Communications Technologies (formerly BIONETICS) (pp. 21-
26).
3.Sekar, R., Guang, Y., Verma, S., & Shanbhag, T. (1999, November). A high-performance
network intrusion detection system. In Proceedings of the 6th ACM Conference on Computer
and Communications Security (pp. 8-17).
4.Farnaaz, N., & Jabbar, M. A. (2016). Random forest modeling for network intrusion detection
system. Procedia Computer Science, 89, 213-217.
5. Ashiku, L., & Dagli, C. (2021). Network intrusion detection system using deep
learning. Procedia Computer Science, 185, 239-247.
6. Mukherjee, B., Heberlein, L. T., & Levitt, K. N. (1994). Network intrusion detection. IEEE
network, 8(3), 26-41.
7. Abdulganiyu, O. H., Ait Tchakoucht, T., & Saheed, Y. K. (2023). A systematic literature
review for network intrusion detection system (IDS). International journal of information
security, 22(5), 1125-1162.
8. Shanmugavadivu, R., & Nagarajan, N. (2011). Network intrusion detection system using
fuzzy logic. Indian Journal of Computer Science and Engineering (IJCSE), 2(1), 101-111.
9. Raghunath, B. R., & Mahadeo, S. N. (2008, July). Network intrusion detection system
(NIDS). In 2008 first international conference on emerging trends in engineering and
technology (pp. 1272-1277). IEEE.
10. Zhang, J., Zulkernine, M., & Haque, A. (2008). Random-forests-based network intrusion
detection systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications
and Reviews), 38(5), 649-659.
11. Sultana, N., Chilamkurti, N., Peng, W., & Alhadad, R. (2019). Survey on SDN based
network intrusion detection system using machine learning approaches. Peer-to-Peer
Networking and Applications, 12(2), 493-501.
12. Liao, H. J., Lin, C. H. R., Lin, Y. C., & Tung, K. Y. (2013). Intrusion detection system: A
comprehensive review. Journal of network and computer applications, 36(1), 16-24.

Page | 59
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

13. Moustafa, N., & Slay, J. (2015, November). UNSW-NB15: a comprehensive data set for
network intrusion detection systems (UNSW-NB15 network data set). In 2015 military
communications and information systems conference (MilCIS) (pp. 1-6). IEEE.
14. Aldarwbi, M. Y., Lashkari, A. H., & Ghorbani, A. A. (2022). The sound of intrusion: A
novel network intrusion detection system. Computers and Electrical Engineering, 104,
108455.
15. Koc, L., Mazzuchi, T. A., & Sarkani, S. (2012). A network intrusion detection system based
on a Hidden Naïve Bayes multiclass classifier. Expert Systems with Applications, 39(18),
13492-13500.
16. Casas, P., Mazel, J., & Owezarski, P. (2012). Unsupervised network intrusion detection
systems: Detecting the unknown without knowledge. Computer Communications, 35(7), 772-
783.
17. Sahu, S., & Mehtre, B. M. (2015, August). Network intrusion detection system using J48
Decision Tree. In 2015 International Conference on Advances in Computing, Communications
and Informatics (ICACCI) (pp. 2023-2026). IEEE.
18. Moustafa, N., & Slay, J. (2015, November). The significant features of the UNSW-NB15
and the KDD99 data sets for network intrusion detection systems. In 2015 4th international
workshop on building analysis datasets and gathering experience returns for security
(BADGERS) (pp. 25-31). IEEE.
19. Gurung, S., Ghose, M. K., & Subedi, A. (2019). Deep learning approach on network
intrusion detection system using NSL-KDD dataset. International Journal of Computer
Network and Information Security, 11(3), 8-14.-0q`49
20. Kim, D. S., Nguyen, H. N., & Park, J. S. (2005, March). Genetic algorithm to improve
SVM based network intrusion detection system. In 19th International Conference on Advanced
Information Networking and Applications (AINA'05) Volume 1 (AINA papers) (Vol. 2, pp. 155-
158). IEEE.
21.Vigna, G., & Kemmerer, R. A. (1999). NetSTAT: A network-based intrusion detection
system. Journal of computer security, 7(1), 37-71.
22. Sarhan, M., Layeghy, S., & Portmann, M. (2022). Towards a standard feature set for
network intrusion detection system datasets. Mobile networks and applications, 1-14.
23.Bai, Y., & Kobayashi, H. (2003, March). Intrusion detection systems: technology and
development. In 17th International Conference on Advanced Information Networking and
Applications, 2003. AINA 2003. (pp. 710-715). IEEE.

Page | 60
MONITORING VITAL SIGNS AND AUTOMATED PRESCRIPTION GENERATION

24. He, K., Kim, D. D., & Asghar, M. R. (2023). Adversarial machine learning for network
intrusion detection systems: A comprehensive survey. IEEE Communications Surveys &
Tutorials, 25(1), 538-566.
25.Van, N. T., & Thinh, T. N. (2017, July). An anomaly-based network intrusion detection
system using deep learning. In 2017 international conference on system science and
engineering (ICSSE) (pp. 210-214). Ieee
26. Mohammadpour, L., Ling, T. C., Liew, C. S., & Chong, C. Y. (2018). A convolutional neural
network for network intrusion detection system. Proceedings of the Asia-Pacific Advanced
Network, 46(0), 50-55.
27. Aydın, M. A., Zaim, A. H., & Ceylan, K. G. (2009). A hybrid intrusion detection system
design for computer network security. Computers & Electrical Engineering, 35(3), 517-526.
28. Hodo, E., Bellekens, X., Hamilton, A., Dubouilh, P. L., Iorkyase, E., Tachtatzis, C., &
Atkinson, R. (2016, May). Threat analysis of IoT networks using artificial neural network
intrusion detection system. In 2016 International Symposium on Networks, Computers and
Communications (ISNCC) (pp. 1-6). IEEE.
29. Alzahrani, A. O., & Alenazi, M. J. (2021). Designing a network intrusion detection system
based on machine learning for software defined networks. Future Internet, 13(5), 111.
30. Pillai, M. M., Eloff, J. H., & Venter, H. S. (2004, October). An approach to implement a
network intrusion detection system using genetic algorithms. In ACM International Conference
Proceeding Series (Vol. 75, pp. 221-221).

Page | 61

You might also like