Mini Project Report Format
Mini Project Report Format
by
by
We declare that this written submission represents our ideas in our own words and where
other’s ideas or words have been included; we have adequately cited and referenced
the original sources. We also declare that we have adhered to all principals of
academics honestly and integrity have not misrepresented or fabricated or falsified any
idea/data/fact/sources in my submission. We understand that any violation of the above
will be cause for disciplinary action by the institute and can also evoke penal action from
the source which has thus not been properly cited or from whom proper permission has
not been taken when needed.
Date:
Project Report Approval for Bachelor Of Engineering
Date: Examiners:
Place:
This is to certify that the project entitled “Face Detection Attendance System” is
a bonafide work of “Piyush Shashikant Nandre (25), Harsh Jitendra Patil (28),
Karan Ramanand Kushwaha (23), Mohd. Faisal Mohd. Aamir Shaikh (38)”
submitted to the University of Mumbai in partial fulfillment of the requirement for the
award of the degree of “Bachelor of Engineering” in “Computer Engineering” has
been carried out under my supervision at the department of Computer Engineering of
Theem College of Engineering, Boisar. The work is comprehensive, complete and fit for
evalautaion.
First and foremost, we thank God Almighty for blessing us immensely and empowering us
at times of difficulty like a beacon of light. Without His divine intervention we wouldn’t
have accomplished this project without any hindrance.
We are also grateful to the Management of Theem College of Engineering for their
kind support. Moreover, we thank our beloved Principal Dr.Riyazoddin Siddiqui,
our Director, Dr.N.K. Rana for their constant encouragement and valuable advice
throughout the course.
We are profoundly indebted to Prof. Mubashir Khan, Head of the Department of
Computer Engineering and Prof. Shakeel Shaikh, Project Coordinator for helping
us technically and giving valuable advice and suggestions from time to time. They are
always our source of inspiration.
Also, we would like to take this opportunity to express our profound thanks to our guide
Prof. Akash Gojare, Assistant Professor, Computer Engineering for his/her valuable
advice and whole hearted cooperation without which this project would not have seen
the light of day.
We express our sincere gratitude to all Teaching/Non-Teaching staff members of
Computer Engineering department for their co-operation and support during this project.
Manual attendance marking systems are often time-consuming, prone to errors, and lack
security. This mini project addresses these limitations by developing a face detection
attendance system. The system utilizes computer vision techniques to detect faces in
real-time video streams and automatically mark attendance upon successful recognition.
This project focuses on implementing a core system with the following functionalities:
real-time face detection, user enrollment, and attendance recording with timestamps. We
evaluate the system’s performance in terms of accuracy and efficiency. This project lays
the foundation for a more comprehensive attendance management system with potential
future advancements like liveness detection, data security measures, and integration with
existing systems.
i
LIST OF FIGURES
5.1.0.1
System Architecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2.0.1
System Architecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4.1.1
Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.2.1
Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.3.1
Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.4.1
Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ii
LIST OF TABLES
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List Of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List Of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
1 INTRODUCTION 1
2 LITERATURE REVIEW 3
2.1 Automated Attendance System Using Face Recognition: . . . . . . . . . 3
2.2 An Attendance Marking System Based on Face Recognition: . . . . . . . 3
2.3 Face Recognition-based Attendance System using Machine Learning
Algorithms: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Class Attendance System based on Face Recognition": . . . . . . . . . . 4
2.5 Prototype model for an Intelligent Attendance System based on facial
Identification: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 PROPOSED SYSTEM 10
5.1 System Architecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 System Architecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3.1 Random Forest Algorithm: . . . . . . . . . . . . . . . . . . . . . . 14
5.3.2 Support Vector Machine Algorithm: . . . . . . . . . . . . . . . . . 17
5.4 UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.1 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.3 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.4 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
iv
6 Experimental Setup 25
6.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Dataset Selection: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3 Data Preprocessing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.4 Model Training: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.5 Model Evaluation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.6 Web Application Development: . . . . . . . . . . . . . . . . . . . . . . . 26
6.7 Model Deployment: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.8 Integration with Frontend: . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.9 Evaluation and Analysis Display: . . . . . . . . . . . . . . . . . . . . . . 27
6.10 Details about Inputs to the System . . . . . . . . . . . . . . . . . . . . . 27
6.11 Evaluation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.12 Software and Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . 29
6.12.1 Software and Packages Requirements: . . . . . . . . . . . . . . . . 29
6.12.2 Hardware Requirements: . . . . . . . . . . . . . . . . . . . . . . . 29
8 Conclusion 31
v
Chapter 1
INTRODUCTION
Ensuring accurate and efficient attendance tracking is crucial for various organizations,
including educational institutions, workplaces, and event venues. Traditional attendance
marking methods, such as paper-based sign-in sheets, calling names, or using ID
cards, present several challenges. These manual processes are often time-consuming
for administrators or event organizers, leading to delays and disruptions. Additionally,
they are susceptible to human error, resulting in missed entries, buddy punching (one
person marking attendance for another), or inaccurate roll calls. Furthermore, manual
systems lack security features, making them vulnerable to proxy attendance (someone
attending for another person).
In recent years, technological advancements have opened doors for innovative
solutions to address these concerns. Facial recognition technology, a powerful tool in
the realm of computer vision, offers a promising approach to attendance management.
This technology leverages algorithms to analyze facial features and identify individuals
based on their unique characteristics. By integrating facial recognition with attendance
marking processes, organizations can achieve a more efficient, accurate, and secure
system.
This mini project aims to develop a face detection attendance system as a prototype
for a more comprehensive attendance management solution. The core functionality of
this system lies in utilizing real-time face detection to automate attendance marking.
Unlike traditional methods, the proposed system will eliminate the need for manual
intervention.
The system will operate through the following key stages:
Face Detection: A webcam will continuously capture video streams. The system
will employ computer vision algorithms to detect faces within the captured frames.
These algorithms will be trained on a dataset of diverse facial images to ensure robust
performance across various lighting conditions, poses, and angles.
User Enrollment: Authorized users will be enrolled in the system by capturing their
facial data through the webcam. This data will be securely stored in a database, forming
a reference for future recognition.
1
Face Recognition: When a user approaches the system, the face detection module will
identify their presence. The captured facial information will then be compared against
the stored database of enrolled users.
Attendance Marking: Upon successful recognition of a registered user, the system
will automatically mark their attendance. This will include recording the timestamp
and any other relevant details, such as date, location, or course/meeting attended.
2
Chapter 2
LITERATURE REVIEW
This research describes a proposed intrusion detection system (IDS) that uses machine
learning algorithms to detect four different types of attacks: denial of service (DoS), user
to root (U2R), remote to local (R2L), and probe. The IDS is implemented in Python and
uses the KDD dataset for training and testing. The paper compares the performance
of four machine learning algorithms: decision tree, random forest, logistic regression,
and k-nearest neighbors (KNN). The results show that the random forest algorithm has
the highest accuracy, followed by the decision tree algorithm. The logistic regression
and KNN algorithms have lower accuracy, but they are still able to detect a significant
number of attacks.
The paper also discusses the use of Snort, a popular open-source IDS, to perform
vulnerability assessment and protocol analysis. The authors argue that Snort can be
used to complement the proposed IDS by providing additional detection capabilities.
However, there are also some potential limitations to the proposed IDS. First, machine
learning algorithms can be computationally expensive to train and deploy. This may
make the proposed IDS impractical for use in some environments. Second, the proposed
IDS is trained on the KDD dataset, which is a well-known dataset, but it is also relatively
old. It is possible that the proposed IDS may not perform as well on real-world data.
Overall, the proposed IDS is a promising approach to intrusion detection. It has several
advantages over traditional rule-based IDSs, but it is important to be aware of the
potential limitations before deploying it in a production environment.
The paper provides a valuable contribution to the field of IDS research. It highlights
the challenges of detecting modern malware and discusses the limitations of existing
IDSs. The authors also provide several suggestions for future research, including
the development of new datasets and new IDS techniques that can overcome evasion
3
techniques. This paper discusses the challenges of designing and building intrusion
detection systems (IDSs) that are capable of detecting modern malware. The authors
argue that existing IDSs are often ineffective against zero-day attacks and other
sophisticated attacks that use evasion techniques.
The authors also review several machine learning techniques that have been proposed
for IDS research. However, they argue that these techniques may have the problem
of generating and updating the information about new attacks and yield high false
alarms or poor accuracy. The authors conclude that there is a need for newer and
more comprehensive datasets that contain a wide spectrum of malware activities. They
also argue that IDs research should focus on developing systems that are capable of
overcoming evasion techniques.
This paper compares the performance of two network intrusion detection systems
(NIDS): Suricata and Snort. The authors evaluate the two systems based on accuracy,
performance, and scalability. The authors found that Suricata has higher accuracy than
Snort, meaning that it is better at detecting attacks. However, Suricata also has higher
system requirements than Snort, and it can reach its operational limit sooner under high
traffic loads. The authors also found that Suricata is more scalable than Snort. This
means that Suricata can handle more traffic without dropping packets. However, the
authors note that Snort can be scaled by running multiple instances of Snort on multiple
CPU cores. The paper provides a valuable comparison of two popular NIDS systems.
The authors’ findings are consistent with other research on the performance of Suricata
and Snort. One limitation of the paper is that it is based on a single dataset. It would be
interesting to see if the authors’ findings are consistent when evaluated on other datasets.
Another limitation of the paper is that it does not evaluate other NIDS systems, such as
Bro and Zeek. It would be interesting to see how Suricata and Snort compare to other
NIDS systems in terms of accuracy, performance, and scalability.
This paper discusses the use of machine learning classifiers in intrusion detection systems
(IDSs). The authors review a number of research papers published from 2015 to 2020.
The authors find that ensemble and hybrid classifiers have better predictive accuracy
and detection rate than single classifiers. They also discuss several directions for future
research, including:
4
• Using hybrid and ensemble classifiers more often. Developing models that can
perform efficiently on multiple datasets.
• Considering feature selection to improve the efficiency and detection rate of IDSs.
• Using more recently updated datasets to deal with the most recent malicious
intrusions and attacks.
The paper provides a valuable overview of the use of machine learning classifiers in IDSs.
The authors’ findings are consistent with other research on this topic. One limitation of
the paper is that it is based on a relatively small number of research papers. It would
be interesting to see if the authors’ findings are consistent when evaluated on a larger
set of research papers. Another limitation of the paper is that it does not discuss the
challenges of using machine learning classifiers in IDSs. For example, machine learning
classifiers can be susceptible to overfitting and adversarial attacks. Overall, the paper
provides a valuable contribution to the field of IDS research. The authors’ findings and
suggestions for future research can help researchers and practitioners to develop more
effective and robust IDSs.
5
practitioners to develop more effective and robust anomaly detection systems for NFV
networks.
6
Chapter 3
LIMITATIONS OF EXISTING SYSTEM OR
RESEARCH GAP
These ten research papers collectively contribute to the field of intrusion detection
and anomaly detection, offering valuable insights while also highlighting some common
limitations and research gaps. Several of the papers introduce novel intrusion detection
systems and machine learning techniques, demonstrating promising results but often
failing to delve into the computational complexities and practical challenges of deploying
these systems in real-world scenarios. Moreover, many of the papers rely on older datasets
for training and testing, raising concerns about the adaptability of the proposed methods
to more recent and diverse datasets. In addition, while some papers present comparisons
of various intrusion detection systems and machine learning algorithms, they often
focus on specific choices, leaving research gaps in broader comparative analyses. The
research papers collectively emphasize the importance of addressing these limitations by
optimizing computational efficiency, developing practical solutions for countering modern
malware and evasion techniques, and exploring more recent and diverse datasets, as
well as conducting comprehensive comparative studies. Additionally, there is a need
for research that addresses multi-class scenarios, adversarial attacks, and the broader
applicability of intrusion detection models in various network environments, all while
considering scalability and adaptability. These gaps in the literature signify opportunities
for future research to further enhance the field of intrusion detection, bridging the divide
between theoretical advancements and practical implementation, and addressing the
evolving challenges posed by modern network threats.
7
Chapter 4
PROBLEM STATEMENT AND OBJECTIVE
• Inaccuracy: Manual processes are prone to errors such as missed entries, buddy
punching (someone marking attendance for another person), or inaccurate roll call.
4.2 Objectives
8
• Design a User Interface: Create a user-friendly interface to manage the system.
Allow for user enrollment by capturing and storing facial data securely.
9
Chapter 5
PROPOSED SYSTEM
1. Frontend: The frontend of the system is responsible for collecting data from
the network or system being monitored and displaying the results to users. The
frontend is implemented using Flask, a lightweight web framework for Python.
HTML, CSS, and JavaScript are used to create the user interface of the web
application.
2. Backend: The backend of the system is responsible for preprocessing the collected
data, training the intrusion detection model, deploying the model, and performing
intrusion detection. The backend is implemented using Python. Two machine
learning algorithms, SVM and Random Forest, are used to train the intrusion
detection model. The CICIDS 2018 and UNSW 2015 datasets are used to train
and evaluate the model.
10
3. Cloud Deployment: The intrusion detection model is deployed on the cloud
using AWS or other cloud platforms. This allows the model to be scaled to meet
the needs of the system and to be accessed from anywhere in the world.
1. Data collection: The frontend collects data from the network or system being
monitored. This data can be in the form of network traffic packets, system logs,
or other types of data.
3. Model training: The backend trains the intrusion detection model using the
preprocessed data. Two machine learning algorithms, SVM and Random Forest,
are used in the system.
4. Model evaluation and analysis: The backend evaluates the performance of the
trained intrusion detection model on the two datasets and displays the results to
users in the frontend.
6. Intrusion detection: The web application uses the deployed intrusion detection
model to detect intrusions in the network or system being monitored.
The following are some of the benefits of the proposed system architecture:
1. Accuracy: The system uses two machine learning algorithms, SVM and Random
Forest, which are known for their high accuracy in intrusion detection.
2. Scalability: The system can be scaled to meet the needs of the system by
deploying the intrusion detection model on the cloud.
11
3. Accessibility: The web application can be accessed from anywhere in the world,
provided that there is an internet connection.
4. Flexibility: The system can be customized to meet the specific needs of the
organization.
This architecture is based on the idea of using a large and diverse dataset of labeled
network traffic to train a machine learning model to detect intrusions.
12
The system architecture works as follows:
1. Data collection: The system collects network traffic data from a variety of
sources, such as production networks, honeypots, and sandboxes. Production
networks are real-world networks that are used by organizations. Honeypots
are fake networks that are designed to attract attackers. Sandboxes are isolated
environments that are used to test suspicious code or software.
The data is collected in a variety of formats, such as packet captures, flow logs,
and system logs. Packet captures contain all of the packets that are transmitted
on a network interface. Flow logs contain aggregated information about the traffic
that flows through a network. System logs contain information about events that
occur on a system, such as login attempts and process executions.
3. Model training: The machine learning models are trained on the engineered
dataset. Two machine learning algorithms, SVM and Random Forest, are used
in the system. SVM is a supervised learning algorithm that can be used for
both classification and regression tasks. Random Forest is an ensemble learning
algorithm that combines the predictions of multiple decision trees to produce a
more accurate prediction.
4. Model evaluation: The trained models are evaluated on a held-out test set to
assess their performance. This helps to identify any areas where the models need
improvement. The evaluation metrics that are used depend on the specific machine
learning algorithms that are used. For example, common evaluation metrics for
classification tasks include accuracy, precision, recall, and F1 score.
5. Model deployment: The trained models are deployed to production. This can
be done by deploying them to a cloud platform or to on-premises servers. The
specific deployment method that is used depends on the specific requirements of
the organization.
13
6. Intrusion detection: The deployed models are used to detect intrusions in real
time. The models analyze network traffic and generate alerts when they detect
malicious activity. The alerts can be sent to security analysts or to automated
systems that can take action to respond to the threat.
The SVM and Random Forest algorithms will be implemented to train the IDS models.
SVM is a supervised learning algorithm that separates data points into different classes
using hyperplanes in high-dimensional space. Random Forest is an ensemble learning
method that constructs multiple decision trees and combines their predictions. The
training process involves splitting the datasets into training and testing sets. The models
will be trained on the training set and evaluated on the testing set to measure their
performance. Various metrics, such as accuracy, precision, recall, and F1-score, will be
used to assess the models’ effectiveness in detecting intrusions.
Random Forest is a popular machine learning algorithm that is used for both classification
and regression tasks. It is an ensemble learning method that combines multiple decision
trees to make predictions. The algorithm gets its name from the fact that it creates a
"forest" of decision trees, where each tree is built using a random subset of the training
data. The key components of Random Forest are:
14
diversity among the trees and prevents any single feature from dominating the
decision-making process.
3. Building Decision Trees: Each decision tree in the Random Forest is built using
a recursive process called recursive partitioning. The goal is to split the data at
each node based on a selected feature and its corresponding threshold value. The
splitting criterion, such as Gini impurity or information gain, is used to determine
the best feature and threshold for each split. The process continues until a stopping
criterion is met, such as reaching a maximum depth or minimum number of samples
in a leaf node.
4. Voting and Aggregation: Once all the decision trees are built, the Random
Forest algorithm combines their predictions to make the final prediction. For
classification tasks, the most common class predicted by the individual trees is
selected as the final prediction. For regression tasks, the average or median of
the predicted values is taken as the final prediction. This voting and aggregation
process helps reduce the variance and improve the overall accuracy of the model.
Algorithm:
1. Initialize the number of trees (d) and the maximum depth of the tree (d)
(a) Select a random subset of features (X) from the training data
(b) Use a random subset of the training data (X) to train each decision tree
(c) Use the remaining features (X) to train the next decision tree
(d) Repeat steps b and c until all trees are trained
(a) Compute the prediction for the test data (y) using the trained decision tree
(b) Compute the error (e) between the predicted output (y) and the actual output
(y’)
4. Combine the predictions of all trees to produce the final output (y_rand_forest)
15
Implementation in Scikit-learn:
For each decision tree, Scikit-learn calculates a nodes importance using Gini Importance,
assuming only two child nodes (binary tree):
where,
The importance for each feature on a decision tree is then calculated as:
P
j:node j splits on feature i nj
f ii = P (5.2)
k ∈ all nodes nk
where,
These can then be normalized to a value between 0 and 1 by dividing by the sum of all
feature importance values:
f ii
normf ii = P (5.3)
j ∈ all features f ij
16
The final feature importance, at the Random Forest level, is it’s average over all the
trees. The sum of the feature’s importance value on each trees is calculated and divided
by the total number of trees:
P
j ∈ all trees j normf iij
RF f ii = (5.4)
T
where,
• RF f ii = the importance of feature i calculated from all trees in the Random Forest
model
Support Vector Machines (SVM) is a powerful machine learning algorithm that is widely
used for classification and regression tasks. It is based on the principles of statistical
learning theory and aims to find an optimal hyperplane that separates different classes or
predicts continuous values. In this explanation, we will delve into the mathematical terms
and concepts behind SVM. Let’s start by considering a binary classification problem
where we have a set of labeled training data points. Each data point is represented by a
feature vector x and belongs to one of two classes, either positive (+1) or negative (-1).
The goal of SVM is to find a hyperplane in the feature space that maximally separates
the two classes.
Mathematically, we can represent the hyperplane as a linear equation:
w·x+b=0 (5.5)
where w is the normal vector to the hyperplane and b is the bias term. The sign of
w · x + b determines on which side of the hyperplane a data point lies. If w · x + b > 0,
then the data point belongs to the positive class, otherwise it belongs to the negative
class.
To find the optimal hyperplane, SVM aims to maximize the margin between the
hyperplane and the closest data points from each class. These data points are known as
support vectors, hence the name "Support Vector Machines". The margin is defined as
the perpendicular distance between the hyperplane and these support vectors.
17
Let’s denote the set of support vectors as S. For any given data point xi in S, we have:
w · xi + b = 1 if yi = +1 (5.6)
w · xi + b = −1 if yi = −1 (5.7)
where yi represents the class label of xi. The margin can be calculated as:
w 2
margin = · (x+ −
i − xi ) = (5.8)
∥w∥ ∥w∥
where x+i and xi are two support vectors from the positive and negative classes,
−
respectively. The objective of SVM is to maximize this margin, which can be formulated
as an optimization problem.
To handle cases where the data is not linearly separable, SVM introduces the concept
of slack variables. These variables allow for a certain amount of mis-classification or
overlapping between the classes. Let’s denote the slack variables as ξi , where ξi ≥ 0 for
all data points. The optimization problem can then be formulated as:
1 X
minimize: ∥w∥2 + C ξi
2
subject to: yi (w · xi + b) ≥ 1 − ξi (5.9)
ξi ≥ 0
where C is a hyperparameter that controls the trade-off between maximizing the
margin and minimizing the misclassification errors. A larger value of C allows for fewer
misclassifications but may result in a smaller margin, while a smaller value of C allows
for a larger margin but may lead to more misclassifications.
To solve this optimization problem, we can use techniques from convex optimization,
such as quadratic programming or Lagrange duality. The solution will provide us with
the optimal values of w and b, which define the hyperplane that separates the classes.
18
5.4 UML Diagrams
Users can view alerts generated by the intrusion detection system, view the system
status, and download reports.
Actors: User
Use Cases:
1. Configure System
2. Train Model
3. Deploy Model
4. Monitor System
Users would be able to log into the web application to view the following:
1. A list of alerts generated by the intrusion detection system, including the time of
the alert, the type of alert, and the source IP address of the malicious activity.
19
2. The status of the intrusion detection system, such as whether it is running, stopped,
or in maintenance mode.
3. Reports on the performance of the intrusion detection system, such as the number
of alerts generated, the types of attacks detected, and the accuracy of the system.
4. This information would allow users to stay informed about the security of their
network and to take action to respond to threats.
20
This activity diagram shows the high-level steps involved in developing and deploying
an intrusion detection system using machine learning.
Actors:
Activities:
1. Collect and prepare data: Collect network traffic data and prepare it for
training and testing the machine learning models.
2. Train machine learning models: Train two machine learning models, SVM and
random forest, on the prepared data.
4. Deploy machine learning models: Deploy the two machine learning models to
a cloud platform.
5. Develop web application: Develop a web application using Flask, HTML, CSS,
and JavaScript to allow users to view the results of the intrusion detection system
and manage the system.
6. Deploy web application: Deploy the web application to AWS or another cloud
platform.
21
5.4.3 Sequence Diagram
The sequence diagram provides a high-level overview of the steps involved in the intrusion
detection system using machine learning. It shows the interactions between the different
components of the system, including the user, the web application, and the intrusion
detection model.
The sequence diagram shows the following steps involved in the intrusion detection
system using machine learning:
2. The web application forwards the request to the intrusion detection model.
3. The intrusion detection model analyzes the request and generates a prediction.
4. The intrusion detection model returns the prediction to the web application.
6. The web application collects data about the performance of the intrusion detection
models.
7. The web application analyzes the data to identify areas where the models can be
improved.
22
5.4.4 Class Diagram
Class Diagram for Intrusion Detection System Using Machine Learning with 3 classes:
23
• Building_trees( ) This method builds a set of decision trees from the
random vectors.
3. SVM: This class represents a support vector machine (SVM) machine learning
model. It has the following attributes and methods:
1. The Detects_Intrusion class uses the Random_Forest and SVM classes to detect
intrusions in network traffic.
2. The Random_Forest and SVM classes use the Protocol_type, Service, and
srv_error_rate attributes to build their machine learning models.
24
Chapter 6
EXPERIMENTAL SETUP
6.1 Introduction:
The purpose of this experimental setup is to develop an intrusion detection system (IDS)
using machine learning algorithms, specifically Support Vector Machines (SVM) and
Random Forest. The IDS will be trained and evaluated on two datasets: CICIDS 2018
and UNSW 2015. Additionally, a web application will be built using Flask, HTML,
CSS, and JavaScript to provide a user-friendly interface for interacting with the IDS.
The model will be deployed on a cloud platform, such as AWS, to ensure scalability and
availability.
The first step in the experimental setup is to select appropriate datasets for training
and evaluation. The CICIDS 2018 dataset is a widely used benchmark dataset for
network intrusion detection research. It contains a large number of network traffic records
with various types of attacks and normal traffic. The UNSW 2015 dataset is another
popular dataset that includes both normal and attack traffic captured in a controlled
environment.
Before training the machine learning models, it is essential to preprocess the datasets to
ensure data quality and compatibility with the algorithms. This step involves removing
duplicates, handling missing values, normalizing features, and encoding categorical
variables if necessary. Additionally, feature selection techniques can be applied to reduce
dimensionality and improve model performance.
25
6.4 Model Training:
In this step, the SVM and Random Forest algorithms will be trained on the preprocessed
datasets. SVM is a supervised learning algorithm that separates data points into
different classes using hyperplanes in high-dimensional space. Random Forest is an
ensemble learning method that combines multiple decision trees to make predictions.
Both algorithms have been proven effective in intrusion detection tasks.
To evaluate the performance of the trained models, various metrics will be used,
including accuracy, precision, recall, F1-score, and area under the receiver operating
characteristic curve (AUC-ROC). The evaluation will be conducted using cross-validation
techniques to ensure robustness and avoid overfitting. Additionally, confusion matrices
and classification reports will be generated to provide detailed insights into the model’s
performance.
To make the IDS accessible to users, a web application will be developed using Flask,
HTML, CSS, and JavaScript. Flask is a lightweight web framework that allows easy
integration with Python-based machine learning models. The frontend of the web
application will be designed using HTML, CSS, and JavaScript to provide an intuitive
and user-friendly interface for interacting with the IDS.
The trained machine learning models will be deployed on a cloud platform, such as AWS
or another suitable platform. This ensures that the IDS is accessible from anywhere
and can handle a large number of concurrent requests. The deployment process involves
containerizing the models using technologies like Docker and deploying them on cloud
instances or serverless architectures.
The deployed models will be integrated with the frontend of the web application. This
integration allows users to input network traffic data through the web interface and
receive real-time predictions from the IDS. The frontend will communicate with the
deployed models through APIs or other communication protocols.
26
6.9 Evaluation and Analysis Display:
The evaluation results obtained during model training and testing will be displayed to
the users in the frontend of the web application. This includes metrics such as accuracy,
precision, recall, F1-score, AUC-ROC, as well as visualizations like confusion matrices
and classification reports. The purpose of displaying these results is to provide users
with insights into the performance of the intrusion detection system and help them
make informed decisions.
1. Datasets: The system will be trained on datasets of labeled network traffic. This
helps the system to learn to identify different types of intrusions. The datasets we
will be using are CICIDS 2018 and UNSW 2015.
2. Packet captures: Packet captures contain all of the packets that are transmitted
on a network interface. This information can be used to identify suspicious patterns
of traffic, such as a large number of packets from a single IP address or a large
number of packets with the same destination port.
3. Flow logs: Flow logs contain aggregated information about the traffic that flows
through a network. This information can be used to identify unusual traffic
patterns, such as a sudden increase in traffic to a particular service.
4. System logs: System logs contain information about events that occur on a
system, such as login attempts and process executions. This information can be
used to identify suspicious activity, such as a large number of failed login attempts
or a process that is using a lot of resources.
These evaluation parameters will be used to assess the performance of the intrusion
detection system using two machine learning algorithms, random forest and support
vector machine, on two datasets, CICIDS 2018 and UNSW-NB15.The following
evaluation parameters will be used for the intrusion detection system:
27
1. Accuracy: The accuracy of the intrusion detection system is the percentage of
correctly classified data points. It is calculated as follows:
Accuracy = (T P + T N )/(T P + T N + F P + F N )
where,
P recision = T P/(T P + F P )
3. Recall: The recall of the intrusion detection system is the percentage of actual
positive cases that are correctly predicted. It is calculated as follows:
Recall = T P/(T P + F N )
5. Area under the ROC curve (AUC): The AUC is a measure of the overall
performance of a classifier. It is calculated by plotting the true positive rate (TPR)
against the false positive rate (FPR) at different thresholds. The AUC ranges from
0 to 1, with a higher AUC indicating better performance.
28
6.12 Software and Hardware Setup
1. Tools:
(a) Python: Python will be used to implement the entire system, including
the data preprocessing, feature extraction, and evaluation, and application
development.
(b) openCV: openCV will be used to train and evaluate the model for face
detection.
2. Web Application:
(a) Flask: Flask will be used to develop a web application that can be used to
interact with the face detection system.
3. Data Analysis:
(a) Pandas: Pandas will be used to preprocess the network traffic data and
extract features that are relevant for face detection.
(b) Joblib: Joblib will be used to recompute the function repetitively leading to
saving lot of time.
(c) Numpy: Numpy will be used for numerical operations and linear algebra
computations.
4. Databases:
1. Operating System: We have used Windows 10. These editions offer features and
compatibility suitable for development and deployment.
29
Chapter 7
IMPLEMENTATION PLAN OF NEXT SEMESTER
30
Chapter 8
CONCLUSION
This mini project successfully developed a core prototype of a face detection attendance
system. The system demonstrated the feasibility of utilizing real-time face detection
technology to automate attendance marking. We achieved the key objectives of
implementing a face detection module, designing a user interface for enrollment, and
integrating attendance recording with timestamps.
The evaluation process revealed promising results in terms of accuracy and efficiency.
The system effectively detected faces in real-time video streams and accurately recognized
enrolled users under various lighting conditions.
This project serves as a stepping stone for further development of a comprehensive
attendance management system. Here’s a glimpse into the future:
Liveness detection techniques can be incorporated to ensure only real people are
marking attendance. Robust data security measures can be implemented to guarantee
user privacy and compliance with regulations. Integration with existing systems like
HR software or attendance management platforms can streamline data management.
The success of this mini project highlights the potential of face detection technology
to revolutionize attendance tracking. The proposed system offers significant advantages
over traditional methods, promoting efficiency, accuracy, and security. As technology
continues to evolve, facial recognition holds immense potential to transform attendance
management across diverse organizations.
While this project focused on a core functionality, it lays the foundation for a future
where attendance marking becomes a seamless and secure process. Further advancements
in this technology can contribute significantly to improved workflows and data accuracy
in various sectors.
31
REFERENCES
[2] R. H. Khem Puthea and R. Hidayat, “An attendance marking system based on face
recognition,” Cybersecurity, vol. 02, no. 01, pp. 2–20, Dec. 2019.
[5] A. V. S. R. Raj Malik, Praveen Kumar, “Prototype model for an intelligent attendance
system based on facial identification,” Sensors 2023, vol. 10, no. 01, pp. 1–26, Jun.
2023.
32