0% found this document useful (0 votes)
144 views68 pages

Artificial Intelligence Based Real-Time Attendance System Using Face Recognition

This document is a project report submitted in partial fulfillment of a Bachelor of Engineering degree. It describes developing an artificial intelligence based real-time attendance system using face recognition. The system takes group photos of students in a classroom using a high-resolution camera and then extracts individual faces to recognize students using a convolutional neural network trained on a student face database. The experimental results show this approach performs better than other attendance marking systems in terms of effectiveness, usability, and implementation as it requires minimal human-machine interaction. The report also provides details on the software and hardware platforms, algorithms, and methodology used to build this autonomous attendance tracking system based on facial recognition technology.

Uploaded by

gururaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views68 pages

Artificial Intelligence Based Real-Time Attendance System Using Face Recognition

This document is a project report submitted in partial fulfillment of a Bachelor of Engineering degree. It describes developing an artificial intelligence based real-time attendance system using face recognition. The system takes group photos of students in a classroom using a high-resolution camera and then extracts individual faces to recognize students using a convolutional neural network trained on a student face database. The experimental results show this approach performs better than other attendance marking systems in terms of effectiveness, usability, and implementation as it requires minimal human-machine interaction. The report also provides details on the software and hardware platforms, algorithms, and methodology used to build this autonomous attendance tracking system based on facial recognition technology.

Uploaded by

gururaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 68

ARTIFICIAL INTELLIGENCE BASED REAL-TIME

ATTENDANCE SYSTEM USING FACE


RECOGNITION 

A PROJECT REPORT

Submitted by

HARIRAM S(191EC156)
GOWTHAM G K(191EC146)
GURURAJ R(191EC152)

In partial fulfilment for the award of the degree


of
BACHELOR OF ENGINEERING
in
ELECTRONICS AND COMMUNICATION ENGINEERING

BANNARI AMMAN INSTITUTE OF TECHNOLOGY


(An Autonomous Institution Affiliated to Anna University, Chennai
SATHYAMANGALAM-638401

ANNA UNIVERSITY: CHENNAI 600 025

MARCH 2023
BONAFIDE CERTIFICATE

Certified that this project report “ARTIFICIAL INTELLIGENCE BASED


REALTIME ATTENDANCE SYSTEM USING FACE RECOGNITION” is the
bonafide work of HARIRAM S(191EC156), GOWTHAM G K(191EC146),
GURURAJ R(191EC152) who carried out the project work under my supervision.

SIGNATURE SIGNATURE
Dr C. POONGODI Dr C.RAJU
HEAD OF THE DEPARTMENT, SUPERVISOR,
Professor & Head, Assistant Professor,
Department of ECE, Department of ECE,
Bannari Amman Institute Of Bannari Amman Institute Of
Technology-638401. Technology-638401.

Submitted for project Viva Voce examination held on ____________.

Internal Examiner External Examiner


DECLARATION

We affirm that the project work titled “ARTIFICIAL INTELLIGENCE


BASED REALTIME ATTENDANCE SYSTEM USING FACE
RECOGNITION ” being submitted in partial fulfillment for the award of the
degree of Bachelor of Engineering is the record of original work done by us
under the guidance of Dr C.RAJU Supervisor, Assistant Professor, Department of
ECE. It has not formed a part of any other project work(s) submitted for the award
of any degree or diploma, either in this or any other University.

HARIRAM S GOWTHAM G K GURURAJ R

(191EC156) (191EC146) (191EC152)

I certify that the declaration made above by the candidates is true.

SIGNATURE
Dr C.RAJU
SUPERVISOR,
Assistant Professor,
Department of ECE,
Bannari Amman Institute Of
Technology-638401.
ACKNOWLEDGEMENT

We would like to enunciate heartfelt thanks to our esteemed Chairman


Sri.S.V.Balasubramaniam, and the respected Director Dr.M.P.Vijaykumar,
for providing excellent facilities and support during the course of study in this
institute.

We are grateful to Dr.C. POONGODI, Professor and Head of the


Department, Electrical and Communication Engineering for his valuable
suggestions to carry out the project work successfully.

We wish to express our sincere thanks to the Faculty guide Dr.C.RAJU,


Associate Professor, Department of Electronics And Communication
Engineering for his constructive ideas, inspirations, encouragement, excellent
guidance and much needed technical support extended to complete our project
work.

We would like to thank our friends, faculty and non-teaching staff who
have directly and indirectly contributed to the success of this project.

HARIRAM S (191EC156)

GOWTHAM G K (191EC146 )
GURURAJ R (191EC152)
ABSTRACT

To maintain track of students' everyday presence in academic institutions across all


grade levels, attendance marking is a widespread practise. The manual methods
used in the past to record attendance. These methods are accurate and have no
potential of recording phoney attendance, but they take a lot of time and need some
work from many pupils.

Using biometric technologies based on radio frequency identification, fingerprint,


face, and iris scanning, automated solutions are created to address the
shortcomings of manual methods Each method has benefits and drawbacks.
However, each of these systems has the drawback of requiring human interaction
to mark each person's attendance one at a time. In this work, we present a robust
and effective attendance marking system from a single group photograph using
face identification and recognition algorithms to overcome the shortcomings of
existing manual and automated attendance systems.

In this technique, a group photo of every student seated in a classroom is taken


using a high-resolution camera set at a fixed point. Following the extraction of the
face pictures from the group image using an algorithm, a convolutional neural
network trained on a student face database is used for recognition. We tested our
system using several group picture formats and database types. Our experimental
findings demonstrate that, in terms of effectiveness, usability, and implementation,
the suggested framework performs better than other attendance marking systems.
The suggested system is an autonomous attendance system that can be readily
integrated into a smart classroom because it necessitates minimal contact between
humans and machines.

This project describes a face recognition algorithm-based multiple attendance


system. Using a deep learning algorithm, a reliable and effective facial detection-
based attendance system is put into place. Facial recognition technology is
commonly used to identify faces.

Keywords: . Facial recognition, deep learning, autonomous, multiple attendance,


radio frequency identification, fingerprint, face, and iris scanning.
TABLE OF CONTENTS

CHAPTE
TITL PAGE
R NO
E NO

ABSTRACT v
LIST OF FIGURES xi

1 INTRODUCTION 1
1.1 ADVANTAGES 2
1.2 APPLICATIONS 3
2 LITERATURE SURVEY 4
3 OBJECTIVES AND METHODOLOGY 8
3.1 OBJECTIVES 8
3.2 METHODOLOGY 8
3.2.1 Building Dataset 10
3.2.2 Training Dataset 10
3.2.3 Object Detection Model Testing 11

4 PLATFORMS 12
4.1 SOFTWARE REQUIREMENT 12
4.1.1 H/W System Configuration 12
4.1.2 S/W System Configuration 12
4.2 SOFTWARE ENVIRONMENT 12
4.2.1 Python Technology 12
4.2.2 Python Programing 12

5 YOLOV5 AND PYTHON ARCHITECTURE 14


5.1 YOLO ALGORITHM 14
5.1.1 ADVANTAGES 15
5.2 YOLOV3 NETWORK 16

5.3 YOLOV5 NETWORK 16


5.4 PYTHON 17
5.4.1 THE PYTHON PLATFORM 19
5.4.2 WHAT DOES PYTHON 20
TECHNOLOGY DO?
5.4.3 PRODUCTIVITY AND SPEED 20
5.4.4 PYTHON IS POPULAR FOR WEB 20
APPS

5.5 OPEN SOURCE AND FRIENDLY 21


COMMUNITY
5.5.1 PYTHON IS QUICK TO LEARN 21
5.5.2 BROAD APPLICATION 21

6 EXERIMENTAL PROCEDURE 22
6.1 IMAGE PROCESSOR 22
6.2 IMAGE PREPROCESSING 23
6.3 IMAGE SEGMENTATION 23
6.4 FEATURE EXTRACTION 24
6.5 IMAGE ENHANCEMENT 24
6.6 CLASSIFICATION 24
6.7 API DOCUMENTATION 25
GENERATORS
6.7.1 USES 25
6.8 PANDAS 27
6.8.1 LIBRARY FEATURES 27
28
6.9 CSV READER
7 30
PROCESSOR
30
7.1 INTRODUCTION TO
PROCESSOR
30
7.2 GENERAL PURPOSE
PROCESSOR
7.3 MICROPROCESSOR 30
7.3.1 BASIC COMPONENTS 31
OF PROCESSOR
7.3.2 PRIMARY CPU 31
PROCESSOR OPERATIONS
7.4 TYPES OF PROCESSOR 31
7.4.1 SINGLE CORE 32
PROCESSOR
7.4.2 DUEL CORE 32
PROCESSOR 32
7.4.3 MULTI CORE
PROCESSOR 33
7.4.4 QUAD CORE 33
PROCESSOR
33
7.4.5 OCTA CORE
34
PROCESSOR
7.5 WEB CAM
34
7.5.1 VIDEO CALLING AND
VIDEO CONFRENCING 35

7.5.2 VIDEO SECURITY 35


7.5.3 VIDEO CLIPS AND 35
STILLS
36
7.5.4 INPUT CONTROL
37
DEVICES
7.5.5 ASTRO PHOTOGRAPHY
38
7.6 OPEN CV
7.7 YOLO
7.7.1 HOW YOLO IMPROVES 39
8
OVER PREVIOUS OBJECT
DETECTION METHOD 40
7.8 CNN ARCHITECTURE
7.8.1 DEEP NEURAL 41
NETWORK
RESULTS 41
8.1 DATASET CREATION
9 8.2 DATA COLLECTION 42
8.3 FACE EXTRACTION
8.4 OUTPUT 43
8.5 EXPECTED RESULT
43
10
CONCLUSION
11 44
9.1 CONCLUSION
12
9.2 REFERENCE

44

ANNEXURE 1 44
ANNEXURE 2
ANNEXURE 3 45

46

47

52
LIST OF FIGURES
FIGURE FIGURE NAME PAGE
NO. NO.
1 Methodology chart 9
2 Labelling and Roboflow 10
3 Flowchart of Dataset to Weight 10
Model
4 Flowchart of Input to Output 11
5 Overall Flowchart 11
6 A Repository Architecture of IDE 17
7 Workflow of a Source Code 19
8 Block Diagram of Fundamental 22
Sequence Involved In An Image
Processing System
9 Software Libraries 27
10 Python Panda Features 28

11 Opening a CSV Reader 29

12 Webcam 33

13 CNN Architecture 39
CHAPTER 1

INTRODUCTION

Marking attendance is a common practice in both workplaces and


organizations. In educational institutions, attendance is seen as a crucial aspect for
both students and professors. It takes a lot of work to keep track of student
attendance in the classroom. The two main categories of attendance systems are
manual and automatic attendance systems. The roll call method, in which a teacher
records attendance by calling out each student's name one at a time, is the most used
manual attendance approach. It can take more than 10 minutes per day to mark
proxy attendance using this method, which is incredibly out-of-date and has the
highest number of opportunities.

The second way involves marking one's presence on an attendance register or


sheet. It is the most time-consuming method and, if unchecked, is easily
manipulated and falsified. Therefore, it is crucial to create an automated attendance
system that can efficiently record attendance without human involvement. The best
option for building attendance systems is face recognition because it is the least
invasive way of identification, can be used to take images from a distance, is a cost-
effective solution, has no chance of recording a proxy as present, and is a simple yet
dependable process. In this study, we created an automated attendance system that
uses films taken by a webcam to track students' attendance by facial recognition.

In this study, we suggested a face detection and recognition-based attendance


system that can accept numerous attendances with a single input, increasing the
system's efficiency while eliminating the need for proxy attendance. The method
begins by taking a group photo of the class using a live CCTV feed, after which the
faces are recognised. The suggested system recognizes faces from group images
using the DCNN algorithm. Face data from the user is initially captured using
OpenCV software.

More than 1000 photos from the user have been gathered. The four crucial
procedures of grayscale conversion, resizing, normalization, and augmentation are
used to preprocess the collected data. The processed data was used to create and
train the CNN architecture. The automated facial detection block stores and uses the
trained model. The created model aids in accurately identifying the faces that have
been trained The attendance will be noted if the face is recognized. Recognizing a
student count as marking him or her as present. To improve system effectiveness,
the procedure is performed several times, and the final findings are recorded in the
excel file. Due to its background operation and minimal to no involvement from
either professors or students, this automatic attendance system helps pupils preserve
their valuable study time.

1.1 ADVANTAGES

• The main advantage of this system is where attendance is marked on the server
which is highly secure where no one can mark the attendance of other.

• Time saving
• Ease in maintaining attendance.
• Reduced paper work.
• Automatically operated and accurate.
• Reliable and user friendly
1.2 APPLICATION

● To verify identities in Government organizations.


● Enterprises.
● Attendance in library.
● To detect fake entries at international borders.
● Industries.
CHAPTER 2

LITERATURE SURVEY

[1] Vyavahare M.D, Kataria S. S entitled “Library Management Using Real-Time


Face Recognition System” proposed the concept of an automatic system for human
face identification in a vast home-made dataset of a person's faces in a real-time
backdrop setting. The task is extremely challenging since real-time background
removal from an image is still a problem. In addition, there is a significant range in
the size, position, and expression of the human face. Most of this variation is
collapsed by the suggested approach. Ada-boost with cascade is performed to
recognize human faces in real time, and a quick and easy PCA and LDA are used to
identify the faces found. In our example, the matching face is then used to record
attendance in the lab.Real-time attendance is tracked by this library management
system, which uses human face recognition and gains a high accuracy rate. There
are two databases—one is a system for library databases, and the other is a database
for students.

[2] K. Susheel Kumar, Shitala Prasad, Vijay Bhaskar Semwal, R C Tripathi entitled
“Real Time Face recognition using Ada-boost improved fast PCAS Algorithm”.
This work provides an automated method for human face identification in a big
home-made datasets of a person's face in a real-time backdrop setting. The task is
extremely challenging since real-time background removal from an image is still a
problem. In addition, there is a significant range in the size, position, and expression
of the human face. Most of this variation is collapsed by the suggested approach.
Ada-boost with cascade is used to detect human faces in real time, and a quick and
easy PCA and LDA are used to identify the faces found. In our example, the
matching face is then utilized to record attendance in the lab. This biometric system
is a real-time attendance system.
[3] Chengji Liu, Yufan Tao, Jiawei Liang, Kai Li1, Yihang Chen entitled “Object
Detection Based on YOLO Network” proposed that by implementing advanced
degrading  techniques on training sets, such as noise, blurring, rotation, and
cropping of pictures, a generic object identification network was created. The
model's generalization and resilience were improved by using degraded training sets
during training. The experiment demonstrated that the model's resilience and
generalization capabilities for degraded pictures are both subpar for sets trained on
standard data. The model was then trained on damaged photos, which increased its
average precision. It was established that the generic degenerative model
outperformed the standard model in terms of average accuracy for deteriorated
photos.

[4] Rumin Zhang, Yifeng Yang entitled “An Algorithm for Obstacle Detection
based on YOLO and Light Field Camera'' displays the concept of The YOLO object
detection algorithm and the light field camera are combined to create a suggested
obstacle detection system for indoor environments. This algorithm will categorize
objects into groups and mark them in the picture. To train YOLO, the photographs
of the typical obstacles were tagged. The unimportant obstruction is eliminated
using the object filter. The usefulness of this obstacle identification algorithm is
illustrated using several scene types, such as pedestrians, chairs, books, and so on.

[5] S. Aravindh, R. Athira, M. J. Jeevitha entitled “Automated Attendance


Management Reporting System using Face Recognition” highlights the
development of a system that is successful and will automatically mark attendance
by recognizing their faces. The phases in this face recognition system's procedure
are broken down into many stages, but the crucial ones are face detection and face
recognition. First, a picture of each student's face will be needed to record their
attendance. 

[6] Swarnendu Ghosh, Mohammed Shafi KP, Neeraj Mogal, Prabhu Kalyan Nayak,
Biswajeet Champaty entitled “Smart Attendance System”It is suggested that the
Android application be restricted to authorized staff in order to track student
attendance and communicate information for library records. The gadget is
extremely secure since only authorized personnel's fingerprints may be used to
activate it.

[7] Vassilios Tsakanikas and Tasos Dagiuklas proposed a paper on title "Video
surveillance systems-current status and future trends" .An effort is made to
document the current state of video surveillance systems in this survey. The
fundamental elements of a surveillance system are provided and carefully
examined. The presentation of algorithms for object detection, object tracking,
object recognition, and item re-identification. The most popular surveillance system
modalities are examined, with a focus on video in terms of available resolutions and
cutting-edge imaging techniques like High Dynamic Range video. Together with
the most popular methods for improving the image and video quality, the most
significant features and statistics are offered.
The most significant deep learning algorithms and the intelligent analytics they
employ are described. Just before examining the difficulties and potential future
directions of surveillance, augmented reality and the function it can play in a
surveillance system are reported.
[8] S. Aravindh, R. Athira, M. J. Jeevitha delivered the idea of automated
attendance through paper titled as "Automated Attendance Management and
Reporting System using Face Recognition ".Manually managing the system for
attendance is challenging. The many biometrics-based smart and automated
attendance systems are frequently used to manage attendance. One of them is face
recognition. This method frequently resolves the issue of proxies and fake
attendance. There were certain drawbacks to the old facial recognition-based
attendance system, such as sun intensity and head posture issues. Thus, a number of
methods, including the illumination invariant, the Viola and Jones algorithm, and
principle component analysis are utilized to overcome these problems. The two
basic processes in this system are face detection and face recognition. After this,
the discovered faces are often compared by cross-referencing with the student face
database. This clever technique will make it easier to keep track of students'
attendance and records. Taking the attendance manually in a classroom full of many
pupils is a laborious and time-consuming operation. As a result, we may put in
place a system that effectively marks pupils' attendance by identifying their faces.
[9] Swarnendu Ghosh, Mohammed Shafi KP, Neeraj Mogal, Prabhu Kalyan Nayak,
Biswajeet Champaty proposed the title of “Automated Attendance System”. For the
best possible use of teaching and learning time, the current study outlines the design
and development of a smart attendance system for students in schools or colleges.
The suggested gadget is a biometric attendance recorder that works with an Arduino
UNO and fingerprint sensor. Through the enrollment procedure, the gadget
recorded the fingerprint prints of all faculty members and students at an institute.
Students' registration fingerprints were compared to the enrolled database
throughout the attendance process. If there was a match, the student's name was
stored in that device and wirelessly communicated to an Android application
created in-lab using Bluetooth protocol service. Only approved staff members have
access to the Android app, which is used to share and track student attendance.The
device is very secure since only the authorised persons concerned may activate it
using their fingerprints (faculties). The gadget is affordable, reliable, transportable,
and user-friendly. The gadget has an advantage over the items already on the market
due to its portability and affordability. The technology shortens class periods,
increasing instructors' and students' important teaching and learning time and
offering them more opportunities to teach and learn, respectively.
CHAPTER 3

OBJECTIVES AND METHODOLOGY

3.1 OBJECTIVES

● To detect multiple faces in real-time scenarios for monitoring attendance in


working place and producing the daily report to an authorized person.
● To increase identity security using a face recognition system.
● To prevent the misuse of the identity of an individual.

3.2 METHODOLOGY

The library attendance system is being built in three


phases, each based on one of its three subsystems: API Service, Face Recognition
Using YOLOv5 and Visitor Identification System. The stages of development are as
follows:

1. Creation of a library attendance API service

2. Developing facial recognition software using YOLOv5

3. Creating a system for visitor identification 

4. Thorough system testing.


Figure 1 Methodology chart

The general software development life-cycle is used in phases 1 and 3 to


construct the API Service and Visitor Identification System. Stage 2 of facial
recognition development using YOLOv5 calls for more careful attention. YOLO
(You Only Look Once) is a very accurate real-time object identification system. For
object identification, the YOLO method makes use of a convolution neural network.
To find items in a picture, YOLO employs an artificial neural network (ANN)
method. The picture is divided into various parts by this network, which also
predicts each boundary city. YOLO excels in both picture prediction and
classification. YOLO can recognize many images at the same time. YOLO v5
includes four models for training data: YOLO v5-s, YOLO v5-m, YOLO v5-l, and
YOLO v5-x. The network design, number of layers, and number of parameters
differ throughout the four models.
3.2.1 BUILDING A DATASET

The first step is to create a dataset before beginning training.

Figure 2 Labelling and Roboflow

A JPG/img image collection of images is required to create the dataset, and each
photo is subsequently tagged or annotated using labeling software. The annotation's
output comes in the form of an XML file. For picture pre-processing, The XML
files in the datasets are then concatenated.

3.2.2 TRAINING DATASET

In the second approach, which employs a proprietary YOLO v5-s model,


the outputs of the training phase, or Yolov5 weight model, are used for detection.

Figure 3 Flowchart of Dataset to Weight Model Yolov5

The dataset is read, and a class is created to serve as the basis for Yolov5's unique
detection model. The dataset is then trained using this file. When the training is
finished, pictures and videos may be used to test the training's data.
3.2.3 OBJECT DETECTION MODEL TESTING

The final procedure is object detection using a trained model.

Figure 4 Flowchart of Input to Output

Using images and videos, the testing step is represented in the above picture. For
testing, images or videos can be used as input. The built-in model is loaded to begin
the detection phase, which is followed by classification and prediction using
bounding boxes and confidence ratings. Prediction boxes, confidence values, and
object classes are shown as the results. Thorough System Testing is the fourth stage.
Comprehensive system testing is carried out by incorporating the three-part sub-
system to represent the comprehensive system testing phase in order to evaluate the
integrated sub-system.

Figure 5 Overall Flowchart

The input begins with a face that the camera has photographed. The built-in model
will then be loaded and used by the detection process to do classification and
prediction utilizing bounding boxes and confidence scores. Prediction boxes,
confidence values, and object classes are shown as the results. The report database
will save the detection results as an object class (cls), and the report page will then
display the data from the report database. The report page is designed to show
information in the form of a number, user name, NPM, attendance date, and time.
CHAPTER 4
PLATFORMS

4.1 SOFTWARE REQUIREMENT

4.1.1 H/W SYSTEM CONFIGURATION:-

• processor - INTEL

• RAM - 4 GB (min)

• Hard Disk - 20 GB

4.1.2 S/W SYSTEM CONFIGURATION:-

• Operating System : Windows 7 or 8


• Software : Python Idle

4.2 SOFTWARE ENVIRONMENT

4.2.1 PYTHON TECHNOLOGY:


Python is a general-purpose, interpreted programming language. Programming
paradigms including procedural, object-oriented, and functional programming are
all supported. Due to its extensive standard library, Python is frequently referred to
as a "batteries included" language.

4.2.2 PYTHON PROGRAMING LANGUAGE:


A multi-paradigm programming language is Python. A large number of its
features allow functional programming and aspect-oriented programming, including
metaprogramming and met objects (magic methods), and object-oriented
programming and structured programming are both fully supported. Extensions are
available for many additional paradigms, such as design by contract and logic
programming.

Python packages with a wide range of functionality, including:

● Easy to Learn and Use


● Expressive Language
● Interpreted Language
● Cross-platform Language
● Free and Open Source
● Object-Oriented Language
● Extensible
● Large Standard Library
● GUI Programming Support
● Integrated

Python's memory management system combines reference counting and a cycle-


detecting garbage collector with dynamic typing. Moreover, it has a dynamic name
resolution (late binding) capability that binds variable and method names as the
programme is being run.
Python was made to be very extendable rather than having all of its features
included in its core. It is especially well-liked for adding programmable interfaces
to already-existing applications because of its compact modularity. Van Rossum's
dissatisfaction with ABC, which advocated the opposite strategy, led to his concept
of a tiny core language with a huge standard library and easily expandable
interpreter.
CHAPTER 5
YOLOV5 AND PYTHON
ARCHITECTURE

5.1 YOLO ALGORITHM

YOLO is an algorithm that uses neural networks to provide real time object
detection. The concept of object detection in computer vision includes identifying
different things in digital photos or movies. YOLO is an algorithm that can find and
identify different items in images. On the other hand, the YOLO framework (You
Only Look Once) approaches object identification in a different way. It predicts the
bounding box coordinates and class probabilities for these boxes using the complete
picture in a single instance. The main benefit of adopting YOLO is its outstanding
speed; it can process 45 frames per second. Moreover, YOLO is aware of generic
object representation.

YOLO is popular because it achieves great accuracy while running in real-


time. The method "just looks once" at the picture in the sense that making
predictions takes only one forward propagation pass through the neural network.
Following non-max suppression (which ensures that the object detection algorithm
identifies each object only once), it returns identified objects together with
bounding boxes. A single CNN predicts multiple bounding boxes and class
probabilities for those boxes using YOLO. YOLO trains on entire photos and
enhances detection performance immediately. This approach offers some
advantages over conventional object detection methods: YOLO moves incredibly
quickly. YOLO sees the complete image during training and testing, thus it stores
contextual information implicitly.

When you feed an image into a YOLO algorithm, it divides the picture into a
SxS grid and utilises it to determine if a given bounding box contains the object (or
portions of it). It then uses this knowledge to determine what class the object
belongs to. We must comprehend how the algorithm creates and specifies each
bounding box before we can go into depth and describe how the method works. The
YOLO algorithm predicts an outcome by using four components and extra value.

1.The center of a bounding box (bx by)

2.Width (bw)

3.Height (bh)

4.The Class of the object (c)

The final predicted value is confidence (pc). It displays the likelihood that an
object will be found inside the bounding box. The centre of the enclosing box is
represented by the (x,y) coordinates. As most bounding boxes won't typically
include an item, we must employ computer prediction. Non-max suppression is a
technique we may use to get rid of extra boxes that are unlikely to contain items and
those that share large regions with other boxes.

5.1.1 ADVANTAGES

In the past, object identification tasks were completed in a pipeline of multi-


step series using techniques like Region-Convolution Neural Networks (R-CNN),
including rapid R-CNN. R-CNN trains each component individually while
concentrating on a particular area of the picture.
This procedure takes a long time since the R-CNN must categorise 2000
areas every image (47 seconds per individual test image). As a result, real-time
implementation is not possible. Moreover, R-CNN employs a fixed selection
method, meaning no learning process takes place at this point and the network may
produce a subpar area recommendation.

As a result, object detection networks like R-CNN are slower than YOLO and
are more difficult to improve. YOLO is based on an algorithm that employs just one
neural network to perform all of the task's components, making it quicker (45
frames per second) and simpler to tune than earlier techniques.

We must first investigate YOLO's design and algorithm in order to


comprehend what it is.

5.2 YOLOV3 NETWORK

YOLO V3 is an incremental improvement over YOLO V2, which employs a


different type of Dark net. This YOLO V3 architecture has 106 layers, with 53
trained on ImageNet and another 53 tasked with object identification. While this has
significantly increased network accuracy, it has also lowered network speed from
45 to 30 frames per second.

5.3 YOLOV5 NETWORK

Similar to a standard CNN, a YOLO network has convolution and max-pooling


layers, followed by two fully linked CNN layers. Inverse Loss Function As the
YOLO method predicts numerous bounding boxes for each grid cell, we only want
one of the predicted bounding boxes to be accountable for the item within the
image. To do this, we compute the loss for each true positive using the loss
function. The bounding box with the highest Intersection over Union (IoU) with the
ground truth must be chosen in order to increase the efficiency of the loss function.
By creating specific bounding boxes, this technique enhances predictions for
particular aspect ratios and sizes.

5.4 PYTHON
Python is designed to be a language that is simple to read. Its
formatting is visually clean and frequently substitutes English keywords for
punctuation in other languages. It differs from many other languages in that blocks
are not delimited by curly brackets, and the use of semicolons to end statements is
optional. Compared to C or Pascal, it features fewer syntactic exceptions and
special circumstances.
Figure 6: A repository architecture for an IDE

Python gives developers a choice in their development style while aiming for a
simpler, less cluttered syntax and grammar. Python adheres to a "there should be
one and preferably only one obvious way to do it" design ethos as opposed to Perl's
"there is more than one way to do it" maxim. "To label anything as 'smart' is not
considered a praise in the Python culture," argues Alex Martelli, a Fellow of the
Python Software Foundation and author of several Python books.
The Python developers try to avoid over-optimizing code, and they reject changes to
non-critical areas of the Python reference implementation that might result in slight
speed improvements at the expense of readability. When speed is crucial, a Python
programmer can use PyPy, a just-in-time compiler, or relocate time-critical
functions to extension modules written in languages like C. There is also Python,
which converts a Python script into C and allows users to use the Python interpreter
directly from C-level APIs.
The developers of Python prioritise keeping the language enjoyable to use.
This is reflected in the name of the language, which pays homage to the British
comedy group Monty Python, as well as in the language's occasionally lighthearted
approach to tutorials and reference materials, as in the use of examples like spam
and eggs (from a well-known Monty Python sketch) rather than the more traditional
foo and bar.
Duck typing is used in Python, which has typed objects but untyped variable
names. Type constraints aren't checked at build time; instead, actions on an object
could fail because the object's type isn't right. Python is highly typed, despite its
dynamic typing; it forbids operations that are not clearly stated rather than trying
invisibly to make sense of them.

Figure 7 Work flow of a source code

5.4.1 The Python Platform:


Python's platform module can be used to get data on the underlying platform,
including details on the hardware, operating system, and interpreter version. Tools
to view information about the hardware, operating system, and interpreter version of
the platform where the programme is running are included in the platform module.
The current Python interpreter can be found out using four functions.
Python's major, minor, and patch level components are returned in various ways via
the functions python version() and python version tuple(). The Python compiler is
reported by the function python compiler(). Moreover, python build() returns a
version string for the interpreter's build.
Platform() returns a string with a universal platform identifier in it. There are two
optional Boolean arguments the function will take. The names in the return value
are changed from their formal names to their more informal forms if aliased is set to
true. returns a minimum value with some parts skipped when terse is true.

5.4.2 WHAT DOES PYTHON TECHNOLOGY DO?


Python is very well-liked by programmers, but actual usage demonstrates that
business owners also believe in Python development, and for good reason. Its
reputation as one of the simplest programming languages to learn and its simple
syntax make it a favourite among software engineers. The fact that there is a
framework for almost everything, from web apps to machine learning, is
appreciated by business owners or CTOs.
However, it is more of a technology platform that was created by a massive
collaboration amongst hundreds of independent professional developers who came
together to establish a sizable and unusual community of enthusiasts.
What specific advantages does the language offer individuals who chose to adopt it
as their primary technology, then? Here are only a few of them justifications.

5.4.3 PRODUCTIVITY AND SPEED


There is a well-known myth in the programming community that writing Python
code can be up to 10 times faster than writing Java or C/C++ code for the same
application. The clear object-oriented design, improved process control capabilities,
excellent integration, and text processing capabilities all contribute to the
outstanding advantage in terms of time savings. Moreover, its own unit testing
framework makes a significant contribution to its efficiency and performance.

5.4.4 PYTHON IS POPULAR FOR WEB APPS

The market still favors technology for quick and efficient web development because
web development isn't showing any signs of slowing down. Together with
JavaScript and Ruby, Python also offers excellent support for creating web apps and
is very well-liked in the web development world thanks to its most well-known web
framework, Django.

5.5 OPEN-SOURCE AND FRIENDLY COMMUNITY


It is created under an OSI-approved open source licence, as mentioned on the
official website, allowing it freely distributable and useful. Furthermore, the
community actively participates and organizes conferences, meet-ups, hackathons,
etc. to promote camaraderie and knowledge-sharing.

5.5.1 PYTHON IS QUICK TO LEARN


It is said that the language is relatively simple so you can get pretty quick results
without actually wasting too much time on constant improvements and digging into
the complex engineering insights of the technology. Even though Python
programmers are really in high demand these days, its friendliness and
attractiveness only help to increase number of those eager to master this
programming language.

5.5.2 BROAD APPLICATION

It is used for the broadest spectrum of activities and applications for nearly all
possible industries. It ranges from simple automation tasks to gaming, web
development, and even complex enterprise systems. These are the areas where this
technology is still the king with no or little competence:
● Machine learning as it has a plethora of libraries implementing machine learning
algorithms.
● Web development as it provides back end for a website or an app.
● Cloud computing as Python is also known to be among one of the most popular
cloud-enabled languages even used by Google in numerous enterprise-level
software apps.
● Scripting.
● Desktop GUI applications.

CHAPTER 6
EXPERIMENTAL PROCEDURE

6.1 IMAGE PROCESSOR


The tasks of picture collecting, storage, preprocessing, segmentation,
representation, recognition, and interpretation are completed by an image processor,
which then displays or records the finished image. The basic steps of an image
processing system are shown in the block diagram below.

PROBLEM IMAGE SEGMENTATION REPRESENTATI


DOMAIN ACQUISITION ON &
DESCRIPTION

KNOWLEDGE RESULT
PREPROCESSING RECOGNITION &
BASE INTERPRETATION

Fig 8 block diagram of fundamental sequence involved in an image


processing system

The process begins with picture acquisition, which is done using an imaging
sensor and a digitizer to digitize the image, as shown in the diagram. The following
phase is preprocessing, when the image is enhanced and fed into the other processes
as an input. Preprocessing frequently involves improving, eliminating noise,
isolating regions, etc. Segmentation divides an image into its individual objects or
components. The result of segmentation is often raw pixel data, which either
includes the region's perimeter or the region's individual pixels. The process of
representation involves converting the raw pixel data into a format that can be used
by the computer for further processing. The task of description is to identify key
characteristics that distinguish one class of items from another. Based on the details
provided by an object's descriptors, recognition gives it a label. An ensemble of
identified things must be given meaning in order to be considered as interpreted.
The knowledge base includes the information on a certain problem domain. Each
processing module is guided in its functioning by the knowledge base, which also
regulates how the modules communicate with one another. Not all modules are
required to perform a given function. The application determines how the image
processing system is built. The image processor typically operates at a frame rate of
25 or less per second.

6.2 IMAGE PREPROCESSING:

The input image may be of a different size, contain noise, and have a different
colour scheme after preprocessing. These settings must be changed in accordance
with the process' requirements. Picture regions with low signal levels, such as
shadow areas or underexposed photographs, are where image noise is most
noticeable. There are numerous sorts of noise, including film grains, salt and pepper
noise, and others. All of this noise is eliminated using filtering algorithms. Weiner
filter is one of the many filters used. The acquired image will be processed for
accurate output in the preprocessing module. An algorithm was used for pre-
processing. Pre-processing must be done for all photographs in order to improve the
final product.

6.3 IMAGE SEGMENTATION:


The technology and process of segmenting the facial image into various
distinct and specific regions and locating objects of interest within those regions is
known as facial image segmentation. Technology for face image segmentation has
become widely employed in processing facial data in recent years.

6.4 FEATURE EXTRACTION:

The study of data gathering, organization, analysis, and interpretation is


known as statistics. It covers every facet of this, including how surveys and
experiments are designed and how data collecting is planned. This is what statistics
are used for. The following statistical characteristics of the image: Mean, Variance,
Skewness, and Standard Deviation
Utilizing the Gray-Level Co-Occurrence Matrix for Texture Analysis
(GLCM). The gray-level co-occurrence matrix (GLCM), often referred to as the
gray-level spatial dependence matrix, is a statistical technique for analysing texture
that takes into account the spatial relationship of pixels.

6.5 IMAGE ENHANCEMENT:

Several picture enhancement methods may be grouped using spatial domain


approaches and frequency domain techniques. Several image enhancing techniques
are used to different photographs. This comprises improving the smoothness of the
image and removing noise, blur, etc. It has been shown that Gabor filters are
effective in removing noise and blur. In the phase that follows, picture filtering will
be advantageous.

6.6 CLASSIFICATION:

The relationship between the data and the classifications they are classified into
must be clearly understood in order to classify a piece of data into several classes or
categories. In order for a computer to accomplish this, it must be trained. Training is
essential for categorization success. Techniques for classification were initially
created. Features are characteristics of the data items that serve as the basis for
classifying them into different groups.
1). The picture classifier acts as a discriminant, favoring some classes over others.
2). highest for one class, lower for other classes in the discriminant value
(multiclass) 3). Positive discriminant value for one class and negative for another
(two class).

6.7 API DOCUMENTATION GENERATORS

Generators of Python API documentation include:


● Sphinx
● Epydoc
● HeaderDoc
● Pydoc

6.7.1 USES

Python has been successfully embedded in many software products as a scripting


language, including in finite element method software such as Abaqus, 3D
parametric modeller like FreeCAD, 3D animation packages such as 3ds Max,
Blender, Cinema 4D, Lightwave, Houdini, Maya, modo, MotionBuilder, Softimage,
the visual effects compositor Nuke, 2D imaging programmes like GIMP, Inkscape,
Scribus and Paint Shop Pro, and musical notation programmes like scorewriter and
capella. Python is used by GNU Debugger as an aesthetically pleasing printer to
display complex structures like C++ containers. Python is recommended by Esri as
the best language for creating scripts for ArcGIS. It has also been incorporated into
a number of video games, and Google App Engine has chosen it as the first of the
three programming languages it offers, the other two being Java and Go.

With the aid of packages like TensorFlow, Keras, and Scikit-learn, Python is
frequently used in artificial intelligence projects. Python is frequently used for
natural language processing because it is a scripting language with a modular
architecture, easy syntax, and rich text processing facilities.

Python is a widely used operating system that comes as standard equipment. It may
be used from the command line and is included with the majority of Linux
distributions, AmigaOS 4, FreeBSD (as a package), NetBSD, OpenBSD (as a
package), and macOS (terminal). Python-based installers are used by several Linux
distributions; Red Hat Linux and Fedora utilise the Anaconda installer, while
Ubuntu uses the Ubiquity installer. Python is used by Gentoo Linux's Portage
package manager.

Python is widely used in the information security sector, notably for the creation of
exploits.

Python is the primary programming language used in Sugar Labs' development of


the One Laptop per Child XO software. Python has been chosen as the primary
user-programming language for the Raspberry Pi single-board computer project.

Python is a component of LibreOffice, which seeks to displace Java with it. With
Version 4.0 on the 7th of February 2013, its Python Scripting Provider has become
a key component.
6.8 PANDAS
Pandas is a software library for the Python programming language designed
for data manipulation and analysis in computer programming. It includes specific
data structures and procedures for working with time series and mathematical
tables. It is free software distributed under the BSD license's three clauses. The
word is derived from "panel data," a phrase used in econometrics to refer to data
sets that contain observations for the same persons throughout a range of time

periods.

Fig 9 Software libraries


6.8.1 LIBRARY FEATURES

● A data manipulation object called a data frame with built-in indexing.


● Instruments for transferring data across several file formats and in-memory data
structures.
● Data synchronization and seamless handling of missing data.
● Reorganizing and rotating data sets.
● Subsetting of big data sets, clever indexing, and label-based slicing.
● Inserting and removing columns from data structures.
● The ability to split-apply-combine data sets using the group by engine.
● Joining and combining data sets.
● The use of a lower-dimensional data structure's hierarchical axis indexing to
handle high-dimensional data.
● Time series functionality includes date shifting, frequency conversion, lagging,
moving window statistics, and moving window linear regressions.
● Offers data filtering.

Fig 10 Python panda features


6.9 CSV READER

The CSV (Comma Separated Values) file format is straightforward and used to
store tabular data in spreadsheets and databases. Tabular data (numbers and text) is
stored as plain text in a CSV file. The file's lines each contain a data record. One or
more fields, separated by commas, make up each record. The name of this file
format is derived from the fact that fields are separated by commas.

Python includes a module called csv that may be used to open and read CSV files.

Fig 11 Opening a CSV reader


CHAPTER-7
PROCESSOR
7.1 INTRODUCTION TO PROCESSOR

The basic instructions needed to operate a specific computer are responded to and
processed by the processor, which is a chip or logical circuit. The fetching,
decoding, execution, and write-back of an instruction are the processor's primary
tasks. Any system that includes computers, laptops, smartphones, embedded
systems, etc. has a processor, which is also referred to as the system's brain. The
two components of the processors are the CU (Control Unit) and ALU (Arithmetic
Logic Unit). The control unit functions like a traffic cop, managing the command or
the operation of the instructions, while the arithmetic logic unit conducts all
mathematical operations such as additions, multiplications, subtractions, divisions,
etc. The input/output devices, memory, and storage devices that make up the other
components are likewise in communication with the processor.

7.2 GENERAL PURPOSE PROCESSOR

There are five different types of general-purpose processors: media


processor, embedded processor, DSP, microcontroller, and microprocessor.

7.3 MICROPROCESSOR
In embedded systems, the microprocessor is a representation of the general-
purpose processors. There are numerous types of microprocessors from various
manufacturers on the market. The microprocessor is a general-purpose processor
that includes a control unit, ALU, and a number of registers, including control
registers, status registers, and registers for scratchpads. There could be an on-chip
memory as well as ports, interrupt lines, and other lines for the memory and
interfaces for interacting with the outside world. Ports are frequently referred to as
programmable ports since we may programme them to operate as either inputs or
outputs. In the table below, general-purpose processors are listed.

7.3.1 BASIC COMPONENTS OF PROCESSOR

⮚ ALU stands for arithmetic logic unit, which help out to execute all arithmetic
and logic operations.
⮚ FPU (Floating Point Unit) is also called the “Math coprocessor” that helps to
manipulate mathematically calculations.
⮚ Registers store all instructions and data, and it fires operands to ALU and save
the output of all operations.
⮚ Cache memory helps to save more time in travelling data from main memory.

7.3.2 PRIMARY CPU PROCESSOR OPERATIONS

⮚ Fetch – In which, to obtain all instructions from main memory unit (RAM).


⮚ Decode – In this operation, to convert all instructions into understandable ways then
other components of CPU are ready to continue further operations, and this entire
operations ar performed by decoder.
⮚ Execute – Here, to perform all operations and every components of CPU which are
needed to activate to carry out instructions.
⮚ Write-Back – After executing all operations, then its result is moved to write back.
7.4 TYPES OF PROCESSOR

Here, we will discuss about different types of CPU (Processors), which are
used in computers. If you know how many types of CPU (Processors) are there,
then short answer is 5 types of processor.

7.4.1 SINGLE CORE PROCESSOR

Single Core CPUs were used in the traditional type of computers. Those


CPUs were able to perform one operation at once, so they were not comfortable to
multitasking system. These CPUs got degrade the entire performance of computer
system while running multiple programs at same time duration.

In Single Core CPU, FIFO (First Come First Serve) model is used, it means that
couple of operations goes to CPU for processing according to priority base, and left
operations get wait until first operation completed.

7.4.2 DUAL CORE PROCESSOR

Two processors make up a dual core processor, and they are connected to one
another like a single integrated circuit (Integrated circuit). Each processor has its
own local cache and controller, enabling it to complete various challenging tasks
faster than a single core CPU.

Intel Core Duo, AMD X2, and the dual-core PowerPC G5 are a few examples of
dual core CPUs in use.

7.4.3 MULTI CORE PROCESSOR


Multi core processor is designed with using of various processing units’
means “Cores” on one chip, and every core of processor is able to perform their all
tasks. For example, if you are doing multiple activities at a same time like as using
WhatsApp and playing games then one core handles WhatsApp activities and other
core manage to another works such as game.

7.4.4 QUAD CORE PROCESSOR

Quad core processor is high power CPU, in which four different processors
cores are combined into one processor. Every processor is capable to execute and
process all instructions own level without taking support to other left processor
cores. Quad core processors are able to execute massive instructions at a time
without getting waiting pools. Quad core CPU help to enhance the processing
power of computer system, but it performance depend on their using computing
components.

7.4.5 OCTA CORE PROCESSOR

Octa core processor is designed with using of multiprocessor architecture,


and its design produces the higher processing speed. Octa core processor has best
ability to perform multitasking and to boost up efficiency of your CPU. These types
of processors are mostly used in your smart phones.

7.5 WEB CAM

A webcam is a video camera that streams or sends its image live to or over a
network of computers. The computer can "capture" a video stream, which can then
be saved, viewed, or shared to other networks via the internet and email as an
attachment. The video feed can be saved, viewed, or further transferred to a remote
destination.
A webcam is typically connected through a USB connection or other similar cable,
or it may be integrated into computer hardware, such as laptops, unlike an IP
camera, which connects using Ethernet or Wi-Fi.

Fig 12 Web cam

7.5.1 VIDEO CALLING AND VIDEOCONFERENCING

Videoconferencing, videophone, and videotelephony One-to-one live video


communication over the Internet has now reached millions of common PC users
worldwide. Cameras can be integrated to instant messaging, text chat services like
AOL Instant Messenger, and VoIP systems like Skype. Webcams are beginning to
replace traditional video conferencing solutions thanks to improved visual quality.
Webcams are becoming more and more popular thanks to new capabilities like
automatic lighting adjustments, real-time upgrades (retouching, wrinkle smoothing,
and vertical stretch), automatic face tracking, and autofocus. Program, computer
operating system, and CPU capabilities can all affect the webcam's functionality
and performance. Several well-known instant messaging apps now include
capabilities for video calling.
7.5.2 VIDEO SECURITY

Security cameras can be made from webcams. There is software that enables PC-
connected cameras to listen for sound and detect movement, and record both when
they are found. These recordings can then be downloaded from the Internet, saved
to a PC, or sent via email. In one well-known instance, the owner of the computer
was able to provide authorities with a clear photograph of the burglar's face even
after the computer had been stolen because the burglar e-mailed pictures of himself
while the computer was being stolen. Webcam access without authorization might
cause serious privacy problems (see "Privacy" section below).

7.5.3 VIDEO CLIPS AND STILLS

Both static images and video can be captured using webcams. For this, a variety of
widely used software programmes can be used, such as Pic Master (for Windows
operating systems), Photo Booth (Mac), or Cheese (with Unix systems). See out
Comparison of webcam software for a more comprehensive list.

7.5.4 INPUT CONTROL DEVICES

The visual stream from a camera can be used by specialised software to


facilitate or improve a user's control over programmes and games. To create a
similar type of control, video features such as faces, forms, models, and colours can
be seen and tracked. For instance, a head-mounted light would enable hands-free
computing and significantly increase computer accessibility. The position of a
single light source might also be recorded and utilised to simulate a mouse cursor.
This can be used in games to provide players more control, better involvement, and
a more immersive experience. Free Track is a free webcam motion-tracking
programme for Windows that can track an exclusive head-mounted model in up to
six degrees of freedom and output information to mouse, keyboard, joystick, and
Free Track-compatible games. The webcam's IR filter can be removed so that IR
LEDs can be used instead. As IR LEDs are invisible to the naked eye, the user is not
distracted when using them. A commercial application of this technology is called
Track IR.

7.5.5 ASTRO PHOTOGRAPHY

A select few particular webcam models with very-low-light capability are used
frequently by astronomers and astro photographers to capture images of the night
sky. These cameras often have manual focus and a somewhat older CCD array
rather than a CMOS array. The cameras' lenses are taken off, and the cameras are
then mounted on telescopes to capture still, moving, or both types of media. In more
recent methods, movies of extremely faint objects are captured for a few seconds,
and then all of the video's frames are "stacked" together to create a still image with
decent contrast.

7.6 OPENCV

A group of enthusiastic programmers created the Open Source Computer


Vision Library in 1999 to integrate image processing into a wide range of coding
languages. It runs on Windows, Linux, Android, and Mac and features C++, C, and
Python interfaces.
The Open CV project, which was formally introduced in 1999, began as an
Intel Research endeavor to develop CPU-intensive applications. It was one of
several initiatives that also included real-time ray tracing and 3D display walls.
The Intel Performance Library Team and several optimization specialists
from Intel Russia were among the project's major contributors. To make code more
legible and transferrable, a standard infrastructure was created that developers could
use to spread vision information. By making portable, performance-optimized code
freely available under a license that does not demand that the code itself be open or
free, we can advance commercial vision-based applications. At the IEEE
Symposium on Computer Vision and Pattern Recognition in 2000, the first alpha
version of Open CV was made available to the public, and five beta versions
followed between 2001 and 2005. In 2006, the initial 1.0 version was released. A
version 1.1 "pre-release" was made accessible in October 2008. The Open CV's
second significant update came out in October 2009. The C++ interface has
undergone significant changes in Open CV 2, with the goal of facilitating new
functions, easier and more type-safe patterns, and improved performance for
existing ones (especially on multi-core systems). The frequency of official releases
has been increased to every six months, and independent Russian developers are
now backed by private businesses. In August 2012, a non-profit organization called
OpenCV.org, which runs a development and user website, took over maintenance
for Open CV. An agreement to acquire was inked by Intel in May 2016. It is a well-
known Open CV developer.

7.7 YOLO

YOLO is a method that provides real-time object detection using neural


networks. The phenomenon of object detection in computer vision includes
identifying different things in digital photos or movies. YOLO is an algorithm that
can find and identify different items in images. On the other hand, the YOLO
framework (You Only Look Once) approaches object identification in a different
way. It predicts the bounding box coordinates and class probabilities for these boxes
using the complete image in a single instance. The main benefit of adopting YOLO
is its outstanding speed; it can process 45 frames per second.
Moreover, YOLO is aware of generalized object representation. This is one
of the best object detection algorithms and has demonstrated performance that is
comparable to R-CNN algorithms. We shall learn about various methods employed
by the YOLO algorithm in the parts that follow. One of the fundamental issues in
computer vision is object detection, where the goal is to identify what and where,
particularly what things are present in a given image and where they are located
within the image. Object detection is a more challenging task than classification,
which can also identify items but does not tell the viewer where they are in the
image. Moreover, classification fails for photos with several objects. YOLO takes a
very different tack. A smart convolutional neural network (CNN) called YOLO is
used to recognise objects in real time. A single neural network is applied to the
entire image by the algorithm, which then divides it into regions and forecasts
bounding boxes and probabilities for each region. The projected probabilities are
used to weight these bounding boxes.
Because it can run in real-time and attain great accuracy, YOLO is well-liked.
In the sense that it only needs to perform one forward propagation run through the
neural network to produce predictions, the algorithm "only looks once" at the
image. It outputs recognised items along with the bounding boxes after non-max
suppression. A single CNN predicts several bounding boxes and class probabilities
for those boxes simultaneously with YOLO. YOLO moves quite quickly. YOLO
implicitly encodes contextual information about classes in addition to their outward
appearance because it views the full image during training and testing. YOLO
develops generalizable representations of objects, outperforming previous top
detection techniques when trained on natural photos and tested on creative works. A
network called You Only Look Once (YOLO) employs Deep Learning (DL)
techniques for object detection. YOLO accomplishes object detection by
categorising certain things in the image and locating them on it. A YOLO network,
for instance, will produce a vector of bounding boxes for each individual sheep and
identify it as such if you input a picture of a herd of sheep.
7.7.1 HOW YOLO IMPROVES OVER PREVIOUS OBJECT DETECTION
METHODS-

In the past, object detection tasks were completed in a pipeline of multi-step series
using techniques like Region-Convolution Neural Networks (R-CNN), including
fast R-CNN. R-CNN trains each component separately while concentrating on a
particular area of the image.

This procedure takes a long time because the R-CNN must categorise 2000 regions
every image (47 seconds per individual test image). As a result, real-time
implementation is not possible. Furthermore, R-CNN employs a fixed selection
method, meaning no learning process takes place at this point and the network may
produce a subpar area recommendation.

As a result, object detection networks like r-cnn are slower than yolo and are more
difficult to improve. Yolo is based on an algorithm that uses just one neural network
to run all of the task's components, making it faster (45 frames per second) and
simpler to optimise than earlier techniques.

We must first investigate yolo's architecture and algorithm in order to comprehend


what it is.

Yolo architecture -structure design and algorithm operation

In a yolo network, there are three essential components. The algorithm, sometimes
referred to as the predictions vector, comes first. The network, next. The loss works,
thirdly.

7.8 CNN ARCHITECTURE

This research work describes the image classification using deep neural
network combined with HOG feature extraction with K-means segmentation
algorithm and classifies through SVM classifier for more accuracy. The following
advantage of proposed system
1) The proposed CNN method reduce the number of preprocessing steps
2) Extra shape feature extracted from HOG algorithm for provide the better
accuracy
3) SVM classifier reduced the complexity of work and improved the robustness of
system

7.8.1 DEEP NEURAL NETWORK

A Complete 2D dimensional neural network consist of number of image input


layer, convolution layer, ReLu layer, Maxpooling2d layer ,Fully connected
layer ,SoftMax Layer and classification layer ,the detail description of each layer of
classifiers compete .
(1) Image input layer: The Image input Layer learn the feature from the input
image. The first step define pixel of input image, the image size define.
(2) Convolution layer: The convolution layer extract the features of image from
the image input layer. CNN layer consists of one or more kernel with different
weight that are used for extract the features of input image. Depending on weights
associated with each Filter we can extract the feature of image.
(3) Pooling layer: The pooling layer applies a down sampling of convolved
features of image .when detect non linearity of input image. The pooling layer is to
provide the dimension of features map of image.
(4)Fully connected layer: The fully connected layer connect the 26 class of image
data, the above layer of five blocks interconnected which is classified by the fully
connected layer of system, based on the class score we can classify the predicted
score.
Fig 13 CNN Architecture

CHAPTER 8
RESULTS

8.1 DATASET CREATION


In this module, the dataset for the student is created by using Opencv-python.
One thousand image where collected from each and every students to create a
dataset.

Fig 14 Dataset creation

8.2 DATA COLLECTION


CHAPTER-7

Fig 15 Dataset collection

8.3 FACE EXTRACTION

Fig 16 Face extraction

8.4 OUTPUT
Fig 17 Result

8.5 EXPECTED RESULT


The intended outcome of this suggested system is to automatically record each
student's attendance in class using a single group video. Based on the student's
presence or absence, which is determined in the model utilizing face recognition,
the marking of attendance is ultimately accomplished in a website providing date
time and name of the persons attendance in a proper manner.

CHAPTER 9

9.1 CONCLUSION
The previous (manual) method's shortcomings are intended to be lessened by the
automated attendance system. The application of image processing techniques in the
classroom is demonstrated through this attendance system. A wonderful example of
recording student attendance in a classroom is the suggested automated attendance
system using face recognition. Also, this system aids in reducing the likelihood of
proxies and phoney attendance. There are many methods that use biometrics that are
available in the modern world. Yet, due to its great accuracy and minimal need for
human participation, facial recognition emerges as a potential solution. The system's
goal is to offer a high level of security. This technique can enhance an institution's
reputation in addition to simply assisting with the attendance system.

9.2 REFERENCES

[1] P. Cocca, F. Marciano, and M. Alberti, ``Video surveillance systems


to enhance occupational safety: A case study,'' Saf. Sci., 2016
[2] M. L. Garcia, Vulnerability Assessment of Physical Protection
Systems. Oxford, U.K.: Heinemann, 2006.
[3] M. P. J. Ashby, ``The value of CCTV surveillance cameras as an
investigative tool: An empirical analysis,'' Eur. J. Criminal Policy
Res,2017
[4] B. C. Welsh, D. P. Farrington, and S. A. Taheri, ``Effectiveness and
social costs of public area surveillance for crime prevention,'' 2015
[5] The Effectiveness of Public Space CCTV: A Review of Recent
Published Evidence Regarding the Impact of CCTV on Crime, Police
Community Saf. Directorate Scottish Government, Edinburgh, U.K.,
2009.
[6] W. Hu, T. Tan, L. Wang, and S. Maybank, ``A survey on visual
surveillance of object motion and behaviors,'' EEE Trans. Syst., Man,
Cybern. C, Appl. Rev 2004
[7] P. L. Venetianer and H. Deng, ``Performance evaluation of an
intelligent video surveillance systemA case study,'' Comput. Vis. Image
Understand., Nov. 2010
[8] V. Tsakanikas and T. Dagiuklas, ``Video surveillance systems-
current status and future trends,'' Comput. Electr. Eng., Aug. 2018
[9] L. Patino, T. Nawaz, T. Cane, and J. Ferryman, ``PETS 2017: Dataset
and challenge,'' in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Workshops (CVPRW), Honolulu, HI, USA, Jul. 2017
[10] G. Awad, A. Butt, J. Fiscus, D. Joy, A. Delgado, M. Michel, A. F.
Smeaton, Y. Graham, W. Kraaij, G. Quénot, M. Eskevich, R. Ordelman,
G. J. F. Jones, and B. Huet. (2018). TRECVID 2017: Evaluating Ad-Hoc
and Instance Video Search, Events Detection, Video Captioning, and
Hyperlinking.
ANNEXURE – I
WORK CONTRIBUTION

PROJECT TITLE: ARTIFICIAL INTELLIGENCE BASED REAL-TIME ATTENDANCE


SYSTEM USING FACE
RECOGNITION

INDIVIDUAL CONTRIBUTION OF STUDENT 1

Student Name: GURURAJ R

Register Number: 191EC152

Role in the project: Designing UI using Angular JS for creation of login page,
signup page and for all the dashboard visualizations.

INDIVIDUAL CONTRIBUTION OF STUDENT 2

Student Name: HARIRAM S

Register Number: 191EC156

Role in the project: data collection , data organization , feature extraction and output
design execution.

INDIVIDUAL CONTRIBUTION OF STUDENT 3

Student Name: GOWTHAM G K


Register Number: 191EC146
Role in the project: Data Visualization in Python coding end of the project.
ANNEXURE 2 :
ANNEXURE 3 :

You might also like