0% found this document useful (0 votes)
31 views

Precision-Based Face Detection Algorithm Implementation On FPGA

Face detection is a crucial step to implement a face recognition and tracking system which is used in security, surveillance, biometrics, artificial intelligence etc. Face detection is a technique in which face(s) in a image or video and its location in image/video is identified. Face detection can be implemented by using different algorithms which depends upon the accuracy and the processing capabilities of the system on which it is implemented.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Precision-Based Face Detection Algorithm Implementation On FPGA

Face detection is a crucial step to implement a face recognition and tracking system which is used in security, surveillance, biometrics, artificial intelligence etc. Face detection is a technique in which face(s) in a image or video and its location in image/video is identified. Face detection can be implemented by using different algorithms which depends upon the accuracy and the processing capabilities of the system on which it is implemented.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Precision-based Face Detection Algorithm


Implementation on FPGA
Rohit Kumar Singh1 Hridayjit Nayak2
MTech, VLSI and Embedded Systems Senior year
Sardar Vallabhbhai National Institute of Technology Folsom High School
Surat, Gujarat, India California, USA

Arti Singh3
MTech, Electronics and Communication Engineering
Nirma University
Ahmedabad, Gujarat, India

Abstract:- Face detection is a crucial step to implement a bottleneck in the system development. Minimum response
face recognition and tracking system which is used in time of the face detection system is the need of the hour.
security, surveillance, biometrics, artificial intelligence
etc. Face detection is a technique in which face(s) in a The detection of face in an input image/video is the first
image or video and its location in image/video is step in any face tracking system and face recognition system
identified. Face detection can be implemented by using which plays an important role in a surveillance system that
different algorithms which depends upon the accuracy can be so much helpful in many cases like, finding suspects
and the processing capabilities of the system on which it or convicts. An example of this is if a webcam is connected
is implemented. The accuracy of detection is highly to a display, then it can detect any face that walks by in front
influenced by the factors like illumination, head pose, of the webcam. Once this information is stored a number of
occlusion etc. This paper talks about the implementation operations can perform on it in order to detect
of Viola-Jones algorithm for face detection in each image. gender/race/age. Face detection system also has many
This algorithm works on Haar features extracted from a applications in the fields of Biometrics, Robotics, Human
face. Viola-Jones algorithm is a highly accurate interface and other commercial use.
algorithm, but it requires large number of resources. The
complexity level of this algorithm is very high and can be A. Factors affecting Face Detection: Below are some factors
used in such places where the accuracy of the system is a which can affect the result of face detection in an input
major concern. image/video:

Keywords:- Accuracy, Algorithm, Haar Features, Viola-  Head Pose –


Jones. Due to the pose of the head some of the facial features
like nose, eyes, cheeks or lips may be get blocked partially or
I. INTRODUCTION fully. In an input image the location of the faces may vary due
to the profile, half profile, frontal plane rotation and upside
Face Detection is an application of object detection down. Figure [1] below shows different poses of head.
technique. Object detection is finding the location of objects
of a particular class like car, building, face etc. in each image
or video. So, face detection is a technique to find the location
of face/faces in given image/video. The available algorithms
for face detection focus on the detection of the frontal human
face. Since lots of robust hardware and algorithms are
available, it gives a motivation to use face detection in the
wide range of applications. Although powerful algorithms
and hardware are available today but still face detection
systems are not 100% accurate because they work within
some constraints. Illumination, pose, occlusion etc. are some Fig 1 Different Poses of Head
factors which affect the performance of the system. An
efficient face detection algorithm is the one which takes care  Occlusion –
of all the factors like pose, illumination, occlusion etc. But Occlusion is a type of obstruction for a face in an input
these much of factors will increase the complexity of the image which can cover the face either partially or fully by
system as well as the time taken by the algorithm to detect the some other object. Figure [1] below shows some occluded
face. The response time of the face detection system acts as a face which can affect the result of face detection.

IJISRT24OCT808 www.ijisrt.com 1385


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

 Image Orientation—
This factor depends upon the nature of the input image,
which may appear upside down, rotated, inverted, or in the
correct form.

 Computation Time and Speed—


For real-time applications, an algorithm's computation
Fig 2 Occluded Face Images time plays an important and critical role. The computation
time depends upon the algorithm's complexity and the
 Illumination— availability of hardware resources.
Illumination determines the quality of images and
affects the result of face detection algorithms. This  Facial Expression—
illumination factor is correlated with the lighting and the A human's emotion is expressed by facial expression.
angle of light at which it falls on the face in the image. Figure Emotions like angriness or happiness are directly related to
[1] below shows a face under different illumination facial expression and can directly impact an individual’s
conditions. facial expression [1].

B. Features of Human Faces:


A human face has many unique features, such as two
eyes, eyebrows, nose, lips, mouth, cheek, skin colour, etc.
These features have some patterns, like the skin region under
the eye is darker than the cheek area, the nose’s bridge is
brighter than the remaining nose area, etc. Our face detection
technique belongs to the pattern recognition problem class.
Fig 3 Same Faces Under Different Illumination Condition So, to detect a face in an image, we need to use the features
of a face model.

Fig 4 Features of Human Face

IJISRT24OCT808 www.ijisrt.com 1386


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

II. CLASSIFICATION OF FACE DETECTION TECHNIQUES

There are four main categories in which face detection techniques can be broadly divided: knowledge-based face detection
technique, feature-based face detection technique, template-based face detection technique and statistic-based face detection
technique. Each category is again subdivided into different techniques which are given in below figure [3]

Fig 5 Classification of Face Detection Methods

 Knowledge based Face Detection Technique: declared a face. The difficult part of this method is building
It is a rule-based face detection technique. In this such appropriate rules.
technique, some rules are defined to detect the faces from the
input image. These rules can be extended to detect faces from  Feature-based Face Detection Technique:
a complicated background. These rules are nothing but some This face detection technique depends on the features of
features like 2 ears, 1 nose, 2 eyes, 1 mouth and other facial human faces. A human face can be distinguished by other
features. For example, a rule is like there are two symmetric objects by using these features like the area under eye is
eyes a face usually has, and the area under the eyes is darker darker than the cheeks area, the edge of the nose is brighter
than the cheeks. In the input image, firstly the facial features than the surrounding area etc. This technique depends on the
are extracted and then the face is detected according to the features which are extracted from the human face and will not
defined rules [4]. be undergone for any changes due to the factors like
occlusion, illumination, pose etc. Skin colour, nose, eyes,
The knowledge-based face detection technique tries to ears, mouth, eyebrows and etc., are some features that can be
capture knowledge of human faces and encode it into well- used in face detection techniques.
defined criteria. When the input image meets the criteria, it is

IJISRT24OCT808 www.ijisrt.com 1387


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Generally, the edge detector is used to extract these


features. Then a statistical model is created to provide a
relationship between the characteristics which are extracted
from the features and this model is used to determine the
presence of a huma face [3]. Furthermore, some studies have
proved that skin colour of human is an excellent feature of a
human face and can be utilized to detect human faces from
the other objects because the skin colour of different people
is different [5]. Along with this, human faces can be Fig 6 Sample of Defined Template’s Images
differentiated from the other objects with the help of the
textures of the human face because human faces have In predetermined template-based technique a standard
particular textures. Edges of the features can also help to template is calculated, and then we calculate the associated
detect the face from the other object. value of detection area and templates. When the associated
value is within the limits, the detection area is a human face.
 Templet based Face Detection Technique: In the deformable template-based technique, firstly a
A template base face detection technique tries to define template parameter is developed, and then according to the
a function for a face. This technique tries to find a global detection area, we modify the parameters until convergence,
function for all the faces. This function acts as a template. The in order to achieve the purpose of face detection.
features of a face act as a variable for the template and
different features can be defined individually. For example, a  Statistics based Face Detection Technique:
human face can be divided into mouth, eyes, nose etc. For This technique depends on the statistical analysis and
these different parts, a relationship can be defined in terms of machine learning to find the features of the face. The feature
brightness and darkness. For the face detection, this method of an image is a variable which have some probability for
uses the relation between the pattern present in the input belonging to a face. This technique is a learning-based
image and the pattern which is defined for the face or for its technique in which classifiers are trained by using a number
features. This technique can be divided into two categories: of positive images (having faces) and negative images (don’t
predetermined template-based face detection and deformable have any face). By using AdaBoost (Adaptive Boosting)
template-based face detection. some weak classifiers are collected to create a stage and these
stages are cascaded into multiple stages [8]. So, in this first
weak classifier check each single window and if they pass the
threshold value, they will be passed to the next classifiers.
This action continues till the last pixel value. Above figure
explain how we use weak classifiers to make a strong
classifier.

Fig 7 AdaBoost Cascade Classifiers

IJISRT24OCT808 www.ijisrt.com 1388


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

III. ADVANTAGES AND CHALLENGES IN FACE DETECTION

 Advantages and Challenges in above-Described Face Detection Methods are given in Table:

Table 1 Advantages and Challenges in Face Detection Methods


Methods Advantages Challenges
Feature Based These techniques have Rotation The main challenge is feature “restoration”. This can
independency, scale happen when the algorithm tries to retrieve features that are
independency and execution time is less imperceptible because of huge varieties, similar to head
when contrasted with different strategies posture when we are coordinating a profile picture with a
[7]. frontal picture [8].
Knowledge Based Knowledge based algorithms are easy to Building an appropriate set of rules which can apply for all
implement. faces with the different condition. There could me many
false positive if the rule were being so general, while there
might be numerous false negative if the rules are too
detailed. [9]
Statistic Based In this technique a window with non-face In this method the system structure requires many
will be dismisses through early stages adjustment (number of layers of nodes, learning rate, etc.) to
and along these lines the execution time acquire the desired performance [3].
will be diminished and accuracy will be
increased also [12].
Template Based Template based algorithms are simple for These algorithms are dependent on size, scale, occlusion,
implementation and we can take some and rotation. So every time need to take care that input
assumption in advance [1]. image must be a frontal image and un-occluded [1].

IV. VIOLA JONES ALGORITHM this structure the region of the image which is not
containing the face.
Viola-Jones face detection technique is one of the face
detection techniques which can detect the presence of frontal  Haar-Like Feature:
face(s) in an input image and determine the location of that Our face contains a number of features like nose, eyes,
face. This face detection technique can scan the input image lips, cheek, eyebrows etc. In face detection scheme we prefer
rapidly and give a high accuracy rate. Therefor the detection to use features of the face rather than the pixels directly. There
rate of viola-jones technique is high as well as the false are a number of reasons to use the features instead of the
positive rate is very small. There are three main properties of pixels directly for computation. The first reason of this is the
this algorithm which are characterized briefly as: speed of detection because the detection speed of a feature-
based system is higher than the pixel-based system. The
 The first one is the representation of the input image, in second reason is that the features can be used to encode ad-
which the location of faces needs to determine, into a new hoc domain knowledge that is hard to learn by using a finite
format known as “Integral Image”. This format of the quantity of training data. The feature used in this technique is
image allows the detector to calculate the “features” same as Haar basis function which used by Papageorgiou et
rapidly. Along with the original image, this format also al (1998). Below figure shows some rectangular Haar like
helps to calculate the features at different scale value. This feature.
format of the image can be obtained by only some
mathematical operations per pixel. With the help of the
integral image, we can obtain the “Haar” like features of
an image very quickly and in constant time irrespective of
the location of the pixel.
 The second property of this algorithm is the introduction
of the “Adaptive boosting (AdaBoost)” to select the
critical features of the face out of all the computed feature.
The AdaBoost algorithm is a learning algorithm and after
learning by the different examples of faces and non-faces
it gives a classifier which can classify the face in an
image. Out of all features, the irrelevant features must be
rejected by the learning process to achieve fast
classification and process only the critical features. Fig 8 Some Haar-Like Features
 The third and important contribution is the application of
the cascaded structure of the strong classifiers. This The two, three and four rectangular haar-like feature is
cascaded structure enables the feature like the rejection of shown in figure below. This haar-like feature gives one value
the background region quickly and spends most of the after the computation of the feature which can be used to
computation over the promising face like regions. Due to categorize the subsections of an image.

IJISRT24OCT808 www.ijisrt.com 1389


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

One of the common features of the human face is the


region of eyes is darker than the region of the cheeks. So,
during the training phase of classifier through the database of
images, we make a set of two adjacent rectangles that lies
above the eye and the cheek region and save the dataset for
the testing phase. To compute these features, we compute the
sum of the pixel’s intensity under the black and white
rectangle and take the difference. To compute this summation
rapidly and in equal time we use a new type of image
representation that is the integral image.

 Integral Image:
“Integral image” is the intermediate representation of
the input image. The main objective of this type of
representation is to compute the summation of the pixel’s
intensity under a rectangular region quickly and in constant
Fig 9 Rectangle Haar-Like Features time irrespective to the location where it is needed. This
representation of the input image at pixel location (i,j) can be
achieve by taking the summation of the pixel’s intensity
The computation of two rectangle features (figure 9 (A),
above and to the left of the location (i,j). Equation (3.1) gives
(B)) can be done by taking the difference between the
an idea to calculate the integral image and also shown in
summation of the pixels under white region and black region.
The computation of three rectangle feature (figure 9(C)) is figure below:
done by taking the summation of the pixels coming under the
two outside rectangles and then subtract it from the V. MATLAB IMPLEMENTATION AND RESULTS
summation of the pixels under the centre rectangle. Similarly,
Viola-Jones algorithm for face detection technique is a
for four rectangle feature (figure 9(D)) the value can be
machine learning technique in which a cascaded function is
computed by taking the summation of the pixels under
diagonal region and then take the difference of the two trained through a number of positive (contains face/faces) and
negative images (didn’t contain any face/faces). Now, this
summed values.
function is able to classify the face(s) in other images also.
This cascade classification function is obtained by taking the
 How Haar-Like Feature Works:
weighted sum of weak classifiers. These weak classifiers are
Haar-like features are nothing, but the adjacent
made by the features which are extracted from the training
rectangular region and its value is calculated by taking the
difference between the sum of the pixel’s intensity in the data.
white rectangular region and black rectangular region. These
For the development of MATLAB implementation of
types of features are basically used in the machine learning
the viola-jones algorithm, I have used a pre-trained classifier
where a function is trained by a number of positive images
given by Dr. Rainer Lienhart professor at University at
(these images contain the human faces) and negative images
(these images didn’t contain any human face) which are Augsburg in Computer Science department [10]. This is one
of the best trained cascaded classifiers based on Viola-Jones
utilized to detect the location of human faces in an image. The
approach which is widely used by all prominent companies
input image is scanned and searched for the Haar like feature
such as Intel, Microsoft, Apple etc. for face detection
of the current stage. The size and weight of every feature are
applications. So, after the training of the classifiers, it get
computed by the machine learning algorithm like Adaboost.
trained and then it can be used to classify/detect the objects.
There are a number of features which can be applied on the
face, and it is shown in the figure below: The flow of the implementation of the Viola-Jones algorithm
is shown in figure below:

Fig 10 Applying Haar-Like Feature on a Face

IJISRT24OCT808 www.ijisrt.com 1390


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Fig 11 MATLAB Implementation Flow of Face Detector

The trained data which contains all the features at face detection process. If in an image face is found, then it
multiple stages is taken as a .xml format. This file is returns the starting co-ordinate of the face and its width and
converted into .mat file which stores all the variables that height. With the help of co-ordinate, width, and height we can
contain the values of required constraints. Now all the get extreme co-ordinates of the face. By using these co-
features are used to make a cascaded structure of Haar ordinates face region can bound by a box. The bounding box
features and every stage have its own threshold. This can be drawn by changing the colour of the pixel of the
cascaded structure is used for the detection process. For the original image to the colour of the bounding box.
detection purpose, an input image in RGB/Grayscale is given
to the algorithm. This input image is scaled at multiple values  Classifiers Details:
in order to detect the faces of any size. After the scaling The trained classifiers used here is taken from OpenCV
process, the image is given to integral image generation unit to detect the frontal human face by using viola-jones
to generate the integral image. With the help of this integral algorithm. Training of this cascaded classifier is done by the
image, we can calculate the sum of pixels by using four-pixel frontal faces of size 20X20. The total number of stages used
value irrespective of the number of pixel’s to be summed. here is 22, the total number of Haar classifiers are 2135 and
Through this integral image computation of Haar feature in total number of features used is 4630. For each stage the
the image is done. This Haar feature is used to compare with number of classifiers used is shown in the table below. As
the feature values taken from the trained data. If the computed shown in the table the number of classifiers in each stage is
feature value crosses the threshold value, then it is pass to the increasing thus the complexity of each stage is also increases.
next stage of cascaded structure otherwise got rejected for the

Table 2 Number of Weak Classifiers in each Stage


Stage No. No. of classifiers Stage No. No. of classifiers Stage No. No. of classifiers
0 3 8 56 16 140
1 16 9 71 17 160
2 21 10 80 18 177
3 39 11 103 19 182
4 33 12 111 20 211
5 44 13 102 21 213
6 50 14 135 Total 2135
7 51 15 137

 Performance of MATLAB Implementation: obtained by collecting images of 320x240 pixels from the
In order to obtain the accuracy performance internet. This database contains the frontal face images of
measurement of implemented code has been done. Two different people in a complex background and different
different databases has been used in order to measure the lightening conditions. Table below shows the accuracy of
performance of implemented viola-jones algorithm. First MATLAB implementation of Viola-Jones algorithm with this
database contains 100 images of different people and database.

IJISRT24OCT808 www.ijisrt.com 1391


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Table 3 Accuracy of MATLAB Implementation for Viola-Jones Algorithm


Total Images Image Type Correct detection False Detection Accuracy
100 Frontal face images 80 20 80%

Second database is given by Cambridge University poses this database also contains the facial images of 15
named as “Pointing’04 ICPR Workshop” [11]. This database people with and without spectacles. Accuracy of
contains the face pointing images of 15 different people with implemented algorithm is also tested for the images with and
different head poses. The angle of head poses varies from -90 without spectacles. The image size in this database is
to +90. In order to performance measurement only 0-to-45- 384x288 pixels.
degree variation in head pose is taken. Along with the head

Fig 12 Sample of Image Database

Performance measurement of implemented Viola-Jones detection decreases because increase in face rotation angle
algorithm with ‘Pointing’04 ICPR workshop’ [11] is leads to decrement in visible facial features. In 45o rotation of
tabulated in table below. This table shows the accuracy of face the right eye is less than 50% visible than the 0o rotation
face detection as the face is rotated by 0o, 15o, 30o, 45o. We i.e. frontal face image. Therefore, with 45o facial rotation the
can observe that as rotation angle increases the accuracy of detection accuracy is minimum.

Table 4 Accuracy of MATLAB Implementation of Viola-Jones Algorithm with Database of Cambridge University
Person ID Facial Rotation
0o 15o 30o 45o
Person-1 T T T F
Person-2 T T T T
Person-3 T T F F
Person-4 T T T T
Person-5 T T F F
Person-6 T T T F
Person-7 T T T T
Person-8 F F F F
Person-9 F F F F
Person-10 T T T F
Person-11 T T F F
Person-12 T T T F
Person-13 T T T F
Person-14 T T T F
Peron-15 T T F F
Accuracy 86.67% 86.67% 60% 13.34%

IJISRT24OCT808 www.ijisrt.com 1392


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Along with this database ‘Pointing’04 ICPR workshop’ occlusion of facial features. Table below shows the
[11] database also consists of frontal face images of 15 comparison of implemented algorithm on the database which
different people with and without spectacles. The contains the frontal faces with and without spectacles. This
implemented algorithm is also tested with this database. This table show with spectacles the accuracy of implemented
gives less accuracy as compared to the image of people algorithm deceases as some of the features of face get
without spectacles because the features which are related to blocked.
eyes are getting blocked due to spectacles. This comes under

Table 5 Accuracy of MATLAB Implementation of Viola-Jones Algorithm with and without Spectacles using
Database of Cambridge University
Person ID Frontal Face
With Spectacles Without Spectacles
Person-1 T T
Person-2 T T
Person-3 F T
Person-4 F T
Person-5 F T
Person-6 F F
Person-7 T T
Person-8 F T
Person-9 F T
Person-10 T T
Person-11 T T
Person-12 F T
Person-13 T T
Person-14 T T
Peron-15 F F
Accuracy 46.47% 86.67%

Therefore, this implementation giving 46.67% accuracy


of detection when person wears spectacles and 86.67% with
naked eyes.

 Face Detection Result:


For the detection of the face(s) in an image, we have
given different images to the implemented algorithm. Some
of these images contain single face and some of the images
contain multiple faces. The complexity of background is also
varied for different images.
Fig 14 Input and Output Image of Viola-Jones
For the single face the input image and its corresponding Algorithm with 3 Face
output is shown below:
VI. HARDWARE IMPLEMENTATION AND
RESULTS

For the hardware implementation of Viola-Jones


algorithm, FPGA is used as processing element, a camera is
used to take the input image, and a display is used to display
the result. The hardware components used in this project are:

 OV7670 CMOS Camera Module:


The OV7670 is a CMOS camera module which can
operate at a maximum of 30 FPS and 640 x 480 resolution. It
is equivalent to 0.3 MegaPixels. This camera is used to take
Fig 13 Input and Output Image of Viola-Jones input image to the Zedboard.
Algorithm with 1 Face
 VGA Display:
For the input image which contains multiple faces in In this project displaying the images is done on a VGA
RGB space, and its corresponding output is shown in figure (Video Graphics Array) display. The resolution of the display
below: is 640x480. Like the other displays (ex. TFT) it has horizontal

IJISRT24OCT808 www.ijisrt.com 1393


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

rows which contains a fixed number of pixels which called as number of general purpose I/Os, switches and LEDs are used
number of columns in the screen. At each pixel location, the for the implementation of some user-controlled activity.
RGB colour information in the video signal is used to control
the colour of the pixel. By changing the analog levels of the  Hardware Setup used:
three RGB signals all other colours are produced. The block diagram of hardware setup and connections
of the Viola-Jones face detection system is shown in figure
 FPGA Development Board: below. Camera OV7670 and a VGA display is connected
The Zedboard development board was chosen for the with the Zedboard. In this setup an image as an input to the
development of our project. The Zedboard is an evaluation system is taken of the resolution of 320x240. Since the
and development board which is based on Xilinx Zynq-7000 display is of resolution 640x480, only the left top corner is
All Programmable SoC (AP-SoC). This development kit used to display the input image or video. VHDL language is
implements a Xilinx Zynq-7000 AP SoC XC7Z020-CLG484 used to develop the code and VHDL code is compile and
which is having 4.9MB Block RAM, 106,400 numbers of synthesized on Xilinx ISE software and programmed onto the
flip-flops, 53200 number of LUTs (look up tables) and 85K FPGA board. A VGA cable is used to connect the VGA
of programmable logic cells. In this project the FPGA kit display and GPIOs are used to connect the OV7670 camera.
plays a role of heart of the entire system that captures images Four switches are used for the user control over the system.
from the camera, process the captured image to get the facial To reset the entire system [SW0] is used. In order to send the
features in the image, and display the faces on the VGA reconfigure the camera register [SW1] is used. [SW2] is used
display monitor. The camera OV7670 is interfaced with the to define the capture mode like whether we want input as
Zedboard via GPIO pins on the board and the VGA display is image or video. [SW3] is used to take the snapshot by the
interfaced with the VGA connector available on board. A camera.

Fig 15 Block Diagram of Hardware Setup used

 The Mapping of Switches on FPGA Board and Detailed Description is given in Table below.

Table 6 Mapping of Switches used for user’s Control


Function Switch Name Mapping to Zedboard Description
RESET [SW0] F22 System Reset
Reconfigure Camera [SW1] G22 Resend the configuration data to the camera’s
pins.
Capture Mode [SW2] H22 Select the capture mode or video mode
[0]: Video mode
[1]: Capture mode
Capture [SW3] F21 Capture an image
[0]: Snapshot
[1]: Video

IJISRT24OCT808 www.ijisrt.com 1394


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

VII. VHDL IMPLEMENTATION  Top Level Implementation:


To control the different modules of the system a top-
Implementation of the Viola-Jones algorithm for face level state machine has been implemented. The top level of
detection system is done on Zedboard using VHDL. The the design consists of a number of modules. These modules
entire implementation is divided into four parts. are responsible for capturing the image, processing it and
display it. All these top-level modules are synchronized with
 Capturing the input image using OV7670 camera and each other and controlled from a top-level entity. A PLL is
saving it into BRAM used to generate the different clock frequencies (25 MHz, 40
 Generation of integral image of a part of the image MHz, 50 MHz and 80MHz) form the on-board crystal
 Process the integral image for finding the face oscillator of frequency 50 MHz. Figure below shows the top-
 Face box creation over the face. level interconnection between the different modules.

Fig 16 Block Diagram of Top level Hardware Implementation

The top-level entity controls the camera module, display between the integral image generation and subwindow
module and the processing part of the input image. For evaluation. This integral image generation and subwindow
processing the input image, the top level entity takes the evaluation process will continue till the entire current scaled
source image from the image frame buffer and process it. The image is processed. After the evaluation of the last
processing of source image is done by first generating the subwindow the current image is further scaled down and the
integral image and then the comparison of weak and strong again the integral image generation and subwindow
stage threshold with pre-trained classifiers. Initially, an evaluation process take place. This system evaluates the
integral image of 40x60 pixels is generated. After this 17 source image for the face candidate at 4 scaled values and
parallel subwindow scans the integral image and evaluates then create the red box around the face in the source image.
this for faces. In parallel with subwindow scanning, the The final processed image shown on the VGA display. The
integral image for the next subwindow has been generated so scanning of image by subwindow and integral image is shown
that there will be no delay to give the new integral image to in figure below:
the subwindow evaluation. So there will be no any latency in

IJISRT24OCT808 www.ijisrt.com 1395


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Fig 17 Sub-Window Scanning and Integral Image

 Capturing of Input Image and Saving it into BRAM: The 12 bit RGB input image is taken from the image
The input to the system which is an image, and it is buffer and convert it into grey image which is used for the
captured by the camera module OV7670. The image is generation of the integral image. The integral image square is
captured by using an on-board switch which is connected to also generated for the purpose of the calculation of variance
one of the pins of the FPGA. The captured image is saved into normalization factor. This generator uses accumulators and
the Block RAM which have a 320X240 = 76800 addressable recursive computation to obtain the resultant integral image.
memory addresses and each memory location can save 12 bits For the current row of location (x,y) an accumulator is used
of data (4: 4: 4 = R: G: B). Here dual port BRAM is to compute the sum of gray scale pixels value. If the current
instantiated in order to save the original image. Out of two row is not the first row of the source image then it must be
ports, one port is used to just read the original image in order added with the previous rows (x,y-1) integral image value in
to convert it into the integral image and save this integral order to get the correct integral value at location (x,y). Here
image in some other memory. Another port is used to make multiplexer is used to select the pixel summation of the first
the box over the detected faces in the original image. row of the image. After the first row of the image, integral
image generator requires the summation of the pixels of the
 Generation of Integral Image: previous rows also so for that we take the data back from the
The next stage of this implementation is the generation memory and then add it to the summation of the pixels value
of the integral image. The integral image is generated for a of the current row. For the generation of the integral image
portion of the source image in order to use minimum memory the block diagram is shown in the figure:
resources. The integral image at any location (x,y) is the
summation of gray scale pixels above and to the left of (x,y).

Fig 18 Block Diagram of Integral Image Generator

IJISRT24OCT808 www.ijisrt.com 1396


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

The simulation waveform is shown in figure below for the integral image generation. In this figure wrdata_buff_2A[19:0]
shows the waveform for integral image and wrdata_buff_3A[27:0] shows the waveform of integral image square.

Fig 19 Simulation waveform of Intergral Image Generator

Figure below shows the computation time of an integral image. The high pulse shows that the computation of an integral image
is done, and the low period of the pulse shows the computation of the integral image.

Fig 20 Integral Image done Signal and Computation Time of Integral Image

On Zedboard a GPIO pin is used to get the signal and displayed on a DSO. Table below shows the computation time of integral
image used.

Table 7 Integral Image Computation Time


Integral image size Integral image computation time
39X59 60 micro-sec

 Processing of Sub-Window to Find Face: subwindow area. Figure below shows the chosen window
A cascaded classifier is used to remove the non-faces area for the subwindow processing. The sum of grey scale
and detect the faces in an image. This cascaded classifier is a pixel values within these rectangular areas are basically used
trained chain of facial features. Different feature evaluations to obtain the difference between the dark and light regions in
have been done throughout 22 strong stages in a cascaded human faces.
manner. The evaluation of feature is done in a 24x24 pixel

IJISRT24OCT808 www.ijisrt.com 1397


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Fig 21 24x24 Window Area

After the features calculation, the accumulated values subwindow is rejected and then the processing of next
are compared with the threshold of the strong stage. If this subwindow starts. If the subwindow crosses the last stage of
threshold is crossed by the accumulated value than the the cascaded structure without detecting any non-face, then
currently evaluated subwindow is considered for having the the subwindow is determined to contain a face. Figure below
face element and passed to the next stage for further shows the sequential processing of a subwindow in a
processing. If this threshold is not crossed by the accumulated cascaded classifier structure.
value than a non-face is detected and the currently evaluated

Fig 22 Cascade Classifier

IJISRT24OCT808 www.ijisrt.com 1398


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

Figure below shows the processing of subwindow in which calculation of features and its comparison with the weak threshold
and the strong threshold is shown. This implementation shows two data paths, first is for variance normalization of subwindow and
second is for feature evaluation.

Fig 23 Processing of Sub-Window

In order to bring the light level of the subwindow to the Here mean (m) is obtained from the integral image (s0 :
light levels of the training images, variance normalization is s3) and the sum of the square of pixel’s value (p2) is obtained
used. The formula used to calculate the variance from the integral image of squared pixels (ss0 : ss3). In one
normalization factor (VNF) is given by the equation below: subwindow total pixels are 24 x 24 = 576 but for the value of
N, we have taken 512 for ease in the division by using right
shifting of bits by 9 places.

The second data path is for calculation of feature(f).


Integral image representation helps to get the quick
summation of the pixels value within a rectangle. Equations
below shows how to calculate the feature(f) value.

IJISRT24OCT808 www.ijisrt.com 1399


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

According to the normalized weak threshold, a left tree the actual position (x and y position) and scale values are
and right tree value is collected into a register for strong required. A red box can be drawn by easily changing the pixel
threshold comparison at the end of a stage. colour of the desired pixel to the red. For this, a memory is
used in first in first out manner. It stores the detection from
 Face Box Creation: all the 16 subwindow. When this subwindow process is
Face box creation stage draws the red box around the completed for all the scales ofthe image, desired pixel value
detected faces in the source image. To draw the box over face, of source image in image buffer is changed to the red colour.

Fig 24 Face Box Creator

 Performance Measurement of Implemented Face box drawing on the detected faces is measured. On Zedboard
Detection System on FPGA: FPGA a register is used to get the FPS. The output of the
The performance of implemented system on FPGA is register is mapped with a GPIO pin of the FPGA. A DSO is
measured as the total time taken to detect the face. To obtain used to measure the time duration after which the output of
the performance of FPGA based face detection system FPS is the register become high. The waveform of face detection is
measured. FPS is nothing but the number of frames detected shows in figure below which is taken from the GPIO pin. The
per second. In order to compute the FPS of system, time low period of the signal shows the processing time of the face
duration between capturing an image to displaying the detection. When the system detects the face after processing
detected result is measured. More accurately the time then the signal become high.
between the starting of the frame capture to the end of face

Fig 25 Waveform of Face Detected

IJISRT24OCT808 www.ijisrt.com 1400


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT808

In the above shown figure above one single square is of 60us. So, the total time for which the signal is low is approx. 130us.
The time taken to detect the face, and the detection frequency is shown in the table below:

Table 8 Face Detection Time


Input image size Detection time Detection frequency FPS
320x240 130ms 7.69Hz 7.69

VIII. CONCLUSION [5]. Z. Li, L. Xue, and F. Tan, “Face detection in complex
background based on skin color features and improved
There are several algorithms available for the face adaboost algorithms,” in Progress in Informatics and
detection. Selection of the algorithm entirely depends upon Computing (PIC), 2010 IEEE International
the requirement of the time. If we require a system which can Conference on, vol. 2, Dec 2010, pp.723–727.
detect the faces in an image with high accuracy, then it leads [6]. X. Zhao, X. Chai, Z. Niu, C. Heng, and S. Shan,
to more computation and thus requires more powerful “Context constrained facial landmark localization
hardware. If the hardware is not powerful, then we need to based on discontinuous haar-like feature,” in
compromise with the accuracy. In this paper Viola-Jones Automatic Face Gesture Recognition and Workshops
algorithm is used for the implementation which gives a high (FG 2011), 2011 IEEE International Conference on,
accuracy, but it requires more computation. March 2011, pp. 673–678.
[7]. W.-C. Hu, C.-Y. Yang, D.-Y. Huang, and C.-H.
Viola-Jones algorithm is implemented for face detection Huang, “Feature-based face detection against skin-
on MATLAB and then by using VHDL it is implemented on color like backgrounds with varying illumination,”
Zedboard FPGA. By MATLAB implementation and Journal of Information Hiding and Multimedia Signal
simulations, we will verify that how accurate Viola-Jones Processing, vol. 2, no. 2, pp. 123–132, 2011.
algorithm can detect the faces. The accuracy of the MATLAB [8]. D. N. Parmar and B. B. Mehta, “Face recognition
implementation of viola-jones algorithm is 86.67%. For the methods & applications,” arXiv preprint
hardware implementation the algorithm is developed in arXiv:1403.0485, 2014.
VHDL and implemented on Zedboard FPGA. The detection [9]. P. F. De Carrera and I. Marques, “Face recognition
rate given by the hardware implementation is measured in algorithms,” Master’s thesis in Computer Science,
processed frames per second and its value is obtained as 7.69 Universidad Euskal Herriko, 2010.
FPS. [10]. “Pre-trained classifier for face detection.”
[11]. “Pointing’04 icpr workshop.”
ACKNOWLEDGMENT

I would acknowledge my professor, guide and friends


who has supported work.

REFERENCES

[1]. A. Sharifara, M. S. M. Rahim, and Y. Anisi, “A


general review of human face detection including a
study of neural networks and haar feature-based
cascade classifier in face detection,” in Biometrics and
Security Technologies (ISBAST),2014 International
Symposium on, Aug 2014, pp. 73–78.
[2]. Z. Guo, H. Liu, Q. Wang, and J. Yang, “A fast
algorithm face detection and head pose estimation for
driver assistant system,” in 2006 8th international
Conference on Signal Processing, vol. 3, 2006.
[3]. X. Zhu, D. Ren, Z. Jing, L. Yan, and S. Lei,
“Comparative research of the common face detection
methods,” in Computer Science and Network
Technology (ICCSNT), 2012 2nd International
Conference on, Dec 2012, pp. 1528–1533.
[4]. D. Ghimire and J. Lee, “A robust face detection
method based on skin color and edges,” Journal of
Information Processing Systems, vol. 9, no. 1, pp.
141–156, 2013.

IJISRT24OCT808 www.ijisrt.com 1401

You might also like