0% found this document useful (0 votes)
53 views

Final Documentation 9

The document discusses several papers on age and gender detection technologies. It covers topics like understanding deep neural networks used for age and gender classification, a hybrid CNN-ELM model for improved accuracy and efficiency, jointly training deep neural networks on speech and face images, and using local deep neural networks for age and gender classification.

Uploaded by

Pradeep Victory
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Final Documentation 9

The document discusses several papers on age and gender detection technologies. It covers topics like understanding deep neural networks used for age and gender classification, a hybrid CNN-ELM model for improved accuracy and efficiency, jointly training deep neural networks on speech and face images, and using local deep neural networks for age and gender classification.

Uploaded by

Pradeep Victory
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

1

CHAPTER 1
INTRODUCTION

Age and gender detection refer to the use of computer


algorithms and artificial intelligence to identify the age and gender of
individuals from images or videos. A vast number of application
developers, especially after the growth of social media and social
networks, are indulging themselves in automatic age identification.
Age and gender are the most fundamental facial qualities in social
interaction. Face detection includes classifying images into two
classes: one with faces (targets) and the other containing the
background (clutter), which needs to be removed. The objective is to
accurately identify and isolate the face or faces in the image while
filtering out the background and other irrelevant elements. This
process is a crucial first step in many facial recognition and biometric
identification systems, as it allows for the extraction and analysis of
key facial features such as the shape and position of the eyes, nose,
and mouth. A further problem is complicated by differing lighting
Conditions, image qualities, and geometries; partial occlusion and
disguise are also possibilities. Age and gender detection technologies
are used in various fields for different purposes. Some of the main
areas where age and gender detection are commonly used are:
Marketing: Age and gender detection technologies are used to target
specific demographics with advertising and marketing campaigns.
This information helps businesses tailor their ads and products to
specific groups. Retail: Retail stores use age and gender detection
technologies to gather data on their customers' demographics. This
information can be used to improve store layout, product placement,
and marketing strategies. Healthcare: Age and gender detection
technologies are used in healthcare to identify potential health risks
and personalize treatment plans. For example, doctors can use this
information to prescribe medication or recommend lifestyle changes
2

that are appropriate for a patient's age and gender. Security: Age and
gender detection technologies are used in security systems to improve
accuracy and efficiency. For example, facial recognition technology
can be used to identify potential threats or suspects. Entertainment:
Age and gender detection technologies are used in the entertainment
industry to provide personalized experiences to users. For example,
streaming platforms can use this information to suggest content that
is tailored to a user's age and gender. Overall, age and gender
detection technologies are used to gather information and provide
personalized experiences in various fields, from marketing to
healthcare to security.

MOTIVATION:

The motivation behind human age and gender detection projects


stems from several reasons:

Demographic analysis: Age and gender information are


fundamental demographic attributes that provide valuable insights
into various fields. This data can be utilized in market research,
consumer behavior analysis, targeted advertising, and public policy
planning.

Personalized user experience: In many applications, such as e-


commerce, social media, and entertainment platforms, providing a
personalized user experience is crucial. Knowing the age and gender of
the users can help tailor content, recommendations, and
advertisements to better suit their preferences and needs.

Biometric identification: Age and gender detection can be


utilized as part of biometric identification systems, alongside other
traits like facial recognition. These systems have applications in
security and access control, where age and gender information can be
3

used to enhance identification accuracy and improve overall security


measures.

Social sciences and healthcare research: Age and gender are


essential factors in social sciences and healthcare research. Analyzing
age and gender distributions in various populations can help
researchers understand social behaviors, health disparities, and
disease prevalence, leading to better-informed policies and
interventions.

Data-driven decision-making: By accurately detecting age and


gender, organizations can gather valuable data for decision-making
processes. This data can be used to understand target audiences,
segment customer bases, and improve marketing strategies,
ultimately driving business growth and efficiency.

It is important to note that the development and use of age and


gender detection technologies raise ethical considerations related to
privacy, bias, and potential misuse. Responsible development and
deployment of these technologies should prioritize fairness,
transparency, and respect for individual privacy rights.
4

CHAPTER 2
LITERATURE SURVEY

 Understanding and Comparing Deep Neural Networks for


Age and Gender Classification

Recently, deep neural networks have demonstrated excellent


performances in recognizing the age and gender on human face
images. However, these models were applied in a black-box manner
with no information provided about which facial features are actually
used for prediction and how these features depend on image
preprocessing, model initialization and architecture choice. We
present a study investigating these different effects. In detail, our work
compares four popular neural network architectures, studies the
effect of pretraining, evaluates the robustness of the considered
alignment preprocessings via cross-method test set swapping and
intuitively visualizes the model's prediction strategies in given
preprocessing conditions using the recent Layer-wise Relevance
Propagation (LRP) algorithm. Our evaluations on the challenging
Audience benchmark show that suitable parameter initialization leads
to a holistic perception of the input, compensating artificial data
representations. With a combination of simple preprocessing steps, we

reach state of the art performance in gender recognition.


5

 A hybrid deep learning CNN–ELM for age and gender


classification

Automatic age and gender classification has been widely used in


a large amount of applications, particularly in human-computer
interaction, biometrics, visual surveillance, electronic customer, and
commercial applications. In this paper, we introduce a hybrid
structure which includes Convolutional Neural Network (CNN)
and Extreme Learning Machine (ELM), and integrates the synergy of
two classifiers to deal with age and gender classification. The hybrid
architecture makes the most of their advantages: CNN is used to
extract the features from the input images while ELM classifies the
intermediate results. We not only give the detailed deployment of our
structure including design of parameters and layers, analysis of the
hybrid architecture, and the derivation of back-propagation in this
system during the iterations, but also adopt several measures to limit
the risk of overfitting. After that, two popular datasets, such as,
MORPH-II and Adience Benchmark, are used to verify our hybrid
structure. Experimental results show that our hybrid architecture
outperforms other studies on the same datasets by exhibiting
significant performance improvement in terms of accuracy and
efficiency.
6

 Age and gender classification from speech and face images


by jointly fine-tuned deep neural networks

The classification of human's age and gender from speech and


face images is a challenging task that has important applications in
real-life and its applications are expected to grow more in the
future. Deep neural networks (DNNs) and Convolutional neural
networks (CNNs) are considered as one of the state-of-art systems as
feature extractors and classifiers and are proven to be very efficient in
analyzing problems with complex feature space. In this work, we
propose a new cost function for fine-tuning two DNNs jointly. The
proposed cost function is evaluated by using speech utterances and
unconstrained face images for age and gender classification task. The
proposed classifier design consists of two DNNs trained on different
feature sets, which are extracted from the same input data. Mel-
frequency cepstral coefficients (MFCCs) and fundamental frequency
(F0) and the shifted delta cepstral coefficients (SDC) are extracted
from speech as the first and second feature sets, respectively. Facial
appearance and the depth information are extracted from face images
as the first and second feature sets, respectively. Jointly training of
two DNNs with the proposed cost function improved the classification
accuracies and minimized the over-fitting effect for both speech-based
and image-based systems. Extensive experiments have been
conducted to evaluate the performance and the accuracy of the
proposed work. Two publicly available databases, the Age-Annotated
Database of the German Telephone Speech database (aGender) and
the Adience database, are used to evaluate the proposed system. The
overall accuracy of the proposed system is calculated as 56.06% for
seven speaker classes and overall exact accuracy is calculated as
63.78% for Adience database.
7

 Local Deep Neural Networks for Age and Gender


Classification

Local deep neural networks have been recently introduced for


gender recognition. Although, they achieve very good performance
they are very computationally expensive to train. In this work, we
introduce a simplified version of local deep neural networks which
significantly reduces the training time. Instead of using hundreds of
patches per image, as suggested by the original method, we propose to
use 9 overlapping patches per image which cover the entire face
region. This results in a much reduced training time, since just 9
patches are extracted per image instead of hundreds, at the expense
of a slightly reduced performance. We tested the proposed modified
local deep neural networks approach on the LFW and Audience
databases for the task of gender and age classification. For both tasks
and both databases the performance is up to 1% lower compared to
the original version of the algorithm. We have also investigated which
patches are more discriminative for age and gender classification. It
turns out that the mouth and eyes regions are useful for age
classification, whereas just the eye region is useful for gender
classification.
8

 Age and gender classification using brain–computer


interface

With the development of Internet of things (IOT), it is now


possible to connect various heterogeneous devices together using
Internet. The devices are able to share their information for various
applications including health care, security and monitoring. IOT
facilitates patients to self-monitor their physiological states invariably
and doctors to monitor their patients remotely.
Electroencephalography (EEG) provides a monitoring method to record
such electrical activities of the brain using sensors. In this paper, we
present an automatic age and gender prediction framework of users
based on their neurosignals captured during eyes closed resting state
using EEG sensor. Using EEG sensor, brain activities of 60
individuals with different age groups varying between 6 and 55 years
and gender (i.e., male and female) have been recorded using a wireless
EEG sensor. Discrete wavelet transform frequency decomposition has
been performed for feature extraction. Next, random forest classifier
has been applied for modeling the brain signals. Lastly, the accuracies
have been compared with support vector machine and artificial neural
network classifiers. The performance of the system has been tested
using user-independent approach with an accuracy of 88.33 and
96.66% in age and gender prediction, respectively. It has been
analyzed that oscillations in beta and theta band waves show
maximum age prediction, whereas delta rhythm leads to highest
gender classification rates. The proposed method can be extended to
different IOT applications in healthcare sector where age and gender
information can be automatically transmitted to hospitals and clinics
through Internet.
9

 Age and gender recognition in the wild with deep attention

Face analysis in images in the wild still pose a challenge for


automatic age and gender recognition tasks, mainly due to their high
variability in resolution, deformation, and occlusion. Although the
performance has highly increased thanks to Convolutional Neural
Networks (CNNs), it is still far from optimal when compared to other
image recognition tasks, mainly because of the high sensitiveness of
CNNs to facial variations. In this paper, inspired by biology and the
recent success of attention mechanisms on visual question answering
and fine-grained recognition, we propose a novel feedforward attention
mechanism that is able to discover the most informative and reliable
parts of a given face for improving age and gender classification. In
particular, given a down sampled facial image, the proposed model is
trained based on a novel end-to-end learning framework to extract the
most discriminative patches from the original high-resolution image.
Experimental validation on the standard Adience, Images of Groups,
and MORPH II benchmarks show that including attention
mechanisms enhances the performance of CNNs in terms of
robustness and accuracy.
10

CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

Microsoft Azure: Microsoft Azure offers a cognitive services platform


that includes an age and gender detection API. It uses machine
learning models to analyze facial features and predict age and gender

The Face API uses machine learning models to analyze facial features
such as the distance between the eyes, the shape of the nose and
mouth, and the texture of the skin to estimate age and gender. The
age estimation feature returns a range of ages, while the gender
estimation feature returns either male or female.

For age estimation, the Face API returns a range of ages rather than a
specific age, since age estimation based on facial features is not a
precise science. The age range is typically divided into five-year
increments, for example, "18-24", "25-29", and so on.

The Face API can be easily integrated into web or mobile applications,
making it accessible to developers who want to add age and gender
detection features to their projects without having to build the models
from scratch. Microsoft Azure also provides tools for customizing and
training the Face API models to improve their accuracy and
performance for specific use cases.

3.2 PROPOSED SYSTEM

Age and gender recognition are important areas of research in


computer vision and image processing. They have wide applications in
areas such as security, marketing, healthcare, and entertainment. The
use of deep learning techniques has significantly improved the
accuracy of age and gender recognition systems. The Caffe library is
one such deep learning framework that is widely used for image
11

classification, segmentation, and recognition. This essay proposes a


system for age and gender recognition using the Caffe library.

The proposed system for age and gender recognition consists of


three main components: data collection, feature extraction, and model
training.

The first step in developing an age and gender recognition


system is to collect a large dataset of images with labelled age and
gender information. There are several publicly available datasets that
can be used for this purpose, such as the Adience dataset, the IMDB-
WIKI dataset, and the MORPH-II dataset. These datasets contain a
large number of images of faces with labels indicating the age and
gender of the person in the image.

The second step in developing an age and gender recognition


system is to extract features from the images in the dataset. The Caffe
library provides several pre-trained deep neural network models that
can be used for feature extraction. These models include AlexNet,
VGGNet, and GoogLeNet. The features extracted from these models
can be used to train a classifier for age and gender recognition.

The final step in developing an age and gender recognition


system is to train a classifier using the features extracted from the
images in the dataset. The classifier can be trained using several
machine learning algorithms, such as logistic regression, support
vector machines, or neural networks. The Caffe library provides
several pre-trained models for classification, such as CaffeNet, which
is based on the AlexNet architecture.

Once the classifier has been trained, it can be evaluated using a


test dataset. The accuracy of the system can be measured using
metrics such as precision, recall, and F1-score. The system can also
be evaluated using a confusion matrix, which provides information
12

about the true positive, false positive, true negative, and false negative
rates of the system.

The proposed system for age and gender recognition using the
Caffe library consists of three main components: data collection,
feature extraction, and model training. The system can be evaluated
using metrics such as precision, recall, and F1-score. The accuracy of
the system can be improved by using larger datasets, better feature
extraction techniques, and more advanced machine learning
algorithms. The proposed system has wide applications in areas such
as security, marketing, healthcare, and entertainment.
13

CHAPTER 4
FEASIBILITY STUDY

♦ In this phase, the project's viability is evaluated, and a


preliminary business plan with cost estimates is presented. In
the system analysis phase, the feasibility of the proposed
system is assessed to ensure that it won't pose any financial
burden on the company. To conduct the feasibility analysis, it's
important to have a basic understanding of the system's major
requirements.
♦ ECONOMICAL FEASIBILITY
♦ TECHNICAL FEASIBILITY
♦ SOCIAL FEASIBILITY

4.1 ECONOMICAL FEASIBILITY

The purpose of this study is to assess how the proposed system


will affect the organization's finances. As the amount of money the
company can allocate for the system's research and development is
limited, any expenses must be reasonable and justifiable. Therefore,
the system was developed within the budget, with most of the
technologies used being available for free. The only expenses were for
customized products that needed to be purchased.

4.2 TECHNICAL FEASIBILITY

The purpose of this study is to evaluate the technical feasibility


of the proposed system, specifically its technical requirements. It's
important to ensure that the system's development won't place a
heavy demand on the organization's technical resources, which would
result in a burden on the client. The ideal system should have modest
14

technical requirements, so that minimal or no changes are needed to


implement it.

4.3 SOCIAL FEASIBILITY

This study focuses on assessing the level of user acceptance of


the proposed system, which includes providing training to ensure
efficient use of the system. It's crucial that users don't feel intimidated
by the system, but rather view it as a necessary tool. The level of user
acceptance depends on the methods used to educate and familiarize
them with the system, as well as building their confidence to provide
constructive feedback, which is valuable as they are the system's final
users.

4.4 HARDWARE REQUIREMENTS:

• System : Pentium Dual Core.


• Hard Disk : 120 GB.
• Monitor : 15’’ LED
• Input Devices : Keyboard, Mouse
• Ram : 1 GB
• Internet : NodeMCU(Wi-Fi Module ESP8266):

• Camera : Any moderate Camera Module

4.5 SOFTWARE REQUIREMENTS:

• Operating system : Windows 11

• Coding Language : python

• Tool : PyCharm

• Libraries : Open cv; Dlib ; Pillow; Numpy


15

CHAPTER 5
SYSTEM DESIGN

5.1 SYSTEM ARCHITECTURE

Fig 1: Schematic diagram depicting a block diagram of a system


16

Fig 2: Flow Chart of Gender and Age Detection


17

5.2 Convolutional Neural Networks

Age and gender detection can be performed using Convolutional


Neural Networks (CNNs) in Python. A CNN is a type of deep learning
algorithm that is commonly used for image processing tasks such as
object detection, image classification, and facial recognition.
Here are the general steps to create a CNN model for age and gender
detection:
1. Data collection: Collect a large dataset of labeled images that
represent different ages and genders.
2. Data preprocessing: Preprocess the collected data by resizing
the images, converting them to grayscale, and normalizing the
pixel values.
3. Model architecture: Define the architecture of the CNN model.
The architecture should consist of convolutional layers, pooling
layers, and fully connected layers. The number of layers and the
number of neurons in each layer depend on the complexity of
the problem.
4. Training the model: Train the CNN model on the preprocessed
dataset. Use a training algorithm like stochastic gradient
descent (SGD) to update the model parameters.
5. Testing the model: Test the trained model on a separate
dataset to evaluate its performance. Use metrics like accuracy,
precision, and recall to measure the model's performance.
6. Predicting age and gender: Once the model is trained and
tested, it can be used to predict the age and gender of new
images.

LAYERS OF CNN

There are 6 layers in CNN


1. Convolutional Layer
2. Pooling Layer
18

3. Activation Layer
4. Fully Connected Layer
5. Dropout Layer
6. Batch Normalization Layer

1. CONVOLUTIONAL LAYER

A Convolutional Layer is the core building block of a Convolutional


Neural Network (CNN). Its purpose is to extract meaningful features
from an input image or signal by applying a set of learnable filters,
also known as kernels or weights, that slide over the input to produce
a set of output feature maps.
The convolution operation can be thought of as a sliding window
that moves over the input image in a specific direction, multiplying the
values of the input and filter elements at each location and summing
the result to produce a single output value for that location in the
output feature map.
The size of the output feature maps can be controlled by the
following parameters:
Stride: the number of pixels by which the filter is shifted after each
convolution operation. A larger stride reduces the output size, while a
smaller stride preserves more spatial information.
Padding: the number of pixels added to the borders of the input to
maintain the size of the output. Padding can be used to avoid border
effects and to preserve spatial resolution.
Filter Size: the size of the filter, typically square, that is slid over the
input image. The size of the filter determines the receptive field of the
layer and the spatial frequency of the features that can be detected.
The filters in a convolutional layer are initialized randomly and
then adjusted during training to minimize the error between the
predicted output and the ground truth labels. This is achieved
through a process called backpropagation, which computes the
gradient of the loss function with respect to the filter weights and
19

updates them accordingly using an optimization algorithm such as


Stochastic Gradient Descent (SGD).
Convolutional Layers are particularly effective at capturing local
spatial correlations and patterns in an input image, such as edges,
corners, and textures, which are essential for tasks such as image
classification, object detection, and segmentation. The successive
layers in a CNN typically learn increasingly complex and abstract
features by combining and recombining the lower-level features
learned in the earlier layers.
In addition to the basic convolution operation, Convolutional Layers
often include additional components to enhance their performance
and flexibility:
Activation Function: The output of the convolutional layer is passed
through a non-linear activation function, such as ReLU, to introduce
non-linearity and increase the expressive power of the network.
Pooling Layer: The output of the convolutional layer is downsampled
using a pooling operation, such as max pooling or average pooling, to
reduce the size of the feature maps and improve the robustness of the
network to small spatial variations.
Batch Normalization: The output of the convolutional layer is
normalized across the batch dimension to improve the stability and
convergence of the training process and to reduce the sensitivity to
hyperparameters.
Dropout: Randomly selected activations in the output of the
convolutional layer are set to zero during training to prevent
overfitting and improve the generalization performance of the network.
Convolutional Layers are a fundamental component of Convolutional
Neural Networks that enable the extraction of meaningful features
from input images or signals. They use learnable filters to perform a
sliding convolution operation over the input and produce a set of
output feature maps that capture local spatial correlations and
patterns. Additional components, such as activation functions,
pooling layers, batch normalization, and dropout, can be added to
20

enhance the performance and flexibility of the layer.

Fig 3: Convolutional Layer

2. POOLING LAYER

The Pooling Layer is a crucial component of Convolutional


Neural Networks (CNNs) used in computer vision tasks such as image
recognition, object detection, and segmentation. The main purpose of
this layer is to reduce the dimensionality of the feature maps
produced by the convolutional layers while maintaining the important
information.
The Pooling Layer takes as input a 3D tensor representing the
feature maps produced by the convolutional layer. It operates on each
feature map independently, using a sliding window to partition the
map into non-overlapping regions, called pooling regions or pooling
windows. The most common pooling operation is max pooling, which
takes the maximum value within each pooling region and outputs it
as the new value for that region.
The main advantage of max pooling is that it is a non-linear
operation that preserves the most important features of the input
data, while reducing the dimensionality of the feature maps. By
reducing the number of parameters in the network, pooling also helps
to reduce overfitting and improve the generalization of the model.
Another advantage of pooling is that it introduces some degree of
21

translational invariance, meaning that small translations of the input


image will not affect the output of the pooling layer. This can be
helpful in object recognition tasks, where the location of the object in
the image may vary.
In addition to max pooling, there are other types of pooling
operations, such as average pooling, which computes the average
value within each pooling region. However, max pooling is the most
commonly used operation in CNNs, as it has been shown to be more
effective than average pooling in many cases.
The pooling layer can also be used with different parameters, such as
the size of the pooling window and the stride, which determines the
distance between adjacent pooling regions. A larger pooling window
reduces the dimensionality of the feature maps more aggressively,
while a smaller window preserves more detail. A larger stride reduces
the output size of the pooling layer, while a smaller stride preserves
more spatial information. However, it is important to note that pooling
can also have some disadvantages. For example, it can cause a loss of
information by discarding some of the input data, which can be
problematic in certain tasks, such as image segmentation. In addition,
pooling can also introduce some distortion or blurring of the features,
which can affect the accuracy of the model.
To mitigate these issues, some recent works have proposed
alternative methods for reducing the dimensionality of the feature
maps, such as strided convolution and dilated convolution, which
perform a similar function to pooling but without discarding any
information. However, pooling remains a widely used and effective
technique for reducing the dimensionality of the feature maps in
CNNs.
The pooling layer is an essential component of CNNs used in
computer vision tasks. It reduces the dimensionality of the feature
maps produced by the convolutional layers while maintaining the
important information, introducing some degree of translational
invariance, and reducing overfitting. Max pooling is the most
22

commonly used pooling operation in CNNs, but other types of pooling


and different parameters can also be used. While pooling has some
disadvantages, it remains a widely used and effective technique for
reducing the dimensionality of the feature maps in CNNs.

Fig 4: Pooling Layer

3. ACTIVATION LAYER

The activation layer is a fundamental component of neural networks,


including convolutional neural networks (CNNs), and plays a crucial
role in introducing non-linearity into the model. The activation layer
applies a non-linear function to the output of the previous layer,
which allows the neural network to learn more complex
representations of the input data.
The activation layer is typically inserted after the convolutional and
pooling layers in a CNN, and its output is then passed to the next
layer, such as a fully connected layer or another convolutional layer.
The activation function can be any non-linear function, such as ReLU
(Rectified Linear Unit), sigmoid, or tanh.
ReLU is the most commonly used activation function in deep learning,
including CNNs. ReLU applies the following function to each element
of the input tensor:

f(x) = max(0, x)

where x is the input tensor. In other words, ReLU sets any negative
value in the input tensor to zero and leaves any positive value
23

unchanged. This simple non-linear function has several advantages,


including computational efficiency and improved convergence during
training.

Another popular activation function is the sigmoid function, which


maps the input tensor to a range between 0 and 1:

f(x) = 1 / (1 + e^(-x))

The sigmoid function is often used in binary classification tasks,


where the goal is to predict a binary outcome, such as whether an
image contains a cat or not.
The tanh function is similar to the sigmoid function but maps the
input tensor to a range between -1 and 1:

f(x) = (e^x - e^(-x)) / (e^x + e^(-x))

The tanh function can be used in place of the sigmoid function in


some cases and has the advantage of being symmetric around zero.
The choice of activation function depends on the specific task and the
architecture of the neural network. In general, ReLU is a good default
choice for most deep learning tasks, as it has been shown to be
effective in practice and is computationally efficient.
In addition to introducing non-linearity into the model, the activation
layer also plays a crucial role in preventing the vanishing gradient
problem. This problem occurs when the gradients in the
backpropagation step become too small and prevent the neural
network from learning. The non-linear activation function ensures
that the gradients do not become too small, which helps to prevent the
vanishing gradient problem and allows the neural network to learn
more effectively.

The activation layer is a critical component of neural networks,


24

including CNNs. It applies a nonlinear function to the output of the


previous layer, which introduces non-linearity into the model and
prevents the vanishing gradient problem. The choice of activation
function depends on the specific task and architecture of the neural
network, but ReLU is a good default choice for most deep learning
tasks.

4. FULLY CONNECTED LAYER

The Fully Connected Layer, also known as the Dense Layer, is a


crucial component of a neural network that connects every neuron in
one layer to every neuron in the next layer. This layer is often used as
the final layer in a neural network for classification tasks, as it
outputs the predicted class probabilities based on the learned
features.
Working of Fully Connected Layer:
The input to the fully connected layer is a flattened vector of the
output from the previous layer. The fully connected layer then applies
a set of weights and biases to this input, producing a new set of
activations that are passed through an activation function.
The weights in the fully connected layer are learned through the
process of backpropagation, where the gradients of the loss function
with respect to the weights are computed and used to update the
weights using an optimization algorithm such as stochastic gradient
descent.
During training, the fully connected layer adjusts its weights and
biases to minimize the error between the predicted outputs and the
true outputs. This allows the model to learn the underlying patterns
in the input data and make accurate predictions on new data.
The number of neurons in the fully connected layer is typically chosen
based on the complexity of the task and the size of the input data. A
larger number of neurons can increase the capacity of the model, but
can also lead to overfitting if not properly regularized.
25

The activation function used in the fully connected layer is often a


nonlinear function, such as the Rectified Linear Unit (ReLU), which
helps introduce non-linearity into the model and allows it to learn
more complex representations. Other commonly used activation
functions include the sigmoid and hyperbolic tangent functions.
Fully connected layers are commonly used in deep learning models for
a variety of tasks, including image classification, natural language
processing, and speech recognition. In computer vision, fully
connected layers are often used in conjunction with convolutional
layers to form a Convolutional Neural Network (CNN), which is a type
of neural network that is particularly well-suited for image
classification tasks.
The Fully Connected Layer is a crucial component of a neural network
that connects every neuron in one layer to every neuron in the next
layer. It is commonly used as the final layer in a neural network for
classification tasks, and is learned through the process of
backpropagation. Despite its limitations, fully connected layers remain
an important tool in the deep learning toolbox and are used in a
variety of applications.

5. DROPOUT LAYER

The dropout layer is a regularization technique commonly used


in neural networks, including Convolutional Neural Networks (CNNs).
The main idea behind dropout is to prevent overfitting by randomly
dropping out (i.e., setting to zero) some of the neurons in a layer
during training.
Overfitting occurs when a model learns to fit the training data too well
and performs poorly on new, unseen data. Dropout helps to prevent
overfitting by forcing the network to learn more robust features that
are not dependent on the presence of specific neurons.
In practice, dropout works by randomly selecting a subset of the
neurons in a layer and setting their outputs to zero during each
26

training iteration. The probability of dropping out a neuron is a


hyperparameter that needs to be set by the user, typically between 0.1
and 0.5. During inference (i.e., when making predictions on new data),
all neurons are used, and their outputs are scaled by the probability
of being present during training.
Dropout has been shown to be an effective regularization technique in
many deep learning applications, including image classification,
speech recognition, and natural language processing. It has several
benefits, including:
Reducing overfitting: Dropout prevents the network from memorizing
the training data too well and forces it to learn more robust features
that generalize better to new, unseen data.
Improving generalization: Dropout helps the network to avoid learning
features that are only present in the training data and encourages it
to learn more diverse features that are useful for a wider range of
inputs.

6. BATCH NORMALIZATION LAYER

Batch Normalization is a technique used in deep learning to


improve the training of neural networks. It is an important part of
many state-of-the-art architectures, such as ResNet, DenseNet, and
Inception. Batch Normalization helps to stabilize the training process,
prevent overfitting, and improve the performance of the model. In this
explanation, we will discuss the concept and working of the Batch
Normalization Layer.
Batch Normalization is a normalization technique applied to the
activations of the neural network. The main idea behind Batch
Normalization is to normalize the inputs to each layer in a way that
the distribution of inputs is approximately Gaussian. This is done by
normalizing the activations of the previous layer over a mini-batch of
data.
The Batch Normalization Layer has four main steps:
27

Mean and Variance Computation: The first step is to compute the


mean and variance of the activations over the mini-batch of data. This
is done by taking the average of the activations and computing the
variance.

Normalization: The next step is to normalize the activations by


subtracting the mean and dividing by the standard deviation. This
ensures that the activations have zero mean and unit variance.

Scaling and Shifting: The normalized activations are then scaled and
shifted using learnable parameters. This allows the network to learn
the optimal scale and shift for each activation.

Activation Function: The final step is to apply the activation function


to the scaled and shifted activations. This produces the output of the
Batch Normalization Layer.

The benefits of Batch Normalization include:

Improved Stability: Batch Normalization helps to stabilize the


training process by reducing the internal covariate shift. This is the
phenomenon where the distribution of inputs to each layer changes
during training, making it harder for the network to learn.

Improved Generalization: Batch Normalization helps to prevent


overfitting by reducing the dependence of each layer on the other
layers. This means that the network can learn more independent
features, which improves its ability to generalize to new data.

Faster Convergence: Batch Normalization speeds up the training


process by allowing the use of larger learning rates. This is because
the normalization of the inputs reduces the gradient magnitudes,
28

making it easier for the optimizer to find the optimal parameters.

Batch Normalization is a powerful technique used in deep


learning to improve the training of neural networks. It helps to
stabilize the training process, prevent overfitting, and improve the
performance of the model. The Batch Normalization Layer normalizes
the activations of the neural network by computing the mean and
variance over a mini-batch of data, normalizing the activations,
scaling and shifting the activations, and applying the activation
function.

5.3 Cascade

Cascade classifiers are a popular technique for gender and age


detection in images. Cascade classifiers are machine learning
algorithms that use Haar-like features and Adaboost to classify
images based on certain features.
Haar-like features are a type of feature used to detect specific patterns
in images, such as edges or lines. Adaboost is an algorithm used to
train a classifier on a dataset by iteratively improving the classifier's
accuracy.
Cascade classifiers work by breaking down an image into
smaller and smaller regions and then classifying each region as
positive or negative for the target feature (in this case, gender or age).
The algorithm then combines these regions to make a final
classification for the entire image.
The advantage of cascade classifiers is that they are fast and efficient,
making them ideal for real-time applications like video streaming or
live camera feeds. The downside is that they may not be as accurate
as other techniques like CNNs, especially when dealing with complex
or ambiguous images.
To use cascade classifiers for gender and age detection in Python, we
can use the OpenCV library. OpenCV includes pre-trained cascade
29

classifiers for gender and age detection,


OpenCV provides pre-trained cascade classifiers for age and
gender detection that can be used directly in your Python code. The
pre-trained classifiers are XML files that contain the Haar-like
features and Adaboost parameters that are used to detect the target
features (age and gender) in images.
Here are the cascade classifiers that are provided by OpenCV for age
and gender detection:
1. haarcascade_frontalface_alt2.xml: This is a general-purpose face
detector that can be used to detect faces in images. It is not
specifically designed for age or gender detection, but it can be
used as a pre-processing step to isolate the face region before
applying the age or gender classifiers.
2. haarcascade_frontalface_default.xml: This is another general-
purpose face detector that can be used to detect faces in
images.
3. haarcascade_age.xml: This classifier is specifically designed to
detect the age of a person in an image. It is trained on a dataset
of images that are labeled with the age of the person.
4. haarcascade_gender_female.xml: This classifier is designed to
detect the gender of a female person in an image. It is trained on
a dataset of images that are labeled with the gender of the
person.
5. haarcascade_gender_male.xml: This classifier is designed to
detect the gender of a male person in an image. It is trained on
a dataset of images that are labeled with the gender of the
person.

To use these cascade classifiers in your Python code, you can load the
XML file using the CascadeClassifier class provided by OpenCV, and
then use the detectMultiScale function to detect the target feature
(age or gender) in the image.
30

Advantages:

1. Fast and efficient: Cascade classifiers are designed to be fast


and efficient, making them ideal for real-time applications like
video streaming or live camera feeds. They can process images
quickly and accurately, even on low-powered devices like mobile
phones or embedded systems.
2. Low memory usage: Cascade classifiers require very little
memory to operate, making them ideal for use in low-memory
environments.
3. Easy to use: Cascade classifiers are easy to use, with simple
APIs provided by popular libraries like OpenCV. You can easily
incorporate them into your Python code and use them for age
and gender detection with minimal effort.
4. Good accuracy: While cascade classifiers may not be as
accurate as other techniques like CNNs, they can still achieve
good accuracy in many cases, especially when trained on large
datasets.
5. Robust: Cascade classifiers are designed to be robust to
changes in lighting, pose, and other factors that can affect the
accuracy of age and gender detection. They can detect features
even when they are partially occluded or obscured, making
them suitable for use in a wide range of applications.

Overall, cascade classifiers are a useful technique for age and gender
detection in images, especially in applications where speed and
efficiency are critical. While they may not be as accurate as other
techniques in all cases, they are a good option to consider when
designing image-based application.
31

5.4 DNN(Deep Neural Network)

Fig 5: Deep Neural Network

Deep Neural Networks (DNNs) have been at the forefront of


revolutionizing machine learning, particularly in the fields of computer
vision, speech recognition, natural language processing, and robotics.
These networks are built with multiple layers that transform input
data into output data, enabling them to learn and make predictions
with high accuracy. However, developing these networks can be quite
challenging, given the complexity of their architecture and the large
amounts of data required for training. This is where DNN libraries
come in handy.

A DNN library is a software package that contains pre-written


code to help developers build deep learning models with ease. These
libraries come equipped with a variety of features, including
algorithms for optimizing model performance, pre-trained models, and
tools for data processing and visualization.

One of the most popular DNN libraries in use today is


TensorFlow. TensorFlow is an open-source software library for
32

building and deploying machine learning models. Developed by the


Google Brain Team, TensorFlow has become a go-to tool for data
scientists and researchers due to its extensive functionality and ease
of use.

TensorFlow provides a high-level API called Keras, which


simplifies the process of building neural networks. Keras provides a
user-friendly interface that allows developers to create complex
models with just a few lines of code. It also includes a variety of pre-
trained models that can be used for a range of applications.

Another popular DNN library is PyTorch, an open-source


machine learning library developed by Facebook. PyTorch provides a
dynamic computational graph that allows developers to easily define,
modify, and debug their neural networks. It also includes a variety of
tools for data processing, including support for popular data formats
like NumPy and Pandas.

PyTorch also features a high-level API called TorchVision, which


is specifically designed for computer vision applications. TorchVision
includes pre-trained models for a variety of tasks, such as object
detection, image classification, and segmentation.

Caffe is another popular DNN library developed by the Berkeley Vision


and Learning Center. Caffe is a fast and efficient framework that is
designed to work with large datasets. It includes a variety of pre-
trained models, including models for image classification and object
detection.

One of the advantages of coffee is its speed. Caffe is optimized


for high-performance computing and can be used on a range of
hardware, including GPUs and CPUs. This makes it a popular choice
for researchers and data scientists who need to process large amounts
of data quickly.
33

In addition to TensorFlow, PyTorch, and Caffe, there are many


other DNN libraries available, including Theano, MXNet, and Chainer.
Each library has its own set of features and advantages, and the
choice of library depends on the specific requirements of the
application.

In conclusion, DNN libraries are an essential tool for building


deep learning models. These libraries simplify the process of building
and deploying models by providing pre-written code and pre-trained
models. TensorFlow, PyTorch, and Caffe are some of the most popular
DNN libraries in use today, but there are many other options
available. By using these libraries, developers and data scientists can
focus on developing innovative solutions rather than spending time on
the technical details of building a neural network from scratch.

LAYERS OF DNN

There are 3 layers in DNN

1. Input Layer
2. Hidden Layer
3. Output Layer

1. INPUT LAYER

The input layer is the first layer of a deep neural network (DNN)
that receives the input data. It is responsible for transforming the raw
input data into a format that can be processed by the network. The
number of nodes in the input layer corresponds to the number of
input features or variables in the data.

Each node in the input layer represents a feature or variable in


the input data. For example, if the input data consists of an image
with height, width, and color channels, the input layer nodes will
correspond to each pixel value in the image. The input layer nodes can
34

also represent non-image features, such as numerical or categorical


data.

The input layer nodes do not perform any computation or


transformation on the input data; they simply pass the data to the
next layer in the network. However, the input layer is essential for the
network to learn meaningful representations of the input data through
the subsequent layers.

During the training process, the weights and biases of the


connections between the input layer nodes and the next layer nodes
are adjusted to optimize the network's performance. This allows the
network to learn patterns and relationships in the input data, and
make accurate predictions or classifications.

The input layer of a DNN is responsible for receiving the input


data and passing it to the subsequent layers of the network. It is a
crucial component of the network architecture and plays a significant
role in the network's ability to learn from the input data.

2. HIDDEN LAYER

Hidden layers are the intermediate layers in a deep neural


network (DNN) between the input and output layers. They are called
"hidden" because their internal workings are not directly visible to the
user and are not part of the final output of the network. Hidden layers
are responsible for processing the input data by applying a set of
nonlinear transformations to the input features, which allows the
network to learn complex representations of the input data.

Each hidden layer in a DNN consists of multiple nodes or


neurons, which are connected to the nodes in the previous and next
layers. Each connection is associated with a weight and a bias, which
are learned during training to optimize the network's performance.
The nodes in each hidden layer apply an activation function to the
35

weighted sum of their inputs, which introduces nonlinearity into the


network and enables it to model complex relationships between the
input and output variables.

The number of hidden layers and the number of nodes in each


layer can vary depending on the complexity of the input data and the
task at hand. Generally, increasing the number of hidden layers and
nodes can improve the network's ability to learn complex features and
generalize to new data, but it also increases the risk of overfitting if
the network is too large or the training data is insufficient.

Hidden layers play a crucial role in the success of DNNs, allowing


them to learn complex representations of input data and make
accurate predictions or classifications.

3. OUTPUT LAYER

The output layer of a deep neural network (DNN) is the final


layer in the network that produces the output prediction or
classification based on the input data. The design of the output layer
depends on the nature of the task being performed by the network.
For example, in a binary classification problem where the goal is to
predict whether an input belongs to one of two classes (such as spam
vs. not spam), the output layer would typically consist of a single node
with a sigmoid activation function. The output of this node would
represent the predicted probability that the input belongs to the
positive class.

In a multiclass classification problem where there are more than


two classes (such as classifying images of animals into different
species), the output layer would typically consist of multiple nodes,
with each node representing a different class. The nodes would
typically use a softmax activation function, which produces a
probability distribution over the different classes.
36

In regression problems where the goal is to predict a continuous


numerical value (such as the price of a house), the output layer would
typically consist of a single node with a linear activation function.

During training, the weights and biases of the connections


between the nodes in the output layer and the previous layer(s) are
adjusted to minimize the difference between the predicted output and
the actual output. The loss function used during training depends on
the specific task being performed by the network.

Fig 6: Structure of DNN

5.5 CAFFE LIBRARY

Caffe: A High-Performance Deep Learning Framework

Caffe (Convolutional Architecture for Fast Feature Embedding)


is a deep learning framework developed by the Berkeley Vision and
Learning Center (BVLC) at the University of California, Berkeley. It is a
free and open-source software library written in C++ and CUDA that
allows users to train and deploy deep neural networks for a wide
range of computer vision tasks.
37

One of the main advantages of Caffe is its high performance,


which is achieved through the use of efficient algorithms and parallel
computing techniques. The library is optimized for NVIDIA GPUs,
which allow it to train and deploy models significantly faster than
traditional CPU-based approaches.

Caffe supports a wide range of deep learning architectures,


including convolutional neural networks (CNNs), recurrent neural
networks (RNNs), and deep belief networks (DBNs). It also includes a
range of pre-trained models for popular computer vision tasks such as
image classification, object detection, and segmentation.

One of the key features of Caffe is its flexibility, which allows


users to easily customize and extend the library to meet their specific
needs. For example, users can define their own layers, loss functions,
and optimization algorithms or modify existing ones to better suit
their data and models.

Caffe also provides a user-friendly interface for building and


training deep neural networks. Users can create and configure
network architectures using a simple text-based description language,
which allows them to define the number and type of layers as well as
their parameters and connections.

In addition, Caffe includes a powerful visualization toolkit that allows


users to monitor and analyze the training process in real-time. This
includes tools for visualizing the network structure, displaying feature
maps and filters, and plotting training and validation curves.

One of the most popular applications of Caffe is image


classification, where it has achieved state-of-the-art performance on a
range of benchmark datasets. For example, in the ImageNet Large
Scale Visual Recognition Challenge (ILSVRC), Caffe-based models have
consistently outperformed other deep learning frameworks, achieving
38

top-1 and top-5 error rates of 15.3% and 7.4%, respectively, in the
2014 competition.

Another area where Caffe has been widely used is in object


detection, where it has been applied to a range of tasks, such as
pedestrian detection, face detection, and vehicle detection. Caffe-
based models have achieved state-of-the-art performance on
benchmark datasets such as PASCAL VOC and MS COCO,
demonstrating the library's versatility and adaptability to different
applications.

Overall, Caffe is a powerful and flexible deep learning framework


that has been widely adopted in the computer vision community. Its
high performance and ease of use make it an attractive choice for
researchers and practitioners looking to train and deploy deep neural
networks for a wide range of applications. With its active development
community and growing user base, Caffe is likely to continue to play a
key role in the advancement of deep learning in the years to come.

5.6 deploy_gender.prototxt

Deploy_gender.prototxt is a file that contains the architecture of


a convolutional neural network (CNN) used for gender classification. It
is an important file used in deep learning models to classify the
gender of a person in images or videos. In this essay, we will discuss
the deploy_gender.prototxt file, its structure, and how it is used in
gender classification.

The deploy_gender.prototxt file is a text file that describes the


architecture of the CNN used for gender classification. It is written in
the Protocol Buffer format, which is a language-neutral, platform-
neutral, extensible way of serializing structured data for use in
communications protocols, data storage, and more. The file contains
the network's layers, their types, and their parameters. These layers
39

include convolutional layers, pooling layers, and fully connected


layers.

The convolutional layers are responsible for extracting features


from the input image. These layers consist of a set of filters that slide
over the image and compute the dot product between the filter weights
and the pixel values in the input image. The output of these layers is a
set of feature maps that capture different features of the input image.

The pooling layers are used to reduce the size of the feature maps and
make the network more computationally efficient. These layers
perform a non-linear down-sampling operation, which reduces the
spatial dimensions of the input.

The fully connected layers are used to perform the final


classification. These layers take the output of the previous layers and
convert it into a probability distribution over the two classes, male
and female. The class with the highest probability is then selected as
the predicted gender.

The deploy_gender.prototxt file is used during the inference


phase of the CNN. During this phase, an image is passed through the
network, and the output is computed using the weights of the
network. The weights are learned during the training phase, where the
network is trained on a large dataset of labeled images.

In order to use the deploy_gender.prototxt file, it must be loaded


into a deep learning framework such as TensorFlow, Caffe, or
PyTorch. Once loaded, the network can be used to classify the gender
of a person in an image or video.

The deploy_gender.prototxt file is an important file used in deep


learning models for gender classification. It contains the architecture
of the CNN used for classification, including its layers, types, and
parameters. The file is used during the inference phase of the CNN,
40

where it is loaded into a deep learning framework and used to classify


the gender of a person in an image or video.

5.7 gender_net.caffemodel

Gender_net.caffemodel is a pre-trained convolutional neural


network model that is designed to recognise the gender of a person in
an image. This model is part of the Caffe deep learning framework,
which is widely used for computer vision tasks. In this essay, we will
discuss the architecture and functionality of gender_net.caffemodel,
as well as its applications and limitations.

The gender_net.caffemodel is a deep neural network that


consists of multiple layers. It is based on the Convolutional Neural
Network (CNN) architecture, which is a popular deep learning model
for image classification. The model takes an input image and passes it
through multiple layers of convolutional filters and pooling operations
to extract features from the image. These features are then fed into
fully connected layers to classify the gender of the person in the
image.

The architecture of gender_net.caffemodel is designed to


recognise gender from facial features such as the shape of the face,
the presence of facial hair, and the length and style of the hair. The
model was trained on a large dataset of images containing both males
and females. During training, the model learned to differentiate
between male and female features and to assign a probability score to
each gender.

The functionality of gender_net.caffemodel is straightforward.


Given an input image, the model outputs a probability score for each
gender. The gender with the highest probability score is considered
the gender of the person in the image. The model is capable of
recognising gender with high accuracy, making it useful for a variety
of applications.
41

One application of gender_net.caffemodel is in security systems.


For example, security cameras can be set up to use the model to
recognise the gender of people entering a building. This can be useful
for tracking the movements of individuals and ensuring the safety of
the building. The model can also be used in marketing and advertising
to target products and services at specific genders. For example, a
clothing retailer can use the model to show clothing options to
customers based on their gender.

However, there are also limitations to the use of


gender_net.caffemodel. Firstly, the model may not work well on images
that do not contain clear facial features. This can be due to poor
lighting, poor image quality, or images that do not contain faces.
Secondly, the model may not be accurate in recognising gender for
individuals who do not conform to traditional gender norms. This can
be due to variations in hairstyles, facial hair, or other features that
may not fit into traditional gender categories.

In conclusion, gender_net.caffemodel is a pre-trained


convolutional neural network model that is designed to recognise the
gender of a person in an image. The model is based on the CNN
architecture and is capable of recognising gender with high accuracy.
The model has a variety of applications in security systems,
marketing, and advertising. However, there are also limitations to the
use of the model, particularly in images of poor quality or for
individuals who do not conform to traditional gender norms.

5.8 deploy.prototxt

Deploy.prototxt is a configuration file that specifies the


architecture of a neural network and its parameters. It is an essential
component of the Caffe framework, which is an open-source deep
learning library developed by the Berkeley Vision and Learning Center.
This file is responsible for defining the structure of the neural network
and providing the necessary information for the model to be deployed.
42

The deploy.prototxt file describes the structure of the neural


network architecture, including the input and output layers as well as
any intermediate layers. It specifies the layer types, such as
convolutional, pooling, and fully connected layers, and their
associated parameters, including the kernel size, stride, and activation
functions. Additionally, the file contains information about the input
data shape, batch size, and mean values, which are used to normalize
the input data.

The deploy.prototxt file is used in the inference phase, where


the trained model is applied to new data to make predictions. During
this phase, the file is loaded along with the trained model weights, and
the input data is fed through the network. The output of the model is
then compared to the ground truth to evaluate its performance.

One of the advantages of using the deploy.prototxt file is its


flexibility. It allows users to customise the neural network architecture
and its parameters to suit their specific needs. Additionally, the file
can be easily modified using a text editor, making it easy to
experiment with different configurations and architectures.

To create a deploy.prototxt file, one must first define the neural


network architecture using the Caffe model definition language. This
language is used to specify the layers and their parameters, as well as
the connections between the layers. Once the model architecture is
defined, the deploy.prototxt file can be generated by specifying the
input and output layers and any other necessary parameters.

The deploy.prototxt file is a crucial component of the Caffe deep


learning framework. It provides a way to define the neural network
architecture and its parameters, making it possible to deploy the
trained model on new data. Its flexibility and ease of modification
make it an essential tool for researchers and developers working with
deep learning models.
43

5.9 res10_300x300_ssd_iter_140000_fp16.caffemodel

Res10_300x300_ssd_iter_140000_fp16.caffemodel is a deep
learning model that is widely used for object detection. It is a part of
the Single Shot Multibox Detector (SSD) architecture and was trained
on the COCO (Common Objects in Context) dataset. This essay will
discuss the Res10_300x300_ssd_iter_140000_fp16.caffemodel model,
its architecture, and its applications.

Object detection is the process of identifying and locating


objects in an image or video. It is a crucial task in computer vision
and has numerous applications, including surveillance, autonomous
vehicles, and robotics. Deep learning has revolutionized the field of
computer vision and led to significant advancements in object
detection. One such advancement is the Single Shot Multibox Detector
(SSD) architecture, which is a real-time object detection model.

The SSD architecture is a feedforward convolutional neural


network (CNN) that produces a set of bounding boxes and class
scores. TheRes10_300x300_ssd_iter_140000_fp16.caffemodel is a
pre-trained SSD model that is based on the ResNet-10 architecture.
ResNet-10 is a smaller version of the ResNet (Residual Network)
architecture that was introduced in 2015 and won the ImageNet Large
Scale Visual Recognition Challenge (ILSVRC) that year.

ResNet-10 consists of 10 layers and uses residual blocks, which


are a type of skip connection that allows the network to learn residual
functions. The Res10_300x300_ssd_iter_140000_fp16.caffemodel is
based on the ResNet-10 architecture and has been trained on the
COCO dataset, which consists of over 330,000 images with 2.5 million
object instances. The COCO dataset contains 80 object categories,
including people, animals, vehicles, and household objects.
44

The Res10_300x300_ssd_iter_140000_fp16.caffemodel has


several advantages over other object detection models. Firstly, it is a
real-time model, which means it can detect objects in video frames at
a rate of up to 30 frames per second. This makes it suitable for
applications such as autonomous vehicles and drones. Secondly, it is
a lightweight model, which means it can run on devices with limited
computational resources, such as smartphones and embedded
systems. Finally, it has a high accuracy rate, with a mean average
precision (mAP) of 0.76 on the COCO dataset.

The Res10_300x300_ssd_iter_140000_fp16.caffemodel has


numerous applications in various fields. In the field of autonomous
vehicles, it can be used for object detection and collision avoidance. In
the field of surveillance, it can be used for identifying and tracking
suspicious individuals or objects. In the field of robotics, it can be
used for object detection and manipulation.

The Res10_300x300_ssd_iter_140000_fp16.caffemodel is a deep


learning model that is widely used for object detection. It is a part of
the Single Shot Multibox Detector (SSD) architecture and was trained
on the COCO dataset. The model is based on the ResNet-10
architecture and has several advantages over other object detection
models, including real-time performance, lightweight design, and high
accuracy. The model has numerous applications in various fields,
including autonomous vehicles, surveillance, and robotics.

5.10 deploy_age.prototxt

Deploy_age.prototxt is a file used for deploying deep learning


models. It contains a network definition that describes the
architecture of the model, the input and output data formats, and the
trained parameters. This file is used to define the architecture of the
model and the layers that make up it.
45

Deep learning models are becoming increasingly popular in


various fields. They are used for image recognition, natural language
processing, speech recognition, and many other applications.
However, deploying these models can be a challenging task. The
deploy_age.prototxt file is designed to make this process easier.

The deploy_age.prototxt file is a text file that contains a set of


instructions for deploying the deep learning model. It defines the
architecture of the model, the input and output data formats, and the
trained parameters. This file is used to deploy the model on various
platforms, such as CPUs, GPUs, and mobile devices.

The first section of the deploy_age.prototxt file contains the


network definition. This section defines the architecture of the model
and the layers that make up it. This section defines the architecture of
the model and the layers that make up it. Each layer in the model is
defined by a set of parameters, such as the type of layer, the size of
the filter, and the number of filters. The network definition also
specifies the input and output data formats for the model.

The second section of the deploy_age.prototxt file contains the


trained parameters. These parameters are the weights and biases that
were learned during the training phase of the model. These
parameters are used to make predictions based on new data.

The deploy_age.prototxt file can be used to deploy the model on


different platforms. For example, if the model needs to be deployed on
a CPU, the deploy_age.prototxt file can be used to generate a CPU
implementation of the model. Similarly, if the model needs to be
deployed on a mobile device, the deploy_age.prototxt file can be used
to generate a mobile implementation of the model.

One of the advantages of using the deploy_age.prototxt file is


that it is portable. The file can be used to deploy the model on
different platforms without having to rewrite the entire code. This
46

saves time and effort and makes the deployment process more
efficient.

Another advantage of using the deploy_age.prototxt file is that it


is easy to use. The file contains all the information required to deploy
the model, so the user does not have to worry about the details of the
implementation. This makes it easier for non-experts to deploy deep
learning models.

The deploy_age.prototxt file is an essential tool for deploying


deep learning models. It contains a network definition that describes
the architecture of the model, the input and output data formats, and
the trained parameters. This file is used to define the architecture of
the model and the layers that make it up. The deploy_age.prototxt file
is portable, making it easy to employ the model on different platforms.
It is also easy to use, making it accessible to non-experts. Overall, the
deploy_age.prototxt file is a valuable asset for deploying deep learning
models.

5.11 age_net.caffemodel

AgeNet is a deep learning model that is used for age estimation.


It was introduced by researchers at the University of California, San
Diego, and is based on the Convolutional Neural Network (CNN)
architecture. AgeNet is designed to predict the age of a person based
on their facial features. The model is pre-trained using a large dataset
of facial images and is capable of accurately predicting age even when
presented with images that are not part of the training set.

The AgeNet model is based on the Caffe deep learning


framework. Caffe is a popular framework for building and training
deep neural networks. It is written in C++ and optimized for
performance on both CPU and GPU architectures. The Caffe
framework provides a set of pre-trained models, including AgeNet, that
can be used for a variety of computer vision tasks.
47

The AgeNet model consists of eight convolutional layers and


three fully connected layers. The input to the model is a 227x227-
pixel image of a face. The output of the model is a probability
distribution over age ranges. The age ranges used in the AgeNet model
are 0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, and 60-100. The
model is trained using a large dataset of facial images with labelled
ages.

The AgeNet model is available as a pre-trained Caffe model file,


age_net.caffemodel. This file contains the weights and biases learned
by the model during training. The model can be loaded into Caffe or
another deep learning framework and used for age estimation tasks.
The AgeNet model can also be fine-tuned on a new dataset to improve
its accuracy on specific tasks.

Age estimation is an important task in computer vision and has


applications in a variety of fields, including security, marketing, and
healthcare. The AgeNet model is a state-of-the-art approach to age
estimation and has been shown to outperform other methods on
several benchmark datasets. The model is also computationally
efficient, making it well suited for deployment on mobile and
embedded devices.

AgeNet is a deep learning model for age estimation that is based


on the Caffe deep learning framework. The model is pre-trained on a
large dataset of facial images and can accurately predict age based on
facial features. The AgeNet model is available as a pre-trained Caffe
model file, age_net.caffemodel, which can be used for age estimation
tasks. The model has applications in a variety of fields and is a state-
of-the-art approach to age estimation.

5.12 NUMPY

NumPy, short for Numerical Python, is an open-source library


for the Python programming language that provides support for large,
48

multi-dimensional arrays and matrices, along with a vast library of


mathematical functions to manipulate these arrays. NumPy is widely
used in the scientific community for data analysis, machine learning,
and scientific computing. In this essay, we will discuss the features
and applications of NumPy.

NumPy was created in 2005 by Travis Olliphant and is currently


maintained by a large community of developers. The library is built on
top of the C programming language and is optimised for performance,
making it one of the fastest numerical computing libraries available.
NumPy is also designed to integrate seamlessly with other libraries in
the scientific Python ecosystem, such as SciPy and Pandas.

One of the key features of NumPy is its support for multi-


dimensional arrays. NumPy arrays are homogeneous, meaning they
contain elements of the same data type. This allows for efficient
storage and manipulation of large datasets. NumPy arrays can be
created from lists, tuples, and other arrays and can be manipulated
using a variety of mathematical and logical operations.

NumPy also provides a wide range of mathematical functions for


manipulating arrays. These include functions for performing basic
arithmetic operations, such as addition and multiplication, as well as
more advanced functions for performing linear algebra, Fourier
transforms, and statistical analysis. NumPy also provides tools for
indexing and slicing arrays, allowing for efficient access to specific
elements or subsets of an array.

In addition to its support for arrays, NumPy also provides tools


for reading and writing data to and from disc. NumPy can read and
write a variety of file formats, including CSV, HDF5, and binary
formats. This makes it easy to work with data stored in different
formats and to share data between different applications.
49

NumPy is widely used in the scientific community for a variety


of applications. In data analysis, NumPy is used to manipulate and
analyze large datasets, perform statistical analysis, and create
visualizations. In machine learning, NumPy is used to preprocess
data, train and evaluate models, and perform predictions. In scientific
computing, NumPy is used to solve differential equations, perform
numerical simulations, and model physical systems.

NumPy is a powerful and versatile library for numerical


computing in Python. Its support for multi-dimensional arrays,
mathematical functions, and data I/O make it an essential tool for
scientific computing and data analysis. With its efficient performance
and seamless integration with other Python libraries, NumPy has
become a standard tool in the scientific Python ecosystem.

5.13 CNN VS DNN

Convolutional neural networks (CNNs) and deep neural


networks (DNNs) are both subfields of machine learning that have
revolutionized the field of computer vision. CNNs and DNNs are both
capable of learning from vast amounts of data and making predictions
with high accuracy. However, there are fundamental differences
between these two types of neural networks, which we will explore in
detail in this essay.

First, let's define what CNNs and DNNs are. A convolutional


neural network is a type of neural network designed for image
recognition and classification. The network consists of multiple layers,
including convolutional layers, pooling layers, and fully connected
layers. The convolutional layers are responsible for identifying features
in the image, while the pooling layers help reduce the dimensionality
of the feature maps. The fully connected layers then perform the
classification task.
50

On the other hand, deep neural network is a general term for


any neural network with multiple hidden layers. A DNN can be used
for a wide range of applications, including image recognition, speech
recognition, natural language processing, and more.

The main difference between CNNs and DNNs is their


architecture. CNNs are specifically designed for image recognition and
classification tasks, whereas DNNs can be used for a wide range of
applications. CNNs use convolutional layers to extract features from
the input image, while DNNs use a combination of fully connected
layers, convolutional layers, and other types of layers to extract
features from the input data.

Another significant difference between CNNs and DNNs is their


training process. CNNs are typically trained using large datasets of
images, with the goal of learning the underlying patterns and features
that are common across all images in the dataset. This training
process is often referred to as supervised learning, as the network is
provided with labeled examples of images and their corresponding
labels. The network learns to map the input image to the correct label
by adjusting the weights of its neurons during the training process.

In contrast, DNNs can be trained using a variety of methods,


including supervised learning, unsupervised learning, and
reinforcement learning. In supervised learning, the network is
provided with labeled examples of input data and their corresponding
labels, and it learns to map the input data to the correct label. In
unsupervised learning, the network is provided with unlabeled
examples of input data and learns to extract the underlying patterns
and features from the data. In reinforcement learning, the network
learns to make decisions based on feedback from the environment.

Another key difference between CNNs and DNNs is their


computational complexity. CNNs typically have lower computational
complexity than DNNs, as they are designed to process images
51

efficiently. This makes CNNs well-suited for applications that require


real-time processing, such as self-driving cars, robotics, and other
autonomous systems.

DNNs, on the other hand, are more computationally intensive


and require more processing power than CNNs. This makes DNNs
better suited for applications that require higher accuracy, such as
speech recognition, natural language processing, and other complex
tasks.

In summary, the main difference between CNNs and DNNs lies


in their architecture, training process, and computational complexity.
CNNs are specifically designed for image recognition and classification
tasks and use convolutional layers to extract features from the input
image. DNNs, on the other hand, can be used for a wide range of
applications and use a combination of fully connected layers,
convolutional layers, and other types of layers to extract features from
the input data. CNNs are typically trained using supervised learning,
while DNNs can be trained using a variety of methods. Finally, CNNs
have a lower computational complexity than DNNs and are better
suited for applications that require real-time processing, while DNNs
52

CHAPTER 6
IMPLEMENTATION

6.1 MODULES:

 User
 Application
 Face detection with Haar cascades
 Gender Recognition with CNN
 Age Recognition with CNN

MODULES DESCRIPTION:

 User
Human who gives a frame/video to detect gender and predict age.

 APPLICATION

Application uses the CNN algorithm first detects for faces in


each frame .Once it finds faces in the frame, the features of the
faces are extracted, and the gender is determined using second
layer of CNN .In the third layer of CNN, the age of the faces is
determined and falls under either of the 8 age ranges [ (0-3), (4-
8), (9-14), (15-20), (21-32),(33-43), (44-60), (61-100)] and Show
the result to the user

 Face detection with Haar cascades

OpenCV provides convenient methods to import Haar-


cascades for detecting faces in images. Face detection involves
detecting the presence and location of a face, but not identifying
the individual. OpenCV's Haar cascade XML file serves as a
53

classifier to identify specific objects from a webcam stream. The


"haarcascade_frontalface_default.xml" provided by OpenCV can
be used to recognize frontal faces. By connecting to a webcam,
users can scan their faces for classification based on age,
gender, and emotion by extracting 128-d feature vectors (called
"embeddings") that quantify each face in the image.

 Gender Recognition with CNN

OpenCV's "dnn" package includes a class called "Net" that


allows users to build a neural network. This package also
supports importing pre-trained neural network models from
popular deep learning frameworks like Caffe, TensorFlow, and
Torch. In this case, we will be using the Cafe Importer to import
a pre-trained CNN model.

 Age Recognition with CNN

This is almost similar to the gender detection part except


that the corresponding prototxt file and the caffe model file are
deploy_agenet.prototxt” and “age_net.caffemodel”. Furthermore,
the CNN’s output layer (probability layer) in this CNN consists of
8 values for 8 age classes (“0–2”, “4–6”, “8–13”, “15–20”, “25–
32”, “38–43”, “48–53” and “60-”)

1.prototxt — The definition of CNN goes in here. This file defines


the layers in the neural network, each layer’s inputs, outputs and
functionality.
2.caffemodel — This contains the information of the trained
neural network (trained model).
54

CHAPTER-7
RESULT

Fig 7: ACCURACY AND GENDER DETECTION IN WEBCAM GROUP


PHOTO

Fig 8: ACCURACY AND GENDER DETECTION IN WEBCAM SINGLE


PHOTO
55

Fig 9: ACCURACY AND GENDER DETECTION IMAGE IN


GROUP PHOTO

Fig 10: ACCURACY AND GENDER DETECTION IMAGE IN


SINGLE PHOTO
56

No. of people Gender detected Accuracy

7 people all male Male 73.21%

3 people males Male Approx. 85%

1 Person Male 87%

Table -1: Gender Analysis

No. of people Age detected Accuracy

7 people all male Between (20-49) 59%

3 people males Between(21-24) 81%

1 Person Between(24-25) 87%

Table 2: Age Analysis

The given code is a Python script for predicting the age and
gender of faces in images or from a live webcam feed using pre-trained
deep learning models. Here's an explanation of the code:
The code begins by importing necessary libraries: `cv2` for computer
vision tasks, `numpy` for numerical operations, and `argparse` for
command-line argument parsing.
The script defines various constants: - `FACE_PROTO` and
`FACE_MODEL`: Paths to the prototxt file and pre-trained model file
for face detection. - `GENDER_MODEL` and `GENDER_PROTO`: Paths
to the prototxt file and pre-trained model file for gender classification.
- `AGE_MODEL` and `AGE_PROTO`: Paths to the prototxt file and pre-
trained model file for age estimation. - `MODEL_MEAN_VALUES`:
Mean values used for preprocessing the input image. -
`GENDER_LIST`: List of gender labels ('Male' and 'Female'). -
`AGE_INTERVALS`: List of age intervals.
The script defines the `frame_width` and `frame_height` variables,
which specify the desired width and height of the processed frames.
57

The code uses OpenCV's `dnn` module to load the pre-trained


models into memory using the `cv2.dnn.readNetFromCaffe()` function.
The face detection, gender classification, and age estimation models
are loaded separately.
The `get_faces()` function takes an input frame and performs face
detection using the face detection model. It returns a list of bounding
box coordinates for detected faces that meet a given confidence
threshold.
The `display_img()` function displays an image in a window using
OpenCV's `imshow()` function. It waits for a key press and then closes
the window using `destroyAllWindows()`.
The `image_resize()` function resizes an input image to the specified
width and height while maintaining the aspect ratio. It uses OpenCV's
`resize()` function.
The `get_gender_predictions()` function takes a face image as
input and performs gender classification using the gender
classification model. It returns the gender predictions.
The `get_age_predictions()` function takes a face image as input and
performs age estimation using the age estimation model. It returns the
age predictions.
The `predict_age_and_gender()` function is the main function that
performs age and gender prediction. It takes an optional `input_path`
parameter, which specifies the path to an input image file. If the
`input_path` is provided, it reads the image from the file and performs
the prediction on that image. Otherwise, it initializes the webcam and
performs real-time prediction on the captured frames.
If an `input_path` is provided, the function reads the image from the
file and resizes it if necessary. It then calls the `get_faces()` function to
detect faces in the image. For each detected face, it extracts the face
region, performs age and gender predictions using the
`get_age_predictions()` and `get_gender_predictions()` functions, and
retrieves the predicted age and gender labels and confidence scores. It
draws a rectangle around the face and overlays the predicted age and
58

gender labels on the image. Finally, it displays the annotated image


using the `display_img()` function and saves it as "output.jpg".
If no `input_path` is provided, the function initializes the webcam
using `cv2.VideoCapture()` and starts a loop to capture frames from
the webcam. It performs the same steps as described above for each
captured frame. The annotated frames are displayed in a window until
the user presses the 'q' key to quit.
The `__name__ == "__main__"` block at the end of the script
checks if the script is being run directly and not imported as a
module. If command-line arguments are provided, the first argument
is assumed to be the input image file path, which is passed to the
`predict_age_and_gender()` function. Otherwise, the function is called
without any arguments, and it performs real-time prediction from the
webcam. Overall, the code uses pre-trained deep learning models to
detect faces, estimate age, and classify gender in images or from a live
webcam feed. It provides a simple interface for age and gender
prediction tasks.
59

CHAPTER 8
ADVANTAGES

1. Personalization: Age and gender are two key demographic


factors that can be used to personalize content and the user
experience. With age and gender prediction models, companies
can tailor their offerings to individual users based on their age
and gender, which can lead to increased engagement and
loyalty.

2. Targeted advertising: Knowing a user's age and gender can help


advertisers create more targeted and relevant ads. By using age
and gender prediction models, advertisers can ensure that their
ads are shown to the right demographic group, which can
increase the effectiveness of their campaigns.

3. Fraud prevention: Age and gender prediction models can be


used to detect fraudulent activity. For example, if a user claims
to be a certain age or gender but the prediction model suggests
otherwise, this could be a red flag for fraudulent behavior.

4. Improved customer service: age and gender prediction models


can help companies provide more personalized and relevant
customer service. For example, if a customer contacts a
company for support, the company can use the age and gender
prediction model to understand the customer's demographic
profile and provide tailored support.

5. Better data analysis: Age and gender prediction models can help
companies analyze their customer data more effectively
60

CHAPTER 9
CONCLUSION

The increasing demand for applications such as visual


surveillance, medical diagnosis, and marketing intelligence has
highlighted the need for a robust and efficient methodology to predict
age and gender from facial images. In response, this paper proposes a
age and gender are important facial attributes that have a significant
impact on social interactions, making age and gender estimation from
a single face image a crucial task for various intelligent applications
such as access control, marketing intelligence, and visual
surveillance. To achieve this, a system was developed that consists of
seven stages, including video capturing, frame selection, face
detection, feature extraction, gender detection, age prediction, and
result display. The CNN approach was used for age and gender
estimation, and a pre-trained model was utilized to extract features
from the image. The accuracy of age and gender prediction was found
to increase with the use of the caffe model in the results analysis. The
system was implemented using Python language and allowed for both
real-time and static detection of the face. Further improvements in
accuracy could be achieved by using more complex CNN architecture
and reliable image processing approaches in the future.
61

CHAPTER 10
FUTURE ENHANCEMENT

We will look into a more complex CNN architecture and a more


reliable image processing approach for estimating exact ages for
future work. We can use this project for electronic customers, and
crowd behavior analysis. When changing a data set, the same model
can be trained to predict the feelings of race, etc. Age and gender
classifications can be used to predict age and gender in uncontrolled
real-time situations such as train stations, banks, buses, airports, etc.
For example, depending on the number of male and female passengers
by age on the train station, toilets and restrooms can be built to
facilitate transportation.
62

BIBLIOGRAPHY

[1] J. Lu, V. E. Liong, and J. Zhou, ‘‘Cost-sensitive local binary feature


learning for facial age estimation,’’ IEEE Trans. Image Process., vol.
24, no. 12, pp. 5356–5368, Dec. 2015. [28] Z. Niu, M. Zhou, X. Gao,
and G. Hua. The Asian Face Age Dataset (AFAD). Accessed: Jun. 18,
2020. [Online]. Available: https://afadataset.github.io/
[2] Z. Niu, M. Zhou, L. Wang, X. Gao, and G. Hua, ‘‘Ordinal regression
with multiple output CNN for age estimation,’’ in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 4920–4928.
[3] H. Noh, S. Hong, and B. Han, ‘‘Learning deconvolution network for
semantic segmentation,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
Dec. 2015, pp. 1520–1528.
[4] T. Qin, X.-D. Zhang, D.-S. Wang, T.-Y. Liu, W. Lai, and H. Li,
‘‘Ranking with multiple hyperplanes,’’ in Proc. 30th Annu. Int. ACM
SIGIR Conf. Res. Develop. Inf. Retr. (SIGIR), 2007, pp. 279–286.
[5] R. Rothe, R. Timofte, and L. V. Gool, ‘‘DEX: Deep EXpectation of
apparent age from a single image,’’ in Proc. IEEE Int. Conf. Comput.
Vis. Workshop (ICCVW), Dec. 2015, pp. 10–15.
[5] R. Rothe, R. Timofte, and L. Van Gool, ‘‘Deep expectation of real
and apparent age from a single image without facial landmarks,’’ Int.
J. Comput. Vis., vol. 126, nos. 2–4, pp. 144–157, Apr. 2018.
[6] W. Samek, A. Binder, S. Lapuschkin, and K.-R. Müller,
‘‘Understanding and comparing deep neural networks for age and
gender classification,’’ in Proc. IEEE Int. Conf. Comput. Vis.
Workshops (ICCVW), Oct. 2017, pp. 1629–1638.
[7] C. Sammut and G. I. Webb, Eds., ‘‘Mean absolute error,’’ in
Encyclopedia of Machine Learning. Boston, MA, USA: Springer, 2010,
p. 652, doi: 10.1007/978-0-387-30164-8_525.
[8] A. V. Savchenko, ‘‘Efficient facial representations for age, gender
and identity recognition in organizing photo albums using multi-
output ConvNet,’’ PeerJ Comput. Sci., vol. 5, p. e197, Jun. 2019, doi:
10.7717/peerj-cs.197.
63

[9] P. Smith and C. Chen, ‘‘Transfer learning with deep CNNs for
gender recognition and age estimation,’’ in Proc. IEEE Int. Conf. Big
Data (Big Data), Dec. 2018, pp. 2564–2571. [38] N. Srinivas, H. Atwal,
D. C. Rose, G. Mahalingam, K. Ricanek, and D. S. Bolme, ‘‘Age,
gender, and fine-grained ethnicity prediction using convolutional
neural networks for the East Asian face dataset,’’ in Proc. 12th IEEE
Int. Conf. Autom. Face Gesture Recognit. (FG), May 2017, pp. 953–
960, doi: 10.1109/FG.2017.118.
[10] R. K. Srivastava, K. Greff, and J. Schmidhuber, ‘‘Training very
deep networks,’’ in Proc. Adv. Neural Inf. Process. Syst., 2015, pp.
2377–2385.
[11] P. Viola and M. J. Jones, ‘‘Robust real-time face detection,’’ Int.
J. Comput. Vis., vol. 57, no. 2, pp. 137–154, May 2004. [41] F. Wang,
M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang,
‘‘Residual attention network for image classification,’’ in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3156–
3164.
[12] X. Wang, R. Guo, and C. Kambhamettu, ‘‘Deeply-learned feature
for age estimation,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis.,
Jan. 2015, pp. 534–541.
[13] M. Xia, X. Zhang, L. Weng, and Y. Xu, ‘‘Multi-stage feature
constraints learning for age estimation,’’ IEEE Trans. Inf. Forensics
Security, vol. 15, pp. 2417–2428, 2020, doi:
10.1109/TIFS.2020.2969552.
[14] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, ‘‘Aggregated
residual transformations for deep neural networks,’’ in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1492–
1500.
[15] K. Zhang, C. Gao, L. Guo, M. Sun, X. Yuan, T. X. Han, Z. Zhao,
and B. Li, ‘‘Age group and gender estimation in the wild with deep RoR
architecture,’’ IEEE Access, vol. 5, pp. 22492–22503, 2017, doi:
10.1109/ACCESS.2017.2761849. 85688 VOLUME 9, 2021 A.
[16] R. Rothe, R. Timofte, and L. V. Gool, ‘‘DEX: Deep EXpectation of
64

apparent age from a single image,’’ in Proc. IEEE Int. Conf. Comput.
Vis. Workshop (ICCVW), Dec. 2015, pp. 10–15.
[17] R. Rothe, R. Timofte, and L. Van Gool, ‘‘Deep expectation of real
and apparent age from a single image without facial landmarks,’’ Int.
J. Comput. Vis., vol. 126, nos. 2–4, pp. 144–157, Apr. 2018.
[18] W. Samek, A. Binder, S. Lapuschkin, and K.-R. Müller,
‘‘Understanding and comparing deep neural networks for age and
gender classification,’’ in Proc. IEEE Int. Conf. Comput. Vis.
Workshops (ICCVW), Oct. 2017, pp. 1629–1638.
[19] C. Sammut and G. I. Webb, Eds., ‘‘Mean absolute error,’’ in
Encyclopedia of Machine Learning. Boston, MA, USA: Springer, 2010,
p. 652, doi: 10.1007/978-0-387-30164-8_525.
[19] A. V. Savchenko, ‘‘Efficient facial representations for age, gender
and identity recognition in organizing photo albums using multi-
output ConvNet,’’ PeerJ Comput. Sci., vol. 5, p. e197, Jun. 2019, doi:
10.7717/peerj-cs.197.
[20] P. Smith and C. Chen, ‘‘Transfer learning with deep CNNs for
gender recognition and age estimation,’’ in Proc. IEEE Int. Conf. Big
Data (Big Data), Dec. 2018, pp. 2564–2571.
[21] N. Srinivas, H. Atwal, D. C. Rose, G. Mahalingam, K. Ricanek,
and D. S. Bolme, ‘‘Age, gender, and fine-grained ethnicity prediction
using convolutional neural networks for the East Asian face dataset,’’
in Proc. 12th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG),
May 2017, pp. 953–960, doi: 10.1109/FG.2017.118.
[22] R. K. Srivastava, K. Greff, and J. Schmidhuber, ‘‘Training very
deep networks,’’ in Proc. Adv. Neural Inf. Process. Syst., 2015, pp.
2377–2385.
[23] P. Viola and M. J. Jones, ‘‘Robust real-time face detection,’’ Int. J.
Comput. Vis., vol. 57, no. 2, pp. 137–154, May 2004.
[24] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang,
and X. Tang, ‘‘Residual attention network for image classification,’’ in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 3156–3164.
65

[25] X. Wang, R. Guo, and C. Kambhamettu, ‘‘Deeply-learned feature


for age estimation,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis.,
Jan. 2015, pp. 534–541. [26] M. Xia, X. Zhang, L. Weng, and Y. Xu,
‘‘Multi-stage feature constraints learning for age estimation,’’ IEEE
Trans. Inf. Forensics Security, vol. 15, pp. 2417–2428, 2020, doi:
10.1109/TIFS.2020.2969552.
[27] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, ‘‘Aggregated
residual transformations for deep neural networks,’’ in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1492–
1500.
[28] S. Yan, H. Wang, Y. Fu, J. Yan, X. Tang, and T. S. Huang,
‘‘Synchronized submanifold embedding for person-independent pose
estimation and beyond,’’ IEEE Trans. Image Process., vol. 18, no. 1,
pp. 202–210, Jan. 2009.
[29] S. Yan, H. Wang, T. S. Huang, Q. Yang, and X. Tang, ‘‘Ranking
with uncertain labels,’’ in Proc. IEEE Multimedia Expo Int. Conf., Jul.
2007, pp. 96–99.
[30] S. Yan, H. Wang, X. Tang, and T. S. Huang, ‘‘Learning auto-
structured regressor from uncertain nonnegative labels,’’ in Proc.
IEEE 11th Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8.
[31] P. Yang, L. Zhong, and D. Metaxas, ‘‘Ranking model for facial age
estimation,’’ in Proc. 20th Int. Conf. Pattern Recognit., Aug. 2010, pp.
3404–3407.
66

APPENDIX
import cv2
import numpy as np

MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744,


114.895847746)

GENDER_LIST = ['Male', 'Female']

GENDER_MODEL = 'deploy_gender.prototxt'
GENDER_PROTO = 'gender_net.caffemodel'

FACE_PROTO = "deploy.prototxt"
FACE_MODEL = "res10_300x300_ssd_iter_140000_fp16.caffemodel"

AGE_MODEL = 'deploy_age.prototxt'
AGE_PROTO = 'age_net.caffemodel'

AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)','(21, 24)', '(25,
32)', '(38, 43)', '(48, 53)', '(60, 100)']

frame_width = 1280
frame_height = 720

face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)


age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL,
GENDER_PROTO)

def get_faces(frame, confidence_threshold=0.5):


blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0,
123.0))
face_net.setInput(blob)
output = np.squeeze(face_net.forward())

faces = []

for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * \
np.array([frame.shape[1], frame.shape[0],
67

frame.shape[1], frame.shape[0]])

start_x, start_y, end_x, end_y = box.astype(np.int)

start_x, start_y, end_x, end_y = start_x - \


10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

def image_resize(image, width = None, height = None, inter =


cv2.INTER_AREA):
dim = None
(h, w) = image.shape[:2]
if width is None and height is None:
return image

if width is None:

r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

def get_gender_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False, crop=False
)
gender_net.setInput(blob)
return gender_net.forward()
68

def get_age_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
age_net.setInput(blob)
return age_net.forward()

def predict_age_and_gender():
"""Predict the gender of the faces showing in the image"""
# create a new cam object
cap = cv2.VideoCapture(0)

while True:
_, img = cap.read()
# Take a copy of the initial image and resize it
frame = img.copy()
# resize if higher than frame_width
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# predict age
age_preds = get_age_predictions(face_img)
# predict gender
gender_preds = get_gender_predictions(face_img)
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"{gender}-{gender_confidence_score*100:.1f}%, {age}-
{age_confidence_score*100:.1f}%"
# label = "{}-{:.2f}%".format(gender,
gender_confidence_score*100)
#print(label)
69

yPos = start_y - 15
while yPos < 15:
yPos += 15
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y),
box_color, 2)
# Label processed image
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, 0.54, box_color, 2)

# Display processed image


cv2.imshow("Gender Estimator", frame)
if cv2.waitKey(1) == ord("q"):
break

cv2.destroyAllWindows()

if __name__ == "__main__":
predict_age_and_gender()

You might also like