ML Working (1)
ML Working (1)
Neural Network
Arjun Singh1, Nishant Rai2, Prateek Sharma3,*, Preeti Nagrath4 and Rachna
Jain5
1, 2, 3, 4, 5
Computer Science and Engineering Department, Bharati Vidyapeeth
College of Engineering, New Delhi
1
[email protected]
2
[email protected]
3
[email protected]
4
[email protected]
5
[email protected]
Abstract: Automatic age and gender prediction from face images has lately attracted a lot of interest.
There are many applications in a variety of facial investigations. With the use of the aforementioned
technology, we are able to identify a person's age and gender based just on a quick glance at a camera,
photograph, or movie. This research paper will present a convolutional neural network (CNN) that uses
deep learning, as well as methods, algorithms, and how everything works together to classify gender and
age. Technology will also highlight its importance and how it may be used to enhance our daily lives.
For age-gender classification, the dataset was obtained from IMDb-WIKI, and for emotion detection, it
was obtained from Kaggle's Fer2013 dataset. Two models are used in this design; one is trained to
predict age and gender using a broad ResNet framework, while the other is taught to recognise emotions
using a traditional CNN architecture. Compared to classifier-based approaches, our technique exhibits
higher classification accuracy for both age and gender. The map also demonstrates how this technology
may be utilised for our advantage and examines the wide range of businesses where it might be put to
use, including matrimonial sites, CCTV cameras, and government organisations..
1 Introduction
Deep learning presents the idea of end-to-end learning, in which the system is instructed on
what to search for in respect to each particular object class. This presents the brevities and
characteristics that are most important to each thing. To put it another way, neural networks are
given the task of identifying the underlying patterns that are present in collections of
photographs [3].
One of the many approaches that may be used to determine a person's age and gender as well as
their emotions is the use of deep convolution neural networks for emotion identification. It is
the solution, which is a trained neural network that can distinguish emotions on a human face
by analysing an input stream of video in real time [4].
In the past, methods for estimating or identifying these characteristics from face photos
depended on changes in the size of facial features [5] or "tailored" face descriptors. Most have
used categorization systems that were developed specifically for age or gender estimation
tasks[6].
The various requirements of the module in the process of recognition of age, gender and
emotion include:
The training of the networks requires the latest, large and different types of datasets.
Training requires high power in computation due to the large size of the dataset and high
number of parameters for the CNN models.
The post deployment results will be near real time only, due to the low FPS from the webcams
When attempting to determine a person's age and gender, as well as their emotions, there are a
variety of challenges that must often be overcome. According to the research that have been
done so far, ageing significantly impacts the ability to recognise facial expressions. In addition,
gender is a significant factor in the identification of feelings. The identification of emotions in
females is superior to that of men in this respect. [3]
The hierarchical method will be used in the system that we propose in order to determine the
age, gender, and state of mind of the individual who stand in front of the camera. This approach
is done using two separate modules. The first module, known as the "emotion" module, is
responsible for receiving input from recognised facial features and providing output. The result
of the emotion module's processing, which is then used as an input for the age-gender module.
The output of the age-gender module is the ultimate integrated result of both modules, which
displays the age, gender, and emotion of the input data. This result is obtained by integrating the
results of both modules.
i. Anger
ii. Disgust
iii. Happy
iv. Surprise
v. Fear
Electronic copy available at:
vi. Neutral
vii. Sad
The two datasets are used first one is IMDB-WIKI dataset which is used for age and gender
detection and another one is FER 2013 which is used for emotion detection
In this research paper, we endeavour to study about age prediction, emotion and gender by
using face images and suggest an effective method and significantly optimized neural
architecture for the cause
This first section contains introduction which gives a short overview of the domain of the
project and the various methods being used for the execution of the project. The second section
contains the literature survey which includes the previous researches related to this project. The
third section describes about deep learning and the architectures used. The fourth section
describes about Cnn which is used for Image classification and recognition .The fifth section
contains our proposed system along with the architecture diagram which explains about how
our project works. The sixth section consists of the experiments section which shows the setup
required for running the module and the various datasets used. The seventh section discusses
about the results that have been obtained after the experiment has been conducted. It shows us
the analysis of the result obtained along with the graph of the accuracy and loss versus the
number of epochs. The eight section states about the conclusion of the project and the eight
section contains the researches which were referred while preparing this project.
Before describing the proposed method, we briefly review related methods for emotion, age and
gender classification and provide a cursory overview of deep convolutional networks.
Electronic copy available at:
2.1. Emotion, Age and Gender Classification
We began our background search with research papers and blogs posted online, related to our
topic. The research paper details:
Emotion Classification:
A study on face emotion recognition that reveals the characteristics of the dataset and facial emotion recognition
study classifier[7]. In order to further analyse the strategies for emotion identification, [8] examines the visual
characteristics of the picture and discusses a few classifier algorithms. This study [9] investigated the use of several
classes of classifiers to predict future responses to pictures based on the identification of emotions.
Using filter banks and Deep CNN[10], which has a high accuracy rate, we can identify emotions from face photos.
From this, we may deduce that deep learning can also be utilised to detect emotions. Additionally, picture
spectrograms with deep convolutional networks, as demonstrated in [11], may be used to recognise facial
expressions.
Using several classifiers including KNN, HMM, GMM, and SVM, different picture kinds and emotions were
investigated for deducing expressions from face expressions [12]. This article [13] discusses how to acquire
important features for face emotion identification, including salient discriminative feature analysis, local invariant
feature learning, and support vector machine training. Convolution neural networks are used to analyse and train a
variety of key aspects to identify emotions. The dataset is taken from a variety of emotional databases, including
SAVEE, Emo-DB, DES, and MES.
Age Classification:
Numerous approaches have been proposed to solve the issue of automatically extracting age-related information
from face photos in recent years. A thorough analysis of these techniques may be found in [14] and, more recently,
[15]. The techniques in the survey below may be used for either purpose, even though our emphasis is on age
group categorization rather than accurate age estimate (i.e., age regression).
Calculating ratios between various measures of face traits is the foundation of early age prediction techniques [16].
After localising and measuring the sizes and distances of facial features (such as the eyes, nose, mouth, chin, etc.),
proportions between them are computed and utilised to categorise the face into various age orders in accordance
with hand-crafted guidelines. A more recent study [17] models age progression in people under the age of 18 using
a similar methodology. These techniques are inappropriate for in-the-wild photographs that one could anticipate to
encounter on social media sites since they need correct localisation of face characteristics, a difficult challenge in
and of itself.
Gender Classification:
A detail analysis of gender categorization techniques found in [18] and, more recently [19]. Here, we rapidly
review key techniques. A neural network that had been trained on a limited collection of near-frontal facial
pictures was one of the first approaches for gender classification [20]. In [21], he combined the use of picture
intensities with the 3D anatomy of the skull (obtained by a ray scanner) to determine gender. [22] employed
SVM classifiers and applied them directly to picture intensities. Instead of utilising SVM, [23] employed
AdaBoost for the same function, this time on picture intensities. Finally, offered viewpoint-invariant age and
gender categorization[24].
Deep Learning
The use of deep learning involves the construction of artificial neural networks with numerous layers of linked
nodes. These layers are structured in a hierarchy, with the lower layers usually learning basic characteristics such
as edges and forms, and the higher layers learning more complicated functions that are combinations of the
features learned at the lower levels of the hierarchy. The neural network will make adjustments to the weights and
biases of the connections between nodes as part of the training process. This is done with the objective of reducing
error and increasing the precision of the network's predictions.
Deep learning has been shown to be effective in a wide variety of applications, including the recognition of
images and sounds, the processing of natural languages, and even the playing of games like chess and go.
Anomaly detection, medical diagnosis, and the search for new drugs are some of the other applications of this
technology. One of the primary benefits of deep learning is that it is able to learn from data that is either
unstructured or unlabeled. This is made possible by the fact that it can recognise patterns and characteristics in
the data on its own. Because of this, it is an effective tool for dealing with datasets that are both huge and
complicated.
CNN
Convolutional neural networks, often known as CNNs, are a kind of neural network that are well suited for
performing image classification tasks due to its architecture. The structure of the visual cortex in the brain, which
processes visual information by segmenting it into smaller bits and analysing each component on its own, served as
an inspiration for these. One example that is recent and noteworthy is the use of CNN to the difficult task of picture
classification on the ImageNet benchmark. [26]
The input layer of a convolutional neural network (CNN) is a matrix of pixel values that represents a picture. This
matrix is then sent through one or more convolutional layers, each of which applies a set of filters, sometimes
known as kernels or weights, to the input in order to derive features from the input. Each filter is a tiny matrix of
weights that is used to identify a particular feature in the input, such as an edge or a corner. For example, an edge
filter may be used to detect a corner. In order to apply the filters to the input, one must first slide them over the
input matrix and then carry out a dot product at each place.
The output of the convolutional layer is a new matrix that is referred to as the feature map. This matrix is
comprised of the values that were created by the filters at each place in the input. After that, the feature map is
processed by a non-linear activation function rectified linear unit (ReLU), which introduces non-linearity into the
model and makes it possible for the model to learn more complicated patterns in the data.
After the output of the activation function has been processed, it is often sent to a pooling layer. This layer then
applies a down sampling operation to the feature map in order to make it more compact. This may assist to lower
the computing cost of the model and make it more resistant to little shifts in the input. Additionally, this can make
the model more resilient.
The output is often routed via a fully-connected layer after it has been processed by one or more convolutional and
pooling layers. This fully-connected layer then carries out a linear operation on the input and generates a collection
of output values. The output values that were utilised to construct a prediction about which category the input
picture belonged to out of a few different options.
CNNs have shown to be effective in a broad variety of image classification tasks, and their performance has
advanced to a state-of-the-art level on many different benchmarks..
With the use of a webcam and opencv, the model initially uses yolo [5] to identify the face of the subject in front of
the camera. Yolo is a tool for real-time object detection. The network design, which comprises of numerous deep
neural networks, is then sent each frame as input. The Conv2D layer receives the input picture first. This layer
creates output that is fed into the MaxPool 2D and is used to resize or remove extraneous elements from the input
picture. There are other hidden layers in between that are used to extract facial traits and carry those features
forward. The MaxPool 2D outputs to the Softmax layer after taking the image's maximum value, or its
characteristics. The output of the emotion model is sent forward in the form of an HDF5 file and given as input to
the age-gender module using this layer, which converts the result into a probability and delivers it that way. This
input goes through several hidden layers that are used to extract features from the face before being passed to
Global avg pooling 2D, which computes the average value of all values before passing the output to softmax layer,
which creates the final integrated result of both modules that displays the input data's age, gender, and emotion.
A. Challenges Associated the difficulties encountered while working on this project are:
• Integrating the age, gender, and emotion modules was a demanding and difficult undertaking to do.
• Some feelings were mistaken for other emotions throughout the process of emotion identification. For example,
the emotions of disgust and anger may be quite similar and may thus be confused with one another.
• The quantity of the dataset and the amount of parameters for the CNN models need substantial computational
power for training.
5. Requirement Analysis
+ 3. Python
4. NumPy
5. Tensorflow
5.2 Hardware
Intel core processor with high GPU power & frequency
5.3 Dataset
Electronic copy available at:
6. Matplotlib
The IMDB-WIKI collection includes more than 5000 photos that may be on Wikipedia3 and IMDb2. The
following categories are included inside this dataset: 1) acquisition data 2) gender 3) birth date (DOB) 4) facial
score (FS).
The fer2013 dataset from kaggle[2] is used for the purpose of emotion detection. This dataset comprises of more
then 35000 face photos with 7 different categories of emotions: 1)Angry 2)Disgust 3)Fear 4)Happy 5)Sad
6)Surprise 7)Neutral.
The various results have been obtained which are reported on the various publicly available datasets i.e. FDDB,
IMDB-wiki, UTKFace and Fer-2013. The accuracy reported has been noted down and it is being compared with
various previously available methods.
Live Prediction
References
[1] https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/
[2] https://www.kaggle.com/deadskull7/fer2013
[3] 2….Bailenson, Jeremy N., et al. "Real-time classification of evoked emotions using facial feature tracking
and physiological responses." International journal of human-computer studies 66.5 (2008): 303-317.
[4] 4…..Levi, G., Hassner., T.: Age and gender classification using convolutional neural networks. .
In: CVPRW. (2015)
[5] 5……Y. H. Kwon and N. da Vitoria Lobo. Age classification from facial images. In Proc. Conf. Comput.
Vision Pattern Recognition, pages 762–767. IEEE, 1994
[6] 6……….J.-J. Ding. Facial age estimation based on label-sensitive learning and age-oriented regression.
Pattern Recognition, 46(3):628–641, 2013.
[7] Emotion G. Hintin ,Greves and A. Mohemed, “Emotion Recognition with Deep Recurrent Neural
Networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp.
6645–6649
[8] K.Weint and C.-W. Huaang, “Characterizing Types of Convolution in Deep Convolutional Recurrent
Neural Networks for Robust Speech Emotion Recognition,” pp. 1–19, 2017.
[10] K.-Y. Hueng, C.-H. Wiu, T.-H. Yieng, M.-H. Sha and J.-H. Chiu, “Emotion recognition using
autoencoder bottleneck features and LSTM,” in 2016 International Conference on Orange Technologies
(ICOT), 2016,pp.1-4.
[11] M. N. Sttilar, M. Leich, R. S. Bolie, and M. Skinter, “Real time emotion recognition using RGB image
classification and transfer learning,” 2017 11th Int. Conf. Signal Process. Commun. Syst., pp. 1–8, 2017
[12] R. Ashrafidust, S. Setaeyeshi, and A. Shaarifi, “Recognizing Emotional State Changes Using Facial
Expression,” 2016 Eur. Model. Symp., pp. 41–46, 2016.
[13] C. Busiso et al., “IEMOCAP: Interactive emotional dynamic motion capture database,” Lang. Resour.
Eval., vol. 42, no. 4, pp. 335–359, 2008.
[15] 21…… H. Han, C. Otto, and A. K. Jain. Age estimation from face images: Human vs. machine
performance. In Biometrics (ICB), 2013 International Conference on. IEEE, 2013
[16] 29…….Y. H. Kwon and N. da Vitoria Lobo. Age classification from facial images. In Proc. Conf.
Comput. Vision Pattern Recognition, pages 762–767. IEEE, 1Recognition, pages 1725–1732. IEEE, 2014
[17] 41…… N. Ramanathan and R. Chellappa. Modeling age progression in young faces. In Proc. Conf.
Comput. Vision Pattern Recognition, volume 1, pages 387–394. IEEE, 2006.
[18] Gender 34…….E.Makinen and R. Raisamo. Evaluation of gender classification methods with
automatically detected and aligned faces. Trans. Pattern Anal. Mach. Intell., 30(3):541–547, 2008.
[19] 37….A. J. O’toole, T. Vetter, N. F. Troje, H. H. Bulthoff, et al. Sex ¨ classification is better with three-
dimensional head structure than with image intensity information. Perception, 26:75–84, 1997.
[20] 17…….. B. A. Golomb, D. T. Lawrence, and T. J. Sejnowski. Sexnet: A neural network identifies sex
from human faces. In Neural Inform. Process. Syst., pages 572–579, 1990.
[21] 37……A. J. O’toole, T. Vetter, N. F. Troje, H. H. Bulthoff, et al. Sex ¨ classification is better with three-
dimensional head structure than with image intensity information. Perception, 26:75–84, 1997.
[22] 35……. Moghaddam and M.-H. Yang. Learning gender with support faces. Trans. Pattern Anal. Mach.
Intell., 24(5):707– 711, 2002
[23] Rowley. Boosting sex identification performance. Int. J. Comput. Vision, 71(1):111–119, 2007
[24] 49….M. Toews and T. Arbel. Detection, localization, and sex classification of faces from arbitrary
viewpoints and under occlusion. Trans. Pattern Anal. Mach. Intell., 31(9):1567–1581, 2009
[25] 3…J. Schmidhubar, “Deep Learning in neural networks: An overview,” Neural Networks, vol. 61, pp.
85–117, 2015.
[26] 28……A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional
neural networks. In Neural Inform. Process. Syst., pages 1097–1105,