Assignment 2
Assignment 2
Introduction:
This assignment is about gender classification and age estimation. Since the rise of social platforms and
social media, age estimation and gender classification has become relevant to an increasing number of
applications. Attempting to make age and gender estimation from a facial image is an important task in
intelligent applications such as access control, human-computer interaction, law enforcement,
marketing intelligence, and visual surveillance etc [1]. This assignment uses labeled dataset of 5000
images containing both men and women with the maximum age of 116. This assignment requires
training of Convolutional neural network (CNN) model with the provided labeled dataset. After the
successful training of a model, we are required to evaluate it using unknown data.
I have used one Convolutional neural network model with one input which is image of a person and
two outputs, age and gender of a person. The size of input image is fixed to 128*128. I have used six
convolutional layers. The input later along with the other five layers of kernel size 3 and activation
function ‘relu’. The activation function in a neural network is in charge of converting the node's summed
weighted input into the node's activation or output for that input.
The rectified linear activation function, abbreviated ReLU, is a piecewise linear function that outputs the
input directly if it is positive; otherwise, it outputs zero. It has become the default activation function for
many types of neural networks because it is easier to train and often results in better performance. Here
ReLU is used because the output needs to be in two classes. (Gender and race). One layer of output is
using sigmoid for age estimation and other layer of output uses softmax for gender classification. The
sigmoid function is used primarily because it exists between (0 to 1). As a result, it is particularly useful
for models that require us to predict the probability as an output. Because the probability of anything
exists only between 0 and 1, the sigmoid is the best choice. In neural network models that predict a
multinomial probability distribution, the SoftMax function is used as the activation function in the
output layer. That is, softmax is used as the activation function in multi-class classification problems
where class membership on more than two class labels is required.
For training the data is split into 70/30. Seventy percent of data is used for training CNN model and 30
percent of data is used to test the model. The RMSprop optimizer is used to balance the step size for
large gradients to avoid exploding, and increase the step for small gradients. The metrics used for age
estimation is ‘mae’ and for gender classification is ‘accuracy’. Mae is used because it measures the mean
of absolute error by giving absolute difference between the real value and the predicted value. The
model is trained on 10 epochs that took 80 minutes to train the dataset of size 5000.
Following is the learning curve showing the loss and accuracy obtained during training and validation.
The above figure shoes the Training and validation accuracy and loss of gender and age classification.
The figure shows that training loss linearly decreased to almost zero and validation loss shows irregular
trend. The gender training and validation accuracy increases with the epochs. The age training accuracy
is decreasing with increasing epochs and loss shows reverse trend. This may be due to less training and
validation data.
Pre-trained CNN:
Because the UTKFace dataset is too small to capture the complexities of age and gender estimation, we
concentrated our efforts on leveraging transfer learning. I have used VGG16 which is a convolutional
neural network model for image recognition. It uses in ImageNet, a dataset of over 14 million images
classified into 1000 classes, the model achieves 92.7 percent top-5 test accuracy. It has pre-trained
weights; I have transferred learning from previous model to this model. The dense layer with 512 hidden
units and ReLu activation is added. The final layer is added with a sigmoid activation function because
we want the classification as output. Since Age and Gender estimation is a classification problem, so
binary_crossentropy loss function is uses because it computes loss between actual value and predicted
value. The model has produced the following results.
The RMSprop optimizer is used to balance the step size for large gradients to avoid exploding, and
increase the step for small gradients. The metrics used for age estimation is ‘mae’ and for gender
classification is ‘accuracy’. Mae is used because it measures the mean of absolute error by giving
absolute difference between the real value and the predicted value. The model is trained on 10 epochs
that took 60 minutes to train the model. This fine-tuned model result in huge improvement in results.
The gender output accuracy was improved from 85 percent to about 97 percent which is a huge
improvement. The age output accuracy was improved from 80 percent to 89 percent which again is a
huge improvement. This is because the VGG16 is trained on 14 million images of different classes and it
achieved 93 percent accuracy. We transferred our learning from CNN trained on 5000 dataset to CNN
with pretrained weights of 14 million images that resulted in huge improvement.
Whether Fine-Tuning improves the classification accuracy over just using Transfer Learning depends on
the pre-trained model, the transfer-layer you choose, your dataset, and how you train the new model.
Fine-tuning may result in improved performance, or it may result in worse performance if the fine-tuned
model overfits your training data. In this age and gender classification problem, finetuning has proved to
be beneficial and produced excellent results [3]. The VGG16 model was trained on the so-called
ImageNet dataset which may have contained many images of people. The lower layers of a
Convolutional Neural Network can recognize many different shapes or features in an image. It is the last
few fully-connected layers that combine these features to classify gender and age.
References:
[1] Age and Gender Prediction using Convolutional Neural Network. Available at:
https://www.ukessays.com/assignments/predicting-age-and-gender-from-facial-images-4366.php
[2] A Gentle Introduction to the Rectified Linear Unit (ReLU) by Jason Brownlee on January 9, 2019 in
Deep Learning Performance. Available at: https://machinelearningmastery.com/rectified-linear-
activation-function-for-deep-learning-neural-networks/