0% found this document useful (0 votes)
6 views46 pages

Methodology

Uploaded by

Rj SL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views46 pages

Methodology

Uploaded by

Rj SL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Overview of the System

This research focuses on the development of an interactive display


system for custom content delivery. The system is designed to
process group images and extract key audience attributes, including
age, gender, emotion, and social relationships such as families and
couples. Using this extracted data, the system employs an expert
system to analyse the audience profile and generate
recommendations for the most suitable advertisement categories
from a predefined set of 11 categories.

To ensure a seamless user experience, a graphical user interface


(GUI) has been developed. The GUI allows users to upload group
images, visualize the extracted audience data, and display the
recommended advertisements. This integration of machine learning
models, expert systems, and a user-friendly interface ensures
accurate and efficient delivery of custom content based on audience
characteristics.

Dataset and Preprocessing

Dataset and Preprocessing for the Age Classification Model

Datasets

The model used the combined_faces dataset, which aggregates


images from diverse public datasets containing labelled face data.
Additionally, the training dataset was augmented to include a total
of 234,000 images. The augmentation aimed to address issues of
class imbalance and enhance the model's ability to generalize
across diverse facial characteristics.

Images were classified into seven age classes based on intuitive age
ranges. The dataset included faces with varying lighting conditions,
angles, and expressions to simulate real-world scenarios. The labels
were curated to avoid overlap between classes, ensuring clarity in
classification tasks.

Preprocessing

Input images were converted to grayscale to reduce the


computational complexity of the model. Since age prediction
primarily depends on facial texture and feature contours, colour
information was deemed unnecessary. Implemented using
TensorFlow's decode_jpeg function with the channels=1 parameter.
The conversion reduced the model input size and focused
computations on relevant features.

All images were resized to 200x200 pixels to ensure uniformity in


input dimensions. This standardization allowed the CNN to process
images consistently, irrespective of their original sizes. Implemented
using TensorFlow's resize function with bilinear interpolation to
minimize pixelation effects. Smaller dimensions were chosen to
balance feature preservation and computational efficiency.

Age values from the dataset were mapped into seven distinct age
classes for multi-class classification:

Class 0: 1–2 years, Class 1: 3–9 years, Class 2: 10–20 years, Class
3: 21–27 years, Class 4: 28–45 years, Class 5: 46–65 years and
Class 6: 66+ years

This mapping reduced the complexity of predicting continuous


values, making it easier for the model to focus on key transitions in
age-related features. Labels were normalized and stored alongside
their corresponding images to facilitate training.
To enhance model generalization and mitigate the effects of class
imbalance, the dataset was pre-augmented using the following
techniques:

 Rotation: Random rotations (up to ±20 degrees) simulated


variations in head orientation.
 Flipping: Horizontal flipping represented mirrored facial
features.
 Scaling: Random scaling ensured the model learned to
handle slight changes in facial size.

Augmentation was applied to the training dataset only, ensuring the


test dataset remained representative of real-world data.
Filenames and age labels were converted into TensorFlow tensors for
seamless integration with the TensorFlow data pipeline. Each image
was read from the file path, decoded into arrays, and passed
through the grayscale conversion and resizing pipelines. This
conversion allowed preprocessing operations to be batched and
parallelized, reducing the overall training time.

Training and testing datasets were divided into batches of 512


images. This batch size was chosen to optimize GPU memory usage
and training throughput. The tf.data.Dataset API handled batching,
shuffling, and prefetching to maximize data-loading efficiency
during training. Each batch included a balanced mix of images from
all seven classes, ensuring consistent gradient updates across
epochs.

Dataset and Preprocessing for the Gender Model

Dataset

The model used the UTKFace dataset, which contains images with
metadata embedded in their filenames in the following format:
[age]_[gender]_[race]_[date and time].jpg. For this project, only the
gender labels (denoted by the second field in the filename) were
extracted to train the gender classification model.

The dataset includes a wide variety of faces with diverse


characteristics such as lighting conditions, angles, and expressions.
Gender labels are binary. (0: Male, 1: Female)

Preprocessing Steps

The UTKFace dataset was imported from Google Drive. Filenames


were parsed to extract gender labels using Python's split() function.
The extracted gender labels were stored in an array for mapping
with corresponding image data.

Each image was read using OpenCV's cv2.imread() function. To


reduce computational complexity and focus on essential features, all
images were converted to grayscale using cv2.cvtColor() with the
cv2.COLOR_BGR2GRAY flag. This conversion helped emphasize
facial structure over colour information.

mages were resized to a uniform size of 100x100 pixels using


OpenCV's cv2.resize() function. This ensured all input images had
consistent dimensions, allowing for efficient processing by the
convolutional neural network (CNN).

The dataset was split into training and testing sets using the
train_test_split() function from the sklearn library. (Training Set: 75%
of the dataset, Test Set: 25% of the dataset)

Pixel values were normalized to a range of [0, 1] by dividing by 255,


standardizing the data and aiding in faster convergence during
training.

Images and labels were converted into NumPy arrays for


compatibility with TensorFlow's data pipeline. Data batches were
created during training for efficient computation.

Dataset and Preprocessing for the Emotion Model

Dataset:
The CK+ (Cohn-Kanade Plus) dataset was used to train the emotion
recognition model. This dataset consists of facial images annotated
with discrete emotion labels, including anger, contempt, disgust,
fear, happiness, sadness, and surprise.

Preprocessing Steps:

To align with the model’s use case, the original emotion labels were
mapped into three categories:

 Positive: Includes happiness and surprise.


 Negative: Includes anger and sadness.
 Neutral: Includes contempt, disgust, and fear.

All images were converted to grayscale to simplify computations


while retaining essential facial features necessary for emotion
detection.

Images were resized to 48x48 pixels to maintain consistency in


input size across the dataset.
Pixel values were scaled to the range of 0 to 1 by dividing by 255.
This step was performed to ensure computational efficiency and
faster convergence during training.

Emotion labels were one-hot encoded to represent the three


emotion categories ([1,0,0] for positive, [0,1,0] for negative, and
[0,0,1] for neutral).

The pre-processed dataset was divided into training and testing sets
using an 80:20 split, ensuring the model was trained on diverse
samples while retaining a portion for evaluation.

These preprocessing steps ensured uniformity in input data, reduced


complexity, and improved the model's ability to generalize to
unseen facial expressions.

Model Architecture and Training process

Model Architecture and Training process of age model

The Convolutional Neural Network (CNN) architecture is designed for


age range classification. It is a layered structure optimized to extract
spatial features from input images and classify them into predefined
categories. Below is a detailed explanation of its components,

CNN Architecture
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 198, 198, 32) 320

average_pooling2d (AverageP (None, 99, 99, 32) 0


ooling2D)

conv2d_1 (Conv2D) (None, 97, 97, 64) 18496

average_pooling2d_1 (Averag (None, 48, 48, 64) 0


ePooling2D)

conv2d_2 (Conv2D) (None, 46, 46, 128) 73856

average_pooling2d_2 (Averag (None, 23, 23, 128) 0


ePooling2D)

conv2d_3 (Conv2D) (None, 21, 21, 256) 295168

average_pooling2d_3 (Averag (None, 10, 10, 256) 0


ePooling2D)

global_average_pooling2d (G (None, 256) 0


lobalAveragePooling2D)

dense (Dense) (None, 132) 33924

dense_1 (Dense) (None, 7) 931

=================================================================

The model begins with an input layer that accepts images of size
100×100 pixels, converted to grayscale. This grayscale format
reduces computational complexity by eliminating the need to
process multiple colour channels (i.e., RGB). The image is
represented as a matrix of pixel values, typically ranging from 0 to
255 for grayscale. The input layer prepares these values for further
processing by the convolutional layers.

This layer applies 32 filters, each with a kernel size of 3×33 \times
33×3 to the input image. A filter is essentially a small matrix that
slides over the image to detect patterns such as edges or textures.
The ReLU (Rectified Linear Unit) activation function is applied to
introduce non-linearity. ReLU replaces any negative values in the
feature map with zeros, helping the model learn more complex
patterns. Following this, an AveragePooling2D layer is used to
reduce the spatial size of the feature maps by downsampling. This
helps reduce computational load and retains only the most essential
features.
The architecture includes three sets of convolutional layers. Each set
contains:

 A Conv2D layer with filters: 64, 128, and 256, respectively.


As you progress through these layers, the model begins
detecting increasingly complex features, from simple edges
to more intricate patterns and textures.

 Each convolutional layer is followed by an AveragePooling2D


layer, which further reduces the size of the feature maps,
focusing on the most important features.

The increase in the number of filters as you move deeper into the
network allows the model to capture more abstract and hierarchical
patterns.

After the convolutional layers, the Global Average Pooling (GAP)


layer takes the output from the previous layers and averages each
feature map into a single value. This reduces the output from a large
3D tensor into a 1D vector of fixed length, in this case, 256
dimensions (equal to the number of filters in the final convolutional
layer). The GAP layer helps to reduce the model’s complexity and
also mitigates the risk of overfitting by summarizing the feature
maps into a fixed-size vector.

Following the GAP layer, the model has a Dense layer with 132
neurons. Each neuron is connected to every output from the
previous layer, allowing the model to learn high-level abstractions
from the features extracted by the convolutional layers. This Dense
layer uses ReLU activation to introduce non-linearity and help the
model learn complex patterns.

The final layer is another Dense layer with 7 neurons, corresponding


to 7 age range categories. Each neuron represents a possible class
(i.e., an age range). Softmax activation is used here to convert the
output into a probability distribution. The softmax function ensures
that the sum of the output probabilities is 1, and the highest
probability corresponds to the predicted age range.

Training Process

The categorical cross-entropy loss function is used because this is a


multi-class classification problem. Categorical cross-entropy
measures how far the predicted probability distribution (from the
output layer) is from the actual distribution (the true labels). During
training, the goal is to minimize this loss by adjusting the weights of
the network to reduce the difference between the predicted and
actual values.

The Adam optimizer is employed for efficient training. Adam


combines the benefits of momentum and adaptive learning rates,
making it well-suited for training deep networks. It adapts the
learning rate for each parameter based on the gradients of the loss
function, which helps the model converge faster and more reliably.
Adam is preferred because it works well with noisy gradients and is
computationally efficient.

The model is trained for 60 epochs, meaning that the entire training
dataset is passed through the network 60 times. Each epoch helps
the model gradually adjust its weights to improve performance. The
batch size is set to 512, meaning that 512 images are processed in
one go before the model’s weights are updated. This helps to
stabilize the training process by averaging gradients over a larger
set of data.

TensorBoard is used for real-time visualization of training progress,


showing metrics like loss and accuracy across epochs. This helps in
monitoring the learning process and diagnosing any issues like
overfitting or underfitting. ModelCheckpoint saves the model’s
weights whenever it achieves a better validation accuracy, ensuring
that the best-performing model is preserved.

During training, the images (with their corresponding labels) are fed
into the network in batches. For each batch:

 The model generates predictions for the input images.

 The loss function (categorical cross-entropy) calculates the


error between the predicted output and the true label.

 The optimizer (Adam) updates the model's weights based on


the computed gradients.

This process continues across the 60 epochs until the model has
learned to make accurate predictions.
Epoch 58/60
458/458 [==============================] - ETA: 0s - loss: 0.2502
- accuracy: 0.9002
Epoch 00058: val_accuracy did not improve from 0.82481
458/458 [==============================] - 137s 299ms/step -
loss: 0.2502 - accuracy: 0.9002 - val_loss: 0.6888 -
val_accuracy: 0.8015
Epoch 59/60
458/458 [==============================] - ETA: 0s - loss: 0.2570
- accuracy: 0.8969
Epoch 00059: val_accuracy did not improve from 0.82481
458/458 [==============================] - 136s 296ms/step -
loss: 0.2570 - accuracy: 0.8969 - val_loss: 0.7584 -
val_accuracy: 0.7819
Epoch 60/60
458/458 [==============================] - ETA: 0s - loss: 0.2733
- accuracy: 0.8898
Epoch 00060: val_accuracy did not improve from 0.82481
458/458 [==============================] - 136s 296ms/step -
loss: 0.2733 - accuracy: 0.8898 - val_loss: 0.7837 -
val_accuracy: 0.7792

CNN Architecture and Training Process for Gender


classification Model

This model is designed for classifying gender based on facial images


using the UTKFace dataset. The following describes the architecture
and training procedure of the CNN model for gender classification.

CNN Architecture
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 100, 100, 1)] 0

conv2d (Conv2D) (None, 100, 100, 32) 320

dropout (Dropout) (None, 100, 100, 32) 0

activation (Activation) (None, 100, 100, 32) 0

max_pooling2d (MaxPooling2D (None, 50, 50, 32) 0


)

conv2d_1 (Conv2D) (None, 50, 50, 64) 18496

dropout_1 (Dropout) (None, 50, 50, 64) 0

activation_1 (Activation) (None, 50, 50, 64) 0

max_pooling2d_1 (MaxPooling (None, 25, 25, 64) 0


2D)

conv2d_2 (Conv2D) (None, 25, 25, 128) 73856

dropout_2 (Dropout) (None, 25, 25, 128) 0

activation_2 (Activation) (None, 25, 25, 128) 0

max_pooling2d_2 (MaxPooling (None, 12, 12, 128) 0


2D)

conv2d_3 (Conv2D) (None, 12, 12, 256) 295168

dropout_3 (Dropout) (None, 12, 12, 256) 0

activation_3 (Activation) (None, 12, 12, 256) 0

max_pooling2d_3 (MaxPooling (None, 6, 6, 256) 0


2D)

flatten (Flatten) (None, 9216) 0

dense (Dense) (None, 128) 1179776

dropout_4 (Dropout) (None, 128) 0

dense_1 (Dense) (None, 2) 258

=================================================================

The model begins by accepting input images of size 100×100 pixels,


which are converted to grayscale (single-channel images) to simplify
the computation and focus on structural features. The image data is
loaded from the dataset, and labels (representing gender) are
extracted from the image filenames. The grayscale conversion
ensures that only intensity information is used, which helps the
model focus on the essential features rather than colour.

This layer applies 32 filters of size 3×3 on the input images, aiming
to detect low-level features like edges and textures. The ReLU
activation function is used to introduce non-linearity into the model,
which allows it to learn complex patterns. Negative values are set to
zero by the ReLU function. A MaxPooling2D layer follows the
convolution to reduce the spatial dimensions of the feature map,
downsampling the output by a factor of 2 in both dimensions (height
and width).

This layer applies 64 filters of size 3×3 continuing to detect more


complex features. Similar to the first layer, the ReLU activation
function is applied, and the output is downsampled using another
MaxPooling2D layer. MaxPooling helps in reducing computational
load while retaining the most important spatial features from the
previous layer.

Here, 128 filters of size 3×3 are applied, allowing the model to
capture even more abstract and high-level features from the data.
Again, ReLU activation is used, followed by MaxPooling2D, further
reducing the feature map size and emphasizing the most relevant
features.

The model uses 256 filters of size 3×3 to capture the most complex
patterns, leading to a deeper understanding of the data. This layer
is followed by MaxPooling2D to reduce the spatial dimensions of the
feature map and retain the critical features.

The output of the last pooling layer is a 3D tensor. The Flatten layer
converts this 3D tensor into a 1D vector, which can then be
processed by fully connected layers (Dense layers). This flattening
operation is crucial because it reshapes the feature map from the
convolutional layers into a format that can be fed into the Dense
layers.

A Dense layer with 128 neurons is used to learn high-level features


and patterns based on the flattened input from the previous layer.
The ReLU activation function is applied to introduce non-linearity.

A Dropout layer with a rate of 0.2 is included to prevent overfitting


by randomly setting 20% of the neurons to zero during training. This
helps the model generalize better on unseen data.
The final layer is a Dense layer with 2 neurons, one for each class
(Male, Female), as the gender classification problem is a binary
classification task. The Sigmoid activation function is used to output
a probability value between 0 and 1, indicating the predicted class.
A value closer to 1 would indicate "Female", and a value closer to 0
would indicate "Male".

Training Process

The Sparse Categorical Cross-Entropy loss function is used because


this is a binary classification problem, where the goal is to minimize
the difference between the predicted gender and the true gender
label. This loss function is ideal when the labels are integer-encoded
(i.e., 0 for male and 1 for female).

The model uses the Adam optimizer, which combines the benefits of
momentum and adaptive learning rates. This optimizer is efficient
for training deep networks and helps the model converge quickly to
an optimal solution.

The model is trained for 30 epochs, meaning the entire training


dataset is passed through the network 30 times. The model will
process a batch of images before updating its weights. The
ModelCheckpoint callback is used to save the best model based on
the lowest training loss, ensuring that the model with the best
performance is preserved.

Epoch 28/30
555/556 [============================>.] - ETA: 0s - loss: 0.2340
- accuracy: 0.9283
Epoch 00028: loss improved from 0.23508 to 0.23439, saving model
to ./output/gender_model.h5
556/556 [==============================] - 8s 14ms/step - loss:
0.2344 - accuracy: 0.9282 - val_loss: 0.3232 - val_accuracy:
0.8929
Epoch 29/30
555/556 [============================>.] - ETA: 0s - loss: 0.2374
- accuracy: 0.9288
Epoch 00029: loss did not improve from 0.23439
556/556 [==============================] - 7s 13ms/step - loss:
0.2373 - accuracy: 0.9288 - val_loss: 0.3397 - val_accuracy:
0.8900
Epoch 30/30
556/556 [==============================] - ETA: 0s - loss: 0.2353
- accuracy: 0.9302
Epoch 00030: loss did not improve from 0.23439
556/556 [==============================] - 7s 13ms/step - loss:
0.2353 - accuracy: 0.9302 - val_loss: 0.3113 - val_accuracy:
0.8947

CNN Architecture and Training Process for Emotion Model


This model is designed for classifying emotion based on facial
images using the CK+ dataset. The following describes the
architecture and training procedure of the CNN model for gender
classification.

CNN Architecture
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 48, 48, 1)] 0

conv2d (Conv2D) (None, 48, 48, 32) 320

dropout (Dropout) (None, 48, 48, 32) 0

activation (Activation) (None, 48, 48, 32) 0

max_pooling2d (MaxPooling2D (None, 24, 24, 32) 0


)

conv2d_1 (Conv2D) (None, 24, 24, 64) 18496

dropout_1 (Dropout) (None, 24, 24, 64) 0

activation_1 (Activation) (None, 24, 24, 64) 0

max_pooling2d_1 (MaxPooling (None, 12, 12, 64) 0


2D)

conv2d_2 (Conv2D) (None, 12, 12, 128) 73856

dropout_2 (Dropout) (None, 12, 12, 128) 0

activation_2 (Activation) (None, 12, 12, 128) 0

max_pooling2d_2 (MaxPooling (None, 6, 6, 128) 0


2D)

conv2d_3 (Conv2D) (None, 6, 6, 256) 295168

dropout_3 (Dropout) (None, 6, 6, 256) 0

activation_3 (Activation) (None, 6, 6, 256) 0

max_pooling2d_3 (MaxPooling (None, 3, 3, 256) 0


2D)

flatten (Flatten) (None, 2304) 0

dense (Dense) (None, 128) 295040

dropout_4 (Dropout) (None, 128) 0

dense_1 (Dense) (None, 3) 387

=================================================================

The model begins by accepting input images of size 48x48 pixels,


which are grayscale (single-channel images). Grayscale conversion
simplifies computations by using only intensity information, focusing
on the structural features necessary for emotion detection. The
dataset provides the images along with corresponding emotion
labels, which are extracted from the filenames. The grayscale
format helps the model concentrate on patterns such as facial
expressions, which are crucial for emotion classification, rather than
color details.

In the first convolutional layer, the model applies 32 filters of size


3x3 to the input images. These filters are designed to detect low-
level features like edges, lines, and textures, which are foundational
for emotion recognition. The ReLU activation function is applied to
introduce non-linearity into the model, enabling it to learn more
complex patterns from the data. After the convolution operation, a
MaxPooling2D layer with a pool size of 2x2 follows, reducing the
spatial dimensions of the feature map by a factor of 2 in both height
and width. This helps in downsampling the data while retaining the
important features, making the model more efficient.

The second convolutional layer applies 64 filters of size 3x3 to


detect more complex features in the data. Similar to the first layer,
the ReLU activation function is used to capture non-linear patterns,
and a MaxPooling2D layer follows to reduce the size of the feature
map while keeping the critical features intact. This process enables
the model to focus on increasingly abstract features that are
important for emotion recognition.

Here, the model applies 128 filters of size 3x3, allowing it to learn
even more intricate and abstract representations of the input
images. The ReLU activation function is again used for non-linearity,
and MaxPooling2D follows to reduce the spatial dimensions of the
feature map, enabling the model to focus on higher-level features
and reduce computational complexity.

In this layer, the model uses 256 filters of size 3x3 to capture the
most complex patterns in the image. These patterns are more
abstract and help the model better understand complex facial
expressions associated with different emotions. A MaxPooling2D
layer is again applied after the convolution to downsample the
feature map, preserving the most significant features and enhancing
the model’s ability to generalize.

After the convolution and pooling layers, the output is a 3D tensor.


The Flatten layer converts this 3D tensor into a one-dimensional
vector, which can then be processed by the fully connected (Dense)
layers. This step is crucial because the convolutional layers
generate complex feature maps and flattening them prepares the
data for the dense layers.
The flattened output is passed through a Dense layer with 128
neurons, which helps the model learn high-level features and
patterns. The ReLU activation function is applied here to introduce
non-linearity and enable the model to capture more intricate
relationships in the data.

To prevent overfitting and improve generalization, a Dropout layer


with a rate of 0.2 is included. This means that during training, 20%
of the neurons are randomly set to zero, helping the model avoid
dependency on specific neurons and forcing it to learn more robust
features.

The final layer is another Dense layer, but with 6 neurons, one for
each emotion class (e.g., Happy, Sad, Angry, Surprise, Neutral,
Fear). The Softmax activation function is used here to output a
probability distribution over the 6 possible classes. The Softmax
function ensures that the output values are between 0 and 1 and
that their sum is equal to 1, making it suitable for multi-class
classification problems. The model will predict the class with the
highest probability, which corresponds to the recognized emotion.

Training Process

The model uses Categorical Cross-Entropy as the loss function. This


loss function is ideal for multi-class classification tasks, where each
image is associated with one of the six emotion categories. It
computes the difference between the predicted probability
distribution and the actual one-hot encoded emotion label. The goal
is to minimize this loss during training, which ensures that the
predicted probabilities align with the true class labels.

The Adam optimizer is used for training the model. Adam is an


efficient optimization algorithm that combines the benefits of
momentum and adaptive learning rates, making it suitable for
training deep neural networks. It helps the model converge quickly
and efficiently to an optimal solution by adjusting the learning rate
dynamically for each parameter.

The model is trained for 50 epochs, meaning the entire training


dataset is passed through the network 50 times. During each epoch,
the model updates its weights based on the loss function and
optimizer. The batch size, typically set to a power of 2 (e.g., 32, 64),
determines how many images are processed before updating the
model’s weights. The ModelCheckpoint callback is used during
training to save the best model based on the validation loss,
ensuring that the model with the best performance is preserved for
future use.

Epoch 48/50
17/23 [=====================>........] - ETA: 0s - loss: 0.0945 -
accuracy: 0.9963
Epoch 00048: loss did not improve from 0.09191
23/23 [==============================] - 0s 9ms/step - loss: 0.0946
- accuracy: 0.9959 - val_loss: 0.1396 - val_accuracy: 0.9837
Epoch 49/50
17/23 [=====================>........] - ETA: 0s - loss: 0.0936 -
accuracy: 0.9945
Epoch 00049: loss did not improve from 0.09191
23/23 [==============================] - 0s 9ms/step - loss: 0.0919
- accuracy: 0.9959 - val_loss: 0.1374 - val_accuracy: 0.9837
Epoch 50/50
16/23 [===================>..........] - ETA: 0s - loss: 0.0893 -
accuracy: 0.9961
Epoch 00050: loss improved from 0.09191 to 0.08838, saving model to
./output/emotion_model.h5
23/23 [==============================] - 0s 14ms/step - loss:
0.0884 - accuracy: 0.9959 - val_loss: 0.1243 - val_accuracy: 0.9797

Social Relationship Identification


Grouping Faces Based on Proximity

The task of grouping faces based on proximity focuses on identifying


how close faces are to each other in an image and creating groups
based on these spatial relationships. This step is essential for
understanding how people are related in the context of a social
group, which might include families or couples.

The algorithm detects faces and marks them with bounding boxes.
For each face, a rectangle (bounding box) is drawn around the face,
which has four coordinates: x,y (top-left corner), and w,h (width and
height). The center of each bounding box is calculated to identify
the exact location of the face in the image.
The horizontal distance between the centers of two bounding boxes
is computed using the calculate_horizontal_distance() function. It
uses the following formula to calculate the horizontal distance
between two faces:

|( ) ( )|
Distance= x 1
w1
2
− x2 +
w2
2

where x1, y1, h1 and h1 are the coordinates of the first bounding box,
and x2, y2, w2 and h2 are the coordinates of the second bounding
box.

A face is considered "close" to another if the horizontal distance


between their bounding box centers is less than 200 pixels. This
threshold can be adjusted based on the context of the images (e.g.,
a crowded room might need a smaller threshold, while a more
spacious scene might require a larger threshold).

Once the horizontal distances between the faces are calculated, the
program checks if any face is close to another. If two faces are
within the proximity threshold (200 pixels), they are added to the
same group.

The faces are iteratively compared to each other to determine if


they belong to an existing group. If a face is close to a face in an
existing group, it is added to that group. If no groups exist that the
face is close to, a new group is created, and the face is placed in it.
As an example,
 If Face A and Face B are close enough, they form a group.

 If Face C is far from Face A and Face B, a new group is


created for Face C.

 If Face D is close to Face A, Face B, and Face C, it is added to


the existing group.

Once faces have been grouped based on proximity, the program


proceeds to classify each group as a "couple" or a "family" based on
gender and age predictions. These predictions are generated using
pre-trained models that assess the gender and age of each detected
face.

Couples Identification:

Each face in the group has a gender prediction based on the output
of the gender prediction model. The model classifies each face as
either "male" or "female."

A group is classified as a "couple" if it contains exactly one male and


one female. This classification works because a couple is
traditionally defined as two people of opposite genders. The
program iterates over each group and counts the number of males
and females in the group:

If the group contains exactly one male and one female, it is


identified as a couple. As an example,

 Group A consists of two faces: one male and one female. It is


classified as a couple.

 Group B consists of two males or two females. It is not


classified as a couple.

Families Identification:

Each face in the group is classified into one of several age groups
based on the output of the age prediction model. The age groups
used in the classification are:

 Children: 1-2 years, 3-9 years, 10-20 years.


 Adults: 21-27 years, 28-45 years, 46-65 years, 66-116 years.

A group is classified as a "family" if it contains at least one adult and


at least one child. The rationale behind this is that a family usually
consists of parents (adults) and children. The program counts the
number of adults and children in each group:
If a group contains one or more adults and one or more children, it is
classified as a family. As an example,

 Group C consists of two adults (ages 30 and 35) and one


child (age 5). This group is classified as a family.

 Group D consists of two adults (ages 40 and 45) but no


children. This group is not classified as a family.

Attention Detection
The attention detection mechanism in the code is designed to
evaluate whether individuals in an image are paying attention based
on their face orientation. This evaluation is achieved through a
combination of frontal and side-profile face detection techniques,
filtering to ensure unique detections, and labeling with visualization.
Below is a more detailed explanation of each step:

Face Detection

Face detection is the foundational step in identifying individuals


within an image. This is carried out in two stages:

Frontal Face Detection:

The Dlib frontal face detector is employed, which utilizes a


Histogram of Oriented Gradients (HOG) combined with a linear
classifier. The input image is processed to detect rectangular regions
where faces are present. These regions are known as bounding
boxes. Each bounding box represents the coordinates of a face
detected in the frontal view.

Side Profile Detection:

Haar Cascade classifiers are used for detecting left and right profile
faces. Haar features capture intensity differences in images and are
effective for this task. The grayscale version of the input image is
passed through a pre-trained Haar Cascade classifier for left profile
detection. Detected faces are returned as bounding boxes.

For right profile detection:

The image is flipped horizontally. The flipped image is processed


using the same Haar Cascade classifier. Bounding box coordinates
for right profiles are adjusted to map back to the original image
orientation.

This two-step process ensures the detection of faces regardless of


their orientation.
Filtering Unique Side Profile Faces

Since a single face may appear in both frontal and side-profile


detections, a filtering mechanism is applied to ensure accuracy
avoid counting a face multiple times if it is detected as both frontal
and side-profile. For each side profile bounding box, its coordinates
are compared with the coordinates of frontal face bounding boxes. If
a significant overlap is found (calculated using Intersection over
Union, IoU), the side profile detection is discarded. Non-overlapping
side profile faces are considered unique and included in the final
detection results.

Expert System Design


The expert system was designed using a rule-based inference
framework to determine the most suitable advertisement category
based on audience attributes. The system integrates data such as
audience demographics, social relationships, emotional states, and
attention levels to make informed decisions.
Structure and Logic

The expert system employs a decision matrix to map audience


attributes to 11 advertisement categories. These categories include
10 primary categories (e.g., Family Oriented, Travel & Adventure,
Health & Fitness) and an additional category, Attention-Grabbing
Content, which is prioritized under specific conditions.

The system processes inputs such as the number of people detected


(total_faces_count), individuals not paying attention
(not_attention_count), the number of families, couples, males, and
females etc. This data is dynamically fed into the system for real-
time decision-making.

A predefined set of rules governs how audience data influences


category scores. Rules ensure alignment with the detected audience
composition, such as prioritizing "Family Oriented" content for
groups with families or emphasizing "Luxury & Fashion" for couples.
These rules are designed to maximize relevance by tailoring
advertisements to the audience and enhance engagement by
adapting to varying social contexts and demographics.

Attention-Grabbing Content

A unique feature of the system is its ability to detect attention levels


and adapt accordingly,

If more than 50% of individuals are not paying attention, the system
shifts focus to Attention-Grabbing Content. This content type is
designed to re-engage distracted audiences through visually striking
or highly engaging material.

When attention-grabbing content is prioritized, It receives a 100%


score in the decision matrix, overriding all other categories. This
ensures the advertisement is specifically tailored to capture and
hold the audience's focus.

Dynamic Weighting

In cases where the attention level does not necessitate prioritization


of attention-grabbing content, The system dynamically adjusts
category weights based on demographic and social relationship
data. For example:

 Groups with families increase the relevance of "Family


Oriented" and "Entertainment."

 The presence of couples boosts categories like "Luxury &


Fashion" and "Travel & Adventure."

The weights are normalized to ensure that all category scores


collectively sum to 100%, maintaining balance and fairness in
category prioritization.

Graphical User Interface (GUI)


The Graphical User Interface (GUI) for the advertisement
recommendation system was developed using CustomTkinter, a
Python library for creating modern and responsive interfaces. The
GUI was designed to facilitate seamless interaction with the system,
allowing users to upload images, analyse audience data, and view
targeted advertisements.

Features of the GUI

Image Upload and Display:

A file selection dialog allows users to upload an image for analysis.


The uploaded image is displayed on the GUI after resizing and
maintaining the original aspect ratio.

Audience Data Visualization:

Detected faces are analysed, and their attributes such as gender,


age, and emotion are displayed in a tabular format. Each row
corresponds to a detected face, with separate columns for
attributes.

Category Percentage Display:

Categories such as "Family Oriented" or "Health & Fitness" are


displayed along with their respective likelihood percentages, sorted
in descending order. This provides users with clear insight into
audience segmentation.
Advertisements Recommendation and Display:

The “Advertisements Recommendation and Display” feature is a


core aspect of the advertisement recommendation system,
designed to engage users by showcasing relevant advertisements
dynamically based on the analysed audience data. This feature
ensures that the content displayed aligns with the identified
preferences and demographics of the audience, providing a
personalized and interactive experience.

After analysing the uploaded image, the system assigns likelihood


percentages to predefined categories, such as "Family Oriented,"
"Health & Fitness," or "Technology & Gadgets". These percentages
indicate how well the audience attributes match each category, with
higher percentages reflecting stronger relevance.

The system selects the top 5 categories based on their likelihood


percentages. For each category, the system retrieves a folder of
advertisement images stored locally in organized directories, e.g.,
Ads/family_oriented/ or Ads/health_fitness/.

Advertisement Display Logic,

 The GUI displays one advertisement at a time in a dedicated


area of the interface.

 Every 5 seconds, the displayed advertisement cycles to


another randomly selected advertisement from the folders
corresponding to the top 5 categories.

 This approach keeps the content engaging and varied,


ensuring that users are exposed to multiple advertisements
within the most relevant categories.
Results and Discussion
Introduction
In this section, the outcomes of the research are presented and
analysed in detail to evaluate the performance and practical
applicability of the proposed advertisement recommendation
system. The results are aligned with the objectives of the study,
which are:

1. To develop a robust system capable of detecting and


analyzing audience attributes such as age, gender, and
emotion with high accuracy.

2. To identify social relationships, including families and couples,


using facial proximity and contextual analysis.

3. To recommend advertisements dynamically based on


audience segmentation using an expert system.

4. To provide a seamless user experience through a responsive


Graphical User Interface (GUI).

The structure of this section begins with a presentation of key


results, including metrics for accuracy, system performance, and
user satisfaction. This is followed by an in-depth discussion that
interprets the findings, compares them with existing systems, and
explores their implications. Challenges encountered during the
study and recommendations for future improvements are also
highlighted. Finally, the broader applications and potential of the
system in diverse settings are discussed to showcase its versatility.

Age Prediction Model

The accuracy scores improved significantly, although with a slight


degree of over-fit, but still be acceptable. The plots below show the
changes in loss and accuracy scores as the Age Prediction CNN
model trained over 60 epochs.

Loss vs Epochs
Loss vs Epoch
2
1.8
1.6
1.4
1.2
1 Train Loss
Loss

0.8 Validation Loss


0.6
0.4
0.2
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57

Epoch

The Loss vs Epochs graph visualizes the evolution of the model's


training and validation loss during the 60 epochs. In this graph, the
x-axis represents the epoch number (ranging from 1 to 60), and the
y-axis represents the loss values.

Training Loss:

The training loss decreases steadily over the 60 epochs, indicating


that the model is learning and improving its ability to predict the
age. This suggests that the model is fitting to the training data more
effectively as the number of epochs increases. The slight
fluctuations observed could be due to the stochastic nature of
gradient descent, but the overall downward trend highlights the
model's improving accuracy.

Validation Loss:

The validation loss also shows a generally decreasing trend but with
some fluctuations. At the beginning of training, the validation loss
starts relatively high and gradually decreases as the model
improves. However, the red line shows more variability compared to
the training loss, which is common when evaluating a model on
unseen data. This fluctuation could indicate the model’s struggle to
generalize well in some epochs, but the overall decrease suggests
improvement.

Interpretation:

Both the training and validation loss decrease over time, suggesting
that the model is improving in its ability to predict age from the
input data. However, the gap between the training loss and
validation loss could also suggest a slight overfitting issue, where
the model fits the training data well but experiences some difficulty
generalizing to the validation set.

Accuracy vs Epoch

Accuracy vs Epoch
100.00%
90.00%
80.00%
70.00%
60.00%
Accuracy

50.00% Train Accuracy


40.00% Validation Accuracy
30.00%
20.00%
10.00%
0.00%
1 6 11 16 21 26 31 36 41 46 51 56

Epoch

The Accuracy vs Epochs graph compares the training and validation


accuracy over the 60 epochs.

Training Accuracy:

The training accuracy shows a strong upward trend throughout the


epochs. It starts from around 30% accuracy and increases to
approximately 90% by the 60th epoch. This rapid increase indicates
that the model is successfully learning the patterns in the training
data, improving its performance in predicting the correct age labels.

Validation Accuracy:

The validation accuracy follows a similar trend but with some


fluctuations. The accuracy increases from around 35% at the start to
roughly 75-80% towards the end. While the validation accuracy is
lower than the training accuracy, it still shows a steady
improvement, reflecting the model's general ability to predict age
for previously unseen data.

Interpretation:

The significant increase in both training and validation accuracy


over the epochs indicates that the model is successfully learning
and generalizing to unseen data. However, the gap between the two
lines suggests a mild overfitting problem, where the model performs
better on the training set compared to the validation set. This can
be addressed by using regularization techniques or obtaining more
diverse training data.

Confusion Matrix:

A confusion matrix is a tool used to evaluate the performance of a


classification model by showing the count of true vs predicted class
labels. It helps identify how well the model classifies each category,
and highlights misclassifications. For age range classification, the
matrix breaks down how accurately the model assigns people to
specific age groups (e.g., '1-2', '3-9', '10-20'). Diagonal values
represent correct predictions, while off-diagonal values show errors,
helping us understand where the model excels or needs
improvement.

True / 1- 3- 10- 21- 28- 46- 66-116


Pred 2 9 20 27 45 65

1-2 94 1 1 1 4 3 0
8

3-9 16 51 69 58 26 5 5
4 8

10-20 3 6 566 260 91 15 0

21-27 1 0 11 167 308 13 0


4

28-45 0 0 9 480 222 91 6


1

46-65 2 0 0 43 359 120 72


3

66-116 0 0 0 8 29 74 698

Interpretation:

High Accuracy for '1-2' Age Range:

The model performs extremely well for the '1-2' age range, with 948
correct predictions (diagonal element). The misclassifications are
minimal, with only 1 instance predicted as '3-9', 1 as '10-20', 1 as
'21-27', and a few others spread across other ranges. This suggests
that the model is highly accurate in predicting the '1-2' age group,
with a very low rate of confusion.

Moderate Performance for '3-9' Age Range:

The '3-9' range shows 518 correct predictions, but there are notable
misclassifications, such as 164 instances predicted as '1-2' and 69
as '10-20'. This suggests that the model struggles to distinguish
between '3-9' and the '1-2' age range, and occasionally confuses it
with the '10-20' range.

Good Performance for '10-20' Age Range:

For the '10-20' range, there are 566 correct predictions, but
misclassifications are also notable. For example, 260 instances
predicted as '21-27' and 91 as '28-45'. This indicates that while the
model correctly predicts '10-20' most of the time, it occasionally
confuses it with higher age ranges.

High Accuracy for '21-27' Age Range:

The model performs well for the '21-27' range, with 1674 correct
predictions, but it also misclassifies some as '28-45' (480 instances),
suggesting the model tends to overestimate the age in this range.
There is a low level of confusion with other ranges, which suggests
good model performance for this age group.

Excellent Performance for '28-45' Age Range:

For '28-45', the model performs very well, with 2221 correct
predictions. However, it misclassifies a few as '21-27' (480
instances) and a few as '46-65' (91 instances). The confusion is
relatively low, indicating strong generalization within this range.

Solid Performance for '46-65' Age Range:

The '46-65' range has 1203 correct predictions, but there are 43
instances misclassified as '21-27' and 359 as '28-45'. The model
shows some confusion between '46-65' and '28-45', which might be
due to the overlap in age appearance in real-life scenarios. This
suggests that the model is somewhat less accurate at distinguishing
between these two age ranges.

Good Performance for '66-116' Age Range:

The '66-116' range also performs well, with 698 correct predictions.
The model confuses this group with '28-45' (29 instances) and '46-
65' (74 instances) but performs relatively well in this range. The
confusion with '46-65' may stem from age-related appearance
overlap, where older adults may appear younger due to various
factors like health or lifestyle.

Gender Prediction Model

The accuracy scores improved significantly, although with a slight


degree of over-fit, but still be acceptable. The plot below shows the
changes in loss and accuracy scores as the Gender Prediction CNN
model trained over 30 epochs.

Loss vs Epoch Graph

Loss vs Epoch Graph


1.6
1.4
1.2
1
0.8 Loss
Loss

0.6 Validation Loss

0.4
0.2
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Epoch

The Loss vs Epoch graph visualizes the evolution of the model's


training and validation loss during the 30 epochs. The x-axis
represents the epoch number (ranging from 1 to 30), and the y-axis
represents the loss values.

Training Loss:

The training loss shows a steady decrease over the 30 epochs,


starting from a higher value of 1.4473 and decreasing to 0.2353 by
the 30th epoch. This consistent reduction suggests that the model is
improving its ability to classify gender correctly and is learning from
the training data effectively. The fluctuations in the curve, especially
in the initial epochs, indicate the stochastic nature of gradient
descent, but the overall downward trend shows the model's
improvement.

Validation Loss:

The validation loss also decreases over time, starting from 0.5918 in
the first epoch and ending at 0.3113 by the 30th epoch. Similar to
the training loss, the validation loss generally decreases, although it
shows some fluctuations. This variability is typical when a model is
evaluated on unseen data. It indicates that while the model
performs well on the training data, it may experience some
challenges in generalizing to the validation set, especially in certain
epochs.

Interpretation:

Both the training and validation loss decrease over the epochs,
indicating that the model is learning to predict gender more
accurately. However, the validation loss fluctuates more than the
training loss, which may suggest some difficulty in generalizing to
new, unseen data. The gap between the training and validation loss
might indicate a slight overfitting issue, where the model fits the
training data well but struggles to generalize. This can be addressed
by improving regularization or increasing the diversity of the
training dataset.

Accuracy vs Epoch Graph

Accuracy vs Epoch
100.00%
90.00%
80.00%
70.00%
60.00%
Accuracy

50.00% Accuracy
40.00% Validation Accuracy
30.00%
20.00%
10.00%
0.00%
1 4 7 10 13 16 19 22 25 28
Epoch

The Accuracy vs Epoch graph compares the training and validation


accuracy over the 30 epochs.

Training Accuracy:

The training accuracy shows a significant upward trend throughout


the 30 epochs, starting from 74.00% in epoch 1 and steadily
increasing to 93.02% by epoch 30. This steady increase suggests
that the model is learning effectively from the training data and
improving its gender classification capabilities. The accuracy
reaches above 90% by the later epochs, indicating strong
performance on the training data.
Validation Accuracy:

The validation accuracy shows a similar trend but with more


fluctuation. It starts at 83.65% in epoch 1 and increases to
approximately 89.47% by the 30th epoch. Although the validation
accuracy is slightly lower than the training accuracy, it still reflects
steady improvement, indicating that the model is able to generalize
reasonably well to unseen data.

Interpretation:

Both the training and validation accuracy increase steadily, showing


that the model is successfully learning to classify gender. The gap
between the training and validation accuracy, while present, is
relatively small, suggesting that the model is not overfitting
severely. The fluctuations in the validation accuracy curve may
reflect some challenges in generalizing, but overall, the model
appears to be improving its performance on unseen data. A slight
increase in the validation accuracy towards the later epochs also
shows that the model is adapting better to the validation set over
time.

Emotion Prediction Model

The accuracy scores improved significantly, although with a slight


degree of over-fit, but still be acceptable. The plot below shows the
changes in loss and accuracy scores as the Emotion Prediction CNN
model trained over 50 epochs.

Loss vs Epoch Graph

Loss vs Epoch
1.4
1.2
1
0.8
Training Loss
Loss

0.6 Validation Loss


0.4
0.2
0
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Epoch
The Loss vs Epoch graph demonstrates how the model's training
and validation loss evolved over the course of 50 epochs. The x-axis
represents the epoch number (1 to 50), and the y-axis represents
the loss values.

Training Loss:

The training loss exhibits a steady decline as the epochs progress,


beginning at a high value of 1.3038 in the first epoch and reducing
to 0.0884 by the 50th epoch. This consistent reduction indicates
that the model is effectively learning to classify emotions by
minimizing the error on the training dataset. Fluctuations observed
in the initial epochs are typical due to the stochastic nature of
gradient descent. The downward trend overall reflects improved
learning over time.

Validation Loss:

The validation loss also follows a decreasing trend, starting at


1.2193 in the first epoch and reaching 0.1243 by epoch 50.
Although there are intermittent fluctuations in the curve, the
general trend of decreasing loss demonstrates the model's ability to
generalize its learning to unseen data. The fluctuations, especially in
the middle epochs, suggest periodic challenges in adapting to the
validation set.

Interpretation:

The reduction in both training and validation loss over the epochs
indicates that the emotion prediction model is effectively learning
and adapting. The presence of occasional fluctuations in the
validation loss curve suggests potential overfitting in specific
epochs; however, the overall alignment between the training and
validation loss curves indicates that the model is generalizing
reasonably well.

Accuracy vs Epoch Graph


Accuracy vs Epoch
120.00%

100.00%

80.00%
Accuracy

60.00% Training Accuracy


Validation Accuracy
40.00%

20.00%

0.00%
1 5 9 13 17 21 25 29 33 37 41 45 49

Epoch

The Accuracy vs Epoch graph illustrates the progression of training


and validation accuracy over the 50 epochs.

Training Accuracy:

Training accuracy starts at 44.90% in the first epoch and rises


consistently, reaching an impressive 99.59% by epoch 50. This
increase demonstrates that the model effectively learns to classify
emotions over the training data and achieves high accuracy in later
epochs.

Validation Accuracy:

Validation accuracy shows a steady upward trend with some


fluctuations, beginning at 47.56% in the first epoch and climbing to
98.37% by the final epoch. While the fluctuations indicate periodic
challenges in generalization, the overall improvement signifies that
the model adapts well to unseen data and performs consistently.

Interpretation:
Both training and validation accuracy improve significantly over the
epochs, with the gap between them remaining relatively small. This
small gap suggests that the model maintains a balance between
learning and generalization, avoiding severe overfitting. The
validation accuracy's stabilization and high performance towards
the later epochs indicate the model's ability to predict emotions with
a high degree of accuracy on unseen data.

Social Relationship Identification


The social relationship identification system demonstrates a robust
methodology for detecting and categorizing social groups such as
families and couples in real-world scenarios. This section explores
the analysis and insights gained through the implemented
algorithm.

Group Formation and Analysis

The algorithm initially detects faces within an image using pre-


trained models for frontal and side-profile face detection. Each
detected face is assigned attributes such as age, gender, and
proximity to others. Faces are grouped based on their horizontal
distance, leveraging proximity as a primary criterion for establishing
social connections. A distance threshold of 200 pixels between
bounding boxes ensures that closely situated individuals are
considered part of the same group.

Categorization of Groups

The algorithm categorizes groups into distinct social units—couples


and families—based on demographic attributes:

Couples: Groups containing exactly one male and one female are
classified as couples. This criterion ensures precision by combining
gender-based filtering with proximity-based grouping.

Families: Groups containing at least one adult (age categories: 21-


27, 28-45, 46-65, 66-116) and one child (age categories: 1-2, 3-9,
10-20) are identified as families. This classification leverages the
age attribute, facilitating an accurate distinction between familial
and non-familial groups.

Final Social Relationship Classification:

After grouping the faces based on proximity and applying the


gender and age-based classification criteria, the program identifies
which groups are couples and which are families. The groups are
then labelled accordingly in the output image.

The program uses the draw_lines_between_centers() function to


visually connect the centers of the faces within each group. This
step is important for indicating which faces belong together as a
couple or family. A label ("Couple" or "Family") is added at the
center of the group to further clarify the relationship.

The image is then displayed with bounding boxes around faces, lines
connecting the centers of faces within groups, and labels indicating
whether the group is a "Couple" or a "Family."

Attention Identification

The attention identification component of the system analyses facial


orientations to determine if individuals are paying attention within
the frame. This is achieved by combining frontal face detection and
side-profile face detection. The main outcomes are,

Detection of Frontal Faces:


The algorithm detects frontal faces using Dlib's face detector.
Frontal faces are considered as individuals potentially paying
attention. For each detected frontal face, bounding boxes are drawn,
and relevant facial features are analysed for further classification
into age, gender, and emotion categories.

Identification of Side Profile Faces:

To identify individuals not paying attention, the system employs a


Haar Cascade classifier for side-profile face detection. Both left and
right-side profiles are detected, with the latter being identified by
horizontally flipping the image for symmetry. Adjustments are made
to ensure detected right-side profiles are correctly mapped back to
the original frame.

Unique Side-Profile Filtering:

A key feature of the algorithm is filtering out overlapping faces


between frontal and side-profile detections. This ensures that only
unique side profiles, which are not already identified as frontal
faces, are considered as "not paying attention."

Visual Representation:

 Frontal Faces: These are marked with green bounding boxes.

 Unique Side Profiles: These are marked with green bounding


boxes and labeled "Not Paying Attention" above each face,
indicating their inattention status.

Quantitative Results:

The system calculates the number of individuals not paying


attention by counting the unique side-profile faces. This provides a
numerical metric for inattention, which can be used in subsequent
analyses or integrated into decision-making processes, such as
advertisement suitability predictions.

Expert System for Advertisement Category Prediction

The expert system implemented in this research provides a dynamic


and adaptive method to recommend the most suitable
advertisement categories based on real-time audience attributes.
This section discusses the outcomes generated by the expert
system and its ability to interpret audience characteristics
effectively.

Core Functionality and Rule-Based Decision-Making


The expert system is designed to analyze audience data including
attention levels, demographic attributes (e.g., gender distribution,
number of families, and couples), and other contextual inputs to
prioritize advertisement categories. The system employs a rule-
based approach, making decisions dynamically based on the
collected data.

For instance, if more than 50% of the audience is not paying


attention to the displayed content, the system automatically assigns
100% priority to "Attention-Grabbing Content," ensuring that the
advertisement displayed is designed to capture audience focus. This
decision-making rule reflects the system's capability to adapt to
real-time engagement levels.

Category Weight Adjustment Based on Audience Attributes

When the audience shows sufficient attention, the system evaluates


secondary attributes, such as the number of families and couples
present, to fine-tune its recommendations.

 Impact of Families: The presence of families increases the


weightage of categories such as "Family Oriented," "Food &
Beverages," "Entertainment," and "Education & Learning."
This outcome is aligned with behavioral trends indicating
families often show interest in educational and recreational
content.

 Impact of Couples: The detection of couples raises the priority


of categories like "Luxury & Fashion" and "Travel &
Adventure," reflecting preferences commonly associated with
pairs planning leisure or lifestyle activities. Secondary weight
increases are also assigned to "Food & Beverages" and
"Entertainment," emphasizing shared activities.

Normalization and Distribution of Weights

The system normalizes all calculated category weights to ensure the


total equals 100%. This ensures equitable allocation across
categories based on the contextual significance of each audience
attribute. For example, in a scenario with high family presence,
categories such as "Family Oriented" might dominate the
percentage distribution while still allowing for smaller contributions
from complementary categories like "Food & Beverages."

Robustness in Diverse Scenarios

The expert system has demonstrated versatility across various


simulated audience compositions:
 High Attention Deficit: When over half the audience showed
disengagement, the system shifted to prioritizing "Attention-
Grabbing Content," suppressing all other categories. This
ensures alignment with the system's primary goal of
maximizing audience engagement.

 Diverse Audiences: In mixed groups containing families,


couples, and individual audience members, the system
allocated balanced weightings across several categories,
creating an inclusive content strategy.

Insights Derived from Outputs

The outputs generated by the expert system offer valuable insights


into audience behavior and preferences:

 Audience Behavior Trends: The system's dynamic weighting


reveals correlations between demographic features and
content interest areas, enabling better-targeted
advertisements.

 Content Strategy Development: The ability to prioritize


"Attention-Grabbing Content" highlights areas where
engagement strategies can be improved in future iterations.

Graphical User Interface (GUI)

The Graphical User Interface (GUI) developed for the advertisement


recommendation system serves as an interactive and user-friendly
platform that integrates multiple functionalities, ensuring efficient
visualization and real-time analysis of the results. The design and
implementation of the GUI address two primary objectives:
seamless interaction with the system and effective communication
of outputs.

Design and Functionality


The GUI is organized into three primary frames, each serving
distinct roles:

1. Preview Frame: This frame is split into two sections. The first
displays the original input image, and the second presents the
processed image with annotated details, including detected
faces and their respective attributes (e.g., age, gender,
emotion). This dual visualization allows for a side-by-side
comparison, aiding in the verification of system outputs.

2. Faces Frame: This dynamic frame provides detailed textual


information for each detected face. Attributes such as Face
Number, Gender, Age, and Emotion are systematically
displayed in a tabular format. The frame's flexibility ensures
that it accommodates variable numbers of faces detected in
the input image, dynamically adjusting rows and labels as
needed.

3. Output Frame: The rightmost frame displays the


advertisement category predictions, sorted by their relevance
(percentage). Each category is paired with its computed
weight, providing users with a clear understanding of the
system’s recommendation priorities. The top-ranked
categories are updated in real time and linked to a
corresponding advertisement display.

User Interaction

Two buttons at the bottom of the preview frame streamline the user
experience,
 Load Image: Allows users to upload an image file for
analysis.

 Analyze and Show Ad: Triggers the processing pipeline,


which includes face detection, attribute extraction, and
advertisement recommendation.

Ad Display Mechanism

The advertisement display integrates the recommendation system's


results by dynamically cycling through the top five categories.
Advertisements from the most relevant category are shown
prominently, with updates occurring every 5 seconds. This feature
not only ensures that the highest-priority content is emphasized but
also provides diversity by showcasing ads from multiple categories
in succession.

Conclusions

This research aimed to create an advanced, AI-driven advertisement


recommendation system that integrates face detection,
demographic analysis, emotion recognition, social relationship
identification, and an expert system, all presented through an
interactive Graphical User Interface (GUI). The successful realization
of these goals underscores the study’s contributions to personalized
advertising technologies and computer vision applications. By
leveraging advanced machine learning techniques and intuitive
system design, this project demonstrated how artificial intelligence
can enhance the targeting and personalization of advertisements in
diverse settings.

The face detection module served as the foundation of the system,


employing MTCNN to accurately identify both frontal and side-profile
faces in a variety of image settings. Its robustness ensured a high
degree of reliability in identifying individuals within group scenarios.
On top of this, the demographic classification models effectively
analyzed detected faces to determine their age, gender, and
emotional states. These models, trained on comprehensive datasets
and optimized for grayscale image inputs, exhibited strong
performance metrics, enabling the system to categorize audiences
into meaningful demographic segments. This information was
further enriched through a unique feature that identified social
relationships, grouping individuals into families and couples based
on spatial proximity and contextual cues. This multi-layered analysis
provided a deeper understanding of audience dynamics, which plays
a crucial role in targeted advertising strategies.
The attention detection feature added a sophisticated layer to the
system by estimating head poses to determine whether individuals
were engaged with the content being displayed. This real-time
assessment of audience focus enhanced the system’s contextual
understanding, ensuring that recommendations were not only
personalized but also relevant to actively attentive viewers.

Central to the system was an expert recommendation engine that


combined these demographic, emotional, and relational attributes
into actionable insights for advertisement selection. The rule-based
expert system employed a structured framework to compute
relevance scores across ten advertisement categories, prioritizing
those that aligned most closely with the detected audience
characteristics. By incorporating factors such as age, gender,
emotion, and the presence of families or couples, the system offered
highly tailored advertisement suggestions that reflected the
nuanced preferences of diverse audiences. This integration
demonstrated the practical potential of AI-driven recommendations
to improve engagement and maximize the effectiveness of
advertising campaigns.

The GUI provided a dynamic, user-friendly interface to visualize the


system’s outputs and facilitate seamless interaction. Organized into
three main frames, it allowed users to view the original and
processed images side by side, inspect detailed demographic and
emotional data in tabular form, and explore advertisement
recommendations ranked by relevance. The inclusion of interactive
buttons for loading images and triggering analysis streamlined the
user experience, while the advertisement display mechanism
dynamically cycled through top categories, ensuring the most
relevant content was prominently featured. This intuitive design
bridged the gap between complex AI processes and practical user
interaction, making the system accessible and efficient.

Key findings of the study highlight the system’s ability to reliably


detect and analyze audience attributes, achieving high accuracy in
demographic and emotional classification. The expert system’s
advertisement recommendations were both relevant and adaptive,
showcasing the practical applications of this technology in real-
world settings such as retail environments, public displays, and
online platforms. The integration of advanced analysis techniques
with interactive design marks a significant step forward in the field
of personalized advertising, demonstrating the feasibility and
impact of combining AI and human-centered interfaces.
The significance of this research lies in its contributions to multiple
domains. In computer vision, the study presented an innovative
approach to demographic and emotion-based audience
segmentation. In advertising technology, it demonstrated a practical
solution for enhancing engagement through personalized content
delivery. The novelty of combining demographic analysis, social
relationship identification, and an expert recommendation system
within a cohesive GUI underscores the study’s role in advancing
interactive and intelligent systems for targeted advertising.

Despite its achievements, the research faced several limitations.


Dataset constraints limited the diversity of demographic
representation, particularly in capturing cultural and contextual
variations in age, gender, and emotional expression. Real-time
processing presented challenges in terms of computational
efficiency, highlighting the need for further optimization to ensure
scalability and applicability in resource-limited environments.
Additionally, the system’s reliance on static images restricted its
potential for dynamic, video-based applications, suggesting a future
direction for real-time video stream integration. These limitations,
while noteworthy, do not diminish the system’s demonstrated
effectiveness and its potential for further development.

In conclusion, this research successfully combined cutting-edge AI


techniques with practical system design to create a robust,
interactive advertisement recommendation platform. The system’s
ability to detect, analyze, and respond to complex audience
attributes demonstrates its potential to transform the landscape of
personalized advertising. By addressing its current limitations and
exploring future enhancements, such as integrating live video
processing and expanding dataset diversity, the system can achieve
even greater impact, offering innovative solutions to the evolving
demands of advertising and audience engagement.

Future Works

The success of this research demonstrates the feasibility of


integrating face detection, demographic analysis, attention
identification, and advertisement recommendations into a cohesive
system. While the current implementation showcases significant
advancements, there is immense potential to enhance its
functionality, scalability, and adaptability. Future developments
could focus on overcoming current limitations, expanding features,
and introducing new deployment methods to maximize the system’s
real-world applicability.
System Optimization

Optimizing computational efficiency and accuracy remains a critical


area for improvement. Leveraging lightweight neural network
architectures, such as MobileNet, EfficientNet, or TensorFlow Lite
models, can enable real-time processing on resource-constrained
devices like smartphones or embedded systems. Techniques such as
model quantization, pruning, and knowledge distillation could be
implemented to reduce computational requirements while
maintaining high accuracy. These optimizations would allow the
system to perform seamlessly in diverse environments, including
mobile and web-based platforms.

Web Application Development

To enhance usability and accessibility, future work will involve


developing a web application to handle the system. A web-based
interface would allow users to manage the system remotely, upload
datasets, configure parameters, and analyze outputs in real time.
The application could integrate the system's core functionalities—
such as face detection, demographic analysis, attention
identification, and expert recommendations—within an intuitive and
interactive dashboard. Features like data visualization, live feed
processing through connected cameras, and report generation could
further streamline the user experience. Implementing cloud-based
processing within the web application would ensure scalability,
enabling the system to handle large datasets and simultaneous
inputs from multiple users.

The web application would also provide an ideal platform for


incorporating user-friendly tools, such as drag-and-drop interfaces
for content updates or region-based configurations for targeted
advertisements.

Expanded Functionality

Enhancing the system’s capabilities to provide a richer user


experience is another promising direction. Integrating voice-based
commands would facilitate hands-free interaction, making the
system suitable for public environments and accessible to
individuals with disabilities. Multilingual support could broaden its
usability across different linguistic and cultural groups.
Personalization features for recurring users, such as adaptive
algorithms that refine recommendations based on previous
interactions, would add a layer of sophistication. Expanding the
system's ability to detect complex social dynamics, such as
identifying friendships, professional relationships, or group
structures, could deepen its understanding of audience behavior.

Integration with Real-World Applications

The system’s versatility makes it suitable for deployment across


multiple domains. In retail environments, it could dynamically adjust
in-store advertisements based on real-time demographic and
emotional data. Public spaces like airports, malls, and transportation
hubs could use the system to display targeted messages to large,
diverse audiences. The integration of IoT devices and edge
computing could further enable real-time applications, such as
interactive advertising boards that adjust content instantaneously.
By connecting to smart cameras and sensors, the system could
become a fully automated solution for live audience analysis and
dynamic content delivery.

Advanced User Behavior Analysis

Future developments could explore more advanced methods of


analyzing user behavior. Features like gaze detection, biometric data
analysis, and motion tracking would provide deeper insights into
audience engagement and attention spans. Tracking long-term
engagement patterns, such as repeated interactions with certain
types of advertisements, could refine the recommendation engine to
deliver more effective and contextually relevant content. These
enhancements would further solidify the system’s role as a precision
tool for targeted advertising.

Broader Data Sources and Real-Time Camera Integration

A key limitation of the current system is its reliance on datasets that


lack sufficient representation of local demographics, particularly in
countries like Sri Lanka. Future efforts should prioritize creating or
incorporating datasets that reflect diverse facial features,
expressions, and cultural nuances. This improvement would
significantly enhance the system’s generalizability and accuracy.

Additionally, integrating real-time camera feeds into the system


would transform it into a dynamic, live-processing tool. While this
feature was excluded in the current project due to dataset
constraints, its inclusion would allow the system to process real-
world inputs instantly and offer on-the-spot recommendations. Using
regionally diverse training data would be essential to ensure
accuracy and reliability in real-time applications.

Scalability and Deployment


Scaling the system for large-scale deployment is a critical
consideration. Future work could focus on server-side optimization,
enabling the system to handle simultaneous inputs from multiple
devices and users. Cloud-based solutions could facilitate storage
and processing for larger datasets, while edge computing would
allow real-time analysis in low-latency environments. Deploying the
system on mobile platforms and web applications would expand its
accessibility, making it usable across various devices and settings.

Exploration of Ethical Implications

As AI-powered systems become more widespread, addressing


ethical implications is crucial. Data privacy must remain a priority,
with measures such as anonymization, secure data storage, and
clear user consent protocols implemented at every stage. Future
developments should focus on building transparent mechanisms
that allow users to control their data usage and understand how
their information is processed. Additionally, efforts to reduce biases
in the system—whether in demographic representation or
algorithmic decision-making—are essential to ensure fairness and
inclusivity.

You might also like