AMED Project 2
AMED Project 2
Monsoon 2024
SUBMITTED BY
SALINI C P (M240869EC)
SREEDARSHANA C V (M241161EC)
To design and implement a facial emotion detection system using a convolutional neural network
(CNN) trained on a dataset of labeled facial images, enabling accurate classification of human
emotions into predefined categories such as angry, disgust, fear, happy, sad, surprise, and neutral.
The system involves preprocessing images, detecting faces using Haar Cascade, and leveraging a
trained deep learning model to predict emotions from images and with the potential for real-time
applications.
THEORY
Facial emotion recognition (FER) is the process of detecting and analyzing human emotions
from facial expressions using computer vision and machine learning techniques. This technology
relies on the identification of specific patterns and features in facial images, such as eye
movement, mouth curvature, and other expressions that correspond to emotional states like
happiness, sadness, anger, or surprise.
The significance of FER lies in its ability to enable machines to understand and interpret human
emotions, thereby bridging the gap between human behavior and artificial intelligence. It
facilitates more intuitive human-computer interactions and helps in understanding psychological
and social cues. FER is particularly valuable in applications that require empathy,
communication, or behavioral analysis.
CNNs are a class of deep learning models specifically designed to process and analyze grid-like
data such as images. They have revolutionized fields like computer vision by automating the
process of feature extraction and enabling accurate classification, detection, and recognition
tasks.
CNNs work by applying a series of operations, such as convolutions and pooling, that learn to
extract and prioritize important features from input data. These features are then passed to fully
connected layers for decision-making, such as predicting an image's category or class.
The architecture of a CNN is inspired by the structure of the visual cortex, which processes
visual information hierarchically. A typical CNN architecture includes several layers like
convolutional, pooling, flattening, fully connected (dense), and dropout layers.
In this project, the CNN architecture was defined entirely from scratch without using any
pre-trained models like ResNet or VGGNet, in compliance with the guidelines. The architecture
2
was designed to effectively handle the task of facial emotion recognition while maintaining
computational efficiency.
1. Number of Layers:
○ The model consists of 3 convolutional layers, 3 max pooling layers, 4 dropout
layers, 1 flattening layer, and 2 fully connected (dense) layers.
○ The sequence of these layers was carefully selected to ensure proper feature
extraction, dimensionality reduction, and robust classification.
2. Neurons per Layer:
○ The number of filters (neurons) in the convolutional layers increases
progressively: 32, 64, 128. This gradual increase allows the network to capture
finer and more complex features at each stage.
○ The first dense layer has 1024 neurons, enabling the network to learn complex
relationships between features, while the final dense layer has 7 neurons,
corresponding to the 7 emotion classes.
3. Activation Functions:
○ ReLU (Rectified Linear Unit) is a popular activation function in deep learning,
defined as
f(x)=max(0,x)
where xirepresents a specific logit. This allows the model to interpret outputs as
probabilities, making it easy to determine the most likely class. Softmax is
particularly effective when the task requires mutually exclusive class predictions.
3
4. Optimizer:
○ The Adam (Adaptive Moment Estimation) optimizer was chosen for its
adaptive learning rate, which helps accelerate convergence while preventing
overfitting. The Adam optimizer is an advanced optimization algorithm used in
deep learning models. It combines the advantages of two other optimizers:
AdaGrad and RMSProp. Adam calculates adaptive learning rates for each
parameter by maintaining both the first moment (mean) and second moment
(uncentered variance) of the gradients. This allows it to adapt the learning rate for
each parameter individually, improving convergence speed and efficiency. Adam
also uses bias correction terms to prevent issues in the initial stages of training.
With its robust performance and low memory requirements, Adam is widely used
in training deep neural networks.
One of the key strengths of Adam is that it adapts the learning rate for each
parameter individually. Parameters with large gradients will receive smaller
updates, and parameters with smaller gradients will receive larger updates, which
helps the model converge faster and reduces the need for manual tuning of the
learning rate.
○ Learning rate: 0.0001.
5. Loss Function:
○ In deep learning, a loss function (also known as the cost function) is crucial for
training models because it measures how well the model's predictions match the
true values (targets). During training, the model learns by adjusting its weights to
minimize this loss function. The process of minimizing the loss function is done
using an optimization algorithm (like Adam or SGD), which updates the model's
parameters based on the gradients of the loss with respect to the model's weights.
○ In the context of classification tasks, where the goal is to predict a class or
category (like recognizing emotions from facial expressions), a suitable loss
function is necessary to quantify how far off the model’s predicted class
probabilities are from the true class labels.
○ Categorical Cross-Entropy is a loss function specifically designed for
multi-class classification problems where the goal is to classify an input into one
of several possible categories. It is commonly used when the model's output is a
set of probabilities for each class (as in a softmax output layer).
4
yi is the true label for class i (this is typically a one-hot encoded vector for
multi-class classification, where the correct class is 1 and all other classes are 0).
piis the predicted probability for class i (output from the softmax layer of the
model).
The Haar Cascade Classifier is a popular and efficient method for object detection, particularly
for tasks like face detection. It is based on the concept of using Haar-like features to identify
specific regions of interest in an image. This technique is widely used in computer vision due to
its balance of accuracy and computational efficiency.
Haar Cascade is an object detection framework proposed by Paul Viola and Michael Jones in
their landmark 2001 paper titled “Rapid Object Detection using a Boosted Cascade of Simple
Features”. It is especially effective for detecting objects with consistent and distinguishable
features, such as faces.
This framework relies on machine learning to train a classifier using positive images (containing
the object of interest, such as faces) and negative images (not containing the object). Once
trained, the classifier can be used to detect objects in real-time from new images or video frames.
It works by applying Haar-like features, which compare the intensity of adjacent rectangular
regions in an image, to detect specific patterns such as edges or lines that characterize facial
structures.
The dataset used for training and validation in this project plays a crucial role in building an
effective facial emotion recognition model. We are using the dataset FER-2013 (Facial
Expression Recognition 2013) from Kaggle that corresponds to emotions - Angry, Fear, Happy,
Sad, Surprise, Disgust, Neutral. These datasets are essential for teaching the model to generalize
across different facial expressions.
To prepare the dataset for training, several preprocessing techniques were applied to ensure
consistency and optimize computational efficiency. Rescaling pixel values or normalization, was
performed by dividing pixel intensities by 255, transforming the values to range between 0 and
5
1. This step helps the neural network converge faster and improves numerical stability during
training. Grayscale conversion was applied to the images to reduce computational complexity
and focus on intensity-based features, as color information is not critical for detecting emotions.
Finally, the images were resized to a fixed dimension of 48x48 pixels, ensuring uniform input
size to the CNN while retaining sufficient detail for emotion recognition. These preprocessing
steps streamline the training process and enhance the model's ability to learn from the data
effectively.
ALGORITHM DESCRIPTION
● Import libraries such as numpy, cv2, keras, and matplotlib for handling image processing,
model training, and visualization.
1. Dataset Extraction:
○ Upload a dataset ZIP file and extract its contents to obtain the training (train_dir)
and validation (val_dir) datasets.
2. Data Preprocessing:
○ Rescaling: Normalize pixel values to the range [0, 1] by dividing by 255 using
ImageDataGenerator.
○ Grayscale Conversion: Convert images to grayscale to reduce computational
complexity.
○ Image Resizing: Resize images to 48x48 pixels to standardize input size for the
model.
3. Data Augmentation:
○ Use ImageDataGenerator to augment the dataset by applying transformations like
zoom, shift, and flips, improving model generalization.
6
Step 3: CNN Model Definition
1. Architecture Design:
○ Define a custom Convolutional Neural Network (CNN) model.
○ Include three convolutional blocks with:
■ Convolutional Layers: Extract spatial features using filters (e.g., 32, 64,
128 filters).
■ Batch Normalization: Normalize feature maps to stabilize training.
■ MaxPooling: Reduce dimensionality and retain key features.
■ Dropout: Prevent overfitting by randomly deactivating neurons during
training.
○ Add fully connected layers:
■ Flatten: Transform the 2D feature map into a 1D array.
■ Dense Layers: Perform high-level reasoning and classification.
■ Use Dropout to further reduce overfitting.
○ The final layer uses the softmax activation function to output probabilities for 7
emotion classes.
2. Compile the Model:
○ Loss Function: categorical_crossentropy for multi-class classification.
○ Optimizer: Adam with a learning rate of 0.0001.
○ Metrics: Accuracy to evaluate the model's performance.
○ Training Parameters:
■ Batch size: 64
■ Number of epochs: 10
■ Steps per epoch: Calculated based on dataset size.
○ Use augmented training data and validation data to fit the model.
7
○ Detect faces using cv2.CascadeClassifier and crop the detected face region.
○ Save the cropped face as capture.jpg.
Step 8: Visualization
● Display the predicted emotion scores as a bar graph with labels for each emotion.
● Print the emotion probabilities for better interpretability.
8
9
10
11
12
13
CONCLUSION
The Facial Emotion Detection System successfully demonstrates the use of Convolutional
Neural Networks (CNNs) combined with Haar Cascade for real-time emotion recognition. By
employing a well-designed custom CNN architecture, the model effectively classifies facial
expressions into seven distinct emotions with high accuracy. Preprocessing techniques such as
grayscale conversion, normalization, and resizing ensure computational efficiency, while data
augmentation enhances the model's generalization capabilities.
The project highlights the importance of deep learning and computer vision in solving real-world
problems, such as emotion analysis, which has applications in diverse fields like healthcare,
psychology, and human-computer interaction. The use of Haar Cascade for face detection
ensures accurate input data for the CNN, while the visualization of predictions makes the system
user-friendly.
Overall, this project serves as a foundation for building more advanced emotion recognition
systems by incorporating additional features such as real-time video analysis, multi-face
detection, and integration with IoT devices for real-world applications. It showcases the potential
of machine learning in improving human-computer interaction and understanding human
emotions.
REFERENCES
● https://www.kaggle.com/datasets/msambare/fer2013/data
● https://github.com/komalck/FACIAL-EMOTION-RECOGNITION/blob/master/Facial_e
motion_recognition.ipynb
● https://blog.clairvoyantsoft.com/emotion-recognition-with-deep-learning-on-google-cola
b-24ceb015e5
● S. Alizadeh and A. Fazel, "Convolutional Neural Networks for Facial Expression
Recognition," Stanford University
14