0% found this document useful (0 votes)
8 views5 pages

Computer Vision Fa2

Uploaded by

Radhika Rahane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Computer Vision Fa2

Uploaded by

Radhika Rahane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

COMPUTER VISION FA2

Name : Sejal Anil Rahane


PRN : 122B2B298
Div : D
Problem Statement : Emotion Recognition System Using Computer Vision

REPORT

1. Introduction :
Emotion recognition is the process of identifying and classifying human emotions based on facial
expressions, body language, voice intonations, or physiological signals. Human emotions play a
critical role in social interactions and decision-making processes. Understanding these emotions can
provide valuable insights for enhancing user experiences, improving communication, and facilitating
intelligent systems. As such, emotion recognition has attracted considerable attention across diverse
fields including psychology, healthcare, marketing, and human-computer interaction.
In recent years, emotion recognition has advanced rapidly, thanks to the development of powerful
computational techniques such as Computer Vision and Deep Learning. Traditional methods of
emotion detection required manual feature extraction and were often prone to inaccuracies due to the
complexity and variability of human emotions. However, with the advent of Deep Learning, especially
Convolutional Neural Networks (CNNs), the task of emotion recognition has become more efficient
and accurate. CNNs can automatically learn and extract meaningful features from facial images, which
reduces the need for hand-crafted features.
Computer Vision, in combination with Deep Learning models, has empowered machines to analyze
visual data in real-time, making them capable of recognizing human emotions from a variety of inputs,
including static images and live video feeds. The application of CNNs has proven particularly effective
in tasks like image classification, object detection, and face recognition, and this technology is now
being leveraged for emotion classification as well.
This project explores the development of an Emotion Recognition System using cutting-edge Deep
Learning methodologies. The system primarily focuses on detecting emotions such as angry, disgust,
fear, happy, sad, surprise, and neutral by analyzing facial expressions in images. In addition to using
CNNs, the project integrates Transfer Learning with pre-trained models like VGG16 to improve
performance by utilizing learned features from vast datasets. Furthermore, the project applies
Generative Adversarial Networks (GANs) to generate synthetic images, augmenting the dataset and
enhancing the model's ability to generalize to diverse emotional states.
The ability to accurately detect and interpret human emotions holds significant promise in various
applications:
• Psychology and Mental Health: Emotion recognition systems can assist in diagnosing mental health
conditions such as depression or anxiety, where monitoring emotional patterns can offer critical
insights.
• Human-Computer Interaction (HCI): Emotion recognition enables the development of systems that
can respond empathetically to user emotions, improving user experience in applications like virtual
assistants and gaming.
• Surveillance and Security: Recognizing emotions in public spaces can help in identifying distress or
aggression, thus aiding in maintaining public safety.
• Marketing and Customer Service: Companies can utilize emotion recognition to analyze customer
reactions to products or services, improving personalized marketing strategies and customer service
responses.
The aim of this project is to develop a robust system that can identify emotions accurately and
efficiently in both controlled and real-world settings, thereby contributing to advancements in emotion-
aware technologies.

2. Objective :
The primary objectives of this project are:
• To create a system that can identify emotions using Computer Vision techniques and Deep Learning
models.
• To implement and evaluate a Convolutional Neural Network (CNN) for emotion classification.
• To enhance the performance using Transfer Learning with pre-trained models like VGG16.
• To employ Generative Adversarial Networks (GANs) for generating synthetic images to further
improve training datasets.
• To analyze and compare the results in terms of accuracy and generalization on unseen data.

3. Dataset :
In this project, the FER-13 (Facial Expression Recognition) dataset was utilized, which consists of 35,887
grayscale images, each measuring 48x48 pixels. These images capture human faces, and each is labeled
with one of seven distinct emotions: Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.
The dataset is organized into three main sets:
• Training Set: This is the largest portion of the dataset and is used to train the model.
• Validation Set: This set is used to fine-tune model parameters and evaluate intermediate performance
during training.
• Test Set: This set contains unseen images and is employed to assess the final performance and
generalization of the model.
The images were preprocessed to ensure uniform size and format, including resizing to maintain a
consistent input dimension and normalizing pixel values for optimal model training. This preprocessing
step is crucial to improve the efficiency and accuracy of the deep learning model by standardizing input
data, which in turn reduces computational complexity and enhances model convergence.

4. Methodology :
Before training the model, images need to be processed to extract important features and reduce noise.
Key steps include:
• Grayscale Conversion: Images are converted from RGB to grayscale to simplify computations while
retaining critical facial features.
• Gaussian Filtering: Applied to smooth the image and reduce noise.
• Edge Detection: Canny edge detection is applied to detect and emphasize edges in the image, such as
the contours of facial expressions.
• Corner Detection: The Harris corner detection algorithm is used to identify distinct corners in the
image that may correspond to specific facial features (e.g., the corners of the mouth, eyes).
Visualization of Preprocessing Steps:
For a better understanding, the results of each preprocessing technique can be visualized as follows:
• Original Image: Unprocessed input image.
• Grayscale Image: Grayscale representation.
• Gaussian Blur: Smoothed image using Gaussian filtering.
• Edge Detection: Result of applying Canny edge detection.
• Corner Detection: Points marked where corners were detected.

5. Convolutional Neural Network (CNN)


A CNN was developed to automatically extract features from the image and classify emotions. CNNs are
highly effective in image-based tasks as they learn spatial hierarchies of patterns.
Network Architecture:
1. Input Layer: Takes images of size 48x48 pixels.
2. Convolution Layers:
o Conv1: 32 filters of size (3x3), followed by a ReLU activation.
o Conv2: 64 filters of size (3x3), followed by a ReLU activation.
o Conv3: 128 filters of size (3x3), followed by a ReLU activation.
3. Pooling Layers: Max-pooling is used after each convolution layer to downsample the image and reduce
the computational complexity.
4. Fully Connected Layer (FC): After flattening the output from the convolutional layers, a fully
connected layer with 128 neurons is used.
5. Output Layer: A softmax layer is used for classifying emotions into one of the seven categories.
Training Details:
• Optimizer: Adam optimizer with a learning rate of 0.001.
• Loss Function: Categorical cross-entropy.
• Batch Size: 64.
• Epochs: 50 epochs of training were performed.

6. Transfer Learning using VGG16


To improve the performance of the system, Transfer Learning was employed. A pre-trained model, VGG16
(trained on the large ImageNet dataset), was used to leverage pre-learned features. The last few layers of
VGG16 were replaced with custom layers suited to the emotion classification task.
Modifications to VGG16:
• The convolutional base of VGG16 was kept intact, but the top layers were replaced.
• A fully connected layer with 128 neurons was added, followed by a softmax layer for emotion
classification.
Training and Fine-Tuning:
The VGG16-based model was fine-tuned on the emotion dataset using the Adam optimizer and a lower
learning rate.

7. Generative Adversarial Networks (GANs)


Generative Adversarial Networks (GANs) were used to generate synthetic emotion images that can
augment the existing dataset. GANs consist of two networks:
1. Generator: Creates synthetic images from random noise.
2. Discriminator: Tries to distinguish between real and synthetic images.
The generator attempts to create realistic emotion images, while the discriminator improves its ability to
classify images as real or fake.

8. Results
Here are the results of the Emotion Recognition System project, detailing the accuracy achieved by
different approaches:
1. Convolutional Neural Network (CNN): The CNN model demonstrated the highest accuracy at 81%.
This suggests that the CNN architecture effectively captured the features in the FER-13 dataset,
making it the most reliable method among the tested approaches for emotion recognition.
2. Transfer Learning: The transfer learning approach yielded an accuracy of 42%. This lower
performance indicates that the pre-trained model may not have generalized well to the specific features
of the FER-13 dataset, suggesting potential limitations in adapting knowledge from the original model
to this particular task.
3. Generative Adversarial Network (GAN): The GAN approach achieved an accuracy of 52%. While this
result is better than transfer learning, it reflects the challenges inherent in training GANs for emotion
recognition, where the model may struggle with effectively distinguishing between different emotional
states.
Overall, the CNN approach significantly outperformed the other methods, highlighting its effectiveness in
emotion recognition tasks.
9. Conclusion :

The Emotion Recognition System project demonstrated varying levels of accuracy across three different
methodologies: Convolutional Neural Networks (CNN), Transfer Learning, and Generative Adversarial
Networks (GAN). The CNN approach emerged as the most effective, achieving an accuracy of 81%.
This suggests that CNNs are well-suited for extracting meaningful features from the FER-13 dataset,
allowing for robust emotion classification.

In contrast, the Transfer Learning and GAN methods recorded accuracies of 42% and 52%, respectively.
These results indicate challenges in adapting existing models and architectures for this specific emotion
recognition task. The lower performance of Transfer Learning points to potential inadequacies in the
model's ability to generalize, while the GAN's results highlight the difficulties associated with training
generative models in this context.

Overall, the findings reinforce the significance of model selection and architecture design in achieving
optimal performance in emotion recognition tasks. Future work may involve exploring further
enhancements to Transfer Learning and GAN strategies, as well as experimenting with additional
architectures to improve overall accuracy and reliability in emotion recognition systems.

You might also like