0% found this document useful (0 votes)
5 views11 pages

Manuscript J9

The document presents a Facial Emotion Detection (FED) system that utilizes Haarcascade for face detection and a Convolutional Neural Network (CNN) for classifying emotions in real-time, capable of running on non-GPU hardware. The system recognizes seven basic emotions and emphasizes user privacy and accessibility by operating offline. Future enhancements may include improved face detection methods and multi-modal emotion recognition approaches.

Uploaded by

sonalinayak0786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

Manuscript J9

The document presents a Facial Emotion Detection (FED) system that utilizes Haarcascade for face detection and a Convolutional Neural Network (CNN) for classifying emotions in real-time, capable of running on non-GPU hardware. The system recognizes seven basic emotions and emphasizes user privacy and accessibility by operating offline. Future enhancements may include improved face detection methods and multi-modal emotion recognition approaches.

Uploaded by

sonalinayak0786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Facial Emotion Detection

Sahil Kumar1, Aanjelina Rout2, Dona Priyadarshini3, Abhinav Kumar Sinha4,


Monalisa Panda5

Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be)
University, Bhubaneswar, Odisha, India

[email protected]
[email protected]
[email protected]
[email protected]

Abstract. Facial Emotion Detection (FED) is now established as a key


contributor to intelligent human-computer interaction. This project
proposes a real-time FED system capable of operating offline with
available non-GPU hardware. The system incorporates two key ele-
ments: Haarcascade for fast and lightweight face detection, and a Con-
volutional Neural Network (CNN) for classifying facial emotions
trained on the FER-2013 dataset. The system is capable of recognizing
a total of seven basic emotional states - Angry, Disgust, Fear, Happy,
Sad, Surprise, and Neutral - from webcam 48×48 grayscale facial emo-
tion images. The system provides live predictions and has low latency
due to a simple GUI that conveys real time emotion label classification.
Our goals included minimizing computation, ensuring user privacy and
accessibility by avoiding internet connectivity or cloud-based resources.
The modular structure of the FED system allows the two components
that detect and classify emotion to be separately upgraded to accommo-
date future work such as extensions to multi-modal data or advanced
face detectors. We showed that the FED system was capable of reason-
able performance and accuracy under benchmark testing as well as real-
world conditions.

Keywords Facial Emotion Recognition, Convolutional Neural Net-


work, Haarcascade Classifier

1 Introduction
Communication along with behavior and decision-making depend heavily on
human emotional reactions. Human interaction requires people to instinctively
detect emotional signals through facial expressions. Yet machine-based com-
munication systems lack the ability to sense emotions in their interactions. A
2

model named Facial Emotion Detection using CNN and Haarcascade has been
created to develop a computational system which can detect facial expressions
in real-time.

1.1 Motivation(s)
The ongoing trend of relying upon digital interactions has created the oppor-
tunity to fill the uncertainty todays society faces as we rely upon machines that
lack emotional sensitivity. Traditional human-machine communication is often
devoid of a tonal quality that captures the emotional awareness of traditional
face to face interactions. The absence of emotional awareness whether it is
implicit, explicit, or relational is impacted by the mode of communication. The
absence of emotionally aware communication is particularly evident in areas
such as education, mental health based services, customer services, and areas
such as tele-medicine, where emotions are critical to the effective and success-
ful delivery of service. It is the need for a socially responsible application of
the project that has motivated its development and then project application.
This project developed a system that could analyze and interpret human emo-
tions via facial expressions in response to less emotionally aware machines.
The intersection of social sciences and the technical components we have de-
veloped form our goals within the project. We have suggested working to-
wards a socially responsible application that is technically feasible. The ability
to monitor emotions in real-time provides the capability to assess emotional
states while the subject is in an actual INTERACTION (real-world applica-
tion).

1.2 Objectives(s)
The main goal is to classify in real-time using limited computing resources to
ensure the system will work on devices with no dedicated GPU and that can
only connect to the internet intermittently. A particular focus will be on utiliz-
ing and integrating Haarcascade for lightweight face detection, and utilizing a
CNN model trained on the FER-2013 dataset for accurate emotion classifica-
tion. The initiative's focus is on end-user application in that it will include an
accessible graphical overlay to show what emotions (if any) are being detected
through the video stimulus, providing instant feedback. A crucial requirement
for the solution is modularity, which means that if either the detection al -
gorithm and/or deep learning model needs to be changed or updated later on, it
can be achieved without changing the entire solution. This will allow for pro-
gress to be made by constantly adapting to new technologies.
3

1.3 Original Contributions


Emphasizing accessibility and privacy, this project offers an offline, low-cost
facial emotion recognition system appropriate for simple consumer goods. Un-
like many systems using premium GPUs or cloud computing, all processing
happens on-device. Separating face recognition from emotional categorization
with its modular construction enables easy updates and next multi-modal exten-
sions like voice or gesture input. A customized CNN educated on the FER-2013
database with data augmentation promises better accuracy. The user-friendly
GUI displays real-time emotion labels on webcam feeds and makes results quick
and simple to comprehend. This performance, flexibility, and ethical design
balance support several practical applications.

1.4 Paper Layout


This paper is excellent because it is broken into smaller portions. Specific in-
formation is easily obtainable. Section 1 presents the paper's introduction; Sec-
tion 2 delves into the literature review; and Section 3 presents our suggested
model. Section 4 displays our findings along with those of every other statistical
major. In section 5, we have brought our paper to a close and discussed potential
future developments.

2 Literature Survey

For over a decade, facial emotion recognition (FER) has been an active research topic
in human-computer interaction and computer vision. The early FER systems relied on
human design features and used methods such as Local Binary Patterns, Gabor filters,
and Principal Component Analysis to extract features and implement a statistical
learning model, for example, Support vector machines and Decision Trees, to classify
the features obtained. These systems would obtain remarkable performance in an
ideal context, but they would struggle to generalize well in real-world scenarios based
on their choice of fixed methods for feature extraction, as well as generated datasets
that may not be robust against changes in real context conditions such as lighting,
noise, or poses.

In recent years, we have witnessed a major shift in technology with the introduction
of deep learning, particularly Convolutional Neural Networks (CNN). CNNs are suc-
cessfully directly learning features from raw data. A study by a team in Stanford Uni-
versity in 2013 was able to create a FER dataset based on web images (FER-2013). A
group in European Union in 2019 developed a new dataset more robustness wide
nuances in images based on task specific images (AffectNet). Although the systems
used advanced detection based on learning features and can successfully recognize
4

features, the improvement in recognition performance also includes an advancement


in the underlying learning of complex features in images.

3 Proposed Model

3.1 Methodologies Used

The system was intended to rely on a hybrid of classical and deep learning
methods. The Haarcascade classifier, a classical model derived from the origi-
nal Viola-Jones algorithm, was utilized as the means for face detection. Haar-
cascade implements cascade classifiers which is trained with both positive and
negative images of faces, so it can successfully detect human faces in real-time
with the high level of accuracy. Ideal for frontal-face images, it can be used to
detect faces offline means it is lightweight and most applicable in this project..

3.2 Schematic Layout of the proposed system

Fig 1: Model Diagram

The conceptualization of the system includes a sequential modular design for


simplicity, transparency, and flexibility. There are two core modules we review
in this report; (1) Face Detection, and (2) Emotion Classification. Each module
is designed independently, ensuring that changes made to one module do not
affect the functionality of the other module. The first module uses real-time
video data from a webcam. In the second phase, the emotion classification uti-
5

lizes CNN and was trained on the FER-2013 dataset to classify emotions. The
end prediction is laid as text on the frame allowing for immediate visual repre-
sentation.

3.3 System Requirements

The system being proposed was designed to run on mid-range consumer plat-
forms to make it more accessible and deployable in a range of environments.
The hardware requirements are a machine (laptop or tower) with no less than an
Intel Core i5 processor, 8 GB RAM, and a webcam. A GPU is not needed to run
the system, but a GPU may improve plotting performance and model training
during development.
As for software, the system was developed with Python 3.8 due to the ease of
use, as well as the maturity and stability of Python's community. Key software
libraries we relied on included, but were not limited to: OpenCV (for live face
detection and rendering of video), TensorFlow and Keras (for constructing and
training the CNN model), NumPy (for numerical processing), Matplotlib (for
visualization), and Scikit-learn (for evaluation metrics and related data visual-
ization such as the confusion matrix and F1-score).

3.4 Proposed Algorithm

The algorithm, discussed here, operates sequentially in a pipeline starting with


the video capture stage (means of input) and concluding in emotion classifica-
tion and display. The following is a summary of the step-by-step process:
1) Video Frame Capture: the algorithm will grab the video input through a real-
time capture process using OpenCV
2) Face Detection: The grab object will pass through the Haarcascade frontal
face object classifier which will detect faces within the frame. Once a detected
face is present, the coordinates for that face will be outputted.
3) Preprocessing of Image: Once a face is detected; the face will be cropped
from the image, resized to 48×48 pixels, converted to grayscale, and pixel val-
ues normalized.
4) Prediction of Emotion: The pre-processed face image will then be used as an
input to the trained CNN model. The CNN will consist of three convolutional
layers utilizing ReLU activations, max-pooling layers, dropout layers, and a
dense output layer utilizing SoftMax.
5) Emotion Label Output: The CNN will provide probabilities for each of the
seven emotion classes that correspond to the image input. The emotion with the
highest score indicates the detected emotion.
6) Real-Time Display: The predicted emotion will be displayed (overlaid) in
real-time on the live video stream utilizing OpenCV's text overlay function.-
models.
6

4 Experimentation and Model Evaluation

Evaluating the accuracy and overall performance of a forex attention model is part of
the model comparison process for forex awareness in desktop learning. In addition to
techniques like cross-validation and confusion matrices, this evaluation typically in-
cludes metrics like precision, recall, and F1-score. The evaluation results explain how
well it can identify and categorize different currencies.

4.1 Depiction Results

Fig 2: Accuracy and Loss Over Epochs

Fig 3: Dataset Division


7

Fig 4: Confusion matrix

4.2 System Performance Evaluation

Performance Metrics: We used a variety of criteria to combine the ap-


proaches' abilities. The bulk of them are based on the cm (confusion matrix), a
table representing the execution of various models on the test set. Accuracy
demonstrates the preparation of precisely estimated observations that were
8

either accurate or fake. The following equation can be used to model perform-
ance and determine a system's correctness.

 Accuracy: It is the most widely utilized way of ML model validation

Accuracy= (𝑇P + TN) / 𝑇P + 𝐹P + 𝑇N + FN


for evaluating issues connected to categorization.
(1)

 Recall: The true-positive rate is another name for the proportion of


data samples that a machine-learning technique correctly identifies as
belonging to a unit of interest—the "certain-class"—out of all the
samples in front of that relationship.
Recall= TP / (TP + FN) (2)

 F1-Score: The F1 score can be used as the goal function to discover


the most optimal combination of precision and recall when optimizing
a model.
F1-Score= TP / {TP + (1/2) FP +FN} (3)

 Precision: It evaluates the capacity of a model to properly notice


positive samples from all expressed positive samples. Precision can
be especially profitable when the cost of false positives is vital.
Precision= TP/ (TP + FP) (4)
Where TP: True Positive, TN: True Negative, FP: False Positive,
FN: False Negative

4.3 Discussions on Contributions

This project validates the premise that a lightweight, offline-capable facial emo-
tion detection system can theoretically operate functionality using open-source
tools and particularly inexpensive and ubiquitous hardware. The hybrid archi-
tecture consisting of a Haarcascade face detection component and a CNN-based
emotion classification component serves as a strong basis for real-time affective
computing. This project prides itself as key contributions; one being that the
design is entirely user-centric (it is meant for human use), modular (updating the
face detection module or the classification module can be done independently),
unplugged (not reliant on a continuous internet connection or cloud-based ser-
vices), and importantly to promote privacy.

While the modularity approach presents future upgrade options to the user, it is
flexible enough - and, by virtue of a design that allows for independent modular
updating - so that it can be used long-term without encountering hurdles toward
more advanced artificial intelligences in the future. Furthermore, it has raised
the hope that by using grayscale input to design, using smaller CNN's, the sys-
9

tem may work well enough for classifying basic human emotions known to
researchers - to minimize computational demand and minimize power demand.

An ethical advantage of this approach is that it does the cloud and the ethical
dilemma of sending facial data to a non-user entity is avoided - this is para -
mount inside of healthcare and surveillance. The real-time feedback represented
in graphical overlay format can develop an engaging interface that will help
further enhance behavioral monitoring of mental health, classroom/learning
engagement systems, and customer service.

5 Conclusion and Future Scope

This project shows that a Facial Emotion Detection (FED) system that works in real-
time can be implemented using a combination of classical computer vision and deep
learning. We have shown the advantages of using a Haarcascade face detector as the
base for a CNN that was trained on the FER-2013 dataset to create a modular, effi-
cient and non-intrusive system capable of detecting seven basic human emotions Our
model achieved a test accuracy of 66%—this is an acceptable value for real-time ap-
plications, especially when employing small CNN models.

Future extensions should attempt to implement better face detectors than Haarcas-
cades (ie. Dlib or MediaPipe), deeper CNN architectures, and consider video-se-
quence pairwise data as inputs for models based on recurrent systems such as LSTMs.
Future research projects could lead to a multi-modal approach to emotion recognition
that incorporates audio or physiological data together.

References

[1] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cam-
bridge (2016)

[2] Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of
Simple Features. In: Proceedings of the 2001 IEEE Computer Society Con-
ference on Computer Vision and Pattern Recognition, vol. 1, pp. I–511.
IEEE (2001)

[3] Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: A Database for
Facial Expression, Valence, and Arousal Computing in the Wild. IEEE
Transactions on Affective Computing 10(1), 18–31 (2019)

[4] Kaggle: Facial Expression Recognition Challenge (FER-2013).


https://www.kaggle.com/c/challenges-in-representation-learning-facial-
expression-recognition-challenge
[5] OpenCV Documentation. https://docs.opencv.org
10

[6] Keras API Documentation. https://keras.io

[7] TensorFlow Guide. https://www.tensorflow.org (Accessed: 2025-05-


20)Rehman, Sadiq Ur, et al. "Low-Cost Smart Home Automation System
with Advanced Features." Quaid-E-Awam University Research Journal of
Engineering Science and Technology Nawabshah, vol. 20, no. 1, 2022, pp.
74-82.
11

Similarity Report

You might also like