Manuscript J9
Manuscript J9
Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be)
University, Bhubaneswar, Odisha, India
[email protected]
[email protected]
[email protected]
[email protected]
1 Introduction
Communication along with behavior and decision-making depend heavily on
human emotional reactions. Human interaction requires people to instinctively
detect emotional signals through facial expressions. Yet machine-based com-
munication systems lack the ability to sense emotions in their interactions. A
2
model named Facial Emotion Detection using CNN and Haarcascade has been
created to develop a computational system which can detect facial expressions
in real-time.
1.1 Motivation(s)
The ongoing trend of relying upon digital interactions has created the oppor-
tunity to fill the uncertainty todays society faces as we rely upon machines that
lack emotional sensitivity. Traditional human-machine communication is often
devoid of a tonal quality that captures the emotional awareness of traditional
face to face interactions. The absence of emotional awareness whether it is
implicit, explicit, or relational is impacted by the mode of communication. The
absence of emotionally aware communication is particularly evident in areas
such as education, mental health based services, customer services, and areas
such as tele-medicine, where emotions are critical to the effective and success-
ful delivery of service. It is the need for a socially responsible application of
the project that has motivated its development and then project application.
This project developed a system that could analyze and interpret human emo-
tions via facial expressions in response to less emotionally aware machines.
The intersection of social sciences and the technical components we have de-
veloped form our goals within the project. We have suggested working to-
wards a socially responsible application that is technically feasible. The ability
to monitor emotions in real-time provides the capability to assess emotional
states while the subject is in an actual INTERACTION (real-world applica-
tion).
1.2 Objectives(s)
The main goal is to classify in real-time using limited computing resources to
ensure the system will work on devices with no dedicated GPU and that can
only connect to the internet intermittently. A particular focus will be on utiliz-
ing and integrating Haarcascade for lightweight face detection, and utilizing a
CNN model trained on the FER-2013 dataset for accurate emotion classifica-
tion. The initiative's focus is on end-user application in that it will include an
accessible graphical overlay to show what emotions (if any) are being detected
through the video stimulus, providing instant feedback. A crucial requirement
for the solution is modularity, which means that if either the detection al -
gorithm and/or deep learning model needs to be changed or updated later on, it
can be achieved without changing the entire solution. This will allow for pro-
gress to be made by constantly adapting to new technologies.
3
2 Literature Survey
For over a decade, facial emotion recognition (FER) has been an active research topic
in human-computer interaction and computer vision. The early FER systems relied on
human design features and used methods such as Local Binary Patterns, Gabor filters,
and Principal Component Analysis to extract features and implement a statistical
learning model, for example, Support vector machines and Decision Trees, to classify
the features obtained. These systems would obtain remarkable performance in an
ideal context, but they would struggle to generalize well in real-world scenarios based
on their choice of fixed methods for feature extraction, as well as generated datasets
that may not be robust against changes in real context conditions such as lighting,
noise, or poses.
In recent years, we have witnessed a major shift in technology with the introduction
of deep learning, particularly Convolutional Neural Networks (CNN). CNNs are suc-
cessfully directly learning features from raw data. A study by a team in Stanford Uni-
versity in 2013 was able to create a FER dataset based on web images (FER-2013). A
group in European Union in 2019 developed a new dataset more robustness wide
nuances in images based on task specific images (AffectNet). Although the systems
used advanced detection based on learning features and can successfully recognize
4
3 Proposed Model
The system was intended to rely on a hybrid of classical and deep learning
methods. The Haarcascade classifier, a classical model derived from the origi-
nal Viola-Jones algorithm, was utilized as the means for face detection. Haar-
cascade implements cascade classifiers which is trained with both positive and
negative images of faces, so it can successfully detect human faces in real-time
with the high level of accuracy. Ideal for frontal-face images, it can be used to
detect faces offline means it is lightweight and most applicable in this project..
lizes CNN and was trained on the FER-2013 dataset to classify emotions. The
end prediction is laid as text on the frame allowing for immediate visual repre-
sentation.
The system being proposed was designed to run on mid-range consumer plat-
forms to make it more accessible and deployable in a range of environments.
The hardware requirements are a machine (laptop or tower) with no less than an
Intel Core i5 processor, 8 GB RAM, and a webcam. A GPU is not needed to run
the system, but a GPU may improve plotting performance and model training
during development.
As for software, the system was developed with Python 3.8 due to the ease of
use, as well as the maturity and stability of Python's community. Key software
libraries we relied on included, but were not limited to: OpenCV (for live face
detection and rendering of video), TensorFlow and Keras (for constructing and
training the CNN model), NumPy (for numerical processing), Matplotlib (for
visualization), and Scikit-learn (for evaluation metrics and related data visual-
ization such as the confusion matrix and F1-score).
Evaluating the accuracy and overall performance of a forex attention model is part of
the model comparison process for forex awareness in desktop learning. In addition to
techniques like cross-validation and confusion matrices, this evaluation typically in-
cludes metrics like precision, recall, and F1-score. The evaluation results explain how
well it can identify and categorize different currencies.
either accurate or fake. The following equation can be used to model perform-
ance and determine a system's correctness.
This project validates the premise that a lightweight, offline-capable facial emo-
tion detection system can theoretically operate functionality using open-source
tools and particularly inexpensive and ubiquitous hardware. The hybrid archi-
tecture consisting of a Haarcascade face detection component and a CNN-based
emotion classification component serves as a strong basis for real-time affective
computing. This project prides itself as key contributions; one being that the
design is entirely user-centric (it is meant for human use), modular (updating the
face detection module or the classification module can be done independently),
unplugged (not reliant on a continuous internet connection or cloud-based ser-
vices), and importantly to promote privacy.
While the modularity approach presents future upgrade options to the user, it is
flexible enough - and, by virtue of a design that allows for independent modular
updating - so that it can be used long-term without encountering hurdles toward
more advanced artificial intelligences in the future. Furthermore, it has raised
the hope that by using grayscale input to design, using smaller CNN's, the sys-
9
tem may work well enough for classifying basic human emotions known to
researchers - to minimize computational demand and minimize power demand.
An ethical advantage of this approach is that it does the cloud and the ethical
dilemma of sending facial data to a non-user entity is avoided - this is para -
mount inside of healthcare and surveillance. The real-time feedback represented
in graphical overlay format can develop an engaging interface that will help
further enhance behavioral monitoring of mental health, classroom/learning
engagement systems, and customer service.
This project shows that a Facial Emotion Detection (FED) system that works in real-
time can be implemented using a combination of classical computer vision and deep
learning. We have shown the advantages of using a Haarcascade face detector as the
base for a CNN that was trained on the FER-2013 dataset to create a modular, effi-
cient and non-intrusive system capable of detecting seven basic human emotions Our
model achieved a test accuracy of 66%—this is an acceptable value for real-time ap-
plications, especially when employing small CNN models.
Future extensions should attempt to implement better face detectors than Haarcas-
cades (ie. Dlib or MediaPipe), deeper CNN architectures, and consider video-se-
quence pairwise data as inputs for models based on recurrent systems such as LSTMs.
Future research projects could lead to a multi-modal approach to emotion recognition
that incorporates audio or physiological data together.
References
[1] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cam-
bridge (2016)
[2] Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of
Simple Features. In: Proceedings of the 2001 IEEE Computer Society Con-
ference on Computer Vision and Pattern Recognition, vol. 1, pp. I–511.
IEEE (2001)
[3] Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: A Database for
Facial Expression, Valence, and Arousal Computing in the Wild. IEEE
Transactions on Affective Computing 10(1), 18–31 (2019)
Similarity Report