Emotion Detection Final Paper
Emotion Detection Final Paper
than one people communicate with each recognition includes determine how
other, they can quickly tell how each other individuals perceive or respond to given
feelings in audio. The model utilizes a model. To improve feature learning and
bidirectional gated recurrent unit (BGRU) screen out distracting information, the self-
network with an attention layer to gather attention mechanism is used. The approach
deep time series information, spectrogram beats existing models with accuracy rates of
features, and spatial data using a 86.03%, 86.07%, and 70.57% on three
convolutional neural network with residual emotional datasets (CASIA, IEMOCAP,
blocks. Using the IEMOCAP sentiment and MELD).
corpus, the model's accuracy rises.
Wani et al. [6] gives an in-depth
Zhang et al. [4] explains the examination of systems for Speech
increasing curiosity in multi-modal Emotion Recognition (SER). The design
emotion detection and the important role of components and methodologies of SER
recognizing feelings in human interaction. systems, including databases,
The authors recommend an approach to preprocessing, feature extraction, and
increase the accuracy of emotion classification techniques, are addressed.
identification utilizing text, video, and Along with highlighting the research
audio modalities. After preprocessing, they invalid in the subject, the study additionally
extract deep emotional features from the tackles the difficulties encountered in SER.
data and integrate the data at the feature
Yang et al. [7] explains the
level. The model's findings on the
evolution of the discrete emotion model-
IEMOCAP dataset are discussed in the
based spoken emotion recognition research.
release, exhibiting increased accuracy over
It provides an overview of speech emotion
speech emotion identification alone.
feature parameters and frequently utilized
Han et al. [5] recommends using a emotion databases. The study offers a
deep residual shrinkage network with a description of the emotion recognition and
bidirectional gated recurrent unit (DRSN- methods for extraction of features
Bi-GRU) to identify speech emotions. The employed in current Chinese research. It
approach makes use of the Mel- also discusses the challenges in recognizing
spectrogram, an attribute for speech that has emotions in speech and the directions that
information in both the historical and future research and growth might proceed.
frequency worlds. A convolution network,
Barhoumi et al. [8] demonstrates a
residual shrinkage network, bi-directional
real-time voice emotion identification
recurrent unit, and fully-connected network
system developed via data augmentation
are all included in the DRSN-Bi-GRU
and deep learning methods. The goal is to classification problems. When compared to
use the tone of the voice alone to identify earlier methods, the findings demonstrate
emotions. The system utilizes the use of that the suggested method achieves great
three separate datasets and a variety of precision in speech emotion identification.
feature selection techniques, including
Olatinwo et al. [10] proposes using
chroma, Root Mean Square Value (RMS),
the Internet of Things to develop a WBAN
Mel spectrograms, Zero Crossing Rate
(Wireless Body Area Network) system that
(ZCR), and Mel Frequency Cepstral
is emotion-aware and capable to grasp
Coefficients (MFCC). Emotion recognition
patients' expressed emotions. The
can be accomplished by three distinct deep
technology uses a combination of machine
learning models: Convolutional Neural
learning algorithms and IoT sensors to
Network (CNN), Multi-Layer Perceptron
assess and forecast patients' moods based
(MLP), and a hybrid model incorporating
on their speech. The writers look at several
CNN with Bidirectional Long-Short Term
methods for extracting features, techniques
Memory (Bi-LSTM). By evaluating the
to normalization, and algorithms for deep
suggested system's efficacy in real-time
learning and machine learning.
scenarios, the CNN + Bi-LSTM model
Additionally, they create a regularized
appears to be stronger.
CNN model and a hybrid deep learning
Uthayashangar et al. [9] draws model to lower computational complexity
attention to speech emotion recognition and boost prediction accuracy. The
(SER) and its potential applications in a accuracy of the suggested models is around
number of fields. The study uses Mel 98% when compared to an existing model.
Frequency Cepstral Coefficients (MFCCs)
Iliev et al. [11] investigates the
to extract attributes from voice data and
application of deep learning techniques in
convolutional neural networks (CNNs) to
artificial intelligence to determine emotions
characterize emotions. Preprocessing
through speech. It discusses how essential
speech data, feature selection, and
emotions are in human communication and
background noise reduction are all part of
how difficult it may be to separate emotions
the recommended approach. Using data
clearly from used signals. The chapter looks
augmentation techniques increases the
at and compares the performance of several
model's dependability. The CNN algorithm
deep learning and machine learning
is utilized for classification because of its
classifiers used in emotion detection. The
adaptability and history of success with
limitations of these approaches are also
a few of the features that the authors extract Chromogram, Mel scaled spectrogram,
from sound files. To identify emotions, a Spectral contrast, and Tonal Centroid were
one-dimensional Convolutional Neural some of the Acoustic characteristics which
Network (CNN) uses these properties as are obtained from speech data. These
inputs. The suggested approaches exceed features capture multiple speech signal
current frameworks and achieve excellent characteristics that are crucial for emotion
classification accuracy, providing a new recognition.
standard for emotion identification.
3.Deep Neural Network Model: It is a
classification model for emotion
recognition. It also uses other models like
SVM and Control Neural Network (CNN).
III. Methodology and approaches:
4.Training and Evaluation: The
Methodology: researchers trained the speech DNN using
the resultant characteristics and collected
The suggested system relies on
audio recordings.
emotion detection and uses some specified
dataset for system training. Following by IV. Approaches:
training, several preprocessing methods are S. Methods Param Challe
used, and feature extraction techniques is
No eters nges
then carried out. This dataset is utilized by
1 1MFCC, Accuracy,
the proposed method to classify the LPCC, Error rate, ----------
emotions into different categories. CNN
. DELTA, Time.
2 Data Lack of a
two classification methods used by the
balancing dedicated
system. Training data is utilized for
techniques, emotion
classification. Vectorization -------- recognitio
methods, n method,
1.Data Collection: To train the emotion
Word Unavailabi
recognition system, the researchers embedding lity of
gather audio recordings from 24 people in techniques, methods
Speech Emotion Recognition: Speech text. According to the findings, LFPC was
Literature Survey, and Feature paths for further study and advancement in
and analyse speech signals in real-time are emotional state more accurately through
growing more and more important. Future speech.
research may concentrate on creating
VIII. References:
effective structures and algorithms that can
offer real-time emotion identification [1] Kumar, Sandeep, Mohd Anul Haq,
capabilities, enabling applications in fields Arpit Jain, C. Andy Jason, Nageswara Rao
like affective computing, virtual assistants, Moparthi, Nitin Mittal, and Zamil S.
and human-computer interaction. Alzamil. "Multilayer Neural Network
Based Speech Emotion Recognition for
Cross-Cultural and Multilingual
Smart Assistance." Computers, Materials &
Emotion Recognition: various cultures Continua 75, no. 1 (2023).
and languages have various ways of
[2] Płaza, Mirosław, Robert Kazała,
expressing emotions. Future studies might
Zbigniew Koruba, Marcin Kozłowski,
work on creating emotion recognition
Małgorzata Lucińska, Kamil Sitek, and
algorithms that are accurate in identifying
Jarosław Spyrka. "Emotion Recognition
and interpreting emotions in cross-cultural
Method for Call/contact Centre
and multilingual settings.
Systems." Applied Sciences 12, no. 21
VII. Conclusion: (2022): 10951.
Deep learning algorithms can produce [3] Yan, Yu, and Xizhong Shen. "Research
fruitful outcomes. We successfully on speech emotion recognition based on
described a model for emotion recognition, AA-CBGRU network." Electronics 11, no.
and it scored 96% in testing. You should be 9 (2022): 1409.
aware that expecting feelings is arbitrary
[4] Zhang, Xue, Ming-Jiang Wang, and
and that different listeners may give any
Xing-Da Guo. "Multi-modal emotion
piece of music different emotional values.
recognition based on deep learning in
The algorithm occasionally generates
speech, video and text." In 2020 IEEE 5th
inconsistent results when trained on human-
International Conference on Signal and
rated emotions for the same reason. The
Image Processing (ICSIP), pp. 328-333.
system was trained using datasets such as
IEEE, 2020.
RAVDESS, which says mainly the speaker
accent may result in unexpected results. As [5] Han, Tian, Zhu Zhang, Mingyuan Ren,
a result, it seeks to convey the speaker's Changchun Dong, Xiaolin Jiang, and
Quansheng Zhuang. "Speech Emotion
[6] Wani, Taiba Majid, Teddy Surya [12] Pucci, Francesco, Pasquale Fedele, and
Gunawan, Syed Asif Ahmad Qadri, Mira Giovanna Maria Dimitri. "Speech emotion
Kartiwi, and Eliathamby Ambikairajah. "A recognition with artificial intelligence for
comprehensive review of speech emotion contact tracing in the COVID‐19
recognition systems." IEEE access 9 pandemic." Cognitive Computation and
(2021): 47795-47814. Systems 5, no. 1 (2023): 71-85.
[7] Yang, Chunfeng, Jiajia Lu, Qiang Wu, [13] Saini, Anu, Amit Ramesh Khaparde,
and Huiyu Chen. "Research progress of Sunita Kumari, Salim Shamsher,
speech emotion recognition based on Jeevanandam Joteeswaran, and Seifedine
discrete emotion model." In Journal of Kadry. "An investigation of machine
Physics: Conference Series, vol. 2010, no. learning techniques in speech emotion
1, p. 012110. IOP Publishing, 2021. recognition." Indonesian Journal of
Electrical Engineering and Computer
[8] Barhoumi, Chawki, and Yassine Ben
Science 29, no. 2 (2023): 875-882.
Ayed. "Real-Time Speech Emotion
Recognition Using Deep Learning and Data [14] Koppula, Neeraja, Koppula Srinivas
Augmentation." (2023). Rao, Shaik Abdul Nabi, and Allam
Balaram. "A novel optimized recurrent
[9] Uthayashangar, S. "Speech Emotion
network-based automatic system for speech
Recognition Using Machine
emotion identification." Wireless Personal
Learning." Journal of Coastal Life
Communications 128, no. 3 (2023): 2217-
Medicine 11 (2023): 1564-1570.
2243.
[10] Olatinwo, Damilola D., Adnan Abu-
[15] Tambat, Aditi Manoj, Ramkumar
Mahfouz, Gerhard Hancke, and Hermanus
Solanki, and Pawan R. Bhaladhare.
Myburgh. "IoT-Enabled WBAN and
"Sentiment Analysis-Emotion
Machine Learning for Speech Emotion
Recognition." Int. J. of Aquatic Science 14,
Recognition in Patients." Sensors 23, no. 6
no. 1 (2023): 381-390.
(2023): 2948.
[16] Jayanthi, K., and S. Mohan. "An
integrated framework for emotion
recognition using speech and static images