0% found this document useful (0 votes)

2 views6 pages

Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition

The document presents a study on the effectiveness of advanced machine learning models, particularly LSTM and CNN, in speech emotion recognition (SER) using the RAVDESS dataset. The research highlights that LSTM achieved the highest accuracy of 91%, outperforming traditional models like SVM and Random Forests. The findings suggest that deep learning techniques can significantly enhance the detection of emotional states in speech, which has applications in various fields such as healthcare and security.

Uploaded by

nbprathyusha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views6 pages

Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition

Uploaded by

nbprathyusha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)

Exploring the Effectiveness of Advanced Machine

Learning Models in Speech Emotion Recognition
Kanika Jangra Deepika Ghai Sandeep Kumar
Department of Electronics and Department of Electronics and Department of Computer Science and
Communication Enginnering Communication Enginnering Engineering
Lovely Professional University, Phagwara Lovely Professional University, Phagwara Koneru Lakshmaiah Educational
Punjab, India Punjab, India Foundation
2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE) | 979-8-3503-6684-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/IC3SE62002.2024.10593399

[email protected] [email protected] Vaddeswaram, India

[email protected]

Abstract— The importance of recognizing emotion from (MLP), and Support Vector Machines (SVM) are the linear
voice stems from the basic human need to understand and models that are most often used to recognize emotions. The
communicate emotional states, which is vital in enhancing speech sound is not usually thought of as being fixed. Since
security, health care, etc. This study compares several advanced this is the case, nonlinear models should do well in SER. SER
machine learning models to assess their effectiveness in can be used with several different nonlinear classification
recognizing emotions from speech, using the widely accepted methods. These are frequently employed to put data into
RAVDESS, i.e. Ryerson Audiovisual Database of Emotional groups based on basic-level traits.
Speech Song. Our research focuses on the study of depth models
of Convolutional Neural Networks (CNNs) and Long Short- A lot of the time, energy-based traits like Perceptual Linear
Term Memory networks (LSTMs) versus conventional machine Prediction cepstrum coefficients (PLP), Mel-Frequency
learning algorithms, like Support Vector Machines (SVMs), Cepstrum Coefficients (MFCC), Linear Predictor Coefficients
Random Forests (RFs), and Long-Range Machines (GBM). (LPC), and Mel Energy-spectrum Dynamic Coefficients
Through careful preprocessing, feature extraction using Mel- (MEDC) are used to pick out feelings in speech accurately.
Frequency Cepstral Coefficients (MFCCs). The research Deep learning techniques for SER have multiple advantages
concludes that LSTM performs better at 91% than the other over traditional methods. For example, they can find complex
implemented models. Thus, in future, voice-based emotion structures and includes without requiring human feature
recognition can help with diagnosis with ongoing monitoring of extraction and tuning. They also prefer to extract features at a
mental health conditions like depression, anxiety and stress by
low level from raw data and can work with data that has yet to
detecting emotional distress or mood changes.
be labelled. Deep Neural Networks (DNNs) with
Keywords— Machine Learning, emotion detection, voice Convolutional Neural Networks (also known as CNN) are
detection, CNN, LSTM, SVM, GBM, RF, MFCC. good at handling images and videos. Speech-based
classification tasks like natural language processing (NLP)
I. INTRODUCTION and speech recognition (SER) should use recurrent designs,
Human speech includes numerous features that the listener such as recurrent neural networks (RNNs) with Long Short-
examines to understand the complicated information supplied Term Memory (LSTM). In other words, the study focuses on
by the speaker. Inadvertently, the speaker conveys tone, the efficiency of machine learning and deep-learning
intensity, tempo, and other auditory properties, which help to algorithms to detect feelings.
capture both the subtext or meaning and the precise words. II. LITERATURE WORK
Emotion detection has many applications in medical treatment,
security, forensic sciences, and other fields. Models such as Speech emotion recognition enhances human-machine
LSTM do computations in a timestep sequence. Numeric interaction through emotional classification [1]. Fusion of
features are fed into a network of neural networks, which spatial and temporal feature depictions for speech emotion
outputs the logit vector. LSTMs, the decoder, was built to be recognition [2] achieves higher accuracy on RAVDESS and
an attention-based machine that trained on the encoder's learnt IEMOCAP datasets and outperforms state-of-the-art models.
representation to produce an output chance for the following Emotion recognition based on speech and audio features using
character sequence. When examining MFCCs as time-series MFCC and CNN+LSTM algorithms [3]. Anger and neutral
information, LSTMs or their more complicated counterparts emotion performed best, yielding an accuracy of 61.07%.
are used to address the issue of the speech emotion recognition Deep learning techniques are critical solutions for SER [4].
problem of classification. CNNs work with MFCCs in a single Speech emotion recognition (SER) is a method for extracting
dimension or acquire to recognize Mel spectrograms applying emotions from human speech. The analysis used the
2D filters. RAVDESS data set and achieved an accuracy of 80.64% with
CNN LSTM [5]. Hybrid MFCCT features with CNN
SET, which stands for speech emotion identification, has performed better than MFCC and domain features [6].
two steps: extracting features and categorizing features.
Speech-processing researchers have developed several A proposed self-concept-based deep learning model for
features, such as source-based excitement features, prosodic speech perception recognition. The optimized data set
characteristics, vocal sliding factors, and mixed features. In obtained an experimental accuracy rate of 90% [7]. Bilingual
the second step, nonlinear and linear algorithms are used to Arabic English Speech Emotion Recognition System. High
sort the features into groups. Bayesian networks (BN), which performance with low computing costs". Speech emotion
are sometimes called the Minimum Likelihood Principle recognition using audio features achieves an accuracy of
85%[8]. Detection of sarcasm was done with a score of 75%.

507
979-8-3503-6684-6/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: Nitte Meenakshi Institute of Technology. Downloaded on April 22,2025 at 06:20:11 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)

"Human speech emotion recognition using CNN[9]. The shown in Figure 1. The model involves various steps for the
model outperformed the other models and achieved an detection of 6 classes of emotions.
accuracy of 94.38%. A variety of audio and machine learning
algorithms are used for emotion recognition. A proposed
method for speech perception detection using masked sliding
windows[10]. A deep neural network-based classifier achieves
high accuracy with sentiment data sets. Emotion recognition
uses speech signals in the intelligence system[11]. Deep
learning techniques for feature extraction and model building.
This paper describes a set of sound structures means built on
Match Frequency Cepstral Coefficient (MFCC)[12], Wavelet
Packet Transformation (WPT), Linear Predictive Cepstral
Coefficient (LPCC), Zero Crossing Rate (ZCR), Spectrum
Center, Spectral Rolloff. Spectral Kurtosis[13], Root surface
square (RMS), pitch, jitter, and shimmer to improve a
particular feature[14]. This paper explains acoustic text
features in hidden space are used to select a perceptual class
with minimum generalized reconstruction error as an SER
result, which can be used as an indicator to decide whether the
class is neutral or not and thus can be applied to it other classes
of perception[15]. Voice is a powerful emotional state;
loudness and tone often betray underlying emotional states.
Advances in SER systems have been characterized by the
inherently language-driven nature of consumer engagement to
enhance user experience through responsive and sensitive
technology[16].
Early approaches to SER, as described in the literature,
included the development of unique classifiers based on
extraction methods from speech signals. These classifiers Fig. 1. Flow Chart of Proposed Work
were trained on tone, pitch, and strength to distinguish
between emotional states[17]. One study stated that linear A. Dataset Used
discriminant analysis (LDA) and support vector machine We used the RAVDESS dataset for this study because it is
(SVM) were used to detect four primary emotions: happy, sad, an open-source collection that scientists can use to find out
angry, and neutral. Deep learning, especially 2D how people feel when they talk. Research the Ryerson Audio-
Convolutional Neural Networks (CNNs), shows essential Visual Database of Emotional Speech and Songs, also known
progress in the field. CNNs have shown promise in classifying as RAVDESS, has 7356 recordings showing emotions. These
emotions, with a reported accuracy of around 70% when files have three types: full AV, video-only, and audio-only.
analyzing data sets[18]. Including CNNs highlights the shift There are also two voice channels, one for spoken text and one
towards architectures that can extract and learn the most for song. One character in each file plays one of the eight
suitable features for SER tasks with little domain knowledge. feelings below: neutral, happy, sad, angry, scared, shocked, or
A notable case in this area is the RAVDESS, which contains sickened..
linguistic content with various sensory properties. This data
set has contributed to developing SER systems that are more B. Data Visualization
nuanced and capable of understanding complex human Figure 2 shows the count of emotions in the dataset; it
emotions[19]. Research suggests that gender-based training describes the colour of each bar indicates a specific emotion,
can help develop more accurate SER models, emphasizing the while its height indicates the frequency of that emotion.
importance of individualized programs. One of the recent
studies proposed a less complex SER algorithm that showed
good performance using only Mel-frequency cepstral
coefficients (MFCC)[20-21]. Thus, the survey describes that
extensive research in advanced machine learning techniques
and deep learning techniques are used for speech emotion
detection (SER) [22-23]. However, real-world applications
often have environments with variable noise and sound, which
can reduce the performance of SER models that are not
explicitly designed to handle such situations; our proposed
model performs augmentation by noise pitching to check the
efficiency of the mode. Thus, the research is also efficient in
giving excellent analysis by comparing the outcomes of
multiple machine learning and deep learning algorithms.
III. PROPOSED WORK Fig. 2. Count of Emotions in RAVDESS Dataset

The proposed model describes the efficient method of The research analysed the highest count of emotions, such
detecting emotions using machine learning algorithms, as as anger, sadness, fear, happiness, and disgust; each visual

508

shows a waveform and a spectrogram with frequency. They for emotional information; data augmentation describes the
explain waves: the figure on the left shows a wave that is the process of affectedly expanding the amount of data by
visual magnitude of an acoustic signal. The x-axis indicates manipulating existing data. Figure 8 shows the original voice;
time, and the y-axis represents amplitude. Peaks in the we have performed some standard data augmentation methods
waveform indicate where the sound is loudest (highest applied to voice emotion recognition, and Figure 8 shows the
amplitude) and troughs quietest (lowest amplitude). original voice.
Spectrogram with fundamental frequency: The graph on the
right is a spectrogram, which explains the representation of the
spectrum frequencies in sound or other signals as they vary
with time. Here again, time at x-3. Axis: the y-axis represents
frequency (in Hz) and colours at any given time. The brighter
the colour, which indicates the intensity or signal on each
frequency, the more energy there is. The cyan line that appears
to be traced through the centre of the spectrogram indicates
the dominant frequency at which the signal evolves, the lowest Fig. 8. Original Voice
frequency of the sound perceived as the pitch of the tone.
Figure 3 shows anger. While Figure 4 shows disgust, Figure 5 Noise Injection: Adding background noises to clean audio
fear, Figure 6 is happy, and Figure 7 is sad. samples helps to stabilize the image in real-world situations
with background noise, the sample voice, by injecting the
voice is as shown in Figure 9.

Fig. 3. Emotion: Angry on RAVDESS Dataset

Fig. 9. Noised Voice

Time Stretch/Voice Modification: Using this model, we can

recognize emotion even when the speech is different by
varying the pronunciation speed without affecting the
pronunciation rate (time dilation) or not changing the speed
Fig. 4. Emotion: Disgust on RAVDESS Dataset (modification), as shown in Figure 10.

Fig. 5. Emotion: Fear on RAVDESS Dataset

Fig. 10. Stretched Voice

Shifted Voice: In the context of audio processing, “ shifted

voice” refers to a change in the original pitch or tempo of a
voice recording, as shown in Figure 11.

Fig. 6. Emotion: Happy on RAVDESS Dataset

Fig. 11. Shifted Voice

Pitched Voice: The frequency at which the vocal cords

generate sound waves is thought to change the pitch of the
Fig. 7. Emotion: Sad on RAVDESS Dataset
voice, as shown in Figure 12.

C. Pre-processing
In the research, we have implemented the augmentation
method for pre-processing in the context of voice recognition

509

IV. RESULT ANALYSIS

Analysis of the RAVDESS data set using various machine
learning models provides in-depth analysis to check the
performance of each to detect the emotions using deep
learning models, especially CNN and LSTM, which showed
Fig. 12. Pitched Voice slightly better performance than ensemble methods such as RF
and GBM, as well as traditional classifiers like SVM -Metrics
D. Feature Extraction: also reinforce this, with CNN and LSTM. The CNN model
Mel Frequency Cepstral Coefficients (MFCCs) are used in balances precision and harmonic approaches to accuracy and
speech and audio processing. The MFCC includes several recall, as evidenced by its F1-score of 0.9. Similarly, LSTM
steps; the process of MFCC is applied to each implemented achieves a commendable 92.3% accuracy and 0.91 F1 score,
model in the proposed system; the sample results of MFCC highlighting the suitability of capturing temporal features in
are as shown in Figure13-a for anger, Figure13-b for disgust, audio data. Analysis of the outcome of models trained on the
Figure13-c for fear, Figure13-d for happiness, Figure13-e for RAVDESS dataset is necessary to assess the performance of
sad respectively. each algorithm in the context of emotion recognition. This
procedure compares the ability of each model to accurately
predict emotion state, which distinguishes them based on
features extracted from audio recordings; the proposed models
give an efficient comparison for the detection of emotions, as
shown in Table 1 and Figure 14.

TABLE I. COMPARATIVE ANALYSIS

(a) Angry (b) Disgust
Model Accuracy % Precision % Recall % F1-Score
CNN 0.91 0.9 0.91 0.9
SVM 0.85 0.84 0.86 0.85
LSTM 0.92.3 0.9 0.92 0.91
RF 0.87 0.86 0.88 0.87
GBM 0.89 0.88 0.9 0.89
(c) Fear (d) Happiness

COMPARATIVE ANALYSIS
PERFORMANCE METRIC

Accuracy Precision Recall F1-Score

0.92
0.92
0.91
0.91

0.91

0.89

0.89
0.9
0.9

0.9

0.9
0.88

0.88
0.87

0.87
0.86

0.86
0.85

0.85
0.84

(e):Sad
Fig. 13. (a) Angry, (b) Disgust, (c) Fear, (d) Happiness (e) Sad

E. Models Implemented: CNN SVM LSTM RF GBM

MODEL IMPLEMENTED
SVM (Support Vector Machine): Classifies voice emotion
states by finding the best boundary between emotion classes
in the RAVDESS dataset feature space. Fig. 14. Comparative Analysis
LSTM (Long Short-Term Memory): Predicts emotional
cues from voice data by learning time-dependence and speech
processing in RAVDESS sequences.
CNN (Convolutional Neural Network): Automatically
extracts peaks from spectrogram or MFCC of RAVDESS
audio to classify perceptual segmentation.
RF (random forest): Aggregates the decisions of multiple
decision tree classifiers trained on different subsets of the
RAVDESS data set to improve emotion state prediction from
voice data.
GBM (Gradient Boosting Machine): It also builds a series
of decision trees, where each tree learns to correct the errors
of the previous ones, which are applied to the RAVDESS
dataset to improve the tone sensitivity recognition.
Fig. 15. ROC curve of implemented models

510

Further the analysis of the proposed model is shown by the The confusion matrix describes the performance of the
ROC curve for each model. AUC values close to equal scores CNN model, which is used to distinguish between visually
of 1.00 across samples indicate good classification. The distinct emotions, such as 'happiness' and 'sadness' but
analysis of each model is explained as below: struggles with more subtle distinctions, such as 'fear' and
'shock', as shown in Figure 16-a. The SVM model, shown in
LSTM (long-term memory): The ROC curve of the LSTM Figure 16-b, which is known to perform well in high-
model slopes almost to the upper left, indicating a high area dimensional environments, has strongly defined limitations.
under the curve (AUC) of 0.99. This means a high true An LSTM model, as shown in Figure 16-c, that is adept at
positive for the range of decision thresholds of the LSTM processing sequences succeeds with regard to temporal
model. The rate is a low number and a false positive. sensitivity. The clustered random forest approach can lead to
CNN (Convolutional Neural Network): CNN has an robust overall performance with less apparent weaknesses.
absolute AUC of 1.00, indicating that it discriminates well The RF, as described in Figure 16-d, the diagonal cell,
between classes for all thresholds. predicted values match the actual value, and the darker cells
indicate a higher number of correct predictions. The GBM
SVM (Support Vector Machine): Like CNN, SVM also model, as Figure16-e, in capturing complex patterns but at risk
shows an AUC of 1.00, meaning that the positive and negative of overfitting, can potentially lose generalization. Black
classifications can be perfectly told apart. triangles along diagonals in their respective uncertainty
GBM (Gradient Boosting Machine): The ROC curve for matrices will indicate correct predictions. In contrast, any
GBM is in the upper left corner, indicating an AUC of 1.00, obvious diagonal excess pattern reflects systematic
indicating good performance. misclassification, such as 'silent' and 'neutral a neutral' or
'scared' conflated with 'surprised' balanced model respects
RF (Random Forest): RF has a ROC curve with an AUC accuracy in all emotions, maintains high levels of true
of 0.99, which is close to equal scores, indicating that it also positives, and reduces false positives and false information on
performs very well in discriminating between studies. negative as the most effective for emotion seeing this
The diagonal dashed line ultimately represents AUC = 0.5 particular task. The LSTM algorithm shows better results
for random classification. If the classifier's ROC curve regarding the confusion matrix than other machine learning
exceeds this line and moves towards the corner on the upper and deep learning models.
left, it appears strong. The test is more accurate when the slope
V. CONCLUSION
stays close to the left and upper edges of the ROC space. Thus,
we analyzed that LSTM performed better on the RAVDESS In summary, machine education image analysis is obtained
dataset for emotion detection than the other models. by analyzing the RAVDESS data in the experimental analysis.
Their superior displays evidence this CNN accomplished an
accuracy of 0.91 and an F1-score of 0.9. At the same time,
LSTM notched an impressive 92.3% accuracy and 0.91 F1-
score-sensitive audio data, highlighting their excellence in
capturing spatial and temporal aspects of the types of content.
The AUC values close to the model score of 1.00 indicate that
this model can discriminate well in sensitivity classification.
However, such high AUC values deserve to be interpreted
cautiously to ensure that they are not the result of overfitting
but rather the accuracy of the models' generalizability. The
(a) CNN (b) SVM
ROC curves further support the power of CNN and LSTM
models; an LSTM shows a high AUC of 0.99. Finally, the
outcome of the LSTM model on the RAVDESS dataset is
outstanding, indicating that it is considered more suitable for
sensing recognition tasks compared to its other existing modes;
the further future scope for the proposed model can be the
enhancement model implemented on multiple datasets, with
better accuracy as compared to the proposed model.
(c) LSTM (d) RF
REFERENCES
[1] S. Shreya, P. Likitha, G. Saicharan, and Dr. Shruti Bhargava Choubey,
“Speech Emotion Detection Through Live Calls,” International Journal
for Research in Applied Science & Engineering Technology
(IJRASET), vol. 11, no. 5, May 2023.
[2] R. Ullah et al., “Speech Emotion Recognition Using Convolution
Neural Networks and Multi-Head Convolutional Transformer,”
Sensors, vol. 23, no. 13, p. 6212, 2023.
[3] Q. Ouyang, “Speech emotion detection based on MFCC and CNN-
LSTM architecture,” in Proceedings of the 3rd International
Conference on Signal Processing and Machine Learning, Sichuan,
China, 2023.
(e) GBM
[4] G. Liu, S. Cai, and C. Wang, “Speech emotion recognition based on
Fig. 16. (a) CNN, (b) SVM, (c) LSTM, (d) RF (e) GBM emotion perception,” EURASIP Journal on Audio, Speech, and Music
Processing, vol. 1, no.1, p.1-10, 2023.
[5] M. C. Pentu Saheb, P. Sai Srujana, P. Lalitha Rani, and M. Siva Jyothi,
“Speech Emotion Recognition,” International Journal of Food and

511

Nutritional Sciences (IJFANS), vol.11, no. 12, pp.1920-1927, Dec. acoustic and text features in latent space,” in Proceedings of the 2022
2022. Asia-Pacific Signal and Information Processing Association Annual
[6] A. S. Alluhaidan, O. Saidani, R. Jahangir, and O. S. Neffati, “Speech Summit and Conference (APSIPA ASC), Chiang Mai, Thailand,
Emotion Recognition through Hybrid Features and Convolutional pp.1678-1683, 2022.
Neural Network,” Appl. Sci., vol.13, no.8, p. 4750, 2023. [16] S. M. B. R., S. B., S. L., and K. K., “Speech Based Emotion
[7] J. Singh, L. B. Saheer, and O. Faust, “Speech Emotion Recognition Recognition System,” International Journal of Engineering Technology
Using Attention Model,” Int. J. Environ. Res. Public Health, vol.20, and Management Sciences, vol.7, no.1, pp.332-337, 2023.
no.6, p.5140, 2023. [17] J. Indra, R. K. Shankar, and R. D. Priya, "Speech Emotion Recognition
[8] M. E. Seknedy and S. Fawzi, "Arabic English Speech Emotion Using Support Vector Machine and Linear Discriminant Analysis," in
Recognition System," in Proceedings of the 20th Learning and Intelligent Systems Design and Applications. ISDA 2022, A. Abraham,
Technology Conference (L&T), Jeddah, Saudi Arabia, pp.167-170, S. Pllana, G. Casalino, K. Ma, and A. Bajaj, Eds., vol. 715, no.1, 2023.
2023. [18] R. Aswani, A. Gawale, B. Dhawale, A. Shivade, N. Donde, and Prof.
[9] Q. Q. Oh, C. K. Seow, M. Yusuff, S. Pranata, and Q. Cao, “The Impact U. Tambe, “Speech Emotion Recognition,” International Journal of
of Face Mask and Emotion on Automatic Speech Recognition (ASR) Creative Research Thoughts (IJCRT), vol. 9, no. 5, May 2021.
and Speech Emotion Recognition (SER),” in Proceedings of the 8th [19] S. M. M. Naidu, V. Shinde, V. Kulkarni, A. Wadekar, and Y. A. Chavan,
International Conference on Cloud Computing and Big Data Analytics “Speech-based Emotion Recognition Methodologies,” The Ciencia &
(ICCCBDA), Chengdu, China, 2023. Engenharia - Science & Engineering Journal, vol. 11, no. 1, pp. 798-
[10] M. D. A. I. Majumder et al., “Human Speech Emotion Recognition 807, 2023.
Using CNN,” in Proceedings of the 25th International Conference on [20] R. Mittal, S. Vart, P. Shokeen and M. Kumar, “Speech Emotion
Computer and Information Technology (ICCIT), Cox's Bazar, Recognition,” 2022 2nd International Conference on Intelligent
Bangladesh, pp.25-30, 2022. Technologies (CONIT), Hubli, India, pp. 1-6, 2022.
[11] A. Sayar et al., “Emotion Recognition From Speech via the Use of [21] Kumar, Sandeep, Mohd Anul Haq, Arpit Jain, C. Andy Jason,
Different Audio Features, Machine Learning and Deep Learning Nageswara Rao Moparthi, Nitin Mittal, and Zamil S. Alzamil.
Algorithms,” Artificial Intelligence and Social Computing, vol. 72, "Multilayer Neural Network Based Speech Emotion Recognition for
no.1, pp.111-120, 2023. Smart Assistance." Computers, Materials & Continua 75, no. 1 (2023).
[12] N. T. Pham, S. D. Nguyen, V. S. T. Nguyen, B. N. H. Pham, and D. N. [22] Kumar, Sandeep, Sanjana Mathew, Navya Anumula, and K. Shravya
M. Dang, “Speech emotion recognition using overlapping sliding Chandra. "Portable camera-based assistive device for real-time text
window and Shapley additive explainable deep neural network,” recognition on various products and speech using android for blind
Journal of Information and Telecommunication, vol.7, no.3, pp.317- people." In Innovations in Electronics and Communication
335, 2023. Engineering: Proceedings of the 8th ICIECE 2019, pp. 437-448.
[13] S. Harsha Vardhan, M. P. Rahul, P. Kavyasri, and A. Sraavani, Springer Singapore, 2020.
“Emotion Recognition using Speech Signals,” International Journal of [23] Srilakshmi, Regula, Vidya Kamma, Shilpa Choudhary, Sandeep Kumar,
Advanced Research in Science, Communication and Technology and Munish Kumar. "Building an Emotion Detection System in Python
(IJARSCT), vol. 2, no. 3, pp.126-131, November 2022. Using Multi-Layer Perceptrons for Speech Analysis." In 2023 3rd
[14] K. Bhangale and M. Kothandaraman, “Speech Emotion Recognition International Conference on Technological Advancements in
Based on Multiple Acoustic Features and Deep Convolutional Neural Computational Sciences (ICTACS), pp. 139-143. IEEE, 2023.
Network,” Electronics, vol. 12, no. 4, p. 839, 2023.
[15] J. Santoso, R. Sekiguchi, T. Yamada, K. Ishizuka, T. Hashimoto, and S.
Makino, “Speech emotion recognition based on the reconstruction of

512

Authorized licensed use limited to: Nitte Meenakshi Institute of Technology. Downloaded on April 22,2025 at 06:20:11 UTC from IEEE Xplore. Restrictions apply.

Lo and Rs 2025 Grade 12 Preparatory Exam Timetable Draft
No ratings yet
Lo and Rs 2025 Grade 12 Preparatory Exam Timetable Draft
1 page
Pmes Repository Folder...
No ratings yet
Pmes Repository Folder...
32 pages
Project-PPT-Speech Emotion Recognition
85% (13)
Project-PPT-Speech Emotion Recognition
10 pages
Narrative Report in Elln
100% (7)
Narrative Report in Elln
2 pages
Speech Emotion Journal Phase 2-3
No ratings yet
Speech Emotion Journal Phase 2-3
6 pages
Speech Emotion Recognition Using Machine Learning
No ratings yet
Speech Emotion Recognition Using Machine Learning
14 pages
EMOTIONDETECTION (1) Mini Project
No ratings yet
EMOTIONDETECTION (1) Mini Project
5 pages
Sample GRADE 5
No ratings yet
Sample GRADE 5
3 pages
Review 3 PPT Final1)
No ratings yet
Review 3 PPT Final1)
51 pages
Research Paper
No ratings yet
Research Paper
5 pages
Speech Emotion Recognition Using Deep Learning Techniques: A Review
No ratings yet
Speech Emotion Recognition Using Deep Learning Techniques: A Review
19 pages
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
No ratings yet
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
6 pages
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
No ratings yet
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
11 pages
Emotion Detection Final Paper
No ratings yet
Emotion Detection Final Paper
15 pages
Real-Time Speech Emotion Recognition Using Deep Le
No ratings yet
Real-Time Speech Emotion Recognition Using Deep Le
40 pages
Zhao 2019
No ratings yet
Zhao 2019
12 pages
XEmoAccent Embracing Diversity in Cross-Accent Emo
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emo
19 pages
MiniProject 5
No ratings yet
MiniProject 5
11 pages
SECOND - s11042 023 16849 X
No ratings yet
SECOND - s11042 023 16849 X
18 pages
Pay Raise Letter Request
No ratings yet
Pay Raise Letter Request
2 pages
MS Thesis Final
No ratings yet
MS Thesis Final
47 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
Deep Learning Based Emotion Recognition System Using Speech Features and Transcriptions
No ratings yet
Deep Learning Based Emotion Recognition System Using Speech Features and Transcriptions
12 pages
Set Conference Draft Paper - 223585
No ratings yet
Set Conference Draft Paper - 223585
6 pages
DL Emotion MFCC
No ratings yet
DL Emotion MFCC
6 pages
Speech Emotion Recognition Based On SVM Using Matlab PDF
No ratings yet
Speech Emotion Recognition Based On SVM Using Matlab PDF
6 pages
1 ST
No ratings yet
1 ST
23 pages
Research Paper On Speech Emotion Recogtion System
No ratings yet
Research Paper On Speech Emotion Recogtion System
9 pages
2019 BE Emotionrecognition ICESTMM19
No ratings yet
2019 BE Emotionrecognition ICESTMM19
8 pages
Recognition of Emotions in Speech Using Deep CNN A
No ratings yet
Recognition of Emotions in Speech Using Deep CNN A
18 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
SER Final
No ratings yet
SER Final
10 pages
Serdl 2
No ratings yet
Serdl 2
10 pages
JETIR2106163
No ratings yet
JETIR2106163
5 pages
1 s2.0 S0003682X23002906 Main
No ratings yet
1 s2.0 S0003682X23002906 Main
11 pages
Wa0007
No ratings yet
Wa0007
6 pages
GROUP7 Researchpaper
No ratings yet
GROUP7 Researchpaper
9 pages
10 1109@access 2019 2936124
No ratings yet
10 1109@access 2019 2936124
19 pages
Speech Emotion Recognization
No ratings yet
Speech Emotion Recognization
65 pages
9 - Yogendra
No ratings yet
9 - Yogendra
5 pages
Sensors 23 06212 v2
No ratings yet
Sensors 23 06212 v2
20 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
No ratings yet
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
13 pages
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
No ratings yet
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
7 pages
Enhanced Speech Emotion Detection Using Deep Neural Networks
No ratings yet
Enhanced Speech Emotion Detection Using Deep Neural Networks
14 pages
SPRINGERIJST
No ratings yet
SPRINGERIJST
11 pages
Final Presentation
No ratings yet
Final Presentation
50 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
Unit 4
No ratings yet
Unit 4
14 pages
Chethana H N REPORT
No ratings yet
Chethana H N REPORT
12 pages
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
No ratings yet
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
7 pages
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
18 pages
Speech Emotion Recognition Based On SVM Using MATLAB: March 2016
No ratings yet
Speech Emotion Recognition Based On SVM Using MATLAB: March 2016
7 pages
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
No ratings yet
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
11 pages
Machine Learning and Deep Learning Techniques For Emotion Recognition From Human Speech Using Acoustic Analysis
No ratings yet
Machine Learning and Deep Learning Techniques For Emotion Recognition From Human Speech Using Acoustic Analysis
10 pages
Multimodal Speech Emotion Recognition and Ambiguity Resolution
No ratings yet
Multimodal Speech Emotion Recognition and Ambiguity Resolution
9 pages
Speaker Emotion Recognition: Leveraging Self-Supervised Models For Feature Extraction Using Wav2Vec2 and Hubert
No ratings yet
Speaker Emotion Recognition: Leveraging Self-Supervised Models For Feature Extraction Using Wav2Vec2 and Hubert
9 pages
Electronics 12 00839 v2
No ratings yet
Electronics 12 00839 v2
17 pages
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
No ratings yet
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
5 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
Deep Learning Report 1 3
No ratings yet
Deep Learning Report 1 3
3 pages
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
No ratings yet
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
2 pages
Speech Emotion Recognition Using Machine Learning
No ratings yet
Speech Emotion Recognition Using Machine Learning
8 pages
GenAI Curriculum (DataSpoof)
No ratings yet
GenAI Curriculum (DataSpoof)
4 pages
Unit 1 Lesson 1 Computer Systems
No ratings yet
Unit 1 Lesson 1 Computer Systems
8 pages
2024 Visual Communication Designs D
No ratings yet
2024 Visual Communication Designs D
46 pages
Air Canada SMS
No ratings yet
Air Canada SMS
42 pages
The Problem and Its Background: Thesis Title: Learning Virtues Through Literary Selections in English
No ratings yet
The Problem and Its Background: Thesis Title: Learning Virtues Through Literary Selections in English
12 pages
Grade 11 Mathematics Sequence and Series - Arithmetic Progression 2 Editable Lesson Plan
No ratings yet
Grade 11 Mathematics Sequence and Series - Arithmetic Progression 2 Editable Lesson Plan
2 pages
And He
No ratings yet
And He
1 page
Customer Relationship Management in Banking Sector
No ratings yet
Customer Relationship Management in Banking Sector
21 pages
My Internship Overview1
No ratings yet
My Internship Overview1
15 pages
Nadavant-ul-Ulma, Ali Garh and Deoband
No ratings yet
Nadavant-ul-Ulma, Ali Garh and Deoband
5 pages
Wind Turbine Design Project: Investigate
No ratings yet
Wind Turbine Design Project: Investigate
5 pages
Yoga Instructor Resume
100% (2)
Yoga Instructor Resume
5 pages
Dissertation Sukanya
No ratings yet
Dissertation Sukanya
53 pages
Program Evaluation Read 180
No ratings yet
Program Evaluation Read 180
15 pages
Pengaruh Perawatan Perianal Hygiene Dengan Minyak Zaitun Terhadap Pencegahan Ruam Popok Pada Bayi
No ratings yet
Pengaruh Perawatan Perianal Hygiene Dengan Minyak Zaitun Terhadap Pencegahan Ruam Popok Pada Bayi
9 pages
EXAMÉN EDUSOFT-Intermediate 1 Exit Test
No ratings yet
EXAMÉN EDUSOFT-Intermediate 1 Exit Test
7 pages
Stcgan Shadow
No ratings yet
Stcgan Shadow
10 pages
Gr. 11 Civic Mirror Assessment Outline For Students
No ratings yet
Gr. 11 Civic Mirror Assessment Outline For Students
4 pages
Case Based 1 - Week 9
No ratings yet
Case Based 1 - Week 9
3 pages
Assignment No.3
No ratings yet
Assignment No.3
8 pages
Summary of Promotional Vacancies 2024
No ratings yet
Summary of Promotional Vacancies 2024
2 pages
Dulcie September - A Voice of Reason
No ratings yet
Dulcie September - A Voice of Reason
3 pages
Banners
No ratings yet
Banners
2 pages
GENJ653BDUGKZNMVZZA78M6A8RK4Y3PLPDPNHSVL 6567d098
No ratings yet
GENJ653BDUGKZNMVZZA78M6A8RK4Y3PLPDPNHSVL 6567d098
2 pages
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet

Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition

Uploaded by

Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition

Uploaded by

2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)

Exploring the Effectiveness of Advanced Machine

[email protected] [email protected] Vaddeswaram, India

Fig. 3. Emotion: Angry on RAVDESS Dataset

Fig. 9. Noised Voice

Time Stretch/Voice Modification: Using this model, we can

Fig. 5. Emotion: Fear on RAVDESS Dataset

Shifted Voice: In the context of audio processing, “ shifted

Fig. 6. Emotion: Happy on RAVDESS Dataset

Fig. 11. Shifted Voice

Pitched Voice: The frequency at which the vocal cords

IV. RESULT ANALYSIS

TABLE I. COMPARATIVE ANALYSIS

Accuracy Precision Recall F1-Score

E. Models Implemented: CNN SVM LSTM RF GBM

You might also like