0% found this document useful (0 votes)

85 views5 pages

Arabic English Speech Emotion Recognition System

Uploaded by

Sahar Fawzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views5 pages

Arabic English Speech Emotion Recognition System

Uploaded by

Sahar Fawzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Arabic English Speech Emotion Recognition System

Mai El Seknedy Sahar Fawzi .

Biomedical Systems Group Biomedical Systems Group
Center of Informatics Science Center of Informatics Science
Nile University Nile University
Giza, Egypt Giza, Egypt
[email protected] [email protected]

Abstract— The Speech Emotion Recognition (SER) system is an The proposed feature-set performance was compared with the
approach to identify individuals' emotions. This is important for benchmarked IS09 feature set (based on the INTERSPEECH
human-machine interface applications and for the emerging 2009 Emotion Challenge) performance [11]. Multi-Layer
Metaverse. This paper presents a bilingual Arabic-English Perceptron (MLP), Support Vector Machine (SVM), Simple
speech emotion recognition system using EYASE and
Logistic Regression (SLR), and Random Forest (RF) machine
RAVDESS datasets. A novel feature set was composed by using
spectral and prosodic parameters to obtain high performance at learning classification models were used [12]. The
a low computational cost. Different machine learning classifiers performance of SER was analyzed using 10 folds to ensure
were applied, including Multi-Layer Perceptron, Support model generalization and stability. Three evaluation metrics;
Vector Machine, Random Forest, Logistic Regression, and accuracy, recall rate, and precision were used.
Ensemble learning. The execution time of the proposed feature
set was compared to the benchmarked feature set of This paper is organized as follows: a literature review of the
"Interspeech 2009”. Promising results were obtained using the evolution of SER is presented in Section 2, the methodology
proposed feature sets. SVM resulted in the best emotion applied with the proposed features set is presented in Section
recognition rate and execution performance. The best accuracies
3, experiments and results are presented in Section 4 and
achieved were 85% on RADVESS, and 64% on EYASE.
Ensemble learning detected the valence emotion with 90% on finally, Section 5 introduced the conclusion and the proposed
RADVESS, and 87.6% on EYASE. future work.

Keywords: Bilingual Speech emotion recognition, Cross corpus, II. LITERATURE REVIEW
Mel frequency cepstral coefficients, prosodic features A significant interest in SER research has evolved over the
past two decades. Several acted, elicited, and non-acted
I. INTRODUCTION datasets are now available for use in SER systems using

S peech Emotion Recognition (SER) has a wide range of

applications in human interacting systems to enhance the
interactive experience.
different languages [7]. Features representing different
domains have been used in SER systems. Prosodic features
which describe the speech intonation, rhythm, and pitch
Useful applications include human-computer interaction [1], trajectories were the main components of the proposed feature
the emerging technology of Metaverse [2], call centers [3], sets [13]. Spectral features such as contrast, bandwidth,
medical applications [4], autonomous vehicles [5], e-learning centroid, signal energy features (RMS), and Mel spectrogram
engagement evaluation [6], and commercial applications [7]. features were also used extensively. The most widely used
Most of these applications allow the user to choose the feature in the SER domain is Mel-frequency cepstral
language to use. In Egypt and most Arab countries, coefficients (MFCC) as it represents the natural speech
applications provide Arabic and English language choices. perception of humans [14-15].
An Arabic-English SER can be integrated with online
MFCC Features and spectrogram images of the audio
customer support services to predict clients’ satisfaction [8].
signals were used to train Convolution Neural Networks
This will improve the quality of services by analyzing the
client’s psychological attitude and taking needed actions on (CNN) and Long Short-Term Memory (LSTM) deep neural
the spot. network systems [16]. Other important features include linear
Furthermore, e-learning may benefit from tracking the prediction coefficients (LPC) and Voice Quality Features
emotional status of students/attendants, which will improve such as Jitter and Shimmer [17,18].
the instructors’ communication skills [9]. Different classification algorithms were applied in the SER
This paper introduces an SER model based on a novel domain as Hidden Markov models (HMM) [7], Gaussian
features’ set to identify the emotional status of the speaker. Mixture Models (GMM), tree-based Models (Random Forest)
The model performance is validated using EYASE dataset for [19], Support Vector Machine (SVM) [20,21], K-Nearest
Arabic [8] and RAVDESS dataset for English [10]. Neighbor (KNN) [22], Logistic Regression [23] and Artificial
Neural Networks (ANN) [24].

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

Artificial neural networks such as CNN, LSTM, Auto-
encoders, RNN, and attention-based models are currently the
dominant stream in SER [25]. Transfer Learning through a
pre-defined CNN model was applied using spectrogram
images of the speech [26]. Multimodal systems were also
implemented. The integration between speech and text for
emotion classifications was introduced in [27]. Speech and
visual images were combined for emotion recognition, as
presented in [28].

III. METHODOLOGY
Two datasets were used to check the efficiency of
the proposed feature set over the benchmarked feature set of
INTERSPEECH 2009.

A. Datasets

RAVDESS (Ryerson Audio-Visual Database of Emotional

Speech and Song), which is a dynamic dataset of lexically Fig. 1. Chi-square scores for the top-ranked proposed features
matched statements in an American accent. Twenty-four
actors (12 male and 12 female) acted on eight emotions Further details about the feature set and the benchmarked
angry, happy, neutral, sad, calm, fearful, surprise, and INTERSPEECH 2009 Paralinguistic Challenge feature set
disgust. Each expression is recorded at two emotional (IS09) [11], obtained using openSMILE [32] are displayed
intensity levels and neutral. It consists of 1440 utterances in in Table I.
.wav formats with a sampling rate of 48 kHz [10].
EYASE (Egyptian Arabic Semi-Natural Emotion) speech TABLE I. FEATURES’ SET DESCRIPTION
dataset. It includes 579 statements representing four basic
emotions; angry, happy, neutral, and sad, and pronounced by Feature set IS09 – Number of Proposed Feature set -
3 male and 3 female professional actors. Statements were features: 384 Number of features: 122
extracted from an Egyptian drama series. The files are in .wav Tool used OpenSMILE Tool Librosa + pYAAPT
formats with a sampling rate of 44.1 kHz [8]. Components RMS, 12 MFCC, RMS,14 MFCC, 8 Mel-
ZCR, Voicing spectrogram, ZCR, 12
B. Features extraction probability, and Chroma, Tonnetz, 8
Fundamental Contrast, Fundamental
frequency F0 frequency (F0), Pitch
As mentioned in the literature review, Prosodic and Contour, and Signal’s low-
Spectral are commonly used features for emotion detection frequency band mean energy
[8,18]. (SLFME)
The proposed feature set consists of prosodic, spectral, and Statistical min, max, mean, min, max, standard
statistical parameters and was developed using Librosa [29], Functions range, standard deviation, mean, range, and
and pYAAPT (pitch tracker tool) for python users [30]. deviation, maxPos, percentile (25, 50, 75,90)
Chi-square test was applied to select the most significant minPos, linregc1,
linregc2, linregerrQ,
features with the best Chi-square scores [31]. skewness, and
Pitch features, Mel spectrogram, and MFCCs showed high kurtosis
Chi-square scores which indicate a high impact on our results.
Fig. 1, shows the top Chi-square scores for our selected
C. Feature Scaling
features.
Different Normalization techniques were used in the
literature such as Standard Scaler and Minimum-Maximum
Scaler (MMS) [23],[26]. Minimum-Maximum scaler
(MMS) method was adopted using Eq. (1).

= ( − min) / (max −min) (1)

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

Where X is the input features, min and max are the Recall:
minimum and maximum values of the features. Shows how many of the actual positive emotional classes
D. Machine Learning Models were correctly predicted.
Four classification techniques were considered. The Support &! '' = (4)
+
vector Machine (SVM) was selected for its high performance
in higher-dimension data such as audio data. A Random Confusion Matrix:
Forest tree-based ensemble classifier of 500 decision trees Is a representation to analyze the model’s performance by
with a maximum depth of trees equal to 20 was implemented. comparing between the actual and predicted labels.
The logistic Regression algorithm was also used to analyze
the linear model's performance. Finally, a 3-layered IV. RESULTS AND DISCUSSION
feedforward neural network algorithm, Multi-Layer
Perceptron (MLP) was applied. This section elaborates on the classification models’ results
For hyperparameters tuning, the GridSearchCV method was for emotion recognition and shows a comparison between this
used to fine-tune the classifier’s parameters. For SVM, the paperwork with previous related research. The recognition
kernel function used is (rbf), the decision function shape is performance was analyzed for the used classifiers MLP,
SVM, Random Forest, and Logistic Regression through 2
set to a one-vs-rest (ovr) decision function of shape
datasets for 2 different languages (English, and Arabic):
(n_samples, n_classes), and the regularization parameter (C)
RADVESS, and EYASE, respectively. The 10 K folds were
is set to 10 (Inverse of regularization strength). In the case of used for evaluation to ensure the model's generalization and
Random Forest trees, (n_estimators) with a maximum depth stability.
of trees equal to 20 and the "entropy" criterion function (the
A. Single Corpus Multi Emotions Classification
function to measure the quality of a split) was used whereas
“lbfgs” solver with l2 norm was used for Logistic Regression. The models are trained and tested with the same language.
Close results were obtained by applying the different
The maximum iterations for a solver to converge were set to
classifiers on the two feature sets, as shown in Table 2.
1000 (max_iter). For MLP, the number of neurons in the
hidden layer was 400 neurons, the solver used was ‘adam’, TABLE II. SINGLE CORPUS MULTI-EMOTION CLASSIFICATION USING
the activation method was set to the default ‘relu’, the size of 10 FOLDS (ANGRY/HAPPY/NEUTRAL/SAD)

mini-batches was 5 for stochastic optimizers (batch_size), Feature-set

Regression

Ensemble
Learning
Random
Datasets

Logistic
metrics

Forest
and the learning rate was ‘constant’.

SVM
MLP
E. Evaluation Metrics
The 10-fold cross-validation was applied to ensure statistical
Feature

78.3 85.4 76.2 70.8 79.4

stability and generalization of the model. Where, in 10-fold 77.5 76.5 70.7 78.5
RADVESS

83.2
Accuracy
-set

cross-validation, the database is randomly partitioned into 10 76.7 84.8 74.6 68.1 77.7
Precision
equal size subsamples. Of the 10 subsamples, 1 subsample Recall 81.2 84.7 74.6 80.8 82.7
IS09

which is 10% of the database is considered as the testing data 81 84.4 76.8 80.7 83.8
to validate the classification model, and the remaining 9 82.1 83.2 71.4 81.2 82.2
subsamples are used as training data. The reported accuracy 64.6 62.5 61.3 64
Feature

64.1
is the average of the 10 folds tests. 64.7 64.7 60 60.3 63.7
Accuracy
-set

We used 4 evaluation metrics during our experiments. 63.9 63 61 60.4 62.6

EYASE

Precision
Accuracy: Recall
61 64.6 61.5 63 64.2
IS-09

where it gives an overall measure of the percentage of 60 64.5 60.6 62 64

correctly classified instances. 60 63.7 60.7 62 62.7
+
= (2)
+ + +
B. Cross Corpus Multi Emotion Classification
Where,
Tp: True positive (positive examples predicted positive) The models are trained with both languages and tested with
Tn: True negative (negative examples predicted negative) one language at a time.
Fp: False positive (negative examples predicted positive)
Fn: False negative (negative examples predicted negative)
Precision:
|Is to measure the true positive cases relative to all positively
predicted emotional classes.
! "#"$ = (3)
+

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

TABLE III. CROSS CORPUS MULTI-EMOTION CLASSIFICATION USING REFERENCES
PROPOSED FEATURE SET (ANGRY/HAPPY/NEUTRAL/SAD)
[1] A. J. and R. A. L. Matsane, “The use of Automatic Speech Recognition
in education for identifying attitudes of the Speakers,” in IEEE Asia-Pacific

Feature-set

Regression

Ensemble
Learning
Random
Datasets

Logistic
metrics Conference on Computer Science and Data Engineering (CSDE), 2020.

Forest
SVM
MLP
[2] S. -M. Park and Y. -G. Kim, "A Metaverse: Taxonomy, Components,
Applications, and Open Challenges," in IEEE Access, vol. 10, pp. 4209-4251,
2022, doi: 10.1109/ACCESS.2021.3140175.
[3] Blumentals E and Salimbajevs A, "Emotion Recognition in Real-World
65.6 63.8 63.6 66.2
Feature
66.3 Support Call Center Data for Latvian Language", "CEUR Workshop
63 62.7 63.5 64
RADVESS

Accuracy 64 Proceedings", vol. 3124, 2022

-set

63.4 65.8 63.4 63.2 65.2 [4] M. A. Rashidan et al., “Technology-Assisted Emotion Recognition for
Precision
Autism Spectrum Disorder (ASD) Children: A Systematic Literature
Recall 64.6 64.3 64.8 63.8 65.7 Review,” IEEE Access, vol. 9, pp. 33638–33653, 2021.
IS09

62.8 64.6 62.3 64 64.6 [5] L. Tan et al., "Speech Emotion Recognition Enhanced Traffic
63 65.4 65 63.8 65.7 Efficiency Solution for Autonomous Vehicles in a 5G-Enabled Space–Air–
Ground Integrated Intelligent Transportation System," in IEEE Transactions
61.6 60.5 59.3 62
Feature

62.1 on Intelligent Transportation Systems, vol. 23, no. 3, pp. 2830-2842, March
Accuracy 61.3 61 60.2 58.3 62.7 2022, doi: 10.1109/TITS.2021.3119921.
-set

60.9 61.6 61 59.4 62.2

EYASE

Precision [6] Du, Y., Crespo, R. G., & Martínez, O. S., “Human emotion recognition
Recall for enhanced performance evaluation in e-learning”, “Progress in Artificial
61.7 62.6 59.5 59.6 61.2 Intelligence”, 2022, 1–13. https://doi.org/10.1007/S13748-022-00278-2
IS-09

62.6 61.5 58.6 57.3 60.7 [7] M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional
60.8 61.7 58.7 58.5 61.7 models, databases, features, preprocessing methods, supporting modalities,
and classifiers,” Speech Commun., vol. 116, pp. 56–76, 2020.
[8] L. Abdel-Hamid, “Egyptian Arabic speech emotion recognition using
Since the main objective of this bilingual proposed system prosodic, spectral and wavelet features,” Speech Commun., vol. 122, pp. 19–
30, 2020
is to detect the speaker’s emotion through the duration of the [9] S. Mirsamadi, E. Barsoum and C. Zhang, "Automatic speech emotion
talk, the valence-arousal emotion classification was recognition using recurrent neural networks with local attention," 2017 IEEE
considered, as shown in Table IV. Valence describes the International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2017, pp. 2227-2231
emotion as positive/satisfying or negative/dissatisfying, while [10] S. Livingstone and F. Russo, The Ryerson Audio-Visual Database of
arousal describes the strength of emotion [33]. Emotional Speech and Song (RAVDESS), vol. 13. 2018.
[11] Schuller, B., Steidl, S., and Batliner, A. ,” The INTERSPEECH 2009
TABLE IV. VALENCE EMOTION CLASSIFICATION emotion challenge” , INTERSPEECH,2010.
[12] S. G. Koolagudi, Y. V. S. Murthy, and S. P. Bhaskar, “Choice of a
classifier, based on properties of a dataset: case study-speech emotion
recognition,” Int. J. Speech Technol., vol. 21, no. 1, pp. 167–183, 2018.
Regression

Ensemble
Learning
Random

Logistic

[13] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion

Forest
SVM
MLP

recognition: Features, classification schemes, and databases,” Pattern

Recognit., vol. 44, no. 3, pp. 572–587, 2011.
[14] S. Lalitha, D. Geyasruti, R. Narayanan, and M. Shravani, “Emotion
Detection Using MFCC and Cepstrum Features,” Procedia Comput. Sci., vol.
RADVESS

Accuracy 70, pp. 29–35, 2015.

88.8 89.8 86.8 84.2 90
precision [15] K. A. Araño, P. Gloor, C. Orsenigo, and C. Vercellis, “When Old Meets
88.3 88.2 87 84.1 89.5
New: Emotion Recognition from Speech Signals,” Cognit. Comput., no.
recall 88.2 87.3 87.2 84.4 89.7
April, 2021.
[16] J. Ancilin and A. Milton, “Improved speech emotion recognition with
Accuracy 85 86.5 85.8 82.2 87.6 Mel frequency magnitude coefficient”, ”Applied Acoustics”, vol. 179,
EYASE

precision 85 86.2 85.9 82 88 pp.108046, 2021

85.2 86.3 85.8 82.2 86.5 [17] Mustaqeem and S. Kwon, “A CNN-assisted enhanced audio signal
recall
processing for speech emotion recognition,” Sensors (Switzerland), vol. 20,
no. 1, 2020.
[18] A. Koduru, H. B. Valiveti, and A. K. Budati, “Feature extraction
V. CONCLUSION algorithms to improve the speech emotion recognition rate,” Int. J. Speech
In this paper, a novel speech features’ set was used to train Technol., vol. 23, no. 1, pp. 45–55, 2020.
[19] N. Vryzas, L. Vrysis, M. Matsiola, R. Kotsakis, C. Dimoulas, and G.
different ML models. The impact of each feature was studied, Kalliris, "Continuous Speech Emotion Recognition with Convolutional
and it was found that MFCC is one of the most dominant Neural Networks," J. Audio Eng. Soc., vol. 68, no. 1/2, pp. 14-24, 2020.
features across the used classifiers. SVM proved to be the [20] Bhavan, A., Chauhan, P., Hitkul, and Shah, R.R., “Bagged support
vector machines for emotion recognition from speech”,” Knowledge-Based
optimum SER classifier for its accuracy and efficiency. MLP Syst., vol. 184, p. 104886, 2019.
is a very promising classifier, but it needs a long training time. [21] Parry, J., Palaz, D., Clarke, G., Lecomte, P., Mead, R., Berger, M.A., &
Promising results were reached as the recognition rates of the Hofer, "Analysis of Deep Learning Architectures for Cross-Corpus Speech
single corpus multi-emotion classification system were 85% Emotion Recognition", Interspeech, 2019.
[22] W. Zehra1, A. R. Javed2, Z. Jalil2, H. U. Khan3 and T. R. Gadekallu,
for RADVESS, 64.6% for EYASE using SVM and MLP “Cross corpus multi-lingual speech emotion recognition using ensemble
respectively, and the recognition rates of the cross-corpus learning,“ Complex Intell. Syst., vol. 7, pp.1845–1854, 2021.
multi-emotion classification system was 66% for RADVESS, [23] S. Goel and H. Beigi, “Cross-Lingual Cross-Corpus Speech Emotion
Recognition,” arXiv, 2020.
62% for EYASE using SVM. [24] Z. Peng, Y. Lu, S. Pan and Y. Liu, "Efficient Speech Emotion
Recognition Using Multi-Scale CNN and Attention," ICASSP 2021 - 2021

IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2021, pp. 3020-3024
[25] N. -C. Ristea, L. C. Duţu and A. Radoi, "Emotion Recognition System
from Speech and Visual Information based on Convolutional Neural
Networks," 2019 International Conference on Speech Technology and
Human-Computer Dialogue (SpeD), 2019, pp. 1-6
[26] Z. T. Liu, A. Rehman, M. Wu, W. H. Cao, and M. Hao, “Speech
emotion recognition based on formant characteristics feature extraction and
phoneme type convergence,” Inf. Sci. (Ny)., vol.
[27] M. Caschera, P. Grifoni, F. Ferri, “Emotion Classification from
Speech and Text in Videos Using a Multimodal Approach, Multimodal
Technologies and Interaction, 6(4):28, DOI: 10.3390/mti6040028”, April
2022
[28] Y. Li, Q. He, Y. Zhao, H. Yao, ”Multi-modal Emotion Recognition
Based on Speech and Image”, Advances in Multimedia Information
Processing, May 2018, DOI: 10.1007/978-3-319-77380-3_81
[29] https://librosa.org/
[30] http://bjbschmitt.github.io/AMFM_decompy/pYAAPT.html
[31] https://www.analyticsvidhya.com/blog/2020/10/feature-selection-
techniques-in-machine-learning/.
[32] https://www.audeering.com/research/opensmile/
[33] https://cxl.com/blog/valence-arousal-and-how-to-kindle-an-emotional-
fire/, Last updated: Aug 25, 2022.

Grade 9 Social Studies Notes
86% (22)
Grade 9 Social Studies Notes
46 pages
Step To Sample Book
100% (1)
Step To Sample Book
120 pages
Module 1 - Basics of Costing
100% (2)
Module 1 - Basics of Costing
40 pages
SGLGB EO No. 19 BESRT
100% (6)
SGLGB EO No. 19 BESRT
1 page
Speech Emotion Recognition Using Machine Learning
No ratings yet
Speech Emotion Recognition Using Machine Learning
14 pages
Deep Learning Approaches For Speech Emotion Recognition: State of The Art and Research Challenges
No ratings yet
Deep Learning Approaches For Speech Emotion Recognition: State of The Art and Research Challenges
68 pages
Pre Processing
No ratings yet
Pre Processing
54 pages
Speech Emotion Recognization
No ratings yet
Speech Emotion Recognization
65 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
55 pages
A Survey of Speech Emotion Recognition in Natural Environment
No ratings yet
A Survey of Speech Emotion Recognition in Natural Environment
16 pages
Electronics 12 00839 v2
No ratings yet
Electronics 12 00839 v2
17 pages
Hybrid Temporal Spectral Convolutional Neural Network CNN Designed Specifically For Speech Emotion Rec
No ratings yet
Hybrid Temporal Spectral Convolutional Neural Network CNN Designed Specifically For Speech Emotion Rec
11 pages
Emotion Classification From Speech Signal Based On
No ratings yet
Emotion Classification From Speech Signal Based On
16 pages
Paper 1
No ratings yet
Paper 1
11 pages
Semantic Speech Analysis Using Machine Learning and Deep Learning Techniques: A Comprehensive Review
No ratings yet
Semantic Speech Analysis Using Machine Learning and Deep Learning Techniques: A Comprehensive Review
30 pages
Literature Study 2025
No ratings yet
Literature Study 2025
10 pages
Applsci 12 09188 v2
No ratings yet
Applsci 12 09188 v2
17 pages
Speech Emotion Recognition Using Machine Learning Techniques
No ratings yet
Speech Emotion Recognition Using Machine Learning Techniques
8 pages
1 s2.0 S0003682X23002906 Main
No ratings yet
1 s2.0 S0003682X23002906 Main
11 pages
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
18 pages
Efficient Speech Emotion Recognition: Presented By: Samir Kumar Majhi
No ratings yet
Efficient Speech Emotion Recognition: Presented By: Samir Kumar Majhi
12 pages
Lesson 1 PE
No ratings yet
Lesson 1 PE
28 pages
CNN Based Approach For Speech Emotion Recognition Using MFCC Croma and STFT Hand-Crafted Features
No ratings yet
CNN Based Approach For Speech Emotion Recognition Using MFCC Croma and STFT Hand-Crafted Features
5 pages
2 SER Using LSTM
No ratings yet
2 SER Using LSTM
5 pages
Emotion Recognition in Persian Speech Using Deep Neural Networks
No ratings yet
Emotion Recognition in Persian Speech Using Deep Neural Networks
5 pages
19854-Article Text-29745-1-10-20200601
No ratings yet
19854-Article Text-29745-1-10-20200601
7 pages
Speech Emotion Recognition1
No ratings yet
Speech Emotion Recognition1
86 pages
Economic and Cultural Growth
No ratings yet
Economic and Cultural Growth
3 pages
9 Removed
No ratings yet
9 Removed
5 pages
An Enhanced Speech Emotion Recognition Using Vision Transformer
No ratings yet
An Enhanced Speech Emotion Recognition Using Vision Transformer
17 pages
Speech Emotion Recognition Using Deep Learning Hybrid Models
No ratings yet
Speech Emotion Recognition Using Deep Learning Hybrid Models
5 pages
Modeling and Simulation of Bacterial Foraging Variants - Acoustic Feature Selection and Classification
No ratings yet
Modeling and Simulation of Bacterial Foraging Variants - Acoustic Feature Selection and Classification
7 pages
Recognition of Emotions in Speech Using Deep CNN A
No ratings yet
Recognition of Emotions in Speech Using Deep CNN A
18 pages
Breaking The Barrier With A Multi-Domain SER (Dataset)
No ratings yet
Breaking The Barrier With A Multi-Domain SER (Dataset)
6 pages
Multimodal Speech Emotion Recognition and Ambiguity Resolution
No ratings yet
Multimodal Speech Emotion Recognition and Ambiguity Resolution
9 pages
Speech Emotion Recognition System For Human Interaction Applications
No ratings yet
Speech Emotion Recognition System For Human Interaction Applications
8 pages
Group No 37
No ratings yet
Group No 37
19 pages
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
No ratings yet
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
13 pages
Emotion Recognition From Speech Via The Use of Dif
No ratings yet
Emotion Recognition From Speech Via The Use of Dif
11 pages
XEmoAccent Embracing Diversity in Cross-Accent Emo
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emo
19 pages
Information Sciences: Zhen-Tao Liu, Abdul Rehman, Min Wu, Wei-Hua Cao, Man Hao
No ratings yet
Information Sciences: Zhen-Tao Liu, Abdul Rehman, Min Wu, Wei-Hua Cao, Man Hao
17 pages
Emotion Detection Final Paper
No ratings yet
Emotion Detection Final Paper
15 pages
ASERS-LSTM: Arabic Speech Emotion Recognition System Based On LSTM Model
No ratings yet
ASERS-LSTM: Arabic Speech Emotion Recognition System Based On LSTM Model
9 pages
ENM Installation Guide
No ratings yet
ENM Installation Guide
19 pages
Sensors 23 06212 v2
No ratings yet
Sensors 23 06212 v2
20 pages
Yan 2020
No ratings yet
Yan 2020
5 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
MiniProject 5
No ratings yet
MiniProject 5
11 pages
Pectus Carinatum Pigeon Chest
No ratings yet
Pectus Carinatum Pigeon Chest
2 pages
Johnson and Lester 2021 - Mental Health in Academia - Hacks For Cultivating and Sustaining Wellbeing
100% (1)
Johnson and Lester 2021 - Mental Health in Academia - Hacks For Cultivating and Sustaining Wellbeing
13 pages
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
No ratings yet
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
11 pages
DL Emotion MFCC
No ratings yet
DL Emotion MFCC
6 pages
British Ballads From Maine
No ratings yet
British Ballads From Maine
599 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
An Ensemble 1D-CNN-LSTM-GRU Model With Data Augmentation For Speech Emotion Recognition
No ratings yet
An Ensemble 1D-CNN-LSTM-GRU Model With Data Augmentation For Speech Emotion Recognition
19 pages
JETIR2106163
No ratings yet
JETIR2106163
5 pages
1 PB
No ratings yet
1 PB
12 pages
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
No ratings yet
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
6 pages
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
No ratings yet
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
12 pages
Speech Emotion Recognition Using Machine Learningg
No ratings yet
Speech Emotion Recognition Using Machine Learningg
19 pages
Research Paper Attri
No ratings yet
Research Paper Attri
7 pages
Deep Learning Report 1 3
No ratings yet
Deep Learning Report 1 3
3 pages
External Environment
No ratings yet
External Environment
54 pages
Development of A Real-Time Embedded System For Speech Emotion Recognition
No ratings yet
Development of A Real-Time Embedded System For Speech Emotion Recognition
35 pages
3 Listening Subskills Which Are Key For Learners
No ratings yet
3 Listening Subskills Which Are Key For Learners
2 pages
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
No ratings yet
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
5 pages
Cloze Test: How To Crack The Nut
No ratings yet
Cloze Test: How To Crack The Nut
4 pages
Metric Screw Thread Chart: Metric Tap Size Tap Drill (Inches) Clearance Drill (Inches)
No ratings yet
Metric Screw Thread Chart: Metric Tap Size Tap Drill (Inches) Clearance Drill (Inches)
2 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
Doctrine of Christ Series - Denver Snuffer
No ratings yet
Doctrine of Christ Series - Denver Snuffer
57 pages
1.1 Apogamy, Apospory and Parthenogenesis
No ratings yet
1.1 Apogamy, Apospory and Parthenogenesis
21 pages
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
No ratings yet
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
7 pages
Defination, Types and Vids On Poetry
No ratings yet
Defination, Types and Vids On Poetry
4 pages
Evaluate Vygotsky's Theory of Cognitive Development (8 Marks)
No ratings yet
Evaluate Vygotsky's Theory of Cognitive Development (8 Marks)
1 page
DSCP & Vlan Priority
No ratings yet
DSCP & Vlan Priority
13 pages
Rational of The Body Shop - Project 2 (Branded Interactions)
No ratings yet
Rational of The Body Shop - Project 2 (Branded Interactions)
5 pages
UA SmartLife Brochure-English
No ratings yet
UA SmartLife Brochure-English
10 pages
Contraception Today A Pocketbook For General Practitioners and Practice Nurses 7th Edition John Guillebaud
No ratings yet
Contraception Today A Pocketbook For General Practitioners and Practice Nurses 7th Edition John Guillebaud
55 pages
WIREs Water - 2020 - Meehan - Exposing The Myths of Household Water Insecurity in The Global North A Critical Review
No ratings yet
WIREs Water - 2020 - Meehan - Exposing The Myths of Household Water Insecurity in The Global North A Critical Review
20 pages
In Manhattan Pizza War, Price of Slice Keeps Dropping: N. R. Kleinfield
No ratings yet
In Manhattan Pizza War, Price of Slice Keeps Dropping: N. R. Kleinfield
4 pages
Withania Coagulans (Solanaceae)
No ratings yet
Withania Coagulans (Solanaceae)
11 pages
Armance V The State 2020 SCJ 148
No ratings yet
Armance V The State 2020 SCJ 148
9 pages
Zamoras Vs Su Case Digest
No ratings yet
Zamoras Vs Su Case Digest
1 page
6.1comprehensive Interviews
No ratings yet
6.1comprehensive Interviews
2 pages
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
EngMech - Lecture 1.0
No ratings yet
EngMech - Lecture 1.0
20 pages
IPE 4715 Material Handling and Maintenance
No ratings yet
IPE 4715 Material Handling and Maintenance
2 pages
Family Dynamics
No ratings yet
Family Dynamics
3 pages

Arabic English Speech Emotion Recognition System

Uploaded by

Arabic English Speech Emotion Recognition System

Uploaded by

Arabic English Speech Emotion Recognition System

Mai El Seknedy Sahar Fawzi .

S peech Emotion Recognition (SER) has a wide range of

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

RAVDESS (Ryerson Audio-Visual Database of Emotional

= ( − min) / (max −min) (1)

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

mini-batches was 5 for stochastic optimizers (batch_size), Feature-set

78.3 85.4 76.2 70.8 79.4

We used 4 evaluation metrics during our experiments. 63.9 63 61 60.4 62.6

where it gives an overall measure of the percentage of 60 64.5 60.6 62 64

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

Accuracy 64 Proceedings", vol. 3124, 2022

60.9 61.6 61 59.4 62.2

[13] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion

recognition: Features, classification schemes, and databases,” Pattern

Accuracy 70, pp. 29–35, 2015.

precision 85 86.2 85.9 82 88 pp.108046, 2021

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

979-8-3503-0030-7/23/$31.00 ©2023 IEEE

You might also like