Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
ISSN No:-2456-2165
Abstract:- Speech recognition has gained significant the ability to perceive and express emotions. Speech,
importance in facilitating user interactions with various psychological signals, facial expressions, and other
technologies. Recognizing human emotions and affective modalities can all be used to detect emotions. Speech signals
states from speech, known as Speech Emotion are far more adaptable and simple to acquire than other
Recognition (SER), has emerged as a rapidly growing modalities. Mel-frequency cepstrum coefficients (MFCC),
research subject. Unlike humans, machines lack the chroma, and mel features are extracted from the speech
innate ability to perceive and express emotions. signals and used to train the classifiers.
Therefore, leveraging speech signals for emotion
detection has become an adaptable and accessible Our project aims to classify the emotional state of the
approach. This paper presents a project aimed at speech which can be used in a number of applications like
classifying emotional states in speech for applications call centers, measuring the degree of emotional attachment
such as call centers, measuring emotional attachment in from phone calls, real-time emotion recognition in online
phone calls, and real-time emotion recognition in online learning, etc. There are three classifying methods that are
learning. The classification methods employed in this used in this project for analyzing emotions (calm, happy,
study include Support Vector Machines (SVM), Logistic fearful, angry, disgust, surprised) using SVM, Logistic
Regression (LR), and Multi-Layer Perceptron (MLP). Regression (LR), and Multi-Layer Perceptron (MLP).
The project utilizes features such as Mel-frequency
cepstrum coefficients (MFCC), chroma, and mel to Motivations for Doing the Project
extract relevant information from speech signals and In today's world, identifying the emotion exhibited in a
train the classifiers. Through a comparative analysis of spoken percept has various applications. Human-Computer
these classification methods, this research aims to Interaction (HCI) is a branch of study that looks into how
enhance the understanding of speech emotion humans and computers interact with each other. A computer
recognition and contribute to the development of more system that understands more than simply words is required
effective and accurate emotion recognition systems. for an efficient HCI application. Voice-based inputs are used
by several real-world IoT applications, including Amazon
Keywords:- Speech Emotion Recognition, Speech Alexa, Google Home, and Mycroft. In IoT applications,
Recognition (SER), Emotion Classification, Support Vector voice plays a critical role. According to a recent survey,
Machines (SVM), Logistic Regression (LR), Multi-Layer about 12% of all IoT applications will be completely
Perceptron (MLP), Mel-frequency Cepstrum Coefficients functional by 2022. Self-driving automobiles are one
(MFCC), Chroma, Mel Features. example of the emerging field that uses voice commands to
operate several of its tasks. In emergency scenarios where
I. INTRODUCTION the user may be unable to offer a clear spoken command, the
emotion communicated through the user's tone of voice can
Speech recognition has become increasingly important be used to activate specific car emergency functions.
in recent years as a means of assisting others with ease of
use. Several well-known technology companies, including Objectives
Google, Samsung, and Apple, have used speech recognition The primary objective of speech emotion recognition
to convert human speech into sentences so that their is to improve human-machine interaction interface by
customers may quickly navigate around their products. detecting the emotional state of a person using speech.
Fig 1 Methodology
The above figures show the general flow-chart of our Phase 1: Data Collection
project. It consists of 5 different phases. The data is stored in The RAVDESS dataset is used in the project. The
files in the project directory. The files are loaded using dataset is downloaded into our system.
different python libraries then unnecessary files are
removed. And, we extract different features of sound files Audio files in the directory are loaded using libraries
like mfcc, mel, chroma, which will be used as features for like: os, glob, and soundfile.
mapping classifier function. The dataset is then divided into
two different sets: testing and training sets. We then build We use glob module which finds all the path names
different classifier models. And using the training set, we matching a specified pattern as the dataset consists of audio
train the model. After that we use a testing set for different files named in some specific pattern, which also consists of
evaluations and accuracy calculations of the model. The the emotion decoded value in file name only. os module is
whole process is generalized by the above diagram. used to get the base name of the file. Then, using soundfile
library we read sound files along with the sample rate of the
audio.
SVM
SVM (Support Vector Machine) are supervised learning models with associated learning algorithms that analyze data for
classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an
SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic
binary linear classifier. It is one of supervised machine learning models that linearly separable binary sets. The goal of this model
is to calculate and create a hyperplane that classifies all training vectors. After creating a hyperplane, the next step is to determine
the maximum margin between data point and hyperplane which can be called as support vectors.
Fig 3 SVM
kernel="linear",
C=1
Phase 4: Evaluation
Evaluation of the experiment involves comparison between each model classification report and accuracy. Evaluation of the
experiments includes comparison of accuracy between the multiple experiment’s of each algorithm and between different
algorithms.
In this phase different algorithm models are used for classifying the emotions: MLP Classifier, Logistic Regression, SVM.
SVM
The table below describes the comparison between different algorithms and the table consists the best result of the algorithm
after performing multiple experiments:
MLP has best accuracy, f1-score, precision and recall when it has a hidden layer with 600 hidden units. SVM has best results
when it uses linear kernel and value 1 for regularization parameter, and LR has best results when it uses newton-cg as solver for
optimization.
Compared to different models, we get best results with MLP Classifier and other two SVM and LR have similar results.
SN Name Contribution
VIII. CODE
Snippet codes for the implementation of different methods have already been specified and discussed above.
https://github.com/SamjhanaP/speechemotionrecognition
REFERENCES