0% found this document useful (0 votes)
84 views

Speech Emotion Detection Using Machine Learning

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
0% found this document useful (0 votes)
84 views

Speech Emotion Detection Using Machine Learning

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
You are on page 1/ 11

ABSTRACT

The speech signal is one of the most natural and fastest methods of
communication between humans. Many systems have been developed
by various researchers to identify the emotions from the speech signal.
In differentiating between various emotions particularly speech features
are more useful and if not clear is the reason that makes emotion
recognition from speaker‘s speech very difficult. There are a number of
the dataset available for speech emotions, it's modelling, and types that
helps in knowing the type of speech. After feature extraction, another
important part is the classification of speech emotions so the paper has
compared and reviewed the different classifiers that are used to
differentiate emotions such as sadness, neutral, happiness, surprise,
anger, etc. The research also shows the improvement in emotion
recognition system by making automatic emotion recognition system
adding a deep neural network. The analysis has also been performed
using different ML techniques for Speech emotions recognition accuracy
in different languages.

i
TABLE OF CONTENTS
CHAPTER NO TITLE PAGE NO

ABSTRACT i
LIST OF FIGURES iii
LIST OF ABBREVIATIONS iv
LIST OF TABLES v
1 INTRODUCTION 1
1.1 OUTLINE 1
1.2 DATA OVERVIEW 2
2 LITERATURE REVIEW 3
3 METHODOLOGY 5
PURPOSE 5
SCOPE 5
EXISTING SYSYTEM 5
PROPOSED SYSTEM 5
ALGORITHMS USED 6
CLASSIFIERS 6
REGRESSORS 12
SYSTEM DESIGN 19
DATA FLOW DIAGRAM 20
MODULES 21
SYSTEM ARCHITETURE 23
4 RESULT AND DISCUSSION 24
5 CONCLUSION AND FUTURE WORK 24
REFERENCES 25
APPENDIES 26

ii
LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NO.

1 RAVDESS dataset 2
2 TESS dataset 2
3 SVC Without HyperPlane 7
4 SVC With HyperPlane 7
5 SVC Best HyperPlane 8
6 Decision tree classifier 11
7 Support Vector Regression 13
8 Procedure of Random Forest 13
9 Structure of Random Forest 14
10 A Simple Regression Example 15
11 An Average of Outputs16
12 Predictions for different Values 17
13 KNN Regression 17
14 MLP Regression 18
15 Decision Tree Regression 19
16 Data Flow Diagram 21
17 System Architecture 23
18 Histogram on different classifiers 24
19 Output Screenshots 45
20 Output Screenshots of Happy wav file 45
21 Output Screenshots of sad wav file 46
22 Output Screenshots of Neutral wav file 46

iii
LIST OF ABBREVIATIONS

ABBREVIATION EXPANSION
MLP Muti layer perceptron
SVC Support Vector Classifier
KNN K Nearest Neighbours

iv
LIST OF TABLES

TABLE NO TABLE NAME PAGE NO

1 Confusion Matrix 24

v
CHAPTER 1
INTRODUCTION

1.1 OUTLINE

One of the fastest and natural methods of communication between


humans is a speech signal. For interaction between human and machine
use of speech signal is the fastest and most efficient method. To
maximum awareness of received message, all available senses are
used by human‘s natural ability. For machine emotional detection is a
very difficult task, on the other hand, it is natural for humans. So,
knowledge related to emotion is used by an emotion recognition system
in such a way that there is an improvement in communication between
machine and human. The female or make speakers emotions find out
through speech in speech emotion recognition. These features make a
base for speech processing. In differentiating between various emotions
particularly speech features are more useful is not clear is the reason
that makes emotion recognition from speakers‘ speech very difficult.
There is an introduction of accosting variability due to the existence of
different speaking rates, styles, sentences and speakers that affects the
features of speech. Different emotions may be shown by the same
utterance and there are different portions of the spoken utterance of
each correspond emotion that makes it difficult to differentiate these
portions of utterance. The emotion expression depends on the culture
and environment of the speaker that creates another problem as there is
variation in the style of speaking by the variations in environment and
culture. Transient and long terms emotion are two types of emotions and
it is not clear about the type of emotion detected by recognizer. Speech
information recognized emotions may be speaker independent or
speaker dependent.

1
1.2 DATAOVERVIEW

Figure 1 – RAVDESS dataset.

This dataset is a combination of song files and speech files. we have a


total of 24 actor files in both speech and song[13]. In 40 audio files in
song files. The emotions are labelled as follows: 01-'neutral', 02-'calm',
03-'happy', 04-'sad', 05-'angry', 06- 'fearful', 07-'disgust', 08 -'surprised'.

Figure2 – TESS dataset.


This dataset contains a total of 2800 audio file.

2
CHAPTER 2
LITERATURE REVIEW

An i-vector GPLDA System for Speech based Emotion Recognition

In this paper, we propose the use of a Gaussian Probabilistic


Linear Discriminant Analysis back-end for utterance level emotion
classification based on i-vectors representing the distribution of frame
level MFCC features. Experimental results based on the IEMOCAP
corpus show that the GPLDA back-end outperforms an SVM based
back-end while being less sensitive to i-vector dimensionality, making
the proposed framework more robust to parameter tuning during system
development.

Reconstruction-error-based learning for continuous emotion recognition


in speech

To advance the performance of continuous emotion recognition from


speech, we introduce a reconstruction-error-based learning framework
with memory-enhanced Recurrent Neural Networks. In the framework,
two successive RNN models are adopted, where the first model is used
as an auto encoder for reconstructing the original features, and the
second is employed to perform emotion prediction. The RE of the
original features is used as a complementary descriptor, which is
merged with the original features and fed to the second model. The
assumption of this framework is that the system has the ability to learn
its ‗drawback‘ which is expressed by the RE. Experimental results on the
RECOLA database show that the proposed framework significantly
outperforms the baseline systems without any RE information in terms of
Concordance Correlation Coefficient.

Speech-based Emotion Recognition and Next Reaction Prediction

Communication through voice is one of the main components of


affective computing in human-computer interaction. In this type of
interaction, properly comprehending the meanings of the words or the
linguistic category and recognizing the emotion included in the speech is
essential for enhancing the performance. In order to model the
emotional state, the speech waves are utilized, which bear signals
standing for emotions such as boredom, fear, joy and sadness. In the
first step of the emotional reaction prediction system proposed in this

3
paper,
different emotions are recognized by means of different types of
classifiers. The second step is the prediction of a sequence of the next
emotional reactions using neural networks. The sequence is extracted
based on the speech signals being digitized at tenths of a second, after
concatenating the different speech signals ofeach subject. The
prediction problem is solved as a nonlinear auto-regression time-series
neural network with the assumption that the variables are defined as
data-feedback time-series. The best average recognition rate is 86.25%,
which is achieved by the Random Forest classifier, and the average
prediction rate of reactions by using neural networks.

Speech Based Emotion Recognition Using Spectral Feature Extraction


and an Ensemble of kNN Classifiers

Security (and cyber security) is an important issue in existing and


developing technology. It is imperative that cyber security go beyond
password based systems to avoid criminal activities. A human biometric
and emotion based recognition framework implemented in parallel can
enable applications to access personal or public information securely.
The focus of this paper is on the study of speech based emotion
recognition using a pattern recognition paradigm with spectral feature
extraction and an ensemble of k nearest neighbor classifiers. The five
spectral features are the linear predictive cepstrum, mel frequency
cepstrum, line spectral frequencies, adaptive component weighted
cepstrum and the post filter cepstrum. The bagging algorithm is used to
train the ensemble of kNNs. Fusion is implicitly accomplished by
ensemble classification. The LDC emotional prosody speech Database
is used in all the experiments. Results show that the maximum gain in
performance is achieved by using two kNNs as opposed to using a
single kNN.

Towards Robust Speech-Based Emotion Recognition


Abstract-Maintaining the robustness of a speech processing
system in the presence of noise is a challenge. This paper shows
frequency subband architecture can improve robustness of an emotion
recognition system when signals are corrupted by selective noise. This
paper also demonstrates that feature
Selection based on some mutual-information criterion can give us the
most effective subset of features to get a better result.

4
CHAPTER 3
METHODOLOGY

PURPOSE
The purpose of this document is speech emotion detection using
machine learning algorithms. In detail, this document will provide a
general description of our project, including user requirements, product
perspective, and overview of requirements, general constraints. In
addition, it will also provide the specific requirements and functionality
needed for this project - such as interface, functional requirements and
performance requirements.

SCOPE
The scope of this SRSdocument persists for the entire life cycle of the
project. This document defines the final state of the software
requirements agreed upon by the customers and designers. Finally at
the end of the project execution all the functionalities may be traceable
from the SRSto the product. The document describes the functionality,
performance, constraints, interface and reliability for the entire life cycle
of the project.

EXISTING SYSTEM
The speech emotion detection system is implemented as a Machine
Learning (ML) model. The steps of implementation are comparable to
any other ML project, with additional fine-tuning procedures to make the
model function better. The flowchart represents a pictorial overview of
the process (see Figure 1). The first step is data collection, which is of
prime importance. The model being developed will learn from the data
provided to it and all the decisions and results that a developed model
will produce is guided by the data. The second step, called feature
engineering, is a collection of several machine learning tasks that are
executed over the collected data. These procedures address the several
data representation and data quality issues. The third step is often
considered the core of an ML project where an algorithmic based model
is developed. This model uses an ML algorithm to learn about the data
and train itself to respond to any new data it is exposed to. The final step
is to evaluate the functioning of the built model. Very often, developers
repeat the steps of developing a model and evaluating it to compare the
performance of different algorithms. Comparison results help to choose
the appropriate ML algorithm most relevant to the problem.

PROPOSED SYSTEM
In this current study, we presented an automatic speech emotion

5
recognition (SER) system using machine learning algorithms to classify
the emotions.The performance of the emotion detection system can
greatly influence the overall performance of the application in many
ways and can provide many advantages over the functionalities of these
applications. This research presents a speech emotion detection system
with improvements over an existing system in terms of data, feature
selection, and methodology that aims at classifying speech percepts
based on emotions, more accurately.

ALGORITHMS USED

CLASSIFIERS
Classification is defined as the process of recognition, understanding,
and grouping of objects and ideas into preset categories a.k.a ―sub-
populations.‖ With the help of these pre-categorized training datasets,
classification in machine learning programs leverage a wide range of
algorithms to classify future datasets into respective and relevant
categories. Classification algorithms used in machine learning utilize
input training data for the purpose of predicting the likelihood or
probability that the data that follows will fall into one of the
predetermined categories. One of the most common applications of
classification is for filtering emails into ―spam‖ or ―non-spam‖, as used by
today‘s top email service providers.In short, classification is a form of
―pattern recognition,‖. Here, classification algorithms applied to the
training data find the same pattern (similar number sequences, words or
sentiments, and the like) in future data sets.
We will explore classification algorithms in detail, and discover how a
text analysis software can perform actions like sentiment analysis - used
for categorizing unstructured text by opinion polarity (positive, negative,
neutral, and the like).

SVC
SVM algorithms classify data and train models within super finite
degrees of polarity,creating a 3-dimensional classification model that
goes beyond just X/Y predictive axes. Take a look at this visual
representation to understand how SVM algorithms work. We have two
tags: red and blue, with two data features: X and Y, and we train our
classifier to output an X/Y coordinate as either red or blue.

You might also like