Speech Emotion Detection Using Machine Learning
Speech Emotion Detection Using Machine Learning
The speech signal is one of the most natural and fastest methods of
communication between humans. Many systems have been developed
by various researchers to identify the emotions from the speech signal.
In differentiating between various emotions particularly speech features
are more useful and if not clear is the reason that makes emotion
recognition from speaker‘s speech very difficult. There are a number of
the dataset available for speech emotions, it's modelling, and types that
helps in knowing the type of speech. After feature extraction, another
important part is the classification of speech emotions so the paper has
compared and reviewed the different classifiers that are used to
differentiate emotions such as sadness, neutral, happiness, surprise,
anger, etc. The research also shows the improvement in emotion
recognition system by making automatic emotion recognition system
adding a deep neural network. The analysis has also been performed
using different ML techniques for Speech emotions recognition accuracy
in different languages.
i
TABLE OF CONTENTS
CHAPTER NO TITLE PAGE NO
ABSTRACT i
LIST OF FIGURES iii
LIST OF ABBREVIATIONS iv
LIST OF TABLES v
1 INTRODUCTION 1
1.1 OUTLINE 1
1.2 DATA OVERVIEW 2
2 LITERATURE REVIEW 3
3 METHODOLOGY 5
PURPOSE 5
SCOPE 5
EXISTING SYSYTEM 5
PROPOSED SYSTEM 5
ALGORITHMS USED 6
CLASSIFIERS 6
REGRESSORS 12
SYSTEM DESIGN 19
DATA FLOW DIAGRAM 20
MODULES 21
SYSTEM ARCHITETURE 23
4 RESULT AND DISCUSSION 24
5 CONCLUSION AND FUTURE WORK 24
REFERENCES 25
APPENDIES 26
ii
LIST OF FIGURES
1 RAVDESS dataset 2
2 TESS dataset 2
3 SVC Without HyperPlane 7
4 SVC With HyperPlane 7
5 SVC Best HyperPlane 8
6 Decision tree classifier 11
7 Support Vector Regression 13
8 Procedure of Random Forest 13
9 Structure of Random Forest 14
10 A Simple Regression Example 15
11 An Average of Outputs16
12 Predictions for different Values 17
13 KNN Regression 17
14 MLP Regression 18
15 Decision Tree Regression 19
16 Data Flow Diagram 21
17 System Architecture 23
18 Histogram on different classifiers 24
19 Output Screenshots 45
20 Output Screenshots of Happy wav file 45
21 Output Screenshots of sad wav file 46
22 Output Screenshots of Neutral wav file 46
iii
LIST OF ABBREVIATIONS
ABBREVIATION EXPANSION
MLP Muti layer perceptron
SVC Support Vector Classifier
KNN K Nearest Neighbours
iv
LIST OF TABLES
1 Confusion Matrix 24
v
CHAPTER 1
INTRODUCTION
1.1 OUTLINE
1
1.2 DATAOVERVIEW
2
CHAPTER 2
LITERATURE REVIEW
3
paper,
different emotions are recognized by means of different types of
classifiers. The second step is the prediction of a sequence of the next
emotional reactions using neural networks. The sequence is extracted
based on the speech signals being digitized at tenths of a second, after
concatenating the different speech signals ofeach subject. The
prediction problem is solved as a nonlinear auto-regression time-series
neural network with the assumption that the variables are defined as
data-feedback time-series. The best average recognition rate is 86.25%,
which is achieved by the Random Forest classifier, and the average
prediction rate of reactions by using neural networks.
4
CHAPTER 3
METHODOLOGY
PURPOSE
The purpose of this document is speech emotion detection using
machine learning algorithms. In detail, this document will provide a
general description of our project, including user requirements, product
perspective, and overview of requirements, general constraints. In
addition, it will also provide the specific requirements and functionality
needed for this project - such as interface, functional requirements and
performance requirements.
SCOPE
The scope of this SRSdocument persists for the entire life cycle of the
project. This document defines the final state of the software
requirements agreed upon by the customers and designers. Finally at
the end of the project execution all the functionalities may be traceable
from the SRSto the product. The document describes the functionality,
performance, constraints, interface and reliability for the entire life cycle
of the project.
EXISTING SYSTEM
The speech emotion detection system is implemented as a Machine
Learning (ML) model. The steps of implementation are comparable to
any other ML project, with additional fine-tuning procedures to make the
model function better. The flowchart represents a pictorial overview of
the process (see Figure 1). The first step is data collection, which is of
prime importance. The model being developed will learn from the data
provided to it and all the decisions and results that a developed model
will produce is guided by the data. The second step, called feature
engineering, is a collection of several machine learning tasks that are
executed over the collected data. These procedures address the several
data representation and data quality issues. The third step is often
considered the core of an ML project where an algorithmic based model
is developed. This model uses an ML algorithm to learn about the data
and train itself to respond to any new data it is exposed to. The final step
is to evaluate the functioning of the built model. Very often, developers
repeat the steps of developing a model and evaluating it to compare the
performance of different algorithms. Comparison results help to choose
the appropriate ML algorithm most relevant to the problem.
PROPOSED SYSTEM
In this current study, we presented an automatic speech emotion
5
recognition (SER) system using machine learning algorithms to classify
the emotions.The performance of the emotion detection system can
greatly influence the overall performance of the application in many
ways and can provide many advantages over the functionalities of these
applications. This research presents a speech emotion detection system
with improvements over an existing system in terms of data, feature
selection, and methodology that aims at classifying speech percepts
based on emotions, more accurately.
ALGORITHMS USED
CLASSIFIERS
Classification is defined as the process of recognition, understanding,
and grouping of objects and ideas into preset categories a.k.a ―sub-
populations.‖ With the help of these pre-categorized training datasets,
classification in machine learning programs leverage a wide range of
algorithms to classify future datasets into respective and relevant
categories. Classification algorithms used in machine learning utilize
input training data for the purpose of predicting the likelihood or
probability that the data that follows will fall into one of the
predetermined categories. One of the most common applications of
classification is for filtering emails into ―spam‖ or ―non-spam‖, as used by
today‘s top email service providers.In short, classification is a form of
―pattern recognition,‖. Here, classification algorithms applied to the
training data find the same pattern (similar number sequences, words or
sentiments, and the like) in future data sets.
We will explore classification algorithms in detail, and discover how a
text analysis software can perform actions like sentiment analysis - used
for categorizing unstructured text by opinion polarity (positive, negative,
neutral, and the like).
SVC
SVM algorithms classify data and train models within super finite
degrees of polarity,creating a 3-dimensional classification model that
goes beyond just X/Y predictive axes. Take a look at this visual
representation to understand how SVM algorithms work. We have two
tags: red and blue, with two data features: X and Y, and we train our
classifier to output an X/Y coordinate as either red or blue.