0% found this document useful (0 votes)
15 views35 pages

DEMO PPT

This document discusses emotion recognition from German speech using feature selection and a decision support system. It describes extracting features from speech like MFCCs, energy, pitch, and centroid. Literature reviews recent papers on emotion recognition from speech that use features like MFCCs and support vector machines. The objective is to develop a system to recognize emotions using feature selection and an SVM, calculate the system's accuracy, and compare results to other literature. The document outlines extracting time-based, spectral, and prosodic features from speech and using them with feature selection and an SVM for emotion recognition.

Uploaded by

shuklchitrank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views35 pages

DEMO PPT

This document discusses emotion recognition from German speech using feature selection and a decision support system. It describes extracting features from speech like MFCCs, energy, pitch, and centroid. Literature reviews recent papers on emotion recognition from speech that use features like MFCCs and support vector machines. The objective is to develop a system to recognize emotions using feature selection and an SVM, calculate the system's accuracy, and compare results to other literature. The document outlines extracting time-based, spectral, and prosodic features from speech and using them with feature selection and an SVM for emotion recognition.

Uploaded by

shuklchitrank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Emotion Recognition from German Speech Using Feature Selection Based Decision

Support System

Conference logo

Presented By:
Sanjay Sharma

Name of Conference
LIST OF CONTENT
 Introduction(in paper)
 Literature Review(last 4 paper of reference)
 Problem Statement and Objectives(From conclusion)
 Features Extraction(In Paper)
 Feature Selection(In paper)
 Application of Machine Learning(Web Development)
 Support Vector Machine(HTML)
 Process Flow Diagram( you have to create)
 Result(in paper)
 Conclusion(in paper)
 Future Scope(in paper)
 References(first 7 paper of reference)
Introduction

• Emotion recognition is the process of identifying human emotion.


• The most stimulating task in a speech signal processing field is emotion
recognition.
• Emotions are the feelings, which bring a change in physical and
psychological state, and have an influence on the person’s behavior.
• Emotion identification is used in different applications such as Lie detector
and can be used as voice tag in different database access systems . This
voice tag is used in telephony shopping, ATM machine as a password for
accessing that particular account and SIRI in iphone etc..
• It is useful in customer relationship management wherein the employee can
understand and resolve customer’s problems through the emotional change
in the speech.
• Researchers are doing research to finding and improving accuracy to
recognizing the emotions of others.
Literature Reviews
• Tripathi et al (2020) proposes The paper reviews about “emotion detection
using vocal audios”. The vocals mainly constitute of the speech which is
determined by the signals. Emotion recognition from the speech is an old
and challenging problem in the field of artificial intelligence.

• Guo, L. et al (2019) proposes a dynamic fusion framework to utilize the


potential advantages of the complementary spectrogram-based statistical
features and the auditory-based empirical features. In addition, a kernel
extreme learning machine (KELM) is adopted as the classifier to distinguish
emotions. To validate the proposed framework, we conduct experiments on
two public emotional databases, including Emo-DB and IEMOCAP
databases. The experimental results demonstrate that the proposed fusion
framework significantly outperforms the existing state-of-the-art methods.
The results also show that the proposed method, by integrating the auditory-
based features with spectrogram-based features, could achieve a notably
improved performance over the conventional methods.
Literature Reviews
• Aouani, H., et al (2018) designed a system which is a two-stage approach, namely feature
extraction and classification engine. Firstly, two sets of feature are investigated which are: 39
Mel-frequency Cepstral Coefficient (MFCC) coefficients features extracted. Secondly, use the
Support Vector Machine (SVM) as the main classifier engine since it is the most common
technique in the field of speech recognition. Besides that, we investigate the importance of the
recent advances in machine learning including the deep kernel learning, as well as the various
types of auto-encoder (the basic auto-encoder and the stacked auto-encoder). A large set of
experiments are conducted on the SAVEE audio database. The experimental results show that
DSVM method outperforms the standard SVM with a classification rate of 69.84% and 68.25%
using 39 MFCC, respectively. Additionally, the auto-encoder method outperforms the standard
SVM, yielding a classification rate of 73.01%.

• Bennasar et al (2015) introduces two new nonlinear feature selection methods, namely Joint
Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information
Maximisation (NJMIM); both these methods use mutual information and the ‘maximum of the
minimum’ criterion, which alleviates the problem of overestimation of the feature significance
as demonstrated both theoretically and experimentally. The proposed methods are compared
using eleven publically available datasets with five competing methods. The results
demonstrate that the JMIM method outperforms the other methods on most tested public
datasets, reducing the relative average classification error by almost 6% in comparison to the
next best performing method. The statistical significance of the results is confirmed by the
ANOVA test. Moreover, this method produces the best trade-off between accuracy and
stability.
Problem Statement and Objective
Problem statement:
It is difficult to identify and recognize the emotion of human being and save
human life from stress, suicide etc. This pressure sign and symptoms add a
status of high excitement, elevated heart rate, extreme adrenaline creation and
malfunction of coping component, sense of tension and exhaustion and inability to
completely focus. To minimize the risk factor and protect human being at earlier
stage of tension, stress etc. we study about “ Emotion Recognition From German
Speech using Feature selection Based Decision Support System “ in detail.

Objective:
• Develop a system those will provide recognition of emotion using feature
selection and support vector machine.
• Calculate accuracy of system by providing test signal to modal.
• Compare the result obtained from this method with the available literature.
• Detect the emotions.
Feature Extraction
• Timeframe: These characteristics give the temporal properties of voiced and unvoiced
elements. They operate entirely on the temporal signal.

• Mel-Freq. Cepstrum Coefficients: These coefficients are generated from your


improvement to a cepstrum space, to be able to catch data of the time-different
spectral envelope. A cepstrum can be accomplished by making use of a Fourier
Transform in the sign (f) plot, to isolate in the regularity domain name the slowly
different spectral envelope in the far more changing fast spectral fine composition.
2
Power spectrum = |FT{log(|FT{x(n)} |2)}|

• Energy: The capacity of the transmission x inside a particular window of N trial samples
is depicted by:
Feature Extraction
N
E n =  x(n).x * (n)
n 1

• Zero Traversing Rate (ZTR): The zero crossings attribute computes the number of
periods how the symbol of the conversation signal amplitude differs in the time site in a
structure. For solitary-voiced speech signals, absolutely no crossings are used to
generate a rough estimation in the essential-consistency. In the case of sophisticated
indicators, it is a straightforward working out of noisiness. The Zero Traversing Amount
computes how often the conversation transmission modifications its indication: -
1 N
ZTR =  | sgn( xn )  sgn( xn1 ) |
2 n 1

• Pitch: The signal which comes through the singing pathway begins in the larynx where
vocal cords are placed and come to an end at jaws. The shape of the vocal pathway and
the vibrations of the singing cords are monitored by nerves from the human brain. The
noise, which we make, could be classified into voiced and unvoiced appears to be.
Feature Extraction
during the technology of unvoiced appears to be the vocal cords usually do not vibrate
and remain open whereas during voiced noises they vibrate and generate exactly what
is known as glottal pulse. A pulse is an addition of your sinusoidal influx of fundamental
regularity and its harmonics (Amplitude diminishes when frequency raises). The basic
regularity of glottal pulse is named as the pitch.

• Centroid: The spectral centroid is identified as the middle of gravitational pressure in the
variety. The centroid may be the computation from the spectral form and higher
centroid values refer to better textures with increased high frequencies.
N /2

 f [k ] X r [k ]
Cr  k 1
N /2

X
k 1
r [k ]

Whereby f [k] may be the regularity at bin k. Centroid adjusts the sound sharpness.
Sharpness corresponds to the high-frequency content of the spectrum. Higher centroid
values refer to spectra in the range of higher frequencies. Because of its effectiveness to
define spectral shape, centroid measures are utilized in audio classification tasks.
Feature Extraction
vector of flatness beliefs per period of time. The flatness of any music band is known as
the ratio of your geometric along with the arithmetic method of the power variety
coefficients within that music group. Each vector is decreased into a scalar by
determining the imply benefit over the groups for every single particular structure,
therefore reaching a scalar function that specifies the general flatness.

.
Feature Selection
From the set of extracted features set, only few feature are significant in nature. It
means we have to search out and exclude irrelevant features from the retrieved
feature set. For this sake in this work we have used Information theory to rank
features on the basis of their Joint Mutual Information (JMI) with respect to rest
features as well as the associated class label. Let A and B be two discrete random
variables representing the elements of feature set, the mutual information between
these two features is defined as:
P ( A, B)
MI ( A, B)   P( A, B) log 2
A B P ( A) P( B)
Where P is a probiblity density function and MI(A,B) is a measure of dependency
between the density of variable A and the density of target variable B.
Let the features of a sample instance is associated to some class label C, then the joint
mutual information between the features A, B and class label C can be represented as:
I(A:B/C) = I(A/B) – I( A/B,C )
Using the joint mutual information principle, features are ranked according to their
information with respected to next ranked features and associated class label for that
instance.
Web Development
• Machine learning is a field of study in which computer learn itself without
using of any external program. It makes computer similar to human in the
task of decision making.
• It involves computers learning from data provided so that they carry out
certain tasks.
• Machine learning is a subfield of Artificial Intelligence.
• Machine learning approaches are traditionally divided into three broad
categories, depending on the nature of the "signal" or "feedback" available
to the learning system
Supervised Machine learning
Unsupervised Machine learning
Reinforcement learning
Fig: Architecture of Machine
Learning
Fig: Emotion Recognition Using Machine Learning
Applications Of Machine learning
• Self Driving Cars
• Machine Learning in Healthcare
• Image Recognition
• Traffic prediction
• Speech Recognition
• Virtual Personal Assistant
• Financial Fraud Detection
HTML
• In Machine Learning, a Support Vector Machine is a most popular Supervised
learning algorithm.

• It is mostly used in classification problems in machine learning.

• The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision boundary is called
a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as
Support Vector Machine.

• It has two types Linear SVM and Non-linear SVM.


Fig: Linear separable data

Fig: Non- Linear separable data


Algorithm-
Input:D={(xl,yl),(xl,yl),…..,(xl,yl)},xЄRn, yЄ{-1,+1}

Classifier: f(x)=sgn(w*.x+b*) if sgn is ‘+’ then class is positive, if sgn is ‘–’ then class is
negative.

(w*.x(plus)+b*) >=1
(w*.x(minus)+b*) <=-1

yi(w*.x(plus)+b*) >=1
yi(w*.x(minus)+b*) >=1
Radial basis kernel-
Radial Basis Kernel is a kernel function that is used in machine learning to find
a non-linear classifier.
Kernel Function is used to transform n-dimensional input to m-dimensional
input, where m is much higher than n.
The main idea to use kernel is: A linear classifier curve in higher dimensions
becomes a Non-linear classifier curve in lower dimensions
Result
Objective-1
A system graph those will provide recognition of emotion

Fig. Graph of model to recognize emotion using 421 speech signal


of different classes
Output

Fig: GUI of training model graph and testing graph of training model
Objective-2
Calculate Accuracy of system by providing test signal to modal

• Calculation of Accuracy has been done.

• The graphical user interface of training model graph and testing graph
apply on training model has been completed

• The confusion matrix has been achieved of testing signals by training


model .
Step 1
Step 2
Step 3
Objective-3
Compare the result obtained from this method with the
available literature
Work on Information Theory is done in the current research work

The current research work is more accurate

Database Percentage Accuracy Percentage Accuracy


(With SVM) (With SVM and JMI)

Emo_Db 75.75% 92%


Berlin
Database
Objective-4
The identification of correct emotions was achieved 92 percent of accuracy in the
proposed system.
Conclusion
• The emotion plays a very significant role in human life. Speech is one-a-
kind of way to communicate with other persons.

• In this work we developed a model to classify the German speech signals


into various emotion labels. This research objective is aimed at improving
the performance of raw model of Support Vector Machine.

• The efficiency of emotion classification has successfully improved using


Joint Mutual Information based feature selection method and excludes out
irrelevant features from the extracted feature set.
Conclusion
• The experimental outcomes evidence the promising result with the class
emotion classification accuracy of 92 percentages.
Future Scope
• Real time assessment of data for further classification.

• Application of more recent Support Vector Machine.

• Comparative assessment of Self-trained Support Vector Machine.


References

• Ghai, M., Lal, S., Duggal, S., Manik, S. ,“Emotion recognition on speech
signals using machine learning”, International Conference on Big Data
Analytics and Computational Intelligence (ICBDAC), pp.34-39, 2017.

• Korkmaz, O. E., & Atasoy, A. ,“Emotion recognition from speech signal


using mel frequency cepstral coefficients”, International Conference on
Electrical and Electronics Engineering (ELECO), Vol. 9,2015.

• M. S. Likitha, S. R. R. Gupta, K. Hasitha,A. U. Raju, “Speech based human


emotion recognition using MFCC” International Conference on Wireless
Communications, Signal Processing and Networking (WiSPNET), pp. 2257-
2260, 2017

• A. Benba, A. Jilbab, A. Hammouch, “Detecting patients with parkinson’s


disease with mel frequency cepstral coefficient and support vector
machine”, International Journal on Electrical Engineering and Informatics,
Vol. 7, pp 297-307, 2015.

References
• S. Ladake and A. Gurjar, “Analysis and dissecton of sanskrit divine sound om using
digital signal processing to study the science behind om chanting”, International
Conference on Intelligent Systems, Modelling and Simulation, Bangkok, Vol. 7, pp
169-173, 2016.

• Bhoomika Panda, Debananda Padhi, Kshamamayee Dash, “Use of SVM Classifier &
MFCC in Speech Emotion Recognition System”, IJARCSSE ,Vol. 2, No. 3, 2012.
• Harshini, D., Pranjali, B., Ranjitha, M., Rushali, J., Manikandan, J., “Design and
Evaluation of Speech based Emotion Recognition System using Support Vector
Machines”, IEEE India Council International Conference (INDICON), Vol. 16,2019.

• Guo, L., Wang, L., Dang, J., Liu, Z., & Guan, H., “Exploration of Complementary
Features for Speech Emotion Recognition Based on Kernel Extreme Learning
Machine” IEEE Access, Vol.7, pp.75798–75809, 2019.

• Aouani, H., & Ben Ayed, Y., “Emotion recognition in speech using MFCC with SVM,
DSVM and auto-encoder”,International Conference on Advanced Technologies for
Signal and Image Processing (ATSIP), Vol. 4, 2018.
Thank You

You might also like