0% found this document useful (0 votes)
41 views14 pages

Python Mini Project

This document describes a Python mini project to perform speech emotion recognition using the librosa library. It involves loading speech data from the RAVDESS dataset, extracting features like MFCCs and chroma using librosa, training an MLPClassifier model on the features to classify emotions, and evaluating the model accuracy on a test set. Key steps include defining functions for feature extraction, loading and splitting the dataset, initializing and training the MLPClassifier, making predictions on the test set, and calculating classification accuracy. The model is able to recognize emotions from speech with an accuracy of 72.4% on this dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views14 pages

Python Mini Project

This document describes a Python mini project to perform speech emotion recognition using the librosa library. It involves loading speech data from the RAVDESS dataset, extracting features like MFCCs and chroma using librosa, training an MLPClassifier model on the features to classify emotions, and evaluating the model accuracy on a test set. Key steps include defining functions for feature extraction, loading and splitting the dataset, initializing and training the MLPClassifier, making predictions on the test set, and calculating classification accuracy. The model is able to recognize emotions from speech with an accuracy of 72.4% on this dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Python Mini Project – Speech

Emotion Recognition with librosa

Python Mini Project


Speech emotion recognition, the best ever python mini project. The best
example of it can be seen at call centers. If you ever noticed, call centers
employees never talk in the same manner, their way of pitching/talking to
the customers changes with customers. Now, this does happen with
common people too, but how is this relevant to call centers? Here is your
answer, the employees recognize customers’ emotions from speech, so they
can improve their service and convert more people. In this way, they are
using speech emotion recognition. So, let’s discuss this project in detail.

Speech emotion recognition is a simple Python mini-project, which you are


going to practice with DataFlair.

What is Speech Emotion Recognition?


Speech Emotion Recognition, abbreviated as SER, is the act of attempting
to recognize human emotion and affective states from speech. This is
capitalizing on the fact that voice often reflects underlying emotion through
tone and pitch. This is also the phenomenon that animals like dogs and
horses employ to be able to understand human emotion.

SER is tough because emotions are subjective and annotating audio is


challenging.
What is librosa?
librosa is a Python library for analyzing audio and music. It has a flatter
package layout, standardizes interfaces and names, backwards
compatibility, modular functions, and readable code. Further, in this
Python mini-project, we demonstrate how to install it (and a few other
packages) with pip.

What is JupyterLab?
JupyterLab is an open-source, web-based UI for Project Jupyter and it has
all basic functionalities of the Jupyter Notebook, like notebooks, terminals,
text editors, file browsers, rich outputs, and more. However, it also provides
improved support for third party extensions.

To run code in the JupyterLab, you’ll first need to run it with the command
prompt:

This will open for you a new session in your browser. Create a new Console
and start typing in your code. JupyterLab can execute multiple lines of code
at once; pressing enter will not execute your code, you’ll need to press
Shift+Enter for the same.

Speech Emotion Recognition –


Objective
To build a model to recognize emotion from speech using the librosa and
sklearn libraries and the RAVDESS dataset.
Speech Emotion Recognition – About
the Python Mini Project

In this Python mini project, we will use the libraries librosa, soundfile, and
sklearn (among others) to build a model using an MLPClassifier. This will
be able to recognize emotion from sound files. We will load the data, extract
features from it, then split the dataset into training and testing sets. Then,
we’ll initialize an MLPClassifier and train the model. Finally, we’ll calculate
the accuracy of our model.

The Dataset
For this Python mini project, we’ll use the RAVDESS dataset; this is the
Ryerson Audio-Visual Database of Emotional Speech and Song dataset, and
is free to download. This dataset has 7356 files rated by 247 individuals 10
times on emotional validity, intensity, and genuineness. The entire dataset
is 24.8GB from 24 actors,

Prerequisites
You’ll need to install the following libraries with pip:

If you run into issues installing librosa with pip, you can try it with conda.
Steps for speech emotion recognition
python projects
1. Make the necessary imports:

Screenshot:
2. Define a function extract_feature to extract the mfcc, chroma, and mel
features from a sound file. This function takes 4 parameters- the file name
and three Boolean parameters for the three features:

• mfcc: Mel Frequency Cepstral Coefficient, represents the short-


term power spectrum of a sound
• chroma: Pertains to the 12 different pitch classes
• mel: Mel Spectrogram Frequency

Open the sound file with soundfile.SoundFile using with-as so it’s


automatically closed once we’re done. Read from it and call it X. Also, get
the sample rate. If chroma is True, get the Short-Time Fourier Transform of
X.

Let result be an empty numpy array. Now, for each feature of the three, if it
exists, make a call to the corresponding function from librosa.feature (eg-
librosa.feature.mfcc for mfcc), and get the mean value. Call the function
hstack() from numpy with result and the feature value, and store this in
result. hstack() stacks arrays in sequence horizontally (in a columnar
fashion). Then, return the result.
Screenshot:
3. Now, let’s define a dictionary to hold numbers and the emotions
available in the RAVDESS dataset, and a list to hold those we want- calm,
happy, fearful, disgust.

Screenshot:
4. Now, let’s load the data with a function load_data() – this takes in the
relative size of the test set as parameter. x and y are empty lists; we’ll use
the glob() function from the glob module to get all the pathnames for the
sound files in our dataset. The pattern we use for this is:
“D:\\DataFlair\\ravdess data\\Actor_*\\*.wav”. This is because our
dataset looks like this:

Screenshot:

So, for each such path, get the basename of the file, the emotion by splitting
the name around ‘-’ and extracting the third value:
Screenshot:

Using our emotions dictionary, this number is turned into an emotion, and
our function checks whether this emotion is in our list of
observed_emotions; if not, it continues to the next file. It makes a call to
extract_feature and stores what is returned in ‘feature’. Then, it appends
the feature to x and the emotion to y. So, the list x holds the features and y
holds the emotions. We call the function train_test_split with these, the
test size, and a random state value, and return that.
Screenshot:

5. Time to split the dataset into training and testing sets! Let’s keep the test
set 25% of everything and use the load_data function for this.

Screenshot:
6. Observe the shape of the training and testing datasets:

Screenshot:

7. And get the number of features extracted.

Output Screenshot:

8. Now, let’s initialize an MLPClassifier. This is a Multi-layer Perceptron


Classifier; it optimizes the log-loss function using LBFGS or stochastic
gradient descent. Unlike SVM or Naive Bayes, the MLPClassifier has an
internal neural network for the purpose of classification. This is a
feedforward ANN model.

Screenshot:
9. Fit/train the model.

Output Screenshot:

10. Let’s predict the values for the test set. This gives us y_pred (the predicted emotions for
the features in the test set).

Screenshot:
11. To calculate the accuracy of our model, we’ll call up the accuracy_score()
function we imported from sklearn. Finally, we’ll round the accuracy to 2
decimal places and print it out.

Output Screenshot:

Summary
In this Python mini project, we learned to recognize emotions from speech.
We used an MLPClassifier for this and made use of the soundfile library to
read the sound file, and the librosa library to extract features from it. As
you’ll see, the model delivered an accuracy of 72.4%. That’s good enough
for us yet.

Hope you enjoyed the mini python project.

You might also like