Python Mini Project
Python Mini Project
What is JupyterLab?
JupyterLab is an open-source, web-based UI for Project Jupyter and it has
all basic functionalities of the Jupyter Notebook, like notebooks, terminals,
text editors, file browsers, rich outputs, and more. However, it also provides
improved support for third party extensions.
To run code in the JupyterLab, you’ll first need to run it with the command
prompt:
This will open for you a new session in your browser. Create a new Console
and start typing in your code. JupyterLab can execute multiple lines of code
at once; pressing enter will not execute your code, you’ll need to press
Shift+Enter for the same.
In this Python mini project, we will use the libraries librosa, soundfile, and
sklearn (among others) to build a model using an MLPClassifier. This will
be able to recognize emotion from sound files. We will load the data, extract
features from it, then split the dataset into training and testing sets. Then,
we’ll initialize an MLPClassifier and train the model. Finally, we’ll calculate
the accuracy of our model.
The Dataset
For this Python mini project, we’ll use the RAVDESS dataset; this is the
Ryerson Audio-Visual Database of Emotional Speech and Song dataset, and
is free to download. This dataset has 7356 files rated by 247 individuals 10
times on emotional validity, intensity, and genuineness. The entire dataset
is 24.8GB from 24 actors,
Prerequisites
You’ll need to install the following libraries with pip:
If you run into issues installing librosa with pip, you can try it with conda.
Steps for speech emotion recognition
python projects
1. Make the necessary imports:
Screenshot:
2. Define a function extract_feature to extract the mfcc, chroma, and mel
features from a sound file. This function takes 4 parameters- the file name
and three Boolean parameters for the three features:
Let result be an empty numpy array. Now, for each feature of the three, if it
exists, make a call to the corresponding function from librosa.feature (eg-
librosa.feature.mfcc for mfcc), and get the mean value. Call the function
hstack() from numpy with result and the feature value, and store this in
result. hstack() stacks arrays in sequence horizontally (in a columnar
fashion). Then, return the result.
Screenshot:
3. Now, let’s define a dictionary to hold numbers and the emotions
available in the RAVDESS dataset, and a list to hold those we want- calm,
happy, fearful, disgust.
Screenshot:
4. Now, let’s load the data with a function load_data() – this takes in the
relative size of the test set as parameter. x and y are empty lists; we’ll use
the glob() function from the glob module to get all the pathnames for the
sound files in our dataset. The pattern we use for this is:
“D:\\DataFlair\\ravdess data\\Actor_*\\*.wav”. This is because our
dataset looks like this:
Screenshot:
So, for each such path, get the basename of the file, the emotion by splitting
the name around ‘-’ and extracting the third value:
Screenshot:
Using our emotions dictionary, this number is turned into an emotion, and
our function checks whether this emotion is in our list of
observed_emotions; if not, it continues to the next file. It makes a call to
extract_feature and stores what is returned in ‘feature’. Then, it appends
the feature to x and the emotion to y. So, the list x holds the features and y
holds the emotions. We call the function train_test_split with these, the
test size, and a random state value, and return that.
Screenshot:
5. Time to split the dataset into training and testing sets! Let’s keep the test
set 25% of everything and use the load_data function for this.
Screenshot:
6. Observe the shape of the training and testing datasets:
Screenshot:
Output Screenshot:
Screenshot:
9. Fit/train the model.
Output Screenshot:
10. Let’s predict the values for the test set. This gives us y_pred (the predicted emotions for
the features in the test set).
Screenshot:
11. To calculate the accuracy of our model, we’ll call up the accuracy_score()
function we imported from sklearn. Finally, we’ll round the accuracy to 2
decimal places and print it out.
Output Screenshot:
Summary
In this Python mini project, we learned to recognize emotions from speech.
We used an MLPClassifier for this and made use of the soundfile library to
read the sound file, and the librosa library to extract features from it. As
you’ll see, the model delivered an accuracy of 72.4%. That’s good enough
for us yet.