Speech Emotion Recognition (Sound C
Speech Emotion Recognition (Sound C
Hackers Realm
The dataset employed for this project is the Toronto emotional speech set (TESS),
available on Kaggle, but it is noted that additional datasets are also available
and may be incorporated to enrich the training process. The dataset encompasses
2800 samples, featuring various emotions labeled accordingly, allowing for
effective model training.
The classifier's development began on Kaggle's platform, where the data was
downloaded and processed. Initially, audio files in .wav format were sorted into
different folders based on the emotion they represent. The computational work was
facilitated by using Kaggle's notebook, which is akin to a Jupyter notebook,
enabling efficient code execution and analysis.
Key libraries such as numpy, os, seaborn, matplotlib, librosa, and IPython.display
were imported for data handling and visualization. The data processing included
extracting the file paths and labels from the dataset and loading these into a
Pandas DataFrame for ease of manipulation.
The training process utilized a GPU accelerator to promptly iterate through epochs,
showcasing soaring accuracy levels that quickly reached approximately 99% for both
training and validation sets. This significant feat underscored the model's
robustness and its proficiency in speech emotion recognition.
Final Thoughts
The journey from understanding the dataset to creating and validating a proficient
model represents a remarkable achievement within the scope of deep learning and
artificial intelligence. Speech emotion recognition can play pivotal roles across
various sectors by assisting in gauging human emotions, thus being an asset for
many practical applications in technology.