ANN Report
ANN Report
Report On
Gargi Bendale--(16)
Riya Birnale--(18)
Krishna Jogi--(35)
Guide
CERTIFICATE
This is to certify that the project titled “Implement Automatic Music Emotion Classification
Gargi Bendale--(16)
Riya Birnale--(18)
Krishna Jogi--(35)
The course is a part of semester VI of the Department of Artificial Intelligence and Data Science
during the academic year 2024-2025. The said work has been assessed and is found to be
satisfactory.
College seal
Table of Contents
Sr.No Content Page No.
1 Introduction 2
2 Literature Survey 4
3 Problem Definition 5
4 Working of Project 6
5 Analysis of Different Methods 7
6 Performance parameters 8
7 Flowchart of Program 9
8 Result 11
9 Conclusion 12
10 References 13
1
1. Introduction
Traditional methods of music classification were based on genre or manually annotated metadata.
However, these approaches fail to capture the complex emotional impact of music. Recent
advancements in machine learning and deep learning, particularly Convolutional Neural Networks
(CNNs) and Long Short-Term Memory (LSTM) networks, have shown promising results in
extracting meaningful features from music and understanding its emotional content.
This report focuses on the development of an MER system using deep learning techniques,
specifically CNN+LSTM architecture, to analyze spectrograms of music and classify them into
predefined emotional categories.
Objectives
Develop a MER system using CNN+LSTM.
Extract Mel Spectrograms from songs to analyze spatial and temporal features.
Classify music into four emotion categories: Happy/Energetic, Calm/Peaceful,
Angry/Tense, and Sad/Melancholic.
Enable real-time emotion prediction for new songs.
Contribute to AI-driven music analytics, personalized recommendations, and affective
computing.
2
2. Literature Survey
3
3. Problem Definition
Music classification based on genre is common, but recognizing emotions from music remains
challenging due to:
The subjective nature of emotions.
Variations in tempo, pitch, and harmony affecting perceived emotions.
The need for real-time classification in applications like music streaming and therapy.
This project aims to develop a Neural Network MER system that effectively classifies emotions
from music, leveraging CNN+LSTM architecture.
4
4. Working of Project
Dataset: The dataset consists of 1,802 MP3 songs, each associated with arousal and valence
values recorded at 500ms intervals. These values represent the song’s emotional intensity
and positivity.
Labeling Process: Since explicit emotion labels were unavailable, we manually labeled the
dataset by mapping arousal and valence values to four emotional categories:
Happy/Energetic, Calm/Peaceful, Angry/Tense, and Sad/Melancholic.
Feature Extraction: Mel Spectrograms are extracted from the audio files, capturing
frequency and time information.
Data Preprocessing: Normalization and augmentation techniques are applied to improve
model generalization.
Dataset: The dataset consists of 1,802 MP3 songs, each associated with arousal and valence
values recorded at 500ms intervals. These values represent the song’s emotional intensity
and positivity.
Feature Extraction: Mel Spectrograms are extracted from the audio files, capturing
frequency and time information.
Data Preprocessing: Normalization and augmentation techniques are applied to improve
model generalization.
Model Architecture
1. CNN Layer: Extracts spatial features from spectrograms.
2. LSTM Layer: Processes extracted features over time to detect emotion transitions.
3. Fully Connected Layer: Classifies into four emotion categories.
4. Softmax Layer: Outputs final emotion probability.
5
5. Analysis of Different Methods
Comparison of Approaches
Supervised Learning (CNN+LSTM): Offers high accuracy and learns hierarchical audio
patterns but requires labelled data.
Unsupervised Learning (Clustering): Useful for exploratory analysis but lacks accuracy in
classification.
Traditional Feature Extraction (MFCCs, Chroma): Simple and interpretable but requires
manual feature selection.
Transformers: Captures long-term dependencies but demands large datasets and high
computational resources.
Experimental Analysis
CNN+LSTM achieved the best balance between accuracy and computational efficiency.
Clustering-based methods were less reliable for real-time applications.
Transformers showed potential but were resource-intensive.
6
6. Performance Parameters
7
7. Flowchart of Program
1. Start
The system begins by initializing all necessary components, including importing libraries and setting
up the environment for data processing.
9. End
The system outputs predicted emotions and can be used in applications like music
recommendation systems, therapy, gaming, and media analysis.
9
8. Result
Model Performance
Accuracy: Achieved an accuracy of approximately 85% on test data.
Emotion Trends:
o High valence + high arousal → Happy
o Low valence + high arousal → Angry
o Low valence + low arousal → Sad
10
9. Conclusion
Music Emotion Recognition (MER) using deep learning is a significant advancement in the
field of artificial intelligence, allowing for a more profound understanding of how music
influences human emotions. This project successfully implemented a CNN+LSTM-based
model that classifies music into four primary emotional categories: Happy/Energetic,
Calm/Peaceful, Angry/Tense, and Sad/Melancholic. By leveraging arousal and valence
values, the system effectively analyzes music patterns and assigns corresponding emotions
with high accuracy. The manually labeled dataset, created using arousal-valence mapping,
played a crucial role in training the model to recognize emotional variations accurately.
The impact of MER extends across multiple domains. Music streaming services can
integrate MER to provide mood-based recommendations, enhancing user engagement.
Mental health applications can benefit from emotion-driven playlists that assist in therapy
and relaxation techniques. In interactive AI systems, MER can enhance gaming experiences
by adapting in-game soundtracks to match player emotions, creating a more immersive and
dynamic environment. Moreover, the application of MER in film scoring can help automate
the selection of background music that aligns with the mood of a scene.
Despite its promising capabilities, MER faces several challenges. Subjectivity in emotional
perception makes classification difficult, as different listeners may interpret the same song
differently. Additionally, the overlap of emotional categories complicates classification,
requiring more nuanced models for better accuracy. Computational complexity is another
hurdle, as training deep learning models on large-scale audio data demands significant
processing power. The trade-off between real-time processing and model accuracy must
also be addressed for MER to be effectively deployed in commercial applications.
11
10.Reference
12