0% found this document useful (0 votes)

3 views14 pages

ANN Report

The project report details the implementation of an Automatic Music Emotion Classification system using a CNN+LSTM architecture to classify music into four emotional categories: Happy/Energetic, Calm/Peaceful, Angry/Tense, and Sad/Melancholic. It highlights the significance of Music Emotion Recognition (MER) in applications such as music streaming services and mental health therapy, while also addressing challenges like subjective emotional perception and computational complexity. The model achieved approximately 85% accuracy, demonstrating its effectiveness in analyzing music patterns and predicting emotions.

Uploaded by

riya.birnale31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views14 pages

ANN Report

Uploaded by

riya.birnale31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

K J Somaiya Institute of Technology

An Autonomous Institute Permanently Affiliated to University of Mumbai.

A Project Based Learning

Report On

“Implement Automatic Music Emotion Classification

Using NN ”
SUBMITTED BY

Gargi Bendale--(16)

Riya Birnale--(18)

Krishna Jogi--(35)

Guide

Prof. Milind Nemade

Department of Artificial Intelligence and Data Science
2024-25
K J Somaiya Institute of Technology
UNIVERSITY OF MUMBAI

CERTIFICATE

This is to certify that the project titled “Implement Automatic Music Emotion Classification

Using NN “ is completed under my supervision and guidance in partial fulfilment of the

requirements of the course ANN, by the following students:

Gargi Bendale--(16)

Riya Birnale--(18)

Krishna Jogi--(35)
The course is a part of semester VI of the Department of Artificial Intelligence and Data Science
during the academic year 2024-2025. The said work has been assessed and is found to be
satisfactory.

(Internal guide name and sign.)

College seal
Table of Contents
Sr.No Content Page No.
1 Introduction 2
2 Literature Survey 4
3 Problem Definition 5
4 Working of Project 6
5 Analysis of Different Methods 7
6 Performance parameters 8
7 Flowchart of Program 9
8 Result 11
9 Conclusion 12
10 References 13

1
1. Introduction

Background and Motivation

Music has always played a significant role in human emotions, evoking feelings of happiness,
sadness, excitement, or calmness. The ability to recognize and categorize these emotions is crucial
for various applications such as music recommendation systems, emotional therapy, and interactive
AI-driven music platforms. The field of Music Emotion Recognition (MER) seeks to automatically
classify music into different emotional categories based on its acoustic and structural properties.

Importance of Music Emotion Recognition

 Music Streaming Services: Personalized playlists based on mood (e.g., Spotify and Apple
Music).
 Music Therapy: Assisting in emotional well-being and mental health treatments.
 AI and Human-Computer Interaction: Enhancing the experience in gaming, virtual reality,
and assistive technologies.

Traditional methods of music classification were based on genre or manually annotated metadata.
However, these approaches fail to capture the complex emotional impact of music. Recent
advancements in machine learning and deep learning, particularly Convolutional Neural Networks
(CNNs) and Long Short-Term Memory (LSTM) networks, have shown promising results in
extracting meaningful features from music and understanding its emotional content.

This report focuses on the development of an MER system using deep learning techniques,
specifically CNN+LSTM architecture, to analyze spectrograms of music and classify them into
predefined emotional categories.

Objectives
 Develop a MER system using CNN+LSTM.
 Extract Mel Spectrograms from songs to analyze spatial and temporal features.
 Classify music into four emotion categories: Happy/Energetic, Calm/Peaceful,
Angry/Tense, and Sad/Melancholic.
 Enable real-time emotion prediction for new songs.
 Contribute to AI-driven music analytics, personalized recommendations, and affective
computing.

2
2. Literature Survey

Traditional Feature-Based Approaches

Early studies in MER relied on handcrafted features such as MFCCs (Mel-Frequency Cepstral
Coefficients), Chroma Features, and Rhythm Patterns. Research by Laurier et al. (2009) used
Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN) for emotion classification based
on such features. While these methods achieved moderate accuracy, they struggled with complex
emotional dynamics.

Arousal-Valence Models in MER

The Russell’s Circumplex Model of Affect (1980) has been widely adopted for emotion
representation, mapping emotions on arousal (energy level) and valence (positivity or negativity)
dimensions. Studies like Yang et al. (2012) developed regression-based models for continuous
emotion prediction using arousal and valence values at different time intervals.

Deep Learning in MER

 CNNs for Feature Extraction: CNNs extract spatial and frequency-based features from
spectrograms, identifying pitch, tone, and intensity.
 LSTMs for Capturing Time-Series Dependencies: LSTMs model emotional progression in
music by analyzing sequential patterns over time.
 Comparison with Other Architectures:
o CNN+LSTM vs. GRUs: GRUs are computationally efficient but similar in
performance to LSTMs.
o CNN+LSTM vs. Transformers: Transformers handle long-term dependencies better
but require more computational resources.
o Why CNN+LSTM? CNN+LSTM balances efficiency and accuracy, making it a
reliable choice for MER tasks.

3
3. Problem Definition

Music classification based on genre is common, but recognizing emotions from music remains
challenging due to:
 The subjective nature of emotions.
 Variations in tempo, pitch, and harmony affecting perceived emotions.
 The need for real-time classification in applications like music streaming and therapy.
This project aims to develop a Neural Network MER system that effectively classifies emotions
from music, leveraging CNN+LSTM architecture.

4
4. Working of Project

Dataset and Feature Extraction

 Dataset: The dataset consists of 1,802 MP3 songs, each associated with arousal and valence
values recorded at 500ms intervals. These values represent the song’s emotional intensity
and positivity.
 Labeling Process: Since explicit emotion labels were unavailable, we manually labeled the
dataset by mapping arousal and valence values to four emotional categories:
Happy/Energetic, Calm/Peaceful, Angry/Tense, and Sad/Melancholic.
 Feature Extraction: Mel Spectrograms are extracted from the audio files, capturing
frequency and time information.
 Data Preprocessing: Normalization and augmentation techniques are applied to improve
model generalization.
 Dataset: The dataset consists of 1,802 MP3 songs, each associated with arousal and valence
values recorded at 500ms intervals. These values represent the song’s emotional intensity
and positivity.
 Feature Extraction: Mel Spectrograms are extracted from the audio files, capturing
frequency and time information.
 Data Preprocessing: Normalization and augmentation techniques are applied to improve
model generalization.

Model Architecture
1. CNN Layer: Extracts spatial features from spectrograms.
2. LSTM Layer: Processes extracted features over time to detect emotion transitions.
3. Fully Connected Layer: Classifies into four emotion categories.
4. Softmax Layer: Outputs final emotion probability.

Training and Evaluation

 Training: The model is trained using a supervised learning approach with labels derived from
arousal-valence thresholds.
 Hyperparameter Tuning: Batch size, learning rate, and model depth are optimized.
 Evaluation Metrics: Performance is measured using accuracy, precision, recall, F1-score,
and confusion matrix analysis.

5
5. Analysis of Different Methods

Comparison of Approaches
 Supervised Learning (CNN+LSTM): Offers high accuracy and learns hierarchical audio
patterns but requires labelled data.
 Unsupervised Learning (Clustering): Useful for exploratory analysis but lacks accuracy in
classification.
 Traditional Feature Extraction (MFCCs, Chroma): Simple and interpretable but requires
manual feature selection.
 Transformers: Captures long-term dependencies but demands large datasets and high
computational resources.

Experimental Analysis
 CNN+LSTM achieved the best balance between accuracy and computational efficiency.
 Clustering-based methods were less reliable for real-time applications.
 Transformers showed potential but were resource-intensive.

6
6. Performance Parameters

Key Metrics for Evaluation

 Accuracy: Measures the percentage of correctly classified emotions.

Accuracy = (TP + TN) / (TP + TN + FP + FN)
 Precision: Determines how many predicted emotions were correct.
Precision = TP / (TP + FP)
 Recall (Sensitivity): Evaluates the ability to correctly detect emotions.
Recall = TP / (TP + FN)
 F1-Score: Balances precision and recall.
F1-Score = 2 * (Precision * Recall) / (Precision + Recall)
 Confusion Matrix: Analyzes classification errors and misclassifications.
A matrix representation of True Positives (TP), False Positives (FP), True Negatives (TN),
and False Negatives (FN) to evaluate model performance.
 Loss Function (Cross-Entropy Loss): Measures the difference between predicted and actual
probabilities.
Cross-Entropy Loss = -Σ (y * log(y_pred))
Where y is the actual class label, and y_pred is the predicted probability for that class.

7
7. Flowchart of Program

1. Start
The system begins by initializing all necessary components, including importing libraries and setting
up the environment for data processing.

2. Load Dataset (Arousal and Valence CSV, MP3 Files)

 The dataset consists of 1,802 MP3 files and their corresponding arousal and valence values,
recorded at 500ms intervals.
 These CSV files contain song IDs as keys, with numerical values representing emotional
intensity (arousal) and positivity (valence).

3. Preprocess Data: Normalize Values, Generate Emotion Labels

 Normalization: Arousal and valence values are scaled to a uniform range (e.g., between 0
and 1) to improve model efficiency.
 Label Generation: Since explicit labels are unavailable, emotions are derived from arousal
and valence values:
o High Arousal + High Valence → Happy/Energetic
o Low Arousal + High Valence → Calm/Peaceful
o High Arousal + Low Valence → Angry/Tense
o Low Arousal + Low Valence → Sad/Melancholic

4. Extract Mel Spectrograms from Audio

 Mel Spectrograms are generated from each MP3 file to capture frequency and time-based
information.
8
 CNN is used to extract spatial features from these spectrograms.

5. Reshape Data for CNN + LSTM

 The extracted spectrograms are reshaped into a format suitable for deep learning models.
 CNN layers handle feature extraction, while LSTM layers process sequential
dependencies over time.

6. Train CNN + LSTM Model

 The CNN extracts spatial features from spectrograms.
 The LSTM models temporal dependencies to understand emotional progressions within a
song.
 The model is trained using supervised learning, with labels derived from the arousal-valence
mapping.

7. Evaluate Performance (Accuracy, Loss, Confusion Matrix)

 The model’s performance is tested using metrics like accuracy, precision, recall, F1-score,
and a confusion matrix.
 Loss functions (such as cross-entropy loss) measure how well the model generalizes.

8. Predict Emotion for New Song

 When a new song is provided, it undergoes the same preprocessing, feature extraction, and
prediction process.
 The CNN+LSTM model classifies the song into one of the four emotion categories.

9. End
The system outputs predicted emotions and can be used in applications like music
recommendation systems, therapy, gaming, and media analysis.

9
8. Result

Model Performance
 Accuracy: Achieved an accuracy of approximately 85% on test data.
 Emotion Trends:
o High valence + high arousal → Happy
o Low valence + high arousal → Angry
o Low valence + low arousal → Sad

Visualization & Interpretability

 Spectrogram representations for different emotional categories.
 Accuracy and loss graphs over training epochs.
 Confusion matrix to analyze misclassifications.
 Comparison of ground truth vs. predicted emotions.

Spectrogram of a Happy Music Clip

10
9. Conclusion

Music Emotion Recognition (MER) using deep learning is a significant advancement in the
field of artificial intelligence, allowing for a more profound understanding of how music
influences human emotions. This project successfully implemented a CNN+LSTM-based
model that classifies music into four primary emotional categories: Happy/Energetic,
Calm/Peaceful, Angry/Tense, and Sad/Melancholic. By leveraging arousal and valence
values, the system effectively analyzes music patterns and assigns corresponding emotions
with high accuracy. The manually labeled dataset, created using arousal-valence mapping,
played a crucial role in training the model to recognize emotional variations accurately.

The impact of MER extends across multiple domains. Music streaming services can
integrate MER to provide mood-based recommendations, enhancing user engagement.
Mental health applications can benefit from emotion-driven playlists that assist in therapy
and relaxation techniques. In interactive AI systems, MER can enhance gaming experiences
by adapting in-game soundtracks to match player emotions, creating a more immersive and
dynamic environment. Moreover, the application of MER in film scoring can help automate
the selection of background music that aligns with the mood of a scene.

Despite its promising capabilities, MER faces several challenges. Subjectivity in emotional
perception makes classification difficult, as different listeners may interpret the same song
differently. Additionally, the overlap of emotional categories complicates classification,
requiring more nuanced models for better accuracy. Computational complexity is another
hurdle, as training deep learning models on large-scale audio data demands significant
processing power. The trade-off between real-time processing and model accuracy must
also be addressed for MER to be effectively deployed in commercial applications.

11
10.Reference

 IEEE Paper: Music Emotion Recognition Using Deep Learning.

 YouTube Tutorials:
o Music Emotion Recognition | Deep Learning Project.
o Spectrograms and Feature Extraction.
 Hizlisoy, S., Yildirim, S., & Tufekci, Z. (2020). Music emotion recognition using
convolutional long short-term memory deep neural networks. Neural Computing and
Applications, 32(10), 6629-6641.Hizlisoy, S., Yildirim, S., & Tufekci, Z. (2020). Music
emotion recognition using convolutional long short-term memory deep neural networks.
Neural Computing and Applications, 32(10), 6629-6641.
 Zhao, S., Li, Y., Yao, X., Nie, W., Xu, P., Yang, J., & Keutzer, K. (2020). Emotion-based
end-to-end matching between image and music in valence-arousal space. arXiv preprint
arXiv:2009.05103.
 https://ieeexplore.ieee.org/document/10729602

BPM Cbok 4.0
67% (3)
BPM Cbok 4.0
100 pages
Aljanaki
No ratings yet
Aljanaki
149 pages
Paper Id#1570965694
No ratings yet
Paper Id#1570965694
25 pages
Feel The Beat Through Emotion Using Convolutional Neural Network
No ratings yet
Feel The Beat Through Emotion Using Convolutional Neural Network
5 pages
Song Recommdation
No ratings yet
Song Recommdation
18 pages
Batch No-11.
No ratings yet
Batch No-11.
17 pages
Zeroth Review 920&860
No ratings yet
Zeroth Review 920&860
16 pages
Emotion Based Music Recommendation and Player System
No ratings yet
Emotion Based Music Recommendation and Player System
16 pages
Intelligent Music Player
No ratings yet
Intelligent Music Player
44 pages
Final Project
No ratings yet
Final Project
27 pages
Ai Updated
No ratings yet
Ai Updated
17 pages
Emotion Classification For Musical Data Using Deep Learning Techniques
No ratings yet
Emotion Classification For Musical Data Using Deep Learning Techniques
8 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
1 page
Face Emotion Based Music Player System
No ratings yet
Face Emotion Based Music Player System
4 pages
Emotion-Based Music Player
No ratings yet
Emotion-Based Music Player
8 pages
Mufecs
No ratings yet
Mufecs
37 pages
Phase1 Project Report
No ratings yet
Phase1 Project Report
8 pages
Emotion-Based Music Player Using Open CV
No ratings yet
Emotion-Based Music Player Using Open CV
4 pages
Free
No ratings yet
Free
3 pages
Music Emotion Recognition Toward New Robust Standards in Personalized and Context-Sensitive Applications
No ratings yet
Music Emotion Recognition Toward New Robust Standards in Personalized and Context-Sensitive Applications
9 pages
New Final Poster
No ratings yet
New Final Poster
1 page
Emotion-Based Music Recommendation System
No ratings yet
Emotion-Based Music Recommendation System
5 pages
Emotion Based Music Recomentation System
No ratings yet
Emotion Based Music Recomentation System
2 pages
Music Recommendation System Based On Facial Expression
No ratings yet
Music Recommendation System Based On Facial Expression
5 pages
Music Emotion Recognition: Homer H. Chen National Taiwan University
No ratings yet
Music Emotion Recognition: Homer H. Chen National Taiwan University
54 pages
Exploring and Applying Audio-Based Sentiment Analysis in Music
No ratings yet
Exploring and Applying Audio-Based Sentiment Analysis in Music
5 pages
An Emotion Model For Music Using Brain Waves
No ratings yet
An Emotion Model For Music Using Brain Waves
6 pages
Voice Emotion Recognition
No ratings yet
Voice Emotion Recognition
11 pages
Emotion Based Music Player (Manchester Univ)
No ratings yet
Emotion Based Music Player (Manchester Univ)
43 pages
Emotion Based Music Recommendation System
No ratings yet
Emotion Based Music Recommendation System
6 pages
Group 4 - ReVibe Presentation
No ratings yet
Group 4 - ReVibe Presentation
7 pages
Emotion Based Music Playerfinally
No ratings yet
Emotion Based Music Playerfinally
36 pages
Emotion Based Music Recommendation System
No ratings yet
Emotion Based Music Recommendation System
4 pages
Nimbalkar Sandesh Seminar PPT Final
No ratings yet
Nimbalkar Sandesh Seminar PPT Final
20 pages
Emotion Based Music Recommendation System Using LSTM - CNN Architecture
No ratings yet
Emotion Based Music Recommendation System Using LSTM - CNN Architecture
6 pages
Emotional Based Music Recommendation System Using Wearable Physiological Sensors
No ratings yet
Emotional Based Music Recommendation System Using Wearable Physiological Sensors
6 pages
Music Emotion Classification and Context-Based Music Recommendation
No ratings yet
Music Emotion Classification and Context-Based Music Recommendation
28 pages
Song Recommdation
No ratings yet
Song Recommdation
18 pages
Emotion Based Music Recommendation System Using Deep Learning Model
No ratings yet
Emotion Based Music Recommendation System Using Deep Learning Model
6 pages
Emotify An AI-Powered Emotion-Based Music Recommendation System
No ratings yet
Emotify An AI-Powered Emotion-Based Music Recommendation System
6 pages
APznzaZMqp08eqii1Tn4W3njOp2JAFgCyCMYfNYjVXBn_-RW67m5yiZMgy8WqQlWFrgULM1oOhtVD5I7pOtCrx9ZVSsPQZgX0KSNvGrloRVOus1wGECTi01nXQxVVTf7BuAJOm9UMtxyp1J1K9BDZhwbUfLdCgfa7t1A47bz_hopEmHptxFj3k-Rrh11-Wzv8KNPU54rPamYlObHHCsi2j
No ratings yet
APznzaZMqp08eqii1Tn4W3njOp2JAFgCyCMYfNYjVXBn_-RW67m5yiZMgy8WqQlWFrgULM1oOhtVD5I7pOtCrx9ZVSsPQZgX0KSNvGrloRVOus1wGECTi01nXQxVVTf7BuAJOm9UMtxyp1J1K9BDZhwbUfLdCgfa7t1A47bz_hopEmHptxFj3k-Rrh11-Wzv8KNPU54rPamYlObHHCsi2j
1 page
Ref 3
No ratings yet
Ref 3
5 pages
Emotion and Emoji Python Report
No ratings yet
Emotion and Emoji Python Report
53 pages
Xpressify - Harish S 71772118113
No ratings yet
Xpressify - Harish S 71772118113
6 pages
Mini Project
No ratings yet
Mini Project
15 pages
Final 4
No ratings yet
Final 4
11 pages
Music Recommendation System Using Facial Detection Based Emotion Analysis
No ratings yet
Music Recommendation System Using Facial Detection Based Emotion Analysis
6 pages
Facial Expression Based Music Recommendation System: Ijarcce
No ratings yet
Facial Expression Based Music Recommendation System: Ijarcce
11 pages
IJCSP23C1045
No ratings yet
IJCSP23C1045
6 pages
Chapter 1: Introduction
No ratings yet
Chapter 1: Introduction
6 pages
Mini Project - Music Genre Classification
No ratings yet
Mini Project - Music Genre Classification
20 pages
Presentation PDF
No ratings yet
Presentation PDF
14 pages
Minor Project G-24 (Audio Sentiment Analysis)
No ratings yet
Minor Project G-24 (Audio Sentiment Analysis)
15 pages
Emotions Based Music System
No ratings yet
Emotions Based Music System
13 pages
EmoMelody Mapper - Researchppr
No ratings yet
EmoMelody Mapper - Researchppr
8 pages
Reasearch Paper Abstracts
No ratings yet
Reasearch Paper Abstracts
5 pages
Smart Music Player Based On Emotion Recognition From Facial Expression
No ratings yet
Smart Music Player Based On Emotion Recognition From Facial Expression
3 pages
Automatic Mood Classification of Indian Popular Music
No ratings yet
Automatic Mood Classification of Indian Popular Music
64 pages
Affective Computing: Fundamentals and Applications
From Everand
Affective Computing: Fundamentals and Applications
Fouad Sabry
No ratings yet
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
From Everand
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
César Pérez López
No ratings yet
Computer Audition: Fundamentals and Applications
From Everand
Computer Audition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Franco Intelligent Control
No ratings yet
Franco Intelligent Control
232 pages
Module - 3 Artificial Intelligence
No ratings yet
Module - 3 Artificial Intelligence
19 pages
NTCC Final Report
No ratings yet
NTCC Final Report
17 pages
Aihub 1002
No ratings yet
Aihub 1002
20 pages
Gen AI
No ratings yet
Gen AI
11 pages
Cse2009 Soft-Computing Eth 1.1 3 Cse2009
No ratings yet
Cse2009 Soft-Computing Eth 1.1 3 Cse2009
2 pages
Solution 10
No ratings yet
Solution 10
7 pages
Master Thesis Topics: Felix Kahlhoefer
No ratings yet
Master Thesis Topics: Felix Kahlhoefer
10 pages
Python With Automation
No ratings yet
Python With Automation
18 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
1RV21AI011-1RV21AI028 Stream Lab Report
No ratings yet
1RV21AI011-1RV21AI028 Stream Lab Report
34 pages
An Efficient Algorithm For A LSTM-based Method For Stock Returns Prediction
No ratings yet
An Efficient Algorithm For A LSTM-based Method For Stock Returns Prediction
6 pages
e Flux Criticism Value in Garbage Out On Ai Art and Hegemony
No ratings yet
e Flux Criticism Value in Garbage Out On Ai Art and Hegemony
7 pages
AI Human Capital, Jobs and Skills
No ratings yet
AI Human Capital, Jobs and Skills
2 pages
Paper 1
No ratings yet
Paper 1
20 pages
Shashidhar-18csl76 Final
No ratings yet
Shashidhar-18csl76 Final
19 pages
Facial Emotion Detection
No ratings yet
Facial Emotion Detection
20 pages
KianNet A Violence Detection Model Using An Attent
No ratings yet
KianNet A Violence Detection Model Using An Attent
12 pages
Job Description ML Intern (Remote) - Quantrium
No ratings yet
Job Description ML Intern (Remote) - Quantrium
2 pages
Pore Pressure Prediction by Machine Learning Techniques
No ratings yet
Pore Pressure Prediction by Machine Learning Techniques
15 pages
ML Lab
No ratings yet
ML Lab
2 pages
Seminar Report by Bharat Pareek
No ratings yet
Seminar Report by Bharat Pareek
23 pages
AI Part B (XII) 2023-24
No ratings yet
AI Part B (XII) 2023-24
20 pages
Digital Version - The 10 Most Iconic CEO's To Watch in 2024
No ratings yet
Digital Version - The 10 Most Iconic CEO's To Watch in 2024
52 pages
AI Skills Report
No ratings yet
AI Skills Report
20 pages
Introduction To ML,: Module-I
No ratings yet
Introduction To ML,: Module-I
48 pages
Introduction To Generative AI
100% (1)
Introduction To Generative AI
77 pages
Machine Learning Based Telecom-Customer Churn Prediction
No ratings yet
Machine Learning Based Telecom-Customer Churn Prediction
7 pages
Data Science
No ratings yet
Data Science
26 pages