0% found this document useful (0 votes)
28 views36 pages

Music Genre Classification Report

The document outlines a project report on 'Music Genre Classification: Hybrid Model' submitted for a Bachelor of Engineering degree at Anna University. It describes the use of Convolutional Neural Networks, multi-model architectures, and transfer learning to classify music genres from audio data, focusing on the GTZAN dataset. The project aims to improve genre classification accuracy by integrating advanced machine learning techniques and feature extraction methods from wavelet and spectrogram representations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views36 pages

Music Genre Classification Report

The document outlines a project report on 'Music Genre Classification: Hybrid Model' submitted for a Bachelor of Engineering degree at Anna University. It describes the use of Convolutional Neural Networks, multi-model architectures, and transfer learning to classify music genres from audio data, focusing on the GTZAN dataset. The project aims to improve genre classification accuracy by integrating advanced machine learning techniques and feature extraction methods from wavelet and spectrogram representations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

MUSIC GENRE CLASSIFICATION: HYBRID

MODEL

INNOVATIVE / MULTI-DISCIPLINARY PROJECT REPORT


SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE AWARD OF THE
INNOVATIVE
DEGREE OF BACHELOR OF ENGINEERING
PROJECT
IN COMPUTER SCIENCE AND ENGINEERING
May 2024 OF THE ANNA UNIVERSITY

PROJECT
WORK
Submitted by
AKSHARA SHRI V – 722821104009
DHIVYA PARAMESHWARI S – 722821104041
ISWARIYA G – 722821104061

BATCH
2021 – 2025
Under the Guidance of

DR. S. ANANTHI M.E., Ph.D.

ASSISTANT PROFESSOR, CSE

COMPUTER SCIENCE AND ENGINEERING

SRI ESHWAR COLLEGE OF ENGINEERING


(An Autonomous Institution – Affiliated to Anna University)

COIMBATORE – 641 202


BONAFIDE CERTIFICATE

Certified that this project report “MUSIC GENRE CLASSIFICATION: A


HYBRID MODEL” is the bonafide work of

AKSHARA SHRI V [21CS009]


DHIVYA PARAMESHWARI S [21CS042]
ISWARIYA G [21CS062]

who carried out the project work under my supervision

………………………………… …………………………………
SIGNATURE SIGNATURE

Dr. R. SUBHA M.E., Ph. D. Dr. S. ANANTHI M.E., Ph.D.


Professor & Head, PROJECT GUIDE,

Dept. of Computer Science & Engineering, Assistant Professor,


Sri Eshwar College of Engineering, Dept. of Computer Science & Engineering,
Coimbatore-641 202. Sri Eshwar College of Engineering,
Coimbatore-641 202.

Submitted for the Autonomous Semester End Innovative / Multi-Disciplinary


Project Viva-Voce held on ………………….

……………………… ………………………

(Internal Examiner) (External Examiner)

DECLARATION

i
AKSHARA SHRI V [21CS009]

DHIVYA PARAMESHWARI S [21CS042]

ISWARIYA G [21CS061]

To declare that the project entitled “MUSIC GENRE CLASSIFICATION:


HYBRID MODEL” submitted in partial fulfilment to the University as the
project work of Bachelor of Engineering (Computer Science and Engineering)
Degree, is a record of original work done by us under the supervision and guidance
of the Dr. S. ANANTHI M.E., Ph.D. Assistant
Professor, Department of Computer Science and Engineering, Sri Eshwar College
of Engineering, Coimbatore.

Place: Coimbatore

Date:

[AKSHARA SHRI V]

[DHIVYA PARAMESHWARI S]

[ISWARIYA G]

Project Guided by,

[Dr. S. ANANTHI, AP/CSE]


ACKNOWLEDGEMENT

i
ACKNOWLEDGEMENT

The success of a work depends on a team and cooperation. We take this opportunity
to express our gratitude and thanks to everyone who helped us in our project. We
would like to thank the management for the constant support provided by them to
complete this project.

It is indeed our great honor bounded duty to thank our beloved


Chairman Mr. R. Mohanram, for his academic interest shown towards the
students.

We are indebted to our Director Mr. R. Rajaram, for motivating and providing us
with all facilities.

We wish to express our sincere regards and deep sense of gratitude to


Dr. Sudha Mohanram, M.E, Ph.D. Principal, for the excellent facilities and
encouragement provided during the course of the study and project.

We are indebted to Dr. R. Subha, M.E., Ph.D. Head of Computer Science and
Engineering Department for having permitted us to carry out this project and giving
the complete freedom to utilize the resources of the department.

We express our sincere thanks to our mini project Coordinator


Ms. D. Mohana Priya, M.E, Assistant Professors of Computer Science and
Engineering Department for the valuable guidance and encouragement given, to us
for this project.

We also extend our heartfelt thanks to our Project Guide Dr. S. Ananthi, M.E,
Ph.D. Assistant Professor, Department of Computer Science and Engineering for
providing us with her support and guidance which really helped us.

We solemnly express our thanks to all the teaching and nonteaching staff of the
Computer Science and Engineering Department, family and friends for their valuable
support which inspired us to work on this project.
TABLE OF CONTENT
i
TABLE OF CONTENT

S. NO TITLE PAGE
NO.
ABSTRACT i

LIST OF FIGURES iii

LIST OF ABREVATIONS iv

1. INTRODUCTION 3
2. SYSTEM ANALYSIS AND DESIGN 6

2.1. Existing Scenario 8

2.2. Existing System 6

3. PROPOSED SOLUTION 9

3.1. Overview 10

3.2. Block Diagram 11

4. SYSTEM SPECIFICATION 12

4.1. Hardware Requirements 13

4.2. Software Requirements 13

5. PROJECT DESCRIPTION 14

5.1. Methodology 15

5.2. Implementation 16

5.2.1 Data Extraction 16

5.2.2 Preprocessing 16

5.2.3 Machine learning Algorithm 17

5.2.4 Integration and Analysis 17

5.2.5 Performance Measures 17

6. IMPLEMENTATIONS AND RESULTS 19

7. CONCLUSION 22

8. REFERENCES 24
ABSTRACT

In this work, an innovative approach to music genre classification utilizing state-of-


the-art techniques in machine learning and signal processing our methodology
integrates Convolutional Neural Networks (CNN), multi-model architectures, and
transfer learning methodologies to accurately classify music based on its genre. The
process involves feature extraction from both wavelet and spectrogram
representations of the audio data, providing comprehensive insights into the temporal
and frequency characteristics of the music. By employing CNNs, this model learns
hierarchical representations of the audio features, capturing intricate patterns crucial
for genre discrimination. Additionally, the incorporation of multi-model architectures
enhances the model's robustness and generalization capabilities by aggregating
diverse features learned from different modalities. Furthermore, transfer learning
empowers this model to leverage pre-trained networks, facilitating effective
knowledge transfer and alleviating the need for extensive training data. Through
extensive experimentation and evaluation, this work proposed approach demonstrates
superior performance compared to traditional methods, achieving high accuracy and
robustness in music genre classification tasks. Overall, this project contributes to
advancing the state-of-the-art in music analysis and showcases the potential of deep
learning techniques in understanding and categorizing complex auditory signals.

i
LIST OF FIGURES
i
LIST OF FIGURES

FIGURE NO. TITLE PAGE NO.

3.1 Block Diagram 06

6.1 Wavelet Transform 16


6.2 Feature Extraction 16
6.3 Classification Output 17

iii
LIST OF ABBREVIATIONS

1
LIST OF ABBREVIATIONS

ACRONYM ABBREVIATION

CNN Convolution Neural Network

GTZAN Genre Taxonomy for Zero-


Annotation Music Analysis

ML Machine Learning

TL Transfer Learning

2
CHAPTER
1

INTRODUCTION

3
CHAPTER 1
INTRODUCTION

1.1 OBJECTIVE
Browsing and searching by genre can be very effective tools for users of the
rapidly growing networked music archives. The current lack of a generally accepted
automatic genre classification system necessitates manual classification, which is both
time-consuming and inconsistent. Most existing studies have focused on accomplishing
the difficult task of features extraction from music/audio data. The nebulous and
changing nature of genre definitions makes the task well suited to machine leaving
systems such as
Convolutional Neural Networks
This work aims to leverage the Genre Taxonomy for Zero-Annotation Music
Analysis (GTZAN) dataset, a widely-used benchmark dataset in the field of music genre
classification, to train and evaluate our classification model.
And seek to employ CNNs, a type of deep learning architecture well-suited for
analysing spatial data, to automatically learn discriminative features from wavelet and
spectrogram representations of music audio signals. CNNs are capable of capturing
complex patterns within the audio data, enabling effective genre classification.
In addition to CNNs, this project intends to investigate the effectiveness of multi-
model architectures, which combine features extracted from multiple modalities (e.g.,
wavelet and spectrogram representations) to enhance classification performance. By
fusing information from diverse sources, multi-model architectures have the potential to
improve the model's ability to discern subtle genre-specific characteristics in the audio
data.
This work leverage transfer learning, a technique that allows us to transfer
knowledge learned from pre-trained neural networks to our music genre classification
task. To capture both temporal and frequency-domain information inherent in music
audio signals, this work will extract features using wavelet and spectrogram
representations. Wavelet transforms enable us to analyse the time-frequency

4
characteristics of the audio signal, while spectrograms provide a visual representation of
the signal's frequency content over time.

CHAPTER 2

5
SYSTEM ANALYSIS AND DESIGN

CHAPTER 2

SYSTEM ANALYSIS AND DESIGN

2.1 EXISTING SCENARIO

The current landscape of music genre classification, there exists a growing


demand for automated systems capable of accurately categorizing music tracks into
distinct genres. Traditional approaches to music genre classification often rely on
handcrafted feature extraction techniques and shallow learning algorithms, which
may struggle to capture the complex and high-dimensional nature of audio data. In
this existing scenario, researchers and practitioners are increasingly turning to
CNNs as a powerful tool for music genre classification. CNNs are well-suited for
analysing spatial data and have been successfully adapted to process audio
spectrograms, which represent the frequency content of audio signals over time. By
leveraging the hierarchical feature learning capabilities of CNNs, researchers can
automatically extract discriminative features from spectrograms, facilitating
accurate genre classification. Moreover, there is a growing interest in exploring
multi-model architectures for music genre classification, which combine features
extracted from different modalities, such as wavelet and spectrogram
representations. By fusing information from multiple sources, multi-model
architectures aim to enhance the model's ability to capture diverse aspects of the
audio data, leading to improved classification performance.

6
2.2 PROBLEM STATEMENT

This project focuses on the development of an automated music genre


classification system using the Genre Taxonomy for Zero-Annotation Music
Analysis (GTZAN) dataset, employing convolutional neural networks, multi-model
architectures, and transfer learning methods. By extracting wavelets and
spectrograms from audio data, the main aim of this project is to address the
challenges posed by the high-dimensional and complex nature of music signals. The
objective of this work is to accurately categorize music tracks into predefined
genres while accommodating genre variability, limited annotated data, and the need
for generalization across diverse music styles. Through the integration of deep
learning techniques and feature extraction methodologies, this work strives to create
a scalable and efficient classification model capable of processing large volumes of
music data in real-time, contributing to advancements in automated music analysis
and facilitating various applications such as music recommendation and content
organization.

7
8
CHAPTER 3

PROPOSED SOLUTION

CHAPTER 3

PROPOSED SOLUTION

3.1 OVERVIEW I

The project on music genre classification aims to develop a sophisticated system capable
of automatically categorizing music tracks into distinct genres. Genre Taxonomy for Zero-
Annotation Music Analysis dataset is employed in this work as a primary data source. The
proposed work employs the combination of advanced techniques including convolutional
neural networks, multi-model architectures, and transfer learning methods to achieve
accurate genre classification. The Key approach of this project is the extraction of wavelet
and spectrogram features from the audio data. These representations capture both temporal
and frequency-domain information, providing a comprehensive understanding of the
music signals' characteristics. By utilizing wavelets and spectrograms, this Model can
effectively represent the complex nature of music audio, facilitating more robust genre
classification. The utilization of CNNs allows us to automatically learn discriminative
features from the extracted wavelet and spectrogram representations. CNNs are well-
suited for analysing spatial data, and their hierarchical feature learning capabilities enable
them to capture intricate patterns within the audio data, crucial for accurate genre
classification.

In addition to CNNs, this project explores the effectiveness of multi-model architectures,


which combine features extracted from multiple modalities. By fusing information from
wavelet and spectrogram representations, the aim of this work is to enhance the model's
ability to discern genre-specific characteristics, leading to improved classification
performance. Furthermore, the model leverage transfer learning methods to capitalize on

9
pre-trained neural network models' knowledge. By fine-tuning these models on the music
genre classification task, the expedite training process and potentially improve the model's
performance, particularly in scenarios with limited annotated data.

3.2 BLOCK DIAGRAM

Figure 3.1 – Block Diagram

10
11
CHAPTER 4

SYSTEM SPECIFICATION

CHAPTER 4
SYSTEM SPECIFICATION

4.1 HARDWARE REQUIREMENTS

❖ Processor Type : Core i3

❖ Speed : 3.40GHZ

❖ RAM : 4GB DD2 RAM

❖ Hard disk : 500 GB

❖ Keyboard : 101/102 Standard Keys

❖ Mouse : Optical Mouse

4.2 SOFTWARE REQUIREMENTS

❖ Operating System : Windows 10+

❖ Software : Google Colabatory

❖ Coding Language : Python

12
13
CHAPTER 5

PROJECT DESCRIPTION

CHAPTER 5

PROJECT DESCRIPTION

5.1 METHODOLOGY
The methodology of the music genre classification integrates advanced
techniques including convolutional neural networks, multi-model fusion, and transfer
learning, applied to the GTZAN dataset. The project begins by preprocessing the
audio data to ensure consistency in format and duration across the dataset, which is
crucial for effective training and evaluation. This preprocessing step involves tasks
such as audio normalization, resampling, and segmentation to standardize the data.
Additionally, any noise or artifacts present in the audio samples are removed or
mitigated to enhance the quality of the input data.
Furthermore, this model explores the transformative potential of wavelet
transforms, enabling the decomposition of audio signals into time-frequency
representations for deeper insight into spectral characteristics.
This model architecture embodies sophistication, embracing both CNNs and
transfer learning paradigms. CNN architectures are tailored to process spectrogram
inputs, extracting hierarchical features through layers of convolutions and max-
pooling. Concurrently, transfer learning harnesses pre-trained CNN models like VGG,
ResNet, or Inception, adapting their learned features from image datasets to the
nuanced task of music genre classification. The fusion of multi-modal representations
further enhances model performance, capitalizing on diverse feature sets to capture
nuanced genre distinctions. Through rigorous training, meticulous evaluation, and
seamless deployment, this project culminates in a robust system capable of accurately
categorizing music genres, poised to enrich both research and real-world applications
within the realm of audio analysis.

14
5.2 IMPLEMENTATION
5.2.1 Data Extraction
The foundation of the music genre classification project lies in the meticulous
extraction of data from the GTZAN dataset, a widely recognized benchmark
dataset in the field of music genre classification. The GTZAN dataset comprises
audio samples spanning multiple genres, including but not limited to rock, pop,
jazz, classical, and electronic. Each audio sample in the dataset is associated with a
genre label, providing ground truth annotations for training and evaluation
purposes.

5.2.2 Preprocessing
In order to extract valuable information from this music genre classification
project, data preprocessing is a critical step ensuring uniformity and quality in the
GTZAN dataset. Initially the Model starts by loading the dataset and standardizing
the audio format and duration across all samples to ensure consistency.
Normalization is then applied to bring the audio data to a common scale, followed
by resampling to ensure uniformity in sampling rates. Subsequently, longer audio
samples are segmented into shorter segments to facilitate processing, while noise
reduction techniques are employed to mitigate background noise and artifacts.
Finally, wavelet and spectrogram features are extracted from the pre-processed
audio data, providing a comprehensive representation of the audio signals. This
meticulous preprocessing ensures that the classification models receive high-
quality input data, enabling them to effectively learn genre-specific patterns and
achieve accurate genre classification results. By iteratively refining the
preprocessing pipeline and integrating advanced feature engineering and
augmentation techniques, this project endeavours to extract rich and informative
representations from the GTZAN dataset, empowering the classification models to
discern intricate genre distinctions with unparalleled accuracy and reliability.

15
5.2.3 Machine Learning Algorithm
In the music genre classification, convolutional neural networks serve as the
primary machine learning architecture for accurately categorizing music tracks
into predefined genres. CNNs are adept at automatically learning hierarchical
representations of features from spectrogram representations of audio signals. By
leveraging the spatial hierarchies inherent in the spectrogram data, CNNs can
capture intricate patterns and characteristics indicative of different music genres.
Through multiple layers of convolutional filters and pooling operations, CNNs
extract relevant features at varying levels of abstraction, enabling the model to
discern genre-specific patterns across different frequencies and time intervals.
Furthermore, the integration of transfer learning techniques enhances the
performance of CNNs by leveraging pre-trained models' knowledge, accelerating
the training process and improving classification accuracy, particularly in
scenarios with limited annotated data.

Fig. 5.2.3 Implies the Architecture of CNN

5.2.4 Integration and Analysis


Integration and analysis are pivotal stages in music genre classification
project, where the model seamlessly combine convolutional neural networks,
multi-model architectures, and transfer learning methods into a unified framework.
During integration, this project carefully merges the pre-processed audio data with
wavelet and spectrogram features, ensuring compatibility and optimization of each
model architecture. Subsequently, training and fine-tuning of the integrated models

16
occur, with transfer learning techniques initializing the models' parameters and
optimizing their performance on specific music genre classification task.

5.2.5 Performance Measure

A confusion matrix is a table that summarizes the performance of a


classification algorithm. It presents the counts of true positive, true negative, false
positive, and false negative predictions.
True Positive (TP): Correctly predicted positive instances.
True Negative (TN): Correctly predicted negative instances.
False Positive (FP): Incorrectly predicted positive instances (Type I error).
False Negative (FN): Incorrectly predicted negative instances (Type II error)
Finally, comprehensive evaluation metrics are employed to assess the
performance of the classification system accurately. In addition to standard metrics
such as accuracy and precision, this model also considers metrics tailored to the
multi-class classification problem, such as F1-score and confusion matrices. This
holistic approach to evaluation ensures that the model's performance is thoroughly
analysed and validated across various dimensions.
TP
Precision=
FP+TP
- (1)
TP
Recall=
TP+ FP
- (2)
2 x Precision+ Recall
F 1 Score = - (3)
Precision x Recall
Total Amount of Predictions
Accuracy=
Number of Correct Predictions
- (4)

17
CHAPTER 6
RESULT AND IMPLEMENTATION

18
CHAPTER 6
RESULT AND IMPLEMENTATION

Figure 6.1 –Wavelet Transform of the Audio Data

Figure 6.2 – Model Implementation

19
Figure 6.3 – Stimulate the Multi-Model Output using various music genre

Fig 6.4 – Stimulate Transfer Learning Output using the various Music genre

20
CHAPTER 7
CONCLUSION

21
CHAPTER 7

CONCLUSION

In conclusion, our project on music genre classification has demonstrated the efficacy
of integrating convolutional neural networks, multi-model architectures, and transfer
learning methods to accurately categorize music based on genre labels using the
GTZAN dataset. Through the extraction of wavelets and spectrograms, The Model
have captured intricate temporal and frequency characteristics of the audio data,
enriching our model's understanding of genre-specific patterns. Leveraging CNNs, this
work successfully helps us to learned hierarchical representations of these features,
achieving robust classification performance. Transfer learning expedited training and
improved model generalization, especially in scenarios with limited annotated data.
the methodology's success underscores the potential of deep learning in automating
music analysis tasks, offering promising avenues for music recommendation systems
and content organization tools, while continuous refinement promises further
advancements in this field.

22
CHAPTER 8
REFERENCES

23
CHAPTER 8
REFERENCES

1. Wu, J.; Hong, Q.; Cao, M.; Liu, Y.; Fujita, H. A group consensus-based travel
destination evaluation method with online reviews. Appl. Intel. 2022, 52, 1306–1324.
2. Zhao, C.; Chang, X.; Xie, T.; Fujita, H.; Wu, J. Unsupervised anomaly detection-
based method of risk evaluation for road traffic accident. Appl. Intel. 2022, 1–16.
3. Ganeva, M.G. Music Digitalization and Its Effects on the Finnish Music Industry
Stakeholders. Ph.D. Thesis, Turku School of Economics, Turku, Finland, June 2012.
4. Tzanetakis, G.; Cook, P. Musical genre classification of audio signals. IEEE Trans.
Speech Audio Proc. 2002, 10, 293–302.
5. Chen, K.; Gao, S.; Zhu, Y.; Sun, Q. Music genres classification using text
categorization method. In Proceedings of the 2006 IEEE Workshop on Multimedia
Signal Processing, Victoria, BC, Canada, 3–6 October 2006; pp. 221–224.
6. Dai, J.; Liang, S.; Xue, W.; Ni, C.; Liu, W. Long short-term memory recurrent neural
network-based segment features for music genre classification. In Proceedings of the
2016 10th International Symposium on Chinese Spoken Language Processing
(ISCSLP), Tianjin, China, 17–20 October 2016; pp. 1–5
7. Sanden, C.; Zhang, J.Z. Enhancing multi-label music genre classification through
ensemble techniques. In Proceedings of the 34th International ACM SIGIR
Conference on Research and Development in Information Retrieval, Beijing, China,
24–28 July 2011; pp. 705–714.
8. Vishnupriya, S.; Meenakshi, K. Automatic music genre classification using
convolution neural network. In Proceedings of the 2018 International Conference on
Computer Communication and Informatics (ICCCI), Coimbatore, India, 4–6 January
2018; pp. 1–4.
9. Ajoodha, R.; Klein, R.; Rosman, B. Single-labelled music genre classification using
content-based features. In Proceedings of the 2015 Pattern Recognition Association of
South Africa and Robotics and Mechatronics International Conference (PRASA-
RobMech), Port Elizabeth, South Africa, 26–27 November 2015; pp. 66–71.

24
10.Bahuleyan, H. Music genre classification using machine learning
techniques. arXiv 2018, arXiv:1804.01149.
11.Silla, C.N.; Koerich, A.L.; Kaestner, C.A. A machine learning approach to automatic
music genre classification. J. Braz. Comput. Soc. 2008, 14, 7–18.
12.Karami, A.; Guerrero-Zapata, M. A fuzzy anomaly detection system based on hybrid
PSO-Kmeans algorithm in content-centric networks. Neurocomputing 2015, 149,
1253–1269.
13.Silla, C.N., Jr.; Koerich, A.L.; Kaestner, C.A. Feature selection in automatic music
genre classification. In Proceedings of the 2008 Tenth IEEE International Symposium
on Multimedia, Berkeley, CA, USA, 15–17 December 2008; pp. 39–44.
14.Cheng, G.; Ying, S.; Wang, B.; Li, Y. Efficient performance prediction for Apache
spark. J. Parallel Distrib. Comput. 2021, 149, 40–51.
15.Karami, A. A framework for uncertainty-aware visual analytics in big data. In
Proceedings of the 3rd International Workshop on Artificial Intelligence and
Cognition (AIC) 2015, Turin, Italy, 28–29 September 2015; Volume 1510, pp. 146–
155.
16.Karami, A.; Lundy, M.; Webb, F.; Boyajieff, H.R.; Zhu, M.; Lee, D. Automatic
Categorization of LGBT User Profiles on Twitter with Machine
Learning. Electronics 2021, 10, 1822.
17.Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.;
Tsai, D.; Amde, M.; Owen, S.; et al. Mllib: Machine learning in Apache spark. J.
Mach. Learn. Res. 2016, 17, 1235–1241.

25

You might also like