Music Genre Classification Report
Music Genre Classification Report
MODEL
PROJECT
WORK
Submitted by
AKSHARA SHRI V – 722821104009
DHIVYA PARAMESHWARI S – 722821104041
ISWARIYA G – 722821104061
BATCH
2021 – 2025
Under the Guidance of
………………………………… …………………………………
SIGNATURE SIGNATURE
……………………… ………………………
DECLARATION
i
AKSHARA SHRI V [21CS009]
ISWARIYA G [21CS061]
Place: Coimbatore
Date:
[AKSHARA SHRI V]
[DHIVYA PARAMESHWARI S]
[ISWARIYA G]
i
ACKNOWLEDGEMENT
The success of a work depends on a team and cooperation. We take this opportunity
to express our gratitude and thanks to everyone who helped us in our project. We
would like to thank the management for the constant support provided by them to
complete this project.
We are indebted to our Director Mr. R. Rajaram, for motivating and providing us
with all facilities.
We are indebted to Dr. R. Subha, M.E., Ph.D. Head of Computer Science and
Engineering Department for having permitted us to carry out this project and giving
the complete freedom to utilize the resources of the department.
We also extend our heartfelt thanks to our Project Guide Dr. S. Ananthi, M.E,
Ph.D. Assistant Professor, Department of Computer Science and Engineering for
providing us with her support and guidance which really helped us.
We solemnly express our thanks to all the teaching and nonteaching staff of the
Computer Science and Engineering Department, family and friends for their valuable
support which inspired us to work on this project.
TABLE OF CONTENT
i
TABLE OF CONTENT
S. NO TITLE PAGE
NO.
ABSTRACT i
LIST OF ABREVATIONS iv
1. INTRODUCTION 3
2. SYSTEM ANALYSIS AND DESIGN 6
3. PROPOSED SOLUTION 9
3.1. Overview 10
4. SYSTEM SPECIFICATION 12
5. PROJECT DESCRIPTION 14
5.1. Methodology 15
5.2. Implementation 16
5.2.2 Preprocessing 16
7. CONCLUSION 22
8. REFERENCES 24
ABSTRACT
i
LIST OF FIGURES
i
LIST OF FIGURES
iii
LIST OF ABBREVIATIONS
1
LIST OF ABBREVIATIONS
ACRONYM ABBREVIATION
ML Machine Learning
TL Transfer Learning
2
CHAPTER
1
INTRODUCTION
3
CHAPTER 1
INTRODUCTION
1.1 OBJECTIVE
Browsing and searching by genre can be very effective tools for users of the
rapidly growing networked music archives. The current lack of a generally accepted
automatic genre classification system necessitates manual classification, which is both
time-consuming and inconsistent. Most existing studies have focused on accomplishing
the difficult task of features extraction from music/audio data. The nebulous and
changing nature of genre definitions makes the task well suited to machine leaving
systems such as
Convolutional Neural Networks
This work aims to leverage the Genre Taxonomy for Zero-Annotation Music
Analysis (GTZAN) dataset, a widely-used benchmark dataset in the field of music genre
classification, to train and evaluate our classification model.
And seek to employ CNNs, a type of deep learning architecture well-suited for
analysing spatial data, to automatically learn discriminative features from wavelet and
spectrogram representations of music audio signals. CNNs are capable of capturing
complex patterns within the audio data, enabling effective genre classification.
In addition to CNNs, this project intends to investigate the effectiveness of multi-
model architectures, which combine features extracted from multiple modalities (e.g.,
wavelet and spectrogram representations) to enhance classification performance. By
fusing information from diverse sources, multi-model architectures have the potential to
improve the model's ability to discern subtle genre-specific characteristics in the audio
data.
This work leverage transfer learning, a technique that allows us to transfer
knowledge learned from pre-trained neural networks to our music genre classification
task. To capture both temporal and frequency-domain information inherent in music
audio signals, this work will extract features using wavelet and spectrogram
representations. Wavelet transforms enable us to analyse the time-frequency
4
characteristics of the audio signal, while spectrograms provide a visual representation of
the signal's frequency content over time.
CHAPTER 2
5
SYSTEM ANALYSIS AND DESIGN
CHAPTER 2
6
2.2 PROBLEM STATEMENT
7
8
CHAPTER 3
PROPOSED SOLUTION
CHAPTER 3
PROPOSED SOLUTION
3.1 OVERVIEW I
The project on music genre classification aims to develop a sophisticated system capable
of automatically categorizing music tracks into distinct genres. Genre Taxonomy for Zero-
Annotation Music Analysis dataset is employed in this work as a primary data source. The
proposed work employs the combination of advanced techniques including convolutional
neural networks, multi-model architectures, and transfer learning methods to achieve
accurate genre classification. The Key approach of this project is the extraction of wavelet
and spectrogram features from the audio data. These representations capture both temporal
and frequency-domain information, providing a comprehensive understanding of the
music signals' characteristics. By utilizing wavelets and spectrograms, this Model can
effectively represent the complex nature of music audio, facilitating more robust genre
classification. The utilization of CNNs allows us to automatically learn discriminative
features from the extracted wavelet and spectrogram representations. CNNs are well-
suited for analysing spatial data, and their hierarchical feature learning capabilities enable
them to capture intricate patterns within the audio data, crucial for accurate genre
classification.
9
pre-trained neural network models' knowledge. By fine-tuning these models on the music
genre classification task, the expedite training process and potentially improve the model's
performance, particularly in scenarios with limited annotated data.
10
11
CHAPTER 4
SYSTEM SPECIFICATION
CHAPTER 4
SYSTEM SPECIFICATION
❖ Speed : 3.40GHZ
12
13
CHAPTER 5
PROJECT DESCRIPTION
CHAPTER 5
PROJECT DESCRIPTION
5.1 METHODOLOGY
The methodology of the music genre classification integrates advanced
techniques including convolutional neural networks, multi-model fusion, and transfer
learning, applied to the GTZAN dataset. The project begins by preprocessing the
audio data to ensure consistency in format and duration across the dataset, which is
crucial for effective training and evaluation. This preprocessing step involves tasks
such as audio normalization, resampling, and segmentation to standardize the data.
Additionally, any noise or artifacts present in the audio samples are removed or
mitigated to enhance the quality of the input data.
Furthermore, this model explores the transformative potential of wavelet
transforms, enabling the decomposition of audio signals into time-frequency
representations for deeper insight into spectral characteristics.
This model architecture embodies sophistication, embracing both CNNs and
transfer learning paradigms. CNN architectures are tailored to process spectrogram
inputs, extracting hierarchical features through layers of convolutions and max-
pooling. Concurrently, transfer learning harnesses pre-trained CNN models like VGG,
ResNet, or Inception, adapting their learned features from image datasets to the
nuanced task of music genre classification. The fusion of multi-modal representations
further enhances model performance, capitalizing on diverse feature sets to capture
nuanced genre distinctions. Through rigorous training, meticulous evaluation, and
seamless deployment, this project culminates in a robust system capable of accurately
categorizing music genres, poised to enrich both research and real-world applications
within the realm of audio analysis.
14
5.2 IMPLEMENTATION
5.2.1 Data Extraction
The foundation of the music genre classification project lies in the meticulous
extraction of data from the GTZAN dataset, a widely recognized benchmark
dataset in the field of music genre classification. The GTZAN dataset comprises
audio samples spanning multiple genres, including but not limited to rock, pop,
jazz, classical, and electronic. Each audio sample in the dataset is associated with a
genre label, providing ground truth annotations for training and evaluation
purposes.
5.2.2 Preprocessing
In order to extract valuable information from this music genre classification
project, data preprocessing is a critical step ensuring uniformity and quality in the
GTZAN dataset. Initially the Model starts by loading the dataset and standardizing
the audio format and duration across all samples to ensure consistency.
Normalization is then applied to bring the audio data to a common scale, followed
by resampling to ensure uniformity in sampling rates. Subsequently, longer audio
samples are segmented into shorter segments to facilitate processing, while noise
reduction techniques are employed to mitigate background noise and artifacts.
Finally, wavelet and spectrogram features are extracted from the pre-processed
audio data, providing a comprehensive representation of the audio signals. This
meticulous preprocessing ensures that the classification models receive high-
quality input data, enabling them to effectively learn genre-specific patterns and
achieve accurate genre classification results. By iteratively refining the
preprocessing pipeline and integrating advanced feature engineering and
augmentation techniques, this project endeavours to extract rich and informative
representations from the GTZAN dataset, empowering the classification models to
discern intricate genre distinctions with unparalleled accuracy and reliability.
15
5.2.3 Machine Learning Algorithm
In the music genre classification, convolutional neural networks serve as the
primary machine learning architecture for accurately categorizing music tracks
into predefined genres. CNNs are adept at automatically learning hierarchical
representations of features from spectrogram representations of audio signals. By
leveraging the spatial hierarchies inherent in the spectrogram data, CNNs can
capture intricate patterns and characteristics indicative of different music genres.
Through multiple layers of convolutional filters and pooling operations, CNNs
extract relevant features at varying levels of abstraction, enabling the model to
discern genre-specific patterns across different frequencies and time intervals.
Furthermore, the integration of transfer learning techniques enhances the
performance of CNNs by leveraging pre-trained models' knowledge, accelerating
the training process and improving classification accuracy, particularly in
scenarios with limited annotated data.
16
occur, with transfer learning techniques initializing the models' parameters and
optimizing their performance on specific music genre classification task.
17
CHAPTER 6
RESULT AND IMPLEMENTATION
18
CHAPTER 6
RESULT AND IMPLEMENTATION
19
Figure 6.3 – Stimulate the Multi-Model Output using various music genre
Fig 6.4 – Stimulate Transfer Learning Output using the various Music genre
20
CHAPTER 7
CONCLUSION
21
CHAPTER 7
CONCLUSION
In conclusion, our project on music genre classification has demonstrated the efficacy
of integrating convolutional neural networks, multi-model architectures, and transfer
learning methods to accurately categorize music based on genre labels using the
GTZAN dataset. Through the extraction of wavelets and spectrograms, The Model
have captured intricate temporal and frequency characteristics of the audio data,
enriching our model's understanding of genre-specific patterns. Leveraging CNNs, this
work successfully helps us to learned hierarchical representations of these features,
achieving robust classification performance. Transfer learning expedited training and
improved model generalization, especially in scenarios with limited annotated data.
the methodology's success underscores the potential of deep learning in automating
music analysis tasks, offering promising avenues for music recommendation systems
and content organization tools, while continuous refinement promises further
advancements in this field.
22
CHAPTER 8
REFERENCES
23
CHAPTER 8
REFERENCES
1. Wu, J.; Hong, Q.; Cao, M.; Liu, Y.; Fujita, H. A group consensus-based travel
destination evaluation method with online reviews. Appl. Intel. 2022, 52, 1306–1324.
2. Zhao, C.; Chang, X.; Xie, T.; Fujita, H.; Wu, J. Unsupervised anomaly detection-
based method of risk evaluation for road traffic accident. Appl. Intel. 2022, 1–16.
3. Ganeva, M.G. Music Digitalization and Its Effects on the Finnish Music Industry
Stakeholders. Ph.D. Thesis, Turku School of Economics, Turku, Finland, June 2012.
4. Tzanetakis, G.; Cook, P. Musical genre classification of audio signals. IEEE Trans.
Speech Audio Proc. 2002, 10, 293–302.
5. Chen, K.; Gao, S.; Zhu, Y.; Sun, Q. Music genres classification using text
categorization method. In Proceedings of the 2006 IEEE Workshop on Multimedia
Signal Processing, Victoria, BC, Canada, 3–6 October 2006; pp. 221–224.
6. Dai, J.; Liang, S.; Xue, W.; Ni, C.; Liu, W. Long short-term memory recurrent neural
network-based segment features for music genre classification. In Proceedings of the
2016 10th International Symposium on Chinese Spoken Language Processing
(ISCSLP), Tianjin, China, 17–20 October 2016; pp. 1–5
7. Sanden, C.; Zhang, J.Z. Enhancing multi-label music genre classification through
ensemble techniques. In Proceedings of the 34th International ACM SIGIR
Conference on Research and Development in Information Retrieval, Beijing, China,
24–28 July 2011; pp. 705–714.
8. Vishnupriya, S.; Meenakshi, K. Automatic music genre classification using
convolution neural network. In Proceedings of the 2018 International Conference on
Computer Communication and Informatics (ICCCI), Coimbatore, India, 4–6 January
2018; pp. 1–4.
9. Ajoodha, R.; Klein, R.; Rosman, B. Single-labelled music genre classification using
content-based features. In Proceedings of the 2015 Pattern Recognition Association of
South Africa and Robotics and Mechatronics International Conference (PRASA-
RobMech), Port Elizabeth, South Africa, 26–27 November 2015; pp. 66–71.
24
10.Bahuleyan, H. Music genre classification using machine learning
techniques. arXiv 2018, arXiv:1804.01149.
11.Silla, C.N.; Koerich, A.L.; Kaestner, C.A. A machine learning approach to automatic
music genre classification. J. Braz. Comput. Soc. 2008, 14, 7–18.
12.Karami, A.; Guerrero-Zapata, M. A fuzzy anomaly detection system based on hybrid
PSO-Kmeans algorithm in content-centric networks. Neurocomputing 2015, 149,
1253–1269.
13.Silla, C.N., Jr.; Koerich, A.L.; Kaestner, C.A. Feature selection in automatic music
genre classification. In Proceedings of the 2008 Tenth IEEE International Symposium
on Multimedia, Berkeley, CA, USA, 15–17 December 2008; pp. 39–44.
14.Cheng, G.; Ying, S.; Wang, B.; Li, Y. Efficient performance prediction for Apache
spark. J. Parallel Distrib. Comput. 2021, 149, 40–51.
15.Karami, A. A framework for uncertainty-aware visual analytics in big data. In
Proceedings of the 3rd International Workshop on Artificial Intelligence and
Cognition (AIC) 2015, Turin, Italy, 28–29 September 2015; Volume 1510, pp. 146–
155.
16.Karami, A.; Lundy, M.; Webb, F.; Boyajieff, H.R.; Zhu, M.; Lee, D. Automatic
Categorization of LGBT User Profiles on Twitter with Machine
Learning. Electronics 2021, 10, 1822.
17.Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.;
Tsai, D.; Amde, M.; Owen, S.; et al. Mllib: Machine learning in Apache spark. J.
Mach. Learn. Res. 2016, 17, 1235–1241.
25