0% found this document useful (0 votes)
12 views5 pages

Music Player

Uploaded by

Yashaswini Yashu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Music Player

Uploaded by

Yashaswini Yashu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Mood Based Music Player

Anuja Arora, Aastha Kaul, Vatsala Mittal


Computer Science Department
Jaypee Institute of Information Technology
Noida, India
[email protected], [email protected], [email protected]

Abstract— The ability of music to produce an emotional response increase of the web , the network related with music
in its listeners is one of its most exciting and yet the least increment continuously. This has prompted an extensive
understood property. Music not only conveys emotion and database of music, which surely is difficult to categorize
meaning but can also stir a listener’s mood. This paper will study manually on the basis of moods of the music. Subsequently,
various algorithms based on classification to provide a clear
there is a need to create a less time consuming technique such
methodology to i) classify songs into 4 mood categories and ii)
detect users mood through his facial expressions and then a big assignment. The adjustments in meanings of a mood
combine the two to generate user customized music playlist. after some time have additionally lead to an expanded trouble
Songs have been classified by two approaches; by directly for its categorization. For instance, the music we listen today
training the models namely KNN, Support Vector Machines is such a great amount of not quite the same as what it was 20
(SVM), Random Forest and MLP using selected audio features years prior.
and by predicting a songs arousal and valence values using these
audio features. The first approach attains maximum accuracy of In this research, we study methods for classifying moods
70% using MLP while the latter achieves accuracy of 81.6% with the help of features of an audio file and propose
using SVM regression. The face mood classifier using HAAR approaches for this research by using machine learning
classifier and fisher face algorithm attains precision of 92%. classifiers. DEAM dataset has been used for mood
classification. It has more than 2800 songs annotated with 4
Keywords—mood; classification; Multi Layer Perceptron; moods: Happy, Sad, Angry and Relax and with their valence
SVMregression; Valence; Arousal and arousal values. The idea behind this paper is that give
I. INTRODUCTION attention towards on how good audio features are to predict
moods of a audio file. Also, we validate the performance of
Listening to music nowadays has become a day to day result in order to predict valence and arousal values using
activity. There are a huge number of categories of music a these audio features
person to listen to. Human emotions are related to music so
much since we choose to listen to a song that relates to our The following segment gives a review of a portion of the
mood at a particular time. Several studies on Music present work done for music and mood arrangement. In
Information Retrieval [14, 15, 18, 19] have also been carried section 3, we explain the methods and approaches undertaken
out in recent decades. for this research. Section 4 portrays the results of the
investigations directed and the experiments done. Section 5 is
Facial expressions are a great indicator of the state of mind an overview of the research and states future work which can
of an individual. It is indeed the most natural and basic way to be done.
express emotions [1, 2, 11]. In spite of this strong correlation,
most of today’s music software is still devoid of providing the II. STATE OF ART
facility of mood-aware playlist generation.
Emotional meaning of the music is subjective and thus, it The problem of Music Emotion Recognition is a very
depends upon many factors like place, tradition and culture interesting field of study and poses a lot of application. This
whereas the mood category of a song varies depending upon area gives an outline of the related and well known work done
several psychological conditions. Music listeners, collectors or by specialists in the field of audio and mood recognition of
psychologists may use mood wise music widely to categorize audio files.
their music collection, or help soothe their clients. Despite
such extensive use, this field of research is unexplored by In 2005 Wieczorkowska et al [3] published a paper in
many, thus classification task becomes much more difficult, which their goal was to aid the user to find piece of music for
yet important. specific moods. They classified the 327683 .mp3 songs dataset
into 6 emotions using KNN and general accuracy in test in
The music database keeps on increasing as the audio data
parallel yielded 37% correctness. In 2008 [21] used a
also increases in the digital world. There has been some
regression approach to the problem of MER and achieved 64%
development in creating archives for these kind of database.
accuracy for arousal and 59% for valence. In 2012 Yading
Because of the progression in innovation, and the regular

978-1-5386-9436-7/19/$31.00 ©2019 IEEE

333
Song et al [22] did evaluation of music features for the task of exciting the information is. Now rather than using these audio
MER. Data set of 2904 songs that had been tagged with one features directly, we predict the valence and arousal values
either “happy”, “sad”, “angry” or “relaxed”, and SVM was using SVM Regression with different kernels. These can be
used as the training algorithm. It showed spectral features then mapped into the 2-D Valence-Arousal Space to identify
outperformed the rest of the acoustic features. the mood.

In the same year [16] did Music Emotion Classification


with 903 songs of 5 different categories. SVM was chosen as
the classification algorithm with 10 cross validation resulting
in achieving F-measure of 47.2% with precision of 46.8% and
recall of 47.6%. In 2014 Aathreya et al in [16] used multilayer
neural network for classifying songs based on moods. In 2015
JacekGrekow [12] used 3 different audio tools during emotion
detection. Dataset with 4 emotions was trained and tested by
various algorithms like KNN, Random Forest in WEKA. The
best results were obtained after applying attribute selection for
data from jAudio. In the same year Braja Gopal et al [5]
prepared two systems; first was to detect the valence and
arousal for each song and second was mood classification
.
system of hindi songs and achieved maximum F-score of
72.32. Fig 1. Two Dimensional Valence Arousal Space[23]

PyAudioAnalysis and librosa library [6, 7, 8, 9] in the


In 2016 [17] RenuTaneja et al extracted audio features like python language was used for extracting features from the
tempo, beats, RMSE using jAudio to form clusters of 4 audio files. The total numbers of features extracted are 36. The
different emotions. Kee Moe Han et al [13] took the average size of each audio file is approximately between 5mb-10mb
of emotions given by 15 people as the emotion of the song and and they are 30-60 seconds long. Details of all the features is
a classifier was trained with the same. Audio features such as given in Table 1
pitch, timbre etc. were then extracted and the emotion of the
music signal could then be classified through the dataset with TABLE I. EXTRACTED FEATURES
a probabilistic classifier. Recently in 2017 a music system
using facial recognition [20] was made by V. R. Ghule et al. Featur Feature Name Description
e ID
III. METHODOLOGY
1-2 Root Mean Compute root-mean-square (RMS) energy
There are different techniques which can be used for Square Energy for each frame, either from the audio
classification so as to fit properties of the database and which (RMSE) samples
then leads to different outcomes with respect to actual ground
3 Total Beats Defines the total beats
truth data. In this section we explain details of the all the
techniques used and all the approaches selected for 4 Tempo Estimate the tempo (beats per minute)
experimenting for classifying moods of audio files
5-6 Harmonic Extract only the harmonic component.
A. Model
In the following approach we convert audio files of 7-8 Chroma STFT Compute a chromagram from a waveform
or power spectrogram.
different moods to numerical values using features of these
files in the python language. Feature extraction method is 9-10 Chroma CQ Constant-Q chromagram.
applied to retrieve different kind of audio features such as
harmonic features, spectral, rhythm, energy and chroma 11-12 Chroma CENS Computes the chroma variant “Chroma
vectors of the audio files, and this paper explores through Energy Normalized”, CENS
many algorithms based on classification so as to suggest a 13-14 Melspectrogram Compute a mel-scaled spectrogram.
new approach to classify and detect moods. We investigate
basic classification models such as K-Nearest Neighbors 15-18 Mfcc Mel Frequency Cepstral Coefficients form a
(KNN), Support Vector Machines (SVM), Multi-Layer cepstral representation where the frequency
bands are not linear but distributed
Perceptron (MLP) and Random Forest. Training of the dataset according to the mel-scale.
is done using the above mentioned models and then using
these trained models predictions are made on test set so as to 19-20 Spectral The center of gravity of the spectrum.
provide results. Arousal and Valence values are a major factor Centroid
for improving the training models. Valence is positive or
21-22 Spectral Compute pth-order spectral bandwidth
negative affectivity, whereas arousal measures how calming or

334
Bandwidth classification models. Four above mentioned classification
models have been used.
23-24 Spectral Compute spectral contrast
Contrast

25-26 Spectral rolloff The frequency below which 90% of the


magnitude distribution of the spectrum is
concentrated.

27-28 Poly_features Get coefficients of fitting an nth-order


polynomial to the columns of a
spectrogram.

29-30 Tonnetz Computes the tonal centroid features


(tonnetz).
Fig 1. Mood Classification Approach (KNN: K-Nearest Neighbor, SVM:
31-32 Zero Crossing The rate of sign-changes of the signal
Support Vector Machine Classifier, RF: Random Forest Classifier, MLP:
Rate during the duration of a particular frame.
Multi-Layer Perceptron)
33-34 Percusive Extracts percussive elementsfrom anaudio
time-series. Second, use the selected audio features to predict the
valence and arousal values using SVM regression with 3
35-36 Frames_to_time Extracts percussive elements from an audio different kernels- Linear, Poly and rbf. These are then mapped
time-series. into 2D Space to identify the mood. Fig. 2 shows the flow for
the Valence-Arousal approach.
B. Feature Selection
After feature selection is done the important and highest
Not all features affect musical mood in the same way and
ranking features are used to train models and then the
hence we need to carefully select the features we intend to use
confusion matrix is created with the help of the predicted
for the purpose of mood detection from audio tracks. Audio
moods and the known moods of the audio samples in the test
features are put into 4 suitable dimensions namely: Dynamic,
set. Calculation of accuracy is done using the V-A model and
Harmony, Rhythm and Spectral. Using Feed Forward
confusion matrix is formed as the final outcome.
selection it is clear that spectral, dynamic and harmony
features used together help achieve the best accuracy.

TABLE II.DIMENSIONS OF AUDIO FEATURES

Dimension Audio Features

Dynamic rmse _mean and rmse_std

Rhythm Total Beats and Tempo

Harmony Harm_mean, harm_std, chroma_stft_mean,


chroma_stft_std, chroma_cq_mean, chroma_cq_std,
chroma_cens_mean, chroma_cens_std

Spectral Melspectrogram_mean, melspectrogram_std, mfcc_mean,


mfcc_std, mfcc_delta_mean, mfcc_delta_std, cent_mean,
cent_std, spec_bw_mean, spec_bw_std, contrast_mean, Fig 2.Valence-Arousal Classifier Model
contrast_std, rolloff_mean, rolloff_std, poly_mean,
poly_std, tonnetz_mean, tonnetz_std, zcr_mean, zcr_std D. Facial Mood Detection
For detecting a user’s mood this paper makes use of
C. Classification facial expressions. The pre-trained HAAR frontal-face
For classification the paper presents two approaches. First, classifier is used for detecting a user’s face on screen.
to test and predict directly by using audio feature and four Before training the model it is necessary we preprocess
base classification algorithms: Random Forest (RF), Support and standardize the images by keeping only the face
Vector Machine (SVM), K-Nearest Neighbor (KNN) and portion of the image and by turning it into black and white.
Multi-Layer Perceptron (MLP) are employed. The dataset
was split into training and test where 80% of the data was To create our model we use fisherface algorithm and
dedicated to the training set and the remaining 20% was for collect 16 images per mood category in a 5 second time
test set. Fig. 1 shows the overall flow for first approach. span. Once the model is trained successfully it can be used
Initially, python library has been used to extract features from to detect a user’s mood. After detecting the users mood
take audio emotion data set. Further, feature evaluation is confusion matrix is plotted and precision was then
performed in order to provide error free feature content to calculated using it.

335
IV. EXPERIMENTS AND DISCUSSION evaluate the kernels in SVM Regression we calculate the
Features which were extracted from the audio files are divided accuracy scores, shown in table VII. Table VIII shows the
into two sets train and test set respectively. Numerous confusion matrix for face mood detection. The precision can
performances of the classification models have been be obtained using the confusion matrix.
calculated with the help of training the data and predicting
TABLE V.EVALUATION OF AUDIO FEATURES
results of test set. Performance was improved for predicting
the correct mood by doing numerous attempts on Features Accuracy
classification models. Dynamic 38.83
Rhythm 27.38
A. Data Harmony 35.35
Classification of the mood of an audio file is very much Spectral 61.52
Harmony and Rhythm 46.79
dependent on what kind of dataset is being used and what
Harmony and Dynamic 43.32
features have to be extracted. DEAM dataset is used for the Spectral and Dynamic 60.13
research being done. The MediaEval Database for Emotional Spectral and Harmony 61.52
Analysis of Music (DEAM) is a diverse dataset annotated with Spectral and Rhythm 65.68
mean valence and mean arousal values along with the mood of Rhythm and Dynamic 33.10
the audio file. Metadata including, song title, genre and artist Spectral, Dynamic and Rhythm 61.98
is also provided. Spectral, Dynamic and Harmony 70.88
Spectral, Dynamic and Rhythm 49.04
DEAM [10] dataset contains more than 2800musical audio All features 61.35
excerpts belonging to four moods. The four moods available
in the datasets are: Happy, Sad, Angry and Relax. Categories
such as Tense, Excited, and fear as moods of the songs are TABLE VI.MOOD CLASSIFICATION USING AUDIO FEATURES
not available in this dataset. The training dataset consists of Learning Model Accuracy
audio files which are of type .wav(waveform) and average KNN 60.08
duration of audio files were around 45 seconds, for 4 moods. SVM 56.50
Table III shows the number of audio files dedicated for each RF 55
mood across the training set of the DEAM dataset. MLP 70.88

For mood detection of user a dataset of 448.jpeg images is TABLE VII.EVALUATION OF KERNELS IN SVM REGRESSION
manually created where in all the 4 mood classes has equal
images. Table IV shows the number of files of mood across Kernel Accuracy
the face mood detection dataset. Rbf 81.6
Linear 59
TABLE III. AUDIO FILES IN DATASET Poly 76.4

MoodCategory Number of Audio Files


Happy 753 TABLE VIII.CONFUSIN MATRIX
Sad 759 Actual/Predicted Happy Sad Angry Relax
Angry 637
Happy 101 11 0 0
Relax 749
Sad 3 97 0 12
Angry 9 0 102 1
TABLE IV. IMAGES IN DATASET
Relac 0 5 0 107
MoodCategory Number of Audio Files
Happy 112
Sad 112 It is deduced that Spectral, Dynamic and Harmony features
Angry 112 used together gave better results than using all the features.
Relax 112 For mood classification by model trained with audio features it
is seen MLP outperforms the rest and achieves an accuracy
B. Classification Results score of 70.88%. For predicting valence and arousal values
using SVM regression it is seen Rbf kernel provides the
The results of models with valence and arousal are better than
maximum accuracy of 81.6% and hence should be used. Using
results of models which directly uses the audio features. Also
Table VIII the precision for the face mood classifier is
it is observed that not all features contribute in a similar
calculated, which comes out to be 92%.
fashion to the task of mood classification and hence feature
selection is crucial to this problem. V. CONCLUSION AND FUTURE SCOPE
Table V shows the evaluation of audio features using MLP Through the course of our exploration to recognize mood of
as the classifier. Table VI shows the accuracy scores of audio files utilizing audio features , we have concluded some
different models that trained directly with audio features. To critical outcomes. Firstly, we conclude that not all features

336
contribute equally to the task of mood classification and hence [11] F. Abdat, C. Maaoui and A. Pruski, “Human-computer interaction using
it is imperative to perform feature selection. For the classifiers emotion recognition from facial expression”,InUKSim 5th European
trained on audio features it is seen MLP (70.88%) has the best Symposium on Computer Modelling and Simulation (2011).
[12] J. Grekow, "Emotion Detection Using Feature Extraction Tools." In
performance for the classification task than the rest for music
International Symposium on Methodologies for Intelligent
mood classification by decreasing over-fitting and applying
Systems,(2015, October), (pp. 267-272). Springer, Cham.
different parameters. The task of predicting valence and [13] K. Han, T. Zin& H. M. Tun, "Extraction Of Audio Features For Emotion
arousal values to be used in music mood classification is done Recognition System Based On Music", In International Journal Of
using SVM regression. 3 different kernels are used and rbf Scientific & Technology Research , (JUNE 2016).
(81.6%) performs the best out of the three. The audio features [14] L. Lu, D. Liu, and H.-J. Zhang, “Automatic mood detection and tracking
extracted could be sufficient to predict moods but the models of music audio signals”, In IEEE Trans. Audio, Speech, Lang. Process.,
could still be improved by applying feature extraction on (Jan. 2006), vol. 14, no. 1, pp. 5–18.
every single audio file and not seeing them as whole since [15] M.-Y. Wang, N.-Y.Zhang, and H.-C. Zhu, “User-adaptive music
every audio file would have different features which are most emotion recognition,” In Proc. Int. Conf. Sig. Process., (2004), pp.
important part of that audio file, which may lead to an increase 1352–1355.
[16] R. Panda & R. P. Paiva., "Music emotion classification: Dataset
in the accuracies obtained. For the task of user’s mood
acquisition and comparative analysis." In 15th International Conference
detection, the model was trained with HAAR frontal-face
on Digital Audio Effects ,(2012). (DAFx-12).
classifier and fisherface algorithm. This model has a precision [17] R. Taneja, A. Bhatia, J. Monga& P. Marwaha, "Emotion detection of
of 92%. audio files." In IEEE Computing for Sustainable Global Development
(INDIACom), 2016 3rd International Conference on (2016, March). (pp.
To make our work progressively solid and usable we may
2397-2400).
change classifier mixes to enhance the less positive cases in [18] T.-L. Wu and S.-K.Jeng, “Extraction of segments of significant
results. Be that as it may, so as to improve the execution of our emotional expressions in music,” In Proc. Int. Workshop Comput. Music
classifiers all the more effectively, refining the list of features Audio Technol., (2006), pp. 76–80.
is the most important factor. Region and culture from which [19] V. Carvalho and C. Chao, “Sentiment retrieval in popular music
the audio file belong is also an important factor on classifying basedonsequential learning,” Proc. ACM SIGIR, 2005
the mood of the audio file. [20] V. R. Ghule, A. B. Benke, S. S. Jadhav, S. A. Joshi," Emotion Based
Music Player Using Facial Recognition", In International Journal of
REFERENCES Innovative Research in Computer and Communication Engineering,
(February 2017), Vol. 5, Issue 2.
[1] A. Lehtiniemi and J. Holm, “Using Animated Mood Pictures in Music [21] Y. H. Yang, , Y. C. Lin, , Y. F. Su, & H. H. Chen, "A regression
Recommendation”, In 16th International Conference on Information approach to music emotion recognition.", In IEEE Transactions on
Visualisation.(2012) audio, speech, and language processing, (2008) 16(2), 448-457.
[2] A. S.Dhavalikar and Dr. R. K. Kulkarni, “Face Detection and Facial [22] Y. Song, S. Dixon, & M. Pearce, "Evaluation of Musical Features for
Expression Recognition System” ,International Conference on Emotion Classification." In ISMIR ,(2012, October). (pp. 523-528).
Electronics and Communication System (ICECS -2014). [23] 2D Valence Arousal Space[Online]
[3] A. Wieczorkowska, P. Synak, R. Lewis, & Z. W.Raś, "Extracting Available:https://www.researchgate.net/figure/The-2D-valence-arousal-
emotions from music data." In International Symposium on emotion-space-Russell-1980-the-position-of-the-
Methodologies for Intelligent Systems (2005, May) (pp. 456-465). affective_fig1_254004106
Springer, Berlin, Heidelberg.
[4] Aljanaki, Anna, Y. H. Yang, and M. Soleymani.” Emotion in music task
at MediaEval 2015.”, In MediaEval 2015 Workshop, Wurzen,
Germany,(2015).
[5] B. G. Patra., D. Das and S. Bandyopadhyay. "Music emotion recognition
system.", In Proceedings of the International Symposium Frontiers of
Research Speech and Music (FRSM-2015). 2015.
[6] B. G. Patra, P. Maitra, D. Das, and S. Bandyopadhyay.“Feed-Forward
Neural Network based Music Emotion Recognition.”, In MediaEval
2015 Workshop, Wurzen, Germany.2015.
[7] B. G. Patra, P. Maitra, D. Das, and S. Bandyopadhyay.“Feed-Forward
Neural Network based Music Emotion Recognition.”, In MediaEval
2015 Workshop, Wurzen, Germany.2015.
[8] B. G. Patra, D. Das, and S. Bandyopadhyay. “Unsupervised approach to
Hindi music mood classification.”,InMining Intelligence and Knowledge
Exploration. Springer International Publishing, 2013.62-69.
[9] B. G. Patra, , D. Das, and S. Bandyopadhyay. “Automatic Music Mood
Classification of Hindi Songs”, In 3rd Workshop on Sentiment Analysis
where AI meets Psychology (SAAIP-2013). 2013.
[10] DEAM dataset:TheMediaEval Database for Emotional Analysis of
Music[Online] Available : http://cvml.unige.ch/databases/DEAM/

337

You might also like