Music Player
Music Player
Abstract— The ability of music to produce an emotional response increase of the web , the network related with music
in its listeners is one of its most exciting and yet the least increment continuously. This has prompted an extensive
understood property. Music not only conveys emotion and database of music, which surely is difficult to categorize
meaning but can also stir a listener’s mood. This paper will study manually on the basis of moods of the music. Subsequently,
various algorithms based on classification to provide a clear
there is a need to create a less time consuming technique such
methodology to i) classify songs into 4 mood categories and ii)
detect users mood through his facial expressions and then a big assignment. The adjustments in meanings of a mood
combine the two to generate user customized music playlist. after some time have additionally lead to an expanded trouble
Songs have been classified by two approaches; by directly for its categorization. For instance, the music we listen today
training the models namely KNN, Support Vector Machines is such a great amount of not quite the same as what it was 20
(SVM), Random Forest and MLP using selected audio features years prior.
and by predicting a songs arousal and valence values using these
audio features. The first approach attains maximum accuracy of In this research, we study methods for classifying moods
70% using MLP while the latter achieves accuracy of 81.6% with the help of features of an audio file and propose
using SVM regression. The face mood classifier using HAAR approaches for this research by using machine learning
classifier and fisher face algorithm attains precision of 92%. classifiers. DEAM dataset has been used for mood
classification. It has more than 2800 songs annotated with 4
Keywords—mood; classification; Multi Layer Perceptron; moods: Happy, Sad, Angry and Relax and with their valence
SVMregression; Valence; Arousal and arousal values. The idea behind this paper is that give
I. INTRODUCTION attention towards on how good audio features are to predict
moods of a audio file. Also, we validate the performance of
Listening to music nowadays has become a day to day result in order to predict valence and arousal values using
activity. There are a huge number of categories of music a these audio features
person to listen to. Human emotions are related to music so
much since we choose to listen to a song that relates to our The following segment gives a review of a portion of the
mood at a particular time. Several studies on Music present work done for music and mood arrangement. In
Information Retrieval [14, 15, 18, 19] have also been carried section 3, we explain the methods and approaches undertaken
out in recent decades. for this research. Section 4 portrays the results of the
investigations directed and the experiments done. Section 5 is
Facial expressions are a great indicator of the state of mind an overview of the research and states future work which can
of an individual. It is indeed the most natural and basic way to be done.
express emotions [1, 2, 11]. In spite of this strong correlation,
most of today’s music software is still devoid of providing the II. STATE OF ART
facility of mood-aware playlist generation.
Emotional meaning of the music is subjective and thus, it The problem of Music Emotion Recognition is a very
depends upon many factors like place, tradition and culture interesting field of study and poses a lot of application. This
whereas the mood category of a song varies depending upon area gives an outline of the related and well known work done
several psychological conditions. Music listeners, collectors or by specialists in the field of audio and mood recognition of
psychologists may use mood wise music widely to categorize audio files.
their music collection, or help soothe their clients. Despite
such extensive use, this field of research is unexplored by In 2005 Wieczorkowska et al [3] published a paper in
many, thus classification task becomes much more difficult, which their goal was to aid the user to find piece of music for
yet important. specific moods. They classified the 327683 .mp3 songs dataset
into 6 emotions using KNN and general accuracy in test in
The music database keeps on increasing as the audio data
parallel yielded 37% correctness. In 2008 [21] used a
also increases in the digital world. There has been some
regression approach to the problem of MER and achieved 64%
development in creating archives for these kind of database.
accuracy for arousal and 59% for valence. In 2012 Yading
Because of the progression in innovation, and the regular
333
Song et al [22] did evaluation of music features for the task of exciting the information is. Now rather than using these audio
MER. Data set of 2904 songs that had been tagged with one features directly, we predict the valence and arousal values
either “happy”, “sad”, “angry” or “relaxed”, and SVM was using SVM Regression with different kernels. These can be
used as the training algorithm. It showed spectral features then mapped into the 2-D Valence-Arousal Space to identify
outperformed the rest of the acoustic features. the mood.
334
Bandwidth classification models. Four above mentioned classification
models have been used.
23-24 Spectral Compute spectral contrast
Contrast
335
IV. EXPERIMENTS AND DISCUSSION evaluate the kernels in SVM Regression we calculate the
Features which were extracted from the audio files are divided accuracy scores, shown in table VII. Table VIII shows the
into two sets train and test set respectively. Numerous confusion matrix for face mood detection. The precision can
performances of the classification models have been be obtained using the confusion matrix.
calculated with the help of training the data and predicting
TABLE V.EVALUATION OF AUDIO FEATURES
results of test set. Performance was improved for predicting
the correct mood by doing numerous attempts on Features Accuracy
classification models. Dynamic 38.83
Rhythm 27.38
A. Data Harmony 35.35
Classification of the mood of an audio file is very much Spectral 61.52
Harmony and Rhythm 46.79
dependent on what kind of dataset is being used and what
Harmony and Dynamic 43.32
features have to be extracted. DEAM dataset is used for the Spectral and Dynamic 60.13
research being done. The MediaEval Database for Emotional Spectral and Harmony 61.52
Analysis of Music (DEAM) is a diverse dataset annotated with Spectral and Rhythm 65.68
mean valence and mean arousal values along with the mood of Rhythm and Dynamic 33.10
the audio file. Metadata including, song title, genre and artist Spectral, Dynamic and Rhythm 61.98
is also provided. Spectral, Dynamic and Harmony 70.88
Spectral, Dynamic and Rhythm 49.04
DEAM [10] dataset contains more than 2800musical audio All features 61.35
excerpts belonging to four moods. The four moods available
in the datasets are: Happy, Sad, Angry and Relax. Categories
such as Tense, Excited, and fear as moods of the songs are TABLE VI.MOOD CLASSIFICATION USING AUDIO FEATURES
not available in this dataset. The training dataset consists of Learning Model Accuracy
audio files which are of type .wav(waveform) and average KNN 60.08
duration of audio files were around 45 seconds, for 4 moods. SVM 56.50
Table III shows the number of audio files dedicated for each RF 55
mood across the training set of the DEAM dataset. MLP 70.88
For mood detection of user a dataset of 448.jpeg images is TABLE VII.EVALUATION OF KERNELS IN SVM REGRESSION
manually created where in all the 4 mood classes has equal
images. Table IV shows the number of files of mood across Kernel Accuracy
the face mood detection dataset. Rbf 81.6
Linear 59
TABLE III. AUDIO FILES IN DATASET Poly 76.4
336
contribute equally to the task of mood classification and hence [11] F. Abdat, C. Maaoui and A. Pruski, “Human-computer interaction using
it is imperative to perform feature selection. For the classifiers emotion recognition from facial expression”,InUKSim 5th European
trained on audio features it is seen MLP (70.88%) has the best Symposium on Computer Modelling and Simulation (2011).
[12] J. Grekow, "Emotion Detection Using Feature Extraction Tools." In
performance for the classification task than the rest for music
International Symposium on Methodologies for Intelligent
mood classification by decreasing over-fitting and applying
Systems,(2015, October), (pp. 267-272). Springer, Cham.
different parameters. The task of predicting valence and [13] K. Han, T. Zin& H. M. Tun, "Extraction Of Audio Features For Emotion
arousal values to be used in music mood classification is done Recognition System Based On Music", In International Journal Of
using SVM regression. 3 different kernels are used and rbf Scientific & Technology Research , (JUNE 2016).
(81.6%) performs the best out of the three. The audio features [14] L. Lu, D. Liu, and H.-J. Zhang, “Automatic mood detection and tracking
extracted could be sufficient to predict moods but the models of music audio signals”, In IEEE Trans. Audio, Speech, Lang. Process.,
could still be improved by applying feature extraction on (Jan. 2006), vol. 14, no. 1, pp. 5–18.
every single audio file and not seeing them as whole since [15] M.-Y. Wang, N.-Y.Zhang, and H.-C. Zhu, “User-adaptive music
every audio file would have different features which are most emotion recognition,” In Proc. Int. Conf. Sig. Process., (2004), pp.
important part of that audio file, which may lead to an increase 1352–1355.
[16] R. Panda & R. P. Paiva., "Music emotion classification: Dataset
in the accuracies obtained. For the task of user’s mood
acquisition and comparative analysis." In 15th International Conference
detection, the model was trained with HAAR frontal-face
on Digital Audio Effects ,(2012). (DAFx-12).
classifier and fisherface algorithm. This model has a precision [17] R. Taneja, A. Bhatia, J. Monga& P. Marwaha, "Emotion detection of
of 92%. audio files." In IEEE Computing for Sustainable Global Development
(INDIACom), 2016 3rd International Conference on (2016, March). (pp.
To make our work progressively solid and usable we may
2397-2400).
change classifier mixes to enhance the less positive cases in [18] T.-L. Wu and S.-K.Jeng, “Extraction of segments of significant
results. Be that as it may, so as to improve the execution of our emotional expressions in music,” In Proc. Int. Workshop Comput. Music
classifiers all the more effectively, refining the list of features Audio Technol., (2006), pp. 76–80.
is the most important factor. Region and culture from which [19] V. Carvalho and C. Chao, “Sentiment retrieval in popular music
the audio file belong is also an important factor on classifying basedonsequential learning,” Proc. ACM SIGIR, 2005
the mood of the audio file. [20] V. R. Ghule, A. B. Benke, S. S. Jadhav, S. A. Joshi," Emotion Based
Music Player Using Facial Recognition", In International Journal of
REFERENCES Innovative Research in Computer and Communication Engineering,
(February 2017), Vol. 5, Issue 2.
[1] A. Lehtiniemi and J. Holm, “Using Animated Mood Pictures in Music [21] Y. H. Yang, , Y. C. Lin, , Y. F. Su, & H. H. Chen, "A regression
Recommendation”, In 16th International Conference on Information approach to music emotion recognition.", In IEEE Transactions on
Visualisation.(2012) audio, speech, and language processing, (2008) 16(2), 448-457.
[2] A. S.Dhavalikar and Dr. R. K. Kulkarni, “Face Detection and Facial [22] Y. Song, S. Dixon, & M. Pearce, "Evaluation of Musical Features for
Expression Recognition System” ,International Conference on Emotion Classification." In ISMIR ,(2012, October). (pp. 523-528).
Electronics and Communication System (ICECS -2014). [23] 2D Valence Arousal Space[Online]
[3] A. Wieczorkowska, P. Synak, R. Lewis, & Z. W.Raś, "Extracting Available:https://www.researchgate.net/figure/The-2D-valence-arousal-
emotions from music data." In International Symposium on emotion-space-Russell-1980-the-position-of-the-
Methodologies for Intelligent Systems (2005, May) (pp. 456-465). affective_fig1_254004106
Springer, Berlin, Heidelberg.
[4] Aljanaki, Anna, Y. H. Yang, and M. Soleymani.” Emotion in music task
at MediaEval 2015.”, In MediaEval 2015 Workshop, Wurzen,
Germany,(2015).
[5] B. G. Patra., D. Das and S. Bandyopadhyay. "Music emotion recognition
system.", In Proceedings of the International Symposium Frontiers of
Research Speech and Music (FRSM-2015). 2015.
[6] B. G. Patra, P. Maitra, D. Das, and S. Bandyopadhyay.“Feed-Forward
Neural Network based Music Emotion Recognition.”, In MediaEval
2015 Workshop, Wurzen, Germany.2015.
[7] B. G. Patra, P. Maitra, D. Das, and S. Bandyopadhyay.“Feed-Forward
Neural Network based Music Emotion Recognition.”, In MediaEval
2015 Workshop, Wurzen, Germany.2015.
[8] B. G. Patra, D. Das, and S. Bandyopadhyay. “Unsupervised approach to
Hindi music mood classification.”,InMining Intelligence and Knowledge
Exploration. Springer International Publishing, 2013.62-69.
[9] B. G. Patra, , D. Das, and S. Bandyopadhyay. “Automatic Music Mood
Classification of Hindi Songs”, In 3rd Workshop on Sentiment Analysis
where AI meets Psychology (SAAIP-2013). 2013.
[10] DEAM dataset:TheMediaEval Database for Emotional Analysis of
Music[Online] Available : http://cvml.unige.ch/databases/DEAM/
337