Music Recommendation System Using Facial Expression Recognition Using Machine Learning
Music Recommendation System Using Facial Expression Recognition Using Machine Learning
https://doi.org/10.22214/ijraset.2022.44396
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Abstract: “MUSIC RECOMMENDATION SYSTEM USING FACIAL EXPRESSION RECOGNITION USING MACHINE
LEARNING" is the title of this project. The study of human emotional responses to visual stimuli such as photos and films,
known as visual sentiment analysis, has proven a fascinating and difficult problem. It attempts to comprehend the high-level
information of visual data. The development of strong algorithms from computer vision is responsible for the present models'
success. The majority of existing models attempt to overcome the problem by recommending either robust features or more
sophisticated models. Visual elements from the entire image or video are the key proposed inputs in particular. Local areas have
received less attention, which we believe is important to the emotional reaction of humans to the entire image. Image recognition
is used to locate people in photos, analyse their emotions, and play emotion-related tunes based on their feelings. This repo
accomplishes this goal by utilising the Google platform's Vision services. Given an image, it would look for faces, identify them,
draw a rectangle around them, and describe the emotion it discovered.
Keywords: Emotion Recognition; music recommendation; Facial Extraction; Emotion Extraction Module; Audio Feature
Extraction Module
I. INTRODUCTION
When they have hundreds of tracks, music listeners find it difficult to manually create and organise playlists. It's also tough to keep
track of all the music: some are added but never played, wasting a lot of device capacity and forcing the user to manually locate and
delete songs. Users must actively choose music depending on their interests and mood each time. When users' play-styles change,
it's also tough to reorganise and play music. As a result, we employed the Machine Learning idea, which entails face scanning and
feature monitoring, to assess the user's mood and create a tailored playlist based on it.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3242
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
H. Immanuel James, J. James Anto Arnold, J. Maria Masilla Ruban, M. Tamilarasan (2019) proposed "Emotion Based Music
Recommendation" It seeks to scan and analyse facial expressions before constructing a playlist based on them. By producing a
suitable playlist based on an individual's emotional traits, the tiresome work of manually segregating or dividing music into distinct
lists is minimised. The suggested method focuses on sensing human emotions in order to create music players that are emotion-
based. Face detection is done with a linear classifier. Using regression trees trained with a gradient boosting approach, a facial
landmark map of a given face picture is constructed based on the pixel intensity values indexed of each point. Emotions are
classified using a multiclass SVM Classifier. Emotions are divided into four categories: happy, angry, sad, and surprised. The
suggested system's shortcomings include the fact that due to the limited availability of images in the image dataset employed, it is
currently unable to accurately record all emotions. There are no diverse feelings to be found. In the wild, handcrafted characteristics
are frequently insufficiently generalizable.
Ali Mollahosseini, Behzad Hasani and Mohammad H. Mahoor (2017) proposed “AffectNet” : More than 1,000,000 facial images
were obtained from the Internet by querying three major search engines using 1250 emotion-related keywords in six different
languages, according to "A Database for Facial Expression, Valence, and Arousal Computing in the Wild," where more than
1,000,000 facial images were obtained from the Internet by querying three major search engines using 1250 emotion-related
keywords in six different languages. The existence of seven unique facial expressions, as well as the strength of valence and arousal,
were manually annotated in around half of the recovered photos. In the continuous domain of the dimensional model, two baselines
are proposed to categorise pictures in the categorical model and predict the value of valence and arousal. There were certain
restrictions, such as the fact that VGG16 only improves on AlexNet by replacing huge kernel-sized filters with multiple 3X3 kernel-
sized filters one by one. Multiple stacked smaller size kernels are better than one bigger size kernel in a given receptive area. The
samples in the AffectNet database aren't particularly powerful. That is, samples with arousal of 1 or -1 and valence of 1 or -1.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3243
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Fig -3: System Architecture of Music Recommendation System using facial expression recognition
A. Modules
1) User: With this module, a user may connect a camera to an application and track live video of a user with various emotional
expressions, as well as take images of each frame and transmit data to a CNN model that will forecast and provide live
emotions. Based on the studied data values provided from the CNN model and received by the user, the live camera was shown
with the type of sentiment ( sad, happy..etc)
2) Dataset Collection: Collect data from the 2013 dataset, which includes pixlel values as features and emotions as labels.
3) Pre processing: This stage divides the dataset into features and labels, which are then stored as x and y values.
4) Initializing CNN: The CNN model is initialised at this step, and features and labels are provided to the fit statement, and the
algorithm is trained. The model is stored as a.h5 file.
5) CNN Model Training: Fer 2013 data set with facial emotions is used as input, and a CNN algorithm model is developed and
saved to disc for prediction of live emotions. ( )
6) Prediction: At this point, the camera will open, and faces will be detected and forecasted using a trained model, with an
emotion song being played based on the results.
V. IMPLEMENTATION
We gathered data for our CNN architecture using the FER2013 [12] database. It was constructed using the Google image search
API and presented at the 2013 ICML Challenges. Faces in the database were automatically adjusted to 4848 pixels. There are 35887
pictures in the FER2013 collection with 7 expression labels (28709 training images, 3589 validation images, and 3589 test images).
A fixed amount of photos are used to symbolise each emotion. CNN is a cable news network (CNN) Putting everything together:
We utilised the OpenCV package to record live web camera shots and apply the Haar Cascades approach to recognise students'
faces. The Adaboost learning method is used in Haar Cascades.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3244
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
The Adaboost learning method picked a small number of relevant features from a vast set in order to provide an effective classifier
output. To do picture augmentation, we utilised Keras' Image Data Generator class. We were able to rotate, shift, shear, zoom, and
flip the training photos using this class. Rotation range=10, width shift range=0.1, zoom range=0.1, height shift range=0.1, and
horizontal flip=True are the parameters utilised. Then we employed four convolutional layers, four pooling layers, and two fully
linked layers in our CNN model. The ReLU function was then used to add nonlinearity to our CNN model, as well as batch
normalisation to normalise the activation of the previous layer at each batch and L2 regularisation to apply penalties to the model's
different parameters. 22
We used the FER 2013 database to train our Convolutional Neural Network model, which covers seven emotions (happiness, anger,
sadness, disgust, neutral, fear and surprise) Before being input into the CNN model, the identified face photos were scaled to 4848
pixels and transformed to grayscale images.
B. Recommendation
Every system, including the Emotion Based Music Player, is subject to modifications and improvements.
The first step is to improve emotion detection accuracy. It is possible to do this by expanding the amount of face characteristics
utilised in emotion recognition. At the moment, the model simply extracts the mouth and eyes. Other face characteristics, such as
brows and cheeks, might be introduced in future development. Furthermore, noise reduction software may be included in the model
to automatically eliminate noise from still or recorded images. Apart from the foregoing, the suggested model might be enhanced by
include auto-adjustment of picture resolution, brightness, and contrast. The quality of the photographs loaded has a significant
impact on the accuracy of emotion recognition in the current application. As a result of the auto adjustment, the user may load any
image quality, and the future model will be able to modify the quality to meet its needs.
Furthermore, a real-time emotion recognition technology may be used to improve the interaction between the user and the
programme. Once the app is activated, the future model will identify and extract face features, allowing the emotion to be identified
in real time.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3245
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
VII. ACKNOWLEDGEMENT
First and foremost, the writer wishes to express theIr heartfelt gratitude and admiration to the project's principal supervisor and
internal guide, Asst. Prof. Mr. V Seetharama Rao, who kept a close eye on the project's development throughout the semesters. The
project's success has been attributed to the constructive remarks and recommendations supplied.
This thanks extends to anyone who took the time to fill out the survey. The author is grateful for the input and collaboration
provided by the participants, which has aided in the development and enhancement of the system prototype.
Last but not least, the author would like to thank her family and friends for their unwavering support and constructive criticism
throughout the process.
REFERENCES
[1] Savva, V. Stylianou, K. Kyriacou, and F. Domenach, "Recognizing student facial expressions: A web application," in Tenerife, 2018 IEEE Global Engineering
Education Conference (EDUCON), p. 1459-1462.
[2] D. Bahdanau, K. Cho, and Y. Bengio. 2014. By learning to align and translate together, neural machines can translate. Presented at ICLR 2015.
[3] Borth, D., Chen, T., Ji, R., and Chang, S.-F. Sentibank: large-scale ontology and classifiers for recognising sentiment and emotions in visual information.
[4] Umesh Kumar, Jagdish Lal Raheja Human Facial Expression Detection From Detected In Captured Image Using Back Propagation Neural Network,
International Journal of Computer Science & Information Technology (IJCSIT), Vol.2 (2010). (1).
[5] Zhengyou Zhang, Feature-Based Facial Expression Recognition: Sensitivity Analysis and Experiments With a Multi-Layer Perceptron, Journal of Pattern
Recognition and Artificial Intelligence, vol. 13(6), pp. 893- 911, 1998.
[6] Eva Cerezo1, Isabelle Hupont, Critina Manresa, Javier Varona, Sandra Baldassarri, Francisco J. Perales, and Francisco J. Seron. Eva Cerezo1, Isabelle Hupont,
Critina Manresa, Javier Varona, Sandra Baldassarri, Francisco J. Perales, and Francisco J. Seron. Real-Time Facial Expression Recognition for Natural
Interaction, in J. Mart et al. (Eds. ), IbPRIA 2007, Part II, LNCS 4478, pp. 40–47. Heidelberg: Springer-Verlag Berlin.
[7] C. Shan, S. Gong, and P. W. McOwan, "A thorough investigation of facial emotion identification based on local binary patterns," Image Vis. Comput., vol. 27,
no. 6, pp. 803–816, May 2009.
[8] Antoinette L. Bouhuys, Gerda M. Bloem, Ton G.G. Groothuis. Antoinette L. Bouhuys, Gerda M. Bloem, Ton G.G. Groothuis. Induction of depressed and
elated moods by music influences the perception of facial emotional expressions in healthy subjects: Journal of Affective Disorders, vol. 33, no. 2, pp. 215-
226.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3246