Intelligent Music Player
Intelligent Music Player
(Deemed to be University)
By
PINISETTI JAYAPRAKASH
REG. NO 38110405
MARCH 2022
SATHYABAMA
INSTITUTE OF SCIENCE AND
TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC (Established
Under Section 3 of UGC Act, 1956) JEPPIAAR
NAGAR,RAJIV GANDHI SALAI
CHENNAI– 600119
www.sathyabama.ac.in
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the Bonafide work of PINISETTI JAYA
PRAKASH (38110405) and P.kumarswamy(38110409) who carried out the project
entitled “Emotion aware Smart Music Recommender System using CNN ” under
my supervision from NOV 2021 to MAR 2022.
Internal Guide
InternalExaminer ExternalExaminer
DECLARATION
DATE:
I would like to express my sincere and deep sense of gratitude to my Project Guide
Dr .T. Sasikala , M.E.,Ph.D for his valuable guidance, suggestions and constant
encouragement paved way for the successful completion of my project work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the Department
of Computer Science and Engineering who were helpful in many ways for the completion of
the project.
TABLE OFCONTENTS
CHAPTERNO. TITL PAGENO.
E
ABSTRACT V
LISTOFFIGURES VIII
LISTOFABBREVIATIONS IX
1 INTRODUCTION 1
1
2
2
2
2
3
• CNN ALOGRITHM 10
• CONVULATION LAYER 11
• POOLING LAYER 13
• FULLY CONNECTED LAYER
• PROBLEMSTATEMENT `15
• PROBLEMDESCRIPTION
2 LITERATURESURVEY 16
18
• MOOD BASED MUSIC RECOMMENDER
SYSTEM 18
● An Emotion-Aware Personalized Music
Recommendation System Using a Convolutional 20
Neural Networks Approach
● DEEP LEARNING IN MUSIC
● Review on Facial Expression Based Music Player
• SMART MUSIC PLAYER BASED ON FAIAL
EXPRESSION
3 SOFTWARE RQURIMENTS 21
• EXISTINGSYSTEM 22
• PROPOSEDSYSTEM 22
• ArchitectureOverview
22
• SYSTEMREQUIREMENTS 23
4 SYSTEM ANALYSIS 24
24
• PURPOSE
24
• SCOPE
• EXISTING SYSTEM 24
25
● PROPSED SYSTEM
5 SYSTEM DESGIN 26
● INPUT DESGIN 26
26
● OUTPUT DESGIN
26
● DATAFLOW 26
27
● UML DIAGREMS
6 MODULES 32
● Data Collection Module 32
32
● Emotion Extraction Module
● Audio Extraction Module
● Emotion - Audio Integration
Module
7 SYSTEM IMPLEMENTATION 33
● SYSTEM ARTICTURE
8 SYSTEM TESTING 35
● TEST OF PLAN 35
● VERIFICATION 36
● VALIDATION 40
● WHITE BOX TESTING
● BLACK BOX TESTING
● TYPES OF TESTING
● REQURIMENT ANALYSIS
● FUNCTIONAL ANALYSIS
● NON FUNCTIONAL ANALYSIS
9 CONCLUSION 41
REFERENCES 42
PLAGRISM REPORT 43
LIST OF FIGURES
ABSTRACT
CNN takes an image as input, which is classified and process under a certain
category such as dog, cat, lion, tiger, etc. The computer sees an image as an array
of pixels and depends on the resolution of the image. Based on image resolution, it
will see as h * w * d, where h= height w= width and d= dimension. For example,
An RGB image is 6 * 6 * 3 array of the matrix, and the grayscale image is 4 * 4 *
1 array of the matrix.
In CNN, each input image will pass through a sequence of convolution layers
along with pooling, fully connected layers, filters (Also known as kernels). After
that, we will apply the Soft-max function to classify an object with probabilistic
values 0 and 1.
Convolution Layer
Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional layer
preserves the relationship between pixels. It is a mathematical operation which
takes two inputs such as image matrix and a kernel or filter.
The convolution of 5*5 image matrix multiplies with 3*3 filter matrix is called
"Features Map" and show as an output.
Strides
Stride is the number of pixels which are shift over the input matrix. When the
stride is equaled to 1, then we move the filters to 1 pixel at a time and similarly, if
the stride is equaled to 2, then we move the filters to 2 pixels at a time. The
following figure shows that the convolution would work with a stride of 2.
Padding
Padding plays a crucial role in building the convolutional neural network. If the
image will get shrink and if we will take a neural network with 100's of layers on
it, it will give us a small image after filtered in the end.
If we take a three by three filter on top of a grayscale image and do the convolving
then what will happen?
It is clear from the above picture that the pixel in the corner will only get covers
one time, but the middle pixel will get covered more than once. It means that we
have more information on that middle pixel, so there are two downsides:
o Shrinking outputs
o Losing information on the corner of the image.
Pooling Layer
Max Pooling
Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.
Syntax
layer = averagePooling2dLayer(poolSize)
layer = averagePooling2dLayer(poolSize,Name,Value)
Sum Pooling
The sub-region for sum pooling or mean pooling are set exactly the same as
for max-pooling but instead of using the max function we use sum or mean.
The fully connected layer is a layer in which the input from the other layers will be
flattened into a vector and sent. It will transform the output into the desired
number of classes by the network.
In the above diagram, the feature map matrix will be converted into the vector
such as x1, x2, x3... xn with the help of fully connected layers. We will combine
features to create a model and apply the activation function such
as softmax or sigmoid to classify the outputs as a car, dog, truck, etc.
CHAPTER 2
LITERATURE SURVEY
Emotion recognition has many useful applications in daily lives. In this paper, we
present a potential approach to detect human emotion in real time. For any face
detected in camera, we extract the corresponding facial landmarks and examine
different kinds of features and models for predicting human emotion. The
experiments show that our proposed system can naturally detect human emotion in
real time and achieve an average accuracy about 70.65%.
Music plays a very important role in human's daily life and in the modern
advanced technologies. Usually, the user has to face the task of manually browsing
through the playlist of songs to select. Here we are proposing an efficient and
accurate model, that would generate a playlist based on current emotional state and
behavior of the user. Existing methods for automating the playlist generation
process are computationally slow, less accurate and sometimes even require use of
additional hardware like EEG or sensors. Speech is the most ancient and natural
way of expressing feelings, emotions and mood and its and its processing requires
high computational, time, and cost. This proposed system based on real-time
extraction of facial expressions as well as extracting audio features from songs to
classify into a specific emotion that will generate a playlist automatically such that
the computation cost is relatively low.
CHAPTER 3
SYSTEM REQUIREMENTS
3.1 HARDWARE REQUIREMENTS
System : Pentium i3 Processor
Hard Disk : 500 GB.
Monitor : 15’’ LED
Input Devices : Keyboard, Mouse
Ram : 2 GB
3.2 SOFTWARE REQUIREMENTS
Operating system : Windows 10
Coding Language : Python
Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).
⮚ Easy-to-read − Python code is more clearly defined and visible to the eyes.
⮚ Portable − Python can run on a wide variety of hardware platforms and has
the same interface on all platforms.
CHAPTER 4
SYSTEM ANALYSIS
4.1 PURPOSE
The purpose of this document is real time facial expression based music
recommender system using machine learning algorithms. In detail, this document
will provide a general description of our project, including user requirements,
product perspective, and overview of requirements, general constraints. In addition,
it will also provide the specific requirements and functionality needed for this
project - such as interface, functional requirements and performance requirements.
4.2 SCOPE
The scope of this SRSdocument persists for the entire life cycle of the project.
This document defines the final state of the software requirements agreed upon by
the customers and designers. Finally at the end of the project execution all the
functionalities may be traceable from the SRSto the product. The document
describes the functionality, performance, constraints, interface and reliability for the
entire life cycle of the project.
4.3 EXISTING SYSTEM
Nikhil et al. determines the mindset of the user by using facial expression Humans
often express their feeling by their expressions, hand gestures, and by raising the
voice of tone but mostly humans express their feelings by their face.
Emotion-based music player reduces the time complexity of the user. Generally,
people have a large number of songs on their playlist. Playing songs randomly
does not satisfy the mood of the user. This system helps user to play songs
automatically according to their mood. The image of the user is captured by the
web camera, and the images are saved. The images are first converted from RGB
to binary format. This process of representing the data is called a feature-point
detection method. This process can also be done by using Haar Cascade
technology provided by Open CV. The music player is developed by using a java
program. It manages the database and plays the song according to the mood of the
user.
4.4 PROPOSED SYSTEM
The proposed system can detect the facial expressions of the user and based on
his/her facial expressions extract the facial landmarks, which would then be
classified to get a particular emotion of the user. Once the emotion has been
classified the songs matching the user's emotions would be shown to the user. In
this proposed system, we develop a prototype in recommendation of dynamic
music recommendation system based on human emotions. Based on each human
listening pattern, the songs for each emotions are trained. Integration of feature
extraction and machine learning techniques, from the real face the emotion are
detected and once the mood is derived from the input image, respective songs for
the specific mood would be played to hold the users. In this approach, the
application gets connected with human feelings thus giving a personal touch to the
users. Therefore our projected system concentrate on identifying the human
feelings for developing emotion based music player using computer vision and
machine learning techniques. For experimental results, we use openCV for
emotion detection and music recommendation.
CHAPTER 5
SYSTEM DESIGN
UML DIAGRAMS
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that
they can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.
ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities
and actions with support for choice, iteration and concurrency. In the Unified
Modeling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity
diagram shows the overall flow of control.
CHAPTER 6
MODULES
MODULES
⮚ Data Collection Module
⮚ Emotion Extraction Module
⮚ Audio Extraction Module
⮚ Emotion - Audio Integration Module
MODULE DESCRIPTION
Data Collection Module
A survey was collected from users based on 3 parameters which are, 1. What type
of songs would they want to listen to when they are happy? 2. What type of songs
would they want to listen to when they are sad? 3. What type of songs would they
want to listen to when they are angry.
SYSTEM TESTING
8.2 Verification
Verification is the process to make sure the product satisfies the conditions
imposed at the start of the development phase. In other words, to make sure the
product behaves the way we want it to.
8.3 Validation
Validation is the process to make sure the product satisfies the specified
requirements at the end of the development phase. In other words, to make sure
the product is built as per customer requirements.
There are two basics of software testing: black box testing and white box
testing.
Black box testing is a testing technique that ignores the internal mechanism
of the system and focuses on the output generated against any input and execution
of the system. It is also called functional testing.
White box testing is a testing technique that takes into account the internal
mechanism of a system. It is also called structural testing and glass box testing.
Black box testing is often used for validation and white box testing is often used
for verification.
● Unit Testing
● Integration Testing
● Functional Testing
● System Testing
● Stress Testing
● Performance Testing
● Usability Testing
● Acceptance Testing
● Regression Testing
● Beta Testing
System testing is the testing to ensure that by putting the software in different
environments (e.g., Operating Systems) it still works. System testing is done with
full system implementation and environment. It falls under the class of black box
testing.
REQUIREMENT ANALYSIS
Requirement analysis, also called requirement engineering, is the process of
determining user expectations for a new modified product. It encompasses the
tasks that determine the need for analyzing, documenting, validating and
managing software or system requirements. The requirements should be
documentable, actionable, measurable, testable and traceable related to identified
business needs or opportunities and define to a level of detail, sufficient for system
design.
FUNCTIONAL REQUIREMENTS
It is a technical specification requirement for the software products. It is the
first step in the requirement analysis process which lists the requirements of
particular software systems including functional, performance and security
requirements. The function of the system depends mainly on the quality hardware
used to run the software with given functionality.
Usability
It specifies how easy the system must be use. It is easy to ask queries in any
format which is short or long, porter stemming algorithm stimulates the desired
response for user.
Robustness
It refers to a program that performs well not only under ordinary conditions
but also under unusual conditions. It is the ability of the user to cope with errors
for irrelevant queries during execution.
Security
The state of providing protected access to resource is security. The system
provides good security and unauthorized users cannot access the system there by
providing high security.
Reliability
It is the probability of how often the software fails. The measurement is
often expressed in MTBF (Mean Time Between Failures). The requirement is
needed in order to ensure that the processes work correctly and completely without
being aborted. It can handle any load and survive and survive and even capable of
working around any failure.
Compatibility
It is supported by version above all web browsers. Using any web servers
like localhost makes the system real-time experience.
Flexibility
The flexibility of the project is provided in such a way that is has the ability
to run on different environments being executed by different users.
Safety
Safety is a measure taken to prevent trouble. Every query is processed in a
secured manner without letting others to know one’s personal information.
NON- FUNCTIONAL REQUIREMENTS
Portability
It is the usability of the same software in different environments. The project can
be run in any operating system.
Performance
These requirements determine the resources required, time interval, throughput
and everything that deals with the performance of the system.
Accuracy
The result of the requesting query is very accurate and high speed of retrieving
information. The degree of security provided by the system is high and effective.
Maintainability
Project is simple as further updates can be easily done without affecting its
stability. Maintainability basically defines that how easy it is to maintain the
system. It means that how easy it is to maintain the system, analyse, change and
test the application. Maintainability of this project is simple as further updates can
be easily done without affecting its stability.
CHAPTER 9
CONCLUSION
REFERENCES
[1] Emanuel I. Andelin and Alina S. Rusu,”Investigation of facial
microexpressions of emotions in psychopathy - a case study of an individual in
detention”, 2015, Published by Elsevier Ltd.
[2] Paul Ekman, Wallace V Friesen, and Phoebe Ellsworth. Emotion in the human
face: Guidelines for research and an integration of findings. Elsevier 2013.
[3] F. De la Torre and J. F. Cohn, “Facial expression analysis,” Vis. Anal. Hum.,
pp. 377–410, 2011.
[4] Bavkar, Sandeep, Rangole, Jyoti, Deshmukh,” Geometric Approach for Human
Emotion Recognition using Facial Expression”, International Journal of Computer
Applications, 2015.
[5] Zhang, Z. Feature-based facial expression recognition: Sensitivity analysis and
experiments with a multilayer perceptron. International Journal of Patten
Recognition and Artificial Intelligence.
[6] Remi Delbouys, Romain ´ Hennequin, Francesco Piccoli, Jimena RoyoLetelier,
Manuel Moussallam. “Music mood detection based on audio.
[7] nd lyrics with Deep Neural Net”, 19th International Society for Music
Information Retrieval Conference, Paris, France, 2018.
[8] KrittrinChankuptarat, etal, “Emotion Based Music Player”, IEEE 2019
conference.
[9] Kim, Y.: Convolutional Neural Networks for Sentence Classification. In:
Proceedings of the 2014 Conference on EMNLP, pp. 1746–1751 (2014).
[10] Tripathi, S., Beigi, H.: Multi-Modal Emotion recognition on IEMOCAP
Dataset using Deep Learning. In: arXiv:1804.05788 (2018).
[11] Teng et al.,”Recognition of Emotion with SVMs”, Lecture Notes in Computer
Science, August 2006.
[12] B.T. Nguyen, M.H. Trinh, T.V. Phan, H.D. NguyenAn efficient realtime
emotion detection using camera and facial landmarks , 2017 seventh international
conference on information science and technology (ICIST) (2017)
SAMPLE CODE
from keras.preprocessing.image import img_to_array
import imutils
import cv2
from keras.models import load_model
import numpy as np
import tensorflow as tf
from tensorflow.python.keras.layers import Input, Embedding, Dot, Reshape,
Dense
from tensorflow.python.keras.models import Model
from playsound import playsound
import time
# parameters for loading data and images
detection_model_path =
'haarcascade_files/haarcascade_frontalface_default.xml'
emotion_model_path = 'models/_mini_XCEPTION.102-0.66.hdf5'
#feelings_faces = []
#for index, emotion in enumerate(EMOTIONS):
# feelings_faces.append(cv2.imread('emojis/' + emotion + '.png', -1))
preds = emotion_classifier.predict(roi)[0]
print(preds)
emotion_probability = np.max(preds)
label = EMOTIONS[preds.argmax()]
# try:
for (i, (emotion, prob)) in enumerate(zip(EMOTIONS, preds)):
# construct the label text
w = int(prob * 300)
cv2.rectangle(canvas, (7, (i * 35) + 5),
(w, (i * 35) + 35), (0, 0, 255), -1)
cv2.putText(canvas, text, (10, (i * 35) + 23),
cv2.FONT_HERSHEY_SIMPLEX, 0.45,
(255, 255, 255), 2)
cv2.putText(frameClone, label, (fX, fY - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255),
2)
cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH),
(0, 0, 255), 2)
cv2.imshow('your_face', frameClone)
cv2.imshow("Probabilities", canvas)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
if emotion == 'happy':
camera.release()
cv2.destroyAllWindows()
time.sleep(20)
break
if emotion == 'sad':
playsound(r'C:/Users/Kumara Swamy/Documents/facial emotion
final/songs/sad/enn-kathalle.mp3')
camera.release()
cv2.destroyAllWindows()
time.sleep(20)
break
if emotion == 'neutral':
if emotion == 'scared':
if emotion == 'surprised':
if emotion == 'angry':
if emotion == 'fear':
playsound('SONGS/fear/agayam.mp3')
camera.release()
cv2.destroyAllWindows()
time.sleep(20)
break
# for c in range(0, 3):
# frame[200:320, 10:130, c] = emoji_face[:, :, c] * \
# (emoji_face[:, :, 3] / 255.0) + frame[200:320,
# 10:130, c] * (1.0 - emoji_face[:, :, 3] / 255.0)
# except :
# er = 'error'