0% found this document useful (0 votes)
3 views40 pages

cpp edit

The document presents a project on an 'Emotions Based Music System' that utilizes facial emotion recognition to recommend music based on user emotions. It outlines the system's design, requirements, and implementation, integrating machine learning techniques for emotion detection and music recommendation. The project acknowledges contributions from faculty and peers while detailing the methodology and expected outcomes of the system.

Uploaded by

Geeta Birle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views40 pages

cpp edit

The document presents a project on an 'Emotions Based Music System' that utilizes facial emotion recognition to recommend music based on user emotions. It outlines the system's design, requirements, and implementation, integrating machine learning techniques for emotion detection and music recommendation. The project acknowledges contributions from faculty and peers while detailing the methodology and expected outcomes of the system.

Uploaded by

Geeta Birle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Emotions Based Music System

ACKNOWLEDGEMENT

It is our privilege to express our sincere regards to our project guide, Mrs.A.S.Shinde
mam for her valuable inputs, able guidance, encouragement, whole- hearted cooperation and
constructive criticism throughout the duration of our project. We deeply express our sincere
thanks to our Head of Department Mr. .S.D. Jadhav for encouraging and allowing us to
present the project on the topic ―Emotions Based Music System at our department premises
for the partial fulfillment of the requirements leading to the award of Diploma in Information
Technology Engineering.
We take this opportunity to thank all our lecturers who have directly or indirectly helped
us for our project. We pay our respect and love to our parents and all other family members and
friends for their love and encouragement throughout our career. Last but not the least we
express our thanks to our friends for their cooperation and support.

Bardapure Durga (Enrollment no:-2200160553)


Birajdar Laxmi (Enrollment no:- 2200160555)
Birle Geeta (Enrollment no:- 2200160556)
Kolpuke Ankita (Enrollment no:- 2200160578)
Mule Gitanjali (Enrollment no:- 23510210572)

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 1


Emotions Based Music System

ABSTRACT

The face is an important aspect in predicting human emotions and mood. Usually the human
emotions are extracted with the use of camera. There are many applications getting developed
based on detection of human emotions. Few applications of emotion detection are business
notification recommendation, e-learning, mental disorder and depression detection, criminal
behaviour detection etc. In this proposed system, we develop a prototype in recommendation of
dynamic music recommendation system based on human emotions. Based on each human
listening pattern, the songs for each emotions are trained. Integration of feature extraction and
machine learning techniques, from the real face the emotion are detected and once the mood is
derived from the input image, respective songs for the specific mood would be played to hold
the users. In this approach, the application gets connected with human feelings thus giving a
personal touch to the users. Therefore our projected system concentrate on identifying the
human feelings for developing emotion based music player using computer vision and machine
learning techniques. For experimental results, we use CNN model architecture for emotion
detection and music recommendation.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 2


Emotions Based Music System

CONTENTS

1. Introduction ..................................................................................................... 5-8


1.1. Overview ................................................................................................... 5-6
1.2. Literature Survey ................................................................................... 7-8

2. System Requirements .................................................................. 9-10

2.1. python ................................................................................................... 9-10

2.1.1. Introduction.................................................................................. 9

2.1.2.History of python ........................................................................ 10

2.1.3. Python features .......................................................................... 10

3. System Design ................................................................................... 11-19

3.1. System Architecture ........................................................................... 11-13

3.2. UML Diagrams ................................................................................... 14

3.3. Use Case Diagram .............................................................................. 15

3.4. Class Diagram ..................................................................................... 16

3.5. Sequence Diagram .............................................................................. 17

3.6.Activity Diagram ................................................................................. 18

3.7.Proposed System .................................................................................. 19

4. Implementation .................................................................................. 20-30


Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 3


Emotions Based Music System

4.1. Modules ........................................................................................... 20


4.2. Modules Description .................................................................. 20-30

5. Input Design and Output Design ..................................................... 31-32

5.1. Input Design .................................................................................. 31


5.2.Output Design ................................................................................. 32

6. Actual Output ....................................................................................... 33

7. Results and Conclusion .............................................................. 34-35

7.1. Advantages34 ………………………………………………………………34


7.2. Application………………………………………………………………….34
7.3. Conclusion……………………………………………………………35
7.4. Future Scope…………………………………………………………….35

8. Bibliography/ References ......................................................... 36-38

8.1. Reference .................................................................................36-38

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 4


Emotions Based Music System

CHAPTER 1

INTRODUCTION

Overview
Machine learning involves computers discovering how they can perform tasks without being
explicitly programmed to do so. It involves computers learning from data provided so that they
carry out certain tasks. For simple tasks assigned to computers, it is possible to program
algorithms telling the machine how to execute all steps required to solve the problem at hand;
on the computer's part, no learning is needed. For more advanced tasks, it can be challenging for
a human to manually create the needed algorithms. In practice, it can turn out to be more
effective to help the machine develop its own algorithm, rather than having human programmers
specify every needed step.

The discipline of machine learning employs various approaches to teach computers to


accomplish tasks where no fully satisfactory algorithm is available. In cases where vast numbers
of potential answers exist, one approach is to label some of the correct answers as valid. This
can then be used as training data for the computer to improve the algorithm(s) it uses to
determine correct answers. For example, to train a system for the task of digital character
recognition, the MNIST dataset of handwritten digits has often been used.

Machine learning approaches


Machine learning approaches are traditionally divided into three broad categories, depending on
the nature of the "signal" or "feedback" available to the learning system:
Supervised learning: The computer is presented with example inputs and their desired outputs,
given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning).

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 5


Emotions Based Music System

Reinforcement learning: A computer program interacts with a dynamic environment in which


it must perform a certain goal (such as driving a vehicle or playing a game against an opponent).
As it navigates its problem space, the program is provided feedback that's analogous to rewards,
which it tries to maximize.
Other approaches have been developed which don't fit neatly into this three-fold categorisation,
and sometimes more than one is used by the same machine learning system. For example topic
modeling, dimensionality reduction or meta learning.

As of 2020, deep learning has become the dominant approach for much ongoing work in the
field of machine learning.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 6


Emotions Based Music System

LITERATURE SURVEY

1. Smart Music Player Integrating Facial Emotion Recognition and Music Mood Recommendation

AUTHORS: Shlok Gilda, Husain Zafar, Chintan Soni and Kshitija Waghurdekar

Songs, as a medium of expression, have always been a popular choice to depict and understand
human emotions. Reliable emotion based classification systems can go a long way in helping us
parse their meaning. However, research in the field of emotion-based music classification has
not yielded optimal results. In this paper, we present an affective cross-platform music player,
EMP, which recommends music based on the real-time mood of the user. EMP provides smart
mood based music recommendation by incorporating the capabilities of emotion context
reasoning within our adaptive music recommendation system. Our music player contains three
modules: Emotion Module, Music Classification Module and Recommendation Module. The
Emotion Module takes an image of the user's face as an input and makes use of deep learning
algorithms to identify their mood with an accuracy of 90.23%. The Music Classification Module
makes use of audio features to achieve a remarkable result of 97.69% while classifying songs
into 4 different mood classes. The Recommendation Module suggests songs to the user by
mapping their emotions to the mood type of the song, taking into consideration the preferences
of the user.

1) Current emotion research in music psychology

AUTHORS: Swathi Swaminathan and E. Glenn Schellenberg

Music is universal at least partly because it expresses emotion and regulates affect. Associations
between music and emotion have been examined regularly by music psychologists. Here, we
review recent findings in three areas: (a) the communication and perception of emotion in
music, (b) the emotional consequences of music listening, and (c) predictors of music
preferences.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 7


Emotions Based Music System

2) Mood Classification from Musical Audio Using User Group-dependent Models

AUTHORS: Kyogu Lee and Minsu Cho

In this paper, we propose a music mood classification system that reflects a user's profile based
on a belief that music mood perception is subjective and can vary depending on the user's
profile such as age or gender. To this end, we first define a set of generic mood descriptors.
Secondly, we make up several user profiles according to the age and gender. We then obtain
musical items, for each group, to separately train the statistical models. Using the two different
user models, we verify our hypothesis that the user profiles play an important role in mood
perception by showing that both models achieve higher classification accuracy when the test
data and the mood model are of the same kind. Applying our system to automatic play list
generation, we also demonstrate that considering the difference between the user groups in
mood perception has a significant effect in computing music similarity.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 8


Emotions Based Music System

CHAPTER 2
System Requirments

 Operating system –windows-10


 Python 3.8
 numpy==1.17.4
 opencv-python==4.2.0.32
 tensorflow==2.0.0
 python-vlc==3.0.7110
 System : Pentium i3 Processor.  Hard Disk : 500 GB.
 Monitor :15’’ LED
 Input Devices : Keyboard, Mouse
 RAM :4 GB

2.1. Python

2.1.1. Introduction

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is


designed to be highly readable. It uses English keywords frequently where as other languages
use punctuation, and it has fewer syntactical constructions than other languages.

• Python is Interpreted − Python is processed at runtime by the interpreter. You do not


need to compile your program before executing it. This is similar to PERL and PHP.

• Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.

• Python is Object-Oriented − Python supports Object-Oriented style or technique of


programming that encapsulates code within objects.

• Python is a Beginner's Language − Python is a great language for the beginner-level


programmers and supports the development of a wide range of applications from simple
text processing to WWW browsers to games.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 9


Emotions Based Music System

2.1.2 History of Python

Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).

Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress

2.1.3 Python Features

Python's features include −

• Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly..

• Easy-to-maintain − Python's source code is fairly easy-to-maintain.

• A broad standard library − Python's bulk of the library is very portable and
crossplatform compatible on UNIX, Windows, and Macintosh.

• Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

• Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.

• Extendable − You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.

• Databases − Python provides interfaces to all major commercial databases.

• GUI Programming − Python supports GUI applications that can be created and ported
to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 10


Emotions Based Music System

Chapter 3

SYSTEM DESIGN

SYSTEM ARCHITECTURE:

Face Detection using


Webcam

Facial Landmarks
Extraction

Facial Expression
Detection

Music Player

Angry Disgusted Fearful Happy Neutral Sad Surprised


DATA FLOW DIAGRAM:

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 11


Emotions Based Music System

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used
by the process, an external entity that interacts with the system and the information
flows in the system.
3. DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and
the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 12


Emotions Based Music System

Input Live Web Cam Video

Preprocessing

Training dataset

CNN Feature Extraction

Prediction/Classification Facial Expression Detection

Play Music according to the


Expression recognized

Data Flow Diagram

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 13


Emotions Based Music System

UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major components: a
Meta-model and a notation. In the future, some form of method or process may also be added to;
or associated with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business modeling
and other non-software systems.
The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the design
of software projects.

GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns
and components.
7. Integrate best practices.

USE CASE DIAGRAM:


Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 14


Emotions Based Music System

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. Roles of the actors in
the system can be depicted.

Input Live Web Cam Video

Preprocessing

User

Training

Classification

CLASS DIAGRAM:

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 15


Emotions Based Music System

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of
static structure diagram that describes the structure of a system by showing the system's classes,
their attributes, operations (or methods), and the relationships among the classes. It explains
which class contains information.

Input Output
Image Acquisition Features extraction
Input Live Video Classification

Preprocessing ( ) Finally get Classified &


Display Result ( )Play
Music based on the
Expression detected

SEQUENCE DIAGRAM:

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 16


Emotions Based Music System

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams.

Data collection Training Testing

Collect the images from the user

Send the data to the training stage

Perform Preprocessing

Train the images

Extract the features with images & send to the testing stage

Give input from live web camera

Predict the type using proposed algorithm & play song accordingly

ACTIVITY DIAGRAM:

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 17


Emotions Based Music System

Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.

Input Web Cam Video

Preprocessing

Training

CNN Model Architecture

Play music according to the


predicted expression

PROPOSED SYSTEM:

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 18


Emotions Based Music System

• The proposed system is divided into two parts front end part which is the user interface
and the back end which performs all facial expression related operations. All the
application is implemented in python front end is implemented using python module
Tkinter and backend is implemented using Keras module.
• When the application is started it automatically triggers the prediction module which is
used for prediction of facial expression. The prediction module calls the system camera
for retrieving image, then the image is classified using the pre-trained model. This
process is repeated for N times to get N predictions, from N predictions maximum
counted value is taken and is returned to the application.
• Then the application suggests playlist based on predicted value. Users can either start
that playlist or listen to his regular songs. Users can do operations general music player
operations such as play, pause, next, previous. The application will suggest a playlist
based on mood for every K number of songs.

Chapter 4

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 19


Emotions Based Music System

IMPLEMENTATION

MODULES:
 Dataset
 Importing the necessary libraries
 Retrieving the images
 Splitting the dataset
 Building the model
 Apply the model and plot the graphs for accuracy and loss
 Accuracy on test set
 Saving the Trained Model
 Face expression in Live webcam

MODULES DESCSRIPTION:
Dataset:
In the first module, we developed the system to get the input dataset for the training and testing
purpose. We have given the dataset for face expression detection in project folder itself.

The dataset consists of 35,887 face expression images

Importing the necessary libraries:


We will be using Python language for this. First we will import the necessary libraries such as
keras for building the main model, sklearn for splitting the training and test data, PIL for
converting the images into array of numbers and other libraries such as pandas, numpy,
matplotlib and tensorflow.

Retrieving the images:


We will retrieve the images and their labels. Then resize the images to (48,48) as all images
should have same size for recognition. Then convert the images into numpy array.

Splitting the dataset:


Split the dataset into train and test. 80% train data and 20% test data.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 20


Emotions Based Music System

Convolutional Neural Networks


The objectives behind the first module of the course 4 are:
• To understand the convolution operation
• To understand the pooling operation
• Remembering the vocabulary used in convolutional neural networks (padding, stride, filter, etc.)
• Building a convolutional neural network for multi-class classification in images

Computer Vision
Some of the computer vision problems which we will be solving in this article are:
1. Image classification
2. Object detection
3. Neural style transfer
One major problem with computer vision problems is that the input data can get really big.
Suppose an image is of the size 68 X 68 X 3. The input feature dimension then becomes 12,288.
This will be even bigger if we have larger images (say, of size 720 X 720 X 3). Now, if we pass
such a big input to a neural network, the number of parameters will swell up to a HUGE number
(depending on the number of hidden layers and hidden units). This will result in more
computational and memory requirements – not something most of us can deal with.

Edge Detection Example


In the previous article, we saw that the early layers of a neural network detect edges from an
image. Deeper layers might be able to detect the cause of the objects and even more deeper
layers might detect the cause of complete objects (like a person’s face).
In this section, we will focus on how the edges can be detected from an image. Suppose we are
given the below image:

As you can see, there are many vertical and horizontal edges in the image. The first thing to do

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 21


Emotions Based Music System

is to detect these edges:


But how do we detect these edges? To illustrate this, let’s take a 6 X 6 grayscale image (i.e. only
one channel):

Next, we convolve this 6 X 6 matrix with a 3 X 3 filter:

After the convolution, we will get a 4 X 4 image. The first element of the 4 X 4 matrix will be
calculated as:

So, we take the first 3 X 3 matrix from the 6 X 6 image and multiply it with the filter. Now, the
first element of the 4 X 4 output will be the sum of the element-wise product of these values, i.e.
3*1 + 0 + 1*-1 + 1*1 + 5*0 + 8*-1 + 2*1 + 7*0 + 2*-1 = -5. To calculate the second element of

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 22


Emotions Based Music System

the 4 X 4 output, we will shift our filter one step towards the right and again get the sum of the
element-wise product:

Similarly, we will convolve over the entire image and get a 4 X 4 output:

So, convolving a 6 X 6 input with a 3 X 3 filter gave us an output of 4 X 4. Consider one more
example:

Note: Higher pixel values represent the brighter portion of the image and the lower pixel values
represent the darker portions. This is how we can detect a vertical edge in an image.

More Edge Detection


The type of filter that we choose helps to detect the vertical or horizontal edges. We can use the
following filters to detect different edges:

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 23


Emotions Based Music System

Some of the commonly used filters are:

The Sobel filter puts a little bit more weight on the central pixels. Instead of using these filters,
we can create our own as well and treat them as a parameter which the model will learn using
backpropagation.

Padding
We have seen that convolving an input of 6 X 6 dimension with a 3 X 3 filter results in 4 X 4
output. We can generalize it and say that if the input is n X n and the filter size is f X f, then the
output size will be (n-f+1) X (n-f+1):
• Input: n X n
• Filter size: f X f
• Output: (n-f+1) X (n-f+1)
There are primarily two disadvantages here:
1. Every time we apply a convolutional operation, the size of the image shrinks
2. Pixels present in the corner of the image are used only a few number of times during
convolution as compared to the central pixels. Hence, we do not focus too much on the
corners since that can lead to information loss
To overcome these issues, we can pad the image with an additional border, i.e., we add one pixel
all around the edges. This means that the input will be an 8 X 8 matrix (instead of a 6 X 6
matrix). Applying convolution of 3 X 3 on it will result in a 6 X 6 matrix which is the original
shape of the image. This is where padding comes to the fore:
• Input: n X n

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 24


Emotions Based Music System

• Padding: p
• Filter size: f X f
• Output: (n+2p-f+1) X (n+2p-f+1)
There are two common choices for padding:
1. Valid: It means no padding. If we are using valid padding, the output will be (n-f+1) X
(n-f+1)
2. Same: Here, we apply padding so that the output size is the same as the input size, i.e.,
n+2p-f+1 = n
So, p = (f-1)/2
We now know how to use padded convolution. This way we don’t lose a lot of information and
the image does not shrink either. Next, we will look at how to implement strided convolutions.

Strided Convolutions
Suppose we choose a stride of 2. So, while convoluting through the image, we will take two
steps – both in the horizontal and vertical directions separately. The dimensions for stride s will
be:
• Input: n X n
• Padding: p
• Stride: s
• Filter size: f X f
• Output: [(n+2p-f)/s+1] X [(n+2p-f)/s+1]
Stride helps to reduce the size of the image, a particularly useful feature.

Convolutions Over Volume


Suppose, instead of a 2-D image, we have a 3-D input image of shape 6 X 6 X 3. How will we
apply convolution on this image? We will use a 3 X 3 X 3 filter instead of a 3 X 3 filter. Let’s
look at an example:
• Input: 6 X 6 X 3
• Filter: 3 X 3 X 3
The dimensions above represent the height, width and channels in the input and filter. Keep in
mind that the number of channels in the input and filter should be same. This will result in an
output of 4 X 4. Let’s understand it visually:
Since there are three channels in the input, the filter will consequently also have three channels.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 25


Emotions Based Music System

After convolution, the output shape is a 4 X 4 matrix. So, the first element of the output is the
sum of the element-wise product of the first 27 values from the input (9 values from each
channel) and the 27 values from the filter. After that we convolve over the entire image.
Instead of using just a single filter, we can use multiple filters as well. How do we do that? Let’s
say the first filter will detect vertical edges and the second filter will detect horizontal edges
from the image. If we use multiple filters, the output dimension will change. So, instead of
having a 4 X 4 output as in the above example, we would have a 4 X 4 X 2 output (if we have
used 2 filters):

Generalized dimensions can be given as:


• Input: n X n X nc
• Filter: f X f X nc
• Padding: p
• Stride: s
• Output: [(n+2p-f)/s+1] X [(n+2p-f)/s+1] X nc’
Here, nc is the number of channels in the input and filter, while nc’ is the number of filters.

One Layer of a Convolutional Network


Once we get an output after convolving over the entire image using a filter, we add a bias term
to those outputs and finally apply an activation function to generate activations. This is one
layer of a convolutional network. Recall that the equation for one forward pass is given by:
z[1] = w[1]*a[0] + b[1]

a[1] = g(z[1])
Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 26


Emotions Based Music System

In our case, input (6 X 6 X 3) is a[0]and filters (3 X 3 X 3) are the weights w[1]. These activations
from layer 1 act as the input for layer 2, and so on. Clearly, the number of parameters in case of
convolutional neural networks is independent of the size of the image. It essentially depends on
the filter size. Suppose we have 10 filters, each of shape 3 X 3 X 3. What will be the number of
parameters in that layer? Let’s try to solve this:
• Number of parameters for each filter = 3*3*3 = 27
• There will be a bias term for each filter, so total parameters per filter = 28
• As there are 10 filters, the total parameters for that layer = 28*10 = 280
No matter how big the image is, the parameters only depend on the filter size. Awesome, isn’t
it? Let’s have a look at the summary of notations for a convolution layer:  f[l] = filter size
• p[l] = padding
• s[l] = stride

• n[c][l] = number of filters


Let’s combine all the concepts we have learned so far and look at a convolutional network
example.

Simple Convolutional Network Example


We’ll take things up a notch now. Let’s look at how a convolution neural network with
convolutional and pooling layer works. Suppose we have an input of shape 32 X 32 X 3: We
take an input image (size = 39 X 39 X 3 in our case), convolve it with 10 filters of size 3 X 3,
and take the stride as 1 and no padding. This will give us an output of 37 X 37 X 10. We
convolve this output further and get an output of 7 X 7 X 40 as shown above. Finally, we take
all these numbers (7 X 7 X 40 = 1960), unroll them into a large vector, and pass them to a
classifier that will make predictions. This is a microcosm of how a convolutional network
works.
There are a number of hyperparameters that we can tweak while building a convolutional
network. These include the number of filters, size of filters, stride to be used, padding, etc. We
will look at each of these in detail later in this article. Just keep in mind that as we go deeper
into the network, the size of the image shrinks whereas the number of channels usually
increases.
In a convolutional network (ConvNet), there are basically three types of layers:
1. Convolution layer

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 27


Emotions Based Music System

2. Pooling layer
3. Fully connected layer
Let’s understand the pooling layer in the next section.

Pooling Layers
Pooling layers are generally used to reduce the size of the inputs and hence speed up the
computation. Consider a 4 X 4 matrix as shown below:

Applying max pooling on this matrix will result in a 2 X 2 output:

For every consecutive 2 X 2 block, we take the max number. Here, we have applied a filter of
size 2 and a stride of 2. These are the hyperparameters for the pooling layer. Apart from max
pooling, we can also apply average pooling where, instead of taking the max of the numbers, we
take their average. In summary, the hyperparameters for a pooling layer are:
1. Filter size
2. Stride
3. Max or average pooling
If the input of the pooling layer is nh X nw X nc, then the output will be [{(nh – f) / s + 1} X {(nw
– f) / s + 1} X nc].

CNN Example
We’ll take things up a notch now. Let’s look at how a convolution neural network with
convolutional and pooling layer works. Suppose we have an input of shape 32 X 32 X 3: There
are a combination of convolution and pooling layers at the beginning, a few fully connected
layers at the end and finally a softmax classifier to classify the input into various categories.
There are a lot of hyperparameters in this network which we have to specify as well. Generally,
we take the set of hyperparameters which have been used in proven research and they end up
doing well. As seen in the above example, the height and width of the input shrinks as we go

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 28


Emotions Based Music System

deeper into the network (from 32 X 32 to 5 X 5) and the number of channels increases (from 3
to 10).
All of these concepts and techniques bring up a very fundamental question – why convolutions?
Why not something else?

Why Convolutions?
There are primarily two major advantages of using convolutional layers over using just fully
connected layers:
1. Parameter sharing
2. Sparsity of connections Consider the below example:

If we would have used just the fully connected layer, the number of parameters would be =
32*32*3*28*28*6, which is nearly equal to 14 million! Makes no sense, right?
If we see the number of parameters in case of a convolutional layer, it will be = (5*5 + 1) * 6 (if
there are 6 filters), which is equal to 156. Convolutional layers reduce the number of parameters
and speed up the training of the model significantly.
In convolutions, we share the parameters while convolving through the input. The intuition
behind this is that a feature detector, which is helpful in one part of the image, is probably also
useful in another part of the image. So a single filter is convolved over the entire input and
hence the parameters are shared.
The second advantage of convolution is the sparsity of connections. For each layer, each output
value depends on a small number of inputs, instead of taking into account all the inputs.

Building the model:


For building the we will use sequential model from keras library. Then we will add the layers to
make convolutional neural network. In the first 2 Conv2D layers we have used 32 filters and the
kernel size is (3,3).
In the MaxPool2D layer we have kept pool size (2,2) which means it will select the maximum
value of every 2 x 2 area of the image. By doing this dimensions of the image will reduce by

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 29


Emotions Based Music System

factor of 2. In dropout layer we have kept dropout rate = 0.25 that means 25% of neurons are
removed randomly.

We apply these 4 layers again with some change in parameters. Then we apply flatten layer to
convert 2-D data to 1-D vector. This layer is followed by dense layer, dropout layer and dense
layer again. The last dense layer outputs 7 nodes as the face expression types. This layer uses
the softmax activation function which gives probability value and predicts which of the 2
options has the highest probability.

Apply the model and plot the graphs for accuracy and loss:
We will compile the model and apply it using fit function. The batch size will be 64. Then we
will plot the graphs for accuracy and loss. We got average validation accuracy of 96.6% and
average training accuracy of 95.3%.

Accuracy on test set:


We got a accuracy of 95.3% on test set Saving
the Trained Model:

Once you’re confident enough to take your trained and tested model into the production-ready
environment, the first step is to save it into a .h5 or .pkl file using a library like pickle .

Make sure you have pickle installed in your environment.

Next, let’s import the module and dump the model into.pkl file

Face expression in Live webcam:


Here, we capture the video. The read() function reads one frame from the video source, which in
this example is the webcam. This returns:
The actual video frame read (one frame on each loop)
A return code
The return code tells us if we have run out of frames, which will happen if we are reading from a
file. This doesn’t matter when reading from the webcam, since we can record forever, so we will
ignore it.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 30


Emotions Based Music System

Again, this code should be familiar. We are merely searching for the face in our captured frame.
The results will be angry, disgust, fear, happy, neutral, sad, surprise. After capturing the emotion,
A list of songs are suggested based on the emotion.

Chapter 5

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 31


Emotions Based Music System

INPUT DESIGN AND OUTPUT DESIGN

INPUT DESIGN

The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to
put transaction data in to a usable form for processing can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system. The design of input focuses on controlling the amount
of input required, controlling the errors, avoiding delay, avoiding extra steps and keeping the
process simple. The input is designed in such a way so that it provides security and ease of use
with retaining the privacy. Input Design considered the following things:

 What data should be given as input?


 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input.
 Methods for preparing input validations and steps to follow when error occur.

OBJECTIVES

1. Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and
show the correct direction to the management for getting correct information from the
computerized system.

2. It is achieved by creating user-friendly screens for the data entry to handle large volume of
data. The goal of designing input is to make data entry easier and to be free from errors. The
data entry screen is designed in such a way that all the data manipulates can be performed. It
also provides record viewing facilities.

3. When the data is entered it will check for its validity. Data can be entered with the help of
screens. Appropriate messages are provided as when needed so that the user will not be in
maize of instant. Thus the objective of input design is to create an input layout that is easy to
follow

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 32


Emotions Based Music System

OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and direct
source information to the user. Efficient and intelligent output design improves the system’s
relationship to help user decision-making.

1. Designing computer output should proceed in an organized, well thought out manner; the
right output must be developed while ensuring that each output element is designed so that
people will find the system can use easily and effectively. When analysis design computer
output, they should Identify the specific output that is needed to meet the requirements.

2. Select methods for presenting information.

3. Create document, report, or other formats that contain information produced by the system.

The output form of an information system should accomplish one or more of the following
objectives.

 Convey information about past activities, current status or projections of the  Future.
 Signal important events, opportunities, problems, or warnings.
 Trigger an action.
 Confirm an action.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 33


Emotions Based Music System

Chapter 6

Actual output

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 34


Emotions Based Music System

Chapter 7

Results and conclusion

7.1 Advantages:-

7.1.1 ADVANTAGES OF PROPOSED SYSTEM:

 The smart music player is an application that runs based on the idea that we can detect a
person's mood based on the expression on his face.
 The expression on the face is detected using convoluted neural networks (CNN). The set
of images is taken from the camera of the device and these images are given to a
pretrained CNN which returns facial expression to application. Based on facial
expression a song playlist is suggested.
 This is an additional feature of the existing feature of a music player. Usually facial
expression changes within seconds and not consistent so it may lead to wrong playlist
suggestion to overcome this problem application collects N number of images when the
application is started and takes facial expression which has appeared the maximum
number of times.

7.2 Applications-

1. Personalized playlists: Music streaming services could create personalized playlists


based on the user's mood, allowing them to easily find music that matches how they're
feeling at any given moment.
2. Therapeutic purposes: Music therapy programs could use emotions-based music systems
to tailor playlists for individuals dealing with anxiety, depression, or other mental health
issues.
3. Entertainment: Video games and movies could use dynamic music systems that adapt to
the emotions of the scene or gameplay, enhancing the overall experience for the player
or viewer.
4. Retail and marketing: Retail stores and businesses could use music to influence the
emotions of customers, creating a more pleasant shopping experience or encouraging
certain behaviors like relaxation or excitement.
5. Health and wellness apps: Meditation and mindfulness apps could incorporate music that
aligns with the user's emotional state, helping them to relax, focus, or energize as
needed.
Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 35


Emotions Based Music System

7.3 Conclusion

This application can be added as an additional feature to current advanced music players which
suggests songs based on previous song history. Adding a facial expression detection system in
music player would increase the situations where the system suggests the song what he needed,
this would increase user satisfaction. This facial recognition model can also be used in several
different situations such as movie suggestions, activity suggestions, etc.

7.4 Future scope

We can add incremental learning to the application in such a way that it learns from new data
generated by the application. The application asks for feedback from the user whether it has
predicted correct or not, based on feedback it will learn. The above process increases model
accuracy and results in improved quality. We can also add new features such as heart rate which
is somewhat connected to human emotions to increase the correctness of the model. We can also
consider background while predicting the emotion, this way we can get better results than the
previous method. For example, if we are in the gym the application must detect objects in a gym
and play motivational songs that are suited for the gym

Chapter 8
Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 36


Emotions Based Music System

Bibliography/ References

REFERENCES

[1] Swathi Swaminathan and E. Glenn Schellenberg, “Current emotion research


in music psychology,” Emotion Review, vol. 7, no. 2, pp. 189–197, Apr. 2015.

[2] “How music changes your mood”, Examined Existence. [Online]. Available:
http://examinedexistence.com/how-music-changes-yourmood/.

[3] Kyogu Lee and Minsu Cho, “Mood Classification from Musical Audio Using
User Group-dependent Models.”

[4] Daniel Wolff, Tillman Weyde, and Andrew MacFarlane, “Culture-aware


Music Recommendation.”

[5] Mirim Lee and Jun-Dong Cho, “Logmusic: context-based social music
recommendation service on mobile device,” Ubicomp’14 Adjunct, Seattle,
WA, USA, Sep. 13–17, 2014.

[6] D. Gossi and M. H. Gunes, “Lyric-based music recommendation,” in Studies


in computational intelligence. Springer Nature, pp. 301–310, 2016.

[7] Bo Shao, Dingding Wang, Tao Li, and Mitsunori Ogihara, “Music
recommendation based on acoustic features and user access patterns,” IEEE
Transactions on Audio, Speech, and Language Processing, vol. 17, no. 8, Nov.
2009.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 37


Emotions Based Music System

[8] Ying-li Tian, T. Kanade, and J. Cohn, “Recognizing lower. Face action units
for facial expression analysis,” in Proceedings of the 4th IEEE International
Conference on Automatic Face and Gesture Recognition (FG’00), Mar. 2000,
pp.
484–490.

[9] Gil Levi and Tal Hassner, “Emotion Recognition in the Wild via
Convolutional Neural Networks and Mapped Binary Patterns.” [10] E. E. P.
Myint and M. Pwint, “An approach for mulit-label music mood
classification,” in 2010 2nd International Conference on Signal Processing
Systems, Dalian, 2010, pp.
V1-290-V1-294.

[11] Peter Burkert, Felix Trier, Muhammad Zeshan Afzal, Andreas Dengel, and
Marcus Liwicki, “DeXpression: Deep Convolutional Neural Network for
Expression Recognition.”

[12] Ujjwalkarn, “An intuitive explanation of Convolutional neural networks,” the


data science blog, 2016. [Online]. Available:
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

[13] Ian J. Goodfellow et al., “Challenges in Representation Learning: A report on


three machine learning contests.”

[14] S. Lawrence, C. L. Giles, Ah Chung Tsoi, and A. D. Back, “Face recognition:


a convolutional neural-network approach,” in IEEE Transactions on Neural
Networks, vol. 8, no. 1, pp. 98–113, Jan. 1997.

[15] A. Kołakowska, A. Landowska, M. Szwoch, W. Szwoch, and M. R. Wríobel,


Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 38


Emotions Based Music System

“Human-Computer systems interaction: back-grounds and applications,” 3, ch.


Emotion Recognition and Its Applications, Cham: Springer International
Publishing, 2014, pp. 51–62.

[16] Brian McFee, Matt McVicar, Colin Raffel, Dawen Liang, Oriol Nieto, Eric
Battenberg, ..., and Adrian Holovaty, (2015). librosa: 0.4.1 [Data set].
Zenodo. http://doi.org/10.5281/zenodo.32193.

[17] The aubio team, “Aubio, a library for audio labelling,” 2003. [Online].
Available: http://aubio.org/.

[18] J. S. Downie, The music information retrieval evaluation exchange (mirex).


D-Lib Magazine, 12(12), 2006.

[19] Cyril Laurier, Perfecto Herrera, M Mandel and D Ellis, “Audio music mood
classification using support vector machine.”

[20] “Unsupervised feature learning and deep learning Tutorial,” [Online].


Available:
http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDesce
nt/.

[21] A. S. Bhat, V. S. Amith, N. S. Prasad, and D. M. Mohan, “An efficient


classification algorithm for music mood detection in western and hindi music
using audio feature extraction,” in 2014 Fifth International Conference on
Signal and Image Processing, Jeju Island, 2014, pp. 359–364.

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 39


Emotions Based Music System

Department of Information Technology

Puranmal Lahoti Government Polytechnic, Latur 413512 40

You might also like