17MCEI13
17MCEI13
and Videos
Submitted By
Vihar Sanghavi
17MCEI13
Major Project
Submitted in fulfillment of the requirements
for the degree of
Master of Technology in Computer Science and Engineering (Information
and Network Security)
Submitted By
Vihar Sanghavi
( 17MCEI13 )
Guided By
Dr. Sharada Valiveti
iii
Statement of Originality
—————————————————————————————————-
I, Vihar Sanghavi, 17MCEI13, give undertaking that the Major Project
entitled ””Sentiment Analysis Based on Images and Videos”” sub-
mitted by me, towards the partial fulfillment of the requirements for the
degree of Master of Technology in Computer Science & Engineering
(Information and Network Security) of Institute of Technology, Nirma
University, Ahmedabad, contains no material that has been awarded for any
degree or diploma in any university or school in any territory to the best of
my knowledge. It is the original work carried out by me and I give assurance
that no attempt of plagiarism has been made.It contains no material that
is previously published or written, except where reference has been made. I
understand that in the event of any similarity found subsequently with any
published work or any dissertation work elsewhere; it will result in severe
disciplinary action.
———————–
Signature of Student
Date:
Place:
Endorsed by
iv
Acknowledgements
I would also thank the Institution, all faculty members of Computer En-
gineering Department, Nirma University, Ahmedabad for their special atten-
tion and suggestions towards the project work.
- Vihar Sanghavi
17MCEI13
v
Abstract
vi
Abbreviations
CCN Convolutional Neural Network
RNN Recurrent Neural Network
LSTM Long Short Term Memory
S2VT Sequence to Sequence - Video to Text
SVM Support Vector Machine
vii
Contents
Certificate iii
Statement of Originality iv
Acknowledgements v
Abstract vi
Abbreviations vii
List of Figures xi
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objective of Study . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Scope of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Survey 5
2.1 General Sentiment Analysis Model . . . . . . . . . . . . . . . 5
2.2 Stages of Sentiment Analysis . . . . . . . . . . . . . . . . . . . 6
viii
2.3 Visual Sentiment Framework . . . . . . . . . . . . . . . . . . . 7
2.4 Classification of action in video . . . . . . . . . . . . . . . . . 8
2.5 Multimodel Sentiment Analysis . . . . . . . . . . . . . . . . . 9
2.6 Sequence to Sequence - Video to Text (S2VT) . . . . . . . . . 10
3 System Analysis 12
3.1 Tools & Technology . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Flow of Implementation . . . . . . . . . . . . . . . . . . . . . 13
4 Implementation 16
4.1 Result of Image Sentiment Analysis . . . . . . . . . . . . . . . 16
4.2 Result of Video Sentiment Analysis . . . . . . . . . . . . . . . 22
4.3 Confusion Matrix of Image Sentiment Analysis . . . . . . . . . 28
4.4 Confusion Matrix of Video Sentiment Analysis . . . . . . . . . 29
References 31
ix
List of Figures
x
4.12 Correctly Classified Video as Negative(2) . . . . . . . . . . . 25
4.13 Correctly Classified Video as Negative(3) . . . . . . . . . . . 25
4.14 Correctly Classified Video as Neutral(1) . . . . . . . . . . . . 26
4.15 Correctly Classified Video as Neutral(2) . . . . . . . . . . . . 27
4.16 Correctly Classified Video as Neutral(3) . . . . . . . . . . . . 28
xi
List of Tables
xii
Chapter 1
Introduction
1
data. We need to extract features from images and videos to perform sen-
timent analysis on them. we need to create model which takes images and
videos as input and features of them as output. Then we need to perform
sentiment analysis on features rather than full images and videos.
Now a days people share their views using images and videos rather than
text. Sometimes without caring impact of that on others and content in-
tegrity many people simply forward image and videos. So it impacts neg-
atively on society. Images and videos contain thousands of words and for-
warded easily so it impacts more compare to textual data. So sentiment
analysis of these kind of data is required.
Large number of images and videos are being uploaded online every day.
Videos and images contain text, visual and audio features that complement
each other. Multimodal Sentiment Analysis refers to the combination of two
or more input models in order to improve the performance of the analysis.
The applications of sentiment analysis are powerful and broad. The ability
to extract insights from social data is a practice that is being mostly adopted
by organizations across the world[1].
2
1.2 Objective of Study
Objective of this study is to do sentiment analysis of images and video
using data of twitter account. Currently many people share images and video
without checking content integrity and impact on others. So this framework
analyze sentiments of images and videos, then it classify into positive and
negative. Based on classification of images and videos we can decide that
which content should be forwarded and which should not be allowed to for-
ward.
3
to the world to maintain security.
1.4 Application
Sentiment analysis is a method which allows companies to categorize and
analyze the massive amount of unstructured feedback data from social media
platforms. It used in social media monitoring and to track survey responses,
customer reviews and competitors[3].
4
Chapter 2
Literature Survey
People share images and video without taking care of impact of their on
others. If someone forwards image and video then he should know about
integrity of that image and video but in reality people didn’t take care about
integrity and no one is ready to accept authority of forward content[5].
5
Figure 2.1: General Sentiment Analysis[1]
6
Figure 2.2: Stages of Sentiment Analysis[2]
7
Figure 2.3: Visual Sentiment Framework[3]
8
In figure 2.4all the picked huge features and hand-made features are then
encoded by Fisher vector and merged together with SVM to describe action
chronicles into different sorts. SURF is a novel scale-invariant component
finder and descriptor for highlight coordinating among various pictures and
has been very much utilized in different applications, for example, object
recognition[4].
9
obtaining information of sentiment that expressed through facial, physical,
gestures and any sentiment that can be observed in visual multimedia[3].
Ekman and Keltner (1970) are pioneers in this field of research; they put
through costly studies on facial emotions. They argued that it is possible
to detect basic emotions as Anger, Joy, Sadness, Disgust and Surprise from
cues of facial expressions. In figure 2.5, we present various studies on the use
of visual features for multimodal affecting analysis[7].
10
Convolutional Neural Network (CNN). First LSTM layer encodes the video
sequence. After viewing all the frames, the second LSTM layer learns to de-
code this state into a group of words. This can be viewed as using one LSTM
layer to model the visual features, and a second LSTM layer to convert into
language of the visual representation[8].
11
Chapter 3
System Analysis
• Dataset: ILSVRC-2012-OLS
• No. of Epochs: 10
12
• Hardware: 2.4 GHz Processor, 16 GB RAM
• No. of Epochs: 40
• Batch size: 25
Figure 3.1 shows framework for image sentiment analysis. First image is
given as to model as input. Model is made using CNN and RNN. CNN is
used to generate token from images. As example if there is image of people
playing with colors then CNN result would be people, play, color etc.
RNN is used t make meaningful sentence from that. From above exmple
output of RNN would be boy is playing cricket on ground. This feature is
13
given to sentiment analysis. Based on sentiment result analysis it will classify
images into two category:Positive & Negative.
For sentiment analysis purpose text blob library of python is used. Text
blob takes sentence as input return polarity value of given statement. If
polarity value is greater than 0 then sentiment is positive, polarity value is
less than 0 then sentiment is negative and sentiment is neutral for 0.
Figure 3.2 shows framework for video sentiment analysis. First video
is given as to model as input. Model is made using LSTM. Long short-term
memory (LSTM) is one type artificial recurrent neural network. It is used in
the field of deep learning. RNN has standard feedforward neural networks
but LSTM has feedback connections.
It can not only process single data points example as images, but also
entire sequences of data like video and speech. So to generate description
from video LSTM is used in place of RNN. Using LSTM feature from video
is generated. As example video contains train which is moving from one end
to another end. Then model gives train is moving as output.
14
This gives input for sentiment analysis. Sentiment analysis same as above
sentiment analysis which we have done in image sentiment analysis. Text blob
takes sentence as input return polarity value of given statement. If polarity
value is greater than 0 then sentiment is positive, polarity value is less than
0 then sentiment is negative and sentiment is neutral for 0.
15
Chapter 4
Implementation
16
Figure 4.2: Correctly Classified Image as Positive(2)
Figure 4.1 and 4.2 shows images which have actually positive sentiment
and classified as positive. In figure 4.1 people are playing with colors and
figure 4.2 people are celebrating birthday. Images gave as input to model
and model generate sentence which describes activity happens in image. For
first image it generates people playing with colors.
17
Figure 4.3: Correctly Classified Image as Negative(1)
Figure 4.3 shows image which has actually negative sentiment and classi-
fied as negative. In figure 4.3 people are fighting with each other. As people
are fighting with each other sentiment of above images are negative.
18
Figure 4.4: Correctly Classified Image as Negative(2)
19
Figure 4.6: Incorrectly Classified Image as Positive
Figure 4.5 shows image which has actually neutral sentiment and classified
as neutral. In figure 4.5 a person is reading a book. So there is no sentiment
in figure so it defines as neutral. Figure 4.6 shows case in which model fails
to identify correct sentiment of image.
20
Figure 4.7: Incorrectly Classified Image as Negative
Figure 4.6 and 4.7 shows images which have actually negative and positive
sentiment respectively but they classified as positive and negative. In figure
4.6 people are in sad mood and figure 4.2 people are happy and enjoying
function.
21
4.2 Result of Video Sentiment Analysis
Figure 4.8 is snippet of full video which have actually positive sentiment
and classified as positive. In full video boy is dancing and using model
it shows that young man is dancing and using that feature it classified as
positive sentiment.
22
Figure 4.9: Correctly Classified Video as Positive(2)
Figure 4.9 and 4.10 are snippet of videos which have actually positive
23
sentiment and classified as positive. In figure 4.1 boy is dancing, in 4.9 the
player made a great catch and figure 4.10 boy is talking.
24
Figure 4.12: Correctly Classified Video as Negative(2)
Figure 4.12 and 4.13 are snippet of videos which have actually negative
25
sentiment and classified as positive. In figure 4.11 monkey is doing martial
arts, in 4.12 a monkey is teasing a dog and figure 4.13 a dog drags girl.
26
Figure 4.15: Correctly Classified Video as Neutral(2)
Figure 4.15 and 4.16 are snippet of videos which have actually neutral
sentiment and classified as neutral. In 4.15 man is doing push ups and figure
4.16 a woman is singing.
27
Figure 4.16: Correctly Classified Video as Neutral(3)
Accuracy = 75.33%
28
4.4 Confusion Matrix of Video Sentiment Anal-
ysis
Accuracy = 71.33%
29
Chapter 5
5.2 Conclusion
As sentiment analysis on images and video are more complicated com-
pare to text analysis and accuracy is also concerned CNN & RNN is used
to identify feature from images. After retrieval of features of image using
textblob library of python. As video contains multiple images frames which
are dependent with each other, LSTM is used instead of simple RNN to
generate feature from video.
:
30
References
[1] Intisar O. Hussien and Yahia Hasan Jazyah. Multimodal sentiment anal-
ysis: A comparison study. Journal of Computer Science, 2018.
[2] Adyan Marendra Ramadhani and Hong Soon Goo. Twitter sentiment
analysis using deep learning methods. 7th International Annual Engi-
neering Seminar (InAES), Yogyakarta, Indonesia, 2017.
[3] Brendan Joub Vctor Camposa and Xavier Gir i Nietoc. From pixels to
sentiment: Fine-tuning cnns for visual sentiment prediction. Image and
Vision Computing, 2017.
[4] Jun-Wei Hsieh Salah Alghyaline and Chi-Hung Chuang. Video action
classification using symmelets and deep learning. IEEE International
Conference on Systems, Man, and Cybernetics (SMC), 2017.
[5] Stuti Jindal and Sanjay Singh. Image sentiment analysis using deep con-
volutional neural networks with domain specific fine tuning. International
Conference on Information Processing (ICIP), 2015.
[6] Tomislav Lipic Eva Cetinic and Sonja Grgic. Fine-tuning convolutional
neural networks for fine art classification. Expert Systems With Applica-
tions, 2018.
31
[7] Hailin Jin Quanzeng You, Jiebo Luo and Jianchao Yang. Robust image
sentiment analysis using progressively trained and domain transferred
deep networks. Twenty-Ninth AAAI Conference on Artificial Intelligence,
5, 2015.
32