0% found this document useful (0 votes)
6 views44 pages

17MCEI13

The document presents a major project titled 'Sentiment Analysis Based on Images and Videos' submitted by Vihar Sanghavi for a Master's degree in Computer Science and Engineering at Nirma University. It outlines the objective of analyzing sentiments from images and videos shared on social media, utilizing techniques like CNN and RNN for feature extraction. The project aims to classify content as positive or negative to determine its appropriateness for sharing, addressing the impact of unverified media on society.

Uploaded by

harisaikumar265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views44 pages

17MCEI13

The document presents a major project titled 'Sentiment Analysis Based on Images and Videos' submitted by Vihar Sanghavi for a Master's degree in Computer Science and Engineering at Nirma University. It outlines the objective of analyzing sentiments from images and videos shared on social media, utilizing techniques like CNN and RNN for feature extraction. The project aims to classify content as positive or negative to determine its appropriateness for sharing, addressing the impact of unverified media on society.

Uploaded by

harisaikumar265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Sentiment Analysis Based on Images

and Videos

Submitted By

Vihar Sanghavi

17MCEI13

DEPARTMENT OF COMPUTER & SCIENCE ENGINEERING


INSTITUTE OF TECHNOLOGY
NIRMA UNIVERSITY
AHMEDABAD-382481
May 2019
Sentiment Analysis Based on Images
and Videos

Major Project
Submitted in fulfillment of the requirements
for the degree of
Master of Technology in Computer Science and Engineering (Information
and Network Security)

Submitted By
Vihar Sanghavi
( 17MCEI13 )

Guided By
Dr. Sharada Valiveti

DEPARTMENT OF COMPUTER & SCIENCE ENGINEERING


INSTITUTE OF TECHNOLOGY
NIRMA UNIVERSITY
AHMEDABAD-382481
May 2019
Certificate
This is to certify that the major project entitled as ”Sentiment Anal-
ysis Based on Images and Videos” submitted by Vihar Sanghavi
(17MCEI13), towards the fulfillment of the requirements for the award
of degree of Master of Technology in Computer Science and Engineering (In-
formation and Network Security) of Nirma University, Ahmedabad, is the
record of work carried out by him under my supervision and guidance. In
my opinion, the submitted work has reached a level required for being ac-
cepted for examination. The results embodied in this major project part-I,
to the best of my knowledge, haven’t been submitted to any other university
or institution for award of any degree or diploma.

Dr. Sharada Valiveti


Guide & Associate Professor
Coordinator M.Tech - CSE (Information and Network Security)
CSE Department Institute of Technology,
Nirma University, Ahmedabad.

Dr. Madhuri Bhavsar Dr Alka Mahajan


Professor and Head, Director,
CSE Department, Institute of Technology,
Institute of Technology, Nirma University, Ahmedabad
Nirma University, Ahmedabad.

iii
Statement of Originality
—————————————————————————————————-
I, Vihar Sanghavi, 17MCEI13, give undertaking that the Major Project
entitled ””Sentiment Analysis Based on Images and Videos”” sub-
mitted by me, towards the partial fulfillment of the requirements for the
degree of Master of Technology in Computer Science & Engineering
(Information and Network Security) of Institute of Technology, Nirma
University, Ahmedabad, contains no material that has been awarded for any
degree or diploma in any university or school in any territory to the best of
my knowledge. It is the original work carried out by me and I give assurance
that no attempt of plagiarism has been made.It contains no material that
is previously published or written, except where reference has been made. I
understand that in the event of any similarity found subsequently with any
published work or any dissertation work elsewhere; it will result in severe
disciplinary action.

———————–
Signature of Student
Date:
Place:

Endorsed by

Dr. Sharada Valiveti


(Signature of Guide)

iv
Acknowledgements

It gives me immense pleasure in expressing thanks and profound gratitude to


Dr. Sharada Valiveti, Associate Professor, Computer & Science Engineer-
ing Department, Institute of Technology, Nirma University, Ahmedabad for
her valuable guidance and continual encouragement throughout this work.
The appreciation and continual support she has imparted has been a great
motivation to me in reaching a higher goal. Her guidance has triggered and
nourished my intellectual maturity that I will benefit from, for a long time
to come.

It gives me an immense pleasure to thank Dr. Madhuri Bhavsar,


Hon’ble Head of Computer & Science Engineering Department, Institute of
Technology, Nirma University, Ahmedabad for her kind support and provid-
ing basic infrastructure and healthy research environment.

A special thank you is expressed wholeheartedly to Dr. Alka Mahajan,


Hon’ble Director, Institute of Technology, Nirma University, Ahmedabad for
the unmentionable motivation she has extended throughout course of this
work.

I would also thank the Institution, all faculty members of Computer En-
gineering Department, Nirma University, Ahmedabad for their special atten-
tion and suggestions towards the project work.

- Vihar Sanghavi
17MCEI13

v
Abstract

Sentiment analysis is process to identify polarity of opinions as positive


or negative. Large number of images and videos are being uploaded online
every day. Videos and images contain text, visual and audio features that
complement each other. Here sentiment analysis of images and video using
data of twitter account is performed. Currently many people share images
and video without checking content integrity and impact on others. So this
framework analyze sentiments of images and videos, then it classify into
positive and negative. Based on classification of images and videos we can
decide that which content should be forwarded and which should not be
allowed to forward. Here we used CNN & RNN to extract features from
images and videos and Twitter API to load images from twitter.

vi
Abbreviations
CCN Convolutional Neural Network
RNN Recurrent Neural Network
LSTM Long Short Term Memory
S2VT Sequence to Sequence - Video to Text
SVM Support Vector Machine

vii
Contents

Certificate iii

Statement of Originality iv

Acknowledgements v

Abstract vi

Abbreviations vii

List of Figures xi

List of Tables xii

1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objective of Study . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Scope of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Survey 5
2.1 General Sentiment Analysis Model . . . . . . . . . . . . . . . 5
2.2 Stages of Sentiment Analysis . . . . . . . . . . . . . . . . . . . 6

viii
2.3 Visual Sentiment Framework . . . . . . . . . . . . . . . . . . . 7
2.4 Classification of action in video . . . . . . . . . . . . . . . . . 8
2.5 Multimodel Sentiment Analysis . . . . . . . . . . . . . . . . . 9
2.6 Sequence to Sequence - Video to Text (S2VT) . . . . . . . . . 10

3 System Analysis 12
3.1 Tools & Technology . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Flow of Implementation . . . . . . . . . . . . . . . . . . . . . 13

4 Implementation 16
4.1 Result of Image Sentiment Analysis . . . . . . . . . . . . . . . 16
4.2 Result of Video Sentiment Analysis . . . . . . . . . . . . . . . 22
4.3 Confusion Matrix of Image Sentiment Analysis . . . . . . . . . 28
4.4 Confusion Matrix of Video Sentiment Analysis . . . . . . . . . 29

5 Future Work and Conclusion 30


5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

References 31

ix
List of Figures

2.1 General Sentiment Analysis[1] . . . . . . . . . . . . . . . . . . 6


2.2 Stages of Sentiment Analysis[2] . . . . . . . . . . . . . . . . . 7
2.3 Visual Sentiment Framework[3] . . . . . . . . . . . . . . . . . 8
2.4 Classification of action in video[4] . . . . . . . . . . . . . . . . 8
2.5 Multimodel Sentiment Analysis[2] . . . . . . . . . . . . . . . . 9
2.6 Sequence to Sequence - Video to Text (S2VT) Framework . . 10

3.1 Framework for Image Sentiment Analysis . . . . . . . . . . . 13


3.2 Framework for Video Sentiment Analysis . . . . . . . . . . . 14

4.1 Correctly Classified Image as Positive(1) . . . . . . . . . . . . 16


4.2 Correctly Classified Image as Positive(2) . . . . . . . . . . . . 17
4.3 Correctly Classified Image as Negative(1) . . . . . . . . . . . 18
4.4 Correctly Classified Image as Negative(2) . . . . . . . . . . . 19
4.5 Correctly Classified Image as Neutral . . . . . . . . . . . . . . 19
4.6 Incorrectly Classified Image as Positive . . . . . . . . . . . . . 20
4.7 Incorrectly Classified Image as Negative . . . . . . . . . . . . 21
4.8 Correctly Classified Video as Positive(1) . . . . . . . . . . . . 22
4.9 Correctly Classified Video as Positive(2) . . . . . . . . . . . . 23
4.10 Correctly Classified Video as Positive(3) . . . . . . . . . . . . 23
4.11 Correctly Classified Video as Negative(1) . . . . . . . . . . . 24

x
4.12 Correctly Classified Video as Negative(2) . . . . . . . . . . . 25
4.13 Correctly Classified Video as Negative(3) . . . . . . . . . . . 25
4.14 Correctly Classified Video as Neutral(1) . . . . . . . . . . . . 26
4.15 Correctly Classified Video as Neutral(2) . . . . . . . . . . . . 27
4.16 Correctly Classified Video as Neutral(3) . . . . . . . . . . . . 28

xi
List of Tables

4.1 Confusion Matrix for Image Sentiment Analysis . . . . . . . . 28


4.2 Confusion Matrix for Video Sentiment Analysis . . . . . . . . 29

xii
Chapter 1

Introduction

Sentiment analysis is the process to identify how perspectives and opin-


ions can be related to attitude and emotion. It measures the inclination of
peoples views through computational linguistics natural language processing,
which are used to retrieve and process subjective information from the social
media platforms like Twitter and Facebook. The analyzed data quantifies
people or ideas, reactions toward certain products, the general public’s sen-
timents and reveal the contextual polarity of the information.

Sentiment analysis is mainly focused on the automatic recognition of po-


larity of opinions as positive or negative. These days, it is supplanting the
old electronic study and conventional overview strategies that directed by
associations for discovering popular sentiment about elements like items and
administrations so as to improve result of notice and their advertising system
and in the meantime it improves client administration.

To perform sentiment analysis on text is easy compare to sentiment anal-


ysis on images and videos because it contains more information than textual

1
data. We need to extract features from images and videos to perform sen-
timent analysis on them. we need to create model which takes images and
videos as input and features of them as output. Then we need to perform
sentiment analysis on features rather than full images and videos.

1.1 Problem Statement


Social media platforms like Facebook, Twitter and Instagram provide
more and more convenient facilities to their users. Today these platforms
are most important sources for people to share their views and information.
These platforms contains various types of data medium like textual, image,
audio and video.

Now a days people share their views using images and videos rather than
text. Sometimes without caring impact of that on others and content in-
tegrity many people simply forward image and videos. So it impacts neg-
atively on society. Images and videos contain thousands of words and for-
warded easily so it impacts more compare to textual data. So sentiment
analysis of these kind of data is required.

Large number of images and videos are being uploaded online every day.
Videos and images contain text, visual and audio features that complement
each other. Multimodal Sentiment Analysis refers to the combination of two
or more input models in order to improve the performance of the analysis.
The applications of sentiment analysis are powerful and broad. The ability
to extract insights from social data is a practice that is being mostly adopted
by organizations across the world[1].

2
1.2 Objective of Study
Objective of this study is to do sentiment analysis of images and video
using data of twitter account. Currently many people share images and video
without checking content integrity and impact on others. So this framework
analyze sentiments of images and videos, then it classify into positive and
negative. Based on classification of images and videos we can decide that
which content should be forwarded and which should not be allowed to for-
ward.

• Aim to provide an interactive automatic system which predicts the


sentiments of the images and video of the people posted in twitter
related to human psychology.

• This framework analyze sentiments of images and videos, then it clas-


sify into positive and negative.

• Based on classification of images and videos we can decide that which


content should be forwarded and which should not be allowed to for-
ward

1.3 Scope of Work


As sentiment analysis can be useful for different aspects. In this project
we focus mainly on sentiment analysis related to images and videos which are
shared through social media platform like twitter, facebook and instagram
etc. It directly useful to check integrity of images and videos before sharing

3
to the world to maintain security.

In this project behaviour of group of persons is analyzed. On the basis


of their activities images and videos are classified into main three categories:
Positive, Negative and Neutral feedback. As an example lets consider a image
which contains people who are celebrating festival is considered as positive,
people arguing with each other is considered as negative and people sitting
in garden is considered as neutral.

1.4 Application
Sentiment analysis is a method which allows companies to categorize and
analyze the massive amount of unstructured feedback data from social media
platforms. It used in social media monitoring and to track survey responses,
customer reviews and competitors[3].

• Image and video sentiment analysis can be used in surveillance system


to see behavior of people.

• Sentiment analysis of video is useful to find characteristics of different


people.

• Sentiment analysis on image and video is helpful to restrict unethical


activities on social media platforms like facebook, twitter and insta-
gram.

4
Chapter 2

Literature Survey

People share images and video without taking care of impact of their on
others. If someone forwards image and video then he should know about
integrity of that image and video but in reality people didn’t take care about
integrity and no one is ready to accept authority of forward content[5].

Many frameworks are created to perform sentiment analysis of image


and videos. Sentiment analysis of both things are different from sentiment
analysis of textual data. Sentiment analysis of image also contains facial
expression where sentiment analysis of video contains facial expression with
sounds. Different frameworks related to sentiment analysis of images and
videos are studied[6].

2.1 General Sentiment Analysis Model


A modern and recent development of multimodal sentiment analysis is the
visual sentiment analysis. Users of social media platforms frequently share
their text messages along with videos and, those kinds of visual multime-

5
Figure 2.1: General Sentiment Analysis[1]

dia are an additional source of information in expressing users sentiment[1].


Above figure 2.1 shows generic sentiment analysis model. It shows data from
various social platforms like facebook, twitter etc contains mainly three type
of data: Textual, Speech and Visual. Visual data contains images and videos
which are having textual, sound features with facial expressions.

2.2 Stages of Sentiment Analysis


The design of tweets sentiment analysis method is illustrated in figure 2.2.
Text cleaning and preprocessing text is used for preprocessing stages. In the
training stage, the tweets are trained. The final stage is sentiment test using
the test dataset[2].

Above figure simply represents flow of any sentiment analysis processs


using twitter data. First we need to download data from twitter then clean
the data because it contains unneccesary things also. Using deep learning
method we can train our neural network with the help of training dataset
and using testing dataset we test the model[2].

6
Figure 2.2: Stages of Sentiment Analysis[2]

2.3 Visual Sentiment Framework


In figure 2.3 image of dog is analyze using convolutional neural network
often known is CNN. CNNs mostly contain a large number of parameters
that need to be learned, and also often require large datasets when training
from scratch. In visual sentiment prediction tasks though, the size of the
datasets is usually constrained due to the difficulty and expense of acquiring
labels that depend so much on subjective reasoning[3].

A typical way to deal with this issue of little information estimate is to


utilize exchange taking in utilizing data from pre-prepared system prepared
on a lot of information to isolate in the littler dataset. It isn’t in every case
promptly evident how much each layer adds to a definitive exhibition of a

7
Figure 2.3: Visual Sentiment Framework[3]

system. In our system we utilized CNN to break down element of picture


and recordings[3].

2.4 Classification of action in video

Figure 2.4: Classification of action in video[4]

For video action classification ”Symmlet” is used. It is basically a


symmetrical pair including a SURF point and its corresponding symmetrical
point in a similar casing. With symmelets, the symmetrical property can be
effectively recovered and used to sift through foundation objects in the event
that they do exclude any symmelets. Utilizing this technique objects with
more significance can be chosen and encouraged into two-Stream ConvNets
to learn significant highlights for video characterization[4].

8
In figure 2.4all the picked huge features and hand-made features are then
encoded by Fisher vector and merged together with SVM to describe action
chronicles into different sorts. SURF is a novel scale-invariant component
finder and descriptor for highlight coordinating among various pictures and
has been very much utilized in different applications, for example, object
recognition[4].

2.5 Multimodel Sentiment Analysis

Figure 2.5: Multimodel Sentiment Analysis[2]

Visual language is a type of non-verbal communication in which phys-


ical behavior, as opposed to words, are used to express or convey informa-
tion. Such behavior includes facial expressions, body posture, gestures, eye
movement, touch and the use of space. Processing sentiment analysis using
computer vision is a relatively recent area of research. The main research
tasks in visual sentiment analysis is focusing on detecting, modeling and

9
obtaining information of sentiment that expressed through facial, physical,
gestures and any sentiment that can be observed in visual multimedia[3].

Ekman and Keltner (1970) are pioneers in this field of research; they put
through costly studies on facial emotions. They argued that it is possible
to detect basic emotions as Anger, Joy, Sadness, Disgust and Surprise from
cues of facial expressions. In figure 2.5, we present various studies on the use
of visual features for multimodal affecting analysis[7].

2.6 Sequence to Sequence - Video to Text


(S2VT)

Figure 2.6: Sequence to Sequence - Video to Text (S2VT) Framework

A sequence to sequence approach is used by S2VT that maps an input


video frame feature sequence into a fixed dimensional vector and then convert
into a sequence of words It consists a stack of two LSTM layers. The input
to the first LSTM layer is a sequence of frame features obtained from the

10
Convolutional Neural Network (CNN). First LSTM layer encodes the video
sequence. After viewing all the frames, the second LSTM layer learns to de-
code this state into a group of words. This can be viewed as using one LSTM
layer to model the visual features, and a second LSTM layer to convert into
language of the visual representation[8].

Long short-term memory (LSTM) is one type artificial recurrent neural


network. It is used in the field of deep learning.RNN has standard feedfor-
ward neural networks but LSTM has feedback connections. It can not only
process single data points example as images, but also entire sequences of
data like video and speech. So to generate description from video LSTM is
used in place of RNN.

11
Chapter 3

System Analysis

3.1 Tools & Technology


For Image Sentiment Analysis

• Hardware: 2.4 GHz Processor, 16 GB RAM

• Dataset: ILSVRC-2012-OLS

• Model: RESNET 152

• Techonolgy: Convolutional Neural Networks (CNN), Recurrent Neural


Networks (RNN)

• No. of Epochs: 10

• Batch size: 128

• Learning Rate: 0.001

• Language: Python 3.6.4

For Video Sentiment Analysis

12
• Hardware: 2.4 GHz Processor, 16 GB RAM

• Dataset: Movie Video Description(MVD)

• Techonolgy: Long short-term memory (LSTM)

• No. of Epochs: 40

• Batch size: 25

• Learning Rate: 0.001

• Language: Python 3.6.4

3.2 Flow of Implementation

Figure 3.1: Framework for Image Sentiment Analysis

Figure 3.1 shows framework for image sentiment analysis. First image is
given as to model as input. Model is made using CNN and RNN. CNN is
used to generate token from images. As example if there is image of people
playing with colors then CNN result would be people, play, color etc.

RNN is used t make meaningful sentence from that. From above exmple
output of RNN would be boy is playing cricket on ground. This feature is

13
given to sentiment analysis. Based on sentiment result analysis it will classify
images into two category:Positive & Negative.

For sentiment analysis purpose text blob library of python is used. Text
blob takes sentence as input return polarity value of given statement. If
polarity value is greater than 0 then sentiment is positive, polarity value is
less than 0 then sentiment is negative and sentiment is neutral for 0.

Figure 3.2: Framework for Video Sentiment Analysis

Figure 3.2 shows framework for video sentiment analysis. First video
is given as to model as input. Model is made using LSTM. Long short-term
memory (LSTM) is one type artificial recurrent neural network. It is used in
the field of deep learning. RNN has standard feedforward neural networks
but LSTM has feedback connections.

It can not only process single data points example as images, but also
entire sequences of data like video and speech. So to generate description
from video LSTM is used in place of RNN. Using LSTM feature from video
is generated. As example video contains train which is moving from one end
to another end. Then model gives train is moving as output.

14
This gives input for sentiment analysis. Sentiment analysis same as above
sentiment analysis which we have done in image sentiment analysis. Text blob
takes sentence as input return polarity value of given statement. If polarity
value is greater than 0 then sentiment is positive, polarity value is less than
0 then sentiment is negative and sentiment is neutral for 0.

15
Chapter 4

Implementation

4.1 Result of Image Sentiment Analysis

Figure 4.1: Correctly Classified Image as Positive(1)

16
Figure 4.2: Correctly Classified Image as Positive(2)

Figure 4.1 and 4.2 shows images which have actually positive sentiment
and classified as positive. In figure 4.1 people are playing with colors and
figure 4.2 people are celebrating birthday. Images gave as input to model
and model generate sentence which describes activity happens in image. For
first image it generates people playing with colors.

17
Figure 4.3: Correctly Classified Image as Negative(1)

Figure 4.3 shows image which has actually negative sentiment and classi-
fied as negative. In figure 4.3 people are fighting with each other. As people
are fighting with each other sentiment of above images are negative.

18
Figure 4.4: Correctly Classified Image as Negative(2)

Figure 4.5: Correctly Classified Image as Neutral

19
Figure 4.6: Incorrectly Classified Image as Positive

Figure 4.5 shows image which has actually neutral sentiment and classified
as neutral. In figure 4.5 a person is reading a book. So there is no sentiment
in figure so it defines as neutral. Figure 4.6 shows case in which model fails
to identify correct sentiment of image.

20
Figure 4.7: Incorrectly Classified Image as Negative

Figure 4.6 and 4.7 shows images which have actually negative and positive
sentiment respectively but they classified as positive and negative. In figure
4.6 people are in sad mood and figure 4.2 people are happy and enjoying
function.

21
4.2 Result of Video Sentiment Analysis

Figure 4.8: Correctly Classified Video as Positive(1)

Figure 4.8 is snippet of full video which have actually positive sentiment
and classified as positive. In full video boy is dancing and using model
it shows that young man is dancing and using that feature it classified as
positive sentiment.

22
Figure 4.9: Correctly Classified Video as Positive(2)

Figure 4.10: Correctly Classified Video as Positive(3)

Figure 4.9 and 4.10 are snippet of videos which have actually positive

23
sentiment and classified as positive. In figure 4.1 boy is dancing, in 4.9 the
player made a great catch and figure 4.10 boy is talking.

Figure 4.11: Correctly Classified Video as Negative(1)

Figure 4.11 is snippet of video which has actually negative sentiment


and classified as negative. In actual video monkey is doing martial arts but
using model it generates monkey is fighting with man so it generates negative
sentiment.

24
Figure 4.12: Correctly Classified Video as Negative(2)

Figure 4.13: Correctly Classified Video as Negative(3)

Figure 4.12 and 4.13 are snippet of videos which have actually negative

25
sentiment and classified as positive. In figure 4.11 monkey is doing martial
arts, in 4.12 a monkey is teasing a dog and figure 4.13 a dog drags girl.

Figure 4.14: Correctly Classified Video as Neutral(1)

Figure 4.14 is snippet of video in which train is moving to one end to


another end. So there is no sentiments in video so it classified as neutral.

26
Figure 4.15: Correctly Classified Video as Neutral(2)

Figure 4.15 and 4.16 are snippet of videos which have actually neutral
sentiment and classified as neutral. In 4.15 man is doing push ups and figure
4.16 a woman is singing.

27
Figure 4.16: Correctly Classified Video as Neutral(3)

Sentiment analysis of woman is singing is depend upon sound of video


but currently we did not consider sound while performing sentiment analysis
so it gives as neutral sentiment.

4.3 Confusion Matrix of Image Sentiment Anal-


ysis

Positive Negative Neutral


Positive 78 15 7
Negative 10 75 15
Neutral 14 13 73

Table 4.1: Confusion Matrix for Image Sentiment Analysis

Accuracy = 75.33%

28
4.4 Confusion Matrix of Video Sentiment Anal-
ysis

Positive Negative Neutral


Positive 38 4 8
Negative 10 34 6
Neutral 6 9 35

Table 4.2: Confusion Matrix for Video Sentiment Analysis

Accuracy = 71.33%

29
Chapter 5

Future Work and Conclusion

5.1 Future Work


Currently sound of video is not considered while performing sentiment
analysis on video so need to add that functionality. Our model fails with
low quality of image so need to handle that type of images. Perform video
sentiment analysis for other video format as currently it is only working for
.avi format.

5.2 Conclusion
As sentiment analysis on images and video are more complicated com-
pare to text analysis and accuracy is also concerned CNN & RNN is used
to identify feature from images. After retrieval of features of image using
textblob library of python. As video contains multiple images frames which
are dependent with each other, LSTM is used instead of simple RNN to
generate feature from video.
:

30
References

[1] Intisar O. Hussien and Yahia Hasan Jazyah. Multimodal sentiment anal-
ysis: A comparison study. Journal of Computer Science, 2018.

[2] Adyan Marendra Ramadhani and Hong Soon Goo. Twitter sentiment
analysis using deep learning methods. 7th International Annual Engi-
neering Seminar (InAES), Yogyakarta, Indonesia, 2017.

[3] Brendan Joub Vctor Camposa and Xavier Gir i Nietoc. From pixels to
sentiment: Fine-tuning cnns for visual sentiment prediction. Image and
Vision Computing, 2017.

[4] Jun-Wei Hsieh Salah Alghyaline and Chi-Hung Chuang. Video action
classification using symmelets and deep learning. IEEE International
Conference on Systems, Man, and Cybernetics (SMC), 2017.

[5] Stuti Jindal and Sanjay Singh. Image sentiment analysis using deep con-
volutional neural networks with domain specific fine tuning. International
Conference on Information Processing (ICIP), 2015.

[6] Tomislav Lipic Eva Cetinic and Sonja Grgic. Fine-tuning convolutional
neural networks for fine art classification. Expert Systems With Applica-
tions, 2018.

31
[7] Hailin Jin Quanzeng You, Jiebo Luo and Jianchao Yang. Robust image
sentiment analysis using progressively trained and domain transferred
deep networks. Twenty-Ninth AAAI Conference on Artificial Intelligence,
5, 2015.

[8] Raymond Mooney Subhashini Venugopalan, Lisa Anne Hendricks and


Kate Saenko. Improving lstm-based video description with linguistic
knowledge mined from text. Computer Vision and Pattern Recognition,
2016.

32

You might also like