0% found this document useful (0 votes)
121 views

Image Sentiment Analysis Using Deep Learning

This document summarizes research on using deep learning techniques for image sentiment analysis. It discusses how deep learning models like deep neural networks (DNNs) and convolutional neural networks (CNNs) have been applied to the task and have achieved significant performance. The paper reviews past work using techniques like DNNs, CNNs, region-based CNNs and Fast R-CNNs. It analyzes the performance and limitations of these approaches and discusses challenges and opportunities in the emerging field of image sentiment analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Image Sentiment Analysis Using Deep Learning

This document summarizes research on using deep learning techniques for image sentiment analysis. It discusses how deep learning models like deep neural networks (DNNs) and convolutional neural networks (CNNs) have been applied to the task and have achieved significant performance. The paper reviews past work using techniques like DNNs, CNNs, region-based CNNs and Fast R-CNNs. It analyzes the performance and limitations of these approaches and discusses challenges and opportunities in the emerging field of image sentiment analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)

Image Sentiment Analysis using Deep Learning


Namita Mittal Divya Sharma Manju Lata Joshi
Department of Computer Science Department of Computer Science Department of Computer Science
MNIT, Jaipur, India The IIS University, Jaipur, India International School of Informatics &
[email protected] [email protected] Management, Jaipur, India
[email protected]

Abstract— Sentiments are feelings, emotions likes and dislikes Deep learning techniques have also been explored for image
or opinions which can be articulate through text, images or sentiment analysis and providing significant results too. The
videos. Sentiment Analysis on web data is now becoming a deep learning modal can acquire accuracy or sometimes may
budding research area of social analytics. Users express their exceeds the human level intelligence and can give an
sentiments on the web by exchanging texts and uploading efficient task performance. The term “Deep” in Deep
images through a variety of social media like Instagram,
Learning indicates to the number of hidden layers in the
Facebook, Twitter, WhatsApp etc. A lot of research work has
been done for sentiment analysis of textual data; there has Neural Networks. Deep Learning models are trained by
been limited work that focuses on analyzing the sentiment of using any large set of labelled data and an architecture
image data. Image sentiment concepts are ANPs i.e. Adjective known as Neural Network Architecture that makes the
Noun Pairs automatically discovered tags of web images which feature or parameter learning directly from the given data
are useful for detecting the emotions or sentiments conveyed by without any human intervention or manual feature extraction
the image. The major challenge is to predict or identify the . Deep learning plays a major role for image sentiment
sentiments of unlabelled images. To overcome this challenge analysis for giving various techniques like Convolutional
deep learning techniques are used for sentiment analysis, as Neural Network (CNN), Region Neural Network (RNN),
deep learning models have the capability for effectively
Deep Neural Network (DNN ) and Deep Belief Network
learning the image behavior or polarity.
Image recognition, image prediction, image sentiment analysis, (DBN) etc to get optimum results. Deep Learning can be
and image classification are some of the fields where Neural seen as a framework that produce accurate parameter
Network (NN) has performed well implying significant learning for image classification. This paper studies and
performance of deep learning in image sentiment analysis. This analyzes about different techniques of deep learning namely;
paper focuses on some of the noteworthy models of deep DNN, CNN, R-CNN and Fast R-CNN. Further, in section II,
learning as Deep Neural Network (DNN), Convolutional the paper discusses the research work done so far for image
Neural Network (CNN), Region-based CNN (R-CNN) and Fast sentiment analysis using above mentioned techniques and
R-CNN along with the suitability of their applications in image their outcomes. In section III paper will analyze the
sentiment analysis and their limitations. The study also
performance and limitations of techniques discussed in
discusses the challenges and perspectives of this rising field.
section II. Section IV concludes the paper.
Keywords—image sentiment analysis, deep learning, deep neural
network, convolutional neural network, Region based CNN and
II. LITERATURE REVIEW
Fast R-CNN.

I. INTRODUCTION Though many researchers explored number of techniques


Sentiments can be expressed using text, image or videos [1]. for image sentiment analysis, machine learning based
A plethora of research papers available for text sentiment techniques are performing significantly well. Among
analysis, but still image sentiment analysis is not much various machine learning based techniques deep learning
explored. With the increasing use of social media to express based techniques outperforming for image sentiment
sentiments, it become one of the important area of research, analysis. This section analyzes some important research
therefore since last few years plenty of research has been works performed by researchers using deep learning
done for the same to achieve optimum results. techniques, along with their outcomes
Multiple techniques and algorithms have been proposed for A. Deep Neural Network (DNN)
image sentiment analysis broadly classified into two that is
DNN is used for visual sentiment analysis as well as textual
Machine Learning based algorithms and Lexicon based sentiment analysis. Neural Network use multiple layers,
algorithms. Machine Learning based algorithms includes initially image is fed in the input layer and then processed to
Support Vector Machine (SVM), Neural Network, Naïve give an output through output layer. In the middle of input
Bayes, Bayesian Network, and Maximum Entropy while and output layer multiple hidden layers are present for
Lexicon based algorithms includes statistical and semantic- further processing of an input image, due to multiple hidden
based techniques. layers in a neural network it is known as Deep Neural
Deep learning is the subfield or a technique of machine Network. Computer only understand image in a matrix form
learning that makes the computer intelligent enough, so the with each pixel holding some value the value representing
machine is capable to learn from experience and perceive the the pixel refers as activations. Each neuron is connected to
world of concepts. Computer fetch knowledge from the real other neurons; Activation of first layer only determines the
world experience, without human help to make computer activation of next layer. The goal is to join or bind the image
understand all the situation or to make decisions. [2]. pixels into edges, edges into the sub patterns and finally

978-1-5386-7325-6/18/$31.00 ©2018 IEEE 684


DOI 10.1109/WI.2018.00-11
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:20:06 UTC from IEEE Xplore. Restrictions apply.
combine the identified patterns into the image for analyzing reduced if the given image is of large size. For example, if
sentiments. any of the features are already identified in the modal in the
The study [5] have proposed an approach for image or previous convolution operation, then it would not be
visual sentiment analysis in respect to deep coupled processed further for identification. This is referred to as
adjective and noun neural networks. The main aim of their down sampling or sub-sampling. After pooling layer the
work is to reduce the amount of inter-class variance, so the output is still awaited, and then the concept of the fully
process discussed is to study the middle-level sentiment connected layer is introduced. It takes the output data from
representation first along with deep neural network, then as the convolutional networks. It flattened the matrix into the
a second step prediction is performed with optimization and vector and then gives it into the fully connected layer.
at last trained the system with mutual supervision with The study [3] has introduced a concept of visual sentiment
learned adjectives and networks the Rectified Kullback- analysis with Deep CNN, they evaluated the data on two
Leibler loss (ReKL). ReKN is used for eliminating datasets namely Twitter and Tumbler. The data is collected
unreasonable outcomes and train efficient sentiment based on the photos tags or hashtags and lexicons for
representation. DCAN (Deep Coupled Adjectives and Noun identifying the emotions behind any sentence from word
Neural Networks) is used for visual sentiment analysis. The level. To get the best and true feeling or sentiment of the
introduced system can analyze the feature of web noisy image a survey have conducted for classify the images
images along with ANP and get the result on Twitter and namely strongly positive, weakly negative, neutral, etc. and
SentiBank dataset. takes two baseline techniques that is low-level visual
The paper [1] have presented multiple techniques for features and also SentiBank, The study conducted
sentiment analysis using deep learning which covers both experiments for baseline approach for low-level features
image sentiment as well as text sentiments. The paper conclude that SentiBank is more powerful for efficient
includes models like CNN which is best suited for visual sentiment analysis.
sentiment analysis. Recursive Neural Network falls under The concept of DeepSentiBank was introduced by [10] for
supervised learning technique, Deep Belief Network is used visual Sentiment classification with DCNN. The experiment
for unlabelled data and to overcome the limitation of is based on a single server system which consists of 16- core
unlabelled data, it consists of several layers that are hidden. dual Intel E5-2650L processor with 64 GB Memory. The
Their study conveys that deep learning networks are better Deep CNN modal is used for training the system using
as compared to SVM and normal neural networks as deep Caffe. On the basis of the experiment done in their study
neural networks have more hidden layers. CNN's based approach has great accuracy as compared to
SVM.
B. Convolutional Neural Network (CNN)
The study mentioned at [7] has introduces a model, which
CNN is a feed-forward neural network majorly used for was trained for large-scale image data, used Flickr photos
image processing, image classification or image prediction; for analysis as it consists of a huge amount of images with a
So CNN’s one of the most important application is the large number of variations. It consists of progressive
analysis of Images or visuals. A sequence of functions is training as well as transfer learning for the labeled dataset of
performed for image Sentiment analysis using CNN. This images. Their study concludes that using convolutional
comprises of convolutional layer followed by nonlinear neural network one can achieve high accuracy and high
layer followed by Pooling Layer and fully connected layer. performance for analysis of image sentiment.
The research in [9] have designed a new architecture using
Input neurons First hidden the CNN and introduced a new training technique to
overcome the large-scale training data which is of a noisy
nature. The experiment was performed on large image data
set of Flicker images by SentiBank for training CNN and
implemented the modal on the Caffe which is publically
available. All the experiments are examined on Linux
X86_64 systems with 32GB RAM. The system has taken
90% of the image data randomly from the Flicker dataset for
Fig. 1. Basic CNN model with one layer [17] training and 10% is used as a testing data. They also
perform fine-tuning of the images [13].
The very first layer for CNN image classification is The study in [6] presented the sentiment analysis technique
convolutional layer as the image is entered, assume the for image sentiment as well as text sentiment. They present
analysis or reading of an image starts from the top left a method that consists of CNN architecture for text
corner, then a small segment or matrix of the image is sentiment analysis and image sentiment analysis to perform
selected known as filters. There would be multiple multimedia sentiment analysis. The experiment is based on
convolutional networks as the image will pass one twitter datasets, the data contains both positive as well as
convolutional layer and the output of a layer would be the negative tweets. The model proposed compares the text
input for the next layer. The Nonlinear Layer is the second between CNN with the following Naïve Bayes, Logistic
function in the pipeline, after the convolution operation. It Regression and SVM, for analyzing textual sentiments and
consists of a function known as activation function which CNN is compared with the following that is low-level
gives the CNN a nonlinearity behavior. Pooling Layer features, sentribute and SentiBank for image sentiment
follows nonlinear layer reduces the workload by reducing analysis. In this experiment, it is concluded that the multi
the features of an image or the image volume will be

685

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:20:06 UTC from IEEE Xplore. Restrictions apply.
CNN has performed significantly better than other datasets shows that the proposed algorithm outperforms as
techniques. compared to state of art approaches.
The study in [4] proposed a system using the CNN to fetch
certain parameters from the image and classify the image
D. Fast R-CNN
according to their behavior and parameters in an appropriate
class. Also created different neural networks to train the To overcome the limitations of R-CNN that is to reduce the
modal for self-observation of the patterns by itself and processing time and less memory usage a fast object
tested the performance and the accuracy level of the modal detection algorithm was proposed [14, 18] which is
on CPU and GPU and stated that CNN is one of the good popularly known as Fast R-CNN, it is similar to R-CNN. In
choices for image classification [7].
The study in [12] have explored about the possibilities of
emotions for an image by the help of deep learning, 5
emotions are identified in the paper and categories are Love,
Happiness, Violence, Fear, and Sadness. Data was collected
from Flicker for these categories and perform experiments
using various classification methods as SVM, Fine-tuning,
etc for the same. The paper proposed few methods (based on Fig. 2. Extracted regions from the input
deep learning) for image sentiment analysis and conclude
that ResNet-50 method on Flickr data performed best Fast R-CNN, the image is not initially divided into different
amongst all. regions but the image is first inputted to CNN where it
A rigorous empirical study by [15] shows comparisons generate convolutional feature map. By the help of
among a number of fine-tuned CNN architectures for visual convolutional feature map regions are identified and box
sentiment analysis. The study analyzed that deep boundary is formed, and to give input to the fully connected
architectures can learn features useful for identifying image layer it is reshaped it by using ROI (Region of Interest)
sentiments in social networks and state of the art models, pooling layer so that the fully connected layer must have the
experimented on datasets of Twitter were presented. Their same size images, at last, softmax layer is used to predict the
work also demonstrates that selection of pre-training a image. Therefore, in the Fast R-CNN, we need not to input
model initialization can make the difference when the data 2000 regions to CNN each time. We just need to perform
set is small, otherwise cost increases. convolution operation only once for an image.
The research in the study [9] introduced a modal for image
C. R-CNN (Region Convolutional Neural Network) sentiment analysis by combining features of multiple
R-CNN was introduced in [14] is a method for searching techniques like SentiBank, RCNN, and SentiStrength, the
selective regions for detecting objects from the given image experiment is performed on Flicker images dataset. The
for analyzing the sentiment of an image by the help of results show that for sentiments object detection is based
objects around the image. upon SIFT feature, but this is not much efficient than used
R-CNN extract 2000 regions from the input image and R-CNN for better and efficient output and R-CNN generates
generate a box boundary around the regions and fed the mean average precision (mAP) as 53.3%.
regions into CNN that is why it known as Region
Convolutional Neural Network. CNN is used to extract
features or parameters from the regions inputted and then III. ANALYSIS
the features extracted are submitted into SVM to classify the In this paper different deep learning techniques are
object present in the region. The major challenge for R- discussed for image sentiment analysis including DNN,
CNN is that takes more time and more memory as it trains CNN, R-CNN, and Fast R-CNN.
the network to classify and analyze 2000 different regions This work analyzes that CNN is far efficient and accurate
for a single image. then DNN, R-CNN and Fast R-CNN for image sentiment
In the work proposed by [8], they suggested a model based analysis. As Neural Network and DNN with a large number
on the mid-level features of the images that combines the of hidden layers will also increase cost, which is not
techniques of SentiBank, RCNN and SentiStrength. Results efficient. CNN is performing better and generate the
of experiments conducted on the Flickr image dataset show optimum result for analysis and increase almost 20 percent
that their approach achieves better sentiment classification accuracy and with less number of features requirements
accuracy. [11]. The time to train the model will also reduce with the
The work done in [16] proposed a framework to advantage help of CNN. Fast R-CNN is more efficient then R-CNN as
regions, where they first used off the shelf objectness tool to it occupies less space and takes less time to process as in
generate the candidates and apply a potential candidate fast R-CNN no need to input the 2000 regions every time.
selection method to remove redundant and noisy objects. The data set available and being used by the researchers for
Further, they computed the sentiment scores using CNN and image sentiment analysis are; SUN database2 which is first
the effective regions are discovered. Then finally they large-scale scene attribute database containing more than
combine both the scores as objectness score as well as 800 categories and 14,340 images as well as discriminative
sentiment scored to find effective regions automatically. attributes labeled by crowd source human studies. The other
Their framework only required image level label which datasets are Flickr dataset, Twitter testing dataset ( consists
significantly reduces the annotation burden required for of image tweets), standard Twitter dataset, SentiBank
training. The experiment conducted on 8 benchmark

686

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:20:06 UTC from IEEE Xplore. Restrictions apply.
Twitter Dataset6, MOUD dataset , Multimodel Opinion- Level Sentiment Intensity (MOSI) data set etc.

TABLE I. COMPARATIVE ANALYSIS OF DEEP LEARNING APPROACHES BASED IMAGE SENTIMENT ANALYSIS
Researchers Name and Data Set Results
Model Used Purpose
Year Used
J. Wang, J. Fu, Y. Xu, DCAN To reduce the amount of inter- Twitter and The system can efficiently get trained on mid-level
and T. Mei (2016) class variance SentiBank sentiment from noisy web images
Q. T. Ain, M. Ali, and et DNN, CNN To get a review for sentiment ----- Deep Learning is an optimum choice for sentiment
al. (2017) and RNN analysis techniques analysis
C. Xu, S. Cetintas, K. C. DCNN Visual Sentiment Prediction Twitter SentiBank is more powerful than Low-Level Visual
Lee, and L. J. Li. (2014) And Feature
Tumbler
T. Chen, D. Borth, and DCNN Visual Sentiment Concept Flicker images CNN based approach is more accurate than SVM
S. F. Chang (2014) Classification
S. Jindal, and S. Singh DCNN Image Sentiment Analysis with Flickr Using CNN one can achieve high accuracy
(2015) fine tuning
Q. You, J. Luo, and J. CNN and Robost Image Sentiment Flickr image PCNN shows better and efficient result as
Yang (2015) Progressive Analysis from SentiBank compared to CNN
CNN
G. Cai, and B. Xia CNN Multimedia Sentiment Analysis Flickr image Multi CNN have performed better in all
(2015) from SentiBank
& from Twitter
V. Bharadi, A. I. CNN Image classification Images by CNN is better choice for image classification
Mukadam, and et al. Digital camera
(2018) or from
databases
J. Mandhyani, L. Khatri, CNN Image Sentiment Analysis and Images from R-CNN is better and generates 53.3% (mAP)
and et al. (2017) categorizing emotions. SentiBank
J. Yang, D. She, Rosin, R-CNN Visual Sentiment Prediction IAPS, ArtPhoto, The proposed method outperforms the methods on
and L. Wang (2018) Twitter, Flickr, the popular affective datasets
Instagram

IV. CONCLUSION In Information Processing (ICIP), 2015 International Conference


Image Sentiment Analysis is one of the imperative on pp. 447-451, IEEE, 2015.
research areas for study, as now people are more used to [8] J. Mandhyani, L. Khatri, V. Ludhrani, V., Nagdev, and S. Sahu,
“Image Sentiment Analysis”. International Journal of Engineering
of visual data to converse. Inspired by the arising issue of Science, 4566
image sentiment analysis and its promising solution [9] Q. You, J. Luo, H. Jin, and J. Yang, “Robust Image Sentiment
through deep learning techniques, in this paper we Analysis Using Progressively Trained and Domain Transferred
addressed some of the significant studies executed in past Deep Networks”. In AAAI pp. 381-388, 2015.
for image sentiment analysis using deep learning [10] T. Chen, D. Borth, T. Darrell, and S. F. Chang, “Deepsentibank:
techniques. Further, as the research work in this area is Visual sentiment concept classification with deep convolutional
neural network”. arXiv preprint arXiv:1410.8586, 2014
being performed it is expected that soon researchers will
[11] C. Wang, and Y Xi, Convolutional Neural Network for Image
propose more efficient techniques to get optimum results. Classification”. Johns Hopkins University Baltimore, MD, 21218.
[12] V. Gajrala and A. Gupta, " Emotion detection and sentiment
REFERENCES analysis of Images". Georgia Institute of Technology, 2015.
[13] R. Girshick, J. Donahue, T. Darrell and J. Malik, “Region-based
[1] Q. T. Ain, M. Ali, A. Riaz, A. Noureen, M. Kamran, B. Hayat, convolutional networks for accurate object detection and
and A. Rehman. “Sentiment analysis using deep learning segmentation”. IEEE transactions on pattern analysis and machine
techniques: a review” Int J AdvComputSciAppl, vol. 8 issue 6, intelligence, 38(1), 142-158, 2016.
424, 2017. [14] R. Girshick, Fast R-CNN. In: IEEE International Conference on
[2] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. “ Deep Computer Vision (ICCV), 2015.
learning “ vol. 1.Cambridge: MIT Press. 2016. [15] V. Campos, B. Jou, and X. Giro-i-Nieto, “From pixels to
[3] C. Xu, S. Cetintas, K. C. Lee, and L. J. Li. “Visual sentiment sentiment: Fine-tuning cnns for visual sentiment prediction. Image
prediction with deep convolutional neural networks”. arXiv and Vision Computing”, 65, 15-22, 2017.
preprint arXiv:1411.5731, 2014. [16] J. Yang, D. She, M. Sun, M. M. Cheng, P. Rosin, and L. Wang.
[4] V. Bharadi, A. I. Mukadam, M. N. Panchbhai and N. N. Rode, “Visual sentiment prediction based on automatic discovery of
“Image Classification Using Deep Learning”. 2018 affective regions”. IEEE Transactions on Multimedia, 2018.
[5] J. Wang, J. Fu, Y. Xu, and T. Mei, “Beyond Object Recognition: [17] R. Dalai, and K. K. Senapati, “Comparison of Various RCNN
Visual Sentiment Analysis with Deep Coupled Adjective and Noun techniques for Classification of Object from Image”. International
Neural Networks”. In IJCAI pp. 3484-3490., 2016. Research Journal of Engineering and Technology (IRJET), vol. 4
[6] G. Cai, and B. Xia, “Convolutional neural networks for multimedia issue 07, 2017.
sentiment analysis”. In Natural Language Processing and Chinese [18] S.Ren, K. He, R. Dirshick and J. Sun. “Faster R-CNN: Towards
Computing pp. 159-167. Springer, Cham, 2015. real time object detection with region proposal networks”. In
[7] S. Jindal, and S. Singh, "Image sentiment analysis using deep advances in neural information processing systems (pp. 91-99),
convolutional neural networks with domain-specific fine tuning". 2015.

687

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:20:06 UTC from IEEE Xplore. Restrictions apply.

You might also like