Image Sentiment Analysis Using Deep Learning
Image Sentiment Analysis Using Deep Learning
Abstract— Sentiments are feelings, emotions likes and dislikes Deep learning techniques have also been explored for image
or opinions which can be articulate through text, images or sentiment analysis and providing significant results too. The
videos. Sentiment Analysis on web data is now becoming a deep learning modal can acquire accuracy or sometimes may
budding research area of social analytics. Users express their exceeds the human level intelligence and can give an
sentiments on the web by exchanging texts and uploading efficient task performance. The term “Deep” in Deep
images through a variety of social media like Instagram,
Learning indicates to the number of hidden layers in the
Facebook, Twitter, WhatsApp etc. A lot of research work has
been done for sentiment analysis of textual data; there has Neural Networks. Deep Learning models are trained by
been limited work that focuses on analyzing the sentiment of using any large set of labelled data and an architecture
image data. Image sentiment concepts are ANPs i.e. Adjective known as Neural Network Architecture that makes the
Noun Pairs automatically discovered tags of web images which feature or parameter learning directly from the given data
are useful for detecting the emotions or sentiments conveyed by without any human intervention or manual feature extraction
the image. The major challenge is to predict or identify the . Deep learning plays a major role for image sentiment
sentiments of unlabelled images. To overcome this challenge analysis for giving various techniques like Convolutional
deep learning techniques are used for sentiment analysis, as Neural Network (CNN), Region Neural Network (RNN),
deep learning models have the capability for effectively
Deep Neural Network (DNN ) and Deep Belief Network
learning the image behavior or polarity.
Image recognition, image prediction, image sentiment analysis, (DBN) etc to get optimum results. Deep Learning can be
and image classification are some of the fields where Neural seen as a framework that produce accurate parameter
Network (NN) has performed well implying significant learning for image classification. This paper studies and
performance of deep learning in image sentiment analysis. This analyzes about different techniques of deep learning namely;
paper focuses on some of the noteworthy models of deep DNN, CNN, R-CNN and Fast R-CNN. Further, in section II,
learning as Deep Neural Network (DNN), Convolutional the paper discusses the research work done so far for image
Neural Network (CNN), Region-based CNN (R-CNN) and Fast sentiment analysis using above mentioned techniques and
R-CNN along with the suitability of their applications in image their outcomes. In section III paper will analyze the
sentiment analysis and their limitations. The study also
performance and limitations of techniques discussed in
discusses the challenges and perspectives of this rising field.
section II. Section IV concludes the paper.
Keywords—image sentiment analysis, deep learning, deep neural
network, convolutional neural network, Region based CNN and
II. LITERATURE REVIEW
Fast R-CNN.
685
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:20:06 UTC from IEEE Xplore. Restrictions apply.
CNN has performed significantly better than other datasets shows that the proposed algorithm outperforms as
techniques. compared to state of art approaches.
The study in [4] proposed a system using the CNN to fetch
certain parameters from the image and classify the image
D. Fast R-CNN
according to their behavior and parameters in an appropriate
class. Also created different neural networks to train the To overcome the limitations of R-CNN that is to reduce the
modal for self-observation of the patterns by itself and processing time and less memory usage a fast object
tested the performance and the accuracy level of the modal detection algorithm was proposed [14, 18] which is
on CPU and GPU and stated that CNN is one of the good popularly known as Fast R-CNN, it is similar to R-CNN. In
choices for image classification [7].
The study in [12] have explored about the possibilities of
emotions for an image by the help of deep learning, 5
emotions are identified in the paper and categories are Love,
Happiness, Violence, Fear, and Sadness. Data was collected
from Flicker for these categories and perform experiments
using various classification methods as SVM, Fine-tuning,
etc for the same. The paper proposed few methods (based on Fig. 2. Extracted regions from the input
deep learning) for image sentiment analysis and conclude
that ResNet-50 method on Flickr data performed best Fast R-CNN, the image is not initially divided into different
amongst all. regions but the image is first inputted to CNN where it
A rigorous empirical study by [15] shows comparisons generate convolutional feature map. By the help of
among a number of fine-tuned CNN architectures for visual convolutional feature map regions are identified and box
sentiment analysis. The study analyzed that deep boundary is formed, and to give input to the fully connected
architectures can learn features useful for identifying image layer it is reshaped it by using ROI (Region of Interest)
sentiments in social networks and state of the art models, pooling layer so that the fully connected layer must have the
experimented on datasets of Twitter were presented. Their same size images, at last, softmax layer is used to predict the
work also demonstrates that selection of pre-training a image. Therefore, in the Fast R-CNN, we need not to input
model initialization can make the difference when the data 2000 regions to CNN each time. We just need to perform
set is small, otherwise cost increases. convolution operation only once for an image.
The research in the study [9] introduced a modal for image
C. R-CNN (Region Convolutional Neural Network) sentiment analysis by combining features of multiple
R-CNN was introduced in [14] is a method for searching techniques like SentiBank, RCNN, and SentiStrength, the
selective regions for detecting objects from the given image experiment is performed on Flicker images dataset. The
for analyzing the sentiment of an image by the help of results show that for sentiments object detection is based
objects around the image. upon SIFT feature, but this is not much efficient than used
R-CNN extract 2000 regions from the input image and R-CNN for better and efficient output and R-CNN generates
generate a box boundary around the regions and fed the mean average precision (mAP) as 53.3%.
regions into CNN that is why it known as Region
Convolutional Neural Network. CNN is used to extract
features or parameters from the regions inputted and then III. ANALYSIS
the features extracted are submitted into SVM to classify the In this paper different deep learning techniques are
object present in the region. The major challenge for R- discussed for image sentiment analysis including DNN,
CNN is that takes more time and more memory as it trains CNN, R-CNN, and Fast R-CNN.
the network to classify and analyze 2000 different regions This work analyzes that CNN is far efficient and accurate
for a single image. then DNN, R-CNN and Fast R-CNN for image sentiment
In the work proposed by [8], they suggested a model based analysis. As Neural Network and DNN with a large number
on the mid-level features of the images that combines the of hidden layers will also increase cost, which is not
techniques of SentiBank, RCNN and SentiStrength. Results efficient. CNN is performing better and generate the
of experiments conducted on the Flickr image dataset show optimum result for analysis and increase almost 20 percent
that their approach achieves better sentiment classification accuracy and with less number of features requirements
accuracy. [11]. The time to train the model will also reduce with the
The work done in [16] proposed a framework to advantage help of CNN. Fast R-CNN is more efficient then R-CNN as
regions, where they first used off the shelf objectness tool to it occupies less space and takes less time to process as in
generate the candidates and apply a potential candidate fast R-CNN no need to input the 2000 regions every time.
selection method to remove redundant and noisy objects. The data set available and being used by the researchers for
Further, they computed the sentiment scores using CNN and image sentiment analysis are; SUN database2 which is first
the effective regions are discovered. Then finally they large-scale scene attribute database containing more than
combine both the scores as objectness score as well as 800 categories and 14,340 images as well as discriminative
sentiment scored to find effective regions automatically. attributes labeled by crowd source human studies. The other
Their framework only required image level label which datasets are Flickr dataset, Twitter testing dataset ( consists
significantly reduces the annotation burden required for of image tweets), standard Twitter dataset, SentiBank
training. The experiment conducted on 8 benchmark
686
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:20:06 UTC from IEEE Xplore. Restrictions apply.
Twitter Dataset6, MOUD dataset , Multimodel Opinion- Level Sentiment Intensity (MOSI) data set etc.
TABLE I. COMPARATIVE ANALYSIS OF DEEP LEARNING APPROACHES BASED IMAGE SENTIMENT ANALYSIS
Researchers Name and Data Set Results
Model Used Purpose
Year Used
J. Wang, J. Fu, Y. Xu, DCAN To reduce the amount of inter- Twitter and The system can efficiently get trained on mid-level
and T. Mei (2016) class variance SentiBank sentiment from noisy web images
Q. T. Ain, M. Ali, and et DNN, CNN To get a review for sentiment ----- Deep Learning is an optimum choice for sentiment
al. (2017) and RNN analysis techniques analysis
C. Xu, S. Cetintas, K. C. DCNN Visual Sentiment Prediction Twitter SentiBank is more powerful than Low-Level Visual
Lee, and L. J. Li. (2014) And Feature
Tumbler
T. Chen, D. Borth, and DCNN Visual Sentiment Concept Flicker images CNN based approach is more accurate than SVM
S. F. Chang (2014) Classification
S. Jindal, and S. Singh DCNN Image Sentiment Analysis with Flickr Using CNN one can achieve high accuracy
(2015) fine tuning
Q. You, J. Luo, and J. CNN and Robost Image Sentiment Flickr image PCNN shows better and efficient result as
Yang (2015) Progressive Analysis from SentiBank compared to CNN
CNN
G. Cai, and B. Xia CNN Multimedia Sentiment Analysis Flickr image Multi CNN have performed better in all
(2015) from SentiBank
& from Twitter
V. Bharadi, A. I. CNN Image classification Images by CNN is better choice for image classification
Mukadam, and et al. Digital camera
(2018) or from
databases
J. Mandhyani, L. Khatri, CNN Image Sentiment Analysis and Images from R-CNN is better and generates 53.3% (mAP)
and et al. (2017) categorizing emotions. SentiBank
J. Yang, D. She, Rosin, R-CNN Visual Sentiment Prediction IAPS, ArtPhoto, The proposed method outperforms the methods on
and L. Wang (2018) Twitter, Flickr, the popular affective datasets
Instagram
687
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 04,2023 at 18:20:06 UTC from IEEE Xplore. Restrictions apply.