Deep Learning Based Sentiment
Deep Learning Based Sentiment
January 2023
Shashank Kalluri
The authors declare that they are the sole authors of this thesis and that they have not used
any sources other than those listed in the bibliography and identified as references. They further
declare that they have not submitted this thesis at any other institution to obtain a degree.
Contact Information:
Author(s):
Shashank Kalluri
E-mail: [email protected]
University advisor:
Hüseyin Kusetogullari
Department of Computer Science
Background: Text data includes things like customer reviews and complaints,
tweets from social media platforms. When analyzing text-based data, the Sentiment
Model is used. Understanding news headlines, blogs, the stock market, political
debates, and film reviews some of the areas where sentiment analysis is used. The
results of a sentiment analysis may be used to aid in evaluating whether a review
is favorable, negative, or neutral. In this thesis we explore the performance of some
algorithms.
Objectives: The problems with natural language processing, on the other hand,
make it harder for sentiment analysis to work well and be accurate (NLP). In the
past few years, it has been shown that deep learning models are a promising way
to solve some of NLP’s problems. This paper looks at the most recent studies that
used deep learning to solve problems with sentiment analysis and their performance
metrics
Methods: The literature review is done to figure out which algorithms are best for
achieving the above goals. An experiment is done to understand how deep learning
works and what metrics are used to figure out which model is the best for sentiment
analysis. Several datasets have been used to test models that use the term frequency-
inverse document frequency and word embedding.
Results: The experiment indicated that the CNN model strikes the best balance
between how fast it works and how well it works. When used with word embedding,
the RNN model was the most accurate, but it took a long time to process and didn’t
work well with TF-IDF. The processing times and results of DNN are about average.
Conclusions: The primary objective of this research is to learn more about the
fundamentals of deep learning models and related approaches that have been used
for sentiment analysis of social network data. Before feeding it to deep learning
models, we changed the data using TF-IDF and word embedding. Architectures for
DNN, CNN, and RNN were looked into after performing the literature review. The
processing time gap was fixed, and the best combination was found.
I finally thank my family members and friends who supported me all the time in-
cluding the wonderful people I have met during my thesis.
iii
Contents
Abstract i
Acknowledgments iii
1 Introduction 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Aim and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Traditional sentiment classification techniques . . . . . . . . . . . . . 5
2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Convolutional Neural Networks (CNN) . . . . . . . . . . . . . 8
2.3.3 Recurrent Neural Networks (RNN) . . . . . . . . . . . . . . . 9
2.4 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.3 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.4 F1 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Related Work 15
4 Method 17
4.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.4 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.5 Training, Validation, and Test sets . . . . . . . . . . . . . . . 25
4.2.6 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
v
4.2.7 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.8 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.9 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Discussion 41
6.1 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.3 Conclusion Validity . . . . . . . . . . . . . . . . . . . . . . . . 42
References 45
vi
List of Figures
vii
List of Tables
ix
Chapter 1
Introduction
Since we often rely our choices on the experiences and perspectives of others, feelings
and viewpoints play a significant part in human behaviour. Learning about someone
else’s experiences and ideas is helpful in gaining a more well-rounded perspective
since everyone’s perspectives are subjective and influenced by their own unique cir-
cumstances. There has been an interest in public opinion to institutions, businesses,
and political figures over a significant length of time. The opinions of a population
as a whole may be used to develop information about future subjects and trends, as
well as provide insight into who will win elections. In addition to this, it helps assess
public opinion on goods and services, which in turn assists marketing teams in de-
ciding on a marketing plan, enhancing current or manufacture of a new product, and
increasing customer assistance [1, 2, 6] As a result, the acknowledgment of sentiment
in a variety of professions is of the utmost importance.
At first, people’s thoughts were gathered and analysed by hand after being gath-
ered via the use of questionnaires and surveys. On the other hand, as people’s
familiarity with the internet grew, they began posting their thoughts and behaviors
on the web more regularly [1]. The proliferation of social media platforms in recent
years has made it possible for users of these platforms to disseminate a broad range
of information and to offer a greater number of methods in which they may express
their thoughts [2]. Blogs, discussion forums, reviews, comments, and microblogging
services like Twitter and Facebook all serve as valuable data sources since they in-
clude audio recordings, video files, picture files, and opinions. Other rich data sources
include reviews and comments. [3, 4]
This paper looks at how people feel and think about a furniture store’s new
product based on online reviews of it. Especially how much attention it gets and
how many good and bad feelings it makes people feel. Researchers have come up with
a lot of strategies and algorithms for figuring out how people feel about something.
The analysis of this kind analyzed assists businesses in better understanding their
customers’ attitudes regarding their brand efforts. Sentiment Analysis is a kind of
Natural Language Processing that makes use of a variety of techniques - Machine
Learning algorithms, Lexicon based algorithms and Hybrid algorithms to classify
data [7, 9].In the past few years, a number of studies have come up with ideas for
deep-learning-based sentiment analyses. These analyses have different features and
1
2 Chapter 1. Introduction
levels of performance. This work looks at the most recent studies that used deep
learning models to solve different problems related to sentiment analysis. [5]
1.2.2 Objectives
• To apply different word embedding methods with deep learning techniques.
Justification: The need for RQ3 is to address the gap of the processing time of
different models.The Experimentation method is used to solve this question.This
questions is used to help in determing the computing cost as well.
1.4 Outline
This section describes the thesis structure
Chapter 1: The Introduction and motivation of the thesis, as well as the aim
and is an overview of the thesis and the problem that we are trying to solve. It talks
about the purpose, goals, and research questions.
Chapter 3: The methodology section gives a summary of the many algorithms that
were used as a direct consequence of the literature review and the recommended
strategy.
Chapter 4: The experiments that were carried out in order to address the re-
search questions are the primary emphasis of this chapter.
Chapter 5: This chapter is where the findings from the experiment are presented.
Chapter 6: This section contains a discussion and analysis of the acquired re-
sults.
Chapter 7:This chapter provides conclusion and future work of the thesis.
Chapter 2
Background
Machine learning, as the name implies, is the process of computers learning without
explicit human programming. First, give them excellent data, then train them by de-
veloping several machine learning models utilising the data and different techniques.
Primarily divided into two types: [3, 29]
Traditionally, supervised machine learning methods like Naive Bayes (NB), Support
Vector Machine (SVM), or Logistic Regression (LR) have been used in an effort to
solve the text sentiment categorization issue [22]. Pang et al. [10] produced one of the
first studies to suggest that machine learning may be utilized for the categorization
of content on online platforms based on the sentiment of the text. On the IMDb
movie review dataset, Pang et al. compared the performance of the NB, Maximum
Entropy (ME), and SVM classifiers [17, 18]. SVM was able to achieve an accuracy
that was about 83 Percent, which was considered to be good. Since that time, the
5
6 Chapter 2. Background
use of social media has steadily increased over the years, and these days, millions
of individuals express their thoughts and ideas via online platforms. An interest in
attaining automated sentiment categorization has been sparked as a result of the vast
amounts of emotive data that are made accessible on the majority of social network
sites. [29]
Twitter has been one of the social media platforms that has been investigated the
most in terms of sentiment analysis up to this point. Twitter gives users the abil-
ity to communicate their thoughts in the form of brief messages known as tweets.
Users are compelled to organize their ideas in a way that is succinct but gets to to
the point since the available space is restricted. This results in data that is rich in
sentiment, making it suited for use in NLP activities. Neethu and Rajasree investi-
gated and compared the effectiveness of SVM, [28] NB, [27]and ME algorithms on
electronic product tweets categorization, reaching 93 Percent accuracy with SVM
and ME. Agarwal et al. obtained 75 percent accuracy using SVM for binary classi-
fication on non-domain specific data. [11] [12]. There has also been some work done
with YouTube datasets, such as the classification of YouTube cooking videos using
SVM (with an accuracy of 95.3 Percent achieved) [19] or the classification of popular
Arabic YouTube videos using their comments, with an F1-score of 0.88 achieved by
SVM with the Radial Bases function [12, 16].
The classic machine learning algorithms perform poorly with cross-lingual or cross-
domain data [15] and have been under performing in contrast to deep learning.
Despite the fact that these techniques may yield accurate sentiment prediction for
text, they also have drawbacks.
By incorporating a multi layer structure into the neural network’s hidden layers,
deep learning is able to achieve more complex results. Features in conventional ma-
chine learning methods are specified and retrieved by hand or via the use of feature
selection techniques. Deep learning models, on the other hand, automatically learn
and extract information, leading to improved accuracy and performance. Classifier
models’ hyper parameters are often measured automatically as well. Comparison of
standard machine learning (Support Vector Machine (SVM), Bayesian networks, and
decision trees) with deep learning for sentiment polarity categorization is shown in
Figure 2.1 and 2.2. When it comes to solving difficult issues in areas like image and
voice recognition and NLP, the state-of-the-art solutions are those that use artificial
neural networks and deep learning. In this part, we’ll go through a variety of deep
learning approaches.
2.3. Deep Learning 7
• It contains neurons that receive signals from the preceding input layer. Each
buried layer trains its own set of characteristics.The more buried layers, the
more intricate abstract.
• Output Layer: This layer is made up of neurons that receive input from the
hidden layer and create the output value.
8 Chapter 2. Background
In CNNs, the output is computed using convolutions over the input layer. This
produces local connections in which each input area is linked to a neuron in the
output. Each layer applies several filters, generally hundreds or thousands as seen
above, and mixes the resulting images. A CNN automatically learns the values of its
filters based on the desired task during the training phase.
For instance, a CNN for image classification may learn to detect edges from raw
pixels in the first layer, then use the edges to detect simple shapes in the second
layer, and finally use these simple shapes to deter higher-level features, such as facial
shapes, in higher layers. The last layer is a classifier that employs these high-level
characteristics. Instead of picture pixels, the input to the majority of NLP jobs is
a matrix of phrases or texts. Each row of the matrix represents one token, which
is often a word but might also be a character. In other words, each row is a vec-
tor representing a word. These vectors are often word embeddings (low-dimensional
representations) such as word2vec or GloVe, but they may also be one-hot vectors
that index the word into a dictionary.
Data were preprocessed for the embedding matrix. Figure 2.4 depicts 4 convolution
layers and 2 max pooling layers processing an input embedding matrix. That the
very first 2 convolution layers utilize 64 and 32 filters to train various features; a
max pooling layer reduces output complexity and prevents over fitting. 3 and
4 convolution layers feature 16 and 8 filters, followed by max pooling. The last layer
is a fully linked layer that reduces the 8-dimensional vector to a 1-dimensional output
vector (Positive, Negative)
2.3. Deep Learning 9
Initially, approaches based on a lexicon were utilized for sentiment analysis. They
are separated into dictionary-based and corpus-based techniques [25]. In the first
kind, sentiment categorization is accomplished by the use of a terminology dictio-
nary, such as SentiWordNet and WordNet. However, corpus-based sentiment analysis
does not rely on a predefined dictionary, but rather on a statistical analysis of the
contents of a collection of documents, using techniques such as k-nearest neighbors
(k-NN) , conditional random field (CRF) [22], and hidden Markov models (HMM) ,
among others.
Machine learning Techniques offered for sentiment analysis issues fall into two
categories: (1) standard models and (2) deep learning models. Traditional models
relate to traditional machine learning algorithms, such as the nave Bayes classifier ,
the maximum entropy classifier [21,23], and support vector machines (SVM) . These
algorithms receive as input lexical characteristics, sentiment lexicon-based features,
parts of speech, as well as adjectives and adverbs. The precision of these systems
relies on the selected characteristics. Deep learning models can deliver superior than
traditional methods.CNN, DNN, and RNN are among the deep learning models that
may be utilized for sentiment analysis. These methods handle categorization issues
at the document, phrase, and aspect levels. The next section will cover these ap-
proaches of deep learning.
The hybrid techniques [26] combine methodologies based on lexicons and ma-
chine learning. Commonly, sentiment lexicons play a crucial part in the bulk of these
tactics.
2.4. Sentiment Analysis 11
2.5 BERT
BERT is an open source natural language processing machine learning framework
(NLP). Word embedding is intended to assist computers in understanding the mean-
ing of ambiguous words in text by leveraging surrounding material to build con-
text. [13]
A fundamental Transformer consists of an encoder that reads the text input and
a decoder that generates a prediction for the job. Since the objective of BERT is
to construct a language representation model, it simply requires the encoder. En-
coder input for BERT is a series of tokens, which are transformed to vectors and
then processed by the neural network. [14] Some of the other alternative options are
Hugging Face:- Distilled BERT,GPT 23 and XLNet. They are efficient but BERT
beat’s them all and has been a better performer by being state of the art in 7 0f 11
NLP tasks.
2.6.1 Accuracy
Accuracy =
TP + TN
TP + TN + FP + FN
2.6.2 Precision
Precision =
TP
TP + FP
2.6.3 Recall
Recall is calculated as true positive divided by the true positives and false nega-
tives.
Recall =
TP
TP + FN
14 Chapter 2. Background
2.6.4 F1 Score
The F1 score is the harmonic mean of accuracy and recall, accounting for both
measurements using the following equation: We use the harmonic mean rather than
a simple average since it penalizes outliers. To design a classification model with the
ideal combination of recall and accuracy, we maximize the F1 score.
F1 Score =
2 ∗ P recision ∗ Recall
P recision + Recall
Chapter 3
Related Work
This study aims to examine various techniques and methodologies in sentiment anal-
ysis that might serve as a reference for future empirical research.
Recently, deep learning models (such as DNN, CNN, and RNN) have been used to
improve the efficiency of sentiment analysis jobs. In this part, cutting-edge techniques
to sentiment analysis based on deep learning are examined.
Since 2015, several scholars have studied this tendency. Tang et al. [37]proposed
deep learning-based algorithms for a variety of sentiment studies, including learn-
ing word embedding, sentiment categorization, and opinion extraction. Zhang and
Zheng [26] addressed the use of machine learning to sentiment analysis. Both study
teams employed POS as a text feature and TF-IDF to compute the weight of words
for analysis. Sharef et al. [32] explored the advantages of large data sentiment analy-
sis methodologies. The most recent studies [3, 7, 33] are cited in deep-learning-based
techniques (namely CNN, RNN, and LSTM) were reviewed and compared with each
other in the context of sentiment analysis problems.
Other research used sentiment analysis based on deep learning to many areas, in-
cluding banking [2, 4] tweets about the weather [5], travel advisers, recommender
systems for cloud services [34], and movie reviews [6, 29]. In [5], where text char-
acteristics were automatically retrieved from several data sources, Word2vec was
used to translate user information and weather knowledge into word embedding.
Several papers [2, 34] use the same methodologies. Combining topic modeling with
the findings of a sentiment analysis conducted on customer-generated social media
data, Jeong et al. [43] highlighted product development prospects. It has been used
as a tool for real-time monitoring and analysis of changing client demands in situa-
tions with fast-developing products. Pham et al. [35] analyzed travel evaluations and
determined opinions for five criteria, including value, room, location, and cleanliness.
The application of polarity-based sentiment deep learning to tweets yielded [8, 9, 22].
The authors revealed how they employed deep learning models to boost the accuracy
of their sentiment assessments. Most of the models are used for material posted in
English, although there a handful that handle tweets in other languages, including
Spanish , Thai , and Persian [36]. Researchers in the past have examined tweets us-
ing various models of polarity-based sentiment deep learning. Those models include
DNN CNN , and hybrid techniques [9].
15
16 Chapter 3. Related Work
We discovered three prominent models for sentiment polarity analysis using deep
learning from studies: DNN [15, 29], CNN [9], and hybrid [29]. In [7, 8, 37], CNN,
RNN, and LSTM were evaluated independently on distinct data sets. However, a
comparative comparison of these three methodologies was lacking. [8]
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) ex-
hibit very good overall accuracy [8, 29] when evaluating the performance of a single
approach on a single dataset inside a certain domain (RNN). Hassan and Mahmood
shown that CNN and RNN models may circumvent the short-text limitation of deep
learning algorithms. Qian et al. [10] shown that Long Short-Term Memory (LSTM)
performs well when applied to varying text levels of weather-and-mood tweets.
The authors proposed deep learning sentiment analysis algorithms to classify Twit-
ter data reviews [38]. Significant data demonstrate that deep learning outperforms
conventional methods, such as Naive Bayes and SVM without Maximum Entropy. In
their research, the authors have used LSTM and DCNN models. Using word2vec [21]
to train word vectors for the DCNN and LSTM models. In this research, the Twitter
dataset was used. This study shown that DNN is superior than LSTM for sentiment
analysis using deep learning [38]. Furthermore, a big, almost meaningful data sample
is required for mining.
Chapter 4
Method
This chapter covers the methodology used in the study. The research begins with
a systematic assessment of the literature to identify frequently used algorithms for
doing Sentiment Analysis. It is followed by experiments to compare algorithm perfor-
mance. Experiments, Case studies, and Surveys [44] are the most prevalent empirical
methods.
This research may be conducted objectively by any industry or applied to many
topics, and the results can be used appropriately. Consequently, experimentation
is selected as one of the study methods.The experimentation method [44] is an an-
alytical and scholarly strategy in which the researcher systematically conducts an
experiment The primary purpose of experimentation is to apply and assess the cho-
sen algorithms using defined evaluation procedures. In addition, our dataset includes
dependent and independent variables, which motivates us to do experiments.The ex-
perimental research approach is used to answer research questions 2 and 3, The
Experiments utilize the same hardware and software described in this chapter.
For RQ1, literature research helps us uncover deep learning approaches and choose
the most popular algorithms. We examine relevant papers to determine which met-
rics to utilize to evaluate the algorithm for efficient sentiment analysis. Research will
build on the literature study findings.
How to do SLR:
• Make a short list of the helpful resources that have something to do with the
thesis.
17
18 Chapter 4. Method
• Add the criteria for which articles are included and which are not.
• Choose the research papers and publications that will help with the thesis.
• Look at the papers you found after your search.
• Write a summary of what you found and use it in the next steps of your research
4.2 Experiment
Hardware Environment
4.2.1.1 Pandas
For data pre-processing and data handling, panda’s package is used. [40] For the
next steps, it is very important to create a data structure for the scrapped data
that is provided by pandas for fast and flexible structuring of data. Its multipurpose
functionality for handling data is an advantage, all the data that is scraped for the
thesis work is converted to a data frame for further analysis and prediction [40].
4.2.1.2 NumPy
A NumPy stands for “Numerical Python” is used for implementing numerical com-
putations for vectors and matrices. It provides 50 times faster computation than
list data. For data analysis and numerical calculation in the thesis, this library is
used [38].
4.2.1.3 nltk
Natural Language Toolkit (NLTK) is a standard library that eases the use and im-
plementation of natural language processing and information retrieval tasks like to-
kenization, stemming, parsing, and semantic text relationships [40].
20 Chapter 4. Method
4.2.1.4 Sklearn
Scikit-learn provides tools and functionality for machine learning and statistical mod-
eling for classification, clustering, and other predictions. For example, split data into
train, validation, and test subsets, create features for text inputs, create tokens, and
count vectors like frequency count for tf-idf. For classification, task data is split into
train and test [41].
4.2.1.5 seaborn
Like matplotlib, the seaborn library is also used for data visualization and exploratory
data analysis, built on Matplotlib to create customized plots [40].
4.2.1.6 TensorFlow
TensorFlow is an end-to-end open-source library for creating deep learning models
to handle extensive data and implementing complex models like BERT to simplify
and speed up the process [18].
4.2.1.7 Keras
Like tensorflow, Keras is an open-source software high-level Application Program-
ming Interface (API) that provides a Python interface for artificial neural networks.,
it acts as an interface for the TensorFlow library. It is more user-friendly and a little
faster compared to Tensor flow. For the implementation of the Bidirectional En-
coder Representations from Transformers (BERT) model in the thesis, this package
is used [18]
4.2.2 Data
In order to conduct sentiment analysis, researchers may either create their own data
or make use of already databases Creating a new dataset allows for the use of data
that is relevant to the issue being analyzed, and the usage of personal data guaran-
tees that no privacy rules are broken [29].
The purpose of this thesis was to collect data and public opinion on the furniture
shop through various social media outlets. As part of the process, we have collected
data and built datasets from Twitter, Reddit, and several consumer forum websites
that include reviews of the furniture store’s items. Web scraping was utilized to
produce the dataset.
4.2. Experiment 21
Web scraping is the automatic collection of web data and information. It is essentially
the extraction of web data. Website Scraping is concerned with information retrieval,
newsgathering, web monitoring, and more. [15] Utilizing web scraping allows access-
ing the large quantity of information available online quick and straightforward. It
is far quicker and less complicated than manually pulling data from websites.
• Access the HTML of the webpage and extract useful information/data from it.
This technique is called web scraping or web harvesting or web data extraction.
4.2.2.2 BeautifulSoup
Beautiful Soup offers straightforward techniques for exploring, finding, and editing
a parse tree in HTML and XML files. It converts a complicated HTML page to a
Python object tree. It also transforms the page to Unicode automatically, so you
don’t have to worry about encodings. This program allows you not only scrape
data but also clean it. Figure 4.1 shows an example of the implementation of this
library. [16]
Figure 4.1: Web Scraping of the Website ’www.reviews.io’ done using beautifulsoup
library
Some of the other method’s that I have used for the data collection are tools
like OctaParse, is a cloud-based online data extraction system that enables users to
collect pertinent data from a variety of websites. It allows users from many sectors to
scrape unstructured data and store it in a number of forms. I have also considered
using Tweepy,an open-source Python program that facilitates easy access to the
Twitter API using Python which is useful in extracting tweets. During web scraping,
a lot of different non-necessary attributes, like user id and time of post, are pulled
out. So, we got rid of those columns and cleaned up our table so that the main
attribute is the review of the furniture store.
22 Chapter 4. Method
Figure 4.2: This is the sample Dataset that is formed after the web scraping one of
the social media channel.
4.2.3 Dataset
The datasets were gathered from various sites and are about different things regarding
the topic so that a wide range of experiments can be done. Because of this, the
results have made it possible to compare the performance of deep learning models in
sentiment analysis in a wide range of ways. The following explains these data sets:
• Twitter Dataset, is the primary dataset. It included close to 1.2 million tweets
that discussed various opinions and thoughts about the furniture store. It had
various fields like the ’user id’, ’date’, ’tweet url’,’text’, which contained the
main review.
• Reddit Dataset, has been obtained from the social networking site Reddit,
using the search string of the furniture store. This dataset has around close to
2200 samples.
• Reviewsio Dataset has a collection of reviews about the store. We have about
23100 samples.
• Consumer Affairs is the dataset that has the most extensive history of re-
views. There are around 51000 samples in all.
4.2. Experiment 23
A representative sample of tweets taken from one datasets is shown in Figure 4.3. It
includes data pertaining to all of the following areas:
In order to carry out the experiment, we made use of the "text" field from the
Dataset’s.
• Download and extract the dataset, then explore the directory structure.
• Here you can choose which BERT model you will load from TensorFlow Hub
and fine-tune. There are multiple BERT models available.
Using this BERT Model we have been able to classify the reviews into 1-5 as shown
in figure 4.4. This has been translated into positive and negative.
24 Chapter 4. Method
4.2.6.1 Tokenization:
Separating the phrase into words.
4.2.6.3 StopWords
Stop words are widely used terms (a, an, etc.) that are eliminated from papers.
These words have no practical significance since they do not discriminate between
two papers.
26 Chapter 4. Method
4.2.6.4 Stemming:
It is the transformation of a word into its basic form.
4.2.6.5 Lemmatization:
Lemmatization, unlike stemming, reduces words to an existing term in the language.
The construction of a stemmer is simpler than that of a lemmatizer, since the latter
needs extensive understanding of linguistics for developing dictionaries that search
up the lemma of a word.
4.2.8 TF-IDF
TF-IDF measures the mathematical importance of document words[2]. Vectoriza-
tion resembles OHE. Instead of 1, the word’s value is TF-IDF. Multiplying TF and
IDF yields.Term frequency is the ratio of target terms to overall terms in the doc-
ument.IDF is the logarithm of the ratio of total documents to target-term docu-
ments.We utilized the vectorizer class from the scikit-learn package for TF-IDF. The
formula that is used to compute the tf-idf for a term t of a document d in a document
set is tf-idf(t, d) = tf(t, d) * idf(t), and the idf is computed as idf(t) = log [ n / df(t)
] + 1 , where n is the total number of documents in the document set and df(t) is
the document frequency of t;
4.2.9 Implementation
Depending on the dataset, a particular processing approach was then used to ease
model construction Using instance, Twitter dataset, we eliminated columns that are
not relevant for sentiment analysis: "id", "date", "query stalongsidend "username"
and transformed class label to positive and negative values.
4.2. Experiment 27
After cleansing the datasets, sentences were separated into individual words and
returned. to their fundamental form by lemmatization. At this step, phrases were
transformed into continuous vectors Using two approaches, word embedding and TF-
IDF, we can convert feature vectors into vectors of words. Inputs to the deep learning
algorithms assessed in this research were both types of feature vectors. These were
the Convolutional Neural Networks, Deep Neural Networks, and Recurrent Neural
Networks algorithms. Thus, construction using models were generated, one for any
kind of vector.
Like mentioned earlier k fold cross validation with k equal to ten is used to de-
termine the efficacy of various embeddings. Initialized with random weights, the
function layer, the embedding layer that understands the embedding for all terms
in the training datasets. In this instance, the vocabulary size is 17000, the high-
est len is 40 characters. The output is a 40 by 300 matrix.
Initial 1D CNN layer contains a filter with a size of 3 kernels. For this, 64 filters
will be defined. This permits 64 distinct features to be trained on the initial layer
Consequently, the output of the first neural network layer is a 40 64 neuron matrix,
and the output of primary CNN fed into the next one. ˜
˜
Again, 32 distinct filters will be defined for training on this level. Using the same
reasoning like the primary layer, resulting matrix will have dimensions of 40 by 32.M
The max pool layer is often employed after a CNN layer to minimize the output’s
complexity and avoid data overfitting. In this instance, we select a level of three.
This indicates that the output matrix size of this layer is 13 by 32. 13 × 16 matrix
and a 13 × 8 matrix come out of the third and fourth 1D Convolutional Neural
Network layer. ˜
˜
Avg pooling layer that is used to prevent overfitting. We w,ill utilize the average
value rather than the highest number in this instance since it will provide superior
results. The size of the output matrix is 1 by 8 neurons. a fully connected layer
which has sigmoid activation is the last layer that reduces the 8-dimensional vector
to 1 for prediction ("positive," "negative").
Chapter 5
Results and Analysis
In this chapter, we show the results of the Literature Review, the Experiments, and
the analysis performed in response to the research questions.
No Article Results
1 Twitter sentiment analysis with CNN was the approach that was used for the
a deep neural network: An en- study work that was done on Twitter sentiment
hanced approach using user be- analysis. The dataset that was utilized was
havioral information [42] from the SemEval 2016 workshop, and the goal
of the experiment was to extract features based
on information on user behavior.
2 Sentiment analysis through recur- Sentiment analysis using recent recurrent vari-
rent variants latterly on convolu- ations was the focus of his work. CNN and
tional neural network of Twitter. RNN are the techniques that are being em-
[9] ployed in an effort to create domain-specific
word embedding on Twitter.
3 Big Data: Deep Learning for fi- Deep learning was the topic of this paper’s em-
nancial sentiment analysis [2] phasis for the analysis of financial sentiment.
LSTM, Word2vec, and CNN were among of
the techniques that were used. These were ap-
plied to the dataset of StockTwits with the goal
of enhancing the effectiveness of the sentiment
analysis for StockTwits.
4 Sentiment Analysis of a document Google’s Word2Vec-aided deep learning senti-
using deep learning approach and ment analysis. Preprocessing extracts char-
decision trees [45] acteristics initially. Word2Vec uses CBOWs
to forecast the current word and skip-grams
to anticipate the surrounding words. Data is
trained using an Elman-type RNN and clus-
tered. Deep learning outperformed CBOW, al-
though the accuracy difference was small.
Continued in next page
29
30 Chapter 5. Results and Analysis
Figure 5.1: Accuracy values of the models with TF-IDF and WordEmbedding
5.2. Experiment 1 - Results 33
The first performance metric, Accuracy, has been made, and Figure 5.1 shows
how the varying parts of the data all follow the same pattern for both TF-IDF
Word Embedding, with the exception of CNN for TF-IDF, which goes up exponen-
tially as it gets closer to the last 10%. CNN’s accuracy with TF-IDF was 0.7 most
of the time and 0.8 at the end. With Word Embedding, CNN has always been right
around 0.8 of the time. Both TF-IDF and Word Embedding gave DNN a score of
between 0.75 and 0.8. With an average accuracy of around 0.55, RNN with TF-IDF
did the worst. With an accuracy of over 0.8, Word Embedding with RNN has done
better than everyone else.
Figure 5.2: Recall values of the models with TF-IDF and WordEmbedding
Both TF-IDF and Word Embedding gave DNN a Recall Value between 0.75 and
0.8. CNN performed slightly better with Word Embedding with Recall Value closer
to 0.8 in comparison to TF-IDF which was close to 0.7.RNN with Word Embedding
was again the best performer with a value above 0.8 however it had a really nonlinear
trend with TF-IDF as seen in Figure 5.2
Figure 5.3: Precision values of the models with TF-IDF and WordEmbedding
34 Chapter 5. Results and Analysis
Figure 5.3 shows that Word Embedding has consistently performed with CNN,
RNN, and DNN with an approximate precision of 0.8. TF-IDF was able to equal this
value with DNN, but it failed badly with RNN, scoring a dismal 0.5. CNN performed
decently, with precision ranging from 0.7 to 0.8.
Figure 5.4: F-Score values of the models with TF-IDF and WordEmbedding
Figure 5.5: AUC values of the models with TF-IDF and WordEmbedding
Figures 5.4 and 5.5 illustrate a similar pattern for F-Score and AUC. The average
value for Word Embedding is around 0.8, with RNN doing the best, followed by CNN
and DNN. DNN has the highest TF-IDF score of 0.75, followed closely by CNN with
0.7. RNN has performed badly once again, with an AUC value of 0.55 and an F-Score
of 0.6.
5.2. Experiment 1 - Results 35
Figure 5.6: Performance measures of the Recurrent Neural Network with each of the
Word Embedding Technique
Figures 5.1–5.6 clearly demonstrate the superior efficiency of the models while
utilizing word embedding versus TF-IDF for all studied criteria. This increase is
particularly noteworthy for Recurrent Neural Network, which is the algorithm that
produces the greatest results when combined with word embedding. In contrast,
RNN is the least effective of the three algorithms examined when combined with
TF-IDF. Figure 5.6 illustrates the metric values produced by RNN models.
In the graphs above, we can also see that when it comes to word embedding, there
aren’t any big differences between the values of the evaluation measures for the three
deep learning methods. However, when it comes to TF-IDF, the differences between
the three methods are noticebale.Regarding the size of the dataset, its effect on the
outcomes is negligible for word embedding but somewhat stronger and inconsistent
for the TF-IDF approach.
Based on the study of the Twitter dataset’s findings, we can conclude the follow-
ing, word embedding is a better approach than TF-IDF. Furthermore, its use would
provide us to deal with a small set of data representing fifty percent of the overall
sample at a reduced computing cost and with negligible differences in the outcomes.
The tables below show the result’s of the dataset’s used.
TF-IDF word2vec
Metrics
CNN DNN RNN CNN DNN RNN
Accuracy 0.7563 0.7548 0.5432 0.8001 0.7702 0.815
Recall 0.7321 0.7423 0.7623 0.8012 0.7865 0.8241
Precision 0.7366 0.748 0.7635 0.8023 0.7845 0.8269
F Score 0.7542 0.764 0.6412 0.8074 0.7888 0.818
AUC 0.754 0.746 0.7557 0.8006 0.7875 0.8214
36 Chapter 5. Results and Analysis
TF-IDF word2vec
Metrics
CNN DNN RNN CNN DNN RNN
Accuracy 0.6651 0.6924 0.5421 0.7142 ‘0.7012 0.7567
Recall 0.6678 0.6932 0.8421 0.7241 0.7004 0.8014
Precision 0.6623 0.7012 0.4327 0.7123 0.7023 0.7423
F Score 0.678 0.6978 0.5874 0.7145 0.7088 0.7784
AUC 0.6642 0.6933 0.5023 0.7414 0.7023 0.762
TF-IDF word2vec
Metrics
CNN DNN RNN CNN DNN RNN
Accuracy 0.7124 0.7542 0.5062 0.7541 0.7325 0.7247
Recall 0.7244 0.7321 0.6231 0.8102 0.7294 0.7384
Precision 0.7144 0.7655 0.5541 0.7321 0.7325 0.7221
F Score 0.7144 0.7452 0.5210 0.7622 0.7322 0.7215
AUC 0.7211 0.7451 0.501 0.7514 0.7358 0.7214
TF-IDF word2vec
Metrics
CNN DNN RNN CNN DNN RNN
Accuracy 0.8122 0.8412 0.5594 0.8547 0.835 0.864
Recall 0.7954 0.8321 0.4512 0.8324 0.8365 0.8547
Precision 0.8255 0.8411 0.6021 0.8542 0.8369 0.8632
F Score 0.8011 0.8423 0.4523 0.8471 0.8214 0.874
AUC 08114 0.8544 0.5513 0.8541 0.8331 0.8641
TF-IDF word2vec
Metrics
CNN DNN RNN CNN DNN RNN
Accuracy 0.7921 0.8423 0.5741 0.8142 0.8014 0.8563
Recall 0.7412 0.8654 0.5541 0.8214 0.7714 0.8741
Precision 0.8241 0.8365 0.5847 0.8102 0.8001 0.8475
F Score 0.7845 0.8425 0.5632 0.8102 0.7821 0.8541
AUC 0.795 0.8475 0.5741 0.8147 0.7956 0.8547
5.2. Experiment 1 - Results 37
The findings drawn from the study of Twitter dataset are simply validated by the
results of the additional datasets. In general, the pairing of RNN with word embed-
ding exhibits the best performance, however there are outliers. These are generated
in "Reddit", in which the values of all metrics, barring recall, are slightly greater for
DNN with TF-IDF than it is for RNN with word2vec. CNN and Word2vec provided
the top results for Recall, Precision, and AUC for Reddit. These are the minuscule
distinctions between the biggest and smallest datasets. Similarly, as stated before,
we can confirm that word embedding is a more suitable approach than TF-IDF for
doing sentiment analysis, inspite of the tiny gains observed with TF-IDF for partic-
ular data sets.
38 Chapter 5. Results and Analysis
The tables demonstrate that TF-IDF, which generates less accurate models, con-
sumes more computing time than word embedding. This is another reason why this
last strategy is the most recommended. With both TF-IDF and word embeddings,
RNN is more time-taking approach. Given that the advances of RNN relative with
CNN, DNN are not particularly substantial in the latter situation, the usage of DNN
and CNN might be seen more acceptable when reducing computational cost is a
priority.
When comparing Deep Neural Networks and Convolutional Neural Network mod-
els, it is clear that CNN requires more time to process data but yields superior as-
sessment metrics.
5.4. Observations 39
5.4 Observations
Based on the Performance Metrics, we highlight some of the observations of the
sentiment analysis techniques.
• When word embedding is used, the Recurrent Neural Network model provides
the best level of dependability; however, its computing time is also the largest.
When doing a study of the sentiment of tweets and review datasets, using
a Recurrent Neural Network with TF-IDF takes much more time than using
other models, and the accuracy of the findings is around only half as good,
around 50 percent.
• The DNN model is easy to construct and generates results in a short amount of
time — around one minute for the majority of datasets, with the exception of
the dataset Twitter, for which the model required twelve minutes to generate
the results. Even while the model can be trained in a short amount of time, its
accuracy is only satisfactory (between 75% and 80%) across the board in all of
the validated datasets, which include tweets and reviews.
• The CNN model may also be trained and evaluated quickly, although it may be
a little less fast than DNN in this regard. The model achieves a greater level of
accuracy (above 80%) when applied to the tweet data observationss the review
dataset.
Chapter 6
Discussion
In this part, we Summarize the acquired findings and how they contribute to answer-
ing the Research questions. We also explore aspects that contradict the findings.
The RQ1 is resolved via a systematic literature review as discussed in 5.1. The
SLR assisted us in identifying suitable algorithms for our research question. Popular
strategies derived from the SLR have aided in identifying the optimal algorithm for
addressing the remaining research questions.
The Experiment and the results presented in 5.2 answers the ’RQ2:Which combi-
nation of word embedding performs the best with the deep learning model?’ The
experiment has shown that Before feeding text data (tweets, reviews) into a deep
learning model, text data (TF-IDF and word embedding) are converted into a nu-
meric vector. The results produced by TF-IDF are inferior than those produced by
word embedding. Moreover, the TF-IDF approach used with the RNN model pro-
vides less accurate findings. However, when RNN is combined with word embedding,
the outcomes are much improved. Future research may investigate how these and
other strategies might be improved to provide even better outcomes.
The Experiment and the results presented in 5.3 answers the ’RQ3:Which Deep
Learning model has the best processing time?’ Experiments on sentiment analysis
included 3 deep learning models (RNN,DNN and CNN). It was discovered that the
CNN model provides the optimal balance between processing speed and accuracy of
output. Although the RNN model was the most accurate when employed alongside
word embedding, the processing time was ten times greater compared to the CNN
model. The RNN model is ineffective when combined with the TF-IDF approach,
and its much longer processing time does not provide significantly superior results.
DNN is a straightforward deep learning model with average processing times and
outcomes. Continued studies on deep learning models may concentrate on improving
the trade-off between the accuracy of the findings and the processing time.
41
42 Chapter 6. Discussion
7.1 Conclusion
In this study, we present the fundamentals of deep learning models and associated
approaches that have been applied to social network data sentiment analysis. We
transformed input data using TF-IDF , word embedding before feeding it to deep
learning models. DNN, CNN, and RNN architectures were investigated and inte-
grated with TF-IDF, word embedding for sentiment analysis. We ran a series of
experiments to test DNN, CNN, and RNN models on various datasets, including
tweets and reviews, based on diverse subject matter. Additionally, we addressed
relevant studies in the topic. This information, together with the outcomes of our
experiments, provides us with a comprehensive understanding of applying deep learn-
ing models to sentiment analysis and integrating these models with text preparation
approaches.
CNN, DNN, and hybrid techniques were found as the most popular models for sen-
timent analysis after a study of the relevant literature. Another finding derived from
the study was that popular approaches, such as RNN and CNN, are individually
evaluated on various datasets in these papers, but there is no comparison analysis of
these techniques. In addition, the majority of articles provide outcomes in terms of
accuracy without regard for processing time.
The experiments that were carried out as part of this study were planned with the
intention of contributing to the filling in of the gaps stated before. We investigated
the effects of a variety of datasets, feature extraction methods, and deep learning
models, with a particular emphasis on the issue of sentiment analysis. When it
comes to doing a sentiment analysis, the findings indicate that it is preferable to
use a combination of deep learning algorithms and word embedding rather than TF-
IDF. The trials also showed that CNN works better than other models and strikes
a decent balance between accuracy and the amount of time it takes for the CPU to
execute. In most datasets, the RNN has a reliability that is somewhat better than
that of the CNN; nevertheless, its computing time is much greater. The efficiency of
43
44 Chapter 7. Conclusions and Future Work
The primary focus can be on investigating hybrid approaches, which involve the
combination of multiple models and techniques in order to improve the accuracy of
sentiment classification attained by the single models while concurrently decreasing
the amount of computational effort required. The purpose of this is to broaden the
scope of the comparative research such that it incorporates not only new methodolo-
gies but also new kinds of data. [30,31] As a result, the dependability and processing
speed of hybrid models would be assessed using a variety of data, including the sta-
tus updates, comments, and news found on social media platforms. We will also
have the intention of tackling the issue of aspect sentiment analysis to get a more
in-depth understanding of user feelings by linking them with certain characteristics
or subjects. This is of tremendous importance to a vast number of businesses since it
enables them to collect in-depth feedback from customers and, as a result, determine
which areas of their goods or services need to be enhanced. [32]
References
[1] K. Jain and S. Kaushal, "A Comparative Study of Machine Learning and Deep
Learning Techniques for Sentiment Analysis," 2018 7th International Conference
on Reliability, Infocom Technologies and Optimization (Trends and Future Direc-
tions) (ICRITO), 2018, pp. 483-487, doi: 10.1109/ICRITO.2018.8748793.
[2] Sohangir, S.; Wang, D.; Pomeranets, A.; Khoshgoftaar, T.M. Big Data: Deep
Learning for financial sentiment analysis. J. Big Data 2018, 5, 3
[3] Ain, Q.T.; Ali, M.; Riaz, A.; Noureen, A.; Kamran, M.; Hayat, B.; Rehman, A.
Sentiment analysis using deep learning techniques: A review. Int. J. Adv. Comput.
Sci. Appl. 2017, 8, 424.
[4] Jangid, H.; Singhal, S.; Shah, R.R.; Zimmermann, R. Aspect-Based Financial
Sentiment Analysis using Deep Learning. In Proceedings of the Companion of the
The Web Conference 2018 on The Web Conference, Lyon, France, 23–27 April
2018; pp. 1961–1966
[5] H. A. Shehu et al., "Deep Sentiment Analysis: A Case Study on Stemmed Turkish
Twitter Data," in IEEE Access, vol. 9, pp. 56836-56854, 2021, doi: 10.1109/AC-
CESS.2021.3071393.
[6] Kraus, M.; Feuerriegel, S. Sentiment analysis based on rhetorical structure theory:
Learning deep neural networks from discourse trees. Expert Syst. Appl. 2019, 118,
65–79.
[7] Singhal, P.; Bhattacharyya, P. Sentiment Analysis and Deep Learning: A Survey;
Center for Indian Language Technology, Indian Institute of Technology: Bombay,
Indian, 2016.
[8] Erenel, Z.; Adegboye, O.R.; Kusetogullari, H. A New Feature Selection
Scheme for Emotion Recognition from Text. Appl. Sci. 2020, 10, 5351.
https://doi.org/10.3390/app10155351
[9] Abid, F.; Alam, M.; Yasir, M.; Li, C.J. Sentiment analysis through recurrent vari-
ants latterly on convolutional neural network of Twitter. Future Gener. Comput.
Syst. 2019, 95, 292–308
[10] Aggarwal, C.C. Neural Networks and Deep Learning;
[11] “Clustering — scikit-learn 0.24.2 documentation.” [Online]. Available: https:
//www.scikit-yb.org/en/latest/api/cluster/elbow.html
[12] Ajay Shrestha and Ausif Mahmood. “Review of deep learning algorithms and
architectures”. In: IEEE Access 7 (2019), pp. 53040–53065.
45
46 References
[13] Shanshan Yu, Jindian Su, and Da Luo. “Improving bert-based text classification
with auxiliary sentence and domain knowledge”. In: IEEE Access 7 (2019), pp.
176600– 176612.
[14] Mickel Hoang, Oskar Alija Bihorac, and Jacobo Rouces. “Aspect-based sen-
timent analysis using bert”. In: Proceedings of the 22nd Nordic Conference on
Computational Linguistics (2019), pp. 187–196.
[15] Lei Zhang, Shuai Wang, and Bing Liu. “Deep learning for sentiment analysis:
A survey”. In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Dis-
covery 8.4 (2018), e1253.
[16] Vineeth G Nair. Getting started with beautiful soup. Packt Publishing Ltd, 2014
[17] S Chris Colbert et al. “The NumPy array: a structure for efficient numerical
computation”. In: Computing in Science Engineering. Citeseer. 2011.
[18] Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and Ten-
sorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly
Media, 2019
[19] Zhang, L.; Wang, S.; Liu, B. Deep learning for sentiment analysis: A survey.
WIREs Data Min. Knowl. Discov. 2018, 8, e1253.
[20] Britz, D. Recurrent Neural Networks Tutorial, Part 1–Introduction to
Rnns. Available online: http://www.wildml.com/2015/09/recurrent-neural-
networkstutorial-part-1-introduction-to-rnns/ (accessed on 12 March 2020).
[21] Ruangkanokmas, P.; Achalakul, T.; Akkarajitsakul, K. Deep Belief Networks
with Feature Selection for Sentiment Classification. In Proceedings of the 2016
7th International Conference on Intelligent Systems, Modelling and Simulation
(ISMS), Bangkok, Thailand, 25–27 January 2016; pp. 9–14.
[22] Vateekul, P.; Koomsubha, T. A study of sentiment analysis using deep learning
techniques on Thai Twitter data. In Proceedings of the 2016 13th International
Joint Conference on Computer Science and Software Engineering (JCSSE), Khon
Kaen, Thailand, 13–15 July 2016; pp. 1–6
[23] Ghosh, R.; Ravi, K.; Ravi, V. A novel deep learning architecture for sentiment
classification. In Proceedings of the 2016 3rd International Conference on Recent
Advances in Information Technology (RAIT), Dhanbad, India, 3–5 March 2016;
pp. 511–516.
[24] Bhavitha, B.; Rodrigues, A.P.; Chiplunkar, N.N. Comparative study of machine
learning techniques in sentimental analysis. In Proceedings of the 2017 Interna-
tional Conference on Inventive Communication and Computational Technologies
(ICICCT), Coimbatore, India, 10–11 March 2017; pp. 216–221
[25] Salas-Zárate, M.P.; Medina-Moreira, J.; Lagos-Ortiz, K.; Luna-Aveiga, H.;
Rodriguez-Garcia, M.A.; Valencia-García, R.J.C. Sentiment analysis on tweets
about diabetes: An aspect-level approach. Comput. Math. Methods Med. 2017,
2017. [CrossRef] [PubMed]
References 47
[26] Zhang, X.; Zheng, X. Comparison of Text Sentiment Analysis Based on Machine
Learning. In Proceedings of the 2016 15th International Symposium on Parallel and
Distributed Computing (ISPDC), Fuzhou, China, 8–10 July 2016; pp. 230–233.
[27] Malik, V.; Kumar, A. Communication. Sentiment Analysis of Twitter Data
Using Naive Bayes Algorithm. Int. J. Recent Innov. Trends Comput. Commun.
2018, 6, 120–125.
[28] Firmino Alves, A.L.; Baptista, C.d.S.; Firmino, A.A.; Oliveira, M.G.d.; Paiva,
A.C.D. A Comparison of SVM versus naive-bayes techniques for sentiment analysis
in tweets: A case study with the 2013 FIFA confederations cup. In Proceedings of
the 20th Brazilian Symposium on Multimedia and the Web, João Pessoa, Brazil,
18–21 November 2014; pp. 123–130.
[29] Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and appli-
cations: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [CrossRef]
[30] Jain, A.P.; Dandannavar, P. Application of machine learning techniques to
sentiment analysis. In Proceedings of the 2016 2nd International Conference on
Applied and Theoretical Computing and Communication Technology (iCATccT),
Karnataka, India, 21–23 July 2016; pp. 628–632.
[31] Tang, D.; Qin, B.; Liu, T. Deep learning for sentiment analysis: Successful ap-
proaches and future challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
2015, 5, 292–303.
[32] Sharef, N.M.; Zin, H.M.; Nadali, S. Overview and Future Opportunities of Sen-
timent Analysis Approaches for Big Data. JCS 2016, 12, 153–168
[33] Qian, J.; Niu, Z.; Shi, C. Sentiment Analysis Model on Weather Related Tweets
with Deep Neural Network. In Proceedings of the 2018 10th International Confer-
ence on Machine Learning and Computing, Macau, China, 26–28 February 2018;
pp. 31–35
[34] Roshanfekr, B.; Khadivi, S.; Rahmati, M. Sentiment analysis using deep learning
on Persian texts. In Proceedings of the 2017 Iranian Conference on Electrical
Engineering (ICEE), Tehran, Iran, 2–4 May 2017; pp. 1503–1508.
[35] Ramadhani, A.M.; Goo, H.S. Twitter sentiment analysis using deep learning
methods. In Proceedings of the 2017 7th International Annual Engineering Seminar
(InAES), Yogyakarta, Indonesia, 1–2 August 2017; pp. 1–4
[36] Paredes-Valverde, M.A.; Colomo-Palacios, R.; Salas-Zárate, M.D.P.; Valencia-
García, R. Sentiment analysis in Spanish for improvement of products and services:
A deep learning approach. Sci. Program. 2017, 2017.
[37] Tang, D.; Zhang, M. Deep Learning in Sentiment Analysis. In Deep Learning
in Natural Language Processing; Springer: Berlin, Germany, 2018; pp. 219–253.
[38] Araque, O.; Corcuera-Platas, I.; Sanchez-Rada, J.F.; Iglesias, C.A. Enhancing
deep learning sentiment analysis with ensemble techniques in social applications.
Expert Syst. Appl. 2017, 77, 236–24
[39] Liu, J.; Chang, W.-C.; Wu, Y.; Yang, Y. Deep learning for extreme multi-
label text classification. In Proceedings of the 40th International ACM SIGIR
48 References