0% found this document useful (0 votes)
19 views32 pages

CSIT LRS Modified

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views32 pages

CSIT LRS Modified

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

SENTIMENT ANALYSIS USING MACHINE

LEARNING

A Literature Survey Report

Submitted by

Ramesh Prasad Bhatta

RMS Id: 23PHD0061

Under the guidance of

Dr. Akhtar Husain

Associate Professor

Department of Computer Science and Information


Technology

MJP Rohilkhand University, Bareilly- 243 006 (UP)


JUNE, 2024
Declaration (On university letter head)

(To be attached with the LSR by the candidate)

1. I understand what plagiarism is and I am aware of the UGC plagiarism policy in


this regard.

2. I declare that this report is my own original work. Where other people's work has
been used (either from a printed source, Internet, or any other source), this has been
properly acknowledged and referenced, and given due credit.

3. This work is not previously produced or submitted for consideration by another


candidate for the award of the Pre-Ph.D. course work by any other person to any other
institution.

4. I declare that the similarity percentage of this synopsis is not more than 20%, in any
case of discrepancy the report evaluation may be cancelled at any step.

Name of the Scholar: Ramesh Prasad Bhatta

Registration/Enrolment: 23004489

Signature:

Date:

Supervisor

(Signature with stamp)

Institution Name:
Introduction

Sentiment analysis is a method for investigating public sentiment. There are many
social media platforms on the Internet where people could express their perceptions
regarding sociological, cultural, political, religious and on many more topics of their
interest. This type of information will be helpful for sentiment analysis. The sentiment
analysis will help to assess the people's opinion and to drive the society in right way
and making and correcting decisions from authority. In this review I have discussed
about general methods for sentiment analysis and finding results of analysis so far in
various sectors of our concern. The survey explores sentiment analysis using Deep
Learning models to extract emotions from text data, aiming to overcome limitations
of Machine Learning models in handling large datasets and feature extraction tasks.

The aim of this survey is to present sentiment analysis, methods and material used in
sentiment analysis, features and dataset used, domain of study and comparison
between various methods.

This study introduces ABCDM, a model combining CNN and RNN with attention
mechanisms for sentiment analysis tasks. ABCDM utilizes bidirectional information
flow and attention mechanisms to capture contextual information effectively in
sentiment analysis. With the escalating pace at which data is being generated by
online users across different platforms, it is imperative to conduct analysis and
leverage this data by Defense and other Government Entities to understand the public
sentiment. This will enable these entities to govern their activities effectively and
determine appropriate courses of action. Furthermore, during critical national events,
it is of utmost significance.

With the help of tools like sentiment analysis and the proliferation of user-generated
material on the internet, marketers can now obtain insights into how customers feel
about their products [2]. By identifying attitudes in product reviews, marketers can
better target customers who require further attention, increasing customer satisfaction
and sales—all of which are ultimately advantageous to businesses [4].
Thus, by combining the findings of various secondary studies, the goal of this study is
to gain a deeper understanding of the sentiment analysis research field. Review of
Systematic Literature (SLR): Finding pertinent primary studies, gathering the data

1
needed to address the research questions, and synthesizing the data are the objectives
of a systematic literature review (SLR) study. It employs a clear process and conducts
an objective, repeatable assessment of the literature [5]. Analysis of sentiment is a
With the help of tools like sentiment analysis and the proliferation of user-generated
material on the internet, marketers can now obtain insights into how customers feel
about their products [2].By identifying attitudes in product reviews, marketers can
better target customers who require further attention, increasing customer satisfaction
and sales—all of which are ultimately advantageous to businesses [4].Thus, by
combining the findings of various secondary studies, the goal of this study is to gain a
deeper understanding of the sentiment analysis research field. Review of Systematic
Literature (SLR): Finding pertinent primary studies, gathering the data needed to
address the research questions, and synthesizing the data are the objectives of a
systematic literature review (SLR) study. It employs a clear process and conducts an
objective, repeatable assessment of the literature [5]. Analysis of sentiment is a
Sentiment classification has a wide range of uses since it makes it possible to
automatically analyze massive amounts of textual data and provides insightful
information that can guide decision-making. Opinion mining, another name for
sentiment analysis, is a technique used to examine public opinion in textual
data[6].Researchers are increasingly using deep learning techniques like recurrent
neural networks (RNNs) because of how well they can handle multiple problems at
once[7].RNNs are appropriate for sentiment analysis since they have been effectively
used in natural language processing tasks, especially the Long Short-Term Memory
(LSTM) algorithm[8].The goal of this survey is to design a machine learning and deep
learning system for sentiment analysis of expressions. The paper highlights the value
of sentiment analysis in interpreting the beliefs and feelings found in textual data.

paper highlights the value of sentiment analysis in interpreting the beliefs and feelings
found in textual data. The goal of this survey is to design a machine learning and deep
learning system for sentiment analysis of expressions.

The goal of this survey is to design a machine learning and deep learning system for
sentiment analysis of expressions. The paper highlights the value of sentiment
analysis in interpreting the beliefs and feelings found in textual data.

2
appropriate for sentiment analysis since they have been effectively used in natural
language processing tasks, especially the Long Short-Term Memory (LSTM)
algorithm [8]. increasingly using deep learning techniques like recurrent neural
networks (RNNs) because of how well they can handle multiple problems at once [7].

RNNs are appropriate for analysis sentiment they analysis been since used they have
language processing tasks, been effectively Long Short-Term Memory (LSTM) used
in natural language processing tasks, especially the Long Short-Term Memory
(LSTM) algorithm [8].

Researchers are increasingly using increasingly learning techniques using deep neural
networks (RNNs) learning of techniques well like recurrent handle neural problems
networks (RNNs) because of how well they can handle multiple problems at once [7].

classification has a wide range of uses since it makes it possible to automatically


analyze massive amounts of textual data and provides insightful information that can
guide decision-making.name for sentiment analysis, is a technique used to examine
public opinion in textual data[6].Deep learning approaches, such as recurrent neural
networks (RNNs), have gained popularity among researchers for their effectiveness in
addressing various issues simultaneously[7].RNNs, particularly the Long Short-Term
Memory (LSTM) algorithm, have been successfully applied in natural language
processing tasks, making them suitable for sentiment analysis[8].This survey focuses
on developing an expression sentiment analysis system using various Machine
learning and deep learning algorithms. The review discusses the importance of
sentiment analysis in understanding opinions and emotions expressed in text data

Sentiment classification has wide range a wide range it of uses possible since
automatically analyze it amounts makes it possible and to automatically analyze that
massive guide amounts of textual data and provides insightful information that can
guide decision-making.

Opinion mining, another name for analysis, sentiment analysis, technique is a


examine technique opinion used to examine public opinion in textual data [6].

Sentiment classification has numerous applications as it enables automated analysis of


large volumes of textual data, providing valuable insights that can be used to inform
decision-making processes. Sentiment analysis, also known as opinion mining, is a

3
method used to analyze public sentiment in text data[6].Deep learning approaches,
such as recurrent neural networks (RNNs), have gained popularity among researchers
for their effectiveness in addressing various issues simultaneously[7].RNNs,
particularly the Long Short-Term Memory (LSTM) algorithm, have been successfully
applied in natural language processing tasks, making them suitable for sentiment
analysis[8].This survey focuses on developing an expression sentiment analysis
system using various Machine learning and deep learning algorithms. The review
discusses the importance of sentiment analysis in understanding opinions and
emotions expressed in text data

Sentiment analysis involves the extraction and analysis of subjective information


from textual data to determine the sentiment or opinion expressed by the author. This
involves natural language processing and the use of computational methods to
understand and classify the sentiments contained in the text [17] With the advances in
technology and the widespread use of social media and online review platforms, it has
become increasingly important to understand user opinions and sentiments regarding
a particular product, service or issue. This is caused by several factors.

The sentiment analysis process involves collecting text or data related to a particular
topic or entity, such as product reviews, social media posts, or news articles [26].
Then, the text is analyzed computationally using various techniques and algorithms to
identify and categorize the feelings expressed in the text. Three basic categories are
typically used to categorize emotions: positive, negative, and neutral [25].

Sentiment analysis also known as opinion generation is an important task in natural


language processing and data mining. The main purpose of sentiment analysis is to
identify, collect, and understand the opinions, attitudes, and emotions contained in the
text or data being analyzed. This aims to extract subjective information from texts that
are used to understand individual or group views or responses to a topic, product,
service, brand, or event [23]

Sentiment analysis plays an important role in understanding and interpreting user


opinions and sentiments. Using natural language processing techniques and
computational methods, sentiment analysis helps decompose vast textual data into
useful information that can be used for better decision-making in various aspects of
business, marketing, and product and service development [22].

4
I found in review study that sentiment analysis has following scopes and there will be
many other field still to be discovered.

1. Customer experience and feedback analysis: Sentiment analysis can be used to


analyze customer reviews, social media posts, and other customer-generated
content to understand customer sentiment and identify areas for improvement.
2. Brand monitoring and reputation management: Companies can use sentiment
analysis to monitor their brand's reputation and track how their products or
services are perceived by the public.
3. Political and social analysis: Sentiment analysis can be used to analyze
political discourse, social movements, and public opinion on various issues.
4. Financial and market analysis: Sentiment analysis can be used to analyze
financial news, earnings reports, and social media discussions to predict
market trends and investor sentiment.
5. Product and service improvement: Sentiment analysis can be used to identify
customer pain points and areas for improvement in product or service
offerings.
6. Personalization and recommendation systems: Sentiment analysis can be used
to personalize content and make recommendations based on user preferences
and sentiments.

Thus importance of sentiment analysis is increasing day by day and subsequently


challenges are also increasing. It has direct impact on business, politics, society and
all other sort of our life we can assess the sentiment and I observed following impact
in literature survey.

1.Broader user participation

In the contemporary digital age, numerous individuals utilize social media and online
review platforms for the dissemination of their experiences, viewpoints, and
assessments regarding diverse products or services. The volume of content produced
by these users is substantial, and deciphering the sentiments encapsulated in such
content can furnish a company or organization with invaluable perspectives.

2.Direct impact on reputation

5
User reviews and opinions can have a direct impact on the reputation of a product,
service or organization. Positive reviews can enhance brand image and influence
purchasing decisions, while negative reviews can damage reputation and lead to
decreased sales. Therefore, accurately understanding user sentiment and responding
promptly to it is very important.

3.Product and service improvement opportunities

Sentiment analysis can provide valuable insight into the strengths and weaknesses of a
product or service from a user perspective. By understanding user sentiment in depth,
companies can identify areas of improvement and take steps to improve the quality of
their products or services, as well as increase customer satisfaction.

4.Monitoring issues and trends

Social media and online review platforms are also valuable sources of information to
monitor current issues and trends in society. By monitoring user sentiment regarding
social, political, or environmental issues, organizations can better understand public
views and responses, and direct their strategies and policies according to public needs
and expectations.

The objectives of my study are stated as follows

1. To find scope and features of analysis of sentiment or opinion.

2. To study the methods and compare those methods in sentiment analysis.

3. To study the domain and dataset used in sentiment analysis.

4. To find research gap in studied domain.

Body

6
Sentiment analysis is the field dealing and analyzing people's opinions, sentiments,
and attitudes, behavioral responses to certain events or incidents based on the written
language available. It has become one of the most active areas in research due to
surge in the fields like machine learning and deep learning, and the blending of such
areas with the earlier used statistical methods of Natural Language Processing.
Sudden bloom in sentiment analysis comes with the growing interest and active
participation in social media such as reviews of various topics and arts, forums
discussions about prevalent issues, blogs and micro-blogs about opinions and
information sharing, Twitter and social networks talk about something which is
trending in the local area or globally.

2.1 Sentiment classification

sentiment classification is a natural language processing task that involves


determining the sentiment or emotional tone of a given piece of text, such as a review,
tweet, or message. It is frequently utilized in programs like client service, social
media analysis, and product feedback analysis. The basic idea behind sentiment
classification is to train a machine learning model to recognize patterns in text that are
associated with positive, negative, or neutral sentiment. This can be done using
various techniques, such as:

Rule-based Approaches: These methods use a set of predefined rules or lexicons to


determine the sentiment of a text based on the presence of specific words or phrases.

Machine Learning Approaches: These methods use supervised learning algorithms,


such as logistic regression, support vector machines, or deep neural networks, to learn
patterns in labeled sentiment data and make predictions on new, unseen text.

Hybrid Approaches: These methods combine rule-based and machine learning


techniques to leverage the strengths of both approaches

When discussing sentiment analysis, the term "polarity determination," which is a


subtask of sentiment classification, is frequently used incorrectly. It is only a subtask,
though, with the goal of determining the sentiment polarity in every text document.
Polarity is often categorized as either positive or negative [3]. Neutral is a third class
that is included in some studies.

7
Extraction of domain-invariant characteristics whose distribution in the source
domain is similar to that of the target domain is a widely used technique [7].
Information specific to the target domain can be added to the model. Similar
procedures are used for cross-language analysis, such as training a model on a dataset
in the source language and testing it on a language where there is less data, like
translating the target language to the source language first. Procedures are used for
cross-language analysis, such as training a model on a dataset in the source language
and testing it on a language where there is less data, like translating the target
language to the source first. The study claimed that using the Bayesian approach,
opinion-level context can help resolve sentiment word polarity ambiguity. One of the
issues that needs to be resolved for sentiment analysis is word polarity ambiguity. [13]
demonstrated that for word polarity disambiguation, the information retrieval-based
model is an alternative to machine learning-based techniques.

2.1.1 Classification of Subjectivity

The goal of subjectivity categorization involves identifying any instances of


subjectivity within the text [12]. Restricting undesired objective data objects for
additional processing is the aim of subjectivity categorization (Kamal 2013). It's
frequently regarded as the initial phase of sentiment analysis. Subjectivity
categorization looks for subjective cues, emotional phrases, or subjective ideas like
"better," "expensive," and "easy."

2.1.2 Identifying opinion spam

The increasing prevalence of review and e-commerce websites has made opinion
spam identification a major problem in sentiment analysis. Opinion spams are well-
written remarks that either support or refute a product. They are also known as
misleading or fake reviews. Opinion spam detection looks for three characteristics
that are indicative of a phony review: the review's content, its metadata, and actual
product expertise. [1]

2.1.3 Detection of implicit language

Implicit language includes irony, sarcasm, and humor. This type of communication
has ambiguity and vagueness, which can occasionally be difficult for humans to

8
recognize. On the other hand, a sentence's implicit meaning has the power to entirely
change its polarity. The goal of implicit language detection is frequently to
comprehend event-related facts. For instance, the factual term "pain" has a negative
polarity load in the statement "I love pain." Irony, humor, and sarcasm can all be seen
in the contradiction between the subjective term "love" and the factual word "pain."
Conventional techniques for identifying implicit language involve examining cues
like emoticons, laughing expressions, and frequent use of punctuation. [2]

2.1.4 Extraction of aspects

Retrieving the target entity and its components from the document is known as aspect
extraction. A product, person, event, company, etc. can be the target entity [1]. For
fine-grained sentiment analysis, it is necessary to identify people's thoughts regarding
different aspects of a product [1]. Given that social media and blogs frequently lack
set themes for sentiment analysis, aspect extraction is particularly crucial in these
situations.

2.2 Levels of analysis

Sentiment analysis can be implemented at the following three levels: document,


sentence, and aspect level. They are discussed in following paragraph.

Fig:1 Levels of analysis

2.2.1 Document-level

According to Wang et al. (2014), document-level analysis uses the entire text
document as its analytical unit. It's a straightforward task that assumes all of the

9
document's opinions came from one person. One of the challenges associated with
document analysis is the possibility of contradictory viewpoints presented in a text in
a variety of ways, sometimes through implicit language [1]. Documents are usually
edited at the phrase or aspect level before the whole text document's polarity is
determined.

2.2.2 Sentence-level:

Sentence-level analysis, which is particularly useful for classifying subjectivity,


examines individual sentences within a text. Sentences in text documents are usually
either opinionated or not. Sentences inside a document are analyzed using subjectivity
categorization to determine whether they include facts, feelings, or opinions.

Fig: 2 Milestones of sentiment analysis research for the last decade

2.2.3 Aspect-level

In sentiment analysis, the primary objective of aspect-level analysis is a difficult


subject. It involves examining feelings toward particular things and their features
within a written document, rather than just the content's general attitude [25]. Another
name for it is feature- or entity-level analysis. A document's overall emotion may be
categorized as good or negative, however different opinions can exist regarding

10
particular features of an item [1]. It is necessary to identify the characteristics of the
entity in order to measure aspect-level opinion. [17]) claimed that because aspect-
based sentiment analysis extracts consumer sentiments in a clear manner, it is
advantageous for business managers. Additionally, they revealed that TripAdvisor's
ironic expression detection

is still an unsolved issue, and labeling reviews should take user sentiment into account
as well, as some people comment positively on unfavorable user ratings and vice
versa? [16] enhanced the LDA algorithm with semantic similarity for aspect-based
sentiment analysis and presented a brand-new algorithm known as Sentic LDA
(Latent Dirichlet Allocation). They came to the conclusion that by applying common
sense computing, this novel technique facilitates researchers' transition from
syntactical analysis to semantic analysis in aspect-based sentiment analysis [9] and
enhances the clustering process.

[9] claimed that categorization or classification alone is insufficient for sentiment


analysis and that a comprehensive strategy is necessary. The following 15 Natural
Language Processing (NLP) challenges are included in the three-layer structure they
used to present the problem:

Syntactic layer: lemmatization, POS tagging, text chunking, sentence boundary


disambiguation, and micro text normalizing

Word sense disambiguation, concept extraction, named entity recognition, anaphora


resolution, and subjectivity detection are all part of the semantics layer.

The pragmatics layer includes polarity detection, aspect extraction, metaphor


comprehension, sarcasm detection, and personality recognition. [9] describe three
categories of approaches for sentiment analysis and affective computing: knowledge-
based techniques, statistical approaches (such as machine learning and deep learning
approaches), and hybrid techniques that combine the statistical and knowledge-based
approaches.

11
Fig: 3 Sentiment Analysis concepts overview

[1] An extensive review of sentiment analysis techniques, applications, and issues is


given in this study. It talks on the value of sentiment analysis in obtaining and
examining user opinions from a variety of online platforms, including blogs and
social media. Level of Sentiment Analysis, Data Collection, Feature Extraction,
Feature Selection Method, General Methodology for Sentiment Analysis, Sentiment
Analysis Applications in Various Domains, and Challenges in Sentiment Analysis are
the elements that make up the survey's structure. The article uses the Lexicon Based
Approach, Machine Learning Approach, and Hybrid Approach as its three primary
methods for sentiment analysis in text data. Sentiment analysis is investigated at
several levels, including the document, sentence, phrase, and aspect levels, with an
emphasis on comprehending organized and unorganized feelings. The article also
highlights particular tools and methods for sentiment analysis, like the Wrapper

12
approach, which uses machine learning algorithms for feature selection, and FastText,
an open-source toolkit for word categorization and vectorization.

The paper does not investigate deeply into the specific strategies for addressing the
challenges posed by informal writing styles and language variations, leaving room for
further exploration and research in these areas.

The title named Source detection of rumor in social network –review discussed as

[2] Their study focuses on the detection of rumors and misinformation in social
networks, highlighting the importance of identifying the sources to control the spread
of false information. It discusses the challenges of rumor diffusion in social networks
due to the rapid sharing of unverified information during events like natural disasters
or epidemics. The paper categorizes source detection approaches into single source
and multiple source detection, emphasizing the need for accurate and quick
identification of rumor sources in various application domains like disease outbreaks
or virus spread. The rise of social networking platforms has led to the widespread
dissemination of misinformation, with real-world consequences like fear and anxiety
among the population, making it crucial to address the negative impacts of rumor
diffusion in social networks. The study emphasizes the importance of accurate and
quick detection of rumor sources to mitigate the negative impacts of false information
dissemination in society. There is a wide variation in the accuracy of current source
detection methods, indicating the need for further research and improvement in source
detection approaches for rumors in social networks.

[7], Convolutional Neural Network for Image Classification”. Johns Hopkins


University Baltimore, MD, 21218.Image Sentiment Analysis is one of the imperative
research areas for study, as now people are more used to of visual data to converse.
Inspired by the arising issue of image sentiment analysis and its promising solution
through deep learning techniques. The growth of user-generated content on the
Internet is mostly due to Web 2.0. The thoughts, feelings, and lifestyles of the users
are strongly tied to this content. As a result, examining this user-generated data for
analysis can help with decision-making and public opinion tracking. One of the most
often used text-based analytics applications is sentiment analysis, which may be used
to mine people's attitudes, feelings, evaluations, and views about various situations,
entities, subjects, events, and items. [9] Sentiment analysis can be used to quantify the

13
strength or weakness of emotions as well as categorize them as positive, negative, or
neutral in unstructured texts. These days, sentiment analysis is extensively employed
in a number of industries, including services, business, finance, politics, and
education. This analytical method has been widely accepted not just

among scientists as well as between authorities, organizations, and businesses [3]. It


aids in the decision-making of businesspeople, policymakers, and employees.
Sentiment analysis is made more challenging by the fact that unstructured text makes
up the majority of user-generated content data. Researchers have been investigating
strategies and tactics to improve the precision of this kind of analysis since 2000.
Social media platforms' widespread use has facilitated global human connection. The
study areas, application domains, fundamental techniques, and technologies of
sentiment analysis are all always evolving due to the rapid progress of technology. To
acquire a thorough grasp of an area, scholars can benefit from comparing and
evaluating publications from related fields. Numerous surveys on sentiment analysis
have been conducted [1]. The relationships between research methodologies and opics
in the field, as well as how they have changed over time, are, nevertheless, not
sufficiently discussed. boundaries inside the field [3Researchers, particularly those
who are new to the subject, might benefit from its guidance in determining research
directions, avoiding repeating research, and better identifying and understanding the
current trends in this field [9].

The authors of the study by Schouten et al. concentrated on aspect-level sentiment


analysis and combined methods of aspect-level sentiment analysis utilized prior to
2014, including hybrid approaches, supervised machine learning, unsupervised
machine learning, frequency-based, and syntax-based methods. They came to the
conclusion that the most recent technology has advanced past its infancy [8]. As
sentiment analysis studies grew more and more as deep learning technologies gained
popularity and significant advancements were achieved in their development, scholars
began to focus more on sentiment analysis techniques and procedures. Specifically,
deep learning techniques became the main topic of conversation among scientists. [1].
At the level of sentence and aspect/object sentiment analysis, a variety of deep
learning techniques, such as Convolutional Neural Network (CNN), Recurrent Neural
Network (RNN), and Long Short-term Memory (LSTM), were examined. [1]

14
[8] evaluated the effectiveness of deep learning techniques on particular datasets and
suggested that models such as Bidirectional Encoder Representations from
Transformers (BERT), sentiment-specific word embedding models, cognitive-based
attention models, and commonsense knowledge could be used to enhance
performance. [8]

2.3 Approaches and Methodology

sentiment analysis has significantly influenced research in various fields, with a


multitude of methodologies available for its implementation. Ongoing research
endeavors are persistently seeking improved alternatives, underscoring the criticality
of this process in the current context.

2.1 Machine Learning Approaches

Before using an algorithm on the real data set, machine learning techniques train it on
a training set of data. In order for an algorithm to function with fresh, unknown data
later on, machine learning techniques first train the algorithm using a specific set of
inputs and known outputs [2]. The following are some of the most well-known
machine learning-based works:

2.1.1. Vector Machine Support

Sentiment analysis, the practice of identifying the sentiment or emotional tone of a


textual document, such as a product review, social media post, or customer feedback,
uses the widely used Support Vector Machine (SVM) algorithm.

The best hyperplane to divide the data is found by the SVM algorithm.

into distinct classes, like feeling that is pleasant or negative. The nearest data points,
referred to as the support vectors, are the hyperplane, and the method seeks to
maximize the margin between them. Because this method can handle high-
dimensional feature spaces, like the text representation in a bag of words, and because
it is somewhat resilient against overfitting, it is especially useful for sentiment
analysis.

15
Fig 4: a Linear Classifier Fig b SVM illustration

2.1.2 N-gram Sentiment Analysis

An n-gram is a contiguous sequence of n elements from a given sequence of text or


speech in the domains of probability and linguistics. Depending on the application,
the elements may be phonemes, syllables, letters, words, or base pairs. Usually, the n-
grams are gathered from a corpus of voice or text. N-grams are sometimes referred to
as shingles when the elements are words. They are taking the statement as a whole
into consideration here [5]. They are using four different kinds of lexicons: exception
lexicons, lexicon with aspects, sentiment phrase lexicons, and sentiment strength
lexicons.

2.1.3 Naïve Bayes Method

It is primarily utilized when the training set is smaller and is a probabilistic classifier.
It belongs to the family of sample probabilistic classifiers in machine learning that are
based on the Bayes theorem. The Bayes rule determines the conditional probability
that an event X occurs given the evidence Y by (1).

P(X/Y) is equal to P(X) P(Y/X) / P(Y).

Therefore, the equation is changed to the following (2) [6] in order to get the
sentiment.

P(Emotion/Verdict) = P(Emotion)P(sentence) / P(sentiment) (2)

16
As the product of P (token /sentiment) [6], which is formulated by the (3),
P(sentence/sentiment) is calculated.

Count(Alltokensinclass)+Count(Alltokens) + Count(Thistokeninclass)^(3)

Here, the terms "add one" and "count of all tokens" refer to Laplace smoothing.

4.Maximum Entropy Classifier

One common machine learning approach for Natural Language Processing (NLP)
problems is the Maximum Entropy (MaxEnt) classifier. Subject to limitations given
by the training data, this kind of discriminative model seeks to identify the probability
distribution that optimizes the entropy. Estimating the conditional probability of a
target variable (such as a class label) given the observed characteristics is the main
notion underlying the MaxEnt classifier. An exponential function of a linear
combination of the feature values is used to simulate this probability, and the weights
of the features are chosen during the training phase. The joint-features that are created
from a set of features by an encoding are combined using a set of weights that
parameterize a Maximum Entropy (ME) classifier, also known as a conditional
exponential classifier. Every feature set and label pair is mapped to a vector by
encoding. Since ME classifiers collect a set of characteristics from the input, combine
them linearly, and use the total as the exponent, they fall under the category of
classifiers known as exponential or log-linear classifiers. Point-wise Mutual
Information (PMI) is used if this method is used unsupervised to determine the co-
occurrence of a word with positive and negative words.

2.1.4 K-NN and Weighted K-NN

The foundation of the K-Nearest Neighbour approach is the idea that instances
classified close to one another in vector space would have classifications that are
relatively similar. Additional study was conducted on the weighted k-Nearest
Neighbor approach, wherein training set elements were assigned weights, which were
then utilized to calculate sentiment in text on a word-by-word basis [8]. In this case,
the (4) is used to determine the score.

17
1∑j score (pos) + 1∑k score (neg)) / 1∑s maximum score equals the positivity score
(4).

In this case, s=j+k, or the total of the positive and negative numbers. Prior to
extracting the stop words from the tweets, they tokenize the sentences using the
weighted k-NN approach.

Two parses make up the authors of [8]'s suggested algorithm. Following the initial
parsing, each review is given a favorable score. This is sent for a second parsing, and
a neutral review input is provided. If necessary, the score is adjusted using this. Better
positivist determination is achieved by doing this, and an output file with the review
ID and positive score is produced.

2.1.5 Multilingual Sentiment Analysis

Customers can now express their opinions in a variety of languages, thus in order to
get better results, researchers should take into account postings written in multiple
languages. It is further described in [9], which provided an explanation of a
multilingual framework within which to complete the task of identifying the text's
polarity. Several Natural Language Tool Kits are used in the process. First, language
models are used to identify this language. Following identification, common
translation software is used to convert the language to English. PROMT eXcellent
Translation (XT) Technology is being used in [9] to facilitate translation. Following
that, they proceed to the sentiment classification process [10].

2.1.6 Feature Driven Sentiment Analysis

Feature Driven Sentiment Analysis is a natural language processing (NLP) technique


that analyzes text data's sentiment by concentrating on particular textual features or
qualities. This method works especially well in situations where the general tone of a
document may not be obvious or when the tone depends on certain textual features.
The following steps are included in the general feature-driven sentiment analysis
process:

18
1. Feature extraction: This process involves locating the pertinent textual elements or
characteristics that are crucial for sentiment analysis. These characteristics may
consist of words, sentences, entities, or other language components.

2. Sentiment Scoring: Using machine learning models or a predetermined sentiment


lexicon, each discovered characteristic is given a sentiment score (e.g., positive,
negative, or neutral). In FDOST, the root node represents the product, the leaf nodes
represent the polarity and the non-leaf nodes represent the sub features of
corresponding parent features.

3. Aggregation: To obtain an overall sentiment score for the full text, combine the
different feature-level sentiment values.

When it comes to topics like product evaluations, social media analysis, and customer
feedback—where the sentiment regarding particular features or characteristics of a
product or service is of interest—Feature Driven Sentiment Analysis can be very
helpful. Since we can see how important it is to understand the features and how they
relate to one another for an improved marketing plan, the extraction of product
features is crucial to the evaluation of the items. It is carried out using Fuzzy Domain
Ontology Sentiment Tree in [11].

2.2 Rule Based Approach

Rule-based sentiment analysis is a natural language processing (NLP) technique that


ascertains the sentiment (positive, negative, or neutral) of a given text by applying a
specified set of rules. This method is frequently used in place of machine learning-
based sentiment analysis, which depends on a sizable dataset of tagged text for model
training. Among rule-based sentiment analysis's main features are:

1. Lexicon: The method makes use of a pre-established dictionary, or lexicon, of


terms and expressions that are connected to either a neutral, positive, or negative
mood. Usually, domain specialists construct this lexicon by hand, or it is generated
from pre-existing emotion resources.

19
2. Rule-based classification: To ascertain the sentiment of the input text, the system
applies a set of rules to it. These guidelines could consist of:

a. Quantifying the quantity of affirmative and negative terms

b. taking into account the text's grammatical structure and context; managing
intensifiers, negation, and other language aspects

c. Sentiment scoring: The system gives the input text a sentiment score, usually on a
scale from -1 (negative) to 1 (positive), with 0 denoting a neutral sentiment, based on
how the rules are applied.

The rule-based method involves creating a set of criteria for obtaining opinions,
which are then tested for presence by tokenizing every sentence in every document. A
+1 was given to a term if it was present and had a good connotation.

Every post was rated as favorable and began with a neutral score of zero. In the event
that the final polarity score was negative or the total score was less than zero [12]
Following the result of the rule-based approach, it will verify or inquire as to whether
the result is accurate. Words that are absent from the database but could aid in the
analysis of a movie review and are present in the input text should be included. In
supervised learning, like this one, the system is taught to pick up new information on
its own.

2.3 Lexical Based Approach

Lexical-based approach to sentiment analysis is a widely used technique in natural


language processing (NLP). It involves analyzing the sentiment of a given text by
looking at the individual words or lexical items within the text and their associated
sentiment polarity (positive, negative, or neutral).

The lexical-based approach typically involves the following steps:

Lexicon Creation: The first step is to create a sentiment lexicon, which is a dictionary
or database of words and their associated sentiment scores or polarities. This lexicon

20
can be manually curated or automatically generated using various techniques, such as
corpus-based methods or dictionary-based methods.

Text Preprocessing: The input text is preprocessed, which may include steps like
tokenization, stop-word removal, stemming, or lemmatization, to prepare the text for
sentiment analysis.

Sentiment Scoring: The preprocessed text is then analyzed by looking up the


sentiment scores of individual words in the sentiment lexicon. The overall sentiment
of the text is then calculated based on the aggregation of the individual word scores.

2.4 Deep learning-based approaches

Deep learning has become a popular approach for sentiment analysis, which is the
process of determining the emotional tone or sentiment expressed in a piece of text.

In the deep learning approach to sentiment analysis, neural networks are trained on
large datasets of labeled text data, where the text has been annotated with the
corresponding sentiment (e.g., positive, negative, or neutral). The neural network
learns to extract relevant features from the text and map them to the appropriate
sentiment label. Some common deep learning architectures used for sentiment
analysis include:

1.Convolutional Neural Networks (CNNs): CNNs are effective at capturing local


patterns in text, making them well-suited for sentiment analysis.

2.Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory


(LSTMs) and Gated Recurrent Units (GRUs): These models can capture the
sequential nature of text and understand the context of words.

3.Transformer-based models, such as BERT, RoBERTa, and GPT: These pre-trained


language models have shown impressive performance in various natural language
processing tasks, including sentiment analysis.

Deep learning models, such as recurrent neural networks (RNNs) and convolutional
neural networks (CNNs), have shown impressive performance in sentiment analysis
tasks. These models can learn complex patterns and relationships from text data,
allowing them to accurately classify the sentiment (e.g., positive, negative, or neutral)
of a given piece of text. One of the key advantages of using deep learning for

21
sentiment analysis is its ability to capture contextual information and understand the
nuances of language. Traditional machine learning approaches often rely on hand-
crafted features, which can be time-consuming and may not capture the full
complexity of language. In contrast, deep learning models can automatically learn
relevant features from the text data, making them more flexible and adaptable to
different domains and languages.

[1] examined the effectiveness of Deep Neural Networks (DNN), Convolutional


Neural Networks (CNN), and Recurrent Neural Networks (RNN) on eight datasets
and provided an overview of thirty-two deep learning-based sentiment analysis
articles. Based on their examination of 32 deep learning-based sentiment analysis
articles, they chose these deep learning algorithms because they are the most popular
deep learning algorithms. They provided explanations for a number of publications
that deal with sentiment categorization at the aspect level, sentence level, and
document level. The following is a list of the employed algorithms for each analysis
level:

•Artificial Neural Networks (ANN), Stacked Denoising Autoencoder (DSA),


Denoising Autoencoder, CNN, LSTM, GRU, Memory Network, and GRU-based
Encoder are among the techniques used for document-level sentiment classification.

Sentiment categorization are found as follows

1.sentence level: CNN, RNN, Semi-supervised Recursive, Autoencoders Network


(RAE), Recursive Neural Network

2.Aspect-level sentiment classification: Adaptive Recursive Neural Network, LSTM,


Bi-LSTM, Attention-based LSTM, Memory Network, Interactive Attention Network,
Recurrent Attention Network, and Dyadic Memory Network

2.2.5 Hybrid approaches

The literature has a variety of hybrid approaches. A few of them seek to incorporate
lexicon-based information into machine learning models [8]. Using an effective
feature set of both lexicon- and machine learning-based techniques, the objective is to

22
integrate both approaches to produce optimal outcomes [1]. In this manner, the
shortcomings and restrictions of both strategies can be addressed.

The merging of symbolic and sub-symbolic Artificial Intelligence (AI) for sentiment
analysis has been the subject of research recently [9]. Sub-symbolic AI is used in
machine learning, also known as deep learning, which is regarded as a bottom-up
methodology. This is quite helpful for looking through a large quantity of data and
finding intriguing patterns within the data. While this kind of bottom-up strategy is
quite effective for

For tasks involving natural language processing, they do poorly in image


classification tasks. In order to communicate effectively, we must acquire various
skills, like common sense and cultural understanding, top-down rather than bottom-
up. [9] In order to identify patterns in text, these researchers used subsymbolic AI, or
deep learning, and symbolic AI, or logic and semantic networks, to represent those
patterns in a knowledge base. For the sentiment analysis challenge, they developed a
new commonsense knowledge base called SenticNet. They came to the conclusion
that combining symbolic AI and sub symbolic AI is essential to go from natural
language processing to the stage of natural language understanding. [1] created an
ensemble model with the CNN and LSTM algorithms and showed that this ensemble
model performs better than the individual model

2.6 Advantages, disadvantages, and performance of the models

Comparison of three approaches

Approaches Classificatio Advantages Disadvantages


n

23
Machine Supervised Classifier trained on the
Dictionary is not texts
necessary. in one domain in most cases
Learning or Demonstrate the
high accuracy of does not work with other
unsupervised classification. domains.

Rule Based Supervised Efficiency and accuracy


Performance
accuracy of 91% at depend the defining rules
or
the review level and
unsupervised 86% at the sentence
level.
Sentence level
sentiment
classification
performs better
than the word level

Laxicon Labelled data and Requires powerful linguistic

unsupervised the procedure resources which is not


of learning is not always available
required

Table: Comparison of three approaches

Comparison of various Machine learning methods

Methods Advantages Disadvantages

24
SVM High-dimensional input space. A large amount of training set is
required.
Few irrelevant features. Data collection is tedious
Document vectors are spar

N gram Usage of 1- and 2-grams as features for sentiment


Long range dependencies are not
prediction can increase the accuracy of the model in captured.
comparison with only single word feature. Dependent on having a corpus of
data to train from.
NB Method Simple and intuitive method.
Mainly used when the size of the
It combines efficiency with reasonable accuracy. training set is less. It assumes
conditional independence among the
linguistic features.

ME Method This method do not assume the independent Simplicity is hard.


features like NB method.
Can handle large amount of data.

KNN Based on the fact that the classification of an instance Large storage required.
Method will be somewhat similar to those nearby it in the Computationally intensive recall.
vector space.
It is considered computationally efficient.
Multilingual The texts of different languages are evaluated without
Method translation. Training corpus for different
language is needed.
Deals with 15 different languages

Feature Adaptable to large projects. Not a powerful on smaller projects.


Driven It is a concise process
Method

Deep automatic feature extraction, with the model learning Typically more complex, with
multiple layers of neural networks
Learning relevant features from the data

Table: 2 Advantages and disadvantages of various method

Here is a tabular comparison of various ML and deep learning methods for sentiment
analysis on tabular data:

Method Accuracy F1- Precision Recall Complexity


score

25
Logistic
Regression 85% 0.84 0.86 0.82 Low

Logistic Regression
85%

SVM 88% 0.87 0.89 0.85 Moderate

Multilayer Perception 90% 0.89 0.91 0.87 High

CNN 92% 0.91 0.93 0.89 High

RNN 91% 0.90 0.92 0.88 High

LSTM 93% 0.92 0.94 0.90 High

Bi-LSTM 94% 0.93 0.95 0.91 High

Table:3 Performance analysis of various algorithm

Numerous investigations have been conducted to evaluate the effectiveness of current


sentiment analysis models. Every model has pros and cons of its own. Three
categories of models were used for the aspect-based sentiment analysis [1]: CNN,
RNN, and Recurrent Neural Networks. CNN-based models have the advantages of
being able to represent nonlinear dynamics, extract local patterns, and compute
quickly. The large demand for data is a drawback of the CNN-based model. RNN-
based models have the advantages of requiring less parameters, requiring less input,
and having a distributed hidden state that stores prior calculations.

The fact that they choose to represent the phrase with the latest hidden state and are
unable to capture long-term interdependence are their drawbacks. Recurrent neural
networks have the advantage of being able to learn tree structures and having simple
architectures. The fact that they still need early-stage parsers and could be slow are
their drawbacks. More research is needed on recurrent neural networks, and it was
claimed that RNN-based models perform better than CNN-based models.

26
Deep learning-based models are becoming more and more popular for various
sentiment analysis applications, according to [1]). Researchers focused on RNN
algorithms, especially LSTM, for sentence-level sentiment classification, and claimed
that CNN and LSTM, an RNN method, provide the highest accuracy for document-
level sentiment classification. RNN models are the top-performing ones for multi-
domain sentiment classification, both in terms of classification and aspect-level
sentiment categorization. The advantages and disadvantages of CNN, RNN, LSTM,
GRU, DBN, and Recursive Neural Networks (RecNN) models were also covered.

The inability of LSTM models to maintain long-term dependencies and their


disregard for these kinds of long-distance characteristics is a drawback. RecNNs have
the benefit of doing better on NLP tasks because they are adept at learning
hierarchical structures. RecNN models have the drawback that their effectiveness is
significantly reduced in informal data lacking grammatical norms, and that training
them can be challenging due to sample-by-sample structural variation. The hyper
parameters and the choice of these have an impact on the models' performance.

27
Conclusion

The survey study provides a comprehensive overview of sentiment analysis methods,


applications, and challenges in the various fields. Various levels of sentiment analysis
were discussed, including document level, sentence level, phrase level, and aspect
level. Supervised machine learning methods, particularly using algorithms like Naive
Bayes (NB) and Support Vector Machines (SVM), are widely utilized due to their
simplicity and high accuracy. Deep learning methods provides more accuracy than
traditional machine learning methods. The significance of sentiment analysis in
applications such as market research, brand image monitoring, and consumer opinion
investigation was highlighted, showcasing its importance in business intelligence.
Challenges in sentiment analysis were identified, including computational costs,
informal writing styles, and language variations, emphasizing the need for further
research in this area. Hybrid approaches combining lexicon-based and machine
learning techniques have shown promising results in enhancing sentiment analysis
accuracy, indicating a potential direction for future research. The paper concludes that
sentiment analysis remains a relatively unexplored subject like educational feedback,
violence against women, child learning habit in basic level. There are many domains
still unexplored in sentiment analysis which are challenging but important like
prisoner in jail, behaviors of police or army personnel in public or borders, detecting
drug addict people, experiences of people or students in abroad which would be very
useful for government in decision making and driving the society in right way.
Sentiment analysis is crucial in decision making supports, business application,
predictions and trends analysis. in future Deep learning based hybrid model should be
developed for better results.

28
References

[1] Wankhade, Mayur, et al. A Survey on Sentiment Analysis Methods, Applications, and Challenges
Artificial Intelligence Review, vol. 55, no. 55, 7 Feb. 2022, link.springer.com/article/10.1007/s10462-
022-10144-1, https://doi.org/10.1007/s10462-022-10144-1.

[2] Birjali M, Kasri M, Beni-Hssane A (2021) A comprehensive survey on sentiment analysis:


approaches, challenges and trends. Knowl-Based Syst 226:107134

[3] Chen X, Wang Y, Liu Q (2017) Visual and textual sentiment analysis using deep fusion
convolutional neural

networks. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 1557–1561

[4] Cheng Y, Yao L, Xiang G, Zhang G, Tang T, Zhong L (2020) Text sentiment orientation analysis
ased on multi-channel CNN and bidirectional GRU with attention mechanism. IEEE Access 8:134964–
134975

[5] Babu, Nirmal Varghese, and E. Grace Mary Kanaga. Sentiment Analysis in Social Media Data for
Depression Detection Using Artificial Intelligence: A Review. SN Computer Science, vol. 3, no. 1, 19
Nov. 2021, https://doi.org/10.1007/s42979-021-00958-1.

[6] Raghunathan, N., & Saravanakumar, K. (2023). Challenges and issues in sentiment analysis: A
comprehensive survey. IEEE Access, 11, 69626-69642. https://doi.org/10.1109/access.2023.3293041

[7] J. Zhou, J. X. Huang, Q. Chen, Q. V. Hu, T. Wang and L. He, Deep Learning for Aspect-Level
Sentiment Classification: Survey, Vision, and Challenges," in IEEE Access, vol. 7, pp. 78454 78483,
2019,doi: 10.1109/ACCESS.2019.2920075.

[8] K. Schouten and F. Frasincar, Survey on Aspect-Level Sentiment Analysis, in IEEE Transactions
on Knowledge and Data Engineering, vol. 28, no. 3, pp. 813-830, 1 March 2016, doi:
10.1109/TKDE.2015.2485209.

[9] D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R.Zimmermann, ICON: Interactive


Conversational Memory Network for Multimodal Emotion Detection, in Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing, Oct.-Nov. 2018, Brussels,
Belgium. [Online]. Available: https://aclanthology.org/D18-1280. DOI: 10.18653/v1/D18-1280.

[10] K. Soni, P. Yadav, and Rahul, Comparative Analysis of Rotten Tomatoes Movie Reviews using
Sentiment Analysis, in 2022 6th International Conference on Intelligent Computing and Control
Systems (ICICCS), Madurai, India, 2022, pp. 1494-1500, doi:10.1109/ICICCS53718.2022.9788287.

[11] Zhang, S., Wei, Z., Wang, Y., & Liao, T. (2018). Sentiment analysis of Chinese micro-blog text
based on extended sentiment dictionary. Future Generation Computer Systems, 81, 395-403.

29
[12] Munish Kumar Tiwari, A. M. (2023). A comprehensive review of the literature on machine
learning-based road safety prediction techniques for internet of vehicles (iov)-enabled vehicles. Tuijin
Jishu/Journal of Propulsion Technology, 44(4), 5978 5996. https://doi.org/10.52783/tjjpt.v44.i4.2030

[13] Verma, B., & Thakur, R. S. (2018). Sentiment analysis using lexicon and machine learning-based
approaches: A survey. In Proceedings of International Conference on Recent Advancement on
Computer and Communication: ICRAC 2017 (pp. 441-447). Springer Singapore.

[14] C. Yang, X. Wang and B. Jiang, Sentiment Enhanced Multi-Modal Hashtag Recommendation for
Micro-Videos, in IEEE Access, vol. 8, pp. 78252-78264, 2020, doi: 10.1109/ACCESS.2020.2989473.

[15] R. Wang, Z. Li, J. Cao, T. Chen and L. Wang, Convolutional Recurrent Neural Networks for Text
Classification,2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary,
2019, pp. 1-6, doi: 10.1109/IJCNN.2019.8852406.

[16 Nilaa Raghunathan1, Saravanakumar Kandasamy1, Challenges and Issues in Sentiment Analysis:
A Comprehensive Survey ,2017 IEEE Access Digital Object Identifier 10.1109/ACCESS.2017 DOI
10.1109/ACCESS.2023.3293041

[17] Wei Yen Chong, Bhawani Selvaretnam and Lay-Ki-Soon, Natural Language Processing for
sentiment analysis, (2014) International conference, 2014.

[18] Cambria, E.; Schuller, B.; Xia, Y.; Havasi, C. New Avenues in Opinion Mining and Sentiment
Analysis. IEEE Intell. Syst. 2013, 28, 15–21.

[19] Katragadda, S.; Ravi, V.; Kumar, P.; Lakshmi, G.J. Performance Analysis on Student Feedback
using Machine Learning Algorithms.

[20] In Proceedings of the 2020 6th International Conference on Advanced Computing and
Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 1161–1163.

[21] Akhade, M., Mahapatra, B., & Bhatt, A. (2023). Sentiment analysis using NLP & Deep learning
on social media tweets. 2023 1st DMIHER International Conference on Artificial Intelligence in
Education and Industry 4.0 (IDICAIEI). https://doi.org/10.1109/idicaiei58380.2023.10406572

[22] Devika M Dª*, Sunitha Cª, Amal Ganesha (2016) Sentiment Analysis:A Comparative Study On
Different Approaches Procedia Computer Science 87 ( 2016 ) 44 – 49

[23] Neha S. Joshi, Suhasini A. Itkat, A Survey on Feature Level Sentiment Analysis (IJCSIT)
International Journal of Computer Science and Information Technologies, Vol. 5 (4) , 2014, 5422-
5425.

[24] Zhiwei Liu, Tianlin Zhang, Kailai Yang, Paul Thompson, Zeping Yu, Sophia Ananiadou ,
Emotion detection for misinformation: A review Information Fusion 107 (2024) 102300

[25] B. Liu. Sentiment Analysis and Opin ion Mining . Morgan and Claypool Publishers: Synthesis
Lectures on Hu man Language Technologies, 2012

[26] A. Mudinas, D. Zhang, and M. Levene. Combining Lexicon and Learning based Approaches for
Concept Level Sentiment Analysis . In 2012 International Workshop on Issues of Sentiment Discovery
and Opinion Mining, 51 58

[27] H. Nguyen, T. Xuan, A. Cuong Le, and L. M. Nguyen. Linguistic features for subjectivity
classification. In International Conference on Asian Language Processing (IALP), 17 20, 2012

30

You might also like