Synopsis2 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Belagavi-590018

Synopsis on

“ONLINE REVIEWS SENTIMENT ANALYSIS”

Submitted in partial fulfillment as per VTU curriculum for VI


semester for the award of degree of

Bachelor of Engineering
In
Artificial Intelligence & Data Science

Submitted By Internal Guide

ABHISHEK REDDY - 1EP21AD002 Dr. Vanshika Rastogi


BHUMIKA S - 1EP21AD013 Asst. Professor
MOHITH L C - 1EP21AD033 Dept. of AI & DS, EPCET

Department of Artificial Intelligence & Data Science


Jnana Prabha Campus, Virgo Nagar Post, Bidarahalli.
Bengaluru – 560049
2023
TABLE OF CONTENTS

Chapter No. Description Page No.


1 Introduction 1-2
1.1 Introduction to online reviews sentiment analysis 1
1.1.1 Levels of Sentiment Analysis 1
1.2 Problem Statement 2
2 Literature review 3-5
3 Objectives 6
4 Methodology 7-10
4.1 Data Collection 7
4.2 Prepare data for training 7
4.2.1 Preprocess Data 8
4.3 Train the model 8
4.3.1 Transformer model 8
4.3.2 BERT 9
4.4 Performance measure 9
5 References 11-12
Chapter 1
INRODUCTION

1.1 Introduction to Online Reviews Sentiment Analysis

In the digital age, online reviews have become a crucial component of consumer decision-
making. Whether purchasing products, booking services, or choosing entertainment options,
consumers increasingly rely on the experiences and opinions of others shared through various
online platforms. However, the sheer volume of these reviews can be overwhelming,
necessitating the need for efficient ways to interpret and analyze this data. This is where
sentiment analysis comes into play.

Sentiment analysis, also known as opinion mining, is a field within natural language
processing (NLP) that focuses on identifying and categorizing opinions expressed in a piece of
text. Specifically, it aims to determine the writer’s attitude towards a particular topic, product,
or service—whether it is positive, negative, or neutral. In the context of online reviews,
sentiment analysis helps businesses and consumers make sense of the collective feedback and
opinions embedded within large sets of textual data.

Common use cases of sentiment analysis include monitoring customer feedback, targeting
individuals to improve their service, and tracking how a change in product or service affects
how customers feel. It also helps to track customer sentiment over time. From opinion polls to
creative marketing strategy, this platform has completely redefined the way businesses
operate.

1.1.1 Levels of Sentiment Analysis

Various levels of sentiment analysis can be performed, depending on the specific focus and
objective of the analysis. Some common levels include:

 Document-Level Sentiment Analysis: This type of analysis determines the overall


sentiment expressed in a document, such as a review or an article. It aims to classify the
entire text as positive, negative, or neutral.
 Sentence-Level Sentiment Analysis: Here, the sentiment of each sentence within a
document is analyzed. This type provides a more granular understanding of the sentiment

Dept. of AI & DS 2023-24 Page | 1


expressed in different text parts.
 Aspect-Based Sentiment Analysis: This approach focuses on identifying and extracting
the sentiment associated with specific aspects or entities mentioned in the text. For
example, in a product review, the sentiment towards different features of the product (e.g.,
performance, design, usability) can be analyzed separately.
 Entity-Level Sentiment Analysis: This type of analysis identifies the sentiment expressed
towards specific entities or targets mentioned in the text, such as people, companies, or
products. It helps understand the sentiment associated with different entities within the
same document.
 Comparative Sentiment Analysis: This approach involves comparing the sentiment
between different entities or aspects mentioned in the text. It aims to identify the relative
sentiment or preferences expressed towards various entities or features.

1.2 Problem Statement

The main aim behind the implementation is to develop a sentiment analysis system that
categorizes online reviews as positive, negative, or neutral. This system should efficiently
handle large volumes of unstructured text data and provide actionable insights to aid consumer
decision-making and business strategy, ensuring accuracy and reliability across various review
platforms.

Dept. of AI & DS 2023-24 Page | 2


Chapter 2
LITERATURE REVIEW

This section reviews the literature that has previously been done in sentiment analysis, provides
an overview of existing knowledge in this particular field of research.

Recent research has explored more sophisticated approaches for sentiment analysis, such as
LSTM (Long Short-Term Memory) models. For instance, a study by Santosh Kumar T (2022)
employed an LSTM model to classify sentiment in Amazon Alexa product reviews, achieving
a notable accuracy of 90.9%. The success of this model underscores the potential for
leveraging advanced neural network architectures in sentiment analysis tasks. Moreover, the
authors suggest that further optimization through hyperparameter tuning could potentially
enhance the model's performance, highlighting avenues for future research in this area. This
literature review emphasizes the importance of exploring diverse methodologies, including
deep learning techniques like LSTMs, to improve sentiment classification accuracy and overall
model robustness in analyzing product reviews [1].

A comprehensive exploration of sentiment analysis, elucidating its fundamental principles and


practical applications. Beginning with an introduction to sentiment analysis, the article
navigates through the utilization of the TextBlob library for sentiment analysis tasks on both
imported sentences and tweets. Through leveraging TextBlob's capabilities, the article
showcases the ease of implementing sentiment analysis techniques on diverse textual data
sources. Furthermore, it delves into the nuances of sentiment analysis, including the challenges
posed by linguistic variations and context sensitivity. By amalgamating theoretical
underpinnings with hands-on implementation, this article serves as a valuable resource for
practitioners and enthusiasts seeking to harness sentiment analysis for various applications,
ranging from social media monitoring to customer feedback analysis. [2].

Asentiment analysis on a dataset of reviews using two different techniques: VADER (Valence
Aware Dictionary and sEntiment Reasoner) and a Roberta pretrained model from Hugging
Face’s pipeline. VADER is a lexicon and rule-based sentiment analysis tool specifically
attuned to sentiments expressed in social media and works well on texts from other domains.
RoBERTa is a pre-trained machine learning model developed by the Facebook AI Research
(FAIR) team, which is a modified version of the BERT model that improves upon its
architecture in several ways. Hugging Face Pipeline is a library provided by the Hugging Face

Dept. of AI & DS 2023-24 Page | 3


team that allows for a simple and efficient way to perform natural language processing tasks
using pre-trained models. The pipeline API allows for easy integration of multiple
components, such as tokenization, named entity recognition, sentiment analysis, etc. into a
single, unified pipeline. Overall, this project provided an overview of the two different
techniques and their implementation in Python using NLTK library, Hugging Face pipeline
and the Amazon Food review dataset.[3]

An outline of the development of a bag-of-words model for sentiment prediction in movie


reviews. Initially, the text data undergoes preprocessing, involving cleaning and vocabulary
restriction. Subsequently, the bag-of-words model is applied to transform the text data into a
numerical format suitable for modeling. A multilayer Perceptron (MLP) model is then
constructed and trained using the prepared data. Evaluation is conducted on test data to assess
model performance. Finally, the trained MLP bag-of-words model is utilized to make
predictions on new review text data, demonstrating the practical application of the approach.
This tutorial provides a structured methodology for sentiment analysis in movie reviews,
offering insights into data preparation, model development, and prediction capabilities. [4]

Exploration in the development of a transformer model for sentiment prediction in movie


reviews, encompassing text preprocessing, model construction, and prediction. The process
involves preparing the review data by cleaning and structuring it for input into the transformer
architecture. Leveraging the power of transformers, we achieve an impressive 78% accuracy
even without prior training, underscoring their effectiveness in natural language understanding
tasks. This serves as a compelling introduction to the capabilities of transformers in sentiment
analysis. Furthermore, it sets the stage for future exploration into the fine-tuning of
transformer models on custom datasets, as well as the creation and training of transformer
models from scratch. These planned endeavors promise to deepen our understanding and
extend the applicability of transformer-based approaches in sentiment analysis. This tutorial
provides a robust foundation for sentiment analysis in movie reviews while opening avenues
for advanced exploration in transformer methodologies for natural language processing tasks.
[5]

Dept. of AI & DS 2023-24 Page | 4


A brief summary of the literature review is shown below:

Table 2.1 Literature Review

Dept. of AI & DS 2023-24 Page | 5


Chapter 3
OBJECTIVES

The most important objectives of online review sentiment analysis can be summarized as follows:

• Analyzing Sentiments and opinions to perform sentiment analysis on reviews of the


product to assess the emotions of the user towards the product for the future insights
• Detect trends based on contextual reviews, comments, discussions

Dept. of AI & DS 2023-24 Page | 6


Chapter 4
METHODOLOGY
This section refers to the systematic approach and procedures used to conduct research. It
encompasses the strategies, techniques, and tools employed to collect, analyze, and interpret data.

Fig 4.1 A typical sentiment analysis model

4.1 Data Collection

Data collection is the process of gathering and measuring information on targeted variables in
a systematic manner, which enables researchers to answer specific research questions, test
hypotheses, and evaluate outcomes. Obtain a large-scale dataset of online product reviews
from various sources, such as e-commerce platforms or review aggregation websites. Ensure
the dataset represents diverse product domains and contains reviews with a wide range of
sentiment expressions.
Gathering online reviews from sources such as Amazon, Yelp, TripAdvisor, or any other
platform relevant to analysis. The data set can be downloaded from Kaggle which includes
both the review text and any associated metadata, such as ratings, review dates, and
product/service identifiers.

4.2 Prepare Data for Training

Dept. of AI & DS 2023-24 Page | 7


Preparing data for training is a critical step in the machine learning workflow. It involves
transforming raw data into a format that can be efficiently and effectively used by machine
learning algorithms. This process includes several key tasks such as cleaning the data,
encoding categorical variables, splitting the data into training and test sets, normalizing or
standardizing features, and creating any necessary data structures. Ensure the data is free from
errors and inconsistencies that could affect the model's performance.

4.2.1 Preprocess Data


Preprocessing data includes several steps, some of them are listed below:
 Text Cleaning: Removing any irrelevant content such as HTML tags, special characters,
and excessive whitespace.
 Tokenization: Split the text into individual words or tokens. BERT uses WordPiece
tokenization.
 Lowercasing: Convert all text to lowercase (depending on the tokenizer's requirements).
 Remove Stop Words: Optionally, remove common stop words if they don't add value to
the analysis (though BERT can handle these well).

4.3 Train the Model

Training a model refers to the process of teaching a machine learning algorithm to make
predictions or decisions based on data. During training, the model learns patterns and
relationships within the training data, adjusting its parameters to minimize errors and improve
accuracy. Selecting a suitable machine learning model or algorithm based on the problem type
and data characteristics play a vital role in this case we are using BERT, which stands for
Bidirectional Encoder Representations from Transformers, is a transformer-based model. It
represents a significant advancement in natural language processing (NLP) by leveraging the
power of transformer architecture for a wide range of language understanding tasks. Once the
model is trained Fine-tune the pre-trained BERT model on the sentiment analysis task using
labeled dataset. This involves adjusting the model weights using the labeled examples to
optimize performance for the sentiment classification task.

4.3.1 Transformer-Based Models


Transformer-based models are one of the most advanced Natural Language Processing
Techniques. They follow an Encoder-Decoder-based architecture and employ the concepts of
self-attention to yield impressive results. Though one can always build a transformer model

Dept. of AI & DS 2023-24 Page | 8


from scratch, it is quite tedious a task. Thus, we can use pre-trained transformer models
available on Hugging Face. Hugging Face is an open-source AI community that offers a
multitude of pre-trained models for NLP applications. These models can be used as such or
can be fine-tuned for specific tasks. To perform any task using transformers, we first need to
import the pipeline function from transformers. Then, an object of the pipeline function is
created and the task to be performed is passed as an argument (i.e sentiment analysis in our
case). We can also specify the model that we need to use to perform the task.

4.3.2 BERT
(Bidirectional Encoder Representations from Transformers), developed by Google AI in 2018,
revolutionized natural language processing by introducing a model that reads text
bidirectionally, leveraging the Transformer architecture for enhanced context understanding.
Unlike previous models that processed text in one direction, BERT captures context from both
directions simultaneously, leading to deeper comprehension of word meanings. Pre-trained on
a massive corpus and fine-tuned for specific tasks, BERT has set new performance
benchmarks in question answering, sentiment analysis, named entity recognition, text
summarization, and machine translation. Its release as an open-source model has democratized
access to advanced NLP capabilities, fostering widespread innovation and development in the
field.

4.4 Performance Measure

It refers to the process of assessing the performance of a trained model on unseen data. It
involves measuring how well the model generalizes to new, previously unseen examples and
how effectively it accomplishes the task it was trained for. Evaluation provides insights into
the model's strengths, weaknesses, and overall effectiveness, helping to guide further
improvements or decisions.
Various performance matrices like accuracy, precision and F1 score are measured.

 Accuracy is the most straightforward evaluation metric. It measures the proportion of


correctly classified instances out of the total instances.

 Precision focuses on the accuracy of the positive predictions made by the model. It is particularly

Dept. of AI & DS 2023-24 Page | 9


useful when the cost of false positives is high.

 Recall (also known as sensitivity or true positive rate) measures the ability of the model to
identify all relevant instances of the positive class. It is crucial when the cost of false
negatives is high.

 The F1 score is the harmonic mean of precision and recall. It provides a single metric that
balances both concerns, especially useful when you need to balance the trade-off between
precision and recall.

Dept. of AI & DS 2023-24 Page | 10


REFERENCES
[1]. Santhosh Kumar T, "Natural Language Processing : Sentiment Analysis using LSTM", 26
Aug, 2022.
[2].Natalia Kuzminykh, “Sentiment Analysis in Python With TextBlob”, Jan 12, 2023
[3].Ahmet Tasdemir, Customer Reviews Sentiment Analysis(Two Different Techniques), 2023
[4].How to Develop a Deep Learning Bag-of-Words Model for Sentiment Analysis (Text
Classification) by Jason Brownlee on September 3, 2020 in Deep Learning for Natural
Language Processing
[5].Ganesh Lokare, Effortless Sentiment Analysis with Hugging Face Transformers: A Beginner’s
Guide, 2023
[6].Tomáš Horváth, Sentiment Analysis and Opinion Mining Techniques for Learning Analytics,
2018
[7].Amazon Product Reviews Sentiment Analysis in Python: GEEKSFORGEEKS
[8].Lakshay Bharadwaj, Sentiment Analysis in Online Product Reviews: Mining Customer
Opinions for Sentiment Classification, September 2023.
DOI:10.36948/ijfmr.2023.v05i05.6090
[9].Abhijit Bhowmik et al. A comprehensive dataset for aspect-based sentiment analysis in
evaluating teacher performance Published in AJSE, Vol:22, Issue: 2 ISSN: 1608 – 3679
(print) 2520 – 4890 (Online)
[10]. Sampathirao Suneetha et al. aspect-based sentiment analysis: a comprehensive survey of
techniques and applications ISSN: 1004-9037 DOI:10.5281/zenodo.777648
[11]. Duyu Tang et al.Aspect Level Sentiment Classification with Deep Memory Network
arXiv:1605.08900v2 [cs.CL] 24 Sep 2016
[12]. Nandwani, P., Verma, R. A review on sentiment analysis and emotion detection from
text. Soc. Netw. Anal. Min. 11, 81 (2021). https://doi.org/10.1007/s13278-021-00776-6
[13]. Fang, X., Zhan, J. Sentiment analysis using product review data. Journal of Big Data 2, 5
(2015). https://doi.org/10.1186/s40537-015-0015-2
[14]. Grljevic O., Bosnjak Z., Kovacevic A.Opinion mining in higher education: A corpus-based
approach Enterprise Inf. Syst. (2020), 10.1080/17517575.2020.1773542
[15]. Kastrati Z., Ahmedi L., Kurti A., Kadriu F., Murtezaj D., Gashi F.A deep learning sentiment
analyser for social media comments in low-resource languages Electronics, 10 (10) (2021),
10.3390/electronics10101133
[16]. Rehman A.U., Malik A.K., Raza B., Ali W. A hybrid CNN-LSTM model for improving
accuracy of movie reviews sentiment analysis Multimedia Tools Appl., 78 (18) (2019), pp.
26597-26613, 10.1007/s11042-019-07788-7

Dept. of AI & DS 2023-24 Page | 11


[17]. Singla C., Al-Wesabi N.F., Pathania Y. Singh, Alfurhood B. Sulaiman, Hilal A. Mustafa,
Rizwanullah M., Hamza M. Ahmed, Mahzari M.An optimized deep learning model for emotion
classification in tweets Comput. Mater. Continua, 70 (3) (2022), pp. 6365-6380,
10.32604/cmc.2022.020480
[18]. Abd D.H., Abbas A.R., Sadiq A.T. Analyzing sentiment system to specify polarity by lexicon-
based Bull. Electr. Eng. Inf., 10 (1) (2021), pp. 283-289, 10.11591/eei.v10i1.2471
[19]. Zainuddin N., Selamat A., Ibrahim R.Hybrid sentiment classification on twitter aspect-based
sentiment analysisAppl. Intell., 48 (5) (2018)
[20]. Hameed Z., Garcia-Zapirain B.Sentiment classification using a single-layered BiLSTM model
IEEE Access, 8 (2020), pp. 73992-74001
[21]. Ansar W., Goswami S., Chakrabarti A., Chakraborty B.An efficient methodology for aspect-
based sentiment analysis using BERT through refined aspect extractionJ. Intell. Fuzzy Syst., 40
(5) (2021)
[22]. Stieglitz S., Mirbabaie M., Ross B., Neuberger C.Social media analytics – challenges in topic
discovery, data collection, and data preparationInt. J. Inf. Manage., 39 (2018)

Dept. of AI & DS 2023-24 Page | 12

You might also like