Synopsis2 2
Synopsis2 2
Synopsis2 2
Belagavi-590018
Synopsis on
Bachelor of Engineering
In
Artificial Intelligence & Data Science
In the digital age, online reviews have become a crucial component of consumer decision-
making. Whether purchasing products, booking services, or choosing entertainment options,
consumers increasingly rely on the experiences and opinions of others shared through various
online platforms. However, the sheer volume of these reviews can be overwhelming,
necessitating the need for efficient ways to interpret and analyze this data. This is where
sentiment analysis comes into play.
Sentiment analysis, also known as opinion mining, is a field within natural language
processing (NLP) that focuses on identifying and categorizing opinions expressed in a piece of
text. Specifically, it aims to determine the writer’s attitude towards a particular topic, product,
or service—whether it is positive, negative, or neutral. In the context of online reviews,
sentiment analysis helps businesses and consumers make sense of the collective feedback and
opinions embedded within large sets of textual data.
Common use cases of sentiment analysis include monitoring customer feedback, targeting
individuals to improve their service, and tracking how a change in product or service affects
how customers feel. It also helps to track customer sentiment over time. From opinion polls to
creative marketing strategy, this platform has completely redefined the way businesses
operate.
Various levels of sentiment analysis can be performed, depending on the specific focus and
objective of the analysis. Some common levels include:
The main aim behind the implementation is to develop a sentiment analysis system that
categorizes online reviews as positive, negative, or neutral. This system should efficiently
handle large volumes of unstructured text data and provide actionable insights to aid consumer
decision-making and business strategy, ensuring accuracy and reliability across various review
platforms.
This section reviews the literature that has previously been done in sentiment analysis, provides
an overview of existing knowledge in this particular field of research.
Recent research has explored more sophisticated approaches for sentiment analysis, such as
LSTM (Long Short-Term Memory) models. For instance, a study by Santosh Kumar T (2022)
employed an LSTM model to classify sentiment in Amazon Alexa product reviews, achieving
a notable accuracy of 90.9%. The success of this model underscores the potential for
leveraging advanced neural network architectures in sentiment analysis tasks. Moreover, the
authors suggest that further optimization through hyperparameter tuning could potentially
enhance the model's performance, highlighting avenues for future research in this area. This
literature review emphasizes the importance of exploring diverse methodologies, including
deep learning techniques like LSTMs, to improve sentiment classification accuracy and overall
model robustness in analyzing product reviews [1].
Asentiment analysis on a dataset of reviews using two different techniques: VADER (Valence
Aware Dictionary and sEntiment Reasoner) and a Roberta pretrained model from Hugging
Face’s pipeline. VADER is a lexicon and rule-based sentiment analysis tool specifically
attuned to sentiments expressed in social media and works well on texts from other domains.
RoBERTa is a pre-trained machine learning model developed by the Facebook AI Research
(FAIR) team, which is a modified version of the BERT model that improves upon its
architecture in several ways. Hugging Face Pipeline is a library provided by the Hugging Face
The most important objectives of online review sentiment analysis can be summarized as follows:
Data collection is the process of gathering and measuring information on targeted variables in
a systematic manner, which enables researchers to answer specific research questions, test
hypotheses, and evaluate outcomes. Obtain a large-scale dataset of online product reviews
from various sources, such as e-commerce platforms or review aggregation websites. Ensure
the dataset represents diverse product domains and contains reviews with a wide range of
sentiment expressions.
Gathering online reviews from sources such as Amazon, Yelp, TripAdvisor, or any other
platform relevant to analysis. The data set can be downloaded from Kaggle which includes
both the review text and any associated metadata, such as ratings, review dates, and
product/service identifiers.
Training a model refers to the process of teaching a machine learning algorithm to make
predictions or decisions based on data. During training, the model learns patterns and
relationships within the training data, adjusting its parameters to minimize errors and improve
accuracy. Selecting a suitable machine learning model or algorithm based on the problem type
and data characteristics play a vital role in this case we are using BERT, which stands for
Bidirectional Encoder Representations from Transformers, is a transformer-based model. It
represents a significant advancement in natural language processing (NLP) by leveraging the
power of transformer architecture for a wide range of language understanding tasks. Once the
model is trained Fine-tune the pre-trained BERT model on the sentiment analysis task using
labeled dataset. This involves adjusting the model weights using the labeled examples to
optimize performance for the sentiment classification task.
4.3.2 BERT
(Bidirectional Encoder Representations from Transformers), developed by Google AI in 2018,
revolutionized natural language processing by introducing a model that reads text
bidirectionally, leveraging the Transformer architecture for enhanced context understanding.
Unlike previous models that processed text in one direction, BERT captures context from both
directions simultaneously, leading to deeper comprehension of word meanings. Pre-trained on
a massive corpus and fine-tuned for specific tasks, BERT has set new performance
benchmarks in question answering, sentiment analysis, named entity recognition, text
summarization, and machine translation. Its release as an open-source model has democratized
access to advanced NLP capabilities, fostering widespread innovation and development in the
field.
It refers to the process of assessing the performance of a trained model on unseen data. It
involves measuring how well the model generalizes to new, previously unseen examples and
how effectively it accomplishes the task it was trained for. Evaluation provides insights into
the model's strengths, weaknesses, and overall effectiveness, helping to guide further
improvements or decisions.
Various performance matrices like accuracy, precision and F1 score are measured.
Precision focuses on the accuracy of the positive predictions made by the model. It is particularly
Recall (also known as sensitivity or true positive rate) measures the ability of the model to
identify all relevant instances of the positive class. It is crucial when the cost of false
negatives is high.
The F1 score is the harmonic mean of precision and recall. It provides a single metric that
balances both concerns, especially useful when you need to balance the trade-off between
precision and recall.