0% found this document useful (0 votes)
0 views

Review Analysis and Sentiment Learning Using NLP

The project report titled 'Review Analysis and Sentiment Learning Using NLP' by Rohan Sahu and Sahil focuses on utilizing Natural Language Processing to analyze online reviews and determine sentiment polarity (positive, negative, or neutral). The report highlights the growing importance of automated systems for processing vast amounts of user-generated textual data in e-commerce and social media, addressing challenges such as sarcasm and linguistic nuances. The project aims to develop a scalable and accurate sentiment analysis system to aid businesses in understanding consumer behavior and improving their offerings.

Uploaded by

rohansahu02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Review Analysis and Sentiment Learning Using NLP

The project report titled 'Review Analysis and Sentiment Learning Using NLP' by Rohan Sahu and Sahil focuses on utilizing Natural Language Processing to analyze online reviews and determine sentiment polarity (positive, negative, or neutral). The report highlights the growing importance of automated systems for processing vast amounts of user-generated textual data in e-commerce and social media, addressing challenges such as sarcasm and linguistic nuances. The project aims to develop a scalable and accurate sentiment analysis system to aid businesses in understanding consumer behavior and improving their offerings.

Uploaded by

rohansahu02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Review Analysis and Sentiment Learning Using NLP

A PROJECT REPORT
SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE AWARD OF DEGREE
OF
BACHELOR OF TECHNOLOGY
IN
SOFTWARE ENGINEERING
Submitted by:
ROHAN SAHU
(2k22/SE/139)
SAHIL
(2k22/SE/149)
Under the supervision of
Dr. SONIKA DAHIYA
ASSISTANT PROFESSOR

DEPARTMENT OF SOFTWARE ENGINEERING


DELHI TECHNOLOGICAL UNIVERSITY
(Formerly Delhi College of Engineering)
Bawana Road Delhi-110042
DECEMBER,2024
i

DELHI TECHNOLOGICAL UNIVERSITY


(Formerly Delhi College of Engineering)

Bawana Road Delhi-110042

CANDIDATE’S DECLARATION

We, Rohan Sahu(2k22/SE/139) and Sahil (2k22/SE/149), students of B.Tech. (Software


Engineering), hereby declare that the project dissertation titled “Review Analysis and
Sentiment Learning using NLP” which is submitted by us to the Department of Software
Engineering, Delhi Technological University, Delhi in partial fulfillment of the requirements
for the award of the degree of Bachelor of Technology, is original and not copied from any
source without proper citation. This work has not previously formed the basis for the award
of any Degree, Diploma Associateship, Fellowship or other similar title or recognition.

Place: Delhi
Date:12th December 2024

ROHAN SAHU SAHIL


(2K22/SE/139) (2K22/SE/149)
ii

DEPARTMENT OF SOFTWARE ENGINEERING


DELHI TECHNOLOGICAL UNIVERSITY
(Formerly Delhi College of Engineering)
Bawana Road Delhi-110042

CERTIFICATE
I hereby certify that the Project Dissertation titled “Review Analysis and Sentiment Learning
using NLP” which is submitted by Rohan Sahu (2k22/SE/139) and Sahil (2k22/SE/149),
Department of Software Engineering, Delhi Technological University, Delhi in partial
fulfilment of the requirement for the award of the degree of Bachelor of Technology, is a
record of the project work carried out by the students under my supervision. To the best of
my knowledge, this work is not been submitted in part or full for any Degree or Diploma to
this University or elsewhere.

Place: Delhi Dr. SONIKA DAHIYA


Date: 12th December, 2024 ASSISTANT PROFESSOR
DEPARTMENT OF SOFTWARE ENGINEERING

SUPERVISOR
iii

DEPARTMENT OF SOFTWARE ENGINEERING


DELHI TECHNOLOGICAL UNIVERSITY
(Formerly Delhi College of Engineering)
Bawana Road Delhi-110042

ABSTRACT

As E-Commerce increasing exponentially nowadays, the online review and customer


opinions have become critical for business and survey point of view for companies. Our
project “Review Analysis and Sentiment Learning” aim to extract a textual data out of any
product or movie reviews and identify the polarity (Negative, Positive or Neutral) of the
opinion, the system seeks to provide accurate and scalable solutions, the accuracy of the
machine is achieved by the use of Natural Language Processing (NLP). Future enhancements
include real-time livestream analysis of review.
iv

DEPARTMENT OF SOFTWARE ENGINEERING


DELHI TECHNOLOGICAL UNIVERSITY
(Formerly Delhi College of Engineering)
Bawana Road Delhi-110042

ACKNOWLEDGEMENT
I would like to express my gratitude and appreciation to all those who gave me the
opportunity to complete this dissertation. Special thanks to my supervisor in charge, Dr.
Sonika Dahiya (Assistant Professor), Department of Software Engineering who helped me,
stimulating, suggestions and encouragement. Helped and guided me for the completion of
the project undertaken by me. It is with their supervision that this work came into existence.

I would like to thank the department of Software Engineering for providing the
infrastructure, facilities and opportunity to work in this knowledgeable project that helped
me to learn new things about this burning technology.

I also want to give special thanks to all my fellow mates who supported me in every course
and provided their valuable ideas and thoughts to me.

ROHAN SAHU SAHIL


(2K22/SE/139) (2K22/SE/149)
TABLE OF CONTENT
CANDIDATE’S DECLARATION

SUPERVISIOR’S CERTIFICATE

ABSTRACT

ACKNOLEDGEMENT

LIST OF TABLES

LIST OF FIGURES

LIST OF SYMBOLS, ABBREVIATIONS AND NOMENCLATURE

1. INTRODUCTION

1.1 BACKGROUND

1.2. PROBLEM STATEMENT

2. RELATED WORKS

2.1 BACKGROUND RESEARCH

3. METHODOLOGY

3.1 NATURAL LANGUAGE PROCESSING

4. DATA SOFTWARE AND HARDWARE

5.EXPLORATORY DATA ANALYSIS

6.EXPERIMENTAL WORK

6.1 BUILDING MODEL

6.2 EXPLANATION OF MODEL

6.2.1 STEPS OF MODELING

6.2.2 THE CODE OF THE MODEL CONSTRUCTION

7. RESULTS

8.LIMITATIONS
9.CONCLUSION AND FUTURE WORK

REFERENCES
1

CHAPTER-1
INTRODUCTION
1.1 BACKGROUND AND MOTIVATION

In the world of E-commerce and online streaming platform age, user-posted content such
as reviews, feedback, and social media posts has become a valuable resource for
understanding consumer behaviour and sentiment, so that business person can understand
what’s trending in market. Businesses heavily rely on this textual data to assess customer
satisfaction, monitor brand insight, and make cost effective and meaningful decision.
However, manually analysing such vast and diverse data is not only time consuming but
also prone to biasness, inconsistency and inaccuracy. This has driven the need for
automated systems capable of processing and extracting meaning from textual data.

In this regard, the field of natural language processing (NLP) has become revolutionary,
providing sophisticated models and algorithms for efficient text analysis. One important
use of natural language processing (NLP) is sentiment analysis, which helps businesses
identify underlying themes, categorize reviews as neutral, negative, or positive, and
assess the emotional tone of customer feedback. Despite tremendous advancements in
this area, problems like managing sarcasm, domain-specific language, and linguistic
subtleties still exist.

The growing need for scalable, precise, and affordable tools to interpret unstructured text
data is what inspired this project. This project intends to close current gaps and develop a
system that can provide useful insights by utilizing cutting-edge NLP techniques. In
addition to helping companies improve their goods and services, being able to handle
customer reviews and sentiments effectively builds stronger customer relationships,
which in turn boosts growth and competitiveness.

1.2 PROBLEM STATEMENT


The amount of user-generated textual data, including reviews, comments, and feedback,
has increased to previously unheard-of levels due to the quick development of social
media, e-commerce websites, and online multimedia platforms. Important information
about user preferences, opinions, and satisfaction levels can be found in this abundance
of data. But it can be difficult to glean valuable information from such diverse and
unstructured text.

Because manual review analysis takes a lot of time, is prone to human error, and cannot
keep up with the sheer volume of data, it is not scalable. Accurate text interpretation is
further complicated by elements like sarcasm, domain-specific terminology, ambiguous
language, and cultural quirks. Current automated solutions frequently find it difficult to
manage these complexities, which results in less-than-ideal sentiment detection and
thematic analysis outcomes.
2
As a result, a reliable, scalable, and accurate system that can classify sentiments, analyze
and interpret user-generated text data, and offer useful insights is desperately needed.
By utilizing cutting-edge Natural Language Processing (NLP) techniques, this project
seeks to close this gap and create a system that can overcome these obstacles,
empowering researchers and businesses to make defensible decisions based on
trustworthy sentiment analysis and review interpretation
3
CHAPTER-2

RELATED WORK
2.1. BACKGROUND RESEARCH
Sentiment analysis aims to automatically identify and classify sentimental tendencies
in texts through computer technology and linguistic knowledge, there were various
development and improvement in this section till now, various authors and university
use different Machine learning techniques to evaluate the textual sentiments.
The TextRank algorithm, in which a graph-based text summarization methodology is
involved that represents words or phrases as nodes in a graph, with giving edge
weights capturing semantic similarity. One more method developed which named as
Word2Vec, this method is used for learning distributed word representations by
capturing semantic relationships within a continuous vector space.

On the other hand, deep learning-based approaches have been extensively used in
sentiment analysis, including Deep Neural Network, CNN, and attention mechanism-
based network, using CNN in sentiment analysis by Meena aimed to classify the
sentiment polarity in social media data. They categorized comments preferred by
people into sentiment polarities such as positive, negative, and neutral, achieving an
impressive accuracy of 95.4%. Similarly, Kruspe et al. used a neural network with pre-
trained word and sentence embeddings to perform sentiment analysis on European
COVID-19-related Twitter messages. 79,000 of the 4.6 million tweets that this model
examined contained COVID-19 keywords along with semantic information.
Amazon reviews using various RNN variants, to classify customer sentiment as
negative, neutral, or positive. These RNNs were combined with different word
embeddings for feature extraction, to achieve the highest accuracy of 93.75%.
In sentiment learning both machine learning and deep learning used to understand
the meaning of opinion, but they both worked in different way.
Machine learning based methods are used to build a quick model and work well on
small datasets, in case of large and complex datasets these methods fails to deliver a
high accuracy, for large dataset Deep learning based methods are used these
methods works on more complicated datasets and deliver a high accuracy, these are
difficult to develop and take time and need more labelled dataset.

S.No TITLE OF AUTHOR OBJECTIVE LIMITATION LINK/REF


. PAPER ERNCE
AND YEAR
1. Combining Qi Han, The objective of this The N-gram model relies https://a
an SVM Junfei model is to develop an on fixed sequences of clantholo
Classifier Guo and effective sentiment words, which may not gy.org/S
and Hinrich analysis system for effectively capture the
Character Schutze classifying Twitter and deeper context or
N-gram SMS text messages. It meaning of sentences, this 13-
Language utilizes an SVM classifier model has the scalability 2086.pdf
Models for with various features, issue as this is for smaller
Sentiment including bag-of-words data
Analysis (unigrams and bigrams),
on Twitter POS tags, stylistic
Text features, readability
(2013) scores, emoticons, and
domain-specific
elements. To address
lexical variation in
Twitter text, character n-
gram language models
are incorporated. By
combining these
approaches, the model
aims to achieve high
performance and
robustness across
different datasets.
2. Recurrent Peng The objective of this Depend on large labelled https://a
Attention Chen, model is to develop a dataset, complex model clantholo
Network Zhongqia neural network-based and may have ambiguity in gy.org/D
on n Sun , framework for sentiment the dataset. 17-
Memory Lidong analysis of opinion 1047.pdf
for Aspect Bing, Wei targets in reviews and
Sentiment Yang comments. It uses a
Analysis multiple-attention
(2017) mechanism to capture
long-distance sentiment
features, combined with
a recurrent neural
network for improved
expressive power. A
weighted-memory
mechanism eliminates
the need for extensive
feature engineering. The
model is evaluated on
four datasets,
outperforming existing
methods in various
contexts.
3. TextRank: Rada a graph-based text lacks true semantic https://a
Bringing Mihalcea summarization understanding, struggles clantholo
Order into and Paul methodology is involved with varying text lengths gy.org/W
Texts Tarau that represents words or and structures, and has
(2004) phrases as nodes in a difficulty handling 04-
graph, with giving edge synonyms or polysemy 3252.pdf
weights capturing effectively due to its
semantic similarity reliance on preprocessing
quality and statistical co-
occurrence without
deeper linguistic analysis
4. Categorizi Gaurav aimed to classify the Social network data is full https://li
ng Meena, sentiment polarity in of unstructured text, nk.spring
Sentiment Krishna social media data. They slang, and abbreviations, er.com/a
Polarities Kumar categorized comments making it hard to identify rticle/10.
in Social Mohbey preferred by people into sentiments like positive, 1007/s42
Networks and Ajay sentiment polarities such negative, or neutral. 979-021-
Data Using Indian as positive, negative, and Traditional methods often 00993-
Convolutio neutral, achieving an fail to handle these y#Bib1
nal Neural impressive accuracy of challenges, so a better
Network 95.4%. model is needed to
(2021) accurately analyze and
classify sentiments.
5. Cross- Anna used a neural network During the COVID-19 https://a
language Kruspe, with pre-trained word pandemic, analyzing rxiv.org/
sentiment Matthias and sentence public sentiment across abs/2008
analysis of Häberle, embeddings to perform different European .12172
European Iona sentiment analysis on countries is challenging
Twitter Kuhn, European COVID-19- due to the multilingual
messages Xiao related Twitter nature of Twitter
during Xiang messages. 79,000 of the messages. Traditional
COVID-19 Zhu 4.6 million tweets that sentiment analysis
pandemic this model examined methods often struggle
(2020) contained COVID-19 with cross-language
keywords along with processing and lack the
semantic information. ability to accurately
capture sentiments in
diverse linguistic contexts.
This creates a need for a
robust approach to
analyze and compare
sentiments across multiple
languages to better
understand public
reactions during the crisis.
6
CHAPTER-3

METHODOLOGY
3.1 NATURAL LANGUAGE PROCESSING
A branch of computer science, and more specifically artificial intelligence, is called
natural language processing (NLP). It is closely related to information retrieval,
knowledge representation, and computational linguistics, a branch of linguistics,
since its main goal is to enable computers to process data that has been encoded in
natural language. Usually, rule-based, statistical, or neural-based methods for
machine learning and deep learning are used to gather data from text corpora.
NLP research has helped enable the era of AI, from communications skills of large
language models to the ability of image generation, NLP is already a part of day-to-
day life of many people. From voice assistant to recommendation system of social
media.

BENEFITS OF NLP
NLP make it easier for human being to communicate with machines, by allowing
them to access machine in natural human language.
1.automation of repetitive tasks
2. sentiment analysis
3. Enhanced research work
4. Content generation
7
NLP TASKS
Tasks which that are made easier with the use of NLP and can be performed easily
and to a great accuracy:
1.Sentiment analysis
This the analysis which requires to analysis the sentiment of the required line or we
can say using NLP we can extract the true meaning of the sentence, NLP identify the
polarity of the word or phrase as positive, negative or neutral. This is used by the big
companies and movie makers to analysis the feedback of the consumer on the
particular movie or product so that they can make a business related decision.

2.Text summarization
NLP is used to summarized the text by analysing the text provided to it, and extract
the meaning of the text. After that model generate the new text having similar
meaning as the original text which was giving to the model in starting.

3.Speach recognition
NLP after training with the audio dataset can identify the voice of the person, this
feature of NLP is used in the various field such as voice recognition locks, google
assistant, apple siri etc

4. Text Classification
Assigning predefined categories to a given text, such as spam detection, sentiment
analysis, or topic labelling.

CHALLENGES IN NLP
Biased Training
NLP models can inherit biases from their training data, leading to skewed results,
especially in sensitive areas like healthcare or HR. If the data is biased, the model’s
predictions will be too, affecting accuracy and fairness.

Misinterpretation (Garbage In, Garbage Out)


NLP systems, especially speech-to-text, can struggle with unclear inputs, such as
dialects, slang, or background noise. Mispronunciations, grammar errors, or
fragmented speech can lead to inaccurate results.

New Vocabulary
Language is constantly evolving, and NLP systems may struggle with new words or
shifting grammar. This can lead to incorrect guesses or confusion, especially in fast-
changing fields like technology or pop culture.

Tone of Voice
8
NLP struggles to capture the emotional tone or intent behind words, such as sarcasm
or emphasis. This makes sentiment analysis and understanding user intent more
difficult and less reliable.

You might also like