Sentiment Analysis of Reviews Using Machine Learning

Uploaded by

Isha Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views33 pages

Sentiment Analysis of Reviews Using Machine Learning

Uploaded by

Isha Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

SENTIMENT ANALYSIS OF REVIEWS

USING MACHINE LEARNING

Abstract
• Our project focuses on sentiment analysis of the costumer’s
reviews in one of the most trending e-commerce platform which
is women’s clothing shopping sites using machine learning.
• Machine learning concept helps to improve the shopping
experience by considering the personal preferences and
recommend the consumer while they do a new purchase based
on the history providing personalization. Sentiment analysis
allows e-commerce platforms to understand the opinions of
customer feedback. Along with understanding the emotions of
customer feedback it also analyzes the opinions for a particular
reason.
• The dataset includes attributes like: Clothing ID, Age, Title,
Review Text, Rating, Recommended IND, Positive Feedback
Count, Division Name, Department Name, and Class Name.
Abstract
• We attempt to understand the correlation of different
variables in customer reviews on a women clothing e-
commerce, and to classify each review by the depth
meaning of the words and these words further helps us to
predict whether the reviewed product is recommended or
not and whether it consists of positive, negative, or
neutral sentiment. To achieve these goals, we employed
Multinomial Naive Bayes algorithm.
• To understand the dataset we are using data
representation techniques like bar graphs which showed
the reviews vs. age and category. We also create confusion
matrix to check for the efficiency of our classifier.
Aim
Our aim is to analyze and classify the reviews
present in the dataset based on category of
cloths and age group and hence provide the
recommendations and rating based on the
customer’s new review on the cloth.
Objective
• To focus on e-commerce reviews on women clothing
shopping sites, where our aim is to help in
summarizing the product reviews.
• To help the retailers, the e-commerce platforms will
use the summary of customer feedback to improve
the quality of the products.
• To help the existing and prospective customers in
deciding the products of their interests.
Application of Project
• Recommend the best clothes in each age
group to the customer.
• Recommend the best clothes for all
category of clothes to the customer.
• Determines the recommended IND based
on the reviews provided for the e-
commerce platform.
• Determines the rating based on the reviews
provided for the e-commerce platform.
Workflow: Pre-processing
Dataset
dazzz

Remove
unnecessary Lemmatization
data

Into lower case

Tokenization and single
representation

Pre-processed clean data

Workflow: Polarity
Pre processed clean data

Polarization

neutral positive negative

User interests Analyser Suggestion

Workflow: Classification
Pre processed clean data

Hot encoding/
sparseing

Bag of words

Splitting

Training data Test data

Multinomial Naïve Classification model Desired output

Bayes
Step 1:GATHER THE DATASET
Step 1:GATHER THE DATASET
• The dataset used is Women’s Clothing E-Commerce
dataset revolving around the reviews written by
customers.
• Its nine supportive features offer a great environment
to parse out the text through its multiple dimensions.
• The attributes includes: independent attributes like
clothing ID, Age, Department name, Title.
• Dependent attributes like Division name and Class
name depends on Department name and Clothing ID.
Review, Rating and positive feedback depends on Age
and Title.
Step 2: CLEANING OF DATASET
• We first extract the main attributes from the given dataset
which includes Clothing ID, Age, Review Text, Rating,
Recommended IND, and Class Name.
Step 2: CLEANING OF DATASET
• Since, we have to pre-process and clean the text we
extract the Review Text attribute column.
• We define a function called remove_noise which
consists of the following:
• Using lower() function to convert the text to
lowercase
• Use stip() function to remove the whitespaces.
• Use repalce() function along with the re library
which hs regular expressions to remove repeated
numbers, punctuations(Ex. Beautiful!!!! =
beautiful(!*))
Step 2: CLEANING OF DATASET
• Remove stop words like and, are, because, at etc. from review
text by comparing it with the stop words list present in stop
words library.
Step 2: CLEANING OF DATASET
• Tokenize the text by separating it into individual words to the
specific tokens (example: word “Beautiful” = adjective = token
(JJ)) in order to convert the words to vector form.
• Lemmatize the words in order to get appropriate meaning of
the words(example: words such as “studied”, “studies”, and
“studying” to simple form of word “study”).
Step 3: COMPUTING POLARITY
• Pre-process data is sent to textblob library which has a
sentiment module which has variable called polarity.
• Polarity is a float variable which derives the meaning of the
word given by British English and rates the words in the range
of -1 to 1 where -1 to 0 being negetive words and 0 to 1 being
postive words and 0 being a neutral words.
Step 3: COMPUTING POLARITY
• As our objective is to provide the customer the clothing IDs
which are highly reccomended analysed from the reviews,we
consider the clothing ID’s which has the highest possible
polarity that is 1. Now we send these IDs to the analyser.
• Based on the user interests like age group or catogory of
clothes, the analyser suggests the clothes.
Step 4: FINDING A GOOD DATA
REPRESENTATION
• We use CountVectorizer module of sklearn library to achieve the
following:
• Build a vocabulary of all the unique words in our dataset, and
associate a unique index to each word in the vocabulary using
nltk.corpus.
• Now we compare this vocabulary of words with our filtered dataset
and fit the filtered words with the corresponding values based on
importance (weightage) present in vocabulary using fit function.
• Next the transform function creates a matrix of size 23,486 review
text X number of words compute by fit function. This is called
Sparse matrix.
• Now we add the weightage of all the words present in our filtered
dataset with its corresponding position in sparse matrix. At each
index in this list, we mark how many times the given word appears
in our sentence. This process is done using fit_transform function.
Step 4: FINDING A GOOD DATA
REPRESENTATION
• Now we get the final sparse matrix which we use for further
processing.
Step 4: FINDING A GOOD DATA
REPRESENTATION
• For visualization purpose we use get_feature_name function which
finds the word that was occurred maximum number of times and
also its weight in the reviews for a particular clothing id.
Step 5: BUILDING A CLASSIFIER
• By using train_test_split function of model_selection
module of sklearn library we split the sparse matrix into
training and testing data in the ratio of 80:20 using
test_size parameter.
• Now we decide the classification model that we wish
to use to train our prediction model.
Step 5: BUILDING A CLASSIFIER
• We have used Multinomial Naive Bayes because it calculates
likelihood to be count of an word/token (random variable)
and Naive Bayes calculates likelihood to be following:
Step 5: BUILDING A CLASSIFIER
• To achieve this we use MultinomialNB() function
from naïve Bayes module of sklearn library.
• Our ultimate goal is to train our model to learn
the probabilities needed in order to make a
classification decision. We achieve this by
training the model by giving it the training set of
data.
Step 6: USING TRAINED MODEL
FOR PREDICTION
• Now we use the prediction model that we
obtained after training, to compute further
predictions. We pass the test data to the
predict function which gives us the
prediction results like recommendation IND
or rating.
Data visualization
Histogram of Review vs. Age
Data visualization
Review vs. Category name
Data visualization
Item Id vs. Popularity
Confusion matrix
Confusion matrix for rating
Performance Measurement:
Rating
• Performance report:
precision recall f1-score support
1 0.82 0.33 0.47 199
5 0.96 1.00 0.98 3266
micro avg 0.96 0.96 0.96 3465
macro avg 0.89 0.66 0.72 3465
weighted avg 0.95 0.96 0.95 3465
Confusion matrix
Confusion matrix for recommend ind
Performance Measurement: Recommended
IND
• Performance report:
precision recall f1-score support
0 0.70 0.57 0.63 1036
1 0.91 0.95 0.93 4863
micro avg 0.88 0.88 0.88 5899
macro avg 0.81 0.76 0.78 5899
weighted avg 0.87 0.88 0.88 5899
Conclusion
• Through this project we were able to explore the vast libraries
and modules present in NLP and also understand and
implement few of machine learning algorithms.
• Sentiment analysis helped us to provide the users with best
recommended clothing ID.
• Also, Multinomial Naïve Bayes algorithm helped us to provide
the retailers an easy and efficient way to know whether the
users have genuine interests in their products by providing
them with true ratings and recommended IND.
• Thus, we were able to achieve our goals of improving user
experiences and retailers service.
References
• Abien Fred M. Agarap, Department of Computer Science Adamson University Manila, Philippines
[email protected] ,Paul M. Grafilon, Ph.D.† Department of Computer Science Adamson University
Manila, Philippines [email protected], Statistical Analysis on E-Commerce Reviews, with Sentiment
Classification using Bidirectional Recurrent Neural Network
• Geoff Hulten, Apress Media LLC publishers, Building Intelligent Systems, A Guide to Machine Learning Engineering
• Minqing Hu and Bing Liu ,Mining and Summarizing Customer Reviews , Department of Computer Science University
of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
• Paul Barry (2nd Edition), Head First Python, O’Reilly publications.
• Sasikala P*1 , L.Mary Immaculate Sheela#2 *Research Scholar, Department of Computer science, Mother Teresa
Women’s University, Kodaikanal, India, International Journal of Applied Engineering Research ISSN 0973-4562
Volume 13, Number 14 (2018) pp. 11525-11531 © Research India Publications. http://www.ripublication.com,
Sentiment Analysis and Prediction of Online Reviews with Empty Ratings
• Vishal A. Kharde, S S Sonawane, (April 2016), Sentiment Analysis of Twitter Data: A Survey of Techniques,
International Journal of Computer Applications (0975 – 8887) Volume 139 – No.11.
• https://www.digitalocean.com/community/tutorials/how-to-set-up-jupyter-notebook-for-python-3
• https://www.digitalocean.com/community/tutorials/how-to-work-with-language-data-in-python-3-using-the-natural
-language-toolkit-nltk
• https://pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/
• https://www.youtube.com/watch?v=3Pzni2yfGUQ&feature=youtu.be
• https://www.kaggle.com/

Model QP NLP DrChandiniAG
No ratings yet
Model QP NLP DrChandiniAG
4 pages
Kca 301
No ratings yet
Kca 301
186 pages
Detection of Cyber Attack in Network Using Machine Learning Techniques Final
100% (5)
Detection of Cyber Attack in Network Using Machine Learning Techniques Final
50 pages
Fake Logo Detection DT Report
100% (1)
Fake Logo Detection DT Report
26 pages
Internship Report
No ratings yet
Internship Report
20 pages
Voting System Mini Project Report
100% (2)
Voting System Mini Project Report
18 pages
Final Report Spam Mail Detection 33
No ratings yet
Final Report Spam Mail Detection 33
51 pages
DSBDA Mini Project
No ratings yet
DSBDA Mini Project
19 pages
Sentimental Analysis Project Documentation
83% (6)
Sentimental Analysis Project Documentation
67 pages
Software Requiement Specifications: Fake News Detector
100% (2)
Software Requiement Specifications: Fake News Detector
10 pages
Stress Detection in It Professional by Image Processing and Machine Learning
No ratings yet
Stress Detection in It Professional by Image Processing and Machine Learning
91 pages
Face Mask Detection Project
0% (1)
Face Mask Detection Project
57 pages
Disease Prediction and Drug Recommendation Using Machine Learning
100% (1)
Disease Prediction and Drug Recommendation Using Machine Learning
26 pages
Fake Account Detection
100% (1)
Fake Account Detection
34 pages
ML Ai PGD
No ratings yet
ML Ai PGD
26 pages
Final Year Project Report
No ratings yet
Final Year Project Report
37 pages
Synopsis of ML Project
100% (1)
Synopsis of ML Project
6 pages
Crop Recommender System
No ratings yet
Crop Recommender System
23 pages
Unit 1 Notes of Advance Operating System
100% (2)
Unit 1 Notes of Advance Operating System
18 pages
Simon-Game: Integrated Project Report
100% (1)
Simon-Game: Integrated Project Report
13 pages
"Resume Screening Using ML": R.V.S. College of Engineering and Technology Kolhan University
100% (1)
"Resume Screening Using ML": R.V.S. College of Engineering and Technology Kolhan University
54 pages
Image Captioning
67% (3)
Image Captioning
16 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Internship Presentation New PDF
No ratings yet
Internship Presentation New PDF
14 pages
Fake Job Post Detection Using Machine Learning
100% (1)
Fake Job Post Detection Using Machine Learning
24 pages
Flipkart Reviews Sentiment Analysis
100% (1)
Flipkart Reviews Sentiment Analysis
22 pages
Final PPT - Fake Product Review
100% (1)
Final PPT - Fake Product Review
27 pages
Online Fake Logo Detection System Python Project
No ratings yet
Online Fake Logo Detection System Python Project
8 pages
Assg 7
71% (7)
Assg 7
4 pages
1) Explain in Detail Core Function of Edge Analytics With Diagram
No ratings yet
1) Explain in Detail Core Function of Edge Analytics With Diagram
13 pages
Project Report (Amazon Review (Sentiment Analysis) )
No ratings yet
Project Report (Amazon Review (Sentiment Analysis) )
31 pages
Sentiment Analysis
100% (1)
Sentiment Analysis
35 pages
Modeling and Predicting Cyber Hacking Breaches: Under The Guidance Of: Team Members
100% (1)
Modeling and Predicting Cyber Hacking Breaches: Under The Guidance Of: Team Members
38 pages
WT Lab Manual
No ratings yet
WT Lab Manual
47 pages
Fake Product Review Final
No ratings yet
Fake Product Review Final
30 pages
Final Year Project Report
50% (2)
Final Year Project Report
53 pages
Machine Learning With Python Report
100% (1)
Machine Learning With Python Report
41 pages
Traffic Sign Recognition - PPT
100% (1)
Traffic Sign Recognition - PPT
8 pages
1NH17CS407
No ratings yet
1NH17CS407
110 pages
CARTOON OF AN IMAGE Documentation
No ratings yet
CARTOON OF AN IMAGE Documentation
38 pages
Types of Data Represented As Strings
No ratings yet
Types of Data Represented As Strings
2 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
Final Project Report
No ratings yet
Final Project Report
52 pages
Characteristics of Soft Computing
88% (8)
Characteristics of Soft Computing
11 pages
Ai-Ml 2024
No ratings yet
Ai-Ml 2024
45 pages
Exp-4-Eliminating Ambiguity, Left Recursion and Left Factoring - 012
No ratings yet
Exp-4-Eliminating Ambiguity, Left Recursion and Left Factoring - 012
14 pages
AI Lab Manual-1
100% (1)
AI Lab Manual-1
16 pages
Barcode Based Attendance System: Project Report
100% (1)
Barcode Based Attendance System: Project Report
41 pages
Internship Presentation
No ratings yet
Internship Presentation
16 pages
Progress Report LinkedIn-clone
No ratings yet
Progress Report LinkedIn-clone
19 pages
Find S Algorithm
No ratings yet
Find S Algorithm
7 pages
Use of Artificial Neural Networks To Identify Fake Profiles
100% (6)
Use of Artificial Neural Networks To Identify Fake Profiles
18 pages
Project Report ON Full Stack Web Development Using Python
100% (2)
Project Report ON Full Stack Web Development Using Python
10 pages
Gibbs Algorithm
100% (1)
Gibbs Algorithm
5 pages
Object Detection System Data Flow Diagram
100% (1)
Object Detection System Data Flow Diagram
16 pages
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
No ratings yet
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
30 pages
Soting Algorithm Visualizer
67% (3)
Soting Algorithm Visualizer
32 pages
Internship Report 2023-24 Data Science
100% (2)
Internship Report 2023-24 Data Science
23 pages
Multimedia Mining Presentation
No ratings yet
Multimedia Mining Presentation
18 pages
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
No ratings yet
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
4 pages
Mini Project Report: Submitted in Partial Fulfilment of The Requirement For The University of Mumbai For The Degree of by
No ratings yet
Mini Project Report: Submitted in Partial Fulfilment of The Requirement For The University of Mumbai For The Degree of by
24 pages
Sentiment Analysis of Reviews Using Machine Learning
No ratings yet
Sentiment Analysis of Reviews Using Machine Learning
36 pages
Comsats University Islamabad Wah Campus (Project Report) : Submitted by
No ratings yet
Comsats University Islamabad Wah Campus (Project Report) : Submitted by
14 pages
Online Retail Management System: Software Engineering Mini Project
No ratings yet
Online Retail Management System: Software Engineering Mini Project
6 pages
Rms
No ratings yet
Rms
14 pages
Report WP
No ratings yet
Report WP
8 pages
User Authentication Protocol
No ratings yet
User Authentication Protocol
21 pages
DWM May 2024
No ratings yet
DWM May 2024
3 pages
Introduction To Data Science: Cpts 483-06 - Syllabus
No ratings yet
Introduction To Data Science: Cpts 483-06 - Syllabus
5 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Sms Spam Detectionn
No ratings yet
Sms Spam Detectionn
63 pages
Papers
No ratings yet
Papers
9 pages
An Efficient Machine Learning Approach For Diagnosing Parkinson's Disease by Utilizing Voice Features
No ratings yet
An Efficient Machine Learning Approach For Diagnosing Parkinson's Disease by Utilizing Voice Features
20 pages
Data Science Using Python-I
No ratings yet
Data Science Using Python-I
3 pages
Prediction of Cardiovascular Disease Using Machine Learning Algorithms
No ratings yet
Prediction of Cardiovascular Disease Using Machine Learning Algorithms
11 pages
Assignment Week 3 500832
No ratings yet
Assignment Week 3 500832
6 pages
Yu - Application and Comparison of Classification Techniques in Credit Risk - 2007
No ratings yet
Yu - Application and Comparison of Classification Techniques in Credit Risk - 2007
35 pages
15CSL76
No ratings yet
15CSL76
35 pages
CSE3068-Sequential and Spatial Data Mining: School of Computing Science and Engineering
No ratings yet
CSE3068-Sequential and Spatial Data Mining: School of Computing Science and Engineering
12 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Unmasking The Fake Machine Learning Approach For Deepfake Voice Detection
No ratings yet
Unmasking The Fake Machine Learning Approach For Deepfake Voice Detection
12 pages
Cyberbulling Detection Using ML Updated
No ratings yet
Cyberbulling Detection Using ML Updated
13 pages
Application of Machine Learning
No ratings yet
Application of Machine Learning
8 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Improving Floating Search Feature Selection Using Genetic Algorithm
No ratings yet
Improving Floating Search Feature Selection Using Genetic Algorithm
19 pages
Supervised Machine Learning Unit 3
No ratings yet
Supervised Machine Learning Unit 3
8 pages
Module-2 Part-1 - Merged
No ratings yet
Module-2 Part-1 - Merged
66 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Technologies 09 00052 v3
No ratings yet
Technologies 09 00052 v3
17 pages
QB AMT305module 2
No ratings yet
QB AMT305module 2
4 pages
Crop Selection and Yield Prediction
No ratings yet
Crop Selection and Yield Prediction
13 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
36 pages
Implementasi Algoritma Naïve Bayes Pada Data Set Hepatitis Menggunakan Rapid Miner
No ratings yet
Implementasi Algoritma Naïve Bayes Pada Data Set Hepatitis Menggunakan Rapid Miner
6 pages

Sentiment Analysis of Reviews Using Machine Learning

Uploaded by

Sentiment Analysis of Reviews Using Machine Learning

Uploaded by

SENTIMENT ANALYSIS OF REVIEWS

USING MACHINE LEARNING

Into lower case

Pre-processed clean data

neutral positive negative

User interests Analyser Suggestion

Training data Test data

Multinomial Naïve Classification model Desired output

You might also like