100% found this document useful (1 vote)
3K views33 pages

Sentiment Analysis of Reviews Using Machine Learning

Uploaded by

Isha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
3K views33 pages

Sentiment Analysis of Reviews Using Machine Learning

Uploaded by

Isha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

SENTIMENT ANALYSIS OF REVIEWS

USING MACHINE LEARNING


Abstract
• Our project focuses on sentiment analysis of the costumer’s
reviews in one of the most trending e-commerce platform which
is women’s clothing shopping sites using machine learning.
• Machine learning concept helps to improve the shopping
experience by considering the personal preferences and
recommend the consumer while they do a new purchase based
on the history providing personalization. Sentiment analysis
allows e-commerce platforms to understand the opinions of
customer feedback. Along with understanding the emotions of
customer feedback it also analyzes the opinions for a particular
reason.
• The dataset includes attributes like: Clothing ID, Age, Title,
Review Text, Rating, Recommended IND, Positive Feedback
Count, Division Name, Department Name, and Class Name.
Abstract
• We attempt to understand the correlation of different
variables in customer reviews on a women clothing e-
commerce, and to classify each review by the depth
meaning of the words and these words further helps us to
predict whether the reviewed product is recommended or
not and whether it consists of positive, negative, or
neutral sentiment. To achieve these goals, we employed
Multinomial Naive Bayes algorithm.
• To understand the dataset we are using data
representation techniques like bar graphs which showed
the reviews vs. age and category. We also create confusion
matrix to check for the efficiency of our classifier.
Aim
Our aim is to analyze and classify the reviews
present in the dataset based on category of
cloths and age group and hence provide the
recommendations and rating based on the
customer’s new review on the cloth.
Objective
• To focus on e-commerce reviews on women clothing
shopping sites, where our aim is to help in
summarizing the product reviews.
• To help the retailers, the e-commerce platforms will
use the summary of customer feedback to improve
the quality of the products.
• To help the existing and prospective customers in
deciding the products of their interests.
Application of Project
• Recommend the best clothes in each age
group to the customer.
• Recommend the best clothes for all
category of clothes to the customer.
• Determines the recommended IND based
on the reviews provided for the e-
commerce platform.
• Determines the rating based on the reviews
provided for the e-commerce platform.
Workflow: Pre-processing
Dataset
dazzz

Remove
unnecessary Lemmatization
data

Into lower case


Tokenization and single
representation

Pre-processed clean data


Workflow: Polarity
Pre processed clean data

Polarization

neutral positive negative

User interests Analyser Suggestion


Workflow: Classification
Pre processed clean data

Hot encoding/
sparseing

Bag of words

Splitting

Training data Test data

Multinomial Naïve Classification model Desired output


Bayes
Step 1:GATHER THE DATASET
Step 1:GATHER THE DATASET
• The dataset used is Women’s Clothing E-Commerce
dataset revolving around the reviews written by
customers.
• Its nine supportive features offer a great environment
to parse out the text through its multiple dimensions.
• The attributes includes: independent attributes like
clothing ID, Age, Department name, Title.
• Dependent attributes like Division name and Class
name depends on Department name and Clothing ID.
Review, Rating and positive feedback depends on Age
and Title.
Step 2: CLEANING OF DATASET
• We first extract the main attributes from the given dataset
which includes Clothing ID, Age, Review Text, Rating,
Recommended IND, and Class Name.
Step 2: CLEANING OF DATASET
• Since, we have to pre-process and clean the text we
extract the Review Text attribute column.
• We define a function called remove_noise which
consists of the following:
• Using lower() function to convert the text to
lowercase
• Use stip() function to remove the whitespaces.
• Use repalce() function along with the re library
which hs regular expressions to remove repeated
numbers, punctuations(Ex. Beautiful!!!! =
beautiful(!*))
Step 2: CLEANING OF DATASET
• Remove stop words like and, are, because, at etc. from review
text by comparing it with the stop words list present in stop
words library.
Step 2: CLEANING OF DATASET
• Tokenize the text by separating it into individual words to the
specific tokens (example: word “Beautiful” = adjective = token
(JJ)) in order to convert the words to vector form.
•  Lemmatize the words in order to get appropriate meaning of
the words(example: words such as “studied”, “studies”, and
“studying” to simple form of word “study”).
Step 3: COMPUTING POLARITY
• Pre-process data is sent to textblob library which has a
sentiment module which has variable called polarity.
• Polarity is a float variable which derives the meaning of the
word given by British English and rates the words in the range
of -1 to 1 where -1 to 0 being negetive words and 0 to 1 being
postive words and 0 being a neutral words.
Step 3: COMPUTING POLARITY
• As our objective is to provide the customer the clothing IDs
which are highly reccomended analysed from the reviews,we
consider the clothing ID’s which has the highest possible
polarity that is 1. Now we send these IDs to the analyser.
• Based on the user interests like age group or catogory of
clothes, the analyser suggests the clothes.
Step 4: FINDING A GOOD DATA
REPRESENTATION
• We use CountVectorizer module of sklearn library to achieve the
following:
• Build a vocabulary of all the unique words in our dataset, and
associate a unique index to each word in the vocabulary using
nltk.corpus.
• Now we compare this vocabulary of words with our filtered dataset
and fit the filtered words with the corresponding values based on
importance (weightage) present in vocabulary using fit function.
• Next the transform function creates a matrix of size 23,486 review
text X number of words compute by fit function. This is called
Sparse matrix.
• Now we add the weightage of all the words present in our filtered
dataset with its corresponding position in sparse matrix. At each
index in this list, we mark how many times the given word appears
in our sentence. This process is done using fit_transform function.
Step 4: FINDING A GOOD DATA
REPRESENTATION
• Now we get the final sparse matrix which we use for further
processing.
Step 4: FINDING A GOOD DATA
REPRESENTATION
• For visualization purpose we use get_feature_name function which
finds the word that was occurred maximum number of times and
also its weight in the reviews for a particular clothing id.
Step 5: BUILDING A CLASSIFIER
• By using train_test_split function of model_selection
module of sklearn library we split the sparse matrix into
training and testing data in the ratio of 80:20 using
test_size parameter.
• Now we decide the classification model that we wish
to use to train our prediction model.
Step 5: BUILDING A CLASSIFIER
• We have used Multinomial Naive Bayes because it calculates
likelihood to be count of an word/token (random variable)
and Naive Bayes calculates likelihood to be following:
Step 5: BUILDING A CLASSIFIER
• To achieve this we use MultinomialNB() function
from naïve Bayes module of sklearn library.
• Our ultimate goal is to train our model to learn
the probabilities needed in order to make a
classification decision. We achieve this by
training the model by giving it the training set of
data.
Step 6: USING TRAINED MODEL
FOR PREDICTION
• Now we use the prediction model that we
obtained after training, to compute further
predictions. We pass the test data to the
predict function which gives us the
prediction results like recommendation IND
or rating.
Data visualization
Histogram of Review vs. Age
Data visualization
Review vs. Category name
Data visualization
Item Id vs. Popularity
Confusion matrix
Confusion matrix for rating
Performance Measurement:
Rating
• Performance report:
precision recall f1-score support
1 0.82 0.33 0.47 199
5 0.96 1.00 0.98 3266
micro avg 0.96 0.96 0.96 3465
macro avg 0.89 0.66 0.72 3465
weighted avg 0.95 0.96 0.95 3465
Confusion matrix
Confusion matrix for recommend ind
Performance Measurement: Recommended
IND
• Performance report:
precision recall f1-score support
0 0.70 0.57 0.63 1036
1 0.91 0.95 0.93 4863
micro avg 0.88 0.88 0.88 5899
macro avg 0.81 0.76 0.78 5899
weighted avg 0.87 0.88 0.88 5899
Conclusion
• Through this project we were able to explore the vast libraries
and modules present in NLP and also understand and
implement few of machine learning algorithms.
• Sentiment analysis helped us to provide the users with best
recommended clothing ID.
• Also, Multinomial Naïve Bayes algorithm helped us to provide
the retailers an easy and efficient way to know whether the
users have genuine interests in their products by providing
them with true ratings and recommended IND.
• Thus, we were able to achieve our goals of improving user
experiences and retailers service.
References
• Abien Fred M. Agarap, Department of Computer Science Adamson University Manila, Philippines
[email protected] ,Paul M. Grafilon, Ph.D.† Department of Computer Science Adamson University
Manila, Philippines [email protected], Statistical Analysis on E-Commerce Reviews, with Sentiment
Classification using Bidirectional Recurrent Neural Network
• Geoff Hulten, Apress Media LLC publishers, Building Intelligent Systems, A Guide to Machine Learning Engineering
• Minqing Hu and Bing Liu ,Mining and Summarizing Customer Reviews , Department of Computer Science University
of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
• Paul Barry (2nd Edition), Head First Python, O’Reilly publications.
• Sasikala P*1 , L.Mary Immaculate Sheela#2 *Research Scholar, Department of Computer science, Mother Teresa
Women’s University, Kodaikanal, India, International Journal of Applied Engineering Research ISSN 0973-4562
Volume 13, Number 14 (2018) pp. 11525-11531 © Research India Publications. http://www.ripublication.com,
Sentiment Analysis and Prediction of Online Reviews with Empty Ratings
• Vishal A. Kharde, S S Sonawane, (April 2016), Sentiment Analysis of Twitter Data: A Survey of Techniques,
International Journal of Computer Applications (0975 – 8887) Volume 139 – No.11.
• https://www.digitalocean.com/community/tutorials/how-to-set-up-jupyter-notebook-for-python-3
• https://www.digitalocean.com/community/tutorials/how-to-work-with-language-data-in-python-3-using-the-natural
-language-toolkit-nltk
• https://pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/
• https://www.youtube.com/watch?v=3Pzni2yfGUQ&feature=youtu.be
• https://www.kaggle.com/

You might also like