Amazon Product Review Sentiment Analysis With Machine Learning
Amazon Product Review Sentiment Analysis With Machine Learning
Volume 5 Issue 4, May-June 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
INTRODUCTION
As online marketplaces have grown in popularity over the products by gaining a better understanding of what their
years, online retailers and vendors have encouraged their customers want.
customers to share their thoughts on the items they've
The Amazon electronic product evaluation dataset was taken
purchased. Thousands of reviews are written every day on
into accounts. The evaluations and ratings provided by
the Internet about a wide range of products, programmes,
customers to exceptional products, as well as reviews about
and locations. As a result, the Internet has surpassed all
the customer's product(s), were also taken into accounts.
other sources for collecting information and opinions on a
product or service. LITERATURE SURVEY
Sentiment analysis has gotten a lot of attention in recent
The Internet has revolutionized the way we purchase
years thanks to the abundance of online reviews. As a result,
products. Wherever product testing is not feasible in the
numerous studies have been conducted in this area. Some of
retail e-commerce environment of online marketplace.
the most relevant research works to this thesis are discussed
Furthermore, in today's retail sale environment, a large
in this section.
number of new products are introduced on a regular basis.
As a result, consumers can rely heavily on product feedback SVM was tested for text classification by Joachims (1998),
to shape their opinions in preparation for a more complex who found that it performed well in all experiments with
cognitive process during the purchasing process. Users, on lower error levels than other classification methods.
the other hand, always find looking out and comparing text
With the assistance of SVM and Naive Bayes and maximum
reviews to be challenging. As a result, we want a higher
entropy classification, Pang, Lee, and Vaithyanathan (2002)
numerical rating system that is backed up by feedback, so
attempted supervised learning for classifying movie reviews
that consumers can easily make a buying decision.
into two groups, positive and negative. In terms of precision,
Clients can require the use of a score device at some point all three methods performed admirably. In this analysis, they
during their decision-making process in order to locate experimented with different features and discovered that
useful feedback as quickly as possible. As a result, models when a bag of words was used as a feature in the classifiers,
that can predict a person's score based on a textual content the machine learning algorithms performed better.
assessment are critical. Obtaining a common sense of a
Three supervised machine learning algorithms, Naive Bayes,
textual evaluation may want to enhance customer service. It
SVM, and N-gram model, were tested on online feedback
can also help businesses increase sales and develop their
about various travel destinations around the world in a
@ IJTSRD | Unique Paper ID – IJTSRD42372 | Volume – 5 | Issue – 4 | May-June 2021 Page 720
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
recent survey conducted by Ye et al. (2009). They discovered Random forest Classifier (RFC)
in this study that well-trained machine learning algorithms Random Forest is a concept for putting together decision
work exceptionally well for classification of travel trees that can be obtained by combining multiple decision
destination reviews in terms of accuracy. They also showed trees. We can run into issues like outlier data or noisy data
that the SVM and N-gram models outperformed the Naive while using single tree classifiers, such as decision tree
Bayes system. However, increasing the number of training classifiers, which can affect the performance of the classifier
data sets decreased the gap between the algorithms function, while Random Forest as a classifier provides
significantly. randomness and is therefore highly resistant to noise and
outliers. This classifier produces two different forms of
Chaovalit and Zhou (2005) compared a supervised machine
randomness: data randomness and function randomness.
learning algorithm to an unsupervised approach to movie
This classifier has a number of hyper parameters because it's
review called Semantic orientation, and found that the
used to combine multiple Decision Trees, such as:
supervised approach was more efficient than the
How many trees should be built in the Decision Forest?
unsupervised form.
What is the maximum number of features that can be
Naive Bayes and SVM are two of the most widely used selected at random?
methods in sentiment classification issues, according to The maximum height of each tree.
several studies (Joachims 1998; Pang et al. 2002; Ye et al.
Since it uses the concepts of bootstrapping and bagging,
2009). As a result, this study attempts to apply supervised
Random Forest is thought to be a reliable and accurate
machine learning algorithms such as Naive Bayes and SVM to
classifier.
Amazon's beauty product reviews.
Support vector machine (SVM)
PROPOSED SYSTEM
Support vector machines (SVMs) are a type of supervised
The method entails gathering product-based datasets from
learning system that can be used to solve sentiment
various E-commerce sites such as amazon.com, epinion.com,
classification problems (Cristianini & ShaweTaylor 2000).
and others. The feedback is received on items such as
This approach positions marked training data on a decision
phones, iPods, and other electronic devices. The aim of this
plane, then uses an algorithm to create an optimal
project is to use algorithms like random forest, decision tree,
hyperplane that divides the data into groups or classes. As
and SVM to evaluate and forecast product reviews by
shown in Figure 1, the best hyperplane is the one that
classifying them as positive, negative, or neutral. We conduct
separates the groups by the largest margin. This is done by
pre-processing, extract features on which comments are
choosing a hyperplane that is the furthest away from the
made, measure polarity of feedback, and plot a graph for the
nearest data on each class (Berk 2016). “The groups are not
result since the input is about unstructured product reviews.
separated in H1. H2 has a slight advantage, but only by a
Dealing with negation is also covered in the results. For
small margin. H3 divides them by the greatest possible
instance, "the Nokia phone is not bad" is a positive review
margin.” Weinberg, Zack (2012).
despite the negative word "not." The approach flow diagram
as shown below, and the subsections are explained in detail
in the following subsections.
@ IJTSRD | Unique Paper ID – IJTSRD42372 | Volume – 5 | Issue – 4 | May-June 2021 Page 721
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
A logistic regression, on the other hand, yields a logistic Providing product reviews in a variety of languages.
curve with values ranging from 0 to 1. In logistic regression, Addressing the issue of slang mapping.
rather than using the probability, the usual logarithm of the Dealing with sarcastically expressed views.
target variable's "odds" is used to construct the curve. Identifying comparative views and determining which of
Furthermore, the predictors do not have to be normally the two products under consideration is the best.
distributed or have the same variance in and category to be Dealing with anaphora resolution, which is what the
efficient. opinion is really about.
Decision Tree Classifier (DTC) In the future, the work could be expanded to conduct
A hierarchical tree structure with attributes represented by multiclass classification of reviews, which would give
decision nodes and attribute values represented by edges. consumers a clearer picture of the review's essence, allowing
The creation of decision rules for classifying new data them to make better product decisions. It can also be used to
instances is made possible by this tree-like representation. predict a product's ranking based on the review. This would
provide consumers with a trustworthy rating because the
A decision tree is a tool for making decisions that uses a tree-
product's rating and the sentiment of the review will often
like model of decisions and their possible outcomes, such as
contradict each other. The proposed job extension would be
chance event outcomes, resource costs, and utility. It's one
extremely beneficial to the e-commerce industry by
way of displaying an algorithm that is completely made up of
increasing customer loyalty and confidence.
conditional control statements.
ACKNOWLEDGEMENT:
Result and Discussion
I do acknowledge the support and encouragement of all
The predictive accuracy of the models is calculated after
people who helped me throughout the completion of this
testing and training the dataset to decide which model is the
project.
best classifier for classifying feedback. The SVM model, as
seen in the table, has the best predictive accuracy of the four I would wish to give thanks Dr. Dinesh Nilkhant, Director -
models, whereas the Decision Tree model has the worst JGI, Knowledge Campus, Bangalore, Karnataka for proving
predictive accuracy. the facilities to try to analysis work. His leadership and
management skills are continuously a supply of inspiration.
Model Name Accuracy
Logistic Regression Classifier 93.92% I conjointly wish to give thanks Dr. M. N Nachappa, Dean,
Support Vector Machine 93.94% School of Computer Science & IT, Jain deemed to be
Random Forest Classifier 93.50% university, Knowledge campus, Bangalore, Karnataka for his
Decision Tree Classifier 90.10% support and cordial cooperation.
After a few arbitrary feedbacks, it seems that our I would wish to give thanks to our MCA & program
features are working properly with Positive, Neutral, coordinator, Dr. Bhuvana J, Mentor and Associate Professor,
and Negative outcome. Department of Master of Computer Application for providing
We can also see that our Support Vector Machine for providing the support and steerage to try to analysis
Classifier has improved to a level of 94.08 percent work. Her timely direction and motivation helped me to stay
accuracy after running the grid quest. my patience throughout this journey.
Moving further, I would wish to give thanks my sincere
gratitude to project coordinators Members, Dr. Lakshmi
JVN and Dr. Gangotri, Assistant Professor, Department of
Master of Computer Application for sharing their experience
which helped me in completing my thesis in the best possible
way. In addition, they also helped in critically reviewing and
proof reading my work and my project thesis.
References
[1] S. Brownfield and J. Zhou, "Sentiment Analysis of
Conclusion and Future Work Amazon Product Reviews," in Proceedings of the
Sentiment analysis is the process of recognizing and Computational Methods in Systems and Software,
aggregating user sentiment or opinions. The method of Springer, 2020, pp. 739--750.
deciding whether the polarity of text in a document or
sentence is positive, negative, or neutral is known as [2] T. Haque, N. Saber and F. Shah, "Sentiment analysis on
sentiment analysis. We can see that four approaches have large scale Amazon product reviews," in 2018 IEEE
been compared, and a result has been calculated for international conference on innovative research and
approaches on the product review dataset. The accuracy of development (ICIRD), IEEE, 2018, pp. 1--6.
Logistic Regression is found to be 93.92 %, SVM is found to [3] R. Jagdale, V. Shirsat and S. Deshmukh, "Sentiment
be 93.94 %, Decision Tree is found to be 90.10 %, and analysis on product reviews using machine learning
Random Forest is found to be 93.50 %. Among the four techniques," in Cognitive Informatics and Soft
models, the SVM model has the highest predictive accuracy. Computing, Springer, 2019, pp. 639--647.
We can see that text files that are too big take a long time to
process. Automatic sentimental analysis is a powerful tool [4] N. Nandal, R. Tanwar and J. Pruthi, "Machine learning
for detecting and forecasting current and future patterns. based aspect level sentiment analysis for Amazon
While opinions at the feature level have been sought, there products," Spatial Information Research, vol. 28, pp.
are still many limitations that can be explored further. The 601--607, 2020.
potential for future development –
@ IJTSRD | Unique Paper ID – IJTSRD42372 | Volume – 5 | Issue – 4 | May-June 2021 Page 722
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
[5] A. Rathor, A. Agarwal and P. Dimri, "Comparative [9] K. Srujan, S. Nikhil, H. Rao and K. Karthik,
study of machine learning approaches for amazon "Classification of amazon book reviews based on
reviews," Procedia computer science, vol. 132, pp. sentiment analysis," in Information Systems Design
1552--1561, 2018. and Intelligent Applications, Springer, 2018, pp. 401--
411.
[6] A. Ravi, A. Khettry and S. Sethumadhavachar,
"Amazon Reviews as Corpus for Sentiment Analysis [10] W. Tan, X. Wang and X. Xu, "Sentiment analysis for
Using Machine Learning," in International Conference Amazon reviews," in International Conference, 2018,
on Advances in Computing and Data Sciences, Springer, pp. 1--5.
2019, pp. 403--411.
[11] S. Wassan, X. Chen, T. Shen and M. Waqar, "Amazon
[7] J. Sing, G. Singh and R. Singh, "Optimization of Product Sentiment Analysis using Machine Learning
sentiment analysis using machine learning Techniques," Revista Argentina de Cl{\'\i}nica
classifiers," Human-centric Computing and Psicol{\'o}gica, vol. 30, p. 695, 2021.
information Sciences, vol. 7, pp. 1--12, 2017.
[12] S. Dey, S. Wasif, D. Tonmoy and S. Sultana, "A
[8] Z. Singla, S. Randhawa and S. Jain, "Sentiment analysis Comparative Study of Support Vector Machine and
of customer product reviews using machine learning," Naive Bayes Classifier for Sentiment Analysis on
in 2017 international conference on intelligent Amazon Product Reviews," in 2020 International
computing and control (I2C2), IEEE, 2017, pp. 1--5. Conference on Contemporary Computing and
Applications (IC3A), IEEE, 2020, pp. 217--220.
@ IJTSRD | Unique Paper ID – IJTSRD42372 | Volume – 5 | Issue – 4 | May-June 2021 Page 723