Sentiment Classification Based On Machin
Sentiment Classification Based On Machin
Arockiasamy Soosaimanickam
Department of Information Systems, University of Nizwa, Oman
[email protected]
Received: 15 March 2023 | Revised: 14 April 2023 | Accepted: 23 April 2023
Licensed under a CC-BY 4.0 license | Copyright (c) by the authors | DOI: https://doi.org/10.48084/etasr.5854
ABSTRACT
Online retailers and merchants increasingly request feedback from their clients on the products they
purchase. This has led to a significant increase in the number of product reviews posted online, as more
people are making purchases online. The opinions expressed in these customer reviews have a significant
impact on other customers' purchase decisions, as they are influenced by other customers'
recommendations or complaints. This study used Amazon, a well-known and widely used e-commerce
platform, to examine sentiment categorization using several machine learning techniques while analyzing
an Amazon Reviews dataset. At first, the reviews were transformed into vector representations using the
Bag-of-Words approach. Word cloud was used to illustrate the text data in terms of the frequency they
appear in the review. Subsequently, the machine learning methods decision trees and logistic regression
were used. The two models used in this study achieved high levels of accuracy in analyzing the dataset.
Specifically, the Decision Tree model outperformed the Logistic Regression one, achieving an impressive
accuracy of 99% compared to the 94% of the latter.
Keywords-sentiment analysis; Amazon customer reviews; dataset; feature extraction; text classification;
machine learning
www.etasr.com Kausar et al.: Sentiment Classification based on Machine Learning Approaches in Amazon Product …
Engineering, Technology & Applied Science Research Vol. 13, No. 3, 2023, 10849-10855 10850
determine the overall significance of customer evaluations and analysis to improve company operations and client retention, as
characterize them as positive or negative. The data used were analyzing product evaluations allows a company to understand
reviews of Amazon Titan Men Watches, collected from client experiences [16]. A client can leave a review to indicate
amazon.in. The study employed supervised algorithms to whether he is happy or unhappy with a particular product or
address the issue of sentiment classification for online reviews. service. However, the majority of product reviews do not
The goal was to determine the overall importance of customer represent the level of client happiness. In [16], a study was
evaluations by categorizing them as either positive or negative. conducted based on reviews on the Internet to classify
consumer happiness, according to auditory and language
II. RELATED WORKS characteristics, into four classes: extremely positive, positive,
Many studies used data collected from numerous sources, neutral, or highly unfavorable. The findings of this study on the
such as Twitter [3-4], product reviews, consumer feedback, etc. impact of review length on online sentiments were consistent
Customer engagement programs help businesses to strengthen with [17]. Customers prefer to rely on comprehensive and
their emotional relationships with clients from all over the insightful product reviews before making a final purchase [18].
world. Customer involvement is also influenced by product As a result, companies must discover the true degree of
reviews [5]. The study of product reviews allows a company to customer satisfaction with their products and services in order
learn how customers feel about its products. According to [6], to make an informed decision based on online product reviews.
online consumer reviews have an important influence in Authors in [19] evaluated the way internet reviews
molding customers' online shopping decisions. The term influence Amazon book sales. According to this study,
"review" refers to the process of determining whether a product customers view online reviews as a reliable source of
or service is suitable for a certain purpose. Every day, a information and prefer more accessible and detailed reviews.
massive quantity of fresh information is added to the Web [7], According to this study, internet reviews have a major impact
hence a method was proposed based on simultaneous web on user experiences and product costs. These findings were in
crawling using mobile agents. agreement with those of [20] on online reviews and feelings.
Sentiment analysis is used by businesses to increase their The study also looked at the valence of internet reviews. In
competitiveness in the market, as it enables them to [19], the influence of online reviews on purchases was found to
comprehend the opinions and experiences of their clients on be contradictory. Some studies concluded that the valance of
their products and services [8]. Sentiment analysis is also used online reviews has a major influence on sales, while others
to capture market sentiments to design a better forecasting found that it has no impact. The impact is also affected by
model for the stock market [9]. Modern organizations use variables such as product categories and qualitative text
sentiment analysis to improve their word-of-mouth marketing qualities. In [21], an analysis was conducted on 142.8 million
approach. Competitive companies use text-mining tools to Amazon user reviews, focusing on determining the usefulness
better understand client experiences and extract valuable and unhelpfulness of each review by examining the summary
information from social networking platforms, newspapers, and headline, the product remark, and helpfulness information,
other sources. In [10], k-Nearest Neighbors (k-NN), Decision filtering out blank and non-English product evaluations to
Trees (DT), and Artificial Neural Network (ANN) models were improve the accuracy of the results, and choosing only those
used to identify client behavior in the banking industry. As a with the most votes. The study concluded that an investigation
result, a mix of sentiment analysis, text mining, and other of online product evaluations on Amazon plays an important
methods is critical in ensuring that businesses understand and part in today's e-commerce. Helpful reviews provide thorough
capitalize on online consumer evaluations. Sentiment information on specific products or services based on client
extraction is an excellent method of understanding customer feedback [22]. Customers rely on reviews with the most votes
assumptions online [11], as the data collected from internet to make a purchase decision. The results are congruent with the
platforms and product review sites allow businesses to improve findings of [23] on how customer reviews impact internet
their marketing methods. Product reviews also influence client purchases. Positive product reviews enable clients to build
purchase decisions. Competitive organizations, such as greater trust in the things they want to buy online.
Amazon, use such information to make decisions [12], and Sentiment analysis has gaps due to issues with the accuracy
leading merchants in the United States rely on Internet product and reliability of the utilized methods, as well as the lack of
reviews to improve their marketing efforts and company standardization in sentiment interpretation and classification.
procedures [13]. For example, if a given product receives a More research is needed to optimize customer engagement
large number of unfavorable reviews for its cost or quality, the programs and improve the understanding of customer
company examines the problem and tries to address it as soon sentiments. This study aims to address the potential gaps in the
as possible. sentiment analysis based on Machine Learning Approaches in
According to [6], most of the product evaluations on the Amazon Product Reviews using Decision Trees and Regression
Internet are less polar and more positively balanced. Also, the models. Specifically, this study investigated the effectiveness
distribution of product reviews for similar items differs by of different customer engagement strategies in influencing
platform [14]. The diversity of reviews is driven by several customer sentiments and explored the impact of these strategies
factors, including the rating system, the business strategy of the on sentiment classification accuracy.
online platform, and the review frequency [6]. In [15], a Web
Crawling Model based on Java Aglets was presented.
Furthermore, it is important to mention that firms use sentiment
www.etasr.com Kausar et al.: Sentiment Classification based on Machine Learning Approaches in Amazon Product …
Engineering, Technology & Applied Science Research Vol. 13, No. 3, 2023, 10849-10855 10851
III. METHODOLOGY
Sentiment analysis is considered a study of people's
sentiments, feelings, and views as conveyed via writing, and is
useful in understanding other peoples' points of view on any
subject. Opinions can be positive, negative, or neutral. The
suggested method was based on an ML prediction model, to
analyze both positive and negative reviews by binary
classification. Figure 1 shows the steps of this method,
beginning with data collection and ending with the evaluation
of each classification model.
Fig. 2. Actual Amazon customer review sample.
Data collection was the initial stage of the study. Raw data
were obtained from the website amazon.in and converted to
Comma Separated Values (CSV) format. The data set
contained 4960 Titan Men Watches reviews.
C. Data Cleaning
This is a critical stage in examining any type of data and
has a significant influence on the success of ML models. There
are numerous types of pre-processing procedures and the
appropriate ways must be selected. This study employed four
distinct data preparation steps in the reviews:
1. Remove Emojis.
2. Remove HTML tags.
3. Lowercase all letters.
4. Filter numbers and special characters.
1) Emoji Removal
Although people use emojis to express their feelings, they
were removed since they did not affect the identification of the
polarity of the review.
2) HTML Tag Removal
Fig. 1. Overall sentiment analysis approach for Amazon reviews. The HTML tags of the retrieved reviews were removed
because they did not affect the determination of the polarity of
A. Programming Environment the review.
Python is one of the most used programming languages in 3) Converting all Letters to Lowercase
data science and ML, as it offers a large library collection for
In different reviews, there is a good chance that identical
solving various ML problems. Python was chosen due to its
words will appear in different situations and the system will
extensive libraries and ease of use. Scikit-learn is a Python
recognize them as distinct words. Converting all letters to
package that provides supervised ML algorithms [24], lowercase was used to avoid such problems.
including many classification algorithms, such as SVM and
Naive Bayes, and feature extraction techniques. 4) Filtering Numbers and Special Characters
B. The Dataset All the unnecessary elements in determining the review's
polarity were eliminated to make the data tidy and clean. The
Figure 2 presents an example of an Amazon review to
special letters and digits were eliminated.
better grasp the dataset's structure and format. An Amazon user
review consists of the following four key components that D. Feature Extraction
assist in comprehending and analyzing the reviews: ML algorithms interpret data in specified formats. The text
Summary: The title of the review. data were turned into numerical feature vectors, a process
known as vectorization. Bag of Words is one such approach
Review text: The review's actual content. that involves tokenization, normalization, and counting. This
Rating: The product's user rating on a scale of 1 to 5. study used CountVectorizer to represent words in terms of Bag
of Words. CountVectorizer requires specifying an N-gram
Helpfulness: The percentage of persons who considered the range, which is a tuple consisting of the lowest and maximum
review beneficial. length of the sequence of words to be regarded as features.
www.etasr.com Kausar et al.: Sentiment Classification based on Machine Learning Approaches in Amazon Product …
Engineering, Technology & Applied Science Research Vol. 13, No. 3, 2023, 10849-10855 10852
A. Unigrams
Unigrams determine the frequency of single words in the
reviews. Figures 7-8 show the unigrams of the top 20 positive
and negative reviews, respectively.
B. Bigrams
Unigrams do not offer a clear understanding of a
consumer's intended message. Bigrams can capture the
meaning behind consecutive pairs of words in the reviews.
Combining two adjacent words into a single unit can offer
valuable insight into the context and meaning of a text.
Bigrams show the frequency of two-word combinations in a
text review. For instance, in the sentence "I love this product",
the bigram "love this" would be generated, while "I hate this
product" would produce the bigram "hate this". Analyzing the
most prevalent bigrams in a collection of reviews can help to
Fig. 4. Most common words used in the reviews. understand what consumers are attempting to convey regarding
their encounter with a product or service.
www.etasr.com Kausar et al.: Sentiment Classification based on Machine Learning Approaches in Amazon Product …
Engineering, Technology & Applied Science Research Vol. 13, No. 3, 2023, 10849-10855 10853
Fig. 7. Top 20 positive review unigram. Fig. 8. Top 20 negative review unigram.
www.etasr.com Kausar et al.: Sentiment Classification based on Machine Learning Approaches in Amazon Product …
Engineering, Technology & Applied Science Research Vol. 13, No. 3, 2023, 10849-10855 10854
www.etasr.com Kausar et al.: Sentiment Classification based on Machine Learning Approaches in Amazon Product …
Engineering, Technology & Applied Science Research Vol. 13, No. 3, 2023, 10849-10855 10855
VI. CONCLUSION [12] J. Lim, M. Park, S. Anitsal, M. M. Anitsal, and I. Anitsal, "Retail
Customer Sentiment Analysis: Customers’ Reviews of Top Ten U.S.
Sentiment analysis is a necessary and popular technique for Retailers’ Performance," Global Journal of Managment and Marketing,
collecting information from text data on e-commerce websites. vol. 3, no. 1, pp. 124–150, 2019.
E-commerce platforms produce enormous volumes of text data [13] R. S. Jagdale, V. S. Shirsat, and S. N. Deshmukh, "Sentiment Analysis
every day in the form of suggestions, reviews, tweets, and on Product Reviews Using Machine Learning Techniques," in Cognitive
Informatics and Soft Computing, Singapore, 2019, pp. 639–647,
comments. Additionally, emoticons, ratings, and reviews all https://doi.org/10.1007/978-981-13-0617-4_61.
suggest people's opinions. A customer can learn more about a
[14] V. Vyas and V. Uma, "Approaches to Sentiment Analysis on Product
product and make an informed choice by extrapolating Reviews," in Sentiment Analysis and Knowledge Discovery in
information from reviews. This study used multiclass and Contemporary Business, IGI Global, 2019, pp. 15–30.
binary classification for Amazon reviews of a product using [15] Md. A. Kausar, V. S. Dhaka, and S. K. Singh, "Web Crawler Based on
supervised machine learning methods such as Logistic Mobile Agent and Java Aglets," International Journal of Information
Regression and Decision Tree. Logistic Regression produced Technology and Computer Science, vol. 5, no. 10, pp. 85–91, Sep. 2013,
an excellent outcome with 94% accuracy and Decision Tree https://doi.org/10.5815/ijitcs.2013.10.09.
produced outstanding results with 99% accuracy. E-commerce [16] S. Govindaraj and K. Gopalakrishnan, "Intensified Sentiment Analysis
of Customer Product Reviews Using Acoustic and Textual Features,"
websites should consider various feature extraction methods ETRI Journal, vol. 38, no. 3, pp. 494–501, 2016,
and machine learning techniques, examine additional product https://doi.org/10.4218/etrij.16.0115.0684.
categories, analyze unstructured data, and incorporate [17] M. Ghasemaghaei, S. P. Eslami, K. Deal, and K. Hassanein, "Reviews’
sentiment analysis into customer experience strategies to length and sentiment as correlates of online reviews’ ratings," Internet
enhance customer satisfaction and loyalty. Research, vol. 28, no. 3, pp. 544–563, Jan. 2018,
https://doi.org/10.1108/IntR-12-2016-0394.
REFERENCES [18] P. Sasikala and L. Mary Immaculate Sheela, "Sentiment analysis of
online product reviews using DLMNN and future prediction of online
[1] X. Fang and J. Zhan, "Sentiment analysis using product review data,"
product using IANFIS," Journal of Big Data, vol. 7, no. 1, May 2020,
Journal of Big Data, vol. 2, no. 1, Jun. 2015, Art. no. 5, https://doi.org/
Art. no. 33, https://doi.org/10.1186/s40537-020-00308-7.
10.1186/s40537-015-0015-2.
[19] S. K. Sharma, S. Chakraborti, and T. Jha, "Analysis of book sales
[2] J. McAuley, "Amazon product data," Recommender Systems and
prediction at Amazon marketplace in India: a machine learning
Personalization Datasets. https://cseweb.ucsd.edu/~jmcauley/datasets.
approach," Information Systems and e-Business Management, vol. 17,
html#amazon_reviews.
no. 2, pp. 261–284, Dec. 2019, https://doi.org/10.1007/s10257-019-
[3] M. A. Kausar, A. Soosaimanicka, and M. Nasar, "Public Sentiment 00438-3.
Analysis on Twitter Data during COVID-19 Outbreak," International
[20] A. Y. L. Chong, B. Li, E. W. T. Ngai, E. Ch’ng, and F. Lee, "Predicting
Journal of Advanced Computer Science and Applications, vol. 12, no. 2,
online product sales via online reviews, sentiments, and promotion
2021, https://doi.org/10.14569/IJACSA.2021.0120252.
strategies: A big data architecture and neural network approach,"
[4] M. Mahyoob, J. Algaraady, M. Alrahiali, and A. Alblwi, "Sentiment International Journal of Operations & Production Management, vol. 36,
Analysis of Public Tweets Towards the Emergence of SARS-CoV-2 no. 4, pp. 358–383, Jan. 2016, https://doi.org/10.1108/IJOPM-03-2015-
Omicron Variant: A Social Media Analytics Framework," Engineering, 0151.
Technology & Applied Science Research, vol. 12, no. 3, pp. 8525–8531,
[21] J. Du, J. Rong, S. Michalska, H. Wang, and Y. Zhang, "Feature selection
Jun. 2022, https://doi.org/10.48084/etasr.4865.
for helpfulness prediction of online product reviews: An empirical
[5] N. Nandal, R. Tanwar, and J. Pruthi, "Machine learning based aspect study," PLOS ONE, vol. 14, no. 12, 2019, Art. no. e0226902,
level sentiment analysis for Amazon products," Spatial Information https://doi.org/10.1371/journal.pone.0226902.
Research, vol. 28, no. 5, pp. 601–607, Oct. 2020, https://doi.org/
[22] Meenakshi, A. Banerjee, N. Intwala, and V. Sawant, "Sentiment
10.1007/s41324-020-00320-2.
Analysis of Amazon Mobile Reviews," in ICT Systems and
[6] V. Schoenmueller, O. Netzer, and F. Stahl, "The Polarity of Online Sustainability, Singapore, 2020, pp. 43–52, https://doi.org/10.1007/978-
Reviews: Prevalence, Drivers and Implications," Journal of Marketing 981-15-0936-0_4.
Research, vol. 57, no. 5, pp. 853–877, Oct. 2020, https://doi.org/
[23] K. Q. Anh, Y. Nagai, and L. M. Nguyen, "Extracting Customer Reviews
10.1177/0022243720941832.
from Online Shopping and Its Perspective on Product Design," Vietnam
[7] Md. A. Kausar, V. S. Dhaka, and S. K. Singh, "An Effective Parallel Journal of Computer Science, vol. 06, no. 01, pp. 43–56, Feb. 2019,
Web Crawler based on Mobile Agent and Incremental Crawling," https://doi.org/10.1142/S2196888819500088.
Journal of Industrial and Intelligent Information, vol. 1, no. 2, pp. 86–
[24] F. Pedregosa et al., "Scikit-learn: Machine Learning in Python," The
90, Jun. 2013, https://doi.org/10.12720/jiii.1.2.86-90.
Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[8] I. Karamitsos, S. Albarhami, and C. Apostolopoulos, "Tweet Sentiment
[25] A. Tripathy, A. Agrawal, and S. K. Rath, "Classification of sentiment
Analysis (TSA) for Cloud Providers Using Classification Algorithms
reviews using n-gram machine learning approach," Expert Systems with
and Latent Semantic Analysis," Journal of Data Analysis and
Applications, vol. 57, pp. 117–126, Sep. 2016, https://doi.org/10.1016/
Information Processing, vol. 7, no. 4, Nov. 2019, Art. no. 69212,
j.eswa.2016.03.028.
https://doi.org/10.4236/jdaip.2019.74016.
[9] U. P. Gurav and S. Kotrappa, "Sentiment Aware Stock Price Forecasting
using an SA-RNN-LBL Learning Model," Engineering, Technology &
Applied Science Research, vol. 10, no. 5, pp. 6356–6361, Oct. 2020,
https://doi.org/10.48084/etasr.3805.
[10] A. Rahman and M. N. A. Khan, "A Classification Based Model to
Assess Customer Behavior in Banking Sector," Engineering, Technology
& Applied Science Research, vol. 8, no. 3, pp. 2949–2953, Jun. 2018,
https://doi.org/10.48084/etasr.1917.
[11] V. K. Jain, S. Kumar, and P. Mahanti, "Sentiment Recognition in
Customer Reviews Using Deep Learning," International Journal of
Enterprise Information Systems (IJEIS), vol. 14, no. 2, pp. 77–86, Apr.
2018, https://doi.org/10.4018/IJEIS.2018040105.
www.etasr.com Kausar et al.: Sentiment Classification based on Machine Learning Approaches in Amazon Product …