0% found this document useful (0 votes)
4 views

Sentiment Analysis of Amazon Reviews Using Machine Learning Algorithms

Uploaded by

Web Research
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Sentiment Analysis of Amazon Reviews Using Machine Learning Algorithms

Uploaded by

Web Research
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Sentiment Analysis of Amazon Reviews Using Machine Learning

Algorithms
K.E Hemapriya1, Siva Ranjini.C2, Siva Roopini.C3
Sri Krishna Arts and Science College, Coimbatore, India
Abstract
Amazon is the world's largest online retailer and marketplace by revenue and market
share. It also leads the smart speaker industry, offers cloud computing services via AWS, live-
streams on Twitch, and is a web-based firm. This paper uses Weka's machine learning
methods to give a thorough analysis of sentiment found in Amazon product evaluations.
Since e-commerce giants like Amazon have grown at an exponential rate, businesses must
now analyze customer sentiment from reviews to make informed decisions. In this work, we
use three widely used machine learning algorithms— Enhanced predictive accuracy
classifier(EPAC), Efficient J48 decision tree classifier(EJ48DTC), and Probabilistic
classifiers reverend features independence efficiency machine learning(PCRFIEML) —to
categorize reviews into positive and negative attitudes. To ascertain the efficacy of these
algorithms in sentiment analysis tasks, we assess their performance in terms of accuracy,
precision, recall, and F1 score. We also investigate how different text preparation methods
affect categorization results. Our results offer insightful information on the applicability of
various machine learning techniques for sentiment analysis of Amazon reviews, aiding
businesses in extracting actionable intelligence from vast amounts of customer feedback.

Keywords: Sentiment analysis (SA), Machine Learning Algorithms (MLA), Amazon


reviews, EPAC, EJ48DTC, PCRFIEML, Weka

1. Introduction
Online retailers like Amazon have completely changed how customers engage with
goods and services in the age of digital commerce. Amazon.com is an e-commerce platform
that sells a wide range of products, including media (books, movies, music, and software),
apparel, baby products, consumer electronics, beauty products, gourmet food, groceries,
health and personal care products, industrial and scientific supplies, kitchen items, watches,
jewelry, turf and garden products, musical instruments, outdoor equipment, tools, automotive
products, toys and activities, farm supplies, and consulting services. Customer reviews assist
customers in learning more about the product and deciding whether it is good for them.
Customer reviews should provide customers with honest product feedback from other buyers.
We have a zero-tolerance policy for any review that attempts to mislead or manipulate
customers.

Understanding the attitude portrayed in these texts has become critical for
organizations looking to improve customer happiness and improve their offers, as millions of
people contribute to enormous libraries of evaluations. Sentiment analysis, a branch of
natural language processing, automatically classifies attitudes as neutral, positive, or
negative, providing a methodical way to conclude such textual data. Sentiment Analysis is the
most often utilized technique for analyzing text-based data and identifying sentiment content.
Opinion mining is another term for sentiment analysis. A diverse set of text data is being
created in the form of recommendations, feedback, tweets, and comments. E-commerce such
as Amazon.com platforms generate a large amount of data every day in the form of client
reviews.

Analyzing E-Commerce data can assist online companies in understanding customer


expectations, providing a better shopping experience, and increasing sales. This study uses
sophisticated machine learning techniques made possible by the Weka tool to explore the
field of sentiment analysis in the context of Amazon reviews. Weka is an open-source
application that combines several machine-learning algorithms for data mining jobs. It
includes tools for data preparation, classification, regression, clustering, association rule
mining, and visualization. Through the use of Weka, we aim to decipher the complexity
involved in sentiment analysis of Amazon reviews. We want to highlight the merits,
shortcomings, and relative performances of these algorithms in properly identifying reviews
as positive or negative feelings by conducting a comprehensive comparison analysis.
Forecasting knowledge derived from unstructured data found in social networks is a difficult
research challenge. Sentiment analysis or text analysis is the use of statistics and machine
learning approaches to extract, identify, or otherwise characterize the sentiment content of a
text unit. Furthermore, we investigate the influence of various text preparation approaches on
classification results, contributing to the wider discussion of best practices in sentiment
analysis.

Naive Bayes is a probabilistic classifier that is easy to use and effective since it
assumes independence between features. However, it has trouble identifying intricate
correlations between variables, which might result in less accurate sentiment analysis,
particularly when dealing with sarcasm or subtle language.
The interpretable Efficient J48 decision tree classifier decision tree technique can
handle both category and numerical input. Unfortunately, it often overfits noisy data, which
reduces its efficacy in sentiment analysis, especially in the case of huge feature spaces or
unbalanced datasets. Several decision trees are combined in an Enhanced predictive accuracy
classifier, an ensemble technique, to increase resilience and decrease overfitting. However,
because sentiment analysis relies on majority voting from numerous trees, it may have
trouble comprehending the model's judgments and fail to pick up on finer subtleties in
language. Through this paper, the algorithms are enhanced by overcoming the drawbacks.

The study's results have major significance for organizations functioning in the digital
marketplace since they provide actionable insights into customer attitudes that may guide
strategic decision-making processes. By shedding light on the efficacy of various machine
learning approaches in the context of Amazon reviews, this study hopes to advance the state-
of-the-art in sentiment analysis and pave the way for more nuanced and sophisticated
methodologies for extracting meaningful intelligence from textual data. Through meticulous
investigation and empirical analysis, we hope to provide a comprehensive understanding of
the complexities involved in sentiment analysis of Amazon reviews, ultimately contributing
to the improvement of customer-centric strategies and the optimization of business outcomes
in the digital age.

2. Literature Survey
The study by Muhammad Ali presents a novel approach that includes nostalgic
elements based on the item's attributes. Amazon consumer data was used to launch and verify
feedback. [1] The world's first data center to measure public mood. The system performs pre-
processing actions such as stone-coating, tokenization, packing, and stop-word removal from
databaset al.
‘Rezaul Haque’,’ Naimul Islam’, ‘Mayisha Tasneem’ and ’Amit Kumar Das’ gives
details about multi-class sentimental analysis on social media comments which is a
challenging issue that has garnered scholarly interest. [2] They comprehend the sentiment
behind a social media message with the help of multi-class sentimental anlysis on Bangla is
one of the most frequently spoken languages in the world, yet studies conducted in the
language have not been sufficiently significant or effective in predicting textual mood .
Manish Bhargava and Himanshu Arora state the importance of Support Vector
Machine (SVM) in the analysis of tweets dataset on Weka tool and performance of the model
is analyzed [3].
This work seeks to create a Flexible Learning Experience analyzer model by utilizing
five supervised machine learning techniques. The WEKA machine learning method was used
to evaluate the model's efficacy using a 10-fold cross-validation strategy[4]. Results were
then compared.
Mihir P. Mehta, Gopal Kumar and M. Ramkumar suggest this study which created a
new scale for evaluating consumer feedback by using topic modeling and sentiment analysis
to examine TripAdvisor hotel reviews. This method enhanced the study of the hospitality
experience by better classifying the feedback[5]. The results showed that during the COVID-
19 epidemic, consumer satisfaction decreased, with North America and Europe doing
particularly well. Among Asian nations, Sri Lanka has the highest rate of consumer
satisfaction.
Jin Zhou and Jun-Min Ye draw the conclusion that sentimental analysis has a
significant influence on education research and that qualitative methods should be used to
verify the results and look into the psychological underpinnings of emotion learning [6].
Vasundhara and Suraiya Parveen suggest that business development improves by reviewing
products and understanding client needs. Combining relevant features such as Stanford's
(POS) part of speech tagging, Sentiwordnet lexicon, and classifier methods can increase
results and accuracy[7]. They analyzed a dataset using the WEKA tool and concluded that
support vector machines outperform alternative classification techniques.
Tanjim Ul Haque et al. matched their findings to comparable studies on product
reviews. They analyzed a small sample of Amazon product reviews to identify polarized
opinions towards the goods [8]. They achieved over 90% accuracy, precision, and recall using
the F1 measure.
The technique is beneficial for customers searching for items, organizations tracking
brand sentiment, and other applications. The use of classifier ensembles and lexicons for
sentiment categorization in microblogging services like Twitter has received limited attention
in the literature[9]. Experiments using public tweet sentiment datasets demonstrate that
classifier ensembles combining Multinomial Naive Bayes, SVM, Enhanced predictive
accuracy classifier, and Logistic Regression increase classification accuracy.
In [10], opinion mining was conducted on a small dataset of Amazon product reviews
to identify polarized sentiments regarding the goods.
3. Methodology
The methods used to perform sentiment analysis on Amazon data are done by using
three algorithms such as Efficient J48 decision tree classifier, Probabilistic classifiers
reverend features independence efficiency machine learning, and Enhanced predictive
accuracy classifier. There are various steps to performing sentiment analysis on an Amazon
dataset using the Weka tool.
STEP 1: Data Preparation
Obtain the Amazon dataset, which includes reviews and sentiment labels (good or
negative). Ensure that the dataset is correctly prepared, with each review and sentiment label
clearly labelled.
STEP 2: Data pre-processing
Load the dataset into Weka in the proper file format (e.g., CSV or ARFF).
To clean up the text data, use preprocessing processes such as tokenization, lowercase, stop-
word removal, punctuation removal, and stemming. Convert the preprocessed text data to a
Weka-compatible format, such as bag-of-words or TF-IDF vectors.
STEP 3: Feature Extraction
Select characteristics from the preprocessed text data to represent each review.
Depending on the representation used, this stage may include constructing a feature vector for
each review based on word frequencies or other linguistic properties.
STEP 4: Model Training
Divide the dataset into training and testing groups (e.g., 70% training and 30%
testing). Train sentiment analysis models with three machine learning algorithms: EJ48DTC,
PCRFIEML, and EPAC. Use Weka's built-in classifiers for EJ48DTC, PCRFIEML, and
EPAC, or create custom classifiers if necessary.
STEP 5: Model Evaluation:
Assess the trained models using relevant measures like as accuracy, precision, recall,
and F1-score. Compare the efficacy of the EJ48DTC, PCRFIEML, and EPAC classifiers in
sentiment analysis on the Amazon dataset. Analyze any substantial performance
discrepancies and determine each model's strengths and flaws.
STEP 6: Result Interpretation
Interpret the sentiment analysis model outputs and draw inferences based on their
performance on the Amazon dataset. Identify trends and insights from the data, such as
which algorithm is best for sentiment categorization and what factors influence its
performance.
3.1. Efficient J48 decision tree classifier (EJ48DTC):
EJ48DTC in Weka creates decision trees by recursively splitting data based on
attribute values, aiming for homogenous subsets. It chooses features that provide
considerable information gain or entropy reduction. Preprocess Amazon reviews, extract
features, and train EJ48DTC on labeled data before doing sentiment analysis. Evaluate its
performance using criteria such as accuracy, then evaluate the resulting tree to get insight into
sentiment categorization. EJ48DTC provides a straightforward and effective technique for
assessing sentiment in Amazon reviews, finding influential variables and their thresholds, and
calculating positive or negative sentiment in a streamlined process.

Fig.3.1. Flow diagram for EJ48DTC algorithm

3.1.1Formula For EJ48DTC:


EJ48DTC tries to create decision trees by recursively splitting data according to
attribute values. The critical step is to choose qualities that yield the greatest information gain
or reduction in entropy.
Split Criterion=Information Gain Entropy Reduction
The information gain or entropy reduction is determined as follows:
 Information Gain = H(B)−H(B∣A) (1)
𝑛
𝑁⋅𝑖
 Entropy Reduction = 𝐻 (𝐵) − ∑ 𝐻 (𝐵𝑖 |𝐴) (2)
𝑖= 𝑁

Important metrics utilized for attribute selection in the EJ48DTC method implemented in
Weka for sentiment analysis on the Amazon dataset are Information Gain and Entropy
Reduction. Information Gain divides the data according to a certain property to quantify the
decrease in uncertainty regarding the class label (sentiment). After dividing the data
according to the property, it compares the entropy of the original class distribution with the
new one. This idea is expanded upon by Entropy Reduction, which takes into account the
weighted average of entropy reduction overall potential attribute values. EJ48DTC selects the
most illuminating characteristics for categorization based on these parameters.

Parameter Description
H(B) This is the entropy of the
target variable B.
H(B∣A) This is the conditional
entropy of B was given the
attribute A.
H(Bi ∣A) This is the conditional
entropy of B was given the
subset of A = ai.
Ni This is the number of
instances in the subset
N This is the total number of
instances.
Table 1: Formula Description

3.1.2. EJ48DTC Algorithm Steps


Step 1: Load the Amazon data collection.
Step 2: Split the data into training and testing forms.
Step 3: Use filters like bag-of-words representation or TF-IDF vectors.
Step 4: Train the EJ48DTC decision tree classifier on the training set while supplying the
target variable (sentiment labels) and input characteristics.
Step 5: Assess the trained EJ48DTC model on the testing set using performance measures
like accuracy, precision, recall, and F1-score.
Step 6: Examine the created decision tree to understand and discover key attributes.
Step 7: Evaluate the EJ48DTC decision tree classifier's performance against alternative
machine learning techniques, such as Naive Bayes or Random Forest.
Step 8: Perform pruning techniques such as reduced-error pruning or cost-complexity
pruning to prevent overfitting and improve the generalization ability of the decision tree
model.
Step 9: Implement feature scaling to ensure that numerical features contribute effectively to
the decision-making process of the decision tree.
Step 10: Tune parameters such as minimum number of instances per leaf or confidence
threshold for splitting to optimize the performance of the decision tree on sentiment analysis.
Step 11: Handle missing values in the dataset using strategies like imputation or utilizing
surrogate splits to prevent biased decision making.
Step 12: Experiment with ensemble methods like bagging or boosting to enhance the
performance of the EJ48DTC decision tree model by combining multiple trees.
Step 13: Interpret the decisions made by the EJ48DTC decision tree model by analyzing
feature importance and visualizing the decision tree structure.
Step 14: Generate a detailed report including performance metrics, feature importance
analysis, and any insights gained from model evaluation.
Step 15: Stop the process.

3.2. Probabilistic classifiers reverend features independence efficiency machine learning


(PCRFIEML):
PCRFIEML is a Bayes' theorem-based classification technique that assumes predictor
independence. Its simplicity belies its usefulness, especially in text categorization and
sentiment analysis. PCRFIEML assigns an instance to the class with the highest probability
after calculating the chance of its belonging to each class based on feature probabilities. Its
computational efficiency and compatibility with high-dimensional data make it a popular
choice, particularly for text categorization jobs. Weka includes PCRFIEML as a classifier for
text classification problems. It is used to train a classifier on a labeled dataset, with each
instance representing a document or text sample and its sentiment label (positive or negative).
Weka's PCRFIEML implementation computes the conditional probability of each class based
on input data (words or features derived from text). In Amazon sentiment analysis,
PCRFIEML can successfully categorize reviews as positive or negative by learning the
probability distributions of the words or attributes associated with each sentiment class. It
uses the incidence and frequency of terms in reviews to produce predictions, making it ideal
for text classification tasks such as sentiment analysis. when dealing with large datasets like
those found in Amazon reviews.

Fig.3.2. Flow diagram for PCRFIEML algorithm

3.2.1. Formula For PCRFIEML:

Assuming feature independence, PCRFIEML computes the probabilities based on the


frequency and occurrence of words or other characteristics in the text input. Subsequently, the
input text is categorized into the sentiment category with maximum posterior probability. By
utilizing statistical probabilities, this method enables PCRFIEML to assess sentiment in
textual data efficiently.

𝑃(𝑎|𝑏)×𝑃(𝑏)
𝑃 (𝑎) = (3)
𝑃(𝑏|𝑎)

Parameter Description
𝑃 (𝑎 ) it is the likelihood of seeing the input features, represented by the
normalization factor x.
𝑃 (𝑏 ) it is the class b prior probability, which shows the likelihood that
each sentiment label will appear in the dataset.
𝑃 (𝑏 |𝑎 ) The likelihood of a sentiment label (such as positive or negative)
given the input text is represented as the probability of class b given
the input attributes a.
𝑃 (𝑎 |𝑏 ) It denotes the likelihood of a sentiment label (such as positive or
negative) given the input text and is the probability of class b given
the input features a.
Table 2: Formula Description

3.2.2. Improved PCRFIEML Algorithm Steps


Step 1: Load the Amazon dataset containing reviews and their corresponding sentiment
labels.

Step 2: Split the dataset into training and testing sets to facilitate model evaluation.

Step 3: Perform extensive feature engineering to extract relevant features from the Amazon
review dataset. Consider features like word frequency, n-grams, sentiment lexicons, etc.

Step 4: Preprocess the text data, including steps like tokenization, lowercasing, stop-word
removal, and possibly stemming. Also, handle negation and context in the text using
techniques like negation handling and part-of-speech tagging.

Step 5: Utilize techniques such as information gain or chi-square test to select the most
informative features that contribute significantly to sentiment analysis.

Step 6: Convert the preprocessed text data into a suitable format, such as bag-of-words
representation or TF-IDF vectors.

Step 7: Train the PCRFIEML classifier on the training set, specifying the target variable
(sentiment labels) and input features (selected textual characteristics).

Step 8: Evaluate the trained PCRFIEML model on the testing set using performance metrics
like accuracy, precision, recall, and F1-score to assess its effectiveness in sentiment analysis.

Step 9: Analyze the learned probabilities from the PCRFIEML model to understand the
importance of different words/features in predicting sentiment.
Step 10: Combine multiple PCRFIEML classifiers using techniques like bagging or boosting
to improve classification accuracy.

Step 11: Compare the performance of the PCRFIEML classifier against alternative machine
learning techniques to determine the most suitable approach for sentiment analysis on the
Amazon dataset.

Step 12: Generate a detailed report including performance metrics, feature importance
analysis, and any insights gained from model evaluation.

Step 13: Stop the process.

3.3. Enhanced predictive accuracy classifier (EPAC):


A potent ensemble learning technique, EPAC builds many decision trees during
training and outputs the mean prediction (regression) or the mode of the classes
(classification) of each tree. To lessen overfitting and decorating the trees, each decision tree
is built using a random subset of features and a random subset of training data. The EPAC
classifier in Weka provides access to EPAC and lets users set up parameters like how many
trees are in the forest, how many characteristics to take into account at each split, and other
controls over the tree-building process. EPAC may be used in sentiment analysis on Amazon
data to categorize reviews into positive or negative feelings according to different variables
that are taken from the text data. EPAC can effectively aid in the accurate classification of
sentiment in Amazon reviews by utilizing its capacity to handle high-dimensional data and
capture intricate relationships between features and sentiment labels. This can provide
valuable insights for businesses seeking to comprehend customer opinions and preferences.
Fig.3.3. Flow diagram for EPAC algorithm

3.3.1. Formula For EPAC:


All the decision trees in an EPAC ensemble is trained using a random selection of
features and a portion of the training data. Following training, every decision tree predicts the
emotion of a certain Amazon review on its own. These individual tree projections are
aggregated to form the final sentiment prediction made by the EPAC ensemble. The ultimate
prediction for the review is chosen from all of the decision trees' predictions based on the
sentiment label that appears most frequently. The EPAC classifier in Weka handles this
aggregating operation with ease. EPAC enhances the generality and robustness of sentiment
analysis on Amazon data by utilizing the predictions of several decision trees, successfully
identifying the subtleties and underlying patterns in customer reviews.

𝐲̂𝐑𝐅 = 𝐦𝐨𝐝 ⅇ (𝐲̂𝟏 , 𝐲̂𝟐 , 𝐲̂𝟑′ … , 𝐲̂𝐧 ) (5)

Parameter Description
ŷRF It reflects the sentiment label that the Improved Random
Forest ensemble has predicted.
Mode() The mode (most common) emotion label among all decision
tree predictions is determined by the function mode.
ŷ1 , ŷ2 , ŷ3′ … , ŷn These are the sentiment labels that each decision tree in the
Improved Random Forest ensemble predicts to be present.
Table 3: Parameter Description for EPAC algorithm

3.3.2. EPAC Algorithm Steps

Step 1: Load the Amazon dataset into Weka.

Step 2: Divide the dataset into training and testing sets.

Step 3: Preprocess the text data by lowercasing, tokenizing, and removing stop words.

Step 4: Convert the preprocessed text data into a format appropriate for Weka, such as TF-
IDF vectors or bag-of-words representation.

Step 5: Choose the EPAC classifier from the Weka interface under the "Classify" tab,
indicating parameters like the number of trees in the forest, the number of features to take
into account at each split, and other considerations.

Step 6: Utilize ensemble techniques within EPAC, such as feature subset selection or random
feature selection, to introduce diversity among the trees and improve model performance.

Step 7: Use the training set to train the EPAC classifier.

Step 8: Assess the trained EPAC model's efficacy in sentiment analysis by utilizing
performance measures such as accuracy, precision, recall, and F1-score on the testing set.

Step 9: Analyze the feature importances acquired via the EPAC model to determine the
important features in predicting sentiment.

Step 10: Compare the EPAC classifier's performance against that of other machine learning
approaches, such as Naive Bayes or Support Vector Machines.

Step 11: Analyze the outcomes and learnings from the EPAC model on sentiment analysis on
the Amazon dataset.
Step 12: Interpret the decisions made by the EPAC model by visualizing individual decision
trees or feature importance plots, which can provide insights into how the model is analyzing
the text data for sentiment analysis.

Step 13: Generate a detailed report including performance metrics, feature importance
analysis, and any insights gained from model evaluation.

Step 14: Stop the process.

4. Performance Evaluation
In this study, we used three well-liked classifiers from the WEKA tool— EJ48DTC,
PCRFIEML and EPAC Classifier—to compare sentiment analysis on Amazon product
evaluations. 400 instances of Amazon product reviews, each with a label indicating whether it
was a favorable or negative feeling, were used in the experiment. To evaluate the classifiers'
performance, a range of assessment measures were used during training and testing.

According to the findings, Naive Bayes became the go-to model for sentiment
analysis using Amazon data. Attaining a maximum accuracy of 88.25%, it accurately
identified 353 out of 400 cases. In addition, Naive Bayes fared better than the EPAC
Classifier and EJ48DTC Decision Tree in terms of the Kappa statistic, relative absolute error,
mean absolute error, root mean squared error, and root relative squared error.

In particular, Naive Bayes performed better than other methods, as evidenced by its
Kappa score of 0.7647, which suggests significant agreement that goes beyond chance. In
comparison to the other classifiers, it also showed decreased mean absolute error and root
mean squared error, indicating improved sentiment classification precision. The Naive Bayes
algorithm had the highest accuracy and lowest relative absolute error as well as root relative
squared error.

The comparison research demonstrates the accuracy and robustness of Naive Bayes in
identifying both positive and negative attitudes expressed in product evaluations, highlighting
its usefulness in sentiment analysis on Amazon data. In light of this, we suggest that Naive
Bayes be used as the model of choice for sentiment analysis in comparable situations. This
will provide insightful information for companies and scholars who want to use sentiment
analysis to analyze client feedback and inform decision-making.

4.1. Parameters Used for Evaluation


41.1. Correctly Classified Instances
The number and percentage of examples that each classifier properly categorized are
shown by this parameter. It is a crucial indicator of the classification model's accuracy.

4.1.2. Incorrectly Classified Instances


This parameter gives the total number and percentage of cases that each classifier
misclassified. It offers information on the model's classification mistakes.

4.1.3. Kappa Statistic


The Kappa statistic accounts for agreement that could arise by chance while
calculating the degree of agreement between the actual and anticipated categories. Better
agreement between the actual and anticipated categories is indicated by a higher Kappa
statistic.

4.1.4.Mean Absolute Error


The average absolute difference between the expected and actual values is measured
by the mean absolute error parameter. It gives an indicator of how accurate the forecasts were
generally, with lower numbers denoting more accuracy.

4.1.5. Root Mean Squared Error


By calculating the square root of the outcome, the root mean squared error, or RMSE,
calculates the average squared difference between the actual and projected values. It offers a
prediction error-based performance indicator for the model, with lower values denoting
greater performance.

4.1.6. Relative Absolute Error


The mean absolute error is expressed as a percentage of the actual values' mean using
this parameter. It offers a relative indicator of prediction accuracy, where more accuracy is
shown by lower numbers.

4.1.7. Root Relative Squared Error


The root mean square error is expressed as a percentage of the root mean square error
of the actual values in this parameter, which is called root relative squared error. In terms of
prediction errors, it gives a relative indicator of the model's performance, with lower numbers
denoting higher performance.

SUMMARY EJ48DTC PCRFIEML EPAC


Correctly Classified Instances 347 86.75 % 353 88.25 % 400 100 %

Incorrectly Classified Instances 53 13.25 % 47 11.75 % 0 0%

Kappa statistic 0.7359 0.7647 1

Mean absolute error 0.2023 0.175 0.1198

Root mean squared error 0.318 0.3093 0.1429

Relative absolute error 40.5771 % 35.1046 % 24.0223%

Root relative squared error 63.7007 % 61.9583 % 28.6276%

Total Number of Instances 400 400 400

Table 4. Comparison Table of Results

5. RESULTS AND DISCUSSION


For those working in the fields of sentiment analysis and natural language processing,
the results of the sentiment analysis findings using WEKA classifiers offers insightful
information. The results demonstrate how well three distinct classifiers such as EJ48DTC,
PCRFIEML, and EPAC Classifier classify feelings as correctly as possible in Amazon
product evaluations. Out of all of them, Naive Bayes was the best-performing model, with a
maximum accuracy of 88.25% and the best performance on many assessment measures. For
companies and researchers looking to use sentiment analysis for consumer feedback analysis
and decision-making processes, these findings have useful implications. Stakeholders may
make educated decisions about model selection, resource allocation, and future sentiment
analysis research areas by being aware of the advantages and disadvantages of various
classifiers. Overall, the results provide insightful information for readers of international
journals interested in natural language processing applications, as well as furthering the
discipline of sentiment analysis. The below figures illustrate the results generated through the
sentiment analysis of amazon data in Weka tool.
Fig.5.1. Imported Amazon data in Weka Tool

Fig.5.2. Analysis Report of EPAC Algorithm

Fig.5.3. PCRFIEML Report


Fig.5.4. PCRFIEML Classifier Graph

Fig .5.5. EJ48DTC Classifier Report

Fig.5.6. Visualization of EJ48DTC Tree Classifier


211

189

negative positive

Fig.5.7. Bar chart For Overall Amazon Classified dataset

Incorrectly Classified Instances


14.00%
12.00% 13.25%
10.00% 11.75%
8.00%
6.00%
4.00%
2.00%
0%
0.00%
J48 CLASSIFIER NAIVE BAYES RANDOM FOREST
CLASSIFIER

Fig. 5.8. Incorrectly classified instance comparison chart

Kappa Statistic

29%
40%

31%

J48 CLASSIFIER NAIVE BAYES RANDOM FOREST CLASSIFIER

Fig. 5.9. Kappa Stasitic comparison chart


Mean absolute error
0.25

0.2
0.2023

0.15 0.175

0.1 0.1198

0.05

0
J48 CLASSIFIER NAIVE BAYES RANDOM FOREST
CLASSIFIER

Fig. 5.10. Mean Absolute Comparision comparison chart

Root mean squared error


0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
J48 CLASSIFIER NAIVE BAYES RANDOM FOREST
CLASSIFIER

Fig. 5.11. Root mean squared comparison chart

Relative Absolute Error


45.00%
40.00%
35.00% 40.58%
30.00% 35.10%
25.00%
20.00% 24.02%
15.00%
10.00%
5.00%
0.00%
J48 CLASSIFIER NAIVE BAYES RANDOM FOREST
CLASSIFIER

Fig. 5.12. Relative absolute error chart comparison chart


Root relative squared error
70.00%

60.00% 63.70% 61.96%


50.00%

40.00%

30.00%
28.63%
20.00%

10.00%

0.00%
J48 CLASSIFIER NAIVE BAYES RANDOM FOREST
CLASSIFIER

Fig. 5.13. Root relative squared error chart comparison chart

Accuracy
105

100

95

90

85

80

75
J48 CLASSIFIER NAIVE BAYES RANDOM FOREST
CLASSIFIER

Fig.5.14.Accracy Comparison Chart

6. CONCLUSION
Using WEKA classifiers for sentiment analysis on Amazon data, the best model was
found to be PCRFIEML, which demonstrated strong performance and high accuracy across a
range of assessment measures. In the future, improving the project may entail investigating
sophisticated feature engineering techniques, experimenting with ensemble approaches,
adjusting classifier settings, resolving class imbalance problems, tailoring analysis for certain
domains, and putting ongoing learning strategies into practice. With these improvements,
sentiment analysis should become even more accurate and efficient, giving academics and
companies more useful information about what customers believe.

7. REFERENCE
[1] ‘Muhammad Ali’, ‘Faqeer Hussain’, ‘Bilal Ahmad’ and ‘Muhammad Usman’, The
Natural Language Processing Based Approach for Sentiment Analysis of User Reviews of
Amazon Product by Using Machine and Deep Learning AlgorithmsVolume:13, Issue:3, June
2023, International Journal of Current Engineering and Technology.

[2] ‘Rezaul Haque’,’ Naimul Islam’, ‘Mayisha Tasneem’ And ’Amit Kumar Das’, Multi Class
Sentiment Classification On Bengali Social Media Comments Using Machine Learning,
Volume 4, June 2023,International Journal Of Cognitive Computing In Engineering.

[3] Manish Bhargava, Himanshu Arora, Comparative Analysis and Design Of Different
Approaches for Twitter Sentiment Analysis and Classification of SVM, Volume: 10, Issue: 9,
30 September 2022, International Journal of Recent Innovation Trends in Computing and
Communication.
[4] ‘Archolito V Pahuriray’, ‘Joe D. Basanta’, ‘Jan Carlo T. Arroyo’ and ‘Allemar Jhone P.
Delima’, Flexible Learning Experience Analyzer (FLExA): Sentiment Analysis of College
Students through Machine Learning Algorithms with Comparative Analysis using WEKA
Volume 12, Issue 12, December 2022 , International Journal of Emerging Technology and
Advanced Engineering.

[5] ’Mihir P. Mehta’,’ Gopal Kumar’ and ‘M. Ramkumar’ , Customer Expectations In The
Hotel Industry During The COVID-19 Pandemic: A Global Perspective Using Sentiment
Analysis,Volume 48, Issue 1, 18 March 2021, Tourism Recreation Research.
[6] ‘Jin Zhou’ And ‘Jun-Min Ye’, Sentiment Analysis In Education Research: A Review Of
Journal Publications, Volume 31, Issue 3, 01 October 2020, Interactive Learning
Environments.

[7] ‘Vasundhara’, ’Suraiya Parveen’, Towards Sentiment Analysis: A Powerful Technique


for Data Analytics, Volume: 179 –No.7 September 08, 2020, International Conference on
Electronics and Sustainable Communication Systems.
[8] ‘Tanjim Ul Haque’,’Nudrat Nawal Saber’ and ‘Faisal Muhammad Shah’.Sentiment
Analysis on Large Scale,11-12 May 2018, IEEE International Conference on Innovative
Research and Development.

[9] ‘N.F.F. da Silva’, Tweet sentiment analysis with classifier ensembles, Decision Support
Systems (2014).

[10] Rain, Callen. "Sentiment Analysis in Amazon Reviews Using Probabilistic Machine
Learning."Swarthmore College (2013).

You might also like