MADHU-IEEE Update
MADHU-IEEE Update
Abstract—The advent of web 2.0 provides an ideal platform separates test data label based on the training phase [11][7]
for individuals to share their ideas, opinions, and feelings. Web
Opinion Mining / Sentiment Analysis is a text mining task that
[8][10]. Sentiment analysis is one of the applications of text
aims to build a system that automatically extracts, identifies, mining that purpose to detect opinions, emotions, and
and categorizes user opinions from natural language text, user
created material, or user generated media. In this work, per- perspectives in the given text. Sentiment Analysis requires
formance of different classifiers was analyzed and compared. considerably more knowledge of the natural language than
SVM with 96% and Logistic Regression with 94% on Electr-
nics review dataset, were outpermed well with best classifica- text examination and subjective analysis [20]. The previous
tion accuracy compared with all other classifiers. algorithm considers the only frequent occurrence of the
Keywords: Machine Learning; Sentiment Analysis; Classifica- words in a document but fails to identify the target opinion
tion; Bayesian; Support Vector Machine; Random Forest.
especially when messages are short. It is the process of
1 INTRODUCTION
identifying subjective information as positive or negative. It
In Data Science, Artificial intelligence (AI) has become
helps the business organization and individual to make the
more popular and focused on recent research work. Machine
decision. Recently a lot of research works focus on the
Learning (ML) is a part of AI for various applications to
automatic identification of sentiment by using machine
build a classification model [9]. During the training phase,
learning algorithms.[21][22].
Features and patterns are identified and classifiers are
In this paper, six popular classification techniques namely
trained based on the features set. The Classifier predicted the
Logistic Regression (LR), Support Vector Machine (SVM),
test by using training data [13].ML is used in day-to-day
Random Forest (RF), Naive Bayes (NB), Decision Tree
activities. E.g., the railway reservation system uses
(DT), and K Nearest Neighbors (KNN) are analysed and
classification algorithm to predict seat confirmation
compared its performance in sentiment analysis.
probability, product recommendations in the e-commerce
website are based on purchase history [11][12]. Prediction
1
http://www.cs.cornell.edu/People/pabo/movie-review-data
Step-3: Forecast the decision which comprise two or more
Decision trees.
Step-4: Repeat Step 1 & 2.
Step-5: Predictions of each decision tree are involved in the
testing data; RF chosen the final decision based on majority
TABLE2: EXPERIMENT RESULT OF CLOTHING, SHOES AND JEWELLERY
voting process.
Algorith Classifica TP FP Preci Recal
3.2.6 K NEAREST NEIGHBORS (KNN) m tion Rate Rate sion l
The K-Nearest-Neighbors (KNN) is a straightforward and Accuracy (in%) (in %)
(in %)
powerful classification method. In this method, data are ar-
Logistic 88 44.23 44.12 0.88 0.88
ranged by a dominant part vote of its Neighbors. It acknowl- Regres-
edges the new data similarity and is assigned to the available sion
similar category. For classification of test data, it retrieves
SVM 87.88 44.06 43.82 0.88 0.88
the stored information, based on that it assigns the class to
test data.[3] It is also known as lazy learners, because of the Random 82 40.25 41.84 0.81 0.84
storing the data and retrieves it for classification. [20] Forest
1
http://www.cs.cornell.edu/People/pabo/movie-review-data
Bayes Bayes
Decision 62 31.37 35.90 .70 .62
Decision 73 25.21 47.58 .92 .50 Tree
Tree
KNN 50 22.80 27.55 .51 .45
KNN 84 45.07 39.20 .81 .90
In above Table5, Logistic Regression classifiers outperforms
maximum accuracy.
3.5 DATASET-V Amazon Toys and Games Review dataset
In above table3, SVM classifier outperforms maximum ac- [14]. The performance of various classifiers is shown in Ta-
curacy. ble6.
TABLE6: EXPERIMENT RESULT OF TOYS AND GAMES
4.3 DATASET-III Amazon Health and Personal Care Review Algorithm Classifica- TP FP Rate Pre- Recall
tion Accu- Rate (in %) cision
dataset [14]. The performance of various classifiers is shown racy (in%)
(in %)
in Table4.
Logistic 88 45.02 43.11 .88 .88
Regression
SVM 88 44.82 42.89 .88 .88
TABLE4: EXPERIMENT RESULT OF HEALTH AND PERSONAL CARE Random 79 40.00 39.09 .80 .78
Forest
Classification Accuracy TP Rate FP Rate Precision
(in %) (in%) (in %) Naive 68 25.64 42.73 .80 .50
84 42.13 41.98 Bayes
Decision 68 28.70 39.61 .75 .56
84 41.92 41.81 Tree
66 24.58 41.34
In above Table4, SVM and Logistic Regression classifiers
60 14.30 45.93 outperforms maximum and same accuracy. Evaluation and
Comparison of different Classification algorithms are de-
50 22.95 27.13
picted in graph on Figure 1:
1
http://www.cs.cornell.edu/People/pabo/movie-review-data
Fig1: Various Classifier with different dataset [9] Hung and H. Lin, "Using Objective Words in SentiWordNet to
Improve Word-of-Mouth Sentiment Classification," in IEEE
Intelligent Systems, vol. 28, no. 2, pp. 47-54, March-April (2013),
doi: 10.1109/MIS.2013.1.
[10] F. Alattar, K. Shaalan,“Survey on Opinion Reason Mining and Inter-
preting Sentiment Variations”,IEEE Access, Volume 9, (2021).
[11] Karthikeyan, C., Sahaya, A.N.A., Anandan, P., Prabha, R., Mohan,
D., Vijendra, B.D,“Predicting Stock Prices Using Machine Learning
Techniques”,Proceedings of the 6th International Conference on In-
ventive Computation Technologies, ICICT 2021, .(2021).
[12] KoyelChakraborty; Siddhartha Bhattacharyya; Rajib Bag,“A Survey
of Sentiment Analysis from Social Media Data”, IEEE Transactions
on Computational Social Systems, Volume: 7, Issue: 2, (2020).
[13] Morinaga, S., Yamanishi, K., Tateishi, K. and Fukushima, T.,
(2002), “Mining product reputations on the web”, in Proceeding of the
eighth ACM SIGKDD, international conference on Knowledge dis-
covery and data mining (pp. 341-349). ACM.
[14] Ni, J., Li, J. & McAuley, J.,“Justifying recommendations using dis-
tantly-labeled reviews and fine-grained aspects”, In Proc. 2019 Con-
ference on Empirical Methods in Natural Language Processing and
the 9th International Joint Conference on Natural Language Process-
ing (EMNLP-IJCNLP), 188–197, (2019)
[15] Qixuan Hou; Meng Han; Zhipeng Cai,” Survey on data analysis in so-
cial media: A practical application aspect” Big Data Mining and Ana-
lytics , Volume: 3, Issue: 4, (2020).
[16] Sajana, T., Narasingarao, M. R.,“Classification of Imbalanced
Malaria Disease Using Naïve Bayesian Algorithm”, International
Journal of Engineering & Technology,7(2.7) ,786-790,(2018)
[17] S Sakhare, N.N., Sagar Imambi,“Performance analysis of regression-
based machine learning techniques for prediction of stock market
movement" International Journal of Recent Technology and Engineer-
ing 7 (6), 655-662, (2019)
5 CONCLUSION [18] Sanjay Bhargav, P., Nagarjuna Reddy, G., Ravi Chand, R.V., Pujitha,
K., Mathur, A., “Sentiment analysis for hotel rating using machine
In this work, the performance of various classifications with learning algorithms” International Journal of Innovative Technology
different datasets is evaluated and compared. Feature selec- and Exploring Engineering, Vol. 8,Issue.6,pp 1225-1228.
tion technique Term Frequency and Inverse Document Fre- [19] Shaozhong Zhang; Haidong Zhong,“Mining Users Trust from E-Com-
quency (TF-IDF) is used to convert the dataset into fre- merce Reviews Based on Sentiment Similarity Analysis”, IEEE
Access ,Volume: 7, Page(s): 13523 – 13535, (2019).
quency matrix for selecting the best feature. From the result
[20] Surbhi Bhatia,“A Comparative Study of Opinion Summarization Tech-
we conclude that classifiers SVM and LR produced best niques”, IEEE Transactions on Computational Social Systems, Vol. 8,
classification accuracy compared with all other classifiers. No. 1, (2021).
Different feature selection techniques need to be imple- [21] Shyamasundar L B., Jhansi Rani P, “A Multiple-Layer Machine Learn-
mented in the future. ing Architecture for Improved Accuracy in Sentiment Analysis”, The
Computer Journal , Volume: 63, Issue: 1, Jan. 2020,pp 395 – [22] 409,
6 REFERENCE (2020).
[1] Abbasi, H. Chen, and A. Salem, “Sentiment Analysis in Multiple Lan- [23] Vavilapalli, S.S., Reddykorepu, P., Saggam, S., Pentyala, M., Devi,
guages: Feature Selection for Opinion classification in Web Forums,” S.A,” Summarizing Sentiment Analysis on Movie Critics Data”, Pro-
ACM Trans. Information Systems, vol. 26, no. 3, pp. 1-34, (2008). ceedings of the 6th International Conference on Inventive Computa-
tion Technologies, ICICT 2021, (2021).
[2] Anjali Devi, S., Sapkota, P., Rohit Kumar, K., Pooja, S., Sandeep,
M.S.,” Comparison of classification algorithms on twitter data using [24] Yassine Al-Amrani,Mohamed Lazaar, Kamal Eddine lkadiri,“Senti-
sentiment analysis”, International Journal of Advanced Trends in ment Analysis using supervised classification algorithms”, Proceed-
Computer Science and Engineering, Vol. 9, Issue-5, (2020), pp:8170- ings of the 2nd international Conference on Big Data, Cloud and Ap-
8173 plications, Association for Computing Machinery, Article No.: 61,
Pages 1–8,(2017).
[3] Bahrawi,“Sentiment Analysis using Random Forest Algorithm Online
Social Media Based”, Journal of Information Technology and its Uti- [25] You Li, Yuming Lin, Jingwei Zhang and Guoyong Cai, Constructing
lization, Vol. 2, Issue 2, (2019),29-33 Domain-Dependent Sentiment Lexicons Automatically for Sentiment
Analysis. Information Technology Journal, 12: 990-996, (2013).
[4] Pang and L. Lee,“A Sentimental Education: Sentiment Analysis Using
Subjectivity Summarization Based on Minimum Cuts,” Proc. 42th
Ann. Meeting on Assoc. for Computational Linguistics (ACL),
(2004), pp. 271-278.
[5] Bo Pang and Lillian Lee,“Opinion mining and sentiment analysis”
Foundations and Trends in Information Retrieval, Vol. 2, No 1-2,
(2008), pp 1–135
[6] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan,“Thumbs up?
Sentiment Classification using Machine Learning Techniques”, Pro-
ceedings of EMNLP, (2002).
[7] B. Liu, M. Hu, and J. Cheng,“Opinion Observer: Analyzing and Com-
paring Opinions on the Web,” Proc. Int’l Conf. World Wide Web,
(2005), pp. 342-351.
[8] Bulusu, A., Sucharita, V.Research ,“Research on machine learning
techniques for POS tagging in NLP”, International Journal of Recent
Technology and Engineering,Vol. 8,,Issue 1, Special Issue- 4,(2019).
1
http://www.cs.cornell.edu/People/pabo/movie-review-data