0% found this document useful (0 votes)

68 views11 pages

RGBGB

Uploaded by

Pierre Gan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views11 pages

RGBGB

Uploaded by

Pierre Gan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

INDIAN JOURNAL OF SCIENCE AND TECHNOLOGY

RESEARCH ARTICLE

Sentiment Analysis Framework of

Social Media Text by Feature Extraction
and Machine Learning Model
OPEN ACCESS Kajal Mathur1 ∗ , Paresh Jain2 , Sunita Gupta3 , Puneet Mathur4
Received: 21-06-2023 1 Research Scholar, Department of Computer Science and Engineering, Suresh Gyan Vihar
Accepted: 26-06-2023 University, Jaipur, Rajasthan, India
2 Associate Professor, Department of Electronics and Communication Engineering, Suresh
Published: 03-08-2023
Gyan Vihar University, Jaipur, Rajasthan, India
3 Associate Professor, Department of information technology, Swami Keshvanand Institute
of Technology, Jaipur, Rajasthan, India
4 Assistant Engineer (IT), RVUN, Jaipur, Rajasthan, India
Citation: Mathur K, Jain P, Gupta S,
Mathur P (2023) Sentiment Analysis
Framework of Social Media Text by
Feature Extraction and Machine Abstract
Learning Model. Indian Journal of
Science and Technology 16(29): Objectives: This research paper aims to analyze sentiment and opinions
2233-2243. https://doi.org/ in online resources like discussion forums, review sites, and blogs. It also
10.17485/IJST/v16i29.1537 compares the effectiveness of three feature extraction techniques (TF-IDF,
∗
Corresponding author. Word2Vec, and WAM) and evaluates three machine learning algorithms (Naïve
[email protected] Bayes, SVM, and ANN) for sentiment classification to determine the most
accurate algorithm. Methods: The study utilizes sentiment-rich datasets from
Funding: None
IMDB movie reviews, Yelp reviews, and tweets. Three feature extraction
Competing Interests: None
techniques are applied to extract relevant features and patterns from the text.
Copyright: © 2023 Mathur et al. Three machine learning algorithms are implemented to classify sentiments
This is an open access article
distributed under the terms of the into positive, negative, and neutral categories. Accuracy, precision, recall,
Creative Commons Attribution and F-measure are used to assess algorithm performance. The model is
License, which permits unrestricted
updated and refined three times to ensure reliability. Findings: The Artificial
use, distribution, and reproduction
in any medium, provided the Neural Network (ANN) algorithm outperforms Naïve Bayes and Support Vector
original author and source are Machines, achieving an impressive accuracy rate of 99.74% for sentiment
credited.
classification. Precision, recall, and F-measure exceed 98.5% after model
Published By Indian Society for refinement, demonstrating the approach’s robustness. The study highlights the
Education and Environment (iSee)
potential of sentiment analysis in online resources and emphasizes the ANN’s
ISSN superior accuracy, providing valuable insights for future sentiment analysis
Print: 0974-6846
studies. Novelty: This research combines three popular feature extraction
Electronic: 0974-5645
techniques in sentiment analysis, compares three machine learning algorithms
on multiple datasets, and achieves a remarkable accuracy rate of 99.74% with
the ANN. The study demonstrates the robustness of the approach through
model refinement and contributes insights into sentiment analysis in online
resources.
Keywords: Dataset; Feature Extraction; Machine Learning; Sentiment
Analysis; Accuracy and Precision

https://www.indjst.org/ 2233
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

1 Introduction
Social media has emerged as a dominant platform for online communication, allowing individuals to express their thoughts
and emotions in real-time. However, the informal nature of social media text poses challenges for accurate classification and
information extraction. To address this, techniques such as TF-IDF weighting combined with a Word Article Matrix (WAM)
have been proposed to categorize and analyze social media text effectively. Yet, determining the optimal iteration number for
WAM updating remains an unexplored area (1–3) .
Moreover, sentiment analysis techniques have been applied to movie reviews, with a focus on comparing supervised machine
learning approaches like Support Vector Machines (SVM) and Naive Bayes. The findings indicate the superiority of Naive Bayes,
particularly when dealing with a large number of reviews, achieving higher accuracy compared to other methods. With social
media playing a vital role in public opinion on various topics, sentiment analysis enables businesses to gain valuable insights
for informed decision-making (4,5) .
Sentiment analysis involves predicting sentiments using classification algorithms and employing text pre-processing
techniques. These techniques involve removing symbols, punctuation, and word stems, while also eliminating stop words. The
construction of a vector space model using term frequencies and inverse document frequencies serves as the foundation for
sentiment analysis (6–8) .
While previous studies have explored sentiment analysis using various algorithms, there are still gaps in understanding
algorithm performance across different datasets, including movie comments, political tweets, and drug-related tweets.
Furthermore, research conducted on Turkish datasets highlights the significant role of data distribution in the success rate of
classification algorithms. These gaps justify the need for further investigation and contribute to the advancements of sentiment
analysis on social media text (9,10) .
In this paper a framework of sentiment analysis framework for social media text is proposed by using enhancing advance
feature extraction techniques and machine learning to obtain the accuracy, precision, sensitivity and F-measures of the proposed
framework
The novelty of this study lies in the development of a sentiment analysis framework specifically designed for social media
text in two-fold. Firstly, the framework focuses specifically on social media text, which presents distinct challenges compared to
other types of text, such as news articles or product reviews. Social media text often contains informal language, abbreviations,
emojis, and contextual references that require specialized techniques for accurate sentiment analysis.
Secondly, the framework integrates feature extraction and machine learning models. Feature extraction involves identifying
relevant aspects of the text that can capture sentiment, such as keywords, linguistic patterns, syntactic structures, or contextual
cues. By leveraging machine learning models, such as Support Vector Machines (SVM), Artificial Neural Network (ANN), and
Naïve Bayes (NB), the framework can learn from the extracted features to accurately classify the sentiment of social media text.
Overall, the novelty of this topic lies in its targeted focus on sentiment analysis in the context of social media, as well as
the integration of feature extraction techniques and machine learning models to achieve accurate sentiment classification.
By addressing the unique characteristics of social media text, this framework contributes to advancing the field of sentiment
analysis and enables deeper insights into public opinion, customer feedback, and social media trends.

1.1 Research Gaps

Based on the information provided in the research papers, here are some potential research gaps that could be explored:

• The findings of the previous research paper are limited to specific datasets, and there is a need for further research to
examine the generalizability of the results across different types of online resources, including news articles, forum threads,
and social media posts from various platforms.
• A comprehensive comparison of TF-IDF, Word2Vec, and Word Article Matrix methods in terms of effectiveness and
performance is lacking. Future studies should conduct a more extensive evaluation to determine the most suitable feature
extraction approach for sentiment analysis in different contexts.
• A performance comparison of SVM, NB and ANN using TF-IDF, Word2Vec, and Word Article Matrix feature extracting
methods is lacking if they outperform in terms of accuracy.

2 Methodology
2.1 Data Collection
This paper uses three types of data is collected as (11) :

https://www.indjst.org/ 2234
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

• Internet Movies Database (IMDB)

• Twitter Database
• Yelp Database

The Internet Movies Database (IMDB) movie review dataset. This data consists of unprocessed, unlabelled file. In this dataset
1400 processed text files are available.
The files of all three datasets are divided in two types with respect to their classification as ‘Positive’ and ‘Negative’, indicating
the true classification (sentiment) of the component files.

2.2 Text Pre-processing

The initial step in this stage involves obtaining the actual text from the dataset, treating each review as a separate entity.
To achieve this, the content of the file is split based on the end-of-line character, effectively separating individual reviews.
Additionally, the reviews are converted to lowercase to facilitate matching with the AFINN data being utilized. Punctuation
marks, numbers, and control characters are omitted during this process to enhance matching accuracy. In this research, feature
extraction is performed using the following techniques (12–14) :

• Term Frequency-Inverse Document Frequency (TF-IDF)

• Word2Vec (W2V)
• Word Article Matrix (WAM)

These techniques are employed to extract meaningful features from the reviews, enabling further analysis and classification (13) .

2.3 Classiﬁcations
The algorithms are employed to get the best results as given bellow:

• SVM
• ANN
• Naïve Bayes

2.4 Proposed Framework

This section provides an overview of the datasets used in the study, including Twitter, IMDB, and Yelp. Feature extraction
techniques such as TF-IDF, Word2Vec, and WAM are employed to extract meaningful features from the data. The paper
also utilizes various classification algorithms from the field of machine learning. The flowchart illustrating the methodology
employed in this paper is presented in Figure 1.

2.5 Datasets
Twitter is a popular microblogging site that allows users, including Jack Dorsey, to share text, pictures, and videos instantly
within a 280-character limit (10,15) . Users can follow other accounts, like tweets, and retweet them to share with their own
followers.
In this research paper, a dataset of 4,500 health-related tweets was collected using the Twitter Application Programming
Interface (API). These tweets were then pre-processed and assigned sentiment scores using a Python program. Out of the
collected and labeled tweets, 1,680 were categorized as neutral, 1,220 as positive, and 1,600 as negative (16) . The attributes of the
collected tweets obtained via the Python program are presented in Table 1.
In addition to the analysis of Twitter data, the same models were applied to two other datasets. The first dataset consisted
of 500 positive and 500 negative opinions collected by (14) from IMDB movie reviews, as shown in Table 2. The second dataset,
called Yelp, consisted of 200 neutral, 350 positive, and 300 negative reviews, as presented in Table 3.
These datasets serve as valuable resources for examining sentiment analysis techniques and evaluating the performance of
the models applied in the study. The attributes of the collected tweets and reviews provide insights into the data used for analysis
and classification.

https://www.indjst.org/ 2235
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

Fig 1. Flowchart of Proposed Methodology

Table 1. Twitter dataset

Dataset attribute Explanation of Attribute
Id Order of tweet dataframe
Text Tweet
Created_at Data and time the Tweet was posted
Retweeted Tweet rerun status (bool)
Retweet_count Number of retweets
User_screen_name Username
User_followers_count Number of followers
User_location Followers location
Hastags Tweet tag
Sentiment_score Sentiment score
Sentiment_class Positive, negative, neutral

Table 2. IDMB dataset of Kotzias

Dataset attribute Explanation of Attribute
Text Reviews from IDMB
Year Year of release
Name_movie Name of the movie
Genre Genre of the movie
Runtime Total runtime of the movie
Sentiment class Positive, negative

https://www.indjst.org/ 2236
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

Table 3. Yelp dataset of JSON

Dataset attribute Explanation of Attribute
Name Name of Business
User_id Customer ID number
Review_id ID of Reviewer
Business_id Business number
Stars Rating of business service and quality
Review_count Number of reviews received
Sentiment class Positive, negative, neutral

2.6 Feature Extracting Techniques

2.6.1 Term Frequency-Inverse Document Frequency (TF-IDF)
The TF-IDF (Term Frequency-Inverse Document Frequency) technique is used to calculate term weights in a document. The
TF component calculates the frequency of a term in a document, as shown in Equation (1). The IDF component determines the
significance of a term by considering its occurrence in multiple documents and distinguishing it from stop words. It is calculated
by taking the logarithm of the ratio between the total number of documents and the number of documents containing the term,
as shown in Equation (2) (5,8).

Term t f requency in document d

T F(t, d) = (1)
Total words in document d

( )
Total documents
IDF (t) = log (2)
Documents with term t

t=Term, d=Documents
the TF-IDF formula is defined as (3):

TF − IDF(t) = TF(t, d) × IDF(t) (3)

In Equation (1), ”TF (t, d)” represents the term frequency of term ”t” in document ”d” divided by the total number of words in
document ”d”. In Equation (2), ”IDF(t)” is calculated as the logarithm of the ratio between the total number of documents and
the number of documents containing term ”t”. ”t” represents the term, and ”d” represents the documents. The TF-IDF formula,
given in Equation (3), combines these components to determine the importance of a term in a document based on its frequency
and occurrence in the document collection.

2.6.2 Word2vec
Word2vec is a natural language processing tool that operates on unsupervised learning principles and is based on the artificial
neural network structure developed by (3,9,15) . It functions by taking text input and representing each word in the text as a vector.
The primary objective of word2vec is to cluster words with similar meanings close to each other in vector space. This is achieved
through two different learning architectures: continuous bag of words (CBOW) and skip-gram (SG).
In the CBOW architecture, the tool examines the neighboring words (both to the right and left) of a given word within
a specific window size and performs word estimation based on these neighboring words. On the other hand, the skip-gram
architecture estimates neighboring words by considering the target word in reverse, focusing on predicting the surrounding
words given the target word.
By employing these learning architectures, word2vec can effectively capture semantic relationships between words and
represent them as vectors, enabling various downstream natural language processing tasks such as sentiment analysis, text
classification, and word similarity calculations.

2.6.3 Word Article Matrix (WAM)

WAM is a significant data structure (3,5,17) . It represents a large matrix that captures the weighted relationships between
documents and keywords. The rows of the matrix correspond to document names (articles), while the columns correspond

https://www.indjst.org/ 2237
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

to words or keywords extracted from the documents. The WAM is filled in by counting the occurrences of keywords within
each document, resulting in a table structure as shown in Table 4.
To generate the initial WAM (i-WAM), the term frequency (TF) value of each word is utilized. For example, considering
a training set of 10 documents with a total of 100 words, the i-WAM will be constructed using the TF values, as depicted in
Table 5. In this representation, documents and words are represented as vectors. Each row in the matrix represents a document,
and the values within the row correspond to the vector of words that represent that particular document.
Suppose there is a query, such as ”Microsoft stock got a small boost from the launch of Windows 10”. This query is
transformed into a model of word vectors, as illustrated in Table 6.

Table 4. An example of WAM

Article Word (Category) Stock Windows 10 Golf
Economic 5 2 2
IT 2 10 1
Sports 1 7
Entertainment 5 4 4
Foreign 3 5 6
Politics 4 7 2
Regional 1 6 4

Table 5. An example of the i-WAM

Article Word (Category) Stock Windows 10 Golf
Economic 0.05 0.02 0.02
IT 0.02 0.10 0.01
Sports 0.01 0.07
Entertainment 0.05 0.04 0.04
Foreign 0.03 0.05 0.06
Politics 0.04 0.07 0.02
Regional 0.01 0.06 0.04

Table 6. A sample query with word count

Query Word (Category) Stock Windows 10 Golf
Query 1 1 0

In the context of a corpus, the collection of documents can be seen as a set of vectors in a vector space, with each
term representing a unique axis. The similarity between any two documents can be determined using the cosine similarity
technique (4,18) , which measures the similarity between their respective vectors.
The cosine similarity (d1, d2) is calculated as the dot product of the document vectors d1 and d2, divided by the product of
their magnitudes (∥d1∥ and ∥d2∥), as shown in Equation (4):

Cosine Similarity (d1, d2) = (d1 · d2)/(∥d1∥ ∗ ∥ d2∥) (4)

Here, the dot product represents the similarity between the vectors, while the magnitude represents the length of the vectors.
Using the cosine similarity values, we can calculate the similarity between documents. For example, when applying this
technique to an example query, the cosine similarity scores are computed and presented in Figure 2. In this table, the word
”Stock” has a high weight of 0.5 in the economic category. The operation results indicate that the query is more likely related to
the economic document, as it produces the highest cosine similarity score of 0.861.

2.7 Classiﬁcation Algorithms

This research focuses on document-level sentiment analysis, which involves classifying the sentiment of entire documents
rather than individual sentences or specific attributes. Two supervised machine learning models, Naive Bayes (NB) and support

https://www.indjst.org/ 2238
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

Fig 2. Cosine similarity result

vector machines, were utilized for sentiment classification of selected movie reviews. To represent the documents in a machine-
readable format, a predefined set of features (f1, f2, ..., fm) was established, where ni(d) represents the frequency of feature fi in
document ’d’. Consequently, each document ’d’ was transformed into a document vector d := (n1(d), n2(d), ..., nm(d)).
The chosen machine learning algorithms, namely SVM, ANN, and NB, are widely recognized for their effectiveness in
sentiment analysis tasks. This study contributes by evaluating the performance of these algorithms in comparison to traditional
frequency-based text representation (TF-IDF) and prediction-based text representation (W2V) methods. Experimental analysis
was conducted on datasets including IMDB, Yelp, and tweets that were collected and labeled by researchers based on their
sentiments. The results indicated that the model created using W2V and ANN demonstrated superior performance compared
to other approaches (1–4,19) .

2.7.1 Naïve Bayes

The Naïve Bayes (NB) algorithm, named after the mathematician Thomas Bayes, belongs to the family of Bayesian algorithms
and is based on the statistical Bayesian theorem. It is a statistical classification technique that utilizes the predictive power of
Bayesian models. The Bayes classifier, which is relatively straightforward to apply, is a predictive model.
In the context of the algorithm, let’s consider a sample set d = d1, d2, d3, ..., dn, and a class set c1, c2, c3, ..., cm. To classify a
given sample, the probability is calculated using Equation (5):

P(c/d) = (P(c) ∗ P(d/c))/P(d) (5)

Here, the probability of each class given the sample is determined. The class with the highest probability for the data sample is
considered the classification result.
Although the role of P(d) in selecting c is negligible, it is important to note that the conditional independence assumption
made by the Naive Bayes classifier does not hold in real-world situations. Nevertheless, Naive Bayes-based text classification
tends to perform well, as it is a simple probabilistic classifier based on Bayesian probability. The classifier assumes that the
probabilities of individual features in a document are independent of each other. It treats a document as a collection of words
and assumes that the presence and position of each word in the document are independent of other words. The Naive Bayes
classifier is derived from Bayes’ rule (4,20) .

2.7.2 Support Vector Machine

The Support Vector Machine (SVM) is a data mining method that operates in a vector space and aims to find a decision
boundary between two classes that is farthest from a random point on the training data. It follows the principle of structural

https://www.indjst.org/ 2239
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

risk minimization in statistical learning theory, which is one of its key characteristics (3,4) .
SVMs have proven to be efficient for document classification and are known as large margin classifiers. The fundamental
concept behind SVM classification is to identify a hyperplane with the maximum margin that effectively separates the document
vectors of one class from those of the other class. Unlike Naïve Bayes, SVMs are large-margin classifiers rather than probabilistic
classifiers. The objective is to find a solution represented by the vector W:

W = ∑ j ∝ jc jd j, ∝ j≥ 0 (6)

The α j values, obtained by solving a problem of dual optimization, play a crucial role in determining the support vectors. Only
the document vectors with α j greater than zero contribute to the construction of the vector w. These support vectors are essential
for the classification process, as they define which side of the hyperplane created by w an instance falls on.

2.7.3 Artiﬁcial Neural Network

Artificial neural networks are computational models inspired by the structure and functioning of the human brain. They are
composed of interconnected processing elements, referred to as neurons, which have their own memory and communicate
through weighted connections. These networks emulate the behavior of biological neural networks and are implemented as
computer programs (2,3) .
The structure of an artificial neural network comprises three main components: neurons, connections, and a learning
algorithm. Neurons serve as the fundamental processing units within the network. They receive input from various sources,
representing the factors that influence the problem, and produce output based on the desired outcome. Through the connections
between neurons, an interconnected network is formed, resembling the biological neural connections. In most artificial neural
network systems, neurons are organized into layers, with each layer processing information in a specific direction (15) .

2.8 Performance Criteria

In this paper, the models developed using classification algorithms were evaluated using a confusion matrix [35]. Four statistical
measures were employed for performance evaluation: accuracy (ACC), sensitivity (SENS), precision (PREC), and F-measure
(F). Sensitivity represents the probability of correctly identifying the True Positive (TP) class (where ’Y’ means ’Yes’), while
specificity represents the probability of correctly identifying the True Negative (TN) class (where ’Y’ means ’No’). False Negative
(FN) refers to the situation where the model predicts a negative class while the actual class is positive, while False Positive (FP)
refers to the scenario where the model predicts a positive class while the actual class is negative. Accuracy reflects the overall
probability of correctly detecting the true class. The F-measure is a harmonic mean of precision and recall, ranging from 0
(worst) to 1 (perfect PREC and SENS) (3,15) .
The accuracy value is calculated using Equation (7):

TP + TN
Accuracy = (7)
TP + TN + FP + FN
The sensitivity value is calculated using Equation (8):

TP
Sensitivity = (8)
TP + FN
Precision is calculated using Equation (9):

TP
Precision = (9)
TP + FP
The F-measure value is calculated using Equation (10):

2 ∗ Precision ∗ Sensitivity
F − measure = (10)
Precision + Sensitivity

To establish the models using classifier algorithms and evaluate their performance, the dataset was divided into training and
test sets.

https://www.indjst.org/ 2240
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

3 Results and Discussion

In this paper, a dataset of 4500 tweets was used to perform sentiment analysis using the Naïve Bayes (NB), Support Vector
Machine (SVM), and Artificial Neural Network (ANN) classification algorithms. The tweets underwent text pre-processing
and vector space modelling. To evaluate the performance of the algorithms, a 5-fold cross-validation approach was applied to
split the data into training and test sets. The evaluation metrics used were accuracy (AC), precision (PR), sensitivity (S), and
F-measure (F), and the results are presented in Tables 8, 9 and 10.
Furthermore, the performance of the classification algorithms on the IMDB dataset, which contained labeled polarities
provided by Kotzias, was also assessed and presented inTables 8, 9 and 10.

Table 7. Results with TF-IDF on Twitter, IMDB and Yelp datasets

Dataset Algorithm Accuracy Precision Sensitivity F-Measure
SVM 82% 83% 82% 81%
Twitter NB 72% 73% 72% 76%
ANN 86% 87% 84% 85%
SVM 83% 84% 84% 84%
IMDB NB 82% 82% 83% 82%
ANN 89% 88% 88% 89%
SVM 81% 82% 81% 80%
Yelp NB 70% 72% 71% 74%
ANN 85% 86% 82% 84%

Table 8. Results with TF-IDF on Twitter, IMDB and Yelp datasets

Table 9. Result with W2V on the Twitter, IMDB and Yelp datasets
Dataset Algorithm Accuracy Precision Sensitivity F-Measure
SVM 84% 80% 84% 82%
Twitter NB 72% 76% 76% 77%
ANN 87% 84% 86% 85%
SVM 84% 84% 84% 84%
IMDB NB 83% 84% 85% 84%
ANN 90% 91% 90% 96%
SVM 83% 79% 83% 81%
Yelp NB 71% 75% 75% 75%
ANN 86% 83% 85% 84%

To validate the performance results of the classifiers on the IMDB dataset, the same algorithms were applied to the Twitter
and Yelp datasets using the TF-IDF, Word2Vec (W2V), and Word Article Matrix (WAM) methods for vector modelling. The
performance of the algorithms on the three datasets is compared and presented in Tables 8, 9 and 10. It was observed that

https://www.indjst.org/ 2241
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

Table 10. Result with WAM on the Twitter, IMDB and Yelp datasets
Dataset Algorithm Accuracy Precision Sensitivity F-Measure
SVM 99.68% 99.76% 99.11% 99.65%
Twitter NB 99.60% 99.64% 99.83% 99.50%
ANN 99.72% 100% 100% 99.72%
SVM 99.68% 99.78% 99.21% 99.61%
IMDB NB 99.62% 99.60% 99.15% 99.58%
ANN 99.74% 100% 100% 100%
SVM 99.62%% 99.73% 99.08% 99.64%
Yelp NB 99.58%% 99.61% 99.81% 99.50%
ANN 99.70% 100% 100% 99.71%

the ANN algorithm achieved the best performance across all three datasets, while the NB algorithm exhibited the worst
performance.Table 9 demonstrates better performance results compared to Table 8, and similarly, Table 10 demonstrates
improved performance compared to Table 9. As per the experiment result of ANN on different datasets using different feature
extracting techniques, it is observed that the accuracy outperformed 99.74% on the IMDB dataset for WAM technique as
depicted in Table 10.

3.1 Comparison with other Methods

Results of comparison between the proposed model using ANN classifier and those reported by others is shown in Table 11. It
is revealed that our proposed method is superior to other methods in respect of accuracy, precision, sensitivity and F-measure.
It is therefore apparent the method proposed by us is superior to the existing methods. This tends to authenticate the novelty of
our proposition to use ANN classifier and therefore inherits its merit over other techniques advocated by a number of previous
researchers
Table 11. Comparison between the proposed method and the methods suggested by previous workers
Ref. Classifier Feature Extraction Dataset Accuracy Precision Sensitivity F-Measure
Method
Twitter 83% 83% 82% 81%
TF-IDF IMDB 83% 84% 84% 84%
Yelp 81% 82% 81% 81%
SVM
Twitter 89% 88% 86% 87%
W2V IMDB 84% 84% 86% 85%
(1) Yelp 83% 84% 85% 84%
Twitter 72% 73% 73% 76%
TF-IDF IMDB 82% 82% 83% 82%
Yelp 76% 77% 77% 77%
NB
Twitter 72% 76% 75% 76%
W2V IMDB 83% 84% 85% 84%
Yelp 78% 78% 78% 81%
Pro- Twitter 99.72% 100% 100% 99.72%
posed ANN WAM IMDB 99.74 100% 100% 100%
Method
Yelp 99.70% 100% 100% 99.71%

4 Conclusions
The study aimed to evaluate the effectiveness of classifiers on three diverse datasets: IMDB, Twitter, and Yelp, using various
text representation techniques. By leveraging existing categorization of online news categories, the study achieved human-
like categorization of social media text. Classification algorithms employed were Artificial Neural Network (ANN), Support

https://www.indjst.org/ 2242
Mathur et al. / Indian Journal of Science and Technology 2023;16(29):2233–2243

Vector Machine (SVM), and Naïve Bayes. Results showed consistent performance across all datasets, with ANN outperforming
other algorithms. Naïve Bayes had the lowest performance. Future studies should explore advanced neural network models
for classification. These findings highlight the potential for accurate social media text categorization and suggest avenues for
further research and improvement in classification techniques.

References
1) Muhammet SB, Fatih K. Sentiment Analysis on Social Media Reviews Datasets with Deep Learning Approach Article. Sakarya University Journal of
Computer and Information Sciences·. 2021;4(1). Available from: https://doi.org/10.35377/saucis.04.01.833026.
2) Bordoloi M, Biswas SK. Sentiment analysis: A survey on design framework, applications and future scopes. Artificial Intelligence Review. 2023. Available
from: https://doi.org/10.1007/s10462-023-10442-2.
3) Alantari HJ, Currim IS, Deng Y, Singh S. An empirical comparison of machine learning methods for text-based sentiment analysis of online consumer
reviews. International Journal of Research in Marketing. 2022;39(1):1–19. Available from: https://doi.org/10.1016/j.ijresmar.2021.10.011.
4) Bhuvaneshwari P, Rao AN, Robinson YH, Thippeswamy MN. Sentiment analysis for user reviews using Bi-LSTM self-attention based CNN model.
Multimedia Tools and Applications. 2022;81(9):12405–12419. Available from: https://doi.org/10.1007/s11042-022-12410-4.
5) Ping W, Li J, Jingrui H. S2SAN: A sentence-to-sentence attention network for sentiment analysis of online reviews. 2021. Available from: https:
//doi.org/10.1016/j.dss.2021.113603.
6) Li L, Goh TTT, Jin D. How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Computing
and Applications. 2020;32(9):4387–4415. Available from: https://doi.org/10.1007/s00521-018-3865-7.
7) Li W, Qi F, Tang M, Yu Z. Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing.
2020;387:63–77. Available from: https://doi.org/10.1016/j.neucom.2020.01.006.
8) Zhiqiang G, Guofei C, Yongming H, Lu G, Li F. Semantic relation extraction using sequential and tree-structured LSTM with attention. Information
Sciences. 2020;509:183–192. Available from: https://doi.org/10.1016/j.ins.2019.09.006.
9) Jain R, Kumar A, Nayyar A, Dewan K, Garg R, Raman S, et al. Explaining sentiment analysis results on social media texts through visualization. Multimedia
Tools and Applications. 2023;82(15):22613–22629. Available from: https://doi.org/10.1007/s11042-023-14432-y.
10) Nandwani P, Verma R. A review on sentiment analysis and emotion detection from text. Social Network Analysis and Mining. 2021;11(1). Available from:
https://doi.org/10.1007/s13278-021-00776-6.
11) Rahman H, Tariq J, Masood MA, Subahi AF, Khalaf OI, Alotaibi Y. Multi-Tier Sentiment Analysis of Social Media Text Using Supervised Machine
Learning. Computers, Materials & Continua. 2023;74(3):5527–5543. Available from: https://doi.org/10.32604/cmc.2023.033190.
12) Budhi GS, Chiong R, Pranata I, Hu Z. Using Machine Learning to Predict the Sentiment of Online Reviews: A New Framework for Comparative Analysis.
Archives of Computational Methods in Engineering. 2021;28(4):2543–2566. Available from: https://doi.org/10.1007/s11831-020-09464-8.
13) Fan FLL, Xiong J, Li M, Wang G. On Interpretability of Artificial Neural Networks: A Survey. IEEE Transactions on Radiation and Plasma Medical Sciences.
2021;5(6):741–760. Available from: https://doi.org/10.1109/TRPMS.2021.3066428.
14) Kaur G, Sharma A. A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis. Journal of Big Data.
2023;10(1). Available from: https://doi.org/10.1186/s40537-022-00680-6.
15) Sayyida TK, Sohail A, Shehneela N. Transformer-based deep learning models for the sentiment analysis of social media data. Array. 2022;14:100157.
Available from: https://doi.org/10.1016/j.array.2022.100157.
16) Qianwen AX, Victor C, Chrisina J. A systematic review of social media-based sentiment analysis: Emerging trends and challenges. Decision Analytics
Journal. 2022;3:100073. Available from: https://doi.org/10.1016/j.dajour.2022.100073.
17) Dimple T, Bharti N, Bhoopesh SB, Ashutosh M, Manoj K. A systematic review of social network sentiment analysis with comparative study of ensemble-
based techniques. Artif Intell Rev. 2023;12:1–55. Available from: https://doi.org/10.1007/s10462-023-10472-w.
18) Muhammet SB, Fatih K. Sentiment analysis with machine learning methods on social media. 2020. Available from: https://doi.org/10.14201/
ADCAIJ202093515.
19) Wang H, Wang X. Sentiment analysis of tweets and government translations: Assessing China’s post-COVID-19 landscape for signs of withering or
booming. Global Media and China. 2023;8(2):213–233. Available from: https://doi.org/10.1177/20594364231181745.
20) Yin Z, Shao J, Hussain MJ, Hao Y, Chen Y, Zhang X, et al. DPG-LSTM: An Enhanced LSTM Framework for Sentiment Analysis in Social Media Text
Based on Dependency Parsing and GCN. Applied Sciences. 2022;13(1):354. Available from: https://doi.org/10.3390/app13010354.

https://www.indjst.org/ 2243

N. M. Amosov (Auth.) - Modeling of Thinking and The Mind-Macmillan Education UK (1967) PDF
No ratings yet
N. M. Amosov (Auth.) - Modeling of Thinking and The Mind-Macmillan Education UK (1967) PDF
203 pages
An Overview of ISO 8000
100% (2)
An Overview of ISO 8000
34 pages
Twitter Sentimental Analysis PPT Draft
No ratings yet
Twitter Sentimental Analysis PPT Draft
34 pages
Sentiment Analysis For Social Media
No ratings yet
Sentiment Analysis For Social Media
26 pages
Challenges and Future in Deep Learning For Sentime
No ratings yet
Challenges and Future in Deep Learning For Sentime
80 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
Final - DBMS UNIT-1
No ratings yet
Final - DBMS UNIT-1
179 pages
A Survey and Comparative Study On Negative Sentiment Analysis in Social Media Data
No ratings yet
A Survey and Comparative Study On Negative Sentiment Analysis in Social Media Data
50 pages
IMDB Sentiment Analysis
No ratings yet
IMDB Sentiment Analysis
44 pages
10.1007@s12559 020 09745 1
No ratings yet
10.1007@s12559 020 09745 1
33 pages
2 Scjhasdjfsadfs
No ratings yet
2 Scjhasdjfsadfs
22 pages
Final Sentiment Classification
No ratings yet
Final Sentiment Classification
16 pages
Uno 3
No ratings yet
Uno 3
16 pages
Improving Social Media Sentiment Analysis With Swarm Intelligence Feature Selection and Deep Learning Techniques
No ratings yet
Improving Social Media Sentiment Analysis With Swarm Intelligence Feature Selection and Deep Learning Techniques
20 pages
Product Rating Through Sentiment Analysis
No ratings yet
Product Rating Through Sentiment Analysis
23 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
A Literature Review Enhancing Sentiment
No ratings yet
A Literature Review Enhancing Sentiment
11 pages
A Mobile-Based Image Recognition System For Identifying Bird Species in Kenya
No ratings yet
A Mobile-Based Image Recognition System For Identifying Bird Species in Kenya
94 pages
24-02-14 7. Feature Extraction Methods
No ratings yet
24-02-14 7. Feature Extraction Methods
19 pages
IoTF BDA 2.01 Release Notes
No ratings yet
IoTF BDA 2.01 Release Notes
2 pages
MP 1
No ratings yet
MP 1
14 pages
Amit Anand Presentation Sem4 Deep Learning Based Sentiment Analysis-2
No ratings yet
Amit Anand Presentation Sem4 Deep Learning Based Sentiment Analysis-2
12 pages
Emotion AI Driven Sentiment Analysis A S
No ratings yet
Emotion AI Driven Sentiment Analysis A S
27 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Sustainable Gastronomy Day by Slidesgo
No ratings yet
Sustainable Gastronomy Day by Slidesgo
56 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
Modern Approachesin Sentiment Analysis Models
No ratings yet
Modern Approachesin Sentiment Analysis Models
8 pages
Final Research Paper
No ratings yet
Final Research Paper
12 pages
Aspect-Based Sentiment Analysis: A Survey of Deep Learning Methods
No ratings yet
Aspect-Based Sentiment Analysis: A Survey of Deep Learning Methods
18 pages
Applsci 13 04550
No ratings yet
Applsci 13 04550
21 pages
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
No ratings yet
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
11 pages
Sentiment Analysis of Comment Texts Based On BiLSTM
No ratings yet
Sentiment Analysis of Comment Texts Based On BiLSTM
11 pages
IDEAL: An Inventive Optimized Deep Ensemble Augmented Learning Framework For Opinion Mining and Sentiment Analysis
No ratings yet
IDEAL: An Inventive Optimized Deep Ensemble Augmented Learning Framework For Opinion Mining and Sentiment Analysis
15 pages
Sentiment Analysis Using Deep Learning
No ratings yet
Sentiment Analysis Using Deep Learning
10 pages
Twitter and Emotions: Exploring Sentiment Detection
No ratings yet
Twitter and Emotions: Exploring Sentiment Detection
6 pages
15th ICCCNT 2024 Paper 3320
No ratings yet
15th ICCCNT 2024 Paper 3320
8 pages
DSP Project Report
100% (1)
DSP Project Report
14 pages
Sentiment Analysis On Text Based Data of Social Media Using Deep Learning
No ratings yet
Sentiment Analysis On Text Based Data of Social Media Using Deep Learning
7 pages
Assginment - With Hints
No ratings yet
Assginment - With Hints
2 pages
Bus 406 Summary (READ!!!!)
No ratings yet
Bus 406 Summary (READ!!!!)
10 pages
Sentiment Analysis With Machine Learning and Deep Learning A Survey of Techniques and Applications
No ratings yet
Sentiment Analysis With Machine Learning and Deep Learning A Survey of Techniques and Applications
11 pages
Social Media
No ratings yet
Social Media
13 pages
A Comprehensive Analysis of Sentiment Analysis Approaches Applications and Classifier Comparisons
No ratings yet
A Comprehensive Analysis of Sentiment Analysis Approaches Applications and Classifier Comparisons
8 pages
Base 1
No ratings yet
Base 1
7 pages
State of Penang, Malaysia
No ratings yet
State of Penang, Malaysia
35 pages
A Summary of Aspect-Based Sentiment Analysis
No ratings yet
A Summary of Aspect-Based Sentiment Analysis
11 pages
Opinion Text Analysis Using Artificial Intelligence
No ratings yet
Opinion Text Analysis Using Artificial Intelligence
7 pages
Quiz Artificial Intelligence
No ratings yet
Quiz Artificial Intelligence
4 pages
NA To SS EN 1997-14ty-2010
100% (2)
NA To SS EN 1997-14ty-2010
26 pages
Sentiment Analysis of User Comment Text Based On L
No ratings yet
Sentiment Analysis of User Comment Text Based On L
13 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
5 pages
RS 16
No ratings yet
RS 16
7 pages
OKE JUGA - Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory
No ratings yet
OKE JUGA - Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory
4 pages
Environmental Threats Analysis (Preta) : Pittsburgh Regional
No ratings yet
Environmental Threats Analysis (Preta) : Pittsburgh Regional
19 pages
DL 3
No ratings yet
DL 3
9 pages
Stats 101c Final Project
100% (1)
Stats 101c Final Project
16 pages
Samiksha Krishna Kadam
No ratings yet
Samiksha Krishna Kadam
6 pages
Exploring Sentiment Analysis Through Deep Learning: A Comprehensive Review
No ratings yet
Exploring Sentiment Analysis Through Deep Learning: A Comprehensive Review
4 pages
Tweet Emotion Detection
No ratings yet
Tweet Emotion Detection
6 pages
An Enhanced Sentiment Analysis Using Machine Learning Methods in Imbalanced Movie Review Streams
No ratings yet
An Enhanced Sentiment Analysis Using Machine Learning Methods in Imbalanced Movie Review Streams
6 pages
Fimcar III - Car-to-Car Test Results
No ratings yet
Fimcar III - Car-to-Car Test Results
24 pages
On The Tripped Rollovers and Lateral Skid in Three-Wheeled Vehicles and Their Mitigation
No ratings yet
On The Tripped Rollovers and Lateral Skid in Three-Wheeled Vehicles and Their Mitigation
20 pages
A Study On Sentiment Analysis - Methods and Tools
No ratings yet
A Study On Sentiment Analysis - Methods and Tools
6 pages
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
No ratings yet
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
6 pages
Features of R and Its Applications
No ratings yet
Features of R and Its Applications
2 pages
NILES2021 Paper 43
No ratings yet
NILES2021 Paper 43
5 pages
Sentiment Analysis of Comment Texts Based On BiLSTM
No ratings yet
Sentiment Analysis of Comment Texts Based On BiLSTM
11 pages
ICDAIC 2023 Paper 51
No ratings yet
ICDAIC 2023 Paper 51
6 pages
Sentiments of Public Opinion
No ratings yet
Sentiments of Public Opinion
3 pages
3rd Ass
No ratings yet
3rd Ass
6 pages
Sentiment Analysis Literature Review
No ratings yet
Sentiment Analysis Literature Review
2 pages
C (Programming Language) - Wikipedia: Skip To Main Content Accessibility Feedback
No ratings yet
C (Programming Language) - Wikipedia: Skip To Main Content Accessibility Feedback
11 pages
Assignment No: 2: Aim: Objective
No ratings yet
Assignment No: 2: Aim: Objective
4 pages
Sentimental Analysis Using NLP
No ratings yet
Sentimental Analysis Using NLP
5 pages
IJCRT2207068
No ratings yet
IJCRT2207068
5 pages
CH 12
No ratings yet
CH 12
37 pages
Foundational Papers of Complexity Science Toc
No ratings yet
Foundational Papers of Complexity Science Toc
2 pages
Free PDF Reader - Download: Skip To Main Content Accessibility Feedback
No ratings yet
Free PDF Reader - Download: Skip To Main Content Accessibility Feedback
6 pages
Machine Learning Based Sentiment Analysis For Text Messages
No ratings yet
Machine Learning Based Sentiment Analysis For Text Messages
7 pages
Neural Network (Back Propagation) Case Study: 7 Segments
No ratings yet
Neural Network (Back Propagation) Case Study: 7 Segments
6 pages
Sentiment Analysis of Twitter Data: A Survey of Techniques: Vishal A. Kharde S.S. Sonawane
No ratings yet
Sentiment Analysis of Twitter Data: A Survey of Techniques: Vishal A. Kharde S.S. Sonawane
11 pages
10 1109@icaccs48705 2020 9074208
No ratings yet
10 1109@icaccs48705 2020 9074208
3 pages
Heinz Von Foerster's Self Organization, The Progenitor of Conversation and Interaction Theories
No ratings yet
Heinz Von Foerster's Self Organization, The Progenitor of Conversation and Interaction Theories
14 pages
J. Tarn", X. Yun", S. Hant++
No ratings yet
J. Tarn", X. Yun", S. Hant++
9 pages
Ai Term Paper
No ratings yet
Ai Term Paper
6 pages
Comparitive Fraud App
No ratings yet
Comparitive Fraud App
5 pages
Big Bang - Big Crunch Learning Method For Fuzzy Cognitive Maps
No ratings yet
Big Bang - Big Crunch Learning Method For Fuzzy Cognitive Maps
10 pages
º Y'& Yyy Y YYY Yyyyyy Yyyy Y Yyy Y Yyyyyyy YY Yyyyyy Y Y Y Yy Yy Yyy Y Y Yyyyyyyyyyy !"yy &yyy Yyyyy +yyyyyyy
No ratings yet
º Y'& Yyy Y YYY Yyyyyy Yyyy Y Yyy Y Yyyyyyy YY Yyyyyy Y Y Y Yy Yy Yyy Y Y Yyyyyyyyyyy !"yy &yyy Yyyyy +yyyyyyy
11 pages
Sentiment Analysis Twitter
No ratings yet
Sentiment Analysis Twitter
3 pages
Instructions For How To Solve Assignment
No ratings yet
Instructions For How To Solve Assignment
3 pages
Jabberwacky
No ratings yet
Jabberwacky
2 pages
Chapter 5 Communication Applications
No ratings yet
Chapter 5 Communication Applications
3 pages
Sentiment Analysis For User Reviews On Social Media-IJAERDV04I0291676
No ratings yet
Sentiment Analysis For User Reviews On Social Media-IJAERDV04I0291676
4 pages
Technology: Ranspor
No ratings yet
Technology: Ranspor
2 pages
Vit Schedule PDF
No ratings yet
Vit Schedule PDF
1 page
268687740
No ratings yet
268687740
1 page
HRTHTHRH
No ratings yet
HRTHTHRH
1 page

RGBGB

Uploaded by

RGBGB

Uploaded by

INDIAN JOURNAL OF SCIENCE AND TECHNOLOGY

Sentiment Analysis Framework of

1.1 Research Gaps

• Internet Movies Database (IMDB)

2.2 Text Pre-processing

• Term Frequency-Inverse Document Frequency (TF-IDF)

2.4 Proposed Framework

Fig 1. Flowchart of Proposed Methodology

Table 1. Twitter dataset

Table 2. IDMB dataset of Kotzias

Table 3. Yelp dataset of JSON

2.6 Feature Extracting Techniques

Term t f requency in document d

TF − IDF(t) = TF(t, d) × IDF(t) (3)

2.6.3 Word Article Matrix (WAM)

Table 4. An example of WAM

Table 5. An example of the i-WAM

Table 6. A sample query with word count

Cosine Similarity (d1, d2) = (d1 · d2)/(∥d1∥ ∗ ∥ d2∥) (4)

2.7 Classiﬁcation Algorithms

Fig 2. Cosine similarity result

2.7.1 Naïve Bayes

P(c/d) = (P(c) ∗ P(d/c))/P(d) (5)

2.7.2 Support Vector Machine

2.7.3 Artiﬁcial Neural Network

2.8 Performance Criteria

3 Results and Discussion

Table 7. Results with TF-IDF on Twitter, IMDB and Yelp datasets

Table 8. Results with TF-IDF on Twitter, IMDB and Yelp datasets

3.1 Comparison with other Methods

You might also like