19

Uploaded by

22211a05r1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views8 pages

19

Uploaded by

22211a05r1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

International Journal of

INTELLIGENT SYSTEMS AND APPLICATIONS IN

ENGINEERING
ISSN:2147-67992147-6799 www.ijisae.org Original Research Paper

Enhancement of the Lexical Approach by N-Grams Technique via

Improving Negation-Based Traditional Sentiment Analysis
Harish Dutt Sharma1, Sanjay Sharma2*

Submitted: 11/12/2023 Revised: 23/01/2024 Accepted: 02/02/2024

Abstract: Sentiment analysis, often known as opinion mining, is a significant area in artificial intelligence today. Sentiment analysis was
widely observed in this field. Currently, a lot of data is constantly being exchanged as text on social networking and e-commerce
platforms like Facebook, Twitter, Amazon, etc. Therefore, sentiment analysis is the best technique for businesses to comprehend what
their customers want from them so that they may adapt their plans in response to client feedback and expand their customer base. To
extract the exact meaning from the text is a tough task. So here, our effort is to get the positive and negative sentiment of reviews from
the dataset and enhance the performance of sentiment through Natural language processing (NLP) over pre-existing pre-processing
technique and machine learning algorithms. So for this purpose, we have an Amazon product review dataset. Which is an extract from
the Kaggle website. In this study, we aim to remove noise from the dataset and improve the traditional NLP preprocessing technique
after that, we will use Term Frequency-Inverse Document Frequency (TF-IDF) method for feature selection and then classify the result
through the classification algorithm such as Artificial Neural Network (ANN), Naïve Bayes (NB), and Support Vector Machine (SVM).

Keywords: Negation handling, N-Grams techniques, Sentiment analysis, Pre-processing technique, Machine learning

1. Introduction thinks about a particular topic [3]. In the context of mood

analysis, two concepts can be explored: 'polarity' and
Sentiment analysis (SA) fulfills the challenges and its 'subjectivity' Subjectivity deals with a person's beliefs, opinions,
contribution to the research area. SA deal with text, which is or personal feelings, while polarity essentially refers to feelings
given in social media, where the grammar rules and spelling expressed as negative, positive, or neutral. Sentiment analysis
ignores by a person. So there we have to require a technique that works at the different levels of document, sentence and sub-
pre-process the data and clean the dataset before to SA because sentence. Various forms of sentiment analysis can be done in
social media has a lot of text in the form of information like various areas, such as fine-grained sentiment analysis, which
fragmented sentence, emoticons, abbreviation, slang, and emojis works with polarities ranging from very negative to very positive,
[1]. SA is a study of customer opinion, where the unstructured or the detection of intentions or feelings. For sentence analysis,
data are in the form of text and unstructured data is not easy to there is both a traditional lexicon-based method and a machine
understand because the text data have more ambiguity and learning-based technique. Both methods have some pros and
irregularities as compared to formal databases. A huge amount of cons. The objective of this learning process is to identify and rank
reviews in social media is an opportunity to get SA from the text negative and positive customer feedback on different products
and extract the information, which is hidden within the text that and to use an Machine Learning (ML) model to rank them [4].
can be analyzed through the appropriate tools. So we have According to a study conducted by Amazon last year, more than
required such kind of tools, which analyze or visualize the 80% of online buyers value ratings more than explicit
document and generate the model. In such types of studies, we suggestions. Any online product with a large number of positive
use several computational algorithms for visual analytics and reviews speaks to the credibility of the item, while the lack of
methods that build a specific model to preprocess the reviews and reviews raises doubts among potential customers. Simply put, the
classify or visualize the information as a result in sentiment more reviews, the more credible they appear [5]. There are
analysis and extract the sentiment from customer opinion and for several areas where feelings analysis is used. The authors assert
this entire framework, we use the NPL method for data that sentiment analysis helps the government to identify its
preprocessing [2]. SA is a technique that identifies ambiguity in strengths and weaknesses by reviewing public opinion on social
opinions, language etc. It is therefore also called "sentiment media [6]. In the same way, sentiment analysis is used in online
analysis". SA provides information about how a speaker or user commerce to identify dissatisfied customers and turn them into
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

1Department of Computer Application, promoters. By analyzing their shopping experience and opinions
School of CA&IT, Shri Guru Ram Rai University, on product quality, businesses can use sentiment analysis to make
Dehradun 248001, Uttarakhand, India sure that customers are satisfied with their purchases. This helps
ORCID: 0000-0002-6325-242X businesses to build long-lasting relationships with their
Email: [email protected]
2Department of Computer Application, customers, as well as improve customer loyalty and create better
School of CA&IT, Shri Guru Ram Rai University, customer experiences. Additionally, sentiment analysis helps to
Dehradun 248001, Uttarakhand, India improve customer service and generate more sales and revenue.
ORCID: 0000-0001-7625-5091 Finally, sentiment analysis also helps businesses identify trends
Email: [email protected] and potential issues in their products and services, enabling them
*(Corresponding Author) to quickly make the necessary changes to ensure customer

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 63
satisfaction [7]. The Author affirmed that sentiment analysis is [15] conducted a study to analyze the impact of text
used for assessing customer reviews and opinions about products preprocessing on online movie reviews. To reduce noise in the
and services in order to gain insight into customer attitudes, data, various preprocessing processes such as stemming, HTML
opinions, and emotions. The technique can also help businesses tag removal, and data cleaning were conducted. To reduce
make decisions on how to improve their offerings [8]. However, unnecessary features, the chi-square feature selection technique
it is difficult to identify, filter and monitor the feelings analysis was applied. After preprocessing, a support vector machine
information available on social media applications on an ongoing (SVM) was applied to the scores to categorize them into either
basis. Among these factors are the availability of unstructured negative or positive sentiments. The results of the study showed
data, the variety of languages, a variety of websites and social that the preprocessing steps were beneficial for classifying the
media platforms, and disparate information about people's views. ratings into the respective mood categories. The authors
As a result, suitable tools and algorithms are needed to analyze concluded that text preprocessing is a crucial step in the
feelings from data gathered in Big Data, Blockchain, Fog sentiment analysis of online movie reviews. The author [16]
Computing, and IoT-base [9]. Accurate interpretation of speech is started removing noise from Amazon book reviews by applying
currently almost impossible in computer programs. There are various preprocessing techniques, such as removing URLs and
many difficulties and great challenges in the analysis of negation HTML tags, spaces, stemming, punctuation, and special
[10]-[12] since its contextual-dependent nature complicates bag- characters. This ensured that the data was ready for the next step
of-words (BoW) approaches to natural language. The latter of feature selection, which was performed using TF-IDF. TF-IDF
method of document analysis considers only the frequency of was used to display the preprocessed data by highlighting the
words in the document, without taking into account the order of most important words in the reviews. This feature selection
words in a document from beginning to end. However, such method allows the most relevant words to be identified and used
careful consideration is necessary, since negations occur in a in the analysis. After feature selection was completed using TF-
variety of different forms. They reverse the meaning of single IDF, the authors compared the accuracy of different classifiers
words, but also entire sentences or even phrases [13]. such as K-Nearest Neighbor, Support Vector Machine, Decision
In this paper, we use a dataset of Amazon product reviews to Tree, Naive Bayes, and Random Forest. They also evaluated the
obtain positive and negative sentiments. The TF-IDF technique is time required by each classifier as well as the sentiment scores of
used to extract features, and the output is obtained using a different books to assess the effectiveness of each model.
machine learning classifier. The remainder of the work is Unsupervised learning is a type of machine learning algorithm
organised as follows: in section 2, we examine the literature on that works without requiring input data to be labeled or
opinion analysis briefly. The methods utilised are described in categorized. It instead relies on the data itself to identify patterns
Section 3. Section 4 offers a thorough experimental examination and make predictions. It is used to identify patterns and trends in
of the opinions. Section 5 concludes with a conclusion and data which can be used to generate insights and make predictions.
discussion of future work. Unsupervised learning can be used to estimate models without
annotations, but the results are often less accurate overall [17].
2. Literature Review Compared to unsupervised learning, supervised learning is
known to perform better. For this approach to work, training sets
In the field of text classification and opinion analysis, much must be created with manual labels for each word. This labeling
research has been done to determine the sentiment values of a process is very labor intensive, often requires extensive manual
text. One of the main challenges is to categorize the sentiment work, and results in only mediocre performance. In addition, this
values of a text as either negative or positive. To address this process is highly subjective, which can lead to further
challenge, three different forms of classification have been inaccuracies. Therefore, developing an automated method for
proposed. At the aspect level, sentiment values are classified labeling latent negation regions is critical to improving the
based on individual aspects of the text. Sentence-level accuracy of supervised learning [17] [18]. In the opinion analysis,
classification focuses on the sentiment values of an individual the recognition of negation domains in the relevant research is
sentence in a text. Document-level classification takes into mainly based on rule-based algorithms. Rule-based approaches
account the sentiment values of the entire document. These three have several disadvantages since the list of negations must be
different forms of classifying sentiment values help to accurately determined in advance and the selection criterion for choosing a
categorize the sentiment values of a text as either negative or rule is usually random or determined by cross-validation. Rules
positive. Worldwide, researchers have studied supervised, that are created with the intention of reflecting "ground truth" are
unsupervised, and semi-supervised machine learning techniques. inherently limited in their ability to be learned. This is because
The author [14] used sentiment analysis to rate reviews of the the rules are predefined based on certain assumptions and cannot
iPhone 5 on Amazon as an example. The study of reviews be adapted to new information or experience. This means that the
provided useful insights into customer experience and review rules cannot evolve or grow in their understanding of the ground
sentiment. This study also provided information for future truth, but remain static. As a result, rules can never truly reflect
product design decisions, illustrating the importance of machine "ground truth" because reality is constantly changing and
learning in today's world. This technique is a combination of evolving. Therefore, rules are not able to truly learn. Those who
various preprocessing techniques aimed at reducing noise in the want to use a learning strategy can alternatively turn to generative
data. This includes removing punctuation, HTML tags, and probabilistic models [19].
numbers. The part-of-speech tagger (POS) identifies the types of In this research, we aim to address the issue of negations in
words in a text by identifying the individual words. After reviews and feature extraction from Amazon datasets using TF-
identifying the features, a rule-based procedure is used to IDF. Then, to compare and assess the outcomes, we will analyse
categorize the reviews. In this procedure, a set of rules is used to the accuracy of different classifiers as well as their sentiment
assign each review to a specific category or sentiment. The author
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 64
scores. We will use the TF-IDF scores to study the relationships
between features and ratings. We will also use the classifiers to
identify the sentiment of each review, taking into account the
presence of negations. Finally, we will measure the accuracy of
the different classifiers and sentiment scores to determine which
ones are most effective in identifying the sentiment of each
review. In this way, we can better understand the impact of
negations on sentiment analysis and feature extraction. The
classifiers and methods used in the experiment are discussed in
more detail in the following section.

3. Methodology
Most researchers have traditionally used preprocessing
techniques to remove noise from reviews, as depicted in Figure 1.
These techniques involve removing negative words such as "not",
"no", "wouldn't", "didn't", etc. from reviews, although such
negative terms can have a greater impact on sentiment, as they
Fig. 1. The Traditional approach
can change the entire meaning of a sentence. As such, it is
important to consider the impact of such negative words when
processing reviews. Negative terms can have a significant impact
on the mood of a sentence. When we remove negative terms from
a sentence using preprocessing techniques, the overall meaning
and mood of the opinion can change drastically. For instance,
consider the following two sentences, "He is as brave as a lion"
and "A lion is not braver than him." The first sentence is a
positive statement, implying that the person being referred to is
very brave, just like a lion. However, the second sentence is
negative because it uses the word "not" to negate the comparison
between the lion's bravery and the person's bravery. The sentence
implies that the lion is not as brave as the person. When we
remove the negative term "not" from the second sentence, the
meaning and mood change. The sentence becomes "A lion is
braver than him," which is a positive statement implying that the
person being referred to is less brave than a lion. Therefore, it is
important to carefully consider the use of negative terms in a
sentence. Negative terms can completely change the sentiment of
a sentence and affect the overall meaning of a statement. Fig. 2. Modification in Traditional approach
Removing negative terms from documents is a common
preprocessing technique that uses stopwords. While this can be 3.1 Data Acquisition and Pre-processing
useful for reducing noise in a document but it is also important to
To validate our approach, we selected the Amazon product
remember that it is not good practice to eliminate negative words.
review dataset. This dataset comprises 21,000 reviews of 30
This can result in an incomplete or inaccurate representation of
different items, each having its own DOC_ID, LABEL,
the original document, which can have a negative impact on
RATING, VERIFIED_PURCHASE, PRODUCT_CATEGORY,
overall accuracy. Negative terms, then, should never be neglected
PRODUCT_ID, PRODUCT_TITLE, REVIEW_TITLE, and
in a sentence. They go a long way toward ensuring that the
REVIEW_TEXT. By analyzing this dataset, we can better
message of the sentence is conveyed clearly and correctly.
understand customers’ experiences with different products and
In Figure 2, we attempt to remove negative terms from a list of
brands in the market. We believe this dataset provides us with
stopwords. The remaining stopwords are then manually removed
enough data to accurately assess the validity of our approach. To
from the documents to ensure that no negative terms are removed
ensure the accuracy of our dataset, we have decided to remove
from the reviews. In this way, we can deal with both negative and
reviews with a 3-star rating. These reviews are considered
positive phrases in a document.
neutral, meaning they are neither negative nor positive, and
therefore need not be included. Reviews with more than three
stars are considered favourable, while reviews with fewer than
three stars are considered bad. We move on to the following
phase with the remaining reviews, where only reviews with more
than three stars are assigned a 1, while reviews with fewer than
three stars are assigned a 0. After that, we clean the column
"REVIEW _ TEXT" from interfering factors like URL links,
HTML tags, numbers, special characters, etc. (see Figure3), and
convert it into a column "Clean_Text".
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 65
understand. Lemmatizing the text allows us to group words with
the same root, which makes the analysis and identification of
important terms in the data more efficient.
In the next step, stopwords are again removed from the
"Clean_Text" column after negative words have been removed
from the stopwords list and the text has been lemmatized (see
Figure 2). The result is stored in a new column called
"After_edit_remove_sw". The result of this work can be seen in
Figure 4. In this step, we chose not to remove any negative words
Fig. 3. Screenshot of Text cleaning
that the client may have expressed. We chose to keep the client's
In the first step, without editing the stopwords, we remove them opinion without any influence or manipulation. This decision was
from the "Clean_Text" column and the text has been lemmatized made to ensure that the customer's opinion was accurately
(see Figure 1). The result is stored in a new column called reflected in the document.
"Before_edit_remove_sw" (see Figure 4). This process is useful
for streamlining and normalizing the text in our analysis. After
removing the stopwords, the text is more concise and easier to

Fig. 4. Screenshot of dataset before edit stopwords and after edit remove stopwords

3.2 Feature Extraction regarded to be rarer if it’s TF*IDF score (weight) is higher, and
vice versa. The TF of a term indicates how frequently it is used.
Representation is a key stage in sentiment classification. Often,
The IDF of a word determines how important it is throughout the
noise in raw data must be filtered out using various pre-
narrative. [21].
processing procedures. The preprocessed data is used to generate
a term document matrix, or TDM, which specifies the frequency 3.3 Data Representation
of each specific word. The TDM supports two feature extraction
In this section, we present a word cloud of reviews for our
methods: the TF-IDF and the bag of words. A word's TF-IDF
Amazon product dataset (see Figure 5). A word cloud is a
score is derived by multiplying the TF and IDF together; this
graphical representation of the frequency of words used in a text.
score can also be determined by the product of these two factors.
This visual representation allows us to get a quick overview of
In the TF score, the terms with the highest frequency in the
the most frequently used words in the review data. By analyzing
review dataset are weighted more heavily than other terms. The
this word cloud, we can gain valuable insight into the opinions of
IDF scaling factor increases the relevance of the dataset's least
customers who have submitted reviews. This visual
common terms. When compared to other word categories, this
representation allows us to better understand the sentiment of the
score is lower for rare and common words. We are likely to
reviews as well as the characteristics that customers are most
eliminate terms with low TF-IDF scores if we ignore them [20].
interested in. This word cloud helps us better understand the
Information can be recovered from documents using the TF-IDF
preferences of the customer, which may be utilized to guide
technique, which takes into account both the inverse document
future marketing decisions.
frequency (IDF) and the term frequency (TF). Each phrase and
word is assigned a distinct TF and IDF mark. The aggregate TF
and IDF product scores of a phrase are added together to
calculate its TF*IDF weight. To put it another way, a phrase is

Fig. 5. Word cloud of the dataset

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 66
After conducting TF-IDF vectorization on the text in our dataset, 4. Results and Discussion
we utilize scatter plots to display the data as either negative or
positive. In order to visualize correlations between two variables, In this study, we present the technique and algorithm that have
scatter plots are a valuable tool. The TF-IDF scores for each word been recommended for evaluation. The effectiveness of the
in the text and whether or not the emotion is good or negative are suggested technique and algorithm in handling the current issue
the two variables in this situation. When representing the was evaluated through a series of trials. A dataset of Amazon
importance of a word in a text corpus for information retrieval customer reviews that had been randomly divided into training
and natural language processing, TF-IDF is frequently utilized. and testing groups was used for the tests. Developing a
We can plot the results of TF-IDF vectorization on a scatter plot classification model is a critical task that requires careful
once the text has been processed. The TF-IDF scores for each consideration of the features present in the dataset. As noted by
word in the text are represented by the x0, and the sentiment author [23], the first step in developing such a model is to
(positive or negative) is indicated by the x1. Figure 6 shows the identify the relevant features. During model training, the words
dataset's outcome following the computation of the TF-IDF. The from a review can be decoded and added to the feature vector.
results of the TF-IDF method used on the dataset are shown in This method is known as a "Uni-gram" if only one word is
this figure. considered, and a "Bi-gram" if only two words are considered,
and a "Tri-gram" if only three words are taken into account.
Accurate analysis of substantial amounts of text data is made
possible by combining Uni-gram and Bi-gram approaches. The
accuracy of sentiment analysis can actually be greatly increased
by combining these strategies, according to studies [24], [25].
Therefore, when conducting text analysis, it is strongly advised to
combine Uni-gram and Bi-gram approaches.
Using a number of classifiers such as NB, SVM and ANN. we
estimated the sentiment of these reviews and looked at how
accurate our classifiers are for texts that can be found in the real
world. Accuracy, recall, precision, and F1-score are only a few of
the evaluation criteria that are used in the proposed study [26],
which is described below:
4.1 Accuracy
The accuracy measure is an essential metric used in data analysis
Fig. 6. Visualization of text
to determine how many data values have been correctly
3.4 Experiment predicted. The formula for computing accuracy is as follows:

We next calculated the sentiment analysis accuracy after pre- Sum of Correct Predictions
processing our dataset with two distinct techniques (as depicted Accuracy = (1)
Total Predictions
in Figures 1 and 2). We achieved this by integrating the TF-IDF
4.2 Precision
characteristic with unigram, bigram, and trigram. We divided the
TF-IDF vectorizer into training and testing sets once we acquired Among all the anticipated positive class samples, the precision
it. Then, classifiers like SVM, ANN, and NB were fed these sets. measure calculates the number of samples that are actually
When we compared the outcomes of various classifiers, we positive as follows:
discovered that the accuracy had increased. The use of the TF-
Sumof True Positives
IDF feature, which allowed for a more accurate analysis of the Precision = (2)
Sum of True Positives + Sum of False Positives
sentiment within the dataset, is responsible for the improvement
in accuracy [22]. For the suggested work, the following algorithm 4.3 F1-score
was used:
The F1-score is a statistical measure used in machine learning to
assess a classification model's accuracy. It's the harmonic mean
Proposed Algorithm of Precision and Sensitivity. It is also referred to as the Dice
Input : labeled information Similarity Coefficient or the Sorensen-Dice Coefficient. The
Output : Classifiers' accuracy perfect number is 1. F1-score computation is displayed as
1. Load the dataset and preprocess it using techniques shown in follows:
Figure 1 and Figure 2.
2. Create a TF-IDF vector for each document in the dataset along Precision ∗ Sensitivity
F1 − score = 2 ∗= (3)
with unigram, bigram, and trigram. Precision + Sensitivity

3. Split the dataset into training and testing sets. 4.4 Recall
4. Train the classifiers (SVM, ANN, and Naive Bayes) on the
training set. Recall, also known as sensitivity, is a metric used in machine
5. Test the classifiers on the testing set and calculate the accuracy learning to evaluate the performance of a classification model. It
of sentiment analysis. measures how many positive samples are correctly predicted by
6. Compare the accuracy of each classifier based on unigram, the model among all the positive samples in the test set. The
bigram, and trigram. formula for computing Sensitivity is as follows:
7. End
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 66
Sum of True Positives Table 1 displays the precision, recall, and F1 score for all three
Recall = (4)
Sum of true Positives + Sum of False Negatives
algorithms.

A two-phase comparative sentiment analysis is presented in this Table 1. Classification performance prior to stopword editing
document. Figure1 illustrates the first phase, whereas Figure 2
Unigram Bigram Trigram
shows the second phase. In first phase, we findings our research Performance
SV AN SV AN SV AN
measures
on the Amazon product review dataset. Table 1 displays the NB M N NB M N NB M N
results of our analysis, and we did not make any modifications to Classification 84. 90. 88. 84. 89. 90. 84. 88. 90.
accuracy 39 85 37 42 94 97 42 51 71
the stopwords in the dataset. To preprocess the data, we followed
84. 91. 92. 84. 89. 92. 84. 88. 94.
the traditional technique shown in Figure 1. Upon examining Precision 41 5 36 42 85 49 42 19 58
Table 1, it is clear that ANN outperforms the other methods and 99. 98. 93. 99. 97. 99. 94.
100 100
provides the best results. Recall 95 28 87 29 19 75 4
91. 94. 93. 91. 94. 94. 91. 93. 94.
The results obtained from the implementation of three different
F1-score 53 77 11 55 33 78 55 61 49
machine learning algorithms, namely ANN, SVM, and NB, were
evaluated based on their accuracy, precision, recall, and F1 score.
Furthermore, Figure 7 visualizes the accuracy results of the three
The highest accuracy achieved by ANN was 90.97%, which was
algorithms. It can be seen that ANN and SVM achieved higher
based on Bigram analysis. SVM, on the other hand, obtained an
accuracy than N.
accuracy of 90.85% using Unigram analysis. In the case of NB,
the accuracy was 84.42% using Bigram and Trigram analysis.

Fig. 7. Visualization of accuracy

The second phase of the procedure was preparing the data using Figure 8 also displays the accuracy results from the three
the method depicted in Figure 2. Which involves removing approaches. It can be seen that ANN and SVM performed more
negative words from the stopwords is to identify which words are accurately than NB.
considered negative. Common negative words include "no,"
"not," "never," "don't," and "can't". Once these words have been
identified, they can be removed from the list of stopwords that
are commonly excluded from text analysis because sentiment
analysis is more impacted by the negative words.
The outcomes attained Based on their F1 score, recall, precision,
and accuracy, the second phase of the installation of three
different machine learning algorithms, namely ANN, SVM, and
NB, were assessed. Based on Trigram analysis, ANN had an
accuracy of 92.31%, which was its highest. On the other hand,
SVM used Bigram analysis to get an accuracy of 90.88%. Using
Bigram and Trigram analysis, the accuracy for NB was 84.42%,
as shown in Table 2. The precision, recall, and F1 score for each
of the three algorithms are shown in Table 2.
Table 2. Classification performance after stopword editing
Fig. 8. Visualization of accuracy
Unigram Bigram Trigram
Performance
SV AN SV AN SV AN
measures
NB M N NB M N NB M N
5. Conclusion
Classification 84. 90. 84. 84. 90. 91. 84. 89. 92.
In summary, the use of preprocessing techniques and the TF-IDF
accuracy 39 87 96 42 88 58 42 49 31
84. 91. 90. 84. 90. 84. 89. 94. feature in combination with classifiers such as SVM, ANN, and
Precision 41 94 13 42 82 92 42 25 41 NB proved effective in achieving higher accuracy rates. In
99. 97. 90. 99. 98. 99. 96. comparison, the ANN classifier achieved the best accuracy rate of
Recall 95 75 33 100 23 59 100 54 61
92.31%, while the SVM classifier also performed well. However,
91. 94. 90. 91. 94. 95. 91. 94. 95.
F1-score 53 76 23 55 84 18 55 11 50 the NB classifier did not perform well in terms of accuracy. By
comparing the performance of our proposed algorithm with
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 67
traditional algorithm, we showed that our algorithm was able to [10] N. P. Cruz, M. Taboada, and R. Mitkov, “A machine-learning
identify negative words more accurately. This suggests that approach to negation and speculation detection for sentiment
negative words are more important than other words in the analysis,” J. Assoc. Inf. Sci. Technol., vol. 67, no. 9, pp. 2118–
dataset. Using stopwords in NLP to eliminate unfavorable 2136, Sep. 2016, doi: 10.1002/asi.23533.
keywords from evaluations may not be the best strategy. [11] J. Serrano-Guerrero, J. A. Olivas, F. P. Romero, and E.
Negative words have a big impact on sentiment analysis, thus Herrera-Viedma, “Sentiment analysis: A review and
keeping consumer feedback authentic is crucial. The proposed comparative analysis of web services,” Inf. Sci. (Ny)., vol.
algorithm demonstrates that greater accuracy can be attained by 311, pp. 18–38, Aug. 2015, doi: 10.1016/j.ins.2015.03.040.
leaving out negative words. In order to preserve the integrity of [12] B. Pang and L. Lee, “Opinion Mining and Sentiment
customer feedback, it is critical for researchers and practitioners Analysis,” Found. Trends® Inf. Retr., vol. 2, no. 1–2, pp. 1–
in the field of NLP to rethink the conventional advice to delete 135, 2008, doi: 10.1561/1500000011.
negative terms. [13] I. Councill, R. Mcdonald, and L. Velikovich, “What’s great
Overall, future work in this sector will require a continuous and what’s not: Learning to classify the scope of negation for
development of the suggested method, which will include improved sentiment analysis,” pp. 51–59, Jun. 2010.
negative terms in our database. So we can obtain the accuracy of [14] Aashutosh Bhatt, Ankit Patel, Harsh Chheda, and Kiran
customer feedback without eliminating negative terms from the Gawande, “Amazon Review Classification and Sentiment
dataset since negative words have sentiment in sentences. We can Analysis ,” Int. J. Comput. Sci. Inf. Technol., vol. 6 (6), pp.
also employ strategies such as rule-based negation handling, 5107–5110, 2015.
unsupervised procedures, and ensemble techniques to improve [15] E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-
sentiment. These methodologies can be used to increase processing in Sentiment Analysis,” Procedia Comput. Sci.,
sentiment analysis accuracy. vol. 17, pp. 26–32, 2013, doi: 10.1016/j.procs.2013.05.005.
[16] K. S. Srujan, S. S. Nikhil, H. Raghav Rao, K. Karthik, B. S.
Acknowledgements Harish, and H. M. Keerthi Kumar, “Classification of Amazon
Professor (Dr.) Sanjay Sharma, who served as my research Book Reviews Based on Sentiment Analysis,” 2018, pp. 401–
adviser, has my heartfelt appreciation for his great guidance and 411. doi: 10.1007/978-981-10-7512-4_40.
assistance. I really thank them for everything they have done for [17] N. Prollochs, S. Feuerriegel, and D. Neumann, “Enhancing
me and the time they have spent getting to know me. Sentiment Analysis of Financial News by Detecting Negation
Scopes,” in 2015 48th Hawaii International Conference on
References System Sciences, IEEE, Jan. 2015, pp. 959–968. doi:
10.1109/HICSS.2015.119.
[1] M. A. Palomino and F. Aider, “Evaluating the Effectiveness
[18] N. Prollochs, S. Feuerriegel, and D. Neumann, “Detecting
of Text Pre-Processing in Sentiment Analysis,” Appl. Sci., vol.
Negation Scopes for Financial News Sentiment Using
12, no. 17, Sep. 2022, doi: 10.3390/app12178765.
Reinforcement Learning,” in 2016 49th Hawaii International
[2] V. Chang, L. Liu, Q. Xu, T. Li, and C. H. Hsu, “An improved
Conference on System Sciences (HICSS), IEEE, Jan. 2016, pp.
model for sentiment analysis on luxury hotel review,” in
1164–1173. doi: 10.1109/HICSS.2016.147.
Expert Systems, John Wiley and Sons Inc, Feb. 2023. doi:
[19] L. Rokach, R. Romano, and O. Maimon, “Negation
10.1111/exsy.12580.
recognition in medical narrative reports,” Inf. Retr. Boston.,
[3] A. Iqbal, R. Amin, J. Iqbal, R. Alroobaea, A. Binmahfoudh,
vol. 11, no. 6, pp. 499–538, Dec. 2008, doi: 10.1007/s10791-
and M. Hussain, “Sentiment Analysis of Consumer Reviews
008-9061-0.
Using Deep Learning,” Sustain., vol. 14, no. 17, Sep. 2022,
[20] B. Trstenjak, S. Mikac, and D. Donko, “KNN with TF-IDF
doi: 10.3390/su141710844.
based Framework for Text Categorization,” Procedia Eng.,
[4] K. Gupta, N. Jiwani, and P. Whig, “Effectiveness of Machine
vol. 69, pp. 1356–1364, 2014, doi:
Learning in Detecting Early-Stage Leukemia,” 2023, pp. 461–
10.1016/j.proeng.2014.03.129.
472. doi: 10.1007/978-981-19-2535-1_34.
[21] T. U. Haque, N. N. Saber, and F. M. Shah, “Sentiment
[5] R. Prabowo and M. Thelwall, “Sentiment analysis: A
analysis on large scale Amazon product reviews,” in 2018
combined approach,” J. Informetr., vol. 3, no. 2, pp. 143–157,
IEEE International Conference on Innovative Research and
Apr. 2009, doi: 10.1016/j.joi.2009.01.003.
Development (ICIRD), IEEE, May 2018, pp. 1–6. doi:
[6] Y. Yu, W. Duan, and Q. Cao, “The impact of social and
10.1109/ICIRD.2018.8376299.
conventional media on firm equity value: A sentiment analysis
[22] B. S. Rintyarna, R. Sarno, and C. Fatichah, “Enhancing the
approach,” Decis. Support Syst., vol. 55, no. 4, pp. 919–926,
performance of sentiment analysis task on product reviews by
Nov. 2013, doi: 10.1016/j.dss.2012.12.028.
handling both local and global context,” Int. J. Inf. Decis. Sci.,
[7] R. Kumar Behera, S. Kumar Rath, S. Misra, R. Damaševičius,
vol. 12, no. 1, p. 75, 2020, doi: 10.1504/IJIDS.2020.104992.
and R. Maskeliūnas, “Distributed Centrality Analysis of
[23] A. Ritter, Mausam, O. Etzioni, and S. Clark, “Open domain
Social Network Data Using MapReduce,” Algorithms, vol. 12,
event extraction from twitter,” in Proceedings of the 18th
no. 8, p. 161, Aug. 2019, doi: 10.3390/a12080161.
ACM SIGKDD international conference on Knowledge
[8] S. Vohra and J. Teraiya, “Applications and Challenges for
discovery and data mining, New York, NY, USA: ACM, Aug.
Sentiment Analysis : A Survey,” Int. J. Eng. Res. Technol.,
2012, pp. 1104–1112. doi: 10.1145/2339530.2339704.
vol. 2, 2013.
[24] A. R. Razon, J. A. Barnden, and J. A. Barnden@cs, “A New
[9] A. Alsayat, “Improving Sentiment Analysis for Social Media
Approach to Automated Text Readability Classification based
Applications Using an Ensemble Deep Learning Language
on Concept Indexing with Integrated Part-of-Speech n-gram
Model,” Arab. J. Sci. Eng., vol. 47, no. 2, pp. 2499–2511, Feb.
Features.”
2022, doi: 10.1007/s13369-021-06227-w.
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 68
[25] M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on and Sentiment Analysis using Machine Learning and Deep
sentiment analysis methods, applications, and challenges,” Learning Techniques,” Comput. Intell. Neurosci., vol. 2022,
Artif. Intell. Rev., vol. 55, no. 7, pp. 5731–5780, Oct. 2022, 2022, doi: 10.1155/2022/5211949.
doi: 10.1007/s10462-022-10144-1.
[26] A. P. Rodrigues et al., “Real-Time Twitter Spam Detection

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(15s), 63–69 | 69

Fake News Detection
100% (1)
Fake News Detection
44 pages
Approaching Almost Any NLP
No ratings yet
Approaching Almost Any NLP
118 pages
West Philippine Sea Research Article
No ratings yet
West Philippine Sea Research Article
13 pages
Sentiment Analysis of Product Reviews A Review
No ratings yet
Sentiment Analysis of Product Reviews A Review
6 pages
Reasearch Paper
100% (1)
Reasearch Paper
9 pages
Sentiment Analysis Using Transfer Learning For E-Commerce Websites
No ratings yet
Sentiment Analysis Using Transfer Learning For E-Commerce Websites
5 pages
CC Assignment-1
No ratings yet
CC Assignment-1
7 pages
Ijet V3i3p32
No ratings yet
Ijet V3i3p32
5 pages
Product Rating Through Sentiment Analysis
No ratings yet
Product Rating Through Sentiment Analysis
23 pages
Customer Product
No ratings yet
Customer Product
5 pages
VG Computer Science AI Recommender
No ratings yet
VG Computer Science AI Recommender
18 pages
Comparitive Fraud App
No ratings yet
Comparitive Fraud App
5 pages
Exploring Sentiment Analysis Techniques in Natural Language Processing: A Comprehensive Review
No ratings yet
Exploring Sentiment Analysis Techniques in Natural Language Processing: A Comprehensive Review
6 pages
44 - Aspect-Level Sentiment Analysis On E-Commerce Data
No ratings yet
44 - Aspect-Level Sentiment Analysis On E-Commerce Data
5 pages
A Comprehensive Review On Sentiment Analysis
No ratings yet
A Comprehensive Review On Sentiment Analysis
29 pages
MLRP
No ratings yet
MLRP
8 pages
Comparative Study of Available Technique For Detection in Sentiment Analysis
No ratings yet
Comparative Study of Available Technique For Detection in Sentiment Analysis
5 pages
Kartik-20CS46 Report
No ratings yet
Kartik-20CS46 Report
43 pages
NLP Unit 6
No ratings yet
NLP Unit 6
16 pages
Unit V Sentiment Analysis
No ratings yet
Unit V Sentiment Analysis
17 pages
A Review On Sentiment Analysis Using Machine Learning
No ratings yet
A Review On Sentiment Analysis Using Machine Learning
5 pages
Sentiment Analysis of Comment Texts Based On BiLSTM
No ratings yet
Sentiment Analysis of Comment Texts Based On BiLSTM
11 pages
A Comprehensive Study On Lexicon Based Approaches For Sentiment Analysis
No ratings yet
A Comprehensive Study On Lexicon Based Approaches For Sentiment Analysis
7 pages
Aspect-Based Sentiment Analysis
No ratings yet
Aspect-Based Sentiment Analysis
38 pages
Sentiment Analysis of Tweets Using Python: Dr. Ritesh Srivastava, Bharat Singh, Choudhary Rishab Kumar, Prashant Raj
No ratings yet
Sentiment Analysis of Tweets Using Python: Dr. Ritesh Srivastava, Bharat Singh, Choudhary Rishab Kumar, Prashant Raj
4 pages
Social Media Sentiment Analysis Document
No ratings yet
Social Media Sentiment Analysis Document
6 pages
A Survey On Sentimental Analysis Techniques and Its Usage in Recommendation Systems
No ratings yet
A Survey On Sentimental Analysis Techniques and Its Usage in Recommendation Systems
6 pages
Sentiment Analysis of Twitter Data: A Survey of Techniques: Vishal A. Kharde S.S. Sonawane
No ratings yet
Sentiment Analysis of Twitter Data: A Survey of Techniques: Vishal A. Kharde S.S. Sonawane
11 pages
Sentiment Analysis of A Product Based On User Reviews Using Random Forests Algorithm
No ratings yet
Sentiment Analysis of A Product Based On User Reviews Using Random Forests Algorithm
5 pages
A Comparative Study of Different Classification Te
No ratings yet
A Comparative Study of Different Classification Te
10 pages
Sentiment Analysis: Approaches and Open Issues: Shahnawaz Parmanand Astya
No ratings yet
Sentiment Analysis: Approaches and Open Issues: Shahnawaz Parmanand Astya
5 pages
Kherwa 2014
No ratings yet
Kherwa 2014
7 pages
Sentimental Analysis Using NLP
No ratings yet
Sentimental Analysis Using NLP
5 pages
Minor Fnal
No ratings yet
Minor Fnal
22 pages
Sentiment Analysis Using Microsoft Azure Machine Learning and Python IJERTV10IS110099
No ratings yet
Sentiment Analysis Using Microsoft Azure Machine Learning and Python IJERTV10IS110099
4 pages
Sentimental Analysis Final Year Project
No ratings yet
Sentimental Analysis Final Year Project
21 pages
Paper 8848
No ratings yet
Paper 8848
4 pages
Sentiment Analysis: Natural Language Processing (NLP) Customer Feedback
No ratings yet
Sentiment Analysis: Natural Language Processing (NLP) Customer Feedback
12 pages
Abstract
No ratings yet
Abstract
5 pages
A Review On Sentiment Analysis Techniques For Reshaping Business
No ratings yet
A Review On Sentiment Analysis Techniques For Reshaping Business
10 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Sentimental Analysis
100% (2)
Sentimental Analysis
171 pages
Sentiment Analysis Over Social Networks: An
No ratings yet
Sentiment Analysis Over Social Networks: An
6 pages
RES Presentation
No ratings yet
RES Presentation
21 pages
Team 1 Research Paper ..
No ratings yet
Team 1 Research Paper ..
11 pages
Sentiment Analysis To Handle Complex Linguistic Structures: A Review On Existing Methodologies
No ratings yet
Sentiment Analysis To Handle Complex Linguistic Structures: A Review On Existing Methodologies
7 pages
A Critical Review of Sentiment Analysis: Fatehjeet Kaur Chopra Rekha Bhatia
No ratings yet
A Critical Review of Sentiment Analysis: Fatehjeet Kaur Chopra Rekha Bhatia
4 pages
V4I9201545
No ratings yet
V4I9201545
8 pages
Sentiment Analysis On Product Reviews-1
No ratings yet
Sentiment Analysis On Product Reviews-1
5 pages
Sentiments of Public Opinion
No ratings yet
Sentiments of Public Opinion
3 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
15 pages
SA Notes
No ratings yet
SA Notes
61 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Minor New Report
No ratings yet
Minor New Report
45 pages
Sentiment Analysis of User Comment Text Based On L
No ratings yet
Sentiment Analysis of User Comment Text Based On L
13 pages
Review On Developing Corpora For Sentiment Analysis Using Plutchik's Wheel of Emotions With Fuzzy Logic
No ratings yet
Review On Developing Corpora For Sentiment Analysis Using Plutchik's Wheel of Emotions With Fuzzy Logic
9 pages
Sentiment Analysis of Amazon Reviews Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis of Amazon Reviews Using Machine Learning Algorithms
23 pages
Research Ashish
No ratings yet
Research Ashish
7 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
37 pages
Major Project Presentationn (2) - 1
No ratings yet
Major Project Presentationn (2) - 1
51 pages
Sentiment Analysis of Text and Audio Data IJERTV10IS120009
No ratings yet
Sentiment Analysis of Text and Audio Data IJERTV10IS120009
4 pages
Whitehead's The Function of Reason
From Everand
Whitehead's The Function of Reason
Alfred North Whitehead
No ratings yet
How to Research Qualitatively: Tips for Scientific Working
From Everand
How to Research Qualitatively: Tips for Scientific Working
Martin Gertler
No ratings yet
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
CS F469 Handout
No ratings yet
CS F469 Handout
4 pages
5 B IRModels
No ratings yet
5 B IRModels
51 pages
Analytics Concepts Social Listening
No ratings yet
Analytics Concepts Social Listening
10 pages
CT3 Set A
No ratings yet
CT3 Set A
3 pages
Combine PDF
No ratings yet
Combine PDF
18 pages
Nicholo Manuscript
No ratings yet
Nicholo Manuscript
50 pages
Social Computing (2019 Pattern, Semester VIII) - Exam Questions and Answers
No ratings yet
Social Computing (2019 Pattern, Semester VIII) - Exam Questions and Answers
25 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
29 pages
Utilizing Vector Space Models For Identifying Legal Factors From Text
No ratings yet
Utilizing Vector Space Models For Identifying Legal Factors From Text
10 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
Garv Gupta X HT
No ratings yet
Garv Gupta X HT
59 pages
Movie Recommendation System Based On Emotion Detection Using Machine Learning Te
No ratings yet
Movie Recommendation System Based On Emotion Detection Using Machine Learning Te
6 pages
Mining The Web Searching and Integration
No ratings yet
Mining The Web Searching and Integration
5 pages
Movie Popularity and Target Audience Prediction Using The Content-Based Recommender System
No ratings yet
Movie Popularity and Target Audience Prediction Using The Content-Based Recommender System
17 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Introduction To Indexing Structure and Designing An Information Retrieval
No ratings yet
Introduction To Indexing Structure and Designing An Information Retrieval
22 pages
A Novel Stacking Approach For Accurate Detection of Fake News
No ratings yet
A Novel Stacking Approach For Accurate Detection of Fake News
14 pages
17BEC096
No ratings yet
17BEC096
61 pages
Probabilistic IR: Giorgio Gambosi
No ratings yet
Probabilistic IR: Giorgio Gambosi
42 pages
A Fast Corpus-Based Stemmer
No ratings yet
A Fast Corpus-Based Stemmer
16 pages
Multi Document Summarization Research Paper 1
No ratings yet
Multi Document Summarization Research Paper 1
26 pages
IR Question Bank
100% (2)
IR Question Bank
29 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Automating Hierarchical Document Classification For Construction Management Information Systems
No ratings yet
Automating Hierarchical Document Classification For Construction Management Information Systems
12 pages
Spam Detection
No ratings yet
Spam Detection
39 pages

19

Uploaded by

19

Uploaded by

International Journal of

INTELLIGENT SYSTEMS AND APPLICATIONS IN

Enhancement of the Lexical Approach by N-Grams Technique via

Submitted: 11/12/2023 Revised: 23/01/2024 Accepted: 02/02/2024

1. Introduction thinks about a particular topic [3]. In the context of mood

Fig. 5. Word cloud of the dataset

Fig. 7. Visualization of accuracy

You might also like