0% found this document useful (0 votes)
83 views10 pages

Emotion Recognition by Textual Tweets Classification Using Voting Classifier (LR-SGD)

This document summarizes a research paper on emotion recognition from textual tweets using machine learning classifiers. The researchers implemented 7 machine learning models to classify tweets as expressing either happy or unhappy emotions. They found that their proposed voting classifier using logistic regression and stochastic gradient descent, along with TF-IDF feature extraction, achieved the most optimal results with 79% accuracy and 81% F1 score. They further validated the robustness of their approach on two additional datasets, one for binary classification and one for multi-class classification of product reviews.

Uploaded by

MadhanDhonian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views10 pages

Emotion Recognition by Textual Tweets Classification Using Voting Classifier (LR-SGD)

This document summarizes a research paper on emotion recognition from textual tweets using machine learning classifiers. The researchers implemented 7 machine learning models to classify tweets as expressing either happy or unhappy emotions. They found that their proposed voting classifier using logistic regression and stochastic gradient descent, along with TF-IDF feature extraction, achieved the most optimal results with 79% accuracy and 81% F1 score. They further validated the robustness of their approach on two additional datasets, one for binary classification and one for multi-class classification of product reviews.

Uploaded by

MadhanDhonian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Received November 26, 2020, accepted December 20, 2020, date of publication December 28, 2020,

date of current version January 12, 2021.


Digital Object Identifier 10.1109/ACCESS.2020.3047831

Emotion Recognition by Textual Tweets


Classification Using Voting Classifier (LR-SGD)
ANAM YOUSAF1 , MUHAMMAD UMER 1,2 , SAIMA SADIQ 1 , SALEEM ULLAH 1,

SEYEDALI MIRJALILI3,4,5 , (Senior Member, IEEE), VAIBHAV RUPAPARA6 ,


AND MICHELE NAPPI 7 , (Senior Member, IEEE)
1 Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan 64200, Pakistan
2 Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
3 Yonsei Frontier Laboratory (YFL), Yonsei University, Seoul 03722, South Korea
4 Department of Computer Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
5 Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Brisbane, QLD 4006, Australia
6 School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
7 Department of Computer Science, University of Salerno, 84084 Fisciano, Italy

Corresponding authors: Vaibhav Rupapara ([email protected]) and Michele Nappi ([email protected])


This work was supported by the Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology,
Rahim Yar Khan, Pakistan.

ABSTRACT The proliferation of user-generated content on social media has made opinion mining an
arduous job. As a microblogging platform, Twitter is being used to collect views about products, trends,
and politics. Sentiment analysis is a technique used to analyze the attitude, emotions and opinions of
different people towards anything, and it can be carried out on tweets to analyze public opinion on news,
policies, social movements, and personalities. By employing Machine Learning models, opinion mining can
be performed without reading tweets manually. Their results could assist governments and businesses in
rolling out policies, products, and events. Seven Machine Learning models are implemented for emotion
recognition by classifying tweets as happy or unhappy. With an in-depth comparative performance analysis,
it was observed that proposed voting classifier(LR-SGD) with TF-IDF produces the most optimal result with
79% accuracy and 81% F1 score. To further validate stability of the proposed approach on two more datasets,
one binary and other multi-class dataset and achieved robust results.

INDEX TERMS Sentiment analysis, text classification, machine learning, opinion mining, emotion recog-
nition, artificial intelligence.

I. INTRODUCTION Efficient methods are important to automatically label text


Automatic emotion recognition, pattern recognition and com- data due to its noisy nature. In the past many studies have been
puter vision have become significantly important in Arti- performed on Twitter sentiment classification [1]. As Twitter
ficial Intelligence lately with applications is a wide range is very fast and an efficient micro-blogging examination that
of areas. Recently, social media platforms such as Twitter facilitates the end users to transmit small posts are said to be
have generated enormous amounts of structured, unstructured tweets. Twitter is a highly demanding app in the world and is
and semi-structured data. One of the most recent example is a successful platform in social media.
COVID-19 infodemic that shows misinformation in social Free account can be created by using Twitter that can
media can be far more important and devastating than a provide an enormous audience potential. With the purpose
disaster such as a pendemic. of business and marketing, Twitter can be proved as the best
There is a need to analyze to accurately assign sentiment platform, through which one can get in touch with very rich
classes on a large scale. To perform such tasks, accurate NLP and famous personalities like stars and celebrities, so their
techniques and machine learning (ML) models for text clas- purchasing can be very charming for them as well as for
sification are required. Twitter provides an opportunity to its advertisers. Using Twitter, every celebrity is linked with fans
users to analyze its data on a large and broader point of view. as well as to grant a communication to followers. Such a
platform is one of the superlative approaches for lovers as
The associate editor coordinating the review of this manuscript and well. But, it has a short note range; only 140 letters for each
approving it for publication was F. K. Wang . post and it can type a post or link on the website since it has

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
6286 VOLUME 9, 2021
A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

no cost and also open as the advertisements as well. There is multi-class dataset (containing product reviews having
no problem with clusters of personal ads which are similar to 1 to 5 ratings).
other social networking sites. It is quick because as a tweet is The rest of the paper is organized as follows. Section II
posted on Twitter, the public who is subsequent to respective discusses literature related to the current research work.
business will get it without delay. Section III presents the proposed methodology as well as as
Companies and advertisers can compose utilization of this detailed description of the tweet dataset used in the experi-
source to check the diverse operational point of views which ment. Results are presented in Section IV and the stability
are very considerable. With help of this, they will obtain an of proposed model is given in Section V. Section VI finally
immediate response from their followers. Remarkably, a lot conclude the research work and also suggest future work.
of businesses with the intention of purchase, Twitter followers
increase their deals. Twitter facilitates the followers by mak- II. RELATED WORK
ing them identify regarding fresh business, products, services, Sentiment analysis inspires corporations to define clients’
websites, blogs, eBooks etc. Consequently, Twitter clients preferences about products, services, and brands. Further,
might tick lying on link and also optimistically endow in a it plays an important role in interpreting information about
manufactured goods or examine the products presented and to industries and corporations to reserve them in making entity
get share in profit. It is extremely effortless to utilize as people review. Sarlan et al. [2] established a sentiment analysis
can follow to get the news and updates, as organizations can through extracting number of tweets with the help of proto-
tweet or re-tweet, they can mark favorite or selected people to typing and the results organized customers’ views via tweets
send the tweets, also know how to propel the posts plus to be into positive and negative. Their research divided into two
able to endow their money and instance through it. Academy, phrases. The first part is based on literature study which
Industry, super bowls and Grammy Awards of such major involves the Sentiment analysis techniques and methods
Sports and Entertainment events generate a lot of buzz in the that nowadays are used. In the second part, the application
global world by using it. necessities and operations are described preceding to its
Competition is rising among different products on Twitter. development.
People love to express their feelings about a particular prod- In another research Alsaeedi and Zubair Khan [3] analyzed
uct on social networks like twitter. Product owners are ready various kinds of sentiment analysis that is applied on to
to spend more money on social media platforms to better Twitter dataset and its conclusions. The distinct approaches
advertise their products and to generate more revenue. When and conclusions of algorithm performance were compared.
a person shares experience about a product, it helps the Methods were used which were supervised ML based,,
owner to change their market strategy, selling schemes, and lexicon-based, ensemble methods. Authors used four meth-
improving the quality. Customer reviews serve as a feed- ods that were Twitter sentiment Analysis using Supervised
back to the owners or manufacturers too.The data generated ML Approaches; Twitter sentiment Analysis using Ensem-
in such a way is of large amount and requires an anal- ble Approaches. Twitter sentiment Analysis is using lexicon
ysis expert team to classify the customer sentiment from based Approaches.
the reviews. Experts can make a human error in sentiment Lexicon based approaches have been explored by many
analysis, therefore it requires machine learning and ensemble researchers for emotion classification. Bandhakavi et al. [4]
learning classifiers to accurately classify the sentiment of the performed emotion-based feature extraction using domain
customers. specific lexicon generation. They captured association of
This study compares various machine learning mod- words and emotions using a unigram mixture model. They
els for emotion recognition by tweet classification using used tweets that are weakly labelled to classify emotions.
Tf and TF-IDF. This research presents a voting classifier Their proposed architecture outperformed other state-of-the-
(LR-SGD) and aims to estimate the performance of famous art approaches such as Latent Dirichlet Allocation and Point
ML classifiers on twitter datasets. The key contributions are wise Mutual Information. Event related tweets are identified
as follows: by researchers on geo related tweets [5]. They used specific
• Machine learning-based classifiers including support tweets of local festivities in one year. They also identified dif-
vector machine (SVM), Decision Tree Classifier (DTC), ferent parameters that helped in event discovery. Alsinet et al.
Naive Bayes (NB), Random Forest (RF), Gradient [6] analyzed tweets from political domains. They claimed
Boosting Machine (GBM) and Logistic Regression (LR) accepted tweets are stronger as compared to the rejected
trained on Twitter dataset are compared for emotion tweets. Rumor detection in tweets is performed by using an
recognition. encoder to analyze human behavior in comments [7].
• A voting classifier (VC) designed to classify tweets Hakh et al. [8] used SMOTE method to remove exces-
which combines LR and SGD and outperformed using sive challenges of Twitter dataset. In addition, they applied
TF-IDF. different feature selections for rapidity of sentiment analysis
• The proposed model stability is further validated by method. Authors projected methodology that was estimated
applying it on two different datasets, one binary dataset beside the dataset application decision, squashy favorable
(containing hatred or non-hatred classes) and other results on all operated evaluation metrics. Pre-processing

VOLUME 9, 2021 6287


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

steps were applied on their dataset after that they used TF-IDF recognized for the most part positive estimation toward halal
features that were used to measure important weight of food, while geo-found Twitter maps indicated that "strict
terms. Then classification methods were used (i.e. AdaBoost, diaspora" broadly utilizes computerized presents on impart
Linear SVM, Kernel SVM, Random Forest, Decision Tree, about halal food.
Naïve Bayes and K-NN) and at last to relate classification’s Parveen and Pandey [15] studied sentiment analysis on
effectiveness: Accuracy and F1-score measures were used. Twitter dataset that uses NB algorithm. Analyst use Hadoop
In [9], Xia et al. created the proportional training of the Framework for preparing film informational collection which
efficiency about collaborative method on behalf of Senti- is reachable on Twitter site as reviews, input and opinions.
ment’s arrangement. They set two types of feature in the Sentiment analysis on Twitter data is explored in three classes
context of sentiment analysis. Firstly, the feature set was that are positive, negative and neutral. Alomari et al. [16]
totally depend on the part of speech and word relation analyzed SVM utilizing TF-IDF. The study presented the
was depending on the feature set. Secondly, the following Arabic Jordanian Twitter corpus where Tweets are explained
familiar text classification algorithms that were maximum seeing that any positive or negative. It researched distinctive
entropy,support vector machines and naive Bayes. Thirdly, directed machine learning opinion examination classifiers
the following ensemble strategies, that was the fixed com- when applied to Arabic client’s online life of general subjects
bination, meta-classifier combination and weighted combi- that are found in either Modern Standard Arabic (MSA) or
nation. They used 5 document-level datasets broadly utilized Jordanian tongue. Analyses were conducted to assess the
along with arena of Sentiment’s arrangement. Experiments utilization of various weight plans, stemming and N-grams
shown in this research the ensemble techniques are more terms strategies and situations.
effective than rest of the classifier which is also shown in Gamal et al. [17] built Twitter benchmark dataset for
our search that ensemble of two classifiers that are Logistics Arabic Sentiment Analysis. A benchmark Arabic dataset sug-
regression and stochastic gradient decent classifiers ensemble gested in experiment for estimation investigation demonstrat-
and give better result than other classifiers. ing social event strategy about the latest tweets in various
Deep learning has been utilized by many researchers Arabic vernaculars. The experiment dataset incorporates in
for image classification [10] and tweet classification [11]. excess of 151,000 unique assessments which marked into two
Rustam et al. [12] presented a Tweets Classification for classes, negative and positive. ML algorithms are functioned
US Airline Companies Sentiments. The researcher applied in SC; ML algorithm attached through learning arrangements.
pre-processing on the dataset. The influence about feature Sentiment analysis ordinarily executed using one fundamen-
extraction methods, together with TF, TF-IDF, along with tal methodology from a ML(lexicon-based approach) based
word2vec, proceeding the classification accuracy has been approach. The calculations functioned via SC on the dataset
examined. In addition, execution about the long short-term accomplished 99.90% precision utilizing TF-IDF.
memory (LSTM) was studied in certain dataset. Paper of Kumar and Garg [18] explored the sentiment analy-
researcher proposes a Voting Classifier (VC) who helps sis of multimodal Twitter data. The experiment utilized a
to process similar administrations. Voting Classifier must multi-method feeling examination approach to decide slant
dependent the Spatial Estimation (SE), Stochastic Gradi- extremity mark for approaching tweet that is printed picture
ent Descent classifier (SGDC) along with simple ensemble information realistic. Picture estimation marking was accom-
method for concluding results. Various types of ML classi- panied by utilizing SentiBank along with SentiStrength mark-
fiers tested with the use of precision, accuracy, recall and ing for Regions with convolution neural network (R-CNN).
F1-score by way of working metrics. Results indicate that For a picture posted in Twitter, the picture module is executed
proposed VC is more efficient than one of the phase actors. which utilizes a current module of SentiBank along with
The experiment also demonstrated the efficiency of machine R-CNN that decide the feeling estimation mark of the picture.
learning students improved while TF-IDF utilizes a feature After pre-processing, the content module utilizes an AI-based
input. troupe strategy gradient boosting to characterize tweets into
Santos and Bayser [13] examined a sentiment analysis of extremity classifications, to be specific, positive, negative or
short texts. In the experiment, researchers suggest a first-hand neutral High execution exactness of 91.32% is watched on
profound convolution neural network that achieve from char- behalf of arbitrary multi method tweet dataset utilize assess
acter to sentence level material to accomplish sentiment the planned model. Sailunaz [19] investigated the feeling
analysis of little texts. Mohamed [14] evaluated a sentiment through the dataset that analyzed by a sentiment analysis from
analysis of mining halal food consumers. This examina- Twitter texts. The objective this work was to recognize and
tion fills this gap through the investigation of an irregular investigate assessment and feeling communicated by individ-
example of 100,000 tweets managing halal food. To lead uals from content in their Twitter posts and to use them for
the examination, a specialist predefined dictionary of seed creating suggestions.
descriptors was utilized. By investigating halal food feelings The dataset is utilized to recognize slant and feeling from
communicated via web-based networking media, this exam- tweets and their answers and estimated the impact scores of
ination adds expansiveness and profundity to the discussion clients dependent on different Tweet based and client based
over such an underrepresented region. Distinct investigation parameters. The strategy we utilized in this paper include

6288 VOLUME 9, 2021


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

TABLE 1. Dataset specifications.

several fresh approaches: (I) remembering answers to tweets


for the dataset and estimations, (II) presenting understanding
score, slant score and feeling score of answers in impact
score computation, (III) producing customized and general
proposal consisting rundown of clients who conceded to a FIGURE 1. Countplot showing class-wise data distribution.

similar subject and communicated comparable feelings.


its performance. Pre-processing plays a vital role in improv-
III. PROPOSED METHODOLOGY ing the efficiency of ML models and saving computational
In this research, different techniques have been used for resources. Text pre-processing boosts the prediction accu-
methodology in ML for its objectives. Versatile experiments racy of the model [20]. Following steps are performed
were examined using different methods and techniques. Mul- in pre-processing; tokenization, case-conversion, stopwords
tiple classifiers applied on the dataset, but the Voting classifier removal and removal of numbers.
is an ensemble of Logistic Regression and Stochastic Gradi-
2) FEATURE EXTRACTION
ent Descent outperforms than all other ML models in terms
After the data pre-processing step, the next essential step
of accuracy, recall, precision and F1-score.
is the choice of features on a refined dataset. Supervised
Twitter dataset used in this experiment is scrapped from
machine learning classifiers require textual data in vector
Kaggle repository. First the dataset is pre-processed by
form to get trained on it. The textual features are converted
removing unwanted data. Then, the data was split into two
into vector form using TF and TF-IDF techniques [21]–[23]
sets: training set and testing set. The training set was given
in this work. Features extraction techniques not only convert
the percentage of 70% while the test set portion is 30%.
textual features into vector form but also helps to find sig-
After that feature engineering techniques are applied on the
nificant features necessary to make predictions. For the most
training set. Multiple machine learning classifiers are trained
part all features do no contribute to the prediction of the target
on the training set and tested using the test set. The evalu-
class. That is the reason feature extraction is the important
ation parameters used in this experiment are: (a) Accuracy
part in the recognition of happy and unhappy related tweets.
(b) Recall (c) Precision (d) F1-score.
What actually Term Frequency(TF) means that, according
A. DATASET to what often the term arises within the document? It’s mea-
Dataset contains a lot of contrary tweets. The dataset is sured by TF. This will be achievable with the intention of a
called ‘‘Sentiment Analysis on Twitter data" and contains term would seem a lot further in lengthy documents than short
99989 records. Every record is labeled as happy and unhappy documents because every document is variant in extent. Like
according to its sentimental polarity using symbol 1 and 0. the mode about standardization:
Tweets which are in English are remembered for the fin- No. of times term t shows in a document
TF(t) = (1)
ished dataset. The dataset contains different features. Table 1 Total no. of terms inside document
contains features and description of each feature. The term frequency be frequently divided with the docu-
B. DATA VISUALIZATION ment length (the total number of terms in the document). IDF:
Data Visualization helps to understand the hidden patterns Inverse documents frequency proceeds to find how much a
lying inside the dataset. It helps to qualitatively get more term is significant within the text. Every term is measured
details about the dataset by visualizing the characteristics of equally when TF is computed. Nevertheless it is recognized
the attributes. Figure 1 shows the ratio of two target classes that convinced terms, like "is", "of", and "that", can show
happy and unhappy. Figure 1 also illustrates that the happy much more times except contain small prominence. There-
class has more average than the unhappy class. fore frequent terms are needed to be weighed down as level
Figure 1 show the percentage of classes, percentage classes up exceptional ones, through calculating following:
show that 56.5% tweets are happy tweets and 43.5% tweets Total No. of documents
IDF(t) = log(e)
are related to unhappy tweets. No. of documents through term t in it
(2)
1) DATA PRE-PROCESSING
Datasets contain unnecessary data in raw form that can Term frequency (TF) is utilized regarding data recovery
be unstructured or semi-structured. Such unnecessary data and shows how regularly an articulation (term, word) happens
increases training time of the model and might degrades in a report.

VOLUME 9, 2021 6289


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

FIGURE 2. Proposed methodology architecture diagram.

C. PROPOSED MODELS FOR TWEETS SENTIMENT 3) NAIVE BAYES


CLASSIFICATION Ordering approach, Naive Bayes(NB), with sturdy (naive)
In this section classifiers utilized for tweet classification will independent assumptions among stabilities, depends on
be discussed. Figure 2 shows the proposed methodology of Bayes’ Theorem. NB classifier anticipates that the proximity
data and work flow of this research work. This work uti- of a specific element of class that is confined to the closeness
lized five supervised machine learning algorithms: Support of a couple of different variables. For instance, a natural
Vector Machines (SVM), Naive Bayes (NB), Random Forest organic product is presumably viewed as an apple, if its shad-
(RF), Decision Tree (DT), Gradient Boosting model (GBM), ing is dark red, if type of it is round and it is roughly 3 creeps
Logistic Regression (LR) and Voting Classifier(Logistic in expansiveness. In machine learning, Naive Bayes classi-
Regression + Stochastic Gradient Descent classifier). fiers are a gathering of essential "probabilistic classifiers"
considering applying Bayes’ speculation with gullible oppor-
1) RANDOM FOREST
tunity assumptions between the features. They are considered
RF is a tree based classifier in which input vector generated as the minimum problematic Bayesian network models.
trees randomly. RF uses random features, to create multiple
decision trees, to make a forest. Then class labels of test D. DECISION TREE
data are predicted by aggregating voting of all trees. Higher DT algorithm is the category of supervised ML and is being
weights are assigned to the decision trees with low value error. widely used in regression and classification tasks. Selection
Overall prediction accuracy is improved by considering trees of root node of a tree of each level is its main challenge which
with low error rate. is called as attribute selection [28]. Gini index and infor-
mation gain are most commonly used methods for attribute
2) SUPPORT VECTOR MACHINE
selection. In this study, gini index is used to find probability
The Support vector machine (SVM) is understood that exe- of root node by calculating sum of squares of attribute values
cutes properly as sentiment analysis [24]. SVM typifies pref- and then subtracted by 1.
erence, confines and makes usage of the mechanisms for the
assessment and examines records, which are attained within 1) GRADIENT BOOSTING MACHINE
the index area [25]. Arrangements of vectors for every mag- GBM is a ML based boosting model and is widely being
nitude embody crucial details. Information (shown in form of used for regression and classification tasks, which works by a
vector) has been arranged in type to achieve this target. Next, model formed by ensemble of weak prediction models, com-
the border is categorized in two training sets by stratagem. monly decision trees [29], [30]. In boosting, weak learners
This is a long way from any area in the training samples are converted to strong learners. Every new generated tree
[26]. Support-vector machines in machine learning includes is a modified form of previous one and use gradient as loss
focused learning models connected to learning evaluations function. Loss calculate the efficiency of model coefficients
which inspect material that is exploited to categorize, also fitting over underlying data. Logically loss function is used
revert inspection [27]. for model optimization.

6290 VOLUME 9, 2021


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

2) LOGISTIC REGRESSION
In LR class probabilities are estimated on the basis of out-
put such as they predict if the input is from class X with
probability x and from class Y with probability y. If x is
greater than y, then predicted output class is X, otherwise Y.
Insight, a logistic approach used for demonstrating the prob-
ability of a precise group or else, occurrence is obtainable,
e.g., top/bottom, white/black, up/down, positive/negative or
happy/unhappy. This is able to stretch out and to show a
small number of classes about events, for example, to make
a decision if a image includes a snake, hound, deer, etc.,
every article being famous in the image would be appointed
a probability wherever in the series of 0 and 1 with whole
addition to one [31].

3) STOCHASTIC GRADIENT DESCENT


FIGURE 3. Proposed voting classifier architecture (LR-SGD).
Gradient Descent’s types include Stochastic Gradient
Descent (SGD). SDGD is an iterative strategy for advancing LR calculates posterior probability p(Ct|v) by applying sig-
a target work with appropriate perfection properties (for moid function on input for binary classification [40]. VC can
example differentiable or sub differentiable) [32]. Degree be explained as:
of advancement is calculated by it in light of development n
X n
X
of alternative variables. It is very well, may be viewed as a p = argmax{
b LRi , SGDi }. (3)
stochastic guess of inclination plummet advancement, since i i
it replaces the genuine angle (determined from the whole n n
informational index) by a gauge thereof (determined from Here
X
LRi and
X
SGDi both will give prediction proba-
an arbitrarily chosen subset of the information) [33].
i i
bilities against each test example. After that, the probabilities
4) VOTING CLASSIFIER for each test example by both LR and SGD passes through
Voting Classifier(VC) is a cooperative learning which the soft voting criteria as shown in Figure 3.
engages multiple individual classifiers and combines their The functionality of the VC can be explained with an
predictions, which could attain better performance than a example. When a given sample passes through the LR
single classifier [34]. It has been exhibited that the mixture and SGD, probability score is assigned to each class (that
of multiple classifiers could be more operative compared to can be positive or negative). Let LR’s probability score be
any distinct ones [35]. The VC is a meta-classifier for joining 0.966, 0.024, and for ProbLR − Pos and ProbLR − Neg
tantamount or hypothetically exceptional ML classifiers for classes and SGD’s probability score be 0.997, 0.002 for
order through greater part throwing a voting form. It executes ProbSGD − Pos, and ProbSGD−Neg, respectively. Then the
"hard" and "soft" casting a ballot. Hard voting gives the average probability for the two classes can be calculated as
researcher the chance to foresee the class name in place of
the last class mark that has been anticipated often through Avg-Pos = (0.966 + 0.997)/2 = 0.9815
models of characterization. Soft voting provides researchers Avg-Neg = (0.024 + 0.002)/2 = 0.013
the chance of anticipating the class names through averaging Final prediction is the MaxProb(Avg − PosandAvg − Neg).
the class-probabilities [36]. In this example answer is the positive class. As predicted class
Nowadays, progressively, researchers are concerned with is ‘positive’ and the actual class is also positive in the dataset.
cooperative learning because it gives better results [37]. This The proposed VC combines predicted probabilities of both
research contains voting classifiers by merging two classifiers classifiers to make the final decision. MLR and MSGD that are
that are VC(LR-SGD) and with the help of this voting classi- trained on the dataset and then predict the probability for both
fier maximum results are achieved. SGD is an iterative strat- classes separately.An average probability is calculated for
egy for advancing a target work with appropriate perfection each class form the probability predicted by two classifiers.
properties (for example differentiable or sub differentiable). The decision function is then decides the final class of the
In this research, a voting classifier with multiple parameters is review which is based on the maximum average probabil-
used, that has used two individual classifiers that are LR and ity for a class. The working mechanism of the LR-SGD is
SGD and also passes another parameter which is ‘‘voting’’ as presented in Algorithm 1.
‘‘soft’’. SGD is used to solve problems like redundancies in
dataset and for big data. It performs classification by penalty E. EVALUATION METRICS
and loss function [38]. It is similar to gradient decent and ML models are evaluated on many commonly used
looks at one sample for each step [39]. On the other hand, performance indicators such as accuracy, recall, precision

VOLUME 9, 2021 6291


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

Algorithm 1 Ensembling of Logistic Regression and TABLE 2. Classification result of all machine learning models using TF
Stochastic Gradient Descent (LR-SGD) features.

Input: input data (x, y)N


i=1
MLR = Trained_ LR
MSGD = Trained_ SGD

1: for i = 1 to M do
2: if MLR 6 = 0 & MSGD 6 = 0 & training_set 6 = 0 then
3: ProbSGD − Pos = MSGD .probibility(Pos − class)
4: ProbSGD − Neg = MSGD .probibility(Neg − class)
5: ProbLR − Pos = MLR .probibility(Pos − class)
6: ProbLR − Neg = MLR .probibility(Neg − class)
7: Decision function =
1 P
max( Nclassifier classifier (Avg(ProbSGD−Pos,ProbLR−Pos)
, Avg(ProbSGD−Neg,ProbLR−Neg) ))
8: end if
9: Return final label b p
10: end for

FIGURE 4. Classification result comparison of all machine learning


and F1-score in classification tasks. Accuracy measures models using TF features.
prediction correctness and is measured as:
Number of correctly classified predictions TABLE 3. Classification result of all machine learning models using
Accuracy = TF-IDF features.
Total predictions
(4)
while in case of binary classification, accuracy is
measured as:
TP + TN
Accuracy = (5)
TP + FP + TN + FN
whereas TP is true positive, FP is false positive, TN is true
negative, and FN is false negative and can be defined as [10]. considering both precision and recall of the model [41].
TP: TP represents the positive predictions of a correctly F1-score can be computed as:
predicted class. precision.recall
FP: FP represents the negative predictions of a incorrectly F1score = 2 (8)
precision + recall
predicted class.
TN: TN represents the negative predictions of a correctly IV. RESULTS AND DISCUSSION
predicted class. This section provides the details of the experiment con-
FN: FN represents the positive predictions of a incorrectly ducted in this research and the discussion of obtained results.
predicted class Classification algorithms are tested using TF and TF-IDF fea-
Precision measures the exactness of a classifier and tures. Voting Classifier as an ensemble of Stochastic Gradi-
determine percentage of positive labeled tuples that are ent Descent and Logistic Regression gives highest accuracy.
actually positive. It can be measured as: Table 2 presents the Accuracy, Recall, Precision and F1-score
TP of classification with TF features.
Precision = (6) Figure 4 presents the results of all the classifiers and com-
TP + FP
parison between them. By using the TF feature. It can be seen
While on the other hand recall measures completeness and
that the Voting Classifier is best with accuracy 78% among all
it presents the percentage of correctly labelled true positive
classifiers.
tuples. Recall can be measured as:
A Voting Classifier displays best outcome when it works
TP with Stochastic Gradient Descent and Logistic Regression
Recall = (7)
TP + FN and provides maximum accuracy.
For imbalance dataset, accuracy alone cannot be a good Table 3 shows the accuracy, recall, precision and F1-score
evaluation measure. F1 score, that is the harmonic mean of classification with TF-IDF technique. Voting classifier
of recall and precision, can help in such cases. It performs achieved the highest accuracy value with 79% and LR
statistical analysis and computes score between 1 and 0 by achieved 78%. LR achieved the highest precision value with

6292 VOLUME 9, 2021


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

FIGURE 5. Classification result comparison of all machine learning


models using TF-IDF features.
FIGURE 6. Ratings assigned by customers.
TABLE 4. Details of datasets used to check proposed model stability.
of datasets. The second dataset used contains Dresses, Tweets
of 20 garment products, Pants, Sweaters and KnitsBlouses.
Rating range assigned by users are from 1 to 5, as shown
in Figure 6. Dataset 3 consists of Tweets that contain sup-
portive and hostile reviews, and that are to be classified as non
TABLE 5. List of features of dataset with their description. hatred and hatred. The details of both datasets are presented
in table 4 and table 5.
The proposed model which is ensemble of LR and SGD is
applied on both dataset and the results are shown in 7. Results
revealed that the proposed model outperformed other clas-
sifiers on both binary and multi-class classification dataset.
The complete classification report of all classifiers is shown
in table 6.
As it can be observed from the above results all traditional
machine learning based models did not perform well on
all three dataset. The proposed Voting Classifier ensemble
outperforms all other traditional models. If the reason of
poor performance of RF is explored specially on dataset
2 then it is concluded as RF is an ensemble technique which
is composed of joining multiple trees which helps to deal
79% and the proposed model achieved 78%. Proposed model with outliers and noise. But for the large size dataset it is
achieved the highest recall and f1 score with 84% and 81% difficult to grasp relationship in input data [42]. RF works on
values respectively. LR individually show reasonable results bootstrap samples and if samples are not fully representatives,
with 80% recall and 80% F1-score. prediction can be inaccurate.
Figure 5 shows the results of all the classifiers and com- GBM converts weak learners to strong learners and it is
parison between them Using TF-IDF feature. It can be seen sensitive to noise and outliers. If it gets trained on weak
clearly that the proposed voting classifier is performing best learners due to noisy data which can cause overfitting. GBM
in terms of accuracy, recall and f1 score among all classifiers. shows results similar to RF on the Twitter dataset but it
V. STABILITY OF THE PROPOSED MODEL perform better on Dataset-2 and Dataset-3.
Different experiments are performed on the proposed NB works on the assumption that features are independent
approach to verify its stability under the different types of one another, that is rarely correct. Features commonly

TABLE 6. Classification report of both datasets.

VOLUME 9, 2021 6293


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

TABLE 7. Accuracy of classifiers with TF-IDF. [5] J. Capdevila, J. Cerquides, J. Nin, and J. Torres, ‘‘Tweet-SCAN: An event
discovery technique for geo-located tweets,’’ Pattern Recognit. Lett.,
vol. 93, pp. 58–68, Jul. 2017.
[6] T. Alsinet, J. Argelich, R. Béjar, C. Fernández, C. Mateu, and J. Planes,
‘‘An argumentative approach for discovering relevant opinions in Twitter
with probabilistic valued relationships,’’ Pattern Recognit. Lett., vol. 105,
pp. 191–199, Apr. 2018.
[7] W. Chen, Y. Zhang, C. K. Yeo, C. T. Lau, and B. S. Lee, ‘‘Unsupervised
rumor detection based on users’ behaviors using neural networks,’’ Pattern
Recognit. Lett., vol. 105, pp. 226–233, Apr. 2018.
[8] H. Hakh, I. Aljarah, and B. Al-Shboul, ‘‘Online social media-based sen-
timent analysis for us airline companies,’’ in New Trends in Information
depends upon each other and that is the major reason of Technology. Amman, Jordan: Univ. of Jordan, Apr. 2017.
low performance on NB on diverse featured dataset. NB per- [9] R. Xia, C. Zong, and S. Li, ‘‘Ensemble of feature sets and classifica-
formed better than tree based models (RF and GBM) on tion algorithms for sentiment classification,’’ Inf. Sci., vol. 181, no. 6,
pp. 1138–1152, Mar. 2011.
Twitter dataset but worse on dataset-2 and dataset-3. [10] M. Umer, S. Sadiq, M. Ahmad, S. Ullah, G. S. Choi, and A. Mehmood,
SVM works on by separating classes with the help of ‘‘A novel stacked CNN for malarial parasite detection in thin blood smear
hyperplane, and shows good results on binary classification images,’’ IEEE Access, vol. 8, pp. 93782–93792, 2020.
[11] S. Sadiq, A. Mehmood, S. Ullah, M. Ahmad, G. S. Choi, and B.-W. On,
problems. It separates class labels by constructing hyper- ‘‘Aggression detection through deep neural model on Twitter,’’ Future
planes between classes but for multiclass problems mostly Gener. Comput. Syst., vol. 114, pp. 120–129, Jan. 2021.
SVM is not able to separate the data. SVM performed better [12] F. Rustam, I. Ashraf, A. Mehmood, S. Ullah, and G. Choi, ‘‘Tweets clas-
sification on the base of sentiments for US airline companies,’’ Entropy,
than most of the tradition Ml models like RF, GBM and NB vol. 21, no. 11, p. 1078, Nov. 2019.
on all datasets. [13] C. D. Santos and M. G. D. Bayser, ‘‘Deep convolutional neural networks
To overcome the deficiencies of ML models, this study uti- for sentiment analysis of short texts,’’ in Proc. 25th Int. Conf. Comput.
Linguistics, Aug. 2014, pp. 69–78.
lized combination of ML models as voting classifiers. It can [14] M. Mohamed, ‘‘Mining and mapping halal food consumers: A geo-located
be seen clearly in table 3, 7 and 6, proposed VC(LR-SGD) Twitter opinion polarity analysis,’’ J. Food Products Marketing, vol. 24,
outperformed on all datasets as compared to tradition ML pp. 1–22, Dec. 2017.
[15] H. Parveen and S. Pandey, ‘‘Sentiment analysis on Twitter data-set using
based models. naive Bayes algorithm,’’ in Proc. Int. Conf. Appl. Theor. Comput. Commun.
Technol., Jan. 2016, pp. 416–419.
VI. CONCLUSION AND FUTURE WORK [16] K. M. Alomari, H. M. Elsherif, and K. Shaalan, ‘‘Arabic tweets sentimental
This paper proposed a novel combination of LR and SGD analysis using machine learning,’’ in Proc. Int. Conf. Ind., Eng. Appl. Appl.
as a voting classifier for emotion recognition by classify- Intell. Syst., Jun. 2017, pp. 602–610.
[17] D. Gamal, M. Alfonse, E.-S. M. El-Horbaty, and A.-B. M. Salem, ‘‘Twitter
ing tweets as happy or unhappy. Our experiments showed benchmark dataset for arabic sentiment analysis,’’ Int. J. Modern Edu.
that one can improve the performance of models by recog- Comput. Sci., vol. 11, no. 1, pp. 33–38, Jan. 2019.
nizing patterns efficiently and through effective averaging [18] A. Kumar and G. Garg, ‘‘Sentiment analysis of multimodal Twitter data,’’
Multimedia Tools Appl., vol. 78, no. 17, pp. 24103–24119, Sep. 2019.
combination of models. Experiments are conducted to test [19] K. Sailunaz and R. Alhajj, ‘‘Emotion and sentiment analysis from Twitter
seven machine learning models that are; (1) SVM, (2) RF, text,’’ J. Comput. Sci., vol. 36, Sep. 2019, Art. no. 101003.
(3) GBM, (4) LR, (5) DT, (6) NB and (7) VC(LR-SGD). This [20] V. Kalra and R. Aggarwal, ‘‘Importance of text data preprocessing &
implementation in RapidMiner,’’ in Proc. 1st Int. Conf. Inf. Technol.
study also employed two feature representation techniques Tf Knowl. Manage., vol. 14, Jan. 2018, pp. 71–75.
and TF-IDF. The results showed that all models performed [21] B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas,
well on tweet dataset but our proposed voting classifier ‘‘Short text classification in Twitter to improve information filtering,’’ in
Proc. 33rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr. (SIGIR), 2010,
VC(LR-SGD) outperforms by using both TF and TF-IDF
pp. 841–842.
among all. Proposed model achieves the highest results using [22] Scikit Learn. Scikit-Learn Feature Extraction With Countvectorizer.
TF-IDF with 79% Accuracy, 84% Recall and 81% F1-score. Accessed: Apr. 5, 2019. [Online]. Available: https://scikit-learn.org/
stable/modules/generated/sklearn.feature_extraction.text.Count/
The proposed model is further validated on two more dataset
[23] Scikit Learn. Scikit-Learn Feature Extraction With TF/IDF.
and achieved robust results. The future work will compare Accessed: Apr. 5, 2019. [Online]. Available: https://scikit-learn.org/stable/
more feature engineering techniques and explore more com- modules/generated/sklearn.feature_extraction.text.Tfidf/
binations of ensemble models to improve the performance. [24] P. Routray, C. K. Swain, and S. P. Mishra, ‘‘A survey on sentiment analy-
sis,’’ Int. J. Comput. Appl., vol. 76, no. 10, pp. 1–8, Aug. 2013.
In addition, new techniques will be investigated to deal with [25] A. Harb, M. Plantié, G. Dray, M. Roche, F. Trousset, and P. Poncelet, ‘‘Web
sarcastic comments. opinion mining: How to extract opinions from blogs?’’ in Proc. 5th Int.
Conf. Soft Comput. Transdisciplinary Sci. Technol. (CSTST). New York,
REFERENCES NY, USA: Association for Computing Machinery, 2008. pp. 211–217.
[1] N. F. F. da Silva, E. R. Hruschka, and E. R. Hruschka, ‘‘Tweet senti- [26] B. Pang, L. Lee, and S. Vaithyanathan, ‘‘Thumbs up? Sentiment classi-
ment analysis with classifier ensembles,’’ Decis. Support Syst., vol. 66, fication using machine learning techniques,’’ EMNLP, vol. 10, pp. 1–9,
pp. 170–179, Oct. 2014. Jun. 2002.
[2] C. Kariya and P. Khodke, ‘‘Twitter sentiment analysis,’’ in Proc. Int. Conf. [27] K. P. Bennett and C. Campbell, ‘‘Support vector machines: hype or hal-
Emerg. Technol. (INCET), Jun. 2020, pp. 212–216. lelujah?’’ Acm Sigkdd Explor. Newslett., vol. 2, no. 2, pp. 1–13, 2000.
[3] A. Alsaeedi and M. Zubair, ‘‘A study on sentiment analysis techniques of [28] D. J. Hand and N. M. Adams, ‘‘Data mining,’’ in Wiley StatsRef: Statistics
Twitter data,’’ Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 2, pp. 361–374, Reference Online. Hoboken, NJ, USA: Wiley, 2014, pp. 1–7.
2019. [29] A. Natekin and A. Knoll, ‘‘Gradient boosting machines, a tutorial,’’ Fron-
[4] A. Bandhakavi, N. Wiratunga, D. Padmanabhan, and S. Massie, ‘‘Lexicon tiers Neurorobotics, vol. 7, p. 21, Dec. 2013.
based feature extraction for emotion text classification,’’ Pattern Recognit. [30] J. Friedman, ‘‘Greedy function approximation: A gradient boosting
Lett., vol. 93, pp. 133–142, Jul. 2017. machine,’’ Ann. Statist., vol. 29, pp. 1189–1232, Nov. 2000.

6294 VOLUME 9, 2021


A. Yousaf et al.: Emotion Recognition by Textual Tweets Classification Using VC (LR-SGD)

[31] D. W. H. Jr, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regres- SALEEM ULLAH was born in Ahmedpur
sion, vol. 398. Hoboken, NJ, USA: Wiley, 2013. East, Pakistan, in 1983. He received the B.Sc.
[32] R. Johnson and T. Zhang, ‘‘Accelerating stochastic gradient descent using degree from The Islamia University Bahawalpur,
predictive variance reduction,’’ in Proc. Adv. Neural Inf. Process. Syst., Pakistan, in 2003, the M.I.T. degree in computer
2013, pp. 315–323. science from Bahauddin Zakariya University,
[33] L. Bottou, ‘‘Large-scale machine learning with stochastic gradient Multan, in 2005, and the Ph.D. degree from
descent,’’ in Proc. COMPSTAT. Berlin, Germany: Springer, 2010, Chongqing University, China, in 2012. From
pp. 177–186.
2006 to 2009, he worked as a Network/IT
[34] Y. Zhang, H. Zhang, J. Cai, and B. Yang, ‘‘A weighted voting classifier
Administrator in different companies. From
based on differential evolution,’’ Abstract Appl. Anal., vol. 2014, pp. 1–6,
May 2014. August 2012 to February 2016, he worked as
[35] A. M. Arbib, The Handbook of Brain Theory and Neural Networks, 2nd ed. an Assistant Professor with The Islamia University Bahawalpur. Since
Cambridge, MA, USA: MIT Press, 2002. February 2016, he has been working as an Associate Dean with the Khwaja
[36] M. Khalid, I. Ashraf, A. Mehmood, S. Ullah, M. Ahmad, and G. S. Choi, Fareed University of Engineering and Information Technology, Rahim Yar
‘‘GBSVM: Sentiment classification from unstructured reviews using Khan. He has almost 14 years of industry experience in field of IT. He is an
ensemble classifier,’’ Appl. Sci., vol. 10, no. 8, p. 2788, Apr. 2020. Active Researcher in the field of adhoc networks, IoT, congestion control,
[37] Z. S. Li and A. Jain, Encyclopedia of Biometrics. Berlin, Germany: data science, and network security.
Springer, 2015.
[38] J. H. Friedman, ‘‘Greedy function approximation: A gradient boosting SEYEDALI MIRJALILI (Senior Member, IEEE) is
machine,’’ Ann. Statist., vol. 29, pp. 1189–1232, Oct. 2001. currently an Associate Professor and the Director
[39] J. Silva, I. Praça, T. Pinto, and Z. Vale, ‘‘Energy consumption forecasting
of the Centre for Artificial Intelligence Research
using ensemble learning algorithms,’’ in Proc. Int. Symp. Distrib. Comput.
and Optimization, Torrens University, Australia.
Artif. Intell. Cham, Switzerland: Springer, 2019, pp. 5–13.
[40] M. Vicente, F. Batista, and J. P. Carvalho, ‘‘Gender detection of Twitter He is internationally recognized for his advances
users based on multiple information sources,’’ in Interactions Between in swarm intelligence and optimization, including
Computational Intelligence and Mathematics Part 2. Cham, Switzerland: the first set of algorithms from a synthetic intelli-
Springer, 2019, pp. 39–54. gence standpoint - a radical departure from how
[41] M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi, and B.-W. On, natural systems are typically understood - and a
‘‘Fake news stance detection using deep learning architecture (CNN- systematic design framework to reliably bench-
LSTM),’’ IEEE Access, vol. 8, pp. 156695–156706, 2020. mark, evaluate, and propose computationally cheap robust optimization
[42] L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32, algorithms. He has published over 200 publications with over 25,000 cita-
2001. tions and an H-index of over 55. As the most cited researcher in Robust
Optimization, he is in the list of 1% highly-cited researchers and named
as one of the most influential researchers in the world by Web of Science
in Computer Science and Engineering. He is an Associate Editor of several
journals, including Neurocomputing, Applied Soft Computing, Advances in
ANAM YOUSAF received the B.S. degree from Engineering Software, Applied Intelligence, and IEEE ACCESS.
the Department of Computer Science, The Islamia
University of Bahawalpur, Pakistan. She is VAIBHAV RUPAPARA received the Master of
currently pursuing the Ph.D. degree in com- Science degree in computer science from Florida
puter science with the Khwaja Fareed Univer- International University, Miami, FL, USA. He has
sity of Engineering and Information Technology worked on different domain, including finance
(KFUEIT). Her recent research interests include and healthcare. His expertise contributed towards
data mining, mainly working natural language achieving high quality, scalable deliverability with
processing-based problems. security. His research interests include machine
learning, AI, and deep learning.

MICHELE NAPPI (Senior Member, IEEE)


received the laurea degree (cum laude) in com-
MUHAMMAD UMER received the B.S. degree puter science from the University of Salerno,
from the Department of Computer Science, Italy, in 1991, the M.Sc. degree in information
Khwaja Fareed University of Engineering and and communication technology from I.I.A.S.S.
Information Technology (KFUEIT), Pakistan, E.R. Caianiello, in 1997, and the Ph.D. degree in
in October 2018. He is currently pursuing the applied mathematics and computer science from
Ph.D. degree in computer science with KFUEIT. the University of Padova, Italy, in 1997. He is
He is also working as a Research Assistant with the currently a Full Professor of computer science with
Fareed Computing and Research Center, KFUEIT. the University of Salerno. He is a Team Leader
His recent research interests include data min- of the Biometric and Image Processing Laboratory (BIPLAB). He is the
ing, mainly working machine learning and deep author of more than 180 papers in peer-reviewed international journals,
learning-based IoT, text mining, and computer vision tasks. international conferences, and book chapters. His research interests include
pattern recognition, image processing, image compression and indexing,
multimedia databases and biometrics, human–computer interaction, and
VR/AR. He is a GIRPR/IAPR Member. He is also a member of TPC
of international conferences. He received several international awards for
SAIMA SADIQ is currently pursuing the Ph.D. degree in computer sci- scientific and research activities. He is the co-editor of several international
ence with the Khwaja Fareed University of Engineering and Information books. He serves as associate editor and managing guest editor for several
and Technology (KFUEIT). She is also working as an Assistant Professor international journals. He has been the President of the Italian Chapter of the
with the Department of Computer Science, Government Degree College for IEEE Biometrics Council. He was one of the founders of the spin off BS3
Women. Her research interests include data mining, machine learning, and (biometric system for security and safety), in 2014.
deep learning-based text mining.

VOLUME 9, 2021 6295

You might also like