2020 International Conference on Computer Engineering, Network and Intelligent Multimedia (CENIM 2020)
Sundanese Twitter Dataset for Emotion
Classification
Oddy Virgantara Putra Fathin Muhammad Wasmanson Triana Harmini
Department of Informatics Department of Informatics Department of Informatics
Universitas Darussalam Gontor Universitas Darussalam Gontor Universitas Darussalam Gontor
Ponorogo, Indonesia Ponorogo, Indonesia Ponorogo, Indonesia
[email protected] [email protected] [email protected] Shoffin Nahwa Utama
Department of Informatics
Universitas Darussalam Gontor
Ponorogo, Indonesia
[email protected] Abstract—Sundanese is the second-largest tribe in Indonesia [9]. Some of them are used to gather knowledge from customer
which possesses many dialects. This condition has gained at- satisfaction and business trends [2] from Twitter. Therefore,
tention for many researchers to analyze emotion especially on utilizing tweets is promising for data analysis.
social media. However, with barely available Sundanese dataset,
this condition makes understanding sundanese emotion is a Recently, there are many emerging issues on Twitter, es-
challenging task. In this research, we proposed a dataset for emo- pecially West Java, from which Sundanese originate. It has
tion classification of Sundanese text. The preprocessing includes considerable potential for sentiment and emotion study. On
case folding, stopwords removal, stemming, tokenizing, and text this day, there are many Indonesian Twitter datasets available
representation. Prior to classification, for the feature generation, for the public [2], [10]. However, it is yet to find public
we utilize term frequency-inverse document frequency (TFIDF).
We evaluated our dataset using k-Fold Cross Validation. Our Sundanese Twitter dataset. Hence, in this work, we build a
experiments with the proposed method exhibit an effective result public Sundanese Twitter dataset for emotion classification.
for machine learning classification. Furthermore, as far as we In here, we provide a public dataset from Twitter and
know, this is the first Sundanese emotion dataset available for propose a model for emotion classification. Furthermore, we
public. perform the classification with K-Nearest Neighbor (KNN),
Keywords—emotion classification, dataset, sundanese, support
vector machine, text mining Random Forest (RF), Naive Bayes (NB), Logistic Regression
(LR), and Support Vector Machine (SVM). We also evaluate
our model using F1-Score, Precision, and Recall from the
I. I NTRODUCTION
confusion matrix.
Nowadays, social media has been widely known as a new
way of communication. People have been using it for many II. R ELATED W ORKS
purposes, such as promoting products, introducing health Research in emotion analysis has been conducted for more
protocols, and researches. One of the generally used social than a decade, which results in many useful datasets available
media is Twitter. In Indonesia, Twitter has been used in many for the public. A heterogeneous annotated database has been
different local languages. The second-largest local language is publicly available by the contribution of [11]. This dataset was
Sundanese. Sundanese is quite active as their favorite football a combination of headlines, fairy tales, and blogs. Here, several
club, PERSIB Bandung has more than 3 million followers fundamental emotions, such as fear, disgust, anger, sadness,
on Twitter. Twitter is prevalent in many types of research, happiness, and surprise, employed in order to analyze. The
especially in emotion analysis [1]–[5]. final task shows that this dataset has an incredible performance
When people travel to a country that varies in ethnicity, such by using SVM compared to other classifiers.
as Indonesia, they must pay attention to local customs prior A two-stage method proposed by [10] for Indonesian emo-
to communicating with each other. It is easy to understand tion detection from the Twitter dataset. The proposed method
someone’s expression from their face, even a subtle movement a couple of stages: emotion extraction and emotion classifica-
[6]. On the other hand, interpreting expression through text tion. Here, emotions are grouped into five outstanding classes:
without emojis is burdensome. [4]. joy, anger, sadness, fear, and love. Some various components
In a world of machine learning, emotion extraction is a were devised, such as semantic, linguistic, and orthographic, to
challenging task. Many algorithms have been proposed in this classify the emotion. This work demonstrated superior results
field, from video-based [7] to text-based recognition [5], [8], and tackled challenging issues in emotion analysis
978-1-7281-8283-4/20/$31.00 ©2020
Authorized licensed use limited to: Carleton IEEE Downloaded391
University. on May 31,2021 at 19:47:47 UTC from IEEE Xplore. Restrictions apply.
A handy work by [2] proposed a public Indonesian emotion
dataset. This incredible work gathered data from Twitter
for about two weeks. This dataset contains five distinctive
emotions: anger, sadness, fear, joy, and love. In the learning
process, this dataset was classified using Logistic Regression
(LR), RF, and SVM. 10-fold Cross-Validation was used to
split the data between test and training. Finally, the results
gained for precision, recall, F1-score are 70%, 68%, and 68%,
respectively.
In the next few months, a practical text classifier using a
pre-trained model proposed [1]. This work can be considered
groundbreaking in natural language processing (NLP). The
dataset was collected from Amazon Reviews. In order to
preprocess, the dataset was separated into several batch groups.
Each batch consists of tokenized vocabularies 32,000 in total.
As interest in microblog services gradually increasing, a
number of topics are produced over time. The microblog
Fig. 1. Dataset identity.
attracts much attention in the analysis of emotional expres-
sion. Ren [5] proposed emotion extraction from Chinese
microblogs. This work is rule-based, which contains three
tasks. They are opinion findings, emotion analysis, and opinion such a lowered word might be translated into another
target extraction. word. Therefore, it is better to let alone the letter case.
In this step, the data separated into two groups: label and
III. P ROPOSED W ORK column. Tweets may contain inconsistent case. Therefore,
In this section, the proposed work is separated into several we were uniformly using lowercase in every single word.
steps, such as Dataset Gathering and Annotation, Text Pre- For instance, this tweet,”Gokar! Punteun, buat yg masih
processing, Feature Selection, Text Representation, Emotion pd nongkrong yg masih jalan2 gajelas” transformed into
Classification, and Model Evaluation. ”gokar punteun buat yg masih pd nongkrong yg masih
jalan gajelas”
A. Dataset • Stopword Filtering
• Gathering and Annotation Stopword is defined as meaningless word. Such word
We gathered dataset from Twitter API between January does not affect much in a sentence. It is safe to ignore
and March 2019 with 2518 tweets in total. The tweets fil- this kind of words. There are many words like this in
tered by using some hashtags which are represented Sun- English, for instance: is, are, were, that, this, which, etc.
danese emotion, for instance, #persib, #corona, #saredih, Subsequently, removing this word is an important task.
#nyakakak, #garoblog, #sangsara, #gumujeng, #bungah, Stopword filtering removes unnecessary characters from
#sararieun, #ceurik, and #hariwang. This dataset contains words because they are not representing any emotion.
four distinctive emotions: anger, joy, fear, and sadness. Here, we gathered many stopwords not only from Sun-
Each tweet is annotated using related emotion. For data danese but also from Bahasa Indonesia. Twitter is vast
validation, we consulted a Sundanese language teacher and full of composite languages. We cannot expect a
for expert validation. tweet contains a language from a specific country or
• Data Identity tribe. Accordingly, our stopword is a mixture between In-
Our dataset consists of four distinctive emotions that have donesia and Sundanese language. Here is some examples
a balanced amount of data for each class. This can be seen of out stopword: ”tapi”, ”sanajan”, ”salain”, ”ti”, ”ku”,
in Fig 1. Our dataset can be accessed at here1 . ”kituna”, ”sabalikna”, ”malah”, ”adalah”, ”nyah”, ”euy”.
• Tokenizing
B. Text Preprocessing Tokenizing is a process of splitting a sentence into several
This step consists of four phase: case folding, filtering, words. Each tweet is split into a word vector. This process
tokenizing, and stemming. may uses a different separator. However, in our case,
• Case Folding we utilize space delimiter. For example:”gokar punteun
It is broadly known that case-folding is often used in data buat yg masih pd nongkrong” transformed into ’gokar’,
preprocessing. This task is simple. All letters are reduced ’punteun’, ’buat’, ’yg’, ’masih’, ’pd’, ’nongkrong’.
into the lowered-case form. In some cases, a lower case • Stemming
may be useful for data normalization. On the contrary, Stemming is a process of extracting or reducing words
into its root form. It is similar to normalization but for
1 https://github.com/virgantara/sundanese-twitter-dataset.git text-based data. Stemming is useful for reducing the
Authorized licensed use limited to: Carleton University. Downloaded392
on May 31,2021 at 19:47:47 UTC from IEEE Xplore. Restrictions apply.
number of words in the corpus. Since both Sundanese TABLE I
and Bahasa have similar words, we adopted stemming E MOTION C LASSIFICATION BASED ON TFIDF F EATURE
from Bahasa. Model Emotion Prec. Rec. F1 Acc.
Anger 85 % 97 % 90 %
C. Feature Selection KNN
Fear 81 % 84 % 82 %
84 %
Joy 83 % 81 % 82 %
In here, feature selection utilizes stopword removal. This Sadness 87 % 73 % 79 %
process removed all conjunctions. A list of stopwords is Anger 93 % 95 % 94 %
created out of Bahasa and Sundanese language. This list Fear 87 % 94 % 90 %
RF 92 %
Joy 98 % 87 % 92 %
contains 508 words. We combined Bahasa and Sundanese Sadness 92 % 92 % 92 %
because of many tweets containing these two languages. So, Anger 73 % 79 % 76 %
it would be ineffective for the results if we abandon or only Fear 65 % 64 % 65 %
NB 65 %
Joy 65 % 57 % 61 %
use one of them. Sadness 57 % 59 % 58 %
Anger 94 % 95 % 95 %
D. Text Representation Fear 93 % 97 % 95 %
LR 94 %
Joy 98 % 89 % 93 %
Text Representation (TR) is considered as one of the main Sadness 90 % 94 % 92 %
factors in text mining. Some feature extractions that employ Anger 94 % 98 % 96 %
Bag-of-Words (BoW) may reduce semantic information [12], SVM
Fear 97 % 98 % 98 %
95 %
not to mention sparsity. Here, in order to ease the classifica- Joy 99 % 90 % 95 %
Sadness 92 % 95 % 94 %
tion, every tweet is transformed into a vector. Then, we work
on some basic features such as BoW, TFIDF, and N-Grams.
• BoW (Bag-of-Words) TABLE II
BoW is a way to extract features from the text. It is E MOTION C LASSIFICATION BASED ON B OW F EATURE
also often described as the presence of words. In BoW, Model Emotion Prec. Rec. F1 Acc.
the more frequent word comes out, the more likely it Anger 67 % 87 % 76 %
becomes the feature. Fear 63 % 81 % 71 %
KNN 69 %
Joy 73 % 61 % 67 %
• TF-IDF Sadness 80 % 44 % 57 %
Different from BoW, TFIDF measures how important a Anger 91 % 97 % 94 %
word is within a document. This feature is composed of Fear 87 % 94 % 90 %
RF 91 %
Joy 95 % 85 % 89 %
two parts. The first one calculates the term frequency Sadness 93 % 89 % 91 %
(TF). TF represents the frequency of words from a Anger 80 % 85 % 83 %
document compared with the total number of words in Fear 68 % 87 % 77 %
NB 75 %
Joy 77 % 69 % 72 %
the same document. The second one is Inverse Document Sadness 77 % 58 % 66 %
Frequency (IDF), which computes the number as the Anger 91 % 96 % 94 %
logarithmic function of a document then divides the Fear 90 % 96 % 93 %
LR 93 %
Joy 98 % 89 % 93 %
document count in which a related term emerges. Simply Sadness 93 % 90 % 91 %
saying, TF-IDF is feature extraction, which the more Anger 90 % 94 % 92 %
word appears in a specific document, the more likely it Fear 92 % 96 % 94 %
SVM 93 %
Joy 97 % 91 % 94 %
works as the main feature. Sadness 93 % 90 % 91 %
• N-Gram
In a nutshell, N-Gram is considered as sequential words.
It also has a similar meaning to a phrase. A phrase may
consist of more than one word. Should a phrase split into F. Model Evaluation
many words, it may obscure the meaning of the phrase. Here, we evaluate our models from each algorithm by
For example: a sentence such as ”I do not like fried rice.” calculating their precision, recall, f1-score, not to mention
if tokenized into some standalone words, it could have the accuracy.
opposite meaning. The word ”not” is likely removed by
the stopword. Thus, by applying N-Gram, we can handle IV. E XPERIMENT AND R ESULTS
such a sentence to maintain its meaning.
In this part, all models produced by the aforementioned
E. Emotion Classification algorithms were tested. They were tested using a laptop
In this section, the dataset is processed using some machine with RAM 16 GB, Processor Intel I7-8750H, VGA NVIDIA
learning algorithms. Each algorithm has a similar output which GeForce GTX 1050 Ti, and operating system Ubuntu 18.04
produce models. These models are later used for classifica- LTE.
tions. Prior to model evaluation, we incorporated K-Fold Cross Table I illustrated the results of emotion classification from
Validation with K equals to 10. Several classifiers exhibited in generally used algorithms KNN, RF, NB, LR, and SVM. As
this research are KNN, RF, NB, LR, and SVM. observed, the majority of algorithms achieved high accuracy,
Authorized licensed use limited to: Carleton University. Downloaded393
on May 31,2021 at 19:47:47 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Learning curve, scalability, and performance of the model.
Fig. 3. Comparison of four emotions with three measurement (Precision, Recall, and F1-Score). (a) Anger, (b) Fear, (c) Joy, and (d) Sadness
TABLE III the term of precision, SVM still performed its best in nearly
E MOTION C LASSIFICATION BASED ON N-G RAM F EATURE all emotions with more than 95% on average. Surprisingly,
Model Emotion Prec. Rec. F1 Acc. RF overpowered all algorithms at Sadness emotion with 95%.
Anger 55 % 72 % 62 % Then, SVM achieved at top-notch in Recall and F1-Score, with
Fear 53 % 71 % 60 % the same value at 95.5%.
KNN 59 %
Joy 71 % 48 % 58 %
Sadness 70 % 46 % 55 % As we found that SVM gained top condition, we measured
Anger 90 % 98 % 94 % its performance. Fig. 2 shows the trend of three different types
Fear 83 % 98 % 90 % of evaluators. Overall, it can be seen that the trend of the
RF 91 %
Joy 97 % 82 % 89 %
Sadness 96 % 83 % 89 % learning curve for training examples experienced remarkable
Anger 91 % 96 % 94 % change throughout time.
Fear 85 % 89 % 87 %
NB
Joy 85 % 84 % 84 %
86 % To begin with, it is clear that the learning curve climbs
Sadness 84 % 77 % 80 % dramatically in the first stage of training data from 200 to
Anger 93 % 96 % 95 % 500. At the second stage, its performance increasingly steady
Fear 89 % 98 % 93 %
LR
Joy 98 % 86 % 92 %
92 % but at low speed.
Sadness 91 % 90 % 91 % Second of all, model performance has slightly the same per-
Anger 93 % 95 % 94 % formance as the learning curve. In the beginning, it gradually
Fear 89 % 98 % 93 %
SVM
Joy 98 % 88 % 93 %
93 % climbs up overfitting times and achieves a score of 0.95.
Sadness 94 % 91 % 93 % Interestingly, the scalability model just started its speed
climbing exponentially to peak with more than 50 fitting times.
We found different results, as illustrated in Table II. Both
SVM and LR make to the top rank with 93%, followed by
precision, recall, and F1-Score. Meanwhile, only one algo- RF, NB, and KNN with 91%, 75%, and 69%, respectively. By
rithm at a low value of the measurement. First of all, SVM using BoW feature extraction, NB has switched position with
stood overall remaining algorithms with roughly 96%. This KNN, which is no longer at the bottom tier of classifiers. In
was comprised of four emotions i.e., Anger, Sadness, Joy, and Table III, we evaluated our model using unigram and bigram
Fear. It can be seen that LR has a slightly lower accuracy features with three different parameters. It can be seen that
than SVM with 94% and followed by RF at 92%. On the SVM still dominates other algorithms with 93% followed by
other hand, NB has the worst performance, with 65%. By LR, RF, LR, and KNN. On the contrary, KNN performance
Authorized licensed use limited to: Carleton University. Downloaded394
on May 31,2021 at 19:47:47 UTC from IEEE Xplore. Restrictions apply.
decreased drastically for accuracy at 59%. This is the lowest [8] H. Fei, D. Ji, Y. Zhang, and Y. Ren, “Topic-Enhanced Capsule Network
score compared in Table I and Table II. for Multi-Label Emotion Classification,” IEEE/ACM Transactions on
Audio, Speech, and Language Processing, vol. 28, pp. 1839–1848, 2020.
The graph in Fig. 3 compares the performance of all text [9] S. Ahmad, M. Z. Asghar, F. M. Alotaibi, and S. Khan, “Classification of
representation features, which are calculated in precision, Poetry Text Into the Emotional States Using Deep Learning Technique,”
recall, and F1-score. Overall, TF-IDF was significantly higher IEEE Access, vol. 8, pp. 73865–73878, 2020.
[10] J. E. The, A. F. Wicaksono, and M. Adriani, “A two-stage emotion
in all emotions in which contributes to precision. detection on Indonesian tweets,” in 2015 International Conference
To begin with, the precision of TF-IDF for Anger class on Advanced Computer Science and Information Systems (ICACSIS),
(Depok, Indonesia), pp. 143–146, IEEE, Oct. 2015.
stood at 94 percent, while for the counterparts are greater than [11] S. Chaffar and D. Inkpen, “Using a heterogeneous dataset for emotion
95 percent. Surprisingly, the recall of TF-IDF peaked at 98 analysis in text,” pp. 62–67, May 2011.
percent, followed by N-Gram and BoW, not to mention F1- [12] Zhou, Wang, Sun, and Sun, “A Method of Short Text Representation
Based on the Feature Probability Embedded Vector,” Sensors, vol. 19,
Score in which gained the top position at 96 percent. p. 3728, Aug. 2019.
In Fear class, the proportion of precision, recall, and F1-
score for TF-IDF was around 97 percent. As in recall, TF-
IDF stood at an equal position with N-Gram. However, the
disparity of F1-Score between TF-IDF and the remainings was
dramatically high. Now, we moved to Joy class. On average,
all three features were almost at the same level. The precision
was relatively high, at 98 percent.
In terms of Sadness class, the precision of TF-IDF was
defeated by N-Gram, followed by BoW. On the contrary, both
BoW and N-Gram were outclassed entirely by TF-IDF by 5
percent.
V. C ONCLUSION
In this research, we have built a new public dataset for
Sundanese emotion classification. Our dataset contains four
distinguished annotated classes (fear, joy, anger, and sadness).
We tested our dataset with five algorithms. As a result, the
SVM model gained the highest score, with 95% accuracy
followed by other algorithms. We found that different feature
extraction exploits different results.
We need to employ stemming specifically for the Sundanese
language and gather more massive datasets in future works.
R EFERENCES
[1] N. Kant, R. Puri, N. Yakovenko, and B. Catanzaro, “Practical Text Clas-
sification With Large Pre-Trained Language Models,” arXiv:1812.01207
[cs], Dec. 2018. Comment: 8 pages, submitted to AAAI 2019.
[2] M. S. Saputri, R. Mahendra, and M. Adriani, “Emotion Classification on
Indonesian Twitter Dataset,” in 2018 International Conference on Asian
Language Processing (IALP), (Bandung, Indonesia), pp. 90–95, IEEE,
Nov. 2018.
[3] M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive
Language Detection in Indonesian Twitter,” in Proceedings of the Third
Workshop on Abusive Language Online, (Florence, Italy), pp. 46–57,
Association for Computational Linguistics, 2019.
[4] S. Sendari, I. A. E. Zaeni, D. C. Lestari, and H. P. Hariyadi, “Opinion
Analysis for Emotional Classification on Emoji Tweets using the Naı̈ve
Bayes Algorithm,” Knowledge Engineering and Data Science, vol. 3,
pp. 50–59, Aug. 2020.
[5] F. Ren and Q. Zhang, “An Emotion Expression Extraction Method for
Chinese Microblog Sentences,” IEEE Access, vol. 8, pp. 69244–69255,
2020.
[6] N. Muna, U. D. Rosiani, E. M. Yuniamo, and M. H. Pumomo, “Subpixel
subtle motion estimation of micro-expressions multiclass classification,”
in 2017 IEEE 2nd International Conference on Signal and Image
Processing (ICSIP), (Singapore), pp. 325–330, IEEE, Aug. 2017.
[7] C. Li, J. Wang, H. Wang, M. Zhao, W. Li, and X. Deng, “Visual-
Texual Emotion Analysis With Deep Coupled Video and Danmu Neural
Networks,” IEEE Transactions on Multimedia, vol. 22, pp. 1634–1646,
June 2020.
Authorized licensed use limited to: Carleton University. Downloaded395
on May 31,2021 at 19:47:47 UTC from IEEE Xplore. Restrictions apply.