0% found this document useful (0 votes)
134 views5 pages

Inset Lexicon: Evaluation of A Word List For Indonesian Sentiment Analysis in Microblogs

This document describes the development and evaluation of an Indonesian sentiment lexicon called InSet. The lexicon was created using words extracted from Indonesian tweets, which were manually classified as having positive or negative sentiment. The resulting lexicon contains 3,609 words classified as positive and 6,609 words classified as negative. Experiments show that InSet outperforms other existing but less comprehensive Indonesian sentiment lexicons.

Uploaded by

adolf hitler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views5 pages

Inset Lexicon: Evaluation of A Word List For Indonesian Sentiment Analysis in Microblogs

This document describes the development and evaluation of an Indonesian sentiment lexicon called InSet. The lexicon was created using words extracted from Indonesian tweets, which were manually classified as having positive or negative sentiment. The resulting lexicon contains 3,609 words classified as positive and 6,609 words classified as negative. Experiments show that InSet outperforms other existing but less comprehensive Indonesian sentiment lexicons.

Uploaded by

adolf hitler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321757985

InSet Lexicon: Evaluation of a Word List for Indonesian Sentiment Analysis in


Microblogs

Conference Paper · December 2017


DOI: 10.1109/IALP.2017.8300625

CITATIONS READS
18 3,284

2 authors, including:

Fajri Koto
University of Melbourne
26 PUBLICATIONS 122 CITATIONS

SEE PROFILE

All content following this page was uploaded by Fajri Koto on 12 December 2017.

The user has requested enhancement of the downloaded file.


InSet Lexicon: Evaluation of a Word List for
Indonesian Sentiment Analysis in Microblogs

Fajri Koto Gemala Y. Rahmaningtyas


Engineering Department Engineering Department
KMK Online KMK Online
Jakarta, Indonesia Jakarta, Indonesia
Email: [email protected] Email: [email protected]

Abstract—In this study, we propose InSet, an Indonesian for Indonesian language. Therefore, in this work we tried
sentiment lexicon built to identify written opinion and to improve sentiment analysis technique specifically for
categorize it into positive or negative opinion, which could Indonesian language by building a precise Indonesian
be utilized to analyze public sentiment towards particular
topic, event, or product. Composed using collection of words lexicon which particularly aims microblogs.
from Indonesian tweet, InSet was constructed by manually Our sentiment lexicon was composed by words gathered
weighting each words and enhanced by adding stemming from Twitter, as the representation of commonly used
and synonym set. As the result, we obtained 3,609 positive social media in Indonesia. We built the lexicon by clas-
words and 6,609 negative words with score ranging between
sifying the polarity of each word and enhanced it with
–5 and +5. Based on the experiment utilizing the InSet, our
method outperforms other rarely found Indonesian lexicon some previously proven methods. The result of tests and
that we used as baseline. evaluations conducted in this study shows that InSet has
a satisfactory performance as an Indonesian sentiment
Keywords- sentiment analysis; lexicon; indonesian; mi-
croblog; twitter lexicon to predict the negative and positive polarity of
shortly written opinions.
I. I NTRODUCTION
II. R ELATED W ORKS
As Indonesia is an archipelago country with thousands
of islands across the ocean and diverse culture as well as In computer science, the study of sentiment analysis
high population density, online social media has become over microblogs has been one of the main focus in the field
a medium for most Indonesian citizen to communicate of natural language processing, information retrieval and
and send any thought or ideas across the country. Even data mining. These researches mainly focused on various
these days, any microblogging tools such as Facebook, construction for English Lexicon such as SentiWordNet
Twitter and Instagram have become an inseparable culture [6], Liu Lexicon [11], AFINN Lexicon [12], Opinion
of modern Indonesian. And in fact, it has been very easy to Finder [9], Senti-Strength [13], HBE Lexicon [14], and
create a viral news on the Internet and gather Indonesian also NRC Emotion Lexicon [15]. Research by Koto et
public opinions towards certain topic. al. has summarized the performance comparison among
With the massive number of social media users in these features and reveal that AFINN and Senti-Strength
Indonesia, it would be interesting, if not profitable, to are the current best features for English Twitter Sentiment
analyze the netizens sentiment of any topics, or which Analysis [16].
is also known as Sentiment Analysis. Sentiment Analysis Research on sentiment analysis over microblog Twit-
is a classification task to determine the text polarity ter has been done by Go et al. in which they utilized
and refers to broad area of natural language process- emoticons to annotate tweets with sentiment label [10].
ing, computational linguistic and text mining [1]. Ac- The next study by Agarwal et al. used manually annotated
cording to [2], there are two kinds of sentiment clas- tweets with sentiment and performed unigram model to do
sification task: 1) polarity classification with classes = classification [17]. In other studies, Koto et al. analyzed
{positive, negative}, and 2) subjectivity classification with sentence pattern of tweets with sentiment label, either in
classes = {subjective, objective}. It is clear that the pos- subjectivity and polarity domain [18].
itive or negative class indicates the positive or negative Despite the large number of works related to English
polarity of a sentence. On the other hand, the objective sentiment analysis, the research number that focuses on
sentence means the utterance of information containing Indonesian language are limited. In [4], Wicaksono et al.
facts or news and less argumentation while the subjectivity proposed a methodology to automatically construct dataset
reflects a private point of view, emotion or belief [3]. for twitter sentiment analysis, while in [5] Lunando et
Though sentiment analysis has been one of the most al. tried to combine sarcasm and sentiment analysis task
studied field in computer science and specifically in natural for Indonesian language. However, the features used for
language processing, there are not many researches con- sentiment analysis were only produced from the translated
ducted to improve Sentiment Analysis, or opinion mining, SentiWordNet [6].

978-1-5386-1981-0/17/$31.00 2017
c IEEE 391
Figure 1. Stages of InSet Construction

In other work, Naradhipa et al. only used n-gram as


the key feature to perform sentiment classification [7]
which did not give a lot of improvement from the pre-
vious technique. The most satisfying research related to
Indonesian sentiment analysis that we found was works
by Vania et al. [8] in which they constructed Indonesian Figure 2. The Result of Lexicon Construction on Each Stages
sentiment lexicon by translating it from Opinion Finder [9]
and applying enhancement by seeding the words. Based on
those previous works, here we focused on building lexicon score is 0.45 and 0.97. Since the score ranges between 0
for Indonesian language which is used to analyze public and +5 or –5, the agreement score can be classified as a
sentiment specifically in microblogging. good annotation result.

III. I NDONESIA S ENTIMENT (I N S ET ) L EXICON Table I


C ONSTRUCTION T HE M ANUALLY S CORING R ESULT OF I N S ET C ONSTRUCTION

Our sentiment lexicon was constructed in 2017 and Positive Negative


utilized the Twitter data stream around November 2016. Score Count Score Count
The data was collected for three days and filtered with 0 3421 0 2942
Bahasa Indonesia (Indonesian Language) and two kinds 1 420 -1 197
of emoticons that express positive “:)” and negative “:(” 2 529 -2 464
polarity. We grouped the tweet into positive and negative 3 382 -3 719
set by following works of Go et al. in which they utilized 4 205 -4 458
emoticons to annotate tweet with sentiment label [10]. 5 44 -5 520
In total we have around 10,000 tweets and then applied
the preprocessing stages by 1) removing repetitive ads, As described in Fig 1, the word list construction was
2) converting to the lowercase, 3) removing url and continued by conducting stemming for the resulting words
Twitter entities such as @account, and 3) removing special in Table I. In this stage, we excluded words with score
character and stopwords. To select the words candidate, 0 since it can be classified as irrelevant for sentiment
we applied n-gram (where n = {1, 2, 3}) and removed lexicon. Stemming was conducted by utilizing Sastrawi1 ,
words with frequency equals to 1. In this stage, we had library that implements stemming algorithm for Indone-
12,503 and 13,164 candidates for positive and negative sian language, proposed by [19]. If the resulting words
words consecutively. exist in the original set, then it will be excluded, otherwise
As AFINN [12] has shown a very good performance the score of the stemmed words will follow the highest
in sentiment analysis task for English twitter [16], we polarity of its original words. For instance ”memarahi“
followed their work by scoring top-5000 words for each (scold) and ”dimarahi“ (scolded) have polarity scores –4
set, range from -5 (very negative) to +5 (very positive). and –2 consecutively. The stemming result of both words
To make the labelling process easier, we only scored is ”marah“ (angry) will follow the score of ”memarahi“
for valence, leaving out subjectivity/objectivity. Manual word which contains higher polarity for negative set.
weighting was done by two native Indonesian speakers After enhancing through stemming stage, our lexicon
which had been given the same instruction before con- grew into 1,987 positive and and 2,593 negative words
ducting the scoring. as described in Fig 2. This word list was then used as
The result of scoring is shown in Table I and indicates the input of the next stage where we added the synonym
that there are many words weighted as 0 by the annotators. to enhance the lexicon. Here, we used Indonesian syn-
The score column was calculated by averaging the scores onym from SinonimKata2 which consists of 35,711 unique
given by two annotators and rounded into the greater words. Words which have similar synonym will be scored
nearest integer (ceiling). The agreement score between with the highest polarity score as well as the scoring of
two annotators can be described as the average of sum stemming stage.
of difference score. For positive set, the agreement score
is 0.52 for whole words and 1.27 if we exclude the words 1 https://github.com/sastrawi/sastrawi

with score 0, while for the negative set the agreement 2 http://www.sinonimkata.com/

392 2017 International Conference on Asian Language Processing (IALP)


Table II
We also conducted cleansing as the final stage, partic- L IST OF F EATURES USED IN THE E XPERIMENT
ularly for the result of synonym addition. Some words in
Indonesian language may have different polarity with its Technique #positive #negative #unique
synonym, so it may cause error if we add all of synonym TF unigram + bigram of the texts
sets. For instance, word “hotel” (hotel) is the synonym TF-IDF unigram + bigram of the texts
of “gubuk” (shack), but the polarity is totally different Vania Lexicon [8] 414 581 994
where “gubuk” is more suitable for negative word. Word Translated SentiWordNet [6] 17015 18028 29095
“bacot” (words) and “perkataan” (words) are also the Translated Liu Lexicon [11] 1182 2402 3461
synonym to each other but ”bacot“ contains the strong Translated AFINN [12] 878 1598 2476
negative polarity. Therefore, to exclude some irrelevant InSet Lexicon 3609 6609 9075
synonym addition, we conducted 2 steps: 1) we only
selected words with high polarity and remove synonym
words with score = {–2, –1, 1, 2}. 2) For each additional As the baseline, we used some existing techniques such
words, we excluded it from the list if it also exists in the as n-gram, Vania Lexicon [8] and some translated well-
additional list of its opposite polarity. known lexicons. Here we followed the works of Lunando
et al. in which they use translated SentiWordNet to perform
Sentiment Analysis task for Indonesian language [5]. By
using Google translation, three translated lexicons were
included, such as SentiWordNet, Liu Lexicon and AFINN
Lexicon. The details of the word distribution in each
polarity are described in Table 2. Vania and Liu Lexicon
were used by calculating number of words that match
with corresponding sentiment class, while features of
SentiWordNet and AFINN were constructed by summing
of the scores for the positive or negative words that match
the lexicon.

B. Experiment Result
To evaluate our method, we performed cross validation
Figure 3. Histogram of Valencies for InSet Lexicon (with k=10) and show the results in Table III. The value on
each cell indicates the accuracy of each methodology with
Finally, we have the word list that comprises of 10,218 particular classifier. Our experiment results show that the
words and call them as InSet (Indonesian Sentiment) as traditional technique such as TF and TF-IDF do not work
described in Figure 3. The word list has a bias towards well in classifying the tweets for Indonesian language. The
negative words (6,609, corresponding to 65%) compared translated SentiWordNet, Liu and AFINN also show the
to positive words (3,609). However, the bias apparently similar result. It might be caused by 1) the error of the
corresponds closely to the bias found in the Opinion translation system, 2) the translation system which does
Finder sentiment lexicon (4911 (64%) negative and 2718 not cover the OOV (Out of Vocabulary) or slang words
positive words), and also AFINN Lexicon (1598 (65%) of Indonesian language, and 3) the lexicon itself which
negative and 878 positive words). contains too many uncommon words and rarely used for
user-generated platform such as Twitter.
IV. E XPERIMENT AND E VALUATION
In Table III, the InSet has the highest accuracy for
A. Experimental Set-Up each classifier compared to the other baselines. It reaches
To evaluate the InSet, we used a Twitter dataset from 65.78% as the highest and better than the Vania Lexicon
2015 which has been manually annotated by a native with 61.48% accuracy. Vania Lexicon is an Indonesian
Indonesian speaker. The data was crawled through twitter Lexicon that was built by translating Opinion Finder
API and has 1,259 positive and 1,371 negative tweets. The Lexicon and then seeding the words for the enhancement.
InSet lexicon was used by constructing two values, called Factors which may cause difference in performance are:
as InSetPos and InSetNeg, where InSetPos or InSetNeg is 1) Vania Lexicon was produced by utilizing the trans-
the sum of the scores for the positive or negative words lation system, while the word candidates of InSet were
that match the lexicon. After that, we performed binary selected directly from Indonesian language source, 2)
classification by using several supervised algorithms such Vania Lexicon contains many formal words, while InSet
as: Logistic Regression (LR), Naive Bayes (NB), Support contains formal and informal words as it was generated
Vector Machine (SVM), and Neural Network (NN). The using Twitter data, 3) InSet was constructed by manually
preprocessing was conducted before performing training labeling of 2 native speakers where the bias and the
and testing for the evaluation and it includes 1) tweet precision of words polarity become the priority of InSet,
conversion to the lowercase, 3) removing url and twitter while Vania Lexicon was constructed automatically. Even
entities, and 3) removing special character. though our Lexicon is less reproducible, we argue that

2017 International Conference on Asian Language Processing (IALP) 393


Table III
T HE ACCURACY (%) OF E ACH F EATURE S ET

Baseline
Classifier Translated English Lexicon InSet
TF TF-IDF Vania Lex.
SentiWordNet Liu Lex. AFINN
NB 59.52 58.71 61.48 52.66 61.48 57.18 65.13
LR 53.58 52.89 61.44 53.83 61.48 52.13 65.78
SVM 55.56 56.21 61.48 53.76 61.48 59.51 65.51
NN 57.64 54.68 61.41 52.51 59.84 58.97 65.71

the manual effort is still needed and required as it works [8] C. Vania, M. Ibrahim, and M. Adriani, “Sentiment Lexicon
very well like AFINN. 4) InSet has polarity score that Generation for an Under-Resourced Language”. In Interna-
indicates positive or negative sentiment of a word, while tional Journal Comput. Linguistics Appl., 20154, pp. 59–
72.
Vania Lexicon only a list of words in two different classes:
positive and negative set. [9] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing
contextual polarity in phrase-level sentiment analysis”. In
V. C ONCLUSION Proc. of the conference on human language technology and
empirical methods in natural language processing, 2005,
In this study we have constructed InSet Lexicon that pp. 347–354.
comprises of 3,609 positive and 6,609 negative words in
Indonesian language. Each word was manually labeled [10] A. Go, R. Bhayani, and L. Huang, “Twitter sentiment
based on its polarity and then enhanced by adding the classification using distant supervision.” In CS224N Project
Report, Stanford, 2009, pp. 1–12.
stemming and synonym set. Our research contributions
lay on two points. First, we constructed the sentiment [11] B. Liu, M.and J. Cheng, “Opinion observer: analyzing and
lexicon for Indonesian language which is more suitable comparing opinions on the web”. In Proceedings of the
for microblogs. Second, our approach shows that the result 14th international conference on World Wide Web, 2005,
outperforms all of the existing baseline methods where pp. 342–351.
the highest accuracy is 65.78%. For the future works, the [12] F. A. Nielsen, “A new ANEW: Evaluation of a word list
enhancement of lexicon can be conducted by incorporating for sentiment analysis in microblogs”. 2011, Available at
the translated English lexicon with the InSet. http://arxiv.org/abs/1103.2903

R EFERENCES [13] M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment


strength detection for the social web”. In Journal of the
[1] W. J. Trybula, “Data Mining and Knowledge Discovery”. American Society for Information Science and Technology,
In Annual review of information science and technology 2012, pp. 163-173.
(ARIST), 1997, pp. 197–229.
[14] F. Koto, and M. Adriani, “HBE: Hashtag-Based Emotion
[2] F. Bravo-Marquez, M. Mendoza, and B. Poblete, “Combin- Lexicons for Twitter Sentiment Analysis”. In Proceedings
ing strengths, emotions and polarities for boosting Twitter of the 7th Forum for Information Retrieval Evaluation,
sentiment analysis”. In Proceedings of the Second Inter- 2015, pp. 31–34.
national Workshop on Issues of Sentiment Discovery and
Opinion Mining, 2013. [15] S. M. Mohammad, P. D. Turney, “Crowdsourcing a worde-
motion association lexicon”. In Computational Intelligence,
[3] S. Raaijmakers and W. Kraaij, “A Shallow Approach to 2013, pp. 436–465.
Subjectivity Classification”. In ICWSM, 2008.
[16] F. Koto, and M. Adriani, “A comparative study on twitter
[4] A. F. Wicaksono, C. Vania, B. Distiawan, and M. Adriani, sentiment analysis: Which features are good?”. In Interna-
“Automatically Building a Corpus for Sentiment Analysis tional Conference on Applications of Natural Language to
on Indonesian Tweets”. In PACLIC, 2014, pp. 185–194. Information Systems, 2015, pp. 453–457.

[17] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passon-


[5] E. Lunando, and A. Purwarianti, “Indonesian social media
neau, “Sentiment analysis of twitter data”. In Proc. of the
sentiment analysis with sarcasm detection”. In Advanced
Workshop on Languages in Social Media, 2011, pp. 30–38.
Computer Science and Information Systems (ICACSIS),
2013, pp. 195–198.
[18] F. Koto, and M. Adriani, “The use of POS sequence for
analyzing sentence pattern in Twitter sentiment analysis”.
[6] S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet In Advanced Information Networking and Applications
3.0: An Enhanced Lexical Resource for Sentiment Analysis Workshops (WAINA), 2015, pp. 547–551.
and Opinion Mining”. In LREC, 2010, pp. 2200–2204.
[19] M. Adriani, J. Asian, B. Nazief, S. M. Tahaghoghi, and
[7] A. R. Naradhipa, and A. Purwarianti, “Sentiment classifi- H. E. Williams “Stemming Indonesian: A confix-stripping
cation for Indonesian message in social media”. In Cloud approach”. In ACM Transactions on Asian Language Infor-
Computing and Social Networking (ICCCSN), 2012, pp. mation Processing (TALIP), 2007, pp. 1–33.
1–5.

394 2017 International Conference on Asian Language Processing (IALP)


View publication stats

You might also like