Emotion Classification On Youtube Comments
Emotion Classification On Youtube Comments
Word Embedding
Abstract— Youtube is one of the most popular video sharing furthermore weighted by using the term frequency and inverse
platform in Indonesia. A person can react to a video by document frequency (TF-IDF). In this paper, we will study
commenting on the video. A comment may contain an emotion that further another type of document representation using word
can be identified automatically. In this study, we conducted embedding.
experiments on emotion classification on Indonesian Youtube
comments. A corpus containing 8,115 Youtube comments is Word embedding have been popular in the NLP community,
collected and manually labelled using 6 basic emotion label mainly because it can capture semantics of the words [4]. This
(happy, sad, angry, surprised, disgust, fear) and one neutral label. unique feature leads to its usage in many text classification tasks,
Word embedding is a popular technique in NLP, and have been such as sentiment analysis [5], election classification [6], etc.
used in many classification tasks. Word embedding is a The usage of word embedding as a feature for emotion
representation of a word, not a document, and there are many classification is rare and is never used for Indonesian language.
methods to use word embedding in a text classification task. Here,
we compared many methods for using word embedding in a However, word embedding is a representation of a word, not
classification task, namely average word vector, average word a document. To represent a text document as a word embedding
vector with TF-IDF, paragraph vector, and by using feature for a classifier, there are various methods that can be
Convolutional Neural Network (CNN) algorithm. We also study used, such as averaging the word vectors [6], using paragraph
the effect of the parameters used to train the word embedding. We vector method [7], and also by using Convolutional Neural
compare the performance of the classification with a baseline, Network [8]. So, a research is needed to be done to determine
which was previously state-of-the-art, SVM with Unigram TF- the best method for using word embedding in emotion
IDF. The experiments showed that the best performance is classification task on Youtube comments.
achieved by using word embedding with CNN method with
accuracy of 76.2%, which is an improvement from the baseline. II. RELATED WORKS
Keywords—comments; classification; word embedding; CNN; Several studies have been conducted on emotion
emotion; classification in text [9][10]. Many methods are used for
different emotion classification tasks and different corpora.
I. INTRODUCTION Word embeddings are mainly used on tweets, not Youtube
Youtube is rapidly growing and is one of the most visited site comments and rarely implemented for a fine-grained text
in Indonesia [1]. Many people rely on Youtube to make money classification, such as emotion classification.
as content creators. Content creators need to cater their There is a research on emotion classification for Youtube
audiences, one of the ways is to read the reaction in the comments in Thai language [10]. Six basic emotion category
comments. The comments may contain emotion that can be are used. The data used are Youtube comments collected from
detected automatically. Then, content creators can gain insight videos with advertisement (AD) and music video (MV)
by knowing the emotion reactions in their videos and using that category. Unigram with TF-IDF weighting are used as features.
insight to create their future videos more wisely. The researcher then conducted experiments using different
To classify the emotion automatically, a text classification classifier algorithm, namely SVM, Multinomial Naïve Bayes,
system can be build using text processing techniques. The text and Decision Tree. Best accuracy is achieved with Multinomial
classification approach is to build classifiers from labeled texts Naïve Bayes for MV dataset (84.48%) and SVM for AD dataset
[2], and be considered as a supervised classification task. The (76.14%).
technique can be described as a statistical or machine-learning Another research in emotion classification task is by
approach. In a machine-learning approach, aside from the Atmadja & Purwarianti [9]. The data used in this research are
learning algorithms, the feature representation of a text is also 7,622 tweets in Bahasa Indonesia. In this research, rule based
affecting the performance of the classifier. Usually, the text is and statistical based classification were compared. In the rule
represented as a bag-of-words [3], where each document is based method, Synesketch algorithm is used and word lexicon
represented by the occurrences of words, which can be such as WordNet-Affect is utilized. In the statistical based
• Paragraph Vector, using Doc2Vec [7]. This method word that have more than one letter and is converted
directly learns the vector representation of a document. to one letter e.g. “kerreeeen” is converted to “keren”.
• One character removal, a word that only contain one
• Convolutional Neural Network (CNN). The comment is character is removed.
represented as an array of word vectors, or matrix, of
each word inside the comment. The matrix is in the size
of S ൈ D where S is the length of the comment and D is
C. Baseline then validated using 10-fold cross validation. The algorithm
To evaluate the various word embedding models, we use a used are Support Vector Machine.
baseline, which obtain the best performance in previous Fig. 1 shows the effect of the features selected to the
research [9]. Support Vector Machine is used as the classifier accuracy. We can see that accuracy is increased up until ~1200
and Unigram with TF-IDF weighting scheme is used as the most important features are selected, and generally decreased
features. after that. It can be assumed that there are many words or
Experiment on feature selection using Information Gain is features that are not important and does not discriminate
also conducted. The feature selection experiment is conducted between the labels well. The accuracy with 1200 features is
to study the effect of the feature size to the performance. Feature about 74.1% and it is set as the baseline accuracy.
selection is also reducing the dimensionality of the feature. B. Word Embedding
D. Word Embedding To tune the parameters of the word embedding models, we
In this study, there are several implementations that is used construct the models, and then use SVM and 10-fold cross
to train word embedding, namely Word2Vec [12], Doc2Vec validation to validate. The word embedding method used for
[7], and GloVe [13]. For each of the implementations, we this is average word vector, except for Doc2Vec model, which
generate a set of word embedding models by fine tuning the use paragraph vector method. For each parameter we set the
parameters. In Word2Vec, the parameters and its values are other control parameters statically, then varying the parameter
Architecture A = {skip-gram, continuous bag-of-words}, to be tuned. The best parameter then used for tuning the
context windows size W = {5, 7, 9, 11, 13, 15}, dimensionality subsequent parameter. For the Architecture, in Word2Vec,
D = {100, 200, 300, 400, 500}, negative sampling N = {5, 10, skip-gram is generally gives the better performance, with about
15, 20, 25, 30}, and iteration I = {5, 10, 15, 20, 25, 30}. For ~3% better accuracy. In Doc2Vec, the distributed bag-of-words
Doc2Vec, the parameters and its values are Architecture A = (DBOW) is better, with ~1.6% better accuracy.
{distributed memory, distributed bag-of-words}, context
windows size W = {5, 7, 9, 11, 13, 15}, dimensionality D =
{100, 200, 300, 400, 500}, negative sampling N = {5, 10, 15,
20, 25, 30}, and iteration I = {5, 10, 15, 20, 25, 30}. For GloVe,
the parameters and its values are context windows size W = {5,
7, 9, 11, 13, 15}, dimensionality D = {100, 200, 300, 400, 500},
and iteration I = {5, 10, 15, 20, 25, 30}.
Aside from the parameter tuning on the word embedding
models, experiment on the word embedding methods is also
conducted. There are four methods, Average Word Vector,
Average Word Vector with TF-IDF, Paragraph Vector, and
Convolutional Neural Network. The parameters of the CNN are
based on the previous research [8].
V. EXPERIMENTAL RESULTS Fig. 2. Effect of dimensionality on accuracy.
Method Accuracy
Average Vector 70.7%
Average Vector with
69.5%
Fig. 4. Effect of iteration on accuracy. TF-IDF
Paragraph Vector 59.1%
For the effect of iteration I on the accuracy, we can see in CNN 73.4%
the Fig.4. It is shown that generally, the larger the iteration, the
better the accuracy. In Word2Vec, the improvement seems to In Table II we can see that the best accuracy is obtained by
stabilize when the I = 15. In Doc2Vec, it seems to stabilize the CNN method. The accuracy of all the methods still can’t
when I = 20, altough I = 30 resulted in better accuracy. In beat the accuracy that was given by the baseline. In an attempt
GloVe, the iteration has not yet stabilized when I = 30, and if to increase accuracy, a feature selection experiment is
time permits, it’s better to increment the iteration further. conducted. In the input data, only words that is included in the
1200 best words from the feature selection experiment in
baseline are considered.
Method Accuracy
Average Vector 71,5%
Average Vector with
71,1%
TF-IDF
Paragraph Vector 59,5%
CNN 76,2%
REFERENCES
[1] SimilarWeb Ltd. (2016, November 1). Top Websites in Indonesia -
SimilarWeb Website Ranking. Retrieved from Similar Web:
https://www.similarweb.com/top-websites/indonesia