0% found this document useful (0 votes)
356 views5 pages

Emotion Classification On Youtube Comments

Uploaded by

william
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
356 views5 pages

Emotion Classification On Youtube Comments

Uploaded by

william
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Emotion Classification on Youtube Comments using

Word Embedding

Julio Savigny Ayu Purwarianti


School of Electrical and Informatics Engineering School of Electrical and Informatics Engineering
Institut Teknologi Bandung Institut Teknologi Bandung
Bandung, Indonesia Bandung, Indonesia
[email protected] [email protected]

Abstract— Youtube is one of the most popular video sharing furthermore weighted by using the term frequency and inverse
platform in Indonesia. A person can react to a video by document frequency (TF-IDF). In this paper, we will study
commenting on the video. A comment may contain an emotion that further another type of document representation using word
can be identified automatically. In this study, we conducted embedding.
experiments on emotion classification on Indonesian Youtube
comments. A corpus containing 8,115 Youtube comments is Word embedding have been popular in the NLP community,
collected and manually labelled using 6 basic emotion label mainly because it can capture semantics of the words [4]. This
(happy, sad, angry, surprised, disgust, fear) and one neutral label. unique feature leads to its usage in many text classification tasks,
Word embedding is a popular technique in NLP, and have been such as sentiment analysis [5], election classification [6], etc.
used in many classification tasks. Word embedding is a The usage of word embedding as a feature for emotion
representation of a word, not a document, and there are many classification is rare and is never used for Indonesian language.
methods to use word embedding in a text classification task. Here,
we compared many methods for using word embedding in a However, word embedding is a representation of a word, not
classification task, namely average word vector, average word a document. To represent a text document as a word embedding
vector with TF-IDF, paragraph vector, and by using feature for a classifier, there are various methods that can be
Convolutional Neural Network (CNN) algorithm. We also study used, such as averaging the word vectors [6], using paragraph
the effect of the parameters used to train the word embedding. We vector method [7], and also by using Convolutional Neural
compare the performance of the classification with a baseline, Network [8]. So, a research is needed to be done to determine
which was previously state-of-the-art, SVM with Unigram TF- the best method for using word embedding in emotion
IDF. The experiments showed that the best performance is classification task on Youtube comments.
achieved by using word embedding with CNN method with
accuracy of 76.2%, which is an improvement from the baseline. II. RELATED WORKS

Keywords—comments; classification; word embedding; CNN; Several studies have been conducted on emotion
emotion; classification in text [9][10]. Many methods are used for
different emotion classification tasks and different corpora.
I. INTRODUCTION Word embeddings are mainly used on tweets, not Youtube
Youtube is rapidly growing and is one of the most visited site comments and rarely implemented for a fine-grained text
in Indonesia [1]. Many people rely on Youtube to make money classification, such as emotion classification.
as content creators. Content creators need to cater their There is a research on emotion classification for Youtube
audiences, one of the ways is to read the reaction in the comments in Thai language [10]. Six basic emotion category
comments. The comments may contain emotion that can be are used. The data used are Youtube comments collected from
detected automatically. Then, content creators can gain insight videos with advertisement (AD) and music video (MV)
by knowing the emotion reactions in their videos and using that category. Unigram with TF-IDF weighting are used as features.
insight to create their future videos more wisely. The researcher then conducted experiments using different
To classify the emotion automatically, a text classification classifier algorithm, namely SVM, Multinomial Naïve Bayes,
system can be build using text processing techniques. The text and Decision Tree. Best accuracy is achieved with Multinomial
classification approach is to build classifiers from labeled texts Naïve Bayes for MV dataset (84.48%) and SVM for AD dataset
[2], and be considered as a supervised classification task. The (76.14%).
technique can be described as a statistical or machine-learning Another research in emotion classification task is by
approach. In a machine-learning approach, aside from the Atmadja & Purwarianti [9]. The data used in this research are
learning algorithms, the feature representation of a text is also 7,622 tweets in Bahasa Indonesia. In this research, rule based
affecting the performance of the classifier. Usually, the text is and statistical based classification were compared. In the rule
represented as a bag-of-words [3], where each document is based method, Synesketch algorithm is used and word lexicon
represented by the occurrences of words, which can be such as WordNet-Affect is utilized. In the statistical based

978-1-5386-3001-3/17/$31.00 ©2017 IEEE.


method, the research compared n-grams with TF-IDF the dimensionality of the word vector. The matrix then
weighting (unigram, bigram, trigram) features. The features are passed to a CNN architecture proposed by Kim Y. [8] as
also selected further with information gain and minimum an embedding layer for a convolution process.
frequency algorithm to reduce the dimensionality. Various
preprocessing techniques are also implemented and compared. IV. EXPERIMENTAL SETUP
Comparation between various learning algorithm, SVM, Naïve A. Dataset
Bayes, and Decision Tree is also conducted. The best accuracy
The data is collected from 10 Youtube videos from various
is 83.156%, obtained by using statistical method with SVM
genre by using Youtube v3 API. The 8,115 comments collected
algorithm and unigram with TF-IDF features.
then stored and manually labeled by 2 people, and the third
Another studies have also been conducted on using word person as a tie-breaker if there is a tie in the labels. The
embedding in text classification tasks. Yang, et al. conducted a annotators were given forms containing all comments need to
research on using word embedding on twitter election be labelled, and then label each comment separately
classification task [6]. The dataset is sampled from collected (independent from the other annotator). The labels used are 6
tweets about Venezuela parliamentary election, and then basic emotions (happy, sad, surprised, disgust, fear, angry)
manually labelled as “Election-related” and “Not Election- proposed by Paul Ekman [11], and one neutral label for
related”. The word embedding models are then constructed by comments that doesn’t contain any emotion. Table I shows the
using Word2Vec implementation on twitter data and Wikipedia label’s distribution over the data collected.
data. The word embedding models are constructed by differing
various parameters, namely context window sizes W, and TABLE I. DISTRIBUTION OF DATA
dimension sizes D. Then the word embedding models are
Label Total Comments
passed to a CNN architecture. To compare the classifiers,
baselines were used. The baselines are Random Classifier, Happy 1689
Sad 571
SVM with TF-IDF features, and SVM with word embedding
Surprised 427
(averaging the word vectors). The best performance is acquired Disgust 270
by the CNN model using word embedding with D = 800 and W Fear 654
= 5, which resulted in f1-score of 0.771. Angry 846
Neutral 3658
III. WORD EMBEDDING METHODS
Word embeddings are the representation of words. To In addition to the labelled data, another 43,151 Youtube
represent a Youtube comments as a word embedding feature, comments are collected to be used as a corpus to train the word
we can use various methods. In this study, we will use four embedding models. The data used to train the word embedding
different methods to represent a Youtube comment as a word doesn’t need to be labelled.
embedding feature, which are: B. Preprocess
• Average Word Vector, word vector of each word in the The data collected are not in a formal form. There are many
comment is averaged along each dimension to construct irregularities in the data. So, we preprocess the data first. The
the feature vectors of the comment, as in (1). Where D preprocess is implemented mainly using python programming
is the document vector of the comment, ࢃ௜ is the word language, by doing some regular expression matchings.
vector of the word i, and N is the number of words in the
comment. Various preprocessing techniques are employed, such as:
• Base processing, contains casefolding (lowercase
σಿ
೔సభ ࢃ೔ conversion), removal of punctuation, and converting
ࡰ ൌ  (1)
ே url.
• Average Word Vector with TF-IDF, word vector of each • Emoticon Conversion, emoticons in the text are
word in the comment is multiplied by TF-IDF value of converted to a word that represent the emotion.
the word and then averaged along each dimension to • Number removal, numbers are assumed to be
construct the feature vectors of the comment, as in (2). irrelevant for emotion classification. Therefore, every
Where D is the document vector of the comment, ࢃ௜ is number in the data is removed.
the word vector of the word i, and N is the number of • Slang Dictionary, informal / slang form of a word is
words in the comment. converted to a formal form using a slang dictionary.
This process is implemented using InaNLP [15].
σಿ
೔సభ ࢃ೔ ൈ்ிିூ஽ிሺࢃ೔ ሻ • Duplicate character removal, Indonesian Youtube
ࡰ ൌ  (2) comments usually contain mistyped and non-standard

• Paragraph Vector, using Doc2Vec [7]. This method word that have more than one letter and is converted
directly learns the vector representation of a document. to one letter e.g. “kerreeeen” is converted to “keren”.
• One character removal, a word that only contain one
• Convolutional Neural Network (CNN). The comment is character is removed.
represented as an array of word vectors, or matrix, of
each word inside the comment. The matrix is in the size
of S ൈ D where S is the length of the comment and D is
C. Baseline then validated using 10-fold cross validation. The algorithm
To evaluate the various word embedding models, we use a used are Support Vector Machine.
baseline, which obtain the best performance in previous Fig. 1 shows the effect of the features selected to the
research [9]. Support Vector Machine is used as the classifier accuracy. We can see that accuracy is increased up until ~1200
and Unigram with TF-IDF weighting scheme is used as the most important features are selected, and generally decreased
features. after that. It can be assumed that there are many words or
Experiment on feature selection using Information Gain is features that are not important and does not discriminate
also conducted. The feature selection experiment is conducted between the labels well. The accuracy with 1200 features is
to study the effect of the feature size to the performance. Feature about 74.1% and it is set as the baseline accuracy.
selection is also reducing the dimensionality of the feature. B. Word Embedding
D. Word Embedding To tune the parameters of the word embedding models, we
In this study, there are several implementations that is used construct the models, and then use SVM and 10-fold cross
to train word embedding, namely Word2Vec [12], Doc2Vec validation to validate. The word embedding method used for
[7], and GloVe [13]. For each of the implementations, we this is average word vector, except for Doc2Vec model, which
generate a set of word embedding models by fine tuning the use paragraph vector method. For each parameter we set the
parameters. In Word2Vec, the parameters and its values are other control parameters statically, then varying the parameter
Architecture A = {skip-gram, continuous bag-of-words}, to be tuned. The best parameter then used for tuning the
context windows size W = {5, 7, 9, 11, 13, 15}, dimensionality subsequent parameter. For the Architecture, in Word2Vec,
D = {100, 200, 300, 400, 500}, negative sampling N = {5, 10, skip-gram is generally gives the better performance, with about
15, 20, 25, 30}, and iteration I = {5, 10, 15, 20, 25, 30}. For ~3% better accuracy. In Doc2Vec, the distributed bag-of-words
Doc2Vec, the parameters and its values are Architecture A = (DBOW) is better, with ~1.6% better accuracy.
{distributed memory, distributed bag-of-words}, context
windows size W = {5, 7, 9, 11, 13, 15}, dimensionality D =
{100, 200, 300, 400, 500}, negative sampling N = {5, 10, 15,
20, 25, 30}, and iteration I = {5, 10, 15, 20, 25, 30}. For GloVe,
the parameters and its values are context windows size W = {5,
7, 9, 11, 13, 15}, dimensionality D = {100, 200, 300, 400, 500},
and iteration I = {5, 10, 15, 20, 25, 30}.
Aside from the parameter tuning on the word embedding
models, experiment on the word embedding methods is also
conducted. There are four methods, Average Word Vector,
Average Word Vector with TF-IDF, Paragraph Vector, and
Convolutional Neural Network. The parameters of the CNN are
based on the previous research [8].
V. EXPERIMENTAL RESULTS Fig. 2. Effect of dimensionality on accuracy.

A. Baseline For the effect of dimensionality D, we can see in the Fig.2.


There it is shown that in Word2Vec, generally, the bigger the
size of the dimension, the better the accuracy. The best size of
dimensionality in Word2Vec is about 400. In the Doc2Vec and
GloVe however, the same behaviour cannot be stated. The
accuracy is increased up until D = 200, then it slightly declines.

Fig. 1. Experimental Result in Feature Selection using Information Gain

The features used are unigram with TF-IDF. The vocabulary


count of the corpus data is about 9000, so to reduce features and
select the most important ones we use the range of the features
from 200-9000, sorted by each feature’s Information Gain, and
Fig. 3. Effect of context window size on accuracy.
Fig.3. Illustrate the effect of context window size W on iteration I = 30. Generally speaking, Word2Vec gives a much
accuracy. We can see that, generally, context window has better performance than GloVe with the average vector method,
insignificant effect on the accuracy. In Word2Vec, accuracy is so we will use only Word2Vec and Doc2Vec for subsequent
generally better the larger the context window size, and is experiments.
generally flat on other models.
After setting the parameters of the word embedding, we then
proceed to compare the methods of representing the comments
as embedding features. We build the features using four
different methods as stated before and then use 10-fold cross
validation to validate. SVM is used to train and classify the
Average Vector, Average Vector with TF-IDF, and Paragraph
Vector methods, and CNN algorithm is used for the CNN
method.

TABLE II. ACCURACY OF WORD EMBEDDING METHODS

Method Accuracy
Average Vector 70.7%
Average Vector with
69.5%
Fig. 4. Effect of iteration on accuracy. TF-IDF
Paragraph Vector 59.1%
For the effect of iteration I on the accuracy, we can see in CNN 73.4%
the Fig.4. It is shown that generally, the larger the iteration, the
better the accuracy. In Word2Vec, the improvement seems to In Table II we can see that the best accuracy is obtained by
stabilize when the I = 15. In Doc2Vec, it seems to stabilize the CNN method. The accuracy of all the methods still can’t
when I = 20, altough I = 30 resulted in better accuracy. In beat the accuracy that was given by the baseline. In an attempt
GloVe, the iteration has not yet stabilized when I = 30, and if to increase accuracy, a feature selection experiment is
time permits, it’s better to increment the iteration further. conducted. In the input data, only words that is included in the
1200 best words from the feature selection experiment in
baseline are considered.

TABLE III. ACCURACY OF WORD EMBEDDING METHODS WITH FEATURE


SELECTION

Method Accuracy
Average Vector 71,5%
Average Vector with
71,1%
TF-IDF
Paragraph Vector 59,5%
CNN 76,2%

Table III shows the result of the feature selection on the


Fig. 5. Effect of negative sample on accuracy.
accuracy of various methods. We can see that the accuracy are
In Fig.5 we can see the effect of negative sample N on the better for every method, compared with the accuracy without
accuracy of Word2Vec and Doc2Vec models. In the Word2Vec feature selection. One of the method, the CNN method, gives a
model, the effect of negative sample is negligible, although the better accuracy, 76.2%, compared to 74.1% of baseline
accuracy is increased slightly when the negative sample is accuracy. Still, there are rooms for improvements. The CNN
bigger. In Doc2Vec however, the negative sample size has a methods works better for this task because
negative correlation with the accuracy. The accuracy is We analyzed that from the error rate, some of the errors are
increased until N = 10, then decreases. caused by some ambiguous emotion in the comments. Also, the
After the parameter tuning is done, we set the parameter lack of negation handling, such as “tidak sedih” is classified as
with the best value that gives the best accuracy. For Word2Vec, sad because it contains kata “sedih” but doesn’t consider the
the Architecture A = skip-gram, dimensionality D = 400, negation context. Unoptimized preprocess technique is also
context window size W = 15, iteration I = 15, negative sample causing errors, some word like “waokoawkokwaok” can be
size N = 20. For Doc2Vec, the Architecture A = dbow, considered as a laughing, thus happy emotion, but is not
dimensionality D = 200, context window size W = 15, iteration converted to its basic form “tertawa”. Also, angry comments,
I = 30, negative sample size N = 10. For GloVe, the parameters which usually typed in all capitalized form, can be misclassified
is dimensionality D = 200, context window size W = 13, and
as a neutral one because of the casefolding in the preprocess [2] Danisman, T., & Alpkocak, A. (2008). Feeler: Emotion Classification of
technique. Text Using Vector Space Model. AISB 2008 Convention Communication,
Interaction and Social Intelligence, 53..
VI. CONCLUSION AND FUTURE WORKS [3] Joachims, T. (2002). Learning to Classify Text Using Support Vector
Machines: Methods, Theory and Algorithms. Norwell: Kluwer Academic
From the experiments that have been done, we may conclude Publishers.
that by using word embedding on emotion classification task on [4] T. Mikolov, W.T. Yih, G. Zweig. Linguistic Regularities in Continuous
Youtube comment can increase the accuracy of the Space Word Representations.
classification. The best method to use word embedding for the [5] Yin, Y., & Jin, Z. (2015). Document Sentiment Classification based on
classification is by using Convolutional Neural Network the Word Embedding. 4th International Conference on Mechatronics,
algorithm. The accuracy obtained by using word embedding Materials, Chemistry and Computer Engineering.
with CNN is 76.2%, which is better than using SVM and [6] Yang, X., Macdonald, C., & Ounis, I. (2016). Using Word Embeddings
unigram with TF-IDF features. CNN method is also gives a in Twitter Election Classification. CoRR.
better accuracy than other word embedding methods. This could [7] Le, Q., & Mikolov, T. (2014). Distributed Representations of Sentences
and Documents. ICML.
be explained by the fact that the neural network approach can
[8] Kim, Y. (2014). Convolutional Neural Networks for Sentence
capture more features, such as the sequence of the words inside Classification. CoRR.
a comment (by convoluting the word matrix, it is dependent of
[9] Atmadja, A. R., & Purwarianti, A. (2015). Comparison on the rule based
sequence of the words), than the others, which is just naïve method and statistical based method on emotion classification for
representation of word vectors (couldn’t capture the word Indonesian Twitter text. 2015 International Conference on Information
orders). For the word embedding models, to obtain the best Technology Systems and Innovation (ICITSI).
result, it is recommended to fine-tune the parameters for every [10] Sarakit, P., Theeramunkong, T., & Haruechaiyasak, C. (2015).
classification task and dataset. Classifying emotion in Thai youtube comments. Information and
Communication Technology for Embedded Systems (IC-ICTES), 2015 6th
There are still room for improvements, and this research can International Conference of. IEEE.
be developed further. In future works, one can handle the [11] Ekman, P. (1972). Universals and Cultural Differences in Facial
negation in the text, and also add another feature, such as capital Expressions of Emotions. Nebraska Symposium on Motivation, 207-282.
letter feature. Another emotion category, such as dimensional [12] Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Distributed
model, can also be used. Also, the word embedding model can Representations of Words and Phrases and their Compositionality. NIPS.
be improved by using a much larger corpus to train. The labelled [13] Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global
Vectors for Word Representation. EMNLP, (pp. 1532-1543).
data can also be larger and more balanced to give a better result.
The preprocessing techniques can also be optimized. [14] Purwarianti, A., Andhika, A., Wicaksono, A. F., & Afif, I. (2016).
InaNLP: Indonesia natural language processing toolkit, case study:
Furthermore, the CNN method can also be combined with SVM Complaint tweet classification. 2016 International Conference On
algorithm. Rather than using the softmax layer as the classifier Advanced Informatics: Concepts, Theory And Application (ICAICTA),
for the features generated by CNN, SVM also can be used. 1-5.

REFERENCES
[1] SimilarWeb Ltd. (2016, November 1). Top Websites in Indonesia -
SimilarWeb Website Ranking. Retrieved from Similar Web:
https://www.similarweb.com/top-websites/indonesia

You might also like