ACL 2020 Proceedings Template 2 PDF
ACL 2020 Proceedings Template 2 PDF
2 Methodology
The first step is the conversion of words into a rep-
resentative form for computational algorithms to
understand. The simplest method which has been
explored in this method is to attach a unique iden-
tifier to each word under consideration. A keras (a) Malayalam news task classes
tokenizer ((Chollet et al., 2015)) has been used for
the Malayalam Text to integer conversion , but in
the case of Hindi it gave poorer results with a sin-
gle number representing multiple words. The will
result in ambiguity and result in considerable re-
duction in the accuracy. When the sentences are
converted into a sequence of number from a se-
quence of words , they need to be converted into
vectors of similar length in order to facilitate train-
ing by neural networks. The number of sentences
in Text classification is much lower than the num- (b) Hindi named Entity Recognition categories
ber of sentences in the Named Entity recognition
Figure 1: Number of classes for Malayalam text classi-
task. The Named Entity recognition task has a fication and Hindi Named Entity recognition
considerable higher number of words classified as
others than all other classes combined. The number
of classes in the text classification 1a is balanced for every time step of the training. The input vector
and the number of words in the 1b are tilting in length is chosen as the size of the longest sentence
favour of others in a very high manner. This is a in both the cases. For optimization we used to
big challenge which emerges in classification tasks RMSprop ((Dauphin et al., 2015)), which is the
where the ’others’ is trained several times over and division of the learning rate by a running average.
over again . to facilitate compatibility with a smaller data set
A 64 unit LSTM is used for the Classification and the Hindi NER used Adam optimizer ((Balles
of Malayalam text , with a 256 layer Neurons with and Hennig, 2018)) for training , since the data
’Relu’ Layer as activation and the final layer with set here is considerably bigger, where the learning
3 neurons depicting the classes are given a sigmoid rate is adaptive varied and is known to perform
layer as in 2a. The word embedding is chosen well on sparse matrices , especially on Natural Lan-
on the basis of the input dimension of the vector guage processing since we go on to work on sparse
under consideration. Dropouts,meaning killing of matrices with all the padding as well.
certain neurons during the training process to pre-
3 Results and Discussion
vent over fitting. The difference between ’sigmoid’
and ’Relu’ is that the relu does not have a negative The result for the text classification and the Named
branch. The training is carried for one epoch in Entity recognition is encouraging as depicted in
case of the LSTM and for the Bidirectionall LSTM 3a and 3b respectively, where the heat plot of dif-
a single training is carried out on batches.The archi- ferent classification metrics is enlisted for each of
tecture of the bidriectional LSTM is given in 2a. A the classes.One can easily see that although the
single epoch took one hour and it was considerably accuracy is high , the precision comes down con-
difficult to train the network further, in contrast siderably for the text classification, and the recall
with the LSTM which trained for 10 epochs. The is very low for the Name Entity recognition. It is
word embedding dimension has been chosen on the interesting to note that the ’datenum’ class has the
basis of the total number of words and a time dis- lowest recall, this can be intuitive understood on
tributed output, see for example (Altché and de La the fact that although many words the algorithm
Fortelle (2017),is used which represents an output encountered are dates and numbers, the algorithm
(a) Malayalam news task