0% found this document useful (0 votes)
81 views4 pages

ACL 2020 Proceedings Template 2 PDF

This document discusses using LSTM networks for natural language processing tasks like text classification and named entity recognition. It explores using a unidirectional LSTM for text classification of Malayalam sentences, which achieved an average accuracy of 84%. For named entity recognition of Hindi words, a bidirectional LSTM was used since it requires stronger contextual information. The bidirectional LSTM achieved 99% accuracy on one data set but only 20% F1 score on another, showing it struggled more with named entity recognition. The document outlines the methodology used, including word embedding and network architectures, and discusses the results, noting precision was lower for text classification while recall was lowest for the "datenum" class in named entity recognition.

Uploaded by

Vijay Shankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views4 pages

ACL 2020 Proceedings Template 2 PDF

This document discusses using LSTM networks for natural language processing tasks like text classification and named entity recognition. It explores using a unidirectional LSTM for text classification of Malayalam sentences, which achieved an average accuracy of 84%. For named entity recognition of Hindi words, a bidirectional LSTM was used since it requires stronger contextual information. The bidirectional LSTM achieved 99% accuracy on one data set but only 20% F1 score on another, showing it struggled more with named entity recognition. The document outlines the methodology used, including word embedding and network architectures, and discusses the results, noting precision was lower for text classification while recall was lowest for the "datenum" class in named entity recognition.

Uploaded by

Vijay Shankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

LSTM for Natural Language Processing

Vijay Shankar A Premjith B


Center for Computational Engineering Center for Computational Engineering
And Networking And Networking
Amrita School of Engineering Amrita School of Engineering
Amrita Vishwa Vidyapeetam ,India Amrita Vishwa Vidyapeetam ,India
[email protected]

Abstract words to and run. This learning needs to continue


over many iterations so that the Network will able
This work aims to study the application of
Long Short term network to the two important
to understand the word contextual information. So
classification tasks associated with Natural lan- it is important that a sequential Network be used in-
guage processing, namely, Text classification stead of conventional Neural networks. The use of
and Named entity recognition. In the former LSTM For Text classification is explored in (Sem-
the individual sentences taken as a single entity berecki and Maciejewski (2017)), where different
are associated with a particular category, in the vectorial representations have been explored in the
later every word of the sentence is classifies on study. This study proposes the use of direct word
the basis of its category. Because of the fact
to index conversion, displaying the efficiency of
that contextual information is more important
in the Named entity recognition than text clas- the Neural Networks in understanding the sentence
sification, we try a bidirectional LSTM for the in a minimalist way of a sequential representations.
Named Entity recognition. While it has been The word embedding here used is direct and sim-
observed that the LSTM on Text classification ple. When the Sequential Neural Networks work
gives an average accuracy of 84 percent, the directly on sequences and their ability to learn and
named entity recognition although giving an understand thos sequences without any complex
accuracy of about 99 percent on one set of the
embedding algorithms is important and interesting.
data fails with an F1 score on a another set of
data giving a low F1 score of 20 percentage.
Instead of working on vectored representations, di-
The work has been carried out on non English rect representation must work.
languages namely, Malayalam and Hindi, ex- The case of Named Entity recognition is much
ploring the prospect of Natural Language pro- more comprehensive. The need for contextual in-
cessing in Indian Languages using LSTM. formation is similar to that of classification but
here every word is marked for its characteristics.
1 Introduction
For example, ” I am going to India ” here India
Text classification is an important task in Natural is a location, I can be taken as an subject and go-
Language processing, it is a paradigm in which a ing is the verb. This is the kind of task associ-
machine or a computer will understand the nature ated with named entity recognition. In this work
of Sentences to tell the computer what a linguistic we classify Malayalam sentences into three cate-
information represents. Long Short term memory gories, Business, sports and Entertainment. While
are sequential networks which take into considera- the Named Entity Recognition is for categorizing
tion the sequential nature of the words in a sentence Hindi words as , Location, Occupation, Date, num-
to explain the characteristic of a particular word in ber, name, event, things, organization and miscella-
the bigger picture of an entire sentence, rather than neous classes as others. By and large the number
as an individual entity. It is well known that words of words are higher for others. For this stronger
in a sentence are known by the company they keep need of contextual information is essential, hence,
( Udupa et al. (2009)). For example we always go we consider the need for Bidirectional LSTMS of
to a place or come from a place. The identifier of Named Entity Recognition considering its strong
a word becomes its context. ” Raj , Run to India” , ability to learn contextual information. ((Melamud
when a neural network wishes to understand India , et al., 2016).This method actually facilitates the
it will understand it as India as on the basis of the consideration of both the previous word and the
next word , so this can help in Named Entity recog-
nition.

2 Methodology
The first step is the conversion of words into a rep-
resentative form for computational algorithms to
understand. The simplest method which has been
explored in this method is to attach a unique iden-
tifier to each word under consideration. A keras (a) Malayalam news task classes
tokenizer ((Chollet et al., 2015)) has been used for
the Malayalam Text to integer conversion , but in
the case of Hindi it gave poorer results with a sin-
gle number representing multiple words. The will
result in ambiguity and result in considerable re-
duction in the accuracy. When the sentences are
converted into a sequence of number from a se-
quence of words , they need to be converted into
vectors of similar length in order to facilitate train-
ing by neural networks. The number of sentences
in Text classification is much lower than the num- (b) Hindi named Entity Recognition categories
ber of sentences in the Named Entity recognition
Figure 1: Number of classes for Malayalam text classi-
task. The Named Entity recognition task has a fication and Hindi Named Entity recognition
considerable higher number of words classified as
others than all other classes combined. The number
of classes in the text classification 1a is balanced for every time step of the training. The input vector
and the number of words in the 1b are tilting in length is chosen as the size of the longest sentence
favour of others in a very high manner. This is a in both the cases. For optimization we used to
big challenge which emerges in classification tasks RMSprop ((Dauphin et al., 2015)), which is the
where the ’others’ is trained several times over and division of the learning rate by a running average.
over again . to facilitate compatibility with a smaller data set
A 64 unit LSTM is used for the Classification and the Hindi NER used Adam optimizer ((Balles
of Malayalam text , with a 256 layer Neurons with and Hennig, 2018)) for training , since the data
’Relu’ Layer as activation and the final layer with set here is considerably bigger, where the learning
3 neurons depicting the classes are given a sigmoid rate is adaptive varied and is known to perform
layer as in 2a. The word embedding is chosen well on sparse matrices , especially on Natural Lan-
on the basis of the input dimension of the vector guage processing since we go on to work on sparse
under consideration. Dropouts,meaning killing of matrices with all the padding as well.
certain neurons during the training process to pre-
3 Results and Discussion
vent over fitting. The difference between ’sigmoid’
and ’Relu’ is that the relu does not have a negative The result for the text classification and the Named
branch. The training is carried for one epoch in Entity recognition is encouraging as depicted in
case of the LSTM and for the Bidirectionall LSTM 3a and 3b respectively, where the heat plot of dif-
a single training is carried out on batches.The archi- ferent classification metrics is enlisted for each of
tecture of the bidriectional LSTM is given in 2a. A the classes.One can easily see that although the
single epoch took one hour and it was considerably accuracy is high , the precision comes down con-
difficult to train the network further, in contrast siderably for the text classification, and the recall
with the LSTM which trained for 10 epochs. The is very low for the Name Entity recognition. It is
word embedding dimension has been chosen on the interesting to note that the ’datenum’ class has the
basis of the total number of words and a time dis- lowest recall, this can be intuitive understood on
tributed output, see for example (Altché and de La the fact that although many words the algorithm
Fortelle (2017),is used which represents an output encountered are dates and numbers, the algorithm
(a) Malayalam news task

(b) Hindi named Entity Recognition

Figure 3: Results for LSTM and Bidirectional LSTM


applied to Malayalam Text classification and Named
Entity Recognition

is so selective that it rejects most of the words


which are actually dates .The results of the Text
classification translate to independent tests, but the
Named entity recognition fails considerably giving
a very little f1 score of 0.2. This is a clear result
of over fitting. With better hyper parameter tuning
(a) LSTM for Malayalam Text Classification and trying out different architectures one might be
able to develop a closely related architecture which
will give good results even without standardized
vectorial representations.

(b) Hindi named Enity Recognition using Bidirec-


tional LSTM

Figure 2: Number of classes for Malayalam text classi-


fication and Hindi Named Entity recognition
References
F. Altché and A. de La Fortelle. 2017. An lstm network
for highway trajectory prediction. In 2017 IEEE
20th International Conference on Intelligent Trans-
portation Systems (ITSC), pages 353–359.
Lukas Balles and Philipp Hennig. 2018. Dissect-
ing adam: The sign, magnitude and variance of
stochastic gradients. In Proceedings of the 35th In-
ternational Conference on Machine Learning, vol-
ume 80 of Proceedings of Machine Learning Re-
search, pages 404–413, Stockholmsmässan, Stock-
holm Sweden. PMLR.

François Chollet et al. 2015. Keras. https://keras.


io.

Yann N. Dauphin, Harm de Vries, and Yoshua Bengio.


2015. Equilibrated adaptive learning rates for non-
convex optimization.
Oren Melamud, Jacob Goldberger, and Ido Dagan.
2016. context2vec: Learning generic context em-
bedding with bidirectional LSTM. In Proceedings
of The 20th SIGNLL Conference on Computational
Natural Language Learning, pages 51–61, Berlin,
Germany. Association for Computational Linguis-
tics.

P. Semberecki and H. Maciejewski. 2017. Deep learn-


ing methods for subject text classification of arti-
cles. In 2017 Federated Conference on Computer
Science and Information Systems (FedCSIS), pages
357–360.

Raghavendra Udupa, Abhijit Bhole, and Pushpak Bhat-


tacharyya. 2009. “a term is known by the company it
keeps”: On selecting a good expansion set in pseudo-
relevance feedback. In Advances in Information Re-
trieval Theory, pages 104–115, Berlin, Heidelberg.
Springer Berlin Heidelberg.

You might also like