0% found this document useful (0 votes)
19 views9 pages

(B) Text Generation Using Long Short-Term Memory Network

This paper discusses the advancements in natural language processing through the use of long short-term memory (LSTM) networks, a type of recurrent neural network (RNN) that addresses the limitations of traditional RNNs, particularly the vanishing gradient problem. The authors present a methodology for constructing a deep stacked LSTM model capable of generating text from a random input seed, achieving a testing accuracy of 71.22%. The paper concludes that while LSTMs are effective for text generation, further improvements can be made by increasing model complexity and applying transfer learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views9 pages

(B) Text Generation Using Long Short-Term Memory Network

This paper discusses the advancements in natural language processing through the use of long short-term memory (LSTM) networks, a type of recurrent neural network (RNN) that addresses the limitations of traditional RNNs, particularly the vanishing gradient problem. The authors present a methodology for constructing a deep stacked LSTM model capable of generating text from a random input seed, achieving a testing accuracy of 71.22%. The paper concludes that while LSTMs are effective for text generation, further improvements can be made by increasing model complexity and applying transfer learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Text Generation Using Long Short-Term

Memory Networks

Ishika Dhall, Shubham Vashisth and Shipra Saraswat

Abstract The domain of natural language processing has lately achieved excep-
tional breakthroughs especially after the origination of the deep neural networks.
This has enabled the machine learning engineers to develop such deep models that
are capable of performing high-level automation, empowering computer systems to
interact with the humans in a competent manner. With the usage of special types of
deep neural networks known as recurrent neural networks, it is possible to accom-
plish various applications in the domain of natural language processing including
sentiment analysis, part-of-speech tagging, machine translation, and even text gen-
eration. This paper presents a deep, stacked long short-term memory network, an
advanced form of recurrent neural network model which can generate text from a
random input seed. This paper discusses the shortcomings of a conventional recur-
rent neural network hence bringing forward the concept of long short-term memory
networks along with its architecture and methodologies being adopted.

Keywords Long short-term memory networks · Recurrent neural networks ·


Natural language processing · Text generation · Machine learning

1 Introduction

Text generation is one of the most significant applications that the domain natu-
ral language processing and machine learning is aiming to crack. Text generation
or natural language generation is a process of generating a meaningful text delib-
erately to achieve some specific communication goals. Text generation techniques
are used to perform automatic letter writing, automatic report generation, automatic
documentation systems, etc.
The goal of text generation is to empower computer machines to recognize data
patterns of text vocabulary which further produces understandable human language.
Information (nonlinguistic) is fed to system as input and the expected output can

I. Dhall (B) · S. Vashisth · S. Saraswat


Department of CSE, Amity University, Sector-125, Noida, Uttar Pradesh, India

© Springer Nature Singapore Pte Ltd. 2020 649


D. K. Sharma et al. (eds.), Micro-Electronics and Telecommunication
Engineering, Lecture Notes in Networks and Systems 106,
https://doi.org/10.1007/978-981-15-2329-8_66
650 I. Dhall et al.

Fig. 1 Basic model of text generating systems

simply be the text, tables, graphics, plain ASCII or formatted LaTeX, RTF and
HTML.
On a higher abstraction level, it is easy to visualize the working of a text gener-
ating system (see Fig. 1). However, the traditional methods like Pollen Forecast as
presented in [1] for Scotland system and others were not able to show many promis-
ing results as they worked on simpler information and were not robust to complex
data. The recent development in the field of machine learning and deep learning has
given rise to various advance forms of deep neural network like for instance various
forms ANNs which are being used to crack problems of almost every domain from
classifying ECG signals [2] to problems that come under the domain of computer
vision using CNNs and NLP using RNNs.
This paper contributes by presenting an approach to text generation for a random
seed by constructing a deep and stacked long short-term memory network model.
The trained model flexibly predicts the text as per the number of characters that are
provided to it as input. The paper also includes the methodology and architecture
of the LSTM model. Another contribution of the paper includes the complications
with the conventional RNN model which resulted in the LSTM networks coming
into picture.
The rest of the paper is organized as mentioned: Sect. 2 presents the preliminaries;
Sect. 3 presents the related works; Sect. 4 is proposing the methodologies and Sect. 5
discusses the result. Finally, conclusion is discussed in Sect. 6.

2 Preliminaries

2.1 Recurrent Neural Networks

An recurrent neural network (RNN) is special type of an artificial neural network


(ANN). It was premeditated to accomplish pattern recognition on sequential form
data; an example of such data can be text or numerical time-series or genomes
and handwriting, etc. RNN can work in such situations where feed-forward neural
network fails, in case of a feed-forward neural network, all set input (X) and output
(Y ) are independent concerning each other. Therefore, in scenarios where the need is
to predict the following word in sentence, a traditional feed-forward neural network
Text Generation Using Long Short-Term Memory Networks 651

Fig. 2 Working of an RNN

will not consider the previous set of output to forecast the very subsequent word in
a sentence.
As we know, a deep neural network consists of many hidden layers and each
hidden layer comprises of its own set of weights and biases, e.g., (w0 , b0 ), (w1 , b1 ),
etc. On the contrary (see Fig. 2), RNN can change the independent activations to
dependent activations by providing all the layers the same set of weights and biases.
This allows the joining of layers together with the same set of weights and biases
into a single recurrent layer.
In order to train the RNN as presented in [3], input is provided to the RNN in the
form of a solitary time step then the current state is calculated using set of previous
states and the current input. For the next time stamp (ht → ht −1 ) depending upon
the problem for which the network is being designed, one can take multiple steps
and join the information from all the previous states. After the completion of the
time steps, the output is evaluated using the final current state. The error is evaluated
by comparison with the output to the target output. The use of back-propagation
algorithm is considered to update the weights of the RNN for each and every iteration.

2.2 Long Short-Term Memory Networks (LSTMs)

LSTMs or long short-term memory network [4] is a unique and superior category in
recurrent neural networks (RNNs). It is proficient to acquire long-term dependencies,
i.e., in the context of text generation, while predicting a word in the sentence which
uses information that is not dependent on the very previous word rather uses some
information in the text that is way too far from the current word position, such
dependency is known as long-term dependency. LSTM solves the issue of “vanishing
gradient” that is faced in the case of standard recurrent neural network (RNN).
652 I. Dhall et al.

Fig. 3 Working of a long short-term memory network

An RNN follows a simple chain structure which consists of a simple repeating


module consisting of a squashing activation function like tanh. On the other hand,
LSTM although also follows a chain alike construction but the repeating modules
consist of a dissimilar structure pattern (see Fig. 3).

3 Related Work

Recurrent neural networks (RNNs) are one of the most prevailing models in the
domain of deep learning which do not exhibit a generic nature. The conventional
architecture of RNNs is not suited for tasks of character-level modeling. Therefore,
to overcome the problems [5] proposed a modified variant of RNN which resolved
their training problems by application of Hessian-free optimization technique and
introducing gated or multiplicative connections. A long short-term memory network
(LSTM) is a special form of recurrent neural network that is designed for the task of
compound sequence generation. These networks follow a long-range of construction
which works by prediction of one data-point at a time. The model demonstrated in
[6] can perform efficient synthesis of cursive handwriting in an extensive diversity
of styles. Another approach to text generation is by using neural checklist models
[7]. This model generates output by vigorously adjusting the interpolation into a
model. RNN is difficult to train and it is dubious to show the full potential of a given
RNN model and in order to address these problems, [8] used the long short-term
memory neural network architecture to examine it on a task of modeling an English
and a French language. It showed an improvement of around 8% over other existing
RNNs. [9] proposed two innovative approaches to text generation which encodes
the contexts into a continuous semantic illustration and then decodes the semantic
Text Generation Using Long Short-Term Memory Networks 653

illustration into text categorizations with RNNs. Since RNNs have its own hitches
like vanishing gradients, the following research will depict how using LSTMs can
overcome its shortcomings.

4 Methodology

The initial phase in an LSTM is to evaluate that form of information which is not
needed and will be neglected from the state of the cell. The second phase is to identify
whether the recent information will be maintained to the cell state which is carried
out by following some steps. Further, a tanh layer creates new candidate values in
a form of that could be updated to the state. Hence, using this mathematical model,
LSTMs can perform text generation precisely and efficiently.

4.1 Data Collection and Preprocessing

To practically implement the same, data acts the initial step. This data must be in
the format of ASCII text and will act as the fuel for our long short-term memory
network. The type of text data that is used to train the data will be the governing
factor while performing text generation. For this model “Alice in Wonderland.txt”
file is provided as input to the network which comprises of 1,63,780 characters.
Data preprocessing is required in order to have filtered and clean data that can
directly be fed to our neural network. For example, the text data is converted to lower
case and mapping of unique characters to integers is performed before designing a
long short-term memory network.

4.2 Constructing the LSTM Model

The goal is to build a deep network by stacking LSTM layers in order to enable
the model to learn complex and long sentences in an efficient manner. This deep
LSTM model can be built by using various libraries and its functions like Keras,
TensorFlow, Theano, Cntk, etc. The LSTM as discussed in [10] will consist of many
different layers along with activation function and a type of learning optimizer which
is adaptive momentum estimation (Adam) in our case. Figure 4 describes the model
summary of the LSTM model that is used to perform text generation.
654 I. Dhall et al.

Fig. 4 Model summary


Text Generation Using Long Short-Term Memory Networks 655

4.3 Training and Hyper-Parameter Tuning

To train the long short-term memory network, the preprocessed data is fed to our
designed LSTM model. The LSTM fits the data and accordingly updates its weights
and biases. Training phase is the crux of Deep learning task as this is where learning
happens. Therefore, this phase is computationally expensive and thus requires a lot of
resources and computational power. In order to avoid overfitting batch normalization
is performed after every LSTM layer. The model was trained on Google Colab
platform using the GPU support; it took the model approximately 22 h to complete
its training. While training a LSTM, we need to provide the optimal values of number
of epochs and batch size.
Hyper-parameter tuning needs to be done in order to evaluate the optimal values
for various parameters such as learning rate, no. of epochs, batch size, quantity of
nodes, and quantity of layers.

5 Results

The trained stacked long short-term memory model was successfully able to generate
text for a randomly generated seed with a testing accuracy of 71.22% as shown in
Fig. 5.
The data generated by the LSTM model is quite realistic in nature and interpretable
in nature. The final long short-term memory network consisted of the following
parameters:
Using the hyper-parameter mentioned in Table 1, long short-term memory can
predict next word in a sentence maintaining its memory and history. The achieved
testing accuracy of 71.22% of the constructed model may be increased by increasing
the no. of epochs and adding a greater quantity of layers/nodes to the current network.

Fig. 5 Sample text generated using LSTM


656 I. Dhall et al.

Table 1 Hyper-parameters
Parameters (deep LSTM model) Configuration
for LSTM model
Total number LSTM layers 10
Number of nodes (LSTM) 300
Kernel initializer he_uniform
Optimizer Adam
Learning rate 0.001
Batch size 1000
Number of epochs 100

6 Conclusion and Future Work

LSTM networks have proved to be the best type of model existing till date to per-
form prediction and classification over text-based data. LSTM is successfully able to
resolve the problem faced by the standard recurrent neural networks, i.e., the problem
of vanishing gradient. LSTM is an efficient model but it is overall computationally
expensive and requires high processing power, i.e., use of GPUs to fit and train the
model. They are currently used in various applications like voice assistants, smart vir-
tual keyboards and automated chatbots, sentiment analysis, etc. As future research,
the model accuracy of current LSTM models can be surpassed by appending more
layers and nodes to the network and applying the notion of transfer learning on the
same problem domain.

References

1. Reiter E (2007) An architecture for data-to-text systems. In: Proceedings of the eleventh
European workshop on natural language generation. Association for Computational Linguistics
2. Saraswat S, Srivastava G, Shukla S (2018) Classification of ECG signals using cross-recurrence
quantification analysis and probabilistic neural network classifier for ventricular tachycardia
patients. Int J Biomed Eng Technol 26(2):141–156
3. Mikolov T et al (2010) Recurrent neural network based language model. In: Eleventh annual
conference of the international speech communication association
4. Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network archi-
tectures for large scale acoustic modeling. In: Fifteenth annual conference of the international
speech communication association
5. Sutskever I, Martens J, Hinton GE (2011) Generating text with recurrent neural networks. In:
Proceedings of the 28th international conference on machine learning (ICML-11)
6. Graves A (2013) Generating sequences with recurrent neural networks. arXiv preprint arXiv:
1308.08502013
7. Kiddon C, Zettlemoyer L, Choi Y (2016) Globally coherent text generation with neural checklist
models. In: Proceedings of the 2016 conference on empirical methods in natural language
processing
8. Sundermeyer M et al (2012) LSTM neural networks for language modeling. In: INTER-
SPEECH
Text Generation Using Long Short-Term Memory Networks 657

9. Tang J et al (2016) Context-aware natural language generation with recurrent neural networks.
arXiv preprint arXiv:1611.099002016
10. Li X, Wu X (2015) Constructing long short-term memory based deep recurrent neural networks
for large vocabulary speech recognition. In: 2015 IEEE international conference on acoustics,
speech and signal processing (ICASSP). IEEE

You might also like