Dinosaurus Island - Character-Level Language Model - (Final) - Learners - Ipynb
Dinosaurus Island - Character-Level Language Model - (Final) - Learners - Ipynb
Branch: master coursera-deep-learning / Sequence Models / Dinosaur Island -- Character-level language model / Find file Copy path
Dinosaurus+Island+--+Character-level+language+model+-+%28final%29+-+learners.ipynb
0 contributors
Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they
could find, and compiled them into this dataset (dinos.txt). (Feel free to take a look by clicking the previous link.) To create new dinosaur names,
you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly
generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!
We will begin by loading in some functions that we have provided for you in rnn_utils. Specifically, you have access to functions such as
rnn_forward and rnn_backward which are equivalent to those you've implemented in the previous assignment.
1 - Problem Statement
The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the <EOS> (or "End of
sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell
below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary
that maps each index back to the corresponding character character. This will help you figure out what index corresponds to what character in the
probability distribution output of the softmax layer. Below, char_to_ix and ix_to_char are the python dictionaries.
Initialize parameters
Run the optimization loop
Forward propagation to compute the loss function
Backward propagation to compute the gradients with respect to the loss function
Clip the gradients to avoid exploding gradients
Using the gradients, update your parameter with the gradient descent update rule.
Return the learned parameters
At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset is a list
of characters in the training set, while is such that at every time-step , we have .
You will then apply these two functions to build the model.
In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients if
needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient
vector is clipped to lie between some range [-N, N]. More generally, you will provide a maxValue (say 10). In this example, if any component of the
gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is
between -10 and 10, it is left alone.
Exercise: Implement the function below to return the clipped gradients of your dictionary gradients. Your function takes in a maximum threshold
and returns the clipped versions of your gradients. You can check out this hint (https://docs.scipy.org/doc/numpy-
1.13.0/reference/generated/numpy.clip.html).
Arguments:
gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
maxValue -- everything above this number is set to this number, and everything less than -ma
xValue is set to -maxValue
Returns:
gradients -- a dictionary with the clipped gradients.
'''
gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
return gradients
In [ ]: np.random.seed(3)
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
Expected output:
**gradients["dWaa"][1][2] ** 10.0
**gradients["dWax"][3][1]** -10.0
**gradients["dWya"][1][2]** 0.29713815361
**gradients["db"][4]** [ 10.]
**gradients["dby"][1]** [ 8.45833407]
2.2 - Sampling
Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture
below:
Exercise: Implement the sample function below to sample characters. You need to carry out 4 steps:
Step 1: Pass the network the first "dummy" input (the vector of zeros). This is the default input before we've generated any
characters. We also set
Step 2: Run one step of forward propagation to get and . Here are the equations:
Note that is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1). represents the probability that the
character indexed by "i" is the next character. We have provided a softmax() function that you can use.
Step 3: Carry out sampling: Pick the next character's index according to the probability distribution specified by . This means that if
, you will pick the index "i" with 16% probability. To implement it, you can use np.random.choice
(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.choice.html).
np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())
This means that you will pick the index according to the distribution:
.
Step 4: The last step to implement in sample() is to overwrite the variable x, which currently stores , with the value of . You
will represent by creating a one-hot vector corresponding to the character you've chosen as your prediction. You will then forward
propagate in Step 1 and keep repeating the process until you get a "\n" character, indicating you've reached the end of the dinosaur
name.
Arguments:
parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b.
char_to_ix -- python dictionary mapping each character to an index.
seed -- used for grading purposes. Do not worry about it.
Returns:
indices -- a list of length n containing the indexes of the sampled characters.
"""
# Create an empty list of indices, this is the list which will contain the list of indexes o
f the characters to generate (≈1 line)
indices = None
# Loop over time-steps t. At each time-step, sample a character from a probability distribut
ion and append
# its index to "indexes". We'll stop if we reach 50 characters (which should be very unlikel
y with a well
# trained model), which helps debugging and prevents entering an infinite loop.
counter = 0
newline_character = char_to_ix['\n']
# Step 2: Forward propagate x using the equations (1), (2) and (3)
a = None
z = None
y = None
# Step 3: Sample the index of a character within the vocabulary from the probability dis
tribution y
idx = None
# Step 4: Overwrite the input character as the one corresponding to the sampled index.
x = None
x[None] = None
if (counter == 50):
indices.append(char_to_ix['\n'])
return indices
In [ ]: np.random.seed(2)
n, n_a = 20, 100
a0 = np.random.randn(n_a, 1)
i0 = 1 # first character is ix_to_char[i0]
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(voc
ab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
Expected output:
Exercise: Implement this optimization process (one step of stochastic gradient descent).
Arguments:
X -- list of integers, where each integer is a number that maps to a character in the vocabu
lary.
Y -- list of integers, exactly the same as X but shifted one index to the left.
a_prev -- previous hidden state.
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n
_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape
(n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy arra
y of shape (n_y, n_a)
b -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape
(n_y, 1)
learning_rate -- learning rate for the model.
Returns:
loss -- value of the loss function (cross-entropy)
gradients -- python dictionary containing:
dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
db -- Gradients of bias vector, of shape (n_a, 1)
dby -- Gradients of output bias vector, of shape (n_y, 1)
a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
"""
### START CODE HERE ###
In [ ]: np.random.seed(1)
vocab_size, n_a = 27, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(voc
ab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25, 26]
Expected output:
**Loss ** 126.503975722
**gradients["dWaa"][1][2]** 0.194709315347
**np.argmax(gradients["dWax"])** 93
**gradients["dWya"][1][2]** -0.007773876032
**gradients["db"][4]** [-0.06809825]
**gradients["dby"][1]** [ 0.01538192]
**a_last[4]** [-1.]
Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient
descent, you will sample 10 randomly chosen names to see how the algorithm is doing. Remember to shuffle the dataset, so that stochastic
gradient descent visits the examples in random order.
Exercise: Follow the instructions and implement model(). When examples[index] contains one dinosaur name (string), to create an example
(X, Y), you can use this:
index = j % len(examples)
X = [None] + [char_to_ix[ch] for ch in examples[index]]
Y = X[1:] + [char_to_ix["\n"]]
Note that we use: index= j % len(examples), where j = 1....num_iterations, to make sure that examples[index] is always a valid
statement (index is smaller than len(examples)). The first entry of X being None will be interpreted by rnn_forward() as setting .
Further, this ensures that Y is equal to X but shifted one step to the left, and with an additional "\n" appended to signify the end of the dinosaur
name.
Arguments:
data -- text corpus
ix_to_char -- dictionary that maps the index to a character
char_to_ix -- dictionary that maps a character to an index
num_iterations -- number of iterations to train the model for
n_a -- number of hidden neurons in the softmax layer
dino_names -- number of dinosaur names you want to sample at each iteration.
vocab_size -- number of unique characters found in the text, size of the vocabulary
Returns:
parameters -- learned parameters
"""
# Initialize parameters
parameters = initialize_parameters(n_a, n_x, n_y)
# Initialize loss (this is required because we want to smooth our loss, don't worry about i
t)
loss = get_initial_loss(vocab_size, dino_names)
# Optimization loop
for j in range(num_iterations):
# Use the hint above to define one training example (X,Y) (≈ 2 lines)
index = None
X = None
Y = None
# Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update paramet
ers
# Choose a learning rate of 0.01
curr_loss, gradients, a_prev = None
# Use a latency trick to keep the loss smooth. It happens here to accelerate the trainin
g.
loss = smooth(loss, curr_loss)
# Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model
is learning properly
if j % 2000 == 0:
seed += 1 # To get the same result for grading purposed, increment the seed by
one.
print('\n')
return parameters
Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations,
your model should learn to generate reasonable-looking names.
Conclusion
You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random
characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with
hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like maconucon, marloralus and
macingsersaurus. Your model hopefully also learned that dinosaur names tend to end in saurus, don, aura, tor, etc.
If your model generates some non-cool names, don't blame the model entirely--not all actual dinosaur names sound cool. (For example,
dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can
pick the coolest!
This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language
requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for
quite some time, and so far our favoriate name is the great, undefeatable, and fierce: Mangosaurus!
A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a
collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text--e.g., where a
character appearing somewhere a sequence can influence what should be a different character much much later in ths sequence. These long term
dependencies were less important with dinosaur names, since the names were quite short.
We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a
few minutes.
To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called "The Sonnets"
(shakespeare.txt).
Let's train the model for one more epoch. When it finishes training for an epoch---this will also take a few minutes---you can run
generate_output, which will prompt asking you for an input (<40 characters). The poem will start with your sentence, and our RNN-Shakespeare
will complete the rest of the poem for you! For example, try "Forsooth this maketh no sense " (don't enter the quotation marks). Depending on
whether you include the space at the end, your results might also differ--try it both ways, and try other inputs as well.
In [ ]: # Run this cell to try with different inputs without having to re-train the model
generate_output()
The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:
If you want to learn more, you can also check out the Keras Team's text generation implementation on GitHub: https://github.com/keras-
team/keras/blob/master/examples/lstm_text_generation.py (https://github.com/keras-
team/keras/blob/master/examples/lstm_text_generation.py).
References: