Keras1 - 1.4 Advanced Model Architectures
Keras1 - 1.4 Advanced Model Architectures
Keras backend
If we import the Keras backend we can build a function that takes in an input tensor
from a given layer and returns an output tensor from another or the same layer.
Tensorflow is the backend Keras is using in this course, but it could be any other, like
Theano. To define the function with our backend K we need to give it a list of inputs and
outputs, even if we just want 1 input and 1 output. Then we can use it on a tensor with
the same shape as the input layer given during its definition. If the weights of the layers
between our input and outputs change the function output for the same input will
change as well. We can use this to see how the output of certain layers change as
weights are adjusted during training, we will check this in the exercises!
This task alone wouldn't be very useful, but since along the way we decrease the
number of neurons, we are effectively making our network learn to compress its inputs
into a small set of neurons.
Neural separation
Put on your gloves because you're going to perform brain surgery!
Neurons learn by updating their weights to output values that help them better
distinguish between the different output classes in your dataset. You will make use of
the inp_to_out() function you just built to visualize the output of two neurons in the first
layer of the Banknote Authentication model as it learns.
The model you built in chapter 2 is ready for you to use, just like X_test and y_test.
Paste show_code(plot) in the console if you want to check plot().
You're performing heavy duty, once all is done, click through the graphs to watch the
separation live!
It consists of applying a filter, also known as kernel, of a given size. In this image, we
are applying a 3 by 3 kernel. We center the kernel matrix of numbers as we slide
through each pixel in the image, multiplying the kernel and pixel values at each location
and averaging the sum of values obtained. This effectively computes a new image
where certain characteristics are amplified depending on the filter used. The secret
sauce of CNNs resides in letting the network itself find the best filter values and to
combine them to achieve a given task.
Typical architectures
For a classification problem with many possible classes, CNNs tend to become very
deep. Architectures consist of concatenations of convolutional layers among other
layers known as pooling layers, that we won't cover here. Convolutional layers perform
feature learning, we then flatten the outputs into a unidimensional vector and pass it to
fully connected layers that carry out classification.
Looking at convolutions
Inspecting the activations of a convolutional layer is a cool thing. You have
to do it at least once in your lifetime!
To do so, you will build a new model with the Keras Model object, which
takes in a list of inputs and a list of outputs. The outputs you will provide to
this new model is the first convolutional layer outputs when given
an MNIST digit as input image.
The convolutional model you built in the previous exercise has already been
trained for you. It can now correctly classify MNIST handwritten images.
You can check it with model.summary() in the console.
Let's look at the convolutional masks that were learned in the first
convolutional layer of this model!
14.3.Intro to LSTMs
It's time to briefly introduce Long Short Term Memory networks, also known as LSTMs.
LSTM units perform several operations. They learn what to ignore, what to keep and to
select the most important pieces of past information in order to predict the future. They
tend to work better than simple RNNs for most problems.
LSTMs + Text
Let's go over an example on how to use LSTMs with text data to predict the next word in
a sentence!
Neural networks can only deal with numbers, not text. We need to transform each
unique word into a number. Then these numbers can be used as inputs to an
embedding layer.
Sequence preparation
We first define some text and choose a sequence length. With a sequence length of 3
we will end up feeding our model with two words and it will predict the third one. We
split the text into words with the split method. The output looks like this: We then, need
to turn these words into consecutive lines of 3 words each. We can loop from seq_len to
the number of words + 1 and store each line. The end results look like this:
That's a nice looking model you've built! You'll see that this model is powerful enough to
learn text relationships, we aren't using a lot of text in this tiny example and our
sequences are quite short. This model is to be trained as usual, you would just need to
compile it with an optimizer like adam and use crossentropy loss. This is because we
have modeled this next word prediction task as a classification problem with all the
unique words in our vocabulary as candidate classes