02-DL-Deep Learning For Image Data (Convnets) 03
02-DL-Deep Learning For Image Data (Convnets) 03
Out[12]: '2.4.3'
Out[14]: True
It is often said that deep learning models are "black boxes", learning representations that are difficult to extract and present in a human-readable form.
While this is partially true for certain types of deep learning models, it is definitely not true for convnets. The representations learned by convnets are
highly amenable to visualization, in large part because they are representations of visual concepts. Since 2013, a wide array of techniques have been
developed for visualizing and interpreting these representations. We won't survey all of them, but we will cover three of the most accessible and useful
ones:
Visualizing intermediate convnet outputs ("intermediate activations"). This is useful to understand how successive convnet layers transform their
input, and to get a first idea of the meaning of individual convnet filters.
Visualizing convnets filters. This is useful to understand precisely what visual pattern or concept each filter in a convnet is receptive to.
Visualizing heatmaps of class activation in an image. This is useful to understand which part of an image where identified as belonging to a given
class, and thus allows to localize objects in images.
For the first method -- activation visualization -- we will use the small convnet that we trained from scratch on the cat vs. dog classification problem two
sections ago. For the next two methods, we will use the VGG16 model that we introduced in the previous section.
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 148, 148, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 72, 72, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 34, 34, 128) 73856
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 15, 15, 128) 147584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 6272) 0
_________________________________________________________________
dense (Dense) (None, 512) 3211776
_________________________________________________________________
dense_1 (Dense) (None, 1) 513
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________
This will be the input image we will use -- a picture of a cat, not part of images that the network was trained on:
plt.imshow(img_tensor[0])
plt.show()
In order to extract the feature maps we want to look at, we will create a Keras model that takes batches of images as input, and outputs the activations
of all convolution and pooling layers. To do this, we will use the Keras class Model . A Model is instantiated using two arguments: an input tensor (or
list of input tensors), and an output tensor (or list of output tensors). The resulting class is a Keras model, just like the Sequential models that you
are familiar with, mapping the specified inputs to the specified outputs. What sets the Model class apart is that it allows for models with multiple
outputs, unlike Sequential . For more information about the Model class, see Chapter 7, Section 1.
When fed an image input, this model returns the values of the layer activations in the original model. This is the first time you encounter a multi-output
model in this book: until now the models you have seen only had exactly one input and one output. In the general case, a model could have any
number of inputs and outputs. This one has one input and 8 outputs, one output per layer activation.
For instance, this is the activation of the first convolution layer for our cat image input:
This channel appears to encode a diagonal edge detector. Let's try the 30th channel -- but note that your own channels may vary, since the specific
filters learned by convolution layers are not deterministic.
This one looks like a "bright green dot" detector, useful to encode cat eyes. At this point, let's go and plot a complete visualization of all the activations
in the network. We'll extract and plot every channel in each of our 8 activation maps, and we will stack the results in one big image tensor, with channels
stacked side by side.
# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:8]:
layer_names.append(layer.name)
images_per_row = 16
plt.show()
Output hidden; open in https://colab.research.google.com to view.
The first layer acts as a collection of various edge detectors. At that stage, the activations are still retaining almost all of the information present in
the initial picture.
As we go higher-up, the activations become increasingly abstract and less visually interpretable. They start encoding higher-level concepts such as
"cat ear" or "cat eye". Higher-up presentations carry increasingly less information about the visual contents of the image, and increasingly more
information related to the class of the image.
The sparsity of the activations is increasing with the depth of the layer: in the first layer, all filters are activated by the input image, but in the
following layers more and more filters are blank. This means that the pattern encoded by the filter isn't found in the input image.