Alexnet - Number of Parameters and Tensor Sizes in A Convolutional Neural Network (CNN)
Alexnet - Number of Parameters and Tensor Sizes in A Convolutional Neural Network (CNN)
This post does not define basic terminology used in CNN and
assumes you are familiar with them. In this post, the word Tensor
simply means an image with an arbitrary number of channels.
So, the output image is of size 55x55x96 ( one channel for each kernel ).
We leave it for the reader to verify the sizes of the outputs of the Conv-
2, Conv-3, Conv-4 and Conv-5 using the above image as a guide.
Note that this can be obtained using the formula for the convolution
layer by making padding equal to zero and keeping same as the
kernel size. But unlike the convolution llayer, the number of channels in
the maxpool layer’s output is unchanged.
Example: In AlexNet, the MaxPool layer after the bank of convolution
filters has a pool size of 3 and stride of 2. We know from the previous
section, the image at this stage is of size 55x55x96. The output image
after the MaxPool layer is of size
Readers can verify the number of parameters for Conv-2, Conv-3, Conv-
4, Conv-5 are 614656 , 885120, 1327488 and 884992 respectively. The
total number of parameters for the Conv Layers is therefore 3,747,200.
Think this is a large number? Well, wait until we see the fully connected
layers. One of the benefits of the Conv Layers is that weights are shared
and therefore we have fewer parameters than we would have in case of a
fully connected layer.
Number of Parameters of a MaxPool Layer
There are no parameters associated with a MaxPool layer. The pool size,
stride, and padding are hyperparameters.
Number of Parameters of a Fully Connected (FC) Layer
There are two kinds of fully connected layers in a CNN. The first FC
layer is connected to the last Conv Layer, while later FC layers are
connected to other FC layers. Let’s consider each case separately.
Case 1: Number of Parameters of a Fully Connected (FC) Layer
connected to a Conv Layer
Let’s define,
= Number of weights of a FC Layer which is connected to a
Conv Layer.
= Number of biases of a FC Layer which is connected to a
Conv Layer.
= Size (width) of the output image of the previous Conv Layer.
= Number of kernels in the previous Conv Layer.
= Number of neurons in the FC Layer.
MaxPool-1 27x27x96 0 0 0
MaxPool-2 13x13x256 0 0 0
MaxPool-3 6x6x256 0 0 0
Output 1000×1 0 0 0
Total 62,378,344