CNN Intro
CNN Intro
ANN
We know it is good to learn a small model.
From this fully connected model, do we really need all the edges?
Can some of these be shared?
CNN: Intuition
We found that our model(ANN) was able to attain high
accuracy with the training data, but not with the
testing/validation data. We’re passing our image in as
one long string of pixels.
Why might that not be such a good idea?
There is important information contained in how pixels
are organized around each other. When we flatten the
picture into a single array, we lose that information
What is Kernel and convolution?
In a drawing program the area around which we click that
manipulates our image is called kernel.
We analyze the influence of nearby pixel by using
Filter(kernel)
Convolution is when our kernel is multiplied with our
image.
Applying one function to another function is convolution.
For each point on the image a value is calculated based on
filter using Convolution operation.
Base image is a function of color and kernel is a function of
pixel.
Kernel and Convolution
.06 .13 .06 0 -1 0
.13 .25 .13 -1 5 -1
0 0 0 0 0 0
Original Image 0 1.5 0 0 0.5 0
Brighten
0 0 0 Darken
0 0 0
Convolution These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter 2
0 1 0 0 1 0
0 0 1 0 1 0 -1 1 -1
…
…
6 x 6 image
Each filter detects a
small pattern (3 x 3).
1 -1 -1
-1 1 -1 Filter 1
Convolution -1 -1 1
stride=1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1 Filter 1
Convolution -1 -1 1
If stride=2
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
-1 1 -1
-1 1 -1 Filter 2
Convolution -1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map0 1
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Padding
Original Image Zero Padding
0 0 0 0 0 0 0 0
1 0 1 1 0 1
0 1 0 1 1 0 1 0
0 1 0 0 1 0
0 0 1 0 0 1 0 0
0 1 1 1 1 0
0 0 1 1 1 1 0 0
0 1 1 1 1 0
0 0 1 1 1 1 0 0
1 0 1 1 0 1
0 1 0 1 1 0 1 0
1 1 0 0 1 1
0 1 1 0 0 1 1 0
0 0 0 0 0 0 0 0
Color image: RGB 3 channels
1 1 -1-1 -1-1 -1-1 1 1 -1-1
1 -1 -1 -1 1 -1
-1 1 -1 -1 1 -1
-1-1 1 1 -1-1 Filter 1 -1-1 1 1 -1-1 Filter 2
-1-1 -1-1 1 1 -1-1 1 1 -1-1
-1 -1 1 -1 1 -1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
3D Filter
In the 1D case, we slide a one dimensional filter
over a one dimensional input
In the 2D case, we slide a two dimensional filter
over a two dimensional out-put
What would a 3D filter look like?
It will be 3D and we will refer to it as a volume
Relation between input size, output
size and filter size
W2 = W1 – F + 1
H2 = H1 – F + 1
W2 = W1 - F + 2P + 1
H2 = H1 - F + 2P + 1
Final Version
Excersise
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0
x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
…
1 0 0 0 0 1
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
1 0
6 x 6 image
3 0
14
15 1 Only connect to 9
inputs, not fully
16 1
connected
…
1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
…
1 0 0 0 0 1
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
1 0
6 x 6 image
3: 0
14:
15: 1
16: 1 Shared weights
…
The whole CNN
cat dog ……
Can
repeat
many
times
Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Why Pooling
Subsampling pixels will not change the object
bird
bird
Subsampling
3 1
0 3
Can
repeat
many
Smaller than the original
times
image
The number of channels
is the number of filters
The whole CNN
cat dog ……
Flattening
Flattened
Few parameters for convolution
1. Stride:. The good is that means we have less data to analyze, but
if we increase our stride too much, we might miss important
information.
2. Padding: If we want enough data to make sure all pixels are
used in convolution, or if we want the resulting image to be the
same size as our input image, we can do something called
padding
3. Max-Pooling: especially useful when working with large
images because it’s a way to shrink images down to a smaller
size, and smaller images mean less computation
4. Dropout: randomly shutoff neurons at a rate we specify,
meaning, it’s unable to learn for that step of training.
Demo
https://setosa.io/ev/image-kernels/