Convolutional Neural Networks Introduction To Convolution Neural Networks
Convolutional Neural Networks Introduction To Convolution Neural Networks
Video Transcripts
Download video �le Download SubRip (.srt) �le
Download Text (.txt) �le
We wish to pass the above input through a feed-forward neural network with a single hidden layer made up of 1000 × 1000 hidden
units each of which is fully connected to the full image.
If the number of connections that exist between the �rst hidden layer and the input image is given by x, then enter below the value of
log10 (x) , i.e. the logarithm of x to the base 10 :
12 Answer: 12
Solution:
Each of the hidden unit is connected to all the pixels from the input image.
So, there are a total of 1000 ∗ 1000 = 106 connections between each of the hidden layers and the input.
6
Since there are 1000 ∗ 1000 = 10 hidden units in the �rst hidden layer, the total number of connections x amounts to
x = 106 ∗ 106 = 1012
The �rst convolutional layer with a 11 × 11 �lter will have 11 ∗ 11 = 121 parameters that operate on the entire image.
true
false
Solution:
The lecture explains this with a mushroom example. If the mushroom is in a di�erent place, the weight matrix parameters at that
location need to learn to recognize the mushroom anew. With convolutional layers, we have translational invariance as the same �lter is
passed over the entire image. Therefore, it will detect the mushroom regardless of location
+∞
(f ∗ g) (t) ≡ ∫ f (τ) g (t − τ) dτ
−∞
In this integral, τ is the dummy variable for integration and t is the parameter. Intuitively, convolution 'blends' the two function f and g
by expressing the amount of overlap of one function as it is shifted over another function.
Now, suppose we are given two rectangular function f and g as shown in the �gures below.
+∞
What is the area under the convolution: ∫−∞ (f ∗ g) dt
Solution:
We can �ip g and shift it over f . f ∗ g stays at 0 when there's no overlap. It increse linearly and reach the peak when f and g fully
overlap with each other.
The area under the convolution:
+∞ +∞ +∞
∫ (f ∗ g) dt = ∫ [∫ f (τ) g (t − τ) dτ] dt
−∞ −∞ −∞
+∞ +∞
=∫ f (τ) [∫ g (t − τ) dt] dτ
−∞ −∞
+∞ +∞
= [∫ f (τ) dτ] [∫ g (t) dt]
−∞ −∞
m=+∞
(f ∗ g) [n] ≡ ∑ f [m] g [n − m]
m=−∞
Intuitively, we can get this result by �rst �ipping g[n] and shift it over f[n] and compute the inner product at each step, as shown in the
�gures below:
In practice, it is common to call the �ipped g ′ as �lter or kernel, for the input signal or image f .
As we forced to pad zeros to where the input are not de�ned, the result on the edge of the input may not be accurate. To avoid this, we
can just keep the convolution result where the input f is actually de�ned. That is h [n] = [5, 8].
Now suppose the input f = [1, 3, −1, 1, −3], and the �lter g ′ = [1, 0, −1], what is the convolutional output of f ∗ g without zero
padding on f ? Enter your answer as a list below (e.g. [0,0,0])
What is the convolutional output of f ∗ g if we pad a 0 on both edges of f so that the output dimension is the same as the input? Enter
your answer as a list below (e.g. [0,0,0,0,0])
Solution:
With zero padding, we add 0 to f such that f = [0, 1, 3, −1, 1, −3, 0],
f ∗ g (0) = 0 × 1 + 1 × 0 + 3 × (−1) = −3
f ∗ g (1) = 1 × 1 + 3 × 0 + (−1) × (−1) = 2
f ∗ g (2) = 3 × 1 + (−1) × 0 + 1 × (−1) = 2
f ∗ g (3) = (−1) × 1 + 1 × 0 + (−3) × (−1) = 2
f ∗ g (4) = 1 × 1 + (−3) × 0 + 0 × (−1) = 1
⎡1 2 1⎤
f = ⎢2 1 1⎥
⎣1 1 1⎦
1 0.5
g′ = [ ]
0.5 1
15 Answer: 15
Solution:
We align the �lter with the top left corner of the image, and take the element wise multiplication of the �lter and the 2 by 2 square in the
top left corner. We then shift the �lter along the top row, doing the same thing. We then apply the same procedure to the next row. If we
went another row down, the bottom row of the �lter would not have any numbers to be multiplied with. Thus, we stop.
4 4
C=[ ]
4 3
Pooling Practice
1/1 point (graded)
A pooling layer's purpose is to pick up on a feature regardless of where it appears in the image.
true
false
Solution:
A pooling layer �nds the maximum value over a given area. The max value can be seen as a "signal" representing whether or not the
feature exists. For example, a high max value could indicate that the feature did appear in the image.
Add a Post
Convolution confusion 1
I found the explanation for convolution very confusing despite having a basic understanding of the concept. [Here's a videoon the topic from the probability course…