Convolutional Networks 2024
Convolutional Networks 2024
Networks
Machine Learning
Michael Wand
TA: Vincent Herrmann ([email protected])
Also called
Multi-Layer Perceptron (MLP)
[1] Ciresan et al., Deep, Big, Simple Neural Nets for Handwritten
Digit Recognition. Neural Computation 2010
Image Classification – a HUGE field
Image Classification
• Is the MLP architecture optimal for image recognition?
“8”
Image Classification
• Is the MLP architecture optimal for image recognition?
• The MLP can learn/approximate any function from input to output!
• (within reasonable constraints, which we have not covered)
• But…
• Two-dimensional input -> large input dimensionality -> many weights!
• Disregards spatial information!
• Does not allow to share “knowledge” across different parts of the image!
“8”
Image classification
• This is what the MLP sees…
40
40
Convolutional Layer
• Let’s take a sample image – say, 40x40 pixels and 3 color channels
• Slide a convolutional filter over the image, say, 5x5 pixels
40
40
Convolutional Layer
• Let’s take a sample image – say, 40x40 pixels and 3 color channels
• Slide a convolutional filter over the image, say, 5x5 pixels
• At each offset: Multiply pixel values by filter values and sum, add bias, get
scalar result
40
40
Convolutional Layer
• Let’s take a sample image – say, 40x40 pixels and 3 color channels
• Slide a convolutional filter over the image, say, 5x5 pixels
• At each offset: Multiply pixel values by filter values and sum, add bias, get
scalar result
u s = filter size (width / height, usually equal)
C = channels
a,b (offset) 𝑠 𝑠 𝐶
x
𝑢𝑎,𝑏 = 𝑤𝑖,𝑗,𝑐 𝑥𝑖+𝑎,𝑗+𝑏,𝑐 + 𝑤0
𝑖=0 𝑗=0 𝑐=0
Convolutional Layer
How to interpret this operation?
• Think about how you measure similarity in a vector space.
• Compute the dot product between two vectors:
𝑥 = (𝑥1 , 𝑥2 ,…, 𝑥𝑁 ); 𝑦 = (𝑦1 , 𝑦2 ,…, 𝑦𝑁 ); 𝑥 ∙ 𝑦 = 𝑥 𝑇 𝑦 = σ𝑖 𝑥𝑖 𝑦𝑖
40
Convolutional Layer
• Of course, we need several features (i.e. several filters): say 16 for
example.
• Thus we get several 36
16
feature maps.
36
40
40
Convolutional Layer
• After the feature map, we frequently apply a nonlinearity (f), usually
component-wise.
• We can reduce the representation size by applying the filter only
every n-th pixel
(stride, not shown)
z
f 36
36
16
u
Convolutional Layer Summarized
• Multiple filters with a certain size (often called kernel size) are shifted over the image,
with a specific step size (stride).
• At each offset, the filter coefficients are multiplied by the pixel values (across all
channels), result is summed to create scalar output.
• Yields feature maps (K filters – K feature maps) which preserve spatial structure
𝑠 𝑠 𝐶
(𝑘) (𝑘)
𝑢𝑎,𝑏,𝑘 = 𝑤𝑖,𝑗,𝑐 𝑥𝑖+𝑎,𝑗+𝑏,𝑐 + 𝑤0
𝑖=0 𝑗=0 𝑐=0
𝜕𝒛∙,∙,𝑘 𝜕𝑧𝑎,𝑏,𝑘
(𝑘)
= (𝑘)
𝜕𝑤𝑖,𝑗,𝑐 𝑎,𝑏 𝜕𝑤𝑖,𝑗,𝑐
ResNet34
Conclusion / Summary
Today you should have learned
• what is the idea of convolutional layers / networks
• why they are useful in image processing (but not only there)
• how forward and backward propagation are implemented in the
convolutional case
• how you construct a neural network for an image classification task