0% found this document useful (0 votes)

13 views44 pages

Convolutional Networks 2024

cnn

Uploaded by

Francesco Gualdi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views44 pages

Convolutional Networks 2024

cnn

Uploaded by

Francesco Gualdi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Convolutional Neural

Networks
Machine Learning
Michael Wand
TA: Vincent Herrmann ([email protected])

Fall semester 2023

Recap: Fully connected neural networks
• So far we have covered feedforward neural networks
• (i.e. data flows from input to output with no way back, no recurrence)
• … where neurons are organized in layers which are fully connected
• (i.e. all neurons of layer l are connected to all neurons of layer l+1)
• Training objective is to learn the weight of the connections.
Recap: Fully connected neural networks
• So far we have covered feedforward neural networks
• (i.e. data flows from input to output with no way back, no recurrence)
• … where neurons are organized in layers which are fully connected
• (i.e. all neurons of layer l are connected to all neurons of layer l+1)
• Training objective is to learn the weight of the connections.

Input Hidden Layers Output

Also called
Multi-Layer Perceptron (MLP)

Input Hidden Layers Output

A classical example: MNIST
• Let us look at image processing tasks!
• Task: recognize handwritten digits
• Standard training and test sets
available, good benchmark task
• Record-breaking GPU implementation
of MLP classifier on MNIST [1]

[1] Ciresan et al., Deep, Big, Simple Neural Nets for Handwritten
Digit Recognition. Neural Computation 2010
Image Classification – a HUGE field
Image Classification
• Is the MLP architecture optimal for image recognition?

“8”
Image Classification
• Is the MLP architecture optimal for image recognition?
• The MLP can learn/approximate any function from input to output!
• (within reasonable constraints, which we have not covered)
• But…
• Two-dimensional input -> large input dimensionality -> many weights!
• Disregards spatial information!
• Does not allow to share “knowledge” across different parts of the image!

“8”
Image classification
• This is what the MLP sees…

(not shown completely)

Image classification
• Furthermore, intuitively it is helpful to learn common knowledge from
different parts of the input image.
• Edge detection, common shapes, materials, shift invariance …
Image classification
• Furthermore, intuitively it is helpful to learn common knowledge from
different parts of the input image.
• Edge detection, common shapes, materials, shift invariance …
• Buzzword: Parameter Sharing
Image classification
• Furthermore, intuitively it is helpful to learn common knowledge from
different parts of the input image.
• Edge detection, common shapes, materials, shift invariance …
• Buzzword: Parameter Sharing
• Similarly, we may wish to apply a common computation to different parts
of the image.
Image classification
• Furthermore, intuitively it is helpful to learn common knowledge from
different parts of the input image
• Edge detection, common shapes, materials, shift invariance …
• Buzzword: Parameter Sharing
• Similarly, we may wish to apply a common computation to different parts
of the image.
• We derive a method to focus at parts of the image, while sharing
information across the image
• Not look at everything at once, but learn from everywhere
• Step by step, extend your field of vision
• Understand image “hierarchically”
Neocognitron (Fukushima ‘79)
• Introduced Convolutional Neural Networks

• Bioinspired, hierarchical model

• Multiple types of cells:
• S-cells extract local features
• C-cells deal with deformations
• Weight sharing
• Filters are shifted across the input map
• Trained with local Winner Take All unsupervised rules
Convolutional Layer
• Let’s take a sample image – say, 40x40 pixels and 3 color channels

40
Convolutional Layer
• Let’s take a sample image – say, 40x40 pixels and 3 color channels
• Slide a convolutional filter over the image, say, 5x5 pixels

40
Convolutional Layer
• Let’s take a sample image – say, 40x40 pixels and 3 color channels
• Slide a convolutional filter over the image, say, 5x5 pixels
• At each offset: Multiply pixel values by filter values and sum, add bias, get
scalar result

C = channels
a,b (offset) 𝑠 𝑠 𝐶
x
𝑢𝑎,𝑏 = ෍ ෍ ෍ 𝑤𝑖,𝑗,𝑐 𝑥𝑖+𝑎,𝑗+𝑏,𝑐 + 𝑤0
𝑖=0 𝑗=0 𝑐=0
Convolutional Layer
How to interpret this operation?
• Think about how you measure similarity in a vector space.
• Compute the dot product between two vectors:
𝑥 = (𝑥1 , 𝑥2 ,…, 𝑥𝑁 ); 𝑦 = (𝑦1 , 𝑦2 ,…, 𝑦𝑁 ); 𝑥 ∙ 𝑦 = 𝑥 𝑇 𝑦 = σ𝑖 𝑥𝑖 𝑦𝑖

• The dot product is a measure of “alignment” between x and y (in particular, it

is zero if x and y are orthogonal).
• The convolution operator is nothing but a dot product: It measures how well
the filter fits the respective part of an image!
• Example: learned filters from an image classification task,
first convolutional layer: Can you imagine how some of these detect edges?
Convolutional Layer
• Performing the convolution operation at all possible offsets yields a
feature map which describes the presence of the feature represented
by the filter across the image.
• Slightly smaller than
the image due to 36
the filter size
36
• example: 5x5 filter You can also pad the input
• Spatial structure 40 to retain the size of the
input data.
preserved!
3

40
Convolutional Layer
• Of course, we need several features (i.e. several filters): say 16 for
example.
• Thus we get several 36
16
feature maps.

40
Convolutional Layer
• After the feature map, we frequently apply a nonlinearity (f), usually
component-wise.
• We can reduce the representation size by applying the filter only
every n-th pixel
(stride, not shown)
z

f 36

36
16

u
Convolutional Layer Summarized
• Multiple filters with a certain size (often called kernel size) are shifted over the image,
with a specific step size (stride).
• At each offset, the filter coefficients are multiplied by the pixel values (across all
channels), result is summed to create scalar output.
• Yields feature maps (K filters – K feature maps) which preserve spatial structure
𝑠 𝑠 𝐶
(𝑘) (𝑘)
𝑢𝑎,𝑏,𝑘 = ෍ ෍ ෍ 𝑤𝑖,𝑗,𝑐 𝑥𝑖+𝑎,𝑗+𝑏,𝑐 + 𝑤0
𝑖=0 𝑗=0 𝑐=0

(x = image, w = filter, w0 = bias, c = source channel, k = feature map)

• Finally, compute z = f(u), usually component-wise
• The filter meta-parameters (size, stride, number of filters) must be empirically
determined.
Convolutional Layers Iterated
• The application of a convolutional layer can
be repeated!
• The feature maps of the previous layer take the
role of input channels in the subsequent layer.
• But there is an issue: We do not have a way to
look at the image as a whole.
• Each pixel of a feature map “sees” only a small
part of the input, determined by the filter size.
• This is called the receptive field.
Pooling layers
• How to consider consecutively larger receptive fields?
• Pooling layers join several adjacent pixels, thus shrinking the size of
the feature map.
• Most common version: max pooling.
• As with the convolutions, the max pooling operator
is shifted across each feature map, taking the maximum
over several adjacent values.
Pooling layers
Example with a pooling kernel size of k and a stride of s:
• Input: Feature map u of size N*N
• Output: Pooled feature map ũ of size (N/k,N/k)

𝑢෤ 𝑎,𝑏 = max 𝑢𝑎∙𝑘+𝑖,𝑏∙𝑘+𝑗

𝑖,𝑗=0…𝑘−1

• Also this operation can be varied by adding a

stride parameter.
• Small stride makes pooling operations overlap.
Pooling layers
Max pooling allows hierarchies of convolutional layers.
Furthermore, it
• improves generalization
• induces robustness to small position shifts
• decreases dimensionality of the representation!
There are also variants (e.g. average pooling).
Convolutional Network Architecture
• Very common architecture: alternating convolutional / max-pooling layers
• Convolutions e.g. of kernel size 3x3 … 7x7, stride 1, followed by nonlinearity
• Max-pooling with kernel size 2x2 or 3x3, stride == kernel
• Hierarchically compute representations (feature maps) which cover larger
and larger areas
• Size of representations decreases exponentially, e.g. 256x256 -> 128x128 ->64x64 …
• Depth of computation increases in parallel with decreasing representation size
• Finally, add a few fully connected layers, and the final layer (e.g. softmax for
classification)
• End-to-end architecture without hand-engineered features
• Convolutional feature maps take their role instead
Convolutional Network Architecture
Input image Output neurons
Convolutional Network Architecture
Visualizing learned filters

Learned filters on an image classification task (Imagenet), and highest activations.

From Zeiler and Fergus, Visualizing and Understanding Convolutional Networks, 2013
Visualizing learned filters
Training of Convolutional Networks
• Training is done with backpropagation, layer for layer
• So, we need to find out how to do backprop for convolutional and
max-pooling layers
• Max-pooling is easy:
• In the forward step, remember which
source neuron had maximum activation.
• The error which arrives
at the max-pooling layer is propagated to
that one neuron.
• The neurons which were not maximal receive
gradient zero since changing them slightly
does not change the output.
Training of Convolutional Networks
• What about the convolutional layer?
• Recap fully connected layer:
𝑢𝑗 = σ𝑖 𝑤𝑖𝑗 𝑥𝑖 + 𝑏𝑗 and 𝑧𝑗 = 𝑓(𝑢𝑗 )

• We need the derivative of zj w.r.t. wij

𝜕𝑧𝑗 𝜕𝑧𝑗 𝜕𝑢𝑗 𝜕𝑧𝑗 𝜕𝑢𝑗
= with = 𝑓 ′ 𝑢𝑗 and = 𝑥𝑖
𝜕𝑤𝑖𝑗 𝜕𝑢𝑗 𝜕𝑤𝑖𝑗 𝜕𝑢𝑗 𝜕𝑤𝑖𝑗
Training of Convolutional Networks
• For the convolutional layer, the situation is almost the same!
• With offset a,b and channel (feature) k, for a single pixel za,b,k
𝑘 (𝑘)
𝑧𝑎,𝑏,𝑘 = 𝑓(𝑢𝑎,𝑏,𝑘 ) and 𝑢𝑎,𝑏,𝑘 = ෍ 𝑤𝑖,𝑗,𝑐 𝑥𝑖+𝑎,𝑗+𝑏,𝑐 + 𝑤0
𝑖,𝑗,𝑐

where c is the source channel.

𝜕𝑢𝑎,𝑏,𝑘
• Consequently, we have 𝜕𝑤 (𝑘) = 𝑥𝑖+𝑎,𝑗+𝑏,𝑐
𝑖,𝑗,𝑐
Training of Convolutional Networks
• For the convolutional layer, the situation is almost the same!
• With offset a,b and channel (feature) k, for a single pixel za,b,k
𝑘 (𝑘)
𝑧𝑎,𝑏,𝑘 = 𝑓(𝑢𝑎,𝑏,𝑘 ) and 𝑢𝑎,𝑏,𝑘 = ෍ 𝑤𝑖,𝑗,𝑐 𝑥𝑖+𝑎,𝑗+𝑏,𝑐 + 𝑤0
𝑖,𝑗,𝑐

where c is the source channel.

𝜕𝑢𝑎,𝑏,𝑘
• Consequently, we have 𝜕𝑤 (𝑘) = 𝑥𝑖+𝑎,𝑗+𝑏,𝑐
𝑖,𝑗,𝑐
• The only difference: In the fully connected case, weight updates for wij
come only from output neuron j, whereas for the convolutional layer,
weight updates for w(k)i,j,c come from the entire feature map.
Training of Convolutional Networks
• Need to sum weight updates from all offsets. (Why?)

• In practice, first collect all weight deltas, then perform update

(requires just one write)
Training of Convolutional Networks
• Need to sum weight updates from all offsets. (Why?)
because we need to sum over
all the “paths” in which a filter
influences the output

• In practice, first collect all weight deltas, then perform update

𝜕𝒛∙,∙,𝑘 𝜕𝑧𝑎,𝑏,𝑘
(𝑘)
=෍ (𝑘)
𝜕𝑤𝑖,𝑗,𝑐 𝑎,𝑏 𝜕𝑤𝑖,𝑗,𝑐

• In practice, first collect all weight deltas, then perform update

(requires just one write)
Training is Computationally Expensive

… or one week on a GPU (2016)!

Final remarks
• Convolutional networks have revolutionized
image processing
• … which uses two-dimensional convolutions
• But they can also have other dimensionalities
• …and be applied to a variety of other signals
• Example: Speech recognition with ConvNets
• Compute spectrogram of speech signal (image)
• … or apply one-dimensional convolutions to
raw signals!
• (Compare Alex Waibel’s time delay neural networks)
• Michael works on Lipreading: process images to
recognize speech 
Final remarks
• You now know the standard convolutional architecture
• For examples have a look at the papers of Dan Ciresan (IDSIA),
or google AlexNet
• The convolutional architecture can be varied in a variety of ways
• Inception modules let the model decide on the size
of the convolution kernel (Szegedy et al., Going
Deeper with Convolutions, 2014)
• ResNet (He et al., Deep Residual
Learning for Image Recognition, 2015)
uses skip connections to allow deeper networks,
most standard architecture nowadays
Inception block

ResNet34
Conclusion / Summary
Today you should have learned
• what is the idea of convolutional layers / networks
• why they are useful in image processing (but not only there)
• how forward and backward propagation are implemented in the
convolutional case
• how you construct a neural network for an image classification task

CNN (Neural Network)
No ratings yet
CNN (Neural Network)
32 pages
CV 2025 Spring 16
No ratings yet
CV 2025 Spring 16
53 pages
Unit 5 Ann
No ratings yet
Unit 5 Ann
28 pages
(Fall 2024) Images and Convolutions
No ratings yet
(Fall 2024) Images and Convolutions
69 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
98 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
CNN Slides Part2
No ratings yet
CNN Slides Part2
69 pages
02 CNN Slides
No ratings yet
02 CNN Slides
77 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
CNN 1
No ratings yet
CNN 1
19 pages
Intro To CNN
No ratings yet
Intro To CNN
93 pages
Unit4 CNN
No ratings yet
Unit4 CNN
187 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
Unit 3
No ratings yet
Unit 3
105 pages
CNN Midterm
No ratings yet
CNN Midterm
103 pages
Lec 8
No ratings yet
Lec 8
60 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
10 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
Topic 3ii - Convolutional Neural Network
No ratings yet
Topic 3ii - Convolutional Neural Network
43 pages
Some Important Question
No ratings yet
Some Important Question
59 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Slides Foundations
No ratings yet
Slides Foundations
81 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
NN 07
No ratings yet
NN 07
24 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
53 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Amanote
No ratings yet
Amanote
33 pages
Unit Iii Deep Learning
No ratings yet
Unit Iii Deep Learning
31 pages
SYSCAL Pro Users Manual SYSCAL Pro Stand
No ratings yet
SYSCAL Pro Users Manual SYSCAL Pro Stand
114 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
No ratings yet
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
64 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
11 pages
Cnns Layers: Convolution Neural Network Convolutional Neural Network
No ratings yet
Cnns Layers: Convolution Neural Network Convolutional Neural Network
10 pages
(MS-02.00) Condensing Unit & Ahu
No ratings yet
(MS-02.00) Condensing Unit & Ahu
52 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Slides NN
No ratings yet
Slides NN
59 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
Grundfosliterature 5769232
No ratings yet
Grundfosliterature 5769232
14 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
Analysis: Understanding Multicellular Function and Disease With Human Tissue-Specific Networks
No ratings yet
Analysis: Understanding Multicellular Function and Disease With Human Tissue-Specific Networks
11 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Vedang PSAT 2
No ratings yet
Vedang PSAT 2
19 pages
T.C. Altinbas University Institute of Graduate Studies
No ratings yet
T.C. Altinbas University Institute of Graduate Studies
75 pages
W5-Group III Cations
No ratings yet
W5-Group III Cations
10 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
CNN Architecture
No ratings yet
CNN Architecture
24 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Convolutional Neural Network - Wikipedia
No ratings yet
Convolutional Neural Network - Wikipedia
21 pages
Worksheet Graphing Systems
No ratings yet
Worksheet Graphing Systems
3 pages
Grouping of Resistances-1
No ratings yet
Grouping of Resistances-1
13 pages
Riemann - Biography - Wiki
No ratings yet
Riemann - Biography - Wiki
7 pages
Unit 1
No ratings yet
Unit 1
57 pages
Computer Vision Part 2
No ratings yet
Computer Vision Part 2
5 pages
E GMAT SC Complete StudyPlan
No ratings yet
E GMAT SC Complete StudyPlan
6 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
H53015302 TRQ XXX
No ratings yet
H53015302 TRQ XXX
2 pages
Group 2 - How Does Music Impact Plant Growth
No ratings yet
Group 2 - How Does Music Impact Plant Growth
5 pages
Sudhanshu Rai Visual Basic Assignment
No ratings yet
Sudhanshu Rai Visual Basic Assignment
33 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
AP1501
No ratings yet
AP1501
12 pages
How Data Access Sets & Security Rules Work Together - R12 General Ledger
No ratings yet
How Data Access Sets & Security Rules Work Together - R12 General Ledger
1 page
Calculating Frequency Bias Setting
100% (1)
Calculating Frequency Bias Setting
5 pages
Theory of CNN (Convolutional Neural Network)
No ratings yet
Theory of CNN (Convolutional Neural Network)
4 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
A WZ Oil Separators Catalog en Us 1733722
No ratings yet
A WZ Oil Separators Catalog en Us 1733722
1 page
This Paper Is SAMPLE of The Official TSH Scholarship Event Exam (This Sample Is Missing The Optional Question 81 and Will Be Updated Soon)
100% (1)
This Paper Is SAMPLE of The Official TSH Scholarship Event Exam (This Sample Is Missing The Optional Question 81 and Will Be Updated Soon)
42 pages
Visual Basic 6.0 Documentation
No ratings yet
Visual Basic 6.0 Documentation
33 pages
Matlab 8
100% (1)
Matlab 8
12 pages
Culinary Math
100% (1)
Culinary Math
11 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Rocker Gear and Valves
No ratings yet
Rocker Gear and Valves
10 pages
Determination of PKa Values For API
100% (1)
Determination of PKa Values For API
9 pages
E427 PDF
No ratings yet
E427 PDF
7 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
250kW MEGATRON - Battery Energy Storage Systems Datasheet - 2022 - Symte...
100% (2)
250kW MEGATRON - Battery Energy Storage Systems Datasheet - 2022 - Symte...
15 pages
BODMAS 1new
No ratings yet
BODMAS 1new
2 pages
Water Heater Spreadsheet
No ratings yet
Water Heater Spreadsheet
16 pages
Design Process en 1993-1!3!2006
No ratings yet
Design Process en 1993-1!3!2006
22 pages
65° Panel Antenna
No ratings yet
65° Panel Antenna
2 pages
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet