0% found this document useful (0 votes)

17 views43 pages

Lecture 08

Uploaded by

Tim Widmoser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views43 pages

Lecture 08

Uploaded by

Tim Widmoser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

WiSe 2023/24

Deep Learning 1

Lecture 8 Convolution Neural Networks

Machine Learning for Computer Vision

▶ Traditional approach (before 2012): handcrafted features

Image Handcrafted Feature Extractor Trainable Classier

e.g., SIFT, HoG

▶ Deep Learning (2012-...): end-to-end hierarchical feature learning

Image Low-level Features Mid-level Features High-level Features Trainable Classier

learning end-to-end

Figures are adapted from Canziani and LeCun, 2021.

1/42
ImageNet [2] Benchmark
Task: 1000-class classication (∼3M images)

ImageNet Classication Top-5 Error (%). Gordon Cooper, 2019

Remarks: 1): Another key ingredient that helped facilitate the progress on the Ima-

geNet task is the utilization of GPUs in training large CNNs [9]; 2): More up-to-date

results see https://paperswithcode.com/sota/image-classification-on-imagenet.

2/42
Recap: Multi-layer perceptrons

A sequence of ane and thresholding transformations:

z (1) = W (1) x + b(1)

a(1) = σ(z (1) )
..
.
a(j) = σ(W j a(j−1) + b(j) )
..
.

3/42
Could we use MLPs for Images?

One might atten images to be (very) tall vectors and feed these vectors as
inputs to MLPs.

Image atten
MLP

For example, a 3-color 32 × 32 image (e.g., R32×32×3 ) to a tall vector R3072

Issues
▶ space and algorithmic complexity
▶ statistical (learning) ineciency: We do not exploit correlations of
neighbouring pixels.

4/42
Could we use MLPs for Images? (cont.)

Issue 1: Space and Algorithmic Complexity Let m be the number of input

dimensions (after attened), e.g, m = 3072 and n be the number of neurons
in the rst layer of a MLP.

Space Complexity: We need mn + n parameters.

→ mn for weights W (1) and n for biases b(1) .

Algorithmic Complexity: O(mn)

5/42
Could we use MLPs for Images? (cont.)
Issue 2: Statistical Ineciency
After attening, we do not exploit local relationships between neighbouring
pixels.

For example, pixels in the regions of car wheels often correlate, and the
correlation might be useful for some learning tasks.

Theisimages
Figure fromfrom
adapted Stanford
theCar and the Car
Stanford annotation
datasetboxes are
[12]. annoted by Pat.

6/42
Overview of CNNs
CNNs = learning hierarchical features (from low to high level features)
using convolution (learning correlation between neighbour pixels)
and pooling layers (enlarging the size of neighbourhoods)1

Remarks: 1): In practice, there are other components in CNNs that help increase the performance of the models, but convolution
and pooling layers are the two main important ingredients; 2) Figure is taken from LeCun et al., 2015 [15].

7/42
Convolution Operator

Let x(t) ∈ R be a one-dimensional signal at time t ∈ R and w(τ ) be a

weighting function. The convolution operator ∗ is
Z
(x ∗ w)(t) = x(τ )w(t − τ ) dτ (continuous setting)
∞
(discrete setting)
X
≈ x(τ )w(t − τ )
τ −∞

We refer
▶ x as the input,
▶ w as the kernel,
▶ and (x ∗ w)(t) as the feature map, denoted with a(t).

8/42
Convolution Operator (cont.)

We can see the convolution operator is commutative

X
(x ∗ w)(t) = x(τ )w(t − τ )
τ

(dene τ ′ := t − τ )
X
= x(t − τ ′ )w(τ ′ )
τ′

= (w ∗ x)(t)

In deep learning, we instead use a related operator called cross-correlation1 ,

denoted with ⋆ (NOT asterisk ∗)
X
(x ⋆ w)(t) = x(t + τ ′ )w(τ ′ ).
τ′

Remarks: 1): the operator is generally referred and the underlying implementation

of convolution layers in deep learning frameworks; therefore, we shall strict with the

name convolution and use the symbol to indicate the actual operator; 2) The range

of τ′ will be made it clear in the following slides.

9/42
Figure from Wikipedia: Convolution.

10/42
Two-Dimensional Discrete Convolution

A two-dimensional discrete convolution layer consists of

▶ weight (W ∈ Rdout ×din ×k×k , or kernel) whose dimensions governed by
▶ Number of input channels din , e.g., a 3-color image din = 3
▶ Number of output channels dout chosen by the user
▶ Kernel size k chosen by the user
▶ bias (b ∈ R, optional)

Weight W
(k = 3, din = 1, dout = 1)

Discrete convolution using W (no bias) on an

input (din = 1)
Figures are adapted from [3].

11/42
Numerical Example

30 31 22 1 0
02 02 10 3 1 12.0 12.0 17.0

30 11 22 2 3 10.0 17.0 19.0

2 0 0 2 2 9.0 6.0 14.0

2 0 0 0 1
0 1 2
2 2 0
3 3 2 1 0
0 1 2
0 0 1 3 1 12.0 12.0 17.0

Weight W (k = 3) 3 1 20 21 32 10.0 17.0 19.0

2 0 02 22 20 9.0 6.0 14.0

2 0 00 01 12

Left: Weight applied to two locations in the input;

Right: Output of the operator at the two locations
(shaded entries) and other locations.
Figures are from [3].

12/42
Stride and Padding

Apart from the parameters (W, b), discrete convolution also has two impor-
tant hyper-parameters, namely
▶ Stride (amount of kernel translated)

Left: stride = 1 (previous example), Right: stride = 2

▶ Padding : How do we handle regions at the boundary of the input?

Padding with sizes 1 and 2 respectively.

13/42
Discrete Convolution when d out >1

So far, we mainly discuss the case when dout = 1. When dout > 1, we repeat
the process dout times and concatenate the output of each time together.

Example when dout = 2 (stride = 1 and no padding)

14/42
Two-Dimensional Discrete Convolution

Denote a ∈ Rdin ×hin ×win be an input and W ∈ Rdout ×din ×k×k be a convolution
weight. Two-dimensional discrete convolution can be expressed as
k din
k X
X X
zc′ ,m,n = (a ⋆ W )i,j = ac,m+(τ1 −1),n+(τ2 −1) Wc′ ,c,τ1 ,τ2 ,
τ1 =1 τ2 =1 c∈1

where ∀c′ ∈ {1, . . . , dout }, ∀m ∈ {1, . . . , hout }, and ∀n ∈ {1, . . . , wout }.

The exact value of hout and wout depends on the kernel size, stride, and
padding.
Suppose we have a square input hin = win and using stride=1 and no padding.
We have

hout = wout = hin − k + 1 (Relationship 2 in [3])

See [3] for the relationship between input and output size in other settings.

15/42
Translation Equivariance

Credit: Christian Wolf

16/42
Pooling Layer
Convolution can detect patterns that are not larger than the kernel size k
A stack of convolutions can be used to increase this pattern-respond region,
commonly referred to as Receptive Field.
Practically, it is more eective to increase the receptive eld by subsampling
the input, or Pooling.
Commonly used pooling layers are average1 and max pooling.

2 0.5
Average Pooling
0 4 1 0 2 1
1 3 0 1
2 2 0 2
1 3 1 1 4 1
Max Pooling
3 2

Average and max pooling with k = stride = 2.

1): One can express the average pooling using convolution with constant weight

Wτ1 ,τ2 = 1/k2 and no bias.

17/42
(Local) Translation Invariance of Max Pooling

0 4 1 0
1 3 0 1 4
Max Pooling
2 2 0 2
1 3 1 1

For this example, the top-left output entry of the max pooling remains the
same if the input is shifted left and down one step.

18/42
How the Size of Receptive Field Evolves

Screenshot from [1]. What happens to the receptive eld if we change some of

these parameters? Try it at

https://distill.pub/2019/computing-receptive-fields/.

19/42
Blueprint of CNNs for Classication

Feature Extractor

[Conv. Act. Pooling]×L Global Average Pooling → MLP

z }| {
→ → → | {z }
Classier

20/42
Case Study: LeNet-5 [17]

Key Contribution: Pioneer work on using modern CNNs for handwritten

character recognition using gradient-based learning.

Building on earlier work on CNNs by LeCun [16] which was inspired by the Neocog-
nitron by Fukushima[5].

21/42
Case Study: AlexNet [13]

Key Contributions: First CNN winning ImageNet Challenge (2012); Making

use of GPUs in training

22/42
Case Study: VGG [20]

Key Contributions: Winner ⋆

of ImageNet Challenge (2014); Demonstrating
the benet of the depth.

: ⋆ 1st and 2nd places on the location and classication tracks.

23/42
Case Study: ResNets [8]

Key Contributions: Winner of ImageNet Challenge (2015) and many other

challenges; Inventing the residue connection that allow training CNNs with
many more layers (e.g., 8× deeper than VGG).

24/42
Case Study: ResNets [8] (cont.)

25/42
Residual Connections

▶ Core idea of ResNet by He et al.

▶ For layer l and intermediate representation xl with layers NNl
xl+1 = xl + NNl (xl ; θl )
xl+2 = xl + NNl (xl ; θl ) + NNl+1 (xl+1 ; θl+1 )
...
▶ Better gradient ow by 'shortcutting' over high number of intermediate
layers between loss and layer NNl
∂θl xl+1 = xl +∂θl NNl (xl ; θl )
∂θl+1 xl+2 = xl + NNl (xl ; θl )+∂θl+1 NNl+1 (xl+1 ; θl+1 )
...
▶ Allows training far deeper networks with more parameters which result
in better performance

26/42
What features do CNNs learn? [23]

27/42
Shortcoming of CNNs: Vulnerable to Noise in Input
If the input is slightly (adversarially) perturbed, the prediction of CNNs can
dramatically change.

Top: Adversarial noise causes the prediction of a CNN to change from Panda to

Gibbon; Bottom: Physical adversarial sticker causes stop signs to detected as a

target speed limit sign; Figures are from [7, 4] respectively.

28/42
Shortcoming of CNNs: Texture Bias [6]

Unlike humans that often rely on shape information in visual processing [14],
CNNs rely heavily on textural features.

29/42
Applications of CNNs

Apart from image classication, CNNs are often used as feature extractors
in image-based learning tasks, and the concept of CNNs can be generalized
to input from other modalities
▶ Dierent Tasks: Object Detection, Image Segmentation, Image
Captioning, ...
▶ Other Modalities: Text Classication, Text-to-Speech, ...

30/42
Object Detection: You Only Look Once (YOLO) [18]

The architecture image is from https://dinghow.site/2019/08/24/object-detection-part1.

31/42
Image Segmentation: U-net [19]

Key Contributions: Fast architecture for precise (biological) image segmen-

tation.

Key concepts: contraction (increase what, reduce where) and expanding paths (pre-

cise localization).

32/42
Image Segmentation: U-net [19] (cont.)

33/42
Image Captioning [21, 10, 22]

Figure from [22].

34/42
Convolution and Pooling for Text Classication [11]

Figure is from https://indiantechwarrior.com/

sentence-classification-using-convolutional-neural-networks/

35/42
Transformers: Attention Mechanism
▶ Idea: nd global interactions within an
input sequence or between two input
sequences.
▶ In our case: ′K = V = x ∈ RT and
Q = y ∈ RT
▶ "Query with target yt′ all source keys xt
on how much attention to pay to source
value xt at each target timestep t′ "

▶ Compute pairwise product

QK T ∈ RT ×T to obtain similarities
′

▶ Normalize over source dimension with

softmax probability
▶ Scale source values V = x
Attention(Q,K,V) = SoftmaxT (QK T ) V
| {z } | {z }
T ′ ×T T ′ ×T
(1)

36/42
Transformers: Attention Mechanism
▶ Original attention mechanism in Bahdanau et al used MLP as scalar
projector
▶ QK T is a dot product, so we can use arbitrary dimensions
x ∈ RT ×F and y ∈ RT ×F , QK T ∈ RT ×T
′ ′
▶

Figure: "Neural Machine Translation by Jointly Learning to Align and Translate"

by Bahndanau et al.

37/42
Summary

▶ CNNs exploit local structure of (2d) and learn hierarchical

representation for a given task.
▶ CNNs share parameters between spatial locations, and they are thus
suitable to learn from signal where features potentially appear in any
location.
▶ Main ingredients of CNNs: Convolution and Pooling layers.
▶ CNNs are widely used in many applications and domains (beyond image
data).
▶ CNNs force local interactions in the rst layers. For some tasks, it is
benecial to capture global interactions (this can be achieved with
attention layers).

38/42
Bibliography I
[1] A. Araujo, W. Norris, and J. Sim.
Computing receptive elds of convolutional neural networks.
Distill, 2019.
https://distill.pub/2019/computing-receptive-elds.
[2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei.
ImageNet: A Large-Scale Hierarchical Image Database.
In CVPR09, 2009.
[3] V. Dumoulin and F. Visin.
A guide to convolution arithmetic for deep learning.
arXiv preprint arXiv:1603.07285, 2016.

[4] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song.
Robust physical-world attacks on deep learning visual classication.
In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City,
UT, USA, June 18-22, 2018, pages 16251634. Computer Vision Foundation / IEEE Computer Society,
2018.
[5] K. Fukushima.
A self-organizing neural network model for a mechanism of pattern recognition unaected by shift in
position.
Biol, Cybern, 36:193202, 1980.

[6] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel.

Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and
robustness.
In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May
6-9, 2019. OpenReview.net, 2019.

[7] I. J. Goodfellow, J. Shlens, and C. Szegedy.

Explaining and harnessing adversarial examples.
In Y. Bengio and Y. LeCun, editors, 3rd International
Conference on Learning Representations, ICLR
2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.

39/42
Bibliography II
[8] K. He, X. Zhang, S. Ren, and J. Sun.
Deep residual learning for image recognition.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June
2016.
[9] S. Hooker.
The hardware lottery.
Communications of the ACM, 64(12):5865, 2021.
[10] A. Karpathy and L. Fei-Fei.
Deep visual-semantic alignments for generating image descriptions.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 31283137,
2015.
[11] Y. Kim.
Convolutional neural networks for sentence classication.
In A. Moschitti, B. Pang, and W. Daelemans, editors, Proceedingsof the 2014 Conference on Empirical
Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting
of SIGDAT, a Special Interest Group of the ACL, pages 17461751. ACL, 2014.

[12] J. Krause, M. Stark, J. Deng, and L. Fei-Fei.

3d object representations for ne-grained categorization.
In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney,
Australia, 2013.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton.
Imagenet classication with deep convolutional neural networks.
In P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,
Advances
in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing
Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States,
pages 11061114, 2012.
[14] B. Landau, L. B. Smith, and S. S. Jones.
The importance of shape in early lexical learning.
Cognitive development, 3(3):299321, 1988.

40/42
Bibliography III
[15] Y. LeCun, Y. Bengio, and G. Hinton.
Deep learning.
nature, 521(7553):436444, 2015.

[16] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel.

Backpropagation applied to handwritten zip code recognition.
Neural computation, 1(4):541551, 1989.

[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haner.

Gradient-based learning applied to document recognition.
Proceedings of the IEEE, 86(11):22782324, 1998.

[18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi.

You only look once: Unied, real-time object detection.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779788,
2016.
[19] O. Ronneberger, P. Fischer, and T. Brox.
U-net: Convolutional networks for biomedical image segmentation.
In International Conference on Medical image computing and computer-assisted intervention, pages
234241. Springer, 2015.
[20] K. Simonyan and A. Zisserman.
Very deep convolutional networks for large-scale image recognition.
In Y. Bengio and Y. LeCun, editors, 3rd International Conference on
Learning Representations, ICLR
2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.

[21] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.

Show and tell: A neural image caption generator.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 31563164,
2015.
[22] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio.
Show, attend and tell: Neural image caption generation with visual attention.
In International conference on machine learning, pages 20482057. PMLR, 2015.

41/42
Bibliography IV

[23] M. D. Zeiler and R. Fergus.

Visualizing and understanding convolutional networks.
In D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors,
Computer Vision - ECCV 2014 - 13th
European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, volume 8689 of
Lecture Notes in Computer Science, pages 818833. Springer, 2014.

42/42

Dell Technologies PowerEdge Server Concepts and Products
No ratings yet
Dell Technologies PowerEdge Server Concepts and Products
207 pages
CIT478
No ratings yet
CIT478
2 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
98 pages
ML 11
No ratings yet
ML 11
62 pages
Ready Reckoner For Capgemini Exceller Winning Steps Learning Journey Webinar 5
No ratings yet
Ready Reckoner For Capgemini Exceller Winning Steps Learning Journey Webinar 5
6 pages
Lecture 6 Deep Learning Training and Testing 2025
No ratings yet
Lecture 6 Deep Learning Training and Testing 2025
36 pages
CD DVD Management Final Report With Code and Output
No ratings yet
CD DVD Management Final Report With Code and Output
66 pages
Powermark Fly Laser Reference (20200306) v1.1
No ratings yet
Powermark Fly Laser Reference (20200306) v1.1
58 pages
Ee046746 Tut 03 04 Convolutional Neural Networks
No ratings yet
Ee046746 Tut 03 04 Convolutional Neural Networks
26 pages
UltraViewerService Log
No ratings yet
UltraViewerService Log
14 pages
Mock Test For Backend Lead (NodeJS)
No ratings yet
Mock Test For Backend Lead (NodeJS)
4 pages
Introduction To The IEEE Transactions On Cloud Computing
No ratings yet
Introduction To The IEEE Transactions On Cloud Computing
19 pages
Convolutional Neural Networks - Deeplearning-Notes
No ratings yet
Convolutional Neural Networks - Deeplearning-Notes
43 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
(Fall 2024) Images and Convolutions
No ratings yet
(Fall 2024) Images and Convolutions
69 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
102 pages
Bigtable - A Distributed Storage System For Structured Data
No ratings yet
Bigtable - A Distributed Storage System For Structured Data
2 pages
NCERT Solutions For Class 9 Maths Chapter 2 Exercise 2.3 - Free PDF 2024
No ratings yet
NCERT Solutions For Class 9 Maths Chapter 2 Exercise 2.3 - Free PDF 2024
6 pages
03 Convolutional Neural Networks
No ratings yet
03 Convolutional Neural Networks
83 pages
Convolutional Neural Networks - Part 2
No ratings yet
Convolutional Neural Networks - Part 2
49 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
26 pages
Convolutional Neural Network: by Gagandeep Kaur
100% (1)
Convolutional Neural Network: by Gagandeep Kaur
107 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
Convolutional Neural Networks (CNN) : Convolutions
No ratings yet
Convolutional Neural Networks (CNN) : Convolutions
17 pages
Intro To CNN
No ratings yet
Intro To CNN
93 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Lab 5 - Intro To Convolutional Neural Networks
No ratings yet
Lab 5 - Intro To Convolutional Neural Networks
52 pages
CNNs
No ratings yet
CNNs
22 pages
Team Structure
No ratings yet
Team Structure
4 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
ChatGPT - Convolution and Pooling Operations
No ratings yet
ChatGPT - Convolution and Pooling Operations
43 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
70 pages
C - How Does Linux Kernel Discover PCI Devices - Stack Overflow
No ratings yet
C - How Does Linux Kernel Discover PCI Devices - Stack Overflow
1 page
Wu Et Al. - 2022 - Identity-Sensitive Knowledge Propagation For Cloth
No ratings yet
Wu Et Al. - 2022 - Identity-Sensitive Knowledge Propagation For Cloth
5 pages
Lecture CNN
No ratings yet
Lecture CNN
68 pages
Chapter 1 - Basic Concepts of Programming
No ratings yet
Chapter 1 - Basic Concepts of Programming
84 pages
Lec 8
No ratings yet
Lec 8
60 pages
Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium
No ratings yet
Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium
13 pages
DL6 - Convnets 4
No ratings yet
DL6 - Convnets 4
57 pages
Course Brochure 6weeks
No ratings yet
Course Brochure 6weeks
6 pages
ASS8
No ratings yet
ASS8
14 pages
GD - 1SP0635 Manual Power Integrations
No ratings yet
GD - 1SP0635 Manual Power Integrations
27 pages
CNN2
No ratings yet
CNN2
70 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Chapter 4 Ann
No ratings yet
Chapter 4 Ann
33 pages
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
123 pages
MIC1971 SMB Thought Leadership - Partner S1R10
No ratings yet
MIC1971 SMB Thought Leadership - Partner S1R10
28 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
CNN 2
No ratings yet
CNN 2
47 pages
CNN Slides PDF
No ratings yet
CNN Slides PDF
81 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
8255 Modes
No ratings yet
8255 Modes
4 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
55 pages
Product IDCodes
No ratings yet
Product IDCodes
1 page
PMI-ACP Mock Exam
No ratings yet
PMI-ACP Mock Exam
7 pages
Panasonic th-50c300k 50c300m 50c300s 50c300t 50c300x
No ratings yet
Panasonic th-50c300k 50c300m 50c300s 50c300t 50c300x
38 pages
NN 06
No ratings yet
NN 06
18 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
21CS743 Module4 Notes
No ratings yet
21CS743 Module4 Notes
15 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
0% (1)
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
49 pages
An Overview of Convolutional Neural Network Architectures For Deep Learning
No ratings yet
An Overview of Convolutional Neural Network Architectures For Deep Learning
22 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Flask
No ratings yet
Flask
29 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
Guide Convolutional Neural Network CNN
100% (1)
Guide Convolutional Neural Network CNN
25 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
No ratings yet
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
64 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
ZKM 2 Manual
No ratings yet
ZKM 2 Manual
118 pages
Operation Manual of Simulation SystemV1.0
No ratings yet
Operation Manual of Simulation SystemV1.0
19 pages
BPUT AMS UserManual
No ratings yet
BPUT AMS UserManual
30 pages
Using SOLIDWORKS 2018: Engineering & Computer Graphics Workbook
No ratings yet
Using SOLIDWORKS 2018: Engineering & Computer Graphics Workbook
22 pages
Forecasting Exercises Problem
100% (1)
Forecasting Exercises Problem
2 pages
Proficiency Himanshu
No ratings yet
Proficiency Himanshu
8 pages
Total 3 Marks
No ratings yet
Total 3 Marks
8 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
21CS743 DL Module4 Notes
No ratings yet
21CS743 DL Module4 Notes
7 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet

Lecture 08

Uploaded by

Lecture 08

Uploaded by

WiSe 2023/24

Lecture 8 Convolution Neural Networks

▶ Traditional approach (before 2012): handcrafted features

Image Handcrafted Feature Extractor Trainable Classier

e.g., SIFT, HoG

▶ Deep Learning (2012-...): end-to-end hierarchical feature learning

Image Low-level Features Mid-level Features High-level Features Trainable Classier

Figures are adapted from Canziani and LeCun, 2021.

ImageNet Classication Top-5 Error (%). Gordon Cooper, 2019

results see https://paperswithcode.com/sota/image-classification-on-imagenet.

A sequence of ane and thresholding transformations:

z (1) = W (1) x + b(1)

For example, a 3-color 32 × 32 image (e.g., R32×32×3 ) to a tall vector R3072

Issue 1: Space and Algorithmic Complexity Let m be the number of input

Space Complexity: We need mn + n parameters.

Algorithmic Complexity: O(mn)

Let x(t) ∈ R be a one-dimensional signal at time t ∈ R and w(τ ) be a

We can see the convolution operator is commutative

In deep learning, we instead use a related operator called cross-correlation1 ,

of τ′ will be made it clear in the following slides.

A two-dimensional discrete convolution layer consists of

Discrete convolution using W (no bias) on an

30 11 22 2 3 10.0 17.0 19.0

2 0 0 2 2 9.0 6.0 14.0

Weight W (k = 3) 3 1 20 21 32 10.0 17.0 19.0

2 0 02 22 20 9.0 6.0 14.0

Left: Weight applied to two locations in the input;

Left: stride = 1 (previous example), Right: stride = 2

Padding with sizes 1 and 2 respectively.

Example when dout = 2 (stride = 1 and no padding)

where ∀c′ ∈ {1, . . . , dout }, ∀m ∈ {1, . . . , hout }, and ∀n ∈ {1, . . . , wout }.

hout = wout = hin − k + 1 (Relationship 2 in [3])

Credit: Christian Wolf

Average and max pooling with k = stride = 2.

Wτ1 ,τ2 = 1/k2 and no bias.

these parameters? Try it at

[Conv. Act. Pooling]×L Global Average Pooling → MLP

Key Contribution: Pioneer work on using modern CNNs for handwritten

Key Contributions: First CNN winning ImageNet Challenge (2012); Making

Key Contributions: Winner ⋆

: ⋆ 1st and 2nd places on the location and classication tracks.

Key Contributions: Winner of ImageNet Challenge (2015) and many other

▶ Core idea of ResNet by He et al.

Gibbon; Bottom: Physical adversarial sticker causes stop signs to detected as a

target speed limit sign; Figures are from [7, 4] respectively.

The architecture image is from https://dinghow.site/2019/08/24/object-detection-part1.

Key Contributions: Fast architecture for precise (biological) image segmen-

Figure from [22].

Figure is from https://indiantechwarrior.com/

▶ Compute pairwise product

▶ Normalize over source dimension with

Figure: "Neural Machine Translation by Jointly Learning to Align and Translate"

▶ CNNs exploit local structure of (2d) and learn hierarchical

[6] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel.

[7] I. J. Goodfellow, J. Shlens, and C. Szegedy.

[12] J. Krause, M. Stark, J. Deng, and L. Fei-Fei.

[16] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel.

[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haner.

[18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi.

[21] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.

[23] M. D. Zeiler and R. Fergus.

You might also like

Image Handcrafted Feature Extractor Trainable Classier

Image Low-level Features Mid-level Features High-level Features Trainable Classier

ImageNet Classication Top-5 Error (%). Gordon Cooper, 2019

A sequence of ane and thresholding transformations:

: ⋆ 1st and 2nd places on the location and classication tracks.

Gibbon; Bottom: Physical adversarial sticker causes stop signs to detected as a

[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haner.