0% found this document useful (0 votes)

5 views35 pages

Lecture 5

The document discusses the significance of the ImageNet dataset and the ILSVRC competition in the context of machine learning for computer vision, particularly focusing on breakthrough convolutional neural networks like AlexNet, VGG, GoogLeNet, and ResNet. It highlights the architectural innovations and performance improvements these networks have achieved in image classification tasks. The conclusion emphasizes the advantages of convolutional neural networks over traditional fully-connected networks in handling image data.

Uploaded by

hafsaladhasse7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views35 pages

Lecture 5

Uploaded by

hafsaladhasse7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Machine Learning for Computer Vision

Convolutional Neural Networks

Mehdi Zakroum

International University of Rabat

Acknowledgments for slides: Courtesy of Prof. Mounir Ghogho.

Outline

1. About The ImageNet Dataset and The ILSVRC Competition

2. Breakthrough Convolutional Neural Networks

3. Conclusion

Mehdi Zakroum International University of Rabat 1 / 31 (3%)

1. About The ImageNet Dataset and The ILSVRC
Competition
1. About The ImageNet Dataset and The ILSVRC Competition

The ImageNet Dataset

ImageNet is an image dataset:

▶ Collected from the web
▶ It counts 14+ million high-resolution images
▶ Images are annotated along 21000+ categories
▶ Images of each concept (category) are quality-controlled and
human-annotated
▶ Images are of variable resolution (size)

Mehdi
Website: Zakroum International University of Rabat
https://www.image-net.org 2 / 31 (6%)
1. About The ImageNet Dataset and The ILSVRC Competition

About ILSVRC

ImageNet Large Scale Visual Recognition Challenge evaluates algorithms

for object detection and image classiﬁcation at large scale.

It uses a subset of the ImageNet dataset:

▶ 1,281,167 training images
▶ 50,000 validation images
▶ 100,000 test images
▶ Images have been down-sampled to a ﬁxed resolution of 256 × 256:
images are ﬁrst rescaled then cropped to retain only the 256 × 256
central region

Researchers around the world report their results and the most successful
and innovative teams are invited to present at the Computer Vision and
Pattern Recognition (CVPR) conference.

Mehdi Zakroum International University of Rabat 3 / 31 (9%)

2. Breakthrough Convolutional Neural Networks
2. Breakthrough Convolutional Neural Networks

AlexNet

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet

classiﬁcation with deep convolutional neural networks.” Advances in neural
information processing systems 25. 2012.

Mehdi Zakroum International University of Rabat 4 / 31 (12%)

2. Breakthrough Convolutional Neural Networks

AlexNet (2012)

AlexNet is one of the ﬁrst Deep CNNs to achieve considerable accuracy on

the 2012 ILSVRC competition with an accuracy of 84.7% as compared to the
second-best with an accuracy of 73.8%.

How many trainable parameters?

Mehdi Zakroum International University of Rabat 5 / 31 (16%)

2. Breakthrough Convolutional Neural Networks

AlexNet (2012)

Mehdi Zakroum International University of Rabat 6 / 31 (19%)

2. Breakthrough Convolutional Neural Networks

AlexNet (2012): what made it win the challenge?

▶ ReLU Non-linearity: AlexNet uses Rectiﬁed Linear Units (ReLU)

instead of the tanh function, which was standard at the time.
▶ Multiple GPUs: AlexNet allows for multi-GPU training by putting half
of the model’s neurons on one GPU and the other half on another
GPU. Not only does this mean that a bigger model can be trained, but
it also cuts down on the training time.
▶ Overlapping Pooling: CNNs traditionally “pool” outputs of
neighboring groups of neurons with no overlapping.
▶ Methods to reduce overﬁtting: AlexNet had 60+ million parameters,
to alleviate this issue, AlexNet used Data Augmentation and
Dropout.

Mehdi Zakroum International University of Rabat 7 / 31 (22%)

2. Breakthrough Convolutional Neural Networks

VGG16 & VGG19

Karen Simonyan, and Andrew Zisserman. “Very deep convolutional

networks for large-scale image recognition.” arXiv preprint
arXiv:1409.1556. 2014.

Mehdi Zakroum International University of Rabat 8 / 31 (25%)

2. Breakthrough Convolutional Neural Networks

VGG16 (2014)

Figure 1: Architecture of VGG16

▶ Designed by Visual Geometry Group (VGG) from Oxford University

▶ 13 convolutional layers, 5 max pooling layers, 3 fully connected layers (4096, 4096, 1000)
⇒ 16 weight layers (i.e. 16 layers of trainable parameters)

How many trainable parameters?

Mehdi Zakroum International University of Rabat 9 / 31 (29%)

2. Breakthrough Convolutional Neural Networks

VGG16 (2014)

Mehdi Zakroum International University of Rabat 10 / 31 (32%)

2. Breakthrough Convolutional Neural Networks

Deeper CNN: VGG19

Figure 2: VGG19 Architecture

Mehdi Zakroum International University of Rabat 11 / 31 (35%)

2. Breakthrough Convolutional Neural Networks

Compared VGG CNNs and Number of Parameters

Mehdi Zakroum International University of Rabat 12 / 31 (38%)

2. Breakthrough Convolutional Neural Networks

VGG Accuracy

Mehdi Zakroum International University of Rabat 13 / 31 (41%)

2. Breakthrough Convolutional Neural Networks

VGG: What Made It Win the Competition?

Quote from the VGG article (where the authors discuss improvements over the previous models):

“It is easy to see that a stack of two 3 × 3 conv. layers (without spatial pooling in between) has an effective
receptive field of 5 × 5; three such layers have a 7 × 7 effective receptive field. So what have we gained by
using, for instance, a stack of three 3 × 3 conv. layers instead of a single 7 × 7 layer? First, we incorporate
three non-linear rectification layers instead of a single one, which makes the decision function more
discriminative. Second, we decrease the number of parameters: assuming that both the input and the
output of a three-layer 3 × 3 convolution stack has C channels, the stack is parametrised by
3(32 C 2 ) = 27C 2 weights; at the same time, a single 7 × 7 conv. layer would require 72 C 2 = 49C 2
parameters, i.e. 81% more. This can be seen as imposing a regularisation on the 7 × 7 conv. filters, forcing
them to have a decomposition through the 3 × 3 filters (with non-linearity injected in between).”

Mehdi Zakroum International University of Rabat 14 / 31 (45%)

2. Breakthrough Convolutional Neural Networks

GoogLeNet (Inception)

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew
Rabinovich. “Going deeper with convolutions.” In Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 1-9. 2015.

Mehdi Zakroum International University of Rabat 15 / 31 (48%)

2. Breakthrough Convolutional Neural Networks

Inception, a.k.a. GoogLeNet (2014)

▶ When designing a layer for a CNN, you need to pick a convolution layer with one specific
filter size (1 × 1, 3 × 3, 5 × 5, etc.), or you need to pick a pooling layer, etc.. This might
limit the captured information by that layer.
▶ The Inception model is characterized by its innovative use of “inception modules”.
These modules combine filters of different sizes, including 1 × 1, 3 × 3, and 5 × 5
convolutions, within the same layer.
▶ This architectural choice allows the network to capture features at multiple scales
simultaneously; i.e. the model gains the ability to adapt to different scales of features
present in the input data.

Figure 3: Inception Module (naive version)

Mehdi Zakroum International University of Rabat 16 / 31 (51%)

2. Breakthrough Convolutional Neural Networks

Inception Module Example

Figure 4: Example of an Inception Module:

input size is 28 × 28 × 192 and output size is 28 × 28 × 256

Mehdi Zakroum International University of Rabat 17 / 31 (54%)

2. Breakthrough Convolutional Neural Networks

Inception: Computational Cost Issue and Solution

▶ Issue: Large ﬁlters like 5 × 5 convolutions brings the beneﬁt of capturing
spatial hierarchies and larger patterns, however, they are computationally
expensive.

The total number of ﬂoat multiplication operations is:

28 × 28 × 32 × 5 × 5 × 192 ≈ 120M
▶ Solution: dimensionality reduction using 1 × 1 convolutions

The total number of ﬂoat multiplication operations is:

(28 × 28 × 16 × 1 × 1 × 192) + (28 × 28 × 32 × 5 × 5 × 16) ≈ 12.4M
Mehdi Zakroum International University of Rabat 18 / 31 (58%)
2. Breakthrough Convolutional Neural Networks

Inception Module with Dimensionality Reduction

Figure 5: Inception Module (dimensionality reduction version)

Mehdi Zakroum International University of Rabat 19 / 31 (61%)

2. Breakthrough Convolutional Neural Networks

Inception

Figure 6: Full GoogLeNet Network

Mehdi Zakroum International University of Rabat 20 / 31 (64%)

2. Breakthrough Convolutional Neural Networks

Inception: Structural Details

Mehdi Zakroum International University of Rabat 21 / 31 (67%)

2. Breakthrough Convolutional Neural Networks

ResNet

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual
learning for image recognition.” In Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 770-778. 2016.

Mehdi Zakroum International University of Rabat 22 / 31 (70%)

2. Breakthrough Convolutional Neural Networks

ResNet: Training Deeper Networks is Harder

Figure 7: Error curves on CIFAR-10 with 20-layer and 56-layer networks

▶ Network depth is of crucial importance: the network learns more

hidden patterns.
▶ The degradation issue is observed: deeper neural networks are
more diﬃcult to train.
▶ The deeper network has higher training error, and thus test error.
▶ This degradation issue is not caused by overﬁtting!
Mehdi Zakroum International University of Rabat 23 / 31 (74%)
2. Breakthrough Convolutional Neural Networks

ResNet: Degradation Should not Happen In Deeper Nets!

The performance should not degrade by adding more layers, here is why:

Quote from the ResNet article:

“The degradation (of training accuracy) indicates that not all systems are similarly easy to
optimize. Let us consider a shallower architecture and its deeper counterpart that adds more
layers onto it. There exists a solution by construction to the deeper model: the added layers
are identity mapping, and the other layers are copied from the learned shallower model. The
existence of this constructed solution indicates that a deeper model should produce no higher
training error than its shallower counterpart. But experiments show that our current solvers
on hand are unable to ﬁnd solutions that are comparably good or better than the constructed
solution (or unable to do so in feasible time).”

Mehdi Zakroum International University of Rabat 24 / 31 (77%)

2. Breakthrough Convolutional Neural Networks

ResNet: Introducing The Residual Block

▶ Let’s assume that H is the mapping that takes as input x and that produces the ideal
predicted output (which matches with the ground truth).
▶ Instead of learning the mapping H, why not learning the function F that tells us what
information to add to x to get the desired ideal output H(x) (H(x) = F(x) + x). F is
called the residual function.
▶ Thus, we want to learn: F(x) = H(x) − x. Then, to get the desired ideal output, we
compute: H(x) = F(x) + x.
▶ Hypothesis: it is easier to optimize the residual mapping F than to optimize the original,
unreferenced mapping H. To the extreme, if an identity mapping were optimal, it would
be easier to push the residual to zero than to ﬁt an identity mapping by a stack of
nonlinear layers.

Figure 8: Example of a Residual Block

Mehdi Zakroum International University of Rabat 25 / 31 (80%)

2. Breakthrough Convolutional Neural Networks

ResNet: Optimization Algorithms Struggle To Learn Identity

Quote from the ResNet article:

“The degradation problem suggests that the solvers might have diﬃculties in approximating
identity mappings by multiple nonlinear layers. With the residual learning reformulation, if
identity mappings are optimal, the solvers may simply drive the weights of the multiple
nonlinear layers toward zero to approach identity mappings.

Mehdi Zakroum International University of Rabat 26 / 31 (83%)

2. Breakthrough Convolutional Neural Networks

ResNet: Shortcuts (Skip Connections) Formulation

Figure 9: Example of a Residual Block

▶ A ResNet building block is expressed as: y = F(x, {Wi }) + x, where:

x and y are the input and the output vectors
F represents the residual mapping to be learned, parameterized by {Wi }
{Wi } are the parameters to be learned
▶ In the ﬁgure above: F(x) = W2 σ(W1 x), where σ is the ReLU activation.
▶ To be able to perform the element-wise addition F(x) + x, the dimensions of F(x) and x
must be equal. If not, we can perform a linear projection with Ws as follows:

y = F(x, {Wi }) + Ws x

▶ The form of the residual function F is ﬂexible; in the illustrative ﬁgure, F has two layers, while
more layers are possible.

Mehdi Zakroum International University of Rabat 27 / 31 (87%)

2. Breakthrough Convolutional Neural Networks

ResNet: Architecture

Figure 10: Architecture of ResNet-34 vs Plain-34

Mehdi Zakroum International University of Rabat 28 / 31 (90%)

2. Breakthrough Convolutional Neural Networks

ResNet: Results after Training on ImageNet

Figure 11: Training on ImageNet.

Thin curves denote training error.
Bold curves denote validation error.

Experiments show that ResNet:

▶ Allows faster convergence at early stage.
▶ Provides accuracy gains from increased depth.

Mehdi Zakroum International University of Rabat 29 / 31 (93%)

2. Breakthrough Convolutional Neural Networks

ResNet: Comparative Performance

Figure 12: Error rates (%) of single-model

results on the ImageNet validation set.

Figure 13: Error rates (%) of ensembles.

The top-5 error is on the test set of ImageNet.
Mehdi Zakroum International University of Rabat 30 / 31 (96%)
3. Conclusion
3. Conclusion

Conclusion
▶ Multi-Layer Perceptrons (Fully-Connected Neural Networks):
Do not scale well for images
Ignore the information brought by pixel position and correlation with neighbors
Cannot handle translations
▶ Convolutional Neural Networks:
Leverage sparse interaction and parameter sharing to reduce the number of
parameters to learn
Use the convolution operation to make object detection and classiﬁcation
robust to shifts of objects in the image
Have demonstrated remarkable results in image classiﬁcation for benchmark
tasks and practical applications.
However, CNNs have a limited robustness to other geometric transformations
such as scaling and rotation; Scaling or rotating an image changes the spatial
relationships between pixels and can result in a loss of relevant features that the
CNN was trained to recognize.

Mehdi Zakroum International University of Rabat 31 / 31 (100%)

Assignment No 2 (Aleeza Anjum CS101)
No ratings yet
Assignment No 2 (Aleeza Anjum CS101)
60 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Aidl 2023s DL 08 CNN Architectures
No ratings yet
Aidl 2023s DL 08 CNN Architectures
51 pages
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
No ratings yet
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
82 pages
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
No ratings yet
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
167 pages
Google Net
No ratings yet
Google Net
40 pages
5b Dana
No ratings yet
5b Dana
67 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Malware Image Classification Using ML DL
No ratings yet
Malware Image Classification Using ML DL
5 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
3 DL ConvNets
No ratings yet
3 DL ConvNets
46 pages
465-Lecture 7
No ratings yet
465-Lecture 7
46 pages
Szegedy Rethinking The Inception CVPR 2016 Paper PDF
No ratings yet
Szegedy Rethinking The Inception CVPR 2016 Paper PDF
9 pages
Unit III
No ratings yet
Unit III
58 pages
Maira Ali 9F 20775 COMPUTER SCIENCE Research Work
No ratings yet
Maira Ali 9F 20775 COMPUTER SCIENCE Research Work
5 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
Szegedy Rethinking The Inception CVPR 2016 Paper
No ratings yet
Szegedy Rethinking The Inception CVPR 2016 Paper
9 pages
CNN Architectures - Transfer Learning
No ratings yet
CNN Architectures - Transfer Learning
64 pages
Unit 2 CNN
No ratings yet
Unit 2 CNN
15 pages
DL3 QB
No ratings yet
DL3 QB
19 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
No ratings yet
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
13 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
FT04 Haghighat Independent 2023
No ratings yet
FT04 Haghighat Independent 2023
40 pages
L10 - Intro - To - Deep - Learning
No ratings yet
L10 - Intro - To - Deep - Learning
75 pages
CS436 CS5310 Ee513 L05 CNN2
No ratings yet
CS436 CS5310 Ee513 L05 CNN2
27 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
An Overview of Convolutional Neural Network Architectures For Deep Learning
No ratings yet
An Overview of Convolutional Neural Network Architectures For Deep Learning
22 pages
Introduction To Deep Learning: Nandita Bhaskhar
No ratings yet
Introduction To Deep Learning: Nandita Bhaskhar
56 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
Convolutional Neural Network Report
No ratings yet
Convolutional Neural Network Report
5 pages
ML II - Unit IV
No ratings yet
ML II - Unit IV
20 pages
Ug4 Proj
No ratings yet
Ug4 Proj
44 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Literature Review On Image Classification Architecture
No ratings yet
Literature Review On Image Classification Architecture
14 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
CNN Apps
No ratings yet
CNN Apps
17 pages
Goog Le Net
No ratings yet
Goog Le Net
30 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
DL Ass 742
No ratings yet
DL Ass 742
14 pages
MA - Koelbl Memoire CNN
No ratings yet
MA - Koelbl Memoire CNN
79 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Lec 2
No ratings yet
Lec 2
42 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
2017 MSSC Verhelst eDNNP-1
No ratings yet
2017 MSSC Verhelst eDNNP-1
11 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Inception New
No ratings yet
Inception New
11 pages
ML Seminar
No ratings yet
ML Seminar
58 pages
2023 AN2DL Lez 4 CNN Famous Architectures
No ratings yet
2023 AN2DL Lez 4 CNN Famous Architectures
113 pages
Deep Learning Assign 2
No ratings yet
Deep Learning Assign 2
5 pages
Deep Learning (22CS63) : Module-3
No ratings yet
Deep Learning (22CS63) : Module-3
58 pages
Unit V
No ratings yet
Unit V
84 pages
CV Course
No ratings yet
CV Course
33 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Tesi
No ratings yet
Tesi
73 pages
Hot Chips Overview
No ratings yet
Hot Chips Overview
47 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Algorithms in ML
No ratings yet
Algorithms in ML
15 pages
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
No ratings yet
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
9 pages
Pattern Assignment
No ratings yet
Pattern Assignment
18 pages
Deep Learning For Health Informatics
No ratings yet
Deep Learning For Health Informatics
19 pages
ML 2022 Sheet 10
No ratings yet
ML 2022 Sheet 10
1 page
DL MCQ
No ratings yet
DL MCQ
13 pages
19eid331 - Artificial Neural Networks
No ratings yet
19eid331 - Artificial Neural Networks
3 pages
Optimization For Long-Term Dependencies
No ratings yet
Optimization For Long-Term Dependencies
57 pages
LATIHAN SOAL Jaringan Syaraf Tiruan
No ratings yet
LATIHAN SOAL Jaringan Syaraf Tiruan
43 pages
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
No ratings yet
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
35 pages
Week9 (Learning Perceptron and Delta)
No ratings yet
Week9 (Learning Perceptron and Delta)
57 pages
Anime Face Generation Using DC-GANs
No ratings yet
Anime Face Generation Using DC-GANs
6 pages
Course Plan 21AIC202J NNML - 24-25 - Even
No ratings yet
Course Plan 21AIC202J NNML - 24-25 - Even
9 pages
Lab2 Solution PDF
No ratings yet
Lab2 Solution PDF
2 pages
Lab Manual Soft Computing
No ratings yet
Lab Manual Soft Computing
44 pages
CD-601 Assignmentquestions
No ratings yet
CD-601 Assignmentquestions
2 pages
UNIT 4 (MCQS)
No ratings yet
UNIT 4 (MCQS)
13 pages
Adobe Scan Dec 17, 2023
No ratings yet
Adobe Scan Dec 17, 2023
1 page
Chapter 9
No ratings yet
Chapter 9
9 pages
BeyondAI Proceedings 2024
No ratings yet
BeyondAI Proceedings 2024
19 pages
Question Bank For NN
No ratings yet
Question Bank For NN
6 pages
ANN Presentation
No ratings yet
ANN Presentation
10 pages
Ann - Lab - Ipynb - Colaboratory
No ratings yet
Ann - Lab - Ipynb - Colaboratory
7 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
ICS423 IoT Syllabus
No ratings yet
ICS423 IoT Syllabus
2 pages
hw1 Deep Learning SoSe2024
No ratings yet
hw1 Deep Learning SoSe2024
2 pages
DL3 Backpropagation
No ratings yet
DL3 Backpropagation
17 pages
ML Module 5
No ratings yet
ML Module 5
22 pages
A Gentle Introduction To Neural Networks AI
No ratings yet
A Gentle Introduction To Neural Networks AI
1 page
Transactions On Emerging Telecommunications Technologies 2020 Ahmad Network Intrusion Detection System A Systemati
No ratings yet
Transactions On Emerging Telecommunications Technologies 2020 Ahmad Network Intrusion Detection System A Systemati
29 pages