0% found this document useful (0 votes)
17 views27 pages

CS436 CS5310 Ee513 L05 CNN2

The document is a lecture on Convolutional Neural Networks (CNNs) presented by Murtaza Taj and Usman Nazir, covering fundamental concepts and case studies of various CNN architectures such as AlexNet, GoogLeNet, and ResNet. It includes details on convolution operations, pooling layers, and parameters associated with different models used in image classification tasks. The lecture also highlights advancements in object detection and segmentation techniques within the field of computer vision.

Uploaded by

Rao aafaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views27 pages

CS436 CS5310 Ee513 L05 CNN2

The document is a lecture on Convolutional Neural Networks (CNNs) presented by Murtaza Taj and Usman Nazir, covering fundamental concepts and case studies of various CNN architectures such as AlexNet, GoogLeNet, and ResNet. It includes details on convolution operations, pooling layers, and parameters associated with different models used in image classification tasks. The lecture also highlights advancements in object detection and segmentation techniques within the field of computer vision.

Uploaded by

Rao aafaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CS 436 / CS 5310 / EE513

Computer Vision Fundamentals

Murtaza Taj/Usman Nazir


[email protected]

Lecture 5: Convolutional Neural Network


Mon 18th Sep 2023
.92

Convolution
Convolution
Convolution

connected

connected
Pooling
Pooling

ReLU
ReLU
ReLU

Fully

Fully
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1

O
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1

.51

Brandon Rohrer https://brohrer.github.io/how_convolutional_neural_networks_work.html


Case Studies
! Methods on ImageNet
! LeNet-5
! AlexNet
! VGG-16
! ResNet-152
! Inception / GoogLeNet
! Efficient Net

! Object Detection
! YOLO
! DETR

! Segmentation
! Segment Anything Model (SAM)
! CLIPSeg
Case Studies
! Methods on ImageNet
! LeNet-5
! AlexNet
! VGG-16
! ResNet-152
! Inception / GoogLeNet
! Efficient Net

! Object Detection
! YOLO
! DETR

! Segmentation
! Segment Anything Model (SAM)
! CLIPSeg
Case Studies
(W - F +2*P)/S + 1
AlexNet (227 - 11 +2*0)/4 + 1 =
POOL POOL

11x11 3x3 5x5 3x3


s=4 s=2 p=2 s=2
p=0 f = 256
227x227x3 f = 96

POOL
= ⋮ ⋮ ⋮
3x3
3x3 3x3 3x3 s=2 Softmax
p=1 p=1 p=1 1000
f = 384 f = 384 f = 256
9216 4096 4096
Layer Output Total Params Total Params
Volume (with bais) (without bais)
Name No. of filters Filter size Stride Pad

Input

Conv1 96 11x11 4 0 55x55x96 (11x11x3+1)*96

[Krizhevsky et al., 2012. ImageNet classification with deep convolutional neural networks]
AlexNet
POOL POOL

11x11 3x3 5x5 3x3


s=4 s=2 same s=2
p=0 27x27x96 27x27x256 13x13x256
227x227x3 55x55x96

POOL
= ⋮ ⋮ ⋮
3x3
3x3 3x3 3x3 s=2 Softmax
same same same 1000
13x13x384 13x13x384 13x13x256 6x6x256 9216 4096 4096

[Krizhevsky et al., 2012. ImageNet classification with deep convolutional neural networks]
AlexNet

58.62mn parameters in FC as compared to total 62.38mn total parameters


Case Study: GoogLeNet [Szegedy et al. 2014]

Fei-Fei Li, Karpathy & Justin CS231-2016


1x1 Convolution

Table 1

Image Filter Conv Output


1 5 4 7 2 3 2 2 10 8 14 4 6
4 3 6 0 2 6 8 6 12 0 4 12
9 1 9 4 3 7 18 2 18 8 6 14
4 0 2 6 4 4 8 0 4 12 8 8
3 5 5 7 7 2 6 10 10 14 14 4
0 9 3 6 7 3 0 18 6 12 14 6
1x1 Convolution
ConvNet

Fei-Fei Li, Karpathy & Justin CS231-2016


Case Study: GoogLeNet [Szegedy et al. 2014]

CONV
5x5
same
f= 32 28x28x32
28x28x192

Multiplications: 5 x 5 x192 x 28 x 28 x 32 = 120m


Case Study: GoogLeNet [Szegedy et al. 2014]

CONV CONV
1x1, 16, 5x5, 32,
1x1 192 5x5 16
28 28 16
28x28x32
28x28x192

Multiplications: 1 x 1 x192 x 28 x 28 x 16= 2.4m

Multiplications: 5 x 5 x 16 x 28 x 28 x 32= 10.0m

12.4m vs. 120m


Case Study: GoogLeNet [Szegedy et al. 2014]

! Inception Motivation

1x1

3x3
64

128
5x5 28
32
32
28
28x28x192 MAX-POOL

Multiplications: 1 x 1 x 192 x 28 x 28 x 64 = 9,633,792


Multiplications: 3 x 3 x 192 x 28 x 28 x 128 = 173,408,256 303.5m vs. 963.4m
Multiplications: 5 x 5 x 192 x 28 x 28 x 32 = 120,422,400

[Szegedy et al. 2014. Going deeper with convolutions]


Case Studies
! ImageNet Challenge - 1000 class classification
! Classical Methods
! LeNet-5 - 1998 (MNIST Digit classification)
! AlexNet - 2012
! VGG-16 - 2014
! ResNet-152
! Inception / GoogLeNet
Case Study: ResNet [He et al. 2015]

Fei-Fei Li, Karpathy & Justin CS231-2016


Case Study: ResNet [He et al. 2015]

Fei-Fei Li, Karpathy & Justin CS231-2016


Case Study: ResNet [He et al. 2015]

Fei-Fei Li, Karpathy & Justin CS231-2016


Case Study: ResNet [He et al. 2015]

Fei-Fei Li, Karpathy & Justin CS231-2016


Case Study: ResNet [He et al. 2015]

Fei-Fei Li, Karpathy & Justin CS231-2016


Case Study: ResNet [He et al. 2015]

Fei-Fei Li, Karpathy & Justin CS231-2016


Case Study: ResNet [He et al. 2015]

Fei-Fei Li, Karpathy & Justin CS231-2016


CNN Top 5 Error vs. Number of Layers

https://www.researchgate.net/figure/Recent-ConvNets-proposed-in-ILSVRC_fig1_338797371
Next: Object Detection

You might also like