0% found this document useful (0 votes)
8 views

Lecture2 Advanced CNN

CNN

Uploaded by

Quang Uy Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture2 Advanced CNN

CNN

Uploaded by

Quang Uy Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Advanced Convolutional Neural Networks

Nguyen Quang Uy

1
Outline
1. Alexnet

2. VGGnet

3. Googlenet

4. Resnet

5. Mobilenet

6. Efficientnet
2
Legends

3
Layers

4
Activation functions

5
Modules/Blocks

6
Repeated layers

7
Alexnet

8
Overview
• Paper: ImageNet Classification with Deep Convolutional Neural Networks
• Published in: NeurIPS 2012.
• Considered to be the most impact in computer vision.

9
Novelties
• Use Rectified Linear Units (ReLUs) as activation functions.
• Use Dropout layer.
• Use data augmentation.

10
Architecture
• AlexNet has 8 layers — 5 convolutional and 3 fully-connected.
• AlexNet Has 60M parameters.

11
Results
• Top-1 error rates is 37.5%
• Top-5 error rates 17.0%

12
VGG

13
Overview
• VGG: Visual Geometry Group
• Paper: Very Deep Convolutional Networks for Large-Scale Image
Recognition
• Published in arXiv 2014

14
Novelties
• Designing of deeper networks (roughly twice as deep as AlexNet). This was done by
stacking uniform convolutions.
• They use only 3x3 kernels, as opposed to AlexNet 11x11. This design decreases the
number of parameters.

15
Architecture
• VGG has 13 convolutional and 3 fully-connected layers.
• This network stacks more layers onto AlexNet.
• It consists of 138M parameters.

16
VGG result
• Top-1 accuracy is 71.5%
• Top-1 accuracy 90.1%

17
Googlenet

18
Overview
• Also known as Inception-v1
• Paper: Going Deeper with Convolutions
• Published in: 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
• Achieve competitive result compared to human

19
Novelties
• Building networks using modules/blocks, instead of stacking convolutional layers.
• 1×1 conv are used for dimensionality reduction to remove computational bottlenecks.
• Have parallel convolutions with filters at 1×1, 3×3 and 5×5, followed by concatenation.
• Use two auxiliary classifiers to encourage discrimination in the lower stages.

20
Architecture

21
Architecture
• Stem and Inception module.

22
Results
• Top-1 accuracy is 78.2%
• Top-5 accuracy is 94.1%
• Human error is 5%-8%.

23
Resnet

24
Overview
• Paper: Deep Residual Learning for Image Recognition.
• Published in: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
• The first network achieves better result then human.

25
Novelties
• Popularise skip connections (they weren’t the first to use skip connections).
• Design even deeper CNNs (up to 152 layers) without compromising model’s
generalisation power
• Among the first to use batch normalisation.

26
Architecture
• Conv block and Identity module.

27
Architecture
• Conv block and Identity module.

28
Resnet result
• Top-1 accuracy is 87.0%.
• Top-5 accuracy 96.3%.
• Top-5 human accuracy: 95.0%

29
Mobilenet

30
Overview
• Paper: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
Applications
• Published in: 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR).
• Specially designed to be used in mobile devices.

31
Novelties
• MobileNet uses depthwise separable convolutions. It significantly reduces the number of
parameters.
• It introduces two shrinking hyperparameters that efficiently trade off between latency and
accuracy

32
Architecture

33
Architecture
• Deepwise separable convolution.

34
Architecture
• Deepwise convolution:
• In a normal convolution, all channels of a kernel are used to produce a feature map.
• In a depthwise convolution, each channel of a kernel is used to produce a feature map.

35
Architecture
• Pointwise convolution.
• In a normal convolution, we just have to use 256 filters of size 5x5x3.
• In a pointwise convolution, we just have to use 256 filters of size 1x1x3.

36
Computation cost
• Standard convolution

• The computational cost can be calculated as

• Where DF is the dimensions of the input feature map and DK is the


size of the convolution kernel, M and N are the number of input and
output channels respectively.
37
Computation cost
• Depthwise convolution

• The computational cost can be calculated as

38
Computation cost
• Depthwise convolution

• The computational cost can be calculated as

39
Computation cost
• The total computational cost of Depthwise separable convolutions can be
calculated as.

• Comparing it with the computational cost of standard convolution, we get


the reduction in computation.

40
Results
• Mobilenet is better than Googlenet and VGG with much lower number of
operators and parameters.

41
Efficientnet

42
Overview
• Paper: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
• Published in: International Conference on Machine Learning, 2019.
• It is considered as the state-of-the-art until today.

43
Novelties
• Compound Scaling from B0 to B7.
• The EfficientNet Architecture (developed using Neural Architecture Search)

44
Architecture

45
Compound scaling
• The most common way to scale up ConvNets was either depth (number of layers), width
(number of channels) or image resolution (image size).
• EfficientNets perform Compound Scaling to scale all three dimensions while mantaining a
balance between all dimensions of the network.

46
Compound scaling
• This idea of compound scaling makes sense because if the input image is
bigger then the network needs more layers (depth) and more channels
(width) to capture more fine-grained patterns.

47
Neural Architecture Search
• This is a reinforcement learning based approach used to develop Efficient-B0 by
leveraging a multi-objective search that optimizes for both Accuracy and FLOPS.

48
Neural Architecture Search
• The objective function can formally be defined as:

49
Mobile inverted bottleneck convolution (MBConv)
• MBConv without squeeze and excitation operation

50
Mobile inverted bottleneck convolution (MBConv)
• MBConv with squeeze and excitation operation

51
Squeeze and excitation operation
• Access to global information
• Modelling channel interdependencies
• Which can be regarded as a self-attention function on channels

52
Scaling Efficient-B0 to get B1-B7
• Let the network depth(d), widt(w) and input image resolution(r) be:

• We then fix α, β, γ as constants and scale up baseline network with


different φ using Equation 3, to obtain EfficientNet-B1 to B7
53
Results

54
Q&A
Thank you!

55

You might also like