0% found this document useful (0 votes)

294 views13 pages

CNN Case Studies Unit 4

The document discusses the architecture and design principles of various convolutional neural network models for computer vision tasks. It provides an overview of classic models like LeNet-5 and AlexNet as well as modern models such as VGG-16, Inception, ResNet, ResNeXt, and DenseNet. These architectures serve as feature extractors that can be adapted to solve computer vision problems like image classification, object detection, and segmentation. They follow principles of applying successive convolutional layers while downsampling spatial dimensions and increasing feature maps.

Uploaded by

Anushka Janoti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

294 views13 pages

CNN Case Studies Unit 4

Uploaded by

Anushka Janoti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

almost all CNN architectures follow the same general design principles of successively applying

convolutional layers to the input, periodically downsampling the spatial dimensions while
increasing the number of feature maps.

While the classic network architectures were comprised simply of stacked convolutional layers,
modern architectures explore new and innovative ways for constructing convolutional layers in a
way which allows for more efficient learning. Almost all of these architectures are based on a
repeatable unit which is used throughout the network.

These architectures serve as general design guidelines which machine learning practitioners will
then adapt to solve various computer vision tasks. These architectures serve as rich feature
extractors which can be used for image classification, object detection, image segmentation, and
many other more advanced tasks.

Classic network architectures

 LeNet-5
 AlexNet
 VGG 16
Modern network architectures

 Inception
 ResNet
 ResNeXt
 DenseNet

LeNet-5
Yann Lecun's LeNet-5 model was developed in 1998 to identify handwritten digits for zip code
recognition in the postal service. This pioneering model largely introduced the convolutional
neural network as we know it today.

Architecture
Convolutional layers use a subset of the previous layer's channels for each filter to reduce
computation and force a break of symmetry in the network. The subsampling layers use a form of
average pooling.

Parameters: 60,000
Paper: Gradient-based learning applied to document recognition

AlexNet
AlexNet was developed by Alex Krizhevsky et al. in 2012 to compete in the ImageNet
competition. The general architecture is quite similar to LeNet-5, although this model is
considerably larger. The success of this model (which took first place in the 2012 ImageNet
competition) convinced a lot of the computer vision community to take a serious look at deep
learning for computer vision tasks.

Architecture

Parameters: 60 million
Paper: ImageNet Classification with Deep Convolutional Neural Networks

VGG-16
The VGG network, introduced in 2014, offers a deeper yet simpler variant of the convolutional
structures discussed above. At the time of its introduction, this model was considered to be very
deep.

Architecture
Parameters: 138 million
Paper: Very Deep Convolutional Networks for Large-Scale Image Recognition

Inception (GoogLeNet)
In 2014, researchers at Google introduced the Inception network which took first place in the 2014
ImageNet competition for classification and detection challenges.

The model is comprised of a basic unit referred to as an "Inception cell" in which we perform a
series of convolutions at different scales and subsequently aggregate the results. In order to save
computation, 1x1 convolutions are used to reduce the input channel depth. For each cell, we learn
a set of 1x1, 3x3, and 5x5 filters which can learn to extract features at different scales from the
input. Max pooling is also used, albeit with "same" padding to preserve the dimensions so that the
output can be properly concatenated.
These researchers published a follow-up paper which introduced more efficient alternatives to the
original Inception cell. Convolutions with large spatial filters (such as 5x5 or 7x7) are beneficial
in terms of their expressiveness and ability to extract features at a larger scale, but the computation
is disproportionately expensive. The researchers pointed out that a 5x5 convolution can be more
cheaply represented by two stacked 3x3 filters.
Whereas a 5×5×� filter requires 25� parameters, two 3×3×� filters only
require 18� parameters. In order to most accurately represent a 5x5 filter, we shouldn't use any
nonlinear activations between the two 3x3 layers. However, it was discovered that "linear
activation was always inferior to using rectified linear units in all stages of the factorization."

It was also shown that 3x3 convolutions could be further deconstructed into successive 3x1 and
1x3 convolutions.

Generalizing this insight, we can more efficiently compute an �×� convolution as

a 1×� convolution followed by a �×1 convolution.

Architecture

In order to improve overall network performance, two auxiliary outputs are added throughout the
network. It was later discovered that the earliest auxiliary output had no discernible effect on the
final quality of the network. The addition of auxiliary outputs primarily benefited the end
performance of the model, converging at a slightly better value than the same network architecture
without an auxiliary branch. It is believed the addition of auxiliary outputs had a regularizing effect
on the network.

A revised, deeper version of the Inception network which takes advantage of the more efficient
Inception cells is shown below.

Parameters: 5 million (V1) and 23 million (V3)

Papers:
 Going deeper with convolutions
 Rethinking the Inception Architecture for Computer Vision

ResNet
Deep residual networks were a breakthrough idea which enabled the development of much deeper
networks (hundreds of layers as opposed to tens of layers).

Its a generally accepted principle that deeper networks are capable of learning more complex
functions and representations of the input which should lead to better performance. However,
many researchers observed that adding more layers eventually had a negative effect on the final
performance. This behavior was not intuitively expected, as explained by the authors below.

Let us consider a shallower architecture and its deeper counterpart that adds more layers onto it.
There exists a solution by construction to the deeper model: the added layers are identity mapping,
and the other layers are copied from the learned shallower model. The existence of this constructed
solution indicates that a deeper model should produce no higher training error than its shallower
counterpart. But experiments show that our current solvers on hand are unable to find solutions
that are comparably good or better than the constructed solution (or unable to do so in feasible
time).
This phenomenon is referred to by the authors as the degradation problem - alluding to the fact
that although better parameter initialization techniques and batch normalization allow for deeper
networks to converge, they often converge at a higher error rate than their shallower counterparts.
In the limit, simply stacking more layers degrades the model's ultimate performance.
The authors propose a remedy to this degradation problem by introducing residual blocks in which
intermediate layers of a block learn a residual function with reference to the block input. You can
think of this residual function as a refinement step in which we learn how to adjust the input feature
map for higher quality features. This compares with a "plain" network in which each layer is
expected to learn new and distinct feature maps. In the event that no refinement is needed, the
intermediate layers can learn to gradually adjust their weights toward zero such that the residual
block represents an identity function.

Note: It was later discovered that a slight modification to the original proposed unit offers better
performance by more efficiently allowing gradients to propagate through the network during
training.
Wide residual networks
Although the original ResNet paper focused on creating a network architecture to enable deeper
structures by alleviating the degradation problem, other researchers have since pointed out that
increasing the network's width (channel depth) can be a more efficient way of expanding the
overall capacity of the network.
Architecture
Each colored block of layers represent a series of convolutions of the same dimension. The feature
mapping is periodically downsampled by strided convolution accompanied by an increase in
channel depth to preserve the time complexity per layer. Dotted lines denote residual connections
in which we project the input via a 1x1 convolution to match the dimensions of the new block.

The diagram above visualizes the ResNet 34 architecture. For the ResNet 50 model, we simply
replace each two layer residual block with a three layer bottleneck block which uses 1x1
convolutions to reduce and subsequently restore the channel depth, allowing for a reduced
computational load when calculating the 3x3 convolution.

Parameters: 25 million (ResNet 50)

Papers:
 Deep Residual Learning for Image Recognition
 Identity Mappings in Deep Residual Networks
 Wide Residual Networks

ResNeXt
The ResNeXt architecture is an extension of the deep residual network which replaces the standard
residual block with one that leverages a "split-transform-merge" strategy (ie. branched paths
within a cell) used in the Inception models. Simply, rather than performing convolutions over the
full input feature map, the block's input is projected into a series of lower (channel) dimensional
representations of which we separately apply a few convolutional filters before merging the results.
This idea is quite similar to group convolutions, which was an idea proposed in the AlexNet
paper as a way to share the convolution computation across two GPUs. Rather than creating filters
with the full channel depth of the input, the input is split channel-wise into groups with each as
shown below.

It was discovered that using grouped convolutions led to a degree of specialization among groups
where separate groups focused on different characteristics of the input image.
The ResNeXt paper refers to the number of branches or groups as the cardinality of the ResNeXt
cell and performs a series of experiments to understand relative performance gains between
increasing the cardinality, depth, and width of the network. The experiments show that increasing
cardinality is more effective at benefiting model performance than increasing the width or depth
of the network. The experiments also suggest that "residual connections are helpful
for optimization, whereas aggregated transformations are (helpful for) stronger representations."
Architecture

The ResNeXt architecture simply mimicks the ResNet models, replacing the ResNet blocks for
the ResNeXt block.

Paper: Aggregated Residual Transformations for Deep Neural Networks

DenseNet
The idea behind dense convolutional networks is simple: it may be useful to reference feature
maps from earlier in the network. Thus, each layer's feature map is concatenated to the input
of every successive layer within a dense block. This allows later layers within the network
to directly leverage the features from earlier layers, encouraging feature reuse within the network.
The authors state, "concatenating feature-maps learned by different layers increases variation in
the input of subsequent layers and improves efficiency."
When I first came across this model, I figured that it would have an absurd number of parameters
to support the dense connections between layers. However, because the network is capable of
directly using any previous feature map, the authors found that they could work with very small
output channel depths (ie. 12 filters per layer), vastly reducing the total number of parameters
needed. The authors refer to the number of filters used in each convolutional layer as a "growth
rate", �, since each successive layer will have � more channels than the last (as a result of
accumulating and concatenating all previous layers to the input).
When compared with ResNet models, DenseNets are reported to acheive better performance with
less complexity.
Architecture

For a majority of the experiments in the paper, the authors mimicked the general ResNet model
architecture, simply swapping in the dense block as the repeated unit.
Parameters:
 0.8 million (DenseNet-100, k=12)

 15.3 million (DenseNet-250, k=24)

 40 million (DenseNet-190, k=40)

Paper: Densely Connected Convolutional Networks

Project Report On Algorithm Visualizer
No ratings yet
Project Report On Algorithm Visualizer
108 pages
Ccs336 CSM Lab Manual
No ratings yet
Ccs336 CSM Lab Manual
30 pages
ML LAB Mannual-1
No ratings yet
ML LAB Mannual-1
79 pages
Ad3511 DL Lab All Lab Manual
No ratings yet
Ad3511 DL Lab All Lab Manual
36 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
Ai Notes Jntuk r20 Unit 1
No ratings yet
Ai Notes Jntuk r20 Unit 1
17 pages
UNIT 4 - Perceptron and DL
No ratings yet
UNIT 4 - Perceptron and DL
39 pages
Unit 1 Aktu
No ratings yet
Unit 1 Aktu
26 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Pattern Recognition and Anomaly Detection Lab
No ratings yet
Pattern Recognition and Anomaly Detection Lab
3 pages
Module 5
No ratings yet
Module 5
31 pages
PPS Course Material
100% (1)
PPS Course Material
177 pages
Software Engineering ASSIGNMENT QUESTION
100% (1)
Software Engineering ASSIGNMENT QUESTION
5 pages
Difference Between AlexNet, VGGNet, ResNet, and Inception
No ratings yet
Difference Between AlexNet, VGGNet, ResNet, and Inception
25 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
CS3491 - Notes - Unit 4 - Ensemble Techniques and Unsupervised Learning
No ratings yet
CS3491 - Notes - Unit 4 - Ensemble Techniques and Unsupervised Learning
35 pages
Artificial Intelligence Aakash
No ratings yet
Artificial Intelligence Aakash
129 pages
Unit 4
100% (1)
Unit 4
57 pages
Aids - VSB Syllabus 2023 - 16.8.24
No ratings yet
Aids - VSB Syllabus 2023 - 16.8.24
88 pages
ML Lab
No ratings yet
ML Lab
62 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Soft Computing Quantum
No ratings yet
Soft Computing Quantum
100 pages
CSS - 1st Sem - 1st Quarter - DLL
100% (2)
CSS - 1st Sem - 1st Quarter - DLL
44 pages
IF4071 Deep Learning QP
No ratings yet
IF4071 Deep Learning QP
2 pages
Ad3461 ML Lab Manual
No ratings yet
Ad3461 ML Lab Manual
48 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
DL Question Bank
No ratings yet
DL Question Bank
23 pages
Unit-5 DS Notes
No ratings yet
Unit-5 DS Notes
19 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
B800 Series Frequency Inverter: Bedford (Quanzhou) Electronic Co., LTD
No ratings yet
B800 Series Frequency Inverter: Bedford (Quanzhou) Electronic Co., LTD
32 pages
Unit 5
No ratings yet
Unit 5
61 pages
Counterpropagation Networks
No ratings yet
Counterpropagation Networks
6 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Unit 5
No ratings yet
Unit 5
23 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Question Bank - Module 2 - Module-3 Module 4 - Module 5
No ratings yet
Question Bank - Module 2 - Module-3 Module 4 - Module 5
4 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
Soft Computing Lab Manual
No ratings yet
Soft Computing Lab Manual
24 pages
Concepts in Deep Learning
No ratings yet
Concepts in Deep Learning
14 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
9 pages
Unit 4
No ratings yet
Unit 4
24 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Principles of Programming Languages: Eliezer A. Albacea
No ratings yet
Principles of Programming Languages: Eliezer A. Albacea
68 pages
Robotics and Machine Vision Internal 3 Important Questions
No ratings yet
Robotics and Machine Vision Internal 3 Important Questions
1 page
CSE Dept. PPT 176 173
No ratings yet
CSE Dept. PPT 176 173
17 pages
Perceptron and Backpropagation
No ratings yet
Perceptron and Backpropagation
17 pages
Assignment On RNN
No ratings yet
Assignment On RNN
1 page
Qxtend IG v0182
No ratings yet
Qxtend IG v0182
98 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
19cs413 Artificial Intelligence
No ratings yet
19cs413 Artificial Intelligence
3 pages
Module 03 OS
No ratings yet
Module 03 OS
35 pages
Project 1 - Optimization Techniques For Memory Management in C++
No ratings yet
Project 1 - Optimization Techniques For Memory Management in C++
6 pages
PSLP Akash
No ratings yet
PSLP Akash
88 pages
Eve NG Comm Book
No ratings yet
Eve NG Comm Book
152 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
Lecture 2-Hardware Architecture (Part 1)
No ratings yet
Lecture 2-Hardware Architecture (Part 1)
22 pages
Medison128BW SA6000 II
No ratings yet
Medison128BW SA6000 II
8 pages
18AI61
No ratings yet
18AI61
3 pages
Lecture Notes - Recurrent Neural Networks
No ratings yet
Lecture Notes - Recurrent Neural Networks
11 pages
AWS Helper
No ratings yet
AWS Helper
67 pages
Procomm Plus® Version 4.8 Configuration
No ratings yet
Procomm Plus® Version 4.8 Configuration
5 pages
Mahesh (7 0)
No ratings yet
Mahesh (7 0)
6 pages
Form STUDY KELAYAKAN MUSTAHIK (Responses)
No ratings yet
Form STUDY KELAYAKAN MUSTAHIK (Responses)
41 pages
Experiment Trial Gamma Trial Session: Reinforce - Cartpole Search
No ratings yet
Experiment Trial Gamma Trial Session: Reinforce - Cartpole Search
5 pages
Jyothi Chauhan New
No ratings yet
Jyothi Chauhan New
3 pages
Neural Networks
No ratings yet
Neural Networks
1 page
Files 2 2019 December NotesHubDocument 1577582110
No ratings yet
Files 2 2019 December NotesHubDocument 1577582110
57 pages
Assignment Problem
No ratings yet
Assignment Problem
5 pages
Production Activity Control Scheduling
No ratings yet
Production Activity Control Scheduling
31 pages
Summer Training Report 1
No ratings yet
Summer Training Report 1
40 pages
05 - RLC and MAC Protocols
No ratings yet
05 - RLC and MAC Protocols
66 pages
DB Report Paper
No ratings yet
DB Report Paper
8 pages
Gowtham P 4
No ratings yet
Gowtham P 4
2 pages
Catalogue 1VAP428601-DB - SCV - ABB
No ratings yet
Catalogue 1VAP428601-DB - SCV - ABB
2 pages
Software Project Management Unit-3 - 1 PDF
No ratings yet
Software Project Management Unit-3 - 1 PDF
2 pages
Final Examination in Empowerment Technologies
No ratings yet
Final Examination in Empowerment Technologies
3 pages
Unit 4
No ratings yet
Unit 4
17 pages
Pentesting Methodologies
No ratings yet
Pentesting Methodologies
12 pages
ESG Economic Validation - Aruba ESP
No ratings yet
ESG Economic Validation - Aruba ESP
11 pages
A Combinatorial Interpretation of Double Base Number System and Some Consequences
No ratings yet
A Combinatorial Interpretation of Double Base Number System and Some Consequences
15 pages
Cursed Emoji Love - Google Search
No ratings yet
Cursed Emoji Love - Google Search
1 page
EMMC Bus Protocol Linux Kernel Internals by SSM
No ratings yet
EMMC Bus Protocol Linux Kernel Internals by SSM
10 pages
Recipes4Success Tools: Using The Rubric Maker
No ratings yet
Recipes4Success Tools: Using The Rubric Maker
7 pages