CNN
CNN
Convolutional
Neural
Networks
2018 / 02 / 23
Buzzword: CNN
Convolutional neural networks (CNN, ConvNet) is a class of deep,
feed-forward (not recurrent) artificial neural networks that are applied to
analyzing visual imagery.
Buzzword: CNN
● Convolution
From wikipedia,
Buzzword: CNN
● Neural Networks
Background: Visual Signal Perception
Background: Signal Relay
Starting from V1 primary visual cortex, visual signal is transmitted upwards, becoming more
complicated and abstract.
Background: Neural Networks
Convolutional neural networks are usually composed by a set of layers that can be
grouped by their functionalities.
Sample Architecture
Convolution
Layer
● The process is a 2D
convolution on the inputs.
● The “dot products”
between weights and
inputs are “integrated”
across “channels”.
● Filter weights are shared
across receptive fields. The
filter has same number of
layers as input volume
channels, and output
volume has same “depth”
as the number of filters.
Convolution
Layer
● The process is a 2D
convolution on the inputs.
● The “dot products”
between weights and
inputs are “integrated”
across “channels”.
● Filter weights are shared
across receptive fields. The
filter has same number of
layers as input volume
channels, and output
volume has same “depth”
as the number of filters.
Convolution
Layer
● The process is a 2D
convolution on the inputs.
● The “dot products”
between weights and
inputs are “integrated”
across “channels”.
● Filter weights are shared
across receptive fields. The
filter has same number of
layers as input volume
channels, and output
volume has same “depth”
as the number of filters.
Convolution
Layer
Activation
Layer
● Used to increase
non-linearity of the
network without affecting
receptive fields of conv
layers
● Prefer ReLU, results in Other types:
faster training Leaky ReLU, Randomized Leaky ReLU, Parameterized ReLU
Exponential Linear Units (ELU), Scaled Exponential Linear Units
● LeakyReLU addresses the
Tanh, hardtanh, softtanh, softsign, softmax, softplus...
vanishing gradient
problem
Softmax
● Convolutional layers
provide activation maps.
● Pooling layer applies
non-linear downsampling
on activation maps.
● Pooling is aggressive
(discard info); the trend is
to use smaller filter size
and abandon pooling
Pooling Layer
FC Layer
● Regular neural network
● Can view as the final
learning phase, which
maps extracted visual
features to desired outputs
● Usually adaptive to
classification/encoding
tasks
● Common output is a
vector, which is then In above example, FC generates a number which is then passed
passed through softmax to through a sigmoid to represent grasp success probability
represent confidence of
classification
● The outputs can also be
used as “bottleneck”
Loss Layer
● L1, L2 loss
● Cross-Entropy loss (works
well for classification, e.g.,
image classification) Binary case
● Hinge Loss
● Huber Loss, more resilient
to outliers with smooth General case
gradient
● Minimum Squared Error
(works well for regression
task, e.g., Behavioral
Cloning)
Regularization
● L1 / L2
● Dropout
● Batch norm
● Gradient clipping
● Max norm constraint
Software 1.0 is what we’re all familiar with — it is written in languages such as Python,
C++, etc. It consists of explicit instructions to the computer written by a programmer. By
writing each line of code, the programmer is identifying a specific point in program space
with some desirable behavior.
Software 2.0 is written in neural network weights. No human is involved in writing this
code because there are a lot of weights (typical networks might have millions). Instead,
we specify some constraints on the behavior of a desirable program (e.g., a dataset of
input output pairs of examples) and use the computational resources at our disposal to
search the program space for a program that satisfies the constraints.
Is CNN the Answer?
Capsule?