0% found this document useful (0 votes)
109 views31 pages

CNN

This document provides an introduction to convolutional neural networks (CNNs). It discusses that CNNs are a type of feedforward artificial neural network applied to visual imagery. It then describes the key layers in a CNN, including convolution layers, activation layers, pooling layers, fully connected layers, and loss layers. Regularization techniques for CNNs like dropout and batch normalization are also covered. Examples of CNN applications in computer vision and robotics are given. In the end, the document discusses software based on neural networks and potential alternatives to CNNs.

Uploaded by

gourav Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views31 pages

CNN

This document provides an introduction to convolutional neural networks (CNNs). It discusses that CNNs are a type of feedforward artificial neural network applied to visual imagery. It then describes the key layers in a CNN, including convolution layers, activation layers, pooling layers, fully connected layers, and loss layers. Regularization techniques for CNNs like dropout and batch normalization are also covered. Examples of CNN applications in computer vision and robotics are given. In the end, the document discusses software based on neural networks and potential alternatives to CNNs.

Uploaded by

gourav Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to

Convolutional
Neural
Networks

2018 / 02 / 23
Buzzword: CNN
Convolutional neural networks (CNN, ConvNet) is a class of deep,
feed-forward (not recurrent) artificial neural networks that are applied to
analyzing visual imagery.
Buzzword: CNN
● Convolution

From wikipedia,
Buzzword: CNN
● Neural Networks
Background: Visual Signal Perception
Background: Signal Relay

Starting from V1 primary visual cortex, visual signal is transmitted upwards, becoming more
complicated and abstract.
Background: Neural Networks

Express the equations in Matrix


form, we have
Neural Networks for Images
For computer vision, why can’t we just flatten the image and feed it
through the neural networks?
Neural Networks for Images
Images are high-dimensional vectors. It would take a huge amount of
parameters to characterize the network.
Convolutional Neural Networks
To address this problem, bionic convolutional neural networks are proposed to
reduced the number of parameters and adapt the network architecture specifically to
vision tasks.

Convolutional neural networks are usually composed by a set of layers that can be
grouped by their functionalities.
Sample Architecture
Convolution
Layer
● The process is a 2D
convolution on the inputs.
● The “dot products”
between weights and
inputs are “integrated”
across “channels”.
● Filter weights are shared
across receptive fields. The
filter has same number of
layers as input volume
channels, and output
volume has same “depth”
as the number of filters.
Convolution
Layer
● The process is a 2D
convolution on the inputs.
● The “dot products”
between weights and
inputs are “integrated”
across “channels”.
● Filter weights are shared
across receptive fields. The
filter has same number of
layers as input volume
channels, and output
volume has same “depth”
as the number of filters.
Convolution
Layer
● The process is a 2D
convolution on the inputs.
● The “dot products”
between weights and
inputs are “integrated”
across “channels”.
● Filter weights are shared
across receptive fields. The
filter has same number of
layers as input volume
channels, and output
volume has same “depth”
as the number of filters.
Convolution
Layer
Activation
Layer

● Used to increase
non-linearity of the
network without affecting
receptive fields of conv
layers
● Prefer ReLU, results in Other types:
faster training Leaky ReLU, Randomized Leaky ReLU, Parameterized ReLU
Exponential Linear Units (ELU), Scaled Exponential Linear Units
● LeakyReLU addresses the
Tanh, hardtanh, softtanh, softsign, softmax, softplus...
vanishing gradient
problem
Softmax

● A special kind of activation layer,


usually at the end of FC layer
outputs
● Can be viewed as a fancy
normalizer (a.k.a. Normalized
Given sample vector input x and weight
exponential function)
vectors {wi}, the predicted probability of y = j
● Produce a discrete probability
distribution vector
● Very convenient when combined
with cross-entropy loss
Pooling Layer

● Convolutional layers
provide activation maps.
● Pooling layer applies
non-linear downsampling
on activation maps.
● Pooling is aggressive
(discard info); the trend is
to use smaller filter size
and abandon pooling
Pooling Layer
FC Layer
● Regular neural network
● Can view as the final
learning phase, which
maps extracted visual
features to desired outputs
● Usually adaptive to
classification/encoding
tasks
● Common output is a
vector, which is then In above example, FC generates a number which is then passed
passed through softmax to through a sigmoid to represent grasp success probability
represent confidence of
classification
● The outputs can also be
used as “bottleneck”
Loss Layer

● L1, L2 loss
● Cross-Entropy loss (works
well for classification, e.g.,
image classification) Binary case
● Hinge Loss
● Huber Loss, more resilient
to outliers with smooth General case
gradient
● Minimum Squared Error
(works well for regression
task, e.g., Behavioral
Cloning)
Regularization

● L1 / L2
● Dropout
● Batch norm
● Gradient clipping
● Max norm constraint

To prevent overfitting with


huge amount of training data
Dropout

● During training, randomly


ignore activations by
probability p
● During testing, use all
activations but scale them
by p
● Effectively prevent
overfitting by reducing
correlation between
neurons
Batch
Normalization

● Makes networks robust to bad initialization of


weights
● Usually inserted right before activation layers
● Reduce covariance shift by normalizing and
scaling inputs
● The scale and shift parameters are trainable
to avoid losing stability of the network
Example:
ResNet ● Residual Network, by Kaiming He (2015)
● Heavy usage of “skip connections” which are similar to
RNN Gated Recurrent Units (GRU)
● Commonly used as visual feature extractor in all kinds of
learning tasks, ResNet50, ResNet101, ResNet152
● 3.57% Top-5 accuracy, beats human
Applications
Can be viewed as a fancy feature extractor, just like SIFT, SURF, etc
CNN & Vision: RedEye
https://roblkw.com/papers/likamwa2016redeye-isca.pdf

Two keys: Add noise, save bandwidth/energy


CNN & Robotics: RL Example
Usually used with Multi-Layer Perceptron (MLP, can be viewed as a fancy
term for non-trivial neural networks) for policy networks.
Software 2.0
Quotes from Andrej Karpathy:

Software 1.0 is what we’re all familiar with — it is written in languages such as Python,
C++, etc. It consists of explicit instructions to the computer written by a programmer. By
writing each line of code, the programmer is identifying a specific point in program space
with some desirable behavior.

Software 2.0 is written in neural network weights. No human is involved in writing this
code because there are a lot of weights (typical networks might have millions). Instead,
we specify some constraints on the behavior of a desirable program (e.g., a dataset of
input output pairs of examples) and use the computational resources at our disposal to
search the program space for a program that satisfies the constraints.
Is CNN the Answer?
Capsule?

End2End is not the right way?

You might also like