Deep Learning
Deep Learning
NETWORK
Handwritten Sequence
Stimulus
Response
Receptors Neural Net Effectors
Human Brain[2]
• Receptors – convert stimuli from the human body or
the external environment into electrical impulses that
convey information to the neural net (brain)
• Neural Net – continually receives information,
perceives it, and makes appropriate decisions
• Effectors – convert electrical impulses generated by
the neural net into discernible responses as system
output
• Arrows pointing from left to right indicate the forward
transmission of information-bearing signals through
the system
• Arrows pointing from right to left signify the presence
of feedback in the system
Feed forward Neural Networks
• Multilayer Perceptrons
• Deep Feedforward Networks
• A feedforward network defines a mapping y = f (x;
θ) and learns the value of the parameters θ that
result in the best function approximation.
• There are no feedback connections in which
outputs of the model are fed back into itself.
• When feedforward neural networks are extended
to include feedback connections, they are called
recurrent neural networks.[7]
Feed-forward Network
Feed Forward Neural Network
Convolution Neural Network
• Do we really need all the edges of the network?
• Can some of these be shared?
Local Receptive field
Hidden neuron
•The region in the input image is called the local receptive field for
the hidden neuron.
• Each connection learns a weight.
Hidden Layer
• Slide the local receptive field across the entire input image and
by sliding one pixel to the right (i.e., by one neuron), connect
to a second hidden neuron and so on building the first hidden
layer.
• For each local receptive field, there is a different hidden
neuron in the first hidden layer.
First Hidden layer
“beak” detector
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector
“middle
beak”
detector
A convolutional layer
A convolutional layer has a number of filters that does
convolutional operation.
Beak
detector
A filter
Convolution
These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter
0 1 0 0 1 0 1
-1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter
0 1 0 0 1 0
2
0 0 1 0 1 0 -1 1 -1
…
…
6x6
image Each filter detects a small
pattern (3 x 3).
1 -1 -1
Convolution
-1 1 -1 Filter
-1 -1 1 1
stride=1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6x6
image
Convolution
1 -1 -1
-1 1 -1 Filter
-1 -1 1 1
stride=1
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6x6 3 -2 -2 -1
image
Convolution 1 -1 -1
-1 1 -1 Filter
If stride=2 -1 -1 1 1
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6x6
image
Convolution -1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
3 -1 -3 -1
0 1 0 0 1 0 -1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Featur
0 0 1 0 1 0 - -3e Map
0
- -1 1-2 1
3
1
6x6 3 -2 -2 -1
- 0 -4 3
image
1
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Color image: RGB 3 channels
11 - - - - - - 11 - -
1 1 -11 -1 1 -1 1 1 -1
- - 111 - 1 -1 11 - 1
-1 1 - -1 Filter 1 1 -1 1 1- -1 Filter
-
1
-1 - 11 -1 11 - 1
1
- -1 - -1 11 - - -1 2
1 1 11 -1 1 11
Color 1 1
image 1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
fewer parameters!
1 0 0 0 0 1
0 1 0 0 1 0
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
conne
…
…
…
…
0 1 0 0 1 0
cted 0 0 1 0 1 0
Convolution and Shallow NN
Subsampling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter
2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Max Pooling
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 30 13
0 0 1 0 1 0 Poolin
g 2 x 2 image
6x6
image Each filter is
a channel
The whole CNN
cat dog
…… Convolution
Max Pooling
Can
Fully Connected repeat
Feedforward
network
Convolution many
times
Max Pooling
Flattened
The whole CNN
3 0
- 1 Convolution
1
30 13
Max Pooling
Can
A new image
repeat
Smaller than the Convolution many
times
original image
The number of Max Pooling
cat dog
…… Convolution
Max Pooling
Max Pooling
1
3 0
-1 1 3
30 1 -
3 Flattene Fully Connected
d 1
Feedforward network
1
3
A CNN compresses a fully connected
network in two ways:
http://wiki.bethanycrane.com/overfitting-of-
data
https://www.neuraldesigner.com/images/learning/selection_error.
svg
Regularization
Dropout
• Randomly drop units (along with their connections) during training
• Each unit retained with fixed probability p, independent of other units
• Hyper-parameter p to be chosen (tuned)
L2 = weight decay
f(x)=
x
Cost
(loss) Cross-entropy Mean Squared
function Error
Mean Absolute
Error
• When later layers in the network are learning well, early layers
often get stuck during training, learning almost nothing at all.
• In real world problems, often train and test datasets have not
been generated by the same distribution.
• If the mean and standard deviations for each input feature are
calculated over the mini-batch instead then the batch size must
be sufficiently representative of the range of each variable.
• Identity mapping does not have any parameters and is there to add
the output from the previous layer to the layer ahead.
• However, x and F(x) will not have the same dimension, because a
convolution operation typically shrinks the spatial resolution of an
image.
• The identity mapping is multiplied by a linear projection W for the
input x and F(x) to be combined as input to the next layer.
•
Ws term can be implemented with 1×1 convolutions,
introducing additional parameters to the model.
Residual Networks