Deep Learning - Part-1
Deep Learning - Part-1
2
Outline
ANN
Overview Neural
Biological Gradient
of Network Training NN
Neurons Descent
DL Layer
3
What is deep learning?
Overview of DL
Deep Learning is a growing trend in general data analysis and has been
termed one of the 10 breakthrough technologies
Deep learning has had a long and rich history, but has gone by many names
reflecting different philosophical viewpoints, and has waxed and waned in
popularity.
6
Overview of DL
8
Neurons in the Brain
The human brain is composed by 10
billions neuron which are
interconnected each others.
Neurons communicate by sending
electrical impulses one to another
Neurons receive inputs from other
neurons, carries out some computations
and sends its out put to other neurons
by electrical impulses
The ANN uses a very simplified mathematical model what a biological neuron does
ANNs are comprised of several interconnected computational units (neurons) arranged
in layers.
A basic operating unit in a neural network is a neuron-like node it takes input from
other nodes and sends output to others.
a computational unit that takes as input x1, x2, x3 .. xn and outputs y where f is the
activation function (E.g. binary threshold, Sigmoid, Softmax, ReLU, and others).
Each connection link is associated with a weight that determines the strength of
the interconnection
10
Model of an artificial Neuron: Perceptron
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑏
The neuron receives the weighted sum as input and calculates the output
as a function of input
Example- compute the output computed by the following perceptron,
which uses a sigmoid activation function and a bias value of 0.5
12
Logistic Regression vs Perceptron
𝜽𝟎
𝜽
𝒉𝜽 = 𝒂 = 𝝈 𝒛 = 𝝈 𝜽𝑻 𝒙 = 𝝈( 𝟏 𝒙𝟎 𝒙𝟏 𝒙𝟐 𝒙𝟑 )
𝒙𝟎 = 𝟏 𝜽𝟐
𝜽𝟎 𝜽𝟑
To account for the intercept term
𝒙𝟏
𝜽𝟏
= 𝝈(𝜽𝟎 𝒙𝟎 + 𝜽𝟏 𝒙𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑 )
𝜽𝟐
𝒙𝟐 𝒛 = 𝜽𝑻 𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽
𝜽𝟑 = 𝝈(𝜽𝟎 + 𝜽𝟏 𝒙𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑 )
𝒙𝟑 1
=
1 + 𝑒 −(𝜽𝟎 +𝜽𝟏 𝒙𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑 )
13
Logistic Regression vs Perceptron
𝒙𝟎 𝒙𝟎
𝜽𝟎 𝜽𝟎
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝜽𝟏
𝜽𝟐 𝜽𝟐
𝒙𝟐 𝒛 = 𝜽𝑻 𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂 𝒉𝜽
𝜽𝟑 𝜽𝟑
𝒙𝟑
Nueron 𝒙𝟑
Nueron
14
Logistic Regression vs Perceptron
𝒙𝟎 𝟏 Denoted as bias
𝜽𝟎 𝒃
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝒘𝟏
𝜽𝟐 𝒘𝟐 ෝ
𝒚
𝒙𝟐 𝒛 = 𝜽𝑻 𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂
𝜽𝟑 𝒘𝟑
𝒙𝟑
Nueron 𝒙𝟑
Nueron
Using the notations in the neural network
literature, where 𝜽 = 𝒘 = 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑
ෝ,
(𝒘𝟎 is not part of this vector here), 𝒉𝜽 = 𝒚
and 𝜽𝟎 = 𝒘𝟎 = 𝒃
15
Logistic Regression vs Perceptron
𝟏 𝒘𝟏
𝒃 𝑻
ෝ = 𝒂 = 𝝈 𝒛 = 𝝈 𝒘 𝒙 + 𝒃 = 𝝈( 𝒘𝟐 𝒙𝟏 𝒙𝟐 𝒙𝟑 + 𝒃)
𝒛 = 𝒘𝑻 𝒙 + 𝒃 𝒚
𝒙𝟏 𝒘𝟑
𝒘𝟏
𝒘𝟐 ෝ
𝒚 = 𝝈(𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 + 𝒃)
𝒙𝟐 𝒛 𝒂
𝒘𝟑
1
=
𝒙𝟑 1 + 𝑒 −(𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 +𝒃)
𝒂 = 𝝈(𝒛)
16
Neural Network Architectures
Why not just use a single neuron? Why do wee need a larger
network?
A single neuron (like logistic regression) only permits a linear
decision boundary
Most real world problems are considerably more complicated
18
Topologies of an ANN
feedforward
completely
(directed, a-cyclic)
connected
recurrent
(feedback connections)
Feedforward versus recurrent networks
Feedforward: No loops, input hidden layers output
Recurrent: Use feedback (positive or negative), A network with feedback,
where some of its inputs are connected to some of its outputs (discrete
time).
For regular neural networks, the most common layer type is the fully-connected
layer in which neurons between two adjacent layers are fully pairwise
connected, but neurons within a single layer share no connections
The above feed forward neural network is an example of Neural Network
topologies that use a stack of fully-connected layers
Multi Layer Perceptron (MLP)
Layer 1 Layer 2
Layer 3
Input Layer 21
MLP- 11 neurons, 3 layers : Notations
We can construct a neural network with as many layers, and
neurons in any layer, as needed
Notations
x = input, b = bias term
w = weights
z = net input-
f = activation function
a = output to next layer
22
MLP- 4 neurons, 2 layers : Notations
We can construct a neural network with as many layers, and
neurons in any layer, as needed
Notations
x = input, b = bias term
w = weights
z = net input- sum of weighted inputs
f = activation function
a = output to next layer
23
MLP- 4 neurons, 2 layers : Notations
We can construct a neural network with as many layers, and
neurons in any layer, as needed
Notations
x = input, b = bias term
w = weights
z = net input- sum of weighted inputs
f = activation function
a = activation - output to next layer
24
MLP Matrix representations - Example
We can construct a neural network with as many layers, and
neurons in any layer, as needed
Input Layer
Input Layer
(or Layer 0) Layer 1
25
MLP Matrix representations - Example
b4 b5 b6
Bias added to Hidden
Neurons
-0.4 0.2 0.1
MLP Matrix representations - Example
6
MLP Matrix representations - Example
6
MLP Matrix representations - Example
6
MLP Matrix representations - Example
[𝟏]
To indicate that this activation, a, is in layer 1
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0)
Layer 1
32
MLP Matrix representations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0)
Layer 1
33
MLP Matrix reprsentations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0)
Layer 1
34
MLP Matrix reprsentations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0)
Layer 1
35
MLP Matrix representations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0)
Layer 1
36
MLP Matrix reprsentations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0)
Layer 1
37
MLP Matrix representations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎 By convention, this
[𝟏] neural network is
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 said to have 2 layers
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
ෝ
𝒚 (and not 3) since the
[𝟏]
𝒛𝟑 𝒂[𝟏] input layer is
𝒙𝟑 𝟑 Layer 2 typically not
Output layer
counted!
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0) Hidden layer with 4 neurons
Layer 1
ANN Layer
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏] Also, the more layers
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 we add, the deeper
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐] ෝ
𝒚 the neural network
𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏] becomes, giving rise
𝒙𝟑 𝟑 Layer 2 Output layer to the concept of
deep learning!
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0) Hidden layer with 4 neurons
Layer 1
39
MLP Matrix representations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒛𝟐 𝒂[𝟏] Interestingly,
𝒙𝟏 𝟐
[𝟐]
neural networks
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
ෝ
𝒚
learn their own
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑 features!
𝒙𝟑 Layer 2 Output layer
[𝟏]
Input Layer 𝒛𝟒 𝒂[𝟏]
𝟒
(or Layer 0) Hidden layer with 4 neurons
Layer 1
40
MLP Matrix representations
We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏 This looks like
logistic regression,
[𝟏] but with features
𝒛𝟐 𝒂[𝟏]
𝟐 that were learnt
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
ෝ
𝒚 [𝟏] [𝟏] [𝟏]
(i.e., 𝒂𝟏 , 𝒂𝟐 , 𝒂𝟑 ,
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
[𝟏]
𝒂𝟒 ) and NOT
Layer 2 Output layer engineered by us
[𝟏] (i.e., 𝒙𝟏 , 𝒙𝟐 , and 𝒙𝟑 )
𝒛𝟒 𝒂[𝟏]
𝟒
Hidden layer with 4 neurons
Layer 1
41
Outline
ANN
Overview Neural
Biological Training
of Network Backprop.
Neurons ANN
DL Layer
42
ANN
A mathematical model composed of
a large number of simple, highly
interconnected processing elements.
The building blocks of ANN are the
neurons
𝒛 = 𝒘𝑻 𝒙 + 𝒃
3 Activation function : for limiting the
amplitude of the neuron output.
𝒂 = 𝒇(𝒛)
43
Commonly Used Activation Functions
Every activation function (or non-linearity) takes a single number and
performs a certain fixed mathematical operation on it. There are several
activation functions you may encounter in practice:
1. Sigmoid
The sigmoid non-linearity has the mathematical form σ(z)=1/(1+e−z)
and is shown in the image below on the left
Sigmoid non-linearity squashes real numbers to range between [0,1]
0, 𝑧 < 0
𝑓 𝑧 =ቊ
𝑧, 𝑧 ≥ 0
4. Leaky ReLU
Acts like ReLU, but allows negative outcomes
∝ 𝑧, 𝑧 < 0
f(z) = 𝑓 𝑥 = ቊ
𝑧, 𝑥 ≥ 0
45
Perceptron Limitations
A single neuron (like logistic regression) only permits a linear
decision boundary.
Linearly Separable Problem Linearly Inseparable Problems
Each layer may have different number of nodes and different activation
functions:
Commonly, same activation function within one layer.
Typically,
ReLu/tanh activation function is used in the hidden units, and
Sigmoid/Softmax or linear activation functions are used in the
output units depending on the problem 47
Sizing Neural Networks…
The two metrics that people commonly use to measure the size
of neural networks are the number of neurons, or more
commonly the number of parameters. Working with the two
example networks in the above picture:
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
51
etc …
Training ANN: Institution
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
52
Training ANN: Institution
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 Step-1 : Initialise with random weights
etc …
53
Training ANN: Institution
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1 Step-2 : Make prediction
4.1 0.1 0.2 0
etc …
1.4
2.7
1.9
54
Training ANN: Institution
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1 Step-2 : Make prediction
4.1 0.1 0.2 0
etc …
1.4
2.7 0.8
1.9
55
Training ANN: Intitution
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
Step-3: Compare prediction Vs. target,
4.1 0.1 0.2 0
for the 1st training dataset
etc …
1.4
2.7 0.8
0
1.9 loss = 2.32
6.4
2.8 0.9
1
1.7 loss = 0.152
6.4
2.8 0.5
0
1.7 loss = 1
6.4
2.8 0.5
0
1.7 loss = 1
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1 Step-4: Adjust weights and biases
4.1 0.1 0.2 0 based on the error (Backprop)
etc …
1.4
2.7 0.8
0
1.9 cost= 1.12
61
Forward propagation (aka “Inference”)
Make prediction about that data and Calculate the cost function,
where the nodes of the output layer are probabilities that the
sample is of a certain class.
67
Backprop : Example
68
Backprop : Example
69
Backprop : Example
70
Convolutional Neural Network
ConvNet
CNN
71
Outline
CNN
Building State-of-the-
Motivation Convolution Transfer
Blocks art
Settings Learning
architectures
73
Motivation – Image Data
Recognizing Images Using ANN- an ANN that try to look at image
of a person and try to predict the identity of that person (face
recognition)
You have collected different persons images as well as which one is
Yohannes
1000 pixels
1000
pixels
74
Motivation – Image Data
77
Motivation – Image Data
Important structures in image data
Topology of pixels (Spatial locality)---
Translation invariance
The pattern of pixels that characterizes a cat is the same no matter
where in the image the cat occurs
78
Motivation – Image Data
Important structures in image data
Topology of pixels (Spatial locality)---
Translation invariance
Scale invariance
Issues of lighting and contrast
Knowledge of human visual system
Features need to be “built up”.
Edges Shapes relation between shapes
79
Motivation – Image Data
The motivation behind the CNN is that different layers can learn
certain intermediate features
Features need to be “built up”.
Edges Shapes relation between shapes
Identifying Textures
CAT = [Two eyes in certain relation to one another] + [car fur texture]
=> Eyes = dark circle (pupil) inside another circle
=> Circle = particular combination of edge detectors
=> Fur = edges in certain pattern
Addressing invariant problem
Save computation time
Significantly diminish the amount of training data
80
Outline
CNN
State-of-the-
Building Convolution Transfer
Motivation art
Blocks Settings Learning
architectures
81
Typical CNN Architecture
Typical CNN architectures look like
el
Convolutional Layer:
convolution operation Example
el
Convolutional Layer:
convolution operation Example
51 60 20
el
Convolutional Layer:
convolution operation Example
51 60 20
el 31
Convolutional Layer:
convolution operation Example
51 60 20
el 31 . .
. . -2
Convolutional Layer:
Stride
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
0 1 0 0 1 0 -1 1 -1 Filter 2
0 0 1 0 1 0 -1 1 -1
…
…
6 x 6 image
Convolutional Layer:
Stride
1 -1 -1
Filter 1
-1 1 -1
stride=1
-1 -1 1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Convolutional Layer:
Stride
1 -1 -1
-1 1 -1 Filter 1
If stride=2 -1 -1 1
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Convolutional Layer:
Stride
1 -1 -1
Filter 1
-1 1 -1
stride=1 -1 -1 1
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6 x 6 image 3 -2 -2 -1
Convolutional Layer:
Stride
-1 1 -1
Filter 2
-1 1 -1
stride=1 -1 1 -1
7
7x7 input (spatially)
assume 3x3 filter
Determine the
7 output image
size ?
7
7x7 input (spatially)
assume 3x3 filter
7
7x7 input (spatially)
assume 3x3 filter
7
7x7 input (spatially)
assume 3x3 filter
7
7x7 input (spatially)
assume 3x3 filter
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
Determine the
output image
7 size ?
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
Stride
Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin Johnson
The step size as the kernel moves across the image
When the stride is greater than 1, it scales down the
output dimension
Convolutional Layer:
Stride
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
Determine the
7 output image
size ?
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
7 doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.
Using
Lecture 7 - kernels
27 Jandirectly there
2016Fei-Fei will
Li & be an
Andrej edge &
Karpathy effect
Justin Johnson
pixels near the edge will not be used as center pixels,
since there are no enough surrounding pixels
Convolutional Layer:
Stride
N
Can you find the formula for
obtaining output image size,
F for the given input image size
(NxN) and kernel size (FxF)
and stride s?
F N
Output size:
(N - F) / s + 1
e.g. N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33 :\
Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin Johnson
Convolutional Layer:
Padding
3x3 output!
(recall (without
padding):)
((N - F) / stride) +
𝑁 −𝐹+2𝑝
+1
𝑠
1 number:
the result of taking a
Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin Johnson
dot product between
the filter and a small
5x5x3 chunk of the
image
Convolutional Layer:
convolution for 3D images
1 -1 -1 -1-1 11 -1-1
11 -1-1 -1-1 -1 1 -1
-1-1 11 -1-1 -1-1-1111-1-1-1 Filter 2
-1 1 -1 Filter 1 -1 1 -1
-1-1 -1-1 11 -1-1 11 -1-1
-1 -1 1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolutional Layer:
convolution for 3D images
Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
Pooling Layer
Reduce the image size by mapping a patch of a pixel to a single value
Shrinks the dimensions of the image
Doesn’t have parameters, though there are different type of pooling
operation
Max-pool: for each distinct patch, represent it by the maximum
Average-pool: for each distinct patch, represent it by the average
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Pooling Layer
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
3 0
-1 1 Convolution
3 1
0 3
Max Pooling
Can
A new image
repeat
Convolution many
Smaller than the original
times
image
The number of channels Max Pooling
Max Pooling
Max Pooling
1
3 0
-1 1 3
3 1 -1
0 3 Flattened
1 Fully Connected
Feedforward network
3
Only modified the network structure and
CNN in Keras input format (vector -> 3-D tensor)
input
Convolution
1 -1 -1
-1 1 -1
-1 1 -1
-1 1 -1 … There are
-1 -1 1 25 3x3
-1 1 -1 … Max Pooling
filters.
Input_shape = ( 28 , 28 , 1)
3 -1 3 Max Pooling
-3 1
Outline
CNN
State-of-the-
Building Convolution Transfer
Motivation art
Blocks Settings Learning
architectures
128
LeNet-5
[LeCun et
al., 1998]
best model
TOTAL
forward!memory: 24M * 4 bytes ~= 93MB / image (only
~*2 for bwd)
TOTAL params: 138M parameters
GoogLeNet
[Szegedy et al., 2014]
Inception module
ILSVRC 2014 winner (6.7% top 5 error)
Fun features:
Only 5 million params! (Removes FC layers
completely)
Lecture 7 - Compared
27 Jan 2016Fei-Fei Li & Andrej to AlexNet:
Karpathy & Justin Johnson
- 12X less params
- 2x more compute
- 6.67% (vs. 15.4%)
ResNet
224x2
[He et al., 2015]
24x3 spatial dimension
only 56x56!
145