Lecture 3
Lecture 3
MultilayerPerceptron (MLP) is
considered a relatively shallower
model with one or two hidden layers
DL models have much more layers
than MLPs
The bottleneck for mounting more
layers has been the “vanishing
The vanishing gradient
problem
a difficulty found in MLPs with the sigmoid activation
function with gradient-descent learning methods and
backpropagation
In such methods, each of the neural network's
weights receives an update proportional to the
partial derivative of the error function with respect to
the current weight in each iteration of training.
The problem is that in some cases, the gradient will
be very small due to the “chain rule”, effectively
preventing the weight from changing its value at the
“front” layers.
In the worst case, this may completely stop the
neural network from further learning.
DL and Data Science
MNIST example
NN and Vision
For computer vision, why can’t we just
flatten the image and feed it through
traditional NN such as MLPs ?
Images are high-dimensional vectors. It
would take a huge amount of parameters to
characterize the network.
(# of parameters = 784*15 + 15 for MNIST
ex.)
CNNs are proposed to reduce the number of
parameters and adapt the network
architecture specifically to vision tasks.
Traditional Recognition
Approach
CNN Layers
Convolution Neural Networks (CNN)
is
a class of deep, feed-forward
artificial neural networks that are
applied to analyzing visual imagery.
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Output size: (N - F) /
stride + 1
e.g. N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1
=5
stride 2 => (7 - 3)/2 + 1
=3
stride 3 => (7 - 3)/3 + 1
= 2.33
Padding