CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
14
ARTIFICIAL NEURAL NETWORK
15
ARTIFICIAL NEURAL NETWORK
Activation function
16
ACTIVATION FUNCTION
perceptron
sigmoid
17
NEURAL NETWORK: A NEURON
• A neuron is a computational unit in the neural network that exchanges messages with each other.
18
FEED FORWARD/BACKPROPAGATION
NEURAL NETWORK
Backpropagation:
• Randomly initialize the parameters
• Calculate total error at the top, 𝑓6 (𝑒)
• Then calculate contributions to error, 𝛿𝑛 , at
each step going backwards.
19
LIMITATIONS OF NEURAL NETWORKS
20
LET’S GO INTO THE MATH....
21
COMPONENTS OF MACHINE LEARNING
• Learning algorithm
• Initialized with a set of default parameters 𝜃1 to 𝜃𝑛
• The data
• Iterate over the dataset and at each row, we feed in the attributes 𝑋1 to 𝑋𝑛 into the learning algorithm → outputs a
prediction of the target variable based on current set of parameters
• Loss function
• Used to compute how close our prediction is to the actual value of target as contained in our dataset.
• Aggregated across all examples
• Optimization algorithm→ gradient descent
• Update the parameters of learning algorithm in a direction that would reduce the aggregated loss
22
STRUCTURE OF DATA
23
LOSS FUNCTION: REGRESSION
• Given a set of parameters, a loss function helps us to evaluate how well our learning algorithm is
performing on the training data using our current parameters.
24
LOSS FUNCTION: CLASSIFICATION
• Classification → return scores for all the classes available in our dataset.
• Softmax: take these scores and return probabilities between 0 and 1
• Given a set of scores (S)
Example:
P: Probability vector
e: base to the natural logarithm=2.71828
s: 𝑆𝑖 = score of each class
25
Softmax cross entropy loss
• Softmax cross entropy loss → the sum of negative of the log softmax score of the correct class
• The loss is very low when we are making the right prediction
26
LOSS FUNCTION: PLUS REGULARIZATION
• Prevent overfitting
• Regularization → based on the fact that models usually overfit when the values of the parameters is too
large
• Parameter sets that have large values tend to result in low loss on the training set but fails to yield correspondingly high
score on the test set.
• Penalizing large weights
L2 regularizer
27
OPTIMIZATION: GRADIENT DESCENT
• Finding the right set of parameters → finding parameters that yield the lowest error on the training set.
Learning rate
29
CONVOLUTIONAL NEURAL NETWORK
30
CONVOLUTIONAL NEURAL NETWORK
31
BASIC OPERATION
32
Stride, Padding
Max pooling
34
35
COMPONENTS OF CNN
• Data Augmentation
36
THE DEEP LEARNING REVOLUTION
37
38
39
40
41
ARCHITECTURE
• AlexNet (2012)
• VGGNet (2014)
• Inception
• Residual Networks
• Evolution of ResNet
• SE NET
42
ALEXNET
• winner of the ILSVRC (ImageNet Large Scale Visual Recognition Competition) 2012
• It was the first time a Convolutional Neural Network would significantly outperform other methods on a
large dataset (ImageNet 2012) by a large margin.
• AlexNet was composed of five convolutional layers followed by three fully connected (Dense)
layers.
• Their most important contribution was the training process.
• they used data augmentation to artificially increase the training dataset.
• CUDA-Covnet code which was an incredibly efficient implementation of the convolution operation. It effectively
parallelized the training process across two GPUs. In those days, there were no Deep Learning libraries
43
8 layers
ReLU is introduced
Overlapping pooling:
stride is smaller than
the kernel size
Data augmentation:
image translation and
mirroring, altering the
intensity using PCA
Dropout: probability of
0.5
45
46
47
48
49
REFERENCES
50