0% found this document useful (0 votes)
40 views50 pages

CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0

The document provides an overview of deep learning and artificial neural networks. It discusses key concepts like artificial neurons, activation functions, feedforward and backpropagation in neural networks. It also covers limitations of early neural networks and how deep learning addresses them by learning multiple levels of representation. Additionally, it introduces convolutional neural networks, their basic operations like convolution and pooling, and components like dropout and batch normalization. Finally, it discusses the evolution of deep learning models from AlexNet to ResNet and SE Net.

Uploaded by

Zee Ingame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views50 pages

CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0

The document provides an overview of deep learning and artificial neural networks. It discusses key concepts like artificial neurons, activation functions, feedforward and backpropagation in neural networks. It also covers limitations of early neural networks and how deep learning addresses them by learning multiple levels of representation. Additionally, it introduces convolutional neural networks, their basic operations like convolution and pooling, and components like dropout and batch normalization. Finally, it discusses the evolution of deep learning models from AlexNet to ResNet and SE Net.

Uploaded by

Zee Ingame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CII4Q3 VISI KOMPUTER

Introduction to Deep Learning


2
3
4
5
6
7
8
9
10
11
12
13
ARTIFICIAL NEURAL NETWORK

14
ARTIFICIAL NEURAL NETWORK

• Network of interconnected neurons.


• Each neuron is a mathematical function
• High number of layers in neural networks → deep learning

15
ARTIFICIAL NEURAL NETWORK

• Fundamental operation that occurs within neurons → linear function

Activation function
16
ACTIVATION FUNCTION

perceptron

sigmoid

RELU (Rectified Linear Units)

17
NEURAL NETWORK: A NEURON

• A neuron is a computational unit in the neural network that exchanges messages with each other.

Possible activation functions:

Step function/threshold Sigmoid function (a.k.a, logistic


function function)

18
FEED FORWARD/BACKPROPAGATION
NEURAL NETWORK

Feed forward algorithm:


• Activate the neurons from the
bottom to the top.

Backpropagation:
• Randomly initialize the parameters
• Calculate total error at the top, 𝑓6 (𝑒)
• Then calculate contributions to error, 𝛿𝑛 , at
each step going backwards.

19
LIMITATIONS OF NEURAL NETWORKS

Random initialization + densely connected networks lead to:


• High cost
• Each neuron in the neural network can be considered as a logistic regression.
• Training the entire neural network is to train all the interconnected logistic regressions.
• Difficult to train as the number of hidden layers increases
• Recall that logistic regression is trained by gradient descent.
• In backpropagation, gradient is progressively getting more dilute. That is, below top layers, the correction signal 𝛿𝑛 is
minimal.
• Stuck in local optima
• The objective function of the neural network is usually not convex.
• The random initialization does not guarantee starting from the proximity of global optima.
→ Solution:
• Deep Learning/Learning multiple levels of representation

20
LET’S GO INTO THE MATH....

21
COMPONENTS OF MACHINE LEARNING

• Learning algorithm
• Initialized with a set of default parameters 𝜃1 to 𝜃𝑛
• The data
• Iterate over the dataset and at each row, we feed in the attributes 𝑋1 to 𝑋𝑛 into the learning algorithm → outputs a
prediction of the target variable based on current set of parameters
• Loss function
• Used to compute how close our prediction is to the actual value of target as contained in our dataset.
• Aggregated across all examples
• Optimization algorithm→ gradient descent
• Update the parameters of learning algorithm in a direction that would reduce the aggregated loss

22
STRUCTURE OF DATA

• Each individual piece of data → variable-value pairs


• Variables → features
• Variables
• Continuous: real valued numbers (price, age, length, area, temperature, etc)
• Categorical: discrete variable; cannot be expressed as real valued numbers (gender, race, color, state, etc)

23
LOSS FUNCTION: REGRESSION

• Given a set of parameters, a loss function helps us to evaluate how well our learning algorithm is
performing on the training data using our current parameters.

Prediction, given parameters 𝜃1 , 𝜃2 ,


and 𝜃3

Simple loss function

Average loss of all examples; MSE


(Mean Squared Error)

24
LOSS FUNCTION: CLASSIFICATION
• Classification → return scores for all the classes available in our dataset.
• Softmax: take these scores and return probabilities between 0 and 1
• Given a set of scores (S)

Example:
P: Probability vector
e: base to the natural logarithm=2.71828
s: 𝑆𝑖 = score of each class

25
Softmax cross entropy loss

Loss function: classification Negative log likelihood loss

• Softmax cross entropy loss → the sum of negative of the log softmax score of the correct class

j: the index of the correct class


• Example:
• Score softmax
• Softmax cross entropy loss

• The loss is very low when we are making the right prediction

26
LOSS FUNCTION: PLUS REGULARIZATION

• Prevent overfitting
• Regularization → based on the fact that models usually overfit when the values of the parameters is too
large
• Parameter sets that have large values tend to result in low loss on the training set but fails to yield correspondingly high
score on the test set.
• Penalizing large weights

Weight decay: controls the


strength of the regularizer

L2 regularizer
27
OPTIMIZATION: GRADIENT DESCENT

• Finding the right set of parameters → finding parameters that yield the lowest error on the training set.

Gradient of the loss with respect


to the parameters

Learning rate

Usually calculated through:


backpropagation
28
STOCHASTIC GRADIENT DESCENT:
MINIBATCH GRADIENT DESCENT

• Operates over a batch of the dataset at a time


• Common batch size: 32, 64, 128

• Other modifications to the Ordinary Gradient Descent: Gradient Descent


with Momentum, Adagrad, AdaDelta, RMSProp, AdamOptimizer

29
CONVOLUTIONAL NEURAL NETWORK

CNN: feed-forward networks; locally connected.

Works by detecting specific patterns of feature across the entire image.

30
CONVOLUTIONAL NEURAL NETWORK

31
BASIC OPERATION

The more the number of feature


detectors present in a CNN, the better
it can classify images.

Feature detector in CNN: kernel or filter

Each filter → detects a specific pattern


Filter size: 5x5, 3x3, 2x2, 1x1

CNN → locally connected


At each dot product (convolution), they are only connected to the local region of the image

32
Stride, Padding

Size of output feature map


33
POOLING

• Reduce the dimensions of image


• Helps to make CNNs invariant to the final presentation of the image, by
picking the most important feature in a given pooling region

Max pooling

34
35
COMPONENTS OF CNN

• Dropout: reducing overfitting

• Switching off some activations by setting them to zero

• Batch Normalization (Ioffe & Szegedy, 2015)

• Addressing the problem of vanishing gradient

• Normalizing each batch of feature maps to have zero mean

• Data Augmentation

• Randomly apply flipping, shifting, rotation, scaling, whitening to our images

36
THE DEEP LEARNING REVOLUTION

• Deep ConvNets for Object Recognition


• Semantic Segmentation
• Object Detection

37
38
39
40
41
ARCHITECTURE

• AlexNet (2012)
• VGGNet (2014)
• Inception
• Residual Networks
• Evolution of ResNet
• SE NET

42
ALEXNET
• winner of the ILSVRC (ImageNet Large Scale Visual Recognition Competition) 2012
• It was the first time a Convolutional Neural Network would significantly outperform other methods on a
large dataset (ImageNet 2012) by a large margin.
• AlexNet was composed of five convolutional layers followed by three fully connected (Dense)
layers.
• Their most important contribution was the training process.
• they used data augmentation to artificially increase the training dataset.
• CUDA-Covnet code which was an incredibly efficient implementation of the convolution operation. It effectively
parallelized the training process across two GPUs. In those days, there were no Deep Learning libraries

43
8 layers

ReLU is introduced

Overlapping pooling:
stride is smaller than
the kernel size

Data augmentation:
image translation and
mirroring, altering the
intensity using PCA

Dropout: probability of
0.5

Not used anymore → Batch normalization


44
VGGNET

• Invented by Visual Geometry Group


• Runner up of the ILSVRC (ImageNet Large Scale Visual Recognition
Competition) 2014
• the first year that there are deep learning models obtaining the error rate
under 10%
• Using smaller filter size → the number of parameters are fewer

45
46
47
48
49
REFERENCES

• Introduction to Deep Computer Vision, 2018, John Olafenwa & Moses


Olafenwa
• https://medium.com/@dataturks/deep-learning-and-computer-vision-from-
basic-implementation-to-efficient-methods-3ca994d50e90
• Deep Learning, 2016, Ian Goodfellow,Yoshua Bengio, & Aaron Courville, MIT
Press
• Deep Learning, NYU, https://atcold.github.io/pytorch-Deep-Learning/

50

You might also like