Deep Learning
Deep Learning
Pranita Mahajan
Classification: Personal
AGENDA
Syllabus
When DL over ML
Fundamentals of Neural Network
Takeaway
Classification: Personal
SYLLABUS 3
Classification: Personal
4
TIMELINE
TRAINING, CONVOLUTIONAL
RECURRENT
OPTIMIZATION AUTOENCODERS NEURAL RECENT TRENDS
NEURAL
REGULARIZATIO : UNSUPERVISED NETWORKS AND
NETWORKS
N OF DEEP LEARNING (CNN): APPLICATIONS
(RNN)
NEURAL SUPERVISED
NETWORK LEARNING
Classification: Personal
WHEN DL OVER ML
When to use DL for analysis
Classification: Personal
AI >> ML >> DL
Classification: Personal
Classification: Personal
DL LEARNING 8
ARCHITECTURE
TEXT USECASE IMAGE USECASE
Deep learning technique learn In the example of image
categories incrementally through recognition it means identifying
it’s hidden layer architecture, light/dark areas before categorizing
defining low-level categories like lines and then shapes to allow face
letters first then little higher-level recognition.
categories like words and then Each neuron or node in the
higher-level categories like network represents one aspect of
sentences. the whole and together they
provide a full representation of the
image.
Each node or hidden layer is given
a weight that represents the
strength of its relationship with the
output and as the model develops
Classification: Personal
the weights are adjusted.
DL LEARNING 9
ARCHITECTURE
TEXT USECASE IMAGE USECASE
Deep learning technique learn In the example of image
categories incrementally through recognition it means identifying
it’s hidden layer architecture, light/dark areas before categorizing
defining low-level categories like lines and then shapes to allow face
letters first then little higher-level recognition.
categories like words and then Each neuron or node in the
higher-level categories like network represents one aspect of
sentences. the whole and together they
provide a full representation of the
image.
Each node or hidden layer is given
a weight that represents the
strength of its relationship with the
output and as the model develops
Classification: Personal
the weights are adjusted.
OVERCOME
CURSE OF
DIMENSIONALITY
Classification: Personal
ARCHITECTURE OF 11
NEURAL NETWORK
Classification: Personal
HOW NN CAN BE USED 12
TO IMPLEMENT
FUNCTIONS
AND Function
- Input features – X1 and X2
- Output – y (AND function)
- 2-D Feature space representation
Classification: Personal
One of the many possible linear boundaries
How to construct a feature vector from this 13
linear equation
- Capture coefficients from linear line
separating two classes
Classification: Personal
Implementing AND using Neural 14
Network
Non Linear
function
AND Operation
Classification: Personal
How to design a neuron to build
AND operation / function
Classification: Personal
16
XOR FUNCTION
IMPLEMENTATION
• It is non-linear problem
• Can we break this non-linear problem
into combination of linear problem
h1 = OR(X1,X2)
h2 = NAND(X1,X2)
XOR(X1,X2) = AND (h1,h2)
Classification: Personal
XOR FUNCTION 18
IMPLEMENTATION
Classification: Personal
Presentation title 19
XOR FUNCTION
Weight vectors of OR,
[-0.5 1 1] h1
Negation of AND is h2
NAND
[1.5 -1 -1]
Classification: Personal
HOW MANY NEURONS ARE
Presentation title 20
NEEDED
• We need 3 neurons
• 3 neurons to be arranged in Two layers
• 1st – OR operation
• 2nd – NAND operation
• 3rd in second layer AND operation
Classification: Personal
Presentation title 21
GENERAL NEURAL
NETWORK
•
(MLP)
A multilayer perceptron (MLP) is a feedforward artificial neural network that generates
a set of outputs from a set of inputs.
• An MLP is characterized by several layers of input nodes connected as a directed graph
between the input and output layers.
• Feed Forward Network.
• The job of each layer is to take input from layer before it and pass it to next layer,
hence Feed Forward NN (Deep Neural Network).
• Every node in intermediate layer is connected to next layer node and the purpose is to
Classification: Personal
find the value of weight in next layer
BUILDING BLOCK OF DEEP
23
NEURAL NETWORKS
• Perceptron
Classification: Personal
Multi layer Perceptron 24
Classification: Personal
BUILDING BLOCK OF DEEP
25
NEURAL NETWORKS
• Sigmoid Neuron
Sigmoid neurons are similar to perceptrons, but they are slightly modified such that the output from
the sigmoid neuron is much smoother than the step functional output from perceptron.
Perceptron model takes several real-valued inputs and gives a single binary output.
In the perceptron model, every input xi has weight wi associated with it.
The weights indicate the importance of the input in the decision-making process.
The model output is decided by a threshold Wₒ if the weighted sum of the inputs is greater than threshold Wₒ
output will be 1 else output will be 0. In other words, the model will fire if the weighted sum is greater than the threshold.
Classification: Personal
Presentation title 26
• Let’s see how this harsh thresholding affects real world problem
• Red points indicates that a person would not buy a car and green points indicate that person would like to buy
a car. Isn’t it a bit odd that a person with 50.1K will buy a car but someone with a 49.9K will not buy a car?
The small change in the input to a perceptron can sometimes cause the output to completely flip, say from 0 to
1.
• This behavior is a characteristic of the perceptron neuron itself which behaves like a step function. We can
overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron.
• One of the limitations of the perceptron model is that the learning algorithm works only if the data is linearly
separable
Classification: Personal
Presentation title 27
Sigmoid neurons where the output function is much smoother than the step function. In the sigmoid
neuron, a small change in the input only causes a small change in the output as opposed to the stepped
output. There are many functions with the characteristic of an “S” shaped curve known as sigmoid
functions. The most commonly used function is the logistic function.
Classification: Personal
Presentation title 28
SUMMARY
If the Feed Forward algorithm only computed the weighted sums in each neuron,
propagated results to the output layer, and stopped there, it wouldn’t be able to learn the
weights that minimize the cost function. If the algorithm only computed one iteration,
there would be no actual learning.
This is where Backpropagation comes into play.
Classification: Personal
Presentation title 29
Classification: Personal
Presentation title 30
• Forward Propagation
• Cost Function
• Gradient Descent
• Learning Rate
• Backpropagation
Classification: Personal
Presentation title 31
CONVOLUTIONAL NEURAL
NETWORK
Convolutional neural networks are basically applied
on image data. Suppose we have an input of size
(28*28*3), If we use a normal neural network, there
would be 2352(28*28*3) parameters. And as the size
of the image increases the number of parameters
becomes very large. We “convolve” the images to
reduce the number of parameters
Classification: Personal
Presentation title 32
Classification: Personal
Presentation title 33
CHAPTER 2
Classification: Personal
Presentation title 34
Classification: Personal
Presentation title 35
• Forward Propagation
• Cost Function
• Gradient Descent
• Learning Rate
• Backpropagation
Classification: Personal
Presentation title 36
GRADIENT DESCENT
• Designing and training a neural network is not much di fferent from training any other machine learning model
with gradient descent
• The largest difference between the linear models we have seen so far and neural networks is that the nonlinearity
of a neural network causes most interesting loss functions to become nonconvex.
• This means that neural networks are usually trained by using iterative, gradient-based optimizers that merely drive
the cost function to a very low value,
• For feedforward neural networks, it is important to initialize all weights to small random values.
• Training a neural network is not much di fferent from training any other model. Computing the gradient is slightly
• more complicated for a neural network but can still be done e fficiently and exactly.
• We will today learn the gradient using the back-propagation algorithm and modern generalizations of the back-
propagation algorithm.
• Back-propagation is an algorithm that computes the chain rule, with a specific order of operations that is highly
efficient.
Classification: Personal
Presentation title 37
• Optimizer algorithms are optimization method that helps improve a deep learning model’s
performance. These optimization algorithms or optimizers widely affect the accuracy and speed
training of the deep learning model.
• An optimizer is a function or an algorithm that adjusts the attributes of the neural network, such as
weights and learning rates. Thus, it helps in reducing the overall loss and improving accuracy. The
problem of choosing the right weights for the model is a daunting task, as a deep learning model
generally consists of millions of parameters.
• Gradient Descent can be considered the popular method among the class of optimizers.
Classification: Personal
Presentation title 38
LEARNING FACTORS
• The factors are as follows:
1. Initial weights
2. Steepness of activation function
3. Learing constant
4. Momentum
5. Network architecture
6. Necessary number of hidden neurons
Classification: Personal
1. Initial weights:
39
The weights of the network to be trained are typically initialized at small random values.
The initialization strongly affects the ultimate solution
2. Steepness of activation function
The neuron’s continuous activation function is characterized by its steepness factor
Also the derivative of the activation function serves as a multiplying factor in building components of the error signal vectors.
3. Learning constant:
The effectiveness and convergence of the error back propagation learning algorithm deepen significantly on the value of the
learning constant.
4. Momentum:
The purpose of the momentum method is to accelerate the convergence of the error back propagation learning algorithm.
The method involves supplementing the current weight adjustment with a fraction of the most recent weight adjustment.
5. Network architecture:
One of the most important attributes if a layered neural network design is choosing the architecture
The number of input nodes is simply determined by the dimension or size of the input vector to be classified. The input vector
size usually corresponds to the total number of distinct features of the input patterns.
6. Necessary number of hidden neurons:
This problem of choice of size of the hidden layer is under intensive study with no conclusive answers available.
One formula can be used to find out how many hidden layer neurons need to be used to achieve classification into M classes
in x dimensional patterns space.
Classification: Personal
Presentation title 40
ACTIVATION
FUNCTIONS
Linear
Logistic / Sigmoid
Tanh
ReLU
Leaky ReLU
Softmax
Classification: Personal
Presentation title 41
TYPES OF ACTIVATION
FUNCTION
• Activation functions are generally two types, These are
1. Linear or Identity Activation Function
2. Non-Linear Activation Function.
Generally, neural networks use non-linear activation functions, which can help the network learn complex data,
compute and learn almost any function representing a question, and provide accurate predictions.They allow
back-propagation because they have a derivative function which is related to the inputs.
Classification: Personal
Presentation title 42
Classification: Personal
Presentation title
SIGMOID/ LOGISTIC 43
• Sigmoid Activation function is very simple which takes a real value as input and gives probability that ‘s always
between 0 or 1. It looks like ‘S’ shape.
• Main advantage is simple and good for classifier. But Big disadvantage of the function is that it It gives rise to a
problem of “vanishing gradients” because Its output isn’t zero centered. It makes the gradient updates go too far in
different directions. 0 < output < 1, and it makes optimization harder. That takes very high computational time in
hidden layer of neural network
Classification: Personal
Presentation title 44
Tanh or Hyperbolic tangent
• Tanh help to solve non zero centered problem of sigmoid function. Tanh squashes a real-valued number to the
range [-1, 1]. It’s non-linear.
• Derivative function give us almost same as sigmoid’s derivative function.
• It can’t remove the vanishing gradient problem completely.
Classification: Personal
Presentation title 45
Classification: Personal
Presentation title 46
RELU
This is most popular activation function
which is used in hidden layer of NN.The
formula is deceptively simple:
𝑚𝑎𝑥(0,𝑧)max(0,z). Despite its name and
appearance, it’s not linear and provides the
same benefits as Sigmoid but with better
performance.
Classification: Personal
Presentation title 47
Classification: Personal
Presentation title 48
Classification: Personal
Presentation title 49
RELU WITH
LEAKY RELU
Leaky ReLU does not provide consistent
predictions for negative input values. During
the front propagation if the learning rate is set
very high it will overshoot killing the neuron.
Classification: Personal
Presentation title 50
SOFTMAX
Generally, we use the function at last layer of
neural network which calculates the
probabilities distribution of the event over ’n’
different events. The main advantage of the
function is able to handle multiple classes.
Classification: Personal
Presentation title 51
Classification: Personal
Presentation title 52
Classification: Personal
Presentation title 53
Classification: Personal
Presentation title 54
Classification: Personal
Presentation title 55
Advantage
1. Easy to interpret.
2. Always differential because of the square.
3. Only one local minima.
Disadvantage
1. Error unit in the square. because the unit in the
square is not understood properly.
2. Not robust to outlier
Classification: Personal
Presentation title 56
Classification: Personal
Presentation title 57
Classification: Personal
Presentation title 58
SUMMARY
Type of Neural Network – DLL, CNN, RNN
Gradient descent
Learning factors – W, alpha
Activation function – Linear, Sigmoid, Tanh, ReLu
Loss
Cost
Classification: Personal
THANK YOU
Pranita Mahajan
[email protected]
Classification: Personal