0% found this document useful (0 votes)
13 views59 pages

Deep Learning

This document provides an overview of deep learning. It begins with an agenda that includes a syllabus, when deep learning is preferable to machine learning, fundamentals of neural networks, and takeaways. It then discusses the timeline of deep learning techniques including training and optimization, autoencoders, convolutional neural networks, recurrent neural networks, and recent trends. It also covers when deep learning should be used over machine learning, the learning architecture of deep neural networks for text and image use cases, and how neural networks can implement functions and overcome the curse of dimensionality.

Uploaded by

Fatema Momin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views59 pages

Deep Learning

This document provides an overview of deep learning. It begins with an agenda that includes a syllabus, when deep learning is preferable to machine learning, fundamentals of neural networks, and takeaways. It then discusses the timeline of deep learning techniques including training and optimization, autoencoders, convolutional neural networks, recurrent neural networks, and recent trends. It also covers when deep learning should be used over machine learning, the learning architecture of deep neural networks for text and image use cases, and how neural networks can implement functions and overcome the curse of dimensionality.

Uploaded by

Fatema Momin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

DEEP LEARNING

Pranita Mahajan​

Classification: Personal
AGENDA
Syllabus​
When DL over ML
​Fundamentals of Neural Network
​Takeaway

Classification: Personal
SYLLABUS 3

Classification: Personal
4

TIMELINE
TRAINING, CONVOLUTIONAL
RECURRENT
OPTIMIZATION AUTOENCODERS NEURAL RECENT TRENDS
NEURAL
REGULARIZATIO : UNSUPERVISED NETWORKS AND
NETWORKS
N OF DEEP LEARNING (CNN): APPLICATIONS
(RNN)
NEURAL SUPERVISED
NETWORK LEARNING

Synergize scalable Disseminate standardized Coordinate e- Foster holistically Deploy strategic


e-commerce business applications superior methodologies networks with
metrics compelling e-
business needs

Classification: Personal
WHEN DL OVER ML
When to use DL for analysis

Classification: Personal
AI >> ML >> DL

MACHINE ML LIMITATION DEEP LEARNING DL PROBLEM


LEARNING SOLVING
Machine Learning is set ML algorithms lot of Subset of Machine
Each concept defined in
of algorithms that parse domain expertise, Learning that achieves
relation to simpler
data, learn from them, human intervention great power and
concepts, and more
and then apply what only capable of what flexibility by learning to
abstract representations
they’ve learned to make they’re designed for; represent the world as
computed in terms of
intelligent decisions. nothing more, nothing nested hierarchy of
less abstract ones
less. concepts

Classification: Personal
Classification: Personal
DL LEARNING 8

ARCHITECTURE
TEXT USECASE IMAGE USECASE
Deep learning technique learn In the example of image
categories incrementally through recognition it means identifying
it’s hidden layer architecture, light/dark areas before categorizing
defining low-level categories like lines and then shapes to allow face
letters first then little higher-level recognition.
categories like words and then Each neuron or node in the
higher-level categories like network represents one aspect of
sentences. the whole and together they
provide a full representation of the
image.
Each node or hidden layer is given
a weight that represents the
strength of its relationship with the
output and as the model develops
Classification: Personal
the weights are adjusted.
DL LEARNING 9

ARCHITECTURE
TEXT USECASE IMAGE USECASE
Deep learning technique learn In the example of image
categories incrementally through recognition it means identifying
it’s hidden layer architecture, light/dark areas before categorizing
defining low-level categories like lines and then shapes to allow face
letters first then little higher-level recognition.
categories like words and then Each neuron or node in the
higher-level categories like network represents one aspect of
sentences. the whole and together they
provide a full representation of the
image.
Each node or hidden layer is given
a weight that represents the
strength of its relationship with the
output and as the model develops
Classification: Personal
the weights are adjusted.
OVERCOME
CURSE OF
DIMENSIONALITY

Classification: Personal
ARCHITECTURE OF 11

NEURAL NETWORK

HUMAN BRAIN NEURAL NETWORK


• Information passes through the
synaptic connections by processing
it at Soma.
• This is called as multiplicative
conversation.
• X -> WX
• This is achieved most of the time by
non linear function
• NN is collection these neurons

Classification: Personal
HOW NN CAN BE USED 12

TO IMPLEMENT
FUNCTIONS
AND Function
- Input features – X1 and X2
- Output – y (AND function)
- 2-D Feature space representation

Can it be considered as binary classification problem

Classification: Personal
One of the many possible linear boundaries
How to construct a feature vector from this 13
linear equation
- Capture coefficients from linear line
separating two classes

One of the many possible linear


boundaries

Classification: Personal
Implementing AND using Neural 14

Network

Non Linear
function

AND Operation

Classification: Personal
How to design a neuron to build
AND operation / function

Hence, with single neuron with threshold non


linear function we can implement AND logic

Classification: Personal
16

Similarly OR Logic can be build

As AND & OR both are linearly


Separable so we could solve it with
single neuron but for problems
where X and y are non-linearly
Separable we need NN (MLP)
Classification: Personal
17

XOR FUNCTION
IMPLEMENTATION
• It is non-linear problem
• Can we break this non-linear problem
into combination of linear problem

XOR(X1, X2) = OR(X1,X2) AND NAND(X1,X2)

h1 = OR(X1,X2)
h2 = NAND(X1,X2)
XOR(X1,X2) = AND (h1,h2)

Classification: Personal
XOR FUNCTION 18

IMPLEMENTATION

Classification: Personal
Presentation title 19

XOR FUNCTION
Weight vectors of OR,
[-0.5 1 1] h1
Negation of AND is h2
NAND
[1.5 -1 -1]

Classification: Personal
HOW MANY NEURONS ARE
Presentation title 20

NEEDED
• We need 3 neurons
• 3 neurons to be arranged in Two layers
• 1st – OR operation
• 2nd – NAND operation
• 3rd in second layer AND operation

Classification: Personal
Presentation title 21

GENERAL NEURAL
NETWORK

In every layer it computes non-linear function

Cascade of Neural network gives overall function

In final layer (kth layer) solve this problem as linearly separable


Classification: Personal
MULTI LAYER PERCEPTRON
22


(MLP)
A multilayer perceptron (MLP) is a feedforward artificial neural network that generates
a set of outputs from a set of inputs.
• An MLP is characterized by several layers of input nodes connected as a directed graph
between the input and output layers.
• Feed Forward Network.

• The job of each layer is to take input from layer before it and pass it to next layer,
hence Feed Forward NN (Deep Neural Network).
• Every node in intermediate layer is connected to next layer node and the purpose is to
Classification: Personal
find the value of weight in next layer
BUILDING BLOCK OF DEEP
23

NEURAL NETWORKS
• Perceptron

Frank Rosenblatt’s model

Classification: Personal
Multi layer Perceptron 24

Frank Rosenblatt’s model

Classification: Personal
BUILDING BLOCK OF DEEP
25

NEURAL NETWORKS
• Sigmoid Neuron
Sigmoid neurons are similar to perceptrons, but they are slightly modified such that the output from
the sigmoid neuron is much smoother than the step functional output from perceptron.
Perceptron model takes several real-valued inputs and gives a single binary output.
In the perceptron model, every input xi has weight wi associated with it.
The weights indicate the importance of the input in the decision-making process.
The model output is decided by a threshold Wₒ if the weighted sum of the inputs is greater than threshold Wₒ
output will be 1 else output will be 0. In other words, the model will fire if the weighted sum is greater than the threshold.

Classification: Personal
Presentation title 26

• Let’s see how this harsh thresholding affects real world problem
• Red points indicates that a person would not buy a car and green points indicate that person would like to buy
a car. Isn’t it a bit odd that a person with 50.1K will buy a car but someone with a 49.9K will not buy a car?
The small change in the input to a perceptron can sometimes cause the output to completely flip, say from 0 to
1.
• This behavior is a characteristic of the perceptron neuron itself which behaves like a step function. We can
overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron.
• One of the limitations of the perceptron model is that the learning algorithm works only if the data is linearly
separable

Classification: Personal
Presentation title 27

Sigmoid neurons where the output function is much smoother than the step function. In the sigmoid
neuron, a small change in the input only causes a small change in the output as opposed to the stepped
output. There are many functions with the characteristic of an “S” shaped curve known as sigmoid
functions. The most commonly used function is the logistic function.

Classification: Personal
Presentation title 28

SUMMARY
If the Feed Forward algorithm only computed the weighted sums in each neuron,
propagated results to the output layer, and stopped there, it wouldn’t be able to learn the
weights that minimize the cost function. If the algorithm only computed one iteration,
there would be no actual learning.
This is where Backpropagation comes into play.

Classification: Personal
Presentation title 29

THREE CLASSES OF DEEP


LEARNING
1. Basics of Neural Networks
2. Convolutional Neural Networks
3. Recurrent Neural Networks

Classification: Personal
Presentation title 30

• Forward Propagation
• Cost Function
• Gradient Descent
• Learning Rate
• Backpropagation

Classification: Personal
Presentation title 31

CONVOLUTIONAL NEURAL
NETWORK
Convolutional neural networks are basically applied
on image data. Suppose we have an input of size
(28*28*3), If we use a normal neural network, there
would be 2352(28*28*3) parameters. And as the size
of the image increases the number of parameters
becomes very large. We “convolve” the images to
reduce the number of parameters

Classification: Personal
Presentation title 32

RECURRENT NEURAL NETWORK


• Recurrent neural networks are used especially for sequential data where the previous output is used to predict the
next one. In this case the networks have loops within them. The loops within the hidden neuron gives them the
capability to store information about the previous words for some time to be able to predict the output.

Classification: Personal
Presentation title 33

CHAPTER 2

TRAINING, OPTIMIZATION AND


REGULARIZATION OF DEEP
NEURAL NETWORK

Classification: Personal
Presentation title 34

TRAINING FEEDFORWARD DNN

Classification: Personal
Presentation title 35

• Forward Propagation
• Cost Function
• Gradient Descent
• Learning Rate
• Backpropagation

Classification: Personal
Presentation title 36

GRADIENT DESCENT
• Designing and training a neural network is not much di fferent from training any other machine learning model
with gradient descent
• The largest difference between the linear models we have seen so far and neural networks is that the nonlinearity
of a neural network causes most interesting loss functions to become nonconvex.
• This means that neural networks are usually trained by using iterative, gradient-based optimizers that merely drive
the cost function to a very low value,
• For feedforward neural networks, it is important to initialize all weights to small random values.
• Training a neural network is not much di fferent from training any other model. Computing the gradient is slightly
• more complicated for a neural network but can still be done e fficiently and exactly.
• We will today learn the gradient using the back-propagation algorithm and modern generalizations of the back-
propagation algorithm.
• Back-propagation is an algorithm that computes the chain rule, with a specific order of operations that is highly
efficient.

Classification: Personal
Presentation title 37
• Optimizer algorithms are optimization method that helps improve a deep learning model’s
performance. These optimization algorithms or optimizers widely affect the accuracy and speed
training of the deep learning model.
• An optimizer is a function or an algorithm that adjusts the attributes of the neural network, such as
weights and learning rates. Thus, it helps in reducing the overall loss and improving accuracy. The
problem of choosing the right weights for the model is a daunting task, as a deep learning model
generally consists of millions of parameters.
• Gradient Descent can be considered the popular method among the class of optimizers.

Classification: Personal
Presentation title 38

LEARNING FACTORS
• The factors are as follows:
1. Initial weights
2. Steepness of activation function
3. Learing constant
4. Momentum
5. Network architecture
6. Necessary number of hidden neurons

Classification: Personal
1. Initial weights:
39
The weights of the network to be trained are typically initialized at small random values.
The initialization strongly affects the ultimate solution
2. Steepness of activation function
The neuron’s continuous activation function is characterized by its steepness factor
Also the derivative of the activation function serves as a multiplying factor in building components of the error signal vectors.
3. Learning constant:
The effectiveness and convergence of the error back propagation learning algorithm deepen significantly on the value of the
learning constant.
4. Momentum:
The purpose of the momentum method is to accelerate the convergence of the error back propagation learning algorithm.
The method involves supplementing the current weight adjustment with a fraction of the most recent weight adjustment.
5. Network architecture:
One of the most important attributes if a layered neural network design is choosing the architecture
The number of input nodes is simply determined by the dimension or size of the input vector to be classified. The input vector
size usually corresponds to the total number of distinct features of the input patterns.
6. Necessary number of hidden neurons:
This problem of choice of size of the hidden layer is under intensive study with no conclusive answers available.
One formula can be used to find out how many hidden layer neurons need to be used to achieve classification into M classes
in x dimensional patterns space.

Classification: Personal
Presentation title 40

ACTIVATION
FUNCTIONS
Linear
Logistic / Sigmoid
Tanh
ReLU
Leaky ReLU
Softmax

Classification: Personal
Presentation title 41

TYPES OF ACTIVATION
FUNCTION
• Activation functions are generally two types, These are
1. Linear or Identity Activation Function
2. Non-Linear Activation Function.

Generally, neural networks use non-linear activation functions, which can help the network learn complex data,
compute and learn almost any function representing a question, and provide accurate predictions.They allow
back-propagation because they have a derivative function which is related to the inputs.

Classification: Personal
Presentation title 42

Classification: Personal
Presentation title
SIGMOID/ LOGISTIC 43

• Sigmoid Activation function is very simple which takes a real value as input and gives probability that ‘s always
between 0 or 1. It looks like ‘S’ shape.
• Main advantage is simple and good for classifier. But Big disadvantage of the function is that it It gives rise to a
problem of “vanishing gradients” because Its output isn’t zero centered. It makes the gradient updates go too far in
different directions. 0 < output < 1, and it makes optimization harder. That takes very high computational time in
hidden layer of neural network

Classification: Personal
Presentation title 44
Tanh or Hyperbolic tangent

• Tanh help to solve non zero centered problem of sigmoid function. Tanh squashes a real-valued number to the
range [-1, 1]. It’s non-linear.
• Derivative function give us almost same as sigmoid’s derivative function.
• It can’t remove the vanishing gradient problem completely.

Classification: Personal
Presentation title 45

COMPARING TANH WITH SIGMOID

Classification: Personal
Presentation title 46

RELU
This is most popular activation function
which is used in hidden layer of NN.The
formula is deceptively simple:
𝑚𝑎𝑥(0,𝑧)max(0,z). Despite its name and
appearance, it’s not linear and provides the
same benefits as Sigmoid but with better
performance.

It’s main advantage is that it avoids and


rectifies vanishing gradient problem and less
computationally expensive than tanh and
sigmoid.
But it has also some draw back . Sometime
some gradients can be fragile during training
and can die. That leads to dead neurons.

Classification: Personal
Presentation title 47

COMPARING RELU WITH SIGMOID

Classification: Personal
Presentation title 48

• It prevents dying ReLU


LEAKY RELU problem.T his variation of ReLU
has a small positive slope in the
negative area, so it does enable
back-propagation, even for
negative input values

Classification: Personal
Presentation title 49

RELU WITH
LEAKY RELU
Leaky ReLU does not provide consistent
predictions for negative input values. During
the front propagation if the learning rate is set
very high it will overshoot killing the neuron.

Classification: Personal
Presentation title 50

SOFTMAX
Generally, we use the function at last layer of
neural network which calculates the
probabilities distribution of the event over ’n’
different events. The main advantage of the
function is able to handle multiple classes.

Classification: Personal
Presentation title 51

SIGMOID WITH Click icon to add picture


SOFTMAX
Let’s take an example
Sigmoid input values: -0.5, 1.2, -0.1, 2.4
Sigmoid output values: 0.37, 0.77, 0.48,
0.91
SoftMax input values: -0.5, 1.2, -0.1, 2.4
SoftMax output values: 0.04, 0.21, 0.05,
0.70

Sigmoid’s probabilities produced by a Sigmoid are independent. Furthermore, they


are not constrained to sum to one: 0.37 + 0.77 + 0.48 + 0.91 = 2.53. The reason for this is because
the Sigmoid looks at each raw output value separately.
Whereas Softmax’s the outputs are interrelated. The Softmax probabilities will always sum to one
by design: 0.04 + 0.21 + 0.05 + 0.70 = 1.00. In this case, if we want to increase the likelihood of
one class, the other has to decrease by an equal amount.

Classification: Personal
Presentation title 52

LOSS AND LOSS FUNCTIONS FOR


TRAINING DEEP LEARNING NEURAL
NETWORKS

Classification: Personal
Presentation title 53

WHAT WE WILL STUDY


• Regression Models
• Squared Error loss
• Classification Model
• Cross Entropy
• Choosing output function and loss function

Classification: Personal
Presentation title 54

SQUARED ERROR LOSS


• In mathematical optimization and decision theory, a loss or cost function (sometimes also called an error
function) is a function that maps an event or values of one or more variables onto a real number intuitively
representing some “cost” associated with the event.
• In simple terms, the Loss function is a method of evaluating how well your algorithm is modeling your dataset. It
is a mathematical function of the parameters of the machine learning algorithm.
• You can’t improve what you can’t measure. That’s why the loss function comes into the picture to evaluate how
well your algorithm is modeling your dataset.

Classification: Personal
Presentation title 55

Advantage
1. Easy to interpret.
2. Always differential because of the square.
3. Only one local minima.

Disadvantage
1. Error unit in the square. because the unit in the
square is not understood properly.
2. Not robust to outlier

Note – In regression at the last neuron use linear


activation function.

Classification: Personal
Presentation title 56

CROSS ENTROPY LOSS


• Binary
• It is used in binary classification problems like two classes. example a person has covid or not or my article gets popular or
not.
• Binary cross entropy compares each of the predicted probabilities to the actual class output which can be either 0 or 1. It then
calculates the score that penalizes the probabilities based on the distance from the expected value. That means how close or
far from the actual value.

• Categorical Cross Entropy


• Categorical Cross entropy is used for Multiclass classification and softmax regression.

Classification: Personal
Presentation title 57

CHOOSING OUTPUT FUNCTION AND LOSS FUNCTION


• The loss function calculates the error per observation, whilst the cost function calculates the error
over the whole dataset.

Classification: Personal
Presentation title 58

SUMMARY
Type of Neural Network – DLL, CNN, RNN
Gradient descent
Learning factors – W, alpha
Activation function – Linear, Sigmoid, Tanh, ReLu
Loss
Cost

Classification: Personal
THANK YOU
Pranita Mahajan
[email protected]

Classification: Personal

You might also like