0% found this document useful (0 votes)

4 views53 pages

Deep Neural Networks

The document provides an introduction to machine learning, focusing on Artificial Neural Networks (ANN) and their foundational concepts. It discusses the structure and functioning of neurons, particularly the Perceptron model, its historical context, and limitations in handling certain binary functions like XOR. Additionally, it outlines the architecture of ANNs, the learning algorithms used to adjust weights, and the application of ANNs in classification tasks.

Uploaded by

alvaro ruiz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views53 pages

Deep Neural Networks

Uploaded by

alvaro ruiz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

Introduction to machine learning

Deep Neural Networks

Introduction

1
Introduction to machine learning

Machine Learning (Artificial Neural Network)

1. Artificial Neural Network (ANN) models relationships between a set of input data and
output data.

2. ANN models are based on the observed behaviour of neural nets in our brains

3. Just as brain uses a network of interconnected neurons to parallelize the processing of

input signals to trigger a response, ANN also make use of interconnected neurons to
parallelly work on input signals and give an output

4. It is interesting to know how biological neurons function Ref:

1. Resting potential
2. Action potential
3. Threshold for action potential
4. Synapses and synaptic transmission

2
Introduction to machine learning

Machine Learning (Artificial Neural Network)

5. Artificial Neural Network (ANN) models relationships between a set of

input data and output data.

6. Natural / Abstract neuron

https://youtu.be/ySgmZOTkQA8

3
Introduction to machine learning
Machine Learning (Artificial Neural Network) /
Perceptron…

7. History artificial neuron

a. Rosenblatt defined a Perceptron as a simple mathematical model of biological neurons
b. It takes as inputs a set of binary values from senses / neighboring neurons.
c. Every input is multiplied by a weight (akin to the synaptic strength to each nearby neuron),
d. A threshold to evaluate the sum of weighted inputs against. Output a 1 (neuron firing) if the
weighted sum is more than threshold, else a zero (neuron not firing)
e. Other than the inputs from sensors / neighboring neurons, they also get a special ‘bias’
input, which just has a value of 1
f. The bias is like intercept in a linear equation. Useful in generating more functions with the
same inputs
g. This was an improvement of the work of Warren McCulloch and Walter Pitts, McColloch-
Pitts neuron.
h. McColloch Pitts neuron had a fixed set of weights associated with inputs and as a result
did not have the ability to learn. They had no bias term

4
Introduction to machine learning

Machine Learning (Artificial Neural Network) /

Perceptron…

7. History artificial neuron (contd….

i. Rosenblatt’s neuron was inspired by a rule coined by Donald Hebb
(researcher studying biological neurons), according to which
learning in brain is stored in form of changes to the strength of
relationships between two connected neurons “Neurons that fire
together, wire together” Donald Hebb 1949
j. Rosenblatt’s neuron can model the basic OR/AND/NOT functions
(building blocks of computing systems)
k. This was a big step towards the belief that making computers able to
perform formal logical reasoning would essentially solve AI.

5
Introduction to machine learning

Warren McCulloch & Walter Pitts Neuron Rosenblatt’s Perceptron

1. It has an Input layer that acts as dendrites

2. It has two parts, first part is weighted addition

Each input is multiplied with a weight (which is
typically initialized with some random value)
1. It has an input layer that acts like dendrites
3. The sum is then passed through an activation
2. It has two parts, the first part, g is weighted addition of function which yields a 1 if threshold is crossed
inputs. The weights are manually initialized and all
have same weight 4. The step function can be defined in such a way
that output can range from -1 to +1
3. The weighted sum is passed through an activation
function f which yields a 1 if threshold is crossed, else
0

6
Introduction to machine learning

Warren McCulloch & Walter Pitts “AND”, “OR” Neuron

>
> =

AND OR

1. In McCulloh Pitt neurons inputs and outputs are binary. The output is only one
but inputs can be many

2. All inputs have same positive weights (not shown in the figure)

3. The inputs multiplied with corresponding weights are summed up and the result
sent to a step function.

4. The threshold of the step function is fixed (for e.g. 1 for “AND” gate with two
inputs each with weight 1)

5. The threshold has to be modified for a gate, for e.g. ‘AND’ gate depending on the
number of inputs. What would the threshold for the “AND” gate be if the number
of inputs is x1, x2, x3?

7
Introduction to machine learning

Boolean Gates and Artificial Neurons

1. The objective of research in artificial neuron was to develop a computing system that could
learn to do tasks on it’s own without instruction how to do it

2. Most of the tasks that we do in our day to day life are classification. Hence, research was on
to develop an artificial neuron that could classify

3. Since computing systems are based on Boolean gates which generated two classes , it was
natural to check whether the artificial neurons can learn to mimic these gates such as the
OR, AND gate

8
Introduction to machine learning

Rosenblatt Neuron / Perceptron

1. Weights for the inputs are not same, can be positive or negative

2. The output can be -1, 0, 1 unlike MCP neuron where the output is only 0 or 1

3. The neuron is associated with a learning rule that modifies the weight to ensure with
same threshold, the neuron can behave like “AND” gate or “OR” gate with no need for
any threshold modification

4. This neuron learns from the data. It has intelligence to learn the pattern from the data

9
Introduction to machine learning

Machine Learning (Artificial Neural Network)

McCulloh Pitts Neuron

AND Gate

w[1] = 1 resul
w[0] = 1 t

training_data = [
(array([0,0]), 0),
(array([0,1]), 0),
(array([1,0]), 0),
(array([1,1]), 1),
]

# Step function with threshold of >1. Anything below is 0

step_function = lambda x: 0 if x < 1 else 1

w[1] = 1
McCullohPitt_Ros
w[0] = 1 enBlat_Ne

for x, _ in training_data:
result = dot(x, w)
print("{}: {} -> {}".format(x[:2], result, step_function(result)))

The McCulloch-Pitts model of a neuron is simple. However, this model is so simplistic that it only generates a binary output and also the
weight and threshold values are fixed.

10
Introduction to machine learning
Machine Learning (Artificial Neural Network) /
Perceptron…

8. Rosenblatt’s Neuron Functioning

i. The objective of the neuron is to extract the relationship between inputs and output
from an example training set
j. The relationship has to be expressed in a mathematical function from. A
mathematical function consists of sum of inputs multiplied with respective weights
k. For each example the weights have to be adjusted (increased or decreased) to
achieve overall correct result across the dataset
l. Perceptron algorithm –
i. Start with a random set of weights for all the input variables and the bias
j. For the input data point, compute the output using the weights and the bias
k. If the calculated output does not match the expected output (from training data) modify the
weights **
l. Go to the next example in the training set and repeat steps j - k until the Perceptron makes
no more mistakes

** The relation between weight adjustment and errors and a learning rate will make the learning possible.
Learning is the process finding the right combination of weights that minimizes the errors in the training data set
11
Introduction to machine learning
Machine Learning (Artificial Neural Network) /
Perceptron…

9. Perceptron Limitations -
a. Papert and Minsky demonstrated that the perceptron was incapable of
handling some of the binary gates such as XOR**
b. Given that it cannot represent all possible binary gates, it could not
have been used for all possible computations hence the objective of
computer based AI was a pipe dream!
c. It was subsequently demonstrated that instead of making a neuron
intelligent, a network of neurons can be used to do what a single
neuron could not
d. This was the birth of a Artificial Neural Network

** Ref: https://alan.do/minskys-and-or-theorem-a-single-perceptron-s-limitations-490c63a02
e9f

Lab - McCullohPitt_RosenBlat_Neurons.ipynb

12
Introduction to machine learning
Machine Learning (Artificial Neural Network)
10. The processing elements of a ANN is called a node, representing the artificial
neuron. Each ANN is composed of a collection of nodes grouped in layers. A typical
structure is shown The initial layer is the input layer and the last layer is the output
layer. In between we have the hidden layers

Input Layer 1st Hidden 2nd Hidden Output Layer

Layer Layer

X2
Y_Pred

B1 B2 B3

Bias Layer

13
Introduction to machine learning
Machine Learning (Artificial Neural Network)
11. Mathematical foundations for artificial neural networks
a. Kolmogorov theorem – any continuous function f defined on n-dimensional cube is
representable by sums and superpositions of continuous functions of only one variable

Input Layer 1st Hidden 2nd Hidden Layer Output Layer

Layer

X
1

X
Y_Pred
2

X
3

Bias Layer
B B B
1 2 3
b. Cover’s theorem - states that given a set of training data that is not linearly separable, one
can with high probability transform it into a training set that is linearly separable by
projecting it into a higher-dimensional space via some non-linear transformation.

14
Introduction to machine learning
Machine Learning (Artificial Neural Network)
12. A given node will fire and feed a signal to subsequent nodes in next layer only if the
non-linear function it implements reaches a threshold. In ANN use of Sigmoid function
is more common than step function

Output ai fired

Threshold
input

15
Introduction to machine learning
Machine Learning (Artificial Neural Network)
13. The summation function can be implemented in many ways. It does not have to be
mathematical addition of the inputs

16
Introduction to machine learning
Machine Learning (Artificial Neural Network)
14. The ANN generic architecture

15. Neural net consists of multiple layers. It has two layers on the edge, one is input layer
and the other is output layer.

16. In between input and output layer, there can be many other layers. These layers are
called hidden layers

17
Introduction to machine learning
Machine Learning (Artificial Neural Network)
17. The input layer is passive, does no processing, only holds
the input data to supply it to the first hidden layer

18
Introduction to machine learning
Machine Learning (Artificial Neural Network)
18. Each node in the first hidden layer, takes all input attributes, multiplies with the
corresponding weights, adds bias and the output is transformed using non_linear
function

X1 Hidden Layer Node 1

W11 W12 W13

ACC = X1W11 + X2W12 + X3*W13

X3 N1Output = Sigmoid(ACC)

19. The weights for a given hidden node is pre-fixed and all the nodes in the hidden layer
have their own weights

20. The output of each node is fed to output layer nodes or another set of hidden nodes in
another hidden layer

19
Introduction to machine learning
Machine Learning (Artificial Neural Network)
21. The output value of each hidden node is sent to each output node in the output layer

20
Introduction to machine learning
Machine Learning (Artificial Neural Network)

X1
Output Node 1

X2
011 O12 O13 O14
X3
ACC = X1*WO11 + X2*WO12 +
X4 X3*WO13 + X4*WO14

N1Output = Sigmoid(ACC)

21
Introduction to machine learning
Machine Learning (Artificial Neural Network)
22. In a binary output ANN, the output node acts like a perceptron classifying the input
into one of the two classes

23. Examples of such ANN applications would be to detect fraudulent transaction,

whether a customer will buy a product given the attributes etc.

22
Introduction to machine learning
Machine Learning (Artificial Neural Network)
24. We can have a ANN with multiple output nodes where a given output node may or
may not get triggered given the input and the weights.

25. We can have a ANN with multiple output nodes where a given output node may or
may not get triggered given the input and the weights.

23
Introduction to machine learning
Machine Learning (Artificial Neural Network)
26. The weights required to make a neural network carry out a particular task are found
by a learning algorithm, together with examples of how the system should operate

27. The examples in vehicle identification could be a large hadoop file of serveral millions
sample segments such as bicycle, motorcycle, car, bus etc.

28. The learning algorithms calculate the appropriate weights for each classification for all
nodes at all the levels in the network

29. If we consider each input as a dimension then ANN labels different regions in the n-
dimensional space. In our example one region is cars, other region is bicycle

Car

Bycycle

Image Source: https://en.wikipedia.org/wiki/K-d_tree#/media/File:3dtree.png

24
Introduction to machine learning

Perceptrons

25
Introduction to machine learning
Perceptron
Perceptron Learning Algorithm –

1. Select random sample from training set as input. Draw the first random line (green) such that
blue triangles lie above it and red circles ones below

2. If classification is correct, do nothing. But first time many blue triangles are on wrong side!

3. If classification is incorrect, modify the weight vector w and shift the green line

4. Repeat this procedure until the entire training set is classified correctly

5. Howsoever times we run this algorithm, it will find a surface which separates the two classes

Run 1 Run n

26
Introduction to machine learning

Test

6. Convergence theorem guarantees that when the classes are linearly separable in the
training set, perceptron will find that surface which separates the two classes correctly

7. The perceptron algorithm does not guarantee it will be able to separate the two classes
correctly even when the classes are linearly separable

8. Why? Because it does not look for an optimal plane. It stops the moment it finds the
separator plane (a.k.a dichotomizer).

9. Since the planes, are passing very close to the data points in the training set, it may not
perform well in test set where the distribution of the data will be different

27
Introduction to machine learning
Perceptron Weakness

1. Perceptrons fail to handle many data distributions such as XOR where it cannot
segregate the classes

2. XOR is an example of distribution of classes not linearly separable in two

dimensions but is easily separable in higher 3 dimensional space. Ref:
http://www.mind.ilstu.edu/curriculum/artificial_neural_net/xor_problem_and_soluti
on.php

3. Cover’s theorem – “Formulating classification problem in a space of higher

dimensionality (than original problem) increases the probability of the classes
becoming linearly separable

4. Neural networks help implement this theorem in from of neurons in a layer

28
Introduction to machine learning
Origins of Neural Networks

1. Perceptron were replaced with artificial neuron which not only had a weighted
summation operation but also included a non-linearity function.

2. Multitude of such neurons working together could solve those problems where
individual Perceptron failed

3. Multiple neurons in a single layer is akin to transforming data from lower

dimensions to higher dimensions! (Cover’s theorem in action)

4. Kolmogorov theorem* states that any continuous function f(x1,x2,…,xn) defined

on [0,1] with n>2 can be expressed in form of two carefully chosen functions

5. Looks similar to layer of neurons with non-linear function g acting on summation

of weighted inputs ! No wonder neuralnets form basis of complex processing
* https://en.wikipedia.org/wiki/Universal_approximation_theorem (another mathematical theorem)

29
Introduction to machine learning

Components of Neural Networks

30
Introduction to machine learning

Activation Functions

1. Artificial neuron works in three steps

1. First it multiplies the input signals with corresponding weights
2. Second, adds the weighted signals
3. Third, converts the result to another value using a mathematical transformation
function

2. For the third step, there are multiple mathematical functions available but all
together are called the activation function

3. The purpose of the activation function is to act like a switch for the neuron.
Should the neuron fire or not. Also…

4. The activation function is critical to the overall functioning of the neural network.
Without it, the whole neural network will mathematically become equivalent to
one single neuron!

5. Activation function is one of the critical component that gives the neural networks
ability to deal with complex problems

31
Introduction to machine learning

Activation Functions - why?

1. Let us take a fully connected neural network. Every neuron in

every layer takes multiple inputs
2. The inputs are weighted and summed up at each neuron
3. The nodes in the second layer are simply scaling up the output
of neurons in previous layer

4. For e.g the neuron G takes as input weight sums from D,E and
F, G’s output is scaled version of output of D, E and F
5. G_Out = 3D -2E + 1F
6. = 3(1A + 2B + 3C) – 2(-3A + 2B -1C) + 1(2A+4B-2C)
7. = 8A + 6B + 3C
G-Out
8. Thus this part of the network is like a single neuron with weights
of 8, 6, 3!!!

9. Same argument holds for other neurons

10. Thus the entire neural network collapses to on neuron!

11. A single neuron is not capable of doing much

32
Introduction to machine learning

Activation Functions - What are they?

1. The activation functions are a mathematical transformations that prevent the

network from collapsing to a single neuron

2. The collapse can happen when the neurons do simple addition and
multiplications of the inputs. These are called linear operations. Thus linear
operations collapse the network

3. All activations functions are non-linear transformers for exactly the same
reason. This non-linear transformation not only prevents collapse, it also
empowers the network to do complex tasks because each neuron does
something in the network totality

33
Introduction to machine learning

Activation Functions - Types

4. Types of non-linear activation functions include –
a. Piecewise linear functions
I. Step function
II. ReLU – Rectified Linear Units
III. Leaky ReLU
IV. Parametric ReLU
V. Shifted ReLU

b. Smooth functions
I. Smooth ReLU / Exponential ReLU
II. Sigmoid / Logistic functions
III. Hyperbolic Tangent (tanh)
IV. Swish (combination of Sigmoid and ReLU)

34
Introduction to machine learning

Activation Functions -

35
Introduction to machine learning
Neurons stretch the features space through non-linear functions and achieve
Cover’s theorem
N
ACC = m1X + C1 N1Output = Sigmoid(ACC) 1

Neuron1 X N
3
N
2

Input distribution Transformed thru sigmoid

ACC = m2X + C2 N2Output = Sigmoid(ACC)

Neuron2 Data points get

linearly separable

Transformed thru sigmoid

Input distribution

36
Introduction to machine learning

Neurons stretch the features pace through non-linear functions and

achieve Cover’s theorem

Image Source: https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Ref: https://cs.stanford.edu/people/karpathy/convnetjs//demo/classify2d.html

37
Introduction to machine learning

SoftMax Function -
1. A kind of operation applied at the output neurons of a classifier network
2. Used only when we have two or more output neurons and is applied simultaneously to all
the output neurons
3. Turns raw numbers coming out of the pen-ultimate layer into probability values in the
output layer
4. Suppose output layer neurons emit (Op1, Op2, Opn). The raw numbers may not make
much sense. We convert that into probabilities using Softmax which becomes more
meaningful. For e.g. input belongs to cycle is 30 times more likely than sailboat, 13 times
more probable than car
Output Probability of Output
Layer belonging to Classes

.07
Op1

0.9
Entire Op2
Network

.03

Opn

38
Introduction to machine learning

Forward Propagation

1. The directed acyclic path taken by input data from input layer to get
transformed using non-linear functions into final network level outputs

2. Input data is propagated forward from the input layer to hidden layer till it
reaches final layer where predictions are emitted

3. At every layer, data gets transformed non-linearly in every neuron

4. There may be multiple hidden layers with multiple neurons in each layer

5. The last layer is the output layer which may have a softmax function (if the
network is a multi class classifier)

39
Introduction to machine learning

Forward Propagation

a. Calculate the weighted input to the hidden layer by multiplying 𝑋 by the hidden
6. Forward prop steps –

weight 𝑊ℎ

c. At output layer, repeat step b replacing 𝑋 by the hidden layer’s output

b. Apply the activation function and pass the result to the final layer

Input Layer Hidden Layer Output Layer

40
Introduction to machine learning

Forward Propagation and Matrix operations

Note: The diagram shows step function instead of ReLU in each neuron
The bias is all set to 1. The bias supplied to a neuron depends on the weight assigned to the
connector connecting bias to the neuron
41
Introduction to machine learning

Bias Term
1. Every neuron in the hidden layers are associated with a bias term. The bias
term help us to control the firing threshold in each neuron

2. It acts like the intercept in a linear equation (y = sum(mx) + c). If sum(mx) is

not crossing the threshold but the neuron needs to fire

3. bias will be adjusted to lower that neuron’s threshold to make it fire! Network
learns richer set of patterns using bias

4. The bias term is also considered as input though it does not come from data
42
Introduction to machine learning

Loss function (Mean Square Loss)

1. What is an optimization algorithm and what is its use? - Optimization algorithms helps us to minimize
(or maximize) an Objective function (another name for Error function) E(x) which is simply a
mathematical function dependent on the Model’s internal learnable parameters which are used in
computing the target values(Y) from the set of predictors(X) used in the model

2. C = ½(( wi.xi + b) – y). In this expression Xi and y come from the data and are given. What the ML
algorithm learns is the weight wi and bias b. Thus C = f( wi, b)

3. The optimizer algorithms try to estimate the values of wi and b which when used will give minimum or
maximum C. In ML we look for minimum

43
Introduction to machine learning

Relation between error and change in weights

Since part of neuron function is linear equation (before applying the non-linear transformation), the error at
each neuron can be expressed in terms of the linear equation.
+
w))x
(
1 +d
(w
Y=
c

1x + c
Actual y1 given X1
Y = w
d iction
Error e1 in pre

d(w) Predicted y1 given X1 e1 = yellow – red

w1 = y_actual - y_pred
= (w1+dm)x + c - (w1x + c)
= w1x + c + dwx – w1x -c
= dwx

Hence dw = e1 / x
The change required in m (dw) is e1/x. However, change required w.r.t another data point
may be different. To prevent jumping around with dw, we moderate the change in W by
introducing a learning rate l . Hence dw = l( e1/x)

44
Introduction to machine learning

Back Propagation
1. Back propagation is the process of learning that the neural network employs to re-
calibrate the weights and bias at every layer and every node to minimize the error in the
output layer

2. During the first pass of forward propagation, the weights and bias are random number.
The random numbers are generated within a small range say 0 – 1

3. Needless to say, the output of the first iteration is almost always incorrect. The
difference between actual value / class and predicted value / class is the error

4. All the nodes in all the preceding layers have contributed to the error and hence need to
get their share of the error and correct their weights

5. This process of allocating proportion of the error to all the nodes in the previous layer is
back propagation

6. The goal of back propagation is to adjust weights and bias in proportion to the error
contribution and in iterative process identify the optimal combination of weights

7. At each layer, at each node, gradient descent algorithm is applied to adjust the weights

45
Introduction to machine learning

Back Propagation

1. Error in output node shown as e1, is contributed by node 1 ,2 and 3 of layer 2 through weights w(3,1),
w(3,2), w(3,3)

2. Proportionate error is assigned back to node 1 of hidden layer 2 is (w(3,1) / w(3,1) + w(3,2) + w(3,3)) * e1

3. The error assigned to node 1 of hidden layer 2 is proportionately sent back to hidden layer 1 neurons

4. All the nodes in all the layers re-adjust the input weights and bias to address the assigned error (for this
they use gradient descent)

5. The input layer is not neurons, they are like input parameters to a function and hence have no errors
46
Introduction to machine learning

Gradient Descent
The challenge is, all the weights in all the inputs of all the neurons need to be adjusted. It is not
manually possible to find the right combination of weights using brute force. Instead, the neural
network algorithm uses a learning function called gradient descent

1. A random combination of bias B1 and input weights

Convex error function W1 (showing only one as more than one is not
possible to visualize)
Error

2. Each combination of W1 and B1 is one particular

linear model in a neuron. That model is associated
with proportionate error e1 (red dashed line).

e1 3. Objective is to drive e1 towards 0. For which we need

to find the optimal weight (Woptimal) and bias
(Boptimal)
Bi
as w1 4. The algorithm uses gradient descent algorithm to
B1

change bias and weight form starting values of B1

and W1 towards the Boptimal, Woptimal.
ht
Weig
Note: in 3D error surface can be visualized as shown but
not in more than 3 dimensions

47
Introduction to machine learning

Gradient Descent Actual y1 given X1

Predicted y1 given X1

e1 w1

Happens at every node

Error

e1 Least error E2 is at the global minima of the

e2 convex function which only one unique
combination of weight (woptimal) and bias
Bi Bo (boptimal) will fetch us.
as Bp1tim w1
l
al tima
Wop
ht
Weig
48
Introduction to machine learning

Gradient Descent

1. Let target value for a training example X be y i.e. The data frame used for training
has value X,y

2. Let the model (represented by random m and c) predict the value for the training
example X to be yhat

3. Error in prediction is E = yhat – y. If we sum all the errors across all data points,
some will be positive some negative and thus cancel out

4. To prevent the sum of errors becoming 0, we square the error i.e. E = (y – yhat)^2.
Note: in squared expression, y – yhat or yhat – y mean the same

5. Sum of (y – yhat)^2 across all the X values is called SSE (Sum of Squared Errors)

6. Using gradient descent (descend towards the global minima). Gradient descent uses
partial derivatives i.e how the SSE changes on slightly modifying the model
parameters m and c one at a time
d(E) / d(m) = d(sum(yhat – y)^2) / d(m)
d(E) / d(c) = d(sum(yhat – y)^2) / d(c)

49
Introduction to machine learning

Gradient Descent
1. Gradient descent is a way to minimize an objective function / cost function such as Sum of
Squared Errors (SSE) that is dependent on model parameters of weight / slope and bias

2. The parameters are updated in the direction opposite to the direction of the gradient
(direction of maximum increase) of the objective function

3. In other words we change the values of weight and bias following the direction of the slope
of the surface of the error function down the hill until we reach minima

4. This movement from starting weight and bias to optimal weight and bias may not happen in
one shot. It is likely to happen in multiple iterations. The values change in steps

5. The step size can be influenced using a parameter called Learning Rate. It decides the size
of the steps i.e. the amount by which the parameters are updated. Too small learning step
will slow down the entire process while too large may lead to an infinite loop
learning step
6. The mathematical expression of gradient descent

Update Model parameter at Old Model parameter at e1 Gradient descent with

e2 learning step

50
Introduction to machine learning

Gradient Descent

Transform our error function (which is a quadratic / convex function) into a

contour graph. Gradient is always found on the input model parameters only

1. Every ring on the error function represents a

combination of coefficients (m1 and m2 in the
image) which result in same quantum of error
i.e. SSE

2. Let us convert that to a 2d contour plot. In the

contour plot, every ring represents one quantum
s of error.
Bi a

we 3. The innermost ring / bull’s eye is the

ig h combination of the coefficients that gives the
t
lease SSE

51
Introduction to machine learning
Randomly selected starting point
About the contour graph –

1. Outermost circle is highest error while innermost is the least

error circle
B1
W
1 2. A circle represents combination of paramters which result in
same error. Moving on a circle will not reduce error.

3. Objective is to start from anywhere but reach the innermost

we circle
s
Bia ig ht Gradient Descent Steps –

1. First evaluate dy(error)/d(weight) to find the direction of highest increase in

error given a unit change in weight (Blue arrow). Partial derivative w.r.t. to
Gradient vector weight

2. Next find dy(error) /d(bias) to find the direction of highest increase in error
given a unit change in bias (green arrow). Partial derivative w.r.t. to bias

3. Partial derivatives give the gradient in the given axis and gradient is a
vector

4. Add the two vectors to get the direction of gradient (black arrow) i.e.
direction of max increase in error

5. We want to decrease error, so find negative of the gradient i.e. opposite to

black arrow ( Orange arrow). The arrow tip is new value of bias and weight.

6. Recalculate the error at this combination an iterate to step 1 till movement

in any direction only increases the error
52
Introduction to machine learning

Thank You

1 Rakitan Printer 02 Agustus 2021
No ratings yet
1 Rakitan Printer 02 Agustus 2021
1 page
Administration of Estates
No ratings yet
Administration of Estates
52 pages
Firestone Epdm Tds en 2020
No ratings yet
Firestone Epdm Tds en 2020
2 pages
Class X Unit 3 DBMS
No ratings yet
Class X Unit 3 DBMS
78 pages
M71-WL Manual v1.0
No ratings yet
M71-WL Manual v1.0
6 pages
User's Manual: Sun-Odn-F
No ratings yet
User's Manual: Sun-Odn-F
4 pages
SoftComputing Module I
No ratings yet
SoftComputing Module I
4 pages
Perceived Guest House Brand Value The Influence of Web Interactivity On Brand Image and Brand Awareness
No ratings yet
Perceived Guest House Brand Value The Influence of Web Interactivity On Brand Image and Brand Awareness
29 pages
Practical Asessment - 3.2022
No ratings yet
Practical Asessment - 3.2022
303 pages
Neural Network - Overview
No ratings yet
Neural Network - Overview
37 pages
Atys c25 Ats Controller Installation and Operating Manual 2022-09-549780 en
No ratings yet
Atys c25 Ats Controller Installation and Operating Manual 2022-09-549780 en
42 pages
PCOALAUUSDM
No ratings yet
PCOALAUUSDM
9 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
3 Intro To ANN
No ratings yet
3 Intro To ANN
39 pages
Socialization of Agriculture
No ratings yet
Socialization of Agriculture
2 pages
Module 4
No ratings yet
Module 4
84 pages
Marine Hsse Final Assignment Chop Saw
No ratings yet
Marine Hsse Final Assignment Chop Saw
11 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Neural Network
No ratings yet
Neural Network
82 pages
EC 504 End Semester QP
No ratings yet
EC 504 End Semester QP
3 pages
Unit 1 Notes Final
No ratings yet
Unit 1 Notes Final
36 pages
Wk. 12. Artificial Neural Networks (12!05!2021)
No ratings yet
Wk. 12. Artificial Neural Networks (12!05!2021)
48 pages
A Review of Evaporative Cooling Technologies
No ratings yet
A Review of Evaporative Cooling Technologies
8 pages
An Agricultural Robotfor Multipurpose Operationsina Greenhouse
No ratings yet
An Agricultural Robotfor Multipurpose Operationsina Greenhouse
11 pages
Unit 4 - DL
No ratings yet
Unit 4 - DL
33 pages
Intro DL
No ratings yet
Intro DL
48 pages
Copy of Copy of LOCAL BIRTH CERTIFICATE - 20250116 - 135004 - 0000.pdf - 20 - 20250221 - 121021 - 0000
No ratings yet
Copy of Copy of LOCAL BIRTH CERTIFICATE - 20250116 - 135004 - 0000.pdf - 20 - 20250221 - 121021 - 0000
4 pages
Lect 5
No ratings yet
Lect 5
41 pages
Unit 1
No ratings yet
Unit 1
30 pages
Thayer, Vice President Kamala Harris Visit To Vietnam Scene Setter
No ratings yet
Thayer, Vice President Kamala Harris Visit To Vietnam Scene Setter
3 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Inbound 91797242154262642
No ratings yet
Inbound 91797242154262642
7 pages
Backface Removal
No ratings yet
Backface Removal
4 pages
Fashion Polka Dot Background Business PPT Templates
No ratings yet
Fashion Polka Dot Background Business PPT Templates
25 pages
Oferta de Compraventa Bilingüe
No ratings yet
Oferta de Compraventa Bilingüe
6 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
4-Early Neural Network Architectures (MADALINE Network), and Application Domains.-16!12!2024
No ratings yet
4-Early Neural Network Architectures (MADALINE Network), and Application Domains.-16!12!2024
136 pages
Neural Network
No ratings yet
Neural Network
85 pages
Module 1
No ratings yet
Module 1
100 pages
Module - 2
No ratings yet
Module - 2
33 pages
ML Lec11
No ratings yet
ML Lec11
14 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
Neural Network
No ratings yet
Neural Network
18 pages
Chapter 1 Introduction To Neural Network and Machine Learning
No ratings yet
Chapter 1 Introduction To Neural Network and Machine Learning
16 pages
Inventor Tutorials
100% (3)
Inventor Tutorials
1,264 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Mod-1 Part 1
No ratings yet
Mod-1 Part 1
143 pages
Learning Algorithm
No ratings yet
Learning Algorithm
58 pages
Mi 2
No ratings yet
Mi 2
605 pages
DS4510 5010
100% (1)
DS4510 5010
2 pages
UNIT-4 Material
No ratings yet
UNIT-4 Material
43 pages
Puente Arizona Et Al v. Arpai Arizona MOTION For Summary Judgment
100% (1)
Puente Arizona Et Al v. Arpai Arizona MOTION For Summary Judgment
31 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
Three Phase Frequency Converter PDF
No ratings yet
Three Phase Frequency Converter PDF
86 pages
Young Medi CT Scanners
No ratings yet
Young Medi CT Scanners
3 pages
10 EIM Q2M1 TLE10 - EIM - Q2 - Mod1 - Wk1-5 - Elec-Meter-Connection-and-Grounding - v3
100% (1)
10 EIM Q2M1 TLE10 - EIM - Q2 - Mod1 - Wk1-5 - Elec-Meter-Connection-and-Grounding - v3
35 pages
Neural Networks: Machine Learning Is Machine Learning Is
No ratings yet
Neural Networks: Machine Learning Is Machine Learning Is
23 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Topic 3i - Artificial Neural Networks - Revised 20032020
100% (1)
Topic 3i - Artificial Neural Networks - Revised 20032020
70 pages
Chapter 1 - Introduction To Deep Learning 2023
No ratings yet
Chapter 1 - Introduction To Deep Learning 2023
50 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Deep Learning
No ratings yet
Deep Learning
100 pages
Organic Bakery Marketing Plan
No ratings yet
Organic Bakery Marketing Plan
30 pages
Government of India Technical Centre, Opposite Safdarjung Airport, New Delhi-110003
No ratings yet
Government of India Technical Centre, Opposite Safdarjung Airport, New Delhi-110003
11 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Introduction To Artificial Neural Networks and Perceptron
No ratings yet
Introduction To Artificial Neural Networks and Perceptron
59 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
Lecture Two: The Perceptron: CEG5301: Machine Learning With Applications
No ratings yet
Lecture Two: The Perceptron: CEG5301: Machine Learning With Applications
66 pages
Neural Networks and CNN
No ratings yet
Neural Networks and CNN
25 pages
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
35 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Stat Learning 7 R
No ratings yet
Stat Learning 7 R
86 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
54 pages
SOS Final Submission
No ratings yet
SOS Final Submission
36 pages
March Pump SP-TE-7K-MD
No ratings yet
March Pump SP-TE-7K-MD
2 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
03 NeuralNetworksI PDF
100% (1)
03 NeuralNetworksI PDF
78 pages
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
24 pages
Neuro-Fuzzy Systems and Their Applications: Bogdan
No ratings yet
Neuro-Fuzzy Systems and Their Applications: Bogdan
15 pages
28 Lecture CSC462
No ratings yet
28 Lecture CSC462
28 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
23 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
51 pages
Part7.2 Artificial Neural Networks
No ratings yet
Part7.2 Artificial Neural Networks
51 pages
Fundamentals of Artificial Neural Networks
No ratings yet
Fundamentals of Artificial Neural Networks
27 pages
3 - ANN Part One PDF
No ratings yet
3 - ANN Part One PDF
30 pages

Deep Neural Networks

Uploaded by

Deep Neural Networks

Uploaded by

Introduction to machine learning

Deep Neural Networks

Machine Learning (Artificial Neural Network)

3. Just as brain uses a network of interconnected neurons to parallelize the processing of

4. It is interesting to know how biological neurons function Ref:

Machine Learning (Artificial Neural Network)

5. Artificial Neural Network (ANN) models relationships between a set of

6. Natural / Abstract neuron

7. History artificial neuron

Machine Learning (Artificial Neural Network) /

7. History artificial neuron (contd….

Warren McCulloch & Walter Pitts Neuron Rosenblatt’s Perceptron

1. It has an Input layer that acts as dendrites

2. It has two parts, first part is weighted addition

Warren McCulloch & Walter Pitts “AND”, “OR” Neuron

Boolean Gates and Artificial Neurons

Rosenblatt Neuron / Perceptron

Machine Learning (Artificial Neural Network)

# Step function with threshold of >1. Anything below is 0

8. Rosenblatt’s Neuron Functioning

Input Layer 1st Hidden 2nd Hidden Output Layer

Input Layer 1st Hidden 2nd Hidden Layer Output Layer

X1 Hidden Layer Node 1

W11 W12 W13

ACC = X1*W11 + X2*W12 + X3*W13

23. Examples of such ANN applications would be to detect fraudulent transaction,

Image Source: https://en.wikipedia.org/wiki/K-d_tree#/media/File:3dtree.png

2. XOR is an example of distribution of classes not linearly separable in two

3. Cover’s theorem – “Formulating classification problem in a space of higher

4. Neural networks help implement this theorem in from of neurons in a layer

3. Multiple neurons in a single layer is akin to transforming data from lower

4. Kolmogorov theorem* states that any continuous function f(x1,x2,…,xn) defined

5. Looks similar to layer of neurons with non-linear function g acting on summation

Components of Neural Networks

1. Artificial neuron works in three steps

Activation Functions - why?

1. Let us take a fully connected neural network. Every neuron in

9. Same argument holds for other neurons

10. Thus the entire neural network collapses to on neuron!

Activation Functions - What are they?

1. The activation functions are a mathematical transformations that prevent the

Activation Functions - Types

Input distribution Transformed thru sigmoid

ACC = m2X + C2 N2Output = Sigmoid(ACC)

Neuron2 Data points get

Transformed thru sigmoid

Neurons stretch the features pace through non-linear functions and

Image Source: https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

3. At every layer, data gets transformed non-linearly in every neuron

c. At output layer, repeat step b replacing 𝑋 by the hidden layer’s output

Input Layer Hidden Layer Output Layer

Forward Propagation and Matrix operations

2. It acts like the intercept in a linear equation (y = sum(mx) + c). If sum(mx) is

Loss function (Mean Square Loss)

Relation between error and change in weights

d(w) Predicted y1 given X1 e1 = yellow – red

1. A random combination of bias B1 and input weights

2. Each combination of W1 and B1 is one particular

e1 3. Objective is to drive e1 towards 0. For which we need

change bias and weight form starting values of B1

Gradient Descent Actual y1 given X1

Happens at every node

e1 Least error E2 is at the global minima of the

Update Model parameter at Old Model parameter at e1 Gradient descent with

Transform our error function (which is a quadratic / convex function) into a

1. Every ring on the error function represents a

2. Let us convert that to a 2d contour plot. In the

we 3. The innermost ring / bull’s eye is the

1. Outermost circle is highest error while innermost is the least

3. Objective is to start from anywhere but reach the innermost

1. First evaluate dy(error)/d(weight) to find the direction of highest increase in

5. We want to decrease error, so find negative of the gradient i.e. opposite to

6. Recalculate the error at this combination an iterate to step 1 till movement

You might also like

ACC = X1W11 + X2W12 + X3*W13