Open navigation menu

Scribd

0% found this document useful (0 votes)

26 views200 pages

Lecture3 - Gradient Descent - IITM - 23-1-200

The document discusses the limitations of perceptrons and introduces sigmoid neurons as a way to represent arbitrary real-valued functions using a smoother activation function compared to the harsh thresholding of perceptrons. It provides an example of how the harsh thresholding of perceptrons can lead to undesirable behavior when making predictions and suggests that sigmoid neurons may address this issue.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views200 pages

Lecture3 - Gradient Descent - IITM - 23-1-200

The document discusses the limitations of perceptrons and introduces sigmoid neurons as a way to represent arbitrary real-valued functions using a smoother activation function compared to the harsh thresholding of perceptrons. It provides an example of how the harsh thresholding of perceptrons can lead to undesirable behavior when making predictions and suggests that sigmoid neurons may address this issue.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 200

CS7015 (Deep Learning) : Lecture 3

Sigmoid Neurons, Gradient Descent, Feedforward Neural Networks,

Representation Power of Feedforward Neural Networks

Mitesh M. Khapra

Department of Computer Science and Engineering

Indian Institute of Technology Madras

0/61
Acknowledgements
• For Module 3.4, I have borrowed ideas from the videos by Ryan Harris on “visualize
backpropagation” (available on youtube)
• For Module 3.5, I have borrowed ideas from this excellent book ? which is available
online
• I am sure I would have been influenced and borrowed ideas from other sources and I
apologize if I have failed to acknowledge them

?
http://neuralnetworksanddeeplearning.com/chap4.html 1/61

1
Module 3.1: Sigmoid Neuron

2/61

2
The story ahead ...
• Enough about boolean functions!

3/61

3
The story ahead ...
• Enough about boolean functions!
• What about arbitrary functions of the form y = f(x) where x ∈ Rn (instead of {0, 1}n )
and y ∈ R (instead of {0, 1}) ?

3/61

3
The story ahead ...
• Enough about boolean functions!
• What about arbitrary functions of the form y = f(x) where x ∈ Rn (instead of {0, 1}n )
and y ∈ R (instead of {0, 1}) ?
• Can we have a network which can (approximately) represent such functions ?

3/61

3
The story ahead ...
• Enough about boolean functions!
• What about arbitrary functions of the form y = f(x) where x ∈ Rn (instead of {0, 1}n )
and y ∈ R (instead of {0, 1}) ?
• Can we have a network which can (approximately) represent such functions ?
• Before answering the above question we will have to first graduate from perceptrons
to sigmoidal neurons ...

3/61

3
Recall
• A perceptron will fire if the weighted sum of its inputs is greater than the threshold
(-w0 )

4/61

4
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5

w1 = 1
x1

5/61

5
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5 • For example, let us return to our problem of decid-
ing whether we will like or dislike a movie

w1 = 1
x1

5/61

5
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5 • For example, let us return to our problem of decid-
ing whether we will like or dislike a movie

w1 = 1 • Consider that we base our decision only on one

input (x1 = criticsRating which lies between 0 and
x1
1)
criticsRating

5/61

5
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5 • For example, let us return to our problem of decid-
ing whether we will like or dislike a movie

w1 = 1 • Consider that we base our decision only on one

input (x1 = criticsRating which lies between 0 and
x1
1)
criticsRating
• If the threshold is 0.5 (w0 = −0.5) and w1 = 1
then what would be the decision for a movie with
criticsRating = 0.51 ?

5/61

5
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5 • For example, let us return to our problem of decid-
ing whether we will like or dislike a movie

w1 = 1 • Consider that we base our decision only on one

input (x1 = criticsRating which lies between 0 and
x1
1)
criticsRating
• If the threshold is 0.5 (w0 = −0.5) and w1 = 1
then what would be the decision for a movie with
criticsRating = 0.51 ? (like)

5/61

5
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5 • For example, let us return to our problem of decid-
ing whether we will like or dislike a movie

w1 = 1 • Consider that we base our decision only on one

input (x1 = criticsRating which lies between 0 and
x1
1)
criticsRating
• If the threshold is 0.5 (w0 = −0.5) and w1 = 1
then what would be the decision for a movie with
criticsRating = 0.51 ? (like)
• What about a movie with criticsRating = 0.49 ?

5/61

5
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5 • For example, let us return to our problem of decid-
ing whether we will like or dislike a movie

w1 = 1 • Consider that we base our decision only on one

input (x1 = criticsRating which lies between 0 and
x1
1)
criticsRating
• If the threshold is 0.5 (w0 = −0.5) and w1 = 1
then what would be the decision for a movie with
criticsRating = 0.51 ? (like)
• What about a movie with criticsRating = 0.49 ?
(dislike)

5/61

5
y
• The thresholding logic used by a perceptron is very
harsh !
bias = w0 = −0.5 • For example, let us return to our problem of decid-
ing whether we will like or dislike a movie

w1 = 1 • Consider that we base our decision only on one

input (x1 = criticsRating which lies between 0 and
x1
1)
criticsRating
• If the threshold is 0.5 (w0 = −0.5) and w1 = 1
then what would be the decision for a movie with
criticsRating = 0.51 ? (like)
• What about a movie with criticsRating = 0.49 ?
(dislike)
• It seems harsh that we would like a movie with rat-
5/61
ing 0.51 but not one with a rating of 0.49
5
• This behavior is not a characteristic of the spe-
cific problem we chose or the specific weight and
threshold that we chose

6/61

6
• This behavior is not a characteristic of the spe-
1
cific problem we chose or the specific weight and
threshold that we chose
• It is a characteristic of the perceptron function it-
y

self which behaves like a step function

P-wn 0
z= i=1 wi xi

6/61

6
• This behavior is not a characteristic of the spe-
1
cific problem we chose or the specific weight and
threshold that we chose
• It is a characteristic of the perceptron function it-
y

self which behaves like a step function

• There will always be this sudden change in the de-
cision (from 0 to 1) when ni=1 wi xi crosses the
P

threshold (-w0 )
P-wn 0
z= i=1 wi xi

6/61

6
• This behavior is not a characteristic of the spe-
1
cific problem we chose or the specific weight and
threshold that we chose
• It is a characteristic of the perceptron function it-
y

self which behaves like a step function

• There will always be this sudden change in the de-
cision (from 0 to 1) when ni=1 wi xi crosses the
P

threshold (-w0 )
P-wn 0
z= • For most real world applications we would ex-
i=1 wi xi
pect a smoother decision function which gradually
changes from 0 to 1

6/61

6
• Introducing sigmoid neurons where the output
1
y function is much smoother than the step function

P-wn 0
z= i=1 wi xi

6/61

6
• Introducing sigmoid neurons where the output
1
function is much smoother than the step function
• Here is one form of the sigmoid function called the
logistic function
y

1
y= Pn
1+ e−(w0 + i=1 wi xi )

P-wn 0
z= i=1 wi xi

6/61

6
• Introducing sigmoid neurons where the output
1
function is much smoother than the step function
• Here is one form of the sigmoid function called the
logistic function
y

1
y= Pn
1+ e−(w0 + i=1 wi xi )
• We no longer see a sharp transition around the
P-w0
n threshold -w0
z= i=1 wi xi

6/61

6
• Introducing sigmoid neurons where the output
1
function is much smoother than the step function
• Here is one form of the sigmoid function called the
logistic function
y

1
y= Pn
1+ e−(w0 + i=1 wi xi )
• We no longer see a sharp transition around the
P-w0
n threshold -w0
z= i=1 wi xi • Also the output y is no longer binary but a real
value between 0 and 1 which can be interpreted
as a probability

6/61

6
• Introducing sigmoid neurons where the output
1
function is much smoother than the step function
• Here is one form of the sigmoid function called the
logistic function
y

1
y= Pn
1+ e−(w0 + i=1 wi xi )
• We no longer see a sharp transition around the
P-w0
n threshold -w0
z= i=1 wi xi • Also the output y is no longer binary but a real
value between 0 and 1 which can be interpreted
as a probability
• Instead of a like/dislike decision we get the prob-
ability of liking the movie 6/61

6
Perceptron Sigmoid (logistic) Neuron
y y

w0 = −θ w1 w2 .. .. wn w0 = −θ w1 w2 .. .. wn

x0 = 1 x1 x2 .. .. xn x0 = 1 x1 x2 .. .. xn
n 1
y=
X
y=1 if wi ∗ xi ≥ 0 Pn
1+ e−( i=0 wi xi )
i=0
Xn
= 0 if wi ∗ xi < 0
i=0 7/61

7
Perceptron Sigmoid Neuron
y 1 1

y
P-wn 0 P-wn 0
z= i=1 wi xi z= i=1 wi xi
Not smooth, not continuous (at w0), not

differentiable

8/61

8
Perceptron Sigmoid Neuron
y 1 1

y
P-wn 0 P-wn 0
z= i=1 wi xi z= i=1 wi xi
Not smooth, not continuous (at w0), not
Smooth, continuous, differentiable
differentiable

8/61

8
Module 3.2: A typical Supervised Machine Learning Setup

9/61

9
• What next ?
Sigmoid (logistic) Neuron
y

w0 = −θ w1 w2 .. .. wn

x0 = 1 x1 x2 .. .. xn

10/61

10
• What next ?
Sigmoid (logistic) Neuron • Well, just as we had an algorithm for learning the
y
weights of a perceptron, we also need a way of
learning the weights of a sigmoid neuron

w0 = −θ w1 w2 .. .. wn

x0 = 1 x1 x2 .. .. xn

10/61

10
• What next ?
Sigmoid (logistic) Neuron • Well, just as we had an algorithm for learning the
y
weights of a perceptron, we also need a way of
learning the weights of a sigmoid neuron
• Before we see such an algorithm we will revisit the
concept of error

w0 = −θ w1 w2 .. .. wn

x0 = 1 x1 x2 .. .. xn

10/61

10
• Earlier we mentioned that a single perceptron cannot deal
with this data because it is not linearly separable

11/61

11
• Earlier we mentioned that a single perceptron cannot deal
with this data because it is not linearly separable
• What does “cannot deal with” mean?

11/61

11
• Earlier we mentioned that a single perceptron cannot deal
with this data because it is not linearly separable
• What does “cannot deal with” mean?
• What would happen if we use a perceptron model to clas-
sify this data ?

11/61

11
• Earlier we mentioned that a single perceptron cannot deal
with this data because it is not linearly separable
• What does “cannot deal with” mean?
• What would happen if we use a perceptron model to clas-
sify this data ?
• We would probably end up with a line like this ...

11/61

11
• Earlier we mentioned that a single perceptron cannot deal
with this data because it is not linearly separable
• What does “cannot deal with” mean?
• What would happen if we use a perceptron model to clas-
sify this data ?
• We would probably end up with a line like this ...
• This line doesn’t seem to be too bad

11/61

11
• Earlier we mentioned that a single perceptron cannot deal
with this data because it is not linearly separable
• What does “cannot deal with” mean?
• What would happen if we use a perceptron model to clas-
sify this data ?
• We would probably end up with a line like this ...
• This line doesn’t seem to be too bad
• Sure, it misclassifies 3 blue points and 3 red points but we
could live with this error in most real world applications

11/61

11
• Earlier we mentioned that a single perceptron cannot deal
with this data because it is not linearly separable
• What does “cannot deal with” mean?
• What would happen if we use a perceptron model to clas-
sify this data ?
• We would probably end up with a line like this ...
• This line doesn’t seem to be too bad
• Sure, it misclassifies 3 blue points and 3 red points but we
could live with this error in most real world applications
• From now on, we will accept that it is hard to drive the error
to 0 in most cases and will instead aim to reach the min-
imum possible error
11/61

11
This brings us to a typical machine learning setup which has the following components...

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)
or ŷ = wT x

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)
or ŷ = wT x
or ŷ = xT Wx

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)
or ŷ = wT x
or ŷ = xT Wx
or just about any function

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)
or ŷ = wT x
or ŷ = xT Wx
or just about any function
• Parameters: In all the above cases, w is a parameter which needs to be learned from
the data

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)
or ŷ = wT x
or ŷ = xT Wx
or just about any function
• Parameters: In all the above cases, w is a parameter which needs to be learned from
the data
• Learning algorithm: An algorithm for learning the parameters (w) of the model (for
example, perceptron learning algorithm, gradient descent, etc.)

12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)
or ŷ = wT x
or ŷ = xT Wx
or just about any function
• Parameters: In all the above cases, w is a parameter which needs to be learned from
the data
• Learning algorithm: An algorithm for learning the parameters (w) of the model (for
example, perceptron learning algorithm, gradient descent, etc.)
• Objective/Loss/Error function: To guide the learning algorithm
12/61

12
This brings us to a typical machine learning setup which has the following components...
• Data: {xi , yi }ni=1
• Model: Our approximation of the relation between x and y. For example,
1
ŷ =
1 + e−(wT x)
or ŷ = wT x
or ŷ = xT Wx
or just about any function
• Parameters: In all the above cases, w is a parameter which needs to be learned from
the data
• Learning algorithm: An algorithm for learning the parameters (w) of the model (for
example, perceptron learning algorithm, gradient descent, etc.)
• Objective/Loss/Error function: To guide the learning algorithm - the learning algorithm
12/61
should aim to minimize the loss function
12
As an illustration, consider our movie example

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1
• Model: Our approximation of the relation between x and y (the probability of liking a
movie).

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1
• Model: Our approximation of the relation between x and y (the probability of liking a
movie).
1
ŷ =
1 + e−(wT x)

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1
• Model: Our approximation of the relation between x and y (the probability of liking a
movie).
1
ŷ =
1 + e−(wT x)

• Parameter: w

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1
• Model: Our approximation of the relation between x and y (the probability of liking a
movie).
1
ŷ =
1 + e−(wT x)

• Parameter: w
• Learning algorithm: Gradient Descent [we will see soon]

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1
• Model: Our approximation of the relation between x and y (the probability of liking a
movie).
1
ŷ =
1 + e−(wT x)

• Parameter: w
• Learning algorithm: Gradient Descent [we will see soon]
• Objective/Loss/Error function:

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1
• Model: Our approximation of the relation between x and y (the probability of liking a
movie).
1
ŷ =
1 + e−(wT x)

• Parameter: w
• Learning algorithm: Gradient Descent [we will see soon]
• Objective/Loss/Error function: One possibility is
n
(ŷi − yi )2
X
L (w) =
i=1

13/61

13
As an illustration, consider our movie example
• Data: {xi = movie, yi = like/dislike}ni=1
• Model: Our approximation of the relation between x and y (the probability of liking a
movie).
1
ŷ =
1 + e−(wT x)

• Parameter: w
• Learning algorithm: Gradient Descent [we will see soon]
• Objective/Loss/Error function: One possibility is
n
(ŷi − yi )2
X
L (w) =
i=1

13/61
The learning algorithm should aim to find a w which minimizes the above function
13
(squared error between y and ŷ)
Module 3.3: Learning Parameters: (Infeasible) guess work

14/61

14
y • Keeping this supervised ML setup in mind, we will
now focus on this model and discuss an algorithm
for learning the parameters of this model from
σ some given data using an appropriate objective
function

w0 = −θ w1 w2 .. .. wn

x0 = 1 x1 x2 .. .. xn
1
f (x) = 1+e−(w·x+b)

15/61

15
y • Keeping this supervised ML setup in mind, we will
now focus on this model and discuss an algorithm
for learning the parameters of this model from
σ some given data using an appropriate objective
function

w0 = −θ w1 w2 .. .. wn • σ stands for the sigmoid function (logistic func-

tion in this case)
x0 = 1 x1 x2 .. .. xn
1
f (x) = 1+e−(w·x+b)

15/61

15
x σ ŷ = f (x) • Keeping this supervised ML setup in mind, we will
now focus on this model and discuss an algorithm
1 for learning the parameters of this model from
1 some given data using an appropriate objective
f(x) = 1+e−(w·x+b) function
• σ stands for the sigmoid function (logistic func-
tion in this case)
• For ease of explanation, we will consider a very
simplified version of the model having just 1 input

15/61

15
x
w
σ ŷ = f (x) • Keeping this supervised ML setup in mind, we will
now focus on this model and discuss an algorithm
1 b
for learning the parameters of this model from
1 some given data using an appropriate objective
f(x) = 1+e−(w·x+b) function
• σ stands for the sigmoid function (logistic func-
tion in this case)
• For ease of explanation, we will consider a very
simplified version of the model having just 1 input
• Further to be consistent with the literature, from
now on, we will refer to w0 as b (bias)

15/61

15
x
w
σ ŷ = f (x) • Keeping this supervised ML setup in mind, we will
now focus on this model and discuss an algorithm
1 b
for learning the parameters of this model from
1 some given data using an appropriate objective
f(x) = 1+e−(w·x+b) function
• σ stands for the sigmoid function (logistic func-
tion in this case)
• For ease of explanation, we will consider a very
simplified version of the model having just 1 input
• Further to be consistent with the literature, from
now on, we will refer to w0 as b (bias)
• Lastly, instead of considering the problem of pre-
dicting like/dislike, we will assume that we want to
predict criticsRating(y) given imdbRating(x) (for no 15/61
particular reason) 15
x σ ŷ = f(x)
w

1 b

1
f (x) = 1+e−(w·x+b)

16/61

16
x σ ŷ = f(x)
w Input for training
1 b {xi , yi }Ni=1 → N pairs of (x, y)

1
f (x) = 1+e−(w·x+b)

16/61

16
x σ ŷ = f(x)
w Input for training
1 b {xi , yi }Ni=1 → N pairs of (x, y)
Training objective
1
f (x) = 1+e−(w·x+b) Find w and b such that:
N
(yi − f(xi ))2
X
minimize L (w, b) =
w,b
i=1

16/61

16
x σ ŷ = f(x)
w Input for training
1 b {xi , yi }Ni=1 → N pairs of (x, y)
Training objective
1
f (x) = 1+e−(w·x+b) Find w and b such that:
N
(yi − f(xi ))2
X
minimize L (w, b) =
w,b
i=1

16/61

16
x σ ŷ = f(x)
w What does it mean to train the network?
b • Suppose we train the network with
1
(x, y) = (0.5, 0.2) and (2.5, 0.9)
1
f (x) = 1+e−(w·x+b)

16/61

16
x σ ŷ = f(x)
w What does it mean to train the network?
b • Suppose we train the network with
1
(x, y) = (0.5, 0.2) and (2.5, 0.9)
1
f (x) = 1+e−(w·x+b) • At the end of training we expect to find
w*, b* such that:

16/61

16
x σ ŷ = f(x)
w What does it mean to train the network?
b • Suppose we train the network with
1
(x, y) = (0.5, 0.2) and (2.5, 0.9)
1
f (x) = 1+e−(w·x+b) • At the end of training we expect to find
w*, b* such that:
• f(0.5) → 0.2 and f(2.5) → 0.9

16/61

16
x σ ŷ = f(x)
w What does it mean to train the network?
b • Suppose we train the network with
1
(x, y) = (0.5, 0.2) and (2.5, 0.9)
1
f (x) = 1+e−(w·x+b) • At the end of training we expect to find
w*, b* such that:
• f(0.5) → 0.2 and f(2.5) → 0.9

In other words...
• We hope to find a sigmoid function such
that (0.5, 0.2) and (2.5, 0.9) lie on this
sigmoid

16/61

16
x σ ŷ = f(x)
w What does it mean to train the network?
b • Suppose we train the network with
1
(x, y) = (0.5, 0.2) and (2.5, 0.9)
1
f (x) = 1+e−(w·x+b) • At the end of training we expect to find
w*, b* such that:
• f(0.5) → 0.2 and f(2.5) → 0.9

In other words...
• We hope to find a sigmoid function such
that (0.5, 0.2) and (2.5, 0.9) lie on this
sigmoid

16/61

16
Let us see this in more detail....

17/61

17
1
σ(x) =
1+ e−(wx+b)

18/61

18
• Can we try to find such a w∗ , b∗ manually

1
σ(x) =
1+ e−(wx+b)

18/61

18
• Can we try to find such a w∗ , b∗ manually
• Let us try a random guess.. (say, w = 0.5, b = 0)

1
σ(x) =
1+ e−(wx+b)

18/61

18
• Can we try to find such a w∗ , b∗ manually
• Let us try a random guess.. (say, w = 0.5, b = 0)
• Clearly not good, but how bad is it ?

1
σ(x) =
1+ e−(wx+b)

18/61

18
• Can we try to find such a w∗ , b∗ manually
• Let us try a random guess.. (say, w = 0.5, b = 0)
• Clearly not good, but how bad is it ?
• Let us revisit L (w, b) to see how bad it is ...

1
σ(x) =
1+ e−(wx+b)

18/61

18
N
1 X
L (w, b) = ∗ (yi − f(xi ))2
2
i=1

1
σ(x) =
1+ e−(wx+b)

18/61

18
N
1 X
L (w, b) = ∗ (yi − f(xi ))2
2
i=1
1
= ∗ (y1 − f (x1 ))2 + (y2 − f(x2 ))2
2

1
σ(x) =
1+ e−(wx+b)

18/61

18
N
1 X
L (w, b) = ∗ (yi − f(xi ))2
2
i=1
1
= ∗ (y1 − f (x1 ))2 + (y2 − f(x2 ))2
2
1
= ∗ (0.9 − f(2.5))2 + (0.2 − f(0.5))2
2

1
σ(x) =
1+ e−(wx+b)

18/61

18
N
1 X
L (w, b) = ∗ (yi − f(xi ))2
2
i=1
1
= ∗ (y1 − f (x1 ))2 + (y2 − f(x2 ))2
2
1
= ∗ (0.9 − f(2.5))2 + (0.2 − f(0.5))2
2
= 0.073

1
σ(x) =
1+ e−(wx+b)

18/61

18
N
1 X
L (w, b) = ∗ (yi − f(xi ))2
2
i=1
1
= ∗ (y1 − f (x1 ))2 + (y2 − f(x2 ))2
2
1
= ∗ (0.9 − f(2.5))2 + (0.2 − f(0.5))2
2
= 0.073

We want L (w, b) to be as close to 0 as possible

1
σ(x) =
1 + e−(wx+b)

18/61

18
Let us try some other values of w, b

w b L (w, b)
0.50 0.00 0.0730

1
σ(x) =
1+ e−(wx+b)

18/61

18
Let us try some other values of w, b

w b L (w, b)
0.50 0.00 0.0730
-0.10 0.00 0.1481

1 Oops!! this made things even worse...

σ(x) =
1+ e−(wx+b)

18/61

18
Let us try some other values of w, b

w b L (w, b)
0.50 0.00 0.0730
-0.10 0.00 0.1481
0.94 -0.94 0.0214

1 Perhaps it would help to push w and b in the other

σ(x) = direction...
1 + e−(wx+b)

18/61

18
Let us try some other values of w, b

w b L (w, b)
0.50 0.00 0.0730
-0.10 0.00 0.1481
0.94 -0.94 0.0214
1.42 -1.73 0.0028

1 Let us keep going in this direction, i.e., increase w and

σ(x) = decrease b
1 + e−(wx+b)

18/61

18
Let us try some other values of w, b

w b L (w, b)
0.50 0.00 0.0730
-0.10 0.00 0.1481
0.94 -0.94 0.0214
1.42 -1.73 0.0028
1.65 -2.08 0.0003

1
σ(x) = Let us keep going in this direction, i.e., increase w and
1+ e−(wx+b)
decrease b

18/61

18
Let us try some other values of w, b

w b L (w, b)
0.50 0.00 0.0730
-0.10 0.00 0.1481
0.94 -0.94 0.0214
1.42 -1.73 0.0028
1.65 -2.08 0.0003
1.78 -2.27 0.0000

1
σ(x) = With some guess work and intuition we were able to
1+ e−(wx+b)
find the right values for w and b

18/61

18
Let us look at something better than our “guess work” algorithm....

19/61

19
• Since we have only 2 points and 2 para-
meters (w, b) we can easily plot L (w, b)
for different values of (w, b) and pick the
one where L (w, b) is minimum

20/61

20
• Since we have only 2 points and 2 para-
meters (w, b) we can easily plot L (w, b)
for different values of (w, b) and pick the
one where L (w, b) is minimum

20/61

20
• Since we have only 2 points and 2 para-
meters (w, b) we can easily plot L (w, b)
for different values of (w, b) and pick the
one where L (w, b) is minimum
• But of course this becomes intractable
once you have many more data points
and many more parameters !!

20/61

20
• Since we have only 2 points and 2 para-
meters (w, b) we can easily plot L (w, b)
for different values of (w, b) and pick the
one where L (w, b) is minimum
• But of course this becomes intractable
once you have many more data points
and many more parameters !!
• Further, even here we have plotted the er-
ror surface only for a small range of (w, b)
[from (−6, 6) and not from (− inf, inf)]

20/61

20
Let us look at the geometric interpretation of our “guess work”
algorithm in terms of this error surface

21/61

21
22/61

22
22/61

22
22/61

22
22/61

22
22/61

22
22/61

22
22/61

22
Module 3.4: Learning Parameters : Gradient Descent

23/61

23
Now let us see if there is a more efficient and principled way of
doing this

24/61

24
Goal
Find a better way of traversing the error surface so that we can reach the minimum value
quickly without resorting to brute force search!

25/61

25
vector of parameters,
say, randomly initialized
θ = [w, b]

26/61

26
vector of parameters,
say, randomly initialized
θ = [w, b]

∆θ = [∆w, ∆b]
change in the
values of w, b

26/61

26
vector of parameters,
say, randomly initialized
θ = [w, b] θ

∆θ = [∆w, ∆b]
∆θ
change in the
values of w, b

26/61

26
vector of parameters,
say, randomly initialized
θ = [w, b] θ θnew

∆θ = [∆w, ∆b]
∆θ
change in the
values of w, b

26/61

26
vector of parameters,
say, randomly initialized
We moved in the direction
θ = [w, b] θ θnew
of ∆θ
∆θ = [∆w, ∆b]
∆θ
change in the
values of w, b

26/61

26
vector of parameters,
say, randomly initialized
We moved in the direction
θ = [w, b] θ θnew
of ∆θ
∆θ = [∆w, ∆b]
∆θ Let us be a bit conservat-
change in the
ive: move only by a small
values of w, b
amount η

26/61

26
vector of parameters,
say, randomly initialized
We moved in the direction
θ = [w, b] θ θnew
of ∆θ
∆θ = [∆w, ∆b]
η · ∆θ ∆θ Let us be a bit conservat-
change in the
ive: move only by a small
values of w, b
amount η

26/61

26
vector of parameters,
say, randomly initialized
We moved in the direction
θ = [w, b] θ θnew
of ∆θ
∆θ = [∆w, ∆b]
η · ∆θ ∆θ Let us be a bit conservat-
change in the
ive: move only by a small
values of w, b
amount η

26/61

26
vector of parameters,
say, randomly initialized
We moved in the direction
θ = [w, b] θ θnew
of ∆θ
∆θ = [∆w, ∆b]
η · ∆θ ∆θ Let us be a bit conservat-
change in the
ive: move only by a small
values of w, b
amount η
θnew = θ + η · ∆θ

26/61

26
vector of parameters,
say, randomly initialized
We moved in the direction
θ = [w, b] θ θnew
of ∆θ
∆θ = [∆w, ∆b]
η · ∆θ ∆θ Let us be a bit conservat-
change in the
ive: move only by a small
values of w, b
amount η
θnew = θ + η · ∆θ

Question: What is the right ∆θ to use ?

26/61

26
vector of parameters,
say, randomly initialized
We moved in the direction
θ = [w, b] θ θnew
of ∆θ
∆θ = [∆w, ∆b]
η · ∆θ ∆θ Let us be a bit conservat-
change in the
ive: move only by a small
values of w, b
amount η
θnew = θ + η · ∆θ

Question: What is the right ∆θ to use ?

The answer comes from Taylor series

26/61

26
For ease of notation, let ∆θ = u, then from Taylor series, we have,

27/61

27
For ease of notation, let ∆θ = u, then from Taylor series, we have,

η2 η3 η4
L (θ + ηu) = L (θ) + η ∗ uT ∇θ L (θ) + ∗ uT ∇2 L (θ)u + ∗ ... + ∗ ...
2! 3! 4!

27/61

27
For ease of notation, let ∆θ = u, then from Taylor series, we have,

η2 η3 η4
L (θ + ηu) = L (θ) + η ∗ uT ∇θ L (θ) + ∗ uT ∇2 L (θ)u + ∗ ... + ∗ ...
2! 3! 4!
= L (θ) + η ∗ uT ∇θ L (θ) [η is typically small, so η 2 , η 3 , .. → 0]

27/61

27
For ease of notation, let ∆θ = u, then from Taylor series, we have,

η2 η3 η4
L (θ + ηu) = L (θ) + η ∗ uT ∇θ L (θ) + ∗ uT ∇2 L (θ)u + ∗ ... + ∗ ...
2! 3! 4!
= L (θ) + η ∗ uT ∇θ L (θ) [η is typically small, so η 2 , η 3 , .. → 0]

Note that the move (ηu) would be favorable only if,

L (θ + ηu) − L (θ) < 0 [i.e., if the new loss is less than the previous loss]

27/61

27
For ease of notation, let ∆θ = u, then from Taylor series, we have,

η2 η3 η4
L (θ + ηu) = L (θ) + η ∗ uT ∇θ L (θ) + ∗ uT ∇2 L (θ)u + ∗ ... + ∗ ...
2! 3! 4!
= L (θ) + η ∗ uT ∇θ L (θ) [η is typically small, so η 2 , η 3 , .. → 0]

Note that the move (ηu) would be favorable only if,

L (θ + ηu) − L (θ) < 0 [i.e., if the new loss is less than the previous loss]

This implies,

uT ∇θ L (θ) < 0

27/61

27
Okay, so we have,

uT ∇θ L (θ) < 0

But, what is the range of uT ∇θ L (θ) ?

28/61

28
Okay, so we have,

uT ∇θ L (θ) < 0

But, what is the range of uT ∇θ L (θ) ? Let us see....

28/61

28
Okay, so we have,

uT ∇θ L (θ) < 0

But, what is the range of uT ∇θ L (θ) ? Let us see....

Let β be the angle between u and ∇θ L (θ), then we know that,

28/61

28
Okay, so we have,

uT ∇θ L (θ) < 0

But, what is the range of uT ∇θ L (θ) ? Let us see....

Let β be the angle between u and ∇θ L (θ), then we know that,

uT ∇θ L (θ)
−1 ≤ cos(β) = ≤1
||u|| ∗ ||∇θ L (θ)||

28/61

28
Okay, so we have,

uT ∇θ L (θ) < 0

But, what is the range of uT ∇θ L (θ) ? Let us see....

Let β be the angle between u and ∇θ L (θ), then we know that,

uT ∇θ L (θ)
−1 ≤ cos(β) = ≤1
||u|| ∗ ||∇θ L (θ)||

28/61

28
Okay, so we have,

uT ∇θ L (θ) < 0

But, what is the range of uT ∇θ L (θ) ? Let us see....

Let β be the angle between u and ∇θ L (θ), then we know that,

uT ∇θ L (θ)
−1 ≤ cos(β) = ≤1
||u|| ∗ ||∇θ L (θ)||

multiply throughout by k = ||u|| ∗ ||∇θ L (θ)||

−k ≤ k ∗ cos(β) = uT ∇θ L (θ) ≤ k

28/61

28
Okay, so we have,

uT ∇θ L (θ) < 0

But, what is the range of uT ∇θ L (θ) ? Let us see....

Let β be the angle between u and ∇θ L (θ), then we know that,

uT ∇θ L (θ)
−1 ≤ cos(β) = ≤1
||u|| ∗ ||∇θ L (θ)||

multiply throughout by k = ||u|| ∗ ||∇θ L (θ)||

−k ≤ k ∗ cos(β) = uT ∇θ L (θ) ≤ k

Thus, L (θ + ηu) − L (θ) = uT ∇θ L (θ) = k ∗ cos(β) will be most negative when

cos(β) = −1 i.e., when β is 180°
28/61

28
Gradient Descent Rule
• The direction u that we intend to move in should be at 180° w.r.t. the gradient

29/61

29
Gradient Descent Rule
• The direction u that we intend to move in should be at 180° w.r.t. the gradient
• In other words, move in a direction opposite to the gradient

29/61

29
Gradient Descent Rule
• The direction u that we intend to move in should be at 180° w.r.t. the gradient
• In other words, move in a direction opposite to the gradient

Parameter Update Equations

wt+1 = wt − η∇wt
bt+1 = bt − η∇bt
∂L (w, b) ∂L (w, b)
where, ∇wt = , ∇b =
∂w at w = wt , b = bt ∂b at w = wt , b = bt

29/61

29
Gradient Descent Rule
• The direction u that we intend to move in should be at 180° w.r.t. the gradient
• In other words, move in a direction opposite to the gradient

Parameter Update Equations

wt+1 = wt − η∇wt
bt+1 = bt − η∇bt
∂L (w, b) ∂L (w, b)
where, ∇wt = , ∇b =
∂w at w = wt , b = bt ∂b at w = wt , b = bt

So we now have a more principled way of moving in the w-b plane than our “guess work”
algorithm
29/61

29
• Let us create an algorithm from this rule ...

30/61

30
• Let us create an algorithm from this rule ...
Algorithm: gradient_descent()
t ← 0;
max_iterations ← 1000;
while t < max_iterations do
wt+1 ← wt − η∇wt ;
bt+1 ← bt − η∇bt ;
t ← t + 1;
end

30/61

30
• Let us create an algorithm from this rule ...
Algorithm: gradient_descent()
t ← 0;
max_iterations ← 1000;
while t < max_iterations do
wt+1 ← wt − η∇wt ;
bt+1 ← bt − η∇bt ;
t ← t + 1;
end

• To see this algorithm in practice let us first derive ∇w and ∇b for our toy neural network

30/61

30
x σ y = f (x)

1
f(x) = 1+e−(w·x+b)

31/61

31
x σ y = f (x)

1 Let’s assume there is only 1 point to fit (x, y)

f(x) = 1+e−(w·x+b)

31/61

31
x σ y = f (x)

1 Let’s assume there is only 1 point to fit (x, y)

f(x) = 1+e−(w·x+b)
1
L (w, b) = ∗ (f(x) − y)2
2

31/61

31
x σ y = f (x)

1 Let’s assume there is only 1 point to fit (x, y)

f(x) = 1+e−(w·x+b)
1
∗ (f(x) − y)2
L (w, b) =
2
∂L (w, b) ∂ 1
∇w = = [ ∗ (f (x) − y)2 ]
∂w ∂w 2

31/61

31
∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2

32/61

32
∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2
1 ∂
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)]
2 ∂w

32/61

32
∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2
1 ∂
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)]
2 ∂w
∂
= (f(x) − y) ∗ (f (x))
∂w

32/61

32
∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2
1 ∂
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)]
2 ∂w
∂
= (f(x) − y) ∗ (f (x))
∂w
∂ 1
= (f(x) − y) ∗
∂w 1 + e−(wx+b)

32/61

32
∂ 1 ∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2 ∂w 1 + e−(wx+b)
1 ∂
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)]
2 ∂w
∂
= (f(x) − y) ∗ (f (x))
∂w
∂ 1
= (f(x) − y) ∗
∂w 1 + e−(wx+b)

32/61

32
∂ 1 ∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2 ∂w 1 + e−(wx+b)
1 ∂ −1 ∂ −(wx+b)
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)] = (e ))
2 ∂w (1 + e −(wx+b) 2
) ∂w
∂
= (f(x) − y) ∗ (f (x))
∂w
∂ 1
= (f(x) − y) ∗
∂w 1 + e−(wx+b)

32/61

32
∂ 1 ∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2 ∂w 1 + e−(wx+b)
1 ∂ −1 ∂ −(wx+b)
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)] = (e ))
2 ∂w (1 + e −(wx+b) 2
) ∂w
∂ −1 ∂
= (f(x) − y) ∗ (f (x)) = ∗ (e−(wx+b) ) (−(wx + b)))
∂w (1 + e −(wx+b) )2 ∂w
∂ 1
= (f(x) − y) ∗
∂w 1 + e−(wx+b)

32/61

32
∂ 1 ∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2 ∂w 1 + e−(wx+b)
1 ∂ −1 ∂ −(wx+b)
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)] = (e ))
2 ∂w (1 + e −(wx+b) 2
) ∂w
∂ −1 ∂
= (f(x) − y) ∗ (f (x)) = ∗ (e−(wx+b) ) (−(wx + b)))
∂w (1 + e −(wx+b) )2 ∂w
∂ 1
= (f(x) − y) ∗ −1 e−(wx+b)
∂w 1 + e−(wx+b) = ∗ ∗ (−x)
(1 + e−(wx+b) ) (1 + e−(wx+b) )

32/61

32
∂ 1 ∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2 ∂w 1 + e−(wx+b)
1 ∂ −1 ∂ −(wx+b)
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)] = (e ))
2 ∂w (1 + e −(wx+b) 2
) ∂w
∂ −1 ∂
= (f(x) − y) ∗ (f (x)) = ∗ (e−(wx+b) ) (−(wx + b)))
∂w (1 + e −(wx+b) )2 ∂w
∂ 1
= (f(x) − y) ∗ −1 e−(wx+b)
∂w 1 + e−(wx+b) = ∗ ∗ (−x)
(1 + e−(wx+b) ) (1 + e−(wx+b) )
1 e−(wx+b)
= ∗ ∗ (x)
(1 + e−(wx+b) ) (1 + e−(wx+b) )

32/61

32
∂ 1 ∂ 1
∇w = [ ∗ (f(x) − y)2 ]
∂w 2 ∂w 1 + e−(wx+b)
1 ∂ −1 ∂ −(wx+b)
= ∗ [2 ∗ (f(x) − y) ∗ (f(x) − y)] = (e ))
2 ∂w (1 + e −(wx+b) 2
) ∂w
∂ −1 ∂
= (f(x) − y) ∗ (f (x)) = ∗ (e−(wx+b) ) (−(wx + b)))
∂w (1 + e −(wx+b) )2 ∂w
∂ 1
= (f(x) − y) ∗ −1 e−(wx+b)
∂w 1 + e−(wx+b) = ∗ ∗ (−x)
(1 + e−(wx+b) ) (1 + e−(wx+b) )
= (f(x) − y) ∗ f(x) ∗ (1 − f(x)) ∗ x
1 e−(wx+b)
= ∗ ∗ (x)
(1 + e−(wx+b) ) (1 + e−(wx+b) )
= f(x) ∗ (1 − f(x)) ∗ x

32/61

32
x σ y = f (x)

1 So if there is only 1 point (x, y), we have,

f(x) = 1+e−(w·x+b)

33/61

33
x σ y = f (x)

1 So if there is only 1 point (x, y), we have,

f(x) = 1+e−(w·x+b)

∇w = (f(x) − y) ∗ f(x) ∗ (1 − f(x)) ∗ x

33/61

33
x σ y = f (x)

1 So if there is only 1 point (x, y), we have,

f(x) = 1+e−(w·x+b)

∇w = (f(x) − y) ∗ f(x) ∗ (1 − f(x)) ∗ x

For two points,

33/61

33
x σ y = f (x)

1 So if there is only 1 point (x, y), we have,

f(x) = 1+e−(w·x+b)

∇w = (f(x) − y) ∗ f(x) ∗ (1 − f(x)) ∗ x

For two points,

2
X
∇w = (f (xi ) − yi ) ∗ f(xi ) ∗ (1 − f(xi )) ∗ xi
i=1

33/61

33
x σ y = f (x)

1 So if there is only 1 point (x, y), we have,

f(x) = 1+e−(w·x+b)

∇w = (f(x) − y) ∗ f(x) ∗ (1 − f(x)) ∗ x

For two points,

2
X
∇w = (f (xi ) − yi ) ∗ f(xi ) ∗ (1 − f(xi )) ∗ xi
i=1
X2
∇b = (f (xi ) − yi ) ∗ f(xi ) ∗ (1 − f(xi ))
i=1
33/61

33
34/61

34
34/61

34
34/61

34
34/61

34
34/61

34
34/61

34
34/61

34
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35
35/61

35

You might also like

Learning XOR - Gradient Based Learning - Hidden Units
No ratings yet
Learning XOR - Gradient Based Learning - Hidden Units
43 pages
Online Doctor Appointment System
69% (13)
Online Doctor Appointment System
20 pages
Module 2
100% (1)
Module 2
62 pages
1.deep Learning Assignment1 Solutions 1
100% (3)
1.deep Learning Assignment1 Solutions 1
12 pages
Sigmoid Neurons - Gradient Descent
No ratings yet
Sigmoid Neurons - Gradient Descent
15 pages
DMBI-MMK-Lec-29-CIET-Neural Network-Part-IV
No ratings yet
DMBI-MMK-Lec-29-CIET-Neural Network-Part-IV
473 pages
Unit V
No ratings yet
Unit V
26 pages
Chapter 5 Artificial Neural Networks
No ratings yet
Chapter 5 Artificial Neural Networks
50 pages
Lecture 3
No ratings yet
Lecture 3
92 pages
Unit-7 ANN
No ratings yet
Unit-7 ANN
211 pages
Sigmoid Neuron - Parveen Khurana - Medium
No ratings yet
Sigmoid Neuron - Parveen Khurana - Medium
47 pages
Nitin Sir Notes
No ratings yet
Nitin Sir Notes
66 pages
Python GUI Automation For Beginners
100% (1)
Python GUI Automation For Beginners
126 pages
08 Neural Networks
No ratings yet
08 Neural Networks
47 pages
MAT6007 - Session7 - Sigmoid Neurons - Gradient Descent
No ratings yet
MAT6007 - Session7 - Sigmoid Neurons - Gradient Descent
19 pages
Lecture 3 H
No ratings yet
Lecture 3 H
70 pages
EAI - Lecture 3
No ratings yet
EAI - Lecture 3
33 pages
Artifical Neural Networks - Lect - 2
No ratings yet
Artifical Neural Networks - Lect - 2
16 pages
Module 4
No ratings yet
Module 4
55 pages
Module 5
No ratings yet
Module 5
27 pages
ML Lec-21
No ratings yet
ML Lec-21
18 pages
ML Lecture#3
No ratings yet
ML Lecture#3
37 pages
Perceptron For Class
No ratings yet
Perceptron For Class
28 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
Lecture 19 NN
No ratings yet
Lecture 19 NN
32 pages
L2 Perceptrons, Function Approximation, Classification
No ratings yet
L2 Perceptrons, Function Approximation, Classification
89 pages
Unit 1.1
No ratings yet
Unit 1.1
44 pages
ANN BackPropagation
No ratings yet
ANN BackPropagation
17 pages
20.NeuralNets Short
No ratings yet
20.NeuralNets Short
60 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Ann Muj
No ratings yet
Ann Muj
65 pages
CHP 9
No ratings yet
CHP 9
29 pages
Preceptron
No ratings yet
Preceptron
17 pages
cs188 sp24 Note22
No ratings yet
cs188 sp24 Note22
8 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
UNIT III 3.1 ML Artificial Neural Networks
No ratings yet
UNIT III 3.1 ML Artificial Neural Networks
65 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
International Baccalaureate (IB) : Artificial Neural Networks - #1
No ratings yet
International Baccalaureate (IB) : Artificial Neural Networks - #1
33 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Lecture 19 NN
No ratings yet
Lecture 19 NN
32 pages
Neural Network
No ratings yet
Neural Network
82 pages
08 NN
No ratings yet
08 NN
43 pages
Week 1
No ratings yet
Week 1
4 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
86 pages
ML Unit 3 Study Material-1
No ratings yet
ML Unit 3 Study Material-1
32 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
DL Assignment Solutions
No ratings yet
DL Assignment Solutions
64 pages
ANN Presentation Exam Tanjina
No ratings yet
ANN Presentation Exam Tanjina
21 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
Linearly Separable 1
No ratings yet
Linearly Separable 1
36 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
NNDL
No ratings yet
NNDL
96 pages
TT1 QBAns1
No ratings yet
TT1 QBAns1
15 pages
Unit 4
No ratings yet
Unit 4
9 pages
A Detailed Study On E - Payment Modes and Its Impact
100% (1)
A Detailed Study On E - Payment Modes and Its Impact
53 pages
A Presentation On: By: Edutechlearners
No ratings yet
A Presentation On: By: Edutechlearners
33 pages
ISW2001NBF - AEB (VERSIONE 6.1.3) - Installatore - ENG PDF
No ratings yet
ISW2001NBF - AEB (VERSIONE 6.1.3) - Installatore - ENG PDF
33 pages
WV Algorithms PDF
No ratings yet
WV Algorithms PDF
2 pages
Simple Perceptrons: 1 Nonlinearity
No ratings yet
Simple Perceptrons: 1 Nonlinearity
5 pages
Missing Neighbors in WCDMA Analysis Guide
100% (2)
Missing Neighbors in WCDMA Analysis Guide
15 pages
CGI EA Maturity Model
No ratings yet
CGI EA Maturity Model
1 page
Product CI854A
No ratings yet
Product CI854A
3 pages
Unit 1
No ratings yet
Unit 1
43 pages
CS Assignment 3
No ratings yet
CS Assignment 3
11 pages
Unit 5 UCSD Notes
No ratings yet
Unit 5 UCSD Notes
2 pages
CS Assignment 4
No ratings yet
CS Assignment 4
12 pages
MCQ1
No ratings yet
MCQ1
22 pages
CS Assignment 7
No ratings yet
CS Assignment 7
7 pages
System Requirements Autodesk Autocad 2021
No ratings yet
System Requirements Autodesk Autocad 2021
3 pages
DC72W 50
No ratings yet
DC72W 50
8 pages
n5 Office Practice Topic 4 Security at Workstation
No ratings yet
n5 Office Practice Topic 4 Security at Workstation
15 pages
Transpose of A Matrix in Python With User Input
No ratings yet
Transpose of A Matrix in Python With User Input
15 pages
Prepositions of Place - My Room
100% (1)
Prepositions of Place - My Room
1 page
Assignment Guidelines-July'24 Session
No ratings yet
Assignment Guidelines-July'24 Session
2 pages
PC DMIS Software de Masura PDF
No ratings yet
PC DMIS Software de Masura PDF
24 pages
Color Pattern - Test Basic - A4
No ratings yet
Color Pattern - Test Basic - A4
3 pages
On-Line Monetary Transaction: Marketing in IT
No ratings yet
On-Line Monetary Transaction: Marketing in IT
16 pages
Nguyễn Minh Thuận: Education
No ratings yet
Nguyễn Minh Thuận: Education
2 pages
Database Deign UG - G Assignment 1 Semester 1 2021
No ratings yet
Database Deign UG - G Assignment 1 Semester 1 2021
4 pages
Hacking Wireless Wifi
No ratings yet
Hacking Wireless Wifi
7 pages
The Visakhapatnam Co-Operative Bank LTD: Vacancies
No ratings yet
The Visakhapatnam Co-Operative Bank LTD: Vacancies
13 pages
Configuring ODI External User Authentication
No ratings yet
Configuring ODI External User Authentication
18 pages
Genetic Algorithms
No ratings yet
Genetic Algorithms
14 pages
Log Horizon Useful Links
No ratings yet
Log Horizon Useful Links
7 pages
Datasheet Wdeh220 20120702-14235212729
No ratings yet
Datasheet Wdeh220 20120702-14235212729
4 pages
Practice Excel 2
No ratings yet
Practice Excel 2
3 pages
Moscad-L: SCADA Remote Terminal Unit
No ratings yet
Moscad-L: SCADA Remote Terminal Unit
2 pages
Intellimali WSU Press Release
No ratings yet
Intellimali WSU Press Release
1 page