0% found this document useful (0 votes)

23 views70 pages

Class Notes 02feb2023

- The document discusses linear and non-linear regression models for machine learning. - For linear regression, the model predicts outputs based on linear combinations of input features. - For non-linear regression, the model accounts for quadratic and higher-order polynomial relationships between inputs and outputs by adding derived features like squares and cross-terms to the input data. - However, adding too many higher-order terms can lead to overfitting, so regularization is introduced to penalize complex models.

Uploaded by

arindamsinharay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views70 pages

Class Notes 02feb2023

Uploaded by

arindamsinharay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Lecture 8: 2 February, 2023

Madhavan Mukund
https://www.cmi.ac.in/~madhavan

Data Mining and Machine Learning

January–April 2023
Linear regression
Training input is
{(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}

~
Each input xi is a vector (xi1 , . . . , xik )
Add xi0 = 1 by convention
yi is actual output

How far away is our prediction h✓ (xi ) from

the true answer yi ?
Define a cost (loss) function
n
1X
J(✓) = (h✓ (xi ) yi ) 2
2
i=1

Essentially, the sum squared error (SSE) -

Justified via MLE
Divide by n, mean squared error (MSE)
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 2 / 22
The non-linear case

What if the relationship is

not linear?

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 3 / 22

The non-linear case

What if the relationship is

not linear?
Here the best possible
explanation seems to be a
quadratic

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 3 / 22

The non-linear case

What if the relationship is

not linear?
Here the best possible
explanation seems to be a
quadratic
Non-linear : cross
dependencies

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 3 / 22

The non-linear case

What if the relationship is

not linear?
Here the best possible
explanation seems to be a
quadratic
Non-linear : cross
dependencies
Input xi : (xi1 , xi2 )

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 3 / 22

The non-linear case

What if the relationship is

not linear?
Here the best possible
explanation seems to be a
quadratic
Non-linear : cross
dependencies
Input xi : (xi1 , xi2 )
Quadratic dependencies:
y = ✓0 + ✓1 xi1 + ✓2 xi2 + ✓11 xi21 + ✓22 xi22 + ✓12 xi1 xi2
--
linear quadratic
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 3 / 22
The non-linear case

Recall how we fit a line

⇥ ⇤ ✓0
1 xi 1
✓1

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 4 / 22

The non-linear case

Recall how we fit a line

⇥ ⇤ ✓0
1 xi 1
✓1

For quadratic, add new

coefficients and expand
parameters
2 3
⇥ ⇤ ✓0
1 xi1 xi21 4 ✓1 5
✓2

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 4 / 22

The non-linear case

Input (xi1 , xi2 )

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 5 / 22

The non-linear case

Input (xi1 , xi2 )

For the general quadratic
case, we are adding new
derived “features”
xi 3 = xi21
xi 4 = xi22
xi 5 = x i 1 xi 2

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 5 / 22

The non-linear case

Original input matrix

2 3
1 x11 x1 2
6 1 x21 x2 2 7
6 7
6 ··· 7
6 7
6 1 xi xi 2 7
6 1 7
4 ··· 5
1 xn 1 xn 2

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 6 / 22

The non-linear case

Expanded input matrix

2 3
1 x1 1 x1 2 x121 x122 x 1 1 x1 2
6 7
6 1 x2 1 x2 2 x221 x222 x 2 1 x2 2 7
6 7
6 ··· 7
6 7
6 1 xi 1 xi 2 xi21 xi22 x i 1 xi 2 7
6 7
4 ··· 5
1 x n 1 xn 2 xn21 xn22 xn 1 xn 2

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 7 / 22

The non-linear case

Expanded input matrix

2 3
1 x1 1 x1 2 x121 x122 x 1 1 x1 2
6 7
6 1 x2 1 x2 2 x221 x222 x 2 1 x2 2 7
6 7
6 ··· 7
6 7
6 1 xi 1 xi 2 xi21 xi22 x i 1 xi 2 7
6 7
4 ··· 5
1 x n 1 xn 2 xn21 xn22 xn 1 xn 2

New columns are computed

and filled in from original
inputs

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 7 / 22

Exponential parameter blow-up

Cubic derived features

Jesic
xi31 , xi32 , xi33 ,

xi21 xi2 , xi21 xi3 ,

xi22 xi1 , xi22 xi3 ,
xi23 xi1 , xi23 xi2 ,
x i 1 xi 2 x i 3 ,

7
xi21 , xi22 , xi23 ,
Quadrate
x i 1 xi 2 , xi 1 x i 3 , x i 2 xi 3 ,

x i 1 , x i 2 , xi 3 . - Linear
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 8 / 22
Higher degree polynomials

How complex a polynomial

should we try?

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 9 / 22

Higher degree polynomials

How complex a polynomial

should we try?
Aim for degree that
minimizes SSE

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 9 / 22

Higher degree polynomials

How complex a polynomial

should we try?
Aim for degree that
minimizes SSE
As degree increases,
features explode
exponentially

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 9 / 22

Overfitting

Need to be careful about

adding higher degree terms

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 10 / 22

Overfitting

Need to be careful about

adding higher degree terms
For n training points,can
always fit polynomial of
degree (n 1) exactly
However, such a curve
would not generalize well to
new data points

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 10 / 22

Overfitting

Need to be careful about

adding higher degree terms
For n training points,can
always fit polynomial of
degree (n 1) exactly
However, such a curve
would not generalize well to
new data points
Overfitting — model fits
training data well, performs
poorly on unseen data

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 10 / 22

Regularization

Need to trade o↵ SSE

against curve complexity

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 11 / 22

Regularization

Need to trade o↵ SSE

against curve complexity
So far, the only cost has
been SSE

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 11 / 22

Regularization

Need to trade o↵ SSE

against curve complexity
So far, the only cost has
been SSE
Add a cost related to
parameters (✓0 , ✓1 , . . . , ✓k )

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 11 / 22

Regularization

Need to trade o↵ SSE

against curve complexity
So far, the only cost has
been SSE
Add a cost related to
parameters (✓0 , ✓1 , . . . , ✓k )
Minimize, for instance
n k
1X X
(zi yi )2 + ✓j2
2
i=1 j=1
5-
SSE
coefficients
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 11 / 22
Regularization

Need to trade o↵ SSE

against curve complexity
So far, the only cost has
been SSE
Add a cost related to
parameters (✓0 , ✓1 , . . . , ✓k )
Minimize, for instance
n k
1X X
(zi yi )2 + ✓j2
2
i=1 j=1

Second term penalizes curve

complexity
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 11 / 22
Regularization
Variations on regularization
Change the contribution of coefficients
to the loss function

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 12 / 22

Regularization
Variations on regularization
Change the contribution of coefficients
to the loss function

Ridge regression:
k
X
Coefficients contribute ✓j2
j=1

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 12 / 22

Regularization
Variations on regularization
Change the contribution of coefficients
to the loss function

Ridge regression:
k
X
Coefficients contribute ✓j2
j=1

LASSO regression:
k
X
Coefficients contribute |✓j |
j=1

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 12 / 22

Regularization
Variations on regularization
Change the contribution of coefficients
to the loss function

Ridge regression:
k
X
Coefficients contribute ✓j2
j=1

LASSO regression:
k
X
Coefficients contribute |✓j |
j=1

Elastic net regression:

k
X
2
Coefficients contribute 1 |✓j | + 2 ✓j
j=1
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 12 / 22
The non-polynomial case

Percentage of urban
population as a function of
per capita GDP
Not clear what polynomial
would be reasonable

⑧
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 13 / 22
The non-polynomial case

Percentage of urban
population as a function of

⑨
per capita GDP
Not clear what polynomial
would be reasonable
Take log of GDP
Regression we are
computing is
y = ✓0 + ✓1 log x1

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 13 / 22

The non-polynomial case

Reverse the relationship

Plot per capita GDP in
terms of percentage of
urbanization
Now we take log of the
output variable
log y = ✓0 + ✓1 x1
Log-linear transformation
Earlier was linear-log
Can also use log-log

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 14 / 22

Regression for classification

Regression line

Set a threshold
Classifier
Output below threshold : 0 (No)
Output above threshold : 1 (Yes)

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 15 / 22

Regression for classification
classific
Regression line ↓
PhHed
Set a threshold
line
Classifier
Output below threshold : 0 (No) Treshold
Output above threshold : 1 (Yes)

Classifier output is a step function

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 15 / 22

Smoothen the step
2
2 1,2
-
-
-> 0
Sigmoid function

(z) =
1 27-
8,1 -
0
1+e z

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 16 / 22

Smoothen the step

Sigmoid function
1
(z) = z
1+e

Input z is output of our

regression
1
(z) =
1 + e 0 1 x1 +···+✓k xk )
(✓ +✓
-

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 16 / 22

Smoothen the step

Sigmoid function

j
1
(z) = z
1+e

Input z is output of our

regression
1
(z) =
1 + e 0 1 x1 +···+✓k xk )
(✓ +✓

Adjust parameters to fix

horizontal position and steepness
of step

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 16 / 22

Logistic regression

Compute the coefficients?

Solve by gradient descent

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 17 / 22

Logistic regression

Compute the coefficients?

Solve by gradient descent

Need derivatives to exist
Hence smooth sigmoid, not
step function
Check that
0
(z) = (z)(1 (z))

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 17 / 22

Logistic regression

Compute the coefficients?

Solve by gradient descent

Need derivatives to exist
Hence smooth sigmoid, not
step function
Check that
0
(z) = (z)(1 (z))

Need a cost function to minimize

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 17 / 22

MSE for logistic regression and gradient descent
Suppose we take mean squared error as the loss function.
n
1X
C= (yi (zi ))2 , where zi = ✓0 + ✓1 xi1 + ✓2 xi2
n

Lacto
i=1
↳preducted
class

class

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 18 / 22

MSE for logistic regression and gradient descent
Suppose we take mean squared error as the loss function.
1X
n S
C= (yi (zi ))2 , where zi = ✓0 + ✓1 xi1 + ✓2 xi2
n
i=1
@C @C @C
For gradient descent, we compute , ,
@✓1 @✓2 @✓0
Consider two inputs x = (x1 , x2 )
For j = 1, 2,
O
n
@C 2X @ (zi )
= (yi (zi )) ·
@✓j n @✓j
i=1

-1.80

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 18 / 22

↓v'(z)
Consider two inputs x = (x1 , x2 )
For j = 1, 2,
n n
@C 2X @ (zi ) 2 X @ (zi ) @zi
= (yi (zi )) · = ( (zi ) yi )
@✓j n @✓j n @zi @✓j

=
i=1 i=1

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 18 / 22

MSE for logistic regression and gradient descent
Suppose we take mean squared error as the loss function.
n
1X
C= (yi (zi ))2 , where zi = ✓0 + ✓1 xi1 + ✓2 xi2
n
i=1
@C @C @C
For gradient descent, we compute , ,
@✓1 @✓2 @✓0
Consider two inputs x = (x1 , x2 )
For j = 1, 2,
n n
@C 2X @ (zi ) 2X @ (zi ) @zi
= (yi (zi )) · = ( (zi ) yi )
@✓j n @✓j n @zi @✓j
i=1 i=1
n
X
2
= ( (zi ) yi ) 0 (zi )xij
n
i=1
n n
@C 2X @ (zi ) @zi 2X
= ( (zi ) yi ) = ( (zi ) yi ) 0 (zi )
@✓0 n @zi @✓0 n
i=1 i=1
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 18 / 22
MSE for logistic regression and gradient descent . . .
n n
@C 2X @C 2X
For j = 1, 2, = ( (zi ) yi ) 0 (zi )xji , and = ( (zi ) yi ) 0 (zi )
@✓j n @✓0 n
i=1 i=1

@C @C @C 0 (z
Each term in , , is proportional to i)
@✓1 @✓2 @✓0

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 19 / 22

MSE for logistic regression and gradient descent . . .
n n
@C 2X @C 2X
For j = 1, 2, = ( (zi ) yi ) 0 (zi )xji , and = ( (zi ) yi ) 0 (zi )
@✓j n @✓0 n
i=1 i=1

@C @C @C 0 (z
Each term in , , is proportional to i)
@✓1 @✓2 @✓0
Ideally, gradient descent should take large steps when (z) y is large

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 19 / 22

MSE for logistic regression and gradient descent . . .
n n
@C 2X @C 2X
For j = 1, 2, = ( (zi ) yi ) 0 (zi )xji , and = ( (zi ) yi ) 0 (zi )
@✓j n @✓0 n
i=1 i=1

@C @C @C 0 (z
Each term in , , is proportional to i)
@✓1 @✓2 @✓0
Ideally, gradient descent should take large steps when (z) y is large
(z) is flat at both extremes
If (z) is completely wrong,
(z) ⇡ (1 y ), we still have
0 (z) ⇡ 0

Learning is slow even when current

model is far from optimal

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 19 / 22

Loss function for logistic regression

Goal is to maximize log likelihood

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 20 / 22

Loss function for logistic regression

Goal is to maximize log likelihood

Let h✓ (xi ) = (zi ).

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 20 / 22

Loss function for logistic regression

Goal is to maximize log likelihood

Let h✓ (xi ) = (zi ). So, P(yi = 1 | xi ; ✓) = h✓ (xi ),
P(yi = 0 | xi ; ✓) = 1 h✓ (xi )

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 20 / 22

Loss function for logistic regression

Goal is to maximize log likelihood

Let h✓ (xi ) = (zi ). So, P(yi = 1 | xi ; ✓) = h✓ (xi ),
P(yi = 0 | xi ; ✓) = 1 h✓ (xi )
&
y1
=
-

Combine as P(yi | xi ; ✓) = h✓ (xi )yi · (1 h✓ (xi ))1 yi

-
↑
y =
0

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 20 / 22

Loss function for logistic regression

Goal is to maximize log likelihood

Let h✓ (xi ) = (zi ). So, P(yi = 1 | xi ; ✓) = h✓ (xi ),
P(yi = 0 | xi ; ✓) = 1 h✓ (xi )

Combine as P(yi | xi ; ✓) = h✓ (xi )yi · (1 h✓ (xi ))1 yi

n
Y
Likelihood: L(✓) = h✓ (xi )yi · (1 h✓ (xi ))1 yi

i=1

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 20 / 22

Loss function for logistic regression

Goal is to maximize log likelihood

Let h✓ (xi ) = (zi ). So, P(yi = 1 | xi ; ✓) = h✓ (xi ),
P(yi = 0 | xi ; ✓) = 1 h✓ (xi )

Combine as P(yi | xi ; ✓) = h✓ (xi )yi · (1 h✓ (xi ))1 yi

U(G) log(π -)
n
Y -

h✓ (xi )yi · (1 h✓ (xi ))1 yi

Likelihood: L(✓) =
i=1
n
X
=
Elg( -

.)
Log-likelihood: `(✓) = yi log h✓ (xi ) + (1 yi ) log(1 h✓ (xi )) =

i=1
*

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 20 / 22

Loss function for logistic regression

#
Goal is to maximize log likelihood
Let h✓ (xi ) = (zi ). So, P(yi = 1 | xi ; ✓) = h✓ (xi ), vz
P(yi = 0 | xi ; ✓) = 1 h✓ (xi )

Combine as P(yi | xi ; ✓) = h✓ (xi )yi · (1 h✓ (xi ))1 yi

n
Y
Likelihood: L(✓) = h✓ (xi )yi · (1 h✓ (xi ))1 yi

i=1
n
X
Log-likelihood: `(✓) = yi log h✓ (xi ) + (1 yi ) log(1 h✓ (xi ))
-
i=1
n
X
Minimize cross entropy: yi log h✓ (xi ) + (1 yi ) log(1 h✓ (xi ))
i=1
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 20 / 22
Cross entropy and gradient descent

C= [y ln( (z)) + (1 y ) ln(1 (z))]

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 21 / 22

Cross entropy and gradient descent

C= [y ln( (z)) + (1 y ) ln(1 (z))]

@C @C @
=
@✓j @ @✓j

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 21 / 22

Cross entropy and gradient descent

↓0
C= [y ln( (z)) + (1 y ) ln(1 (z))]

@C @C @ y 1 y @
= =
@✓j @ @✓j (z) 1 (z) @✓j
-
~

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 21 / 22

Cross entropy and gradient descent

C= [y ln( (z)) + (1 y ) ln(1 (z))]


@C @C @ y 1 y @
= =
@✓j @ @✓j (z) 1 (z) @✓j

y 1 y @ @z
=
(z) 1 (z) @z @✓j

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 21 / 22

Cross entropy and gradient descent

C= [y ln( (z)) + (1 y ) ln(1 (z))]


@C @C @ y 1 y @
= =
@✓j @ @✓j (z) 1 (z) @✓j

y 1 y @ @z
=
(z) 1 (z) @z @✓j

y 1 y 0
= (z)xj
(z) 1 (z)

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 21 / 22

Cross entropy and gradient descent

C= [y ln( (z)) + (1 y ) ln(1 (z))]


@C @C @ y 1 y @
= =
@✓j @ @✓j (z) 1 (z) @✓j

y 1 y @ @z
=
(z) 1 (z) @z @✓j

y 1 y 0
= (z)xj
(z) 1 (z)

y (1 (z)) (1 y ) (z) 0
= (z)xj
(z)(1 (z))

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 21 / 22

Cross entropy and gradient descent . . .

@C y (1 (z)) (1 y ) (z) 0
= (z)xj
@✓j (z)(1 (z))
- -

Recall that 0 (z) = (z)(1 (z))

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 22 / 22

Cross entropy and gradient descent . . .

@C y (1 (z)) (1 y ) (z) 0
= (z)xj
@✓j (z)(1 (z))

Recall that 0 (z) = (z)(1 (z))

@C
Therefore, = [y (1 (z)) (1 y ) (z)]xj
@✓j

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 22 / 22

Cross entropy and gradient descent . . .

@C y (1 (z)) (1 y ) (z) 0
= (z)xj
@✓j (z)(1 (z))

Recall that 0 (z) = (z)(1 (z))

@C
Therefore, = [y (1 (z)) (1 y ) (z)]xj
@✓j
= [y y (z) (z) + y (z)]xj

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 22 / 22

Cross entropy and gradient descent . . .

@C y (1 (z)) (1 y ) (z) 0
= (z)xj
@✓j (z)(1 (z))

Recall that 0 (z) = (z)(1 (z))

@C
Therefore, = [y (1 (z)) (1 y ) (z)]xj
@✓j
= [y y/ (z) /(z) + y (z)]xj
= ( (z) y )xj

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 22 / 22

Cross entropy and gradient descent . . .

@C y (1 (z)) (1 y ) (z) 0
= (z)xj
@✓j (z)(1 (z))

Recall that 0 (z) = (z)(1 (z))

@C
Therefore, = [y (1 (z)) (1 y ) (z)]xj
@✓j
= [y y (z) (z) + y (z)]xj
= ( (z) y )xj
@C
Similarly, = ( (z) y)
@✓0

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 22 / 22

Cross entropy and gradient descent . . .

@C y (1 (z)) (1 y ) (z) 0
= (z)xj
@✓j (z)(1 (z))

Recall that 0 (z) = (z)(1 (z))

@C
Therefore, = [y (1 (z)) (1 y ) (z)]xj
@✓j
= [y y (z) (z) + y (z)]xj
= ( (z) y )xj
@C
Similarly, = ( (z) y)
@✓0
Thus, as we wanted, the gradient is proportional to (z) y

Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 22 / 22

Cross entropy and gradient descent . . .

@C y (1 (z)) (1 y ) (z) 0
= (z)xj
@✓j (z)(1 (z))

Recall that 0 (z) = (z)(1 (z))

@C
Therefore, = [y (1 (z)) (1 y ) (z)]xj
@✓j
= [y y (z) (z) + y (z)]xj
= ( (z) y )xj
@C
Similarly, = ( (z) y)
@✓0
Thus, as we wanted, the gradient is proportional to (z) y
The greater the error, the faster the learning rate
Madhavan Mukund Lecture 8: 2 February, 2023 DMML Jan–Apr 2023 22 / 22

23.0 Logistic Regression-6
No ratings yet
23.0 Logistic Regression-6
24 pages
Linear Regression1
No ratings yet
Linear Regression1
98 pages
Hota ML Regression
No ratings yet
Hota ML Regression
57 pages
NVT SDS Unit V Final PDF
No ratings yet
NVT SDS Unit V Final PDF
100 pages
L4 ML
No ratings yet
L4 ML
43 pages
DeepLearning Lect2 3
No ratings yet
DeepLearning Lect2 3
89 pages
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
No ratings yet
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
48 pages
ML 3
No ratings yet
ML 3
66 pages
LecML - 2 Linear Regression
No ratings yet
LecML - 2 Linear Regression
33 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Lecture Slides - Linear Regression (2025)
No ratings yet
Lecture Slides - Linear Regression (2025)
45 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Lecture 220927 02
No ratings yet
Lecture 220927 02
29 pages
Week 2
No ratings yet
Week 2
43 pages
DV06 LSQ Fit
No ratings yet
DV06 LSQ Fit
61 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Class Notes 23mar2023
No ratings yet
Class Notes 23mar2023
55 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Training Models
No ratings yet
Training Models
13 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Lec 03
No ratings yet
Lec 03
42 pages
CS-31002 (ML) - CS End April 2025
No ratings yet
CS-31002 (ML) - CS End April 2025
19 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
1 Introduction
No ratings yet
1 Introduction
8 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
Optimization Lecture 1
No ratings yet
Optimization Lecture 1
11 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Introduction To Machine Learning - Unit 8 - Week 5
No ratings yet
Introduction To Machine Learning - Unit 8 - Week 5
4 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Geostatistics Kriging
100% (2)
Geostatistics Kriging
62 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
Ex04 Regression MLP Solution
No ratings yet
Ex04 Regression MLP Solution
4 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
ML 01
No ratings yet
ML 01
24 pages
Lect 1
No ratings yet
Lect 1
24 pages
MFMLHandout
No ratings yet
MFMLHandout
7 pages
Dis 1
No ratings yet
Dis 1
5 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Exercise 03
No ratings yet
Exercise 03
5 pages
ML Notes
No ratings yet
ML Notes
14 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Statistical Process Control Chart
No ratings yet
Statistical Process Control Chart
69 pages
V01-Analysis of Variance
100% (1)
V01-Analysis of Variance
991 pages
Hyfran Plus Guide Ev 13janv2015 F
100% (2)
Hyfran Plus Guide Ev 13janv2015 F
71 pages
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
No ratings yet
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
6 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
MLStackCafe QAS 1672810525772
No ratings yet
MLStackCafe QAS 1672810525772
12 pages
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
No ratings yet
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
91 pages
Chi-Square Test: Prem Mann, Introductory Statistics, 7/E
No ratings yet
Chi-Square Test: Prem Mann, Introductory Statistics, 7/E
33 pages
Statistics and Probability: Quarter 3 - Module 6 Central Limit Theorem
No ratings yet
Statistics and Probability: Quarter 3 - Module 6 Central Limit Theorem
20 pages
Advanced Graph Algorithms
No ratings yet
Advanced Graph Algorithms
110 pages
CS2A
No ratings yet
CS2A
7 pages
Nelson Plosser 1982
100% (1)
Nelson Plosser 1982
24 pages
ASTM E122-07 Sample Size Determination
No ratings yet
ASTM E122-07 Sample Size Determination
14 pages
Main
No ratings yet
Main
11 pages
Stata Practical Multilevel
No ratings yet
Stata Practical Multilevel
23 pages
(Formerly West Bengal University of Technology) : (Applicable From The Academic Session 2018-2019)
No ratings yet
(Formerly West Bengal University of Technology) : (Applicable From The Academic Session 2018-2019)
26 pages
Module 5
No ratings yet
Module 5
40 pages
PGDAS Brochure 5aug2022
No ratings yet
PGDAS Brochure 5aug2022
19 pages
6 Some Probability Distributions B1-2
No ratings yet
6 Some Probability Distributions B1-2
36 pages
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Information Technology
No ratings yet
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Information Technology
20 pages
Testing of Hypothesis - Quiz 2
No ratings yet
Testing of Hypothesis - Quiz 2
5 pages
DMML2023 Lecture05 19jan2023
No ratings yet
DMML2023 Lecture05 19jan2023
17 pages
5.Sm025 Wow - Question Set 5
No ratings yet
5.Sm025 Wow - Question Set 5
12 pages
Weekly Wages in Rs. No. of Persons Weekly Wages in Rs. No of Persons
No ratings yet
Weekly Wages in Rs. No. of Persons Weekly Wages in Rs. No of Persons
15 pages
Anim Feed Sci Technol 165 68 Meta Analysis Pigs Betaine
No ratings yet
Anim Feed Sci Technol 165 68 Meta Analysis Pigs Betaine
11 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
DSV Tut 2 Answers
No ratings yet
DSV Tut 2 Answers
6 pages
4040 Statistics: MARK SCHEME For The October/November 2015 Series
No ratings yet
4040 Statistics: MARK SCHEME For The October/November 2015 Series
6 pages
Golf Project Report
No ratings yet
Golf Project Report
12 pages
Quiz To Chapter 11 (Pass - Quiz11) - Attempt Review
No ratings yet
Quiz To Chapter 11 (Pass - Quiz11) - Attempt Review
15 pages
UNit 2-Prob Dist-Discrete
No ratings yet
UNit 2-Prob Dist-Discrete
5 pages
Worksheet 03
No ratings yet
Worksheet 03
2 pages
A Practical Approach To Kalman Filter and How To Implement It
No ratings yet
A Practical Approach To Kalman Filter and How To Implement It
13 pages
SHOPEE CHEGG Qonita966@
No ratings yet
SHOPEE CHEGG Qonita966@
4 pages
Absentee - Report - 1 3 5 7sem 2020 21
No ratings yet
Absentee - Report - 1 3 5 7sem 2020 21
4 pages
Semester Cohort: SMK Kai Chung (Yfb 6301) Peti Surat 100, 96507 Bintangor, Sarawak
No ratings yet
Semester Cohort: SMK Kai Chung (Yfb 6301) Peti Surat 100, 96507 Bintangor, Sarawak
6 pages
EDX As s1 2017 v1
No ratings yet
EDX As s1 2017 v1
4 pages
Univariate Analysis of Variance: Between-Subjects Factors
No ratings yet
Univariate Analysis of Variance: Between-Subjects Factors
3 pages
Maulana Abul Kalam Azad University of Technology2nd
No ratings yet
Maulana Abul Kalam Azad University of Technology2nd
1 page
Maulana Abul Kalam Azad University of Technology4th
No ratings yet
Maulana Abul Kalam Azad University of Technology4th
2 pages
Maulana Abul Kalam Azad University of Technology6th
No ratings yet
Maulana Abul Kalam Azad University of Technology6th
1 page
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet