0% found this document useful (0 votes)
4 views

Lecture 4

Uploaded by

Uzma Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 4

Uploaded by

Uzma Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Lecture 4: Linear Neural Network and Linear

Regression: Part 2

Md. Shahriar Hussain


ECE Department, NSU

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain
Linear Regression Single Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 2
Important Equations

Hypothesis:

Parameters:

Cost Function:

Goal:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 3
Cost Function for two parameters

(for fixed , this is a function of x) (function of the parameters )

500

400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500 3000

Size in feet2 (x)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 4
Cost Function for two parameters

• Previously we plotted our cost function by plotting


– θ1 vs J(θ1)
• Now we have two parameters
– Plot becomes a bit more complicated
– Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 5
Cost Function for two parameters

• We can see that the height


(y) of the graph indicates the
value of the cost function,
• we need to find where y is at
a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 6
Gradient descent

• We want to get min J(θ0, θ1)


• Gradient descent
– Used all over machine learning for minimization

• Outline:

• Start with some

• Keep changing to reduce until we hopefully


end up at a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 7
Gradient descent

 Start with initial guesses


 Start at 0,0 (or any other value)
 Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
 Each time you change the parameters, you select the gradient which
reduces J(θ0,θ1) the most possible
 Repeat
 Do so until you converge to a local minimum
 Has an interesting property
 Where you start can determine which minimum you end up
 Here we can see one initialization point led to one local minimum
 The other led to a different one

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 8
Gradient descent

• One initialization point led to one local minimum.


The other led to a different one
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 9
Gradient Descent Algorithm
• Gradient descent is used to minimize the MSE by
calculating the gradient of the cost function

Correct: Simultaneous update Incorrect:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 10
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 11
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 12
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 13
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 14
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 15
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 16
Learning Rate

• Here, α is the learning rate, a hyperparameter


• It controls how big steps we made
• If α is small, we will take tiny steps
• If α is big, we have an aggressive gradient descent

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 17
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 18
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 19
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 20
Local Minima

• Local minimum: value of the loss function is minimum at that point in a local
region.
• Global minima: value of the loss function is minimum globally across the
entire domain the loss function

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 21
Local Minima

at local minima

Global minima

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 22
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 23
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 24
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 25
Linear Regression Single Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 26
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 27
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 28
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 29
Linear Regression Multiple Variable

Now:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 30
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 31
Linear Regression Multiple Variable

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 32
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 33
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 34
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 35
Gradient Descent for Multi Variables

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 36
Gradient Descent for Multi Variables Vector Format

• Vector format:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 37
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 38
Gradient Descent for Multi Variables Vector Format

• Compute them in the matrix format

• The gradient vector,

where, X is a matrix

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 39
Gradient Descent for Multi Variables Vector Format

• Suppose, we have d number of features and m number of sample examples.

1 1 1 1
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
2 2 2 2
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
3 3 3 3
𝑥0 𝑥1 𝑥2 … … . 𝑥𝑑
𝑋= .
.
.
.
𝑥0 𝑚 𝑥1 𝑚 𝑥2 𝑚 … … . 𝑥𝑑 𝑚

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 40
Gradient Descent for Multi Variables Vector Format
• Dimensionality Matching
– Suppose, we have d number of features and m number of sample examples

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 41
Gradient Descent for Multi Variables Vector Format

• Dimensionality Matching

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 42
Gradient Descent for Multi Variables Vector Format

• The gradient Descent update rule in matrix format:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 43
Batch Gradient Descent

• This formula involves calculations over the full training set X, at each
Gradient Descent step
• This is why the algorithm is called Batch Gradient Descent.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 44
• Reference:
– Andrew NG Lectures on Machine Learning, Standford University

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 45

You might also like