0% found this document useful (0 votes)
6 views51 pages

Lecture 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views51 pages

Lecture 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Lecture 6: Linear Regression

Md. Shahriar Hussain


ECE Department, NSU

North South University Source: Andrew NG Lectures CSE445


What is Linear Regression

• Linear regression is defined as an algorithm that provides a linear


relationship between an independent variable and a dependent variable to
predict the outcome of future events

North South University Source: Andrew NG Lectures CSE445 2


Linear Regression Example

North South University Source: Andrew NG Lectures CSE445 3


Linear Regression Example

North South University Source: Andrew NG Lectures CSE445 4


Linear Regression Example

North South University Source: Andrew NG Lectures CSE445 5


Linear Regression Example

A Line of best
fit/Regression Line is
a straight line that
represents the best
approximation of a
scatter plot of data
points

North South University Source: Andrew NG Lectures CSE445 6


Linear Regression Example

Estimated/Predicted value (𝑦/𝑦) Actual/True value (𝑦)/Ground Truth

North South University Source: Andrew NG Lectures CSE445 7


Data Set Description

North South University Source: Andrew NG Lectures CSE445 8


Data Set Description

x (1) = 2104
x (2) = 1416
y (1) = 460
(x, y)= One Training Example
(x (i), y (i))= ith Training example y (2) = 232

North South University Source: Andrew NG Lectures CSE445 9


Hypothesis

Training Set

Learning Algorithm

Size of house h Estimated


New/unseen price
data (x) hypothesis 𝑦 𝑥
North South University Source: Andrew NG Lectures CSE445 10
Hypothesis

• How do we represent h ?

𝑦 𝑥 =

𝜃0 and 𝜃1 : parameters/weights that will be


trained/determined by the ML model
Not hyperparameters Linear regression with one variable.
𝜃0 = intercept/bias/constant Univariate linear regression.
𝜃1 = slope/coefficient/gradient

North South University Source: Andrew NG Lectures CSE445 11


Hypothesis

The goal is to choose Ө0 and Ө1 properly so that hӨ(x) is close to y.

• A cost function lets us figure out how to fit the best straight line to our data

North South University Source: Andrew NG Lectures CSE445 12


Hypothesis

Size in feet2 (x) Price ($) in 1000's (y)


2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
North South University Source: Andrew NG Lectures CSE445 13
Example

North South University Source: Andrew NG Lectures CSE445 14


Cost Function

minimize
Ө0 Ө1

• We need to choose Ө0 and Ө1 in a way that the result of the function will be minimized for all
m training example. This equation is called cost function.
J(Ө0 , Ө1)=
minimize
Ө0 Ө1 J(Ө0 , Ө1)

North South University Source: Andrew NG Lectures CSE445 15


Cost Function

Cost Function:

Goal:

• Here the cost function is called Squared Error cost function


• Minimize squared different between predicted house price and actual house
price
• 1/m - means we determine the average
• 1/2m the 2 makes the math a bit easier, and doesn't change the constants we
determine at all (i.e., half the smallest value is still the smallest value!)

North South University Source: Andrew NG Lectures CSE445 16


Cost Function Calculation

• For simplifications, assumes θ0 = 0

Find best values of θ1 so that J(θ1) is minimum

North South University Source: Andrew NG Lectures CSE445 17


Cost Function Calculation

3
3

2
2

1
1

0
0 -0.5 0 0.5 1 1.5 2 2.5
0 1 2 3
For, θ1 = 1
J(θ1) = 1/2*3 [0+0+0]=0
North South University Source: Andrew NG Lectures CSE445 18
Cost Function Calculation

For, θ1 = 0.5
J(θ1) = ?
North South University Source: Andrew NG Lectures CSE445 19
Cost Function Calculation

North South University Source: Andrew NG Lectures CSE445 20


Cost Function Calculation

For, θ1 = 0
J(θ1) = ?
North South University Source: Andrew NG Lectures CSE445 21
Cost Function Calculation

North South University Source: Andrew NG Lectures CSE445 22


Cost Function Calculation

• If we compute a range of values plot


 J(θ1) vs θ1 we get a polynomial
(looks like a quadratic)
• The optimization objective for the learning
algorithm is find the value of θ1 which
minimizes J(θ1)
 So, here θ1 = 1 is the best value
for θ1

 The line which has the least sum


of squares of errors is the best fit
line

North South University Source: Andrew NG Lectures CSE445 23


Important Equations

Hypothesis:

Parameters:

Cost Function:

Goal:

North South University Source: Andrew NG Lectures CSE445 24


Cost Function for two parameters

(for fixed , this is a function of x) (function of the parameters )

500

400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500 3000

Size in feet2 (x)

North South University Source: Andrew NG Lectures CSE445 25


Cost Function for two parameters

• Previously we plotted our cost function by plotting


– θ1 vs J(θ1)
• Now we have two parameters
– Plot becomes a bit more complicated
– Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)

North South University Source: Andrew NG Lectures CSE445 26


Cost Function for two parameters

• We can see that the height


(y) of the graph indicates the
value of the cost function,
• we need to find where y is at
a minimum

North South University Source: Andrew NG Lectures CSE445 27


Cost Function for two parameters

• A contour plot is a graphical technique for representing a 3-dimensional surface by


plotting constant z slices, called contours, on a 2-dimensional format

North South University Source: Andrew NG Lectures CSE445 28


Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE445 29


Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE445 30


Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE445 31


Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE445 32


Gradient descent

• We want to get min J(θ0, θ1)


• Gradient descent
– Used all over machine learning for minimization

• Outline:

• Start with some

• Keep changing to reduce until we hopefully


end up at a minimum

North South University Source: Andrew NG Lectures CSE445 33


Gradient descent

 Start with initial guesses


 Start at 0,0 (or any other value)
 Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
 Each time you change the parameters, you select the gradient which
reduces J(θ0,θ1) the most possible
 Repeat
 Do so until you converge to a local minimum
 Has an interesting property
 Where you start can determine which minimum you end up
 Here we can see one initialization point led to one local minimum
 The other led to a different one

North South University Source: Andrew NG Lectures CSE445 34


Gradient descent

• One initialization point led to one local minimum.


The other led to a different one
North South University Source: Andrew NG Lectures CSE445 35
Gradient Descent Algorithm
• Gradient descent is used to minimize the MSE by
calculating the gradient of the cost function

Correct: Simultaneous update Incorrect:

North South University Source: Andrew NG Lectures CSE445 36


Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE445 37


Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE445 38


Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE445 39


Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE445 40


Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE445 41


Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE445 42


Learning Rate

• Here, α is the learning rate, a hyperparameter


• It controls how big steps we made
• If α is small, we will take tiny steps
• If α is big, we have an aggressive gradient descent

North South University Source: Andrew NG Lectures CSE445 43


Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE445 44


Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE445 45


Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE445 46


Local Minima

• Local minimum: value of the loss function is minimum at that point in a local
region.
• Global minima: value of the loss function is minimum globally across the
entire domain the loss function

North South University Source: Andrew NG Lectures CSE445 47


Local Minima

at local minima

Global minima

North South University Source: Andrew NG Lectures CSE445 48


Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE445 49


Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE445 50


• Reference:
– Andrew NG Lectures on Machine Learning, Standford University

North South University Source: Andrew NG Lectures CSE445 51

You might also like