Lecture 2-Linear-Regression-Part1
Lecture 2-Linear-Regression-Part1
Hanaa Bayomi
Updated By: Prof Abeer ElKorany
➢ Model Representation
➢ Cost Function
➢ Gradient Descent
MODEL REPRESENTATION
dependent
variable
1250
Independent
variable
Supervised Learning Regression:
“right answers” or “Labeled data” given Predict continuous valued output (price)
MODEL REPRESENTATION
Example
x (1) 2104
(x,y) one training example (one raw) 232
y (2)
(x (i),y (i)) i th training example
x (4) 852
MODEL REPRESENTATION
x h y
X
X
X X X
X
X
Y X
X
X
Linear Equations
Y
Change in Y
θ1= Slop (ΔY)
Change in X (ΔX)
θ0=Y-intercept
Y
60
40
20
0 X
0 20 40 60
Thinking Challenge
Y
60
40
20
0 X
0 20 40 60
11
Thinking Challenge
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope unchanged
Y
60
40
20
0 X
0 20 40 60
Intercept changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
Price ($) in 1000's
Training Set Size in feet2 (x)
(y)
2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ? Or Weight
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values
and Predicted Y Values is a Minimum. So square errors!
m m
(Yi − h ( xi ) ) =
2
ˆi
2
i =1 i =1
17
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted
Y Values Are a Minimum. So square errors!
m m
(Yi − h ( xi ) ) =
2
ˆi
2
i =1 i =1
▪ 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE)
18
Least Squares Graphically
n
LS minimizes i = 1 + 2 + 3 + 4
2 2 2 2 2
i =1
Y Y2 = 0 + 1 X 2 + ˆ2
^4
^2
^1 ^3
hθ(xi ) = θ0 + θ1 X i
X
19
Least Squared errors Linear Regression
COST FUNCTION ,
Minimize
predictions on the
training set
the actual values
Minimize
Cost function visualization with One parameter
Consider a simple case of hypothesis by setting θ0=0, then h becomes :
hθ(x)=θ1x
At θ1=2,
At θ1=1,
At θ1=0.5, J(0.5)
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
Simple Hypothesis
At θ1=2,
At θ1=1,
At θ1=0.5, J(0.5)
The solution :
Starting
point Red: means high
blue: means low
J(0,1)
1
local
minimum 0
New Starting
point
J(0,1)
New local
minimum
1
0
With different starting point
Gradient descent Algorithm (LMS)
Gradient descent Algorithm
J(θ1) EXAMPLE
d
1 = 1 − j (1 )
+ slop
d1
- slop
θ1= θ1-
(-ve)
θ1
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
QUESTION
what do you think one step of gradient descent will do?
Change of Learning rate value
o
If α is too small, gradient
descent can be slow.
import numpy as np
import tensorflow as tf
y = W * x_data + b
#Carryout 16 Iterations
for step in range(16):
sess.run(train)
#Draw Predicted data (using calculated weight and bias after training
plt.plot(x_data, sess.run(W) * x_data + sess.run(b))
plt.xlabel('x')
plt.xlim(-2, 2)
plt.ylim(0.1, 0.6)
plt.ylabel('y')
plt.legend()
plt.show()
# print updated weights, bias, and Loss value after current training iteration
print(step, sess.run(W), sess.run(b),sess.run(loss))
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
Iteration 11
Iteration 12
Iteration 13
Iteration 14
Iteration 15
Iteration 16