0% found this document useful (0 votes)
14 views

Lecture 2-Linear-Regression-Part1

Here are the key steps of one iteration of gradient descent: 1. Compute the derivative of the cost function J with respect to each parameter θi. This gives the slope of J at the current values of the θs. 2. The slope indicates how much changing each θi would help reduce the cost. If the slope is positive, reducing θi would lower J. If the slope is negative, increasing θi would lower J. 3. Take a small step in the direction opposite to the slope. This means subtracting a small amount from θi if the slope is positive, or adding a small amount if the slope is negative. 4. The step size is determined by the learning rate α.

Uploaded by

Nada Shaaban
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 2-Linear-Regression-Part1

Here are the key steps of one iteration of gradient descent: 1. Compute the derivative of the cost function J with respect to each parameter θi. This gives the slope of J at the current values of the θs. 2. The slope indicates how much changing each θi would help reduce the cost. If the slope is positive, reducing θi would lower J. If the slope is negative, increasing θi would lower J. 3. Take a small step in the direction opposite to the slope. This means subtracting a small amount from θi if the slope is positive, or adding a small amount if the slope is negative. 4. The step size is determined by the learning rate α.

Uploaded by

Nada Shaaban
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Prepared by : Dr.

Hanaa Bayomi
Updated By: Prof Abeer ElKorany

Lecture 2 : Linear Regression


LINEAR REGRESSION WITH ONE VARIABLE

➢ Model Representation

➢ Cost Function

➢ Gradient Descent
MODEL REPRESENTATION

dependent
variable

1250
Independent
variable
Supervised Learning Regression:

“right answers” or “Labeled data” given Predict continuous valued output (price)
MODEL REPRESENTATION

Example

x (1) 2104
(x,y) one training example (one raw) 232
y (2)
(x (i),y (i)) i th training example
x (4) 852
MODEL REPRESENTATION

Training set the job of a learning algorithm to output


a function is usually denoted lowercase
h and h stands for hypothesis
Learning algorithm

x h y

the job of a hypothesis function is


taking the value of x and it tries to
output the estimated value of y. So h is
a function that maps from x's to y's
MODEL REPRESENTATION
How do we represent h ?

X
X
X X X
X
X
Y X
X

X
Linear Equations
Y

Change in Y
θ1= Slop (ΔY)

Change in X (ΔX)

θ0=Y-intercept

Linear regression with one variable. Univariate linear regression.


X
Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship


COST FUNCTION
▪ The cost function, let us figure out how to fit the best
possible straight line to our data.

How to choose θi’s ?


Scatter plot
▪ 1. Plot of All (Xi, Yi) Pairs
▪ 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
Thinking Challenge

How would you draw a line through the points?


How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60
11
Thinking Challenge

How would you draw a line through the points?


How do you determine which line ‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?

Slope unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope changed

Y
60
40
20
0 X
0 20 40 60
Intercept changed
Price ($) in 1000's
Training Set Size in feet2 (x)
(y)
2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ? Or Weight
3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values
and Predicted Y Values is a Minimum. So square errors!

m m
 (Yi − h ( xi ) ) = 
2
ˆi
2
i =1 i =1

17
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted
Y Values Are a Minimum. So square errors!

m m
 (Yi − h ( xi ) ) = 
2
ˆi
2
i =1 i =1
▪ 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE)

18
Least Squares Graphically
n
LS minimizes   i =  1 +  2 +  3 +  4
 2  2  2  2  2

i =1
Y Y2 =  0 + 1 X 2 + ˆ2
^4
^2
^1 ^3
hθ(xi ) = θ0 + θ1 X i
X
19
Least Squared errors Linear Regression
COST FUNCTION ,

Minimize

predictions on the
training set
the actual values

Minimize
Cost function visualization with One parameter
Consider a simple case of hypothesis by setting θ0=0, then h becomes :
hθ(x)=θ1x

Each value of θ1 corresponds to a different hypothesis as it is the slope


of the line
which corresponds to different lines passing through the origin as
shown in plots below as y-intercept i.e. θ0 is nulled out.

At θ1=2,
At θ1=1,

At θ1=0.5, J(0.5)
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
Simple Hypothesis

At θ1=2,

At θ1=1,

At θ1=0.5, J(0.5)

On plotting points like this further, one


gets the following graph for the cost
function which is dependent on
parameter θ1.

plot each value of θ1 corresponds to a


different hypothesizes
Cost function visualization with One parameter

What is the optimal value of θ1 that minimizes J(θ1) ?


It is clear that best value for θ1 =1 as J(θ1 ) = 0,
which is the minimum.

How to find the best value for θ1 ?

Plotting ?? Not practical specially in high


dimensions?

The solution :

1. Analytical solution: not applicable for large


datasets
2. Numerical solution: ex: Gradient descent .
Ploting the cost function 𝑗 𝜃0 , 𝜃1
Cost function visualization with θ0, θ1
COST FUNCTION (RECAP)
Gradient Descent
GRADIENT DESCENT

➢ Iterative solution not only in linear regression. It's


actually used all over the place in machine learning.

➢ Objective: minimize any function ( Cost Function J)


PROBLEM SETUP
Imagine that this is a landscape of grassy park, and you want
to go to the lowest point in the park as rapidly as possible

Starting
point Red: means high
blue: means low

J(0,1)

1
local
minimum 0
New Starting
point

Red: means high


blue: means low

J(0,1)

New local
minimum

1
0
With different starting point
Gradient descent Algorithm (LMS)
Gradient descent Algorithm
J(θ1) EXAMPLE

d
1 = 1 −  j (1 )
+ slop
d1

θ1 θ1= θ1- (+ve)


J(θ1)

- slop

θ1= θ1- 
(-ve)

θ1
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
QUESTION
what do you think one step of gradient descent will do?
Change of Learning rate value

o
If α is too small, gradient
descent can be slow.

If α is too large, gradient


descent can overshoot the
minimum. It may fail to
converge, or even diverge.
Change of Learning rate value

If α is too small, gradient


descent can be slow.

If α is too large, gradient


descent can overshoot the
minimum. It may fail to
converge, or even diverge.
Local minimum
Gradient descent can converge to a local minimum, even with the
learning rate α fixed.

As we approach a local minimum, gradient descent will


automatically take smaller steps. So, no need to decrease α
over time.
GRADIENT DESCENT FOR
A LINEAR REGRESSION
d 1 m
 (h ( xi ) − Yi )
d
j (0 ,1 ) = 2
d j d j 2m i=1
d 1 m
 ( 0 +  1( xi ) − Yi )
d
j (0 ,1 ) = 2
d j d j 2m i=1
1 m
j ( 0 ,1 ) =  (h ( xi ) − Yi )
d
j = 0:
d 0 m i =1
1 m
j ( 0 ,1 ) =  (h ( xi ) − Yi ) • xi
d
j = 1:
d1 m i =1
G.D. FOR LINEAR
REGRESSION
MODEL REPRESENTATION
Linear Regression
Using
TensorFlow
1-D Data Example
Data Preparation

import numpy as np

num_of_points = 100 #Generate 100 Data Points


points = []
for i in range(num_of_points):
x1= np.random.normal(0.0, 0.55)
y1= x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.01)
points.append([x1, y1])
x_data = [v[0] for v in points]
y_data = [v[1] for v in points]
Draw Data

import matplotlib.pyplot as plt

plt.plot(x_data, y_data, 'ro', label='Original data')


plt.legend()
plt.show()
Original Data
Variables and Nodes
Preparation

import tensorflow as tf

#initialize weights "W and bias "b"


W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))

y = W * x_data + b

#Define Loss function as Mean of Squared Error


loss = tf.reduce_mean(tf.square(y - y_data))

#Create Optimizer class to minimize Losses


optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

#initialize TensorFlow Variables (always)


init = tf.global_variables_initializer()
Execute TensorFlow Graph
#Start TensorFlow Session and carryout Variable initialization
sess = tf.Session()
sess.run(init)

#Carryout 16 Iterations
for step in range(16):
sess.run(train)

#Draw Original Data


plt.plot(x_data, y_data, 'ro')

#Draw Predicted data (using calculated weight and bias after training
plt.plot(x_data, sess.run(W) * x_data + sess.run(b))
plt.xlabel('x')
plt.xlim(-2, 2)
plt.ylim(0.1, 0.6)
plt.ylabel('y')
plt.legend()
plt.show()

# print updated weights, bias, and Loss value after current training iteration
print(step, sess.run(W), sess.run(b),sess.run(loss))
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
Iteration 11
Iteration 12
Iteration 13
Iteration 14
Iteration 15
Iteration 16

You might also like