0% found this document useful (0 votes)

22 views

Lecture 2-Linear-Regression-Part1

Here are the key steps of one iteration of gradient descent: 1. Compute the derivative of the cost function J with respect to each parameter θi. This gives the slope of J at the current values of the θs. 2. The slope indicates how much changing each θi would help reduce the cost. If the slope is positive, reducing θi would lower J. If the slope is negative, increasing θi would lower J. 3. Take a small step in the direction opposite to the slope. This means subtracting a small amount from θi if the slope is positive, or adding a small amount if the slope is negative. 4. The step size is determined by the learning rate α.

Uploaded by

Nada Shaaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Lecture 2-Linear-Regression-Part1

Uploaded by

Nada Shaaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Prepared by : Dr.

Hanaa Bayomi
Updated By: Prof Abeer ElKorany

Lecture 2 : Linear Regression

LINEAR REGRESSION WITH ONE VARIABLE

➢ Model Representation

➢ Cost Function

➢ Gradient Descent
MODEL REPRESENTATION

dependent
variable

1250
Independent
variable
Supervised Learning Regression:

“right answers” or “Labeled data” given Predict continuous valued output (price)
MODEL REPRESENTATION

Example

x (1) 2104
(x,y) one training example (one raw) 232
y (2)
(x (i),y (i)) i th training example
x (4) 852
MODEL REPRESENTATION

Training set the job of a learning algorithm to output

a function is usually denoted lowercase
h and h stands for hypothesis
Learning algorithm

x h y

the job of a hypothesis function is

taking the value of x and it tries to
output the estimated value of y. So h is
a function that maps from x's to y's
MODEL REPRESENTATION
How do we represent h ?

X
X
X X X
X
X
Y X
X

X
Linear Equations
Y

Change in Y
θ1= Slop (ΔY)

Change in X (ΔX)

θ0=Y-intercept

Linear regression with one variable. Univariate linear regression.

X
Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

COST FUNCTION
▪ The cost function, let us figure out how to fit the best
possible straight line to our data.

How to choose θi’s ?

Scatter plot
▪ 1. Plot of All (Xi, Yi) Pairs
▪ 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
Thinking Challenge

How would you draw a line through the points?

How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60
11
Thinking Challenge

How would you draw a line through the points?

How do you determine which line ‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?

Slope unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope changed

Y
60
40
20
0 X
0 20 40 60
Intercept changed
Price ($) in 1000's
Training Set Size in feet2 (x)
(y)
2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ? Or Weight
3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values
and Predicted Y Values is a Minimum. So square errors!

m m
 (Yi − h ( xi ) ) = 
2
ˆi
2
i =1 i =1

17
Least Squares
▪ 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted
Y Values Are a Minimum. So square errors!

m m
 (Yi − h ( xi ) ) = 
2
ˆi
2
i =1 i =1
▪ 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE)

18
Least Squares Graphically
n
LS minimizes   i =  1 +  2 +  3 +  4
 2  2  2  2  2

i =1
Y Y2 =  0 + 1 X 2 + ˆ2
^4
^2
^1 ^3
hθ(xi ) = θ0 + θ1 X i
X
19
Least Squared errors Linear Regression
COST FUNCTION ,

Minimize

predictions on the
training set
the actual values

Minimize
Cost function visualization with One parameter
Consider a simple case of hypothesis by setting θ0=0, then h becomes :
hθ(x)=θ1x

Each value of θ1 corresponds to a different hypothesis as it is the slope

of the line
which corresponds to different lines passing through the origin as
shown in plots below as y-intercept i.e. θ0 is nulled out.

At θ1=2,
At θ1=1,

At θ1=0.5, J(0.5)
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
CHANGE OF COEFFICIENT COST FUNCTION
Cost function visualization with One parameter
Simple Hypothesis

At θ1=2,

At θ1=1,

At θ1=0.5, J(0.5)

On plotting points like this further, one

gets the following graph for the cost
function which is dependent on
parameter θ1.

plot each value of θ1 corresponds to a

different hypothesizes
Cost function visualization with One parameter

What is the optimal value of θ1 that minimizes J(θ1) ?

It is clear that best value for θ1 =1 as J(θ1 ) = 0,
which is the minimum.

How to find the best value for θ1 ?

Plotting ?? Not practical specially in high

dimensions?

The solution :

1. Analytical solution: not applicable for large

datasets
2. Numerical solution: ex: Gradient descent .
Ploting the cost function 𝑗 𝜃0 , 𝜃1
Cost function visualization with θ0, θ1
COST FUNCTION (RECAP)
Gradient Descent
GRADIENT DESCENT

➢ Iterative solution not only in linear regression. It's

actually used all over the place in machine learning.

➢ Objective: minimize any function ( Cost Function J)

PROBLEM SETUP
Imagine that this is a landscape of grassy park, and you want
to go to the lowest point in the park as rapidly as possible

Starting
point Red: means high
blue: means low

J(0,1)

1
local
minimum 0
New Starting
point

Red: means high

blue: means low

J(0,1)

New local
minimum

1
0
With different starting point
Gradient descent Algorithm (LMS)
Gradient descent Algorithm
J(θ1) EXAMPLE

d
1 = 1 −  j (1 )
+ slop
d1

θ1 θ1= θ1- (+ve)

J(θ1)

- slop

θ1= θ1- 
(-ve)

θ1
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm
QUESTION
what do you think one step of gradient descent will do?
Change of Learning rate value

o
If α is too small, gradient
descent can be slow.

If α is too large, gradient

descent can overshoot the
minimum. It may fail to
converge, or even diverge.
Change of Learning rate value

If α is too small, gradient

descent can be slow.

If α is too large, gradient

descent can overshoot the
minimum. It may fail to
converge, or even diverge.
Local minimum
Gradient descent can converge to a local minimum, even with the
learning rate α fixed.

As we approach a local minimum, gradient descent will

automatically take smaller steps. So, no need to decrease α
over time.
GRADIENT DESCENT FOR
A LINEAR REGRESSION
d 1 m
 (h ( xi ) − Yi )
d
j (0 ,1 ) = 2
d j d j 2m i=1
d 1 m
 ( 0 +  1( xi ) − Yi )
d
j (0 ,1 ) = 2
d j d j 2m i=1
1 m
j ( 0 ,1 ) =  (h ( xi ) − Yi )
d
j = 0:
d 0 m i =1
1 m
j ( 0 ,1 ) =  (h ( xi ) − Yi ) • xi
d
j = 1:
d1 m i =1
G.D. FOR LINEAR
REGRESSION
MODEL REPRESENTATION
Linear Regression
Using
TensorFlow
1-D Data Example
Data Preparation

import numpy as np

num_of_points = 100 #Generate 100 Data Points

points = []
for i in range(num_of_points):
x1= np.random.normal(0.0, 0.55)
y1= x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.01)
points.append([x1, y1])
x_data = [v[0] for v in points]
y_data = [v[1] for v in points]
Draw Data

import matplotlib.pyplot as plt

plt.plot(x_data, y_data, 'ro', label='Original data')

plt.legend()
plt.show()
Original Data
Variables and Nodes
Preparation

import tensorflow as tf

#initialize weights "W and bias "b"

W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))

y = W * x_data + b

#Define Loss function as Mean of Squared Error

loss = tf.reduce_mean(tf.square(y - y_data))

#Create Optimizer class to minimize Losses

optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

#initialize TensorFlow Variables (always)

init = tf.global_variables_initializer()
Execute TensorFlow Graph
#Start TensorFlow Session and carryout Variable initialization
sess = tf.Session()
sess.run(init)

#Carryout 16 Iterations
for step in range(16):
sess.run(train)

#Draw Original Data

plt.plot(x_data, y_data, 'ro')

#Draw Predicted data (using calculated weight and bias after training
plt.plot(x_data, sess.run(W) * x_data + sess.run(b))
plt.xlabel('x')
plt.xlim(-2, 2)
plt.ylim(0.1, 0.6)
plt.ylabel('y')
plt.legend()
plt.show()

# print updated weights, bias, and Loss value after current training iteration
print(step, sess.run(W), sess.run(b),sess.run(loss))
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
Iteration 11
Iteration 12
Iteration 13
Iteration 14
Iteration 15
Iteration 16

WORKSHEET - Data Representation
100% (1)
WORKSHEET - Data Representation
3 pages
GT-2 Trees
No ratings yet
GT-2 Trees
55 pages
Industrial Electronics
100% (1)
Industrial Electronics
152 pages
NKF-770 TD Exchange Procedure (JLN-550Ver1 0)
No ratings yet
NKF-770 TD Exchange Procedure (JLN-550Ver1 0)
11 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Lec2 Linear Regression With One Variable
No ratings yet
Lec2 Linear Regression With One Variable
48 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Regression: Level:4 Department: IT, Security
No ratings yet
Linear Regression: Level:4 Department: IT, Security
35 pages
CS229
No ratings yet
CS229
69 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Regression
No ratings yet
Regression
30 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
2-LR_Optim
No ratings yet
2-LR_Optim
60 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Week 04
No ratings yet
Week 04
101 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Slide 3 - Linear Regression One Variable
No ratings yet
Slide 3 - Linear Regression One Variable
60 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
6 pages
Linear Regression
100% (1)
Linear Regression
51 pages
ML02
No ratings yet
ML02
25 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
7 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
[PR 2024] Lec2 Regression II
No ratings yet
[PR 2024] Lec2 Regression II
41 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Gdesc LMS
No ratings yet
Gdesc LMS
7 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Lecture02 Scanning 2
No ratings yet
Lecture02 Scanning 2
79 pages
Lecture03 Parsing 1
No ratings yet
Lecture03 Parsing 1
108 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
Edoc-What Is Capacitor Bank Testing and Why Is Done
No ratings yet
Edoc-What Is Capacitor Bank Testing and Why Is Done
8 pages
07 Rawlbolts Plugs Anchors
No ratings yet
07 Rawlbolts Plugs Anchors
1 page
Cover Letter Examples Uk Hospitality
100% (1)
Cover Letter Examples Uk Hospitality
8 pages
Enabler Embedded Brochure PDF
No ratings yet
Enabler Embedded Brochure PDF
4 pages
Mispa CCXL
No ratings yet
Mispa CCXL
2 pages
What Is The Highest Salary A Software Engineer Can Get in India
No ratings yet
What Is The Highest Salary A Software Engineer Can Get in India
205 pages
Chapter 6
No ratings yet
Chapter 6
29 pages
Company Tel: Badrit (Mobile Applications) Elsewedy Electrometer Egypt
No ratings yet
Company Tel: Badrit (Mobile Applications) Elsewedy Electrometer Egypt
4 pages
2025 Midyear Exam Support_virtual Lessons_css_250521_152903
No ratings yet
2025 Midyear Exam Support_virtual Lessons_css_250521_152903
4 pages
The Diet Center
No ratings yet
The Diet Center
5 pages
Thesis Fina
No ratings yet
Thesis Fina
50 pages
Instant download Control of Permanent Magnet Synchronous Motors Sadegh Vaez-Zadeh pdf all chapter
No ratings yet
Instant download Control of Permanent Magnet Synchronous Motors Sadegh Vaez-Zadeh pdf all chapter
22 pages
Reviewing The Literature On Dam-Break Investigations and Flood Mapping Through The Use of Hydraulic Models and Gis
No ratings yet
Reviewing The Literature On Dam-Break Investigations and Flood Mapping Through The Use of Hydraulic Models and Gis
10 pages
System and Network Administration
No ratings yet
System and Network Administration
6 pages
Final labslots (1)
No ratings yet
Final labslots (1)
21 pages
Arif Hussain: Executive Profile Key Skills
No ratings yet
Arif Hussain: Executive Profile Key Skills
3 pages
HTML
No ratings yet
HTML
22 pages
COMPUTER SCIENCE PROJECT FILE (Ayush)
No ratings yet
COMPUTER SCIENCE PROJECT FILE (Ayush)
40 pages
XII Pre-Board (Morning)
100% (1)
XII Pre-Board (Morning)
5 pages
Fitbit Web API Data Dictionary Downloadable Version 2023
No ratings yet
Fitbit Web API Data Dictionary Downloadable Version 2023
40 pages
Completion of Internship Format
No ratings yet
Completion of Internship Format
3 pages
iBTW20 Online Manual
No ratings yet
iBTW20 Online Manual
1 page
Total Cost of Ownership (TCO) Calculator _ Microsoft Azure
No ratings yet
Total Cost of Ownership (TCO) Calculator _ Microsoft Azure
11 pages
Tour & Travel Report
No ratings yet
Tour & Travel Report
43 pages
Modem Functionality: Mobile Connectivity
No ratings yet
Modem Functionality: Mobile Connectivity
10 pages
Namur Solenoid Valve
No ratings yet
Namur Solenoid Valve
1 page

Lecture 2-Linear-Regression-Part1

Uploaded by

Lecture 2-Linear-Regression-Part1

Uploaded by

Prepared by : Dr.

Lecture 2 : Linear Regression

Training set the job of a learning algorithm to output

the job of a hypothesis function is

Linear regression with one variable. Univariate linear regression.

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

How to choose θi’s ?

How would you draw a line through the points?

How would you draw a line through the points?

Each value of θ1 corresponds to a different hypothesis as it is the slope

On plotting points like this further, one

plot each value of θ1 corresponds to a

What is the optimal value of θ1 that minimizes J(θ1) ?

How to find the best value for θ1 ?

Plotting ?? Not practical specially in high

1. Analytical solution: not applicable for large

➢ Iterative solution not only in linear regression. It's

➢ Objective: minimize any function ( Cost Function J)

Red: means high

θ1 θ1= θ1- (+ve)

If α is too large, gradient

If α is too small, gradient

If α is too large, gradient

As we approach a local minimum, gradient descent will

num_of_points = 100 #Generate 100 Data Points

import matplotlib.pyplot as plt

plt.plot(x_data, y_data, 'ro', label='Original data')

#initialize weights "W and bias "b"

#Define Loss function as Mean of Squared Error

#Create Optimizer class to minimize Losses

#initialize TensorFlow Variables (always)

#Draw Original Data

You might also like