0% found this document useful (0 votes)

19 views

Linear Regression

This document summarizes linear regression and gradient descent algorithms. It discusses: 1) Linear regression assumes the output is a linear function of the input and finds the coefficients to minimize the cost function between predicted and actual outputs. 2) Gradient descent is used to find the minimum of the cost function by iteratively updating the coefficients in the direction of steepest descent. 3) It explains batch, stochastic, and mini-batch gradient descent methods and discusses factors like learning rate and convergence conditions.

Uploaded by

Võ Minh Kiệt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Linear Regression

Uploaded by

Võ Minh Kiệt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Lecture 2

Linear Regression

Dr. Le Huu Ton

Hanoi, 09/2016
Outline

Linear Regression

Gradient Descent?

Multi Feature Representation

Questions and Answers

2
Review of Machine Learning

Training Data

System to
Input (X) train h Output (Y)
(hypothesis)

For each input x, output y = h (x)

3
Linear Regression

Price (billion
Size (m2) VND)
(x(2),y(2)) 30 2.5
43 3.4
25 1.8
51 4.5
40 3.2 (x(5),y(5))

20 1.6
Table 1: Training data of housing price in Hanoi

In Supervised Learning, each training data consists of 2 elements:

input x (features) and output y (response)
Notation
(x,y) : One training data
(x(i),y(i)) : the ith training data
4
Linear Regression

Plotting of training data

Price of House in Hanoi
5

4.5

3.5
y-Price (billion VND)

2.5

1.5

0.5

0
0 10 20 30 40 50 60
x-Square (m2)

Linear Regression: Assume that the output y is a linear function of input x

y=h(x)=a*x+b
5
Linear Regression

Objective:
Learning the function y=h(x)=a*x+b, such as it returns the minimize
error- cost function for the training data (optimization problem)

Find the coefficient a,b to minimize the cost function

6
Linear Regression

Cost Function:
Size(m2) Price (b.VND)
The error for training data:
1 1
30 2.5
e(1)  (h( x (1) )  y (1) ) 2  (30a  b  2.5) 2 43 3.4
2 2
25 1.8
1 1
e(2)  (h( x (2) )  y (2) .) 2  (43a  b  3.4) 2 51 4.5
2 2
. 40 3.2
. 20 1.6
1 1
e ( m )  ( h( x ( m ) )  y ( m ) ) 2  ( x ( m ) a  b  y ( m ) ) 2
2 2
The cost function is define as:
m
1 (1) (2) 1 1 m
E  (e  e  ...  e )   e 
( m) (i )
 ( h( x ( i ) )  y ( i ) ) 2
m m i 1 2m i 1
1 m
E 
2m i 1
( ax (i )
 b  y (i ) 2
)
7
Gradient Descent

Objective:
Use the gradient function to find a minimum of a function
1 m
E 
2m i 1
( ax (i )
 b  y (i ) 2
)

Note that E is a function of a and b, we have only 2 variables a and

b.
Idea:
Choose random number for a and b, the algorithm is implemented
in many steps. At each step, modify a and b such s the cost function
is reduced
 
a : a   E (a ) b : b   E (b)
a b

8
Gradient Descent

A demonstration and explanation of Gradient Descent algorithm

can be found at the following website:
http://www.onmyphd.com/?p=gradient.descent

9
Gradient Descent

Suppose that (x0,y0) is a local minimum of the cost function, what

will 1 iteration of gradient descent do?

1. Leave x0 unchanged
2. Change x0 in random direction
3. Move x0 toward the global minimum
4. Decrease x0

10
Gradient Descent

Calculate the derivations of the cost function:  E (a) 

E (b)
a b
1 m
E 
2m i 1
( ax (i )
 b  y (i ) 2
)

Given the following formula:

 2  
(x )  2x ( f ( x) )  2 f ( x).
2
f ( x)
x x x

11
Gradient Descent

 1 m
E (a )   (ax (i )  b  y (i ) ).x ( i )
a m i 1

 1 m
E (b)   (ax (i )  b  y ( i ) )
b m i 1

12
Gradient Descent

Exercise:
Starting at a=0 and b= 0, α=0.01, what is the cost function?
Calculate the value of a and b after first iteration (first step). Confirm
if the cost function is reduced or not?
Size(m2) Price (b.VND)

30 2.5
43 3.4
25 1.8
51 4.5
40 3.2
20 1.6

13
Gradient Descent

Batch and Stochastic Gradient Descent

Batch gradient Descent:
Compute the Gradient Descent using the whole data set

Stochastic Gradient Descent:

Compute the Gradient Descent using 1 training example at a time
- Randomly reorder the training data
- Use (x(1),y(1)) to calculate the gradient descent in order to update
a,b
- Use (x(2),y(2)) to update
- ……

14
Gradient Descent

Mini-Batch Gradient Descent

Compute the Gradient Descent for t example at a time (1<t<m)
Example:
We have 1000 training data.
Step 1: Update the coefficient using 10 data 1-10
Step 2: Update the coefficient using 10 data 11-20
….

15
Gradient Descent

Big value of α may lead to the

incensement of cost function and
not convergence

 Gradient Descent works if the

cost function decrease at each
step

16
Gradient Descent

Convergence:
How to know if a function is converged or not?
- Cost function is smaller than a predefined threshold
- After a big enough number of step
- Cost function decreased less than a predefine threshold

17
Gradient Descent

Summarization:
1. Calculate the cost function
2. Select random value for coefficient a,b
3. Step by step modify a, b such as the cost function is decreased
While (not converged)
do  
a : a  E (a) b : b  E (b)
a b

18
Multiple Input Representation

Example:
Consider the same example, but with more inputs
Size (m2) N0 of floors N0 of rooms Price (billion
VND)

30 3 6 2.5
43 4 8 3.4
25 2 3 1.8
51 4 9 4.5
40 3 5 3.2
20 1 2 1.6

𝑥 𝑖
: the input of ith training data
(i ) : the component j of ith training data
xj
19
Multiple Input Representation

Matrix representation:
y  h( x)   0  1 x1   2 x2  ...   m xm
1  0 
x   
x  1   1
 ...   ... 
   
 xm   m 

 x0 
x 
 1
h( x)  [ 0 1  2 ... m ]  x2    T x
 
 ... 
 xm 
20
Multiple Input Representation

Cost Function:

1 m 1 m
E ( )  
2m i 1
(h ( x (i ) )  y (i ) ) 2  
2m i 1
( T x (i )  y (i ) )2

21
Multiple Input Representation

Gradient Descent
Start with random value of θ, step by step modify θ in order to decrease
the cost function

 j :  j   E ( )
 j

 1 m  T (i )
 j
E ( )  
2m i 1  j
( x  y (i ) 2
)

1 m T (i )
  ( x  y (i ) ) x j ( i )
m i 1

22
Multiple Input Representation

Gradient Descent
Start with random value of θ, step by step modify θ in order to decrease
the cost function

 j :  j   E ( )
 j

 1 m  T (i )
 j
E ( )  
2m i 1  j
( x  y (i ) 2
)

1 m T (i )
  ( x  y (i ) ) x j ( i )
m i 1

23
Normal Equations

Linear Regression:
Minimize the value of the cost function:
1 m 1 m
E ( )  
2m i 1
( h ( x (i )
)  y (i ) 2
)  
2m i 1
( T (i )
x  y (i ) 2
)

Normal Equations:
Solve the following equation to find out the optimized value of θ


E ( )  0


24
Normal Equations

Linear Regression:
Minimize the value of the cost function:
1 m 1 m
E ( )  
2m i 1
( h ( x (i )
)  y (i ) 2
)  
2m i 1
( T (i )
x  y (i ) 2
)

Normal Equations:
Solve the following equation to find out the optimized value of θ

 
E ( )  0  j  (0,1,..., n), E ( )  0
  j

25
Normal Equations

Solution:
Given a training set of m training example, each contain n inputs, we have
the matrix X (m,n+1) of inputs and vector of output Y

 x0 (1) x1(1) ... xn (1)   ( x (1) )T   y (1) 

 (2) (2) (2)   (2) T   (2) 
 x0 x1 ... xn   ( x )   y 
X  Y
 ...   ...   ... 
 (m) (m) (m)   (m) T   (m) 
 x0 x1 ... xn  ( x )  y 
Solution of normal equations is:

  ( X T X ) 1 X T Y 26
Homework

Write a program in (Matlab/C++)to implement gradient descent algorithm

for the following training data with different learning method: batch
learning, stochastic and mini-batch, normal equation. Send your code and
report on what do you see from the result to my email before 20 October
2016

27
Polynomial Regression

Polynomial Regression
Output is an polynomial function of the input
For example

h( x)  0  1 x   2 x 2  ...   n x n

Assume x1  x
x2  x 2
h( x)   0  1 x1   2 x2  ...   n xn
...
x n x n

Linear Regression
28
References

http://openclassroom.stanford.edu/MainFolder/VideoPage.php?co
urse=MachineLearning&video=02.4-LinearRegressionI-
GradientDescent&speed=100

29
Feature Rescale

Objective: Scale all features to the same scale, in order to have easier
computation
Popular scale : [0,1],[-0.5,0.5]

x=x/max (x)
c= mean (x) x=x-c/max(x) =>

Linear Regression
No ratings yet
Linear Regression
30 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
[PR 2024] Lec2 Regression II
No ratings yet
[PR 2024] Lec2 Regression II
41 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Week 04
No ratings yet
Week 04
101 pages
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
6 pages
Linear Regression
100% (1)
Linear Regression
51 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
[ML&PR 2025] Lec2 Regression II
No ratings yet
[ML&PR 2025] Lec2 Regression II
41 pages
Updating_Weight
No ratings yet
Updating_Weight
9 pages
Gradient descent
No ratings yet
Gradient descent
16 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
lecture7-linear-regression
No ratings yet
lecture7-linear-regression
36 pages
Lecture3_Linear Regression and Logistic Regression
No ratings yet
Lecture3_Linear Regression and Logistic Regression
60 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Unit 3.1 Gradient Descent in Linear Regression
No ratings yet
Unit 3.1 Gradient Descent in Linear Regression
6 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
vertopal.com_22644501_lab02 (4)
No ratings yet
vertopal.com_22644501_lab02 (4)
14 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
4. Gradient Descent
No ratings yet
4. Gradient Descent
15 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
2-LR_Optim
No ratings yet
2-LR_Optim
60 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear Regression: Level:4 Department: IT, Security
No ratings yet
Linear Regression: Level:4 Department: IT, Security
35 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Machine Learning Coursera Quiz 2
100% (1)
Machine Learning Coursera Quiz 2
6 pages
ML-UNIT-3
No ratings yet
ML-UNIT-3
46 pages
ML-UNIT-3-1
No ratings yet
ML-UNIT-3-1
57 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
7 pages
2 (1)
No ratings yet
2 (1)
18 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
8_Linear Regression- Gradient Descent Method
No ratings yet
8_Linear Regression- Gradient Descent Method
21 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science
No ratings yet
Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science
20 pages
Regression PDF
No ratings yet
Regression PDF
37 pages
1-Review of Linear Regression
No ratings yet
1-Review of Linear Regression
29 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Vertopal.com C1 W1 Lab03 Cost Function Soln
No ratings yet
Vertopal.com C1 W1 Lab03 Cost Function Soln
5 pages
Question 1 B
No ratings yet
Question 1 B
6 pages
Fractional Brownian Motion: Approximations and Projections
From Everand
Fractional Brownian Motion: Approximations and Projections
Oksana Banna
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
(Ebook) Data Analysis with Small Samples and Non-Normal Data: Nonparametrics and Other Strategies by Carl F. Siebert ISBN 9780199391493, 0199391491 All Chapters Instant Download
100% (7)
(Ebook) Data Analysis with Small Samples and Non-Normal Data: Nonparametrics and Other Strategies by Carl F. Siebert ISBN 9780199391493, 0199391491 All Chapters Instant Download
57 pages
8th MLlab
No ratings yet
8th MLlab
3 pages
Chapter 10. Simple Regression and Correlation
100% (1)
Chapter 10. Simple Regression and Correlation
34 pages
Appreciation Uniquely Predicts Life Satisfaction Above Demographics
No ratings yet
Appreciation Uniquely Predicts Life Satisfaction Above Demographics
5 pages
Sample Exam Questions
No ratings yet
Sample Exam Questions
7 pages
ECC321 chapter2
No ratings yet
ECC321 chapter2
5 pages
Linear Regression Example Data: House Price in $1000s (Y) Square Feet (X)
No ratings yet
Linear Regression Example Data: House Price in $1000s (Y) Square Feet (X)
33 pages
Waterman - ICEMINE - 2022 (2 New)
No ratings yet
Waterman - ICEMINE - 2022 (2 New)
12 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Strategic Human Resource Planning (SHRP) : Dr. Vaneeta Aggarwal
No ratings yet
Strategic Human Resource Planning (SHRP) : Dr. Vaneeta Aggarwal
44 pages
HW3
No ratings yet
HW3
2 pages
Correlation of Weather Parameters
No ratings yet
Correlation of Weather Parameters
11 pages
Advanced Statistics and Probability
No ratings yet
Advanced Statistics and Probability
37 pages
Version 3 Documentation Addendum
No ratings yet
Version 3 Documentation Addendum
11 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Madura IFM10e IM Ch04
100% (1)
Madura IFM10e IM Ch04
11 pages
Introductory Econometrics for Finance 2nd Edition Chris Brooks pdf download
100% (6)
Introductory Econometrics for Finance 2nd Edition Chris Brooks pdf download
64 pages
Academic Achievement and Life Satisfaction Among Medical Students With Forced Medical Parental Choice
No ratings yet
Academic Achievement and Life Satisfaction Among Medical Students With Forced Medical Parental Choice
18 pages
Slides MLR
No ratings yet
Slides MLR
17 pages
Immediate download Statistical Analysis with Missing Data 3rd Edition Roderick J. A. Little ebooks 2024
100% (2)
Immediate download Statistical Analysis with Missing Data 3rd Edition Roderick J. A. Little ebooks 2024
40 pages
Project Proposal 260 Copy
No ratings yet
Project Proposal 260 Copy
38 pages
Project_Report___Vishal_Pradeep
No ratings yet
Project_Report___Vishal_Pradeep
97 pages
Corporate Governanceand Financial Performanceof Nigeria Listed Banks
No ratings yet
Corporate Governanceand Financial Performanceof Nigeria Listed Banks
7 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Download ebooks file Asymptotic Theory for Econometricians Revised Edition Economic Theory Econometrics and Mathematical Economics White all chapters
100% (1)
Download ebooks file Asymptotic Theory for Econometricians Revised Edition Economic Theory Econometrics and Mathematical Economics White all chapters
67 pages
CHAPTER 2 Simple Linear Regression
No ratings yet
CHAPTER 2 Simple Linear Regression
76 pages
SAT Suite Question Bank - Results - PDF Qrafik Cavabalar
No ratings yet
SAT Suite Question Bank - Results - PDF Qrafik Cavabalar
75 pages
BSc. AC-Sem IV
No ratings yet
BSc. AC-Sem IV
19 pages
The Surprising Harmfulness of Benign Overfitting For
No ratings yet
The Surprising Harmfulness of Benign Overfitting For
47 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Lecture 2

Dr. Le Huu Ton

Multi Feature Representation

Questions and Answers

For each input x, output y = h (x)

In Supervised Learning, each training data consists of 2 elements:

Plotting of training data

Linear Regression: Assume that the output y is a linear function of input x

Find the coefficient a,b to minimize the cost function

Note that E is a function of a and b, we have only 2 variables a and

A demonstration and explanation of Gradient Descent algorithm

Suppose that (x0,y0) is a local minimum of the cost function, what

Calculate the derivations of the cost function:  E (a) 

Given the following formula:

Batch and Stochastic Gradient Descent

Stochastic Gradient Descent:

Mini-Batch Gradient Descent

Big value of α may lead to the

 Gradient Descent works if the

 x0 (1) x1(1) ... xn (1)   ( x (1) )T   y (1) 

Write a program in (Matlab/C++)to implement gradient descent algorithm

You might also like