Lecture03 Linear Regression
Lecture03 Linear Regression
Since 2004
Hanoi, 09/2023
Outline
● Supervised Learning
● Linear Regression with One Variable
○ Model Representation
○ Cost Functions
○ Gradient Descent
● Linear Regression with Multiple Variables
○ Learning rate
○ Normal Equation
○ Output:
○ Training Data:
○ Hypothesis:
○ Hypothesis space:
FIT-CS INT3405 - Machine Learning 4
A Learning Problem
Unknown
Function
Input Output
6
The Statistical Learning Framework
7
The Statistical Learning Framework
8
Hypothesis Spaces
●Linear models
Size
(feet2)
Supervised Learning Regression Problem
Given the “right answer” for each Predict real-valued output
example in the data.
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
Learning Algorithm y
Size of h Estimated x
house price
x Hypothesis y
Linear regression with one variable.
“Univariate Linear Regression”
Parameters:
y
Cost Function: mean squared error (MSE)
x
Goal:
Parameters:
Cost Function:
Goal:
Hypothesis:
Parameters:
Cost Function:
Goal:
Price ($)
in
1000’s
Size in feet2
(x)
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
update
and
simultaneously
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
FIT-CS INT3405 - Machine Learning 42
Multivariate Linear Regression (2)
Hypothesis:
Previously:
Parameters:
Cost function:
Gradient descent:
Repeat (simultaneously update for every )
Gradient Descent
Previously (n=1): New algorithm :
Repeat Repeat
(simultaneously update )
Declare convergence if
decreases by less than
in one iteration.
No. of iterations
For sufficiently small , should decrease on every iteration.
But if is too small, gradient descent can be slow to converge.
If is too large: may not decrease on every iteration; may not converge.
SML– Term 1 2020-2021
FIT-CS INT3405 - Machine Learning 46
46
Learning Rate
divergenc
e
gradually
too small too decreased
constant large
●Analytical solution
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
is inverse of matrix .
FIT-CS INT3405 - Machine Learning 52
Gradient Descent vs Normal Equation
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
Duc-Trong Le
FIT-CS INT3405 - Machine Learning 54