0% found this document useful (0 votes)
17 views

Lecture03 Linear Regression

Uploaded by

baygiolamaygio04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lecture03 Linear Regression

Uploaded by

baygiolamaygio04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

UET

Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN


VNU-University of Engineering and Technology

INT3405 - Machine Learning


Lecture 3: Linear Regression
Duc-Trong Le & Viet-Cuong Ta

Hanoi, 09/2023
Outline
● Supervised Learning
● Linear Regression with One Variable
○ Model Representation
○ Cost Functions
○ Gradient Descent
● Linear Regression with Multiple Variables
○ Learning rate
○ Normal Equation

FIT-CS INT3405 - Machine Learning 2


Recap: Random Variables

FIT-CS INT3405 - Machine Learning 3


Supervised Learning
●Supervised (Inductive) Learning
●Formalization
○ Input:

○ Output:

○ Target function: (unknown)

○ Training Data:

○ Hypothesis:

○ Hypothesis space:
FIT-CS INT3405 - Machine Learning 4
A Learning Problem

Unknown
Function
Input Output

FIT-CS INT3405 - Machine Learning 5


The Statistical Learning Framework

6
The Statistical Learning Framework

7
The Statistical Learning Framework

8
Hypothesis Spaces
●Linear models

○ Infinite possible hypotheses!


○ Any choices of coefficient a and b will result in a possible hypothesis
● Polynomial models

● Any nonlinear models

FIT-CS INT3405 - Machine Learning 9


Two Views of Learning
●Learning is the removal of our remaining uncertainty.
○ If we are know that x and y are linearly dependent, then we could
use the training data to infer the linear function
●Learning requires guessing a good, small hypothesis class.
○ We could start with a very small / simple class, and enlarge it until it
contains a hypothesis that fits the data
●But we could be wrong
○ Our prior knowledge might be wrong
○ Our guess of the hypothesis class could be wrong
■ The smaller the hypothesis class, the more likely we are wrong
FIT-CS INT3405 - Machine Learning 10
Two Strategies for Machine Learning
●Develop Languages for Expressing Prior Knowledge
○ Rule grammars and stochastic models

●Develop Flexible Hypothesis Spaces


○ Nested collections of hypotheses, rules, linear models, decision trees,
neural networks, etc.

●For either case, the key is to


○ Developing efficient algorithms for finding a Hypothesis that best
approximates the target function for fitting the data
FIT-CS INT3405 - Machine Learning 11
Key Issues in Machine Learning
● What are good hypothesis spaces?
○ Which spaces have been useful in practical applications and why?
● What algorithms can work with these spaces?
○ Are there general design principles for machine learning algorithms?
● How can we find the best hypothesis in an efficient way?
○ How to find the optimal solution efficiently (“optimization” question)
● How can we optimize accuracy on future data?
○ Known as the “overfitting” problem (i.e., “generalization” theory)
● How can we have confidence in the results?
○ How much training data is required to find accurate hypothesis? (“statistical” question)
● Are some learning problems computationally intractable? (“computational” question)
● How can we formulate application problems as machine learning problems? (“engineering”
question)
FIT-CS INT3405 - Machine Learning 12
Regression with One Variable (1)
Housing Prices
(Portland, OR)
Price
(in 1000s of dollars)

Size
(feet2)
Supervised Learning Regression Problem
Given the “right answer” for each Predict real-valued output
example in the data.

FIT-CS INT3405 - Machine Learning 13


Regression with One Variable (2)

Training set of Size in feet2 (x) Price ($) in 1000's (y)


housing prices 2104 460
(Portland, OR) 1416 232
1534 315
852 178
… …

Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable

FIT-CS INT3405 - Machine Learning 14


Model Representation
Training Set How do we represent h ?

Learning Algorithm y

Size of h Estimated x
house price
x Hypothesis y
Linear regression with one variable.
“Univariate Linear Regression”

How to choose parameters ?


FIT-CS INT3405 - Machine Learning 15
Formulation: Cost Function (1)
Hypothesis:

Parameters:

y
Cost Function: mean squared error (MSE)

x
Goal:

FIT-CS INT3405 - Machine Learning 16


Formulation: Cost Function (2)
Simplified
Hypothesis:

Parameters:

Cost Function:

Goal:

FIT-CS INT3405 - Machine Learning 17


Cost Function: Example (1)

For fix this is a function of x function of the parameter

FIT-CS INT3405 - Machine Learning 18


Cost Function: Example (2)

For fix this is a function of x function of the parameter

FIT-CS INT3405 - Machine Learning 19


Cost Function: Example (3)

For fix this is a function of x function of the parameter

FIT-CS INT3405 - Machine Learning 20


Cost Function (1)

Hypothesis:

Parameters:

Cost Function:

Goal:

FIT-CS INT3405 - Machine Learning 21


Cost Function (2)

(for fixed , this is a function of x) (function of the parameters )

Price ($)
in
1000’s

Size in feet2
(x)

FIT-CS INT3405 - Machine Learning 22


Cost Function (3)
●Contour plots

FIT-CS INT3405 - Machine Learning 23


Cost Function (4)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 24


Cost Function (5)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 25


Gradient Descent for Optimization (1)

Given some objective function


Want to optimize

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum

FIT-CS INT3405 - Machine Learning 26


Gradient Descent for Optimization (2)

FIT-CS INT3405 - Machine Learning 27


Gradient Descent for Optimization (3)

FIT-CS INT3405 - Machine Learning 28


Gradient Descent Algorithm

Gradient descent algorithm

learning rate parameter


(rule of thumb: 0.1)

FIT-CS INT3405 - Machine Learning 29


Gradient Descent for Linear Regression (1)
Gradient descent algorithm Linear Regression Model

FIT-CS INT3405 - Machine Learning 30


Gradient Descent for Linear Regression (2)

Gradient descent algorithm

update
and
simultaneously

FIT-CS INT3405 - Machine Learning 31


Gradient Descent Example (1)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 32


Gradient Descent Example (2)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 33


Gradient Descent Example (3)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 34


Gradient Descent Example (4)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 35


Gradient Descent Example (5)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 36


Gradient Descent Example (6)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 37


Gradient Descent Example (7)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 38


Gradient Descent Example (8)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 39


Gradient Descent Example (9)

(for fixed , this is a function of x) (function of the parameters )

FIT-CS INT3405 - Machine Learning 40


Batch Gradient Descent

“Batch”: Each step of gradient descent uses all the


training examples.

FIT-CS INT3405 - Machine Learning 41


Multivariate Linear Regression (1)
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)


bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
FIT-CS INT3405 - Machine Learning 42
Multivariate Linear Regression (2)
Hypothesis:

Previously:

For convenience of notation, define .

FIT-CS INT3405 - Machine Learning 43


Gradient Descent for Multivariate LR
Hypothesis:

Parameters:

Cost function:

Gradient descent:
Repeat (simultaneously update for every )

FIT-CS INT3405 - Machine Learning 44


Univariate LR vs Multivariate LR

Gradient Descent
Previously (n=1): New algorithm :
Repeat Repeat

(simultaneously update )

FIT-CS INT3405 - Machine Learning 45


Convergence and Learning Rate

Example automatic convergence test:

Declare convergence if
decreases by less than
in one iteration.

No. of iterations
For sufficiently small , should decrease on every iteration.
But if is too small, gradient descent can be slow to converge.
If is too large: may not decrease on every iteration; may not converge.
SML– Term 1 2020-2021
FIT-CS INT3405 - Machine Learning 46
46
Learning Rate

divergenc
e

gradually
too small too decreased
constant large

FIT-CS INT3405 - Machine Learning 47


Normal Equation (1)
Gradient Descent
• Iterative approach
Normal Equation
• Analytical method to solve
Intuition Example: If 1D

Solve equation to find w

FIT-CS INT3405 - Machine Learning 48


Normal Equation (2)

FIT-CS INT3405 - Machine Learning 49


Normal Equation (3)
●Matrix-vector formulation

●Analytical solution

FIT-CS INT3405 - Machine Learning 50


The Pseudo-inverse

FIT-CS INT3405 - Machine Learning 51


Normal Equation: Example
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178

is inverse of matrix .
FIT-CS INT3405 - Machine Learning 52
Gradient Descent vs Normal Equation
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.

FIT-CS INT3405 - Machine Learning 53


Summary
● Supervised Learning
● Linear Regression with One Variable
○ Model Representation
○ Cost Functions
○ Gradient Descent
● Linear Regression with Multiple Variables
○ Learning rate
○ Normal Equation

Duc-Trong Le
FIT-CS INT3405 - Machine Learning 54

You might also like