Lecture LinearRegression
Lecture LinearRegression
Dimensionality
Continuous Regression reduction
Linear Regression
• Model representation
• Cost function
• Gradient descent
• Cost function
• Gradient descent
Learning Algorithm
𝑥 h 𝑦
Size of house Hypothesis Estimated price
House pricing prediction
Price ($)
in 1000’s
400
300
200
100
• Notation:
• = Number of training examples
• = Input variable / features Examples:
• = Output variable / target variable
• (, ) = One training example
• (, ) = training example
Model representation
Training set
Shorthand
𝑥 h 𝑦
200
100
Size of house Hypothesis Estimated price 500 1000 1500 2000 2500
Size in feet^2
• Cost function
• Gradient descent
• Hypothesis
: parameters/weights
1 2 3 𝑥 1 2 3 𝑥 1 2 3 𝑥
Cost function
• Idea: 𝜃 0 , 𝜃1
Choose so that
is close to for our
h 𝜃 ( 𝑥 ) =𝜃 0 +𝜃 1 𝑥
(𝑖 ) (𝑖)
training example
𝑚
𝑦 1 (𝑖 ) 2
Price ($)
in 1000’s
𝐽 ( 𝜃0 , 𝜃1 ) = ∑
2𝑚 𝑖=1
( h𝜃 ( 𝑥 ) − 𝑦 )
(𝑖 )
400
300
200
Cost function
100
• Parameters: • Parameters:
• Goal: • Goal:
𝜃 0 , 𝜃1 𝜃 0 , 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
• Hypothesis:
• Parameters:
• Cost function:
• Goal:
𝜃 0 , 𝜃1
Cost function
How do we find good that minimize ?
Linear Regression
• Model representation
• Cost function
• Gradient descent
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at minimum
Gradient descent
Repeat until convergence{
(for and )
}
𝜕
3 𝐽 ( 𝜃1 ) < 0
𝜕 𝜃1
𝜕
2 𝐽 ( 𝜃1 ) > 0
𝜕 𝜃1
0 1 2 3 𝜃1
Learning rate
Gradient descent for linear regression
Repeat until convergence{
(for and )
}
•:
•:
Gradient descent for linear regression
Repeat until convergence{
}
Linear Regression
• Model representation
• Cost function
• Gradient descent
h 𝜃 ( 𝑥 )=𝜃 0+ 𝜃1 𝑥
Multiple features (input variables)
Size in feet^2 () Number of Number of Age of home Price ($) in
bedrooms () floors () (years) () 1000’s (y)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… …
Notation:
= Number of features
= Input features of training example
= Value of feature in training example
Hypothesis
Previously:
Now:
h 𝜃 ( 𝑥 )=𝜃 0+𝜃1 𝑥1 +𝜃 2 𝑥 2+…+𝜃 𝑛 𝑥 𝑛
• For convenience of notation, define
( for all examples)
•
Gradient descent
• Previously () • New algorithm ()
}
Simultaneously update
Gradient descent in practice: Feature scaling
• Idea: Make sure features are on a similar scale (e.g,. )
• E.g. size (0-2000 feat^2)
number of bedrooms (1-5)
𝜃2 𝜃2
3 3
2 2
1 1
𝜃1 𝜃1
0 1 2 3 0 1 2 3
Gradient descent in practice: Learning rate
• Automatic convergence test
• too small: slow convergence
• too large: may not converge
• To choose , try
• Area
Polynomial regression
Price ($)
in 1000’s
400
300
200
100 = (size)
500 1000 1500 2000 2500
= (size)^2
Size in feet^2 = (size)^3