Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
multiple variables
Reading material:
Part 1 of lecture
notes 1 Slides taken from Andrew Ng Course
Multiple features (variables).
2104 460
1416 232
1534 315
852 178
… …
Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Hypothesis:
Previously:
Now: ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 ⋯ + 𝜃𝑛 𝑥
For convenience of notation, define .
𝑥0 𝜃0
𝒙= ⋮ ,𝜽= ⋮
𝑥𝑛 𝜃𝑛
ℎ𝜃 𝑥 = 𝜽 ⋅ 𝒙 = 𝜽𝑻 𝒙
Gradient descent:
Repeat
(simultaneously update )
Linear Regression with
multiple variables
Gradient descent in
practice I: Feature
Scaling
Machine Learning
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms
Mean normalization (Standardization)
Replace with to make features have approximately zero mean
(Do not apply to ).
E.g.
𝑥𝑖 − 𝜇𝑖
𝑥𝑖 =
𝑠𝑖
Linear Regression with
multiple variables
Gradient descent in
practice II: Learning rate
and stochastic gradient
Machine Learning
descent
Gradient descent
𝐽 𝜃 should
Example automatic
decrease after every convergence test:
iteration
Declare convergence if
decreases by less than
in one iteration.
0 100 200 300 400
No. of iterations
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .
No. of iterations
No. of iterations
To choose , try
New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)
(simultaneously update )
Stochastic Gradient Descent
Batch Gradient Descent Stochastic Gradient Descent
• Initialize parameters randomly • Initialize parameters randomly
• Repeat • Repeat {
for 𝑖 = 1 𝑡𝑜 𝑛 {
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼 ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 𝑥𝑗𝑖
} for every 𝑗
for every 𝑗
}
Linear Regression with
multiple variables
Normal equation
Machine Learning
1. Gradient Descent:
2. Normal equation:
Method to solve for analytically.
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
examples ; features.
E.g. If
is inverse of matrix .
numpy: np.linalg.pinv()
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.