0% found this document useful (0 votes)
55 views

Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1

Linear regression can be performed with multiple variables or features. Gradient descent can be used to minimize the cost function and update the parameters simultaneously for all features. Feature scaling ensures features are on a similar scale. The learning rate and convergence criteria must be chosen carefully for gradient descent. Alternatively, the normal equation can directly compute the optimal parameters without iterations, but requires inverting a matrix.

Uploaded by

M.Talha Javed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1

Linear regression can be performed with multiple variables or features. Gradient descent can be used to minimize the cost function and update the parameters simultaneously for all features. Feature scaling ensures features are on a similar scale. The learning rate and convergence criteria must be chosen carefully for gradient descent. Alternatively, the normal equation can directly compute the optimal parameters without iterations, but requires inverting a matrix.

Uploaded by

M.Talha Javed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Linear Regression with

multiple variables

Reading material:
Part 1 of lecture
notes 1 Slides taken from Andrew Ng Course
Multiple features (variables).

Size (feet2) Price ($1000)

2104 460
1416 232
1534 315
852 178
… …
Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Hypothesis:
Previously:

Now: ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 ⋯ + 𝜃𝑛 𝑥
For convenience of notation, define .

𝑥0 𝜃0
𝒙= ⋮ ,𝜽= ⋮
𝑥𝑛 𝜃𝑛

ℎ𝜃 𝑥 = 𝜽 ⋅ 𝒙 = 𝜽𝑻 𝒙

Multivariate linear regression.


Linear Regression with
multiple variables

Gradient descent for


multiple variables
Machine Learning
Hypothesis:
Parameters:
Cost function:

Gradient descent:
Repeat

(simultaneously update for every )


New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)

(simultaneously update )
Linear Regression with
multiple variables
Gradient descent in
practice I: Feature
Scaling
Machine Learning
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms
Mean normalization (Standardization)
Replace with to make features have approximately zero mean
(Do not apply to ).
E.g.

𝑥𝑖 − 𝜇𝑖
𝑥𝑖 =
𝑠𝑖
Linear Regression with
multiple variables
Gradient descent in
practice II: Learning rate
and stochastic gradient
Machine Learning
descent
Gradient descent

- “Debugging”: How to make sure gradient


descent is working correctly.
- How to choose learning rate .
Making sure gradient descent is working correctly.

𝐽 𝜃 should
Example automatic
decrease after every convergence test:
iteration

Declare convergence if
decreases by less than
in one iteration.
0 100 200 300 400
No. of iterations
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .

No. of iterations

No. of iterations

- For sufficiently small , should decrease on every iteration.


- But if is too small, gradient descent can be slow to converge.
Summary:
- If is too small: slow convergence.
- If is too large: may not decrease on
every iteration; may not converge.

To choose , try
New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)

(simultaneously update )
Stochastic Gradient Descent
Batch Gradient Descent Stochastic Gradient Descent
• Initialize parameters randomly • Initialize parameters randomly
• Repeat • Repeat {
for 𝑖 = 1 𝑡𝑜 𝑛 {
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼 ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 𝑥𝑗𝑖
} for every 𝑗
for every 𝑗
}
Linear Regression with
multiple variables

Normal equation

Machine Learning
1. Gradient Descent:

2. Normal equation:
Method to solve for analytically.
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
examples ; features.

E.g. If
is inverse of matrix .

numpy: np.linalg.pinv()
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.

You might also like