1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
Machine Learning
Machine Learning Resources ▾
Machine Learning and Econometrics ▾
Supervised Learning Theory ▾
Overview (/machine-learning/)
One Variable Linear Regression (/one-variable-linear-regression/)
Linear Algebra (/linear-algebra-machine-learning/)
Multiple Variable Linear Regression (/multi-variable-linear-regression/)
Logistic Regression (/logistic-regression/)
Neural Networks (Representation) (/neural-networks-representation/)
Neural Networks (Learning) (/neural-networks-learning/)
Applying Machine Learning (/applying-machine-learning/)
Machine Learning Systems Design (/machine-learning-systems-design/)
Support Vector Machines (/machine-learning-svms-support-vector-machines/)
Unsupervised Learning Theory ▾
Reinforcement Learning Theory ▾
Deep Learning Theory ▾
Deep Learning with TensorFlow ▾
Machine Learning with Scikit-Learn ▾
Machine Learning Projects ▾
Linear Regression with Multiple Variables
Summary: Linear Regression with Multiple Variables.
Table of Contents
1. Multivariate Linear Regression
– 1a. Multiple Features (Variables)
– 1b. Gradient Descent for Multiple Variables
http://www.ritchieng.com/multi-variable-linear-regression/ 1/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
– 1c. Gradient Descent: Feature Scaling
– 1d. Gradient Descent: Checking
– 1e. Gradient Descent: Learning Rate
– 1f. Features and Polynomial Regression
2. Computing Parameters Analytically
– 2a. Normal Equation
– 2b. Normal Equation Non-invertibility
1. Multivariate Linear Regression
I would like to give full credits to the respective authors as these are my personal python notebooks taken from
deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple python notebook hosted
generously through Github Pages that is on my main personal notes repository on
https://github.com/ritchieng/ritchieng.github.io . They are meant for my personal review but I have open-source
my repository of personal notes as a lot of people found it useful.
1a. Multiple Features (Variables)
X1, X2, X3, X4 and more
http://www.ritchieng.com/multi-variable-linear-regression/ 2/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
New hypothesis
Multivariate linear regression
Can reduce hypothesis to single number with a transposed theta matrix multiplied by x matrix
1b. Gradient Descent for Multiple Variables
http://www.ritchieng.com/multi-variable-linear-regression/ 3/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
Summary
http://www.ritchieng.com/multi-variable-linear-regression/ 4/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
New Algorithm
1c. Gradient Descent: Feature Scaling
Ensure features are on similar scale
Gradient descent will take longer to reach the global minimum when the features are not on a similar
scale
Feature scaling allows you to reach the global minimum faster
http://www.ritchieng.com/multi-variable-linear-regression/ 5/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
So long they’re close enough, need not be between 1 and -1
Mean normalization
1d. Gradient Descent: Checking
Can you a graph
x-axis: number of iterations
http://www.ritchieng.com/multi-variable-linear-regression/ 6/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
y-axis: min J(theta)
Or use automatic convergence test
Tough to gauge epsilon
Gradient descent that is not working (large learning rate)
1e. Gradient Descent: Learning Rate
Alpha (Learning Rate) too small: slow convergence
http://www.ritchieng.com/multi-variable-linear-regression/ 7/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
Alpha (Learning Rate) too large:
J(theta) may not decrease on every iteration
May not converge (diverge)
Start with 0.001 and increase x3 each time until you reach an acceptable alpha
Choose a slightly smaller number than that acceptable alpha value
1f. Features and Polynomial Regression
Ensure the features capture the pattern
Doesn’t make sense to choose quadratic equation for house prices
Use cubic or square root
There are automatic algorithms, and this will be discussed later
http://www.ritchieng.com/multi-variable-linear-regression/ 8/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
2. Computing Parameters Analytically
2a. Normal Equation
Method to solve for theta analytically
If theta is real number
Minimise J(theta) is to take the derivative and equate to zero
Solve for theta
If theta is not
Take partial derivative and equate to zero
http://www.ritchieng.com/multi-variable-linear-regression/ 9/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
Solve for all thetas
Minimise Cost Function: Specific Example
X: m x (n + 1)
m: number of training examples
n: number of features
X_transpose: (n + 1) x m
X_transpose * X: (n + 1) x m * m x (n + 1) = (n + 1) x (n + 1)
(X_transpose * X)^-1 * X_transpose: (n + 1) x (n + 1) * (n + 1) x m = (n + 1) x m
http://www.ritchieng.com/multi-variable-linear-regression/ 10/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
theta = (n + 1) x m * m x 1 = (n + 1) x 1
Minimise Cost Function: General
Minimise Cost: Octave Code
No need for feature scaling using normal equation
http://www.ritchieng.com/multi-variable-linear-regression/ 11/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision
pinv (X' * X) * X' * y
Gradient Descent vs Normal Equation
Gradient Descent Normal Equation
Need to choose No need to choose alpha
alpha
Needs many Don’t need to iterate
iterations
Works with large n Slow if n is large (100, 1000 is fine)
(10,000)
Number of features So long number features < 1000
> 1000
2b. Normal Equation Non-invertibility
What happens if X_transpose * X is non-invertible (singular or degenerate)
pinv (X' * X) * X' * y
This works regardless if it is non-invertible
Intuition of non-invertibility
Causes of non-invertibility
http://www.ritchieng.com/multi-variable-linear-regression/ 12/12