ML - Multiple Linear Regression

Regression Methods in
Machine Learning
Multiple Linear Regression
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
July, 2017

Multiple Linear Regression
X1 (Independent Variable)
Y (Dependent Variable) Hyperplane
• Used to Predict a correlation between more than one
independent variables and a dependent variable.
e.g., Income and Age is correlated with Spending
• When the data is plotted on a graph, there appears to
be a hyperplane relationship.
X2 (Independent Variable)

Simple vs. Multiple Linear Regression
• Simple Linear Regression – one independent variable.
y = b0 + b1x1
• Multiple Linear Regression – multiple independent
variables.
y = b0 + b1x1 + b2x2 … + bnxn
2nd independent
variable and
weight (coefficient)
nth independent
variable and
weight (coefficient)

Feature Elimination
ID Income Age Height Spending
37 18000 18 5’8”
38 75000 40 5’9”
39 27000 28 6’1”
40 24000 26 5’6”
41 45000 34 6’2”
• In a dataset, we may not want to keep all the independent
variables (features) in our model:
• More Features = More Complex Model
• If feature does not contribute to prediction, adds noise
to the model.
ID fields are like
random numbers –
do not contribute
to prediction.
Height not likely or
very little to
influence spending

Backward Elimination
• A method for identifying and removing independent
variables that do not contribute enough to the model.
• Steps:
• Fit (Train) the model with all the independent variables.
• Calculate the P-value of each independent variable.
• Eliminate independent variable with highest P-value above
threshold (e.g., 0.05 [5 percent]).
• Repeat (re-fit) until there are no independent variables with
P-value above threshold.
All Variables Train
Is Variable with highest
P-value > Threshold
DONE
Eliminate the variable

Multiple Linear Regression in Python
from sklearn.linear_model import LinearRegression # sci-kit learn library for linear regression
regressor = LinearRegression() # instantiate linear regression object
regressor.fit(X_train, y_train) # train (fit) the model
• Perform Linear Regression with all independent variables.
y_pred = regressor.predict( X_test ) # y_pred is the list of predicted results
• Run (Predict) the model on the test data.
• Analyze (View) the predicted values (y_pred) to the actual values (y_test)

Backward Elimination in Python
import statsmodels.formula.api as sm
X = np.append( arr = np.ones( (nrows,1 )).astype(int), values = X, axis = 1 )
• Prepare for Backward Elimination.
• The statsmodel does not take into account the constant b0.
• Need to fake it by adding a x0 = 1 independent variable for b0.
Function to create
column of ones
Append column
of ones to X
Create array of one
column of nrows
Append ones
to this array
X_opt = X[:, [0, 1, 2, 3, 4] ]
• Create array of optional independent variables (features) from which we
will eliminate independent variables.
All rows Start with all columns (i.e., 0, 1, 2 .. N)

Backward Elimination in Python (2)
ols = sm.OLS( endog = y, exog = X_opt).fit() # Create OLS object and fit the model
ols.summary() # Display Statistical Metrics including P-values
• Use the class Ordinary Linear Square (OLS) from stats model to train (fit)
the model and get P-values.
Independent
Variables
Dependent
Variable (Label)
• Example elimination of an independent variable (x2).
X_opt = X[:, [0,1,3,4,5]]
ols = sm.OLS( endog = y, exog = X_opt).fit()
ols.summary()
Eliminate x2 (2), where 0 is x0 (constant)
• Repeat elimination until independent variable with highest P-value is not
greater than the threshold (e.g., 0.05).

ML - Multiple Linear Regression

More Related Content

What's hot (20)

Similar to ML - Multiple Linear Regression (20)

More from Andrew Ferlitsch (20)

Recently uploaded (20)

ML - Multiple Linear Regression