0% found this document useful (0 votes)

12 views49 pages

Lecture 4 - Multiple Linear Regression Imran 20022025 092939am

The document provides a comprehensive overview of Multiple Linear Regression (MLR), explaining its purpose, assumptions, and methodologies for implementation. It details how MLR can be used to analyze relationships between multiple independent variables and a dependent variable, including statistical concepts like t-tests and p-values. Additionally, it outlines various strategies for selecting independent variables and includes Python code examples for practical application.

Uploaded by

ridasaman47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views49 pages

Lecture 4 - Multiple Linear Regression Imran 20022025 092939am

Uploaded by

ridasaman47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Multiple Linear

Regression
Introduction to Machine Learning
Contents
1. What is multiple linear regression (MLR)

2. What multiple linear regression can help you do.

3. Assumptions of multiple linear regression

4. How to perform a multiple linear regression

i. T-test
ii. P-value
iii. The model
iv. Selecting the independent variables
v. Python Code
vi. Example of backward elimination for selection of independent variables
What is MLR Multiple linear regression is used to
estimate the relationship between two
or more independent variables and one
dependent variable.
What multiple linear regression can
help you do.
• You can use multiple linear regression when you want to know:
 How strong the relationship is between two or more independent variables and one
dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added
affect crop growth).
 The value of the dependent variable at a certain value of the independent variables
(e.g. the expected yield of a crop at certain levels of rainfall, temperature, and
fertilizer addition).
Assumptions of multiple linear
regression
• Multiple linear regression makes all of the same assumptions as simple
linear regression:
 The probability distribution of e is normal.
 The mean of e is zero: E(e) = 0.
 The standard deviation of e is se for all values of X.
 The set of errors associated with different values of Y are all independent
Two or more independent
variables (predictor
variables).
Design
Requirements
Sample size: >= 50 (at least
10 times as many cases as
independent variables)
The formula for a multiple linear
regression is:
• y = the predicted value of the dependent variable

• B0 = the y-intercept (value of y when all other parameters are set to 0)

• B1X1= the regression coefficient (B 1) of the first independent variable (X1)

(a.k.a. the effect that increasing the value of the independent variable has
on the predicted y value)

• … = do the same for however many independent variables you are testing

• BnXn = the regression coefficient of the last independent variable

• e = model error (a.k.a. how much variation there is in our estimate of y)

best-fit line
• To find the best-fit line for each independent variable, multiple linear
regression calculates three things:
 The regression coefficients that lead to the smallest overall model error.
 The t-statistic of the overall model.
 The associated p-value (how likely it is that the t-statistic would have occurred by
chance if the null hypothesis of no relationship between the independent and
dependent variables was true).
T-test
• In statistics, the t-statistic is the ratio of the
departure of the estimated value of a parameter
from its hypothesized value to its standard error’

• It is meant for evaluating whether the two sets of

data are statistically significantly different from
each other.
• Q.1: Find the t-test value for the following given two sets of values:

• A = 7, 2, 9, 8 and

• B = 1, 2, 3, 4?

• Solution: For first data set:

• Number of terms in first set i.e. n_1 = 4

• Calculate mean value for first data set using formula:

• Higher values of the t-value, also called t-score, indicate that a large
difference exists between the two sample sets. The smaller the t-value, the
more similarity exists between the two sample sets.
P-value
• P-value is the lowest significance level that results in rejecting the null
hypothesis.
Example • Coin toss
 Two possible outcomes
 H0 = This is a fair coin
 H1 = This is not a fair coin
• The P-value test will assume that the
H0 hypothesis is true i.e., the coin is
fair
• Let us assume our threshold value to
be 5% i.e., 0.05
• Let us assume the output is
 First toss output is Tail (probability = 0.5)
 First toss output is Tail and second toss output is also Tail (probability = 0.25)
 First two outputs same as before, third toss output is also Tail (probability = 0.125)
 First three outputs same as before, fourth toss output is also Tail (probability =
0.0625)
 First four outputs same as before, fifth toss output is also Tail (probability = 0.03)
 First five outputs same as before, sixth toss output is also Tail (probability = 0.01)
 After the fourth output the statistical test is significant. Since P-value of less than
5% indicates that the hypothesis H0 is rejected and hypothesis H1 is accepted i.e.,
the coin is not fair
Selecting the independent variables
being used
• Five strategies are available for selecting the independent variables
 All in
 Backward Elimination
 Forward Selection
 Bi-directional elimination
 Score Comparison (All possible combinations)
All in
• Use all features

• Prior knowledge (Data domain expert) tell you which features to keep and
which to discard
Backward Elimination
1. Select a significance level (SL) for P-value e.g. 5% (0.05)

2. Fit the model will all predictors

3. Consider the predictor with highest P-value. If P>SL go to step 4,

otherwise include the predictor in your feature set

4. Remove variable with P>SL

5. Fit model without the variable and go to step 3 if all features have not
been exhausted. Otherwise terminate
Forward selection
Select Select a significance level (SL) for P-value e.g. 5% (0.05)

Fit Fit all the predictors y->xn one at a time and select one with the lowest P-value

Keep Keep this variable and fit all possible models with one extra predictor i.e., add one
predictor to the variables you already have.

Consider Consider the predictor with the least P-value. If P<SL, go to step 3, otherwise finish.
(keep the previous model)
Select a significance level to enter (SL_Enter) and
stay (SL_stay) in the model.

Perform the next step of forward selection (new

Bi- variables must enter if P < SL_enter)

directional Pefrom all steps of backward elimination (old

variables must have P<SL_stay to stay in the model)
Elimination No new variables can enter and no old variables can
exit

FIN: model is ready

All possible 1. Select a criterion of goodness of fit
models 2. Construct all possible models. If N
variables the 2𝑁 − 1 𝑚𝑜𝑑𝑒𝑙𝑠
3. Select the model with the best
criterion
4. Model is ready

• Very computationally intense !!!

• We will be using backward
elimination strategy
MLR Implemention
Python
Multiple Linear
Regression

Python Implementation
Importing
Dataset
Dataset
• Total 50 samples

• Three independent variables

 Administration
 Marketing spent
 State (categorical data)
 One hot encoding
 Three categories, so three
dummy variables

• One dependent variable

 Profit
Code

• One hot encoding to be applied

on column 3

• 80, 20 split
Training and testing the model
Evaluating the model
Q: Do we need to normalize the data in MLR
•A: No, we do not need to perform normalization for MLR, since
the coefficients b0, b1, b2,… in the MLR model automatically
does that.

Q: Do we need to check the assumptions of linear

Some points regression

to •A: Absolutely not, for a new dataset play and experiment with
it. If there are redundant features it will perform poorly.

remember Q: Do we need to use some strategy to avoid the

dummy variable trap
•A: The class used here in python will automatically do that

Q: Do we have to use techniques such as backward

elimination etc, before applying MLR.
•A: No, because the class we use will automatically do that.
Example of Backward Elimination
(Optional)
1. Select a significance level (SL) for P-value e.g. 5% (0.05)

2. Fit the model will all predictors

3. Consider the predictor with highest P-value. If P>SL go to step 4,

otherwise include the predictor in your feature set

4. Remove variable with P>SL

5. Fit model without the variable and go to step 3 if all features have not
been exhausted. Otherwise terminate
Code
• Importing the dataset
• Dividing the dataset into independent and dependent varia
• One hot encoding for the categorical data
• We do not need to cater the missing values as there are non
Inserting beta_0

x6 has highest p value, so it should be removed

x5 has highest p value so it should be removed
Now no independent variable has a p value >0.05, so we keep the remaining variables
Comparison between two approaches
using RMSE

The regression model with backward elimination shows lower RMSE

Plotting the output
Example of Forward
Selection(Optional)
1. Select a significance level (SL) for P-value e.g. 5% (0.05)

2. Fit all the predictors y->xn one at a time and select one with the lowest
P-value

3. Keep this variable and fit all possible models with one extra predictor i.e.,
add one predictor to the variables you already have.

4. Consider the predictor with the least P-value. If P<SL, go to step 3,

otherwise finish. (keep the previous model)

Assignment No 2 28112024 092512pm
No ratings yet
Assignment No 2 28112024 092512pm
1 page
2022 Econometrics I Chapter Four
No ratings yet
2022 Econometrics I Chapter Four
83 pages
CCN Lecture 3 BRIS SP25 17022025 033144pm
No ratings yet
CCN Lecture 3 BRIS SP25 17022025 033144pm
23 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
Midterm Lab Viva - BS RIS 2A 27032025 120831pm
No ratings yet
Midterm Lab Viva - BS RIS 2A 27032025 120831pm
2 pages
Assignment1 CVT 14032025 112123am
No ratings yet
Assignment1 CVT 14032025 112123am
1 page
AI Quizi 1 Bs RIS 4C SPR 2025
No ratings yet
AI Quizi 1 Bs RIS 4C SPR 2025
1 page
AI Assi 1 BEE Fall 2024
No ratings yet
AI Assi 1 BEE Fall 2024
1 page
Lecture 09 Model Misspecification
No ratings yet
Lecture 09 Model Misspecification
5 pages
Artificial Intelligence Journal
No ratings yet
Artificial Intelligence Journal
46 pages
Problem Set 5 With Solutions
No ratings yet
Problem Set 5 With Solutions
10 pages
2.simple Regression Analysis Chapter 6
No ratings yet
2.simple Regression Analysis Chapter 6
27 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
13 pages
Cassi
No ratings yet
Cassi
4 pages
Tabulasi Data Kuesioner Responden Dan Input SPSS - 19.05.52.0101 - SITI CHOLIFAH
No ratings yet
Tabulasi Data Kuesioner Responden Dan Input SPSS - 19.05.52.0101 - SITI CHOLIFAH
7 pages
Business Analytics Unit - V Notes - 60637708 - 2025 - 05 - 15 - 02 - 16
No ratings yet
Business Analytics Unit - V Notes - 60637708 - 2025 - 05 - 15 - 02 - 16
37 pages
IE266 S25 Week12
No ratings yet
IE266 S25 Week12
53 pages
Principles of Model Building
No ratings yet
Principles of Model Building
75 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Key QM
No ratings yet
Key QM
6 pages
04 Violation of Assumptions All
No ratings yet
04 Violation of Assumptions All
24 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
Predictive Analytics
No ratings yet
Predictive Analytics
46 pages
Robust Regression
No ratings yet
Robust Regression
7 pages
Code Phương Pháp Nghiên C U
No ratings yet
Code Phương Pháp Nghiên C U
6 pages
Multiple Linear Regression 3
No ratings yet
Multiple Linear Regression 3
68 pages
Data Science Module 5 Q & A
No ratings yet
Data Science Module 5 Q & A
8 pages
Analisa Pengaruh Fasilitas Dan Kepuasan Pelanggan Terhadap Loyalitas Pelanggan Menginap Di Mikie Holiday Resort Dan Hotel Berastagi
No ratings yet
Analisa Pengaruh Fasilitas Dan Kepuasan Pelanggan Terhadap Loyalitas Pelanggan Menginap Di Mikie Holiday Resort Dan Hotel Berastagi
13 pages
Mgt555 - Individual Assignment 2
100% (1)
Mgt555 - Individual Assignment 2
6 pages
Time Series and Spectral Analysis Part IV. ARIMA Forecasting
No ratings yet
Time Series and Spectral Analysis Part IV. ARIMA Forecasting
20 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
Speciality Packaging Case Study
No ratings yet
Speciality Packaging Case Study
20 pages
Y Abx BX BX: Multiple Linear Regression
No ratings yet
Y Abx BX BX: Multiple Linear Regression
48 pages
Group 1 Practical
No ratings yet
Group 1 Practical
16 pages
Regression Discontinuity Models: Pavel Coronado
No ratings yet
Regression Discontinuity Models: Pavel Coronado
63 pages
The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
No ratings yet
The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
15 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
M.SC Statistics101011 PDF
No ratings yet
M.SC Statistics101011 PDF
35 pages
SRM Formula Sheet
No ratings yet
SRM Formula Sheet
16 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
13 pages
Jurnal Asli Diagram Sa
No ratings yet
Jurnal Asli Diagram Sa
11 pages
Bowman - Monotone Regresion PDF
No ratings yet
Bowman - Monotone Regresion PDF
12 pages
Regn Lect 5
No ratings yet
Regn Lect 5
9 pages
2735 6518 1 PB PDF
No ratings yet
2735 6518 1 PB PDF
13 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
8
No ratings yet
8
23 pages
Malhotra17 Tif
No ratings yet
Malhotra17 Tif
12 pages
Hasil Uji Deskriptif: Descriptive Statistics
No ratings yet
Hasil Uji Deskriptif: Descriptive Statistics
3 pages
Chapter 06 Linear Reg
No ratings yet
Chapter 06 Linear Reg
24 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
ML Unit
No ratings yet
ML Unit
23 pages
MAS Wiley Questions 2019-28
No ratings yet
MAS Wiley Questions 2019-28
5 pages
Modelling Volatility and Correlation: Introductory Econometrics For Finance' © Chris Brooks 2008 1
No ratings yet
Modelling Volatility and Correlation: Introductory Econometrics For Finance' © Chris Brooks 2008 1
59 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Multiple Linear Regression 13112023 063212pm
No ratings yet
Multiple Linear Regression 13112023 063212pm
49 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Bayesian Inference 4 LMS PDF
No ratings yet
Bayesian Inference 4 LMS PDF
91 pages
Ch08 Part 2 - Multiple Regression
No ratings yet
Ch08 Part 2 - Multiple Regression
45 pages
v33b01 PDF
No ratings yet
v33b01 PDF
3 pages
Chapter 10 Multiple Regression
No ratings yet
Chapter 10 Multiple Regression
43 pages
Lecture 4 Intro To ML 27 03 2023 27032023 041559pm
No ratings yet
Lecture 4 Intro To ML 27 03 2023 27032023 041559pm
50 pages
3 Linear Regression 3
No ratings yet
3 Linear Regression 3
10 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Time Series Analysis Exercises: Universität Potsdam
100% (1)
Time Series Analysis Exercises: Universität Potsdam
30 pages
Chap 6 MultipleLinearRegression Adjusted
No ratings yet
Chap 6 MultipleLinearRegression Adjusted
30 pages
Ch08 Part 2 - Multtiple Regression
No ratings yet
Ch08 Part 2 - Multtiple Regression
45 pages
Multiple Regression Analysis Part 1
No ratings yet
Multiple Regression Analysis Part 1
9 pages
Regression - Part III - 2021
No ratings yet
Regression - Part III - 2021
55 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
Natural Resources, Conflict, and Conflict Resolution: Uncovering The Mechanisms
No ratings yet
Natural Resources, Conflict, and Conflict Resolution: Uncovering The Mechanisms
30 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Stepwise Regression
100% (2)
Stepwise Regression
28 pages
SRM Formula Sheet-2
100% (1)
SRM Formula Sheet-2
11 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Chapter 6 (Part Ii)
No ratings yet
Chapter 6 (Part Ii)
41 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
4 pages
Prediction & Forecasting: Regression Analysis
No ratings yet
Prediction & Forecasting: Regression Analysis
3 pages
Fsgs
No ratings yet
Fsgs
28 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Mlmultiplelinearregression 170919114353 PDF
No ratings yet
Mlmultiplelinearregression 170919114353 PDF
8 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Multiple Regression
No ratings yet
Multiple Regression
21 pages
336 Final Review Notes: Regression
No ratings yet
336 Final Review Notes: Regression
4 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Chapter 8: Multiple and Logistic Regression Learning Objectives
No ratings yet
Chapter 8: Multiple and Logistic Regression Learning Objectives
3 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
SBE11 CH 16
No ratings yet
SBE11 CH 16
59 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages

Lecture 4 - Multiple Linear Regression Imran 20022025 092939am

Uploaded by

Lecture 4 - Multiple Linear Regression Imran 20022025 092939am

Uploaded by

Multiple Linear

2. What multiple linear regression can help you do.

3. Assumptions of multiple linear regression

4. How to perform a multiple linear regression

• B0 = the y-intercept (value of y when all other parameters are set to 0)

• B1X1= the regression coefficient (B 1) of the first independent variable (X1)

• BnXn = the regression coefficient of the last independent variable

• e = model error (a.k.a. how much variation there is in our estimate of y)

• It is meant for evaluating whether the two sets of

• Solution: For first data set:

• Number of terms in first set i.e. n_1 = 4

• Calculate mean value for first data set using formula:

2. Fit the model will all predictors

3. Consider the predictor with highest P-value. If P>SL go to step 4,

4. Remove variable with P>SL

Perform the next step of forward selection (new

Bi- variables must enter if P < SL_enter)

directional Pefrom all steps of backward elimination (old

FIN: model is ready

• Very computationally intense !!!

• Three independent variables

• One dependent variable

• One hot encoding to be applied

Q: Do we need to check the assumptions of linear

remember Q: Do we need to use some strategy to avoid the

Q: Do we have to use techniques such as backward

2. Fit the model will all predictors

3. Consider the predictor with highest P-value. If P>SL go to step 4,

4. Remove variable with P>SL

x6 has highest p value, so it should be removed

The regression model with backward elimination shows lower RMSE

4. Consider the predictor with the least P-value. If P<SL, go to step 3,

You might also like