0% found this document useful (0 votes)

13 views

Multiple Linear Regression

Uploaded by

Vrushika Doshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Multiple Linear Regression

Uploaded by

Vrushika Doshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

You are on page 1/ 26

Multiple linear regression,

stepwise Multiple linear

regression
Multiple Regression

An extension of simple linear regression.

Predicts the value of a variable based on the value of two or more

other variables.

For example: whether exam performance can be predicted based on

revision time, test anxiety, lecture attendance, and gender.

Whether daily cigarette consumption can be predicted based on

smoking duration, age when started smoking, smoking type, income,
and gender.
Multiple Regression

Multiple regression also allows us to determine the model's

overall fit (variance explained) and the relative contribution of
each of the predictors to the total variance explained.

For example, you might want to know how much of the variation
in exam performance can be explained by revision time, test
anxiety, lecture attendance and gender "as a whole", and the
"relative contribution" of each independent variable in explaining
the variance.
Assumptions
1: The dependent variable should be measured on a continuous scale.
Types of Variable. If the dependent variable is an ordinal scale, we need
to carry out ordinal regression rather than multiple regression.
2: Two or more independent variables, which can be
either continuous (i.e., an interval or ratio variable) or categorical
(i.e., an ordinal or nominal variable).
Examples of nominal variables include gender (e.g., 2 groups: male
and female)
If one of the independent variables is dichotomous and considered a
moderating variable, we need to run a Dichotomous moderator analysis.
Assumptions
3: Independence of observations (i.e., independence of residuals), - check using
the Durbin-Watson statistic.
4: There needs to be a linear relationship between
(a) the dependent variable and each of your independent variables, and
(b) the dependent variable and the independent variables collectively.
Ways to check for these linear relationships: creating scatterplots and partial
regression plots and then visually inspecting these scatterplots and partial regression
plots to check for linearity.
If the relationship displayed in scatterplots and partial regression plots is not linear, we
will have to either run a non-linear regression analysis or "transform" the data
Assumptions

5: Data needs to show homoscedasticity, where the variances

along the line of best fit remain similar as you move along the line.

6: Data must not show multicollinearity, which occurs when you

have two or more independent variables that are highly correlated.

Assumptions
7: There should be no significant outliers, high leverage points or highly
influential points.

Check for influential points using a measure of influence known as Cook's

Distance.

8: Finally, check that the residuals (errors) are approximately normally

distributed.

Two standard methods to check this assumption include using:

(a) Histogram (with a superimposed normal curve) and a Normal P-P Plot

(b) A Normal Q-Q Plot of the studentized residuals.

Results

Assumptions

The data significantly deviates from a normal distribution.

Assumption #5: Your data needs to show homoscedasticity, where the
variances along the line of best fit remain similar as you move along the
line.

The p-value associated with the Breusch-Pagan test is 0.540, which is greater
than the common significance level of 0.05.
Fail to reject the null hypothesis of homoscedasticity. This suggests that there is no
significant evidence of heteroscedasticity in the data according to the results of the
Breusch-Pagan test. Therefore, the data is more likely to exhibit
homoscedasticity, meaning that the variance of the errors is
constant across observations.
Assumption #3: Independence of observations (i.e., independence of
residuals), - check using the Durbin-Watson statistic.

There is no significant evidence of autocorrelation in the residuals.

Since the DW statistic is close to 2 (1.99), it suggests that there may be no
first-order autocorrelation in the residuals.
Additionally, the p-value of 0.720 is greater than the commonly chosen
significance level of 0.05.
There is no significant evidence of autocorrelation in the residuals of
your model.
Assumption #6: Your data must not show multicollinearity, which occurs when you
have two or more independent variables that are highly correlated.

1. VIF (Variance Inflation Factor): VIF measures

how much the variance of an estimated regression
coefficient is increased due to collinearity. It
quantifies the extent to which the variance of the
estimated regression coefficients is inflated
compared to when the predictors are not
correlated. Typically, a VIF value above 10 (some
sources suggest 5) indicates a problematic level of
collinearity.
2. Tolerance: Tolerance is the reciprocal of VIF
(1/VIF). It indicates the proportion of variance in the
predictor variable that is not explained by the other
predictor variables. Tolerance values close to 1
Results

Model Fit Measures are commonly used to assess the goodness-of-fit of a regression model.
The value of R is 0.0713, indicating a weak positive linear relationship between the
predictor variables and the response variable.
The value of R² is 0.00509, which means that only approximately 0.51% of the variance in
the dependent variable is explained by the independent variables in the model. This
suggests that the model's explanatory power is very low.
The value of Adjusted R² is 0.00461. This value is similar to R-squared but takes into
account the number of predictors in the model. The adjusted R² is slightly smaller than R²,
indicating that the inclusion of the predictor variables did not significantly improve the
model's fit after adjusting for the number of predictors.
Overall, based on these model fit measures, the regression model appears to have very weak
Model Fit Measures
1. R: The correlation coefficient (also known as Pearson's) measures the strength
and direction of the linear relationship between the predictor variables and the
response variable. It ranges from -1 to 1, where 1 indicates a perfect positive
linear relationship, -1 indicates a perfect negative linear relationship, and 0
indicates no linear relationship.
2. R² (R-squared): R-squared represents the proportion of variance in the
dependent variable that is explained by the independent variables in the
regression model. It ranges from 0 to 1, with higher values indicating that the
independent variables explain a larger proportion of the variance in the
dependent variable. In other words, it quantifies the goodness-of-fit of the
regression model.
3. Adjusted R²: Adjusted R-squared is a modified version of R-squared that
adjusts for the number of predictor variables in the model. It penalizes adding
additional predictors that do not significantly improve the model's fit. Adjusted
R-squared can be particularly useful when comparing models with different
numbers of predictors.
Predictor: The predictors are 'age' and 'BMI'.
Estimate: This column provides each predictor variable's estimated coefficients (or slopes).
These coefficients represent the expected change in the dependent variable for a one-unit
change in the corresponding predictor variable, holding all other predictors constant.
Intercept (72.0783): This represents the estimated heart rate when all other predictors are
zero. In this case, it suggests that when age and BMI are zero, the estimated heart rate is
approximately 72.0783 beats per minute.
The estimated coefficient for 'age' is -0.0319. This indicates that, on average, there is a decrease
of 0.0319 units in heart rate for every one-unit increase in age, holding other variables constant.
The coefficient for age indicates the change in heart rate for a one-unit increase in age, holding
other predictors constant. Since the p-value associated with age is 0.143, which is greater than
the common significance level of 0.05, we do not have enough evidence to conclude that age has
a significant effect on heart rate in this model.
The estimated coefficient for 'BMI' is 0.2086.
The coefficient for BMI suggests that for every one-unit increase in BMI, the estimated
heart rate increases by approximately 0.2086 beats per minute.
Since the p-value associated with BMI is less than 0.001, we can conclude that BMI has a
statistically significant effect on heart rate in this model.
Standardized Estimate: These indicate the relative importance of each predictor in the
model after standardizing the variables. It represents the change in the response variable
(heart rate) in terms of standard deviations for a one-unit change in the predictor variable.
It can be useful for comparing the relative importance of predictors in the model.
Check for Cook's Distance statistic in the regression output.Cook's Distance measures the
influence of each observation on the regression coefficients.High Cook's Distance values
(>1) indicate influential outliers that may significantly affect the regression model.

Results
e-4
The notation " e-4 " represents a number in scientific notation,
where "e" stands for "exponent" and "-4" indicates that the decimal
point is moved four places to the left. Specifically, "e-4" is
equivalent to multiplying the number by 10−410−4, or dividing it
by 10,000.
For example:
7.68e-5 is equivalent to 7.68×10−57.68×10−5, which is
0.0000768.
6.83e-4 is equivalent to 6.83×10−46.83×10−4, which is 0.000683.
So, in the context of Cook's Distance:
7.68e-5 means 0.0000768.
6.83e-4 means 0.000683.
Multiple regression with a
categorical variable
The coefficient for gender indicates the difference in estimated heart rate
between males and females.
Specifically, the coefficient of -3.0294 bpm suggests that, on average, males
have a heart rate approximately 3.0294 bpm lower than females when
controlling for other predictors in the model.
Hierarchical Regression
Hierarchical regression is a statistical method used to examine the
incremental contribution of predictor variables to the variance
explained in an outcome variable. It involves entering predictor
variables into the regression equation in a stepwise manner, typically
based on theoretical or logical considerations.
In hierarchical regression, predictors are added to the model in
separate blocks or steps, allowing researchers to assess the unique
contribution of each set of predictors while controlling for the effects of
previous sets of predictors.
This technique is commonly used to understand how additional
predictors improve the prediction of an outcome variable beyond what
is accounted for by earlier predictors.
Stepwise Multiple linear regression
Stepwise Multiple linear regression
Stepwise regression is defined as a step-by-step construction of a regression
model that includes an automatic selection of variables that are independent.

It involves systematically adding or removing predictor variables from the

model based on their statistical significance.

The process typically proceeds in one of two directions: forward selection or

backward elimination.

Stepwise regression is a useful tool for identifying a parsimonious set of

predictor variables that best explain the variability in the outcome variable
Forward Selection
1.

1. Start with an empty model (i.e., no predictor variables included).

2. Add one predictor variable at a time based on a predefined criterion,
such as the highest correlation with the outcome variable or the lowest
p-value from a univariate regression.
3. At each step, assess the statistical significance of the added variable
using a predefined significance level (e.g., p-value < 0.05).
4. Continue adding variables until no more variables meet the inclusion
criterion.
Backward Elimination:
Start with a full model (i.e., all predictor variables included).
Remove one predictor variable at a time based on a predefined
criterion, such as the highest p-value from a multivariate
regression.
At each step, assess the statistical significance of the removed
variable using a predefined significance level (e.g., p-value <
0.05).
Continue removing variables until no more variables meet the
exclusion criterion.
Stepwise Procedure
Stepwise regression alternates between forward selection and
backward elimination steps until a stopping criterion is met.
Common stopping criteria include:
A predefined number of steps.
No more variables meet the inclusion or exclusion criteria.
The model's performance (e.g., adjusted R-squared) no longer
improves significantly with additional variables.
Model Evaluation
Evaluate the final model using appropriate model fit measures
(e.g., R-squared, AIC, BIC) and diagnostic tests (e.g., residual
analysis, multicollinearity assessment).
Validate the model's performance on an independent dataset, if
available, to assess its generalizability.

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Nabil Anwar Academic CV Vitae 2014
100% (1)
Nabil Anwar Academic CV Vitae 2014
4 pages
Lesson 3.1 SPSS OUTPUT
No ratings yet
Lesson 3.1 SPSS OUTPUT
6 pages
Predictive Analytics - Business Predictions Using Mutliple Linear Regression
No ratings yet
Predictive Analytics - Business Predictions Using Mutliple Linear Regression
21 pages
SPSS ANNOTATED OUTPUT Multiple Regression
No ratings yet
SPSS ANNOTATED OUTPUT Multiple Regression
12 pages
Linear Regression
100% (2)
Linear Regression
28 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
1.3. MR Using SPSS
No ratings yet
1.3. MR Using SPSS
24 pages
Unit 4-1
No ratings yet
Unit 4-1
29 pages
Multiple Linear Regression Analysis
No ratings yet
Multiple Linear Regression Analysis
23 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
5.multiple Regression
No ratings yet
5.multiple Regression
17 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Unit 3
No ratings yet
Unit 3
24 pages
Test Procedure in SPSS Statistics
No ratings yet
Test Procedure in SPSS Statistics
8 pages
MAP 716 Lecture 5 Multiple Regression
No ratings yet
MAP 716 Lecture 5 Multiple Regression
6 pages
Om Ashish Mishra 23363025: 5 Mcqs
No ratings yet
Om Ashish Mishra 23363025: 5 Mcqs
9 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Note Multiple Regression KOM 6115
No ratings yet
Note Multiple Regression KOM 6115
18 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Screenshot 2023-12-04 at 11.27.14
No ratings yet
Screenshot 2023-12-04 at 11.27.14
32 pages
Regression
No ratings yet
Regression
20 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
6 pages
Session 2
No ratings yet
Session 2
21 pages
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
100% (1)
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
14 pages
Multiple Linear Regression (Multiple Regression Analysis)
No ratings yet
Multiple Linear Regression (Multiple Regression Analysis)
37 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
Lecture 5 Chapter 3
No ratings yet
Lecture 5 Chapter 3
56 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Unit 3 - Notes
No ratings yet
Unit 3 - Notes
32 pages
Inferential Analysis
No ratings yet
Inferential Analysis
45 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
UKP6053 - L8 Multiple Regression
100% (2)
UKP6053 - L8 Multiple Regression
105 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
No ratings yet
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
16 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
BRM-Lecture 4-2023
No ratings yet
BRM-Lecture 4-2023
48 pages
Monika Project
No ratings yet
Monika Project
34 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Cfa Level 2 2023 Summary
No ratings yet
Cfa Level 2 2023 Summary
100 pages
lecture-5 2
No ratings yet
lecture-5 2
51 pages
SPSS and Building Models
No ratings yet
SPSS and Building Models
36 pages
R Egression Simplified
No ratings yet
R Egression Simplified
24 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Regression
No ratings yet
Regression
12 pages
Stt151a Notes
No ratings yet
Stt151a Notes
14 pages
Bi Is The Slope of The Regression Line Which Indicates The Change in The Mean of The Probablity Bo Is The Y Intercept of The Regression Line
No ratings yet
Bi Is The Slope of The Regression Line Which Indicates The Change in The Mean of The Probablity Bo Is The Y Intercept of The Regression Line
5 pages
Linear Model
No ratings yet
Linear Model
10 pages
PROBLEMS ch05
No ratings yet
PROBLEMS ch05
117 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
linearregression
No ratings yet
linearregression
18 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Notes On Linear Regression - 2
No ratings yet
Notes On Linear Regression - 2
4 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
D12_pub
No ratings yet
D12_pub
10 pages
6-1-9-221
No ratings yet
6-1-9-221
5 pages
NSS -community connect challenge
No ratings yet
NSS -community connect challenge
1 page
Pre Post SEM Item Revision Table
No ratings yet
Pre Post SEM Item Revision Table
1 page
SEM_Scale_Expert_Validation_Template
No ratings yet
SEM_Scale_Expert_Validation_Template
1 page
Reflection of Meaning 3.pptx
No ratings yet
Reflection of Meaning 3.pptx
13 pages
Vrushika Doshi
No ratings yet
Vrushika Doshi
4 pages
Influencing Skills.pptx
No ratings yet
Influencing Skills.pptx
21 pages
Dgi35 Handout
No ratings yet
Dgi35 Handout
4 pages
Tracy QI QualitativeQuality 8bigtentcriteria
No ratings yet
Tracy QI QualitativeQuality 8bigtentcriteria
17 pages
Human Rights Textbook
100% (1)
Human Rights Textbook
733 pages
Collegedekho 240118 233616
No ratings yet
Collegedekho 240118 233616
13 pages
To Determine The Horizontal Angles of The Survey Scheme Using Theodolite
No ratings yet
To Determine The Horizontal Angles of The Survey Scheme Using Theodolite
5 pages
(Doi 10.1002 - 9781119476962.ch4) Benallou, Abdelhanine - Energy Transfers by Convection - Forced Convection Outside Pipes or Around Objects
No ratings yet
(Doi 10.1002 - 9781119476962.ch4) Benallou, Abdelhanine - Energy Transfers by Convection - Forced Convection Outside Pipes or Around Objects
20 pages
Hindustan Olympiad 2018 Question Paper Class 6 With Solution
No ratings yet
Hindustan Olympiad 2018 Question Paper Class 6 With Solution
16 pages
6 Example - Prandtl Lifting Line Theory
100% (1)
6 Example - Prandtl Lifting Line Theory
2 pages
Homework2 2-2 3IVT
No ratings yet
Homework2 2-2 3IVT
4 pages
EQ-502 Curve Fitting - Sinusoid
No ratings yet
EQ-502 Curve Fitting - Sinusoid
8 pages
A Note On The Barycentric Square Roots of Kiepert Perspectors
No ratings yet
A Note On The Barycentric Square Roots of Kiepert Perspectors
6 pages
Ansoft - Designer Training
100% (1)
Ansoft - Designer Training
222 pages
0606_w24_ms_13
No ratings yet
0606_w24_ms_13
8 pages
Bus Gcse Edx Calcbook Sample
No ratings yet
Bus Gcse Edx Calcbook Sample
8 pages
Engineering Techniques Of Ring Spinning Bhattacharya Someshwar S Shaikh download
No ratings yet
Engineering Techniques Of Ring Spinning Bhattacharya Someshwar S Shaikh download
78 pages
Basic Engineering Correlation Algebra Re
No ratings yet
Basic Engineering Correlation Algebra Re
78 pages
P.E Maths Year 11
No ratings yet
P.E Maths Year 11
9 pages
kvs-maths-marking-scheme
No ratings yet
kvs-maths-marking-scheme
9 pages
6 - Frequency Effect
No ratings yet
6 - Frequency Effect
36 pages
Drift 2
No ratings yet
Drift 2
12 pages
Intro to Functions Quiz Review 2024
No ratings yet
Intro to Functions Quiz Review 2024
4 pages
Numerical Methods by Balaguruswamy PDF 15 PDF
No ratings yet
Numerical Methods by Balaguruswamy PDF 15 PDF
3 pages
CATEGORICAL PROPOSITIONS-legal Technique
No ratings yet
CATEGORICAL PROPOSITIONS-legal Technique
3 pages
Bala Subra Mami Am
No ratings yet
Bala Subra Mami Am
10 pages
Structure of A Report
100% (2)
Structure of A Report
25 pages
NCERT Solutions Class 10 Maths Chapter 1 Real Numbers
No ratings yet
NCERT Solutions Class 10 Maths Chapter 1 Real Numbers
16 pages
Final Exame of Statoistics PDF
No ratings yet
Final Exame of Statoistics PDF
9 pages
Lab Manual 1
No ratings yet
Lab Manual 1
13 pages
Statistics and Probability Mean and Variance of Discrete Random Variable
No ratings yet
Statistics and Probability Mean and Variance of Discrete Random Variable
15 pages
8808
No ratings yet
8808
299 pages
N Cape Sept 2024 Mathematics P2 Marking Guideline
100% (1)
N Cape Sept 2024 Mathematics P2 Marking Guideline
19 pages
1.a-Pertemuan I - Forces and Moment & Six Motion
No ratings yet
1.a-Pertemuan I - Forces and Moment & Six Motion
34 pages
International Financial Management 9th Edition Jeff Madura Test Bank instant download
100% (2)
International Financial Management 9th Edition Jeff Madura Test Bank instant download
47 pages

Multiple Linear Regression

Uploaded by

Multiple Linear Regression

Uploaded by

Multiple linear regression,

stepwise Multiple linear

An extension of simple linear regression.

Predicts the value of a variable based on the value of two or more

For example: whether exam performance can be predicted based on

Whether daily cigarette consumption can be predicted based on

Multiple regression also allows us to determine the model's

5: Data needs to show homoscedasticity, where the variances

6: Data must not show multicollinearity, which occurs when you

have two or more independent variables that are highly correlated.

Check for influential points using a measure of influence known as Cook's

8: Finally, check that the residuals (errors) are approximately normally

Two standard methods to check this assumption include using:

(b) A Normal Q-Q Plot of the studentized residuals.

The data significantly deviates from a normal distribution.

There is no significant evidence of autocorrelation in the residuals.

1. VIF (Variance Inflation Factor): VIF measures

It involves systematically adding or removing predictor variables from the

The process typically proceeds in one of two directions: forward selection or

Stepwise regression is a useful tool for identifying a parsimonious set of

1. Start with an empty model (i.e., no predictor variables included).

You might also like