0% found this document useful (0 votes)
148 views

1 - Multiple Regression

The document discusses multiple linear regression. It begins by defining multiple regression as predicting the value of a variable (dependent variable) based on the values of two or more other variables (independent variables). It then provides more details on: - What multiple regression can tell you, such as the relative influence of independent variables on the dependent variable - The objective of using multiple regression to predict dependent variable values based on known independent variable values - Key terms like coefficients, R-squared, and p-values - The multiple regression formula and assumptions - An 8-step process for performing multiple regression analysis - Techniques used like scatterplots and correlation analysis - An example using stock price as the dependent variable and

Uploaded by

zeref dragneel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views

1 - Multiple Regression

The document discusses multiple linear regression. It begins by defining multiple regression as predicting the value of a variable (dependent variable) based on the values of two or more other variables (independent variables). It then provides more details on: - What multiple regression can tell you, such as the relative influence of independent variables on the dependent variable - The objective of using multiple regression to predict dependent variable values based on known independent variable values - Key terms like coefficients, R-squared, and p-values - The multiple regression formula and assumptions - An 8-step process for performing multiple regression analysis - Techniques used like scatterplots and correlation analysis - An example using stock price as the dependent variable and

Uploaded by

zeref dragneel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Quotes

“In principle, more analytic power can be achieved by varying multiple things at once in an
uncorrelated (random) way, and doing standard analysis, such as multiple linear regression. In
practice, though, A/B testing is widely used, because A/B tests are easy to deploy, easy to understand,
and easy to explain to management.”

— Christopher D. Manning

What is Multiple Regression?

Multiple linear regression or also known as multiple regression is an extension of simple linear
regression.
It is used when we want to predict the value of a variable based on the value of two or more
other variables. The variable we want to predict is called the dependent variable (or sometimes,
the outcome, target or criterion variable).

What Multiple Linear Regression Can Tell You

Simple linear regression is a function that allows an analyst or statistician to make predictions
about one variable based on the information that is known about another variable.
Linear regression can only be used when one has two continuous variables—an independent
variable and a dependent variable.
The independent variable is the parameter that is used to calculate the dependent variable or
outcome. A multiple regression model extends to several explanatory variables.

Before we go further to the topic, here is a video that tells us about the multiple regression

https://www.youtube.com/watch?v=zITIFTsivN8&t=130s

Definition/description of Multiple Regression

Multiple regression generally explains the relationship between multiple independent or


predictor variables or p-value and one dependent variable.
A dependent variable is modeled as a function of several independent variables with
corresponding coefficients, along with the constant term.
Multiple regression requires two or more predictor variables, and this is why it is called multiple
regression.

Objective:

The objective of multiple regression analysis is to use the independent variables whose values
are known to predict the value of the single dependent (Y) value.
Terminologies:

R, is the measure of association between the observed value and the predicted value of the
Dependent variable.

R Square (or the coefficient determination) - the square of the measure of association which
indicates the percent of overlap between the predictor variables and the Dependent variable.
Adjusted R2 is an estimate of the R2 if you used this model with a new data set.

o The coefficient of determination (R2) is a statistical metric that is used to measure


how much of the variation in outcome can be explained by the variation in the
independent variables.
o R2 by itself can't thus be used to identify which predictors should be included in a
model and which should be excluded. R2 can only be between 0 and 1, where 0
indicates that the outcome cannot be predicted by any of the independent variables
and 1 indicates that the outcome can be predicted without error from the
independent variables.

Independent variable (p – value) - is the parameter that is used to calculate the dependent
variable or outcome.

Dependent variable (criterion value) - in an equation, the variable whose value depends on one
or more variables in the equation.

How does it work/ How to do it

The Formula of multiple regression:

yi= β0 + β1xi1 + β2xi2 + ... +βpxip + ϵ


where, for i = n 

observations:

yi=dependent variable

xi=Independent variables here we have “p” predictor variables and “p+1” as total regression
parameters.

β0= y-intercept (constant term)

βp= slope coefficients for each explanatory variable

ϵ= the model’s error term (also known as the residuals)
The multiple regression model is based on the following assumptions:

There is a linear relationship between the dependent variables and the independent variables
The independent variables are not too highly correlated with each other
yi observations are selected independently and randomly from the population
Residuals should be normally distributed with a mean of 0 and variance σ

8 Steps to Multiple Regression Analysis

Following is a list of 7 steps that could be used to perform multiple regression analysis

Identify a list of potential variables/features; Both independent (predictor) and dependent


(response)

Gather data on the variables

Check the relationship between each predictor variable and the response variable. This could
be done using scatterplots and correlations.

Check the relationship amoung the predictor variables. This could be done using scatterplots
and correlations. It is also termed as multi-collinearity test.

Try and analyze the simple linear regression between the predictor and response variable.

Use the non-redundant predictor variables in the analysis. This is based on checking the
multicollinearity between each of the predictor variables. If the correlation exists, one may
want to one of these variable.

Analyze one or more model based on some of the following criteria

 t-statistics of one or more parameters: This is used to test the null hypothesis
whether the parameter’s value is equal to zero.
 p-value: This is used to test the null hypothesis whether there exists a
relationship between the dependent and independent variable. Lesser the p-
value, greater is the statistical significance of the parameter. This could, in
turn, imply that there exists a relationship between the dependent and
independent variable
 f-value: Tests how fit is the model
 R2 (R squared) or adjusted R2: Tests the fitness of the regression model

Use the best fitting model to make prediction based on the predictor (independent variables).
This is done based on the statistical analysis of some of the above mentioned statistics such as
t-score, p-value, R squared, F-value etc.
Techniques used in Multiple Regression Analysis

Following are some of the key techniques that could be used for multiple regression analysis:

Scatterplots: Scatterplots could be used to visualize the relationship between two variables.

Correlation analysis (also includes multicollinearity test): Correlation tests could be used to
find out following:
o Whether the dependent and independent variables are related
o Whether the independent variables are related among each other. This is also termed
as multicollinearity.
whether two variables are correlated or not.

Individual/group regressions:This is done to understand whether there exists a regression


between the dependent variable and each independent variable given all the remaining
independent variables parameter are equal to 0.
For example How to Use Multiple Linear Regression

As an example, an analyst may want to know how the movement of the market affects the price of
ExxonMobil (XOM). In this case, their linear equation will have the value of the S&P 500 index as the
independent variable, and the price of XOM as the dependent variable.

In reality, multiple factors predict the outcome of an event. The price movement of ExxonMobil, for
example, depends on more than just the performance of the overall market. Other predictors such as
the price of oil, interest rates, and the price movement of oil futures can affect the price of XOM and
stock prices of other oil companies. To understand a relationship in which more than two variables are
present, multiple linear regression is used.

yi= β0 + β1xi1 + β2xi2 + ... +βpxip + ϵ


Referring to the MLR equation, in our example:

yi = dependent variable—the price of XOM

xi1 = interest rates

xi2 = oil price

xi3 = value of S&P 500 index

xi4= price of oil futures

B0 = y-intercept at time zero

B1 = regression coefficient that measures a unit change in the dependent variable when xi1 changes -
the change in XOM price when interest rates change

B2 = coefficient value that measures a unit change in the dependent variable when xi2 changes—the
change in XOM price when oil prices change

In order to get the information needed, usually statistical software is use. In this case we will use the
Microsoft excell spreasheet.

X0M price = 1.5 + 0.18 Interest rate + 1.15 oil price – 0.4 value of S&P 500 index – 0.09 price of oil
futures

R – square = 0.47 or 47%

An analyst would interpret this output to mean if other variables are held constant, the price of XOM
will increase by 1.15% if the price of oil in the markets increases by n%(I just use random data points).
The model also shows that the price of XOM will increase by 1.15% following a n% rise in interest rates.
R2 indicates that 47% of the variations in the stock price of Exxon Mobil can be explained by changes in
the interest rate, oil price, oil futures, and S&P 500 index.
Additional:

To calculate multiple linear regression using online calculator

https://stats.blue/Stats_Suite/multiple_linear_regression_calculator.html

Advantages of Multiple Regression

The ability to determine the relative influence of one or more independent variables to the
depedent value.
 For example. The real estate agent could find that the size of the homes and
the number of bedrooms have a strong correlation to the price of a home,
while the proximity to schools has no correlation at all, or even a negative
correlation if it is primarily a retirement community.

The second advantage is the ability to identify outliers, or anomalies.


 For example, while reviewing the data related to management salaries, the
human resources manager could find that the number of hours worked, the
department size and its budget all had a strong correlation to salaries, while
seniority did not. Alternatively, it could be that all of the listed predictor
values were correlated to each of the salaries being examined, except for one
manager who was being overpaid compared to the others.

Disadvantages of Multiple Regression

Just as with simple regression, multiple regression will not be good at explaining the
relationship of the independent variables to the dependent variables if those relationships
are not linear.
Any disadvantage of using a multiple regression model usually comes down to the data being
used. Using incomplete data risk and falsely concluding that a correlation is a causation.

 When reviewing the price of homes, for example, suppose the real estate
agent looked at only 10 homes, seven of which were purchased by young
parents. In this case, the relationship between the proximity of schools may
lead her to believe that this had an effect on the sale price for all homes being
sold in the community. This illustrates the pitfalls of incomplete data. Had she
used a larger sample, she could have found that, out of 100 homes sold, only
ten percent of the home values were related to a school's proximity. If she had
used the buyers' ages as a predictor value, she could have found that younger
buyers were willing to pay more for homes in the community than older
buyers.
Impact on the industry\Application\ latest statistical report of multiple regression

It can be applicable while predicting the expected crop yield with the consideration of climate
factors such as a certain rainfall, temperature and fertilizer level, etc.
In order to find the connection between the GPA of a class of students and the number of
study-hours and their height. Here the dependent variable is GPA and the number of study-
hours and student’s heights is explanatory variables.
For determining the salary of a batch of executives in a company and the number of years of
experience and the age of executives, regression analysis can be used. Here, the dependent
variable for this regression is the salary of executives, and the experience and age of the
executives are independent variables.
It is highly used in anticipating trends and future values/events. For example, rain forecast in
coming days, or price of gold/silver in the coming months from the present time.
An example of identifying the relationship between the distance covered (dependent variable)
by the cab driver and the age of the driver and years of experience (independent variables)

Additional learning materials:

https://www.youtube.com/watch?v=3EokKw3eg78&t=107s

Sources:

https://www.investopedia.com/terms/m/mlr.asp

https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php

https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/multiple-
regression/

https://us.sagepub.com/sites/default/files/upm-assets/78103_book_item_78103.pdf

https://sciencing.com/calculate-odds-ratio-contingency-table-8782587.html

https://www.analyticssteps.com/blogs/multiple-linear-regression

https://vitalflux.com/data-science-8-steps-to-multiple-regression-analysis/

You might also like