0% found this document useful (0 votes)

6 views

Lecture 2

Multiple linear regression utilizes two or more independent variables to explain the variation in a dependent variable, potentially improving predictions and understanding of financial factors. Analysts must carefully specify the model, select relevant variables, and ensure underlying assumptions are met to avoid spurious relationships. Tools like AIC and BIC help assess model quality and prevent overfitting, guiding analysts in selecting the most parsimonious model for their data.

Uploaded by

amir rafique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture 2

Uploaded by

amir rafique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Basics of Multiple

Regression
• Multiple linear regression uses two or more independent variables to describe
the variation of the dependent variable rather than just one independent
variable, as in simple linear regression.
• It allows the analyst to estimate using more complex models with multiple
explanatory variables and, if used correctly, may lead to better predictions,
better portfolio construction, or better understanding of the drivers of
security returns.
• If used incorrectly, however, multiple linear regression may yield spurious
relationships, lead to poor predictions, and offer a poor understanding of
relationships.
• The analyst must first specify the model and make several decisions in this
process, answering the following, among other questions:
• What is the dependent variable of interest?
• What independent variables are important?
• What form should the model take?
• What is the goal of the model—prediction or understanding of the
relationship?
• The analyst specifies the dependent and independent variables and then
employs software to estimate the model and produce related statistics.
• The good news is that the software, such as shown in Exhibit 1, does the
estimation, and our primary tasks
• are to focus on specifying the model and interpreting the output from this
software, which are the main subjects of this content.
USES OF MULTIPLE LINEAR REGRESSION

• There are many investment problems in which the analyst needs to consider the
impact of multiple factors on the subject of research rather than a single factor
• In the complex world of investments, it is intuitive that explaining or
forecasting a financial variable by a single factor may be insufficient.
• The complexity of financial and economic relations calls for models with
multiple explanatory variables, subject to fundamental justification and various
statistical tests.
Examples of how multiple regression may be used include the following:

• A portfolio manager wants to understand how returns are influenced by a

set of underlying factors; the size effect, the value effect, profitability, and
investment aggressiveness. The goal is to estimate a Fama–French five-factor
model that will provide an understanding of the factors that are important for
driving a particular stock’s excess returns.
• A financial adviser wants to identify whether certain variables, such as
financial leverage, profitability, revenue growth, and changes in market share,
can predict whether a company will face financial distress.
• An analyst wants to examine the effect of different dimensions of country
risk, such as political stability, economic conditions, and environmental, social,
and governance (ESG) considerations, on equity returns in that country.
• Multiple regression can be used to identify relationships between variables, to test
existing theories, or to forecast. There are many decisions that the analyst must make in
this process.
• For example, if the dependent variable is continuous, such as returns, the traditional
regression model is typically the first step. If, however, the dependent variable is
discrete—for example, an indicator variable such as whether a company is a takeover target
or not a takeover target—then, as we shall see, the model may be estimated as a logistic
regression.
• In either case, the process of determining the best model follows a similar path. The model
must first be specified, including independent variables that may be continuous, such as
company financial features, or discrete (i.e., dummy variables), indicating membership in a
class, such as an industry sector.
• Next, the regression model is estimated and analyzed to ensure it satisfies key underlying
assumptions and meets the analyst’s goodness-of-fit criteria.
• Once the model is tested and its out-of-sample performance is deemed acceptable, then it
can be used for further identifying relationships between variables, for testing existing
theories, or for forecasting.
THE BASICS OF MULTIPLE REGRESSION

• The goal of simple regression is to explain the variation of the dependent

variable, Y, using the variation of an independent variable, X.
• The goal of multiple regression is the same, to explain the variation of the
dependent variable, Y, but using the variations in a set of independent
variables, X1, X2, . . ., Xk.
• Recall the variation of Y is

which we also refer to as the sum of squares total.

• When we introduce additional independent variables to help explain the
variation of the dependent variable, we have the multiple regression equation:

In this equation, the terms involving the k independent variables are the
deterministic part of the model, whereas the error term, εi, is the stochastic or
random part of the model. The model is estimated over n observations, where n
must be larger than k.
• It is important to note that a slope coefficient in a multiple regression, known
as a partial regression coefficient or a partial slope coefficient, must be
interpreted with care.
• A partial regression coefficient, bj, describes the impact of that independent
variable on the dependent variable, holding all the other independent variables
constant.
• For example, in the multiple regression equation,

the coefficient b2 measures the change in Y for a one-unit change in X2 assuming

X1 and X 3 are held constant.
• Consider an estimated regression equation in which the monthly excess returns
of a bond index (RET) are regressed against the change in monthly government
bond yields (BY) and the change in the investment grade credit spreads (CS).
The estimated regression, using 60 monthly observations, is

We learn the following from this regression:

1. The bond index RET yields, on average, 0.0023% per month, or

approximately 0.028% per year, if the changes in the government bond yields
and investment-grade credit spreads are zero.

2. The change in the bond index return for a given one-unit change in the
monthly government bond yield, BY, is –5.0585%, holding CS constant.
3. If the investment-grade credit spreads, CS, increase by one unit, the bond
index returns change by –2.1901%, holding BY constant.
4. For a month in which the change in the credit spreads is 0.001 and the change
in the government bond yields is 0.005, the expected excess return on the bond
index is
ASSUMPTIONS UNDERLYING MULTIPLE LINEAR REGRESSION

• Linearity: The relationship between the dependent variable and the

independent variables is linear.
• Homoskedasticity: The variance of the regression residuals is the same for all
observations.
• Independence of errors: The observations are independent of one another.
This implies the regression residuals are uncorrelated across observations.
• Normality: The regression residuals are normally distributed.
• Independence of independent variables:
• a. Independent variables are not random.
• b. There is no exact linear relation between two or more of the independent
variables or combinations of the independent variables.
• The independence assumption is needed to enable the estimation of the
coefficients.

• If there is an exact linear relationship between independent variables, the

model cannot be estimated.

• In the more common case of approximate linear relationships, which may be

indicated by significant pairwise correlations between the independent
variables, the model can be estimated but its interpretation is problematic.
• Regression software produces diagnostic plots, which are a useful tool for
detecting potential violations of the assumptions underlying multiple linear
regression.
• To illustrate the use of such plots, we first estimate a regression to analyze 10
years of monthly total excess returns of ABC stock using the Fama–French
three-factor model. As noted previously, this model uses market excess return
(MKTRF), size (SMB) and value (HML) as explanatory variables.

• We start our analysis by generating a scatterplot matrix using software. This

matrix is also referred to as a pairs plot.
• Looking at the scatterplots between the independent variables, SMB and HML
have little or no correlation, as indicated by the relatively flat line for the
SMB–HML pair. This is a desirable characteristic between explanatory
variables.
Goodness of Fit
• In the simple regression model, the coefficient of determination, also known
as R-squared or R2, is a measure of the goodness of fit of an estimated
regression to the data.
• R2 can also be defined in multiple regression as the ratio of the variation of the
dependent variable explained by the independent variables (sum of squares
regression) to the total variation of the dependent variable (sum of squares
total).
• In multiple linear regression, however, R2 is less appropriate as a measure
of a model’s goodness of fit.
• This is because as independent variables are added to the model, R2
will increase or will stay the same, but it will not decrease. Problems with
using R2 in multiple regression include the following
• The R2 cannot provide information on whether the coefficients are
statistically significant.
• The R2 cannot provide information on whether there are biases in the
estimated coefficients and predictions.
• The R2 cannot tell whether the model fit is good. A good model may have
a low R2, as in many asset-pricing models, and a bad model may have a
high R2 due to overfitting and biases in the model.
• Overfitting of a regression model is a situation in which the model is too complex,
meaning there may be too many independent variables relative to the number of
observations in the sample.
• A result of overfitting is that the coefficients on the independent variables may not
represent true relationships with the dependent variable.
• An alternative measure of goodness of fit is the adjusted R2(R_2), which is
typically part of the multiple regression output produced by most statistical
software packages.
• A benefit of using the adjusted R2 is that it does not automatically increase when
another independent variable is added to a regression. This is because it adjusts for
the degrees of freedom as follows, where k is the number of independent variables:
• The following are two key observations about adj. R2 when adding a new
variable to a regression:
• If the coefficient’s t-statistic > |1.0|, then adj. R2 increases.
• If the coefficient’s t-statistic < |1.0|, then adj R2 decreases.
• Note that a t-statistic with an absolute value of 1.0 does not indicate the
independent variable is different from zero at typical levels of significance, 5%
and 1%.
• So, adjusted R2 does not set a very high bar for the statistic to increase.
• Consider the regression output provided in Exhibit 1, which shows the results
from the regression of portfolio returns on the returns for five hypothetical
fundamental factors, which we shall call Factors 1 through 5.
• The goal of this regression is to identify the factors that best explain the returns
on the portfolio.
AIC and BIC
• As both the R2 and adjusted R2 may increase when we add independent variables, we
risk model overfitting. Fortunately, there are several statistics to help compare model
quality and identify the most parsimonious model, including two statistics more
commonly known by their acronyms, AIC and BIC.
• We can use Akaike’s information criterion (AIC) to evaluate a collection of models
that explain the same dependent variable. It is often provided in the output for
regression software, but AIC can be calculated using information in the regression
output:

•
AIC is a measure of model parsimony, so a lower AIC indicates a better-fitting
model. The term 2(k + 1) is the penalty assessed for adding independent variables to
the model.
In a similar manner, Schwarz’s Bayesian information criterion (BIC or SBC)
allows comparison of models with the same dependent variable, as follows:

• Compared to AIC, BIC assesses a greater penalty for having more

parameters in a model, so it will tend to prefer small, more parsimonious
models. This is because ln(n) is greater than 2, even for very small sample
sizes.

• Practically speaking, AIC is preferred if the model is used for prediction

purposes, but BIC is preferred when the best goodness of fit is desired.

• Importantly, the value of these measures considered alone is meaningless;

the relative values of AIC or BIC among a set of models is what really
matters.

Transformer and Inductor Design Handbook (PDFDrive)
100% (3)
Transformer and Inductor Design Handbook (PDFDrive)
669 pages
Cfa l2 2024 Volume1 1522872379
No ratings yet
Cfa l2 2024 Volume1 1522872379
30 pages
CFA L2 2024 Volume1
100% (1)
CFA L2 2024 Volume1
168 pages
CFA Level2
No ratings yet
CFA Level2
8 pages
1 - Multiple Regression
No ratings yet
1 - Multiple Regression
8 pages
Quant and Eco
No ratings yet
Quant and Eco
218 pages
LM01 Basics of Multiple Regression and Underlying Assumptions IFT Notes
No ratings yet
LM01 Basics of Multiple Regression and Underlying Assumptions IFT Notes
11 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
6 pages
MBA Analytics For Finance 09
No ratings yet
MBA Analytics For Finance 09
12 pages
BA Module 5 Summary
No ratings yet
BA Module 5 Summary
3 pages
2024 Chapter 1
No ratings yet
2024 Chapter 1
8 pages
Cfa Level 2 2023 Summary
No ratings yet
Cfa Level 2 2023 Summary
100 pages
Hypotest 8
No ratings yet
Hypotest 8
2 pages
Econometrics Unit 4
No ratings yet
Econometrics Unit 4
56 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
5) Multiple Regression
100% (1)
5) Multiple Regression
8 pages
Module 5: Multiple Regression Analysis: Tom Ilvento
No ratings yet
Module 5: Multiple Regression Analysis: Tom Ilvento
20 pages
Marketing Research: Ninth Edition
No ratings yet
Marketing Research: Ninth Edition
44 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
Multiple Regression
No ratings yet
Multiple Regression
35 pages
Da Semi
No ratings yet
Da Semi
42 pages
Example of How To Use Multiple Linear Regression
No ratings yet
Example of How To Use Multiple Linear Regression
4 pages
What Is Multiple Linear Regression (MLR) ?
No ratings yet
What Is Multiple Linear Regression (MLR) ?
4 pages
Multiple linear regression
No ratings yet
Multiple linear regression
39 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Bsacore1 M5 Wed
No ratings yet
Bsacore1 M5 Wed
4 pages
Pink Green Bright Aesthetic Playful Math Class Presentation
No ratings yet
Pink Green Bright Aesthetic Playful Math Class Presentation
34 pages
3.multiple Correlation & Regression
No ratings yet
3.multiple Correlation & Regression
24 pages
CHAPTER 15 Partial and Multiple Correlation and Regression Analysis
100% (2)
CHAPTER 15 Partial and Multiple Correlation and Regression Analysis
48 pages
Linear Relationship: ANS 1. Multiple Linear Regression (MLR), Also Known Simply As Multiple Regression, Is A
No ratings yet
Linear Relationship: ANS 1. Multiple Linear Regression (MLR), Also Known Simply As Multiple Regression, Is A
3 pages
Multiple-Regression -Batool & Raya
No ratings yet
Multiple-Regression -Batool & Raya
24 pages
Reference Guide On Multiple Regression: Daniel L. Rubinfeld
No ratings yet
Reference Guide On Multiple Regression: Daniel L. Rubinfeld
55 pages
Regression Analysis 2
No ratings yet
Regression Analysis 2
7 pages
Lecture5 Mar22 2024
No ratings yet
Lecture5 Mar22 2024
44 pages
IDS UNIT 5 Linear Regression
No ratings yet
IDS UNIT 5 Linear Regression
27 pages
Journal24 IFTA Pandini
No ratings yet
Journal24 IFTA Pandini
28 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit 4-1
No ratings yet
Unit 4-1
29 pages
Multiple Regression
No ratings yet
Multiple Regression
8 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Multiple Regression
No ratings yet
Multiple Regression
30 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Multiple Regression SPECIALISTICA
No ratings yet
Multiple Regression SPECIALISTICA
93 pages
Linear Regression Basic Interview Questions
No ratings yet
Linear Regression Basic Interview Questions
36 pages
What Is Regression Analysis
No ratings yet
What Is Regression Analysis
18 pages
UNIT II Regression
No ratings yet
UNIT II Regression
59 pages
Multiple Regresion
No ratings yet
Multiple Regresion
8 pages
Multiple Regresion
No ratings yet
Multiple Regresion
8 pages
Chapter 15
No ratings yet
Chapter 15
41 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Hanan
No ratings yet
Hanan
9 pages
Multiple Regression in SPSS
No ratings yet
Multiple Regression in SPSS
17 pages
Regression (Basic Concepts)
No ratings yet
Regression (Basic Concepts)
39 pages
Unit II-II
No ratings yet
Unit II-II
21 pages
Level 2 Quants Notes
No ratings yet
Level 2 Quants Notes
7 pages
2024 L2 QuantMethods
No ratings yet
2024 L2 QuantMethods
57 pages
2023 CFA L2 Book 1 Quants Eco Multiple
No ratings yet
2023 CFA L2 Book 1 Quants Eco Multiple
63 pages
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
CAP Veterans Association - 3 Oct 1945
No ratings yet
CAP Veterans Association - 3 Oct 1945
45 pages
NSE Annual Report 2022-23-0 - 0
No ratings yet
NSE Annual Report 2022-23-0 - 0
436 pages
NTPC Equal Opportunity Policy
No ratings yet
NTPC Equal Opportunity Policy
14 pages
PNG Standard Classification (Jan 09)
No ratings yet
PNG Standard Classification (Jan 09)
124 pages
Lumhs TOS Pharmacology
No ratings yet
Lumhs TOS Pharmacology
2 pages
Embedded Systems Architecture Types
No ratings yet
Embedded Systems Architecture Types
3 pages
Chapter I. Overview: Generator User'S Manual
No ratings yet
Chapter I. Overview: Generator User'S Manual
10 pages
American Breakfast Table Set
No ratings yet
American Breakfast Table Set
2 pages
Types, Properties and Uses of Hypalon: Compounding Ingredients
No ratings yet
Types, Properties and Uses of Hypalon: Compounding Ingredients
18 pages
Land Degradation in Palestine
No ratings yet
Land Degradation in Palestine
16 pages
Lakeview Integrated School: Supplemental Research Guides and Worksheet NO. 1
No ratings yet
Lakeview Integrated School: Supplemental Research Guides and Worksheet NO. 1
13 pages
NOIC
No ratings yet
NOIC
1 page
Proficy Hmi/Scada - Ifix: Mportant Product Nformation
No ratings yet
Proficy Hmi/Scada - Ifix: Mportant Product Nformation
47 pages
Paper_4-A_Review_on_Artificial_Intelligence
No ratings yet
Paper_4-A_Review_on_Artificial_Intelligence
8 pages
Training: Because Learning Changes Everything
No ratings yet
Training: Because Learning Changes Everything
50 pages
Recap AA
No ratings yet
Recap AA
6 pages
Descr G826 G828 F1668
No ratings yet
Descr G826 G828 F1668
7 pages
Project On RCOM
No ratings yet
Project On RCOM
22 pages
Soil Cement
100% (2)
Soil Cement
14 pages
rrb gbo 2024 123_229b1006-6c96-4bbd-a90e-b4184d2ba122
No ratings yet
rrb gbo 2024 123_229b1006-6c96-4bbd-a90e-b4184d2ba122
27 pages
Procedures For Dealing With Walk-Ins, Scanty Baggage While Taking Advance
No ratings yet
Procedures For Dealing With Walk-Ins, Scanty Baggage While Taking Advance
2 pages
Streetlight Policy and Design Guidelines, February 2013
No ratings yet
Streetlight Policy and Design Guidelines, February 2013
60 pages
Full Download of Solutions for Intermediate Accounting 15th Edition by Kieso in PDF DOCX Format
100% (18)
Full Download of Solutions for Intermediate Accounting 15th Edition by Kieso in PDF DOCX Format
46 pages
ISA-550-Final---29092024-114214pm
No ratings yet
ISA-550-Final---29092024-114214pm
4 pages
Dissertation Isha Shah (2) .
No ratings yet
Dissertation Isha Shah (2) .
38 pages
Pink & Blue Energy Drink
No ratings yet
Pink & Blue Energy Drink
37 pages
172_16SMBECS2_2-16SMBECA2_1-16SMBEIT2_3_2020051912275542
No ratings yet
172_16SMBECS2_2-16SMBECA2_1-16SMBEIT2_3_2020051912275542
4 pages
h15275 Vxrail Planning Guide Virtual San Stretched Cluster
No ratings yet
h15275 Vxrail Planning Guide Virtual San Stretched Cluster
10 pages
NLM Sas 2
No ratings yet
NLM Sas 2
9 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

Basics of Multiple

• A portfolio manager wants to understand how returns are influenced by a

• The goal of simple regression is to explain the variation of the dependent

which we also refer to as the sum of squares total.

the coefficient b2 measures the change in Y for a one-unit change in X2 assuming

We learn the following from this regression:

1. The bond index RET yields, on average, 0.0023% per month, or

• Linearity: The relationship between the dependent variable and the

• If there is an exact linear relationship between independent variables, the

• In the more common case of approximate linear relationships, which may be

• We start our analysis by generating a scatterplot matrix using software. This

• Compared to AIC, BIC assesses a greater penalty for having more

• Practically speaking, AIC is preferred if the model is used for prediction

• Importantly, the value of these measures considered alone is meaningless;

You might also like