Quantitative Methods: Reading Number Reading Title Study Session
Quantitative Methods: Reading Number Reading Title Study Session
Preparation
QUANTITATIVE METHODS www.dbf-finance.com
Reading
Reading Title Study Session
Number
5 Multiple Regression 2
6 Time-Series Analysis
7 Machine Learning
Luis M. de Alfonso
CFA® Preparation
QUANTITATIVE METHODS www.dbf-finance.com
Multiple Regression
Study Session 2
Reading Number 5
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.a: Formulate a multiple regression equation to describe the relation between a dependent variable and several independent variables and determine
the statistical significance of each independent variable
Multiple regression Regression analysis with more than one independent variable
Regression equation:
Residuals:
Multiple regression methodology estimates the intercept term and slope coefficients such that the sum of
the squared error terms is minimized
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
Intercept Term Is the value of the dependent variable when the independent variables are all equal to zero
Estimated Slope Coefficient Each slope coefficient is the estimated change in the dependent variable for a one-unit change in
the independent variable, holding the other independent variables constant (that is why in
multiple regression, they are called partial slope coefficients)
When a new independent variable is added in a regression equation the slope coefficient of the previous variables
normally chage (unless the new variable is uncorrelated with the previous ones)
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.c Formulate a null and an alternative hypothesis about the population value of a regression coefficient, calculate the value of the test statistic, and
determine whether to reject the null hypothesis at a given level of significance
LOS 5.d: Interpret the results of hypothesis test of regression coefficients
t-test
Process:
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.c Formulate a null and an alternative hypothesis about the population value of a regression coefficient, calculate the value of the test statistic, and
determine whether to reject the null hypothesis at a given level of significance
LOS 5.d: Interpret the results of hypothesis test of regression coefficients
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.c Formulate a null and an alternative hypothesis about the population value of a regression coefficient, calculate the value of the test statistic, and
determine whether to reject the null hypothesis at a given level of significance
LOS 5.d: Interpret the results of hypothesis test of regression coefficients
Interpreting p-values
p-value Smallest level of significance for which the null hypothesis can be rejected
Ø If the p- value < than the significance level, the null hypothesis can be rejected
Ø If the p- value > than the significance level, the null hypothesis cannot be rejected
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.c Formulate a null and an alternative hypothesis about the population value of a regression coefficient, calculate the value of the test statistic, and
determine whether to reject the null hypothesis at a given level of significance
LOS 5.d: Interpret the results of hypothesis test of regression coefficients
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.c Formulate a null and an alternative hypothesis about the population value of a regression coefficient, calculate the value of the test statistic, and
determine whether to reject the null hypothesis at a given level of significance
LOS 5.d: Interpret the results of hypothesis test of regression coefficients
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.e: Calculate and interpret 1) a confidence interval for the population value of a regression coefficient and 2) a predicted value for the dependent
variable, given an estimated regression model and assumed values for the independent variables
s*
() Standard error of the regression coefficient
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.g: Calculate and interpret the F-statistic, and describe how it is used in regression analysis
Ø An F-test assesses how well the set of independet variables, as a group, explains the variation in the dependent variable
F - Test
Ø F-test is used to test whether at least one of the individual variables explains a significant portion of the variation of the
dependent variable
Hypothesis tested Ho: b1=b2=….=bn = 0; versus Ha: at least one bj different than zero
F - Statistic
Process: If F (test statistic) > Fc (critical value) then Ho hypothesis is rejected, that means
that at least one of the slope coefficients is significantly different that zero
1.- Calculate F-Statistic (using ANOVA table data)
2.- Calculate critical F-value using the F table and: At least one of the independent variables in the regression model makes a
significant contribution to the explanation of the dependent variable
df numerator = k (number of variables)
df denominator = n – k – 1
2.- Compare F-statistic with the critical F-value If you are asked to test all the coefficients simultaneously, use the F-test
(do not test each coefficient with a t-test for each one)
Always one tailed test !!!
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.g: Calculate and interpret the F-statistic, and describe how it is used in regression analysis
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.h: Distinguish between and interpret the 𝑅, and adjusted 𝑅, in multiple regression
Percentage of the variation in the dependent variable (Y) that is explained by the set of
Coefficient of determination 𝑹𝟐
independent variables
𝑅, increases as the number of independent variables increases even though the marginal contribution of the new variables is not statistically significant,
so it may not be a reliable measure of the explanatory power of the multiple regression model
This problem is referred as “overestimating the regression”
𝑅,0 <= 𝑅,
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.i: Evaluate how well a regression model explains the dependent variable by analyzing the output of the regression equation and an ANOVA table
ANOVA Table Analysis of variance ANOVA is a statistical procedure that provides information on the explanatory power of a regression
From the ANOVA table we can calculate 𝑹𝟐 , the F-statistic, and the estándar error of estimate (SEE)
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.i: Evaluate how well a regression model explains the dependent variable by analyzing the output of the regression equation and an ANOVA table
Remember:
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.i: Evaluate how well a regression model explains the dependent variable by analyzing the output of the regression equation and an ANOVA table
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.j: Formulate a multiple regression equation by using dummy variables to represent qualitative factors and interpret the coefficients and regressions results
DUMMY Variable Binary variables (“1” or “0”) which are often used to quantify the impact of qualitative events also known
as “qualitative independent variables”
Consider the following regression equation for explaining quarterly EPS (earnings per share) in terms of the quarter of their ocurrence:
When performing multiple regression with dummy variables, whenever we want to distinguish between n classes, we must use n – 1 dummy
variables. Otherwise the regression assumption of no exact linear relationship between independent variables would be violated
e.g. If we are studying quarters (there are 4 quarters) we use 3 dummy variables
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.j: Formulate a multiple regression equation by using dummy variables to represent qualitative factors and interpret the coefficients and regressions results
Example:
Average fourth quarter EPS = 1,25
EPS4 = b 8 + b 6Q64 + b ,Q,4 + b 7 Q74 Average first quarter EPS = 1,25 + 0,75 = 2,00
Average second quarter EPS = 1,25 – 0,20 = 1,05
EPS4 = 1,25 + 0,75Q64 - 0,20 Q,4 + 0,10Q74 Average third quarter EPS = 1,25 + 0,1 = 1,35
b 8 : average value of the dependent variable (EPS) for the fourth quarter (dummy variable omitted)
b 6 : difference in earnings per share between quarter 1 and quarter 4 (omitted) (0,75 = 2,00 – 1,25)
b , : difference in earnings per share between quarter 2 and quarter 4 (omitted) (-0,2 = 1,05 – 1,25)
b 7 : difference in earnings per share between quarter 3 and quarter 4 (omitted) (0,1 = 1,35 – 1,25)
“Slope coefficient is interpreted as the change in the dependent variable for the case when the dummy variable is one”
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.j: Formulate a multiple regression equation by using dummy variables to represent qualitative factors and interpret the coefficients and regressions results
Ø As with all multiple regression results, the F-statistic for the set of coefficients and the R , should be evaluated to determine if the dummy
variables, individually or collectively, contribute to the explanation of the dependent variable
Ø We can also perform the t-test for each slope coefficient using the following null hypotheses:
H8 : b 6 = 0 test whether fourth quarter EPS = first quarter EPS (remind that b 6 is the difference in earnings per share between
quarter 1 and quarter 4 (omitted))
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.j: Formulate a multiple regression equation by using dummy variables to represent qualitative factors and interpret the coefficients and regressions results
Remind again: H8 : b 6 = 0 test whether fourth quarter EPS = first quarter EPS
b 6 is the difference in earnings per share between quarter 1 and quarter 4 (omitted)
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.j: Formulate a multiple regression equation by using dummy variables to represent qualitative factors and interpret the coefficients and regressions results
Interpretation:
b 7 > 0 dependent variable Y is higher (b 7 units higher) when Dummy variable D7 takes value “1” than when Dummy variable D7 takes value “0”
e.g.
If 𝑏7 > 0 and is statistically significant, means that loan spread (Y) is higher when loan is used for corporate restructuring than for loans used for
other porpuses
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
1. A linear relationship exist between the dependent variable and the independet variables
2. The independent variables are not random, and there is not exact linear relation between any two ore more
independent variables
5. The error for one observation is not correlated with that of another observation
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
Recall:
Ø s*
() is the standard error for coefficient j and is calculated using the standard error of estimate (SEE) which is the standard deviation
of the error term.
Any violation of an assumption that affect the error term will affect the coefficient standard error
Consequently, this will affect the t-statistic and F- Statistic and any conclusions drawn from hypothesis test involving these statistics
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
Heteroskedasticity
Serial Correlation (Autocorrelation)
Multicollinearity
Ø We need to know:
What it is?
Effects
Detection
Correction
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
Heteroskedasticity
1.- What is Heteroskedasticity When the variance of the residuals is not the same across all observations of the residuals
Ø Unconditional heteroskedasticity: when heteroskedasticity is not related to the level of the independent
variables (causes no major problems with the regression)
Ø Contitional heteroskedasticity: when heteroskedasticity is related to the level of the independent
variables (it creates significant problems for statistical inference)
Contitional heteroskedasticity
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
Heteroskedasticity
b. The coefficient estimates (b") are not affected (coefficients are consistent)
c. If stardard error is too small (standard error understimated) and the coefficient b" is not affected, t-statistic
will be too large and null hypothesis will be rejected too often (Type I error - rejection of the null hypothesis
when it is actually true)
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
Heteroskedasticity
,
𝑅IJKLM = 𝑅 , from a second regression (squared residuals vs independent variables)
Ø Formulate a regression where the dependent variable are the squared errors and de independent variables are the original ones (we try to see if
there is relationship between the independent variables and the squared residuals)
Ø If BP 𝜒 , calculated is greater than critical BP 𝜒 , value (obtained from the table) then null hypothesis is rejected THERE IS CONDITIONAL
HETEROSKEDASTICITY BECAUSE THE INDEPENDENT VARIABLES SIGNIFICANTLY CONTRIBUTE TO THE EXPLANATION OF THE SQUARED RESIDUALS
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
( n = 60 )
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
Heteroskedasticity
4.- Correcting Heteroskedasticity Using “white corrected” standard errors (also called robust standard errors)
Robust standard errors are then used to recalculate the t-statistic using the original regression coefficients
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
1.- What is Serial Correlation Residuals terms are correlated with one another (common problem with time series data)
Ø Positive serial correlation: when a positive regression error in one time period increases the probability of
observing a positive regression error for the next time period
Ø Negative serial correlation: when a positive regression error in one time period increases the probability of
observing a negative regression error for the next time period
Residuals Residuals
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
a. The coefficient estimates (b") are not affected (coefficients are consistent)
b. When positive serial correlation, standard errors are often understimated (too many Type I errors – rejection of the null
hypothesis when it is actually true)
s*
()
SSE
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
b. Durbin – Watson statistic (DW) DW ≈ 2 (1 – r) r = correlation coefficient between residuals from one
period and those from the previous period
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference
Use the Hansen method to adjust standard errors (also called “Hansen – white
4.- Correcting Serial Correlation
standard errors” or “serial correlation consistent standard errors”)
Hansen - white standard errors are then used in hypothesis testing of the regression coefficients
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.l: Describe multicollinearity and explain its causes and effects in regression analysis
Multicollinearity
Two or more independent variables (or linear combinations of the independent variables)
1.- What is Multicollinearity
are highly correlated with each other
b. Standard errors are overestimated (too many Type II errors – greater probability that we will incorrectly conclude
that a variable is not statistically significant – no rejection of the null hypothesis when it is actually false)
s(R
)
SSE
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.l: Describe multicollinearity and explain its causes and effects in regression analysis
Multicollinearity
It could be that there is no direct correlation between independent variables but there is correlation
between linear combinations of the independent variables, and therefore there is Multicollinearity
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.l: Describe multicollinearity and explain its causes and effects in regression analysis
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.m: Describe how model misspecification affects the results of a regression analysis and describe how to avoid common forms os misspecification
MODEL SPECIFICATION
Regression model specification is the selection of the explanatory (independent) variables to be included in
the regression and the transformations (if any) of those explanatory variables
MODEL MISSPECIFICATION
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
LOS 5.m: Describe how model misspecification affects the results of a regression analysis and describe how to avoid common forms os misspecification
MODEL MISSPECIFICATION
Luis M. de Alfonso
CFA® Preparation
QM – Multiple Regression www.dbf-finance.com
Models with qualitative dependent variables (e.g. bankrupt versus non bankrupt) requires methods other than ordinary
least squares like probit, logit and discriminant analysis
The analysis of regressions models with qualitative dependent variables is the same as we have been discussing (examine individual coefficients
using t-test, determine the validity of the model with the F-test and 𝑅, , and look out for heteroskedasticity, serial correlation and
multicollinearity)
The values of the slope coefficients suggest the economic meaning of the relationship between the independent and the
dependent variables.
(Each slope coefficient is the estimated change in the dependent variable for a one-unit change in the independent variable, holding the other
independent variables constant)
But it is important for the analyst to keep in mind that a regression may have statistical significance even when there is no
practical economic significance in the relationship
(e.g. A study of dividend announcements may identify a statistically significant abnormal return following the announcement, but these
returns may not be sufficient to cover transactions costs)
Luis M. de Alfonso