0% found this document useful (0 votes)
40 views8 pages

CFA Level2

CFA_Level2

Uploaded by

Nick Appiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views8 pages

CFA Level2

CFA_Level2

Uploaded by

Nick Appiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Multiple Linear Regression Overview: Involves modeling the relationship between a dependent

variable and two or more independent variables. Unlike simple linear regression, which involves only
one independent variable, multiple linear regression can provide more complex and potentially more
accurate models for prediction, portfolio construction, or understanding security returns. However,
incorrect use can lead to misleading results and poor predictions.

Model Specification and Analysis Process:

The analyst defines the dependent variable and selects relevant independent variables.

Decisions include determining the model's form and purpose (prediction or relationship
understanding).

Software tools are typically used for model estimation and producing statistics. Examples include
Excel, Python (with libraries like scipy.stats, statsmodels, sklearn), R, SAS, and STATA.

The main tasks involve specifying the model and interpreting the software output.

Regression Equation: Represented as Yi = b0 + b1X1i + b2X2i + b3X3i + … + bkXki + εi for i = 1 to n.


Here, Y is the dependent variable, Xs are independent variables, b0 is the intercept, and b1 to bk are
slope coefficients, indicating the impact of each independent variable on Y.

Key Assumptions: There are five crucial assumptions in multiple regression: linearity,
homoskedasticity, independence of errors, normality, and independence of independent variables.
Diagnostic tools like scatterplots and residual plots help verify these assumptions.

Applications: Used in financial analysis for explaining relationships, testing theories, or forecasting.
The regression process requires careful selection of variables, model testing, examining fit, and
adjusting as necessary.

Purpose and Application of Multiple Linear Regression:

Addresses investment problems involving multiple factors rather than a single factor. In the complex
investment world, using multiple explanatory variables is essential for an accurate explanation or
forecast.

Used in various scenarios, such as portfolio managers understanding stock returns through factors
like size, value, profitability, and investment aggressiveness; financial advisers predicting financial
distress using variables like leverage and market share; analysts assessing the impact of country risk
factors on equity returns.

Regression Analysis Process:


The process begins with specifying the model, including selecting the dependent and independent
variables. These variables can be continuous (like returns) or discrete (like an indicator for a takeover
target).

Traditional regression models are used for continuous dependent variables, while logistic regression
is employed for discrete dependent variables.

The choice of independent variables may vary, ranging from financial metrics to categorical variables
like industry sectors.

After model specification, it's estimated and analyzed to ensure it meets the underlying assumptions
and goodness-of-fit criteria.

The final model, once tested and validated, can be used for identifying relationships, testing theories,
or forecasting.

Differentiating Between Continuous and Discrete Dependent Variables:

The approach varies based on the nature of the dependent variable. Continuous variables typically
use traditional regression models, while discrete variables may require logistic regression.

Importance in Investment Analysis:

Multiple regression is crucial for a nuanced understanding of financial relationships and forecasting
in the complex investment landscape. It's a key tool in testing theories and identifying significant
relationships between various financial variables.

Objective of Multiple Linear Regression:

The goal is to explain the variation of a dependent variable (Y) using the variations in a set of
independent variables (X1, X2, ..., Xk). This is an extension of simple regression, which uses only one
independent variable (X) to explain the variation in Y.

Regression Equations:

Simple regression equation: Yi = b0 + b1Xi + εi (for i = 1 to n).

Multiple regression equation: Yi = b0 + b1X1i + b2X2i + b3X3i + … + bkXki + εi (for i = 1 to n).

In multiple regression, terms with independent variables (X1, X2, ..., Xk) form the deterministic part
of the model, while the εi term represents the stochastic or random part.

Interpretation of Coefficients:

Slope coefficients (b1, b2, ..., bk) are interpreted carefully. Each coefficient (bj) represents the impact
of a specific independent variable on Y, assuming other variables are held constant.
Example: b2 measures the change in Y for a one-unit change in X2, assuming other independent
variables are constant.

Practical Example:

A regression model for monthly excess returns of a bond index (RET) against changes in government
bond yields (BY) and investment-grade credit spreads (CS).

Regression equation: RET = 0.0023 − 5.0585BY − 2.1901CS, based on 60 monthly observations.

Interpretations:

If BY and CS are zero, the average monthly bond index return is 0.0023% (about 0.028% per year).

A one-unit increase in BY decreases RET by 5.0585% (holding CS constant), indicating an empirical


duration of 5.0585 for the bond index.

A one-unit increase in CS decreases RET by 2.1901% (holding BY constant).

Example calculation: For changes of 0.005 in BY and 0.001 in CS, expected excess return on the bond
index is −2.52%.

Summary:

Assumptions of Multiple Linear Regression:

Linearity: The relationship between the dependent variable and the independent variables is linear.

Homoskedasticity: The variance of the regression residuals is constant across all observations.

Independence of Errors: Observations are independent, implying uncorrelated regression residuals.

Normality: Regression residuals should follow a normal distribution.

Independence of Independent Variables: These variables should not be random and should not have
an exact linear relationship among themselves.

Importance of Assumptions:

These assumptions are critical for valid statistical inferences using ordinary least squares (OLS) in
multiple linear regression. Violations can be detected using diagnostic tools and should be addressed
to ensure accurate model interpretation.

Model Estimation and Residual Analysis:

The model is expressed as Yi = b0 + b1X1i + b2X2i + b3X3i + … + bkXki + εi, estimated over n
observations.
Regression software can be used for estimation, producing residual plots to check for assumption
violations.

Example of regression: ABC stock's monthly excess returns analyzed using Fama–French three-factor
model.

Using Diagnostic Plots:

Scatterplot Matrix: Helps visualize relationships between variables and detect outliers.

Residual vs. Predicted Value Plot: Checks for homoskedasticity and independence of errors. Outliers
can be identified here.

Residuals vs. Factors Plot: Assesses potential assumption violations in relation to independent
variables.

Normal Q-Q Plot: Compares the distribution of residuals against a normal distribution, checking for
normality. Outliers are identified if they deviate significantly from the diagonal line.

Practical Application:

Using Python and R code, the model can be estimated and residual plots generated for analysis.
Outliers and assumption violations can be identified and addressed to ensure the model's accuracy.

Chapter 2

Goodness of Fit in Multiple Regression:

R-squared (R²): Measures how much variation in the dependent variable is explained by the
independent variables. In multiple regression, R² always increases or remains the same with the
addition of more independent variables.

Limitations of R²: It does not indicate statistical significance of coefficients, biases in estimates, or
quality of model fit. High R² can be misleading due to overfitting.

Adjusted R-squared (Adjusted R²):

Adjusts R² for the number of independent variables relative to the number of observations,
preventing automatic increase with the addition of variables.

Calculated as Adjusted R² = 1 - [(Sum of squares error/(n-k-1)) / (Sum of squares total/(n-1))].

Increases when a new variable with a t-statistic greater than |1.0| is added, but decreases
otherwise.

Overfitting: Occurs when the model is too complex for the data, possibly leading to unreliable
coefficients.
ANOVA (Analysis of Variance):

Used to evaluate the model's fit, comparing the sum of squares due to regression with the total sum
of squares.

ANOVA table shows degrees of freedom, sum of squares, mean squares, F-statistic, and significance
of F-statistic.

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC):

AIC and BIC help compare model quality, assessing model parsimony.

AIC = n * ln(Sum of squares error/n) + 2 * (k + 1), where n is the sample size and k is the number of
independent variables.

BIC = n * ln(Sum of squares error/n) + ln(n) * (k + 1), placing a higher penalty for model complexity.

AIC is preferred for prediction models, while BIC is favored for fitting models.

Practical Application:

Different models can be evaluated for their explanatory power using these statistical measures. For
example, comparing models with varying numbers of factors (like in portfolio excess returns
regression) to determine the best model based on R², Adjusted R², AIC, and BIC.

For investment decisions, these metrics guide the selection of the most appropriate model – either
focusing on prediction accuracy (AIC) or simplicity and fit (BIC).

Interpreting Goodness-of-Fit Statistics:

In the given research scenario, the PM must evaluate models based on R², Adjusted R², AIC, and BIC
to determine which model (CAPEX only vs. CAPEX and ADV) provides a better fit or is more
appropriate for the research objectives.

Interpretation of Coefficients in Multiple Regression:

Intercept: Expected value of the dependent variable when all independent variables are zero.

Slope Coefficients: Expected change in the dependent variable for a one-unit change in a given
independent variable, with all other variables held constant.

Testing Individual Coefficients:

Conducted similarly to simple regression using t-tests.

Null hypothesis (H0): bj = Bj (bj is the coefficient for the jth variable, Bj is the hypothesized value).
Alternative hypothesis (Ha): bj ≠ Bj.

For significance testing, typically H0: bj = 0 and Ha: bj ≠ 0.

Joint Hypothesis Testing:

Used to test the significance of a subset of variables in a multiple regression model.

Compares an unrestricted model (with all variables) to a restricted model (with fewer variables).

Null hypothesis: The coefficients of the variables excluded in the restricted model are zero.

F-test is used to compare the two models.

Comparing Nested Models:

Uses F-statistic: F = [(Sum of squares error restricted model - Sum of squares error unrestricted) / q] /
[Sum of squares error unrestricted model / (n-k-1)].

q is the number of restrictions (number of variables omitted in the restricted model).

Example Application:

Testing whether two additional factors in a five-factor model (Factors 4 and 5) are necessary in
explaining portfolio excess returns. An F-test compares the restricted model (using only Factors 1, 2,
and 3) against the unrestricted model (using all five factors).

The null hypothesis that Factors 4 and 5 are not significant is not rejected if the F-statistic is below a
critical value.

Goodness-of-Fit Test:

Tests the significance of the entire regression equation.

Null hypothesis: All slope coefficients are zero.

F-statistic: Mean square regression (MSR) divided by mean square error (MSE).

Degrees of freedom: k in the numerator (number of independent variables) and n-k-1 in the
denominator.

Evaluating Model Fit:

Adjusted R², AIC, BIC, t-statistics, and F-tests are used to assess model fit.

No single metric is definitive; a combination is often used to determine the best model.

Selecting the Best Model for ROA Analysis:


Various models are estimated to identify key drivers of Return on Assets (ROA) for a sample of
diversified manufacturers.

The models include different combinations of capital expenditures (CAPEX), advertising expenditures
(ADV), and R&D spending.

The best model is chosen based on a balance of explanatory power (R², Adjusted R²), model
parsimony (AIC, BIC), and significance of the variables (t-tests, F-tests).

Recommendation and Justification:

Based on the given statistics, the recommended model is selected by considering the trade-off
between explanatory power and model simplicity. The decision is made by comparing R², Adjusted
R², AIC, and BIC across all potential models and considering the significance of individual variables
and combinations thereof.

Forecasting with Multiple Regression:

Predicting the Dependent Variable: The forecasted value of the dependent variable (Yˆf) in multiple
regression is obtained by summing the product of each estimated slope coefficient (bˆj) with the
assumed value of its corresponding independent variable (Xjf), and then adding the estimated
intercept (bˆ0).

Formula: Ŷf = bˆ0 + bˆ1X1f + bˆ2X2f + ... + bˆkXkf = bˆ0 + Σkj=1bˆjXjf.

Example: Using a five-factor model for portfolio returns, the forecasted return is calculated by
plugging in assumed values for each of the factors into the regression equation.

Cautions in Prediction:

Inclusion of All Variables: If the regression model includes all five independent variables, predictions
must also consider all of them, even if some are not statistically significant. This is because their
correlations were considered in estimating the coefficients.

Inclusion of Intercept: The intercept term must always be included in the prediction.

Uncertainty and Error: Predictions are subject to model error (residuals) and, if using forecasted
independent variables, additional error from inaccuracies in these forecasts.

Standard Error and Prediction Interval:

Model Error: The basic uncertainty in the model, represented by residuals.

Sampling Error: Arises when independent variables are themselves forecasts, adding to the forecast
error.

Resulting Error: The combination of model and sampling errors leads to a larger standard error of the
forecast, widening the prediction interval.
Software Use: Software is typically used to calculate the forecast interval and standard error. For
example, with the five-factor model, a standard error of the forecast, confidence bounds, and the
point estimate are provided.

Practical Application:

In practice, predictions made using multiple regression must carefully consider the accuracy of the
independent variable values used and acknowledge the inherent uncertainties in the model and the
external forecasts. The calculation of the forecast interval is complex and generally handled by
statistical software, providing a more comprehensive picture of the potential range of the forecasted
dependent variable.

You might also like