Lecture 2
Lecture 2
Regression
• Multiple linear regression uses two or more independent variables to describe
the variation of the dependent variable rather than just one independent
variable, as in simple linear regression.
• It allows the analyst to estimate using more complex models with multiple
explanatory variables and, if used correctly, may lead to better predictions,
better portfolio construction, or better understanding of the drivers of
security returns.
• If used incorrectly, however, multiple linear regression may yield spurious
relationships, lead to poor predictions, and offer a poor understanding of
relationships.
• The analyst must first specify the model and make several decisions in this
process, answering the following, among other questions:
• What is the dependent variable of interest?
• What independent variables are important?
• What form should the model take?
• What is the goal of the model—prediction or understanding of the
relationship?
• The analyst specifies the dependent and independent variables and then
employs software to estimate the model and produce related statistics.
• The good news is that the software, such as shown in Exhibit 1, does the
estimation, and our primary tasks
• are to focus on specifying the model and interpreting the output from this
software, which are the main subjects of this content.
USES OF MULTIPLE LINEAR REGRESSION
• There are many investment problems in which the analyst needs to consider the
impact of multiple factors on the subject of research rather than a single factor
• In the complex world of investments, it is intuitive that explaining or
forecasting a financial variable by a single factor may be insufficient.
• The complexity of financial and economic relations calls for models with
multiple explanatory variables, subject to fundamental justification and various
statistical tests.
Examples of how multiple regression may be used include the following:
In this equation, the terms involving the k independent variables are the
deterministic part of the model, whereas the error term, εi, is the stochastic or
random part of the model. The model is estimated over n observations, where n
must be larger than k.
• It is important to note that a slope coefficient in a multiple regression, known
as a partial regression coefficient or a partial slope coefficient, must be
interpreted with care.
• A partial regression coefficient, bj, describes the impact of that independent
variable on the dependent variable, holding all the other independent variables
constant.
• For example, in the multiple regression equation,
2. The change in the bond index return for a given one-unit change in the
monthly government bond yield, BY, is –5.0585%, holding CS constant.
3. If the investment-grade credit spreads, CS, increase by one unit, the bond
index returns change by –2.1901%, holding BY constant.
4. For a month in which the change in the credit spreads is 0.001 and the change
in the government bond yields is 0.005, the expected excess return on the bond
index is
ASSUMPTIONS UNDERLYING MULTIPLE LINEAR REGRESSION
•
AIC is a measure of model parsimony, so a lower AIC indicates a better-fitting
model. The term 2(k + 1) is the penalty assessed for adding independent variables to
the model.
In a similar manner, Schwarz’s Bayesian information criterion (BIC or SBC)
allows comparison of models with the same dependent variable, as follows: